CN108900419A

CN108900419A - Route decision method and device based on deeply study under SDN framework

Info

Publication number: CN108900419A
Application number: CN201810945527.XA
Authority: CN
Inventors: 潘恬; 黄韬; 杨冉; 张娇; 刘江; 谢人超; 杨帆; 刘韵洁
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Fenomen array (Beijing) Technology Co.,Ltd.
Priority date: 2018-08-17
Filing date: 2018-08-17
Publication date: 2018-11-27
Anticipated expiration: 2038-08-17
Also published as: CN108900419B

Abstract

The embodiment of the invention provides the route decision method and device based on deeply study under a kind of SDN framework, the method is applied to SDN controller, including：Obtain the real-time traffic information in network；Determine the priority of every stream；The depth Q network DQN that the real-time traffic information input is trained in advance, the priority sequence flowed according to described every successively determine the routing of every stream.The embodiment of the present invention can realize the load balancing of network in the network of various topological structures, reduce the generation of network congestion, and in the network environment of the highly dynamic variation of network flow, realize the optimization of routing policy.

Description

Route decision method and device based on deeply study under SDN framework

Technical field

The present invention relates to field of communication technology, determines more particularly to the routing based on deeply study under a kind of SDN framework Plan method and device.

Background technique

For a long time, Congestion Avoidance and routing optimality always are the important research class of traffic engineering in modern communication networks Topic.With being skyrocketed through for number of users and network size, network structure becomes increasingly complex, and network congestion and routing optimality face Increasing challenge.

The flow business of highly dynamic variation in network and the flux density being unevenly distributed are cause network congestion main Reason.In order to solve network congestion, common solution mainly has：Multipath is carried out to the flow that may cause network congestion It shunts to prevent the concentration of overload caused by fluid stopping amount.Wherein, equivalent route (Equal-CostMultipathRouting Hash, ECMP) technology is exactly a kind of common Network Load Balance Technology.Specifically, the basic principle of ECMP technology is：Work as net There are when a plurality of different links, support that the network protocol of ECMP can be simultaneously using a plurality of between source address and destination address in network Link of equal value carries out the transmission of data between source address and destination address.

However, flow is only simply averagely allocated to each equal-cost link without considering flow in network by ECMP technology Distribution, it is not fully up to expectations that this causes it to show in the network with asymmetric topology and flow.With asymmetric topology In the network of structure, flow distribution is asymmetrical, and flow distribution is more unbalanced, is more difficult to subtract by ECMP technology Less or avoid the generation of network congestion.And it is dynamic in network flow height due to the generation for being difficult to reduce or avoiding network congestion In the network environment of state variation, the routing policy based on ECMP technology can not be realized optimal.

Summary of the invention

The route decision method of the embodiment of the present invention being designed to provide based on deeply study under a kind of SDN framework And device, in the network of various topological structures, to reduce the generation of network congestion, and in the highly dynamic variation of network flow In network environment, the optimization of routing policy is realized.Specific technical solution is as follows：

In a first aspect, the embodiment of the invention provides the routing decision sides based on deeply study under a kind of SDN framework Method is applied to SDN controller, the method includes：

Obtain the real-time traffic information in network；Wherein, the real-time traffic information includes：Every stream in the network Occupied link bandwidth；

Determine the priority of every stream；

The depth Q network DQN that the real-time traffic information input is trained in advance, the priority flowed according to described every are high Low sequence successively determines the routing of every stream；

Wherein, the DQN is according to sample flow information and the corresponding sample routing policy instruction of the sample flow information It gets；The sample flow information includes：The occupied link bandwidth of every sample flow, the sample routing policy packet It includes：The routing of the corresponding every sample flow of the sample flow information.

Second aspect, the embodiment of the invention provides the routing decisions based on deeply study under a kind of SDN framework to fill It sets, is applied to SDN controller, described device includes：

First obtains module, for obtaining the real-time traffic information in network；Wherein, the real-time traffic information includes： Every occupied link bandwidth of stream in the network；

First determining module, for determining the priority of every stream；

Second determining module, the depth Q network DQN for training the real-time traffic information input in advance, according to institute The priority sequence for stating every stream successively determines the routing of every stream；

The third aspect, the embodiment of the invention provides a kind of SDN controller, including processor, communication interface, memory and Communication bus, wherein the processor, the communication interface, the memory are completed each other by the communication bus Communication；

The memory, for storing computer program；

The processor when for executing the program stored on the memory, is realized described in first aspect as above The method and step of routing decision based on deeply study under SDN framework.

Fourth aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage Instruction is stored in medium, when run on a computer, so that computer executes SDN framework described in first aspect as above Under based on deeply study routing decision method and step.

5th aspect, the embodiment of the invention provides a kind of computer program products comprising instruction, when it is in computer When upper operation, so that computer executes the routing decision based on deeply study under SDN framework described in first aspect as above Method and step.

In the embodiment of the present invention, previously according to sample flow information and the corresponding sample arm of the sample flow information The DQN obtained by Strategies Training, and then in determining network when the routing of every stream, obtaining the real-time traffic letter in network After breath, by the trained DQN of the real-time traffic information input, so that DQN is successively determined according to the priority of every stream in network The routing of every stream out.It is routed since the embodiment of the present invention is determined based on DQN network trained in advance, and DQM network training When can according to the sample data of the network of topological structure to be analyzed, therefore, the embodiment of the present invention can it is various topology knot The generation of network congestion is reduced in the network of structure, and in the network environment of the highly dynamic variation of network flow, realize routing plan Optimization slightly.

Certainly, it implements any of the products of the present invention or method must be not necessarily required to reach all the above excellent simultaneously Point.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described.

Fig. 1 is the route decision method based on deeply study under a kind of SDN framework provided in an embodiment of the present invention Flow chart；

Fig. 2 is the route decision method based on deeply study under a kind of SDN framework provided in an embodiment of the present invention Another flow chart；

Fig. 3 is the route decision method based on deeply study under a kind of SDN framework provided in an embodiment of the present invention Another flow chart；

Fig. 4 is the routing decision device based on deeply study under a kind of SDN framework provided in an embodiment of the present invention Structure chart；

Fig. 5 is the routing decision device based on deeply study under a kind of SDN framework provided in an embodiment of the present invention Another structure chart；

Fig. 6 is a kind of structural schematic diagram of SDN controller provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Scheme in order to facilitate understanding, first below to SDN (Software Defined Network, software defined network Network), DRL (Deep Reinforcement learning, deeply study) and DQN (DeepQ Network, depth Q Network) simply introduced.

SDN is a kind of novel network architecture.Different from traditional network architecture, SDN proposes the data plane of network With the thought of control planar separation.Wherein, the communication between the data plane of network and control plane can pass through a kind of opening Agreement --- Openflow agreement is realized.Based on Openflow agreement, Openflow interchanger in data plane in addition to It can carry out except common data traffic forwarding and transmission, additionally it is possible to which the real-time traffic information of the network of acquisition is uploaded to net The SDN controller of the control plane of network.SDN controller can collect the Openflow interchanger in the network area that it is managed The flow information that is uploaded simultaneously is summarized, and corresponding routing policy and forwarding are formulated according to the network traffic information being collected into Mechanism.SDN framework has many advantages compared with traditional network architecture, and the void of network function may be implemented based on SDN framework Quasi-ization (Network Function Virtualization, NFV), decouples software and hardware, and abstract network function makes the network equipment Function is no longer dependent on specialized hardware, thus sufficiently flexible shared resource.By SDN controller, network can be fully realized Global route test.This, which means that, can control the routing policy flowed in network and assignment of traffic from whole angle To solve congestion problems caused by being unevenly distributed in network due to flux density.

DRL is a kind of novel machine learning method, be combined with deep neural network (Deep Neural Network, DNN intensified learning (Reinforcement Learning, RL) method), is proposed by DeepMind.If it is intended to DRL is answered For the control problem under different scenes, need to guarantee that the control problem meets the following conditions：(1) one have clear rule and The environment of definition；(2) one the system that can be properly and timely fed back (3) one for defining the reward letter of task object Number.The flow control of network and routing decision problem meet conditions above, that is to say, that realize network using DRL Flow control and routing decision are feasible.Specifically, Markov decisior process usually can be used when RL handles a task Journey (MarkovDecisionProcess, MDPs) describes：In a determining environment E, have a state space S and One motion space A, RL will use an agency and carry out decision in environment E, any one state in state space S indicates Perceived current environment is acted on behalf of, any one movement in the A of motion space is all alternative movement at each state.When After agency executes some movement using strategy π (s) under some state, state can be shifted.And after state shifts, Environment E can be transferred to according to this next state gives one reward (reward) of agency.When agency uses strategy π (s) by original state Start to execute a series of actions, that is, after carrying out a series of state transfers, agency can obtain a progressive award Q^π(s, a).RL's Target is exactly to find an optimal tactful π^*(s), the progressive award that agency obtains can be maximized.

When RL handles task, optimal policy can be found by Q-learning.But when state space is excessively huge When, the progressive award Q under each state is solved by Q-learning^π(s, process a) can become very difficult.In order to solve This problem can be used DNN and seek progressive award Q by approximate mode^π(s, a).It is this to combine DNN and Q- The method of learning is referred to as DQN.

The primary structure of DQN be a neural network --- be referred to as Q network, Q network can using state s as input, The Q of optional movement under output state s^π(s, a) value, due to the Q of output^π(s is a) that Q network is approximate, it is therefore desirable to training Q net The parameter θ of network comes so that approximate Q^π(s, it is a) more accurate, specifically, in the training process, the value of loss function can be calculated, When the value of loss function is unsatisfactory for imposing a condition, the ginseng of Q network is updated by common back transfer and gradient descent method Number θ.As the Q that Q network is after by enough training and update, and Q network exports^π(s a) can be closer to optimal accumulation Reward Q^*(s, a), while current strategies π (s) can also approach optimal policy π^*。

In a network, when the maximum that the data traffic loads of the transmission on link or equipment are more than link bandwidth or equipment When reason ability, the propagation delay time that will lead in the link or equipment increases, and throughput degradation, simultaneous transmission data, which can generate, loses Packet, the referred to as network congestion of this case where causing network transmission performance to decline.Under normal circumstances, the congestion in network be all due to Caused by link or the overload of equipment, when data transfer loads total in network are more than the open ended upper loading limit of network When, network congestion is difficult to avoid that, in this case can only be upgrade of network hardware or by way of increasing extras To avoid network congestion.Even if but data transfer loads total in network reach far away network and can accommodate in many cases, Upper loading limit, network congestion can also occur.This network congestion is since data traffic is unevenly distributed and is led in network mostly It causes：It is certain in key position in a network due to often using basic shortest path first in conventional routing protocols A large amount of flow load is often concentrated on link or equipment, and other is in the equipment of network edge or chain road loads Then seldom, the utilization rate of Internet resources is very low.Caused network congestion in this case, can routing policy to network into Row optimization is to be reduced or avoided.

, it is clear that the optimal routing policy in a network will be believed according to network topology structure and real-time network flow Breath is to determine.When the data traffic variation in network, the optimal routing policy of network is also required to change therewith.This requires necessary The real-time traffic information of network is grasped, to optimize according to the real-time traffic information to the routing policy of network.

In order to realize the generation for reducing network congestion in the network of various topological structures, and it is highly dynamic in network flow In the network environment of variation, the optimization of routing policy is realized, the embodiment of the invention provides depth is based under a kind of SDN framework The route decision method and device of intensified learning.

It, can be under the network architecture of SDN, by being located at control plane in SDN network framework in the solution of the present invention SDN controller realize the acquisition of the real-time traffic information of network and summarize, when SDN controller is by the real-time streams of whole network After amount information summarizes, the current optimal road of network can be determined using DRL method according to the real-time traffic information of network By strategy.It is possible to further determine the current optimal routing policy of network based on the DQN in DRL.

It is provided for the embodiments of the invention the routing decision based on deeply study under a kind of SDN framework first below Method is introduced.

As shown in Figure 1, the routing decision side based on deeply study under a kind of SDN framework provided in an embodiment of the present invention Method, is applied to SDN controller, and this method may comprise steps of：

S101 obtains the real-time traffic information in network；Wherein, real-time traffic information includes：Every stream institute in network The link bandwidth of occupancy.

Method provided in an embodiment of the present invention can be applied to SDN controller.SDN controller is controlled in SDN network framework The controller of plane processed can collect the above-mentioned network of the Openflow interchanger transmission of data plane in SDN network framework Real-time traffic information, and corresponding routing policy and forwarding mechanism are formulated based on the real-time traffic information.Thus, above-mentioned network can To be a kind of communication network with SDN network framework.

Network in the present embodiment, stream and congestion problems in order to facilitate understanding, below to the model of network in the present embodiment, The routing mode of definition and stream and the congestion problems of network flowed in network is introduced.

Firstly, the model of network is：One has several communication nodes, the communication network of m physical link.It is each logical Letter node corresponds to each Openflow interchanger of data plane in SDN network framework.All communication nodes can be divided into Two kinds：Source node and forward node.Source node is the node of the generation and final received data packet in network, all in network Data packet is all generated by source node, and finally can all reach source node.In the present embodiment, the number of the source node in setting network For n, s is used₁, s₂, s₃......s_nIndicate source node.Forward node is the node of the responsible forwarding data packet in network, forwarding section Point does not generate data packet, they are only based on flow table to the data packet to be come by other node-node transmissions and are forwarded operation.

Then, the stream in network refers to：In network it is all by identical source node, eventually arrive at identical purpose section The data packet of point is classified as one kind, and such data packet collectively constitutes a stream.Wherein, the source node of any bar stream and destination node be not It can be the same node, the source node of any bar stream here refers to：The start node of this stream.Based on this, it can be concluded that In a network with n source node, be up to N=n²- n item stream.In order to quantitatively describe each stream in a network Flow demand, definition：One with node s_iFor source node, node s_jFor the stream in a link shared by normal transmission of purpose node Link bandwidth is f_{I, j}.For every stream f_{I, j}, can flow for this and be determined between institute's active node and all purposes node X alternate routing outEvery alternate routing all specifies stream f_{I, j}By source node s_iIt sets out and reaches purpose Node s_jAll links passed through.In the present embodiment, it is exactly to the mode of network progress routing decision：For every in network Item stream selects a practical routing (the referred to as routing of this stream) as this stream for it from alternate routing.Therefore, it flows Routing mode refer to：Stream realizes the transmission of data packet in stream by the specified all links of its practical routing.It needs to illustrate , it is that minimum unit carries out routing decision with stream that in this embodiment, stream, which is the minimum unit controlled in routing decision, right It is easily achieved for the SDN controller forwarded using flow table control data bag.

Finally, the congestion problems of network refer to：It is every physical link specified two for m physical link in network A parameter：The available bandwidth threshold t of maximum of link₁, t₂, t₃..., t_mWith the real-time link load value l of link₁, l₂, l₃..., l_m.Wherein, the bandwidth threshold of link is identical as the linear module of real-time link load value.In the definition of above-mentioned convection current In, by stream, the occupied link bandwidth of normal transmission is expressed as f in a link_{I, j}.So for a link k, if current shape Have a plurality of stream under state is routed across this link k, such as：There are three stream f_1,2, f_1,3, f_{Isosorbide-5-Nitrae}Be routed across link k, definition Real-time link load value l on this link k_kEqual to it is above-mentioned it is a plurality of stream in a link the occupied link bandwidth of normal transmission it With the real-time traffic load l of i.e. link k_k=f_1,2+f_1,3+f_{Isosorbide-5-Nitrae}.If l_kValue be more than link k the available bandwidth of maximum Threshold value t_k, it is considered that congestion, l has occurred on link k at this time_kValue be more than bandwidth threshold t_kDegree correspond to and sent out on link k The severity of raw congestion, the severity of congestion is higher, and the handling capacity of link k is lower, flows through the time delay of the stream of link k also just It is higher, i.e., three stream f_1,2, f_1,3, f_{Isosorbide-5-Nitrae}Transmission delay it is higher；If l_kMaximum available bandwidth of the value no more than link k Threshold value t_k, then it is assumed that congestion is not had on link k at this time, the throughput amount of link k is with l_kValue increase and linearly increase Add, the stream for simultaneously flowing through link k can be transmitted in acceptable time delay range.

The congestion problems of routing mode and network based on the definition and stream flowed in the above-mentioned model to network, network Introduction, for the routing decision problem of network, target is：There is n source node at one, in the network of m physical link, It is that every stream selects most suitable road in having obtained a certain moment network under conditions of every stream occupied link bandwidth By so that the load balancing state of network is optimal, the probability that congestion occurs in network is minimum.It is understood that if certain a period of time It carves certain stream to be not present, occupied link bandwidth is 0.

In the present embodiment, SDN controller can be all on data plane by will be located in SDN network framework The real-time traffic information for the network that Openflow interchanger is sent is collected and summarizes, to obtain the letter of the real-time traffic in network Breath.This process can be achieved by the prior art, and the present invention is secondary without repeating.

In actual use, SDN controller can periodically obtain the real-time streams in network by certain time interval Measure information.The every real-time traffic information obtained in primary network of SDN controller, so that it may believe for acquired real-time traffic Breath, carries out a routing decision.This just embody in the present embodiment when the flow information in network changes, accordingly Ground, the routing policy of network also adjust therewith.Thus, when the highly dynamic variation of flow in a network, SDN controller is in real time The real-time traffic information in network is obtained, and adjusts routing policy, can remain that the routing policy of network is optimal.

Above-mentioned time interval can be determined according to the concrete condition of network.Specifically, can be become according to the flow of network Change degree determines.If the changes in flow rate of network is very fast, above-mentioned time interval can be set to a lesser value；If net The changes in flow rate of network is unhappy, then can set above-mentioned time interval to a biggish value.

S102 determines the priority of every stream.

By a large amount of emulation experiment, inventor is observed：When to the network route with asymmetric topological by certainly When plan, to DQN, (DQN in step S102 refers to the sequence routed for all streams selection in network：Trained DQN) place Reason speed and effect have significant impact, and the processing speed of DQN can be obviously improved under some cases, and in the case of other The processing speed of DQN very slowly even is difficult to restrain.By the comparison of multiple groups emulation experiment, find DQN processing speed and Effect is related with the alternate routing of not cocurrent flow：When having one " ideal " routing in the alternate routing that certain is flowed, here " ideal " Routing refers to：This routes other stream brings on flowed through path and loads very little, i.e., this is routed on flowed through path It is not susceptible to congestion, the Route Selection for preferentially carrying out this stream can allow DQN to optimize routing decision with faster processing speed, And if the stream of " ideal " preferentially non-to those alternate routings routing is routed, the processing time of DQN can be made more It is more, and treatment effect is also very unsatisfactory.This is because：When one flow alternate routing in there is a routing to be substantially better than other When routing, DQN is easy for that the optimal routing policy of this stream can be exported, and is all streams choosing in network when in sequence Routing by when, the sequence of this stream is more forward, and DQN can more quickly export the optimal routing policy of this stream, while optimizing whole The solution space explored required for the routing policy of a network is also just smaller, and it is also easier to handle.Above-mentioned DQN processing refers to：Base The Optimization route of every stream in network is determined in trained DQN.

For these reasons, in the present embodiment, before DQN processing, determine that sequence proposes for the routing policy of stream The priority of stream determines method.In a kind of implementation, the priority of determination every stream in step S102, may include following Step：

S11, for every stream f_{I, j}, determine x alternate routing of this streamWherein, i indicates stream f_{I, j}Source node, j indicate stream f_{I, j}Destination node.

Source node and forward node in network can form a plurality of routing.So it is directed to every stream f_{I, j}, can be a plurality of Some alternate routings are first selected in routing, flow f further to select one in these alternate routings as this_{I, j}Reality Routing.

Wherein, when choosing the alternate routing of every stream, the alternate routing of every stream can satisfy the following conditions：

Condition 1：Any alternate routing of every stream is acyclic.

It is appreciated that when there are when loop in an alternate routing, it is meant that the data packet transmitted on this alternate routing It will be unable to the node that achieves the goal.

Condition 2：Every any alternate routing paths traversed flowed and other alternate routing paths traversed are endless It is exactly the same.

That is, being directed to any bar stream, other alternate routings that every alternate routing of this stream is flowed with this are not Together.Since the purpose of the present embodiment is：For any bar stream, selected in a plurality of alternate routing of this stream one it is optimal As the actual routing of this stream, so, for the ease of more a plurality of alternate routing, a plurality of alternate routing can each not phase Together.

Condition 3：The distance of any alternative path of every stream meets preset value.

For any bar stream, when selecting Optimization route for this stream, it is often desirable that the distance of this Optimization route is shorter.Institute A preset value can be set, distance is less than alternate routing of the routing of the preset value as this stream.One routing away from From referring to：The source node of this routing is (i.e.：Start node) to the distance of destination node, specifically, this can be passed through and routed Item number or other usual ways come measure this routing distance.Above-mentioned preset value can be set according to actual needs.Needle To different stream, identical preset value can be set, can also be respectively set different preset values, the present invention to this and it is unlimited It is fixed.

S12 is calculated by the following formula stream f_{I, j}The r articles alternate routingEvaluation of estimate EV_r：

Wherein, l₁, l₂, l₃..., L indicates the r articles alternate routingListen by each link, It indicates in network except stream f_{I, j}Except other stream alternate routings in, by each link l₁, l₂, l₃..., each total time of L Number；Indicate each total degreeIn maximum value.

After the alternate routing for determining every stream, each alternate routing of this stream can be evaluated.Specifically, The case where each link that every alternate routing is passed through is occupied by other streams can be evaluated.In the present embodiment, it can set The evaluation of estimate EV stated_rTo evaluate the case where each link that every alternate routing is passed through is occupied by other streams.Pass through upper commentary It is worth EV_rCalculation formula it can be concluded that such conclusion：The EV of one alternate routing_rIt is worth bigger, it is meant that this routing warp The utilization rate for the link crossed is lower, flows f_{I, j}A possibility that congestion occurs after selecting this to route is smaller.

S13 is calculated by the following formula stream f_{I, j}Priority reference value P_{I, j}：

P_{I, j}=max (E)-max (E { max (E) })

Wherein, E is indicated by flowing f_{I, j}All alternate routings evaluation of estimate composition set, E={ EV₁, EV₂, EV₃...EV_X, max (E) indicates the maximum value in the set E, E { max (E) } expression by the maximum value in the set in E The new set that max (E) is formed after removing, max (E { max (E) }) indicate the new set E maximum value in { max (E) }.

Calculate stream f_{I, j}All alternate routings evaluation of estimate after, can be further according to the evaluation of all alternate routings Value calculates stream f_{I, j}Priority reference value P_{I, j}.Priority reference value P_{I, j}It illustrates：In stream f_{I, j}All alternate routings In, the difference of the evaluation of estimate of the maximum alternate routing of evaluation of estimate and the second largest alternate routing of evaluation of estimate.That is, the priority Reference value P_{I, j}It illustrates：Flow f_{I, j}All alternate routings in most " ideal " routing better than other routing degree.Priority Reference value P_{I, j}It is bigger, flow f_{I, j}Priority it is higher.

S14, the sequence of the priority reference value flowed according to described every determine the priority of every stream； Wherein, the priority of the highest stream of priority reference value is 0, and the priority of the minimum stream of priority reference value is N-1.

It, can be by the priority reference values of all streams according to from high to low after the priority reference value for calculating all streams Sequence sorts.Sequence after sequence is exactly the sequence for being all stream selection routings.One stream priority reference value it is higher, then this The priority of item stream is bigger.So, when carrying out routing decision, the sequence of this stream is more forward.

In the present embodiment, the priority list of the highest stream of priority reference value is shown as 0, the minimum stream of priority reference value Priority list be shown as N-1.After the priority for determining every stream, can sequence according to priority from 0 to N-1, successively really The routing of fixed every stream.

S103, the depth Q network DQN that real-time traffic information input is trained in advance, according to the every priority flowed height Sequentially, the routing of every stream is successively determined；Wherein, DQN is according to sample flow information and the corresponding sample of sample flow information The training of this routing policy obtains；Sample flow information includes：The occupied link bandwidth of every sample flow, sample routing policy Including：The sample arm of the corresponding every sample flow of sample flow information by.

It, can be according to the sample flow information and sample flow information pair obtained in advance in order to determine the routing of every stream The sample routing policy answered, is trained DQN, obtains trained DQN.It in turn, can be by network after training DQN The trained DQN of real-time traffic information input so that trained DQN according to every flow priority sequence, successively Determine the routing of every stream.Wherein, the sample arm of the corresponding every sample flow of sample flow information is by may be considered every Therefore the optimal routing of sample flow may be considered the optimal of every stream by the routing of the DQN every stream determined Routing.So the process of training is exactly：Learn the optimal routing to every sample flow.Based on this, after training, by net After the trained DQN of real-time traffic information input of network, the optimal routing of each item stream in network can be exported.

Before training, can preset one for training environment.In the training environment, including a plurality of sample Stream, multiple communication nodes (including source node and forward node) and multilink further include the network flow-of a stream rank Load module passes through the corresponding relationship of network flow and link load in the available training environment of this model.Due to instruction The routing decision for practicing each sample flow in environment can change with the variation of sample flow information, so being directed to one group of sample flow Information is measured, the corresponding relationship based on network flow in training environment and link load can be in the routing for determining each sample flow During, determine the real time load of each link in training environment.It is understood that in the present solution, due to specific It is the routing that each sample flow is determined according to the sequence of priority, so after often determining the routing of a sample flow, each item The load of link can change, and the variation will affect the route-determining process of next sample flow.In the present solution, can be with Link load is regarded as to the linear accumulation of all flows in chain road.Specifically, the load of a link is to flow through this link The sum of each occupied link bandwidth of sample flow.

Based on above-mentioned preset training environment, DQN can be trained.In order to be suitable for trained DQN Each item stream in network is routed, it is identical with the network structure of network with one that training environment can be set Training network, in the training network source node number, forward node number and number of links respectively in network source save Point number, forward node number is identical with number of links, and in the training network each link bandwidth also respectively with network In each link bandwidth it is identical.It hereinafter will be described in detail the process of trained DQN.

The depth Q network DQN that real-time traffic information input is trained in advance, the priority sequence flowed according to every, It successively determines the process of the routing of every stream, one can be carried out for one group of sample flow information with reference to what is be described below The learning process of bout.

Scheme provided in an embodiment of the present invention, it is corresponding previously according to sample flow information and the sample flow information The DQN that is obtained by Strategies Training of sample arm, and then obtaining the reality in network when the routing of every stream in determining network When flow information after, by the trained DQN of the real-time traffic information input so that priority of the DQN according to every stream in network, Successively determine the routing of every stream.The embodiment of the present invention can realize that the load of network is equal in the network of various topological structures Weighing apparatus, reduces the generation of network congestion, and in the network environment of the highly dynamic variation of network flow, realize the optimal of routing policy Change.

The process of training DQN in the embodiment of the present invention is introduced below, as shown in Fig. 2, the training process of DQN can be with Include the following steps：

S201 constructs initial DQN.

In the present embodiment, in order to train DQN, initial DQN can be constructed.The structure of the initial DQN may include：Shape State input layer, at least one layer of hidden layer and movement output layer.Wherein, one group of sample flow can be believed in state input layer It is corresponding can to export this group of sample flow information in movement output layer after the processing of at least one layer of hidden layer by breath input DQN The current routing of each sample flow.This it is current routing be：DQN is under parameter current, the result that exports after once learning. In initial DQN, the value of each parameter is initial value.Trained process is exactly the parameter constantly optimized in DQN, so that ginseng Number optimization after DQN output each sample flow it is current route constantly close to the sample arm of each sample flow by.

S202 obtains sample flow information and the corresponding sample routing policy of sample flow information.

After constructing initial DQN, available sample flow information and the corresponding sample arm of sample flow information by Strategy.To be further trained according to sample flow information and the corresponding sample routing policy of sample flow information to DQN.

Due to accordingly, it is desirable to adjust the routing policy of network, that is, needing to adjust when network traffic information changes The routing of each item stream in network.So in the present embodiment, training DQN particularly directed to one group of sample flow information, instruct Experienced result is exactly：Allow DQN export the sample arm of the corresponding each sample flow of this group of sample flow information by.Due to In practical application, network traffic information can be the value of each occupied link bandwidth of item stream in one group of any network, so Training DQN when, the different sample flow information of available multiple groups, for the different sample flow information of the multiple groups respectively into Row training.In this way, when can when being routed to item stream each in network for a certain group of real-time traffic information in network First to determine to believe with the immediate sample flow information of this group of real-time traffic information, directly use for the sample flow Breath and the DQN of training determine the routing of each item stream in network corresponding to this group of real-time traffic information.

S203 obtains sample flow information input DQN according to the priority sequence of preset every sample flow The current routing of every sample flow.

The process for presetting the priority sequence of every sample flow can be with reference to every in determining network above-mentioned The process of the priority of stream.It, can be suitable according to this in each training after the priority sequence for determining every sample arm Sequence determines the current routing of each sample flow, so as to the speed of training for promotion.

In a kind of implementation, in step S203 by sample flow information input DQN, according to preset every sample flow Priority sequence, obtain the current routing of every sample flow, may comprise steps of：

S21 is constituted initial state information with sample flow information, the initial value of link load vector and priority 0；Its In, link load vector is the vector being made of the link loading value of each of the links in preset training environment, any bar link Link loading value be：By the sum of each occupied link bandwidth of sample flow of this link.

In the present embodiment, one bout will be known as a learning process of one group of sample flow information.At each time In conjunction, DQN executes movement by original state, then carries out all processes of a series of state transfer until terminating state As shown in Figure 3.In each bout, one group of status information of every input will export one-off by DQN, the output action table Show：A current routing is determined for a sample flow.After one bout, which outputs the last one movement, to be terminated, in this time The whole movements exported in conjunction are meant that：Define the current routing of all sample flows.

In each bout, the status information inputted every time can be made of three parts：1, sample flow information, i.e., respectively The occupied link bandwidth of sample flow, is expressed as：f_1,2, f_1,3, f_{Isosorbide-5-Nitrae}...f_{N, n-1}；2, link load vector is expressed as (l₁, l₂, l₃...l_m)；3, priority value, for determining the sequence of each sample flow.Specifically, a system of the sample flow information in one bout In column state migration procedure, it will not change as state shifts, this is because conducted in the study of one bout Routing decision be for this group of sample flow information and the routing decision that carries out.Link load vector illustrates：Currently The loading condition of each chain road under state, the link load vector are continually changing with state transfer.Every next state After transfer, the variation of link load vector by movement that the link load vector sum Last status of Last status is exported Lai It determines.Priority indicates：The priority of routing decision is carried out to each sample stream, while being also used to determine the suitable of each state Sequence.

In the present embodiment, the value of the priority in initial state information is set as 0, later after every execution one-off, shape The value of the priority of new state after state transfer is increased by 1.

In initial state information, due to determining to route for any sample flow not yet, so link load to Amount is 0 vector.

Initial state information is inputted DQN, the current routing for the sample flow that output priority is 0 by S22.

After initial state information is inputted DQN, the sample flow that DQN can be 0 based on current parameter output priority is (simple Referred to as sample flow 0) current routing.Specifically, DQN can be selected in a plurality of alternate routing determined in advance for sample flow 0 One, the current routing as sample flow 0.Wherein it is determined that the mode of the alternate routing of sample flow 0, can with reference to it is aforementioned really Determine the mode of the alternate routing of every stream in network.

S23 updates current ink load according to the current routing for the sample flow that initial state information and priority are 0 Vector, and priority is enabled to increase by 1.

In process shown in Fig. 3, the movement exported after each input state information can all influence in next status information Link load vector.This is because：When a stream f has been determined_{I, j}Routing r_{I, j}Afterwards, this stream f_{I, j}The link flowed through Link load is changed.Therefore, it is possible to first according to 0 institute of sample flow in the current routing of sample flow 0 and initial state information The link bandwidth of occupancy calculates the increased load of each link institute that sample flow 0 flows through to it, then by the increased load of institute with The link loading value for each link that sample flow 0 is flowed through in the link load vector of initial state information is added, after obtaining update Link load vector.The updated link load vector can be used as the link load vector in NextState information.

S24 sets s=1 ..., N-1, executes following steps a1-a3 according to the ascending sequence circulation of s, output is preferential Grade is the current routing of the sample flow of 1~N-1, wherein the quantity of N expression sample flow：

a1：S-th of state letter is constituted with sample flow information, updated link load vector and current priority Breath.

This step can refer to step S21.

a2：S-th of status information is inputted into DQN, output priority is the current routing of the sample flow of s.

This step can refer to step S22.

a3：According to the current routing for the sample flow that s-th of status information and priority are s, current ink load is updated Vector, and priority is enabled to increase by 1.

This step can refer to step S23.

Above-mentioned steps a1-a3 is executed by circulation, so that it may be sequentially output the current of the sample flow that priority is 1~N-1 Routing.Working as all sample flows has been determined that out when outputing the last item sample flow after the current routing of i.e. sample flow N-1 Preceding routing.The current routing of all sample flows may further be compared with the Optimization route of all sample flows, to optimize The parameter of DQN.

S204 calculates pre-set loss function according to the current routing of every sample flow and sample routing policy Value.

During training DQN, a loss function can be preset.Every galley proof can be measured by the loss function The gap of the current routing of this stream and the sample arm of every sample flow between.

In a kind of implementation, on the basis of implementation (i.e. step S21-S24) in step S203, step S204 In current routing according to every sample flow and sample routing policy, calculate the value of pre-set loss function, can be with Include the following steps：

S31 calculates object chain according to the current routing for the sample flow that N-1 status information and priority are N-1 Road load vector；Wherein, Target Link load vector includes：Every chain in the corresponding preset training environment of sample flow information The real-time link load value on road.

After determining the current routing that priority is the sample flow of N-1, the current routing of all sample flows has been determined that out. Therefore, it is possible to calculate the real-time link load value of each of the links in training environment, Target Link load vector is formed, further According to the Target Link load vector, to evaluate the load balancing state of training environment.Wherein, Target Link load vector is calculated Mode can refer to step S23.

S32 calculates the corresponding reward function value MLV of sample flow information according to Target Link load vector.

Using calculated Target Link load vector, the load balancing state of training environment can be evaluated, with Convenient for further being optimized to the routing policy of training environment.The load balancing state of training environment is evaluated, also It is：Evaluate the learning outcome of one bout.Wherein, the load balancing state of training environment refers to：Each link in training environment On loading condition.

Due to the purpose of the present invention is：It is reduced as far as the occurrence probability of network congestion and the degree of congestion.Specifically, Need clear two kinds of demands：1, as the link loading value l of any bar chain road in training environment_kMaximum lower than this link can Bandwidth threshold t_kWhen, it needs to make link loading value l_kAs far as possible far from bandwidth threshold t_k；2, as link loading value l_kIt is more than Bandwidth threshold t_kWhen, it needs to make link loading value l_kAs close as bandwidth threshold t_k.In order to realize both demands, need The relationship first quantitatively described in training environment between the link loading value and bandwidth threshold of each link defines one here The maximum load value (maximumloading Value, MLV) of a training environment, expression formula is：

MLV=min ((t₁-l₁), (t₂-l₂), (t₃-l₃)...(t_m-l_m))

Wherein, l₁, l₂, l₃..., l_mRespectively indicate link 1,2,3 ..., the real-time link load value of m, t₁, t₂, t₃..., t_mRespectively indicate link 1,2,3 ..., the bandwidth threshold of m.

MLV is indicated：In training environment between the bandwidth threshold of the link of pack heaviest and real-time link load value Difference.When MLV value be timing, illustrate that the real-time link load value of all links in training environment is respectively less than bandwidth threshold, instruct Practicing in environment does not have congestion, and the value of MLV is bigger at this time, then it is assumed that the load in training environment is more balanced.And when MLV's When value is negative, illustrate that the real-time link load value of at least one link in training environment has been more than bandwidth threshold, training environment In congestion has occurred, the value of MLV is smaller at this time, means that the congestion in training environment is more serious.

It, can be using MLV as reward function, by this in the training of DQN based on the said circumstances that MLV can be indicated Reward function evaluates the load balancing state of training environment.That is, the evaluation can use a reward function value To indicate.Each positive reward function value indicates：It rewards in one bout, the movement that DQN is exported；Each negative reward Functional value indicates：It punishes in one bout, the movement that DQN is exported.When the study by multi-round, so that DQN is gradually learned Can how output action after obtaining bigger reward function value, illustrates to train completion.It, can be with based on the DQN that training is completed For the real-time traffic information of network, optimal routing policy is provided.

It is understood that in the expression formula of MLV, selection is using being most worth rather than mean value describes the congestion of training environment Situation, be due to：Network congestion is often caused by network load is uneven, it is therefore desirable to may cause network load mistake to any The routing policy that degree is concentrated is punished.By measuring the link that loading condition is worst in network, can easily judge The quality of routing policy, and the routing policy and bad routing policy that can make it difficult to distinguish using average value.

In the present embodiment, sample flow can be calculated by the calculation formula of above-mentioned MLV according to Target Link load vector The corresponding reward function value MLV of information is measured, further to carry out to the learning outcome of one bout according to reward function value MLV Evaluation.

S33 calculates the value of pre-set loss function according to reward function value and sample routing policy.

In the present embodiment, it can be calculated by the following formula and set in advance according to reward function value and sample routing policy The value for the loss function set：

L (θ)=E [(MLV+ γ max_aQ (s ', a ' | θ)-Q (s, a | θ)²]

Wherein, L (θ) indicates that loss function, MLV indicate reward function value, and γ indicates that discount factor, 0≤γ≤1, θ indicate The current network parameter of DQN, and Q (s, a | θ) it indicates after initial state information s is inputted DQN, export the current road of every sample flow The progressive award obtained after, a indicate the current routing of every sample flow, max_aQ (s ', a ' | θ) it indicates according to sample arm by plan The optimal progressive award slightly determined.

Specifically, the next state shifted after s ' expression execution movement a, max_aQ (s ', a ' | θ) indicate that state s ' is corresponding Current sample flow all alternate routings corresponding to maximum value in progressive award, the corresponding current sample of a ' expression state s ' All alternative roads of this stream.

Wherein it is possible to which above-mentioned progressive award Q (s, a | θ) is calculated by reward function value MLV.Specifically, passing through The method that reward function value MLV calculates progressive award Q (s, a | θ), and the optimal accumulation determined according to sample routing policy The method of reward belongs to the prior art.The present invention is herein without repeating.

S205 adjusts the network parameter of DQN, and return when the value of loss function calculated is not less than the first preset value It returns and sample flow information input DQN is obtained into every sample flow according to the priority sequence of preset every sample flow The step of current routing.

When the value of loss function calculated is not less than the first preset value, illustrate that the training effect of DQN has not been reached yet Expected effect, thus the network parameter of adjustable DQN, return to step S203.Specifically, back transfer can be used The network parameter of DQN is adjusted with gradient descent method.

Certainly, the first preset value can be set according to actual needs.

S206 terminates training when the value of loss function calculated is lower than the first preset value, obtains training completion DQN。

When the value of loss function calculated is lower than the first preset value, illustrate that the training effect of DQN has reached expected Effect, training can terminate, to obtain the DQN of trained completion.That is, the network ginseng of the DQN completed based on training Number, can export the Optimization route of each item stream in network.

In addition, in the present embodiment, on the basis of embodiment shown in Fig. 1, can also include the following steps：

S104 (not shown), the routing flowed according to every, updates local flow table；Updated flow table is sent to Each openflow interchanger, so that each openflow interchanger carries out the data in the network according to updated flow table Corresponding operating.

SDN controller can formulate corresponding routing policy and forwarding mechanism according to the network traffic information being collected into. Thus after the routing for determining every stream, the routing that SDN controller can directly be flowed according to every updates local flow table. Updated flow table can be then sent to each openflow interchanger of data plane in SDN network framework, so that respectively Openflow interchanger carries out corresponding operating according to updated flow table, to the data in the network.For example, in network Data packet is forwarded operation.Each openflow interchanger can realize data according to the optimal routing policy of network as a result, Transmission.

The above-mentioned routing flowed according to every, can be achieved by the prior art, this hair the step of updating local flow table It is bright herein without repeating.

Corresponding to above method embodiment, the embodiment of the invention provides be based on deeply under a kind of SDN framework to learn Routing decision device, be applied to SDN controller, as shown in figure 4, the apparatus may include：

First obtains module 401, for obtaining the real-time traffic information in network；Wherein, the real-time traffic packet It includes：Every occupied link bandwidth of stream in the network；

First determining module 402, for determining the priority of every stream；

Second determining module 403, the depth Q network DQN for training the real-time traffic information input in advance, according to The priority sequence of every stream successively determines the routing of every stream；

Wherein, the DQN is according to sample flow information and the corresponding sample routing policy instruction of the sample flow information It gets；The sample flow information includes：The occupied link bandwidth of every sample flow, the sample routing policy packet It includes：The sample arm of the corresponding every sample flow of the sample flow information by.

Scheme provided in an embodiment of the present invention, it is corresponding previously according to sample flow information and the sample flow information The DQN that is obtained by Strategies Training of sample arm, and then obtaining the reality in network when the routing of every stream in determining network When flow information after, by the trained DQN of the real-time traffic information input so that priority of the DQN according to every stream in network, Successively determine the routing of every stream.It is routed since the embodiment of the present invention is determined based on DQN network trained in advance, and DQM Can be according to the sample data of the network of topological structure to be analyzed, therefore when network training, the embodiment of the present invention can be each The generation of network congestion is reduced in the network of kind topological structure, and in the network environment of the highly dynamic variation of network flow, it is real The optimization of existing routing policy.

Further, on the basis of the embodiment shown in fig. 4, as shown in figure 5, one kind provided by the embodiment of the present invention Under SDN framework based on deeply study routing decision device, can also include：

Module 501 is constructed, for constructing initial DQN；

Second obtains module 502, for obtaining sample flow information and the corresponding sample arm of the sample flow information By strategy；

Third determining module 503 is used for by DQN described in the sample flow information input, according to preset every described The priority sequence of sample flow obtains the current routing of every sample flow；

Computing module 504, for according to every sample flow current routing and the sample routing policy, meter Calculate the value of pre-set loss function；

First processing module 505, for when the value of loss function calculated is not less than the first preset value, described in adjustment The network parameter of DQN, and trigger the third determining module 503；

Second processing module 506, for terminating instruction when the value of loss function calculated is lower than first preset value Practice, obtains the DQN of training completion.

Optionally, the third determining module 503 may include：

Construction unit is constituted just for the initial value and priority 0 with the sample flow information, link load vector Beginning status information；Wherein, the link load vector is made of the link loading value of each of the links in preset training environment Vector, the link loading value of any bar link is：By the sum of each occupied link bandwidth of sample flow of this link；

First output unit, for the initial state information to be inputted the DQN, the sample flow that output priority is 0 Current routing；

Updating unit, the current road for the sample flow for being 0 according to the initial state information and the priority By updating current ink load vector, and priority is enabled to increase by 1；

Second output unit, for setting s=1 ..., N-1 executes following steps according to the ascending sequence circulation of s A1-a3, output priority are the current routing of the sample flow of 1~N-1, wherein the N indicates the quantity of sample flow：

a1：S-th of state is constituted with the sample flow information, updated link load vector and current priority Information；

a2：S-th of status information is inputted into the DQN, output priority is the current routing of the sample flow of s；

a3：According to the current routing for the sample flow that s-th of status information and the priority are s, update current Link load vector, and priority is enabled to increase by 1.

Optionally, the computing module 504 may include：

First computing unit, the current road for the sample flow for being N-1 according to N-1 status information and priority By calculating Target Link load vector；Wherein, the Target Link load vector includes：The sample flow information is corresponding The physical link load value of each of the links in the preset training environment；

Second computing unit, for it is corresponding to calculate the sample flow information according to the Target Link load vector Reward function value MLV；

Third computing unit, for being preset according to the reward function value and the sample routing policy, calculating Loss function value.

Optionally, the second computing unit is specifically used for being calculated by the following formula according to the Target Link load vector The corresponding reward function value MLV of the sample flow information：

MLV=min ((t₁-l₁), (t₂-l₂), (t₃-l₃)...(t_m-l_m))

Wherein, l₁, l₂, l₃..., l_mRespectively indicate link 1,2,3 ..., the physical link load value of m, t₁, t₂, t₃..., t_mRespectively indicate link 1,2,3 ..., the bandwidth threshold of m；

Third computing unit is specifically used for according to the reward function value and the sample routing policy, by following Formula calculates the value of pre-set loss function：

L (θ)=E [(MLV+ γ max_aQ (s ', a ' | θ)-Q (s, a | θ)²]

Wherein, L (θ) indicates that the loss function, MLV indicate the reward function value, and γ indicates discount factor, 0≤γ ≤ 1, θ indicate the current network parameter of the DQN, Q (s, a | θ) it indicates after the initial state information s is inputted the DQN, The progressive award obtained after the current routing of every sample flow is exported, a indicates the current routing of every sample flow, max_aQ (s ', a ' | θ) indicate the optimal progressive award determined according to sample routing policy.

Optionally, the first determining module 402 may include：

First determination unit, for being directed to every stream f_{I, j}, determine x alternate routing of this streamWherein, i indicates the stream f_{I, j}Source node, j indicates the stream f_{I, j}Destination node；

4th computing unit, for being calculated by the following formula the stream f_{I, j}The r articles alternate routingEvaluation of estimate EV_r：

Wherein, l₁, l₂, l₃..., L indicates the r articles alternate routingEach link passed through,It indicates in the network except the stream f_{I, j}Except other stream alternate routings in, by each item Link l₁, l₂, l₃..., each total degree of L；Indicate each total degree In maximum value；

5th computing unit is calculated by the following formula the stream f_{I, j}Priority reference value P_{I, j}：

P_{I, j}=max (E)-max (E { max (E) })

Wherein, E is indicated by the stream f_{I, j}All alternate routings evaluation of estimate composition set, E={ EV₁, EV₂, EV₃...EV_X, max (E) indicates the maximum value in the set E, E { max (E) } expression by the maximum value in the set in E The new set that max (E) is formed after removing, max (E { max (E) }) indicate the new set E maximum value in { max (E) }；

Second determination unit, the sequence of the priority reference value for flowing according to described every are determined described every The priority of item stream；Wherein, the priority of the highest stream of priority reference value be 0, the minimum stream of priority reference value it is preferential Grade is N-1.

Optionally, the alternate routing of every stream meets the following conditions：

Any alternate routing of every stream is acyclic；

Any alternate routing paths traversed and other alternate routing paths traversed of every stream are incomplete It is identical；

The distance of any alternative path of every stream meets preset value.

In addition, the embodiment of the invention also provides a kind of SDN controllers, as shown in fig. 6, connecing including processor 601, communication Mouth 602, memory 603 and communication bus 604, wherein processor 601, communication interface 602, memory 603 pass through communication bus 604 complete mutual communication,

Memory 603, for storing computer program；

Processor 601 when for executing the program stored on memory 603, realizes SDN any in above-described embodiment Route decision method based on deeply study under framework.

The communication bus that above-mentioned SDN controller is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, abbreviation PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, abbreviation EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control Bus processed etc..Only to be indicated with a thick line in figure convenient for indicating, it is not intended that an only bus or a type of total Line.

Communication interface is for the communication between above-mentioned electronic equipment and other equipment.

Memory may include random access memory (Random Access Memory, abbreviation RAM), also may include Nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.

Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, Abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.；It can also be digital signal processor (Digital Signal Processing, abbreviation DSP), specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), field programmable gate array (Field-Programmable Gate Array, Abbreviation FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hardware components.

In another embodiment provided by the invention, a kind of computer readable storage medium is additionally provided, which can Read storage medium in be stored with instruction, when run on a computer so that computer execute it is any in above-described embodiment Route decision method based on deeply study under SDN framework.

In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.Computer program product Including one or more computer instructions.When loading on computers and executing computer program instructions, all or part of real estate Raw process or function according to the embodiment of the present invention.Computer can be general purpose computer, special purpose computer, computer network, Or other programmable devices.Computer instruction may be stored in a computer readable storage medium, or from a computer Readable storage medium storing program for executing to another computer readable storage medium transmit, for example, computer instruction can from a web-site, Computer, server or data center by wired (such as coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (such as Infrared, wireless, microwave etc.) mode transmitted to another web-site, computer, server or data center.Computer Readable storage medium storing program for executing can be any usable medium or include one or more usable medium collection that computer can access At the data storage devices such as server, data center.Usable medium can be magnetic medium, (for example, floppy disk, hard disk, magnetic Band), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..

It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence " including one ... ", it is not excluded that There is also other identical elements in the process, method, article or equipment for including element.

Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device/ For SDN controller/storage medium embodiment, since it is substantially similar to the method embodiment, so being described relatively simple, phase Place is closed to illustrate referring to the part of embodiment of the method.

The above is merely preferred embodiments of the present invention, it is not intended to limit the scope of the present invention.It is all in this hair Any modification, equivalent replacement, improvement and so within bright spirit and principle, are included within the scope of protection of the present invention.

Claims

1. the route decision method based on deeply study under a kind of SDN framework, which is characterized in that be applied to software defined network Network SDN controller, the method includes：

Obtain the real-time traffic information in network；Wherein, the real-time traffic information includes：Shared by every stream in the network Link bandwidth；

Determine the priority of every stream；

The depth Q network DQN that the real-time traffic information input is trained in advance, it is suitable according to the described every priority flowed height Sequence successively determines the routing of every stream；

Wherein, the DQN is to be obtained according to sample flow information and the corresponding sample arm of the sample flow information by Strategies Training It arrives；The sample flow information includes：The occupied link bandwidth of every sample flow, the sample routing policy include：Institute State the sample arm of the corresponding every sample flow of sample flow information by.

2. the method according to claim 1, wherein the training process of the DQN includes：

Construct initial DQN；

Obtain sample flow information and the corresponding sample routing policy of the sample flow information；

DQN described in the sample flow information input is obtained according to the priority sequence of preset every sample flow To the current routing of every sample flow；

According to the current routing of every sample flow and the sample routing policy, pre-set loss function is calculated Value；

When the value of loss function calculated is not less than the first preset value, the network parameter of the DQN is adjusted, and described in return DQN described in the sample flow information input is obtained into institute according to the priority sequence of preset every sample flow The step of stating the current routing of every sample flow；

When the value of loss function calculated is lower than first preset value, terminate training, obtains the DQN of training completion.

3. according to the method described in claim 2, pressing it is characterized in that, described by DQN described in the sample flow information input According to the priority sequence of preset every sample flow, the current routing of every sample flow is obtained, including：

Initial state information is constituted with the sample flow information, the initial value of link load vector and priority 0；Wherein, The link load vector is the vector being made of the link loading value of each of the links in preset training environment, any bar link Link loading value be：By the sum of each occupied link bandwidth of sample flow of this link；

The initial state information is inputted into the DQN, the current routing for the sample flow that output priority is 0；

According to the current routing for the sample flow that the initial state information and the priority are 0, current ink load is updated Vector, and priority is enabled to increase by 1；

Set s=1 ..., N-1, according to s it is ascending sequence circulation execute following steps a1-a3, output priority be 1~ The current routing of the sample flow of N-1, wherein the N indicates the quantity of sample flow：

a1：S-th of state letter is constituted with the sample flow information, updated link load vector and current priority Breath；

a3：According to the current routing for the sample flow that s-th of status information and the priority are s, current ink is updated Load vector, and priority is enabled to increase by 1.

4. according to the method described in claim 3, it is characterized in that, the current routing according to every sample flow, with And the sample routing policy, the value of pre-set loss function is calculated, including：

According to the current routing for the sample flow that N-1 status information and priority are N-1, calculate Target Link load to Amount；Wherein, the Target Link load vector includes：It is every in the corresponding preset training environment of the sample flow information The real-time link load value of link；

According to the Target Link load vector, the corresponding reward function value MLV of the sample flow information is calculated；

According to the reward function value and the sample routing policy, the value of pre-set loss function is calculated.

5. according to the method described in claim 4, it is characterized in that,

According to the Target Link load vector, it is calculated by the following formula the corresponding reward function value of the sample flow information MLV：

MLV=min ((t₁-l₁), (t₂-l₂), (t₃-l₃)...(t_m-l_m))

Wherein, l₁, l₂, l₃..., l_mRespectively indicate link 1,2,3 ..., the real-time link load value of m, t₁, t₂, t₃..., t_m Respectively indicate link 1,2,3 ..., the bandwidth threshold of m；

According to the reward function value and the sample routing policy, it is calculated by the following formula pre-set loss letter Several values：

L (θ)=E [(MLV+ γ max_aQ (s ', a ' | θ)-Q (s, a | θ)²]

Wherein, L (θ) indicates that the loss function, MLV indicate the reward function value, and γ indicates discount factor, 0≤γ≤1, θ Indicate the current network parameter of the DQN, and Q (s, a | θ) it indicates after the initial state information s is inputted the DQN, export institute The progressive award obtained after the current routing of every sample flow is stated, a indicates the current routing of every sample flow, max_aQ (s ', a ' | θ) indicate the optimal progressive award determined according to sample routing policy.

6. the method according to claim 1, wherein the determination it is described every stream priority, including：

For every stream f_{I, j}, determine x alternate routing of this streamWherein, i indicates the stream f_{I, j}Source node, j indicates the stream f_{I, j}Destination node；

It is calculated by the following formula the stream f_{I, j}The r articles alternate routingEvaluation of estimate EV_r：

Wherein, l₁, l₂, l₃..., L indicates the r articles alternate routingEach link passed through, It indicates in the network except the stream f_{I, j}Except other stream alternate routings in, by each link l₁, l₂, l₃..., each total degree of L；Indicate each total degreeIn maximum Value；

It is calculated by the following formula the stream f_{I, j}Priority reference value P_{I, j}：

P_{I, j}=max (E)-max (E { max (E) })

The sequence of the priority reference value flowed according to described every determines the priority of every stream；Wherein, preferentially The priority of the grade highest stream of reference value is 0, and the priority of the minimum stream of priority reference value is N-1.

7. according to the method described in claim 6, it is characterized in that, the alternate routing of every stream meets the following conditions：

Any alternate routing of every stream is acyclic；

Any alternate routing paths traversed and other alternate routing paths traversed of every stream are not exactly the same；

The distance of any alternative path of every stream meets preset value.

8. the routing decision device based on deeply study under a kind of SDN framework, which is characterized in that it is applied to SDN controller, Described device includes：

First obtains module, for obtaining the real-time traffic information in network；Wherein, the real-time traffic information includes：It is described Every occupied link bandwidth of stream in network；

First determining module, for determining the priority of every stream；

Second determining module, the depth Q network DQN for training the real-time traffic information input in advance, according to described every The priority sequence of item stream successively determines the routing of every stream；

Wherein, the DQN is to be obtained according to sample flow information and the corresponding sample arm of the sample flow information by Strategies Training It arrives；The sample flow information includes：The occupied link bandwidth of every sample flow, the sample routing policy include：Institute State the routing of the corresponding every sample flow of sample flow information.

9. device according to claim 8, which is characterized in that described device further includes：

Module is constructed, for constructing initial DQN；

Second obtains module, for obtaining sample flow information and the corresponding sample routing policy of the sample flow information；

Third determining module is used for by DQN described in the sample flow information input, according to preset every sample flow Priority sequence obtains the current routing of every sample flow；

Computing module, for according to every sample flow current routing and the sample routing policy, calculating set in advance The value for the loss function set；

First processing module, for adjusting the net of the DQN when the value of loss function calculated is not less than the first preset value Network parameter, and trigger the third determining module；

Second processing module, for terminating training, obtaining when the value of loss function calculated is lower than first preset value The DQN that training is completed.

10. a kind of SDN controller, which is characterized in that including processor, communication interface, memory and communication bus, wherein institute It states processor, the communication interface, the memory and completes mutual communication by the communication bus；

The memory, for storing computer program；

The processor when for executing the program stored on the memory, realizes side as claimed in claim 1 to 7 Method step.