Disclosure of Invention
The invention provides a spectrum resource allocation method facing a cloud-fog elastic optical network advance reservation request, which solves the problem of spectrum resource allocation of cross-data center transmission services such as data backup, application data synchronization and virtual machine migration in the existing Internet of things.
S1, for a service request
K shortest candidate paths of the service request r are calculated by using a shortest path algorithm, wherein,
representing the number of services carried by the service request r, s representing the source node, d representing the destinationNode of, t
aAnd t
dRespectively representing the arrival time and the deadline of the service request r;
s2, dividing time slices and frequency spectrum slots based on frequency spectrum resources of each link, obtaining a path resource matrix corresponding to the shortest candidate path in the step S1 according to the state of each time frequency spectrum unit, and obtaining the number n of the time slices needed for processing the service request r according to the path resource matrixtAnd the number n of spectral slotsf;
S3, the number n of time slices obtained from the step S2tAnd the number n of spectral slotsfConfirming the action A allocated to the service request R in the path resource matrix obtained in the step S2 by using a reinforcement learning algorithm, and acquiring a corresponding reward R according to the action A;
s4, according to the reward R obtained in the step S3, whether the distribution scheme under the shortest candidate route is effective is confirmed, if yes, the distribution scheme under the shortest candidate route and the corresponding reward R are recorded, and then the step S5 is executed, and if not, the step S5 is directly executed;
the allocation scheme includes a start time t of scheduling of a service request rsThe shortest candidate path, the number of time slices n required for processing the service request rtAnd the number n of spectral slotsf;
S5, according to the method of steps S2-S4, traversing k shortest candidate paths in turn, and selecting the distribution scheme generating the maximum reward R as the distribution scheme of the service request R.
The invention has the beneficial effects that: for an incoming advance reservation request, the invention firstly finds k shortest candidate paths by using a shortest path method, traverses each candidate path and calculates available spectrum resources corresponding to each candidate path; different service time and the number of frequency spectrum slots can be allocated to each service request, then the optimal allocation scheme is selected by utilizing the deep neural network, meanwhile, a reward is obtained for each allocation scheme, and the optimal allocation scheme is decided according to the reward; the method has good robustness, can select a proper routing path for all the services of the advance reservation requests and allocate the optimal service time and spectrum resources for each advance reservation request, thereby maximizing the utilization rate of the spectrum resources and reducing the blocking rate and the initial time delay of the service requests.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to fig. 1 to 5 in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
A spectrum resource allocation method for a pre-reservation request in a cloud-cloud elastic optical network, as shown in fig. 1, includes the following steps:
s1, for a service request
K shortest candidate paths of the service request r are calculated by using a shortest path algorithm, wherein,
table s representing the number of services carried by a service request rIndicating a source node, d indicating a destination node, t
aAnd t
dRespectively representing the arrival time and the deadline of the service request r;
the service request
For reserving the request service in advance, each shortest candidate path is composed of one or more links.
S2, dividing time slices and frequency spectrum slots based on frequency spectrum resources of each link, respectively obtaining a path resource matrix corresponding to the shortest candidate path in the step S1 according to the state of each time frequency spectrum unit, and obtaining the number n of the time slices needed for processing the service request r according to the path resource matrixtAnd the number n of spectral slotsfThe method comprises the following steps:
s21, dividing the link on the shortest candidate path into time slices and frequency spectrum slots, establishing time frequency spectrum units based on the time slices and the frequency spectrum slots, and respectively confirming the state of each time frequency spectrum unit on the link;
the expression of the state of the time spectrum unit is as follows:
wherein S is(t,f)Representing a time-spectral unit u(t,f)State of (1), time spectrum unit u(t,f)Is composed of the t-th time slice and the f-th frequency spectrum slot.
S22, confirming the link resource matrix of the link according to the state of each time spectrum unit on the link obtained in the step S21;
the expression of the link resource matrix is:
in the formula of U
lA link resource matrix representing the link/,
representing a time-spectrum unit u on a link l
(T,F)T represents the number of time slices on link l; f denotes the number of spectrum slots on link i.
S23, confirming the link resource matrix of each link on the shortest candidate path according to the methods of S21 and S22, and confirming the path resource matrix of the shortest candidate path according to the link resource matrix;
the expression of the path resource matrix is as follows:
in the formula of U
PA path resource matrix representing the shortest candidate path P, L representing all link sets comprised by the shortest candidate path P,
representing the time-spectrum unit u on all links in the shortest candidate path P
(T,F)The state of (1).
The path resource matrix represents the state of each time spectrum unit in the shortest candidate path, and the available spectrum resources in the shortest candidate path can be quickly identified according to the path resource matrix.
S24, calculating the service duration time Deltat required by the service request r according to the path resource matrix, and calculating the number n of the time slices required by the service request r according to the service duration time DeltattAnd the number n of spectral slotsf;
The number of time slices ntThe calculation formula of (2) is as follows:
wherein τ represents the size of a time slice, and Δ t represents the service duration of the service request r;
the service duration Δ t is obtained by processing the service request r by respectively trying different start times and using available spectrum resources in the path resource matrix according to the following constraint conditions:
max△t=td-ta;
ta≤ts≤td;
τ≤△t≤td-ts;
in the formula, tsRepresents the starting time of the scheduling of the service request r;
the calculation formula of the service duration time Δ t is as follows:
△t=te-ts;
in the formula, teRepresenting the end time of the service request r scheduling;
the number n of spectrum slotsfThe calculation formula of (2) is as follows:
in the formula, FslotRepresenting the capacity of a spectrum bin, GB representing the guard bandwidth [. ]]Indicating that the whole is taken.
In this embodiment, the capacity F of one spectrum slotslotAt 12.5GHZ, the size of a time slice τ is one hour.
S3, the number n of time slices obtained from the step S2tAnd the number n of spectral slotsfConfirming the action A allocated to the service request R in the path resource matrix obtained in the step S2 by using a reinforcement learning algorithm, and acquiring a corresponding reward R according to the action A;
in this embodiment, the reinforcement learning algorithm is a DQN algorithm, and step S4 includes the following steps:
s31, as shown in FIG. 2, establishing a resource environment according to the path resource matrix established in step S2, and the number n of time slices required by the service request rtAnd the number n of spectral slotsfAnd establishing a request environment corresponding to the resource environment, and synthesizing the resource environment and the request environment to obtain an environment state S.
And S32, inputting the environment state S obtained in the step S31 into the evaluate network of the DQN algorithm to obtain an action A, wherein the action A represents the position of the service request r to be distributed in the path resource matrix.
And S33, judging and calculating the reward R corresponding to the position according to the reward mechanism.
The reward mechanism of the reward R is as follows:
in the formula, SRU represents a spectrum resource utilization value, and TSAE represents a time spectrum allocation efficiency; the smaller the spectrum resource utilization value SRU, the better, indicating that more resources may be left for subsequent requests, and therefore, the smaller the SRU,
the larger, i.e. the more awards R; the larger the time-spectrum allocation efficiency TSAE, the better, indicating less spectrum fragmentation in the path resource matrix, i.e., more available resources.
The calculation formula of the frequency spectrum resource utilization value SRU is as follows:
SRU=(te-ts)×nt×h(r);
where h (r) represents the number of route hops from source node s to destination node d;
the calculation formula of the time spectrum allocation efficiency TSAE is as follows:
TSAE=Cs×Ri×TFc;
in the formula, CsDenotes the size of the cluster, RiIndicating resource idleness, TFcRepresents temporal spectral continuity;
the calculation of the time spectrum allocation efficiency TSAE comprehensively considers two factors of a cluster and a resource idleness degree on the basis of the time spectrum continuity, so that the spectrum fragmentation can be reduced, and the spectrum resources are utilized to the maximum extent.
The cluster is divided into a position assigned by the service request r and a surrounding areaThe time and frequency spectrum units are connected to form a cluster with the size CsI.e. the number of time-spectrum units in the cluster; resource idleness degree RiRepresenting the fraction of time spectrum units in the path resource matrix that are free. As shown in fig. 4, if the allocated location of the service request is available block 1, cluster 1 is formed, and the size C of cluster 1s64; if the service request is allocated the available block 2, cluster 2 is formed, and the size C of cluster 2s17; since the number of time spectrum units in available block 1 and available block 2 is the same, the resource idleness R in both casesiSame as Ri=0.32。
The time-frequency spectrum continuity TFcThe calculation formula of (2) is as follows:
in the formula (I), the compound is shown in the specification,
and
representing the number of available spectral blocks, num, along the time axis and the spectral axis, respectively
2uIndicating the number of two consecutive spectral units (along the time axis and the spectral axis, respectively).
Time-frequency spectrum continuity TFcRepresents the situation of spectral fragmentation in the path resource matrix, as shown in fig. 5, the corresponding TF in fig. 5c=1.08。
S4, according to the reward R obtained in the step S3, whether the distribution scheme under the shortest candidate route is effective is confirmed, if yes, the distribution scheme under the shortest candidate route and the corresponding reward R are recorded, and then the step S5 is executed, and if not, the step S5 is directly executed; the allocation scheme includes a start time t of scheduling of a service request rsThe shortest candidate path, the number of time slices n required for processing the service request rtAnd the number n of spectral slotsf。
Whether the position allocated in step S3 is occupied can be determined according to the sign of the reward R, and if the reward R is positive, the allocation scheme is valid, and if the reward R is negative, the allocation scheme is invalid.
Preferably, after recording the distribution scheme under the shortest candidate path, the environment state S is synchronously updated according to the action a to obtain a new environment state S_And the experience (S, A, R, S)_) And storing the updated network parameter into an experience pool of the evaluate network, judging whether the set time for updating the network parameter is reached, if so, updating the network parameter, and if not, directly executing the step S5.
As shown in fig. 3, the DQN algorithm includes two networks, namely an evaluate network and a target network, respectively, and the evaluate network is used to calculate an estimated Q value, denoted as Qevaluate(ii) a the target network is used for calculating an actual Q value, which is marked as Qtarget. As shown in fig. 3, according to the set time for updating the network parameters, the evaluate network and the target network extract part of experience (S, a, R, S) from the experience pool at intervals_) The evaluate network obtains Q according to the environment state Sevaluate(S, A), the target network according to the new environment state S_To obtain Qtarget(S_,A_) Then calculating a loss function from the two Q values, wherein A_Indicating a new environmental state according to S-The estimated new action.
The loss function is Qevaluate(S, A) and Qtarget(S_,A_) The specific formula of the mean square error L is as follows:
L=E((Qtarget(S_,A_)-Qevaluate(S,A))2);
the evaluate network updates the network parameters by adopting a gradient descent method, and the target network copies the updated parameters of the evaluate network, which is the prior art and is not described in detail in this embodiment.
And S5, traversing the k shortest candidate paths in sequence according to the method of the steps S2-S4, and then using the allocation scheme of the maximum reward R generated by the elastic optical network as the spectrum resource allocation scheme of the service request R.
The invention firstly establishes a two-dimensional resource model of frequency domain and time domain facing the service of the advance reservation request, carries out interaction with the environment through reinforcement learning, scores the frequency spectrum resource allocation scheme to optimize the allocation of frequency spectrum resources, and then updates the state of the corresponding time frequency spectrum unit according to the determined allocation scheme to prepare for the arrival of the next service request.
Since Deep Reinforcement Learning (DRL) shows the potential for successful Learning strategies for combinatorial and distributed problems, the present invention relies on obtaining feedback and rewards from the environment, and the DQN algorithm can learn the optimization strategy step by step, and is therefore well suited for decision-making problems. In the research of the static Spectrum Allocation strategy of the advance reservation request service, the optimal solution of the computing Resource of the Integer Linear Programming (ILP), the DRDA method and three traditional heuristic algorithms are compared, the performance of the DRDA in the static RSA problem is tested, and the simulation result shows that the performance of the DRDA method is very close to the optimal solution of the Resource computed by the ILP. In the research of dynamic Spectrum Allocation strategy, the invention provides a Time Spectrum Allocation Efficiency (TSAE) measurement standard for measuring the available resource state in an elastic optical network, a DQN algorithm Allocation scheme is adopted for scoring, simulation test and large-scale network experiment are adopted for comparing DRDA with three traditional heuristic algorithms from three aspects of average TSAE, request blocking rate and average initial delay, and the result shows that the DRDA method has good robustness, and compared with other three heuristic algorithms, the DRDA method keeps lower initial delay while obtaining the lowest blocking rate.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.