CN111246320B

CN111246320B - Deep reinforcement learning flow dispersion method in cloud-fog elastic optical network

Info

Publication number: CN111246320B
Application number: CN202010016994.1A
Authority: CN
Inventors: 朱睿杰; 李世华; 李亚飞; 吕培; 徐明亮
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2020-01-08
Filing date: 2020-01-08
Publication date: 2021-09-07
Anticipated expiration: 2040-01-08
Also published as: CN111246320A

Abstract

The invention provides a deep reinforcement learning flow dispersion method in a cloud-fog elastic optical network, which comprises the following steps: calculating a shortest path of the service request through a shortest path algorithm; converting the service path and the network topology sliced according to the wavelength into a form of a picture; extracting the characteristics of all pictures by using a convolutional neural network, classifying by using a softmax classifier, and allocating service requests to corresponding wavelengths; if the allocated wavelength has available resources, the service request is successfully allocated, otherwise, all the wavelengths are traversed according to a first adaptation method to allocate the service request; evaluating by using a reinforcement learning algorithm, updating the network state of the topology, and generating a shortest path topological graph of the next service request; and updating the convolutional neural network every time at least three service request allocations are completed. The invention continuously updates the network through reinforcement learning, so that all services can fully utilize ports, transceivers and amplifiers in the network, thereby reducing the total energy consumption of the network.

Description

Deep reinforcement learning flow dispersion method in cloud-fog elastic optical network

Technical Field

The invention relates to the technical field of elastic optical networks and cloud-fog communication, in particular to a deep reinforcement learning traffic dispersion method in a cloud-fog elastic optical network.

Background

Cloud computing transports all data sets to the same center for analysis, storage and processing, and is good at providing various services, but with the explosive growth of global internet of things equipment, mass data generated by the equipment are not suitable for being processed by the cloud computing, and meanwhile, redundant transmission process causes too high delay, and huge challenges are brought to the current communication network. In order to meet the requirement of a large amount of low-delay computing of the Internet of things and make up for the defects of traditional cloud computing, the fog computing is carried out at the same time, the mode of the fog computing is that a plurality of fog nodes are arranged, data processing and application programs are concentrated in equipment at the edge of a network, the computing processing speed is higher, and a computing result is obtained more efficiently, so the fog computing becomes the best candidate for processing data, and the fog computing has the advantages of low delay, high safety, better user experience and higher power consumption efficiency.

An Elastic Optical Network (EON) is a promising Network infrastructure for communication between a cloud node and a cloud data center, abstracts resources in a bottom-layer physical Network into a resource pool of cloud-fog computing, performs resource allocation and management on a virtual Network, and can provide flexible and efficient services. In order to fully utilize the flexibility and simultaneously fully utilize the bottom layer physical resources, a traffic grooming method is developed, a plurality of fine-grained IP traffic can be flexibly converged into an optical layer through the existing optical path, and a frequency spectrum is flexibly distributed according to the service request bandwidth. Especially with the development of substrate devices such as a sliceable optical transponder, a sliceable optical amplifier, etc., the flow grooming can achieve higher power consumption efficiency.

Traffic grooming directs different bandwidth requests to the same wavelength to save resources and energy. The total energy consumption mainly consists of three parts: IP port, transceiver and amplifier, the energy consumption of these three parts is first modeled:

IP port: considering that the basic energy consumption of 400Gbps is 560W, the total port energy consumption is E_IPTAnd (W) represents.

An optical transceiver: the energy consumption depends on the linear speed of the service request, and for each linear speed unit, the energy consumption is 1.683W (parameter η is 1.683W/Gbps), and the calculation formula is as follows:

where TR denotes the transmission rate, N_OPTIs the number of optical transceivers that are,

is shown asEnergy consumption of i transceivers, E_OPTRepresenting the total energy consumption of the transceiver. Line rates of 40Gbps and 100Gbps are considered in the present invention.

An optical amplifier: the basic power consumption mu of each optical amplifier is 100W, and the additional power consumption depends on the linear speed of the service request. The additional power consumption is 25W and 50W for 40Gbps and 100Gbps, respectively. The energy consumption calculation formula of the optical amplifier is as follows:

where θ is the additional energy consumption, N_OPRIs the number of optical amplifiers.

Is the power consumption of the ith amplifier, E_OPRRepresenting the total energy consumption of the amplifier.

Therefore, the total energy consumption is calculated by the formula: e_TG(W)＝E_IPT(W)+E_OPT(W)+E_OPR(W) (5)。

In the existing research, only a fixed flow grooming strategy or a simple strategy depending on manual feature extraction is applied, and a real self-adaptive flow grooming strategy cannot be realized. Meanwhile, the effectiveness of Deep Learning (DRL) in solving large-scale tasks is also verified.

Disclosure of Invention

Aiming at the technical problems that mass data in the existing Internet of things is high in delay through cloud computing processing and energy consumption of an elastic optical Network is high, the invention provides a deep reinforcement learning flow grooming method in a cloud-fog elastic optical Network.

In order to achieve the purpose, the technical scheme of the invention is realized as follows: a deep reinforcement learning flow dispersion method in a cloud-fog elastic optical network comprises the following steps:

the method comprises the following steps: for a service request r ═ s, d, t, calculating the shortest path of the service request r by a shortest path algorithm; converting a service path of the service request r and a network topology sliced according to wavelength into a form of a picture; wherein, s and d represent a source node and a destination node respectively, and t represents the bandwidth requirement of the service request r;

step two: extracting the characteristics of all the pictures in the step one by using a convolutional neural network, classifying by using a softmax classifier, and allocating the service request to corresponding wavelengths according to the classification result;

step three: if the allocated wavelength has available resources, the service request is successfully allocated, if no available resources exist, all the wavelengths are traversed according to a first adaptation method to allocate the service request r, and a reward value is obtained according to the reduced energy consumption;

step four: after the distribution of each service request is completed, evaluating the step three by using a reinforcement learning algorithm to generate a value, updating the network state of the topology, and generating a shortest path topological graph of the next service request;

step five: and repeating the first step to the fourth step, and updating the convolutional neural network according to the network state, the action, the reward value and the value every time when the distribution of at least three service requests is completed.

In the first step, bandwidth resources of each link from the source node s to the destination node d are divided into 5 parts according to the wavelength, when a service request comes, the bandwidth resources are selected to be allocated to any wavelength, only the state of the current wavelength is changed, namely, the service request is allocated to the current wavelength, and the port, the transceiver, the amplifier and the bandwidth occupation condition of the corresponding position are changed.

The method for converting the service path and the network topology sliced according to the wavelength into the form of the picture comprises the following steps: drawing nodes and links according to the positions of the nodes and the communication condition of the links, and respectively drawing points with different colors and sizes according to the occupation conditions of ports, transceivers and amplifiers; a picture of one of the wavelengths of the network topology sliced by wavelength is: firstly, drawing nodes by using black solid dots according to the coordinates of given network nodes; then drawing links with different colors according to the communication condition of the given link and the occupation condition of the bandwidth on all the links in the network topology of the current wavelength; finally, the port and the transceiver are represented by small dots, the amplifier is represented by large dots, and the amplifier is drawn by different colors according to different occupation conditions; the topology of the traffic path is drawn in the same way.

The convolutional neural network in the second step adopts a lightweight convolutional neural network MobilenetV3, and the lightweight convolutional neural network MobilenetV3 decomposes the standard convolutional layer into a form of deep convolution and point convolution: the convolution kernel of the first layer of convolution layer is 3, the step length is 2, and the filling is 1; the second layer is a block layer with 15 layers of input and output channels, convolution kernels and step length determined; the convolution kernel of the third layer is 1, and the step length is 1; the fourth layer is an average pooling layer with a convolution kernel of 7; dimension reduction is carried out by two layers of 1 multiplied by 1 convolution layers.

Inputting the features extracted by the lightweight convolutional neural network MobilenetV3 into a softmax classifier to obtain the probability distribution of the action, wherein the higher the probability distribution is, the higher the probability of selecting the wavelength corresponding to the action is.

The activation function of the lightweight convolutional neural network MobilentV3 is:

where x represents the input of the activation function layer, ReLU () is a commonly used activation function; and the last layer of the lightweight convolutional neural network MobilentV3 has no activation function.

The available resources in the third step are idle resources in the port, the transceiver, the amplifier and the bandwidth at the position corresponding to the current service request in the network topology of the wavelength; the first adaptation method traverses all wavelengths according to the serial numbers of the wavelengths, and finds the first wavelength with available resources for allocation; and calculating a reward value according to the influence of the allocated wavelength on the network energy consumption.

The reinforcement learning algorithm in the fourth step adopts an Actor-Critic algorithm, the Actor-Critic algorithm comprises an Actor network and a Critic network, the Actor network and the Critic network share a neural network, the Actor network is responsible for guiding the service to a correct part, and the Critic network is used for judging the quality of the action to obtain a value; the network state of the topology represents the network characteristics extracted by the lightweight convolutional neural network MobileneetV 3, the action represents the selected wavelength, the reward value corresponds to the result of each service grooming, and the fewer the resources occupied by the service after entering the topology network, the larger the reward value.

And in the fourth step, the network state of the topology is updated according to the distribution condition of the service request, that is, the changed wavelength is redrawn, the topology maps of other wavelengths are kept unchanged, and the topology map of the next service request is drawn by using the method in the first step.

The method for updating the convolutional neural network in the fifth step comprises the following steps: updating the convolutional neural network by calculating the total loss:

l_t＝l_v·C_v+l_a+e·c_e，

wherein R is_iRepresenting the total reward value, V (s, theta) representing the value function, s representing the network state, theta representing the network parameter, l_vIs the mean square error of the total reward value and value function, l_aIs the cross entropy of the difference between the policy function and the total reward value and the value function; e is entropy, probability difference of evaluation action; l_tDenotes total loss, c_vAnd c_eCoefficients representing the value loss and entropy, respectively;

and updating the network parameter theta by a gradient descent method.

The invention has the beneficial effects that: for a static point-to-point service request, converting the service and the network topology cut according to the wavelength into a picture form; the method comprises the steps that the service is distributed to a certain wavelength by extracting the characteristics of pictures, if available resources exist on the wavelength, the service is distributed successfully, and if no available resources exist, all the wavelengths are traversed until the service can be distributed successfully; each successful distribution can obtain a reward value according to the reduced energy consumption, and the network is continuously updated through reinforcement learning; the invention converts each wavelength of each service and network into pictures, uses different sizes, shapes and colors to represent the occupation conditions of ports, transceivers and amplifiers at different positions, uses lines with different colors to represent the occupation conditions of link bandwidths, and adopts a convolutional neural network to automatically extract the effective characteristics of network topology; in order to make the flow dispersion more intelligent, the invention adopts a reinforcement learning method to successfully distribute all the services and make the total energy consumption less.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of the present invention.

Fig. 2 is a network topology diagram after conversion according to the present invention, wherein (a) is a network topology diagram of one wavelength sliced by wavelength, and (b) is a service topology diagram.

Fig. 3 is a flow chart of the core part of the algorithm.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

As shown in fig. 1, a deep reinforcement learning traffic grooming method in a cloud-fog elastic optical network includes the steps of:

the method comprises the following steps: for a service request r ═ s, d, t, (s, d, t), s and d represent the source node and the destination node respectively, t represents the bandwidth requirement of the service, and the Shortest Path of the service request r is calculated by a Shortest Path algorithm (Dijkstra short Path, DSP). The traffic path and the wavelength sliced network topology are then converted into the form of a picture.

The bandwidth resources of each link in the elastic optical network are divided into 5 parts according to the wavelength, the initial states of the parts are the same, when a service request comes, the part can be selected to be allocated to any wavelength, only the state of the current wavelength is changed, namely the service is allocated to the current wavelength, and the port, the transceiver, the amplifier and the bandwidth occupation condition of the corresponding position are changed. In describing the NSFNET (national science foundation network) network, the nodes are represented by black dots, the differences in bandwidth usage on the link are represented by 11 lines of different colors, and the colored dots near the nodes represent the ports, transceivers and amplifiers, respectively, as shown in fig. 2. Each randomly generated service comprises a source node, a destination node and the bandwidth required to be occupied, and the shortest path between the source node and the destination node is calculated by using a DSP algorithm.

The method for converting the network topology and the service path into the form of the picture according to the wavelength slice comprises the following steps: the node and the link are drawn according to the node position and the connection condition of the link, and then the nodes and the link are respectively drawn by points with different colors and sizes according to the occupation conditions of the port, the transceiver and the amplifier.

As shown in fig. 2, in describing the NSFNET (national science foundation network) network, fig. 2(a) shows a graph of one of the wavelengths of the network topology sliced by wavelength, the nodes are first drawn with black solid dots according to the coordinates of a given network node, and then the links are drawn with different colors according to the connectivity of the given link and the occupation of bandwidth on all links in the network topology at the current wavelength. Finally, the ports and transceivers are represented by smaller dots, the amplifiers by larger dots, and likewise drawn in different colors according to different occupancy conditions. The same method is used for drawing the service topological graph.

Step two: the convolutional neural network is used for extracting the characteristics of all pictures, and the softmax classifier is used for classifying the pictures to determine the wavelength to which the service request is allocated.

And in the second step, the convolutional neural network adopts a lightweight convolutional neural network MobilenetV3, and the standard convolutional layer is decomposed into a deep convolution and point convolution form, so that the operation speed is greatly improved. As shown in fig. 3, a network topology map and a service topology map of 5 wavelengths are input into a lightweight convolutional neural network MobilenetV3, the convolutional kernel of the first layer convolutional layer of the convolutional neural network is 3, the step size is 2, and the padding is 1; then 15 layers of blocks with determined input and output channels, convolution kernels and step sizes are carried out. The output dimension at this time is 7 × 7 × 160, the convolution kernel of the next layer is 1, and the step size is 1. Then, the average pooling layer with convolution kernel of 7 is followed by final dimensionality reduction of two 1 × 1 convolution layers. It should be noted that the last layer has no activation function, because the presence of an activation function after dimensionality reduction destroys the extracted features. And then inputting the extracted features into a softmax classifier to obtain probability distribution of the action, wherein the higher the probability distribution is, the higher the probability of selecting the wavelength corresponding to the action is.

Use of

As an activation function, the accuracy of the network can be improved compared to the ReLU function. Where x represents the input of the activation function layer, ReLU () is another activation function, and H-Swish is an improvement to the activation function ReLU taken here, which is the activation function of the lightweight convolutional neural network MobilentV 3.

Step three: if the allocated wavelength has available resources, that is, it is detected that there are idle resources in the port, transceiver, amplifier and bandwidth at the position corresponding to the current service request in the network topology of the wavelength, the service is successfully allocated, and if there are no available resources, the service request r is allocated by traversing all wavelengths according to a First-time adaptation (FF) method. Finally, whatever the method by which a service request r is distributed, a reward value is obtained in accordance with the reduced energy consumption.

For the wavelength determined by the method, if the port, the transceiver, the amplifier and the bandwidth on the wavelength have idle resources, the current service request r can be allocated, that is, the corresponding port, the transceiver, the amplifier and the bandwidth resources are occupied by the position corresponding to the service request r in the current wavelength. If one cannot meet the requirement, the FF method is adopted, namely all wavelengths are traversed according to the serial numbers of the wavelengths, and the wavelength of the first available resource is found for allocation. Finally, a Reward value (Reward) is obtained according to table 1 based on the impact of the assigned wavelength on network energy consumption.

TABLE 1 corresponding table for calculating reward value

Step four: and after the distribution of each service is finished, evaluating the behavior in the step three by using a reinforcement learning algorithm to generate a value, updating the state of the topological network, and generating the shortest path topological graph of the next service request.

The reinforcement learning algorithm adopts an Actor-criticic (AC) algorithm, the Actor-criticic algorithm comprises an Actor network and a criticic network, the Actor network and the criticic network of the reinforcement learning algorithm share one neural network, the Actor network is responsible for leading the service to a correct part so as to reduce network energy consumption, and the criticic network is used for judging the quality of the action. The network state represents the network characteristics extracted by the lightweight convolutional neural network MobileneetV 3, the action represents the selected wavelength, the reward value corresponds to the result of each service grooming, the fewer the occupied resources after the service enters the topological network, the larger the reward value, and otherwise, the smaller the reward value or even the penalty. The value is the Critic network's evaluation of the corresponding action.

After each service request is distributed, the currently selected action is evaluated, and the extracted feature input Critic network obtains a value to prepare for the subsequent network updating. And updating the state of the topological network according to the distribution condition of the service, namely redrawing the changed wavelength, keeping the topological graphs of other wavelengths unchanged, and drawing the topological graph of the next service request by using the same method as the step I.

Step five: and repeating the steps one-four, and updating the neural network according to the network state, the action, the reward value and the value after the five service request distribution is completed.

The following is a specific update method: the total loss can be calculated according to equations (7), (8), (9) to update the network:

l_t＝l_v·c_v+l_a+e·c_e (9)

wherein R is_iRepresenting the total reward value, V (s, theta) representing the value function, s representing the network state, theta representing the network parameter, l_vIs the mean square error of the total reward value and value function, l_aThe method is characterized in that the cross entropy of the difference between the strategy function and the total reward value and the value function is obtained, and finally the network parameter theta is updated through a gradient descent method. The entropy e is introduced to evaluate the possibility difference of actions, and when the entropy e converges to a certain value, a better strategy is learned, so that all services can be dredged efficiently and energy-effectively. l_tDenotes total loss, c_v、c_eCoefficients representing loss of value and entropy, respectively, are defaulted to 0.5 and 0.01.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A deep reinforcement learning flow dispersion method in a cloud-fog elastic optical network is characterized by comprising the following steps:

the method for converting the service path and the network topology sliced according to the wavelength into the form of the picture comprises the following steps: drawing nodes and links according to the positions of the nodes and the communication condition of the links, and respectively drawing points with different colors and sizes according to the occupation conditions of ports, transceivers and amplifiers; a picture of one of the wavelengths of the network topology sliced by wavelength is: firstly, drawing nodes by using black solid dots according to the coordinates of given network nodes; then drawing links with different colors according to the communication condition of the given link and the occupation condition of the bandwidth on all the links in the network topology of the current wavelength; finally, the port and the transceiver are represented by small dots, the amplifier is represented by large dots, and the amplifier is drawn by different colors according to different occupation conditions; the topological graph of the service path is drawn by the same method;

2. The method according to claim 1, wherein the bandwidth resources of each link from the source node s to the destination node d in the step one are divided into 5 parts according to the wavelength, when a service request comes, any one of the wavelengths is selected to be allocated, only the state of the current wavelength is changed, that is, the service request is allocated to the current wavelength, and the port, the transceiver, the amplifier and the bandwidth occupation situation at the corresponding position are changed.

3. The deep reinforcement learning traffic grooming method in the cloud-fog elastic optical network according to claim 1 or 2, wherein the convolutional neural network in the second step is a lightweight convolutional neural network MobilenetV3, and the lightweight convolutional neural network MobilenetV3 decomposes the standard convolutional layer into a form of deep convolution and point convolution: the convolution kernel of the first layer of convolution layer is 3, the step length is 2, and the filling is 1; the second layer is a block layer with 15 layers of input and output channels, convolution kernels and step length determined; the convolution kernel of the third layer is 1, and the step length is 1; the fourth layer is an average pooling layer with a convolution kernel of 7; dimension reduction is carried out by two layers of 1 multiplied by 1 convolution layers.

4. The method for deep reinforcement learning traffic grooming in the cloud-fog elastic optical network according to claim 3, wherein the features extracted by the lightweight convolutional neural network mobileneetv 3 are input into a softmax classifier to obtain a probability distribution of actions, and the higher the probability distribution is, the greater the probability of selecting the wavelength corresponding to the action is.

5. The deep reinforcement learning traffic grooming method in the cloud-fog elastic optical network according to claim 3, wherein the activation function of the lightweight convolutional neural network MobilentV3 is:

6. The deep reinforcement learning traffic grooming method in the cloud-fog elastic optical network according to any one of claims 1, 4 or 5, wherein the available resources in the third step are idle resources available in a port, a transceiver, an amplifier and a bandwidth at a position corresponding to a current service request in the network topology of the wavelength; the first adaptation method traverses all wavelengths according to the serial numbers of the wavelengths, and finds the first wavelength with available resources for allocation; and calculating a reward value according to the influence of the allocated wavelength on the network energy consumption.

7. The method for deep reinforcement learning traffic grooming in the cloud-fog elastic optical network according to claim 6, wherein an Actor-Critic algorithm is used in the reinforcement learning algorithm in the fourth step, the Actor-Critic algorithm includes an Actor network and a Critic network, the Actor network and the Critic network share a neural network, the Actor network is responsible for grooming traffic to a correct part, and the Critic network is used for evaluating quality of an action to obtain a value; the network state of the topology represents the network characteristics extracted by the lightweight convolutional neural network MobileneetV 3, the action represents the selected wavelength, the reward value corresponds to the result of each service grooming, and the fewer the resources occupied by the service after entering the topology network, the larger the reward value.

8. The method according to claim 7, wherein in the fourth step, the network state of the topology is updated according to the allocation of the service request, that is, the changed wavelength is redrawn, the topology of other wavelengths remains unchanged, and the topology of the next service request is drawn by the method in the first step.

9. The deep reinforcement learning traffic grooming method in the cloud-fog elastic optical network according to any one of claims 1, 7 or 8, wherein the method for updating the convolutional neural network in the fifth step is as follows: updating the convolutional neural network by calculating the total loss:

l_t＝l_v·c_v+l_a+e·c_e，

and updating the network parameter theta by a gradient descent method.