CN111246320B - Deep reinforcement learning flow dispersion method in cloud-fog elastic optical network - Google Patents

Deep reinforcement learning flow dispersion method in cloud-fog elastic optical network Download PDF

Info

Publication number
CN111246320B
CN111246320B CN202010016994.1A CN202010016994A CN111246320B CN 111246320 B CN111246320 B CN 111246320B CN 202010016994 A CN202010016994 A CN 202010016994A CN 111246320 B CN111246320 B CN 111246320B
Authority
CN
China
Prior art keywords
network
service request
wavelength
topology
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010016994.1A
Other languages
Chinese (zh)
Other versions
CN111246320A (en
Inventor
朱睿杰
李世华
李亚飞
吕培
徐明亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University
Original Assignee
Zhengzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University filed Critical Zhengzhou University
Priority to CN202010016994.1A priority Critical patent/CN111246320B/en
Publication of CN111246320A publication Critical patent/CN111246320A/en
Application granted granted Critical
Publication of CN111246320B publication Critical patent/CN111246320B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0062Network aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0005Switch and router aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0005Switch and router aspects
    • H04Q2011/0007Construction
    • H04Q2011/0011Construction using wavelength conversion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0062Network aspects
    • H04Q2011/0073Provisions for forwarding or routing, e.g. lookup tables
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0062Network aspects
    • H04Q2011/0075Wavelength grouping or hierarchical aspects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q11/00Selecting arrangements for multiplex systems
    • H04Q11/0001Selecting arrangements for multiplex systems using optical switching
    • H04Q11/0062Network aspects
    • H04Q2011/009Topology aspects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Optical Communication System (AREA)

Abstract

The invention provides a deep reinforcement learning flow dispersion method in a cloud-fog elastic optical network, which comprises the following steps: calculating a shortest path of the service request through a shortest path algorithm; converting the service path and the network topology sliced according to the wavelength into a form of a picture; extracting the characteristics of all pictures by using a convolutional neural network, classifying by using a softmax classifier, and allocating service requests to corresponding wavelengths; if the allocated wavelength has available resources, the service request is successfully allocated, otherwise, all the wavelengths are traversed according to a first adaptation method to allocate the service request; evaluating by using a reinforcement learning algorithm, updating the network state of the topology, and generating a shortest path topological graph of the next service request; and updating the convolutional neural network every time at least three service request allocations are completed. The invention continuously updates the network through reinforcement learning, so that all services can fully utilize ports, transceivers and amplifiers in the network, thereby reducing the total energy consumption of the network.

Description

Deep reinforcement learning flow dispersion method in cloud-fog elastic optical network
Technical Field
The invention relates to the technical field of elastic optical networks and cloud-fog communication, in particular to a deep reinforcement learning traffic dispersion method in a cloud-fog elastic optical network.
Background
Cloud computing transports all data sets to the same center for analysis, storage and processing, and is good at providing various services, but with the explosive growth of global internet of things equipment, mass data generated by the equipment are not suitable for being processed by the cloud computing, and meanwhile, redundant transmission process causes too high delay, and huge challenges are brought to the current communication network. In order to meet the requirement of a large amount of low-delay computing of the Internet of things and make up for the defects of traditional cloud computing, the fog computing is carried out at the same time, the mode of the fog computing is that a plurality of fog nodes are arranged, data processing and application programs are concentrated in equipment at the edge of a network, the computing processing speed is higher, and a computing result is obtained more efficiently, so the fog computing becomes the best candidate for processing data, and the fog computing has the advantages of low delay, high safety, better user experience and higher power consumption efficiency.
An Elastic Optical Network (EON) is a promising Network infrastructure for communication between a cloud node and a cloud data center, abstracts resources in a bottom-layer physical Network into a resource pool of cloud-fog computing, performs resource allocation and management on a virtual Network, and can provide flexible and efficient services. In order to fully utilize the flexibility and simultaneously fully utilize the bottom layer physical resources, a traffic grooming method is developed, a plurality of fine-grained IP traffic can be flexibly converged into an optical layer through the existing optical path, and a frequency spectrum is flexibly distributed according to the service request bandwidth. Especially with the development of substrate devices such as a sliceable optical transponder, a sliceable optical amplifier, etc., the flow grooming can achieve higher power consumption efficiency.
Traffic grooming directs different bandwidth requests to the same wavelength to save resources and energy. The total energy consumption mainly consists of three parts: IP port, transceiver and amplifier, the energy consumption of these three parts is first modeled:
IP port: considering that the basic energy consumption of 400Gbps is 560W, the total port energy consumption is EIPTAnd (W) represents.
An optical transceiver: the energy consumption depends on the linear speed of the service request, and for each linear speed unit, the energy consumption is 1.683W (parameter η is 1.683W/Gbps), and the calculation formula is as follows:
Figure BDA0002359271380000011
Figure BDA0002359271380000012
where TR denotes the transmission rate, NOPTIs the number of optical transceivers that are,
Figure BDA0002359271380000013
is shown asEnergy consumption of i transceivers, EOPTRepresenting the total energy consumption of the transceiver. Line rates of 40Gbps and 100Gbps are considered in the present invention.
An optical amplifier: the basic power consumption mu of each optical amplifier is 100W, and the additional power consumption depends on the linear speed of the service request. The additional power consumption is 25W and 50W for 40Gbps and 100Gbps, respectively. The energy consumption calculation formula of the optical amplifier is as follows:
Figure BDA0002359271380000021
Figure BDA0002359271380000022
where θ is the additional energy consumption, NOPRIs the number of optical amplifiers.
Figure BDA0002359271380000023
Is the power consumption of the ith amplifier, EOPRRepresenting the total energy consumption of the amplifier.
Therefore, the total energy consumption is calculated by the formula: eTG(W)=EIPT(W)+EOPT(W)+EOPR(W) (5)。
In the existing research, only a fixed flow grooming strategy or a simple strategy depending on manual feature extraction is applied, and a real self-adaptive flow grooming strategy cannot be realized. Meanwhile, the effectiveness of Deep Learning (DRL) in solving large-scale tasks is also verified.
Disclosure of Invention
Aiming at the technical problems that mass data in the existing Internet of things is high in delay through cloud computing processing and energy consumption of an elastic optical Network is high, the invention provides a deep reinforcement learning flow grooming method in a cloud-fog elastic optical Network.
In order to achieve the purpose, the technical scheme of the invention is realized as follows: a deep reinforcement learning flow dispersion method in a cloud-fog elastic optical network comprises the following steps:
the method comprises the following steps: for a service request r ═ s, d, t, calculating the shortest path of the service request r by a shortest path algorithm; converting a service path of the service request r and a network topology sliced according to wavelength into a form of a picture; wherein, s and d represent a source node and a destination node respectively, and t represents the bandwidth requirement of the service request r;
step two: extracting the characteristics of all the pictures in the step one by using a convolutional neural network, classifying by using a softmax classifier, and allocating the service request to corresponding wavelengths according to the classification result;
step three: if the allocated wavelength has available resources, the service request is successfully allocated, if no available resources exist, all the wavelengths are traversed according to a first adaptation method to allocate the service request r, and a reward value is obtained according to the reduced energy consumption;
step four: after the distribution of each service request is completed, evaluating the step three by using a reinforcement learning algorithm to generate a value, updating the network state of the topology, and generating a shortest path topological graph of the next service request;
step five: and repeating the first step to the fourth step, and updating the convolutional neural network according to the network state, the action, the reward value and the value every time when the distribution of at least three service requests is completed.
In the first step, bandwidth resources of each link from the source node s to the destination node d are divided into 5 parts according to the wavelength, when a service request comes, the bandwidth resources are selected to be allocated to any wavelength, only the state of the current wavelength is changed, namely, the service request is allocated to the current wavelength, and the port, the transceiver, the amplifier and the bandwidth occupation condition of the corresponding position are changed.
The method for converting the service path and the network topology sliced according to the wavelength into the form of the picture comprises the following steps: drawing nodes and links according to the positions of the nodes and the communication condition of the links, and respectively drawing points with different colors and sizes according to the occupation conditions of ports, transceivers and amplifiers; a picture of one of the wavelengths of the network topology sliced by wavelength is: firstly, drawing nodes by using black solid dots according to the coordinates of given network nodes; then drawing links with different colors according to the communication condition of the given link and the occupation condition of the bandwidth on all the links in the network topology of the current wavelength; finally, the port and the transceiver are represented by small dots, the amplifier is represented by large dots, and the amplifier is drawn by different colors according to different occupation conditions; the topology of the traffic path is drawn in the same way.
The convolutional neural network in the second step adopts a lightweight convolutional neural network MobilenetV3, and the lightweight convolutional neural network MobilenetV3 decomposes the standard convolutional layer into a form of deep convolution and point convolution: the convolution kernel of the first layer of convolution layer is 3, the step length is 2, and the filling is 1; the second layer is a block layer with 15 layers of input and output channels, convolution kernels and step length determined; the convolution kernel of the third layer is 1, and the step length is 1; the fourth layer is an average pooling layer with a convolution kernel of 7; dimension reduction is carried out by two layers of 1 multiplied by 1 convolution layers.
Inputting the features extracted by the lightweight convolutional neural network MobilenetV3 into a softmax classifier to obtain the probability distribution of the action, wherein the higher the probability distribution is, the higher the probability of selecting the wavelength corresponding to the action is.
The activation function of the lightweight convolutional neural network MobilentV3 is:
Figure BDA0002359271380000031
where x represents the input of the activation function layer, ReLU () is a commonly used activation function; and the last layer of the lightweight convolutional neural network MobilentV3 has no activation function.
The available resources in the third step are idle resources in the port, the transceiver, the amplifier and the bandwidth at the position corresponding to the current service request in the network topology of the wavelength; the first adaptation method traverses all wavelengths according to the serial numbers of the wavelengths, and finds the first wavelength with available resources for allocation; and calculating a reward value according to the influence of the allocated wavelength on the network energy consumption.
The reinforcement learning algorithm in the fourth step adopts an Actor-Critic algorithm, the Actor-Critic algorithm comprises an Actor network and a Critic network, the Actor network and the Critic network share a neural network, the Actor network is responsible for guiding the service to a correct part, and the Critic network is used for judging the quality of the action to obtain a value; the network state of the topology represents the network characteristics extracted by the lightweight convolutional neural network MobileneetV 3, the action represents the selected wavelength, the reward value corresponds to the result of each service grooming, and the fewer the resources occupied by the service after entering the topology network, the larger the reward value.
And in the fourth step, the network state of the topology is updated according to the distribution condition of the service request, that is, the changed wavelength is redrawn, the topology maps of other wavelengths are kept unchanged, and the topology map of the next service request is drawn by using the method in the first step.
The method for updating the convolutional neural network in the fifth step comprises the following steps: updating the convolutional neural network by calculating the total loss:
Figure BDA0002359271380000041
Figure BDA0002359271380000042
lt=lv·Cv+la+e·ce
wherein R isiRepresenting the total reward value, V (s, theta) representing the value function, s representing the network state, theta representing the network parameter, lvIs the mean square error of the total reward value and value function, laIs the cross entropy of the difference between the policy function and the total reward value and the value function; e is entropy, probability difference of evaluation action; ltDenotes total loss, cvAnd ceCoefficients representing the value loss and entropy, respectively;
and updating the network parameter theta by a gradient descent method.
The invention has the beneficial effects that: for a static point-to-point service request, converting the service and the network topology cut according to the wavelength into a picture form; the method comprises the steps that the service is distributed to a certain wavelength by extracting the characteristics of pictures, if available resources exist on the wavelength, the service is distributed successfully, and if no available resources exist, all the wavelengths are traversed until the service can be distributed successfully; each successful distribution can obtain a reward value according to the reduced energy consumption, and the network is continuously updated through reinforcement learning; the invention converts each wavelength of each service and network into pictures, uses different sizes, shapes and colors to represent the occupation conditions of ports, transceivers and amplifiers at different positions, uses lines with different colors to represent the occupation conditions of link bandwidths, and adopts a convolutional neural network to automatically extract the effective characteristics of network topology; in order to make the flow dispersion more intelligent, the invention adopts a reinforcement learning method to successfully distribute all the services and make the total energy consumption less.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of the present invention.
Fig. 2 is a network topology diagram after conversion according to the present invention, wherein (a) is a network topology diagram of one wavelength sliced by wavelength, and (b) is a service topology diagram.
Fig. 3 is a flow chart of the core part of the algorithm.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, a deep reinforcement learning traffic grooming method in a cloud-fog elastic optical network includes the steps of:
the method comprises the following steps: for a service request r ═ s, d, t, (s, d, t), s and d represent the source node and the destination node respectively, t represents the bandwidth requirement of the service, and the Shortest Path of the service request r is calculated by a Shortest Path algorithm (Dijkstra short Path, DSP). The traffic path and the wavelength sliced network topology are then converted into the form of a picture.
The bandwidth resources of each link in the elastic optical network are divided into 5 parts according to the wavelength, the initial states of the parts are the same, when a service request comes, the part can be selected to be allocated to any wavelength, only the state of the current wavelength is changed, namely the service is allocated to the current wavelength, and the port, the transceiver, the amplifier and the bandwidth occupation condition of the corresponding position are changed. In describing the NSFNET (national science foundation network) network, the nodes are represented by black dots, the differences in bandwidth usage on the link are represented by 11 lines of different colors, and the colored dots near the nodes represent the ports, transceivers and amplifiers, respectively, as shown in fig. 2. Each randomly generated service comprises a source node, a destination node and the bandwidth required to be occupied, and the shortest path between the source node and the destination node is calculated by using a DSP algorithm.
The method for converting the network topology and the service path into the form of the picture according to the wavelength slice comprises the following steps: the node and the link are drawn according to the node position and the connection condition of the link, and then the nodes and the link are respectively drawn by points with different colors and sizes according to the occupation conditions of the port, the transceiver and the amplifier.
As shown in fig. 2, in describing the NSFNET (national science foundation network) network, fig. 2(a) shows a graph of one of the wavelengths of the network topology sliced by wavelength, the nodes are first drawn with black solid dots according to the coordinates of a given network node, and then the links are drawn with different colors according to the connectivity of the given link and the occupation of bandwidth on all links in the network topology at the current wavelength. Finally, the ports and transceivers are represented by smaller dots, the amplifiers by larger dots, and likewise drawn in different colors according to different occupancy conditions. The same method is used for drawing the service topological graph.
Step two: the convolutional neural network is used for extracting the characteristics of all pictures, and the softmax classifier is used for classifying the pictures to determine the wavelength to which the service request is allocated.
And in the second step, the convolutional neural network adopts a lightweight convolutional neural network MobilenetV3, and the standard convolutional layer is decomposed into a deep convolution and point convolution form, so that the operation speed is greatly improved. As shown in fig. 3, a network topology map and a service topology map of 5 wavelengths are input into a lightweight convolutional neural network MobilenetV3, the convolutional kernel of the first layer convolutional layer of the convolutional neural network is 3, the step size is 2, and the padding is 1; then 15 layers of blocks with determined input and output channels, convolution kernels and step sizes are carried out. The output dimension at this time is 7 × 7 × 160, the convolution kernel of the next layer is 1, and the step size is 1. Then, the average pooling layer with convolution kernel of 7 is followed by final dimensionality reduction of two 1 × 1 convolution layers. It should be noted that the last layer has no activation function, because the presence of an activation function after dimensionality reduction destroys the extracted features. And then inputting the extracted features into a softmax classifier to obtain probability distribution of the action, wherein the higher the probability distribution is, the higher the probability of selecting the wavelength corresponding to the action is.
Use of
Figure BDA0002359271380000051
As an activation function, the accuracy of the network can be improved compared to the ReLU function. Where x represents the input of the activation function layer, ReLU () is another activation function, and H-Swish is an improvement to the activation function ReLU taken here, which is the activation function of the lightweight convolutional neural network MobilentV 3.
Step three: if the allocated wavelength has available resources, that is, it is detected that there are idle resources in the port, transceiver, amplifier and bandwidth at the position corresponding to the current service request in the network topology of the wavelength, the service is successfully allocated, and if there are no available resources, the service request r is allocated by traversing all wavelengths according to a First-time adaptation (FF) method. Finally, whatever the method by which a service request r is distributed, a reward value is obtained in accordance with the reduced energy consumption.
For the wavelength determined by the method, if the port, the transceiver, the amplifier and the bandwidth on the wavelength have idle resources, the current service request r can be allocated, that is, the corresponding port, the transceiver, the amplifier and the bandwidth resources are occupied by the position corresponding to the service request r in the current wavelength. If one cannot meet the requirement, the FF method is adopted, namely all wavelengths are traversed according to the serial numbers of the wavelengths, and the wavelength of the first available resource is found for allocation. Finally, a Reward value (Reward) is obtained according to table 1 based on the impact of the assigned wavelength on network energy consumption.
TABLE 1 corresponding table for calculating reward value
Figure BDA0002359271380000061
Step four: and after the distribution of each service is finished, evaluating the behavior in the step three by using a reinforcement learning algorithm to generate a value, updating the state of the topological network, and generating the shortest path topological graph of the next service request.
The reinforcement learning algorithm adopts an Actor-criticic (AC) algorithm, the Actor-criticic algorithm comprises an Actor network and a criticic network, the Actor network and the criticic network of the reinforcement learning algorithm share one neural network, the Actor network is responsible for leading the service to a correct part so as to reduce network energy consumption, and the criticic network is used for judging the quality of the action. The network state represents the network characteristics extracted by the lightweight convolutional neural network MobileneetV 3, the action represents the selected wavelength, the reward value corresponds to the result of each service grooming, the fewer the occupied resources after the service enters the topological network, the larger the reward value, and otherwise, the smaller the reward value or even the penalty. The value is the Critic network's evaluation of the corresponding action.
After each service request is distributed, the currently selected action is evaluated, and the extracted feature input Critic network obtains a value to prepare for the subsequent network updating. And updating the state of the topological network according to the distribution condition of the service, namely redrawing the changed wavelength, keeping the topological graphs of other wavelengths unchanged, and drawing the topological graph of the next service request by using the same method as the step I.
Step five: and repeating the steps one-four, and updating the neural network according to the network state, the action, the reward value and the value after the five service request distribution is completed.
The following is a specific update method: the total loss can be calculated according to equations (7), (8), (9) to update the network:
Figure BDA0002359271380000071
Figure BDA0002359271380000072
lt=lv·cv+la+e·ce (9)
wherein R isiRepresenting the total reward value, V (s, theta) representing the value function, s representing the network state, theta representing the network parameter, lvIs the mean square error of the total reward value and value function, laThe method is characterized in that the cross entropy of the difference between the strategy function and the total reward value and the value function is obtained, and finally the network parameter theta is updated through a gradient descent method. The entropy e is introduced to evaluate the possibility difference of actions, and when the entropy e converges to a certain value, a better strategy is learned, so that all services can be dredged efficiently and energy-effectively. ltDenotes total loss, cv、ceCoefficients representing loss of value and entropy, respectively, are defaulted to 0.5 and 0.01.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (9)

1. A deep reinforcement learning flow dispersion method in a cloud-fog elastic optical network is characterized by comprising the following steps:
the method comprises the following steps: for a service request r ═ s, d, t, calculating the shortest path of the service request r by a shortest path algorithm; converting a service path of the service request r and a network topology sliced according to wavelength into a form of a picture; wherein, s and d represent a source node and a destination node respectively, and t represents the bandwidth requirement of the service request r;
the method for converting the service path and the network topology sliced according to the wavelength into the form of the picture comprises the following steps: drawing nodes and links according to the positions of the nodes and the communication condition of the links, and respectively drawing points with different colors and sizes according to the occupation conditions of ports, transceivers and amplifiers; a picture of one of the wavelengths of the network topology sliced by wavelength is: firstly, drawing nodes by using black solid dots according to the coordinates of given network nodes; then drawing links with different colors according to the communication condition of the given link and the occupation condition of the bandwidth on all the links in the network topology of the current wavelength; finally, the port and the transceiver are represented by small dots, the amplifier is represented by large dots, and the amplifier is drawn by different colors according to different occupation conditions; the topological graph of the service path is drawn by the same method;
step two: extracting the characteristics of all the pictures in the step one by using a convolutional neural network, classifying by using a softmax classifier, and allocating the service request to corresponding wavelengths according to the classification result;
step three: if the allocated wavelength has available resources, the service request is successfully allocated, if no available resources exist, all the wavelengths are traversed according to a first adaptation method to allocate the service request r, and a reward value is obtained according to the reduced energy consumption;
step four: after the distribution of each service request is completed, evaluating the step three by using a reinforcement learning algorithm to generate a value, updating the network state of the topology, and generating a shortest path topological graph of the next service request;
step five: and repeating the first step to the fourth step, and updating the convolutional neural network according to the network state, the action, the reward value and the value every time when the distribution of at least three service requests is completed.
2. The method according to claim 1, wherein the bandwidth resources of each link from the source node s to the destination node d in the step one are divided into 5 parts according to the wavelength, when a service request comes, any one of the wavelengths is selected to be allocated, only the state of the current wavelength is changed, that is, the service request is allocated to the current wavelength, and the port, the transceiver, the amplifier and the bandwidth occupation situation at the corresponding position are changed.
3. The deep reinforcement learning traffic grooming method in the cloud-fog elastic optical network according to claim 1 or 2, wherein the convolutional neural network in the second step is a lightweight convolutional neural network MobilenetV3, and the lightweight convolutional neural network MobilenetV3 decomposes the standard convolutional layer into a form of deep convolution and point convolution: the convolution kernel of the first layer of convolution layer is 3, the step length is 2, and the filling is 1; the second layer is a block layer with 15 layers of input and output channels, convolution kernels and step length determined; the convolution kernel of the third layer is 1, and the step length is 1; the fourth layer is an average pooling layer with a convolution kernel of 7; dimension reduction is carried out by two layers of 1 multiplied by 1 convolution layers.
4. The method for deep reinforcement learning traffic grooming in the cloud-fog elastic optical network according to claim 3, wherein the features extracted by the lightweight convolutional neural network mobileneetv 3 are input into a softmax classifier to obtain a probability distribution of actions, and the higher the probability distribution is, the greater the probability of selecting the wavelength corresponding to the action is.
5. The deep reinforcement learning traffic grooming method in the cloud-fog elastic optical network according to claim 3, wherein the activation function of the lightweight convolutional neural network MobilentV3 is:
Figure FDA0003196839230000021
where x represents the input of the activation function layer, ReLU () is a commonly used activation function; and the last layer of the lightweight convolutional neural network MobilentV3 has no activation function.
6. The deep reinforcement learning traffic grooming method in the cloud-fog elastic optical network according to any one of claims 1, 4 or 5, wherein the available resources in the third step are idle resources available in a port, a transceiver, an amplifier and a bandwidth at a position corresponding to a current service request in the network topology of the wavelength; the first adaptation method traverses all wavelengths according to the serial numbers of the wavelengths, and finds the first wavelength with available resources for allocation; and calculating a reward value according to the influence of the allocated wavelength on the network energy consumption.
7. The method for deep reinforcement learning traffic grooming in the cloud-fog elastic optical network according to claim 6, wherein an Actor-Critic algorithm is used in the reinforcement learning algorithm in the fourth step, the Actor-Critic algorithm includes an Actor network and a Critic network, the Actor network and the Critic network share a neural network, the Actor network is responsible for grooming traffic to a correct part, and the Critic network is used for evaluating quality of an action to obtain a value; the network state of the topology represents the network characteristics extracted by the lightweight convolutional neural network MobileneetV 3, the action represents the selected wavelength, the reward value corresponds to the result of each service grooming, and the fewer the resources occupied by the service after entering the topology network, the larger the reward value.
8. The method according to claim 7, wherein in the fourth step, the network state of the topology is updated according to the allocation of the service request, that is, the changed wavelength is redrawn, the topology of other wavelengths remains unchanged, and the topology of the next service request is drawn by the method in the first step.
9. The deep reinforcement learning traffic grooming method in the cloud-fog elastic optical network according to any one of claims 1, 7 or 8, wherein the method for updating the convolutional neural network in the fifth step is as follows: updating the convolutional neural network by calculating the total loss:
Figure FDA0003196839230000022
Figure FDA0003196839230000023
lt=lv·cv+la+e·ce
wherein R isiRepresenting the total reward value, V (s, theta) representing the value function, s representing the network state, theta representing the network parameter, lvIs the mean square error of the total reward value and value function, laIs the cross entropy of the difference between the policy function and the total reward value and the value function; e is entropy, probability difference of evaluation action; ltDenotes total loss, cvAnd ceCoefficients representing the value loss and entropy, respectively;
and updating the network parameter theta by a gradient descent method.
CN202010016994.1A 2020-01-08 2020-01-08 Deep reinforcement learning flow dispersion method in cloud-fog elastic optical network Active CN111246320B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010016994.1A CN111246320B (en) 2020-01-08 2020-01-08 Deep reinforcement learning flow dispersion method in cloud-fog elastic optical network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010016994.1A CN111246320B (en) 2020-01-08 2020-01-08 Deep reinforcement learning flow dispersion method in cloud-fog elastic optical network

Publications (2)

Publication Number Publication Date
CN111246320A CN111246320A (en) 2020-06-05
CN111246320B true CN111246320B (en) 2021-09-07

Family

ID=70866541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010016994.1A Active CN111246320B (en) 2020-01-08 2020-01-08 Deep reinforcement learning flow dispersion method in cloud-fog elastic optical network

Country Status (1)

Country Link
CN (1) CN111246320B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112383846B (en) * 2020-11-13 2022-06-21 国网河南省电力公司信息通信公司 Cloud-fog elastic optical network-oriented spectrum resource allocation method for advance reservation request
CN114584865A (en) * 2020-11-18 2022-06-03 中兴通讯股份有限公司 Single service resource allocation method, device, computer equipment and medium
CN114584871B (en) * 2022-04-28 2022-08-05 华南师范大学 Spectrum allocation method, device, storage medium and equipment of elastic optical network

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104486094A (en) * 2014-12-15 2015-04-01 西安电子科技大学 Multicast traffic grooming method based on physical topology light-tree correction

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100361445C (en) * 2004-12-17 2008-01-09 电子科技大学 Integrated service leading method for WDM optical network
US9942128B2 (en) * 2013-11-29 2018-04-10 Telefonaktiebolaget Lm Ericsson (Publ) Method and apparatus for elastic optical networking
US10574381B2 (en) * 2015-03-30 2020-02-25 British Telecommunications Public Limited Company Optical network design and routing
CN109547876A (en) * 2018-12-29 2019-03-29 重庆邮电大学 A kind of adaptive guard rank method under elastic optical network twin failure
CN109905784B (en) * 2019-01-16 2021-10-15 国家电网有限公司 Service reconstruction method and equipment for optical network wavelength allocation
CN109617809A (en) * 2019-01-21 2019-04-12 中天宽带技术有限公司 Regenerator selection is placed and traffic grooming method in elastic optical network
CN110633790B (en) * 2019-09-19 2022-04-08 郑州大学 Method and system for measuring residual oil quantity of airplane oil tank based on convolutional neural network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104486094A (en) * 2014-12-15 2015-04-01 西安电子科技大学 Multicast traffic grooming method based on physical topology light-tree correction

Also Published As

Publication number Publication date
CN111246320A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN111246320B (en) Deep reinforcement learning flow dispersion method in cloud-fog elastic optical network
CN113193984B (en) Air-space-ground integrated network resource mapping method and system
CN109068391B (en) Internet of vehicles communication optimization algorithm based on edge calculation and Actor-Critic algorithm
CN110753319B (en) Heterogeneous service-oriented distributed resource allocation method and system in heterogeneous Internet of vehicles
CN103428805B (en) The virtual mapping method of a kind of wireless network based on link anti-interference
CN111953547B (en) Heterogeneous base station overlapping grouping and resource allocation method and device based on service
CN111585811B (en) Virtual optical network mapping method based on multi-agent deep reinforcement learning
CN115665227B (en) Universal heterogeneous integrated computing network resource intelligent adaptation network architecture and method
CN114885420A (en) User grouping and resource allocation method and device in NOMA-MEC system
Yu et al. A deep learning based RSA strategy for elastic optical networks
CN114828095A (en) Efficient data perception layered federated learning method based on task unloading
CN113849313A (en) Energy-saving method for deploying computing task chain in cloud-edge elastic optical network
CN113676407A (en) Deep learning driven flow optimization mechanism of communication network
CN111245701B (en) Link priority virtual network mapping method based on maximum weighted matching
CN108833486B (en) Hybrid dynamic task scheduling method for complex vehicle-mounted fog computing system environment
CN116112981A (en) Unmanned aerial vehicle task unloading method based on edge calculation
CN115499365A (en) Route optimization method, device, equipment and medium
CN114745386B (en) Neural network segmentation and unloading method in multi-user edge intelligent scene
CN116112934A (en) End-to-end network slice resource allocation method based on machine learning
CN113676917B (en) Game theory-based energy consumption optimization method for unmanned aerial vehicle hierarchical mobile edge computing network
CN115190543A (en) Edge side live video transmission scheduling method in dense wireless network
Zhu et al. Deep reinforced energy efficient traffic grooming in fog-cloud elastic optical networks
Wang et al. LRA-3C: Learning based resource allocation for communication-computing-caching systems
CN110138670B (en) Load migration method based on dynamic path
CN114051272A (en) Intelligent routing method for dynamic topological network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant