CN115102906A - Load balancing method based on deep reinforcement learning drive - Google Patents

Load balancing method based on deep reinforcement learning drive Download PDF

Info

Publication number
CN115102906A
CN115102906A CN202210700058.1A CN202210700058A CN115102906A CN 115102906 A CN115102906 A CN 115102906A CN 202210700058 A CN202210700058 A CN 202210700058A CN 115102906 A CN115102906 A CN 115102906A
Authority
CN
China
Prior art keywords
wise
network
link
node
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210700058.1A
Other languages
Chinese (zh)
Inventor
吴立军
曾祥云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210700058.1A priority Critical patent/CN115102906A/en
Publication of CN115102906A publication Critical patent/CN115102906A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a load balancing method driven by deep reinforcement learning, which comprises the steps of obtaining a network topological structure, switch characteristics and data link characteristics; generating node-wise graph characteristics according to the network topology structure and the switch characteristics; generating edge-wise graph characteristics according to the network topology structure and the data link characteristics; and constructing and training a BRGCN model, and acquiring a routing path according to the node-wise graph characteristics and the edge-wise graph characteristics by using the trained BRGCN model. The invention combines deep reinforcement learning with the graph convolution neural network and the cyclic neural network and applies the combination to the load balancing method, so that the model can make a decision according to the state information, the structure and the topological relation of the network are taken as the decision-making factors, and the model has the capability of processing the time sequence information and improves the performance of the model.

Description

Load balancing method based on deep reinforcement learning drive
Technical Field
The invention relates to the technical field of network optimization of an SDN data center, in particular to a load balancing method based on deep reinforcement learning drive.
Background
Since the 21 st century, the development of information technology has become faster and faster, the number of internet users in the world has increased remarkably, the amount of data in the network has also exploded in a well-spraying manner, and the internet world has entered a new era. Especially, in the last 10 years, various behaviors of more and more users in the internet have produced information data such as voice, text, pictures, and the like on an increasingly large scale. The development of the internet and the increase of internet users have also led to a large number of internet companies, which provide various application services to users, and the traffic and consumed resources generated by the operation of these services cannot be supported by the data center network. Because the data center network has large uncertainty due to the dynamics of the data center network, the load balancing algorithm routes the flow in the network by selecting a proper path, so that the load of the data link of the network is balanced, and the stability of the network is ensured.
In the SDN architecture, a controller may detect network information in real time, deploy SDN in a data center network, and manage the network according to global information of the network. Due to the characteristics of the SDN and the multi-path structure of the data center network, the controller can select an appropriate path route for the traffic according to the global state of the network. Therefore, after the SDN is introduced into the data center network, the load balancing algorithm has a larger promotion space. However, the load balancing algorithm in the network still has the problems of slow response, large calculation amount and the like due to the problems of the cost of calculating the routing path, the dynamic property of the network traffic and the like.
Accordingly, a technical solution is desired to overcome or at least alleviate at least one of the above-mentioned drawbacks of the prior art.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a load balancing method based on deep reinforcement learning driving.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
a deep reinforcement learning driven load balancing method comprises the following steps:
s1, acquiring a network topology structure, switch characteristics and data link characteristics;
s2, generating node-wise graph characteristics according to the network topology structure and the switch characteristics;
s3, generating edge-wise graph characteristics according to the network topology structure and the data link characteristics;
s4, constructing and training a BRGCN model, and acquiring a routing path according to the node-wise graph characteristics and the edge-wise graph characteristics by using the trained BRGCN model.
Alternatively, in step S1:
the method specifically comprises the steps of obtaining a connection relation between a switch and a data link;
the method specifically comprises the steps of obtaining load information of the switch, flow table utilization rate information, and one-hot codes of a source switch address and a destination switch address;
the obtaining of the data link characteristics specifically includes obtaining delay, packet loss rate, and link utilization information of the data link in the network.
Optionally, step S2 specifically includes:
and constructing a node-wise graph structure by taking the switches as nodes and taking data links between the switches as edges.
Optionally, step S3 specifically includes:
and constructing an edge-wise graph structure by taking the data links as nodes and the public switches between the data links as edges.
Optionally, the BRGCN network model constructed in step S4 specifically includes:
a node-wise recurrent pattern neural network alternately arranged by a three-layer pattern convolutional neural network and a three-layer recurrent neural network;
an edge-wise recurrent pattern neural network alternately arranged by a three-layer pattern convolution neural network and a three-layer recurrent neural network;
and the node-wise circulation diagram neural network and the edge-wise circulation diagram neural network are spliced at the tail end and connected with a layer of fully connected neural network.
Optionally, the training of the BRGCN network model in step S4 specifically includes:
acquiring a network state after the routing path is executed;
calculating rewards according to the network state after the routing path is executed;
storing the network state and the reward after the routing path is executed to an experience pool;
and selecting a group of experience calculation action values from the experience pool by adopting a random continuous sampling strategy, and updating the model parameters by utilizing a loss function based on the action values.
Optionally, the calculation manner of the reward is:
Figure BDA0003704046360000031
wherein, lr i Indicates ith Link utilization, Ave lr Means, Cor, representing the current link utilization lr Denotes a link utilization correction coefficient, alpha denotes a link utilization evaluation weight coefficient, Tol ade Mean value of variation, Met, representing the mean delay of the link ade Indicating the link average delay at the current time, Mle ade Representing the average delay, Cor, of the link at the last moment ade Denotes the link mean delay correction coefficient, beta denotes the link mean delay evaluation weight coefficient, Tol apl Mean value of variation, Met, representing the average packet loss rate of the link apl Indicating the average packet loss rate of the link at the current time, Mle apl Represents the average packet loss rate, Cor, of the link at the last moment apl Represents the modification coefficient of the average link loss rate, gamma represents the evaluation weight coefficient of the average link loss rate, Tlo al Mean value of variation, Met, representing the average load of the link al Indicating the average load of the link at the current time, Mle al Representing the average load, Cor, of the link at the last moment al Denotes a link average load correction coefficient, [ theta ] denotes a link average load evaluation weight coefficient, L num Indicating the number of links.
Optionally, the action value is calculated by:
y i =r i +δQ′(s′ i ,a max (Q(s′ i ,a i ,ω));ω′)
wherein, y i Represents the value of the motion, r i Representing the reward, delta the attenuation factor, Q the Q network, Q' the target network, s i ' indicates the network status at the next time, a i Representing the action, ω represents a parameter of the Q network, and ω' represents a parameter of the target network.
Optionally, the action-value-based loss function is specifically:
Figure BDA0003704046360000041
wherein M represents the number of samples, y i Representing the value of the action, gamma representing the attenuation factor, Q representing the Q network, s i Indicating the current time state, a i Representing the action, ω represents a parameter of the Q network.
Optionally, the obtaining of the routing path according to the node-wise graph feature and the edge-wise graph feature by using the trained BRGCN network model in step S4 specifically includes:
constructing a node-wise feature matrix by utilizing the switch feature, constructing a node-wise adjacency matrix and a node-wise degree matrix according to a node-wise graph structure, constructing an edge-wise feature matrix by utilizing the data link feature, and constructing an edge-wise adjacency matrix and an edge-wise degree matrix according to an edge-wise graph structure;
the method comprises the steps that a trained BRGCN network model is utilized to take an action value table according to a node-wise characteristic matrix, a node-wise adjacency matrix, a node-wise degree matrix, an edge-wise characteristic matrix, an edge-wise adjacency matrix and an edge-wise degree matrix;
and selecting the action as a routing path of the flow by using a greedy strategy according to the action value table.
The invention has the following beneficial effects:
1. according to the method, the deep reinforcement learning, the graph convolution neural network and the cyclic neural network are combined and applied to the load balancing algorithm, so that the model can make a decision according to the state information, the structure and the topological relation of the network are considered as the decision-making factors, the capability of processing the time sequence information is achieved, and the performance of the model is improved.
2. According to the method and the device, aiming at the optimization target of load balancing, the variance of the calculated link utilization rate is provided as a main reward factor, and the stability of the long-term state of the network can be better kept compared with the calculation of the change value of the maximum link utilization rate.
3. The method and the device use more comprehensive network states, including multidimensional characteristics of the switch and the data link and topological structure characteristics of the network, and ensure that the model can make decisions according to the more comprehensive states.
Drawings
Fig. 1 is a schematic flowchart of a deep reinforcement learning-driven load balancing method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a node-wise graph structure and an edge-wise graph structure in an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a BRGCN network model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a node-wise circulation neural network structure according to an embodiment of the present invention;
fig. 5 is an exemplary structural diagram of an electronic device in the embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, an embodiment of the present invention provides a deep reinforcement learning driven load balancing method, including the following steps S1 to S4:
s1, acquiring a network topology structure, switch characteristics and data link characteristics;
in an alternative embodiment of the present invention, in step S1:
the method specifically comprises the steps of obtaining a connection relation between a switch and a data link;
the method specifically comprises the steps of obtaining load information of the switch, flow table utilization rate information, and one-hot codes of a source switch address and a destination switch address;
the obtaining of the data link characteristics specifically includes obtaining delay, packet loss rate, and link utilization information of the data link in the network.
S2, generating node-wise graph characteristics according to the network topology structure and the switch characteristics;
in an optional embodiment of the present invention, step S2 specifically includes:
and constructing a node-wise graph structure by taking the switches as nodes and taking data links between the switches as edges.
S3, generating edge-wise graph characteristics according to the network topology structure and the data link characteristics;
in an optional embodiment of the present invention, step S3 specifically includes:
and constructing an edge-wise graph structure by taking the data links as nodes and the public switches between the data links as edges.
As shown in FIG. 2, taking the left-hand configuration as an example, each node a-F represents a switch and A-F represents an edge between switches. The modeled node-wise structure is shown as the left picture, and the edge-wise structure is shown as the right picture. It can be seen that the node-wise structure diagram (left diagram) has 6 nodes a-F and 6 edges a-F, when the edge-wise structure (right diagram) is constructed, the nodes a-F are regarded as the nodes of the new diagram, the common nodes between the nodes a-F are regarded as the edges of the new diagram, and the shared common nodes can be changed into the edges between a plurality of new diagrams. It can be seen that there are 6 nodes A-F in the new graph, corresponding to 6 edges in the original graph; meanwhile, the graph has 10 edges c1-c6, d, e1-e3, which correspond to nodes c, d, f in the original graph respectively, wherein each new edge represents the mapping of a pair of edges with a common node in the original graph.
S4, constructing and training a BRGCN model, and acquiring a routing path according to the node-wise graph characteristics and the edge-wise graph characteristics by using the trained BRGCN model.
In an optional embodiment of the present invention, the BRGCN network model constructed in step S4 specifically includes:
a node-wise recurrent pattern neural network alternately arranged by a three-layer pattern convolutional neural network and a three-layer recurrent neural network;
an edge-wise recurrent pattern neural network alternately arranged by a three-layer pattern convolution neural network and a three-layer recurrent neural network;
and the node-wise circulation diagram neural network and the edge-wise circulation diagram neural network are spliced at the tail end and connected with a layer of fully connected neural network.
As shown in fig. 3 and 4, the deep neural network in the BRGCN network model is constructed by combining the convolutional neural network and the recurrent neural network, wherein the node-wise convolutional neural network has a structure consistent with that of the edge-wise convolutional neural network, and comprises three layers of convolutional neural networks and three layers of recurrent neural networks, and a layer of recurrent neural network is connected behind each layer of convolutional neural network.
The step S4 of training the BRGCN network model specifically includes:
firstly, acquiring a network state after a routing path is executed; the network state comprises a node-wise characteristic matrix, a node-wise adjacency matrix and a node-wise degree matrix, and the three are used as the input of the node-wise recurrent pattern neural network. The network state further comprises an edge-wise characteristic matrix, an edge-wise adjacency matrix and an edge-wise degree matrix, and the three are used as the input of the edge-wise recurrent pattern neural network.
Then calculating reward according to the network state after executing the routing path; the calculation mode of the reward is as follows:
Figure BDA0003704046360000081
wherein, lr i Indicates the ith Link utilization, Ave lr Means, Cor, representing the current link utilization lr Denotes a link utilization correction coefficient, alpha denotes a link utilization evaluation weight coefficient, Tol ade Representing chainsMean of variation of the way average delay, Met ade Indicating the link average delay at the current time, Mle ade Representing the average delay, Cor, of the link at the last moment ade Denotes the link mean delay correction coefficient, beta denotes the link mean delay evaluation weight coefficient, Tol apl Mean value of variation, Met, representing the average packet loss rate of the link apl Indicating the average packet loss rate of the link at the current time, Mle apl Represents the average loss rate, Cor, of the link at the last moment apl Represents the modification coefficient of the average link loss rate, gamma represents the evaluation weight coefficient of the average link loss rate, Tlo al Mean value of variation, Met, representing the average load of the link al Indicating the average load of the link at the current time, Mle al Representing the average load, Cor, of the link at the last moment al Represents a link average load correction coefficient, theta represents a link average load evaluation weight coefficient, L num Indicating the number of links.
Then storing the network state and the reward after the routing path is executed to an experience pool;
and finally, selecting a group of experience calculation action values from the experience pool by adopting a random continuous sampling strategy, and updating the model parameters by utilizing a loss function based on the action values. The calculation mode of the action value is as follows:
y i =r i +δQ′(s′ i ,a max (Q(s′ i ,a i ,ω));ω′)
wherein, y i Represents the value of the motion, r i Representing the reward, delta the attenuation factor, Q the Q network, Q' the target network, s i ' indicates the network status at the next time, a i Representing the action, ω represents a parameter of the Q network, and ω' represents a parameter of the target network.
The loss function based on the action value is specifically as follows:
Figure BDA0003704046360000091
wherein M represents the number of samples, y i Representing the value of the motion, gamma representing the attenuation factor, Q-tableQ-network, s i Indicating the current time state, a i Representing the action, ω represents a parameter of the Q network.
In step S4, acquiring a routing path according to the node-wise graph feature and the edge-wise graph feature using the trained BRGCN network model specifically includes:
constructing a node-wise feature matrix by utilizing the switch feature, constructing a node-wise adjacency matrix and a node-wise degree matrix according to a node-wise graph structure, constructing an edge-wise feature matrix by utilizing the data link feature, and constructing an edge-wise adjacency matrix and an edge-wise degree matrix according to an edge-wise graph structure;
using the trained BRGCN model to take an action value table according to a node-wise characteristic matrix, a node-wise adjacency matrix, a node-wise degree matrix, an edge-wise characteristic matrix, an edge-wise adjacency matrix and an edge-wise degree matrix; as shown in fig. 3 and 4, specifically, the node-wise feature matrix, the node-wise adjacency matrix and the node-wise degree matrix are used as the input of the node-wise recurrent neural network, the edge-wise feature matrix, the edge-wise adjacency matrix and the edge-wise degree matrix are used as the input of the edge-wise recurrent neural network, after the processing of the node-wise recurrent neural network and the edge-wise recurrent neural network, the network state is processed into two groups of features, which are respectively the outputs of the two recurrent neural networks, the two groups of features are spliced, and finally, the action value table is output through the fully connected neural network.
And selecting the action as a routing path of the flow by a greedy strategy according to the action value table. And the greedy strategy specifically comprises the steps that if the greedy degree is e, the probability that the model has e is executed by the action with the highest value selected according to the action value table, and one action is randomly selected to be executed by the probability of 1-e.
After the SDN controller is started, the network is initialized first, and basic information of all switches and links in the network and a network topology structure formed by the physical devices are obtained.
According to the topological structure of the network, the switches are regarded as nodes, data links among the switches are regarded as edges, and a node-wise graph structure is modeled, the data links are regarded as nodes, and a public switch among the data links is regarded as an edge, and an edge-wise graph structure is modeled.
The invention utilizes the SDN controller to monitor the state of the whole network constantly, sends an inquiry message to the switch through the SDN communication protocol, the switch returns a reply message, and obtains the switch characteristic and the data link characteristic of the network according to the analysis of the reply message.
Each time a new flow arrives at the network, the SDN controller sends the current network state to the BRGCN network model. After the model is analyzed through the network state, a reasonable routing path is decided, and then the routing of the current flow is executed according to the routing path.
The deep reinforcement learning driven load balancing method for the convolutional neural network and the recurrent neural network of the present application is further described below by way of examples, and it should be understood that the examples do not constitute any limitation to the present application.
In this embodiment, the hardware platform is implemented by a Dell Precision T7920 tower workstation, and is programmed using Java and Python languages.
We use Java to implement a controller of the SDN, which performs state detection, routing control, data processing, and reward calculation for the entire network. Whenever a new flow arrives at the network, the controller sends the latest network state to the model, followed by routing the flow according to the routing path decided by the model. And collecting new network states after the routing flow, calculating rewards, and then sending (state, action, reward, new state) tuples as experiences to an experience pool.
We implemented a model of deep reinforcement learning using Python. In the operation process of the model, receiving a request from the controller and state data of the network, wherein the state data comprises a node-wise feature matrix, a node-wise adjacency matrix, a node-wise degree matrix, an edge-wise feature matrix, an edge-wise adjacency matrix and an edge-wise degree matrix. And inputting the state data into two recurrent graph neural networks for processing, finally outputting an action value table through splicing and full connection layer processing, deciding a routing path by the model according to the action value table through a greedy strategy, and finally sending the routing path to the SDN controller for execution. The controller collects new state after performing routing, calculates rewards, and sends experiences to the experience pool of models. After the experience is stored to a certain scale, the model adopts a random continuous sampling strategy to acquire a group of experiences to update the parameters of the neural network.
The load balancing method based on the deep reinforcement learning drive can realize intelligent routing of the flow in the SDN data center network, achieves the load balancing effect, improves the maximum link utilization rate index by more than 20% compared with methods such as ECMP, and meanwhile, is superior to the ECMP methods in indexes such as delay rate and packet loss rate.
The application also provides an electronic device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to implement the load balancing method based on deep reinforcement learning driving.
The present application also provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the load balancing method based on deep reinforcement learning driving can be implemented.
Fig. 5 is an exemplary block diagram of an electronic device capable of implementing a deep reinforcement learning-driven based load balancing method according to an embodiment of the present application.
As shown in fig. 5, the electronic device includes an input device 501, an input interface 502, a central processor 503, a memory 504, an output interface 505, and an output device 506. The input interface 502, the central processing unit 503, the memory 504 and the output interface 505 are connected to each other through a bus 507, and the input device 501 and the output device 506 are connected to the bus 507 through the input interface 502 and the output interface 505, respectively, and further connected to other components of the electronic device. Specifically, the input device 504 receives input information from the outside and transmits the input information to the central processor 503 through the input interface 502; the central processor 503 processes input information based on computer-executable instructions stored in the memory 504 to generate output information, temporarily or permanently stores the output information in the memory 504, and then transmits the output information to the output device 506 through the output interface 505; the output device 506 outputs the output information to the outside of the electronic device for use by the user.
That is, the electronic device shown in fig. 5 may also be implemented to include: a memory storing computer executable instructions; and one or more processors that when executing computer executable instructions may implement the deep reinforcement learning driven based load balancing methodology described in conjunction with fig. 1.
In one embodiment, the electronic device shown in fig. 5 may be implemented to include: a memory 504 configured to store executable program code; one or more processors 503 configured to execute executable program code stored in the memory 504 to perform the human-machine multi-turn dialog method in the above-described embodiments.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media include both permanent and non-permanent, removable and non-removable media, and may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Furthermore, it will be obvious that the term "comprising" does not exclude other elements or steps. A plurality of units, modules or devices recited in the device claims may also be implemented by one unit or overall device by software or hardware.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks identified in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The Processor in this embodiment may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may be used to store computer programs and/or modules, and the processor may implement various functions of the apparatus/terminal device by running or executing the computer programs and/or modules stored in the memory, as well as by invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
In this embodiment, the module/unit integrated with the apparatus/terminal device may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, all or part of the flow in the method according to the embodiments of the present invention may also be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of the embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in the jurisdiction. Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely a preferred embodiment of this invention, which is intended to be illustrative, not limiting; it will be appreciated by those skilled in the art that many changes, modifications and equivalents can be made thereto within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A load balancing method based on deep reinforcement learning drive is characterized by comprising the following steps:
s1, acquiring a network topology structure, switch characteristics and data link characteristics;
s2, generating node-wise graph characteristics according to the network topology structure and the switch characteristics;
s3, generating edge-wise graph characteristics according to the network topology structure and the data link characteristics;
s4, constructing and training a BRGCN model, and acquiring a routing path according to the node-wise graph characteristics and the edge-wise graph characteristics by using the trained BRGCN model.
2. The load balancing method based on deep reinforcement learning driving according to claim 1, wherein in step S1:
the method specifically comprises the steps of obtaining a connection relation between a switch and a data link;
the method specifically comprises the steps of obtaining load information of the switch, flow table utilization rate information, and one-hot codes of a source switch address and a destination switch address;
the obtaining of the data link characteristics specifically includes obtaining delay, packet loss rate, and link utilization information of the data link in the network.
3. The load balancing method based on deep reinforcement learning driving according to claim 1, wherein the step S2 specifically includes:
and constructing a node-wise graph structure by taking the switches as nodes and taking data links between the switches as edges.
4. The load balancing method based on deep reinforcement learning driving according to claim 1, wherein the step S3 specifically includes:
and taking the data links as nodes and the public switch among the data links as edges to construct an edge-wise graph structure.
5. The load balancing method based on deep reinforcement learning driving as claimed in claim 1, wherein the BRGCN network model constructed in step S4 specifically includes:
a node-wise recurrent pattern neural network alternately arranged by a three-layer pattern convolutional neural network and a three-layer recurrent neural network;
an edge-wise recurrent pattern neural network alternately arranged by a three-layer pattern convolution neural network and a three-layer recurrent neural network;
and the node-wise circulation diagram neural network and the edge-wise circulation diagram neural network are spliced at the tail end and connected with a layer of fully connected neural network.
6. The load balancing method based on deep reinforcement learning driving as claimed in claim 1, wherein the training of the BRGCN network model in step S4 specifically includes:
acquiring a network state after the routing path is executed;
calculating rewards according to the network state after the routing path is executed;
storing the network state and the reward after the routing path is executed to an experience pool;
and selecting a group of experience calculation action values from the experience pool by adopting a random continuous sampling strategy, and updating model parameters by using a loss function based on the action values.
7. The deep reinforcement learning drive-based load balancing method according to claim 6, wherein the reward is calculated by:
Figure FDA0003704046350000021
wherein, lr i Indicates ith Link utilization, Ave lr Means, Cor, representing the current link utilization lr Denotes a link utilization correction coefficient, alpha denotes a link utilization evaluation weight coefficient, Tol ade Mean value of variation, Met, representing the mean delay of the link ade Indicating the link average delay at the current time, Mle ade Representing the average delay, Cor, of the link at the last moment ade Denotes the link mean delay correction coefficient, beta denotes the link mean delay evaluation weight coefficient, Tol apl Mean value of variation, Met, representing the average packet loss rate of the link apl Indicating the average packet loss rate of the link at the current time, Mle apl Represents the average loss rate, Cor, of the link at the last moment apl Represents the modification coefficient of the average link loss rate, gamma represents the evaluation weight coefficient of the average link loss rate, Tlo al Representing link averagesMean change in load, Met al Indicating the average load of the link at the current time, Mle al Representing the average load, Cor, of the link at the last moment al Represents a link average load correction coefficient, theta represents a link average load evaluation weight coefficient, L num Indicating the number of links.
8. The load balancing method based on deep reinforcement learning driving according to claim 6, wherein the action value is calculated by:
y i =r i +δQ′(s′ i ,a max (Q(s′ i ,a i ,ω));ω′)
wherein, y i Represents the value of the motion, r i Representing the reward, delta the attenuation factor, Q the Q network, Q' the target network, s i ' indicates the network status at the next time, a i Representing the action, ω represents a parameter of the Q network, and ω' represents a parameter of the target network.
9. The deep reinforcement learning drive-based load balancing method according to claim 6, wherein the action value-based loss function is specifically:
Figure FDA0003704046350000031
wherein M represents the number of samples, y i Representing the value of the action, gamma the attenuation factor, Q the Q network, s i Indicating the current time state, a i Representing the action and ω representing the parameters of the Q network.
10. The load balancing method based on deep reinforcement learning driving as claimed in claim 1, wherein the step S4 of obtaining the routing path according to the node-wise graph feature and the edge-wise graph feature by using the trained BRGCN network model specifically includes:
constructing a node-wise feature matrix by utilizing the switch feature, constructing a node-wise adjacency matrix and a node-wise degree matrix according to a node-wise graph structure, constructing an edge-wise feature matrix by utilizing the data link feature, and constructing an edge-wise adjacency matrix and an edge-wise degree matrix according to an edge-wise graph structure;
using the trained BRGCN model to take an action value table according to a node-wise characteristic matrix, a node-wise adjacency matrix, a node-wise degree matrix, an edge-wise characteristic matrix, an edge-wise adjacency matrix and an edge-wise degree matrix;
and selecting the action as a routing path of the flow by a greedy strategy according to the action value table.
CN202210700058.1A 2022-06-20 2022-06-20 Load balancing method based on deep reinforcement learning drive Pending CN115102906A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210700058.1A CN115102906A (en) 2022-06-20 2022-06-20 Load balancing method based on deep reinforcement learning drive

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210700058.1A CN115102906A (en) 2022-06-20 2022-06-20 Load balancing method based on deep reinforcement learning drive

Publications (1)

Publication Number Publication Date
CN115102906A true CN115102906A (en) 2022-09-23

Family

ID=83292787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210700058.1A Pending CN115102906A (en) 2022-06-20 2022-06-20 Load balancing method based on deep reinforcement learning drive

Country Status (1)

Country Link
CN (1) CN115102906A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116155819A (en) * 2023-04-20 2023-05-23 北京邮电大学 Method and device for balancing load in intelligent network based on programmable data plane

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020254924A1 (en) * 2019-06-16 2020-12-24 Way2Vat Ltd. Systems and methods for document image analysis with cardinal graph convolutional networks
CN113572697A (en) * 2021-07-20 2021-10-29 电子科技大学 Load balancing method based on graph convolution neural network and deep reinforcement learning
US20220027792A1 (en) * 2021-10-08 2022-01-27 Intel Corporation Deep neural network model design enhanced by real-time proxy evaluation feedback
CN114254738A (en) * 2021-12-16 2022-03-29 中国人民解放军战略支援部队信息工程大学 Double-layer evolvable dynamic graph convolution neural network model construction method and application
US20220124543A1 (en) * 2021-06-30 2022-04-21 Oner Orhan Graph neural network and reinforcement learning techniques for connection management
CN114567598A (en) * 2022-02-25 2022-05-31 重庆邮电大学 Load balancing method and device based on deep learning and cross-domain cooperation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020254924A1 (en) * 2019-06-16 2020-12-24 Way2Vat Ltd. Systems and methods for document image analysis with cardinal graph convolutional networks
US20220124543A1 (en) * 2021-06-30 2022-04-21 Oner Orhan Graph neural network and reinforcement learning techniques for connection management
CN113572697A (en) * 2021-07-20 2021-10-29 电子科技大学 Load balancing method based on graph convolution neural network and deep reinforcement learning
US20220027792A1 (en) * 2021-10-08 2022-01-27 Intel Corporation Deep neural network model design enhanced by real-time proxy evaluation feedback
CN114254738A (en) * 2021-12-16 2022-03-29 中国人民解放军战略支援部队信息工程大学 Double-layer evolvable dynamic graph convolution neural network model construction method and application
CN114567598A (en) * 2022-02-25 2022-05-31 重庆邮电大学 Load balancing method and device based on deep learning and cross-domain cooperation

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
WILE SEHERY等: "Load balancing in data center networks with folded-Clos architectures", 《PROCEEDINGS OF THE 2015 1ST IEEE CONFERENCE ON NETWORK SOFTWARIZATION (NETSOFT)》, 4 June 2015 (2015-06-04) *
XING LV等: "Traffic Network Resilience Analysis Based On The GCN-RNN Prediction Model", 《2019 INTERNATIONAL CONFERENCE ON QUALITY, RELIABILITY, RISK, MAINTENANCE, AND SAFETY ENGINEERING (QR2MSE)》, 5 March 2020 (2020-03-05) *
ZENG X等: "Deep reinforcement learning with graph convolutional networks for load balancing in sdn-based data center networks", 《2021 18TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP)》, 19 January 2022 (2022-01-19) *
ZHAO L等: "A temporal graph convolutional network for traffic prediction", 《IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 》, 22 August 2019 (2019-08-22) *
仝宗和;袁立宁;王洋;: "图卷积神经网络理论与应用", 信息技术与信息化, no. 02, 28 February 2020 (2020-02-28) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116155819A (en) * 2023-04-20 2023-05-23 北京邮电大学 Method and device for balancing load in intelligent network based on programmable data plane
CN116155819B (en) * 2023-04-20 2023-07-14 北京邮电大学 Method and device for balancing load in intelligent network based on programmable data plane

Similar Documents

Publication Publication Date Title
CN113572697B (en) Load balancing method based on graph convolution neural network and deep reinforcement learning
CN109635989B (en) Social network link prediction method based on multi-source heterogeneous data fusion
US9672465B2 (en) Solving vehicle routing problems using evolutionary computing techniques
US7537523B2 (en) Dynamic player groups for interest management in multi-character virtual environments
JPH0693680B2 (en) Route selection method in data communication network
CN110619082B (en) Project recommendation method based on repeated search mechanism
CN113194034A (en) Route optimization method and system based on graph neural network and deep reinforcement learning
JP2013190218A (en) Route search method, route search device, and program
Liu et al. Effects of information heterogeneity in Bayesian routing games
CN115529316A (en) Micro-service deployment method based on cloud computing center network architecture
CN115102906A (en) Load balancing method based on deep reinforcement learning drive
CN105634974A (en) Route determining method and apparatus in software-defined networking
CN116310667B (en) Self-supervision visual characterization learning method combining contrast loss and reconstruction loss
Du et al. GAQ-EBkSP: a DRL-based urban traffic dynamic rerouting framework using fog-cloud architecture
CN112566093A (en) Terminal relation identification method and device, computer equipment and storage medium
CN113886073A (en) Edge data processing method, system, device and medium
CN111191778B (en) Deep learning network processing method, device and compiler
CN111770152B (en) Edge data management method, medium, edge server and system
CN113726692B (en) Virtual network mapping method and device based on generation of countermeasure network
CN115865607A (en) Distributed training computing node management method and related device
CN114422453B (en) Method, device and storage medium for online planning of time-sensitive stream
CN113905066B (en) Networking method of Internet of things, networking device of Internet of things and electronic equipment
CN109543725A (en) A kind of method and device obtaining model parameter
CN106600062A (en) Method for calculating the shortest path of single source in multi-region-cross complex network diagram
Araújo et al. Lagrangian relaxation for maximum service in multicast routing with QoS constraints

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination