CN111817975A

CN111817975A - Hybrid intra-network dynamic load balancing method, device and system

Info

Publication number: CN111817975A
Application number: CN202010720378.4A
Authority: CN
Inventors: 姚海鹏; 买天乐; 忻向军
Original assignee: China Communications Communication Network Technology Co ltd; Beijing University of Posts and Telecommunications
Current assignee: China Communications Communication Network Technology Co ltd; Beijing University of Posts and Telecommunications
Priority date: 2020-07-23
Filing date: 2020-07-23
Publication date: 2020-10-23
Anticipated expiration: 2040-07-23
Also published as: CN111817975B

Abstract

The invention provides a method, a device and a system for balancing dynamic loads in a hybrid network, which relate to the technical field of communication, are applied to a first distributed switch and comprise the following steps: after performing the action based on the local policy, sending parameter information of the first distributed switch to the centralized platform; receiving strategy correction information determined by the centralized platform based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network; updating the local policy based on the policy modification information and a preset policy update formula so that the first distributed switch executes a next action based on the updated local policy; the preset strategy updating formula introduces a benchmark mechanism. The embodiment of the invention can realize the cooperation of all the distributed exchangers on the basis of ensuring the distributed execution, effectively avoid the learning difficulty and improve the optimization efficiency.

Description

Hybrid intra-network dynamic load balancing method, device and system

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a method, an apparatus, and a system for dynamic load balancing in a hybrid network.

Background

The existing distributed switches act independently, and communication does not exist among the distributed switches, so that good cooperation cannot be achieved. The existing load balancing method is difficult to evaluate how much the action executed by each distributed switch actually returns to the global state on the basis of the existing distributed switches, so that the optimization is difficult. Therefore, the existing load balancing method has the technical problems that the distributed switches cannot cooperate with one another and local strategies in the distributed switches are difficult to optimize.

Disclosure of Invention

The invention aims to provide a method, a device and a system for dynamic load balancing in a hybrid network, which are used for solving the technical problems that the distributed switches cannot cooperate and local strategies in the distributed switches are difficult to optimize in the prior art.

In a first aspect, the present invention provides a method for dynamic load balancing in a hybrid network, where the method is applied to a first distributed switch, and includes: after performing an action based on a local policy, sending parameter information for the first distributed switch to a centralized platform; receiving policy modification information determined by the centralized platform based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network; updating the local policy based on the policy modification information and a preset policy update formula so that the first distributed switch executes a next action based on the updated local policy; and introducing a benchmark mechanism into the preset strategy updating formula.

Further, updating the local policy based on the policy modification information and a preset policy update formula includes: calculating based on the strategy correction information and a preset reference formula to obtain reference information; and updating the local strategy based on the reference information and the preset strategy updating formula.

Further, the parameter information of the first distributed switch includes: observation information of the first distributed switch and action information of the first distributed switch; prior to performing the action based on the local policy, the method includes: acquiring observation information of a first distributed switch and action information of the first distributed switch; determining observation information of the first distributed switch and action information of the first distributed switch as the parameter information.

In a second aspect, the present invention provides a method for dynamic load balancing in a hybrid network, where the method is applied to a centralized platform, and the method includes: receiving parameter information of a first distributed switch sent by the first distributed switch after performing an action based on a local policy; determining policy revision information based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network; sending the policy correction information to the first distributed switch, so that the first distributed switch updates the local policy based on the policy correction information and a preset policy update formula; and introducing a benchmark mechanism into the preset strategy updating formula.

Further, the method further comprises: and sending the strategy correction information to the second distributed switch so that the second distributed switch updates the local strategy of the second distributed switch based on the strategy correction information and a preset strategy updating formula.

In a third aspect, the present invention provides a hybrid intra-network dynamic load balancing apparatus, where the apparatus is applied to a first distributed switch, and the apparatus includes: a first sending unit, configured to send parameter information of the first distributed switch to a centralized platform after performing an action based on a local policy; a first receiving unit, configured to receive policy modification information determined by the centralized platform based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network; an updating unit, configured to update the local policy based on the policy modification information and a preset policy updating formula, so that the first distributed switch executes a next action based on the updated local policy; and introducing a benchmark mechanism into the preset strategy updating formula.

In a third aspect, the present invention provides a hybrid intra-network dynamic load balancing apparatus, where the apparatus is applied to a centralized platform, and the apparatus includes: a second receiving unit, configured to receive parameter information of a first distributed switch sent by the first distributed switch after performing an action based on a local policy; a determining unit, configured to determine policy modification information based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network; a second sending unit, configured to send the policy modification information to the first distributed switch, so that the first distributed switch updates the local policy based on the policy modification information and a preset policy update formula; and introducing a benchmark mechanism into the preset strategy updating formula.

In a fourth aspect, the present invention further provides a hybrid intra-network dynamic load balancing system, including the first distributed switch according to any one of the first aspects, the centralized platform according to any one of the second aspects, and the second distributed switch.

In a fifth aspect, the present invention further provides an electronic device, including a memory and a processor, where the memory stores a computer program executable on the processor, and the processor executes the steps of the hybrid intra-network dynamic load balancing method implemented when the processor executes the computer program.

In a sixth aspect, the present invention further provides a computer readable medium having non-volatile program code executable by a processor, wherein the program code causes the processor to execute the hybrid intra-network dynamic load balancing method.

The invention provides a dynamic load balancing method in a hybrid network, which is applied to a first distributed switch and comprises the following steps: after performing the action based on the local policy, sending parameter information of the first distributed switch to the centralized platform; receiving strategy correction information determined by the centralized platform based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network; updating the local policy based on the policy modification information and a preset policy update formula so that the first distributed switch executes a next action based on the updated local policy; the preset strategy updating formula introduces a benchmark mechanism. The first distributed switch in the embodiment of the invention is any one distributed switch in a network, and is communicated with the centralized platform, and all other distributed switches except the first distributed switch are also communicated with the centralized platform. In addition, the embodiment of the invention can effectively avoid the learning difficulty and improve the optimization efficiency by introducing a reference mechanism.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of a load balancing method in the prior art;

FIG. 2 is a framework diagram of a centralized platform and distributed switches;

fig. 3 is a flowchart of a method for dynamic load balancing in a hybrid network according to an embodiment of the present invention;

FIG. 4 is a flow chart of a hybrid network load balancing method;

fig. 5 is a convergence diagram of a hybrid intra-network dynamic load balancing method according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating different network topology environments according to an embodiment of the present invention;

fig. 7 is a flowchart of another hybrid intra-network dynamic load balancing method according to an embodiment of the present invention;

fig. 8 is a flowchart of a hybrid intra-network dynamic load balancing method according to another embodiment of the present invention;

fig. 9 is a schematic structural diagram of a hybrid intra-network dynamic load balancing apparatus according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of another hybrid intra-network dynamic load balancing apparatus according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of a hybrid intra-network dynamic load balancing system according to an embodiment of the present invention.

Icon:

11-a first sending unit; 12-a first receiving unit; 13-an update unit; 14-a second receiving unit; 15-a determination unit; 16-a second transmitting unit; 10-a first distributed switch; 20-a centralized platform; 30-a second distribution switch.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Load balancing refers to how to distribute traffic evenly over different available paths of the network to avoid congestion, and the actual effect of load balancing has a great impact on the network transmission performance. The existing load balancing method, as shown in fig. 1, can be divided into the following two methods: a load balancing method based on a centralized architecture (SDN) as shown in (a) of fig. 1 and a load balancing method based on a distributed architecture (ECMP) as shown in (b) of fig. 1. The load balancing method based on the centralized architecture is to continuously collect the global Network state through a centralized Network control plane (SDN Controller), and dynamically adjust the link load according to the congestion condition in the Network. Based on a distributed load balancing method (e.g., ECMP), received traffic is dynamically distributed on multiple egress links directly through the switch to achieve the effect of load balancing.

For a distributed load balancing approach, the control plane and the forwarding plane are tightly coupled in the network device. Each node has only a local view of the network and control capabilities. In this case, the learning result of applying artificial intelligence and machine learning algorithm may have serious non-convergence problem, especially in the case of taking global optimum as the convergence target. In recent years, SDN architectures decouple the control plane from the network hardware devices and provide a centralized control plane for overall network control. By deploying machine learning algorithms within a centralized controller, network hardware devices (which may be referred to as agents, e.g., switches) can interact with the entire network environment and converge to a globally optimal solution. In contrast, although the necessity of centralized optimization for intelligent control of the network is obvious, a completely centralized artificial intelligence control incurs a large overhead. This overhead includes both the communication overhead of the distributed network element nodes (i.e., switches) with the centralized control plane, and the overhead of training and running computations on each packet. In a centralized architecture, constructing a forwarding path for a single data stream requires programming all routers. Meanwhile, the routing calculation of each flow requires complex calculation of the global network state, and the controller needs to recalculate the forwarding logic each time the network state changes. Therefore, for large scale ultra-high dynamic networks, this communication pressure and computational burden can introduce excessive response latency. Therefore, as two-level end points of the system organization mode, the 'centralized' and the 'distributed' have the original defects and advantages. Among them, the distributed architecture easily causes the learning result not to converge, but has faster forwarding and processing speed of the data packet. Centralized learning with global view and control tends to result in excessive communication and computational overhead.

To sum up, the capability of load balancing is deployed in distributed switching nodes (i.e. switches), so as to form a multi-agent system, and how to realize cooperation among the multi-agents is the key for efficient operation of the system. Therefore, the hybrid intra-network dynamic load balancing method is a hybrid reinforcement learning method. The method mainly solves two problems of the distributed intelligent agent, which are respectively as follows: 1. a single-agent reinforcement learning model commonly used for a moving target problem often leads to insufficient information sharing between switches due to independent training, and cannot achieve good cooperation, so that a load balancing effect is poor. 2. The credit allocation problem credit assignment conventional reinforcement learning method is difficult to evaluate how much the action taken by each agent actually affects the global reward because the reward trained by each agent (i.e. switch) is based on the global reward, thus resulting in difficulty in optimization.

Based on this, embodiments of the present invention provide a method, an apparatus, and a system for dynamic load balancing in a hybrid network, which, aiming at problem 1, adopt a framework of "centralized learning-distributed execution", as shown in fig. 2, a centralized network platform (referred to as a centralized platform for short) collects state data of the whole network to continuously modify a local policy of a distributed switch, so as to implement cooperation between the distributed switches. Aiming at the problem 2, the learning difficulty can be effectively avoided by introducing a benchmark mechanism, and the optimization efficiency is improved.

For the convenience of understanding the embodiment, a detailed description will be first given of a hybrid intra-network dynamic load balancing method disclosed in the embodiment of the present invention.

Example 1:

in accordance with an embodiment of the present invention, there is provided an embodiment of a method for dynamic load balancing within a hybrid network, where the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions, and where a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than that illustrated herein.

Fig. 3 is a flowchart of a hybrid intra-network dynamic load balancing method according to an embodiment of the present invention, which is applied to a first distributed switch, and as shown in fig. 3, the method includes the following steps:

step S101, after executing the action based on the local policy, sending parameter information of the first distributed switch to the centralized platform.

And step S102, receiving strategy correction information determined by the centralized platform based on the parameter information of the first distributed switch and the parameter information of the second distributed switch.

In an embodiment of the invention, the second distribution switch is all other distribution switches within the network except the first distribution switch.

Step S103, updating a local strategy based on the strategy correction information and a preset strategy updating formula so that the first distributed switch executes the next action based on the updated local strategy; the preset strategy updating formula introduces a benchmark mechanism.

The invention provides a dynamic load balancing method in a hybrid network, which is applied to a first distributed switch, and the method comprises the steps of firstly, sending parameter information of the first distributed switch to a centralized platform after executing actions based on a local strategy; then receiving strategy correction information determined by the centralized platform based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network; finally, updating the local strategy based on the strategy correction information and a preset strategy updating formula so that the first distributed switch executes the next action based on the updated local strategy; the preset strategy updating formula introduces a benchmark mechanism. The first distributed switch in the embodiment of the invention is any one distributed switch in a network, and is communicated with the centralized platform, and all other distributed switches except the first distributed switch are also communicated with the centralized platform. In addition, the embodiment of the invention can effectively avoid the learning difficulty and improve the optimization efficiency by introducing a reference mechanism.

In an alternative embodiment, step S103, updating the local policy based on the policy modification information and the preset policy update formula, includes the following steps:

step 1, calculating based on strategy correction information and a preset reference formula to obtain reference information;

and 2, updating the local strategy based on the reference information and a preset strategy updating formula.

In the embodiment of the present invention, if there are N distributed switches, the ith distributed switch is taken as the first distributed switch. In the embodiment of the present invention, the policy modification information is denoted as Q (state, action) or Q (s, a), and the preset reference formula is:

wherein, b (s, a)_-i) Is reference information (or called reference variable), a_-iFor the actions of the distributed switches other than the ith distributed switch, s is the full network state and is all local states o_iA set of_iIs the action information of the ith distributed switch, pi (a)_i|o_i) Is a local policy, i.e. pi (action | occurrence), o_iIs a local state, i.e., the observation of the ith distributed switch (which may be referred to as local observation, or observation), Q (s, (a)_i,a_-i) Act a to fix all switches except i_-iChange of a_iThe obtained policy modification information may be referred to as a global Q value, i.e., Q (s, a) described above.

The preset reference formula may characterize the expectation of a global Q value that can be obtained in all possible selection states of the ith distributed switch, that is, b (s, a)_-i) I.e. the reference information that is conventionally understood.

Further, the existing policy update formula is:

where E represents the expectation of a global Q value,

represents a gradient in

Trending toward zero, indicates that the local policy is optimized; pi (a)_i|o_i) Representing a continuously updated local policy, Q (s, a) is policy revision information.

The method adds the reference information into the existing preset strategy updating formula to obtain the preset strategy updating formula, namely:

a baseline mechanism is introduced, and the problem of reward allocation can be effectively solved. According to the preset strategy updating formula, the local strategy pi (a) on the ith distributed switch_i|o_i) Depends on a global Q value whose size is centralizedAnd (4) calculating.

Specifically, the calculation of the global Q value is realized through a neural network function, so that the global Q value can use Q_θ(s, a) the update process of the neural network can be done by an Adma optimizer. Wherein the Adma optimizer may adopt a neural network to the global Q_θThe (s, a) value is updated, and other models can be adopted to update the global Q_θThe (s, a) value is updated, and the embodiment of the present invention does not specifically limit the specific type of the model. The following description is given taking an example in which an Adma optimizer employs a neural network: adma optimizer using neural networks for global Q_θUpdating the (s, a) value, wherein the corresponding updating formula is as follows:

for

y＝r(s,a)+γE_{a′～π(s′)}(Q_θ(s′,a′))

in the formula, theta is a parameter value to be optimized in the neural network function; l (theta) is an objective function value and represents the global Q calculated by the neural network under the parameter value theta_θThe difference between the (s, a) value and the optimization goal, the smaller the value of the objective function, means that the neural network function used predicts the global Q_θThe more accurate the (s, a) value; s is the state of the current moment, s 'is the state of the next moment, a is the action of the current moment, and a' is the next action; q_θ(s, a) is a Q value calculated by the neural network in the state s at the present time and the action a at the present time; d is a set of a plurality of (s, a, s ', a'), since (s, a, s ', a') needs to know the data (s ', a') of the next time (s, a), it needs to store (s, a, s ', a') at the time of updating, and N times later, N E are used_{(s,a,s′,a′)}To calculate L (θ), where E_{(s,a,s′,a′)}The expectation of the calculation result of N (s, a, s ', a') data is referred to; r (s, a) refers to the immediate reward obtained by doing action a in state s. Gamma is a discount factor and is a common parameter in neural network updating; pi refers to the set of all switch local policies; a 'to π (s') meansLocal strategy pi and action a 'under state s'; e is desired; q_θ(s ', a') is a Q value calculated by the neural network in the state s at the next time and in the next action a.

The embodiment of the invention introduces theta to calculate the Q value by utilizing a neural network, namely Q is_θ(s, a) is converted into a function that can be calculated and updated, and the value of L (theta) can be continuously reduced by using an Adma optimizer. After the Adma optimizer updates the global Q value, based on a preset strategy updating formula, the centralized platform can correct the strategy pi (a) on the first distributed switch in a mode of issuing the global Q value_i|o_i) Thereby realizing the cooperative capability of each distributed switch.

In an alternative embodiment, the parameter information of the first distributed switch includes: observation information of the first distributed switch (o in fig. 4)_i) And action information of the first distributed switch (a in fig. 4)_i) (ii) a Before performing the action based on the local policy, the method comprises the steps of:

step 10, acquiring observation information of a first distributed switch and action information of the first distributed switch;

and step 20, determining the observation information of the first distributed switch and the action information of the first distributed switch as parameter information.

The centralized platform is used only in the computation of policy revision information, which can be trained based on the actions of all distributed switches and the global network state. For the first distributed switch, not only the process of local policy update occurs, but also the action is performed, and the training of the first distributed switch is performed only by means of the observation information (local observation) of the first distributed switch, i.e. learning the local policy pi (action | observation), not learning pi (action | state).

The application designs a hybrid (centralized + distributed) network load balancing method. As shown in FIG. 4, the method combines centralized intelligence and distributed intelligence, and combines the global optimization capability of centralized control and the fast processing and forwarding capability of distributed control. The first distributed switch can be continuously subjected to strategy correction through the centralized platform, so that the effect of centralized learning distributed decision is achieved, namely distributed actions can be achieved in the first distributed switch, and overall convergence can be guaranteed. As can be seen in FIG. 4, local state o is obtained from the start of training to switch i (i.e., the first distributed switch)_iDuring the whole training process, utilize

And performing updating learning of the local strategy. As shown in fig. 5, the hybrid intra-network dynamic load balancing method of the present application corresponds to the upper broken line, and the conventional load balancing method corresponds to the lower broken line. The abscissa represents the training times, and the ordinate represents the total network revenue value (i.e., the network broadband utilization rate), and it can be known from fig. 5 that the performance of convergence of the method can be better than that of the conventional method because the method can effectively converge to a better performance in the operation process. As shown in fig. 6, the broken line with inverted triangle represents the method proposed in the present application, and the broken line with circle and the broken line with square are the two existing load balancing methods. The abscissa represents the training times, and the ordinate represents the yield value of the whole network, and it can be known from fig. 6 that the method provided by the application can achieve better performance indexes in different network topology environments.

Compared with a completely centralized and completely distributed solution, the mixed intra-network dynamic load balancing method provided by the embodiment of the invention adopts a centralized learning and distributed execution architecture, and realizes the correction of the strategy learning process of the first distributed switch through a centralized network control platform. The framework can realize the cooperation and learning capacity among all the distributed switches on the premise of ensuring that the load balancing strategy is executed in a distributed mode. Namely, the method can ensure the quick response to the network state and simultaneously realize the global convergence capability of the network.

Example 2:

Fig. 7 is a flowchart of another hybrid intra-network dynamic load balancing method according to an embodiment of the present invention, which is applied to a centralized platform, and as shown in fig. 7, the method includes the following steps:

step S201, receiving parameter information of the first distributed switch sent by the first distributed switch after performing the action based on the local policy.

Step S202, strategy correction information is determined based on the parameter information of the first distributed switch and the parameter information of the second distributed switch.

Step S203, sending strategy correction information to the first distributed switch, so that the first distributed switch updates the local strategy based on the strategy correction information and a preset strategy updating formula; the preset strategy updating formula introduces a benchmark mechanism.

The invention provides a dynamic load balancing method in a hybrid network, which is applied to a centralized platform and comprises the steps of firstly receiving parameter information of a first distributed switch, which is sent by the first distributed switch after the first distributed switch executes actions based on a local strategy; then determining strategy correction information based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network; finally, strategy correction information is sent to the first distributed switch, so that the first distributed switch updates the local strategy based on the strategy correction information and a preset strategy updating formula; the preset strategy updating formula introduces a benchmark mechanism. The first distributed switch in the embodiment of the invention is any one distributed switch in a network, and is communicated with the centralized platform, and all other distributed switches except the first distributed switch are also communicated with the centralized platform. In addition, the embodiment of the invention can effectively avoid the learning difficulty and improve the optimization efficiency by introducing a reference mechanism.

In an alternative embodiment, as shown in fig. 8, the method further comprises:

and step S204, sending the strategy correction information to the second distributed switch, so that the second distributed switch updates the local strategy of the second distributed switch based on the strategy correction information and a preset strategy updating formula.

The hybrid intra-network dynamic load balancing method provided by the embodiment of the invention can realize the cooperative learning capability of the distributed switching nodes, and solves the moving target problem and the credit assignment problem of the traditional single-agent reinforcement learning in the multi-agent environment.

Example 3:

the embodiment of the present invention provides a hybrid intra-network dynamic load balancing apparatus, which is mainly used to execute the hybrid intra-network dynamic load balancing method provided in embodiment 1, and the following describes the hybrid intra-network dynamic load balancing apparatus provided in the embodiment of the present invention in detail.

Fig. 9 is a schematic structural diagram of a hybrid intra-network dynamic load balancing apparatus according to an embodiment of the present invention. As shown in fig. 9, the hybrid intra-network dynamic load balancing apparatus is applied to a first distribution switch, and mainly includes: a first sending unit 11, a first receiving unit 12 and an updating unit 13, wherein:

a first sending unit 11, configured to send parameter information of the first distributed switch to the centralized platform after performing the action based on the local policy;

a first receiving unit 12, configured to receive policy modification information determined by the centralized platform based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network;

an updating unit 13, configured to update the local policy based on the policy modification information and a preset policy updating formula, so that the first distributed switch executes a next action based on the updated local policy; the preset strategy updating formula introduces a benchmark mechanism.

The invention provides a dynamic load balancing device in a hybrid network, which is applied to a first distributed switch.A first sending unit 11 is used for sending parameter information of the first distributed switch to a centralized platform after executing actions based on a local strategy; then, the first receiving unit 12 is used to receive the strategy correction information determined by the centralized platform based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network; finally, updating the local strategy by using an updating unit 13 based on the strategy correction information and a preset strategy updating formula so that the first distributed switch executes the next action based on the updated local strategy; the preset strategy updating formula introduces a benchmark mechanism. The first distributed switch in the embodiment of the present invention is any one distributed switch in a network, which communicates with the centralized platform, and all other distributed switches except the first distributed switch also communicate with the centralized platform, and the first sending unit 11 and the first receiving unit 12 are used to implement that the first distributed switch only communicates with the centralized platform, and the centralized platform communicates with each distributed switch, so that the local policy of each switch can be further modified, and further, the cooperation of each distributed switch can be implemented on the basis of ensuring distributed execution. In addition, the embodiment of the invention can effectively avoid the learning difficulty and improve the optimization efficiency by using the mode of introducing the reference mechanism into the updating unit 13.

Optionally, the updating unit 13 includes:

the calculation module is used for calculating based on the strategy correction information and a preset reference formula to obtain reference information;

and the updating module is used for updating the local strategy based on the reference information and a preset strategy updating formula.

Optionally, the parameter information of the first distributed switch includes: observation information of the first distributed switch and action information of the first distributed switch; prior to performing the action based on the local policy, the apparatus is further to:

acquiring observation information of a first distributed switch and action information of the first distributed switch;

the observation information of the first distributed switch and the action information of the first distributed switch are determined as parameter information.

Compared with solutions of a completely centralized type and a completely distributed type, the hybrid type intra-network dynamic load balancing device provided by the embodiment of the invention adopts a centralized learning and distributed execution architecture, and realizes the correction of the policy learning process of the first distributed switch through a centralized network control platform. The framework can realize the cooperation and learning capacity among all the distributed switches on the premise of ensuring that the load balancing strategy is executed in a distributed mode. Namely, the method can ensure the quick response to the network state and simultaneously realize the global convergence capability of the network.

Example 4:

the embodiment of the present invention provides a hybrid intra-network dynamic load balancing apparatus, which is mainly used to execute the hybrid intra-network dynamic load balancing method provided in the foregoing content of embodiment 2, and the following provides a detailed description of the hybrid intra-network dynamic load balancing apparatus provided in the embodiment of the present invention.

Fig. 10 is a schematic structural diagram of another hybrid intra-network dynamic load balancing apparatus according to an embodiment of the present invention. As shown in fig. 10, the hybrid intra-network dynamic load balancing apparatus is applied to a centralized platform, and mainly includes: a second receiving unit 14, a determining unit 15 and a second transmitting unit 16, wherein:

a second receiving unit 14, configured to receive parameter information of the first distribution switch, which is sent by the first distribution switch after performing an action based on the local policy;

a determining unit 15, configured to determine policy modification information based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network;

a second sending unit 16, configured to send the policy modification information to the first distributed switch, so that the first distributed switch updates the local policy based on the policy modification information and a preset policy update formula; the preset strategy updating formula introduces a benchmark mechanism.

The invention provides a dynamic load balancing method in a hybrid network, which is applied to a centralized platform.A second receiving unit 14 is used for receiving parameter information of a first distributed switch, which is sent by the first distributed switch after the action is executed based on a local strategy; then, the determination unit 15 determines policy modification information based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network; finally, the second sending unit 16 is used for sending the strategy correction information to the first distributed switch, so that the first distributed switch updates the local strategy based on the strategy correction information and the preset strategy updating formula; the preset strategy updating formula introduces a benchmark mechanism. The first distributed switch in the embodiment of the present invention is any one distributed switch in a network, which communicates with the centralized platform, and all other distributed switches except the first distributed switch also communicate with the centralized platform, the communication between the centralized platform and each distributed switch can be realized through the second receiving unit 14, the dynamic policy modification information can be determined through the determining unit 15, and then the modification of the local policy of each switch can be realized, and further, the cooperation of each distributed switch can be realized on the basis of ensuring the distributed execution. In addition, the embodiment of the invention can effectively avoid the learning difficulty and improve the optimization efficiency by introducing a reference mechanism.

Further, the second sending unit 16 is further configured to send policy modification information to the second distributed switch, so that the second distributed switch updates the local policy of the second distributed switch based on the policy modification information and the preset policy update formula.

Example 5:

fig. 11 is a schematic structural diagram of a hybrid intra-network dynamic load balancing system according to an embodiment of the present invention. As shown in fig. 11, the hybrid intra-network dynamic load balancing system includes: a first distributed switch 10, a centralized platform 20, and a second distributed switch 30.

The hybrid intra-network dynamic load balancing system provided by the embodiment of the invention can realize the cooperative learning capability of the first distributed switch and the second distributed switch, and solves the problem of moving target and credit assignment of traditional single-agent reinforcement learning in a multi-agent environment. Compared with a completely centralized and completely distributed solution, the hybrid load balancing system adopts a centralized learning and distributed execution architecture, and realizes the correction of the strategy learning process of the first distributed switch through a centralized network control platform. The framework can realize the cooperation and learning capacity among all the distributed switches on the premise of ensuring that the load balancing strategy is executed in a distributed mode. Namely, the method can ensure the quick response to the network state and simultaneously realize the global convergence capability of the network.

The device and the system provided by the embodiment of the invention have the same implementation principle and the same technical effect as the method embodiment, and for the sake of brief description, the corresponding contents in the method embodiment can be referred to where the device and the system embodiment are not mentioned.

In an optional embodiment, the present embodiment further provides an electronic device, which includes a memory and a processor, where the memory stores a computer program operable on the processor, and the processor executes the computer program to implement the steps of the method of the foregoing method embodiment.

In an alternative embodiment, the present embodiment also provides a computer readable medium having non-volatile program code executable by a processor, wherein the program code causes the processor to perform the method of the above method embodiment.

In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

In the description of the present embodiment, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be configured and operated in a specific orientation, and thus, should not be construed as limiting the present embodiment. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present embodiment, it should be understood that the disclosed method and apparatus may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present embodiment or parts of the technical solution may be essentially implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein.

Claims

1. A hybrid intra-network dynamic load balancing method is applied to a first distributed switch and comprises the following steps:

after performing an action based on a local policy, sending parameter information for the first distributed switch to a centralized platform;

receiving policy modification information determined by the centralized platform based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network;

updating the local policy based on the policy modification information and a preset policy update formula so that the first distributed switch executes a next action based on the updated local policy; and introducing a benchmark mechanism into the preset strategy updating formula.

2. The method of claim 1, wherein updating the local policy based on the policy revision information and a preset policy update formula comprises:

calculating based on the strategy correction information and a preset reference formula to obtain reference information;

and updating the local strategy based on the reference information and the preset strategy updating formula.

3. The method of claim 1, wherein the parameter information of the first distributed switch comprises: observation information of the first distributed switch and action information of the first distributed switch;

prior to performing the action based on the local policy, the method includes:

determining observation information of the first distributed switch and action information of the first distributed switch as the parameter information.

4. A dynamic load balancing method in a hybrid network is applied to a centralized platform and comprises the following steps:

receiving parameter information of a first distributed switch sent by the first distributed switch after performing an action based on a local policy;

determining policy revision information based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network;

sending the policy correction information to the first distributed switch, so that the first distributed switch updates the local policy based on the policy correction information and a preset policy update formula; and introducing a benchmark mechanism into the preset strategy updating formula.

5. The method of claim 4, further comprising:

and sending the strategy correction information to the second distributed switch so that the second distributed switch updates the local strategy of the second distributed switch based on the strategy correction information and a preset strategy updating formula.

6. A hybrid intra-network dynamic load balancing apparatus applied to a first distributed switch, comprising:

a first sending unit, configured to send parameter information of the first distributed switch to a centralized platform after performing an action based on a local policy;

a first receiving unit, configured to receive policy modification information determined by the centralized platform based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network;

an updating unit, configured to update the local policy based on the policy modification information and a preset policy updating formula, so that the first distributed switch executes a next action based on the updated local policy; and introducing a benchmark mechanism into the preset strategy updating formula.

7. A dynamic load balancing device in a hybrid network is applied to a centralized platform and comprises the following components:

a second receiving unit, configured to receive parameter information of a first distributed switch sent by the first distributed switch after performing an action based on a local policy;

a determining unit, configured to determine policy modification information based on the parameter information of the first distributed switch and the parameter information of the second distributed switch; the second distributed switch is all other distributed switches except the first distributed switch in the network;

a second sending unit, configured to send the policy modification information to the first distributed switch, so that the first distributed switch updates the local policy based on the policy modification information and a preset policy update formula; and introducing a benchmark mechanism into the preset strategy updating formula.

8. A hybrid intra-network dynamic load balancing system comprising a first distributed switch implementing the method of any one of claims 1 to 3, a centralized platform and a second distributed switch implementing the method of any one of claims 4 to 5.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method according to any of claims 1 to 5 when executing the computer program.

10. A computer-readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method of any of claims 1 to 5.