WO2024021486A1 - Load balancing method and system, and electronic device and storage medium - Google Patents

Load balancing method and system, and electronic device and storage medium Download PDF

Info

Publication number
WO2024021486A1
WO2024021486A1 PCT/CN2022/141797 CN2022141797W WO2024021486A1 WO 2024021486 A1 WO2024021486 A1 WO 2024021486A1 CN 2022141797 W CN2022141797 W CN 2022141797W WO 2024021486 A1 WO2024021486 A1 WO 2024021486A1
Authority
WO
WIPO (PCT)
Prior art keywords
switch
pheromone
real
path
central controller
Prior art date
Application number
PCT/CN2022/141797
Other languages
French (fr)
Chinese (zh)
Inventor
邹晟
张翼
陈玉鹏
侯飞
Original Assignee
天翼云科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 天翼云科技有限公司 filed Critical 天翼云科技有限公司
Publication of WO2024021486A1 publication Critical patent/WO2024021486A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Definitions

  • the invention relates to the field of network security technology, and specifically to a load balancing method, system, electronic equipment and storage medium.
  • SDN Software-defined networking
  • embodiments of the present invention provide a load balancing method, system, electronic device, and storage medium to balance the load on the switch, reduce response time, and improve user experience.
  • an embodiment of the present invention provides a load balancing method applied to a central controller.
  • the method includes:
  • the initial weight is calculated based on the number of task flows;
  • the real-time health matrix is compared with a preset health matrix, and the load of the switch is adjusted according to the comparison result.
  • the load balancing method obtaineds the gradient parameters of the local model calculated locally by the switch, optimizes the global model based on the gradient parameters, and controls the switch to optimize the local model based on the global model, thereby calculating the resource usage corresponding to the switch.
  • the optimal allocation is determined, that is, the number of task flows on the switch, the weight of the switch is determined based on the number of task flows, the health coefficient is set, and a real-time health matrix is obtained based on the weight of the switch.
  • the load of the switch is adjusted, thereby achieving load balancing of the switch, reducing service response time, and improving user experience.
  • determining the number of task flows on the switch based on the sub-pheromone concentration includes:
  • the number of task flows on the switch is determined based on the optimized path.
  • the pheromone concentration is calculated using the following formula:
  • ⁇ ij (t+1) (1- ⁇ ) ⁇ ij (t)+ ⁇ ij (t)
  • represents the degree of pheromone volatilization
  • ⁇ ij (t) represents the total amount of pheromone released by the ant colony on the path
  • ⁇ ij (t+1) represents the pheromone on the path of switch i and switch j at time t+1 concentration.
  • obtaining the real-time health coefficient of the switch and combining the real-time health coefficient with the initial weight to obtain a real-time health matrix includes:
  • an embodiment of the present invention provides a load balancing method, applied to a switch, and the method includes:
  • the global model is the central controller based on the Obtained by optimizing the gradient parameters mentioned above;
  • the sub-pheromone concentration is determined based on the resource usage and the ant colony algorithm, and the sub-pheromone concentration is sent to the central controller to adjust the load.
  • the determination of the sub-pheromone concentration based on the usage rate of the resource and the ant colony algorithm includes:
  • the sub-pheromone concentration is calculated based on the new pheromones on the path and the ant circle model.
  • the probability of the ants moving to other switches is calculated using the following formula:
  • represents the probability that ant k will visit switch j at the next moment
  • represents the sensitivity of ants to pheromone
  • represents the sensitivity of ant colony to pheromone
  • ⁇ ij represents the heuristic factor.
  • an embodiment of the present invention provides a load balancing system, including:
  • a central controller configured to execute the load balancing method of the first aspect or any implementation of the first aspect
  • At least one switch the switch is connected to the central controller, and the switch is used to perform the load balancing method of the second aspect or any one of the implementation modes of the second aspect.
  • an embodiment of the present invention provides an electronic device, including: a memory and a processor, the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor By executing the computer instructions, the load balancing method described in the first aspect, any implementation manner of the first aspect, the second aspect, or any implementation manner of the second aspect is executed.
  • embodiments of the present invention provide a computer-readable storage medium that stores computer instructions, and the computer instructions are used to cause the computer to execute the first aspect and any of the first aspects.
  • Figure 1 is a flow chart of a load balancing method according to an embodiment of the present invention
  • Figure 2 is a schematic diagram of a GRU timing performance prediction method according to an embodiment of the present invention.
  • Figure 3 is a flow chart of a load balancing method according to an embodiment of the present invention.
  • Figure 4 is a schematic diagram of weighted equal cost multipath according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a load balancing system according to an embodiment of the present invention.
  • Figure 6 is a schematic diagram of a load balancing system according to an embodiment of the present invention.
  • Figure 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
  • a load balancing method is provided. It should be noted that the steps shown in the flow chart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although the steps in the flow chart A logical order is shown, but in some cases the steps shown or described may be performed in a different order than herein.
  • FIG. 1 is a flow chart of a load balancing method according to an embodiment of the present invention. The method is applied to the central controller. As shown in Figure 1, the process includes the following steps:
  • S11 Obtain the gradient parameters of the local model sent by the switch, optimize the global model according to the gradient parameters, and send the global model to the switch to determine the resource usage of the switch.
  • the switch can be a software switch, and the resources of the switch can include CPU (processor), memory and network bandwidth.
  • the software switch performs normalization and other preprocessing on the locally monitored data flow characteristic information.
  • the data flow characteristic information can include the average value of the flow packet size from src to dst, and the average value of the flow packet size from dst to src. , minimum packet value, maximum packet value, average packet value, packet transmission time, handshake time (TCP), etc.
  • TCP handshake time
  • the switch uses the data flow characteristic information as a data set.
  • the data set can be divided according to a preset ratio, such as a 7:3 ratio, as a training data set and a test data set respectively. At the same time, the consistency of the data distribution is maintained as much as possible to avoid Additional bias is introduced due to data partitioning, which affects the final results.
  • the central controller sends the joint GRU (Gated Recurrent Unit) task and initial parameters to the switch, and the initial parameters can be set to 1.
  • the GRU neural network is a variant of LSTM (Long short-term memory). GRU maintains the effect of LSTM while making the structure simpler, including update gates and reset gates.
  • the update gate controls the extent to which state information from the previous moment is brought into the current state. The larger the value, the more state information from the previous moment is brought into the current state.
  • the reset gate controls the extent to which status information at the previous moment is ignored. The smaller the value, the more it is ignored.
  • the switch normalizes the divided training data set. After the training data set is normalized, there are larger values.
  • the activation function ⁇ uses a Rectified Linear Unit (ReLU).
  • ReLU Rectified Linear Unit
  • GRU For timing prediction of CPU, memory and network bandwidth, GRU is combined with federated learning, as shown in Figure 2, the GRU timing performance prediction method based on federated learning.
  • the switch obtains the joint GRU task sent by the central controller. After starting the joint GRU task and initializing the system parameters, it performs calculations locally based on local data such as CPU, memory, network bandwidth, and data flow characteristic information. After the calculation is completed, the obtained gradient parameters are sent. to the central controller.
  • the load balancing system may include one or more switches, and the central controller receives gradient parameters sent by at least one switch. In federated learning, each local gradient parameter is obtained through distributed training, and then the global model is optimized based on each local gradient parameter. After receiving the gradient parameters of the switch, the central controller performs an aggregation operation on these gradient parameters, focusing on efficiency, performance and other factors during the aggregation process.
  • the central controller may sometimes not wait for data upload from all switches, but select a suitable subset of switches as collection targets.
  • the central controller aggregates and optimizes the global model based on the obtained gradient parameters, it sends the optimized global model to the switches participating in the GRU task.
  • the switch updates the local model based on the received global model and evaluates the performance of the local model. If the performance reaches the preset condition, that is, when the performance is good enough, the training stops and the joint modeling ends; if the performance is insufficient, the switch calculates the gradient again locally. parameters and sent to the central controller until the final local model performance reaches the preset condition.
  • the central controller saves the trained global model. It can calculate the initial parameters through the global model and send the initial parameters to the switch.
  • the switch calculates the usage of each resource corresponding to the switch based on the initial parameters and the trained local model, such as CPU. , memory, network bandwidth, etc. usage.
  • S12 Obtain the resource usage of the switch and the sub-pheromone concentration determined by the ant colony algorithm, and determine the number of task flows on the switch based on the sub-pheromone concentration.
  • the central controller obtains the resource usage calculated by the switches and uses the ant colony algorithm to determine the number of task flows on each switch.
  • Ant colony algorithm is an artificial intelligence optimization algorithm that simulates the behavior of ants searching for food and returning to the nest in nature. It finds the optimal path through the cooperation among individual ant colonies.
  • Ant colony algorithm is a heuristic global optimization algorithm in evolutionary algorithms. It has the characteristics of distributed computing, positive information feedback and heuristic search. Its basic idea is: use the walking path of ants to represent the feasible solution to the problem to be optimized. The entire All paths of the ant colony constitute the solution space of the problem to be optimized. Ants with shorter paths release more pheromones. As time goes by, the accumulated pheromone concentration on the shorter path gradually increases, and the number of ants choosing this path also increases.
  • the system can also include hardware switches.
  • n software switches in the system.
  • 2 ants perform the search task.
  • Each ant's The initial point is the hardware switch.
  • the initialization includes setting the upper limit of the number of iterations of the algorithm and the initial pheromone concentration.
  • the pheromone concentration can represent the efficiency of completing the task.
  • the used rates of CPU, memory and network bandwidth are recorded as U cpu , U mem and U net respectively. In order to comprehensively consider the performance of CPU, memory and network bandwidth, the distance is adjusted to the load capacity:
  • ⁇ ij it means that the ⁇ ij value of the switch selected by the task flow is too large, that is, the load is already too large. Selecting this switch will make the load of the entire system more unbalanced. On the contrary, if the obtained ⁇ ij value is larger, the corresponding The value of ⁇ ij is too small, that is, the load is currently too small. Selecting this switch to perform tasks will promote load balancing of the entire system. Therefore, this improvement can prompt task flows to be executed on some relatively idle switches. After multiple iterations of the algorithm, the improved algorithm can finally achieve overall load balancing.
  • the value of ⁇ is adjusted in the following adaptive manner:
  • the pheromone update method is improved and the elite ant system is used.
  • the global update method still uses standard ant colony optimization, while the local update method is adjusted.
  • C ij should be the transmission time T ij from the hardware switch to S j plus the actual execution time E ij of O i on S j plus The delay time W ij from transmission to execution,
  • the data volume of task O i is recorded as F i , P j represents the performance of software switch S j , and N j represents the network bandwidth of software switch S j , then:
  • T ij F i /N j
  • the ant circle model is:
  • C k represents the total completion time of ant k’s search path
  • Q represents the total amount of pheromone left on the path after completing a search
  • the optimal path found is recorded as ⁇ bs .
  • the local update formula is:
  • ⁇ ij (t) represents the total amount of pheromone released by the ant colony on the path
  • e represents the influence weight factor of ⁇ bs
  • C bs represents the completion time of the known optimal path ⁇ bs .
  • Each switch is a node, and the probability of each ant moving to a node can be calculated based on the obtained transfer function, as follows:
  • represents the probability that ant k will visit switch j at the next moment
  • represents the sensitivity of ants to pheromone
  • represents the sensitivity of ant colony to pheromone
  • ⁇ ij represents the heuristic factor.
  • ⁇ ij (t) represents the total amount of pheromone released by the ant colony on the path, that is, the sub-pheromone concentration, represents the total amount of pheromone released by the k-th ant on the path, e is the influence weight factor of ⁇ bs , Represents each additional pheromone added on the path.
  • the central controller summarizes the sub-pheromone concentrations determined by each software switch, evaluates all feasible paths according to the objective function min C max , and selects the current optimal path ⁇ bs , and evaluates the pheromones on all paths based on the determined optimal path. Make a global update.
  • the optimal path that is, the optimal solution to the allocation problem
  • the number of iterations can be continuously increased until the number of iterations reaches the preset upper limit of the number of iterations, and the number of task flows on each switch under the optimal path is obtained.
  • the initial weight is calculated based on the number of task flows.
  • each switch is divided by the greatest common divisor of the number of task flows between other switches in turn to obtain the initial weight.
  • the initial weight is the weighted equivalent Initial value of multipath weight configuration.
  • the obtained initial weight of each software switch can be sent to the hardware switch.
  • the hardware switch can include a path distribution module, which can be used to determine the link between each software switch and the central controller.
  • the central controller can monitor the real-time resource usage and link health of the software switch through the monitoring module.
  • the link health can be used as the basis for the hardware switch to dynamically adaptively adjust multi-path selection.
  • the central controller monitors the switch and obtains the real-time health coefficient.
  • the real-time health coefficient can be multiplied by the initial weight to obtain the real-time health matrix.
  • the preset health matrix is (1, 1, 1,..., 1).
  • the real-time health matrix is compared with the preset health matrix.
  • the hardware is controlled.
  • the switch adjusts the health coefficient of the corresponding link and adjusts the load of the software switch. For example, it can exclude the software switch corresponding to the link with abnormal health status, reduce or remove its load, and maximize the balance of business processing on the software switch. If a link fails, this method can also be used to interrupt data transmission on the link to avoid the risk of data loss and determine the optimal weight configuration of weighted equal-cost multipath.
  • the load balancing method obtaineds the gradient parameters of the local model calculated locally by the switch, optimizes the global model based on the gradient parameters, and controls the switch to optimize the local model based on the global model, thereby calculating the resource usage corresponding to the switch.
  • the optimal allocation is determined, that is, the number of task flows on the switch, the weight of the switch is determined based on the number of task flows, the health coefficient is set, and a real-time health matrix is obtained based on the weight of the switch.
  • the load of the switch is adjusted, thereby achieving load balancing of the switch, reducing service response time, and improving user experience.
  • Pheromone concentration is calculated using the following formula:
  • ⁇ ij (t+1) (1- ⁇ ) ⁇ ij (t)+ ⁇ ij (t)
  • represents the degree of pheromone volatilization
  • ⁇ ij (t) represents the sub-pheromone concentration
  • ⁇ ij (t+1) represents the pheromone concentration on the path of switch i and switch j at time t+1.
  • is adjusted in the following adaptive manner:
  • the central controller can monitor the link health status of the software switch in real time and determine the real-time health coefficient based on the set health detection coefficient.
  • the set health monitoring coefficient is ⁇ , as follows:
  • the real-time health coefficient of each switch is obtained, which can be written as ⁇ 1 , ⁇ 2 , ⁇ 3 ,..., ⁇ n ⁇ .
  • the initial weight of the switch is calculated by the number of task flows obtained. When there are multiple switches, each switch is divided by the greatest common divisor of the number of task flows between other switches in turn to obtain the initial weight.
  • the initial weight can be written as ⁇ w′ 1 , w′ 2 , w′ 3 , ..., w′ n ⁇ .
  • the real-time health matrix obtained by multiplying the initial weight and the implemented health coefficient is [ ⁇ 1 w′ 1 , ⁇ 2 w′ 2 , ⁇ 3 w′ 3 ,..., ⁇ n w′ n ⁇ .
  • Figure 4 shows the weighted equivalent Multipath diagram.
  • FIG. 3 is a flow chart of a load balancing method according to an embodiment of the present invention. The method is applied to a switch. As shown in Figure 3, the process includes the following steps:
  • the switch may be a software switch, and the data flow characteristic information may include the average flow packet size from src to dst, the average flow packet size from dst to src, the minimum packet value, the maximum packet value, and the average packet size, Packet transmission time, handshake time (TCP), etc.
  • the switch uses the data flow characteristic information as a data set.
  • the data set can be divided according to a preset ratio, such as a 7:3 ratio, as a training data set and a test data set respectively. At the same time, the consistency of the data distribution is maintained as much as possible to avoid Additional bias is introduced due to data partitioning, which affects the final results.
  • the central controller sends the joint GRU (Gated Recurrent Unit) task and initial parameters to the switch, and the initial parameters can be set to 1.
  • the GRU neural network is a variant of LSTM (Long short-term memory). GRU maintains the effect of LSTM while making the structure simpler, including update gates and reset gates.
  • the update gate controls the extent to which state information from the previous moment is brought into the current state. The larger the value, the more state information from the previous moment is brought into the current state.
  • the reset gate controls the extent to which status information at the previous moment is ignored. The smaller the value, the more it is ignored.
  • the switch normalizes the divided training data set. After the training data set is normalized, there are larger values.
  • the activation function ⁇ uses a Rectified Linear Unit (ReLU).
  • ReLU Rectified Linear Unit
  • GRU is combined with federated learning, as shown in Figure 2.
  • the switch obtains the joint GRU task sent by the central controller. After starting the joint GRU task and initializing the system parameters, it performs calculations locally based on local data such as CPU, memory, network bandwidth, and data flow characteristic information. After the calculation is completed, the obtained gradient parameters are sent. to the central controller.
  • S22 Obtain the global model sent by the central controller, optimize the local model based on the global model, and calculate the resource usage based on the local model and data flow characteristic information.
  • the global model is optimized by the central controller based on gradient parameters, and the central controller receives gradient parameters sent by at least one switch.
  • each local gradient parameter is obtained through distributed training, and then the global model is optimized based on each local gradient parameter.
  • the central controller After receiving the gradient parameters of the switch, the central controller performs an aggregation operation on these gradient parameters, focusing on efficiency, performance and other factors during the aggregation process. For example, because of the heterogeneous nature of the system, the central controller may sometimes not wait for data upload from all switches, but select a suitable subset of switches as collection targets. After the central controller aggregates and optimizes the global model based on the obtained gradient parameters, it sends the optimized global model to the switches participating in the GRU task.
  • the switch updates the local model based on the received global model and evaluates the performance of the local model. If the performance reaches the preset condition, that is, when the performance is good enough, the training stops and the joint modeling ends; if the performance is insufficient, the switch calculates the gradient again locally. parameters and sent to the central controller until the final local model performance reaches the preset condition.
  • the central controller saves the trained global model. It can calculate the initial parameters through the global model and send the initial parameters to the switch.
  • the switch calculates the usage of each resource corresponding to the switch based on the initial parameters and the trained local model, such as CPU. , memory, network bandwidth, etc. usage.
  • the central controller obtains the resource usage calculated by the switches and uses the ant colony algorithm to determine the number of task flows on each switch.
  • Ant colony algorithm is an artificial intelligence optimization algorithm that simulates the behavior of ants searching for food and returning to the nest in nature. It finds the optimal path through the cooperation among individual ant colonies.
  • Ant colony algorithm is a heuristic global optimization algorithm in evolutionary algorithms. It has the characteristics of distributed computing, positive information feedback and heuristic search. Its basic idea is: use the walking path of ants to represent the feasible solution to the problem to be optimized. The entire All paths of the ant colony constitute the solution space of the problem to be optimized. Ants with shorter paths release more pheromones. As time goes by, the accumulated pheromone concentration on the shorter path gradually increases, and the number of ants choosing this path also increases.
  • the system can also include hardware switches.
  • n software switches in the system.
  • 2 ants perform the search task.
  • Each ant's The initial point is the hardware switch.
  • the initialization includes setting the upper limit of the number of iterations of the algorithm and the initial pheromone concentration.
  • the pheromone concentration can represent the efficiency of completing the task.
  • the used rates of CPU, memory and network bandwidth are recorded as U cpu , U mem and U net respectively. In order to comprehensively consider the performance of CPU, memory and network bandwidth, the distance is adjusted to the load capacity:
  • ⁇ ij it means that the ⁇ ij value of the switch selected by the task flow is too large, that is, the load is already too large. Selecting this switch will make the load of the entire system more unbalanced. On the contrary, if the obtained ⁇ ij value is larger, the corresponding The value of ⁇ ij is too small, that is, the load is currently too small. Selecting this switch to perform tasks will promote load balancing of the entire system. Therefore, this improvement can prompt task flows to be executed on some relatively idle switches. After multiple iterations of the algorithm, the improved algorithm can finally achieve overall load balancing.
  • the value of ⁇ is adjusted in the following adaptive manner:
  • the pheromone update method is improved and the elite ant system is used.
  • the global update method still uses standard ant colony optimization, while the local update method is adjusted.
  • C ij should be the transmission time T ij from the hardware switch to S j plus the actual execution time W ij of O i on S j plus The delay time W ij from transmission to execution,
  • the data volume of task O i is recorded as F i , P j represents the performance of software switch S j , and N j represents the network bandwidth of software switch S j , then:
  • T ij F i /N j
  • the ant circle model is:
  • C k represents the total completion time of ant k’s search path
  • Q represents the total amount of pheromone left on the path after completing a search
  • the optimal path found is recorded as ⁇ bs .
  • the local update formula is:
  • ⁇ ij (t) represents the total amount of pheromone released by the ant colony on the path, that is, the sub-pheromone concentration, represents the total amount of pheromone released by the k-th ant on the path
  • e is the influence weight factor of ⁇ bs
  • C bs represents the completion time of the known optimal path ⁇ bs .
  • Each switch is a node, and the probability of each ant moving to a node can be calculated based on the obtained transfer function, as follows:
  • represents the probability that ant k will visit switch j at the next moment
  • represents the sensitivity of ants to pheromone
  • represents the sensitivity of ant colony to pheromone
  • ⁇ ij represents the heuristic factor.
  • the central controller summarizes the sub-pheromone concentrations determined by each software switch, evaluates all feasible paths according to the objective function min C max , and selects the current optimal path ⁇ bs , and evaluates the pheromones on all paths based on the determined optimal path. Make a global update.
  • the optimal path that is, the optimal solution to the allocation problem
  • the number of iterations can be continuously increased until the number of iterations reaches the preset upper limit of the number of iterations, and the number of task flows on each switch under the optimal path is obtained.
  • the initial weight is calculated based on the number of task flows.
  • each switch is divided by the greatest common divisor of the number of task flows between other switches in turn to obtain the initial weight.
  • the initial weight is the weighted equivalent Initial value of multipath weight configuration.
  • the obtained initial weight of each software switch can be sent to the hardware switch.
  • the hardware switch can include a path distribution module, which can be used to determine the link between each software switch and the central controller.
  • the central controller can monitor the real-time resource usage and link health of the software switch through the monitoring module.
  • the link health can be used as the basis for the hardware switch to dynamically adaptively adjust multi-path selection.
  • the central controller monitors the switch and obtains the real-time health coefficient.
  • the real-time health coefficient can be multiplied by the initial weight to obtain the real-time health matrix.
  • the preset health matrix is (1, 1, 1,..., 1).
  • the real-time health matrix is compared with the preset health matrix.
  • the hardware is controlled.
  • the switch adjusts the health coefficient of the corresponding link and adjusts the load of the software switch. For example, it can exclude the software switch corresponding to the link with abnormal health status, reduce or remove its load, and maximize the balance of business processing on the software switch. If a link fails, this method can also be used to interrupt data transmission on the link to avoid the risk of data loss.
  • determining the sub-pheromone concentration based on resource usage and ant colony algorithm includes the following steps:
  • the initialization includes setting the upper limit of the number of iterations of the algorithm and the initial pheromone concentration.
  • the pheromone concentration can represent the efficiency of completing the task.
  • the used rates of CPU, memory and network bandwidth are recorded as U cpu , U mem and U net respectively. In order to comprehensively consider the performance of CPU, memory and network bandwidth, the distance is adjusted to the load capacity:
  • ⁇ ij it means that the ⁇ ij value of the switch selected by the task flow is too large, that is, the load is already too large. Selecting this switch will make the load of the entire system more unbalanced. On the contrary, if the obtained ⁇ ij value is larger, the corresponding The value of ⁇ ij is too small, that is, the load is currently too small. Selecting this switch to perform tasks will promote load balancing of the entire system. Therefore, this improvement can prompt task flows to be executed on some relatively idle switches. After multiple iterations of the algorithm, the improved algorithm can finally achieve overall load balancing.
  • the probability of ants moving to other switches is calculated using the following formula:
  • represents the probability that ant k will visit switch j at the next moment
  • represents the sensitivity of ants to pheromone
  • represents the sensitivity of ant colony to pheromone
  • ⁇ ij represents the heuristic factor.
  • the sub-pheromone concentration is calculated based on the new pheromone on the path and the ant circle model.
  • the value of ⁇ is adjusted in the following adaptive manner:
  • the pheromone update method is improved and the elite ant system is used.
  • the global update method still uses standard ant colony optimization, while the local update method is adjusted.
  • C ij should be the transmission time T ij from the hardware switch to S j plus the actual execution time E ij of O i on S j plus The delay time W ij from transmission to execution,
  • the data volume of task O i is recorded as F i , P j represents the performance of software switch S j , and N j represents the network bandwidth of software switch S j , then:
  • T ij F i /N j
  • the ant circle model is:
  • C k represents the total completion time of ant k’s search path
  • Q represents the total amount of pheromone left on the path after completing a search
  • C bs represents the completion time of the known optimal path ⁇ bs .
  • the optimal path found is recorded as ⁇ bs .
  • artificial release of additional pheromone is added.
  • ⁇ ij (t) represents the total amount of pheromone released by the ant colony on the path, that is, the sub-pheromone concentration, represents the total amount of pheromone released by the k-th ant on the path
  • e is the influence weight factor of ⁇ bs , Indicates new pheromones on the path.
  • a GRU timing performance prediction method based on federated learning is designed, which reduces the increase in additional delay caused by real-time training and meets the delay-sensitive requirements of short flows.
  • it takes full advantage of the high performance of software switches and uses federated learning to relieve the pressure on the central controller and reduce avoidable data transmission.
  • the load status of the link is equivalently known through prediction.
  • the distributed weighted equal-cost multi-path routing method based on the optimized ant colony algorithm enables load balancing to take into account the heterogeneity of devices on software switches, adopts a more reasonable load evaluation method, improves the convergence of the algorithm, and speeds up the results. solution speed.
  • module may be a combination of software and/or hardware that implements a predetermined function.
  • systems described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
  • This embodiment provides a load balancing system, as shown in Figure 5, including:
  • the central controller is used to execute the load balancing method
  • At least one switch is connected to the central controller for performing a load balancing method.
  • the system includes a software switch, a hardware switch and a central controller.
  • the system is shown in Figure 6 .
  • the hardware switch includes a path distribution module that can perform multi-path selection for short flows based on weight configuration.
  • multi-path selection is dynamically and adaptively adjusted based on the link health detection results sent by the central controller after real-time monitoring.
  • the central controller includes a monitoring module, a performance prediction module and a path training module.
  • the monitoring module is used to monitor and detect the real-time network utilization and link health of each software switch node, which serves as the basis for the dynamic adaptive adjustment of multi-path selection by the hardware switch.
  • the performance prediction module can cooperate with the software switch to federally train a prediction model for CPU, memory and network bandwidth usage.
  • the path training module can cooperate with the software switch to predict the optimal weight configuration of weighted equal-cost multi-paths in a distributed manner.
  • the software switch includes a monitoring module, a performance prediction module and a path training module.
  • the monitoring module is used to monitor and record the usage of local resources of the software switch. Local resources include CPU, memory, network bandwidth, etc.
  • the local network utilization and link health status sent by the central controller can also be recorded as a data source for the performance prediction module.
  • the performance prediction module can coordinate the central controller to calculate resource usage based on federated learning.
  • the path training module can coordinate the distributed calculation of the assigned path search value results by the central controller.
  • the load balancing system in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC circuit, a processor and memory that executes one or more software or fixed programs, and/or other devices that can provide the above functions. .
  • An embodiment of the present invention also provides an electronic device having the load balancing system shown in Figure 5 above.
  • Figure 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
  • the electronic device may include: at least one processor 601, such as a CPU (Central Processing Unit). ), at least one communication interface 603, memory 604, and at least one communication bus 602.
  • the communication bus 602 is used to realize connection communication between these components.
  • the communication interface 603 may include a display screen (Display) and a keyboard (Keyboard), and the optional communication interface 603 may also include a standard wired interface and a wireless interface.
  • the memory 604 can be a high-speed RAM memory (Random Access Memory, volatile random access memory), or a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the memory 604 may optionally be at least one storage device located remotely from the aforementioned processor 601.
  • the processor 601 can be combined with the system described in FIG. 5 , the memory 604 stores an application program, and the processor 601 calls the program code stored in the memory 604 to execute any of the above method steps.
  • the communication bus 602 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the communication bus 602 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 7, but it does not mean that there is only one bus or one type of bus.
  • the memory 604 may include volatile memory (English: volatile memory), such as random access memory (English: random-access memory, abbreviation: RAM); the memory may also include non-volatile memory (English: non-volatile memory), such as flash memory (English: flash memory), hard disk (English: hard disk drive, abbreviation: HDD) or solid-state drive (English: solid-state drive, abbreviation: SSD); the memory 604 can also include the above types memory combination.
  • volatile memory such as random access memory (English: random-access memory, abbreviation: RAM)
  • non-volatile memory such as flash memory (English: flash memory), hard disk (English: hard disk drive, abbreviation: HDD) or solid-state drive (English: solid-state drive, abbreviation: SSD); the memory 604 can also include the above types memory combination.
  • the processor 601 can be a central processing unit (English: central processing unit, abbreviation: CPU), a network processor (English: network processor, abbreviation: NP) or a combination of CPU and NP.
  • CPU central processing unit
  • NP network processor
  • the processor 601 may further include a hardware chip.
  • the above-mentioned hardware chip can be an application-specific integrated circuit (English: application-specific integrated circuit, abbreviation: ASIC), a programmable logic device (English: programmable logic device, abbreviation: PLD) or a combination thereof.
  • the above-mentioned PLD can be a complex programmable logic device (English: complex programmable logic device, abbreviation: CPLD), a field-programmable logic gate array (English: field-programmable gate array, abbreviation: FPGA), a general array logic (English: generic array logic, abbreviation: GAL) or any combination thereof.
  • memory 604 is also used to store program instructions.
  • the processor 601 can call program instructions to implement the load balancing method shown in the embodiments of this application.
  • Embodiments of the present invention also provide a non-transitory computer storage medium.
  • the computer storage medium stores computer-executable instructions.
  • the computer-executable instructions can execute the load balancing method in any of the above method embodiments.
  • the storage medium can be a magnetic disk, an optical disk, a read-only memory (ROM), a random access memory (RAM), a flash memory (Flash Memory), a hard disk (Hard disk). Disk Drive (abbreviation: HDD) or solid-state drive (Solid-State Drive, SSD), etc.; the storage medium may also include a combination of the above types of memories.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Disclosed in the present invention are a load balancing method and system, and an electronic device and a storage medium. The method comprises: a central controller acquiring a gradient parameter of a local model sent by a switch, optimizing a global model according to the gradient parameter, and sending the global model to the switch so as to determine the usage rate of resources of the switch, wherein the resources comprise a processor, a memory and a network bandwidth; acquiring a sub-pheromone concentration, which is determined by the switch on the basis of the usage rate of the resources and an ant colony algorithm, and determining the number of task flows on the switch on the basis of the sub-pheromone concentration; acquiring a real-time health coefficient of the switch, and combining the real-time health coefficient with an initial weight to obtain a real-time health matrix, wherein the initial weight is calculated on the basis of the number of task flows; and comparing the real-time health matrix with a preset health matrix, and adjusting the load of the switch according to a comparison result. The method realizes load balancing of a switch and reduces the service response time, thereby improving the usage experience of users.

Description

一种负载均衡方法、系统、电子设备及存储介质A load balancing method, system, electronic device and storage medium 技术领域Technical field
本发明涉及网络安全技术领域,具体涉及一种负载均衡方法、系统、电子设备及存储介质。The invention relates to the field of network security technology, and specifically to a load balancing method, system, electronic equipment and storage medium.
背景技术Background technique
随着云计算、大数据、人工智能的迅猛发展,应用服务的数据量呈现指数级别的增长,传统的后端网络接入层受限于接入层入口的带宽瓶颈和网络硬件的高昂成本,已无法应对随之而来的海量数据。新兴的网络技术不断涌现。软件定义网络(SDN)通过将控制层与数据转发层解耦,控制层负责管控全局网络,数据转发层负责按照控制层下发的流表完成数据转发,极大提高了网络部署和管理的灵活性,并实现了对数据流量的集中管控。同时,随着软件交换机的蓬勃发展,通过在业务节点部署软件交换机,得以减少网络跳数,进而减少网络时延的方式也越来越普及。对此,在面对海量数据时,越来越多应用服务的网络接入层,通过采用软件路由的方式,直接将应用服务的逻辑处理后端作为硬件交换机的下一跳,实现了扩展性强、经济高效的多路径方案,从而横向扩展了网络接入层入口的带宽,解决了带宽不足的问题。With the rapid development of cloud computing, big data, and artificial intelligence, the amount of data in application services has increased exponentially. The traditional back-end network access layer is limited by the bandwidth bottleneck of the access layer entrance and the high cost of network hardware. It is no longer possible to cope with the massive amount of data that comes with it. Emerging network technologies are constantly emerging. Software-defined networking (SDN) decouples the control layer from the data forwarding layer. The control layer is responsible for controlling the global network, and the data forwarding layer is responsible for completing data forwarding according to the flow table issued by the control layer, which greatly improves the flexibility of network deployment and management. nature, and achieves centralized management and control of data traffic. At the same time, with the vigorous development of software switches, the method of deploying software switches on business nodes to reduce the number of network hops and thereby reduce network latency is becoming more and more popular. In this regard, when faced with massive data, more and more network access layers of application services use software routing to directly use the logical processing backend of the application service as the next hop of the hardware switch to achieve scalability. A powerful, cost-effective multi-path solution, thereby horizontally expanding the bandwidth of the network access layer entrance and solving the problem of insufficient bandwidth.
技术问题technical problem
现有的多路径之间负载均衡方法无法感知路径的拥塞状态和链路故障,容易造成多条数据流在路径上发生哈希冲突,导致链路拥塞和应用性能下降。一般的流量大小情况下,由于短流数据量小,同时处理速度快,些许的等待时间几乎可以忽略不计,然而,海量短流场景极易造成网络的拥塞。Existing load balancing methods between multi-paths cannot sense path congestion status and link failures, which can easily cause hash conflicts among multiple data flows on the path, leading to link congestion and application performance degradation. Under normal traffic conditions, due to the small amount of short-stream data and fast processing speed, the waiting time is almost negligible. However, massive short-stream scenarios can easily cause network congestion.
技术解决方案Technical solutions
有鉴于此,本发明实施例提供了一种负载均衡方法、系统、电子设备及存储介质,以均衡交换机上的负载,减小响应时间,提升用户体验。In view of this, embodiments of the present invention provide a load balancing method, system, electronic device, and storage medium to balance the load on the switch, reduce response time, and improve user experience.
根据第一方面,本发明实施例提供了一种负载均衡方法,应用于中央控制器,所述方法包括:According to a first aspect, an embodiment of the present invention provides a load balancing method applied to a central controller. The method includes:
获取交换机发送的本地模型的梯度参数,根据所述梯度参数优化全局模型,并将所述全局模型发送给所述交换机,以确定交换机的资源的使用率,所述资源包括处理器、内存和网络带宽;Obtain the gradient parameters of the local model sent by the switch, optimize the global model according to the gradient parameters, and send the global model to the switch to determine the usage rate of the switch's resources, the resources including processor, memory and network bandwidth;
获取交换机基于所述资源的使用率以及蚁群算法确定的子信息素浓度,并基于所述子信息素浓度确定交换机上的任务流数量;Obtain the sub-pheromone concentration determined by the switch based on the usage rate of the resource and the ant colony algorithm, and determine the number of task flows on the switch based on the sub-pheromone concentration;
获取交换机的实时健康系数,将所述实时健康系数与初始权重结合得到实时健康矩阵,所述 初始权重是基于所述任务流数量计算得到的;Obtain the real-time health coefficient of the switch, and combine the real-time health coefficient with the initial weight to obtain a real-time health matrix. The initial weight is calculated based on the number of task flows;
将所述实时健康矩阵与预设健康矩阵比较,根据比较结果调整交换机的负载。The real-time health matrix is compared with a preset health matrix, and the load of the switch is adjusted according to the comparison result.
本实施例提供的负载均衡方法,获取交换机在本地计算的本地模型的梯度参数,基于梯度参数优化全局模型,并控制交换机基于全局模型对本地模型进行优化,从而计算交换机对应的资源的使用率。基于蚁群算法和得到的交换机的资源的使用率确定最优分配,即交换机上的任务流数量,根据任务流数量确定交换机的权重,设定健康系数,并基于交换机的权重得到实时健康矩阵。通过比较实时健康矩阵和预设健康矩阵实现对交换机的负载的调整,从而实现交换机的负载均衡,减少服务响应时间,提升用户的使用体验。The load balancing method provided in this embodiment obtains the gradient parameters of the local model calculated locally by the switch, optimizes the global model based on the gradient parameters, and controls the switch to optimize the local model based on the global model, thereby calculating the resource usage corresponding to the switch. Based on the ant colony algorithm and the obtained resource utilization of the switch, the optimal allocation is determined, that is, the number of task flows on the switch, the weight of the switch is determined based on the number of task flows, the health coefficient is set, and a real-time health matrix is obtained based on the weight of the switch. By comparing the real-time health matrix and the preset health matrix, the load of the switch is adjusted, thereby achieving load balancing of the switch, reducing service response time, and improving user experience.
结合第一方面,在一种实施方式中,所述基于所述子信息素浓度确定交换机上的任务流数量,包括:In conjunction with the first aspect, in one implementation, determining the number of task flows on the switch based on the sub-pheromone concentration includes:
循环获取所述交换机基于所述资源的使用率以及蚁群算法确定的子信息素浓度;Cyclically obtain the sub-pheromone concentration determined by the switch based on the usage rate of the resource and the ant colony algorithm;
对所述子信息素浓度进行全局更新,得到信息素浓度,并对路径进行优化;Globally update the sub-pheromone concentration to obtain the pheromone concentration, and optimize the path;
基于所述优化的路径确定交换机上的任务流数量。The number of task flows on the switch is determined based on the optimized path.
结合第一方面,在一种实施方式中,所述信息素浓度采用如下公式计算得到:In conjunction with the first aspect, in one embodiment, the pheromone concentration is calculated using the following formula:
τ ij(t+1)=(1-ρ)τ ij(t)+Δτ ij(t) τ ij (t+1)=(1-ρ)τ ij (t)+Δτ ij (t)
其中,ρ表示信息素挥发程度,Δτ ij(t)表示蚁群在路径上释放的信息素总量,τ ij(t+1)表示交换机i和交换机j路径上在t+1时刻的信息素浓度。 Among them, ρ represents the degree of pheromone volatilization, Δτ ij (t) represents the total amount of pheromone released by the ant colony on the path, τ ij (t+1) represents the pheromone on the path of switch i and switch j at time t+1 concentration.
结合第一方面,在一种实施方式中,所述获取交换机的实时健康系数,将所述实时健康系数与初始权重结合得到实时健康矩阵,包括:With reference to the first aspect, in one implementation, obtaining the real-time health coefficient of the switch and combining the real-time health coefficient with the initial weight to obtain a real-time health matrix includes:
检测交换机的路径健康情况,并根据所述交换机的路径健康情况确定交换机的实时健康系数;Detect the path health of the switch, and determine the real-time health coefficient of the switch based on the path health of the switch;
将交换机对应的初始权重与实时健康系数相乘,确定实时健康矩阵。Multiply the initial weight corresponding to the switch and the real-time health coefficient to determine the real-time health matrix.
根据第二方面,本发明实施例提供了一种负载均衡方法,应用于交换机,所述方法包括:According to a second aspect, an embodiment of the present invention provides a load balancing method, applied to a switch, and the method includes:
获取数据流特征信息,基于所述数据流特征信息确定梯度参数,并将所述梯度参数发送给中央控制器;Obtain data flow characteristic information, determine gradient parameters based on the data flow characteristic information, and send the gradient parameters to the central controller;
获取中央控制器发送的全局模型,基于所述全局模型优化本地模型,并根据所述本地模型与所述数据流特征信息计算得到资源的使用率,所述全局模型是所述中央控制器基于所述梯度参数优化得到的;Obtain the global model sent by the central controller, optimize the local model based on the global model, and calculate the resource usage based on the local model and the data flow feature information. The global model is the central controller based on the Obtained by optimizing the gradient parameters mentioned above;
基于所述资源的使用率和蚁群算法确定子信息素浓度,并将所述子信息素浓度发送给所述中 央控制器,以调整负载。The sub-pheromone concentration is determined based on the resource usage and the ant colony algorithm, and the sub-pheromone concentration is sent to the central controller to adjust the load.
结合第二方面,在一种实施方式中,所述基于所述资源的使用率和蚁群算法确定子信息素浓度,包括:Combined with the second aspect, in one implementation, the determination of the sub-pheromone concentration based on the usage rate of the resource and the ant colony algorithm includes:
基于蚁群算法分配蚂蚁的搜索任务,基于所述资源的使用率确定启发因子;Allocate the search tasks of ants based on the ant colony algorithm, and determine the heuristic factor based on the usage rate of the resources;
根据所述启发因子计算所述蚂蚁移动到其他交换机的概率;Calculate the probability that the ant moves to other switches according to the heuristic factor;
当所述蚂蚁移动至其他交换机,根据路径上新增的信息素以及蚁周模型计算子信息素浓度。结合第二方面,在一种实施方式中,所述蚂蚁移动到其他交换机的概率采用如下公式计算:When the ants move to other switches, the sub-pheromone concentration is calculated based on the new pheromones on the path and the ant circle model. Combined with the second aspect, in one implementation, the probability of the ants moving to other switches is calculated using the following formula:
Figure PCTCN2022141797-appb-000001
Figure PCTCN2022141797-appb-000001
其中,
Figure PCTCN2022141797-appb-000002
表示蚂蚁k在下一时刻访问交换机j的概率,α表示蚂蚁对信息素的敏感程度,β表示蚁群对信息素的敏感程度,
Figure PCTCN2022141797-appb-000003
表示交换机i和交换机j路径上在t时刻的信息素浓度,
Figure PCTCN2022141797-appb-000004
η ij表示启发因子,启发因子用于描述交换机j对交换机i上蚂蚁的吸引程度,可表示为η ij=1/d ij,d ij表示交换机j和交换机i间的距离,allowed k表示尚未访问的交换机集合。
in,
Figure PCTCN2022141797-appb-000002
represents the probability that ant k will visit switch j at the next moment, α represents the sensitivity of ants to pheromone, β represents the sensitivity of ant colony to pheromone,
Figure PCTCN2022141797-appb-000003
represents the pheromone concentration on the path of switch i and switch j at time t,
Figure PCTCN2022141797-appb-000004
η ij represents the heuristic factor. The heuristic factor is used to describe the degree of attraction of switch j to ants on switch i. It can be expressed as η ij =1/d ij , d ij represents the distance between switch j and switch i, and allowed k means that it has not been visited yet. A collection of switches.
根据第三方面,本发明实施例提供了一种负载均衡系统,包括:According to a third aspect, an embodiment of the present invention provides a load balancing system, including:
中央控制器,所述中央控制器用于执行第一方面或第一方面的任意一种实施方式的负载均衡方法;A central controller, the central controller is configured to execute the load balancing method of the first aspect or any implementation of the first aspect;
至少一个交换机,所述交换机与所述中央控制器连接,所述交换机用于执行第二方面或第二方面的任意一种实施方式的负载均衡方法。At least one switch, the switch is connected to the central controller, and the switch is used to perform the load balancing method of the second aspect or any one of the implementation modes of the second aspect.
根据第四方面,本发明实施例提供了一种电子设备,包括:存储器和处理器,所述存储器和所述处理器之间互相通信连接,所述存储器中存储有计算机指令,所述处理器通过执行所述计算机指令,从而执行第一方面、第一方面的任意一种实施方式、第二方面或第二方面的任意一种实施方式中所述的负载均衡方法。According to a fourth aspect, an embodiment of the present invention provides an electronic device, including: a memory and a processor, the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor By executing the computer instructions, the load balancing method described in the first aspect, any implementation manner of the first aspect, the second aspect, or any implementation manner of the second aspect is executed.
根据第五方面,本发明实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储计算机指令,所述计算机指令用于使所述计算机执行第一方面、第一方面的任意一种实施方式、第二方面或第二方面的任意一种实施方式中所述的负载均衡方法。According to a fifth aspect, embodiments of the present invention provide a computer-readable storage medium that stores computer instructions, and the computer instructions are used to cause the computer to execute the first aspect and any of the first aspects. The load balancing method described in one implementation, the second aspect, or any implementation of the second aspect.
附图说明Description of drawings
为了更清楚地说明本发明具体实施方式或现有技术中的技术方案,下面将对具体实施方式或 现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the specific embodiments of the present invention or the technical solutions in the prior art, the accompanying drawings that need to be used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description The drawings illustrate some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting any creative effort.
图1是根据本发明实施例的负载均衡方法的流程图;Figure 1 is a flow chart of a load balancing method according to an embodiment of the present invention;
图2是根据本发明实施例的GRU时序性能预测方法的示意图;Figure 2 is a schematic diagram of a GRU timing performance prediction method according to an embodiment of the present invention;
图3是根据本发明实施例的负载均衡方法的流程图;Figure 3 is a flow chart of a load balancing method according to an embodiment of the present invention;
图4是根据本发明实施例加权等价多路径示意图;Figure 4 is a schematic diagram of weighted equal cost multipath according to an embodiment of the present invention;
图5是根据本发明实施例的负载均衡系统的示意图;Figure 5 is a schematic diagram of a load balancing system according to an embodiment of the present invention;
图6是根据本发明实施例的负载均衡系统的示意图;Figure 6 is a schematic diagram of a load balancing system according to an embodiment of the present invention;
图7是本发明实施例提供的电子设备的结构示意图。Figure 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
本发明的实施方式Embodiments of the invention
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, rather than all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative efforts fall within the scope of protection of the present invention.
根据本发明实施例,提供了一种负载均衡方法,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present invention, a load balancing method is provided. It should be noted that the steps shown in the flow chart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and although the steps in the flow chart A logical order is shown, but in some cases the steps shown or described may be performed in a different order than herein.
在本实施例中提供了一种负载均衡方法,图1是根据本发明实施例的负载均衡方法的流程图,该方法应用于中央控制器,如图1所示,该流程包括如下步骤:This embodiment provides a load balancing method. Figure 1 is a flow chart of a load balancing method according to an embodiment of the present invention. The method is applied to the central controller. As shown in Figure 1, the process includes the following steps:
S11,获取交换机发送的本地模型的梯度参数,根据梯度参数优化全局模型,并将全局模型发送给交换机,以确定交换机的资源的使用率。S11: Obtain the gradient parameters of the local model sent by the switch, optimize the global model according to the gradient parameters, and send the global model to the switch to determine the resource usage of the switch.
交换机可以为软件交换机,交换机的资源可以包括CPU(处理器)、内存和网络带宽。软件交换机对本地监控的数据流特征信息进行归一化等预处理,其中数据流特征信息可以包括从src到dst的流数据包大小的平均值,从dst到src的流数据包大小的平均值,包最小值,包最大值,包平均值,包传输时间,握手时间(TCP)等。交换机将数据流特征信息作为数据集,可以将数据集按照预设比例进行划分,例如按照7:3的比例,分别作为训练数据集和测试数据集,同时尽可能保持数据分布的一致性,避免因数据划分而引入额外的偏差,对最终结果产生影响。The switch can be a software switch, and the resources of the switch can include CPU (processor), memory and network bandwidth. The software switch performs normalization and other preprocessing on the locally monitored data flow characteristic information. The data flow characteristic information can include the average value of the flow packet size from src to dst, and the average value of the flow packet size from dst to src. , minimum packet value, maximum packet value, average packet value, packet transmission time, handshake time (TCP), etc. The switch uses the data flow characteristic information as a data set. The data set can be divided according to a preset ratio, such as a 7:3 ratio, as a training data set and a test data set respectively. At the same time, the consistency of the data distribution is maintained as much as possible to avoid Additional bias is introduced due to data partitioning, which affects the final results.
中央控制器将联合GRU(Gated Recurrent Unit,门控循环单元)任务和初始参数发送至交换机,初始参数可以设置为1。GRU神经网络是LSTM(Long short-term memory,长短期记忆)的一个变体,GRU在保持了LSTM的效果同时又使结构更加简单,包括更新门和重置门。更新门控制前一时刻的状态信息被带入到当前状态中的程度,值越大前一时刻的状态信息带入越多。重置门控制忽略前一时刻的状态信息的程度,值越小说明忽略得越多。交换机对划分出的训练数据集进行归一化处理,训练数据集归一化之后存在较大的数值,为防止梯度消失,激活函数σ使用修正线性单元(Rectified Linear Unit,ReLU)。同时,使用ReLU会使部分神经元为0,造成网络的稀疏性,并且减少参数之间的相互依赖关系,有效地缓解了过拟合问题。ReLU函数表达式如下所示:The central controller sends the joint GRU (Gated Recurrent Unit) task and initial parameters to the switch, and the initial parameters can be set to 1. The GRU neural network is a variant of LSTM (Long short-term memory). GRU maintains the effect of LSTM while making the structure simpler, including update gates and reset gates. The update gate controls the extent to which state information from the previous moment is brought into the current state. The larger the value, the more state information from the previous moment is brought into the current state. The reset gate controls the extent to which status information at the previous moment is ignored. The smaller the value, the more it is ignored. The switch normalizes the divided training data set. After the training data set is normalized, there are larger values. In order to prevent the gradient from disappearing, the activation function σ uses a Rectified Linear Unit (ReLU). At the same time, using ReLU will cause some neurons to be 0, causing the sparsity of the network, and reducing the interdependence between parameters, effectively alleviating the over-fitting problem. The ReLU function expression is as follows:
ReLU=max(0,x)ReLU=max(0,x)
为充分利用软件交换机的计算性能,以及缓解中央控制器的性能压力,并减少可避免的数据传输所带来的额外带宽开销。对于CPU、内存和网络带宽的时序预测,将GRU与联邦学习相结合,如图2所示的基于联邦学习的GRU时序性能预测方法。In order to make full use of the computing performance of the software switch, relieve the performance pressure on the central controller, and reduce the additional bandwidth overhead caused by avoidable data transmission. For timing prediction of CPU, memory and network bandwidth, GRU is combined with federated learning, as shown in Figure 2, the GRU timing performance prediction method based on federated learning.
交换机获取中央控制器发送的联合GRU任务,开启联合GRU任务并初始化系统参数后,在本地根据CPU、内存、网络带宽以及数据流特征信息等本地数据进行计算,计算完成后将得到的梯度参数发送给中央控制器。负载均衡系统中可以包括一个或多个交换机,中央控制器接收至少一个交换机发送的梯度参数。在联邦学习中通过分布式训练得到各个局部的梯度参数,再根据各局部梯度参数对全局模型进行优化。中央控制器在收到交换机的梯度参数后,对这些梯度参数进行聚合操作,在聚合的过程中侧重考虑效率、性能等因素。例如,因为系统的异构性,中央控制器有时可能不会等待所有交换机的数据上传,而是选择一个合适的交换机子集作为收集目标。中央控制器基于获取的梯度参数对全局模型进行聚合优化后,将优化后的全局模型发送给参与GRU任务的交换机。交换机根据接收到的全局模型更新本地模型,并对本地模型性能进行评估,若性能达到预设情况,即性能足够好时,停止训练,联合建模结束;若性能不足,交换机在本地再次计算梯度参数并发送给中央控制器,直至最终的本地模型性能达到预设情况。中央控制器保存训练好的全局模型,可以通过全局模型计算得到初始参数,并将初始参数发送给交换机,交换机基于初始参数以及训练好的本地模型计算该交换机对应的各资源的使用率,例如CPU、内存、网络带宽等的使用率。The switch obtains the joint GRU task sent by the central controller. After starting the joint GRU task and initializing the system parameters, it performs calculations locally based on local data such as CPU, memory, network bandwidth, and data flow characteristic information. After the calculation is completed, the obtained gradient parameters are sent. to the central controller. The load balancing system may include one or more switches, and the central controller receives gradient parameters sent by at least one switch. In federated learning, each local gradient parameter is obtained through distributed training, and then the global model is optimized based on each local gradient parameter. After receiving the gradient parameters of the switch, the central controller performs an aggregation operation on these gradient parameters, focusing on efficiency, performance and other factors during the aggregation process. For example, because of the heterogeneous nature of the system, the central controller may sometimes not wait for data upload from all switches, but select a suitable subset of switches as collection targets. After the central controller aggregates and optimizes the global model based on the obtained gradient parameters, it sends the optimized global model to the switches participating in the GRU task. The switch updates the local model based on the received global model and evaluates the performance of the local model. If the performance reaches the preset condition, that is, when the performance is good enough, the training stops and the joint modeling ends; if the performance is insufficient, the switch calculates the gradient again locally. parameters and sent to the central controller until the final local model performance reaches the preset condition. The central controller saves the trained global model. It can calculate the initial parameters through the global model and send the initial parameters to the switch. The switch calculates the usage of each resource corresponding to the switch based on the initial parameters and the trained local model, such as CPU. , memory, network bandwidth, etc. usage.
S12,获取交换机基于资源的使用率以及蚁群算法确定的子信息素浓度,并基于子信息素浓度确定交换机上的任务流数量。S12: Obtain the resource usage of the switch and the sub-pheromone concentration determined by the ant colony algorithm, and determine the number of task flows on the switch based on the sub-pheromone concentration.
中央控制器获取交换机计算的各自的资源的使用率,采用蚁群算法来确定各交换机上的任务流数量。蚁群算法是一种模拟自然界蚂蚁搜索食物回巢行为的人工智能优化算法,通过蚁群个体间的协作以寻找最优路径。蚁群算法是进化算法中的一种启发式全局优化算法,具有分布式计算、信息正反馈和启发式搜索的特征,其基本思路为:用蚂蚁的行走路径表示待优化问题的可行解,整个蚂蚁群体的所有路径构成待优化问题的解空间。路径较短的蚂蚁释放的信息素量较多,随着时间的推进,较短的路径上累积的信息素浓度逐渐增高,选择该路径的蚂蚁个数也愈来愈多。The central controller obtains the resource usage calculated by the switches and uses the ant colony algorithm to determine the number of task flows on each switch. Ant colony algorithm is an artificial intelligence optimization algorithm that simulates the behavior of ants searching for food and returning to the nest in nature. It finds the optimal path through the cooperation among individual ant colonies. Ant colony algorithm is a heuristic global optimization algorithm in evolutionary algorithms. It has the characteristics of distributed computing, positive information feedback and heuristic search. Its basic idea is: use the walking path of ants to represent the feasible solution to the problem to be optimized. The entire All paths of the ant colony constitute the solution space of the problem to be optimized. Ants with shorter paths release more pheromones. As time goes by, the accumulated pheromone concentration on the shorter path gradually increases, and the number of ants choosing this path also increases.
传统蚁群算法的收敛速度相对较慢,随机性较大,因此寻求最优解的效率相对较低,在解决优化问题时容易陷入局部最优,从而错过全局最优解。在中央控制器和交换机构成的系统中,由于系统中可能存在多个交换机,且各个交换机存在异构性,因此计算能力、网络带宽等可能都存在差异,使得系统可能处于一个动态分配的过程中。各个交换机在每个时刻承担的负载差异较大,若一部分交换机的性能较差,另一部分的交换机性能较好,那么大量任务流容易集中到性能较好的交换机上执行,而性能较差的交换机可能处于空闲状态。因此,需要根据交换机的性能优势分担中央控制器的压力,实现负载均衡。The convergence speed of the traditional ant colony algorithm is relatively slow and the randomness is large. Therefore, the efficiency of seeking optimal solutions is relatively low. When solving optimization problems, it is easy to fall into local optima, thus missing the global optimal solution. In a system composed of a central controller and a switch, since there may be multiple switches in the system, and each switch is heterogeneous, there may be differences in computing power, network bandwidth, etc., so the system may be in a process of dynamic allocation. . The load borne by each switch at each moment is greatly different. If the performance of some switches is poor and the performance of another part is good, then a large number of task flows will be easily concentrated on the switches with better performance, and the switches with poor performance will be executed. May be idle. Therefore, it is necessary to share the pressure of the central controller based on the performance advantages of the switch to achieve load balancing.
系统中除了中央控制器、软件交换机,还可以包括硬件交换机,设系统中有n台软件交换机,一共分配m只蚂蚁,m=2n,按每台交换机2只蚂蚁执行搜索任务,每只蚂蚁的初始点为硬件交换机。In addition to the central controller and software switches, the system can also include hardware switches. Suppose there are n software switches in the system. A total of m ants are allocated, m = 2n. According to each switch, 2 ants perform the search task. Each ant's The initial point is the hardware switch.
在获取交换机的资源的使用率后,对蚁群算法的相关参数进行初始化,初始化包括设置算法的迭代次数上限以及初始的信息素浓度,信息素浓度可表示完成任务的效率。将CPU、内存和网络带宽的已使用率,分别记为U cpu、U mem和U net,为综合考虑CPU、内存和网络带宽的性能,将距离调整为已负载能力: After obtaining the resource usage of the switch, initialize the relevant parameters of the ant colony algorithm. The initialization includes setting the upper limit of the number of iterations of the algorithm and the initial pheromone concentration. The pheromone concentration can represent the efficiency of completing the task. The used rates of CPU, memory and network bandwidth are recorded as U cpu , U mem and U net respectively. In order to comprehensively consider the performance of CPU, memory and network bandwidth, the distance is adjusted to the load capacity:
Figure PCTCN2022141797-appb-000005
Figure PCTCN2022141797-appb-000005
Figure PCTCN2022141797-appb-000006
Figure PCTCN2022141797-appb-000006
其中:
Figure PCTCN2022141797-appb-000007
Figure PCTCN2022141797-appb-000008
分别表示在已负载能力中CPU、内存、网络带宽所占权重的大小。将Φ用于改进传统蚁群算法中的启发因子η,Φ=1/η,即,η ij=1/Φ ij
in:
Figure PCTCN2022141797-appb-000007
and
Figure PCTCN2022141797-appb-000008
Respectively represent the weight of CPU, memory, and network bandwidth in the loaded capacity. Φ is used to improve the heuristic factor η in the traditional ant colony algorithm, Φ=1/η, that is, η ij =1/Φ ij .
η ij越小,说明任务流选择该交换机的Φ ij值偏大,即负载已偏大,选择该交换机会使整个系统负载更加不均衡,相反,如果所得出的η ij值越大,则对应的Φ ij值偏小,即负载目前偏小,选择该交换机执行任务将会促进整个系统的负载均衡。所以,该改进可以促使任务流在一些相对空闲的交换机上执行,经过算法多次迭代以后,改进后的算法最终能够实现整体的负载均衡。 The smaller η ij is, it means that the Φ ij value of the switch selected by the task flow is too large, that is, the load is already too large. Selecting this switch will make the load of the entire system more unbalanced. On the contrary, if the obtained η ij value is larger, the corresponding The value of Φ ij is too small, that is, the load is currently too small. Selecting this switch to perform tasks will promote load balancing of the entire system. Therefore, this improvement can prompt task flows to be executed on some relatively idle switches. After multiple iterations of the algorithm, the improved algorithm can finally achieve overall load balancing.
由于信息素挥发程度ρ对算法搜索性能影响较大,ρ越大,全局搜索能力越差,ρ越小,局部搜索能力越差,收敛速度也越慢。因此,ρ取值采用如下自适应方式进行调整:Since the degree of pheromone volatilization ρ has a greater impact on the search performance of the algorithm, the larger ρ is, the worse the global search ability is, and the smaller ρ is, the worse the local search ability is and the slower the convergence speed is. Therefore, the value of ρ is adjusted in the following adaptive manner:
Figure PCTCN2022141797-appb-000009
Figure PCTCN2022141797-appb-000009
另外,对信息素更新方式进行改进,使用精英蚂蚁系统,当蚂蚁k完成一次路径搜索后,全局更新方式仍采用标准蚁群优化,而局部更新方式则进行调整。In addition, the pheromone update method is improved and the elite ant system is used. When ant k completes a path search, the global update method still uses standard ant colony optimization, while the local update method is adjusted.
分配到软件交换机S j上的目标任务O i的完成时间为C ij,则C ij应为硬件交换机到S j的传输时间T ij加上O i在S j上的实际执行时间E ij再加上从传输到执行前的延迟时间W ijThe completion time of the target task O i assigned to the software switch S j is C ij , then C ij should be the transmission time T ij from the hardware switch to S j plus the actual execution time E ij of O i on S j plus The delay time W ij from transmission to execution,
即:Right now:
C ij=T ij+E ij+W ij C ij =T ij +E ij +W ij
任务O i的数据量大小记为F i,P j表示软件交换机S j的性能,N j表示软件交换机S j的网络带宽,则: The data volume of task O i is recorded as F i , P j represents the performance of software switch S j , and N j represents the network bandwidth of software switch S j , then:
E ij=F i/P jE ij =F i /P j ,
T ij=F i/N j T ij =F i /N j
由于软件交换机并发执行任务,因此系统将所有任务执行完成的时间也就是所有C ij中的最大值C maxSince the software switch executes tasks concurrently, the time it takes for the system to complete all tasks is the maximum value C max among all C ij :
C max=max(C ij) C max =max(C ij )
由于本次优化的总目标是:最小化任务流的完成时间,因此目标即为最小化,即:Since the overall goal of this optimization is to minimize the completion time of the task flow, the goal is to minimize, that is:
min C maxmin C max .
此时,蚁周模型为:At this time, the ant circle model is:
Figure PCTCN2022141797-appb-000010
Figure PCTCN2022141797-appb-000010
其中,C k表示蚂蚁k搜索路径的总完成时间,Q表示完成一次搜索后路径上遗留的信息素总量,
Figure PCTCN2022141797-appb-000011
表示第k只蚂蚁在路径上释放的信息素总量。
Among them, C k represents the total completion time of ant k’s search path, Q represents the total amount of pheromone left on the path after completing a search,
Figure PCTCN2022141797-appb-000011
Indicates the total amount of pheromone released by the k-th ant on the path.
将发现的最优路径记为Γ bs,对于此路径更新局部信息素时,添加人工释放额外的信息素,以增强正反馈效果。此时,局部更新公式为: The optimal path found is recorded as Γ bs . When updating the local pheromone for this path, artificial release of additional pheromone is added to enhance the positive feedback effect. At this time, the local update formula is:
Figure PCTCN2022141797-appb-000012
Figure PCTCN2022141797-appb-000012
其中,Δτ ij(t)表示蚁群在路径上释放的信息素总量,
Figure PCTCN2022141797-appb-000013
表示第k只蚂蚁在路径上释放的信息素总量,e是Γ bs的影响权重因子,
Figure PCTCN2022141797-appb-000014
表示路径上新增的各个额外信息素,公式如下:
Among them, Δτ ij (t) represents the total amount of pheromone released by the ant colony on the path,
Figure PCTCN2022141797-appb-000013
represents the total amount of pheromone released by the k-th ant on the path, e is the influence weight factor of Γ bs ,
Figure PCTCN2022141797-appb-000014
Represents each additional pheromone added on the path, the formula is as follows:
Figure PCTCN2022141797-appb-000015
Figure PCTCN2022141797-appb-000015
其中,C bs表示已知最优路径Γ bs的完成时间。 Among them, C bs represents the completion time of the known optimal path Γ bs .
每一个交换机都是一个节点,可根据得到的转移函数计算每只蚂蚁移动到一个节点的概率,如下:Each switch is a node, and the probability of each ant moving to a node can be calculated based on the obtained transfer function, as follows:
Figure PCTCN2022141797-appb-000016
Figure PCTCN2022141797-appb-000016
其中,
Figure PCTCN2022141797-appb-000017
表示蚂蚁k在下一时刻访问交换机j的概率,α表示蚂蚁对信息素的敏感程度,β表示蚁群对信息素的敏感程度,
Figure PCTCN2022141797-appb-000018
表示交换机i和交换机j路径上在t时刻的信息素浓度,
Figure PCTCN2022141797-appb-000019
η ij表示启发因子,启发因子用于描述交换机j对交换机i上蚂蚁的吸引程度,可表示为η ij=1/d ij,d ij表示交换机j和交换机i间的距离,allowed k表示尚未访问的交换机集合,蚂蚁根据计算得到的概率移动到相应的交换机节点。
in,
Figure PCTCN2022141797-appb-000017
represents the probability that ant k will visit switch j at the next moment, α represents the sensitivity of ants to pheromone, β represents the sensitivity of ant colony to pheromone,
Figure PCTCN2022141797-appb-000018
represents the pheromone concentration on the path of switch i and switch j at time t,
Figure PCTCN2022141797-appb-000019
η ij represents the heuristic factor. The heuristic factor is used to describe the degree of attraction of switch j to ants on switch i. It can be expressed as η ij =1/d ij , d ij represents the distance between switch j and switch i, and allowed k means that it has not been visited yet. The set of switches, the ants move to the corresponding switch node according to the calculated probability.
当蚂蚁移动到新交换机节点,更新其经过路径的信息素,并对禁忌表进行相应的修改,根据信息素的局部更新公式得到子信息浓度,公式如下:When an ant moves to a new switch node, it updates the pheromones along its path and makes corresponding modifications to the tabu table. The sub-information concentration is obtained according to the local update formula of the pheromone. The formula is as follows:
Figure PCTCN2022141797-appb-000020
Figure PCTCN2022141797-appb-000020
其中,Δτ ij(t)表示蚁群在路径上释放的信息素总量,即子信息素浓度,
Figure PCTCN2022141797-appb-000021
表示第k只蚂蚁在路径上释放的信息素总量,e是Γ bs的影响权重因子,
Figure PCTCN2022141797-appb-000022
表示路径上新增的各个额外信息素。
Among them, Δτ ij (t) represents the total amount of pheromone released by the ant colony on the path, that is, the sub-pheromone concentration,
Figure PCTCN2022141797-appb-000021
represents the total amount of pheromone released by the k-th ant on the path, e is the influence weight factor of Γ bs ,
Figure PCTCN2022141797-appb-000022
Represents each additional pheromone added on the path.
中央控制器汇总各软件交换机确定的子信息素浓度,并根据目标函min C max对所有可行路径进行评价,并选择当前最优路径Γ bs,基于确定的最优路径对所有路径上的信息素进行全局更新。为了确定最优路径,即分配问题的最优解,可以不断递增迭代次数,直到迭代次数达到预先设置的迭代次数上限,获取在最优路径下各交换机上的任务流数量。 The central controller summarizes the sub-pheromone concentrations determined by each software switch, evaluates all feasible paths according to the objective function min C max , and selects the current optimal path Γ bs , and evaluates the pheromones on all paths based on the determined optimal path. Make a global update. In order to determine the optimal path, that is, the optimal solution to the allocation problem, the number of iterations can be continuously increased until the number of iterations reaches the preset upper limit of the number of iterations, and the number of task flows on each switch under the optimal path is obtained.
S13,获取交换机的实时健康系数,将实时健康系数与初始权重结合得到实时健康矩阵。S13, obtain the real-time health coefficient of the switch, and combine the real-time health coefficient with the initial weight to obtain a real-time health matrix.
其中,初始权重是基于任务流数量计算得到的,当存在多个交换机时,各交换机依次与其他交换机之间的任务流数量的最大公约数相除,得到初始权重,该初始权重为加权等价多路径 的权重配置初始值。可以将得到的各软件交换机的初始权重发送给硬件交换机,硬件交换机中可以包括路径分发模块,可用于确定各软件交换机与中央控制器之间的链路。Among them, the initial weight is calculated based on the number of task flows. When there are multiple switches, each switch is divided by the greatest common divisor of the number of task flows between other switches in turn to obtain the initial weight. The initial weight is the weighted equivalent Initial value of multipath weight configuration. The obtained initial weight of each software switch can be sent to the hardware switch. The hardware switch can include a path distribution module, which can be used to determine the link between each software switch and the central controller.
设定交换机的健康系数为ζ,健康系数主要表示的是各链路的健康情况,当ζ=1,表示链路健康检测正常,当ζ=0,表示链路健康检测异常。中央控制器可通过监控模块监控软件交换机的实时资源的使用率和链路健康,链路健康情况可作为硬件交换机动态自适应调整多路径选择的依据。中央控制器对交换机的进行监测,获取实时健康系数,可以将实时健康系数与初始权重相乘得到实时健康矩阵。Set the health coefficient of the switch to ζ. The health coefficient mainly represents the health of each link. When ζ=1, it means that the link health detection is normal. When ζ=0, it means that the link health detection is abnormal. The central controller can monitor the real-time resource usage and link health of the software switch through the monitoring module. The link health can be used as the basis for the hardware switch to dynamically adaptively adjust multi-path selection. The central controller monitors the switch and obtains the real-time health coefficient. The real-time health coefficient can be multiplied by the initial weight to obtain the real-time health matrix.
S14,将实时健康矩阵与预设健康矩阵比较,根据比较结果调整交换机的负载。S14: Compare the real-time health matrix with the preset health matrix, and adjust the load of the switch according to the comparison result.
通过设置ζ i=1得到预设健康矩阵,预设健康矩阵为(1,1,1,...,1),将实时健康矩阵与预设健康矩阵比较,当数值有差异时,控制硬件交换机调整对应链路的健康系数,并调整软件交换机的负载,例如,可以排除健康情况异常的链路对应的软件交换机,减少或去除其的负载,最大化地均衡了软件交换机上的业务处理。若链路出现故障,也可以通过该方式中断该链路上数据的传输,避免数据丢失的风险,确定加权等价多路径的最佳权重配置。 The preset health matrix is obtained by setting ζ i = 1. The preset health matrix is (1, 1, 1,..., 1). The real-time health matrix is compared with the preset health matrix. When there is a difference in the values, the hardware is controlled. The switch adjusts the health coefficient of the corresponding link and adjusts the load of the software switch. For example, it can exclude the software switch corresponding to the link with abnormal health status, reduce or remove its load, and maximize the balance of business processing on the software switch. If a link fails, this method can also be used to interrupt data transmission on the link to avoid the risk of data loss and determine the optimal weight configuration of weighted equal-cost multipath.
本实施例提供的负载均衡方法,获取交换机在本地计算的本地模型的梯度参数,基于梯度参数优化全局模型,并控制交换机基于全局模型对本地模型进行优化,从而计算交换机对应的资源的使用率。基于蚁群算法和得到的交换机的资源的使用率确定最优分配,即交换机上的任务流数量,根据任务流数量确定交换机的权重,设定健康系数,并基于交换机的权重得到实时健康矩阵。通过比较实时健康矩阵和预设健康矩阵实现对交换机的负载的调整,从而实现交换机的负载均衡,减少服务响应时间,提升用户的使用体验。The load balancing method provided in this embodiment obtains the gradient parameters of the local model calculated locally by the switch, optimizes the global model based on the gradient parameters, and controls the switch to optimize the local model based on the global model, thereby calculating the resource usage corresponding to the switch. Based on the ant colony algorithm and the obtained resource utilization of the switch, the optimal allocation is determined, that is, the number of task flows on the switch, the weight of the switch is determined based on the number of task flows, the health coefficient is set, and a real-time health matrix is obtained based on the weight of the switch. By comparing the real-time health matrix and the preset health matrix, the load of the switch is adjusted, thereby achieving load balancing of the switch, reducing service response time, and improving user experience.
在一种实施方式中,对应于图1中的S12,还可以包括如下步骤:In one implementation, corresponding to S12 in Figure 1, the following steps may also be included:
(1)循环获取交换机基于资源的使用率以及蚁群算法确定的子信息素浓度。(1) Cyclically obtain the resource usage of the switch and the sub-pheromone concentration determined by the ant colony algorithm.
设系统中有n台软件交换机,一共分配m只蚂蚁,m=2n,按每台交换机2只蚂蚁执行搜索任务,计算每只蚂蚁移动到下一个交换机节点的概率,蚂蚁根据概率移动到对应的节点,并更新子信息素浓度,为了确定最优路径,需要不断增加迭代次数。Assume that there are n software switches in the system, and a total of m ants are allocated, m=2n. Each switch performs the search task with 2 ants, and the probability of each ant moving to the next switch node is calculated. The ants move to the corresponding node based on the probability. node, and update the sub-pheromone concentration. In order to determine the optimal path, the number of iterations needs to be continuously increased.
(2)对子信息素浓度进行全局更新,得到信息素浓度,并对路径进行优化。(2) Globally update the sub-pheromone concentration to obtain the pheromone concentration and optimize the path.
获取子信息素浓度,根据子信息浓度进行全局更新,增加迭代次数重复计算,不断优化路径,直到迭代次数达到预先设置的迭代次数上限,确定最优路径。Obtain the sub-pheromone concentration, perform a global update based on the sub-information concentration, increase the number of iterations and repeat the calculation, and continuously optimize the path until the number of iterations reaches the preset upper limit of the number of iterations to determine the optimal path.
信息素浓度采用如下公式计算得到:Pheromone concentration is calculated using the following formula:
τ ij(t+1)=(1-ρ)τ ij(t)+Δτ ij(t) τ ij (t+1)=(1-ρ)τ ij (t)+Δτ ij (t)
其中,ρ表示信息素挥发程度,Δτ ij(t)表示子信息素浓度,τ ij(t+1)表示交换机i和交换机j路径上在t+1时刻的信息素浓度。 Among them, ρ represents the degree of pheromone volatilization, Δτ ij (t) represents the sub-pheromone concentration, and τ ij (t+1) represents the pheromone concentration on the path of switch i and switch j at time t+1.
ρ采用如下自适应方式进行调整:ρ is adjusted in the following adaptive manner:
Figure PCTCN2022141797-appb-000023
Figure PCTCN2022141797-appb-000023
(3)基于优化的路径确定交换机上的任务流数量。(3) Determine the number of task flows on the switch based on the optimized path.
获取在最优路径下交换机上的任务流数量。Get the number of task flows on the switch under the optimal path.
在一种实施方式中,对应于图1中的S14,还可以包括如下步骤:In one implementation, corresponding to S14 in Figure 1, the following steps may also be included:
(1)检测交换机的路径健康情况,并根据交换机的路径健康情况确定交换机的实时健康系数。(1) Detect the path health of the switch and determine the real-time health coefficient of the switch based on the path health of the switch.
中央控制器可以实时监控软件交换机的链路健康状况,并基于设置的健康检测系数确定实时健康系数,设置的健康监测系数为ζ,具体如下:The central controller can monitor the link health status of the software switch in real time and determine the real-time health coefficient based on the set health detection coefficient. The set health monitoring coefficient is ζ, as follows:
Figure PCTCN2022141797-appb-000024
Figure PCTCN2022141797-appb-000024
根据实时的链路健康情况,得到各交换机的实时健康系数,可以写为{ζ 1,ζ 2,ζ 3,...,ζ n}。 According to the real-time link health status, the real-time health coefficient of each switch is obtained, which can be written as {ζ 1 , ζ 2 , ζ 3 ,..., ζ n }.
(2)将交换机对应的初始权重与实时健康系数相乘,确定实时健康矩阵。(2) Multiply the initial weight corresponding to the switch and the real-time health coefficient to determine the real-time health matrix.
交换机的初始权重是通过得到的任务流数量计算得到的,当存在多个交换机时,各交换机依次与其他交换机之间的任务流数量的最大公约数相除,得到初始权重,初始权重可写为{w′ 1,w′ 2,w′ 3,...,w′ n}。初始权重和实施健康系数相乘得到的实时健康矩阵为[ζ 1w′ 1,ζ 2w′ 2,ζ 3w′ 3,...,ζ nw′ n},图4为加权等价多路径示意图。 The initial weight of the switch is calculated by the number of task flows obtained. When there are multiple switches, each switch is divided by the greatest common divisor of the number of task flows between other switches in turn to obtain the initial weight. The initial weight can be written as {w′ 1 , w′ 2 , w′ 3 , ..., w′ n }. The real-time health matrix obtained by multiplying the initial weight and the implemented health coefficient is [ζ 1 w′ 1 , ζ 2 w′ 2 , ζ 3 w′ 3 ,..., ζ n w′ n }. Figure 4 shows the weighted equivalent Multipath diagram.
在本实施例中提供了一种负载均衡方法,图3是根据本发明实施例的负载均衡方法的流程图,该方法应用于交换机,如图3所示,该流程包括如下步骤:This embodiment provides a load balancing method. Figure 3 is a flow chart of a load balancing method according to an embodiment of the present invention. The method is applied to a switch. As shown in Figure 3, the process includes the following steps:
S21,获取数据流特征信息,基于数据流特征信息确定梯度参数,并将梯度参数发送给中央控制器。S21, obtain the data flow characteristic information, determine the gradient parameters based on the data flow characteristic information, and send the gradient parameters to the central controller.
交换机可以为软件交换机,数据流特征信息可以包括从src到dst的流数据包大小的平均值,从dst到src的流数据包大小的平均值,包最小值,包最大值,包平均值,包传输时间,握手时间(TCP)等。交换机将数据流特征信息作为数据集,可以将数据集按照预设比例进行划分,例如按照7:3的比例,分别作为训练数据集和测试数据集,同时尽可能保持数 据分布的一致性,避免因数据划分而引入额外的偏差,对最终结果产生影响。The switch may be a software switch, and the data flow characteristic information may include the average flow packet size from src to dst, the average flow packet size from dst to src, the minimum packet value, the maximum packet value, and the average packet size, Packet transmission time, handshake time (TCP), etc. The switch uses the data flow characteristic information as a data set. The data set can be divided according to a preset ratio, such as a 7:3 ratio, as a training data set and a test data set respectively. At the same time, the consistency of the data distribution is maintained as much as possible to avoid Additional bias is introduced due to data partitioning, which affects the final results.
中央控制器将联合GRU(Gated Recurrent Unit,门控循环单元)任务和初始参数发送至交换机,初始参数可以设置为1。GRU神经网络是LSTM(Long short-term memory,长短期记忆)的一个变体,GRU在保持了LSTM的效果同时又使结构更加简单,包括更新门和重置门。更新门控制前一时刻的状态信息被带入到当前状态中的程度,值越大前一时刻的状态信息带入越多。重置门控制忽略前一时刻的状态信息的程度,值越小说明忽略得越多。交换机对划分出的训练数据集进行归一化处理,训练数据集归一化之后存在较大的数值,为防止梯度消失,激活函数σ使用修正线性单元(Rectified Linear Unit,ReLU)。同时,使用ReLU会使部分神经元为0,造成网络的稀疏性,并且减少参数之间的相互依赖关系,有效地缓解了过拟合问题。ReLU函数表达式如下所示:The central controller sends the joint GRU (Gated Recurrent Unit) task and initial parameters to the switch, and the initial parameters can be set to 1. The GRU neural network is a variant of LSTM (Long short-term memory). GRU maintains the effect of LSTM while making the structure simpler, including update gates and reset gates. The update gate controls the extent to which state information from the previous moment is brought into the current state. The larger the value, the more state information from the previous moment is brought into the current state. The reset gate controls the extent to which status information at the previous moment is ignored. The smaller the value, the more it is ignored. The switch normalizes the divided training data set. After the training data set is normalized, there are larger values. In order to prevent the gradient from disappearing, the activation function σ uses a Rectified Linear Unit (ReLU). At the same time, using ReLU will cause some neurons to be 0, causing the sparsity of the network, and reducing the interdependence between parameters, effectively alleviating the over-fitting problem. The ReLU function expression is as follows:
ReLU=max(0,x)ReLU=max(0,x)
为充分利用软件交换机的计算性能,以及缓解中央控制器的性能压力,并减少可避免的数据传输所带来的额外带宽开销。对于CPU、内存和网络带宽的时序预测,将GRU与联邦学习相结合,如图2所示。In order to make full use of the computing performance of the software switch, relieve the performance pressure on the central controller, and reduce the additional bandwidth overhead caused by avoidable data transmission. For timing prediction of CPU, memory and network bandwidth, GRU is combined with federated learning, as shown in Figure 2.
交换机获取中央控制器发送的联合GRU任务,开启联合GRU任务并初始化系统参数后,在本地根据CPU、内存、网络带宽以及数据流特征信息等本地数据进行计算,计算完成后将得到的梯度参数发送给中央控制器。The switch obtains the joint GRU task sent by the central controller. After starting the joint GRU task and initializing the system parameters, it performs calculations locally based on local data such as CPU, memory, network bandwidth, and data flow characteristic information. After the calculation is completed, the obtained gradient parameters are sent. to the central controller.
S22,获取中央控制器发送的全局模型,基于全局模型优化本地模型,并根据本地模型与数据流特征信息计算得到资源的使用率。S22: Obtain the global model sent by the central controller, optimize the local model based on the global model, and calculate the resource usage based on the local model and data flow characteristic information.
全局模型是中央控制器基于梯度参数优化得到的,中央控制器接收至少一个交换机发送的梯度参数。在联邦学习中通过分布式训练得到各个局部的梯度参数,再根据各局部梯度参数对全局模型进行优化。中央控制器在收到交换机的梯度参数后,对这些梯度参数进行聚合操作,在聚合的过程中侧重考虑效率、性能等因素。例如,因为系统的异构性,中央控制器有时可能不会等待所有交换机的数据上传,而是选择一个合适的交换机子集作为收集目标。中央控制器基于获取的梯度参数对全局模型进行聚合优化后,将优化后的全局模型发送给参与GRU任务的交换机。交换机根据接收到的全局模型更新本地模型,并对本地模型性能进行评估,若性能达到预设情况,即性能足够好时,停止训练,联合建模结束;若性能不足,交换机在本地再次计算梯度参数并发送给中央控制器,直至最终的本地模型性能达到预设情况。中央控制器保存训练好的全局模型,可以通过全局模型计算得到初始参数,并将初始参 数发送给交换机,交换机基于初始参数以及训练好的本地模型计算该交换机对应的各资源的使用率,例如CPU、内存、网络带宽等的使用率。The global model is optimized by the central controller based on gradient parameters, and the central controller receives gradient parameters sent by at least one switch. In federated learning, each local gradient parameter is obtained through distributed training, and then the global model is optimized based on each local gradient parameter. After receiving the gradient parameters of the switch, the central controller performs an aggregation operation on these gradient parameters, focusing on efficiency, performance and other factors during the aggregation process. For example, because of the heterogeneous nature of the system, the central controller may sometimes not wait for data upload from all switches, but select a suitable subset of switches as collection targets. After the central controller aggregates and optimizes the global model based on the obtained gradient parameters, it sends the optimized global model to the switches participating in the GRU task. The switch updates the local model based on the received global model and evaluates the performance of the local model. If the performance reaches the preset condition, that is, when the performance is good enough, the training stops and the joint modeling ends; if the performance is insufficient, the switch calculates the gradient again locally. parameters and sent to the central controller until the final local model performance reaches the preset condition. The central controller saves the trained global model. It can calculate the initial parameters through the global model and send the initial parameters to the switch. The switch calculates the usage of each resource corresponding to the switch based on the initial parameters and the trained local model, such as CPU. , memory, network bandwidth, etc. usage.
S23,基于资源的使用率和蚁群算法确定子信息素浓度,并将子信息素浓度发送给中央控制器,以调整负载。S23, determine the sub-pheromone concentration based on resource usage and ant colony algorithm, and send the sub-pheromone concentration to the central controller to adjust the load.
中央控制器获取交换机计算的各自的资源的使用率,采用蚁群算法来确定各交换机上的任务流数量。蚁群算法是一种模拟自然界蚂蚁搜索食物回巢行为的人工智能优化算法,通过蚁群个体间的协作以寻找最优路径。蚁群算法是进化算法中的一种启发式全局优化算法,具有分布式计算、信息正反馈和启发式搜索的特征,其基本思路为:用蚂蚁的行走路径表示待优化问题的可行解,整个蚂蚁群体的所有路径构成待优化问题的解空间。路径较短的蚂蚁释放的信息素量较多,随着时间的推进,较短的路径上累积的信息素浓度逐渐增高,选择该路径的蚂蚁个数也愈来愈多。The central controller obtains the resource usage calculated by the switches and uses the ant colony algorithm to determine the number of task flows on each switch. Ant colony algorithm is an artificial intelligence optimization algorithm that simulates the behavior of ants searching for food and returning to the nest in nature. It finds the optimal path through the cooperation among individual ant colonies. Ant colony algorithm is a heuristic global optimization algorithm in evolutionary algorithms. It has the characteristics of distributed computing, positive information feedback and heuristic search. Its basic idea is: use the walking path of ants to represent the feasible solution to the problem to be optimized. The entire All paths of the ant colony constitute the solution space of the problem to be optimized. Ants with shorter paths release more pheromones. As time goes by, the accumulated pheromone concentration on the shorter path gradually increases, and the number of ants choosing this path also increases.
传统蚁群算法的收敛速度相对较慢,随机性较大,因此寻求最优解的效率相对较低,在解决优化问题时容易陷入局部最优,从而错过全局最优解。在中央控制器和交换机构成的系统中,由于系统中可能存在多个交换机,且各个交换机存在异构性,因此计算能力、网络带宽等可能都存在差异,使得系统可能处于一个动态分配的过程中。各个交换机在每个时刻承担的负载差异较大,若一部分交换机的性能较差,另一部分的交换机性能较好,那么大量任务流容易集中到性能较好的交换机上执行,而性能较差的交换机可能处于空闲状态。因此,需要根据交换机的性能优势分担中央控制器的压力,实现负载均衡。The convergence speed of the traditional ant colony algorithm is relatively slow and the randomness is large. Therefore, the efficiency of seeking optimal solutions is relatively low. When solving optimization problems, it is easy to fall into local optima, thus missing the global optimal solution. In a system composed of a central controller and a switch, since there may be multiple switches in the system, and each switch is heterogeneous, there may be differences in computing power, network bandwidth, etc., so the system may be in a process of dynamic allocation. . The load borne by each switch at each moment is greatly different. If the performance of some switches is poor and the performance of another part is good, then a large number of task flows will be easily concentrated on the switches with better performance, and the switches with poor performance will be executed. May be idle. Therefore, it is necessary to share the pressure of the central controller based on the performance advantages of the switch to achieve load balancing.
系统中除了中央控制器、软件交换机,还可以包括硬件交换机,设系统中有n台软件交换机,一共分配m只蚂蚁,m=2n,按每台交换机2只蚂蚁执行搜索任务,每只蚂蚁的初始点为硬件交换机。In addition to the central controller and software switches, the system can also include hardware switches. Suppose there are n software switches in the system. A total of m ants are allocated, m = 2n. According to each switch, 2 ants perform the search task. Each ant's The initial point is the hardware switch.
在获取交换机的资源的使用率后,对蚁群算法的相关参数进行初始化,初始化包括设置算法的迭代次数上限以及初始的信息素浓度,信息素浓度可表示完成任务的效率。将CPU、内存和网络带宽的已使用率,分别记为U cpu、U mem和U net,为综合考虑CPU、内存和网络带宽的性能,将距离调整为已负载能力: After obtaining the resource usage of the switch, initialize the relevant parameters of the ant colony algorithm. The initialization includes setting the upper limit of the number of iterations of the algorithm and the initial pheromone concentration. The pheromone concentration can represent the efficiency of completing the task. The used rates of CPU, memory and network bandwidth are recorded as U cpu , U mem and U net respectively. In order to comprehensively consider the performance of CPU, memory and network bandwidth, the distance is adjusted to the load capacity:
Figure PCTCN2022141797-appb-000025
Figure PCTCN2022141797-appb-000025
Figure PCTCN2022141797-appb-000026
Figure PCTCN2022141797-appb-000026
其中,
Figure PCTCN2022141797-appb-000027
Figure PCTCN2022141797-appb-000028
分别表示在已负载能力中CPU、内存、网络带宽所占权重的大小。将Φ用于改进传统蚁群算法中的启发因子η,Φ=1/η,即,η ij=1/Φ ij
in,
Figure PCTCN2022141797-appb-000027
and
Figure PCTCN2022141797-appb-000028
Respectively represent the weight of CPU, memory, and network bandwidth in the loaded capacity. Φ is used to improve the heuristic factor η in the traditional ant colony algorithm, Φ=1/η, that is, η ij =1/Φ ij .
η ij越小,说明任务流选择该交换机的Φ ij值偏大,即负载已偏大,选择该交换机会使整个系统负载更加不均衡,相反,如果所得出的η ij值越大,则对应的Φ ij值偏小,即负载目前偏小,选择该交换机执行任务将会促进整个系统的负载均衡。所以,该改进可以促使任务流在一些相对空闲的交换机上执行,经过算法多次迭代以后,改进后的算法最终能够实现整体的负载均衡。 The smaller η ij is, it means that the Φ ij value of the switch selected by the task flow is too large, that is, the load is already too large. Selecting this switch will make the load of the entire system more unbalanced. On the contrary, if the obtained η ij value is larger, the corresponding The value of Φ ij is too small, that is, the load is currently too small. Selecting this switch to perform tasks will promote load balancing of the entire system. Therefore, this improvement can prompt task flows to be executed on some relatively idle switches. After multiple iterations of the algorithm, the improved algorithm can finally achieve overall load balancing.
由于信息素挥发程度ρ对算法搜索性能影响较大,ρ越大,全局搜索能力越差,ρ越小,局部搜索能力越差,收敛速度也越慢。因此,ρ取值采用如下自适应方式进行调整:Since the degree of pheromone volatilization ρ has a greater impact on the search performance of the algorithm, the larger ρ is, the worse the global search ability is, and the smaller ρ is, the worse the local search ability is and the slower the convergence speed is. Therefore, the value of ρ is adjusted in the following adaptive manner:
Figure PCTCN2022141797-appb-000029
Figure PCTCN2022141797-appb-000029
另外,对信息素更新方式进行改进,使用精英蚂蚁系统,当蚂蚁k完成一次路径搜索后,全局更新方式仍采用标准蚁群优化,而局部更新方式则进行调整。In addition, the pheromone update method is improved and the elite ant system is used. When ant k completes a path search, the global update method still uses standard ant colony optimization, while the local update method is adjusted.
分配到软件交换机S j上的目标任务O i的完成时间为C ij,则C ij应为硬件交换机到S j的传输时间T ij加上O i在S j上的实际执行时间W ij再加上从传输到执行前的延迟时间W ijThe completion time of the target task O i assigned to the software switch S j is C ij , then C ij should be the transmission time T ij from the hardware switch to S j plus the actual execution time W ij of O i on S j plus The delay time W ij from transmission to execution,
即:Right now:
C ij=T ij+E ij+W ij C ij =T ij +E ij +W ij
任务O i的数据量大小记为F i,P j表示软件交换机S j的性能,N j表示软件交换机S j的网络带宽,则: The data volume of task O i is recorded as F i , P j represents the performance of software switch S j , and N j represents the network bandwidth of software switch S j , then:
E ij=F i/P jE ij =F i /P j ,
T ij=F i/N j T ij =F i /N j
由于软件交换机并发执行任务,因此系统将所有任务执行完成的时间也就是所有C ij中的最大值C maxSince the software switch executes tasks concurrently, the time it takes for the system to complete all tasks is the maximum value C max among all C ij :
C max=max(C ij) C max =max(C ij )
由于本次优化的总目标是:最小化任务流的完成时间,因此目标即为最小化,即:Since the overall goal of this optimization is to minimize the completion time of the task flow, the goal is to minimize, that is:
min C maxmin C max .
此时,蚁周模型为:At this time, the ant circle model is:
Figure PCTCN2022141797-appb-000030
Figure PCTCN2022141797-appb-000030
其中,C k表示蚂蚁k搜索路径的总完成时间,Q表示完成一次搜索后路径上遗留的信息素总量,
Figure PCTCN2022141797-appb-000031
表示第k只蚂蚁在路径上释放的信息素总量。
Among them, C k represents the total completion time of ant k’s search path, Q represents the total amount of pheromone left on the path after completing a search,
Figure PCTCN2022141797-appb-000031
Indicates the total amount of pheromone released by the k-th ant on the path.
将发现的最优路径记为Γ bs,对于此路径更新局部信息素时,添加人工释放额外的信息素,以增强正反馈效果。此时,局部更新公式为: The optimal path found is recorded as Γ bs . When updating the local pheromone for this path, artificial release of additional pheromone is added to enhance the positive feedback effect. At this time, the local update formula is:
Figure PCTCN2022141797-appb-000032
Figure PCTCN2022141797-appb-000032
其中,Δτ ij(t)表示蚁群在路径上释放的信息素总量,即子信息素浓度,
Figure PCTCN2022141797-appb-000033
表示第k只蚂蚁在路径上释放的信息素总量,e是Γ bs的影响权重因子,
Figure PCTCN2022141797-appb-000034
表示路径上新增的各个额外信息素,公式如下:
Among them, Δτ ij (t) represents the total amount of pheromone released by the ant colony on the path, that is, the sub-pheromone concentration,
Figure PCTCN2022141797-appb-000033
represents the total amount of pheromone released by the k-th ant on the path, e is the influence weight factor of Γ bs ,
Figure PCTCN2022141797-appb-000034
Represents each additional pheromone added on the path, the formula is as follows:
Figure PCTCN2022141797-appb-000035
Figure PCTCN2022141797-appb-000035
其中,C bs表示已知最优路径Γ bs的完成时间。 Among them, C bs represents the completion time of the known optimal path Γ bs .
每一个交换机都是一个节点,可根据得到的转移函数计算每只蚂蚁移动到一个节点的概率,如下:Each switch is a node, and the probability of each ant moving to a node can be calculated based on the obtained transfer function, as follows:
Figure PCTCN2022141797-appb-000036
Figure PCTCN2022141797-appb-000036
其中,
Figure PCTCN2022141797-appb-000037
表示蚂蚁k在下一时刻访问交换机j的概率,α表示蚂蚁对信息素的敏感程度,β表示蚁群对信息素的敏感程度,
Figure PCTCN2022141797-appb-000038
表示交换机i和交换机j路径上在t时刻的信息素浓度,
Figure PCTCN2022141797-appb-000039
η ij表示启发因子,启发因子用于描述交换机j对交换机i上蚂蚁的吸引程度,可表示为η ij=1/d ij,d ij表示交换机j和交换机i间的距离,allowed k表示尚未访问的交换机集合,蚂蚁根据计算得到的概率移动到相应的交换机节点。
in,
Figure PCTCN2022141797-appb-000037
represents the probability that ant k will visit switch j at the next moment, α represents the sensitivity of ants to pheromone, β represents the sensitivity of ant colony to pheromone,
Figure PCTCN2022141797-appb-000038
represents the pheromone concentration on the path of switch i and switch j at time t,
Figure PCTCN2022141797-appb-000039
η ij represents the heuristic factor. The heuristic factor is used to describe the degree of attraction of switch j to ants on switch i. It can be expressed as η ij =1/d ij , d ij represents the distance between switch j and switch i, and allowed k means that it has not been visited yet. The set of switches, the ants move to the corresponding switch node according to the calculated probability.
当蚂蚁移动到新交换机节点,更新其经过路径的信息素,并对禁忌表进行相应的修改,根据信息素的局部更新公式得到子信息浓度Δτ ij(t)。 When an ant moves to a new switch node, it updates the pheromone of its path and makes corresponding modifications to the tabu table. The sub-information concentration Δτ ij (t) is obtained according to the local update formula of the pheromone.
中央控制器汇总各软件交换机确定的子信息素浓度,并根据目标函min C max对所有可行路径进行评价,并选择当前最优路径Γ bs,基于确定的最优路径对所有路径上的信息素进行全局更新。为了确定最优路径,即分配问题的最优解,可以不断递增迭代次数,直到迭代次数达到预先设置的迭代次数上限,获取在最优路径下各交换机上的任务流数量。 The central controller summarizes the sub-pheromone concentrations determined by each software switch, evaluates all feasible paths according to the objective function min C max , and selects the current optimal path Γ bs , and evaluates the pheromones on all paths based on the determined optimal path. Make a global update. In order to determine the optimal path, that is, the optimal solution to the allocation problem, the number of iterations can be continuously increased until the number of iterations reaches the preset upper limit of the number of iterations, and the number of task flows on each switch under the optimal path is obtained.
其中,初始权重是基于任务流数量计算得到的,当存在多个交换机时,各交换机依次与其他交换机之间的任务流数量的最大公约数相除,得到初始权重,该初始权重为加权等价多路径的权重配置初始值。可以将得到的各软件交换机的初始权重发送给硬件交换机,硬件交换机中可以包括路径分发模块,可用于确定各软件交换机与中央控制器之间的链路。Among them, the initial weight is calculated based on the number of task flows. When there are multiple switches, each switch is divided by the greatest common divisor of the number of task flows between other switches in turn to obtain the initial weight. The initial weight is the weighted equivalent Initial value of multipath weight configuration. The obtained initial weight of each software switch can be sent to the hardware switch. The hardware switch can include a path distribution module, which can be used to determine the link between each software switch and the central controller.
设定交换机的健康系数为ζ,健康系数主要表示的是各链路的健康情况,当ζ=1,表示链路健康检测正常,当ζ=0,表示链路健康检测异常。中央控制器可通过监控模块监控软件交换机的实时资源的使用率和链路健康,链路健康情况可作为硬件交换机动态自适应调整多路径选择的依据。中央控制器对交换机的进行监测,获取实时健康系数,可以将实时健康系 数与初始权重相乘得到实时健康矩阵。Set the health coefficient of the switch to ζ. The health coefficient mainly represents the health of each link. When ζ=1, it means that the link health detection is normal. When ζ=0, it means that the link health detection is abnormal. The central controller can monitor the real-time resource usage and link health of the software switch through the monitoring module. The link health can be used as the basis for the hardware switch to dynamically adaptively adjust multi-path selection. The central controller monitors the switch and obtains the real-time health coefficient. The real-time health coefficient can be multiplied by the initial weight to obtain the real-time health matrix.
通过设置ζ i=1得到预设健康矩阵,预设健康矩阵为(1,1,1,...,1),将实时健康矩阵与预设健康矩阵比较,当数值有差异时,控制硬件交换机调整对应链路的健康系数,并调整软件交换机的负载,例如,可以排除健康情况异常的链路对应的软件交换机,减少或去除其的负载,最大化地均衡了软件交换机上的业务处理。若链路出现故障,也可以通过该方式中断该链路上数据的传输,避免数据丢失的风险。 The preset health matrix is obtained by setting ζ i = 1. The preset health matrix is (1, 1, 1,..., 1). The real-time health matrix is compared with the preset health matrix. When there is a difference in the values, the hardware is controlled. The switch adjusts the health coefficient of the corresponding link and adjusts the load of the software switch. For example, it can exclude the software switch corresponding to the link with abnormal health status, reduce or remove its load, and maximize the balance of business processing on the software switch. If a link fails, this method can also be used to interrupt data transmission on the link to avoid the risk of data loss.
在一种实施方式中,基于资源的使用率和蚁群算法确定子信息素浓度,包括如下步骤:In one implementation, determining the sub-pheromone concentration based on resource usage and ant colony algorithm includes the following steps:
(1)基于蚁群算法分配蚂蚁的搜索任务,基于资源的使用率确定启发因子。系统中除了中央控制器、软件交换机,还可以包括硬件交换机,设系统中有n台软件交换机,一共分配m只蚂蚁,m=2n,按每台交换机2只蚂蚁执行搜索任务,每只蚂蚁的初始点为硬件交换机。(1) Allocate search tasks of ants based on ant colony algorithm, and determine heuristic factors based on resource usage. In addition to the central controller and software switches, the system can also include hardware switches. Suppose there are n software switches in the system. A total of m ants are allocated, m = 2n. According to each switch, 2 ants perform the search task. Each ant's The initial point is the hardware switch.
在获取交换机的资源的使用率后,对蚁群算法的相关参数进行初始化,初始化包括设置算法的迭代次数上限以及初始的信息素浓度,信息素浓度可表示完成任务的效率。将CPU、内存和网络带宽的已使用率,分别记为U cpu、U mem和U net,为综合考虑CPU、内存和网络带宽的性能,将距离调整为已负载能力: After obtaining the resource usage of the switch, initialize the relevant parameters of the ant colony algorithm. The initialization includes setting the upper limit of the number of iterations of the algorithm and the initial pheromone concentration. The pheromone concentration can represent the efficiency of completing the task. The used rates of CPU, memory and network bandwidth are recorded as U cpu , U mem and U net respectively. In order to comprehensively consider the performance of CPU, memory and network bandwidth, the distance is adjusted to the load capacity:
Figure PCTCN2022141797-appb-000040
Figure PCTCN2022141797-appb-000040
Figure PCTCN2022141797-appb-000041
Figure PCTCN2022141797-appb-000041
其中:
Figure PCTCN2022141797-appb-000042
Figure PCTCN2022141797-appb-000043
分别表示在已负载能力中CPU、内存、网络带宽所占权重的大小。将Φ用于改进传统蚁群算法中的启发因子η,Φ=1/η,即,η ij=1/Φ ij
in:
Figure PCTCN2022141797-appb-000042
and
Figure PCTCN2022141797-appb-000043
Respectively represent the weight of CPU, memory, and network bandwidth in the loaded capacity. Φ is used to improve the heuristic factor η in the traditional ant colony algorithm, Φ=1/η, that is, η ij =1/Φ ij .
η ij越小,说明任务流选择该交换机的Φ ij值偏大,即负载已偏大,选择该交换机会使整个系统负载更加不均衡,相反,如果所得出的η ij值越大,则对应的Φ ij值偏小,即负载目前偏小,选择该交换机执行任务将会促进整个系统的负载均衡。所以,该改进可以促使任务流在一些相对空闲的交换机上执行,经过算法多次迭代以后,改进后的算法最终能够实现整体的负载均衡。 The smaller η ij is, it means that the Φ ij value of the switch selected by the task flow is too large, that is, the load is already too large. Selecting this switch will make the load of the entire system more unbalanced. On the contrary, if the obtained η ij value is larger, the corresponding The value of Φ ij is too small, that is, the load is currently too small. Selecting this switch to perform tasks will promote load balancing of the entire system. Therefore, this improvement can prompt task flows to be executed on some relatively idle switches. After multiple iterations of the algorithm, the improved algorithm can finally achieve overall load balancing.
(2)根据启发因子计算蚂蚁移动到其他交换机的概率。(2) Calculate the probability of ants moving to other switches based on the heuristic factor.
在一种实施方式中,蚂蚁移动到其他交换机的概率采用如下公式计算:In one implementation, the probability of ants moving to other switches is calculated using the following formula:
Figure PCTCN2022141797-appb-000044
Figure PCTCN2022141797-appb-000044
其中,
Figure PCTCN2022141797-appb-000045
表示蚂蚁k在下一时刻访问交换机j的概率,α表示蚂蚁对信息素的敏感程度,β表示蚁群对信息素的敏感程度,
Figure PCTCN2022141797-appb-000046
表示交换机i和交换机j路径上在t时刻的信息素浓度,
Figure PCTCN2022141797-appb-000047
η ij表示启发因子,启发因子用于描述交换机j对交换机i上蚂蚁的吸引程度,可表示为η ij=1/d ij,d ij表示交换机j和交换机i间的距离,allowed k表示尚未访问的交换机集合。
in,
Figure PCTCN2022141797-appb-000045
represents the probability that ant k will visit switch j at the next moment, α represents the sensitivity of ants to pheromone, β represents the sensitivity of ant colony to pheromone,
Figure PCTCN2022141797-appb-000046
represents the pheromone concentration on the path of switch i and switch j at time t,
Figure PCTCN2022141797-appb-000047
η ij represents the heuristic factor. The heuristic factor is used to describe the degree of attraction of switch j to ants on switch i. It can be expressed as η ij =1/d ij , d ij represents the distance between switch j and switch i, and allowed k means that it has not been visited yet. A collection of switches.
(3)当蚂蚁移动至其他交换机,根据路径上新增的信息素以及蚁周模型计算子信息素浓度。(3) When ants move to other switches, the sub-pheromone concentration is calculated based on the new pheromone on the path and the ant circle model.
由于信息素挥发程度ρ对算法搜索性能影响较大,ρ越大,全局搜索能力越差,ρ越小,局部搜索能力越差,收敛速度也越慢。因此,ρ取值采用如下自适应方式进行调整:Since the degree of pheromone volatilization ρ has a greater impact on the search performance of the algorithm, the larger ρ is, the worse the global search ability is, and the smaller ρ is, the worse the local search ability is and the slower the convergence speed is. Therefore, the value of ρ is adjusted in the following adaptive manner:
Figure PCTCN2022141797-appb-000048
Figure PCTCN2022141797-appb-000048
另外,对信息素更新方式进行改进,使用精英蚂蚁系统,当蚂蚁k完成一次路径搜索后,全局更新方式仍采用标准蚁群优化,而局部更新方式则进行调整。In addition, the pheromone update method is improved and the elite ant system is used. When ant k completes a path search, the global update method still uses standard ant colony optimization, while the local update method is adjusted.
分配到软件交换机S j上的目标任务O i的完成时间为C ij,则C ij应为硬件交换机到S j的传输时间T ij加上O i在S j上的实际执行时间E ij再加上从传输到执行前的延迟时间W ijThe completion time of the target task O i assigned to the software switch S j is C ij , then C ij should be the transmission time T ij from the hardware switch to S j plus the actual execution time E ij of O i on S j plus The delay time W ij from transmission to execution,
即:Right now:
C ij=T ij+E ij+W ij C ij =T ij +E ij +W ij
任务O i的数据量大小记为F i,P j表示软件交换机S j的性能,N j表示软件交换机S j的网络带宽,则: The data volume of task O i is recorded as F i , P j represents the performance of software switch S j , and N j represents the network bandwidth of software switch S j , then:
E ij=F i/P jE ij =F i /P j ,
T ij=F i/N j T ij =F i /N j
由于软件交换机并发执行任务,因此系统将所有任务执行完成的时间也就是所有C ij中的最大值C maxSince the software switch executes tasks concurrently, the time it takes for the system to complete all tasks is the maximum value C max among all C ij :
C max=max(C ij) C max =max(C ij )
由于本次优化的总目标是:最小化任务流的完成时间,因此目标即为最小化,即:Since the overall goal of this optimization is to minimize the completion time of the task flow, the goal is to minimize, that is:
min C maxmin C max .
此时,蚁周模型为:At this time, the ant circle model is:
Figure PCTCN2022141797-appb-000049
Figure PCTCN2022141797-appb-000049
其中,C k表示蚂蚁k搜索路径的总完成时间,Q表示完成一次搜索后路径上遗留的信息素总量,
Figure PCTCN2022141797-appb-000050
表示第k只蚂蚁在路径上释放的信息素总量。
Among them, C k represents the total completion time of ant k’s search path, Q represents the total amount of pheromone left on the path after completing a search,
Figure PCTCN2022141797-appb-000050
Indicates the total amount of pheromone released by the k-th ant on the path.
新增的信息素
Figure PCTCN2022141797-appb-000051
公式如下:
New pheromones
Figure PCTCN2022141797-appb-000051
The formula is as follows:
Figure PCTCN2022141797-appb-000052
Figure PCTCN2022141797-appb-000052
其中,C bs表示已知最优路径Γ bs的完成时间。 Among them, C bs represents the completion time of the known optimal path Γ bs .
将发现的最优路径记为Γ bs,对于此路径更新局部信息素时,添加人工释放额外的信息素, The optimal path found is recorded as Γ bs . When updating the local pheromone for this path, artificial release of additional pheromone is added.
以增强正反馈效果。此时,局部更新公式为:to enhance the positive feedback effect. At this time, the local update formula is:
Figure PCTCN2022141797-appb-000053
Figure PCTCN2022141797-appb-000053
其中,Δτ ij(t)表示蚁群在路径上释放的信息素总量,即子信息素浓度,
Figure PCTCN2022141797-appb-000054
表示第k 只蚂蚁在路径上释放的信息素总量,e是Γ bs的影响权重因子,
Figure PCTCN2022141797-appb-000055
表示路径上新增的信息素。
Among them, Δτ ij (t) represents the total amount of pheromone released by the ant colony on the path, that is, the sub-pheromone concentration,
Figure PCTCN2022141797-appb-000054
represents the total amount of pheromone released by the k-th ant on the path, e is the influence weight factor of Γ bs ,
Figure PCTCN2022141797-appb-000055
Indicates new pheromones on the path.
在本申请中,设计了基于联邦学习的GRU时序性能预测方法,减少了实时训练所导致的额外时延增加,满足了短流对时延敏感的要求。同时,充分利用了软件交换机上高性能的优点,使用联邦学习,缓解了中央控制器的压力,减少了可避免的数据传输。通过预测的方式等价地获知了链路的负载情况。基于优化蚁群算法的分布式加权等价多路径路由方法,使得负载均衡考虑了软件交换机上设备的异构性,采用了更加合理的负载评判方式,并且提高了算法的收敛性,加快了结果的求解速度。通过实时链路健康检测,使得在链路出现故障的第一时间,中断数据的传输,避免了数据丢失的风险。在海量短流数据的场景下,充分利用交换机节点计算性能优异的特点,结合各交换机节点之间CPU、内存和网络的差异,预测出各路径最佳的权重配置,进行对应的加权等价多路径的选择。缩短了整体服务的响应时间,提升了用户使用体验。In this application, a GRU timing performance prediction method based on federated learning is designed, which reduces the increase in additional delay caused by real-time training and meets the delay-sensitive requirements of short flows. At the same time, it takes full advantage of the high performance of software switches and uses federated learning to relieve the pressure on the central controller and reduce avoidable data transmission. The load status of the link is equivalently known through prediction. The distributed weighted equal-cost multi-path routing method based on the optimized ant colony algorithm enables load balancing to take into account the heterogeneity of devices on software switches, adopts a more reasonable load evaluation method, improves the convergence of the algorithm, and speeds up the results. solution speed. Through real-time link health detection, data transmission can be interrupted as soon as a link fails, thus avoiding the risk of data loss. In the scenario of massive short-flow data, we make full use of the excellent computing performance of switch nodes and combine the differences in CPU, memory and network between switch nodes to predict the optimal weight configuration of each path and perform the corresponding weighted equivalent multi-step calculation. Path selection. It shortens the response time of the overall service and improves the user experience.
在本实施例中还提供了一种负载均衡系统,该系统用于实现上述实施例及实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的系统较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。This embodiment also provides a load balancing system, which is used to implement the above embodiments and implementation modes. What has been explained will not be described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the systems described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
本实施例提供一种负载均衡系统,如图5所示,包括:This embodiment provides a load balancing system, as shown in Figure 5, including:
中央控制器,中央控制器用于执行负载均衡方法;Central controller, the central controller is used to execute the load balancing method;
至少一个交换机,交换机与所述中央控制器连接,用于执行负载均衡方法。At least one switch is connected to the central controller for performing a load balancing method.
在一种实施方式中,系统包括软件交换机、硬件交换机和中央控制器,该系统如图6所示。硬件交换机包括路径分发模块,可基于权重配置进行短流的多路径选择。同时,根据中央控制器实时监控后,发送的链路健康检测结果,动态自适应的调整多路径选择。In one implementation, the system includes a software switch, a hardware switch and a central controller. The system is shown in Figure 6 . The hardware switch includes a path distribution module that can perform multi-path selection for short flows based on weight configuration. At the same time, multi-path selection is dynamically and adaptively adjusted based on the link health detection results sent by the central controller after real-time monitoring.
中央控制器包括监控模块、性能预测模块和路径训练模块。监控模块用于监控检测各个软件交换机节点的实时网络利用率和链路健康,作为硬件交换机动态自适应调整多路径选择的依据。性能预测模块可以协同软件交换机,联邦式训练出CPU、内存和网络带宽的使用率的预测模型。路径训练模块可以协同软件交换机,分布式预测出加权等价多路径的最佳权重配置。The central controller includes a monitoring module, a performance prediction module and a path training module. The monitoring module is used to monitor and detect the real-time network utilization and link health of each software switch node, which serves as the basis for the dynamic adaptive adjustment of multi-path selection by the hardware switch. The performance prediction module can cooperate with the software switch to federally train a prediction model for CPU, memory and network bandwidth usage. The path training module can cooperate with the software switch to predict the optimal weight configuration of weighted equal-cost multi-paths in a distributed manner.
软件交换机包括监控模块、性能预测模块和路径训练模块。监控模块用于监控并记录软件交换机本地的资源的使用情况,本地资源包括CPU、内存、网络带宽等。同时还可以记录中央控制器发送的本地网络利用率和链路健康情况,以此作为性能预测模块的数据来源。性能 预测模块可以协调中央控制器,基于联邦学习计算资源的使用率。路径训练模块,可以协调中央控制器分布式计算被分配的路径搜索值结果。The software switch includes a monitoring module, a performance prediction module and a path training module. The monitoring module is used to monitor and record the usage of local resources of the software switch. Local resources include CPU, memory, network bandwidth, etc. At the same time, the local network utilization and link health status sent by the central controller can also be recorded as a data source for the performance prediction module. The performance prediction module can coordinate the central controller to calculate resource usage based on federated learning. The path training module can coordinate the distributed calculation of the assigned path search value results by the central controller.
本实施例中的负载均衡系统是以功能单元的形式来呈现,这里的单元是指ASIC电路,执行一个或多个软件或固定程序的处理器和存储器,和/或其他可以提供上述功能的器件。The load balancing system in this embodiment is presented in the form of a functional unit, where the unit refers to an ASIC circuit, a processor and memory that executes one or more software or fixed programs, and/or other devices that can provide the above functions. .
上述各个模块的更进一步的功能描述与上述对应实施例相同,在此不再赘述。Further functional descriptions of each of the above modules are the same as those in the above corresponding embodiments, and will not be described again here.
本发明实施例还提供一种电子设备,具有上述图5所示的负载均衡系统。An embodiment of the present invention also provides an electronic device having the load balancing system shown in Figure 5 above.
请参阅图7,图7是本发明实施例提供的一种电子设备的结构示意图,如图7所示,该电子设备可以包括:至少一个处理器601,例如CPU(Central Processing Unit,中央处理器),至少一个通信接口603,存储器604,至少一个通信总线602。其中,通信总线602用于实现这些组件之间的连接通信。其中,通信接口603可以包括显示屏(Display)、键盘(Keyboard),可选通信接口603还可以包括标准的有线接口、无线接口。存储器604可以是高速RAM存储器(Random Access Memory,易挥发性随机存取存储器),也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器604可选的还可以是至少一个位于远离前述处理器601的存储装置。其中处理器601可以结合图5所描述的系统,存储器604中存储应用程序,且处理器601调用存储器604中存储的程序代码,以用于执行上述任一方法步骤。Please refer to Figure 7. Figure 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention. As shown in Figure 7, the electronic device may include: at least one processor 601, such as a CPU (Central Processing Unit). ), at least one communication interface 603, memory 604, and at least one communication bus 602. Among them, the communication bus 602 is used to realize connection communication between these components. Among them, the communication interface 603 may include a display screen (Display) and a keyboard (Keyboard), and the optional communication interface 603 may also include a standard wired interface and a wireless interface. The memory 604 can be a high-speed RAM memory (Random Access Memory, volatile random access memory), or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 604 may optionally be at least one storage device located remotely from the aforementioned processor 601. The processor 601 can be combined with the system described in FIG. 5 , the memory 604 stores an application program, and the processor 601 calls the program code stored in the memory 604 to execute any of the above method steps.
其中,通信总线602可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。通信总线602可以分为地址总线、数据总线、控制总线等。为便于表示,图7中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The communication bus 602 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The communication bus 602 can be divided into an address bus, a data bus, a control bus, etc. For ease of presentation, only one thick line is used in Figure 7, but it does not mean that there is only one bus or one type of bus.
其中,存储器604可以包括易失性存储器(英文:volatile memory),例如随机存取存储器(英文:random-access memory,缩写:RAM);存储器也可以包括非易失性存储器(英文:non-volatile memory),例如快闪存储器(英文:flash memory),硬盘(英文:hard disk drive,缩写:HDD)或固态硬盘(英文:solid-state drive,缩写:SSD);存储器604还可以包括上述种类的存储器的组合。Among them, the memory 604 may include volatile memory (English: volatile memory), such as random access memory (English: random-access memory, abbreviation: RAM); the memory may also include non-volatile memory (English: non-volatile memory), such as flash memory (English: flash memory), hard disk (English: hard disk drive, abbreviation: HDD) or solid-state drive (English: solid-state drive, abbreviation: SSD); the memory 604 can also include the above types memory combination.
其中,处理器601可以是中央处理器(英文:central processing unit,缩写:CPU),网络处理器(英文:network processor,缩写:NP)或者CPU和NP的组合。Among them, the processor 601 can be a central processing unit (English: central processing unit, abbreviation: CPU), a network processor (English: network processor, abbreviation: NP) or a combination of CPU and NP.
其中,处理器601还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(英文:application-specific integrated circuit,缩写:ASIC),可编程逻辑器件(英文:programmable logic device,缩写:PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(英文: complex programmable logic device,缩写:CPLD),现场可编程逻辑门阵列(英文:field-programmable gate array,缩写:FPGA),通用阵列逻辑(英文:generic array logic,缩写:GAL)或其任意组合。The processor 601 may further include a hardware chip. The above-mentioned hardware chip can be an application-specific integrated circuit (English: application-specific integrated circuit, abbreviation: ASIC), a programmable logic device (English: programmable logic device, abbreviation: PLD) or a combination thereof. The above-mentioned PLD can be a complex programmable logic device (English: complex programmable logic device, abbreviation: CPLD), a field-programmable logic gate array (English: field-programmable gate array, abbreviation: FPGA), a general array logic (English: generic array logic, abbreviation: GAL) or any combination thereof.
可选地,存储器604还用于存储程序指令。处理器601可以调用程序指令,实现如本申请实施例中所示的负载均衡方法。Optionally, memory 604 is also used to store program instructions. The processor 601 can call program instructions to implement the load balancing method shown in the embodiments of this application.
本发明实施例还提供了一种非暂态计算机存储介质,所述计算机存储介质存储有计算机可执行指令,该计算机可执行指令可执行上述任意方法实施例中的负载均衡方法。其中,所述存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)、随机存储记忆体(Random Access Memory,RAM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive,缩写:HDD)或固态硬盘(Solid-State Drive,SSD)等;所述存储介质还可以包括上述种类的存储器的组合。Embodiments of the present invention also provide a non-transitory computer storage medium. The computer storage medium stores computer-executable instructions. The computer-executable instructions can execute the load balancing method in any of the above method embodiments. Wherein, the storage medium can be a magnetic disk, an optical disk, a read-only memory (ROM), a random access memory (RAM), a flash memory (Flash Memory), a hard disk (Hard disk). Disk Drive (abbreviation: HDD) or solid-state drive (Solid-State Drive, SSD), etc.; the storage medium may also include a combination of the above types of memories.
虽然结合附图描述了本发明的实施例,但是本领域技术人员可以在不脱离本发明的精神和范围的情况下做出各种修改和变型,这样的修改和变型均落入由所附权利要求所限定的范围之内。Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the appended rights. within the scope of the requirements.

Claims (10)

  1. 一种负载均衡方法,其特征在于,应用于中央控制器,所述方法包括:A load balancing method, characterized in that it is applied to a central controller, and the method includes:
    获取交换机发送的本地模型的梯度参数,根据所述梯度参数优化全局模型,并将所述全局模型发送给所述交换机,以确定交换机的资源的使用率,所述资源包括处理器、内存和网络带宽;Obtain the gradient parameters of the local model sent by the switch, optimize the global model according to the gradient parameters, and send the global model to the switch to determine the usage rate of the switch's resources, the resources including processor, memory and network bandwidth;
    获取交换机基于所述资源的使用率以及蚁群算法确定的子信息素浓度,并基于所述子信息素浓度确定交换机上的任务流数量;Obtain the sub-pheromone concentration determined by the switch based on the usage rate of the resource and the ant colony algorithm, and determine the number of task flows on the switch based on the sub-pheromone concentration;
    获取交换机的实时健康系数,将所述实时健康系数与初始权重结合得到实时健康矩阵,所述初始权重是基于所述任务流数量计算得到的;Obtain the real-time health coefficient of the switch, and combine the real-time health coefficient with the initial weight to obtain a real-time health matrix. The initial weight is calculated based on the number of task flows;
    将所述实时健康矩阵与预设健康矩阵比较,根据比较结果调整交换机的负载。The real-time health matrix is compared with a preset health matrix, and the load of the switch is adjusted according to the comparison result.
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述子信息素浓度确定交换机上的任务流数量,包括:The method of claim 1, wherein determining the number of task flows on the switch based on the sub-pheromone concentration includes:
    循环获取所述交换机基于所述资源的使用率以及蚁群算法确定的子信息素浓度;Cyclically obtain the sub-pheromone concentration determined by the switch based on the usage rate of the resource and the ant colony algorithm;
    对所述子信息素浓度进行全局更新,得到信息素浓度,并对路径进行优化;Globally update the sub-pheromone concentration to obtain the pheromone concentration, and optimize the path;
    基于所述优化的路径确定交换机上的任务流数量。The number of task flows on the switch is determined based on the optimized path.
  3. 根据权利要求2所述的方法,其特征在于,所述信息素浓度采用如下公式计算得到:The method according to claim 2, characterized in that the pheromone concentration is calculated using the following formula:
    τ ij(t+1)=(1-ρ)τ ij(t)+Δτ ij(t) τ ij (t+1)=(1-ρ)τ ij (t)+Δτ ij (t)
    其中,ρ表示信息素挥发程度,Δτ ij(t)表示蚁群在路径上释放的信息素总量,τ ij(t+1)表示所述交换机i和交换机j路径上在t+1时刻的信息素浓度。 Among them, ρ represents the degree of pheromone volatilization, Δτ ij (t) represents the total amount of pheromone released by the ant colony on the path, τ ij (t+1) represents the pheromone on the path of switch i and switch j at time t+1. Pheromone concentration.
  4. 根据权利要求1所述的方法,其特征在于,所述获取交换机的实时健康系数,将所述实时健康系数与初始权重结合得到实时健康矩阵,包括:The method according to claim 1, characterized in that obtaining the real-time health coefficient of the switch and combining the real-time health coefficient with the initial weight to obtain the real-time health matrix includes:
    检测所述交换机的路径健康情况,并根据所述交换机的路径健康情况确定交换机的实时健康系数;Detect the path health of the switch, and determine the real-time health coefficient of the switch based on the path health of the switch;
    将所述交换机对应的初始权重与实时健康系数相乘,确定实时健康矩阵。Multiply the initial weight corresponding to the switch and the real-time health coefficient to determine the real-time health matrix.
  5. 一种负载均衡方法,其特征在于,应用于交换机,所述方法包括:A load balancing method, characterized in that it is applied to switches, and the method includes:
    获取数据流特征信息,基于所述数据流特征信息确定梯度参数,并将所述梯度参数发送给中央控制器;Obtain data flow characteristic information, determine gradient parameters based on the data flow characteristic information, and send the gradient parameters to the central controller;
    获取中央控制器发送的全局模型,基于所述全局模型优化本地模型,并根据所述本地模型与所述数据流特征信息计算得到资源的使用率,所述全局模型是所述中央控制器基于所述梯度参数优化得到的;Obtain the global model sent by the central controller, optimize the local model based on the global model, and calculate the resource usage based on the local model and the data flow feature information. The global model is the central controller based on the Obtained by optimizing the gradient parameters mentioned above;
    基于所述资源的使用率和蚁群算法确定子信息素浓度,并将所述子信息素浓度发送给所述中央控制器,以调整负载。The sub-pheromone concentration is determined based on the resource usage and the ant colony algorithm, and the sub-pheromone concentration is sent to the central controller to adjust the load.
  6. 根据权利要求5所述的方法,其特征在于,所述基于所述资源的使用率和蚁群算法确定子信息素浓度,包括:The method according to claim 5, characterized in that determining the sub-pheromone concentration based on the usage rate of the resource and the ant colony algorithm includes:
    基于蚁群算法分配蚂蚁的搜索任务,基于所述资源的使用率确定启发因子;Allocate the search tasks of ants based on the ant colony algorithm, and determine the heuristic factor based on the usage rate of the resources;
    根据所述启发因子计算所述蚂蚁移动到其他交换机的概率;Calculate the probability that the ant moves to other switches according to the heuristic factor;
    当所述蚂蚁移动至其他交换机,根据路径上新增的信息素以及蚁周模型计算子信息素浓度。When the ants move to other switches, the sub-pheromone concentration is calculated based on the new pheromones on the path and the ant circle model.
  7. 根据权利要求6所述的方法,其特征在于,所述蚂蚁移动到其他交换机的概率采用如下公式计算:The method according to claim 6, characterized in that the probability of the ant moving to other switches is calculated using the following formula:
    Figure PCTCN2022141797-appb-100001
    Figure PCTCN2022141797-appb-100001
    其中,
    Figure PCTCN2022141797-appb-100002
    表示蚂蚁k在下一时刻访问交换机j的概率,α表示蚂蚁对信息素的敏感程度,β表示蚁群对信息素的敏感程度,
    Figure PCTCN2022141797-appb-100003
    表示交换机i和交换机j路径上在t时刻的信息素浓度,
    Figure PCTCN2022141797-appb-100004
    η ij表示启发因子,启发因子用于描述交换机j对交换机i上蚂蚁的吸引程度,可表示为η ij=1/d ij,d ij表示交换机j和交换机i间的距离,allowed k表示尚未访问的交换机集合。
    in,
    Figure PCTCN2022141797-appb-100002
    represents the probability that ant k will visit switch j at the next moment, α represents the sensitivity of ants to pheromone, β represents the sensitivity of ant colony to pheromone,
    Figure PCTCN2022141797-appb-100003
    represents the pheromone concentration on the path of switch i and switch j at time t,
    Figure PCTCN2022141797-appb-100004
    η ij represents the heuristic factor. The heuristic factor is used to describe the degree of attraction of switch j to ants on switch i. It can be expressed as η ij =1/d ij , d ij represents the distance between switch j and switch i, and allowed k means that it has not been visited yet. A collection of switches.
  8. 一种负载均衡系统,其特征在于,包括:A load balancing system, characterized by including:
    中央控制器,所述中央控制器用于执行权利要求1-4中任一项所述的负载均衡方法;A central controller, the central controller is used to execute the load balancing method according to any one of claims 1-4;
    至少一个交换机,所述交换机与所述中央控制器连接,所述交换机用于执行权利要求5-7中任一项所述的负载均衡方法。At least one switch, the switch is connected to the central controller, and the switch is used to execute the load balancing method according to any one of claims 5-7.
  9. 一种电子设备,其特征在于,包括:An electronic device, characterized by including:
    存储器和处理器,所述存储器和所述处理器之间互相通信连接,所述存储器中存储有计算机指令,所述处理器通过执行所述计算机指令,从而执行权利要求1-7中任一项所述的负载均衡方法。A memory and a processor. The memory and the processor are communicatively connected to each other. The memory stores computer instructions. The processor executes the computer instructions to execute any one of claims 1-7. The load balancing method described.
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使计算机执行权利要求1-7中任一项所述的负载均衡方法。A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute the load balancing method described in any one of claims 1-7.
PCT/CN2022/141797 2022-07-29 2022-12-26 Load balancing method and system, and electronic device and storage medium WO2024021486A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210909025.8A CN115499376B (en) 2022-07-29 2022-07-29 Load balancing method, system, electronic equipment and storage medium
CN202210909025.8 2022-07-29

Publications (1)

Publication Number Publication Date
WO2024021486A1 true WO2024021486A1 (en) 2024-02-01

Family

ID=84465663

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/141797 WO2024021486A1 (en) 2022-07-29 2022-12-26 Load balancing method and system, and electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN115499376B (en)
WO (1) WO2024021486A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117880206A (en) * 2024-03-12 2024-04-12 深圳市艾奥科技有限公司 Load balancing method and system for Internet of things management equipment
CN118070849A (en) * 2024-02-07 2024-05-24 湖南工程学院 Method for optimizing Informer wind power prediction model based on health evaluation
CN118400748A (en) * 2024-06-07 2024-07-26 佛山市南海区大数据投资建设有限公司 Unmanned aerial vehicle base station site selection method and related equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115499376B (en) * 2022-07-29 2024-01-02 天翼云科技有限公司 Load balancing method, system, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107094115A (en) * 2017-05-19 2017-08-25 重庆邮电大学 A kind of ant group optimization Load Balance Routing Algorithms based on SDN
CN109102075A (en) * 2018-07-26 2018-12-28 联想(北京)有限公司 Gradient updating method and relevant device during a kind of distribution is trained
CN110888744A (en) * 2019-11-29 2020-03-17 杭州电子科技大学 Load balancing method based on automatic adjustment and optimization of workload
WO2021179462A1 (en) * 2020-03-12 2021-09-16 重庆邮电大学 Improved quantum ant colony algorithm-based spark platform task scheduling method
CN115499376A (en) * 2022-07-29 2022-12-20 天翼云科技有限公司 Load balancing method, system, electronic equipment and storage medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7327685B2 (en) * 2004-09-10 2008-02-05 Industry-Academic Cooperation Foundation, Yoosei University Apparatus for implementation of adaptive routing in packet switched networks
CN1996921B (en) * 2006-12-31 2010-11-24 华为技术有限公司 Method, route device and business network for establishing the business connection
CN103281245B (en) * 2013-04-26 2016-02-24 广东电网公司电力调度控制中心 Determine method and the device of business routed path
CN107454630B (en) * 2017-09-25 2021-02-02 中国联合网络通信集团有限公司 Load balancing method and load balancing router
CN108512772B (en) * 2018-03-09 2021-07-16 重庆邮电大学 Data center flow scheduling method based on service quality
CN108989133B (en) * 2018-08-27 2020-03-31 山东大学 Network detection optimization method based on ant colony algorithm
CN109474973A (en) * 2018-12-03 2019-03-15 上海金卓网络科技有限公司 Method, apparatus, equipment and medium are determined based on the ad hoc network path of ant group algorithm
KR102165865B1 (en) * 2019-07-22 2020-10-14 성균관대학교산학협력단 Methods and apparatuses for dynamic load balancing based on genetic-ant colony algorithm in software defined network
CN110784366B (en) * 2019-11-11 2022-08-16 重庆邮电大学 Switch migration method based on IMMAC algorithm in SDN
CN111611080B (en) * 2020-05-22 2023-04-25 中国科学院自动化研究所 Cooperative scheduling method, system and device for edge computing tasks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107094115A (en) * 2017-05-19 2017-08-25 重庆邮电大学 A kind of ant group optimization Load Balance Routing Algorithms based on SDN
CN109102075A (en) * 2018-07-26 2018-12-28 联想(北京)有限公司 Gradient updating method and relevant device during a kind of distribution is trained
CN110888744A (en) * 2019-11-29 2020-03-17 杭州电子科技大学 Load balancing method based on automatic adjustment and optimization of workload
WO2021179462A1 (en) * 2020-03-12 2021-09-16 重庆邮电大学 Improved quantum ant colony algorithm-based spark platform task scheduling method
CN115499376A (en) * 2022-07-29 2022-12-20 天翼云科技有限公司 Load balancing method, system, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QING-BIN NIE, CAI TING; WANG NING: "Application of improved ant colony algorithm in resource allocation of cloud computing", COMPUTER ENGINEERING AND DESIG, vol. 37, no. 8, 16 August 2016 (2016-08-16), pages 2016 - 2020, XP093133459 *
ZHUO-RAN SONG, LIU DONG, YOU YI, YU WEN-PENG: "Intelligent dispatching strategy considering the health condition of power equipments", POWER SYSTEM PROTECTION AND CONTROL, vol. 39, no. 20, 16 October 2011 (2011-10-16), pages 43 - 47, XP093133456 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118070849A (en) * 2024-02-07 2024-05-24 湖南工程学院 Method for optimizing Informer wind power prediction model based on health evaluation
CN117880206A (en) * 2024-03-12 2024-04-12 深圳市艾奥科技有限公司 Load balancing method and system for Internet of things management equipment
CN118400748A (en) * 2024-06-07 2024-07-26 佛山市南海区大数据投资建设有限公司 Unmanned aerial vehicle base station site selection method and related equipment

Also Published As

Publication number Publication date
CN115499376B (en) 2024-01-02
CN115499376A (en) 2022-12-20

Similar Documents

Publication Publication Date Title
WO2024021486A1 (en) Load balancing method and system, and electronic device and storage medium
Gai et al. Reinforcement learning-based content-centric services in mobile sensing
US10686672B2 (en) Method for generating routing control action in software-defined network and related device
EP3659305B1 (en) Proactive link load balancing to maintain quality of link
Zhou et al. A load balancing strategy of sdn controller based on distributed decision
US11582163B2 (en) System for early system resource constraint detection and recovery
US10341208B2 (en) File block placement in a distributed network
US20230047068A1 (en) Data Processing Method, Apparatus, and System
CN104092756B (en) A kind of resource dynamic distributing method of the cloud storage system based on DHT mechanism
US10404603B2 (en) System and method of providing increased data optimization based on traffic priority on connection
WO2019134197A1 (en) Method and system for selecting minimum load router based on naive bayes classifier
CN113498508A (en) Dynamic network configuration
WO2021120633A1 (en) Load balancing method and related device
CN113315716A (en) Method and equipment for training congestion control model and method and equipment for congestion control
Fröhlich et al. Smart SDN management of fog services
CN117938755B (en) Data flow control method, network switching subsystem and intelligent computing platform
CN113422812A (en) Service chain deployment method and device
CN109815204A (en) A kind of metadata request distribution method and equipment based on congestion aware
CN109951317B (en) User-driven popularity perception model-based cache replacement method
CN113672372B (en) Multi-edge collaborative load balancing task scheduling method based on reinforcement learning
US11336473B2 (en) Network and method for delivering content while minimizing congestion costs by jointly optimizing forwarding and caching strategies
Li et al. A fuzzy-based fast routing algorithm with guaranteed latency-throughput over software defined networks
CN106775942B (en) Cloud application-oriented solid-state disk cache management system and method
US20160255004A1 (en) System for dynamic selection and application of tcp congestion avoidance flavors
KR102537023B1 (en) Method for controlling network traffic based traffic analysis using AI(artificial intelligence) and apparatus for performing the method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22952915

Country of ref document: EP

Kind code of ref document: A1