CN114945024B

CN114945024B - Method for balancing and optimizing server load based on long-term and short-term memory network

Info

Publication number: CN114945024B
Application number: CN202210556058.9A
Authority: CN
Inventors: 景维鹏; 陈广胜; 李林辉
Original assignee: Northeast Forestry University
Current assignee: Northeast Forestry University
Priority date: 2022-05-19
Filing date: 2022-05-19
Publication date: 2023-05-12
Anticipated expiration: 2042-05-19
Also published as: CN114945024A

Abstract

The invention provides a server load balancing optimization method based on a long-term and short-term memory network, which comprises the following steps: collecting data of one month of a working period of a server cluster as training data of an LSTM-based load weight module, and training an LSTM network to obtain the load weight module capable of identifying the performance difference of the server; loading the trained LSTM-based load weight module into a load balancer of a server cluster to predict the server load; and after the load balancing optimization algorithm receives the load confidence coefficient of each server, the weighted load balancing algorithm takes the current confidence coefficient as a new weight, performs server allocation operation on the current request and request processing, and records the state of the current server cluster so as to update the subsequent LSTM-based load weight module. The model of the invention has smaller parameter quantity, improves the operation speed and improves the calculation efficiency of the load balancing algorithm.

Description

Method for balancing and optimizing server load based on long-term and short-term memory network

Technical Field

The invention belongs to the technical field of servers of networks, and particularly relates to a server load balancing optimization method based on a long-term and short-term memory network.

Background

In today's internet environment, due to the dramatic increase in network users, access requests to network applications are also increasing sharply, which also results in a sudden increase in the pressure of backend servers to process network requests. The ability of server clusters for each internet enterprise to handle an extremely large number of network requests in a short time brings great challenges. Therefore, how to make the backend server process not only daily service requests, but also high concurrency of a large number of network requests in a short time, and improving the service quality is a problem worthy of research.

Typically, a single server has limited request processing capacity. Thus, the simplest and effective solution to deal with a large number of concurrent requests is to combine multiple servers together to form a server cluster system to serve a large number of users by increasing the number of servers, which is also the first-most approach commonly used by many internet companies, but it is observed from the user's perspective that a "one" server is still providing service. After the limit of the number of servers is solved, new problems are brought. When a new user request comes, the request should be handled by which server in the cluster. A reasonable approach would be to have the requests handled by each server approximately equal, i.e. the load pressure is relatively even, to ensure that the entire server cluster is in a stable operating state. A technique in which a hub system (load balancer) determines that a user request is sent to a server to ensure stable and efficient operation of a cluster is called a load balancing technique.

Traditional load balancing techniques are mainly divided into two categories: static load balancing and dynamic load balancing. Static load balancing policies are unchanged from the current server state, such as: a round robin algorithm (RR), a weighted round robin algorithm (WRR), a source address hashing algorithm, etc. The polling algorithm assumes that the processing capacity of each server is the same, the user request is distributed to each server in turn, and the method is simple, and is suitable for server clusters with average time interval of the user request and same software and hardware configuration, wherein the server cluster is started from the first server to the last server and then is cycled. Because the real servers are difficult to achieve complete consistency in software and hardware settings, different servers often handle different business requirements. Weighted round robin algorithm in order to solve this problem, each server is given a corresponding weight according to its request processing capability. When multiple requests arrive, more requests are assigned to servers with higher weights. The core of the weighted polling algorithm is to generate a sequence of servers, the number of elements in the sequence is equal to the total number of the servers, and different server requests can be given to the servers according to the sequence in turn when each request comes, so that the performance difference of each server is difficult to accurately estimate, and the problem is common in a load balancing method. The idea of the source address hash method is essentially derived from a common hash algorithm, an IP address requested by a user is subjected to a hash function to obtain a corresponding hash value, and the number of the servers is modulo to obtain the corresponding serial numbers of the servers which should be endowed. It can thus be seen that for requests initiated by the same user, the source address Ha Xifa always distributes the request to the same server node for processing.

For the polling algorithm, performance differences among servers are not considered, the performance of the entire cluster is limited by servers with poorer performance, and uneven load distribution of the cluster is caused if user requests are unevenly distributed in time. For the weighted polling algorithm, the performance of each server cannot be accurately estimated, so that the weight of each server cannot be dynamically adjusted, and once a certain server fails, the problem of uneven load is easily caused.

When a certain server node fails, the source address hash algorithm can cause that the corresponding user request of the node cannot be effectively processed.

To be able to introduce the state of the server into the load balancing algorithm, researchers have proposed dynamic load balancing algorithms such as: a minimum connection number method (LC), a fastest response speed method, and an observation mode method. The minimum connection method selects a server with the minimum connection number to process the current request according to the current user connection condition processed by each server node when the user request arrives, and dynamically maintains the equal number of the requests processed by each server. The fastest response speed method considers the response time of different server nodes, and allocates more requests to service nodes with short response time and less requests to nodes with long response time. The observation mode method is to give different weights to different servers by comprehensively considering the two indexes and weighting the response time and the connection condition of the servers.

Whether the minimum connection number method or the fastest response speed method is adopted, the connection numbers and response time of all servers need to be calculated when the request comes, a certain delay is caused, and the real-time performance state of the current cluster cannot be accurately reflected only by the connection numbers and the response time.

Disclosure of Invention

In order to solve the problem that the existing load balancing algorithm cannot accurately measure the performance of a server cluster, the invention provides a method for predicting the performance of the server by using a long-term and short-term memory network. Because the working performance of the server can change at any moment, a neural network model with a certain parameter is used for learning the server, performance indexes of the server at different moments are input into the network in the form of characteristics, and the confidence coefficient of the different servers at the next moment is obtained by utilizing a feedback mechanism of the neural network, so that the performance of the server is introduced into the consideration of load distribution, and the limitation of a static load balancing method is relieved.

In addition, the problem that the dynamic load balancing algorithm has higher calculation complexity and cannot reasonably distribute requests to load nodes is solved. According to the load balancing optimization algorithm based on the long-short-term memory network, the current server state and the request are simultaneously used as network input through accumulating the earlier-term request distribution data, and the load pressure caused by a large number of requests is relieved by utilizing GPU (graphic processing unit) acceleration calculation.

The specific technical scheme is as follows:

the server load balancing optimization method based on the long-short-term memory network is realized through the existing deep learning framework Pytorch and a corresponding programming library. Among them, pytorch mainly uses a RNN-based deep learning model or the like.

The method comprises the following specific steps:

s1, collecting data of one month of a working period of a server cluster as training data of an LSTM-based load weight module, wherein the training data is mainly divided into two parts:

the first part, each time the user requests, the state of each server (such as CPU occupancy rate, memory occupancy rate, current network bandwidth, etc.) and the user information of the current request;

the second part, server allocation results for each user request.

The first part of data is vectorized to be used as a main body of training data, and the second part of data is used as a training label. And performing multiple rounds of training on the long-short-period memory network, and performing fine adjustment on parameters of the model after each training to learn the difference between the servers, thereby finally obtaining the LSTM-based load weight module.

S2, loading the trained LSTM-based load weight module into a load equalizer of the server cluster to predict the server load. Each time a server cluster receives a new user request, a load balancer firstly carries out vectorization on the user request to generate user characteristics, then reads the state of each server in the current cluster and carries out vectorization, and the state vectors of all servers are weighted and averaged to obtain the cluster characteristics of the whole cluster. The user characteristics and the cluster characteristics are spliced to obtain the input of a load weight module based on LSTM in the load equalizer, the load confidence of each server in the current cluster is obtained through three gate structure calculation of the module, and the load confidence is output to a second component part of the load equalizer: load balancing optimization algorithm based on weight confidence.

The three door structures are a forgetting door, an updating door and an outputting door.

And S3, after the load balancing optimization algorithm receives the load confidence coefficient of each server, the weighted load balancing algorithm takes the load confidence coefficient as a reference value of the weight, performs server allocation operation on the current request and request processing, and records the state of the current server cluster so as to update the subsequent LSTM-based load weight module.

The load balancing optimization algorithm may select a variety of weighted load balancing algorithms, such as a weighted round robin algorithm, a weighted least number of connections algorithm.

The invention realizes the following method and brings a certain realistic benefit:

1. the invention realizes a method for more accurately estimating the performance difference between servers by utilizing the LSTM, and plays a certain optimizing effect on the traditional load balancing algorithm

2. The model has smaller parameter number, can improve the operation speed, and further improves the calculation efficiency of the load balancing algorithm because the GPU can be used for accelerating calculation.

3. As the amount of data is accumulated, the model may be updated dynamically.

Drawings

FIG. 1 is a frame diagram of the present invention;

fig. 2 is a flow chart of the present invention.

Detailed Description

The specific technical scheme of the invention is described by combining the embodiments.

As shown in fig. 1, the LSTM-based server load balancing optimization method framework is shown in fig. 2 in a flowchart:

the method comprises the following specific steps:

s1, collecting data of one month of a working period of a server cluster as training data of an LSTM-based load weight module, wherein the training data is mainly divided into two parts: the first part, each time the user requests, the state of each server (such as CPU occupancy rate, memory occupancy rate, current network bandwidth, etc.) and the user information of the current request; the second part, server allocation results for each user request. The first part of data is vectorized to be used as a main body of training data, and the second part of data is used as a training label. And performing multiple rounds of training on the long-short-period memory network, and performing fine adjustment on parameters of the model after each training to learn the difference between the servers, thereby finally obtaining the LSTM-based load weight module.

S2, loading the trained LSTM-based load weight module into a load equalizer of the server cluster to predict the server load. Each time a server cluster receives a new user request, a load balancer firstly carries out vectorization on the user request to generate user characteristics, then reads the state of each server in the current cluster and carries out vectorization, and the state vectors of all servers are weighted and averaged to obtain the cluster characteristics of the whole cluster. The user characteristics and the cluster characteristics are spliced to obtain the input of a load weight module based on LSTM in the load equalizer, the load confidence of each server in the current cluster is obtained through the calculation of three gate structures (forget gate, update gate and output gate) of the module, and the load confidence is output to a second component part of the load equalizer: load balancing optimization algorithm based on weight confidence.

Finally, the invention uses Httperf, autobench performance testing tools to simulate a plurality of users to carry out a large number of access operations on the server, and can obtain effective performance testing indexes only in a relatively real system pressure environment. The test tool tests the server sending multiple groups of concurrent request data with different amounts, and gradually increases the amount of concurrent in the experiment, and takes 10 rounds of test and average value as the final result. The comparison method comprises a polling algorithm, a weighted polling algorithm, a minimum connection number method, a fastest response speed method and an observation mode method. And finally, calculating the response time of the cluster.

Table 1: response time experimental results

The experimental results are shown in table 1, and it can be observed that the load balancing algorithm optimized by using LSTM has a certain reduction in response time, and the effect of the reduction gradually increases with the increase of the concurrency number. Compared with other algorithms, the neural network can more accurately utilize the performance of the server node, and the overall stability of the system is improved.

Claims

1. The server load balancing optimization method based on the long-term and short-term memory network is characterized by comprising the following steps of:

s1, collecting data of one month of a working period of a server cluster as training data of an LSTM-based load weight module, wherein the training data is divided into two parts:

vectorizing the first part of data to be used as a main body of training data, and using the second part of data as a training label; performing multiple rounds of training on the long-short-period memory network, performing fine adjustment on parameters of the model after each training to learn differences among servers, and finally obtaining a load weight module based on LSTM;

the training data is divided into two parts:

the first part: the state of each server and the user information of the current request when each user request is carried out;

the second part: distributing results by a server requested by each user;

s2, loading the trained LSTM-based load weight module into a load equalizer of a server cluster to predict the load of the server;

each time a server cluster receives a new user request, a load balancer firstly carries out vectorization on the user request to generate user characteristics, then reads the state of each server in the current cluster and carries out vectorization, and the state vectors of all servers are weighted and averaged to obtain the cluster characteristics of the whole cluster; the user characteristics and the cluster characteristics are spliced to obtain the input of a load weight module based on LSTM in the load equalizer, the load confidence of each server in the current cluster is obtained through three gate structure calculation of the module, and the load confidence is output to a second component part of the load equalizer: a load balancing optimization algorithm based on weight confidence;

2. The optimization method for load balancing of a server based on a long-short-term memory network according to claim 1, wherein in S2, the three gate structures are respectively a forget gate, an update gate and an output gate.

3. The optimization method for load balancing of a server based on a long-term and short-term memory network according to claim 1, wherein in S3, the load balancing optimization algorithm selects a plurality of weighted load balancing algorithms.