CN108810115B

CN108810115B - Load balancing method and device suitable for distributed database and server

Info

Publication number: CN108810115B
Application number: CN201810530688.2A
Authority: CN
Inventors: 曹晟; 蒋长红; 王靖; 邹杰成; 梅亚双
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2018-05-29
Filing date: 2018-05-29
Publication date: 2020-12-01
Anticipated expiration: 2038-05-29
Also published as: CN108810115A

Abstract

The invention relates to a load balancing method, a device and a server suitable for distributed databases, wherein the method comprises a distributed management node initialization stage for completing election and metadata management of distributed management nodes and a load balancing stage for selecting proper distributed database nodes for a client, wherein each distributed database node sends metadata to the distributed management nodes when being started, the distributed management nodes arrange the metadata of all the distributed database nodes into metadata arrays according to the weight value sequencing of each distributed database node, and store the metadata arrays in the distributed management nodes and each distributed database node; and meanwhile, the current load state information of all the distributed database nodes is stored in each distributed database node, so that when the current distributed management node is offline, a proper next distributed management node can be selected through a load balancing stage, and the problems of single-point faults and metadata loss are avoided.

Description

Load balancing method and device suitable for distributed database and server

Technical Field

The invention relates to the technical field of distributed databases, in particular to a load balancing method, a load balancing device and a load balancing server suitable for the distributed databases.

Background

Most database systems today are distributed database systems, and a plurality of servers are deployed in a network cluster to store data in the database, which needs a method for selecting a suitable cluster server for a client. The load balancing technology is used for balancing the storage and processing resources of a plurality of servers in a cluster, so that the overall performance of the system is more efficient. How to realize the load balancing function of the distributed database has important significance.

The patent with the application name of CN107783823A provides a load balancing method and a load balancing device, which realize load balancing of a host by automatically controlling dynamic migration of a virtual machine, achieve the purposes of timely, accurate and efficient resource scheduling, and greatly improve the utilization rate of host resources.

The patent with the application name of publication number CN107480254A, which is entitled online load balancing method for distributed memory database, provides an online load balancing method for distributed memory database, which first uses a greedy algorithm based on data backup location to calculate a new partition for the current database, so that the requests received by each server under the new partition are uniformly distributed, and then applies the new data partition to the database through online data migration. Compared with the prior art, the invention can complete load balancing more quickly, and has very little influence on the execution of the transaction request.

The patent with the application name of CN105787006A relates to a load balancing system for a database server, which can effectively reduce the load of the database server, increase the throughput of the database server, improve the operation efficiency, and is safe and reliable.

The patent with the application name of load balancing of the distributed NewSQL database system with the publication number of CN107368575A discloses a load balancing distributed NewSQL database system, which can solve the problems of uneven data distribution and load imbalance caused by the uneven data distribution in the operation of HBase.

However, the methods and systems disclosed in the above patents all have the following problems and disadvantages:

the problem of single point failure is easy to occur when a load balancing algorithm is operated on an independent server. When a server running a load balancing algorithm fails, the functions of the whole distributed database cluster are affected, the load balancing function cannot be completed, and metadata required by load balancing is stored in an independent server, so that the problem of metadata loss is easily caused.

Disclosure of Invention

The invention aims to solve the problems in the prior art, and provides a load balancing method, a load balancing device and a server suitable for a distributed database, which are used for solving the problems that in the prior art, a single-point fault is easy to occur when a load balancing algorithm is operated on an independent server, and metadata required by load balancing is easy to lose when the metadata is stored in the independent server.

In order to achieve the above object, in one aspect, the present invention provides a load balancing method for a distributed database, including a distributed database cluster composed of a plurality of distributed database nodes, where the method includes a distributed management node initialization stage for completing election and metadata initialization of distributed management nodes and a load balancing stage for selecting an appropriate distributed database node for a client, and specifically includes the following steps:

(1) selecting a distributed database node from a distributed database cluster as a distributed management node according to a preset rule;

(2) the distributed management node receives metadata started by each distributed database node and a weight value P of each distributed database node, wherein the metadata comprises a network address of each distributed database node, and the memory capacity and the processing capacity of a CPU (Central processing Unit);

(3) the distributed management nodes are sequenced according to the weight value P of each distributed database node, the metadata of all the distributed database nodes are arranged into metadata arrays, and the metadata arrays are stored in the distributed management nodes and each distributed database node;

(4) the distributed management node receives the current load state information of each distributed database node at regular time and stores the current load state information of all the distributed database nodes to each distributed database node;

(5) the distributed management node selects a proper distributed database node for the client according to the metadata of each distributed database node in the metadata array and the load state information of each distributed database node, and the effect of load balancing is achieved.

In a second aspect, the present invention further provides a load balancing apparatus suitable for a distributed database, including a distributed database cluster composed of a plurality of distributed database nodes, specifically including:

the management node selection module is used for selecting one distributed database node from the distributed database cluster as a distributed management node according to a preset rule;

the receiving module is used for the distributed management nodes to receive metadata started by each distributed database node and the weight value P of each distributed database node, wherein the metadata comprises the network address of each distributed database node, the memory capacity and the processing capacity of the CPU;

the metadata array module is used for the distributed management nodes to sort the metadata of all the distributed database nodes into metadata arrays according to the weight value P of each distributed database node, and the metadata arrays are stored in the distributed management nodes and each distributed database node;

the load updating module is used for regularly receiving the current load state information of each distributed database node by the distributed management node and storing the current load state information of all the distributed database nodes to each distributed database node;

and the database node selection module is used for selecting a proper distributed database node for the client by the distributed management node according to the metadata of each distributed database node in the metadata array and the load state information of each distributed database node, so that the effect of load balancing is achieved.

In a third aspect, the present invention also provides a server, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the following steps:

One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:

the embodiment of the invention provides a load balancing method, a device and a server suitable for a distributed database, wherein the method comprises a distributed management node initialization stage for completing election and metadata initialization of distributed management nodes and a load balancing stage for selecting a proper distributed database node for a client, and when the distributed management node goes offline, one distributed database node is selected from a distributed database cluster as the distributed management node according to a preset rule. The distributed management node is a server in the distributed database cluster as well as each distributed database node, and only one distributed management node is selected as the management node. Sending metadata to distributed management nodes when each distributed database node is started, arranging the metadata of all the distributed database nodes into metadata arrays by the distributed management nodes according to the weight value P of each distributed database node, and storing the metadata arrays in the distributed management nodes and each distributed database node; and meanwhile, the current load state information of all the distributed database nodes is stored in each distributed database node, so that when the current distributed management node is offline, a proper next distributed management node can be selected through a load balancing stage, the problem of single-point failure is avoided, and the problem of metadata loss is also avoided.

And considering the processor and memory information of each distributed database node, sequencing according to the weight value P of each distributed database node. Each time the distributed database node is selected, the distributed database node is selected from the next distributed database node of the metadata array, and therefore the situation that one distributed database node processes all client requests can be avoided. When the processor and the memory reach a certain value, the current distributed database node is abandoned temporarily, so that a better balancing effect can be obtained.

Drawings

Fig. 1 is a general flowchart of a load balancing method suitable for a distributed database according to an embodiment of the present invention;

fig. 2 is a schematic view of a load balancing flow of a load balancing method suitable for a distributed database according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a load balancing apparatus suitable for a distributed database according to a second embodiment of the present invention;

fig. 4 is a schematic structural diagram of a server according to a third embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.

The problem that in the prior art, a single-point fault is easy to occur when a load balancing algorithm is operated on an independent server, and the problem that metadata is easy to lose when metadata required by load balancing is stored in the independent server is solved.

The distributed database nodes in the invention are servers with data, the distributed database cluster is composed of a plurality of distributed database nodes, and the distributed management nodes are selected from the distributed database cluster.

The embodiment of the invention provides a load balancing method, a device and a server suitable for a distributed database, which comprise a distributed database cluster consisting of a plurality of distributed database nodes, wherein the method comprises a distributed management node initialization stage for completing election and metadata initialization of distributed management nodes and a load balancing stage for selecting a proper distributed database node for a client, and specifically comprises the following steps:

And when the distributed management node is offline, selecting one distributed database node from the distributed database cluster as the distributed management node according to a preset rule. The distributed management node is a server in the distributed database cluster as well as each distributed database node, and only one distributed management node is selected as the management node. Sending metadata to distributed management nodes when each distributed database node is started, arranging the metadata of all the distributed database nodes into metadata arrays by the distributed management nodes according to the weight value P of each distributed database node, and storing the metadata arrays in the distributed management nodes and each distributed database node; and meanwhile, the current load state information of all the distributed database nodes is stored in each distributed database node, so that when the current distributed management node is offline, a proper next distributed management node can be selected through a load balancing stage, and the rest nodes are used as distributed database nodes for storing specific data, so that the problem of single-point failure is avoided, and the problem of metadata loss is also avoided.

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

Example one

A load balancing method suitable for a distributed database, as shown in fig. 1 and fig. 2, includes a distributed database cluster composed of a plurality of distributed database nodes, the method includes a distributed management node initialization phase for completing election and metadata initialization of distributed management nodes and a load balancing phase for selecting a suitable distributed database node for a client, and specifically includes the following steps:

in this step, the distributed management node is a distributed database node running a load balancing algorithm, and the distributed management node can avoid a single point of failure because it includes two stages, namely, an initialization stage of the distributed management node and a selection stage of a database node for a client. When the distributed management node server is offline, a distributed database node is newly selected from distributed database cluster nodes as a distributed management node through a preset rule, wherein the preset rule specifically comprises the following steps: and sequencing according to the starting time of the distributed database nodes in the distributed database cluster, taking the distributed database node which is started firstly as a distributed management node, and selecting the distributed database node which is started secondly as a new distributed management node when the distributed database node is offline.

(2) The distributed management node receives metadata started by each distributed database node and a weight value P of each distributed database node, wherein the metadata comprises a network address of each distributed database node, and the memory capacity and the processing capacity of a CPU (Central processing Unit); the weighted value P of each distributed database node is calculated according to the following formula:

P＝Lc×Lu，

and Lc is the number of the CPUs of the nodes of the distributed database, and Lu is the total memory of the nodes of the distributed database.

(3) The distributed management nodes are sequenced according to the weight value P of each distributed database node, the metadata of all the distributed database nodes are arranged into metadata arrays, and the metadata arrays are stored in the distributed management nodes and each distributed database node; therefore, when the current distributed management node is offline, the problem of metadata loss is avoided.

(4) Each distributed database node sends current load state information to a distributed management node at regular time through heartbeat; the load state information comprises the current CPU utilization rate C and the current distributed database node memory surplus R, and the unit is megabyte. The distributed load balancing is thus initialized.

The specific process of the initialization stage is as follows:

(5) the distributed management node selects a proper distributed database node for the client according to the metadata of each distributed database node in the metadata array and the load state information of each distributed database node to achieve the effect of load balancing, and the specific process is as follows:

(51) setting the load S to be 1 if C is greater than 0.9 or R is less than 300, otherwise, setting the load S to be 0;

(52) traversing the metadata array, sequentially checking each distributed database node, and if no distributed database node exists in the metadata array, returning error information to the client; if yes, checking the load S of the current traversal distributed database node, if S is 0, returning the network address of the current distributed database node to the client, and starting checking from the next distributed database node in the next traversal; and if S is 1, skipping the distributed database node, and sequentially checking the next distributed database node until the algorithm is ended or an error message is returned.

The load balancing stage comprises the following specific processes:

after the distributed management nodes are initialized, metadata needs to be maintained, and the distributed database nodes in the distributed database cluster send load information to the distributed management nodes at fixed time through heartbeat. After the cluster metadata is properly maintained, the appropriate distributed database nodes can be selected for the client. Whenever a client sends a connection request, the distributed management node selects the network address of the appropriate distributed database node to the client. And the client is connected to a specific distributed database node according to the selected network address and sends a specific command request.

The distributed management node is a server in the distributed database cluster as each distributed database node, and only one distributed management node is selected as the management node. The distributed load information is stored in all the servers, so that when the current distributed management node is offline, the next distributed management node can be elected through a load balancing stage, and single-point faults are avoided.

The sorting is performed according to the weight value P of each server in consideration of the processor and memory information of each server. Each time the selection is made, the selection is made from the next distributed database node of the metadata array, so that the situation that one large server processes all client requests can be avoided. When the processor and the memory reach a certain value, the current distributed database is abandoned temporarily, so that a better balancing effect can be obtained.

Based on the same inventive concept, the application provides a device corresponding to the embodiment, which is detailed in the second embodiment.

Example two

Corresponding to the first embodiment, the present embodiment provides a load balancing apparatus suitable for a distributed database, as shown in fig. 3, including a distributed database cluster composed of a plurality of distributed database nodes, specifically including:

Since the apparatus described in this embodiment is to implement a load balancing method suitable for a distributed database in the first embodiment of this application, based on the method described in the first embodiment of this application, a person skilled in the art can understand a specific implementation manner of the method of this embodiment and various variations thereof, so that a detailed description of how to implement the method in this embodiment of this application is omitted here. The apparatus used by those skilled in the art to implement the method in the embodiments of the present application is within the scope of the present application.

Based on the same inventive concept, the application provides a server corresponding to the third embodiment.

EXAMPLE III

The present embodiment provides a server, as shown in fig. 4, including a memory, a processor, and a server program stored in the memory and capable of running on the processor, where the processor executes the server program to implement the following steps:

In a specific implementation process, when the processor executes the server program, any one of the first embodiment can be implemented.

Since the server described in this embodiment is a device used for implementing a distributed algorithm-based database load balancing method in the first embodiment of the present application, a specific implementation manner of the server and various variations thereof can be known to those skilled in the art based on the device described in the first embodiment of the present application, and therefore, how to implement the method in the embodiment of the present application by the server is not described in detail herein. The equipment used by those skilled in the art to implement the system in the embodiments of the present application is within the scope of the present application.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, apparatus or server program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a server program product embodied on one or more server-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having server-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus and server program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by server program instructions. These server program instructions may be provided to a processor of a general purpose server, special purpose server, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the server or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These server program instructions may also be stored in a server readable memory that can direct a server or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the server readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These server program instructions may also be loaded onto a server or other programmable data processing apparatus to cause a series of operational steps to be performed on the server or other programmable apparatus to produce a server implemented process such that the instructions which execute on the server or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims

1. A load balancing method suitable for a distributed database is characterized in that the load balancing method is applied to a distributed database cluster consisting of a plurality of distributed database nodes, the method comprises a distributed management node initialization stage for completing election and metadata initialization of distributed management nodes and a load balancing stage for selecting proper distributed database nodes for a client, and the method specifically comprises the following steps:

(5) the distributed management node selects a proper distributed database node for the client according to the metadata of each distributed database node in the metadata array and the load state information of each distributed database node, so that the effect of load balancing is achieved;

the preset rule in the step (1) is as follows: sequencing according to the starting time of distributed database nodes in the distributed database cluster, taking the distributed database node which is started firstly as a distributed management node, and selecting the distributed database node which is started secondly as a new distributed management node when the distributed database node fails;

the weighted value P of each distributed database node in the step (2) is calculated by the following formula: p is Lc × Lu, where Lc is the number of CPUs of the distributed database nodes, and Lu is the total memory of the distributed database nodes;

the load state information in the step (4) comprises the current CPU utilization rate C and the current distributed database node memory surplus R, and the unit is megabyte;

the specific process of the step (5) is as follows:

2. The utility model provides a load balancing unit suitable for distributed database which characterized in that, is applied to the distributed database cluster that comprises many distributed database nodes, specifically includes:

the database node selection module is used for selecting a proper distributed database node for the client by the distributed management node according to the metadata of each distributed database node in the metadata array and the load state information of each distributed database node, so that the effect of load balancing is achieved;

the preset rule in the management node selection module is as follows: sequencing according to the starting time of distributed database nodes in the distributed database cluster, taking the distributed database node which is started firstly as a distributed management node, and selecting the distributed database node which is started secondly as a new distributed management node when the distributed database node fails;

the weight value P of each distributed database node in the metadata array module is calculated according to the formula: p is Lc × Lu, where Lc is the number of CPUs of the distributed database nodes, and Lu is the total memory of the distributed database nodes;

the specific selection process of the database node selection module is as follows:

(1) setting the load S to be 1 if C is greater than 0.9 or R is less than 300, otherwise, setting the load S to be 0; c is the CPU utilization rate of the current distributed database node, R is the residual amount of the memory of the current distributed database node, and the unit is megabyte;

(2) traversing the metadata array, sequentially checking each distributed database node, and if no distributed database node exists in the metadata array, returning error information to the client; if yes, checking the load S of the current traversal distributed database node, if S is 0, returning the network address of the current distributed database node to the client, and starting checking from the next distributed database node in the next traversal; and if S is 1, skipping the distributed database node, and sequentially checking the next distributed database node until the algorithm is ended or an error message is returned.

3. A server comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the steps of:

the specific process of the step (5) is as follows: