CN111061557B

CN111061557B - Method and device for balancing distributed memory database load

Info

Publication number: CN111061557B
Application number: CN201811204395.1A
Authority: CN
Inventors: 方超
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-10-16
Filing date: 2018-10-16
Publication date: 2023-04-14
Anticipated expiration: 2038-10-16
Also published as: CN111061557A

Abstract

The application provides a method and a device for balancing distributed memory database load, and belongs to the technical field of big data. The method comprises the following steps: when the load of the distributed memory database is balanced, the indication information of the fragments processed by each node during load balancing can be determined based on the number of the current fragments of each node, the average time length of the single fragments processed on the local node and the average time length of the single fragments processed on the non-local node in a cross-network mode, wherein the indication information is used for indicating the fragments from each node and informing each node, and each subsequent node can acquire the fragments for processing based on the indication information. By the adoption of the method and the device, the fragment processing efficiency can be improved.

Description

Method and device for balancing distributed memory database load

Technical Field

The present application relates to the field of big data technologies, and in particular, to a method and an apparatus for balancing distributed memory database loads.

Background

With the development of computer technology and network technology, distributed systems become more and more extensive in the field of large data, a distributed memory database is used as a main database in the distributed systems, data is mainly stored in a memory to provide an operation database, and compared with magnetic disk storage, the storage based on the distributed memory database has higher applicability because the reading and writing speed of the distributed memory database is far higher than that of a magnetic disk.

In the distributed memory database, each node can process a fragment (the fragment is a logic unit composed of data with a fixed number of lines), but in the running process of the distributed memory database, the CPU occupancy rates of the nodes are different (i.e. load imbalance) due to some reasons (such as expansion of the distributed memory database, absence of fragments of a newly added node, network delay of a node, missing of distribution of new data, offline restart and recovery of a node, removal of partial expired data in a previously stored fragment, missing of distribution of new data during offline, and the like), some nodes have high CPU occupancy rates, while other nodes have low CPU occupancy rates, nodes with high CPU occupancy rates have low fragment processing capability and long waiting time, so that fragment processing efficiency is low.

Disclosure of Invention

In order to solve the problems in the prior art, embodiments of the present invention provide a method and an apparatus for balancing distributed memory database loads. The technical scheme is as follows:

in a first aspect, a method for balancing distributed in-memory database load is provided, where the method includes:

acquiring the number of current fragments of each node in a target distributed memory database and an identifier of the current fragments, and acquiring a first time length and a second time length, wherein the first time length is the average time length of the single fragment processed on a local node, and the second time length is the average time length of the single fragment processed on a non-local node in a cross-network mode;

according to the number of the current fragments of each node, the identifier of the current fragments of each node, the first time length and the second time length, determining indication information of the fragments processed by each node in load balancing, wherein the indication information is used for indicating the fragments from each node;

and for each node, sending the indication information of the fragment of the node to the node so that the node acquires the fragment according to the indication information of the fragment of the node and processes the fragment.

Optionally, the obtaining the first duration and the second duration includes:

counting the time lengths of the first preset number of fragments which are respectively processed on the local node, and counting the time lengths of the second preset number of fragments which are respectively processed on the cross-network non-local node;

and determining that the counted average value of the time lengths of which the first preset number of fragments are respectively processed on the local node is a first time length, and determining that the counted average value of the time lengths of which the second preset number of fragments are respectively processed on the cross-network non-local node is a second time length.

Thus, the determined first duration and the second duration can be more accurate.

Optionally, the determining, according to the number of current fragments of each node, the identifier of the current fragment of each node, the first time length, and the second time length, indication information of a fragment processed by each node in load balancing, where the indication information is used to indicate a fragment from each node, includes:

determining the number of fragments from each node in the fragments processed by each node during load balancing according to the number of current fragments of each node, the first time length and the second time length;

and determining the indication information of the fragments processed by each node according to the number of the fragments from each node in the fragments processed by each node and the identification of the current fragment of each node.

Optionally, the indication information includes:

the starting fragment mark and the ending fragment mark in the fragments from each node; alternatively, the first and second electrodes may be,

the number of fragments from each node and the starting fragment identifier; alternatively, the first and second liquid crystal display panels may be,

the number of fragments from each node and the end fragment identification.

Optionally, the determining, according to the number of current fragments of each node, the first duration and the second duration, the number of fragments from each node in the fragments processed by each node in load balancing includes:

determining the number of fragments from each node in the fragments processed by each node in load balancing on the basis of the minimum value of TotalCost, wherein the sum of the current numbers of the fragments of all nodes in the target distributed memory database is equal to the sum of the numbers of the fragments processed on all nodes in load balancing, totalCost is the sum of the variances of the total time lengths of the fragments processed by each node in the target distributed memory database,

Cost _k total duration of fragment processing for node k, cost _k Comprises processing the time length X of the node k for processing the fragments on the node k _kk * t and the time length for which node k processes fragments of other nodes>

t is a first duration, ct is a second duration, n is the number of nodes in the target distributed memory database, X _kk Processing the number of fragments stored on the node k for the node k, X _ik And processing the number of fragments stored on the node i for the node k, wherein the node k is any node in the target distributed memory database, and k is greater than or equal to 1.

Optionally, the determining, in the principle of minimizing the TotalCost value, the number of fragments from each node in the fragments processed by each node in load balancing includes:

converting the TotalCost into a matrix form TotalCost = X ^T HX, wherein X = (X) ₁₁ ,X ₁₂ ,...,X _1n ,...X _ij ,...,X _n1 ,X _n2 ,...,X _nn ) ^T ，X _ij Processing the number of fragments on the ith node for the jth node, wherein i, j and n are positive integers, H is a symmetric matrix, and the sum of elements in X is less than or equal to the sum of the current number of fragments of all nodes in the target distributed memory database;

with X ^T And determining the number of fragments from each node in the fragments processed by each node in load balancing by taking the minimum value of HX as a principle.

Optionally, the obtaining the number of current fragments of each node in the target distributed memory database includes:

when a data query request is received, acquiring the number of current fragments of each node in a target distributed memory database;

the method further comprises the following steps:

receiving the result of processing the fragments of each node in the target distributed memory database;

summarizing the results of processing the fragments of each node to obtain a total result;

and feeding back the total result.

In a second aspect, an apparatus for balancing load of a distributed in-memory database is provided, the apparatus including:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring the number of current fragments of each node in a target distributed memory database and the identifier of the current fragments, and acquiring a first time length and a second time length, the first time length is the average time length of the single fragment processed on a local node, and the second time length is the average time length of the single fragment processed on a non-local node in a cross-network mode;

a determining module, configured to determine, according to the number of current fragments of each node, an identifier of the current fragment of each node, the first time length, and the second time length, indication information of a fragment processed by each node in load balancing, where the indication information is used to indicate the fragment from each node;

and the sending module is used for sending the indication information of the fragments of the nodes to the nodes so that the nodes acquire the fragments according to the indication information of the fragments of the nodes for processing.

Optionally, the obtaining module is configured to:

and determining that the counted average value of the time lengths of the first preset number of fragments respectively processed on the local nodes is a first time length, and determining that the counted average value of the time lengths of the second preset number of fragments respectively processed on the cross-network non-local nodes is a second time length.

Optionally, the determining module is configured to:

Optionally, the indication information includes:

starting fragment identification and ending fragment identification in fragments from each node; alternatively, the first and second electrodes may be,

the number of fragments from each node and the starting fragment identifier; alternatively, the first and second electrodes may be,

the number of fragments from each node and the end fragment identification.

Optionally, the determining module is configured to:

determining the number of fragments from each node in the fragments processed by each node in load balancing on the basis of the minimum value of a TotalCost value, wherein the sum of the current numbers of the fragments of all nodes in the target distributed memory database is equal to the sum of the numbers of the fragments processed on all the nodes in load balancing, the TotalCost is the sum of variances of the total time lengths of the fragments processed by each node in the target distributed memory database,

Cost _k total duration of fragment processing for node k, cost _k Including processing the duration X of the processing of the fragment on the node k by the node k _kk * t and the duration for which the node k processes fragments of other nodes->

t is a first duration, ct is a second duration, n is the number of nodes in the target distributed memory database, X _kk Processing the number of fragments stored on the node k for the node k, X _ik And processing the number of fragments stored on the node i for the node k, wherein the node k is any node in the target distributed memory database, and k is greater than or equal to 1./>

Optionally, the determining module is configured to:

converting the TotalCost into a matrix form TotalCost = X ^T HX, where X = (X) ₁₁ ,X ₁₂ ,...,X _1n ,...X _ij ,...,X _n1 ,X _n2 ,...,X _nn ) ^T ，X _ij Processing the number of fragments on the ith node for the jth node, wherein i, j and n are positive integers, H is a symmetric matrix, and the sum of elements in X is less than or equal to the sum of the current number of fragments of all nodes in the target distributed memory database;

Optionally, the obtaining module is configured to:

the device further comprises:

the receiving module is used for receiving the result of processing the fragments of each node in the target distributed memory database; summarizing the results of processing the fragments of each node to obtain a total result;

the sending module is further configured to feed back the total result.

In a third aspect, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the method steps of the first aspect.

In a fourth aspect, a management device is provided, comprising a processor and a memory, wherein the memory is configured to store a computer program; the processor is configured to execute the program stored in the memory, so as to implement the method steps of the first aspect.

The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:

in the embodiment of the present invention, when balancing the load of the distributed memory database, the indication information of the fragment processed by each node in load balancing may be determined based on the number of the current fragments of each node, the average duration of the processing of the single fragment on the local node, and the average duration of the processing of the single fragment on the non-local node in a cross-network manner, where the indication information is used to indicate the fragment from each node and notify each node, and each subsequent node may acquire the fragment for processing based on the indication information. In this way, the fragments to be processed calculated during load balancing are determined firstly, and then the fragments to be processed calculated during load balancing are processed by each node, so that load balancing can be ensured as much as possible, and the fragment processing efficiency is improved.

Drawings

Fig. 1 is a system diagram for balancing loads of a distributed memory database according to an embodiment of the present invention;

fig. 2 is a flowchart of a method for balancing distributed internal memory database loads according to an embodiment of the present invention;

fig. 3 is a schematic diagram of sending indication information according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of balancing distributed memory database loads according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of balancing distributed memory database loads according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a management device according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

To facilitate understanding of the embodiments of the present invention, a system architecture related to the embodiments of the present invention and concepts related to the terms are first described below.

The embodiment of the invention can be applied to a distributed system, wherein the distributed system is a set of software system established on a network and mainly comprises a distributed memory database, as shown in fig. 1, the distributed memory database comprises a plurality of nodes, and the nodes can be servers.

The distributed memory database stores data in a memory for operation, and has higher reading and writing speed compared with a magnetic disk.

And the local node fragments the stored nodes.

Spark, itself a distributed computing framework, is a top level item of the Apache foundation.

SQL (Structured Query Language), a database Query and programming Language, is used to access data and Query, update, and manage relational database systems.

The shard is a data storage mode in a database, and is a logic unit formed by data with fixed line number. Can be region in HBase, rowgroup in partial/RCFile, etc. When Spark calculation is used, one slice may correspond to one partition of Spark.

A quadratic polynomial, where the number of unknowns is n and the power of each term is 2, is called quadratic.

The load balancing mainly refers to the CPU occupancy rate balancing in the embodiment of the invention, namely the CPU occupancy rates of all the nodes are basically the same.

The embodiment of the invention provides a method for balancing distributed memory database load, wherein an execution main body of the method can be management equipment, and the management equipment can be an independent node or a certain node in the distributed memory database.

The management device may be provided with a processor, a storage and a transceiver, the processor may be configured to perform processing for balancing the load of the distributed memory database, the storage may be configured to perform data required and generated in the process of balancing the load of the distributed memory database, and the transceiver may be configured to receive and transmit data.

Before implementation, it is assumed that the processing capacity of each node is the same, the network capacity of each node is consistent, and the size of the fragments is the same. The implementation of the present invention is described by taking Spark SQL as an example of a query processing engine of a distributed memory database.

An embodiment of the present invention provides a method for balancing distributed memory database loads, as shown in fig. 2, an execution flow of the method may be as follows:

step 201, obtaining the number of current fragments of each node in the target distributed memory database, and obtaining a first time length and a second time length.

The target distributed memory database is any distributed memory database, the first time length and the second time length can be stored in the management device in advance, the first time length is the average time length of a single fragment processed on a local node, namely the average time length of the single fragment processed on a node for storing the fragment, the second time length is the average time length of the single fragment processed on a non-local node in a cross-network mode, namely the average time length of the single fragment not processed on the node for storing the fragment, the second time length comprises two parts, the first part is the time length for transmitting the fragment, and the second part is the time length for processing the fragment.

In implementation, each node in the distributed memory database may periodically report its own fragmentation number and fragmentation identifier to the management device, and the management device may store the received fragmentation number, fragmentation identifier, and node identifier correspondingly. Subsequently, the management device wants to balance the load of each node in the in-memory database, may obtain the number of stored current fragments of each node and the identifier of the fragments, and may obtain the first duration and the second duration that are stored in advance.

Or, the management device may send a fragment acquisition request to each node, and each node may send its own fragment number and fragment identifier to the management device, so that the management device may acquire the current fragment number and fragment identifier of each node, in order to balance the load of each node in the in-memory database. And the management apparatus may acquire the first time period and the second time period stored in advance.

It should be noted that, for a certain node, the current shard refers to the shard currently stored by the node.

Optionally, the process of obtaining the first duration and the second duration may be as follows:

counting the time lengths of the first preset number of fragments which are respectively processed on the local node, and counting the time lengths of the second preset number of fragments which are respectively processed on the cross-network non-local node; and determining that the average value of the counted time lengths of which the processing is respectively finished on the local nodes by the fragments with the first preset number is a first time length, and determining that the average value of the counted time lengths of which the processing is respectively finished on the non-local nodes across the network is a second time length.

The first preset number and the second preset number may be preset and stored in the management device, for example, the first preset number is 1000, the second preset number is 1500, and the first preset number and the second preset number may be the same, for example, the first preset number and the second preset number are both 1000.

In implementation, in the target distributed memory database, each node may count the time length for which the stored fragments are processed on the node itself, and report the time length to the management device, and after receiving the time length for which the first preset number of fragments are processed on the local node, the management device may calculate an average value of the first preset number of time lengths, and determine the average value as the first time length.

In the target distributed memory database, each node may count the time length for processing the fragments stored in other nodes (including the time length for obtaining the fragments from other nodes and the time length for obtaining the post-processing), and report the counted time length to the management device, and after receiving the time length for which the processing of the fragments of a second preset number is completed on the non-local nodes across the network, the management device may calculate an average value of the time lengths of the second preset number, and determine the average value as the second time length.

Optionally, the load balancing considered in the embodiment of the present invention is CPU balancing, so that when a data query request is received, the processing of step 201 is started to be executed:

and when a data query request is received, acquiring the current fragmentation number of each node in the target distributed memory database.

In implementation, a user may input a content to be queried in a search input box, click a search key, trigger a sending of a search request to a management device, and the management device may obtain the number of current segments of each node in a target distributed memory database (the obtaining manner is described in detail above, and is not described here again). The query processing engine for this process may be Spark SQL.

Therefore, when the fragmentation processing is carried out, the unbalanced load of the distributed memory database is increased, so that the load balancing processing can be started when a data query request is received, the processing resource can be saved, and the query performance can be improved.

Step 202, determining indication information of the fragments processed by each node during load balancing according to the number of the current fragments of each node, the identifier of the current fragments of each node, the first time length and the second time length.

In implementation, after obtaining the number of current fragments of each node, the identifier of the current fragment of each node, the first time length, and the second time length, the management device may determine, according to the number of current fragments of each node, the identifier of the current fragment of each node, the first time length, and the second time length, indication information of fragments processed by each node in load balancing, where the indication information is used to indicate the fragments from each node.

Optionally, the number of fragments from each node may be determined first, and then the indication information is determined, and the corresponding processing in step 202 may be as follows:

determining the number of fragments from each node in the fragments processed by each node during load balancing according to the number of current fragments of each node, the first time length and the second time length; and determining the indication information of the fragments processed by each node according to the number of the fragments from each node in the fragments processed by each node and the identification of the current fragment of each node.

In implementation, after obtaining the number of current fragments of each node, the identifier of the current fragment of each node, the first time length and the second time length, the management device may determine the number of fragments from each node in the fragments processed by each node in load balancing. For a current fragment of a target node (any one of all nodes), a management device can determine all nodes for processing the current fragment of the target node and the number of each processed current fragment in all nodes, then sequence numbers of all the nodes are arranged in a sequence from small to large, the sequence numbers of all the nodes are sequentially N1, N2, N3 and N4 \8230fromsmall to large, the number of the current fragments for processing the target node is sequentially N1, N2, N3 and N4 \8230, the node with the sequence number of N1 determines the identifications of the first N1 fragments from the current fragment of the target node, then the node with the sequence number of N2 determines the identifications of the adjacent N2 fragments after the first N1 fragments from the current fragment of the target node, then the node with the sequence number of N3 determines the identifications of the adjacent N3 fragments after the N1+ N2 fragments from the current fragment of the target node, and determines the identifications of the current fragments after the N1+ N2 fragments from the current fragment of the target node according to the method, and then determines the identifications of the current fragments in accordance with the method indicated by the current fragment indicated by the method, and then the method can determine the indicated distributed database information indicated by the distributed database.

For example, there are three nodes, node a, node b, and node c, where the number of fragments stored by node a is 10, the number of fragments stored by node b is 3, the number of fragments stored by node c is 2, and to implement load balancing, the number of fragments stored by node a in processing itself is 5, the number of fragments stored by node a in processing node b in processing node a is 0, the number of fragments stored by node a in processing node c in processing node a is 0, the number of fragments stored by node b in processing node b is 3, the number of fragments stored by node b in processing node a in processing node b is 2, the number of fragments stored by node b in processing node c in processing node b is 0, the number of fragments stored by node c in processing node a is 2, the number of fragments stored by node c in processing node a is 3, and the number of fragments stored by node c in processing node b is 0. The sequence numbers of the nodes are sequentially a node a, a node b and a node c from small to large, for the node a, the node a only processes fragments stored by the node a, the first 5 fragments can be obtained, and the indication information of the fragments processed by the node a is only used for indicating the first 5 fragments. For node b, node a does not process the fragment of node b, the fragment that node b processes node a may be the 6 th fragment and the 7 th fragment, node b processes its own 3 fragments, and the indication information of the fragment processed by node b is only used to indicate the 6 th fragment, the 7 th fragment of node a, and the fragment stored by itself. For the node c, the node a and the node b do not process the fragment of the node c, the fragment of the node a processed by the node c may be from the 8 th fragment to the 10 th fragment, the node c processes two fragments stored by itself, and the indication information of the fragment processed by the node c is only used for indicating the fragments from the 8 th fragment to the 10 th fragment of the node a and the fragments stored by itself.

In addition, for the current fragment of the target node (any one of all nodes), the management device may determine all nodes processing the current fragment of the target node and the number of each processed current fragment in all nodes, and then arrange the sequence numbers of all the nodes in a sequence from small to large, wherein the sequence numbers of all the nodes are N1, N2, N3, and N4 in turn from small to large, 8230, respectively, the number of the current fragment processing the target node is N1, N2, N3, and N4 in turn, the target node is N3, the node with the sequence number N3 first determines the identifier of the top N3 fragments from the current fragment of the target node, then the node with the sequence number N1 determines the identifier of the adjacent N1 fragments after the top N3 fragments from the current fragment of the target node, then the node with the sequence number N2 determines the identifier of the adjacent N2 fragments after the N1+ N3 fragments from the current fragment of the target node, and then determines the identifier of the current fragment to be processed according to the distributed information indicating that the target node is processed, and then the identifier of the distributed database can be determined according to the manner.

For example, there are four nodes in total, node a, node b, node c, and node d, and the sequence numbers of the nodes are, in order from small to large, node a, node b, node c, and node d. The number of fragments stored by the node a is 10, the number of fragments stored by the node b is 3, the number of fragments stored by the node c is 2, and the number of fragments stored by the node d is 9. To realize load balancing, the node a only processes fragments stored by itself, the number of the fragments is 6, and the number of the fragments stored on the node d processed by the node a is 0. The number of fragments stored in the processing node b is 3, the number of fragments stored in the processing node a of the node b is 2, the number of fragments stored in the processing node c of the node b is 0, and the number of fragments stored in the processing node d of the node b is 1. The number of the fragments stored by the node c for processing the node c is 2, the number of the fragments stored by the node c for processing the node a is 2, the number of the fragments stored by the node c for processing the node b is 0, the number of the fragments stored by the node c for processing the node d is 2, and the number of the fragments stored by the node d for processing the node c is only 6.

For the node a, the node a only processes the fragments stored by the node a, and can acquire the first 6 fragments, and the indication information of the fragments processed by the node a is only used for indicating the first 6 fragments.

For the node d, the node d only processes the fragments stored by itself, and may obtain the first 6 fragments, and the indication information of the fragments processed by the node d is only used to indicate the first 6 fragments.

For the node b, the fragment of the node b processing the node a may be the 7 th fragment and the 8 th fragment, the fragment of the node b processing the node d may be the 7 th fragment, the node b processes the 3 own fragments, and the indication information of the fragment processed by the node b is only used for indicating the 7 th fragment of the node a, the fragment stored by the node b, and the 7 th fragment of the node d.

For the node c, the fragment of the node a processed by the node c may be the 9 th fragment and the 10 th fragment, the node c processes the 8 th fragment and the 9 th fragment of the node d, the node c processes two fragments stored by itself, and the indication information of the fragment processed by the node c is only used for indicating the 9 th fragment and the 10 th fragment of the node a, the 8 th fragment and the 9 th fragment of the node d, and the fragment stored by itself.

Optionally, the content of the indication information is various, and three possible contents are given as follows:

A. and the starting fragment identifier and the ending fragment identifier in the fragments from each node.

In an implementation, in step 202, after determining that the fragment of another node is to be processed, for each node, an identifier of a first fragment (i.e., a starting fragment identifier) and an identifier of a last fragment (i.e., an ending fragment identifier) of the fragment to be processed in another node may be obtained. The subsequent indication information includes the start fragment identifier and the end fragment identifier, that is, the start fragment identifier and the end fragment identifier may be used to find all the fragments to be processed.

B. The number of fragments from each node and the starting fragment identification.

In an implementation, in step 202, for each node, after determining that the fragment of another node is to be processed, an identifier of a first fragment of the fragments to be processed in another node (i.e., a starting fragment identifier) and a number of fragments of another node to be processed may be obtained. The subsequent indication information includes the starting fragment identifier and the number of fragments, that is, the starting fragment identifier and the number of fragments can be used to find all fragments to be processed.

C. The number of fragments from each node and the end fragment identification.

In an implementation, in step 202, for each node, after determining the fragments to be processed in another node, an identification of a last fragment of the fragments to be processed in another node (i.e., a starting fragment identification) and a number of fragments of another node to be processed may be obtained. The subsequent indication information includes the ending fragment identifier and the number of the fragments, that is, the ending fragment identifier and the number of the fragments can be used to find all the fragments to be processed.

Optionally, the number of the fragments from each node in the fragments processed by each node may be determined based on a principle that the processing time is the smallest, and the corresponding processing may be as follows:

determining the number of fragments from each node in the fragments processed by each node in load balancing by using the principle that the value of TotalCost is minimum, wherein the sum of the current fragments of all nodes in the target distributed memory database is equal to the sum of the fragments processed on all nodes in load balancing, totalCost is the sum of the variances of the total time length of the fragments processed by each node in the target distributed memory database,

Cost _k total duration of fragment processing for node k, cost _k Including processing node k processing the duration X of a fragment on node k _kk * t and duration for node k to process fragments of other nodes>

t is a first duration, ct is a second duration, n is the number of nodes in the target distributed memory database, X _kk Processing the number of fragments stored on node k for node k, X _ik And processing the number of fragments stored on the node i for a node k, wherein the node k is any node in the target distributed memory database, and k is greater than or equal to 1.

In the implementation, assume that there are n nodes, taking Node k (kth Node, node _ k) as an example, during load balancing, it is assumed that all nodes in the distributed memory database are to process fragments on Node _ k, and after scheduling, each Node processes the number of fragments on Node k, and we mark X as _k1 ,X _k2 ,…,X _kn . Thus, we can get a matrix of n × n:

in the n × n matrix, each column corresponds to a node, the kth column corresponds to a node k, and the number of fragments to be processed by the node k in load balancing can be obtained by adding elements in the kth column, each row corresponds to a fragment stored on one node (i.e., a current fragment on one node), and the number of fragments stored on the node k can be obtained by adding elements in the kth row.

Assuming that the first duration is represented by t and the second duration is represented by ct, in load balancing, the duration required for processing all the fragments on the node k is:

in formula (2), X _kk * t is the time duration for processing the own fragment by the processing node k,

the length of time to process the shards of other nodes for node k.

Then, the sum of the variances of the total durations of the nodes in the target distributed memory database may be determined:

in the formula (3), the reaction mixture is,

in the formula (4), meanCost represents an average value of the processing time of the fragments by each node in the target distributed memory database during load balancing.

Under the allocation of a certain fragment, the TotalCost is minimized, that is, the processing capability of each node in the target distributed memory database is considered to be exerted in a balanced manner, and thus load balancing is achieved.

Optionally, on the principle of TotalCost minimum, the manner of calculating the number of fragments from each node in the fragments processed by each node may be as follows:

converting the TotalCost into a matrix form TotalCost = X ^T HX, where X = (X) ₁₁ ,X ₁₂ ,...,X _1n ,...X _ij ,...,X _n1 ,X _n2 ,...,X _nn ) ^T ，X _ij Processing the number of fragments on the ith node for the jth node, wherein i, j and n are positive integers, H is a symmetric matrix, and the sum of elements in X is less than or equal to the sum of the current number of fragments of all nodes in a target distributed memory database; with X ^T And determining the number of fragments from each node in the fragments processed by each node in load balancing by taking the minimum value of HX as a principle.

In practice, as can be seen from formula (4), X ₁₁ ,X ₁₂ ,...,X _1n ,...X _ij ,...,X _n1 ,X _n2 ,...,X _nn All are unknowns, so TotalCost is a polynomial with n x n unknowns, and each single term of the polynomial is a quadratic, so TotalCost is a quadratic polynomial with n x n unknowns, and it is known based on the properties of the quadratic polynomial that there must be a quadratic polynomialThe symmetric matrix is such that:

TotalCost＝X ^T HX (5)

in formula (5), X = (X) ₁₁ ,X ₁₂ ,...,X _1n ,...X _ij ,...,X _n1 ,X _n2 ,...,X _nn ) ^T 。

Thus, based on equation (5), we can convert the load balancing problem into a quadratic programming problem with constraints, that is:

taking the minimum value of TotalCost: minX ^T HX, and the requirement that HX is less than or equal to b is met.

The matrix H is:

in the formula (6), the reaction mixture is,

in formula (7), Y _n Indicates the number of current shards (i.e., stored shards) on the nth node at load balancing, i.e., the number of stored shards

In formula (7), X _k ＝(X _k1 ,X _k2 ,...,X _kn ) ^T ,Y _k ＝(N _k ,0,...,0) ^T 。

Note that T represents transposition.

X in equation (7) can be determined by solving the quadratic optimal solution. In this way, the number of fragments from each node in the fragments processed by each node is determined.

It should be noted that the method for solving the quadratic programming optimal solution is already a conventional method in the prior art, and a lagrangian multiplier method may be used in the calculation process, which is not described herein again.

And 203, for each node, sending the indication information of the node fragment to the node, so that the node acquires the fragment according to the indication information of the node fragment and processes the fragment.

In implementation, as shown in fig. 3, for each node, the management device may send indication information of a fragment of the node to the node, and after the node receives the indication information, the node may enable the received indication information to obtain the fragment indicated by the indication information for processing.

Optionally, when the indication information is a start fragment identifier and an end fragment identifier in the fragments from each node, for a certain node, if the node is to acquire a fragment from another node for processing, the start fragment identifier and the end fragment identifier of the fragment from the another node may be determined based on the indication information, and then the fragments between the start fragment identifier and the end fragment identifier (the fragments including the fragment of the start fragment identifier and the fragment of the end fragment identifier) are acquired from a storage area of the another node for processing.

Optionally, when the indication information is the number of fragments from each node and the start fragment identifier, for a certain node, if the node is to acquire a fragment from another node for processing, the number of fragments from the another node and the start fragment identifier may be determined based on the indication information, and then the number of fragments identified by the start fragment and the number of fragments adjacent to the fragments identified by the start fragment identifier are acquired from the storage area of the another node and are reduced by one fragment for processing.

Optionally, when the indication information is the number of fragments from each node and the end fragment identifier, for a node, if the node is to acquire a fragment from another node for processing, the number of fragments from the another node and the end fragment identifier may be determined based on the indication information, and then the number of fragments before the fragment with the end fragment identifier and the fragment with the end fragment identifier is acquired from the storage area of the another node, minus one fragment, for processing.

It should be noted that, when acquiring the fragment of another node, the above-mentioned acquisition may be performed based on a metadata table, and a storage area of each node and a storage location of the fragment of each node are recorded in the metadata table.

It should be further noted that, in the distributed memory database, the storage area of each node is a logical storage area.

It should be noted that, in the above-mentioned acquiring the fragment of the other node for processing, only the data of the fragment is read from the other node for processing, and is not written into the other node, so that compared with the fragment migration in the prior art, the time for writing the data of the fragment can be saved. And when the time length for completing the processing of the fragments is longer than the time length for transmitting the data (namely the time length for transmitting the fragments to another node), the CPU occupancy rate of the local node can be reduced by rapidly completing the processing of the fragments by other idle nodes, so that the load balancing capability for the scene is stronger.

Based on the step 201 executed when the data query request is received in step 201, and thus the query result is fed back to the user after step 203, the corresponding process may be as follows:

receiving the result of processing the fragments of each node in the target distributed memory database; summarizing the results of processing the fragments of each node to obtain a total result; and feeding back the total result.

In implementation, after obtaining the fragments and performing query processing, each node may respectively feed back the query results to the management device, and after receiving the query results, the management device may collect the received query results to obtain a total result (i.e., a query result), and then feed back the collected total result to the user, so that the user can check the total result.

In the embodiment of the present invention, when determining load balancing of the distributed memory database, indication information of a fragment processed by each node during load balancing may be determined based on the number of current fragments of each node, an average duration of processing of a single fragment completed on a local node, and an average duration of processing of a single fragment completed on a non-local node in a cross-network manner, where the indication information is used to indicate a fragment from each node and notify each node, and each subsequent node may acquire a fragment for processing based on the indication information. In this way, the fragments to be processed calculated during load balancing are determined, and then the fragments to be processed calculated during load balancing are processed by each node, so that load balancing can be ensured as much as possible, and the fragment processing efficiency is improved.

Based on the same technical concept, an embodiment of the present invention further provides a device for balancing a load of a distributed memory database, as shown in fig. 4, where the device includes:

an obtaining module 410, configured to obtain the number of current fragments of each node in a target distributed memory database and an identifier of the current fragments, and obtain a first duration and a second duration, where the first duration is an average duration for a single fragment to be processed on a local node, and the second duration is an average duration for a single fragment to be processed on a non-local node in a cross-network manner;

a determining module 420, configured to determine, according to the number of current fragments of each node, an identifier of the current fragment of each node, the first time length, and the second time length, indication information of a fragment processed by each node in load balancing, where the indication information is used to indicate a fragment from each node;

a sending module 430, configured to send, for each node, the indication information of the fragment of the node to the node, so that the node obtains the fragment according to the indication information of the fragment of the node and processes the fragment.

Optionally, the obtaining module 410 is configured to:

Optionally, the determining module 420 is configured to:

Optionally, the indication information includes:

starting fragment identification and ending fragment identification in fragments from each node; alternatively, the first and second liquid crystal display panels may be,

the number of fragments from each node and the end fragment identification.

Optionally, the determining module 420 is configured to:

t is the numberA duration, ct is a second duration, n is the number of nodes in the target distributed memory database, X _kk Processing the number of fragments stored on the node k for the node k, X _ik And processing the number of fragments stored on the node i for the node k, wherein the node k is any node in the target distributed memory database, and k is greater than or equal to 1.

Optionally, the determining module 420 is configured to:

with X ^T And taking the minimum value of HX as a principle, and determining the number of fragments from each node in the fragments processed by each node during load balancing.

Optionally, the obtaining module 410 is configured to:

as shown in fig. 5, the apparatus further includes:

a receiving module 440, configured to receive a result of processing the fragment by each node in the target distributed memory database; summarizing the results of processing the fragments of each node to obtain a total result;

the sending module 430 is further configured to feed back the total result.

In the embodiment of the present invention, when balancing the load of the distributed memory database, the indication information of the fragment processed by each node in load balancing may be determined based on the number of the current fragments of each node, the average duration of the processing of the single fragment on the local node, and the average duration of the processing of the single fragment on the non-local node in a cross-network manner, where the indication information is used to indicate the fragment from each node and notify each node, and each subsequent node may acquire the fragment for processing based on the indication information. In this way, the fragments to be processed calculated during load balancing are determined, and then the fragments to be processed calculated during load balancing are processed by each node, so that load balancing can be ensured as much as possible, and the fragment processing efficiency is improved.

It should be noted that: in the apparatus for balancing the load of the distributed memory database provided in the foregoing embodiment, when balancing the load of the distributed memory database, only the division of each function module is illustrated, and in practical applications, the function distribution may be completed by different function modules according to needs, that is, the internal structure of the apparatus is divided into different function modules, so as to complete all or part of the functions described above. In addition, the apparatus for balancing the load of the distributed memory database and the method for balancing the load of the distributed memory database provided in the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 6 is a schematic structural diagram of a management device according to an embodiment of the present invention, where the management device 600 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 601 and one or more memories 602, where the memory 602 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 601 to implement the above method for balancing the load of the distributed memory database.

The embodiment of the invention also provides a computer readable storage medium, wherein a computer program is stored in the storage medium, and when being executed by a processor, the computer program realizes the method for balancing the load of the distributed memory database.

The embodiment of the invention also provides a management device, which comprises a processor and a memory, wherein the memory is used for storing the computer program; the processor is used for executing the program stored in the memory and realizing the method for balancing the load of the distributed memory database.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for balancing distributed memory database load, the method comprising:

according to the number of the current fragments of each node, the identifier of the current fragments of each node, the first time length and the second time length, determining the indication information of the fragments processed by each node in load balancing, wherein the indication information is used for indicating the fragments from each node;

2. The method of claim 1, wherein obtaining the first duration and the second duration comprises:

3. The method according to claim 1, wherein said determining, according to the number of current fragments of each node, the identifier of current fragments of each node, the first duration and the second duration, the indication information of the fragments processed by each node in load balancing comprises:

4. The method according to any one of claims 1 to 3, wherein the indication information comprises:

the number of fragments from each node and the end fragment identification.

5. The method according to claim 3, wherein the determining the number of the fragments from each node among the fragments processed by each node in the load balancing according to the number of the current fragments of each node, the first time duration and the second time duration comprises:

takes the minimum TotalCost value as the originDetermining the number of fragments from each node in the fragments processed by each node during load balancing, wherein the sum of the current numbers of the fragments of all nodes in the target distributed memory database is equal to the sum of the numbers of the fragments processed on all nodes during load balancing, totalCost is the sum of variances of the total time length of the fragments processed by each node in the target distributed memory database,

Cost _k total duration of fragment processing for node k, cost _k Including processing the duration X of the processing of the fragment on the node k by the node k _kk * t and the time length for which node k processes fragments of other nodes>

6. The method according to claim 5, wherein the determining the number of the fragments from each node in the fragments processed by each node in load balancing based on the principle that the TotalCost value is minimum comprises:

with X ^T HX takes the minimum value as the principleAnd determining the number of the fragments from each node in the fragments processed by each node in load balancing.

7. The method of claim 1, wherein the obtaining the current number of shards per node in the target distributed in-memory database comprises:

the method further comprises the following steps:

and feeding back the total result.

8. An apparatus for balancing distributed in-memory database load, the apparatus comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring the number of current fragments of each node in a target distributed memory database and the identifier of the current fragments, and acquiring a first time length and a second time length, the first time length is the average time length of the single fragments processed on a local node, and the second time length is the average time length of the single fragments processed on a non-local node in a cross-network mode;

a determining module, configured to determine indication information of a fragment processed by each node during load balancing according to the number of current fragments of each node, an identifier of the current fragment of each node, the first time length, and the second time length, where the indication information is used to indicate the fragment from each node;

9. The apparatus of claim 8, wherein the obtaining module is configured to:

10. The apparatus of claim 8, wherein the determining module is configured to:

determining the number of fragments from each node in the fragments processed by each node during load balancing according to the number of the current fragments of each node, the first time length and the second time length;

11. The apparatus according to any one of claims 8 to 10, wherein the indication information comprises:

the starting fragment mark and the ending fragment mark in the fragments from each node; alternatively, the first and second liquid crystal display panels may be,

the number of fragments from each node and the end fragment identification.

12. The apparatus of claim 10, wherein the determining module is configured to:

determining the number of fragments from each node in the fragments processed by each node in load balancing by using the principle that the TotalCost value is minimum, wherein the target distributed memory numberThe sum of the current fragment numbers of all nodes in the database is equal to the sum of the fragment numbers processed on all the nodes in load balancing, totalCost is the sum of the variance of the total fragment processing time of each node in the target distributed memory database,

13. The apparatus of claim 12, wherein the determining module is configured to:

14. The apparatus of claim 8, wherein the obtaining module is configured to:

the device further comprises:

the sending module is further configured to feed back the total result.

15. A computer-readable storage medium, characterized in that a computer program is stored in the storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-7.

16. A management device comprising a processor and a memory, wherein the memory is configured to store a computer program; the processor, configured to execute the program stored in the memory, to implement the method steps of any of claims 1-7.