CN111400026B

CN111400026B - Distributed load balancing method based on master-slave backup technology

Info

Publication number: CN111400026B
Application number: CN201911119106.2A
Authority: CN
Inventors: 谢在鹏; 李博文; 张基; 朱晓瑞; 徐媛媛; 叶保留; 毛莺池
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2019-11-15
Filing date: 2019-11-15
Publication date: 2023-02-28
Anticipated expiration: 2039-11-15
Also published as: CN111400026A

Abstract

The invention discloses a distributed load balancing method of a master-slave backup technology, which comprises the following steps: s1, dividing a node set; s2, dividing a task set; s3, distributing subtasks; s4, recording the execution time; s5, calculating execution efficiency; s6, solving a distribution scheme; by modeling the task allocation problem as a linear programming problem, and by analyzing the abstract model, the solution of the task allocation feasible domain is realized. Meanwhile, in combination with practical problems, a task allocation and dynamic adjustment method with backup is provided, and the method aims to realize task allocation according to efficiency on a distributed cluster with backup and reduce apparent performance relevance among nodes as much as possible. The method and the device can avoid increasing the extra burden of the backup node as much as possible when the task amount of the main node is increased, and further can relieve the influence of performance difference between the nodes on the whole task running time.

Description

Distributed load balancing method based on master-slave backup technology

Technical Field

The invention belongs to the field of computer communication, and particularly relates to a distributed load balancing method based on a master-slave backup technology.

Background

In distributed systems, the key way to tolerate a process failure is to put multiple identical processes into a group, and when a message is sent to the group itself for processing, all members accept and process it. In this way, if one process in the group fails, some other process may take over for it. Under the condition of needing fault tolerance, a process copying method is usually used, and when a main process crashes, a backup process replaces the task of the current main process; in the case of needing to improve the performance, the main process is usually copied by using copy and cache extension and redundancy coding to form a process group, tasks can be recovered more quickly due to the existence of redundancy (task redundancy part based on MDS codes is recovered quickly), and the communication quantity required by the tasks can be reduced (communication load based on CDC coding is reduced).

In a conventional process backup scheme, a main process running on a main node is completely copied to form a backup process, and the backup process is placed on a backup node. Because the nodes have a full backup relationship, strong consistency needs to be ensured between the main node and the backup nodes, and the task quantity of the main node is increased while the task quantity of the backup nodes is correspondingly increased. When the performance of the master node and the backup node in the group is different, increasing the task load of the master node may cause a non-negligible running time difference between the master node and the backup node, which affects the real-time performance of backup.

Disclosure of Invention

The invention aims to: in order to model the task allocation problem into a linear programming problem, the solution of a task allocation feasible domain is realized through the analysis of an abstract model, and meanwhile, a task allocation and dynamic adjustment method with backup is provided by combining with an actual problem.

The technical scheme is as follows: the invention provides a distributed load balancing method based on a master-slave backup technology, which comprises the following specific steps:

(1) Dividing a node set according to the constructed node distributed cluster;

(2) Running a task for setting a backup level on a cluster, and dividing all tasks in the same batch into task sets;

(3) Distributing tasks contained in the task set to the corresponding node set according to the node set and the task set obtained in the steps (1) and (2);

(4) After all the nodes finish the received tasks, collecting and recording the execution time;

(5) Acquiring the calculation efficiency of each node, and normalizing the calculation execution efficiency;

(6) And establishing an equation set through known constants, converting the deformation into a linear programming problem, and further solving the distribution scheme.

Further, the method for partitioning a node set in step (1) includes:

(1.1) construction of an n-node scoreDistributed cluster is

(1.2) arranging and combining n nodes, and taking any r nodes to form a group of subsets

All possible combinations

Composition set

Collection

Is provided with

The number of the elements is one,

initialization

Further, the method for dividing the task set in step (2) includes:

(2.1) running F tasks with equal task quantity on the cluster constructed in the step (1), and setting a backup level as r;

(2.2) dividing the whole task set F into small batches, performing task allocation and operation statistics once in each batch, recording the system operation time period as t, and recording the batches as F _j Each batch F _j The calculated time of (d) is recorded as Δ t; each batch F _j Divided into smaller sets of small tasks, denoted

The size of (A) is recorded as

(2.3) all tasks F of the same batch _j Division into

A set of tasks, at time t, in proportion

Batch dividing task F _j Is composed of

A task set

Collection

And

like, set

Also have

And (4) each element.

Further, the method for distributing subtasks in step (3) includes:

(3.1) according to the steps (1) and (2), two genes are obtained

Set of individual elements

And

taking elements from two collections at a time

And

wherein

Is a set of tasks that is to be executed,

is a set of nodes;

(3.2) sequentially subjecting

The contained task is sent to

Repeating the above process on the represented nodes until each task set

All the tasks in (1) are sent to the corresponding

On all nodes in the node; the number of the tasks to be processed existing on each node is the same

Each task set is copied and sent to r different nodes.

Further, the step (4) of recording the execution time process further includes:

and (4.1) waiting for all the nodes to complete the received tasks, and executing redundancy check or other tasks. Collecting each node N _q Is recorded as Δ t _q ；

(4.2) recording the current time as t, node N _q The current computational efficiency of is noted as λ _q (t) sectionPoint N _q Task set owned by

The size of (A) is recorded as

And (4.3) calculating by adopting the formula (1).

Each node independently calculates the calculation efficiency lambda of the node _q (t) and sends it to all other nodes.

Further, the method for calculating the execution efficiency in the step (5) comprises the following steps:

(5.1) obtaining all lambda of each node _q (t) using equation (2) for λ _q (t) performing normalization;

definition of

And e (t):

e(t)＝(e ₀ (t),e ₁ (t),…,e _n-1 (t))；

(5.2) recording the distribution matrix as

Let A be the coefficient matrix and,

as variables, e (t) is a constant term, and a non-homogeneous linear equation system (3) can be obtained.

Wherein, the row mark of the distribution matrix represents the node subscript, the column mark represents the task set subscript, and the element a _q,i =1 representing node q owning task set i, element a _q,i =0 representing node q not owning task set i

Further, the solving of the allocation scheme in the step (6) specifically includes:

(7.1) when the number of rows is equal to the number of columns,

the system of equations has a unique solution

(7.2) when the number of rows in the system of equations is greater than the number of columns,

the equation system has no unique solution and can be transformed into a linear programming problem, and the process is as follows:

(7.2.1) first define the form of matrix A, the left half of which is

The right half part is

Namely:

definition of the same principles

The upper half part of (A) is

The lower half part is

The above equation set is rewritten as (equation 4):

solving to obtain:

if it is

Is greater than 0, is taken

As an argument, it is converted into a linear programming problem, as in equation (6), where

Is a vector

The ith element of (c):

to obtain a solution

(7.2.2) vector

Each obtained in ZhongdeAn element

Corresponding to the ratio of a task set to the total number of tasks, vector

The corresponding set of values can be used as a set of allocation schemes. Divide each batch of tasks F by a ratio using the allocation scheme _j And (5) letting t = t +1, entering the next time step, executing the step (2), and repeating the task allocation and the performance estimation calculation until all the tasks are executed.

Has the beneficial effects that: compared with the prior art, the invention has the remarkable advantages that: (1) The method can realize task allocation according to efficiency on the backup distributed cluster, and simultaneously reduce apparent performance relevance among nodes as much as possible; (2) The method and the device can avoid increasing the extra burden of the backup node as much as possible when the task amount of the main node is increased, and further can relieve the influence of performance difference between the nodes on the whole task running time.

Drawings

Fig. 1 is a flowchart of a distributed load balancing method based on a master-slave backup technology;

FIG. 2 may be a domain sample.

The specific implementation mode is as follows:

the following description will explain embodiments of the present invention by referring to the drawings.

Example 1 is specifically described as figure 1: setting a distributed cluster with 3 nodes

If it is to be at

The above 4 times of loop calculation are performed, and tasks with 120 equal task quantities are performed at a time (the tasks have no relevance, can be performed out of order and do not influence each other). The number r =2 of redundant backups of each task on the present cluster is set (i.e., each task is to be executed twice at the same time). Embodiments of the method according to the above steps will be described below.

Step 1: arranging and combining the 3 nodes to obtain a combined result

Initializing per-batch task set size

And 2, step: according to the size of each batch of task sets

Dividing the tasks in each batch of circulation, and dividing a batch of 120 subtasks into three sets

And is

And step 3: according to steps 1 and 2, two sets of 3 elements are obtained

And

taking elements from two collections at a time

And

wherein

Is a set of tasks that is to be executed,

is a collection of a set of nodes. In turn will

The contained task is sent to

On the node of the representation, i.e. each subset

All nodes in the network receive the same task set

That is to say, the

Send to node { N ₀ ,N ₁ Will be

To a node N ₀ ,N ₂ Will be

To a node N ₁ ,N ₂ }. At this time, the number of the tasks to be processed existing on each node is μ =2 in total, each task set is copied and sent to 2 different nodes, and then the step 4 is performed.

And 4, step 4: and waiting for all the nodes to complete the received tasks. Each node N _q And independently recording the computing efficiency of all the subtasks and sending the computing efficiency to all other nodes in the cluster. With node N ₀ For example, the total running time is recorded as Δ t ₀ At the current moment t, the node N is connected ₀ Is provided withTask set of { P } ₁ ,P ₂ The size of the symbol is recorded as

Calculating lambda ₀ (t)。

And sends λ ₀ (t) to the node N ₁ ,N ₂ . Proceed to step 5.

And 5: at each node N _q The following operations are performed: obtaining all λ from other nodes _q (t) for λ _q (t) normalizing to obtain

Definition of

And e (t) is of the form:

e(t)＝(e ₀ (t),e ₁ (t),e ₂ (t)). A non-homogeneous system of linear equations can be obtained:

wherein:

step 6: solving a non-homogeneous system of linear equations

And as a task allocation scheme of the next iteration, continuing to execute the step 2 until the iteration of all batches is completed.

Suppose that in an iteration, node N ₀ ,N ₁ ,N ₂ Are respectively delta t ₀ ＝16.7secs,Δt ₁ ＝11.1secs,Δt ₂ =14.3secs. The calculation efficiency is obtained as lambda ₀ (t)＝0.0399,λ ₁ (t)＝0.0600,λ ₂ (t) =0.0466, and after normalization e can be obtained ₀ (t)＝0.545,e ₁ (t)＝0.818,e ₂ (t) =0.636 the equation representation of the above-described heterogeneous linear system of equations can be written as:

it can be solved that:

then the next time a task is divided, the size of each of its subtask sets

The total number of tasks at each node becomes 66,98,76, which can be considered to achieve approximately maximum resource utilization, corresponding to its computed performance estimate of 0.0399,0.0600, 0.0466.

Example 2: setting a distributed cluster with 4 nodes

If it is to be at

Execute 4 loop calculations, 120 each timeTasks with equal task amount (tasks have no relevance, can be executed out of order and do not influence each other). The number r =2 of redundant backups of each task on the present cluster is set (i.e., each task is to be executed twice at the same time). Embodiments of the method according to the above steps will be described below.

Step 1: 4 nodes are arranged and combined, and the total number is

Species combination scheme, denoted D ₁ ,D ₂ ,…,D ₆ 。D ₁ ＝{N ₁ ,N ₂ },D ₂ ＝{N ₁ ,N ₃ },…,D ₆ ＝{N ₃ ,N ₄ }. The combined results are recorded as

Initialization

Step 2: according to the size of each batch of task sets

Dividing the tasks of each batch of circulation, then a batch of 120 sub-tasks will be divided into 6 sets

And is

And step 3: according to steps 1 and 2, two sets of 6 elements are obtained

And

taking elements from two collections at a time

And

wherein

Is a set of tasks that is to be executed,

is a collection of nodes. In turn will

The contained task is sent to

On the node of the representation, i.e. each subset

All nodes in the network receive the same task set

That is to say that

To a node N ₀ ,N ₁ Will be

To a node N ₀ ,N ₂ Will be

Send to node { N ₀ ,N ₃ And so on. At this time, the number of tasks to be processed existing on each node is μ =3 in total, and each task set is copied and sent to 2 different nodes, and then the process proceeds to step 4.

And 4, step 4: waiting for all nodes to complete their received tasks. Each node N _q And independently recording the computing efficiency of all the subtasks and sending the computing efficiency to all other nodes in the cluster. With node N ₀ For example, the total operating time is denoted as Δ t ₀ When the current time is t, connecting the node N ₀ Task set P owned by ₁ ,P ₂ ,P ₃ The size of the symbol is recorded as

Calculating lambda ₀ (t)。

And transmit lambda ₀ (t) to node N ₁ ,N ₂ ,N ₃ . Proceed to step 5.

And 5: at each node N _q The following operations are performed: obtaining all λ from other nodes _q (t), for λ _q (t) normalizing to obtain

Definition of

And e (t) is of the form:

e(t)＝(e ₀ (t),e ₁ (t),...,e ₅ (t))。

a non-homogeneous system of linear equations can be obtained:

wherein:

step 6: the sparse matrix of the non-homogeneous linear equation set is not a non-singular matrix, the non-homogeneous linear equation set cannot be solved, and the non-homogeneous linear equation set can be transformed into a linear programming problem, and the process is as follows:

firstly, the matrix A can be written into a form of splicing a square matrix and a common matrix, and the left half part of the matrix A is called A _l And the right half is A _r Namely:

same scale

The upper half of (A) is

The lower half part is

The above equation set can be rewritten as (4):

it can be solved that:

order to

Is greater than 0, is taken

As an argument, it can be converted into a linear programming problem, as in equation (6).

To obtain

Then, the formula (6) is substituted to solve

To obtain

(Vector)

Get each element

Corresponding to the proportion of a task set to the total number of tasks, then vector

The corresponding set of values can be used as a set of allocation schemes. Divide each batch of tasks F by a ratio using the allocation scheme _j And (3) letting t = t +1, entering the next time step, executing the step 2, and repeating the task allocation and the performance estimation calculation until all the tasks are executed.

Suppose that in an iteration, node N ₀ ,N ₁ ,N ₂ ,N ₃ Are respectively delta t ₀ ＝16.7secs,Δt ₁ ＝11.1secs,Δt ₂ ＝14.3secs,Δt ₃ =11.1secs. The calculation efficiency can be obtained as follows:

λ ₀ (t)＝0.0299,λ ₁ (t)＝0.0450,λ ₂ (t)＝0.0349,λ ₃ (t)＝0.0450

after normalization, one can obtain:

e ₀ (t)＝0.3862,e ₁ (t)＝0.5814,e ₂ (t)＝0.4510,e ₃ (t)＝0.5814

the equation representation of the above-described system of non-homogeneous linear equations can be written as:

namely:

coefficient matrix:

inverting it to obtain:

to find

Obtaining:

to find

Obtaining:

order:

the following can be obtained:

the constraint has a feasible region in the first quadrant, as shown in FIG. 2.

In the above feasible domain:

taking a set of feasible solutions:

go on to solve

Then the next time a task is divided, the size of each of its subtask sets

Therefore each timeThe total number of tasks on each node becomes 46,70,54,70, which can be considered to achieve approximately maximum resource utilization, corresponding to its computed performance estimate of 0.0299,0.0450,0.039, 0.0450.

Claims

1. A distributed load balancing method based on a master-slave backup technology is characterized by comprising the following steps:

(1) Dividing a node set according to the constructed node distributed cluster;

(6) Establishing an equation set through a known constant, converting deformation into a linear programming problem, and further solving a distribution scheme;

the method for partitioning a node set in step (1) comprises the following steps:

(1.1) constructing a distributed cluster of n nodes as

(1.2) arranging and combining n nodes, and taking r nodes to form a group of subsets

All possible combinations

Composition set

Collection

Is provided with

The number of the elements is one,

initialization

The solving of the allocation scheme in the step (6) specifically comprises:

(6.1) when the number of rows is equal to the number of columns,

the system of equations has a unique solution

(6.2) when the number of rows in the system of equations is greater than the number of columns,

the equation set has no unique solution and can be transformed into a linear programming problem, and the process is as follows:

(6.2.1) defining the form of matrix A, the left half of which is

The right half part is

Namely:

defining vectors in the same way

The upper half of (A) is

The lower half part is

The above equation set is rewritten as (equation 4):

solving to obtain:

if it is

Is greater than 0, is taken

Is a vector

The ith element of (c):

to obtain a solution

(6.2.2) vectors

Get each element

Corresponding to the ratio of a task set to the total number of tasks, vector

The corresponding set of values can be used as a set of allocation schemes by which to divide the tasks F of each batch in proportion _j And (3) enabling t = t +1, entering the next time step, executing the step (2), and repeating task allocation and performance estimation calculation until all tasks are executed.

2. The distributed load balancing method based on the master-slave backup technology as claimed in claim 1, wherein the task set partitioning method in step (2) comprises:

(2.2) dividing the whole task set F into small batches, performing task allocation and operation statistics once in each batch, recording the system operation time period as t, and recording each batchNext task F _j The calculated time of (d) is recorded as Δ t; task F for each batch _i Divided into smaller sets of small tasks, denoted

The size of (A) is recorded as

(2.3) all tasks F of the same batch _j Division into

A task set, at time t, in proportion

Batch dividing task F _j Is composed of

A task set

Collection

And

like, set

Also have

And (4) each element.

3. The distributed load balancing method based on the master-slave backup technology as claimed in claim 2, wherein the method for distributing the subtasks in step (3) comprises:

(3.1) obtaining two

Set of individual elements

And

taking elements from two sets at a time

And

wherein

Is a set of tasks that is to be executed,

is a set of nodes;

(3.2) sequentially subjecting

The included task is sent to

On the nodes of the representation up to each task set

All tasks in (1) are sent to corresponding

All nodes in the network; the number of the tasks to be processed existing on each node is the same

Each task set is copied and sent to r different nodes.

4. The distributed load balancing method based on the master-slave backup technology as claimed in claim 3, wherein the step (4) of recording the execution time further comprises:

(4.1) waiting for all the nodes to complete the tasks received by the nodes, executing redundancy check or other tasks, and collecting each node N _q Is recorded as Δ t _q ；

(4.2) recording the current time as t, node N _q The current computational efficiency of is noted as λ _q (t), node N _q Task set owned by

The size of (A) is recorded as

(4.3) calculating by using the formula (1):

5. The distributed load balancing method based on the master-slave backup technology as claimed in claim 4, wherein the method for calculating the execution efficiency in step (5) comprises:

(5.1) obtaining all lambda of each node _q (t) using equation (2) for λ _q (t) normalization is carried out；

Definition of

And e (t):

e(t)＝(e ₀ (t),e ₁ (t),…,e _n-1 (t))；

(5.2) recording the distribution matrix as A,

let A be the coefficient matrix and,

as variables, e (t) is a constant term, a non-homogeneous linear equation system (3) can be obtained:

wherein, the row mark of the distribution matrix represents the node subscript, the column mark represents the task set subscript, and the element a _q,i =1 representing node q owning task set i, element a _q,i =0 represents that node q does not own task set i.