CN103226467B

CN103226467B - Data parallel processing method, system and load balance scheduler

Info

Publication number: CN103226467B
Application number: CN201310195179.6A
Authority: CN
Inventors: 杨树强; 华中杰; 贾焰; 尹洪; 赵辉; 李爱平; 陈志坤; 金松昌; 周斌; 韩伟红; 韩毅; 舒琦
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2013-05-23
Filing date: 2013-05-23
Publication date: 2015-09-30
Anticipated expiration: 2033-05-23
Also published as: CN103226467A

Abstract

The embodiment of the invention discloses data parallel processing method, system and load balance scheduler, in embodiments of the present invention, the arbitrary server in server cluster has the ability executing the task and store data.On this basis, in job scheduling aspect, the embodiment of the present invention predicts the overall system load balancing state under different execution sequence according to calculating localization strategy, select the execution sequence that can make overall system load balancing state optimization, and schedule job in this order.In task dispatch layer face, the embodiment of the present invention distributes each operation entering executing state according to calculating localization strategy.Because calculating localization strategy is dispensed to by each data processing task on the server of the data block storing its correspondence, like this, when Processing tasks, same server is not only as the server node of storage data block but also as the server node of executing the task, decrease the network data transmission between server node, improve the performance of data processing.

Description

Data parallel processing method, system and load balance scheduler

Technical field

The present invention relates to technical field of data processing, more particularly, relate to data parallel processing method, system and load balance scheduler.

Background technology

In a distributed computing environment, such as, the MapReduce(proposed by Google is hereinafter referred to as MR) in parallel computation programming model, the data of the required process of operation have been divided into multiple data block, and are stored in units of data block on one or more server node.After client's submit job, this operation will be divided into and data block task one to one, and these tasks will be assigned to executed in parallel on different server nodes.If the server node of executing the task does not store the data block that this task is corresponding, then need by network data transmission, data block is transferred to this server node of executing the task from the server node storing it.Therefore, how to reduce the network data transmission expense between server node, promote the performance of data processing, become the hot topic of research at present.

Summary of the invention

In view of this, the object of the embodiment of the present invention is to provide data parallel processing method, system and load balance scheduler, to solve the problem.

For achieving the above object, the embodiment of the present invention provides following technical scheme:

A kind of data parallel processing method, based on server cluster, the arbitrary server in described server cluster has the ability executing the task and store data;

Described method comprises:

User is put into operation waiting list by the operation that client is submitted to, and collects the Data distribution information of described operation; The data of the required process of described operation are divided into multiple data block, and be stored on the server in described server cluster respectively, the corresponding data processing task of data block described in each, described Data distribution information comprises the distributed intelligence of data block corresponding to described operation;

When the operation number that described server cluster is performing is less than first threshold, according to described Data distribution information, prediction distributes according to calculating localization strategy the overall system load balancing state that the operation in described operation waiting list causes under different execution sequence, obtains optimum execution sequence;

Operation in described operation waiting list is resequenced by described optimum execution sequence, and enter executing state according to the operation in the order after rearrangement successively schedule job waiting list, until the operation number that described server cluster is performing reaches first threshold or waits for that job queue is for empty;

Each operation entering executing state is distributed, so that server performs data processing task according to calculating localization strategy;

Described distribution according to calculating localization strategy comprises:

Each data block for the data processed needed for operation creates a data processing task, and is dispensed to by each data processing task on the server of the data block storing its correspondence.

A kind of parallel data processing system, comprises server cluster and load balance scheduler;

Arbitrary server in described server cluster has the ability executing the task and store data;

Described load balance scheduler comprises:

Pretreatment unit, for user is put into operation waiting list by the operation that client is submitted to, and collects the Data distribution information of described operation; The data of the required process of described operation are divided into multiple data block, and be stored on the server in described server cluster respectively, the corresponding data processing task of data block described in each, described Data distribution information comprises the distributed intelligence of data block corresponding to described operation;

Predicting unit, for when the operation number that described server cluster is performing is less than first threshold, according to described Data distribution information, prediction distributes according to calculating localization strategy the overall system load balancing state that the operation in described operation waiting list causes under different execution sequence, obtains optimum execution sequence;

Job scheduling unit, for resequencing by described optimum execution sequence to the operation in described operation waiting list, and enter executing state according to the operation that the order after rearrangement is dispatched in the operation waiting list after rearrangement successively, until the operation number that described server cluster is performing reaches first threshold or waits for that job queue is for empty;

First task scheduling unit, for distributing each operation entering executing state according to calculating localization strategy, so that server performs data processing task;

A kind of load balance scheduler, match with server cluster, the arbitrary server in described server cluster has the ability executing the task and store data; Described load balance scheduler comprises:

Visible, in embodiments of the present invention, the arbitrary server in server cluster has the ability executing the task and store data.On this basis, in job scheduling aspect, the embodiment of the present invention predicts the overall system load balancing state under different execution sequence according to calculating localization strategy, select the execution sequence that can make overall system load balancing state optimization, and schedule job in this order.In task dispatch layer face, the embodiment of the present invention distributes each operation entering executing state according to calculating localization strategy.Because calculating localization strategy is dispensed to by each data processing task on the server of the data block storing its correspondence, like this, when Processing tasks, same server is not only as the server node of storage data block but also as the server node of executing the task, decrease the network data transmission between server node, improve the performance of data processing.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

The data processing schematic diagram based on MR that Fig. 1 provides for the embodiment of the present invention;

The data parallel processing method process flow diagram that Fig. 2 provides for the embodiment of the present invention;

The system load balancing view that Fig. 3 provides for the embodiment of the present invention;

The global search tree schematic diagram that Fig. 4 provides for the embodiment of the present invention;

The illumination scan schematic flow sheet that Fig. 5 a and Fig. 5 b provides for the embodiment of the present invention;

The parallel data processing system schematic that Fig. 6 provides for the embodiment of the present invention;

The load balance scheduler structural representation that Fig. 7 provides for the embodiment of the present invention.

Embodiment

For the purpose of quoting and know, the technical term hereinafter used, to write a Chinese character in simplified form or summary of abridging is explained as follows:

Calculate localization: calculate localization and refer in a distributed computing environment, by the distribution of computational logic, make the calculation server (computing node) processing data identical with the storage server node (memory node) storing these data, reduce the network data transmission expense between computing node and memory node with this, promote the performance of data processing;

Data locality: refer to calculate localization can satisfaction degree, whether can not by Internet Transmission, and the ability directly calculating place node and obtain if namely calculating required data.General in large-scale distributed computing environment, represent overall localized degree by localization ratio (the calculating percentage of localization completely);

Load balancing: refer in a distributed computing environment, is assigned to load balancing on two to multiple node (server), avoids partial load overweight, to obtain higher resource utilization, improves data processing performance.Load can be computational load, I/O load, offered load etc.

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

MapReduce(is hereinafter referred to as MR) be the parallel computation programming model proposed by Google.Its basic thought is two functions (Map function and Reduce functions), and the parallel computation of data processing request (operation) on large-scale cluster of any complexity has all been arrived this two functions by high abstraction.MR model has not only played fabulous effect in actual applications, and is easy to learn and use, and is subject to the favor of catenet IT enterprises.

MR model is applicable to process Data-intensive computing.The data of the required process of MR operation are divided into multiple data block (uncorrelated between data block, can be calculated separately), and these data blocks are stored on one or more server node.

Fig. 1 is the data processing schematic diagram based on MR, and suppose that this operation desired data of process is S set, this set is divided into n mutually disjoint data subset (data block) S1 ~ Sn, i.e. S=S1 ∪ S2 ... ∪ Sn.Each computation requests (operation) is broken down into a large amount of map and calculates (map task) and a small amount of reduce calculating (reduce task), map calculates and data block (S1 ~ Sn) one_to_one corresponding, result for result of calculation (intermediate result that MR calculates) the independent calculating separately of map, and is saved in specified location in user by reduce.Wherein, map task needs to be assigned to executed in parallel on different computing nodes.Therefore, under MR computing environment, core is the scheduling of map task.Under other similar distributed computing environment, also need scheduler task.

In a distributed computing environment, there are the following problems: assuming that server node A executes the task job1, but does not store data block corresponding to job1, then need by network data transmission, this data block is transferred to server node A from the server node storing it.How to reduce the network data transmission expense in computation process between server node, promote the performance of data processing, become the hot topic of research at present.

In fact, existing task scheduling mode is all to meet particular demands (such as load balancing) as first object, will improve data locality as the second target, and under causing practical operation situation, localization ratio is not high.

Technical scheme provided by the present invention then will improve data locality as first object, and solve data locality and the afoul problem of system load balancing by new thinking, while raising data locality, the load balancing that optimization system is overall, reduce the network I/O expense in computation process, increase the throughput of system and reduce the execution time of single operation.

In addition, current MR scheduling mode does not distinguish job scheduling and task scheduling, this is because MR calculates be mainly used in batch data process at first, generally only has a few operation in execution, disturb less between operation, do not need to distinguish job scheduling and task scheduling.But when there being a large amount of operation concurrence performance, if do not distinguish the scheduling of operation rank and task rank, will the optimization difficulty of scheduling mode be strengthened.And the core of technical scheme provided by the present invention is the scheduling of operation rank and task rank scheduling: in the scheduling of operation rank, go out the best execution sequence of GSLB situation by load balancing analyses and prediction; Task scheduling level not in, after operation enters executing state, some Map tasks and Reduce task can be divided into, according to locality principle, map task distribution be run on the server to its data.

To specifically introduce below.

Technical scheme provided by the present invention is based on server cluster, and its prerequisite implemented is that the arbitrary server in server cluster has the ability executing the task and store data, and like this, arbitrary server can simultaneously as computing node and memory node.In other words, technical scheme provided by the present invention, is comprise based on each server independently to store and computing power, does not share the hypothesis stored in cluster.Current large-scale service provider and data center adopt this pattern, and the large-scale cluster computing environment be namely formed by connecting by internet by a large amount of low and middle-end server, therefore, this hypothesis is rational.

Refer to Fig. 2, the data parallel processing method of application claims protection at least comprises the steps:

S1, user is put into operation waiting list by the operation that client is submitted to, and collect the Data distribution information of operation.

It should be noted that, the data of the required process of the operation that user submits to, be divided into multiple (at least two) data block (the corresponding map task of each data block), and be stored on the server in server cluster.Data distribution information then comprises the distributed intelligence of data block.

S2, when the operation number that server cluster is performing is less than first threshold, according to Data distribution information, prediction according to the overall system load balancing state that the operation calculated in localization strategy distribution operation waiting list causes, obtains optimum execution sequence under different execution sequence.

The setting of first threshold size is not within this programme is discussed, and those skilled in the art can be arranged based on other technology or experience.In MR technology, first threshold can be specified by user.

First threshold actually represent the operation number of the maximum executed in parallel that server cluster can bear, if do not reach first threshold, represents that server cluster also has the ability to perform more operation.

S3, the operation in operation waiting list to be resequenced by optimum execution sequence, and enter executing state according to the operation that the order after rearrangement is dispatched in the operation waiting list after rearrangement successively, until the operation number that server cluster is performing reaches first threshold or waits for that job queue is for empty.

Dispatch how many operations and enter executing state, need to be determined by above-mentioned first threshold and the operation number that is in executing state.

For example, have 4 station servers in server cluster, every station server can perform at most 20 operations, first threshold can be set to 80.If user have submitted 100 operations, then 20 operations are had to need to put into operation waiting list.

Suppose, server cluster completes 4 operations, and like this, the operation number that server cluster is performing is 76 be less than 80, now, will carry out predicting to obtain optimum execution sequence (corresponding step S2).Afterwards, resequence by optimum execution sequence to the operation of 20 in operation waiting list, after rearrangement, front 4 operations in schedule job waiting list enter executing state (corresponding step S3).

S4, according to calculating localization strategy distribute each operation entering executing state so that server perform data processing task.

It should be noted that, when the operation number that server cluster is performing is less than first threshold, all need to re-execute step S2-S4.

" distributing according to calculating localization strategy " in above-mentioned steps S2 and S4 specifically can comprise: each data block for the data processed needed for operation creates a data processing task, and is dispensed to by each data processing task on the server of the data block storing its correspondence.Also namely, only see which server is data block corresponding to map task be stored on, just this map task scheduling is performed to this server.

For example, 4 station server F1 to F4 are had in server cluster.The data that operation X1 is corresponding are divided into 2 data blocks, and are stored in respectively on F1 and F2, and the data that operation X2 is corresponding are divided into 3 data blocks, are stored in F1, F2, F4 respectively.In this application, when distributing according to calculating localization strategy, operation X1 can be divided into two map tasks, and these two map tasks are respectively allocated on F1 and F2; Operation X2 is divided into three map tasks, and these three map tasks are respectively allocated on F1, F2, F4.

Each server in server cluster safeguards there is local task queue (also can be task waiting list), distribute the map task of coming, can be positioned in local task queue, server performs the task in local task queue according to the principle of first in first out.

More specifically, the data block wanted due to a certain map required by task has been stored in a certain server and has suffered, and therefore, only needs the computational logic of this map task to be dispatched on this server to perform.

It should be noted that, one time map is calculated as a task, and computational logic refers to map function, i.e. computing method.Computational logic in each data block of same operation is identical, and the computational logic of different operations may be different, also may be identical.

In addition, under special circumstances, if the operation number that server cluster is performing is less than first threshold but only have an operation in operation waiting list, then do not need to perform above-mentioned steps S2 and S3, operation in direct schedule job waiting list enters executing state, performs step S4 afterwards.

Visible, in embodiments of the present invention, the arbitrary server in server cluster has the ability executing the task and store data.On this basis, in job scheduling aspect, the embodiment of the present invention predicts the overall system load balancing state under different execution sequence according to calculating localization strategy, select the execution sequence that can make overall system load balancing state optimization (least-loaded and load is the most balanced), and schedule job in this order.

In task dispatch layer face, the embodiment of the present invention, according to calculating localization strategy allocating task, makes each Map task be dispatched to these required by task data and performs on the server, thus make the execution of Map task not have network data transmission expense.Decrease the network data transmission between server node, improve the performance of data processing.

In other embodiments of the present invention, said method also can comprise:

The free time of each server in periodic test server cluster;

From the server that number of tasks is maximum, data dispatching Processing tasks exceeded on the server of Second Threshold to free time.

More specifically, the data processing task at the local task queue end of servers maximum for number of tasks can be dispatched on the server of free time more than the second threshold values.

It should be noted that, simple scheduling mode is when machine is ripe when deployed, and in computational tasks waiting list, each operation enters the overall system load balancing state after executing state respectively, selects the operation that load balancing is best, dispatches it and enter executing state.This scheduling mode calculates simple, and good effect can be obtained when initial launch, but the bad operation of load balancing situation can be caused slowly to overstock, after long-play, the load of system may be caused extremely unbalanced, Fig. 3 illustrates this situation, and in Fig. 3, ordinate is load variance (can find out load imbalance state by load variance), and horizontal ordinate is the time.

In order to avoid the above-mentioned situation of Fig. 3, the embodiment of the present invention is when predicting, it not the overall system load balancing state after the single operation of prediction enters executing state, if but consideration All Jobs enters the overall system load balancing state after executing state according to certain execution sequence, select optimum execution sequence, thus avoid the situation of the system performance sharp-decay shown in Fig. 3 to occur.

To describe in detail to job scheduling below.

In job scheduling, it is crucial for how carrying out predicting to obtain optimum execution sequence.In an embodiment of the invention, in above-mentioned steps S2, " the overall system load balancing state that the operation in prediction operation waiting list causes under different execution sequence obtains optimum execution sequence " can comprise following sub-step:

One, structure global search tree, global search tree comprises many searching routes of shared same root node, and each searching route comprises leaf node; Root node characterizes the current load balancing state of server cluster; Leaf node characterizes the operation in operation waiting list, and different searching routes characterizes different execution sequences;

There to be 3 operations in operation waiting list, and the ID of these 3 operations is job1-job3 is respectively example, and global search tree (see Fig. 4) constructs by following sub-step:

Step1, structure ground floor, ground floor only has a root node (start node), and root node represents with job0.

Step2, due to the operation having N number of (3) to wait in system, so the operation that the next one enters executing state has the selection that N kind is possible, thus the second layer can expand N number of leaf node, and available operation ID represents leaf node.

Step3, structure third layer, have selected an operation due to during the structure second layer, so each node layer 2-based, can expand N-1 (2) node and form third layer.

Step4, by that analogy, until cannot expand, namely can complete the structure of whole global search tree again.

Each searching route in above-mentioned global search tree also can be considered Job execution sequence, searches for optimum execution sequence and is equivalent to search for optimum Job execution sequence.The present embodiment by abstract for optimum for searching Job execution sequence be a graph search mathematical model, namely in the global search tree of such as Fig. 4, find an optimal path from root node to leaf node based on some searching algorithm, thus get optimum execution sequence.

Two, calculate the load balancing predicted value (load balancing predicted value is used for characterization system overall load equilibrium state) of different searching route, using execution sequence corresponding for searching route minimum for load balancing predicted value as optimum execution sequence.

But it should be noted that, if there is N number of wait operation, so just there is ANN kind searching route, obviously carry out the overall situation when dispatching each time to the All Jobs in current waiting list to consider, calculated amount can be very large, therefore, preferably, state under the invention in embodiment, adopt heuristic search policy calculation.

See Fig. 5 a, heuristic search strategy detailed content is as follows:

Steps A, using root node as destination node, calculates the evaluation of estimate of destination node, and using the load balancing predicted value of the evaluation of estimate of root node as each searching route;

Step B, the searching route selecting load balancing predicted value minimum as target search path, using execution sequence corresponding for target search path as target execution sequence;

Step C judges whether also there is the leaf node not carrying out load balancing predictor calculation in target search path; If not, using target execution sequence as optimum execution sequence (step e); If, using next leaf node of current target node in target search path as destination node, calculate the load balancing predicted value (step D) of evaluation of estimate as affiliated searching route of destination node, return the step (step B) of the minimum searching route of selection load balancing predicted value as target search path.

Below, refer to Fig. 5 b, herein will there to be P (P=5) operation in operation waiting list, the ID of these 5 operations is job1-job5 is respectively that (P determines the number of plies height in other words conj.or perhaps of global search tree to example, global search tree the number of plies or be highly P+1), heuristic search strategy is introduced in more detail.

S501, calculates the evaluation of estimate f(0 of root node);

S502, on root node basis, expands P leaf node, and for each leaf node, calculates the evaluation of estimate of each leaf node.

It should be noted that, in step S502, all searching routes in global search tree are all respectively target search path (because the f (0) of each searching route is equal), and the 1st leaf node in each searching route is destination node respectively.

As for concrete how Calculation Estimation value, follow-uply herein will to be described in detail.

S503, the node finding evaluation of estimate minimum in the leaf node of the search tree expanded, and the lower level node expanding this node is as destination node, Calculation Estimation value.

Suppose, after execution step S502, the evaluation of estimate that in this Job execution sequence of job0->job1->job3-GreatT.Grea T.GTjob5->job4->job2, job1 is corresponding is minimum.Then expand the lower level node of leaf node corresponding to job1.Lower one deck leaf node of expansion is the leaf node that job2, job3, job4, job5 are corresponding respectively, and calculates the evaluation of estimate of each leaf node respectively.

S504, circulation is until expand to the P+1 layer of global search tree, and namely have found optimum Job execution sequence, then search procedure terminates.

How Calculation Estimation value will be introduced below.

Suppose that destination node is M node layer in a certain target search path.For each destination node, all its evaluation of estimate f (M) can be calculated by such as minor function:

f(M)=g(M)+h(M)

(formula one)

Wherein:

When g (M) represents Job execution corresponding to destination node, the load balancing value of the All Jobs involved by from root node to destination node and (comprising the load balancing value of original state).G (M) available following formula is calculated:

g (M) = Σ_{j = 1}^{M} {LB}_{j}

(formula two)

Wherein, j represents jth node layer in target search path;

In formula two, LB _jrepresent, the load balancing predicted value of server cluster when the operation that in target search path, jth node layer is corresponding enters executing state, and LB1 represents the load balancing value (current system actual loading equilibrium value) that server cluster is current.

Still refer to Fig. 5, for this Job execution sequence of job0->job1->job3-GreatT.Grea T.GTjob5->job4->job2, suppose that job3 is destination node, then need the LB2 corresponding to LB1 and job1 (job1 is second layer node) that job0 is corresponding to sue for peace.

Before address, LB _jthe load balancing predicted value of server cluster when representing that jth node layer is corresponding in target search path operation enters executing state.In other words, LB _jit is the quantification of the operation corresponding to jth node layer load equilibrium of server cluster when entering executing state.

LB _jbetween available server, load variance represents (see formula three), and the load equilibrium of the system then represented now when variance is 0 reaches best.

{LB}_{j} = Σ_{j = 1}^{M} Σ_{i = 1}^{N} {({Load}_{i}^{j} - \overset{&OverBar;}{{Load}^{j}})}^{2}

(formula three)

Wherein, Load _i ^jwhen representing that the operation that in target search path, jth node layer is corresponding enters executing state, the load (N represents the total quantity of server in server cluster) of i-th server in server cluster, when representing that jth node layer is corresponding in target search path operation enters executing state, the average load of server cluster (also, ).

Due in MR technology, the corresponding data block of each map task, and the size of each data block is identical, all map computational logics of same operation are identical.Therefore, in the present embodiment, the load of server represents by the map number of tasks for this server-assignment.In actual MR system, on i-th server, load equals map task queue length (comprising the task and waiting task that are performing).

Still for this Job execution sequence of job0->job1->job3-GreatT.Grea T.GTjob5->job4->job2, suppose in server cluster, there are five servers (F1-F5), the original state of server cluster is, on each server the length of local task queue be 2(also namely map number of tasks be 2).Assuming that the data that operation job1 is corresponding are divided into 2 data blocks, and are stored in respectively on F1 and F2, the data that operation job3 is corresponding are divided into 3 data blocks, are stored in F1, F2, F4 respectively.

If the leaf node that job1 is corresponding is destination node, then when job1 enters executing state, the load of the load of the load of F1 to be the load of 3, F2 be 3, F3 to be the load of 2, F4 be 2, F5 is the g (M) that 2, job1 is corresponding is 1.25.

And if leaf node corresponding to job3 is destination node, then when job3 enters executing state, the load of F1 be 4, F2 load be 4, F3 load be 2, F4 load be 3, F5 load be 2.The g (M) that then job3 is corresponding is 4.

H (M) represents ideally, the summation of the load balancing value that expection produces when remaining All Jobs performs.So-called perfect condition, refers to that the server load in server cluster is completely average.H (M) available following formula is calculated:

h (M) = Σ_{j = M + 1}^{P + 1} {lb}_{j}

(formula four)

Wherein, lb _jrepresent, under the complete average case of the server load in server cluster, the load balancing predicted value of server cluster when the operation that in target search path, jth node layer is corresponding enters executing state.

Lb _javailable following formula is calculated:

{lb}_{j} = Σ_{j = M + 1}^{P + 1} Σ_{i = 1}^{N} {({l_{i}}^{j} - \overset{&OverBar;}{l^{j}})}^{2}

(formula five)

Wherein, l _i ^junder representing the complete average case of server load in server cluster, when the operation that in target search path, jth node layer is corresponding enters executing state, the load of i-th server in server cluster.

And when representing that the operation that in target search path, jth node layer is corresponding enters executing state, the average load of server cluster (is also ).

In like manner, l _i ^jcan equal, under the complete average case of the server load in server cluster, map task queue length on i-th server.

Still for this Job execution sequence of job0->job1->job3-GreatT.Grea T.GTjob5->job4->job2, suppose in server cluster, there are five servers (F1-F5), the original state of server cluster is, on each server the length of local task queue be 2(also namely map number of tasks be 2).

Assuming that the data that operation job1 is corresponding are divided into 2 data blocks, and are stored in respectively on F1 and F2; The data that operation job3 is corresponding are divided into 3 data blocks, are stored in F1, F2, F4 respectively; The data that job5 is corresponding are divided into 3 data blocks, are stored in F1, F3, F4 respectively; The data that job4 is corresponding are divided into 4 data blocks, are stored in F2, F3, F4, F5 respectively, and the data that job2 is corresponding are divided into 2 data blocks, are stored in F1, F5 respectively.

In this Job execution sequence of job0->job1->job3-GreatT.Grea T.GTjob5->job4->job2, if the leaf node that job1 is corresponding is destination node, when then job1 enters executing state, the load of F1 is 3, the load of F2 is 3, the load of F3 is 2, the load of F4 is 2, the load of F5 is 2, the g (M) that then job1 is corresponding is 1.25, corresponding h (M) is 4.4, and corresponding evaluation of estimate f (M) is 5.65.

And if leaf node corresponding to job3 as destination node time, then when job3 enters executing state, the g (M) that job3 is corresponding is 4, and corresponding h (M) is 3.2, and corresponding evaluation of estimate f (M) is 7.2.

Corresponding with said method, the present invention is also for protected data parallel processing system (PPS), and see Fig. 6, this system at least can comprise server cluster 1 and load balance scheduler 2;

Arbitrary server in server cluster 1 has the ability executing the task and store data;

See Fig. 7, above-mentioned load balance scheduler 2 can comprise:

Pretreatment unit 21, for user is put into operation waiting list by the operation that client is submitted to, and collects the Data distribution information of operation;

Predicting unit 22, when operation number for performing when server cluster is less than first threshold, according to Data distribution information, prediction according to the overall system load balancing state that the operation calculated in localization strategy distribution operation waiting list causes, obtains optimum execution sequence under different execution sequence;

Job scheduling unit 23, for resequencing by optimum execution sequence to the operation in operation waiting list, and enter executing state according to the operation that the order after rearrangement is dispatched in the operation waiting list after rearrangement successively, until the operation number that server cluster is performing reaches first threshold or waits for that job queue is for empty;

First task scheduling unit 24, for distributing each operation entering executing state, so that server is executed the task according to calculating localization strategy.

Detail can see this paper foregoing description, and therefore not to repeat here.

In other embodiments of the present invention, above-mentioned load balance scheduler also can comprise the second task scheduling unit, for the free time of server each in server cluster described in periodic test, and exceed on the server of Second Threshold from data dispatching Processing tasks the maximum server of number of tasks to free time.Detail can see this paper foregoing description, and therefore not to repeat here.

The load balance scheduler of the embodiment of the present invention also in claimed above-mentioned all embodiments.

It should be noted that, load balance scheduler can be hardware device, also can be software program.And each unit in load balance scheduler, also can be hardware device (such as, pretreatment unit can be preprocessing server, predicting unit is actual can be predictive server) or software program.When load balance scheduler is software program, it can be arranged in arbitrary server of server cluster.

In this instructions, each embodiment adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar portion mutually see.For the device that embodiment provides, the method provided due to itself and embodiment is corresponding, so description is fairly simple, relevant part illustrates see method part.

Also it should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment comprising key element and also there is other identical element.

Through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add required common hardware by software and realize, common hardware comprises universal integrated circuit, universal cpu, general-purpose storage, universal elements etc., can certainly comprise special IC, dedicated cpu, private memory, special components and parts etc. by specialized hardware to realize, but in a lot of situation, the former is better embodiment.Based on such understanding, technical scheme of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product can be stored in the storage medium that can read, as USB flash disk, mobile memory medium, ROM (read-only memory) (ROM, Read-Only Memory), random access memory (RAM, Random AccessMemory), magnetic disc or CD etc. are various can the medium of storing software program code, comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) perform the method for the present invention each embodiment.

To the above-mentioned explanation of provided embodiment, professional and technical personnel in the field are realized or uses the present invention.To be apparent for those skilled in the art to the multiple amendment of these embodiments, General Principle as defined herein can without departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention can not be restricted to these embodiments shown in this article, but will meet the widest scope consistent with principle provided in this article and features of novelty.

Claims

1. a data parallel processing method, is characterized in that, based on server cluster, the arbitrary server in described server cluster has the ability executing the task and store data;

Described method comprises:

2. the method for claim 1, is characterized in that, also comprises:

The free time of each server in server cluster described in periodic test;

3. method as claimed in claim 2, is characterized in that:

The overall system load balancing state that operation in described prediction described operation waiting list causes under different execution sequence, obtains optimum execution sequence and comprises:

Structure global search tree, described global search tree comprises many searching routes of shared same root node, and each searching route comprises leaf node; Described root node characterizes the current load balancing state of server cluster; Described leaf node characterizes the operation in described operation waiting list, and different searching routes characterizes different execution sequences;

Calculate the load balancing predicted value of different searching route, using execution sequence corresponding for searching route minimum for load balancing predicted value as optimum execution sequence; Described load balancing predicted value is for characterizing described overall system load balancing state.

4. method as claimed in claim 3, is characterized in that: the load balancing predicted value of the different searching route of described calculating, comprises execution sequence corresponding for searching route minimum for load balancing predicted value as optimum execution sequence:

Using root node as destination node, calculate the evaluation of estimate of described destination node, and using the load balancing predicted value of the evaluation of estimate of root node as each searching route;

The searching route selecting load balancing predicted value minimum as target search path, using execution sequence corresponding for target search path as target execution sequence;

Judge whether also there is the leaf node not carrying out load balancing predictor calculation in described target search path; If not, using described target execution sequence as optimum execution sequence; If, using next leaf node of current target node in described target search path as destination node, calculate the load balancing predicted value of evaluation of estimate as affiliated searching route of described destination node, and return the step of the minimum searching route of described selection load balancing predicted value as target search path.

5. method as claimed in claim 4, is characterized in that:

Operation quantity in described operation waiting list is P, and described P is positive integer;

In described target search path, described destination node is M node layer, and M is not less than 1, is not more than P+1;

The evaluation of estimate of the described destination node of described calculating comprises:

Formula f (M)=g (M)+h (M) is used to calculate the evaluation of estimate f (M) of described destination node;

Wherein:

g (M) = Σ_{j = 1}^{M} {LB}_{j};

h (M) = Σ_{j = M + 1}^{P + 1} {lb}_{j};

J represents jth node layer in target search path;

LB _jrepresent, the load balancing predicted value of server cluster when the operation that in target search path, jth node layer is corresponding enters executing state, LB ₁represent the load balancing value that server cluster is current;

Lb _jrepresent, under the complete average case of the server load in server cluster, the load balancing predicted value of server cluster when the operation that in target search path, jth node layer is corresponding enters executing state;

load _i ^jwhen representing that the operation that in target search path, jth node layer is corresponding enters executing state, the load of i-th server in server cluster, when representing that the operation that in target search path, jth node layer is corresponding enters executing state, the average load of server cluster, described N represents the total quantity of server in server cluster;

l _i ^junder representing the complete average case of server load in server cluster, when the operation that in target search path, jth node layer is corresponding enters executing state, the load of i-th server in server cluster, when representing that the operation that in target search path, jth node layer is corresponding enters executing state, the average load of server cluster;

\overset{&OverBar;}{{Load}^{j}} = \frac{Σ_{i = 1}^{N} {Load}_{i}^{j}}{N};

\overset{&OverBar;}{l^{k}} = \frac{Σ_{i = 1}^{N} {l_{i}}^{j}}{N} .

6. method as claimed in claim 5, it is characterized in that, described load number of tasks characterizes.

7. a parallel data processing system, is characterized in that, comprises server cluster and load balance scheduler;

Described load balance scheduler comprises:

8. a load balance scheduler, is characterized in that, matches with server cluster, and the arbitrary server in described server cluster has the ability executing the task and store data; Described load balance scheduler comprises: