CN106502790A

CN106502790A - A kind of task distribution optimization method based on data distribution

Info

Publication number: CN106502790A
Application number: CN201610890105.8A
Authority: CN
Inventors: 王洪添; 李萍
Original assignee: Shandong Inspur Cloud Service Information Technology Co Ltd
Current assignee: Shandong Inspur Cloud Service Information Technology Co Ltd
Priority date: 2016-10-12
Filing date: 2016-10-12
Publication date: 2017-03-15

Abstract

The invention discloses a kind of task distribution optimization method based on data distribution, which realizes that process is：According to the network distance between node and the data transfer cost of intermediate result weight distribution assessment of scenario reduce tasks；Show that the optimum of each task executes node set according to reduce tasks data transfer cost on different nodes；Specific Task Assigned Policy and algorithm are provided based on the optimum node set that executes.Task distribution optimization method that should be based on data distribution is compared with prior art, the data transfer that execute reduce task bring effectively is reduced, can be network access request that MapReduce programs reduce about 12%, and the operation response time also shortens 9% or so, practical.

Description

Task allocation optimization method based on data distribution

Technical Field

The invention relates to the technical field of computer data integration, in particular to a task allocation optimization method which is high in practicability and based on data distribution.

Background

The explosive growth of information pushes the internet to enter a big data era, nowadays, big data becomes an important strategic resource and a novel decision-making mode, and cloud computing provides strong computing and storage capacity for big data processing and analysis. With the rise of big data and cloud computing, more and more companies are beginning to provide cloud services using MapReduce and Hadoop. The MapReduce is a programming model proposed by google, and is usually used for parallel operation of large-scale data sets, and the Hadoop is a parallel programming framework which realizes open sources including the MapReduce model and a distributed file system (HDFS), and has the characteristics of high efficiency, high reliability, high fault tolerance, low cost and scalability.

Network bandwidth has been a bottleneck restricting the development of cloud computing, and is also one of the current research hotspots. As shown in fig. 1, the MapReduce program can be abstracted into two specific functions: the device comprises a map function and a reduce function, wherein the map function is responsible for decomposing input data and performing primary processing, and the reduce function is responsible for summarizing intermediate results to obtain a final result. The MapReduce framework generally constructs map tasks on nodes storing data blocks, so that data transmission and occupation of network bandwidth can be reduced. The reduce tasks do not have the advantage of data localization, however, because the input of a single reduce task usually comes from the output of multiple map tasks, and each reduce task needs to output the final result to the HDFS, the input and output of the reduce function need to occupy network bandwidth.

Based on the method, the task allocation optimization method based on data distribution is provided, network and I/O expenses caused by data transmission are reduced through the starting nodes reasonably allocating the reduce tasks, and meanwhile the performance of the MapReduce program is improved.

Disclosure of Invention

Aiming at the defects, the technical task of the invention provides a task allocation optimization method which is strong in practicability and based on data distribution.

A task allocation optimization method based on data distribution is specifically realized by the following steps:

firstly, evaluating the data transmission cost of the reduce task according to the network distance between the nodes and the weight distribution condition of the intermediate result;

secondly, obtaining an optimal execution node set of each task according to the data transmission cost of the reduce task on different nodes;

and thirdly, giving out a specific task allocation strategy and algorithm based on the optimal execution node set.

The network distance between the nodes specifically refers to: when the MapReduce program has m map tasks Mi and n reduce tasks Rj, wherein i is more than or equal to 0 and less than or equal to m, j is more than or equal to 0 and less than or equal to n, and the input of each reduce task is from the output of all map tasks; the intermediate result generated by the map task is transmitted to the nodes running the reduce task through the network, and the sum of the distances from the nodes where all the map tasks are located to the nodes where the reduce task Rj is located is the total network distance TND of the Rj_Rj。

And the intermediate result weight distribution is restored to a local prediction distribution map by acquiring global distribution information, the weight distribution condition of the intermediate result is counted and predicted by taking a key value pair as a granularity, and the data transmission cost of the reduce task is evaluated by combining a network distance.

The specific process of acquiring the global distribution information is as follows:

1) when the execution progress of the map stage is α, each node counts the intermediate result key value pair, wherein slowstart_conf≤α≤1，slowstart_confA parameter configured for the user and indicating that the ratio of the map task completed when being executed reaches slowstart_confWhen the task is executed, the reduce task is executed;

2) when each node partitions the intermediate result according to the partition function, counting key value pairs corresponding to the intermediate result to generate a series of (k, n) tuples and sequencing the tuples from large to small according to the value of n;

3) setting a global truncation threshold theta, namely only using the first theta% of (k, n) tuple lists in the local distribution diagram as a basis for constructing the global distribution diagram, wherein the key value logarithm n of the theta% of (k, n) in the local distribution diagram is called as a local truncation thresholdThe profile after truncation is called a local truncation profile L;

4) constructing a global distribution map G: first, global score is definedLower cloth limit G_LAnd global distribution ceiling G_UThey respectively represent the maximum value and the minimum value of the number of corresponding element groups of each key obtained by the local truncation distribution diagram and the local truncation threshold value, and then set the global distribution lower limit G_L＝{(k,N_L) K ∈ K }, upper limit of global distribution G_U＝{(k，N_U) I K ∈ K }, then there areWherein,

5) if the global profile G { (K, N) | K ∈ K }, the intermediate value between the upper limit and the lower limit is the result of the global profile, i.e., the result is the global profile

6) Performing prediction correction on the global distribution diagram according to historical distribution, assuming that the distribution deviation of any key is the difference between the current distribution ratio and the historical distribution ratio, and selecting the key with the maximum distribution deviation as a correction key k_cAnd with (k)_cN) and k_cThe historical distribution proportion predicts the total number of key value pairs of the intermediate result, and further predicts the key value pair prediction value corresponding to each key, and the corrected global distribution map is called a global prediction distribution map G_c。

The specific process of restoring the local prediction distribution map through the global distribution information comprises the following steps:

the local prediction distribution map is L_cFrom the global distribution graph G, for any key k, if (k, n) ∈ L_iThen it is to L_cContribution is n, otherwise contribution isPredicting the logarithm N of key value to be generated based on global prediction distribution diagram and global distribution diagram_cThe number of element groups is proportionally divided according to the running progress of each map task, namely if the progress of the map task isThen the logarithm of the predicted key value of key k in the intermediate result corresponding to the task is

The data transmission cost of the reduce task is evaluated in the first step specifically as follows: data transmission Cost of node w executing reduce task r_w/rFor r, the sum of the data transmission costs of pulling the corresponding intermediate result key-value pair from each node, i.e.Wherein m is_iTo execute the node of map task i, d (w, m)_i) Is the network distance between two nodes, r_inputA set of input key-value pairs of r.

The optimal execution node set obtained in the step two is the optimal execution node set N for obtaining the reduce task r_optimal(r), performing a task r at any node w in the set results in a minimal Cost for data transmission Cost_w/rThe specific process comprises the following steps:

when the optimal task set R of any node_optimal(n) when the node n pulls a set formed by tasks with minimum data transmission cost of intermediate result key value pairs in all the unexecuted reduce tasks, and the current node is not the optimal execution node of any task, the task selector allocates R to the current node_optimal(n);

when a node requests a reduce task, firstly, an optimal execution node set of the unexecuted task is sequentially acquired, and if the current node is the optimal execution node of the task, the task is returned; otherwise, adding 1 to the skip count attribute of the task, wherein the skip count records the number of times that each task is skipped because the optimal execution node request cannot be obtained; if the current node is not the optimal execution node of any task, acquiring an optimal execution task list of the current node, and selecting and distributing the task with the largest skip count; and the optimal execution node and the optimal execution task are periodically updated before the execution of the map stage is finished so as to ensure the real-time performance of the scheduling.

The task allocation optimization method based on data distribution has the following advantages that:

according to the task allocation optimization method based on data distribution, a local prediction distribution map is restored by acquiring more accurate global distribution information, and the weight distribution condition of an intermediate result is counted and predicted by taking a key value pair as a granularity; evaluating the data transmission cost of the reduce task according to the network distance between the nodes and the weight distribution condition of the intermediate result, and providing the accuracy of data perception and the network transmission cost balanced by a truncation prediction method; optimizing a distribution strategy of the reduce task in the cloud computing environment and giving a specific algorithm based on the optimal execution node set of the task and the optimal task set of the nodes; on the basis of a job-level scheduling strategy, network and I/O (input/output) expenses caused by data transmission are reduced by reasonably distributing the starting nodes of the reduce tasks, and meanwhile, the performance of a MapReduce program is improved.

Drawings

FIG. 1 is a MapReduce data flow diagram.

FIG. 2 is a schematic diagram of a Hadoop cluster network architecture.

Detailed Description

The invention is further described with reference to the following figures and specific embodiments.

The Hadoop cluster adopts a master-slave architecture and a tree network topology. In a cloud computing environment, a data center usually includes a plurality of racks, each rack is equipped with a plurality of servers, and the architecture is characterized in that: the total bandwidth between nodes within the same rack is much higher than the bandwidth between nodes in different racks. The invention fully utilizes the characteristic and reduces the data transmission among the racks by reasonably distributing the starting nodes of the reduce task.

As shown in fig. 2, the task allocation optimization method based on data distribution of the present invention provides a task allocation optimization strategy taking data transmission cost as an evaluation index, with sensing data distribution as a core. The strategy adopts a greedy algorithm idea, calculates an optimal execution node set by constructing a local prediction distribution graph of an intermediate result, and reduces data transmission in the reduce task execution process as much as possible, thereby reducing network and I/O (input/output) expenses brought by data transmission, and simultaneously improving the time performance of an application program and the throughput rate of the whole cluster.

The main contents comprise:

The network distance between the nodes specifically refers to: in a cloud computing environment, a MapReduce program is assumed to have m (i is more than or equal to 0 and less than or equal to m) map tasks Mi and n (j is more than or equal to 0 and less than or equal to n) reduce tasks Rj, and the input of each reduce task is from the output of all map tasks. Because the intermediate result generated by the map task needs to be transmitted to the node running the reduce task through the network, the sum of the distances from the node where all the map tasks are located to the node where the reduce task Rj is located is called the Total Network Distance (TND) of Rj_Rj). Apparently TND_RjThe larger the intermediate result that needs to be transmitted to the reduce task, the slower the data transmission speed.

As shown in FIG. 2, assume a Hadoop cluster contains two machinesRacks, N0-N9 represent 10 slave nodes in a Hadoop cluster, respectively, where N0-N4 are located in rack 1, and N5-N9 are located in rack 2; suppose that the MapReduce program has 6 map tasks and 4 reduce tasks, wherein the map tasks are respectively located at nodes N0, N1, N2, N3, N5 and N6, and the reduce tasks are respectively located at nodes N3, N4, N6 and N7; meanwhile, assuming that the network distance from each child node to the parent node of the child node in the Hadoop cluster is 1, the network distance between two nodes in the same rack is 2, and the network distance between two nodes in different racks is 4. Next, the total network distance TND of each reduce task is calculated separately_R0、TND_R1、TND_R2And TND_R3：

TND_R0＝3×2+2×4＝14；

TND_R1＝4×2+2×4＝16；

TND_R2＝4×4+1×2＝18；

TND_R3＝4×4+2×2＝20；

It can be seen that when the reduce tasks are located in different slave nodes, the overall network distances are different. This indicates that the reduce task also has data localization properties, but unlike the map task, the reduce task is more concerned with map input on the entire chassis than input data on a single node. The total network distance is 14 when the reduce task is located at node N3 in rack 1 and 20 when the reduce task is located at node N7 in rack 2. Therefore, the starting node of the reduce task is reasonably selected, so that the whole network distance can be reduced, the duration of the shuffle stage is shortened, and the time performance of the application program is improved.

Besides considering the network distance of the reduce task, the weight distribution of the intermediate result is also an important factor for measuring the data transmission cost. The distribution of intermediate result key-value pairs can be collected and counted with partition as granularity, but there are still two problems: 1) in order to reduce the delay caused by the transmission of intermediate result data, the scheduling of the reduce task generally starts before the map stage is completely finished, and the partition size and the final key value pair distribution at the moment may have great difference, which may cause inaccurate scheduling result; 2) the distribution of key-value pairs usually has a certain regularity, and even if the final partition distribution is predicted by the method, the distribution of the final key-value pairs cannot be predicted and corrected by the existing knowledge due to the problems that the partition granularity is large and the final partition distribution completely depends on the partition function. Aiming at the problems, the invention takes the key value pair as the granularity to carry out statistics and prediction on the weight distribution of the intermediate result, and evaluates the data transmission cost of the reduce task by combining the network distance.

The number of intermediate result key-value pairs needs to be counted on each node executing the map task, but since the data volume is large, the data distribution collector that each node transmits all key-value pair tuples (k, n) to the master node consumes more network resources and time. On the other hand, obtaining absolutely accurate key-value pair distribution information is not so meaningful that this approach is not reasonable. The invention firstly obtains more accurate global distribution information, and then reduces a local prediction distribution map according to the global prediction distribution map to calculate the data transmission cost required by executing the task, and the specific statistical process is as follows:

(1) when the execution progress of the map phase is α (slowstart)_confα is less than or equal to 1), each node counts the intermediate result key value pair, wherein slowstart_confA parameter configured for the user and indicating that the ratio of the map task completed when being executed reaches slowstart_confAt that time, the reduce task starts to be executed.

(2) When each node partitions the intermediate result according to the partition function, counting key value pairs corresponding to the intermediate result, generating a series of (k, n) tuples and sequencing the tuples from large to small according to the value of n.

(3) Setting a global truncation threshold theta, namely only using the first theta% of (k, n) tuple lists in the local distribution diagram as a basis for constructing the global distribution diagram, wherein the key value logarithm n of the theta% of (k, n) in the local distribution diagram is called as a local truncation thresholdThe truncated profile is referred to as the local truncated profile L.

(4) A global profile G is constructed. First, a global distribution lower bound (G) is defined_L) And global distribution ceiling (G)_U) They respectively represent the maximum value and the minimum value of the number of corresponding element groups of each key obtained by the local truncation distribution diagram and the local truncation threshold value, and then set the global distribution lower limit G_L＝{(k,N_L) L K ∈ K, global distribution upper bound G_U＝{(k,N_U) I K ∈ K }, then there are Wherein,

(5) if the global profile G { (K, N) | K ∈ K }, the intermediate value between the upper limit and the lower limit is the result of the global profile, i.e., the result is the global profile

(6) Performing prediction correction on the global distribution diagram according to historical distribution, assuming that the distribution deviation of any key is the difference between the current distribution ratio and the historical distribution ratio, and selecting the key with the maximum distribution deviation as a correction key k_cAnd with (k)_cN) and k_cThe historical distribution proportion predicts the total number of key value pairs of the intermediate result, and further predicts the key value pair prediction value corresponding to each key, and the corrected global distribution map is called a global prediction distribution map G_c。

(7) Restoring the local prediction profile L from the global prediction profile_cFrom the global distribution G, for any key k, if (k, n) ∈ L_iThen it is to L_cContribution is n, otherwise contribution isThe logarithm N of key values to be generated can be predicted based on the global prediction distribution diagram and the global distribution diagram_cThe number of element groups is proportionally divided according to the running progress of each map task, namely if the progress of the map task isThen the logarithm of the predicted key value of key k in the intermediate result corresponding to the task is

In summary, based on the network distance between nodes and the intermediate result weight distribution, it can be obtained that: data transmission Cost of node w executing reduce task r_w/rFor r, the sum of the data transmission costs of pulling the corresponding intermediate result key-value pair from each node, i.e.Wherein m is_iTo execute the node of map task i, d (w, m)_i) Is the network distance between two nodes, r_inputA set of input key-value pairs of r.

The optimal execution node set obtained in the step two is the optimal execution node set N for obtaining the reduce task r_optimal(r), performing a task r at any node w in the set results in a minimal Cost for data transmission Cost_w/rIn order to reduce network and I/O overhead brought by intermediate result data transmission, the optimal allocation scheme of the reduce task is to allocate all tasks to respective optimal execution nodes for execution, so that the lowest overall data transmission cost is achieved. But sometimes to meet the user's job response time requirements in the service level agreement in real time, the service provider must complete the assignment of all tasks by time. Under this constraint, it may cause that part of the reduce task cannot be executed on its optimal execution node.

Furthermore, the best performanceWhether there are resources available on a row node is also one of the factors that constrain task allocation. To solve this problem, an optimal task set R of arbitrary nodes is assumed_optimalAnd (n) is a set formed by pulling the intermediate result key value by the node n to the task with the minimum data transmission cost in all the unexecuted reduce tasks. In the case where the current node is not the optimal executing node for any task, the task selector will assign R to it_optimalThe task in (n) is specifically allocated according to the following algorithm:

when a node requests a reduce task, firstly, an optimal execution node set of the unexecuted task is sequentially acquired, and if the current node is the optimal execution node of the task, the task is returned; otherwise, add 1 to the skipcount attribute for the task, which records the number of times each task is skipped because it does not get the best performing node request (lines 1-8). If the current node is not the optimal execution node of any task, the optimal execution task list of the current node is obtained, and the task with the largest distribution skip count is selected (lines 9-16). And the optimal execution node and the optimal execution task are periodically updated before the execution of the map stage is finished so as to ensure the real-time performance of the scheduling.

Under the cloud computing environment, the task allocation optimization method can effectively reduce data transmission brought by executing reduce tasks, can reduce network access requests for MapReduce programs by about 12%, and shortens operation response time by about 9%.

The above embodiments are only specific cases of the present invention, and the protection scope of the present invention includes but is not limited to the above embodiments, and any suitable changes or substitutions that are consistent with the claims of a data distribution-based task assignment optimization method of the present invention and are made by those skilled in the art should fall within the protection scope of the present invention.

Claims

1. A task allocation optimization method based on data distribution is characterized in that the implementation process is as follows:

2. The method for optimizing task allocation based on data distribution according to claim 1, wherein the network distance between the nodes specifically refers to: when the MapReduce program has m map tasks Mi and n reduce tasks Rj, wherein i is more than or equal to 0 and less than or equal to m, j is more than or equal to 0 and less than or equal to n, and the input of each reduce task is from the output of all map tasks; the intermediate result generated by the map task is transmitted to the nodes running the reduce task through the network, and the sum of the distances from the nodes where all the map tasks are located to the nodes where the reduce task Rj is located is the total network distance TND of the Rj_Rj。

3. The method as claimed in claim 1, wherein the intermediate result weight distribution is restored by obtaining global distribution information to obtain a local prediction distribution map, the weight distribution of the intermediate result is counted and predicted by taking key value pairs as granularity, and the data transmission cost of the reduce task is evaluated by combining network distance.

4. The method for optimizing task allocation based on data distribution according to claim 3, wherein the specific process of obtaining the global distribution information is as follows:

3) setting a global truncation threshold theta, namely only using the first theta% of (k, n) tuple lists in the local distribution diagram as a basis for constructing the global distribution diagram, wherein the key value logarithm n of the theta% of (k, n) in the local distribution diagram is called as a local truncation thresholdThe profile after the truncation is called a local truncation profile L;

4) constructing a global distribution map G: first, a global distribution lower limit G is defined_LAnd global distribution ceiling G_UThey respectively represent the maximum value and the minimum value of the number of corresponding element groups of each key obtained by the local truncation distribution diagram and the local truncation threshold value, and then set the global distribution lower limit G_L＝{(k,N_L) L K ∈ K, global distribution upper bound G_U＝{(k,N_U) I K ∈ K }, then there areWherein,

5. The method for optimizing task allocation based on data distribution according to claim 4, wherein the specific process of restoring the local prediction distribution map through the global distribution information is as follows:

6. The method for optimizing task allocation based on data distribution according to claim 5, wherein the evaluating the data transmission cost of the reduce task in the first step is specifically as follows: data transmission Cost of node w executing reduce task r_w/rFor r, the sum of the data transmission costs of pulling the corresponding intermediate result key-value pair from each node, i.e.Wherein m is_iTo execute the node of map task i, d (w, m)_i) Is the network distance between two nodes, r_inputA set of input key-value pairs of r.

7. The method as claimed in claim 1, wherein the optimal executing node set of each task obtained in the second step is the optimal executing node set N for obtaining the reduce task r_optimal(r), performing a task r at any node w in the set results in a minimal Cost for data transmission Cost_w/rThe specific process comprises the following steps:

when the optimal task set R of any node_optimal(n) is all not yet executed reduceIn the e task, when the node n pulls a set formed by tasks with the minimum data transmission cost of the intermediate result key value pair, and the current node is not the optimal execution node of any task, the task selector allocates R to the current node_optimal(n);