CN103617087A

CN103617087A - MapReduce optimizing method suitable for iterative computations

Info

Publication number: CN103617087A
Application number: CN201310600745.7A
Authority: CN
Inventors: 金海�; 郑然�; 余根茂; 章勤; 朱磊
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2013-11-25
Filing date: 2013-11-25
Publication date: 2014-03-05
Anticipated expiration: 2033-11-25
Also published as: CN103617087B

Abstract

The invention discloses a MapReduce optimizing method suitable for iterative computations. The MapReduce optimizing method is applied to a Hadoop trunking system. The trunking system comprises a major node and a plurality of secondary nodes. The MapReduce optimizing method comprises the following steps that a plurality of Hadoop jobs submitted by a user are received by the major node; the jobs are placed in a job queue by a job service process of the major node and wait for being scheduled by a job scheduler of the major node; the major node waits for a task request transmitted from the secondary nodes; after the major node receives the task request, localized tasks are scheduled preferentially by the job scheduler of the major node; and if the secondary nodes which transmit the task request do not have localized tasks, prediction scheduling is performed according to task types of the Hadoop jobs. The MapReduce optimizing method can support the traditional data-intensive application, and can also support iterative computations transparently and efficiently; dynamic data and static data can be respectively researched; and data transmission quantity can be reduced.

Description

A kind of MapReduce optimization method of applicable iterative computation

Technical field

The invention belongs to parallel computation and mass data processing field, more specifically, relate to a kind of MapReduce optimization method of applicable iterative computation.

Background technology

Enter into 21 century, the treatment scale of data is increasing, and other scale of TB level is more and more common, has even occurred other scale of PB level.This other data scale of level is far beyond the processing power of PC.The demand of this processing power has promoted the development of parallel or distributed computing platform just.In this case, the MapReduce model of Google arises at the historic moment, and it is data-intensive computation model under a kind of popular large cluster environment.

MapReduce is a kind of programming model, for the concurrent operation of large-scale dataset (being greater than 1TB).Concept " Map(mapping) " and " Reduce(abbreviation) ", and their main thought, all from Functional Programming, borrow the characteristic of borrowing in addition from vector programming language.He is very easy to programming personnel can not distributed parallel programming in the situation that, and the program of oneself is operated in distributed system.In this model, the organizational form of all data is a kind of <key, value> couple.During programming, programmer need to do just realizes Map and Reduce function.Map function is processed input <key, value> to and export zero or several key-value pairs, Reduce function read Map in the middle of output, finally obtain zero or several results.MapReduce model structure is followed relatively independent principle, between Map or Reduce, does not have data dependence relation.

The mentality of designing of MapReduce model allows it be good at carrying out the calculating of batch mode, such as log analysis and text-processing etc.Yet except the application of these batch processing modes, also exist the application based on machine learning or pattern-recognition, typically have computer vision and data mining application etc.In these application, core algorithm designs based on iterative manner.The realization yet current Hadoop(MapReduce model is increased income) can not transparently support efficiently iterative computation, even some characteristic of Hadoop is not suitable for iterative computation.Along with the development of social networks, computer vision, data mining etc., the data processing scale of this class application is increasing.Can effectively support that the demand of the parallel computational model that this class is applied is increasing.

Summary of the invention

Above defect or Improvement requirement for prior art, the invention provides a kind of MapReduce optimization method of applicable iterative computation, its object is, on the basis of Hadoop, improved, can either support traditional data intensive applications, can transparently support efficiently iterative computation again, and from dynamic data and two aspects of static data, study and realize respectively the minimizing of volume of transmitted data.

For achieving the above object, according to one aspect of the present invention, a kind of MapReduce optimization method of applicable iterative computation is provided, be to be applied in a kind of Hadoop group system, this group system comprises a host node and a plurality of from node, said method comprising the steps of:

(1) host node receives a plurality of Hadoop operations that user submits to, and the job service process of host node is put into job queue by operation, and waits for that the job scheduler of host node carries out job scheduling;

(2) host node is waited for the task requests of sending from node, and after receiving task requests, the job scheduler priority scheduling localization tasks of host node, if there is no localization tasks what send task requests from node, according to the task type of Hadoop operation, predict scheduling, for calculation type task, directly this Hadoop operation is dispatched, for mode transmission task, postpone certain intervals, when total delay time interval reaches delay threshold value, just this Hadoop operation is dispatched;

(3) from node the task of Hadoop operation that receives host node scheduling, judgement Hadoop homework type carries out different disposal, homework type is divided into two kinds of iterative type and non-iterative types, for non-iterative type operation, according to the conventional processing mode of Hadoop, process, for iterative type operation, at Map, before the stage, increase Map end and shuffle process, be used to Map task to read dynamic data, in the Reduce stage, dynamic data is carried out to local cache and transfer to the dynamic data buffer memory assembly management from node, and after operation is disposed, final result is kept in HDFS.

Preferably, step (2) specifically comprises following sub-step:

(2-1) heartbeat message that the job service process monitoring on host node the task service processes of wait from node send, this heartbeat message comprises the current operation information from node, specifically comprises total groove number and the current groove number moving;

(2-2) host node is receiving the heartbeat message of sending from node, according to this heartbeat message, calculate current from the idle groove number of node and the average run channel number of whole Hadoop group system, according to the result of calculating, judging whether need to be to current task of distributing this operation from node, if do not need allocating task, return to step (2-1), otherwise execution step (2-3);

(2-3) counter i=0 is set;

(2-4) judge whether i Hadoop operation has localization tasks current from node, it is the current input data fragmentation (Split) that whether stores i Hadoop operation from node, if do not proceed to step (2-5), if having, proceed to step (2-11);

(2-5) i=i+1 is set, and judges whether i equals the number of Hadoop operation, if equal, enter step (2-7), otherwise return to step (2-4);

(2-6) counter j=0 is set;

(2-7) task type that judges j Hadoop operation is calculation type task or mode transmission task, if calculation type task enters step (2-11), if mode transmission task enters step (2-8);

(2-8) task scheduling of j Hadoop operation is postponed to a heart time;

(2-9) judge that whether the total delay time that j Hadoop job task dispatched reaches a threshold value, if reach, proceeds to step (2-11), otherwise proceeds to step (2-10);

(2-10) j=j+1 is set, and judges whether j equals the number of Hadoop operation, if equal, enter step (2-12), otherwise return to step (2-1);

(2-11) localization tasks of i Hadoop operation is dispatched to current from node, then process finishes;

(2-12) to the task scheduling of j Hadoop operation to current from node, then process finishes.

Preferably, step (2-2) is specially, the current idle groove number from node equals total groove number and deducts the current groove number moving, the average run channel number of whole Hadoop group system be all groove numbers that moving from node that trace daemon monitors and divided by all groove numbers from node, if the idle groove number of present node equals 0, do not need allocating task, if the current groove number moving from node is greater than the average run channel number of whole Hadoop group system simultaneously, do not need allocating task.

Preferably, step (3) specifically comprises following sub-step:

(3-1) receive the task of the Hadoop operation of host node scheduling;

(3-2) homework type of judgement task is that iterative type operation is also non-iterative type operation, if iterative type operation proceeds to step (3-3), if be non-iterative type operation, proceeds to step (3-4);

(3-3) task type that judges this iterative type operation is Map task or Reduce task, if Map task proceeds to step (3-5), if Reduce task proceeds to step (3-9);

(3-4) task type that judges this non-iterative type operation is Map task or Reduce task, if Map task proceeds to step (3-8), if Reduce task proceeds to step (3-9);

(3-5) judge whether this iterative type operation is to move for the first time, if not proceeding to step (3-6), if it is proceed to step (3-7);

(3-6) Map task process (Mapper) place starts a plurality of data copy threads from the task service processes of node, by HTTP mode request Reduce task process place, from node, obtain the dynamic data file that Reduce task process calculates, then proceed to step (3-8);

(3-7) Map task process reads dynamic data initialization value, then proceeds to step (3-8);

(3-8) Hadoop group system resolves into burst one by one by the input file of operation, and Map task process is processed burst, then proceeds to step (3-14);

(3-9) copy of the Reducer task process log-on data from node thread, by HTTP mode request Map task process place, from node, obtain the intermediate output file of Map task process, intermediate output file is stored in from the local disk of node, the file that copy is come can first be placed in core buffer, a plurality of copied files can be merged into final large file according to meeting, this large file sorts according to key, then proceeds to step (3-10);

(3-10) Reduce task process is with <key from the large file obtaining, and iterator> form reads record, and carries out Reduce () method, then proceeds to step (3-11);

(3-11) judgement homework type is that iterative type is also non-iterative type, if iterative type operation proceeds to step (3-12), if be non-iterative type operation, proceeds to step (3-13);

(3-12) result cache from the dynamic data buffer memory assembly of node is carried out Reduce task process, in the middle of internal memory, spills in local disk file when buffer zone is full, then proceeds to step (3-14);

(3-13) Reduce task process is written to the result after carrying out in HDFS, then proceeds to step (3-14);

(3-14) tasks carrying finishes, and then returns to step (3-1).

Preferably, the dynamic data file in step (3-6), by the dynamic data buffer memory assembly management from node, is kept in internal memory and local disk; The copy dynamic data of coming is also managed by dynamic data buffer memory assembly, and same a plurality of Map task process from node are from from the local dynamic data request file of node, the dynamic data input that these data will need as Map task process.

Preferably, in step (3-8), the size of burst is defaulted as HDFS block size, block size configures by configuration file, Map task process resolves into by burst the <key that Map task process needs, the record of value> form, carry out Map () method, by the result cache of carrying out in the middle of internal memory, when buffer zone is full, can spill in the middle of disk, the information of the file meeting record partitioning overflowing, first single spill file according to subregion sequence, then sorts according to key; If there are a plurality of spill files need to be merged into a large file, this process is carried out merge sort to a plurality of spill files.

In general, the above technical scheme of conceiving by the present invention compared with prior art, can obtain following beneficial effect:

(1) the data locality of task is better: owing to having adopted step (2), the scheduling strategy proposing in the present invention is compared to delay dispatching strategy, better balance task demand for localization and task postpone expense, task (task) conception division being about in Hadoop is computation-intensive and intensive two classes of transmission, and in conjunction with the load information of cluster network prediction lag time in real time.Raising task localization ratio can be reached like this, the bulk delay expense of operation can be effectively reduced again.Therefore the present invention has obvious advantage.

(2) dynamic data transmission expense is less: owing to having adopted step (3), the dynamic data cache policy that the present invention proposes has greatly reduced the cluster network transport overhead that read-write dynamic data brings.Theoretical proof experimental verification, the dynamic data transmission total amount of iterative type operation is only directly proportional to task place interstitial content, and has the definite upper limit, i.e. clustered node number.Therefore the present invention has obvious advantage.

(3) cluster usefulness under many operations and multi-user's environment for use is higher: under many operations and multi-user environment, the Internet resources of cluster become the bottleneck of trunking efficiency, will greatly limit the effective utilization of cluster.The present invention, by optimizing the network data flow of Hadoop, reduces cluster network transport overhead, effectively alleviates cluster network load, reduces the Internet resources competition between user and between operation, has improved the cluster effective utilization under many operations and multi-user.Therefore the present invention has obvious advantage.

(4) iterative computation is supported on high-efficient transparent ground.Compared to traditional Hadoop, the present invention both can support traditional batch processing job, can support better iterative type operation again, so use of the present invention field is more extensive, and such as social networks, computer vision, data mining etc.Therefore the present invention has obvious advantage.

Accompanying drawing explanation

Fig. 1 is the process flow diagram that the present invention is applicable to the MapReduce optimization method of iterative computation.

Fig. 2 is the refinement process flow diagram of step of the present invention (2).

Fig. 3 is the refinement process flow diagram of step of the present invention (3).

Embodiment

In order to make object of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.In addition,, in each embodiment of described the present invention, involved technical characterictic just can not combine mutually as long as do not form each other conflict.

Below first technical term of the present invention is explained and illustrated:

Dynamic data: in iterative computation problem, one directly or indirectly constantly by the variable of old value recursion value of making new advances.

Static data: in iterative computation problem, do not have the data of any change, be generally the original input data of algorithm.

Calculation type task: account for the task of major part the computing time of task in whole processing procedures of Map task.

Mode transmission task: the data transmission period of task accounts for the task of major part in whole processing procedures of Map task.

Localization tasks: in the Map task that stores input data fragmentation from node this locality.

Delay dispatching strategy: a kind of strategy that postpones non-localization task scheduling.

Integral Thought of the present invention is, is conceived to multi-user and many operations cluster environment, reduces the Internet Transmission load of cluster by optimizing static data flow and shared data stream.For the optimization of static data flow, main contributions of the present invention one prediction dispatching algorithm; For the optimization of sharing data stream, the present invention achieves the goal by the shuffle process of data buffer storage strategy and increase Map end.

The MapReduce optimization method that the present invention is applicable to iterative computation is to be applied in a kind of Hadoop group system, and this group system comprises master's (Master) node and a plurality of from (Slave) node, and the method comprises the following steps (as shown in Figure 1):

(1) host node receives a plurality of Hadoop operations that user submits to, and the job service process (JobTracker) of host node is put into job queue by operation, and waits for that the job scheduler of host node carries out job scheduling;

(2) host node is waited for the task requests of sending from node, and after receiving task requests, the job scheduler priority scheduling localization tasks of host node, if there is no localization tasks what send task requests from node, according to the task type of Hadoop operation, predict scheduling, for calculation type task, directly this Hadoop operation is dispatched, for mode transmission task, postpone certain intervals, when total delay time interval reaches delay threshold value, just this Hadoop operation is dispatched.Specifically comprise following sub-step (as shown in Figure 2):

(2-1) heartbeat message that the job service process monitoring on host node the task service processes (TaskTracker) of wait from node send, this heartbeat message comprises the current operation information from node, specifically comprises total groove (slot) number and the current groove number moving etc.;

(2-2) host node is receiving the heartbeat message of sending from node, according to this heartbeat message, calculate current from the idle groove number of node and the average run channel number of whole Hadoop group system, according to the result of calculating, judging whether need to be to current task of distributing this operation from node, if do not need allocating task, return to step (2-1), otherwise execution step (2-3); Particularly, the current idle groove number from node equals total groove number and deducts the current groove number moving, the average run channel number of whole Hadoop group system be all groove numbers that moving from node that trace daemon monitors and divided by all groove numbers from node; If the idle groove number of present node equals 0, do not need allocating task, if the current groove number moving from node of while is greater than the average run channel number of whole Hadoop group system, do not need allocating task;

(2-3) counter i=0 is set;

(2-6) counter j=0 is set;

(2-8) task scheduling of j Hadoop operation is postponed to a heart time; Particularly, heart time is from node, to send the time interval of heartbeat message, is specially 3 seconds;

(2-9) judge that whether the total delay time that j Hadoop job task dispatched reaches a threshold value, if reach, proceeds to step (2-11), otherwise proceeds to step (2-10); The value of threshold value can be configured by cluster administrator, configuration according to being: when threshold value is larger, the localization ratio of task can be larger, but the expense postponing also can be larger; Threshold value is less, and localization ratio is relatively less, but the expense postponing also can be less, and threshold value is defaulted as 3 minutes;

The advantage of this step is: task is classified, to calculation type task, use the mode of acquiescence to dispatch, mode transmission task is predicted to scheduling.The localized ratio of task can be both can improved like this, the expense that delay brings can be reduced again.

(3) from node the task of Hadoop operation that receives host node scheduling, judgement Hadoop homework type carries out different disposal, homework type is divided into two kinds of iterative type and non-iterative types, for non-iterative type operation, according to the conventional processing mode of Hadoop, process, for iterative type operation, at Map, before the stage, increase a Map end and shuffled (shuffle) process, be used to the task (being Map task) in Map stage to read dynamic data, in the Reduce stage, dynamic data is carried out to local cache and transfer to the dynamic data buffer memory assembly management from node, and final result is kept to Hadoop distributed file system (Hadoop Distributed File System after operation is disposed, be called for short HDFS) in, this step specifically comprises following sub-step (as shown in Figure 3):

(3-1) receive the task of the Hadoop operation of host node scheduling;

(3-6) Map task process (Mapper) place starts a plurality of data copy threads from the task service processes of node, by HTTP mode request Reduce task process (Reducer) place, from node, obtain the dynamic data file that Reduce task process calculates, then proceed to step (3-8); These dynamic data file, by the dynamic data buffer memory assembly management from node, are kept in internal memory and local disk; The copy dynamic data of coming is also managed by dynamic data buffer memory assembly, and same a plurality of Map task process from node are from from the local dynamic data request file of node, the dynamic data input that these data will need as Map task process;

The advantage of this sub-step is: the dynamic data of 1, Reduce after the stage is kept at this, has reduced and has been written to the expense that HDFS brings; 2, by the dynamic data the stage from node request Reduce at Map task process place, and be kept at this locality from node of Map task process, Map task process, from from the local request msg of node, has reduced the transmission volume of dynamic data so widely.

(3-7) Map task process reads dynamic data initialization value, then proceeds to step (3-8); Briefly, iterative type operation needs and produces some dynamic datas, and these data are that the initialization value of this dynamic data need to be provided by user when Job execution for the first time;

(3-8) Hadoop group system resolves into burst one by one by the input file of operation, and Map task process is processed burst, then proceeds to step (3-14); Particularly, the size of burst is defaulted as HDFS block size, block size configures by configuration file, Map task process resolves into by burst the <key that Map task process needs, the record of value> form, carry out Map () method, the result cache of carrying out, in the middle of internal memory, can be spilt in the middle of disk when buffer zone is full; The information of the file meeting record partitioning overflowing, first single spill file according to subregion sequence, then sorts according to key; If there are a plurality of spill files need to be merged into a large file, this process is carried out merge sort to a plurality of spill files;

(3-14) tasks carrying finishes, and then returns to step (3-1).

Example:

In order to verify feasibility of the present invention and validity, under the experimental configuration environment shown in lower list 1, carry out the computer program of writing, invention is tested, shown in the following list 2 of test result and table 3:

Table 1: experimental configuration environment

In table 2 and table 3, comparison other of the present invention is Hadoop-0.20.0 and Haloop, and experiment algorithm is fuzzy C-Means.What table 2 represented is the transmission volume comparison of the dynamic data of three MapReduce implementations under different experiments scale.Table 3 represents is execution time comparisons during different iterations under certain experimental size of 3 MapReduce implementations.Experimental result demonstration, the present invention has more satisfactory improvement on network data transmission and time performance.

Dynamic data transmission amount comparison in table 2:fuzzy C-Means

Table 3:fuzzy C-Means execution time comparison

Those skilled in the art will readily understand; the foregoing is only preferred embodiment of the present invention; not in order to limit the present invention, all any modifications of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.

Claims

1. a MapReduce optimization method for applicable iterative computation, is to be applied in a kind of Hadoop group system, and this group system comprises a host node and a plurality of from node, it is characterized in that, said method comprising the steps of:

2. MapReduce optimization method according to claim 1, is characterized in that, step (2) specifically comprises following sub-step:

(2-3) counter i=0 is set;

(2-6) counter j=0 is set;

(2-8) task scheduling of j Hadoop operation is postponed to a heart time;

3. MapReduce optimization method according to claim 2, it is characterized in that, step (2-2) is specially, the current idle groove number from node equals total groove number and deducts the current groove number moving, the average run channel number of whole Hadoop group system be all groove numbers that moving from node that trace daemon monitors and divided by all groove numbers from node, if the idle groove number of present node equals 0, do not need allocating task, if the current groove number moving from node is greater than the average run channel number of whole Hadoop group system simultaneously, do not need allocating task.

4. MapReduce optimization method according to claim 1, is characterized in that, step (3) specifically comprises following sub-step:

(3-1) receive the task of the Hadoop operation of host node scheduling;

(3-14) tasks carrying finishes, and then returns to step (3-1).

5. MapReduce optimization method according to claim 4, is characterized in that, the dynamic data file in step (3-6), by the dynamic data buffer memory assembly management from node, is kept in internal memory and local disk; The copy dynamic data of coming is also managed by dynamic data buffer memory assembly, and same a plurality of Map task process from node are from from the local dynamic data request file of node, the dynamic data input that these data will need as Map task process.

6. MapReduce optimization method according to claim 4, it is characterized in that, in step (3-8), the size of burst is defaulted as HDFS block size, block size configures by configuration file, Map task process resolves into by burst the <key that Map task process needs, the record of value> form, carry out Map () method, by the result cache of carrying out in the middle of internal memory, when buffer zone is full, can spill in the middle of disk, the information of the file meeting record partitioning overflowing, first single spill file sorts according to subregion, then according to key, sort, if there are a plurality of spill files need to be merged into a large file, this process is carried out merge sort to a plurality of spill files.