CN103440246A

CN103440246A - Intermediate result data sequencing method and system for MapReduce

Info

Publication number: CN103440246A
Application number: CN2013103059318A
Authority: CN
Inventors: 王猛; 杨毅; 王谦
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2013-07-19
Filing date: 2013-07-19
Publication date: 2013-12-11

Abstract

The invention provides an intermediate result data sequencing method and for MapReduce. The method comprises the following steps that a plurality of intermediate result data generated by mapping tasks are obtained from a mapping task server; of the intermediate result data are divided into N groups according to slices of the intermediate result data; the intermediate result data in the N groups are respectively sequenced through N threads; and N groups of sequenced intermediate result data are written into a local disk from an internal memory. The intermediate result data sequencing method for MapReduce according to the embodiment of the invention has the advantages that the sequencing time for sequencing a plurality of intermediate data before writing the intermediate result data of MapReduce into the disk can be effectively reduced, the sequencing efficiency can be effectively improved. The invention also provides an intermediate result data sequencing system for MapReduce.

Description

Intermediate result data reordering method and system for MapReduce

Technical field

The present invention relates to the cloud computing technology field, particularly a kind of sort method and system of the data of the intermediate result for MapReduce.

Background technology

In current cloud computing field, MapReduce is a kind of simple popular but powerful programming model wherein, for the concurrent operation of large-scale dataset (being greater than 1TB).The application of MapReduce is very extensive, comprising: distribution grep, and distribution sorting, the reversion of web connection layout, the word vector of every machine, the web access log is analyzed, and reverse indexing builds, clustering documents, machine learning etc.

MapReduce does not require that the computing node of cluster is large scale computer, as long as common machines.The data of MapReduce operation are the mass datas be distributed on many machines (node), and these data all are stored in a distributed file system, and file system can externally provide some interfaces that file carried out to the streaming operation.The core concept of MapReduce programming model is to divide and rule, and an operation of writing with the MapReduce framework is comprised of two parts usually, mapping (Map) stage and abbreviation (Reduce) stage.The logic in these two stages is:

The Map(mapping) stage: framework is carried out a plurality of mapping tasks (map task), reads in corresponding data to be processed from distributed file system, uses the map function to carry out (mapping) to each data and processes, and result is write to local disk according to the order of sequence.

The Reduce(abbreviation) stage: framework is carried out one or more reduce task, by all data that produced by the Map stage of Network Capture, uses the reduce function to do further (abbreviation) to it and processes, and result is write in distributed file system.

The main part in Map stage is user-defined map function, it to be input as (key, a value) on territory right, it is output as (key, value) right chained list on another territory.

Map(k1,v1)→list(k2,v2)

The map function can be applied to all input data concurrently, thereby it is right to be that each input (k1, v1) generates a series of (k2, v2).The reduce function in Reduce stage is also by User Defined, and the MapReduce framework can collect all k2 identical (k2, v2) and form one group, and is distributed to reduce by all groups by certain rule and processed.Each group will be employed the Reduce function, generate 0 or a plurality of value.Reduce(k2,list(v2))→list(v3)

As shown in Figure 2, the output assisted class of MapTask is all inherited from MapOutputCollector, if Mapper is follow-up the reduce task is arranged, and system can be used this derived class of MapOutputBuffer as output.These outputs are called as the intermediate result of map, these intermediate results may be very large, and before giving reduce, need to be sorted, the output that is each map is sorted in advance, giving like this after reduce reduce just can be at multiway merge on sorted a plurality of map inputs basis, thereby allows the input of reduce become global orderly.In the intermediate result of map end can leave an internal memory buffer in temporarily, but this internal memory buffer finite capacity, when buffer has expired or consumption reaches certain threshold value and will trigger the spill process, data in buffer are brushed on hard disk, generate interim spill0 on hard disk, the spill1 file, these temporary files need first to sequence order in internal memory before writing hard disk, and the spill file itself is the ordered arrangement of the data of a spill.When the map process finishes, after total data is all exported, mapoutputbuffer can carry out multiway merge according to the spill file on hard disk, synthesizes an orderly definitive document file.out and generates the index file that this file is corresponding.

As shown in Figure 3, in the map process, the record that user program is got can be received by the class of mapoutputbuffer, the byte array that a buffer by name is arranged in this class, for depositing concrete record key, byte after the value serializing, location mode is key, the value continuous and compact is deposited, also have two int arrays simultaneously, for depositing the metadata about each record, one of them is deposited is the partition of the keyvalue that every record is corresponding, keystart, valstart, kvindices(intermediate data index by name), the deposit position of each record in buffer and the partition under it are just described by such tlv triple, each tlv triple is exactly three int variablees, each tlv triple is adjacent one by one in the int array,<partition, keystart, valstart ><partition, keystart, valstart > ... also has in addition int array kvoffsets(burst location index by name), array of indexes as kvindices, the only corresponding int value of each record of this inside, it is the position of partition that this value is pointed to first int in the tlv triple of every record corresponding in kvindices.So just formed the metadata (kvindices) of sequence number (kvoffsets)---> record of record---> the secondary index relation of record real data (kvbuffer).True Data do not need to operate in kvbuffer in sequence in, and only need to read the key that needs record relatively according to index point, then comparative result is reacted in this sequence number array of kvoffsets, only to kvoffsets, sequence gets final product, data physical location in kvindices and kvbuffer does not need to change, and in kvoffsets, the change in location of element means the result of sequence.

Data after sequence according to the order of kvoffsets, read the byte array of key and value from buffer, write in the spill file.But there is a problem in this process, due in most cases, the speed of sequence is compared the speed of user program output record and is wanted slow a lot, thereby be easy to cause when buffer full, but the spill thread work does not also complete, writing like this user thread of data will block, until the data that spill completes in internal memory are released, this can cause certain time delay.

Summary of the invention

Purpose of the present invention is intended at least solve one of described technological deficiency.

For this reason, one object of the present invention is to propose a kind of intermediate result data that can effectively reduce MapReduce and writes from internal memory the sorting time that hard disk is sorted to a plurality of intermediate data before, effectively promotes the sort method of the data of the intermediate result for MapReduce of sequence efficiency.

Another object of the present invention is to propose a kind of ordering system of the data of the intermediate result for MapReduce.

For achieving the above object, the embodiment of first aspect present invention discloses a kind of sort method of the data of the intermediate result for MapReduce, comprises the following steps: from the mapping task server, obtain a plurality of intermediate result data that the mapping task produces; According to the burst under a plurality of described intermediate result data, a plurality of described intermediate result data are divided into to the N group; By N thread, the intermediate result data in described N group are sorted respectively; And the group of the N after sequence intermediate result data are write to local disk from internal memory.

Sort method according to the data of the intermediate result for MapReduce of the embodiment of the present invention, alleviation is in the map output procedure, be greater than spill sequence speed owing to recording output speed, thereby buffer zone buffer is forced to wait for that spill completes the time delay that releasing memory causes because be filled to cause writing.The sort method of the data of the intermediate result for MapReduce by the embodiment of the present invention, the utilization thought of dividing and ruling is divided into aliquot to intermediate data sequence and processes, grouping based on partition simultaneously, simplified Compare Logic, utilize multithreading to be sorted, greatly accelerated sequence speed, reduced and write (spill) time, thereby alleviated buffer in the spill process, be fully written and have to wait for that spill completes the time delay brought.Particularly, when spill, all record to be sorted divide into groups for partition, to the independent thread ordering of the inner use of each partition, for example: the wordcount that uses hadoop to carry is tested the sort method of the embodiment of the present invention, opens three thread map and on average accelerates 30% left and right.

In addition, the sort method of the data of the intermediate result for MapReduce according to the above embodiment of the present invention can also have following additional technical characterictic:

In some instances, also comprise: simplify the N group intermediate data result after task server obtains sequence from described local disk, and described N group intermediate data result is carried out to the processing of simplification task.

In some instances, describedly according to the burst under a plurality of described intermediate result data, a plurality of described intermediate result data are divided into to the N group, further comprise: create burst index two-dimensional array; After being merged, burst under a plurality of described intermediate result data is stored in the first dimension storage space of described burst index two-dimensional array; Each burst location index is stored in the second dimension storage space of described burst index two-dimensional array; According to a plurality of bursts of described the first dimension storage space storage, a plurality of burst location indexs corresponding to each burst that are stored in the second dimension storage space are sorted.

In some instances, a plurality of burst location indexs corresponding to each burst that are stored in the second dimension storage space are sorted by a thread.

The embodiment of second aspect present invention provides a kind of ordering system of the data of the intermediate result for MapReduce, comprising: home server and mapping server, and wherein, described mapping server is for carrying out the mapping task to generate a plurality of intermediate result data; Described home server is for obtaining described a plurality of intermediate result data from described mapping task server, and according to the burst under a plurality of described intermediate result data, a plurality of described intermediate result data are divided into to the N group, and by N thread, the intermediate result data in described N group are sorted respectively, and the group of the N after sequence intermediate result data are write to local disk from internal memory.

Ordering system according to the data of the intermediate result for MapReduce of the embodiment of the present invention, alleviation is in the map output procedure, be greater than spill sequence speed owing to recording output speed, thereby buffer zone buffer is forced to wait for that spill completes the time delay that releasing memory causes because be filled to cause writing.The ordering system of the data of the intermediate result for MapReduce by the embodiment of the present invention, the utilization thought of dividing and ruling is divided into aliquot to intermediate data sequence and processes, grouping based on partition simultaneously, simplified Compare Logic, utilize multithreading to be sorted, greatly accelerated sequence speed, reduced and write (spill) time, thereby alleviated buffer in the spill process, be fully written and have to wait for that spill completes the time delay brought.Particularly, when spill, all record to be sorted divide into groups for partition, to the independent thread ordering of the inner use of each partition, for example: the wordcount that uses hadoop to carry is tested the sort method of the embodiment of the present invention, opens three thread map and on average accelerates 30% left and right.

In addition, the ordering system of the data of the intermediate result for MapReduce according to the above embodiment of the present invention can also have following additional technical characterictic:

In some instances, also comprise: simplify server, described simplification server is used for the N group intermediate data result from the local disk of described home server obtains sequence, and described N group intermediate data result is carried out to the processing of simplification task.

In some instances, described home server is divided into the N group according to the burst under a plurality of described intermediate result data by a plurality of described intermediate result data, comprise: create burst index two-dimensional array, after being merged, burst under a plurality of described intermediate result data is stored in the first dimension storage space of described burst index two-dimensional array, and each burst location index is stored in the second dimension storage space of described burst index two-dimensional array, and according to a plurality of bursts of described the first dimension storage space storage, a plurality of burst location indexs corresponding to each burst that are stored in the second dimension storage space are sorted.

In some instances, described home server to be stored in a plurality of burst location indexs that each burst in the second dimension storage space is corresponding by one independently thread sorted.

The aspect that the present invention is additional and advantage part in the following description provide, and part will become obviously from the following description, or recognize by practice of the present invention.

The accompanying drawing explanation

Of the present invention and/or additional aspect and advantage will become from the following description of the accompanying drawings of embodiments and obviously and easily understand, wherein:

Fig. 1 is according to an embodiment of the invention for the process flow diagram of the sort method of the intermediate result data of MapReduce;

Fig. 2 is the intermediate result data that produce by MapReduce in the prior art flow graphs while from internal memory, writing hard disk;

Fig. 3 writes from internal memory the index structure schematic diagram that hard disk is sorted to middle result data before to the intermediate result data that produce by MapReduce in prior art;

Fig. 4 is according to an embodiment of the invention for the index structure schematic diagram from middle result data being sorted before internal memory writes hard disk to the intermediate result data that produce by MapReduce of the sort method of the intermediate result data of MapReduce; And

Fig. 5 is according to an embodiment of the invention for the schematic diagram of the ordering system of the intermediate result data of MapReduce.

Embodiment

Below describe embodiments of the invention in detail, the example of described embodiment is shown in the drawings, and wherein same or similar label means same or similar element or the element with identical or similar functions from start to finish.Be exemplary below by the embodiment be described with reference to the drawings, only for explaining the present invention, and can not be interpreted as limitation of the present invention.

In description of the invention, it will be appreciated that, term " vertically ", " laterally ", " on ", orientation or the position relationship of the indications such as D score, 'fornt', 'back', " left side ", " right side ", " vertically ", " level ", " top ", " end " " interior ", " outward " be based on orientation shown in the drawings or position relationship, only the present invention for convenience of description and simplified characterization, rather than indicate or imply that the device of indication or element must have specific orientation, construct and operation with specific orientation, therefore can not be interpreted as limitation of the present invention.

In description of the invention, it should be noted that, unless otherwise prescribed and limit, term " installation ", " being connected ", " connection " should be done broad understanding, for example, can be mechanical connection or electrical connection, can be also the connection of two element internals, can be directly to be connected, and also can indirectly be connected by intermediary, for the ordinary skill in the art, can understand as the case may be the concrete meaning of described term.

Below in conjunction with accompanying drawing, sort method and the system according to the data of the intermediate result for MapReduce of the embodiment of the present invention described.

Fig. 1 is according to an embodiment of the invention for the process flow diagram of the sort method of the intermediate result data of MapReduce.As shown in Figure 1, for the sort method of the intermediate result data of MapReduce, comprise the following steps according to an embodiment of the invention:

Step S101: from the mapping task server, obtain a plurality of intermediate result data that the mapping task produces.

Step S102: a plurality of intermediate result data are divided into to the N group according to the burst under a plurality of intermediate result data.

As a concrete example, according to the burst under a plurality of intermediate result data, a plurality of intermediate result data are divided into to the N group, further comprise:

1, create burst index two-dimensional array.

2, will after the merging of the burst under a plurality of intermediate result data, be stored in the first dimension storage space of burst index two-dimensional array.

3, each burst location index is stored in the second dimension storage space of burst index two-dimensional array.

4, according to a plurality of bursts of the first dimension storage space storage, a plurality of burst location indexs corresponding to each burst that are stored in the second dimension storage space are sorted.

Particularly, as shown in Figure 3, by the background technology that middle result data is sorted when internal memory writes (spill) hard disk, sequencer procedure is to use burst location index (kvoffsets) as index, the key real data of pointing in concrete internal memory (buffer) compares, then exchange the index position in kvoffsets, thereby realize that all record of final current spill are preferentially according to burst partition sequence, the logic of the comparator sequence that then partition defines according to key again.And whole process is all to be completed by single-threaded.Suppose that the record total quantity is n, use quicksort, time complexity is nlogn.

Due to comparison rule each time element relatively need in twos two step Compare Logic, and want the whole record of disposable sequence, if n is larger for concrete sort algorithm, do not have good effect under Compare Logic yet complicated situation.And if can be divided into record to be sorted r part, to the part record of every portion sequence, time complexity is exactly so

for

be that data scale is less, the raising of sequence speed is nonlinear.For the record here, most suitable is just according to burst partition quantity, to be divided into r part before sequence, sequence only need to compare the key of same partition inside, does not need relatively burst partition again, can reduce to a step to two step logics.And if the record to having divided according to burst partition, each partition is used an independent thread ordering to belong to the data of this partition, does not interfere with each other between partition, can realize the multi-threaded parallel sequence.

More specifically, for example in the mapoutputbuffer class, add an auxiliary two-dimensional array as new index, it is burst index two-dimensional array, be designated as: partitionIndexes[] [], the first dimension is burst partition, the second dimension is record that this partition the is corresponding index at intermediate data index: kvindices, at record each time, write fashionable, record the partition counting that this record is corresponding and add 1, when each spill starts, the second dimension number of elements according to partition counting initialization burst index two-dimensional array partitionIndexes, then go over the kvoffsets that needs the spill interval, partitionIndexes[inserted in the index that each is recorded in kvindices] in array corresponding to the corresponding partition of [], final so just the record that will sort in burst location index kvoffsets at partitionIndexes[] all re-established new index and mapping relations in [].

Step S103: by N thread, the intermediate result data in the N group are sorted respectively.For example: as shown in Figure 4, create a plurality of threads (as Thread0 to Thread3), the record of each burst partition inside is sorted, the two-dimensional array sequence of partitionIndexes got final product.

Step S104: the group of the N after sequence intermediate result data are write to local disk from internal memory.In one embodiment of the invention, a plurality of burst location indexs corresponding to each burst that are stored in the second dimension storage space are sorted by a thread.For example: sequence order in step S103 after, use partitionIndexes as the index that outputs to hard disk, 0 subscript since the first dimension, represent partition0, if data are arranged successively keyvalue corresponding in internal memory buffer is write to hard disk, gets final product.

In one embodiment of the invention, also comprise: simplify the N group intermediate data result after task server obtains sequence from local disk, and N group intermediate data result is carried out to the processing of simplification task.The intermediate result data that are stored in local hard drive after sequence are carried out to abbreviation task (Reduce Task).

Fig. 5 is according to an embodiment of the invention for the schematic diagram of the ordering system of the intermediate result data of MapReduce.As shown in Figure 5, for the ordering system 500 of the intermediate result data of MapReduce, comprising: home server 510 and mapping server 520 according to an embodiment of the invention.

Wherein, mapping server 520 is for carrying out the mapping task to generate a plurality of intermediate result data.Home server 510 is for obtaining a plurality of intermediate result data from mapping task server 520, and according to the burst under a plurality of intermediate result data, a plurality of intermediate result data are divided into to the N group, and by N thread, the intermediate result data in the N group are sorted respectively, and the group of the N after sequence intermediate result data are write to local disk from internal memory.

As a concrete example, home server 510 is divided into the N group according to the burst under a plurality of intermediate result data by a plurality of described intermediate result data, comprise: create burst index two-dimensional array, after being merged, burst under a plurality of intermediate result data is stored in the first dimension storage space of burst index two-dimensional array, and each burst location index is stored in the second dimension storage space of burst index two-dimensional array, and according to a plurality of bursts of the first dimension storage space storage, a plurality of burst location indexs corresponding to each burst that are stored in the second dimension storage space are sorted.

for

Further.As shown in Figure 4, create a plurality of threads (as Thread0 to Thread3), the record of each burst partition inside is sorted, the two-dimensional array sequence of partitionIndexes got final product.

In one embodiment of the invention, a plurality of burst location indexs corresponding to each burst that 510 pairs of home servers are stored in the second dimension storage space are sorted by a thread.For example: use partitionIndexes as the index that outputs to hard disk, 0 subscript since the first dimension, represent partition0, if data are arranged successively keyvalue corresponding in internal memory buffer is write to hard disk, gets final product.

Shown in Fig. 5, the ordering system 500 of the data of the intermediate result for MapReduce of the embodiment of the present invention also comprises: simplify server 530.Simplify server 530 and obtain the N group intermediate data result sequence for the local disk from home server 510, and N group intermediate data result is carried out to the processing of simplification task.The intermediate result data that are stored in local hard drive after sequence are carried out to abbreviation task (Reduce Task).

In the description of this instructions, the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means to be contained at least one embodiment of the present invention or example in conjunction with specific features, structure, material or the characteristics of this embodiment or example description.In this manual, the schematic statement of described term not necessarily referred to identical embodiment or example.And the specific features of description, structure, material or characteristics can be with suitable mode combinations in any one or more embodiment or example.

Although illustrated and described embodiments of the invention, for the ordinary skill in the art, be appreciated that without departing from the principles and spirit of the present invention and can carry out multiple variation, modification, replacement and modification to these embodiment, scope of the present invention is by claims and be equal to and limit.

Claims

1. the sort method of the data of the intermediate result for MapReduce, is characterized in that, comprises the following steps:

Obtain from the mapping task server a plurality of intermediate result data that the mapping task produces;

According to the burst under a plurality of described intermediate result data, a plurality of described intermediate result data are divided into to the N group;

By N thread, the intermediate result data in described N group are sorted respectively; And

N after sequence group intermediate result data are write to local disk from internal memory.

2. the sort method of the data of the intermediate result for MapReduce according to claim 1, is characterized in that, also comprises:

Simplify the N group intermediate data result after task server obtains sequence from described local disk, and described N group intermediate data result is carried out to the processing of simplification task.

3. the sort method of the data of the intermediate result for MapReduce according to claim 1, is characterized in that, describedly according to the burst under a plurality of described intermediate result data, a plurality of described intermediate result data is divided into to the N group, further comprises:

Create burst index two-dimensional array;

After being merged, burst under a plurality of described intermediate result data is stored in the first dimension storage space of described burst index two-dimensional array;

Each burst location index is stored in the second dimension storage space of described burst index two-dimensional array;

According to a plurality of bursts of described the first dimension storage space storage, a plurality of burst location indexs corresponding to each burst that are stored in the second dimension storage space are sorted.

4. the sort method of the data of the intermediate result for MapReduce according to claim 3, is characterized in that, a plurality of burst location indexs corresponding to each burst that are stored in the second dimension storage space are sorted by a thread.

5. the ordering system of the data of the intermediate result for MapReduce, is characterized in that, comprising: home server and mapping server, wherein,

Described mapping server is for carrying out the mapping task to generate a plurality of intermediate result data;

Described home server is for obtaining described a plurality of intermediate result data from described mapping task server, and according to the burst under a plurality of described intermediate result data, a plurality of described intermediate result data are divided into to the N group, and by N thread, the intermediate result data in described N group are sorted respectively, and the group of the N after sequence intermediate result data are write to local disk from internal memory.

6. the ordering system of the data of the intermediate result for MapReduce according to claim 1, is characterized in that, also comprises:

Simplify server, described simplification server is used for the N group intermediate data result from the local disk of described home server obtains sequence, and described N group intermediate data result is carried out to the processing of simplification task.

7. the ordering system of the data of the intermediate result for MapReduce according to claim 5, it is characterized in that, described home server is divided into the N group according to the burst under a plurality of described intermediate result data by a plurality of described intermediate result data, comprise: create burst index two-dimensional array, after being merged, burst under a plurality of described intermediate result data is stored in the first dimension storage space of described burst index two-dimensional array, and each burst location index is stored in the second dimension storage space of described burst index two-dimensional array, and according to a plurality of bursts of described the first dimension storage space storage, a plurality of burst location indexs corresponding to each burst that are stored in the second dimension storage space are sorted.

8. the ordering system of the data of the intermediate result for MapReduce according to claim 7, it is characterized in that, described home server to be stored in a plurality of burst location indexs that each burst in the second dimension storage space is corresponding by one independently thread sorted.