CN105608046A

CN105608046A - Multi-core processor architecture based on MapReduce programming model

Info

Publication number: CN105608046A
Application number: CN201510954803.5A
Authority: CN
Inventors: 肖昊; 张华娟; 吴宁
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2015-12-17
Filing date: 2015-12-17
Publication date: 2016-05-25

Abstract

The invention discloses a multi-core processor architecture based on a MapReduce programming model, is used for quickening speed for a computational node to process a heavy data stream in a computer network and simplifying the parallel programming complexity of the computational node. In the processor, input data is output after being successively processed by a Split segmentation scheduling module, a MapReduce block assembly line module and a Merge module. The modules are connected through a multilayer on-chip interconnection structure, the Split segmentation scheduling module segments the input data into small blocks, and the MapReduce block assembly line module is controlled to process the small blocks; each block assembly line in the MapReduce block assembly line module consists of a Map processor, a Reduce processor and two Ping-Pong-type data caches, wherein the Map processor and the Reduce processor carry out MapReduce processing on data in the data caches in a Ping-Pong form; and finally, the Merge module adopts a hierarchical structure to merge and sort the results of the MapReduce, and obtains a final result.

Description

Polycaryon processor framework based on MapReduce programming model

Technical field

The invention belongs to polycaryon processor speed technology field, particularly relate to a kind of polycaryon processor framework based on MapReduce programming model.

Background technology

In recent years, technology of Internet of things develop rapidly, it combines various sensors and Intelligent treatment, utilize the various intellectual technologies such as cloud computing, pattern-recognition, the magnanimity information obtaining from sensor, analyze, process and process out significant data, to adapt to the demand of the different application such as business, safety, city management. And along with the progressively expansion of Internet of Things application and deeply, the quantity of various sensors is how much levels and increases, the information content simultaneously gathering also must be explosive increase, and the thing followed is the processing of the large data of magnanimity. At present, Internet of Things adopts centralised storage and the tupe based on large server and data center more, be the image data of being responsible for of sensor node, and data centralization is stored in to a large-scale data center, more further data are carried out to analyzing and processing by server. But the ability to bear of the hardware resource relevant to server all has certain limit, in order to allow the safe and reliable operation of server energy, only have quantity and the more senior server of renewal by continuous increase server, or restriction is the quantity of access services device simultaneously, the former need to spend a large amount of costs and power consumption, and the latter is that to sacrifice performance be cost, obviously the two is not good solution.

Therefore, the concept of the Internet of Things of decentralization is arisen at the historic moment, and it is a kind of taking node device as main, autonomous Internet of Things, it will become the pattern taking embedded device as computing node taking server or PC as the Mode change of computing node in cloud computing, thereby realizes energy-efficient Internet of Things. On the other hand, MapReduce is the parallel programming model extensively adopting at large Data processing at present, it is proposed by Google company at first, use for reference the thought of functional expression programming, two functions that have been abstracted into by the complicated parallel computation height on large-scale cluster: Map function and Reduce function. Map function is responsible for the decomposition to task, and Reduce function is that the task of decomposition is merged. The feature of MapReduce parallel programming model is easy to learn, reduces multiple programming difficulty thereby can shield bottom layer realization details, allows programmer free from numerous and diverse multiple programming work, writes like a cork simple, efficient concurrent program.

Based on the development trend of above-mentioned hardware platform and programming model, how to realize efficient MapReduce at embedded platform and process, the hardware-accelerated problem of MapReduce programming model on embedded platform, becomes Internet of Things field urgent problem. Current existing speeding scheme mainly contains three kinds: the one, and accelerate MapReduce by general-purpose built-in type polycaryon processor (as ARM series polycaryon processor etc.) and process; The 2nd, accelerate MapReduce by GPU and commercial heterogeneous multi-nucleus processor (as the Cell polycaryon processor of IBM etc.) and process; The 3rd, realize the processing of MapReduce by FPGA hardware. Such scheme one and scheme two are all to adopt commercial polycaryon processor to accelerate MapReduce, no matter be general polycaryon processor or GPU, the heterogeneous multi-nucleus processors such as Cell processor, not specially for MapReduce designs and develops, therefore use the effect of this class scheme acceleration MapReduce application limited, and power consumption is higher. Such scheme three adopts hardware circuit to realize MapReduce application, and performance can increase compared with the software scenario of processor, but the construction cycle of hardware is long, once and application change, hardware needs redesign, very dumb.

Be different from the embedded speeding scheme of above-mentioned three kinds of MapReduce, the invention discloses a kind of special polycaryon processor framework based on MapReduce programming model. Polycaryon processor framework of the present invention is more suitable for the data flow of MapReduce self than the heterogeneous multi-nucleus processor of the commercialization such as general processor and GPU, can improve the processing speed of mass data; Than FPGA scheme, polycaryon processor scheme of the present invention has can software programming advantage, is easy to exploitation and uses.

Summary of the invention

Technical problem to be solved by this invention is for general processor slow problem of processing speed in the time carrying out large data calculating, a kind of polycaryon processor framework based on MapReduce programming model has been proposed, be used for the speed of speed-up computation node processing high amount of traffic, and simplify the complexity of computing node multiple programming.

The present invention is for solving the problems of the technologies described above, and the technical scheme of employing is as follows:

Polycaryon processor framework based on MapReduce programming model, the piece pipeline module that comprises Split module, global storage, some MapReduce and Merge module;

Described Split module is kept in to global storage for inputting data, is divided into data block and distributes to the piece pipeline module of each MapReduce, and control the piece pipeline module work of each MapReduce;

The piece pipeline module of described MapReduce is for carrying out pipeline processes to the data block of having cut apart, according to the definition of MapReduce programming model by the be transformed into<key of uncorrected data processing of input, values>export described Merge module to after right form;

Described Merge module for by the output of the piece pipeline module of each MapReduce<key, value>the centering result with identical key value merges and exports.

As the further prioritization scheme of polycaryon processor framework that the present invention is based on MapReduce programming model, the piece pipeline module of described MapReduce comprises Map processor, Reduce processor and two data buffer storages, and described Map processor, Reduce processor and two data buffer storages are connected by shared ahb bus;

Described two data buffer storages are for receiving the data of described Split module assignment;

Described Map processor for by the uncorrected data of data buffer storage that has received Split module assignment data according to be converted into<key of MapReduce programming model, value>right, and trigger described Reduce processor continue in conversion data buffer storage later<key, value>to carrying out subsequent treatment;

Described Reduce processor for data buffer storage was processed through described Map processor<key, value>Middle Phase with the data of key value to merging, and be converted into<key, values>right form.

As the further prioritization scheme of polycaryon processor framework that the present invention is based on MapReduce programming model, described Merge module comprises a total Merge processor and some sub-Merge processors, every sub-Merge processor is connected by shared data/address bus with the data buffer storage at least two MapReduce piece pipeline modules, and total Merge processor is connected with global storage and each sub-Merge processor by shared data/address bus;

Described sub-Merge processor for the multiple data buffer storages that are attached thereto were processed through described Reduce processor have identical key value<key, values>to merging, and by shared data/address bus, by merge after<key, values>to leaving in global storage;

Described total Merge processor for global storage is merged through described sub-Merge processor<key, values>centering have identical key value<key, values>to merging, and the result after merging is exported.

As the further prioritization scheme of polycaryon processor framework that the present invention is based on MapReduce programming model, described Split module, Map processor, Reduce processor, total Merge processor and each sub-Merge processor are connected by a special control bus;

Described Split module notifies the Map processor in each MapReduce piece streamline to start working by described control bus with the form of message transmission;

Described Map processor is started working with the form notice Reduce processor of message transmission by described control bus;

Described Reduce processor notifies each sub-Merge processor to start working by described control bus with the form of message transmission;

Described each sub-Merge processor notifies total Merge processor to start working by described control bus with the form of message transmission.

As the further prioritization scheme of polycaryon processor framework that the present invention is based on MapReduce programming model, described two data buffer storages are with mode and described Map processor and the collaborative work of Reduce processor of table tennis.

The present invention adopts above technical scheme, has following technique effect:

1. the present invention adopts distributed polycaryon processor framework to realize the application of MapReduce programming model, relatively general polycaryon processor, polycaryon processor framework of the present invention more meets the data flow feature of MapReduce model profile formula, can effectively improve processor and process the speed that MapReduce applies;

2. the present invention adopts the processing unit of programmable processor as function modules such as Split, Map, Reduce and Merge in execution MapReduce model, the hardware implementation mode such as relative FPGA, and polycaryon processor scheme of the present invention has more flexibility;

3. MapReduce piece pipeline module of the present invention adopts the pipeline processes technology based on data block, and adopts the data buffer storage of ping-pong structure, has realized the continuous-flow type processing of data block;

4. Merge module of the present invention adopts multistage Merge scheme, first by sub-Merge processor packet transaction, then by the processing of total Merge processor, can effectively improve the throughput of Merge module;

5. the present invention adopts data and controls the internuclear interconnection structure being separated, and by multi-level ahb bus transmission data, transmits internuclear control information by special internuclear interconnection, can effectively improve the throughput of data.

Brief description of the drawings

Fig. 1 is the multiprocessor systems on chips framework flow chart based on MapReduce programming model;

Fig. 2 is the schematic diagram of Split module architectures and data flow;

Fig. 3 is the configuration diagram of MapReduce piece streamline;

Fig. 4 is the WordCount example schematic that adopts MapReduce programming model;

Fig. 5 is the configuration diagram of rear module;

Fig. 6 is the data flowchart of rear module.

Detailed description of the invention

The invention will be further described with reference to the accompanying drawings and in conjunction with concrete application program.

Referring to Fig. 1, the invention discloses a kind of polycaryon processor framework based on MapReduce programming model, by layering interconnection structure, processor cluster is coupled together. As shown in Figure 1, described processor is mainly made up of three parts: Split module is cut apart by input data and task is distributed; The MapReduce piece streamline (MapReduceBlock-Pipe being formed by many sticks streamline, be called for short MRBP) module by the data block of having cut apart be processed into MapReduce programming model define<key, value>data pair, wherein key represents key value, and value represents the number of this key value; Rear end Merge module by MRBP module output<key, value>data merge result, the data of all identical key values are combined into one<key, value>data pair.

Particularly, the Split module in described polycaryon processor, for the data of input are transferred to MRBP module from main system, is dispatched the many sticks streamline in MRBP module simultaneously. When described polycaryon processor work, Split module will be inputted transfer of data to global storage, and according to the architecture configuration parameter (as: number of piece streamline of described polycaryon processor, the size of buffer memory etc. in every stick streamline) input Data Segmentation is become to some little data blocks, distribute to the buffer memory in each stick streamline in MRBP module, and the piece streamline triggering in MRBP module is started working. MRBP module is the core processing unit of described polycaryon processor, and according to MapReduce programming model, MRBP module is by be processed into<key of the uncorrected data of input, values>form that data are right. Finally, Merge module is processed the result of described MRBP module, by MRBP module result, have identical key value<key, values>data are combined and export.

In order to improve the processing speed of data block, and simplify the realization of system, polycaryon processor of the present invention adopts many ahb bus that Split module, MRBP module and rear end Merge module are coupled together, and is responsible for transmission MapReduce data. Simultaneously, in order to alleviate the data congestion on described ahb bus, the present invention separates the control information between above-mentioned Split module, MRBP module and rear end Merge module and the data flow that MapReduce processes, and transmits the control information between each processor by the communication bus between another special processor. Fig. 2-Fig. 6 has described in more detail module architectures of the present invention details.

Referring to Fig. 2, describe framework and the data flow of Split module of the present invention in detail, its core is Split processor unit (can be general processor, can be also the application specific processor of customization), for carrying, cutting apart input data, and control the execution of each MRBP module. Particularly, as shown in dotted line in Fig. 21, Split processor unit obtains input data from main system, then data is stored in global storage. Because data to be processed are very large, global storage can be realized by the DDR of large storage capacity. If dotted line in Fig. 21 is to as shown in dotted line 5, Split processor unit is according to the configuration of the system architecture (as: quantity of MRBP, the size of input data, the size of data buffer storage), Data Segmentation in global storage is become to a certain size data block, and distribute to the data buffer storage in each MRBP module (in Fig. 2 taking two MapReduce piece streamlines as example). Split processor, after the transmission that completes data block, triggers MRBP module by the control information between multiprocessor immediately and starts working, and in Fig. 2, dotted line 6 and dotted line 7 represent the triggering signal between processor.

Referring to Fig. 3, describe MRBP module of the present invention in detail. MRBP module is the core processing part of described multi-core processor system framework. According to MapReduce programming model, MRBP module is by the convert to<key of uncorrected data piece of input, values>form that data are right. As shown in Figure 3, multi-core processor system framework of the present invention is made up of many MRBP, every MRBP is by a Map processor, the data buffer storage composition of a Reduce processor and two ping-pong types, between them, couple together by shared ahb bus, and by an internuclear interconnect bus transmitting control information of special polycaryon processor. The form work of data buffer storage 1 Nuclear Data buffer memory 2 to rattle, and Map processor and Reduce processor are all shared. Adopting the data buffer storage of two ping-pong types is for Map processor and Reduce processor can independently be worked simultaneously, the data in alternate treatment buffer memory 1 and buffer memory 2, thereby the speed of raising MRBP stream of processing data. In MRPB, between Map processor and Reduce processor, the transmission of control signal is still to realize by the internuclear communication network of polycaryon processor. No matter be data buffer storage 1 or data buffer storage 2, as long as there is new data buffer memory the inside, Map processor will be triggered and start working by Split module so. In the time that Map processor completes the data processing in some data buffer storages, Reduce processor is triggered, for continuing the data of processing said data buffer memory, Map processor can be processed the data in another data buffer storage simultaneously, the streamline that can guarantee MRBP is like this carried out smoothly, has improved the throughput of streamline simultaneously.

As shown in Figure 4, taking the application (WordCount) of adding up the quantity of word in text as example, according to MapReduce programming model, Map processor unit is first processed original data block, obtain intergrade<key, value>right, and these data are sorted by key value. Then, Reduce processor further merges the result of Map processor, and obtain merge after<key, value>right. Polycaryon processor of the present invention provides an operation platform based on MapReduce programming model for user, user can be according to concrete algorithm the program of self-defined Map processor and Reduce processor.

Referring to Fig. 5 and Fig. 6, describe Merge module of the present invention in detail, Merge module further merges the result of all Reduce processors in MRBP module, by all<key in result, values>the centering data with identical key value are combined into one<key, values>right, and store in global storage. In order to improve the throughput of Merge module, the present invention adopts two-stage Merge mechanism, piece streamline in MRBP module is divided into some groups in equal size, every group has two or more streamlines, and the data buffer storage of the every chunk streamline Merge processor fixing with is connected by a shared ahb bus. Described sub-Merge processor is responsible for the result of coupled piece streamline (being the result of Reduce processor in piece streamline) further to merge, and by result store in global storage. There is a total Merge processor second level, and the result that it is stored in sub-all one-levels Merge processor in global storage further merges and exports. In example shown in Fig. 5, have four stick streamlines, wherein MRBP1 and MRBP2 are one group, and MRBP3 and MRBP4 are one group. Merge1 and Merge2 are the sub-Merge processor of the first order, and wherein Merge1 processor is responsible for the data block in MRBP1 and MRBP2, and Merge2 processor is responsible for the data block in MRBP3 and MRBP4. Merge1 processor and Merge2 processor are all saved to result in global storage. Merge3 processor is the total Merge processor in described two-stage Merge structure, and it is further merged into final result to the result of one-level Merge processor (Merge1 and Merge2). The present invention realizes described two-stage Merge mechanism by multi-level ahb bus, first order Merge processor is connected by shared ahb bus with the data buffer storage in corresponding MRBP, and global storage is all connected with all sub-Merge processors of one-level, and by the sub-Merge processors sharing of all one-levels. The transmission of the control information between equally, all sub-Merge processors and total Merge processor is still to complete by the internuclear communication network of polycaryon processor.

Fig. 6 has described in more detail the flowchart of Merge module of the present invention, and wherein piece streamline MRBP1 and MRBP2 are one group, is responsible for by the sub-Merge processor Merge1 in Merge module; Piece streamline MRBP3 and MRBP4 are one group, are responsible for by the sub-Merge processor Merge2 in Merge module; Merge3 is the total Merge processor in Merge module. Because the process used time of partition data in Split module in the present invention and transmission data is little, in MRBP module, each streamline almost can be started working simultaneously. Data are after MRBP resume module, and sub-Merge processor (Merge1 and Merge2) is triggered, and the result of different masses streamline (Merge1 is responsible for MRBP1 and MRBP2, and Merge2 is responsible for MRBP3 and MRBP4) is merged, and be stored in global storage. Finally, by total Merge processor (Merge3), all results are merged.

Computation complexity with respect to Split module, Merge module is high, and the processing procedure used time is longer. Wherein, the first order Merge processing procedure used time is shorter, result has been stored in global storage, even if therefore Merge processing procedure in the second level is along with the accumulation of sub-Merge processor result is more and more longer, the work of MapReduce streamline will not be subject to the impact consuming time of second level Merge processor and carries out smoothly. Therefore adopting two-stage Merge to carry out deal with data works and is necessary smoothly guaranteeing piece streamline. The more important thing is, the present invention can carry out adjustment System hardware structure according to the size of target algorithm and data set, by adopting more multistage Merge to accelerate.

Those skilled in the art of the present technique are understandable that, unless otherwise defined, all terms used herein (comprising technical term and scientific terminology) have with the present invention under the identical meaning of the general understanding of those of ordinary skill in field. Should also be understood that such as those terms that define in general dictionary and should be understood to have the meaning consistent with meaning in the context of prior art, unless and definition as here, can not explain by idealized or too formal implication.

Above-described detailed description of the invention; object of the present invention, technical scheme and beneficial effect are further described; institute is understood that; the foregoing is only the specific embodiment of the present invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment of making, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. the polycaryon processor framework based on MapReduce programming model, is characterized in that, the piece pipeline module that comprises Split module, global storage, some MapReduce and Merge module;

2. the polycaryon processor framework based on MapReduce programming model according to claim 1, it is characterized in that, the piece pipeline module of described MapReduce comprises Map processor, Reduce processor and two data buffer storages, and described Map processor, Reduce processor and two data buffer storages are connected by shared ahb bus;

3. the polycaryon processor framework based on MapReduce programming model according to claim 2, it is characterized in that, described Merge module comprises a total Merge processor and some sub-Merge processors, every sub-Merge processor is connected by shared data/address bus with the data buffer storage at least two MapReduce piece pipeline modules, and total Merge processor is connected with global storage and each sub-Merge processor by shared data/address bus;

4. the polycaryon processor framework based on MapReduce programming model according to claim 3, it is characterized in that, described Split module, Map processor, Reduce processor, total Merge processor and each sub-Merge processor are connected by a special control bus;

5. according to the polycaryon processor framework based on MapReduce programming model described in claim 2, it is characterized in that, described two data buffer storages are with mode and described Map processor and the collaborative work of Reduce processor of table tennis.