CN106371919B

CN106371919B - It is a kind of based on mapping-reduction computation model data cache method of shuffling

Info

Publication number: CN106371919B
Application number: CN201610712705.5A
Authority: CN
Inventors: 付周望; 王一丁; 戚正伟; 管海兵
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2016-08-24
Filing date: 2016-08-24
Publication date: 2019-07-16
Anticipated expiration: 2036-08-24
Also published as: CN106371919A

Abstract

The invention discloses a kind of based on mapping-reduction computation model data cache method of shuffling, one mapping-reduction work is sent to shuffle by the division that task is unit by interface including mapping-reduction Computational frame and caches host, shuffle cache host receive task divide data after, in addition timestamp is stored in local memory；It shuffles and caches host the mapping that each node of reduction task therein and cluster is done using random algorithm an a pair three by data is divided to task, and be stored in the form of Hash table in the memory for caching host of shuffling and etc..The present invention is able to ascend the calculated performance based on mapping-reduction model distributed computing framework, avoids the manual Checkpointing of inefficient user, promotes the robustness of distributed computing framework.

Description

It is a kind of based on mapping-reduction computation model data cache method of shuffling

Technical field

The present invention relates to computer distributed system and distributed computing framework fields.It specifically, is mainly base Distribution memory-based is provided in mapping (Map)-reduction (Reduce) computation model to shuffle (shuffle) data buffer storage, from And promote the performance and robustness of the Computational frame.

Background technique

Mapping-reduction computation model and the distributed computing system designed based on this model are the big datas of current mainstream Distributed system, such as Spark, Hadoop.There is one between mapping and reduction stages to wash for calculating based on this model Board (Shuffle) is isolated by mapping and reduction.Current all designs are done using the data write-in disk that will shuffle Persistence processing, is then transmitted again.And the performance of disk can not show a candle to memory, therefore it is biggish to computing system to bring this Performance cost.

Simultaneously, the Computational frame of the type mainly guarantees the fault-tolerance (Hadoop) calculated by disk, or User is needed manually to increase checkpoint (Spark).These fault tolerant mechanisms are not filled not only due to overlapped with calculating logic Divide and utilize existing hardware feature, and is interspersed in the performance for leveraging in calculating process and calculating itself.

Although having some distributed file systems memory-based at present, important is be directed to data block sheet for they Body, and the volume of data block itself is often much larger than data of shuffling, it is therefore desirable to a large amount of memory is as support.Based on above Background shuffles data cache method the present invention provides a kind of distribution memory-based to eliminate to shuffle and transmit and based on disk Fault-tolerance mechanism bring performance cost, promote the performance and robustness of Computational frame.

Summary of the invention

The present invention is directed to based on mapping-reduction model distributed computing system, is transmitted data buffer storage by that will shuffle and is existed Shuffle transmission and the fault-tolerance mechanism bring performance cost based on disk are eliminated in the memory of distributed system.Of the invention Technical solution is as follows:

A kind of data cache method of shuffling of mapping-specification computation model, includes the following steps:

Step 1: a mapping-reduction work is passed through interface by the division that task is unit by mapping-reduction Computational frame It is sent to shuffle and caches host.Here task contains the ID that transmission of shuffling relies on, and mapping tasks sum and reduction task are total Number, after host receives, in addition timestamp is stored in local memory.

Step 2: shuffle and cache after host receives workload partition data, using random algorithm by reduction task therein with Each node of cluster does the mapping an of a pair three.And a reduction task corresponds to random three nodes, based on one of them Node is wanted, being left two is backup node.The mapping of reduction task and node is stored in the memory of host in the form of Hash table In.

Step 3: Computational frame dispatches one of node and executes a mapping tasks.The node has executed mapping tasks After calculating process, held by calling the interface of caching system that the data of shuffling of the mapping tasks are sent to local caching of shuffling The memory headroom of row device process.It returns immediately simultaneously, indicates that task execution is completed.

Step 4: when the actuator process of the caching system on a node receive mapping tasks shuffle data when, can be by According to the division mode (being specified by Computational frame) for data default of shuffling, data are divided into multiple reduction of shuffling according to reduction task Data block saves in memory.A usual mapping tasks can generate it is identical as reduction task number or be less than reduction task The data block of number.

Step 5: actuator caches the mapping table of host request reduction task and node to shuffling, which is entirely reflecting Penetrate-reduction work in can only execute it is primary).Mapping table ensure that the distribution rules of all actuators are consistent.Actuator according to The reduction task of host and the mapping table of node will divide the reduction data block of shuffling that finishes and be distributed to and be corresponding to it in step 4 Three reduction task remote nodes.Actuator transmission shuffle reduction data block when, can according to main and subordinate node in step 2 setting The label of master-slave back-up is added to data block respectively.

Step 6: remote node receives the label that the data block is read when shuffling reduction data block.If the label is shown as Master backup then saves it in memory, is written into hard disk if fruit is from backup.

Step 7: the process that step 3 arrives step 6 is repeated, until all mapping tasks of the work are finished, into step Rapid 10.

Step 8: the distribution feelings of all reduction tasks of interface polls of the Computational frame before scheduling through caching system of shuffling Condition.

Step 9: Computational frame dispatches reduction task according to the distribution situation of reduction task.Computational frame is chosen wherein first Master backup node, a reduction task is distributed on the node.If master backup node failure enters step 10, otherwise Enter step 11.

Step 10: Computational frame selects to send reduction task on the node from backup node.If two from backup Node fails simultaneously, then the mission failure, mistake of dishing out.Terminate all steps.

Step 11: reduction task caches actuator to local shuffling by interface and obtains data when executing on node.

Step 12: local shuffle caches after actuator receives request, whether in memory data is first checked for, as including Corresponding data is then directly returned in depositing, otherwise degaussing, which is examined and seized, takes corresponding data and return.

Step 13: reduction task starts to calculate after receiving data.

Step 14: repeating step 9 to step 13 until all reduction task executions finish, mapping-reduction work terminates.

It shuffles the data cached replacement policy of caching system:

Due to the memory resource limitation of each node, performance when in order to not influence task execution, caching system of shuffling is only Fixed memory headroom (can be arranged by configuration file) can be occupied.It, will be active and standby however as the continuous execution of task The memory cache of part node is largely shuffled reduction data.In order to save memory source, caching system of shuffling provides slow earliest Deposit the strategy that task is removed at first.The strategy follows following steps.

Step 1: it is insufficient to memory surplus that caching system of shuffling executes nodal test.

Step 2: the backup node sends to caching system host of shuffling and rejects request.

Step 3: after caching system host of shuffling receives rejecting request, being recorded according to local memory, find and be buffered earliest The mapping-corresponding transmission of shuffling of work of shuffling rely on the master backup nodes of ID and all reduction tasks of the work.

Step 4: the transmission of shuffling is relied on the master that ID is broadcast to all caching systems of shuffling in cluster by caching system of shuffling Backup node.

Step 5: executing node and receive transmission of shuffling and rely on corresponding data block after ID from the memory of oneself from interior Deposit middle deletion.

Shuffling and caching the decorum is the recovery policy that the robustness that Computational frame provides is supported.

Since mapping-reduction Computational frame contains a large amount of mapping-reduction mistake when executing entire workflow Journey.If having cooperated caching system of shuffling, Computational frame will not need to carry out manually checkpointing to calculating data.It is in office Fail in business implementation procedure, then directly can directly restore from the data of recent mapping-reduction, greatly reduce Recovery time, improve calculated performance.The strategy follows following steps.

Step 1: there is run-time error in Computational frame.

Step 2: Computational frame executes logic according to user and starts to find the data persisted recently from back to front.

Step 3: whether having data of shuffling accordingly standby to caching system inquiry of shuffling by interface when Computational frame is searched Part.

Step 4: directly restoring since the step if having found backup.

Step 5: if not finding backup, continuing to find forward, if all do not backed up, according to Computational frame Fault tolerant mechanism starts to restore.

Compared with prior art, the beneficial effects of the present invention are: being able to ascend based on mapping-reduction model distribution The calculated performance for calculating frame (such as Spark, Hadoop) avoids the manual Checkpointing of inefficient user, promotes distributed computing The robustness of frame.

Detailed description of the invention

Fig. 1 configuration diagram

Fig. 2 mapping tasks operation schematic diagram

Fig. 3 host node reduction task execution schematic diagram

Fig. 4 is from node reduction task execution schematic diagram

Fig. 5 task division information

Fig. 6, which shuffles, caches host traceback information

Specific implementation method

It elaborates below with reference to attached drawing to the embodiment of the present invention.The present embodiment is in technical solution of the present invention and calculation Implemented under the premise of method, and provide detailed embodiment and specific operation process, but is applicable in platform and be not limited to following realities Apply example.The small-sized cluster that the concrete operations platform of this example is made of two common servers is equipped on each server 64 bit of UbuntuServer 14.04.1 LTS, and it is equipped with 8GB memory.Specific exploitation of the invention is based on Apache As explanation, other mappings such as Hadoop-reduction distributed computing framework is equally applicable for the source code version of Spark 1.6.It is first It first needs to make it through the interface of this method by the source code for modifying Spark to transmit data of shuffling.

The present invention modifies few portion of distributed computing framework by disposing caching system in distributed computing cluster Divide code, realizes and the interface of this method is called, the distributed prepare more part of Lai Shixian for data of shuffling in mapping-specification calculating Memory/disk buffering.Under the support of this method, it can be promoted existing based on mapping-reduction model distributed computing framework Performance and robustness.It is designed based on the framework in Fig. 1, uses Paxos agreement in the host for caching system of shuffling to tie up The consistency and robustness of protecting system state itself.Meanwhile it deploying to shuffle on each node of distributed computing cluster and hold Row device caches to be responsible for the transmission of specific data of shuffling and provides interface for the progress of work of distributed computing framework.It is reflecting Data of the workflow in stage being penetrated as shown in Fig. 2, distributed computing framework will shuffle are transferred to local execution of shuffling by interface Then the memory of device divides and selects backup to save by shuffling actuator according to parameters such as the mapping-reduction work number of tasks Point, and Backup Data.In reduction stages, distributed computing framework then passes through interface directly to local actuator request of shuffling Data.In the case where host node work, memory of the data from actuator of shuffling, as shown in Figure 3.If host node loses It loses, task can be scheduled for from node, read data from from the hard disk of node, as shown in Figure 4.

It is please configuration diagram referring initially to Fig. 1, Fig. 1, as shown, general frame of the invention is typical master-slave mode frame Structure, host are made of a working host and two backup hosts.They guarantee the consistency of state by Paxos agreement, from And whole system caused by collapsing as working host is avoided to collapse.In addition, all deployment is shuffled from every server of node Cache actuator.It needs to dispose modified Spark Computational frame in the same cluster simultaneously.

When Spark Computational frame is started to work, once there is the task comprising transmission of shuffling to be submitted by user, caching of shuffling System will be into work step described in people's claims, to provide the acceleration and robustness branch of transmission of shuffling for Spark It holds.And whole process is fully transparent for the user of Spark.

Since to take full advantage of memory data cached for the present embodiment.At the end of mapping tasks, reduction task can be straight It connects and reads required data in the local memory for cache actuator of shuffling, to accelerate the speed of entire distributed computing Degree.

Meanwhile when step a certain in the operation of Spark generation mistake, when needing to restore, it can find and be shuffled with forward recursion The data of caching system caching, then start to restore, to accelerate resume speed, promote the robustness of whole system.

It is a kind of based on mapping-reduction computation model data cache method of shuffling, include the following steps:

Computational frame and caching system cooperation operational process of shuffling:

Step 1: a mapping-reduction work is passed through interface by the division that task is unit by mapping-reduction Computational frame It is sent to shuffle and caches host.Here task contains the ID that transmission of shuffling relies on, and mapping tasks sum and reduction task are total Number, after host receives, in addition timestamp is stored in local memory.As shown in Figure 5.

Step 2: shuffle cache host receive task divide data after, using random algorithm by reduction task therein with Each node of cluster does the mapping an of a pair three.That is a reduction task corresponds to random three nodes, based on one of them Node is wanted, being left two is backup node.The mapping of reduction task and node is stored in the memory of host in the form of Hash table In, while host can stamp timestamp to respective record.Specific preservation information is as shown in Figure 6.

Step 5: actuator to shuffle cache host request reduction task and node mapping table and Fig. 6 represented by letter Breath.(step can only execute primary in entire mapping-reduction work).Mapping table ensure that the distribution rules of all actuators It is consistent.

Step 6: actuator is shuffled according to the reduction task of host and the mapping table of node by what division in step 4 finished Reduction data block is distributed to corresponding three reduction task remote node.

Step 7: actuator transmission shuffle reduction data block when, according to the setting of main and subordinate node in step 2 can give respectively number The label of master-slave back-up is added according to block.

Step 8: remote node receives the label that the data block is read when shuffling reduction data block.If the label is shown as Master backup then saves it in memory, is written into hard disk if fruit is from backup.

Step 9: the process that step 3 arrives step 8 is repeated, until all mapping tasks of the work are finished, into step Rapid 10.

Step 10: the distribution feelings of all reduction tasks of interface polls of the Computational frame before scheduling through caching system of shuffling Condition.

Step 11: Computational frame dispatches reduction task according to the distribution situation of reduction task.Computational frame chooses it first In master backup node, a reduction task is distributed on the node.If master backup node failure, 12 are entered step, it is no Then enter step 13.

Step 12: Computational frame selects to send reduction task on the node from backup node.If two from backup Node fails simultaneously, then the mission failure, mistake of dishing out.Terminate all steps.

Step 13: reduction task caches actuator to local shuffling by interface and obtains data when executing on node.

Step 14: local shuffle caches after actuator receives request, whether in memory data is first checked for, as including Corresponding data is then directly returned in depositing, otherwise degaussing, which is examined and seized, takes corresponding data and return.

Step 15: reduction task starts to calculate after receiving data.

Step 16: repeating step 11 to step 15 until all reduction task executions finish, enter step 17.

Step 17: the mapping-reduction work terminates.

It shuffles the data cached replacement policy of caching system:

Step 1: there is run-time error in Computational frame.

Step 4: directly restoring since the step if having found backup.

By the benchmark program of the correlation Spark such as Word Count on the basis of this embodiment, this is demonstrated The correctness of invention, while the present invention has not in different benchmark programs in performance compared to the Spark of master With the promotion of degree.

The preferred embodiment of the present invention has been described in detail above.It should be appreciated that the ordinary skill of this field is without wound The property made labour, which according to the present invention can conceive, makes many modifications and variations.Therefore, all technician in the art Pass through the available technology of logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea Scheme, all should be within the scope of protection determined by the claims.

Claims

1. a kind of based on mapping-reduction computation model data cache method of shuffling, which is characterized in that this method includes following step It is rapid:

Step 1: mapping-reduction Computational frame is sent a mapping-reduction work by the division that task is unit by interface Cache host to shuffling, shuffle cache host receive task divide data after, in addition timestamp is stored in local memory；

Step 2: shuffling caches host and divides data using random algorithm for each of reduction task therein and cluster to task Node does the mapping an of a pair three, and is stored in the memory for caching host of shuffling in the form of Hash table, and a reduction is appointed Corresponding random three nodes of business, one of them is main node, and being left two is backup node；Step 3: Computational frame dispatches it In node execute a mapping tasks, should by the interface of calling caching system after which has executed mapping tasks The data of shuffling of mapping tasks are sent to local shuffle and cache the memory headroom of actuator process, return simultaneously, expression task is held Row is completed；

Step 4: when the actuator process of the caching system on a node receive mapping tasks shuffle data when, according to shuffling Data are divided into multiple reduction data blocks of shuffling according to reduction task, saved in memory by the division mode of data default；

Step 5: local shuffle caches actuator to the mapping table for caching host request reduction task and node of shuffling, and according to washing Board caches the reduction task of host and the mapping table of node, will be divided in step 4 the reduction data block of shuffling that finishes be distributed to Corresponding three reduction task remote node, and give data respectively according to the setting of main node in step 2 and backup node Block adds master backup and the label from backup；

Step 6: remote node receives the label that the data block is read when shuffling reduction data block, if the label is shown as active and standby Part then saves it in memory, is written into hard disk if the label is shown as from backup；If master backup node at this time Memory headroom it is insufficient, then the data of shuffling that can trigger caching system of shuffling reject step；7 are entered step simultaneously；

Step 7: repeating the process that step 3 arrives step 6 until all mapping tasks of the work are finished and enter step 8；

Step 8: the distribution situation of all reduction tasks of interface polls of the Computational frame before scheduling through caching system of shuffling；

Step 9: Computational frame dispatches reduction task according to the distribution situation of reduction task: choosing master backup section therein first One reduction task is distributed on the node by point, if master backup node failure, is entered step 10, is otherwise entered step 11；

Step 10: Computational frame selects to send reduction task on the node from backup node, if two from backup node It fails simultaneously, then the mission failure, mistake of dishing out terminates all steps；

Step 11: reduction task is shuffled to local by interface when executing on node and caches actuator acquisition data；

Step 12: local shuffle caches after actuator receives request, whether in memory to first check for data, such as in memory Corresponding data directly then is returned to the task, otherwise degaussing, which is examined and seized, takes corresponding data and return；

Step 13: reduction task starts to calculate after receiving data；

2. according to claim 1 based on mapping-reduction computation model data cache method of shuffling, which is characterized in that The data of shuffling reject step, specific as follows:

Step 1: caching system of shuffling executes node, i.e. master backup nodal test is insufficient to memory surplus；

Step 2: the backup node sends to caching system host of shuffling and rejects request；

Step 3: shuffling and cache after host receives rejecting request, recorded according to local memory, find earliest buffered mapping-and wash The corresponding transmission of shuffling of board work relies on the master backup node of ID and all reduction tasks of the work；

Step 4: the transmission of shuffling is relied on the master backup that ID is broadcast to all caching systems of shuffling in cluster by caching system of shuffling Node；

Step 5: execute node receive shuffle transmission rely on ID after from the memory of oneself by corresponding data block from memory It deletes.

3. according to claim 1 based on mapping-reduction computation model data cache method of shuffling, which is characterized in that When run-time error occurs in Computational frame, caching system of shuffling provides the recovery policy of robustness support for Computational frame, specifically such as Under:

Step 1: Computational frame executes logic according to user and starts to find the data persisted recently from back to front；

Step 2: whether having data backup of shuffling accordingly to caching system inquiry of shuffling by interface when Computational frame is searched: such as Fruit has found backup and has then directly restored since the step；

If not finding backup, continue to find forward, until confirmation is not all backed up, then according to the fault-tolerant machine of Computational frame System starts to restore.