CN106371919A

CN106371919A - Shuffle data caching method based on mapping-reduction calculation model

Info

Publication number: CN106371919A
Application number: CN201610712705.5A
Authority: CN
Inventors: 付周望; 王丁; 王一丁; 戚正伟; 管海兵
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2016-08-24
Filing date: 2016-08-24
Publication date: 2017-02-01
Anticipated expiration: 2036-08-24
Also published as: CN106371919B

Abstract

The invention discloses a shuffle data caching method based on a mapping-reduction calculation model. The shuffle data caching method comprises the following steps of: a mapping-reduction calculation frame sends the division, which takes tasks as a unit, of one piece of mapping-reduction work to a shuffle caching host through an interface, after the shuffle caching host receives task division data, a timestamp is added to the task division data, and then, the task division data with the timestamp is stored in local memory; and the shuffle caching host adopts a random algorithm to carry out one-to-three mapping on reduction task in the task division data and each node in a cluster, and the reduction task and each node of the cluster are stored in the memory of the shuffle caching host in a form of a hash table. By use of the method, the calculation performance of a distributed calculation frame based on the mapping-reduction model can be improved, the low-efficiency manual setting of checking points by users can be avoided, and the robustness of the distributed calculation frame is improved.

Description

A kind of data cache method of shuffling based on mapping-reduction computation model

Technical field

The present invention relates to computer distributed system and distributed computing framework field.Specifically, mainly for base There is provided distributed (shuffle) data buffer storage of shuffling based on internal memory in mapping (map)-reduction (reduce) computation model, from And lift performance and the robustness of this Computational frame.

Background technology

Mapping-reduction computation model and based on the distributed computing system of this modelling be current main flow big data Distributed system, such as spark, hadoop.Calculating based on this model has one between mapping and reduction stages and washes Board (shuffle), mapping and reduction are isolated.Current all designs are all to be done using data write disk of shuffling Persistence is processed, and is then transmitted again.And the performance of disk can not show a candle to internal memory, therefore bring this to computing system larger Performance cost.

Simultaneously, the Computational frame of the type mainly ensures the fault-tolerance (hadoop) calculating by disk, or User is needed manually to increase checkpoint (spark).These fault tolerant mechanisms, due to overlapped with calculating logic, are not only filled Divide and utilize existing ardware feature, and be interspersed in the performance leveraging calculating itself in calculating process.

Although having some distributed file systems based on internal memory at present, they are importantly directed to data block originally Body, and the volume of data block itself is often much larger than and shuffles data it is therefore desirable to substantial amounts of internal memory is as support.Based on above Background, the invention provides a kind of elimination based on the distributed data cache method of shuffling of internal memory is shuffled transmission and is based on disk The performance cost brought of fault-tolerance mechanism, the performance of lifting Computational frame and robustness.

Content of the invention

The present invention is directed to the distributed computing system based on mapping-reduction model, and by shuffling, transmission data is buffered in Shuffle transmission and the performance cost brought based on the fault-tolerance mechanism of disk is eliminated in the internal memory of distributed system.The present invention's Technical solution is as follows:

A kind of data cache method of shuffling of mapping-stipulations computation model, comprises the steps:

Step 1: a mapping-reduction job is passed through interface by the division that task is unit by mapping-reduction Computational frame It is sent to caching main frame of shuffling.Here task contains the id that transmission of shuffling relies on, and mapping tasks sum is total with reduction task Number, after main frame receives, adds that timestamp is saved in local memory.

Step 2: shuffle caching main frame receive workload partition data after, using random algorithm by reduction task therein with Each node of cluster do one a pair three mapping.And reduction task corresponds to random three nodes, one of based on Want node, being left two is backup node.The mapping of reduction task and node is saved in the internal memory of main frame in the form of Hash table In.

Step 3: the one of node of Computational frame scheduling executes mapping tasks.This node has executed mapping tasks After calculating process, by calling the interface of caching system, the data is activation of shuffling of this mapping tasks is held to caching of locally shuffling The memory headroom of row device process.Return immediately simultaneously, represent that tasks carrying completes.

Step 4: when executor's process of the caching system on a node receive mapping tasks shuffle data when, can be by According to the dividing mode (being specified by Computational frame) of data acquiescence of shuffling, data is divided into multiple reduction of shuffling according to reduction task Data block, is saved in internal memory.Usual mapping tasks can produce identical with reduction task number or be less than reduction task The data block of number.

Step 5: executor is entirely being reflected to the mapping table that caching host request reduction task is with node of shuffling, this step Penetrate-reduction work in only can execute once).Mapping table ensure that the distribution rules of all executors are consistent.Executor according to The reduction task of main frame and the mapping table of node, the reduction data block of shuffling that division in step 4 is finished is distributed to and corresponds to therewith Three reduction task remote nodes.Executor sends when shuffling reduction data block, can be according to the setting of main and subordinate node in step 2 Add the label of master-slave back-up respectively to data block.

Step 6: remote node receives the label reading this data block when shuffling reduction data block.If this label is shown as Master backup then saves it in internal memory, and such as fruit is then written into hard disk from backup.

Step 7: repeat step 3 arrives the process of step 6, until all mapping tasks of this work are finished, enters step Rapid 10.

Step 8: Computational frame is before scheduling by the distribution feelings of interface polls all reduction task of caching system of shuffling Condition.

Step 9: Computational frame dispatches reduction task according to the distribution situation of reduction task.Computational frame is chosen wherein first Master backup node, a reduction task is distributed on this node.If master backup node failure, enter step 10, otherwise Enter step 11.

Step 10: Computational frame selects from backup node, and reduction task is sent on this node.If two from backup Node lost efficacy simultaneously, then this mission failure, mistake of dishing out.Terminate all steps.

Step 11: when reduction task executes on node, data is obtained to the local caching executor that shuffles by interface.

Step 12: after the caching executor that locally shuffles receives request, first check for data whether in internal memory, such as including Corresponding data is then directly returned, otherwise degaussing is examined and seized and taken corresponding data and return in depositing.

Step 13: reduction task starts after receiving data to calculate.

Step 14: repeat step 9 to step 13 finishes until all reduction tasks carryings, mapping-reduction end-of-job.

Shuffle the data cached replacement policy of caching system:

Due to the memory resource limitation of each node, in order to not affect performance during tasks carrying, caching system of shuffling is only Fixing memory headroom (can arrange) can be taken by configuration file.However as the continuous execution of task, will be active and standby The substantial amounts of reduction data of shuffling of memory cache of part node.In order to save memory source, caching system of shuffling provides and delays earliest Deposit task strategy disallowable at first.This strategy follows following steps.

Step 1: caching system of shuffling execution nodal test is not enough to internal memory surplus.

Step 2: this backup node sends to caching system main frame of shuffling and rejects request.

Step 3: after caching system main frame of shuffling receives rejecting request, according to local memory record, find and be buffered earliest The corresponding transmission of shuffling of mapping-work of shuffling rely on id, and the master backup node of this work all reduction task.

Step 4: this transmission of shuffling is relied on the master that id is broadcast to all caching systems of shuffling in cluster by caching system of shuffling Backup node.

Step 5: execution node receive shuffle transmission rely on id after from the internal memory of oneself by corresponding data block from interior Deposit middle deletion.

The recovery policy that the caching decorum of shuffling is supported for the robustness that Computational frame provides.

Because mapping-reduction Computational frame contains substantial amounts of mapping-reduction mistake when executing whole workflow Journey.If having coordinated caching system of shuffling, Computational frame is not needed to carry out manual checkpointing to calculating data.In office Occur unsuccessfully in business implementation procedure, then directly can directly recover from the data of recent mapping-reduction, greatly reduce Recovery time, improve calculating performance.This strategy follows following steps.

Step 1: run-time error in Computational frame.

Step 2: Computational frame starts to find the data persisting recently from back to front according to user's execution logic.

Step 3: whether have data of shuffling accordingly standby by interface to caching system inquiry of shuffling when Computational frame is searched Part.

Step 4: if having found backup, directly start to recover from this step.

Step 5: without finding backup, then continue to find forward, if all do not backed up, according to Computational frame Fault tolerant mechanism starts to recover.

Compared with prior art, the invention has the beneficial effects as follows: the distributed meter based on mapping-reduction model can be lifted Calculate the calculating performance of framework (such as spark, hadoop), it is to avoid the manual Checkpointing of poorly efficient user, lift Distributed Calculation The robustness of framework.

Brief description

Fig. 1. configuration diagram

Fig. 2. mapping tasks operating diagram

Fig. 3. host node reduction tasks carrying schematic diagram

Fig. 4. from node reduction tasks carrying schematic diagram

Fig. 5. task division information

Fig. 6. caching host traceback information of shuffling

Specific implementation method

Below with reference to accompanying drawing, embodiments of the invention are elaborated.The present embodiment is in technical solution of the present invention and calculation Implemented on the premise of method, and provided detailed embodiment and specific operation process, but be suitable for platform and be not limited to following realities Apply example.The concrete operations platform of this example leads to the small-sized cluster that forms of server by two Daeporis, on each server equipped with Ubuntuserver 14.04.1 lts 64 bit, and it is equipped with 8gb internal memory.The concrete exploitation of the present invention is based on apache The source code version of spark 1.6 is equally applicable as explanation, other mapping-reduction distributed computing frameworks such as hadoop.First The interface being passed to this method by changing the source code of spark is first needed to transmit data of shuffling.

The present invention passes through to dispose caching system in Distributed Calculation cluster, changes few portion of distributed computing framework simultaneously Divide code, realize the interface interchange to this method, the distributed of data of shuffling in realizing calculating for mapping-stipulations back up more Internal memory/disk buffering.Under the support of this method, the existing distributed computing framework based on mapping-reduction model can be lifted Performance and robustness.Designed based on the framework in Fig. 1, employ paxos agreement to tie up in the main frame of caching system of shuffling The concordance of protecting system state itself and robustness.Meanwhile, each node of Distributed Calculation cluster deploys to shuffle and hold Row device, to be responsible for the transmission of data of specifically shuffling, and caching and the progress of work for distributed computing framework provide interface.Reflecting Data is transferred to, by interface, execution of locally shuffling as shown in Fig. 2 distributed computing framework will be shuffled to penetrate the workflow in stage The internal memory of device, the parameter such as number of tasks then being worked according to this mapping-reduction by the executor that shuffles divides and selects backup section Point, and Backup Data.In reduction stages, distributed computing framework then passes through interface directly to local executor's request of shuffling Data.In the case of host node work, this data is derived from the internal memory of the executor that shuffles, as shown in Figure 3.If host node loses Lose, task can be scheduled in from node, read data from the hard disk of from node, as shown in Figure 4.

Please referring initially to Fig. 1, Fig. 1 is configuration diagram, as illustrated, the general frame of the present invention is typical master-slave mode frame Structure, main frame is made up of a working host and two backup hosts.They ensure the concordance of state by paxos agreement, from And avoid the whole system collapse causing due to working host collapse.Additionally, all dispose on every server of from node shuffling Caching executor.Simultaneously need to disposing modified spark Computational frame in same cluster.

When spark Computational frame is started working, once there being the task of comprising to shuffle transmission to be submitted to by user, caching of shuffling System will enter the job step illustrating in people's claims, to provide the acceleration of transmission of shuffling and robustness to prop up for spark Hold.And whole process is fully transparent for the user of spark.

Due to the present embodiment, to take full advantage of internal memory data cached.At the end of mapping tasks, reduction task just can be straight It is connected on the data required for reading in the local internal memory of the caching executor that shuffles, thus accelerating the speed of whole Distributed Calculation Degree.

Meanwhile, when step a certain in the computing of spark makes a mistake, when needing to recover, can find and shuffled with forward recursion The data of caching system caching, then start to recover, thus accelerating resume speed, the robustness of lifting whole system.

A kind of data cache method of shuffling based on mapping-reduction computation model, comprises the steps:

Computational frame and caching system cooperation running of shuffling:

Step 1: a mapping-reduction job is passed through interface by the division that task is unit by mapping-reduction Computational frame It is sent to caching main frame of shuffling.Here task contains the id that transmission of shuffling relies on, and mapping tasks sum is total with reduction task Number, after main frame receives, adds that timestamp is saved in local memory.As shown in Figure 5.

Step 2: shuffle caching main frame receive task divide data after, using random algorithm by reduction task therein with Each node of cluster do one a pair three mapping.I.e. reduction task corresponds to random three nodes, one of based on Want node, being left two is backup node.The mapping of reduction task and node is saved in the internal memory of main frame in the form of Hash table In, main frame can stamp timestamp to respective record simultaneously.Concrete preservation information is as shown in Figure 6.

Step 5: executor is to the mapping table that caching host request reduction task is with node of shuffling, and the letter represented by Fig. 6 Breath.(this step only can execute once in whole mapping-reduction work).Mapping table ensure that the distribution rules of all executors It is consistent.

Step 6: the mapping table of the reduction task according to main frame for the executor and node, by shuffling that division in step 4 finishes Reduction data block is distributed to corresponding three reduction task remote nodes.

Step 7: executor sends when shuffling reduction data block, can give number respectively according to the setting of main and subordinate node in step 2 Add the label of master-slave back-up according to block.

Step 8: remote node receives the label reading this data block when shuffling reduction data block.If this label is shown as Master backup then saves it in internal memory, and such as fruit is then written into hard disk from backup.

Step 9: repeat step 3 arrives the process of step 8, until all mapping tasks of this work are finished, enters step Rapid 10.

Step 10: Computational frame is before scheduling by the distribution feelings of interface polls all reduction task of caching system of shuffling Condition.

Step 11: Computational frame dispatches reduction task according to the distribution situation of reduction task.Computational frame chooses it first In master backup node, a reduction task is distributed on this node.If master backup node failure, enter step 12, no Then enter step 13.

Step 12: Computational frame selects from backup node, and reduction task is sent on this node.If two from backup Node lost efficacy simultaneously, then this mission failure, mistake of dishing out.Terminate all steps.

Step 13: when reduction task executes on node, data is obtained to the local caching executor that shuffles by interface.

Step 14: after the caching executor that locally shuffles receives request, first check for data whether in internal memory, such as including Corresponding data is then directly returned, otherwise degaussing is examined and seized and taken corresponding data and return in depositing.

Step 15: reduction task starts after receiving data to calculate.

Step 16: repeat step 11 to step 15 finishes until all reduction tasks carryings, enters step 17.

Step 17: this mapping-reduction end-of-job.

Shuffle the data cached replacement policy of caching system:

Step 1: run-time error in Computational frame.

Step 4: if having found backup, directly start to recover from this step.

Pass through the benchmark program of the related spark such as word count on the basis of this embodiment, demonstrate this Invention correctness, simultaneously present invention spark compared to master in performance have not in different benchmark programs Lifting with degree.

The preferred embodiment of the present invention described in detail above.It should be appreciated that the ordinary skill of this area need not be created The property made work just can make many modifications and variations according to the design of the present invention.Therefore, all technical staff in the art Pass through the available technology of logical analysis, reasoning, or a limited experiment under this invention's idea on the basis of existing technology Scheme, all should be in the protection domain being defined in the patent claims.

Claims

1. a kind of data cache method of shuffling based on mapping-reduction computation model is it is characterised in that the method includes walking as follows Rapid:

Step 1: a mapping-reduction job is sent by interface for the division of unit by mapping-reduction Computational frame by task To caching main frame of shuffling, after caching main frame of shuffling receives task division data, add that timestamp is saved in local memory；

Step 2: caching main frame of shuffling task is divided data adopt random algorithm by reduction task therein and cluster each Node do one a pair three mapping, and be saved in the form of the Hash table in the internal memory of caching main frame of shuffling

Step 3: the one of node of Computational frame scheduling executes mapping tasks, after this node has executed mapping tasks, leads to Cross the interface calling caching system by the data is activation of shuffling of this mapping tasks to the internal memory of caching executor's process of locally shuffling Space, returns simultaneously, represents that tasks carrying completes；

Step 4: when executor's process of the caching system on a node receive mapping tasks shuffle data when, according to shuffling The dividing mode of data acquiescence, data is divided into multiple reduction data blocks of shuffling according to reduction task, is saved in internal memory；

Step 5: the caching executor that locally shuffles to the mapping table of caching host request reduction task and node of shuffling, and according to washing Board caches the mapping table of reduction task and the node of main frame, by divide in step 4 the reduction data block of shuffling finishing be distributed to Corresponding three reduction task remote nodes, and according to main and subordinate node in step 2 setting respectively to data block add active and standby Part and the label from backup；

Step 6: remote node receives the label reading this data block when shuffling reduction data block, if this label is shown as active and standby Part then saves it in internal memory, if this label is shown as, from backup, being written into hard disk；If now master backup node Memory headroom not enough, then the data of shuffling can trigger the caching decorum of shuffling rejects step；, simultaneously enter step 7.

Step 7: repeat step 3 arrives the process of step 6, until all mapping tasks of this work are finished, enters step 8；

Step 8: Computational frame is before scheduling by the distribution situation of interface polls all reduction task of caching system of shuffling；

Step 9: Computational frame dispatches reduction task according to the distribution situation of reduction task: chooses master backup section therein first Point, a reduction task is distributed on this node, if master backup node failure, enters step 10, otherwise enters step 11；

Step 10: Computational frame selects from backup node, reduction task is sent on this node, if two from backup node Lost efficacy, then this mission failure simultaneously, mistake of dishing out, terminate all steps.

Step 11: when reduction task executes on node, data is obtained to the caching executor that locally shuffles by interface；

Step 12: after the caching executor that locally shuffles receives request, first check for data whether in internal memory, such as in internal memory Then directly return corresponding data to this task, otherwise degaussing is examined and seized and taken corresponding data and return；

Step 13: reduction task starts after receiving data to calculate；

2. according to the data cache method of shuffling based on mapping-reduction computation model it is characterised in that described data of shuffling is picked Except step, specific as follows:

Step 1: caching system of shuffling execution node, that is, master backup nodal test is not enough to internal memory surplus；

Step 2: this backup node sends to caching system main frame of shuffling and rejects request；

Step 3: after caching main frame of shuffling receives rejecting request, according to local memory record, find the mapping that is buffered earliest-wash The corresponding transmission of shuffling of board work relies on id, and the master backup node of this work all reduction task；

Step 4: this transmission of shuffling is relied on the master backup that id is broadcast to all caching systems of shuffling in cluster by caching system of shuffling Node.

Step 5: execution node receive shuffle transmission rely on id after from the internal memory of oneself by corresponding data block from internal memory Delete.

3. according to claim 1 based on mapping-reduction computation model shuffle data cache method it is characterised in that When run-time error in Computational frame, caching system of shuffling provides the recovery policy of robustness support for Computational frame, specifically such as Under:.

Step 1: Computational frame starts to find the data persisting recently from back to front according to user's execution logic；

Step 2: when Computational frame is searched, whether data backup of shuffling accordingly is had to caching system inquiry of shuffling by interface: such as Fruit has been found backup and then directly has started to recover from this step；

Without finding backup, then continue to find forward, until confirming all not back up, then according to the fault-tolerant machine of Computational frame System starts to recover.