CN106371919A - Shuffle data caching method based on mapping-reduction calculation model - Google Patents

Shuffle data caching method based on mapping-reduction calculation model Download PDF

Info

Publication number
CN106371919A
CN106371919A CN201610712705.5A CN201610712705A CN106371919A CN 106371919 A CN106371919 A CN 106371919A CN 201610712705 A CN201610712705 A CN 201610712705A CN 106371919 A CN106371919 A CN 106371919A
Authority
CN
China
Prior art keywords
shuffling
reduction
data
node
mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610712705.5A
Other languages
Chinese (zh)
Other versions
CN106371919B (en
Inventor
付周望
王丁
王一丁
戚正伟
管海兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201610712705.5A priority Critical patent/CN106371919B/en
Publication of CN106371919A publication Critical patent/CN106371919A/en
Application granted granted Critical
Publication of CN106371919B publication Critical patent/CN106371919B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/073Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a memory management context, e.g. virtual memory or cache management

Abstract

The invention discloses a shuffle data caching method based on a mapping-reduction calculation model. The shuffle data caching method comprises the following steps of: a mapping-reduction calculation frame sends the division, which takes tasks as a unit, of one piece of mapping-reduction work to a shuffle caching host through an interface, after the shuffle caching host receives task division data, a timestamp is added to the task division data, and then, the task division data with the timestamp is stored in local memory; and the shuffle caching host adopts a random algorithm to carry out one-to-three mapping on reduction task in the task division data and each node in a cluster, and the reduction task and each node of the cluster are stored in the memory of the shuffle caching host in a form of a hash table. By use of the method, the calculation performance of a distributed calculation frame based on the mapping-reduction model can be improved, the low-efficiency manual setting of checking points by users can be avoided, and the robustness of the distributed calculation frame is improved.

Description

A kind of data cache method of shuffling based on mapping-reduction computation model
Technical field
The present invention relates to computer distributed system and distributed computing framework field.Specifically, mainly for base There is provided distributed (shuffle) data buffer storage of shuffling based on internal memory in mapping (map)-reduction (reduce) computation model, from And lift performance and the robustness of this Computational frame.
Background technology
Mapping-reduction computation model and based on the distributed computing system of this modelling be current main flow big data Distributed system, such as spark, hadoop.Calculating based on this model has one between mapping and reduction stages and washes Board (shuffle), mapping and reduction are isolated.Current all designs are all to be done using data write disk of shuffling Persistence is processed, and is then transmitted again.And the performance of disk can not show a candle to internal memory, therefore bring this to computing system larger Performance cost.
Simultaneously, the Computational frame of the type mainly ensures the fault-tolerance (hadoop) calculating by disk, or User is needed manually to increase checkpoint (spark).These fault tolerant mechanisms, due to overlapped with calculating logic, are not only filled Divide and utilize existing ardware feature, and be interspersed in the performance leveraging calculating itself in calculating process.
Although having some distributed file systems based on internal memory at present, they are importantly directed to data block originally Body, and the volume of data block itself is often much larger than and shuffles data it is therefore desirable to substantial amounts of internal memory is as support.Based on above Background, the invention provides a kind of elimination based on the distributed data cache method of shuffling of internal memory is shuffled transmission and is based on disk The performance cost brought of fault-tolerance mechanism, the performance of lifting Computational frame and robustness.
Content of the invention
The present invention is directed to the distributed computing system based on mapping-reduction model, and by shuffling, transmission data is buffered in Shuffle transmission and the performance cost brought based on the fault-tolerance mechanism of disk is eliminated in the internal memory of distributed system.The present invention's Technical solution is as follows:
A kind of data cache method of shuffling of mapping-stipulations computation model, comprises the steps:
Step 1: a mapping-reduction job is passed through interface by the division that task is unit by mapping-reduction Computational frame It is sent to caching main frame of shuffling.Here task contains the id that transmission of shuffling relies on, and mapping tasks sum is total with reduction task Number, after main frame receives, adds that timestamp is saved in local memory.
Step 2: shuffle caching main frame receive workload partition data after, using random algorithm by reduction task therein with Each node of cluster do one a pair three mapping.And reduction task corresponds to random three nodes, one of based on Want node, being left two is backup node.The mapping of reduction task and node is saved in the internal memory of main frame in the form of Hash table In.
Step 3: the one of node of Computational frame scheduling executes mapping tasks.This node has executed mapping tasks After calculating process, by calling the interface of caching system, the data is activation of shuffling of this mapping tasks is held to caching of locally shuffling The memory headroom of row device process.Return immediately simultaneously, represent that tasks carrying completes.
Step 4: when executor's process of the caching system on a node receive mapping tasks shuffle data when, can be by According to the dividing mode (being specified by Computational frame) of data acquiescence of shuffling, data is divided into multiple reduction of shuffling according to reduction task Data block, is saved in internal memory.Usual mapping tasks can produce identical with reduction task number or be less than reduction task The data block of number.
Step 5: executor is entirely being reflected to the mapping table that caching host request reduction task is with node of shuffling, this step Penetrate-reduction work in only can execute once).Mapping table ensure that the distribution rules of all executors are consistent.Executor according to The reduction task of main frame and the mapping table of node, the reduction data block of shuffling that division in step 4 is finished is distributed to and corresponds to therewith Three reduction task remote nodes.Executor sends when shuffling reduction data block, can be according to the setting of main and subordinate node in step 2 Add the label of master-slave back-up respectively to data block.
Step 6: remote node receives the label reading this data block when shuffling reduction data block.If this label is shown as Master backup then saves it in internal memory, and such as fruit is then written into hard disk from backup.
Step 7: repeat step 3 arrives the process of step 6, until all mapping tasks of this work are finished, enters step Rapid 10.
Step 8: Computational frame is before scheduling by the distribution feelings of interface polls all reduction task of caching system of shuffling Condition.
Step 9: Computational frame dispatches reduction task according to the distribution situation of reduction task.Computational frame is chosen wherein first Master backup node, a reduction task is distributed on this node.If master backup node failure, enter step 10, otherwise Enter step 11.
Step 10: Computational frame selects from backup node, and reduction task is sent on this node.If two from backup Node lost efficacy simultaneously, then this mission failure, mistake of dishing out.Terminate all steps.
Step 11: when reduction task executes on node, data is obtained to the local caching executor that shuffles by interface.
Step 12: after the caching executor that locally shuffles receives request, first check for data whether in internal memory, such as including Corresponding data is then directly returned, otherwise degaussing is examined and seized and taken corresponding data and return in depositing.
Step 13: reduction task starts after receiving data to calculate.
Step 14: repeat step 9 to step 13 finishes until all reduction tasks carryings, mapping-reduction end-of-job.
Shuffle the data cached replacement policy of caching system:
Due to the memory resource limitation of each node, in order to not affect performance during tasks carrying, caching system of shuffling is only Fixing memory headroom (can arrange) can be taken by configuration file.However as the continuous execution of task, will be active and standby The substantial amounts of reduction data of shuffling of memory cache of part node.In order to save memory source, caching system of shuffling provides and delays earliest Deposit task strategy disallowable at first.This strategy follows following steps.
Step 1: caching system of shuffling execution nodal test is not enough to internal memory surplus.
Step 2: this backup node sends to caching system main frame of shuffling and rejects request.
Step 3: after caching system main frame of shuffling receives rejecting request, according to local memory record, find and be buffered earliest The corresponding transmission of shuffling of mapping-work of shuffling rely on id, and the master backup node of this work all reduction task.
Step 4: this transmission of shuffling is relied on the master that id is broadcast to all caching systems of shuffling in cluster by caching system of shuffling Backup node.
Step 5: execution node receive shuffle transmission rely on id after from the internal memory of oneself by corresponding data block from interior Deposit middle deletion.
The recovery policy that the caching decorum of shuffling is supported for the robustness that Computational frame provides.
Because mapping-reduction Computational frame contains substantial amounts of mapping-reduction mistake when executing whole workflow Journey.If having coordinated caching system of shuffling, Computational frame is not needed to carry out manual checkpointing to calculating data.In office Occur unsuccessfully in business implementation procedure, then directly can directly recover from the data of recent mapping-reduction, greatly reduce Recovery time, improve calculating performance.This strategy follows following steps.
Step 1: run-time error in Computational frame.
Step 2: Computational frame starts to find the data persisting recently from back to front according to user's execution logic.
Step 3: whether have data of shuffling accordingly standby by interface to caching system inquiry of shuffling when Computational frame is searched Part.
Step 4: if having found backup, directly start to recover from this step.
Step 5: without finding backup, then continue to find forward, if all do not backed up, according to Computational frame Fault tolerant mechanism starts to recover.
Compared with prior art, the invention has the beneficial effects as follows: the distributed meter based on mapping-reduction model can be lifted Calculate the calculating performance of framework (such as spark, hadoop), it is to avoid the manual Checkpointing of poorly efficient user, lift Distributed Calculation The robustness of framework.
Brief description
Fig. 1. configuration diagram
Fig. 2. mapping tasks operating diagram
Fig. 3. host node reduction tasks carrying schematic diagram
Fig. 4. from node reduction tasks carrying schematic diagram
Fig. 5. task division information
Fig. 6. caching host traceback information of shuffling
Specific implementation method
Below with reference to accompanying drawing, embodiments of the invention are elaborated.The present embodiment is in technical solution of the present invention and calculation Implemented on the premise of method, and provided detailed embodiment and specific operation process, but be suitable for platform and be not limited to following realities Apply example.The concrete operations platform of this example leads to the small-sized cluster that forms of server by two Daeporis, on each server equipped with Ubuntuserver 14.04.1 lts 64 bit, and it is equipped with 8gb internal memory.The concrete exploitation of the present invention is based on apache The source code version of spark 1.6 is equally applicable as explanation, other mapping-reduction distributed computing frameworks such as hadoop.First The interface being passed to this method by changing the source code of spark is first needed to transmit data of shuffling.
The present invention passes through to dispose caching system in Distributed Calculation cluster, changes few portion of distributed computing framework simultaneously Divide code, realize the interface interchange to this method, the distributed of data of shuffling in realizing calculating for mapping-stipulations back up more Internal memory/disk buffering.Under the support of this method, the existing distributed computing framework based on mapping-reduction model can be lifted Performance and robustness.Designed based on the framework in Fig. 1, employ paxos agreement to tie up in the main frame of caching system of shuffling The concordance of protecting system state itself and robustness.Meanwhile, each node of Distributed Calculation cluster deploys to shuffle and hold Row device, to be responsible for the transmission of data of specifically shuffling, and caching and the progress of work for distributed computing framework provide interface.Reflecting Data is transferred to, by interface, execution of locally shuffling as shown in Fig. 2 distributed computing framework will be shuffled to penetrate the workflow in stage The internal memory of device, the parameter such as number of tasks then being worked according to this mapping-reduction by the executor that shuffles divides and selects backup section Point, and Backup Data.In reduction stages, distributed computing framework then passes through interface directly to local executor's request of shuffling Data.In the case of host node work, this data is derived from the internal memory of the executor that shuffles, as shown in Figure 3.If host node loses Lose, task can be scheduled in from node, read data from the hard disk of from node, as shown in Figure 4.
Please referring initially to Fig. 1, Fig. 1 is configuration diagram, as illustrated, the general frame of the present invention is typical master-slave mode frame Structure, main frame is made up of a working host and two backup hosts.They ensure the concordance of state by paxos agreement, from And avoid the whole system collapse causing due to working host collapse.Additionally, all dispose on every server of from node shuffling Caching executor.Simultaneously need to disposing modified spark Computational frame in same cluster.
When spark Computational frame is started working, once there being the task of comprising to shuffle transmission to be submitted to by user, caching of shuffling System will enter the job step illustrating in people's claims, to provide the acceleration of transmission of shuffling and robustness to prop up for spark Hold.And whole process is fully transparent for the user of spark.
Due to the present embodiment, to take full advantage of internal memory data cached.At the end of mapping tasks, reduction task just can be straight It is connected on the data required for reading in the local internal memory of the caching executor that shuffles, thus accelerating the speed of whole Distributed Calculation Degree.
Meanwhile, when step a certain in the computing of spark makes a mistake, when needing to recover, can find and shuffled with forward recursion The data of caching system caching, then start to recover, thus accelerating resume speed, the robustness of lifting whole system.
A kind of data cache method of shuffling based on mapping-reduction computation model, comprises the steps:
Computational frame and caching system cooperation running of shuffling:
Step 1: a mapping-reduction job is passed through interface by the division that task is unit by mapping-reduction Computational frame It is sent to caching main frame of shuffling.Here task contains the id that transmission of shuffling relies on, and mapping tasks sum is total with reduction task Number, after main frame receives, adds that timestamp is saved in local memory.As shown in Figure 5.
Step 2: shuffle caching main frame receive task divide data after, using random algorithm by reduction task therein with Each node of cluster do one a pair three mapping.I.e. reduction task corresponds to random three nodes, one of based on Want node, being left two is backup node.The mapping of reduction task and node is saved in the internal memory of main frame in the form of Hash table In, main frame can stamp timestamp to respective record simultaneously.Concrete preservation information is as shown in Figure 6.
Step 3: the one of node of Computational frame scheduling executes mapping tasks.This node has executed mapping tasks After calculating process, by calling the interface of caching system, the data is activation of shuffling of this mapping tasks is held to caching of locally shuffling The memory headroom of row device process.Return immediately simultaneously, represent that tasks carrying completes.
Step 4: when executor's process of the caching system on a node receive mapping tasks shuffle data when, can be by According to the dividing mode (being specified by Computational frame) of data acquiescence of shuffling, data is divided into multiple reduction of shuffling according to reduction task Data block, is saved in internal memory.Usual mapping tasks can produce identical with reduction task number or be less than reduction task The data block of number.
Step 5: executor is to the mapping table that caching host request reduction task is with node of shuffling, and the letter represented by Fig. 6 Breath.(this step only can execute once in whole mapping-reduction work).Mapping table ensure that the distribution rules of all executors It is consistent.
Step 6: the mapping table of the reduction task according to main frame for the executor and node, by shuffling that division in step 4 finishes Reduction data block is distributed to corresponding three reduction task remote nodes.
Step 7: executor sends when shuffling reduction data block, can give number respectively according to the setting of main and subordinate node in step 2 Add the label of master-slave back-up according to block.
Step 8: remote node receives the label reading this data block when shuffling reduction data block.If this label is shown as Master backup then saves it in internal memory, and such as fruit is then written into hard disk from backup.
Step 9: repeat step 3 arrives the process of step 8, until all mapping tasks of this work are finished, enters step Rapid 10.
Step 10: Computational frame is before scheduling by the distribution feelings of interface polls all reduction task of caching system of shuffling Condition.
Step 11: Computational frame dispatches reduction task according to the distribution situation of reduction task.Computational frame chooses it first In master backup node, a reduction task is distributed on this node.If master backup node failure, enter step 12, no Then enter step 13.
Step 12: Computational frame selects from backup node, and reduction task is sent on this node.If two from backup Node lost efficacy simultaneously, then this mission failure, mistake of dishing out.Terminate all steps.
Step 13: when reduction task executes on node, data is obtained to the local caching executor that shuffles by interface.
Step 14: after the caching executor that locally shuffles receives request, first check for data whether in internal memory, such as including Corresponding data is then directly returned, otherwise degaussing is examined and seized and taken corresponding data and return in depositing.
Step 15: reduction task starts after receiving data to calculate.
Step 16: repeat step 11 to step 15 finishes until all reduction tasks carryings, enters step 17.
Step 17: this mapping-reduction end-of-job.
Shuffle the data cached replacement policy of caching system:
Due to the memory resource limitation of each node, in order to not affect performance during tasks carrying, caching system of shuffling is only Fixing memory headroom (can arrange) can be taken by configuration file.However as the continuous execution of task, will be active and standby The substantial amounts of reduction data of shuffling of memory cache of part node.In order to save memory source, caching system of shuffling provides and delays earliest Deposit task strategy disallowable at first.This strategy follows following steps.
Step 1: caching system of shuffling execution nodal test is not enough to internal memory surplus.
Step 2: this backup node sends to caching system main frame of shuffling and rejects request.
Step 3: after caching system main frame of shuffling receives rejecting request, according to local memory record, find and be buffered earliest The corresponding transmission of shuffling of mapping-work of shuffling rely on id, and the master backup node of this work all reduction task.
Step 4: this transmission of shuffling is relied on the master that id is broadcast to all caching systems of shuffling in cluster by caching system of shuffling Backup node.
Step 5: execution node receive shuffle transmission rely on id after from the internal memory of oneself by corresponding data block from interior Deposit middle deletion.
The recovery policy that the caching decorum of shuffling is supported for the robustness that Computational frame provides.
Because mapping-reduction Computational frame contains substantial amounts of mapping-reduction mistake when executing whole workflow Journey.If having coordinated caching system of shuffling, Computational frame is not needed to carry out manual checkpointing to calculating data.In office Occur unsuccessfully in business implementation procedure, then directly can directly recover from the data of recent mapping-reduction, greatly reduce Recovery time, improve calculating performance.This strategy follows following steps.
Step 1: run-time error in Computational frame.
Step 2: Computational frame starts to find the data persisting recently from back to front according to user's execution logic.
Step 3: whether have data of shuffling accordingly standby by interface to caching system inquiry of shuffling when Computational frame is searched Part.
Step 4: if having found backup, directly start to recover from this step.
Step 5: without finding backup, then continue to find forward, if all do not backed up, according to Computational frame Fault tolerant mechanism starts to recover.
Pass through the benchmark program of the related spark such as word count on the basis of this embodiment, demonstrate this Invention correctness, simultaneously present invention spark compared to master in performance have not in different benchmark programs Lifting with degree.
The preferred embodiment of the present invention described in detail above.It should be appreciated that the ordinary skill of this area need not be created The property made work just can make many modifications and variations according to the design of the present invention.Therefore, all technical staff in the art Pass through the available technology of logical analysis, reasoning, or a limited experiment under this invention's idea on the basis of existing technology Scheme, all should be in the protection domain being defined in the patent claims.

Claims (3)

1. a kind of data cache method of shuffling based on mapping-reduction computation model is it is characterised in that the method includes walking as follows Rapid:
Step 1: a mapping-reduction job is sent by interface for the division of unit by mapping-reduction Computational frame by task To caching main frame of shuffling, after caching main frame of shuffling receives task division data, add that timestamp is saved in local memory;
Step 2: caching main frame of shuffling task is divided data adopt random algorithm by reduction task therein and cluster each Node do one a pair three mapping, and be saved in the form of the Hash table in the internal memory of caching main frame of shuffling
Step 3: the one of node of Computational frame scheduling executes mapping tasks, after this node has executed mapping tasks, leads to Cross the interface calling caching system by the data is activation of shuffling of this mapping tasks to the internal memory of caching executor's process of locally shuffling Space, returns simultaneously, represents that tasks carrying completes;
Step 4: when executor's process of the caching system on a node receive mapping tasks shuffle data when, according to shuffling The dividing mode of data acquiescence, data is divided into multiple reduction data blocks of shuffling according to reduction task, is saved in internal memory;
Step 5: the caching executor that locally shuffles to the mapping table of caching host request reduction task and node of shuffling, and according to washing Board caches the mapping table of reduction task and the node of main frame, by divide in step 4 the reduction data block of shuffling finishing be distributed to Corresponding three reduction task remote nodes, and according to main and subordinate node in step 2 setting respectively to data block add active and standby Part and the label from backup;
Step 6: remote node receives the label reading this data block when shuffling reduction data block, if this label is shown as active and standby Part then saves it in internal memory, if this label is shown as, from backup, being written into hard disk;If now master backup node Memory headroom not enough, then the data of shuffling can trigger the caching decorum of shuffling rejects step;, simultaneously enter step 7.
Step 7: repeat step 3 arrives the process of step 6, until all mapping tasks of this work are finished, enters step 8;
Step 8: Computational frame is before scheduling by the distribution situation of interface polls all reduction task of caching system of shuffling;
Step 9: Computational frame dispatches reduction task according to the distribution situation of reduction task: chooses master backup section therein first Point, a reduction task is distributed on this node, if master backup node failure, enters step 10, otherwise enters step 11;
Step 10: Computational frame selects from backup node, reduction task is sent on this node, if two from backup node Lost efficacy, then this mission failure simultaneously, mistake of dishing out, terminate all steps.
Step 11: when reduction task executes on node, data is obtained to the caching executor that locally shuffles by interface;
Step 12: after the caching executor that locally shuffles receives request, first check for data whether in internal memory, such as in internal memory Then directly return corresponding data to this task, otherwise degaussing is examined and seized and taken corresponding data and return;
Step 13: reduction task starts after receiving data to calculate;
Step 14: repeat step 9 to step 13 finishes until all reduction tasks carryings, mapping-reduction end-of-job.
2. according to the data cache method of shuffling based on mapping-reduction computation model it is characterised in that described data of shuffling is picked Except step, specific as follows:
Step 1: caching system of shuffling execution node, that is, master backup nodal test is not enough to internal memory surplus;
Step 2: this backup node sends to caching system main frame of shuffling and rejects request;
Step 3: after caching main frame of shuffling receives rejecting request, according to local memory record, find the mapping that is buffered earliest-wash The corresponding transmission of shuffling of board work relies on id, and the master backup node of this work all reduction task;
Step 4: this transmission of shuffling is relied on the master backup that id is broadcast to all caching systems of shuffling in cluster by caching system of shuffling Node.
Step 5: execution node receive shuffle transmission rely on id after from the internal memory of oneself by corresponding data block from internal memory Delete.
3. according to claim 1 based on mapping-reduction computation model shuffle data cache method it is characterised in that When run-time error in Computational frame, caching system of shuffling provides the recovery policy of robustness support for Computational frame, specifically such as Under:.
Step 1: Computational frame starts to find the data persisting recently from back to front according to user's execution logic;
Step 2: when Computational frame is searched, whether data backup of shuffling accordingly is had to caching system inquiry of shuffling by interface: such as Fruit has been found backup and then directly has started to recover from this step;
Without finding backup, then continue to find forward, until confirming all not back up, then according to the fault-tolerant machine of Computational frame System starts to recover.
CN201610712705.5A 2016-08-24 2016-08-24 It is a kind of based on mapping-reduction computation model data cache method of shuffling Active CN106371919B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610712705.5A CN106371919B (en) 2016-08-24 2016-08-24 It is a kind of based on mapping-reduction computation model data cache method of shuffling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610712705.5A CN106371919B (en) 2016-08-24 2016-08-24 It is a kind of based on mapping-reduction computation model data cache method of shuffling

Publications (2)

Publication Number Publication Date
CN106371919A true CN106371919A (en) 2017-02-01
CN106371919B CN106371919B (en) 2019-07-16

Family

ID=57878112

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610712705.5A Active CN106371919B (en) 2016-08-24 2016-08-24 It is a kind of based on mapping-reduction computation model data cache method of shuffling

Country Status (1)

Country Link
CN (1) CN106371919B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110690991A (en) * 2019-09-10 2020-01-14 无锡江南计算技术研究所 Non-blocking network reduction computing device and method based on logic tree
WO2020024586A1 (en) * 2018-08-02 2020-02-06 Memverge, Inc. Shuffle manager in a distributed memory object architecture
WO2022218218A1 (en) * 2021-04-14 2022-10-20 华为技术有限公司 Method and apparatus for processing data, reduction server, and mapping server

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140059552A1 (en) * 2012-08-24 2014-02-27 International Business Machines Corporation Transparent efficiency for in-memory execution of map reduce job sequences
US20150150018A1 (en) * 2013-11-26 2015-05-28 International Business Machines Corporation Optimization of map-reduce shuffle performance through shuffler i/o pipeline actions and planning
CN105718244A (en) * 2016-01-18 2016-06-29 上海交通大学 Streamline data shuffle Spark task scheduling and executing method
CN105760215A (en) * 2014-12-17 2016-07-13 南京绿云信息技术有限公司 Map-reduce model based job running method for distributed file system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140059552A1 (en) * 2012-08-24 2014-02-27 International Business Machines Corporation Transparent efficiency for in-memory execution of map reduce job sequences
US20150150018A1 (en) * 2013-11-26 2015-05-28 International Business Machines Corporation Optimization of map-reduce shuffle performance through shuffler i/o pipeline actions and planning
CN105760215A (en) * 2014-12-17 2016-07-13 南京绿云信息技术有限公司 Map-reduce model based job running method for distributed file system
CN105718244A (en) * 2016-01-18 2016-06-29 上海交通大学 Streamline data shuffle Spark task scheduling and executing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YANDONG WANG 等: "Characterization and Optimization of Memory-Resident MapReduce on HPC Systems", 《2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020024586A1 (en) * 2018-08-02 2020-02-06 Memverge, Inc. Shuffle manager in a distributed memory object architecture
CN110690991A (en) * 2019-09-10 2020-01-14 无锡江南计算技术研究所 Non-blocking network reduction computing device and method based on logic tree
CN110690991B (en) * 2019-09-10 2021-03-19 无锡江南计算技术研究所 Non-blocking network reduction computing device and method based on logic tree
WO2022218218A1 (en) * 2021-04-14 2022-10-20 华为技术有限公司 Method and apparatus for processing data, reduction server, and mapping server

Also Published As

Publication number Publication date
CN106371919B (en) 2019-07-16

Similar Documents

Publication Publication Date Title
US11281534B2 (en) Distributed data object management system
CN106662983B (en) The methods, devices and systems of data reconstruction in distributed memory system
US10747745B2 (en) Transaction execution commitment without updating of data row transaction status
CN103370693B (en) restart process
CA2913036C (en) Index update pipeline
US10430298B2 (en) Versatile in-memory database recovery using logical log records
CN105871603B (en) A kind of the real time streaming data processing fail recovery and method of data grids based on memory
US9904721B1 (en) Source-side merging of distributed transactions prior to replication
WO2015100985A1 (en) Method and database engine for recording transaction log
US10049036B2 (en) Reliable distributed messaging using non-volatile system memory
CN109739935A (en) Method for reading data, device, electronic equipment and storage medium
WO2014060881A1 (en) Consistency group management
US11748215B2 (en) Log management method, server, and database system
Moniz et al. Blotter: Low latency transactions for geo-replicated storage
CN109063005B (en) Data migration method and system, storage medium and electronic device
US11003532B2 (en) Distributed data object management system operations
CN109558213A (en) The method and apparatus for managing the virtual machine snapshot of OpenStack platform
WO2022048358A1 (en) Data processing method and device, and storage medium
US20230127166A1 (en) Methods and systems for power failure resistance for a distributed storage system
CN106371919A (en) Shuffle data caching method based on mapping-reduction calculation model
CN110888761A (en) Fault-tolerant method based on active backup of key task part and stream processing platform
WO2023274409A1 (en) Method for executing transaction in blockchain system and blockchain node
CN111475480A (en) Log processing method and system
CN105988885A (en) Compensation rollback-based operation system fault self-recovery method
US11579785B2 (en) Systems and methods of providing fault-tolerant file access

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant