CN105740085B

CN105740085B - Fault-tolerance processing method and device

Info

Publication number: CN105740085B
Application number: CN201410763653.5A
Authority: CN
Inventors: 刘杰; 张鹏; 党李飞; 曾永斌; 王群
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2014-12-11
Filing date: 2014-12-11
Publication date: 2019-04-19
Anticipated expiration: 2034-12-11
Also published as: CN105740085A

Abstract

The invention discloses a kind of fault-tolerance processing method and devices, belong to field of computer technology.Method includes:, when detecting the error message of low memory, to obtain the initial data of failed tasks in task implementation procedure；Initial data is subjected to cutting, obtains at least one data block；Data processing node is called to carry out calculation process at least one data block；Obtain the corresponding operation result of at least one data block；Each operation result is merged, final operation result is obtained, sends final operation result to client.Due to having carried out cutting processing to initial data, it is achieved that being multiple subtasks by ancestral task cutting, in the implementation procedure of subtask, memory usage is reduced for entire task compared to executing, and call it is local from node or other from node multiple subtasks are handled respectively, it can guarantee and failed tasks are effectively treated in the insufficient situation of node memory in local, improve task execution efficiency and fault-tolerant reliability.

Description

Fault-tolerance processing method and device

Technical field

The present invention relates to field of computer technology, in particular to a kind of fault-tolerance processing method and device.

Background technique

Spark is a kind of general distributed big data parallel computation frame, is realized based on MR (Map Reduce) algorithm Distributed computing.For Spark compared with MR, maximum feature is that the operation result of data can store in memory.Based on this Functional characteristic, Spark have more advantage in terms of mass data processing, interative computation and data mining.

The frame diagram of Spark distributed system shown in Figure 1, main most of comprising 3: Driver (client), MasterNode (host node) and SlaveNode (from node).In Fig. 1, TaskScheduler (the task tune of Driver Degree) module can consider the resources left situation of SlaveNode when execution task is distributed.Namely consider that SlaveNode whether can It enough can be by the normal starting of Spark application.After the normal starting of Spark application, a large amount of mediant can be generated during execution task According to result.Successive ignition operation can be especially related to when running machine learning algorithm, the intermediate data result of generation will be defeated 3 to 5 times for entering data are even more.And these intermediate data results can be stored in memory.Due to above-mentioned task distribution side Formula is only accounted for using the memory that needs when starting, therefore the case where often will appear low memory during task execution, from And task execution is caused to fail.

For this purpose, existing fault-tolerance processing method discovery task execution failure after, can be anti-by the error message of low memory DAGScheduler (DAG scheduling) module for client of feeding.DAGScheduler module, will after receiving the error message The task of execution failure is replaced in dispatch list.Later, according to the scheduling of resource mode of itself, for appointing for execution failure Data processing node is redistributed in business.

In the implementation of the present invention, the inventor finds that the existing technology has at least the following problems:

Due to the principle that the scheduling of resource mode of Spark distributed system is localized as far as possible based on data, namely as far as possible will Task is assigned on the node at related data place itself.So DAGSch eduler module is again after task execution failure The probability that the task of execution failure is distributed to original SlaveNode is very high.Due to the initial data general data of the task Measure larger, and the free memory of original SlaveNode was seldom, therefore can occur the case where task execution failure again, from And reduce the execution efficiency of Spark distributed system fault-tolerant reliability and task.

Summary of the invention

In order to solve problems in the prior art, the embodiment of the invention provides a kind of fault-tolerance processing method and devices.It is described Technical solution is as follows:

In a first aspect, providing a kind of fault-tolerance processing method, which comprises

In task implementation procedure, when detecting the error message of low memory, the initial data of failed tasks is obtained；

The initial data is subjected to cutting, obtains at least one data block；

Data processing node is called to carry out calculation process at least one described data block, the data processing node is this Ground is from node or except the local is from other in addition to node from node；

Obtain the corresponding operation result of at least one described data block；

Each operation result is merged, final operation result is obtained, sends the final operation result to client.

In the first possible implementation of the first aspect, the initial data by failed tasks carry out cutting it Before, the method also includes:

Obtain the local from node and it is described other from the free memory information of node；

According to the free memory information, calculate each from the corresponding data weights assigned value of node；

The initial data by failed tasks carries out cutting, comprising:

According to the size of the data weights assigned value and the initial data, cutting is carried out to the initial data, is obtained To each from the corresponding data block of node.

The possible implementation of with reference to first aspect the first, in second of possible implementation of first aspect In, before the calling data processing node carries out calculation process at least one described data block, the method also includes:

Subtask title is distributed from the corresponding data block of node to be each；

The calling data processing node carries out calculation process at least one described data block, comprising:

Send corresponding data block and subtask title from node to each, by the local from node and it is described other from Node carries out calculation process at least one described data block.

The possible implementation of second with reference to first aspect, in the third possible implementation of first aspect In, the corresponding operation result of at least one data block described in the acquisition, comprising:

According to the corresponding subtask title of each data block, from the local from node and it is described other from node, Obtain the corresponding operation result of each data block.

With reference to first aspect, in a fourth possible implementation of the first aspect, described by the original of failed tasks Before data carry out cutting, the method also includes:

Obtain memory peak value of the local from node during executing the failed tasks；

Obtain free memory information of the local from node；

The initial data by failed tasks carries out cutting, comprising:

According to the memory peak value and the free memory information, the piecemeal quantity of the initial data is calculated；

According to the size of the piecemeal quantity and the initial data, cutting is carried out to the initial data, is obtained described At least one data block.

The 4th kind of possible implementation with reference to first aspect, in the 5th kind of possible implementation of first aspect In, before the calling data processing node carries out calculation process at least one described data block, the method also includes:

Subtask title is distributed for each described data block；

It calls the local successively to carry out calculation process to each data block from node, obtained operation result is saved On local disk.

The 5th kind of possible implementation with reference to first aspect, in the 6th kind of possible implementation of first aspect In, the corresponding operation result of at least one data block described in the acquisition, comprising:

Each data block pair is obtained from the local disk according to the corresponding subtask title of each data block The operation result answered.

Second aspect, provides a kind of fault-tolerant processing device, and described device includes:

Initial data obtains module, for when detecting the error message of low memory, obtaining in task implementation procedure Take the initial data of failed tasks；

Initial data cutting module, the initial data for getting initial data acquisition module carry out cutting, Obtain at least one data block；

Handle node calling module, for call data processing node to the initial data cutting module cutting at least One data block carries out calculation process, and the data processing node is local from node or except the local is from its in addition to node He is from node；

Operation result obtains module, for obtaining the corresponding operation result of at least one described data block；

Operation result merging module, each operation result for getting operation result acquisition module close And obtain final operation result；

Operation result sending module, for sending the final operation knot that the operation result merging module obtains to client Fruit.

In the first possible implementation of the second aspect, described device further include:

First memory information obtains module, for obtain the local from node and it is described other from the free memory of node Information；

Weight calculation module, for calculating each from the corresponding data distribution of node according to the free memory information Weighted value；

The initial data cutting module, for the size according to the data weights assigned value and the initial data, Cutting is carried out to the initial data, is obtained each from the corresponding data block of node.

In conjunction with the first possible implementation of second aspect, in second of possible implementation of second aspect In, described device further include:

First task title distribution module, for distributing subtask title from the corresponding data block of node to be each；

The processing node calling module, for sending corresponding data block and subtask title from node to each, by The local from node and it is described other from node at least one described data block carry out calculation process.

In conjunction with second of possible implementation of second aspect, in the third possible implementation of second aspect In, the operation result obtains module, for according to the corresponding subtask title of each data block, from the local from node With it is described other from node, obtain the corresponding operation result of each data block.

In conjunction with second aspect, in the fourth possible implementation of the second aspect, described device further include:

Memory peak value obtains module, for obtaining memory of the local from node during executing the failed tasks Peak value；

Second memory information obtains module, for obtaining free memory information of the local from node；

The initial data cutting module, for according to the memory peak value and the free memory information, described in calculating The piecemeal quantity of initial data；According to the size of the piecemeal quantity and the initial data, the initial data is cut Point, obtain at least one described data block.

In conjunction with the 4th kind of possible implementation of second aspect, in the 5th kind of possible implementation of second aspect In, described device further include:

Second task names distribution module, for distributing subtask title for each described data block；

The processing node calling module, for calling the local successively to carry out operation to each data block from node Processing, obtained operation result is stored on local disk.

In conjunction with the 5th kind of possible implementation of second aspect, in the 6th kind of possible implementation of second aspect In, the operation result obtains module, is used for according to the corresponding subtask title of each data block, from the local disk In, obtain the corresponding operation result of each data block.

Technical solution provided in an embodiment of the present invention has the benefit that

In task implementation procedure, when detecting the error message of low memory, the initial data of failed tasks is obtained, And initial data progress cutting is obtained at least one data block, call later it is local from node or other from node at least one A data block carries out calculation process, due to having carried out cutting processing to initial data, it is achieved that being by ancestral task cutting Multiple subtasks for executing entire task, reduce memory usage, and call in the implementation procedure of subtask It is local from node or other from node multiple subtasks are handled respectively, it is ensured that in local from the insufficient feelings of node memory Failed tasks are effectively treated under condition, improve the task execution efficiency and fault-tolerant reliability of Spark distributed system.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.

Fig. 1 is a kind of architecture diagram of Spark distributed system provided in an embodiment of the present invention；

Fig. 2 is a kind of flow chart of fault-tolerance processing method provided in an embodiment of the present invention；

Fig. 3 is a kind of flow chart of fault-tolerance processing method provided in an embodiment of the present invention；

Fig. 4 is a kind of architecture diagram of Spark distributed system provided in an embodiment of the present invention；

Fig. 5 is a kind of architecture diagram of Spark distributed system provided in an embodiment of the present invention；

Fig. 6 is a kind of flow chart of fault-tolerance processing method provided in an embodiment of the present invention；

Fig. 7 is a kind of architecture diagram of Spark distributed system provided in an embodiment of the present invention；

Fig. 8 is a kind of architecture diagram of Spark distributed system provided in an embodiment of the present invention；

Fig. 9 is a kind of schematic diagram of internal structure of fault-tolerant processing device provided in an embodiment of the present invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.

Before explanation is further explained to the embodiment of the present invention, first to the relevant background knowledge of the embodiment of the present invention It is introduced.

In the frame diagram of Spark distributed system shown in Fig. 1, key function module is described as follows:

DAGScheduler: by Spark breakdown of operation at one or more Stage.Each Stage is according to RDD The Partition number of (Resilient Distributed Datasets, elasticity distribution formula data set) determines that Task (appoints Business) number, then generate corresponding Task set and be put into TaskScheduler.

TaskScheduler: Task is assigned on the Executor of SlaveNode and is executed.

Executor: really executing the module of operation, and a cluster generally comprises multiple Executor.Each Executor After " Launch Task " order that the TaskScheduler for receiving Driver is sent, one or more Task can be executed.

Process flow is usual are as follows: 1., 2. the walks, actively register application letter to MasterNode in Driver start-up course Breath.3. the walks, MasterNode sends to SlaveNode and requests, to start Executor.4. walks, SlaveNode will start Information afterwards returns to MasterNode.5., 6. the walks, MasterNode sends relevant Executor information to Driver. The job that application is submitted is divided into multiple tasks by the 5th step, DAGScheduler and TaskScheduler.7. walks, dividing After the completion of task, TaskScheduler according to each SlaveNode current CPU (Central Processing Unit, in Central Processing Unit) nucleus number and Spark cluster resource memory, distribute task for each SlaveNode.8. walks, in task After being finished, operation result is returned to DAGScheduler by SlaveNode.

In existing Spark distributed system, three kinds of allocation schedule modes are generally included.The first scheduling method is Standalone (singleton) mode carries complete service, can individually be deployed in cluster, without relying on other resource management systems System only supports coarseness mode at present.Second of scheduling method is YARN mode, and Spark is operated on YARN, done by YARN Coarseness mode is only supported in resource management at present.Wherein, YARN is follow-on MR, developed on the basis of first generation MR and Come.The third scheduling method is Mesos mode, and Spark is operated on Mesos, and resource management is done by Mesos, supports coarse grain Degree and fine granularity both of which；The CPU core number of node is only considered when carrying out resource allocation.Wherein, Mesos is a distribution The resource management platform of environment.

In above-mentioned three kinds of resource allocation scheduling modes, Standalone mode only considers node when carrying out resource allocation CPU core number.YARN mode and Mesos mode consider the CPU core number and SlaveNode of node when carrying out resource allocation Executor required memory when starting, the memory are generally specified by configuration file.Due to the production of Spark distributed system The raw efficiency mainly solved in successive ignition operation or MR calculating process, main two class is applied to be excavated for big data And machine learning.And above-mentioned two classes application can be related to processing, the successive ignition operation of mass data, so this operation is logical It is often time-consuming and a large amount of memories need to be occupied.Therefore, appoint caused by will appear in task implementation procedure due to low memory Business executes the case where failure.

For this purpose, carrying out two to the executing failure of the task the embodiment of the invention provides a kind of new fault-tolerance processing method Secondary cutting after obtaining multiple small tasks, takes distributed computing or the mode of local disk storage to solve low memory Caused execution mission failure problem.To be remarkably improved the fault-tolerant reliability and task execution effect of Spark distributed system Rate is detailed in process referring to following embodiments.

Fig. 2 is a kind of flow chart of fault-tolerance processing method provided in an embodiment of the present invention.Referring to fig. 2, the embodiment of the present invention The method flow of offer includes:

201, in task implementation procedure, when detecting the error message of low memory, the original of failed tasks is obtained Data.

After SlaveNode receives the task of TaskScheduler submission, if during execution task not due to memory Lead to mission failure enough, then the Executor of SlaveNode can capture error message " the out of of low memory memery".At this point, the error message will not be fed back to Driver by SlaveNode, but call the fault-tolerant mould in such as Fig. 4 Block carries out fault-tolerant processing.

202, initial data is subjected to cutting, obtains at least one data block.

It, can be according to the surplus of local SlaveNode or other SlaveNode when fault-tolerant module carries out cutting to initial data Remaining memory information carries out cutting, and the embodiment of the present invention is to this without specifically limiting.

203, data processing node is called to carry out calculation process at least one data block.

Wherein, host node and multiple from node is generally included in Spark distributed system.The data processing node refers to In generation, is local from node or except local is from other in addition to node from node.Wherein, local to be currently executing this from node reference The node of task；Other refer to the node of the current not specified task from node.

204, the corresponding operation result of at least one data block is obtained.

After carrying out cutting to data, distributed computing or local disk storage mode can be taken, is based at least one A data block carries out operation.Wherein, when taking distributed computing, data processing node include it is local from node and other from Node.When taking local disk storage mode, data processing node only includes local from node.

205, each operation result is merged, obtains final operation result, send final operation result to client.

Method provided in an embodiment of the present invention, in task implementation procedure, when detecting the error message of low memory, Obtain the initial data of failed tasks, and initial data progress cutting obtained at least one data block, call later it is local from Node or other from node at least one data block carry out calculation process, due to having carried out cutting processing, institute to initial data To realize ancestral task cutting as multiple subtasks, in the implementation procedure of subtask, for executing entire task, Reduce memory usage, and call it is local from node or other from node multiple subtasks are handled respectively, it is ensured that Failed tasks are effectively treated in the insufficient situation of node memory in local, improve appointing for Spark distributed system Execution efficiency of being engaged in and fault-tolerant reliability.

Optionally, before the initial data of failed tasks being carried out cutting, this method further include:

Obtain the local free memory information from node with other from node；

According to free memory information, calculate each from the corresponding data weights assigned value of node；

The initial data of failed tasks is subjected to cutting, comprising:

According to the size of data weights assigned value and initial data, cutting is carried out to initial data, is obtained each from node Corresponding data block.

Optionally, before calling data processing node to carry out calculation process at least one data block, this method further include:

Data processing node is called to carry out calculation process at least one data block, comprising:

Send corresponding data block and subtask title from node to each, by local from node and other from node to A few data block carries out calculation process.

Optionally, the corresponding operation result of at least one data block is obtained, comprising:

According to the corresponding subtask title of each data block, from local from node and other from node, obtain each The corresponding operation result of a data block.

Obtain the local memory peak value from node during executing failed tasks；

Obtain the local free memory information from node；

The initial data of failed tasks is subjected to cutting, comprising:

According to memory peak value and free memory information, the piecemeal quantity of initial data is calculated；

According to the size of piecemeal quantity and initial data, cutting is carried out to initial data, obtains at least one data block.

Subtask title is distributed for each data block；

It calls and calculation process successively locally is carried out to each data block from node, obtained operation result is stored in this On local disk.

It is corresponding to obtain each data block from local disk according to the corresponding subtask title of each data block Operation result.

All the above alternatives can form alternative embodiment of the invention using any combination, herein no longer It repeats one by one.

Fig. 3 is a kind of flow chart of fault-tolerance processing method provided in an embodiment of the present invention.Interaction agent be it is local from node, Other are from node and client.In the fault-tolerant processing mode for taking distributed computing as an example, referring to Fig. 3, the embodiment of the present invention is mentioned The method flow of confession includes:

301, in task implementation procedure, when error message of the local from nodal test to low memory, failure is obtained The initial data of task.

By local from node be Fig. 4 in SlaveNode1 for, then SlaveNode1 receive Driver submission After task, if causing task execution to fail due to low memory during execution task, SlaveNode1's Executor can capture the error message " out of memery " of low memory.At this point, SlaveNode1 will not be by the mistake Information feeds back to Driver, but fault-tolerant modules A in calling figure 4 carries out fault-tolerant processing.

Fault-tolerant modules A is obtained from BlockManager module and is lost after the error message for receiving Executor transmission Lose the initial data of task.Certainly, in addition to the mode for taking above-mentioned acquisition initial data, other acquisition modes can be also taken, this Inventive embodiments are to this without specifically limiting.

302, local to obtain itself the free memory information with other from node from node.

In embodiments of the present invention, local to obtain other free memory information from node from node SlaveNode1 When, it may be implemented in the following manner:

SlaveNode1 and MasterNode establish connection, other free memories from node are obtained from MasterNode Information.

Certainly, other than taking above-mentioned acquisition free memory information mode, other modes, the embodiment of the present invention can also be taken To this without specifically limiting.

And locally can be then directly obtained from the free memory information of node itself, details are not described herein again.

303, it locally from node according to free memory information, calculates each from the corresponding data weights assigned value of node.

In embodiments of the present invention, it when the initial data to failed tasks carries out cutting, needs according to data weights assigned Value.Therefore, it before carrying out cutting to initial data, also needs to be calculated according to obtained free memory information each corresponding from node Data weights assigned value.

With Mem1, Mem2, Mem3 ..., MemN refer to above-mentioned each free memory from node, a_iRefer to any one from For the corresponding data weights assigned value of node, then shown in the following formula of the calculation formula of data weights assigned value (1):

a_i=Memi/ (Mem1+Mem2+Mem3+ ...+MemN) (1)

304, locally from node according to the size of data weights assigned value and initial data, cutting is carried out to initial data, It obtains each from the corresponding data block of node.

In embodiments of the present invention, obtained according to above-mentioned steps 303 it is each from the corresponding data weights assigned value of node Afterwards, it is local from the fault-tolerant modules A in node can be that at least one is counted by initial data cutting according to the data weights assigned value According to block.

Size with initial data is S, and the free memory information of SlaveNode1 is Mem1, and data weights assigned value is a1；The free memory information of SlaveNode2 is Mem2, and data weights assigned value is a2；The free memory information of SlaveNode3 For Mem3, for data weights assigned value is a3, then the size of the corresponding data block of SlaveNode1 is S*a1, SlaveNode2 The size of corresponding data block is S*a2, and the size of the corresponding data block of SlaveNode3 is S*a3.

305, local to distribute subtask title from the corresponding data block of node to be each from node.

After carrying out cutting to initial data, realizing original failed tasks cutting is multiple subtasks.In this hair It further include distributing subtask from the corresponding data block of node to be each to be identified to multiple subtasks in bright embodiment The step of title.Wherein, subtask title can be a character string, such as a cardinar number word string, a pure alphabetic string, a number The character string etc. that word and letter mix.

By it is each from the corresponding data block of node be respectively Data1, Data2 ..., then can be every number for DataN According to block distribution task1, task2, task3 ..., subtask title as taskN.

306, local to send corresponding data block and subtask title from node with other from node to itself.

In embodiments of the present invention, local to be completed from node in the initial data cutting to failed tasks, and be each number After distributing subtask title according to block, just each data block is sent to corresponding from node, each height is appointed from node by each The data of business are handled.It avoids by local when executing the failed tasks again from node, secondary appearance executes unsuccessful Situation.And each only executes a subtask of the failed tasks from node, to reduce interior when executing entire task Utilization rate is deposited, the fault-tolerant reliability of Spark distributed system is improved.

Corresponding data block with SlaveNode1 is Data1, subtask name is known as task1, the correspondence of SlaveNode2 Data block be that Data2, subtask name are known as task2, the corresponding data block of SlaveNode3 is Data3, subtask title For task3, then data block Data2 and subtask title task2 are sent to SlaveNode2 by SlaveNode1；By data Block Data3 and subtask title task3 are sent to SlaveNode3；And data block Data1 is then carried out by SlaveNode1 itself Processing.

307, it is local from node and other from node calculation process is carried out at least one data block.

In embodiments of the present invention, other in Fig. 4 from node SlaveNode2, SlaveNode3 ..., It include a fault-tolerant module in SlaveNodeN.It receives at other from the fault-tolerant module of node local from node SlaveNode1 send data block after, the data block received is committed to Executor, by Executor to data block into Row calculation process, obtain at least one operation result Result1, Result2, Result3 ..., ResultN.

308, local to obtain the corresponding operation result of at least one data block from node.

In embodiments of the present invention, it is local from node when obtaining the corresponding operation result of at least one data block, can adopt Take following manner:

Certainly, in addition to taking above-mentioned operation result acquisition modes, other acquisition modes, the embodiment of the present invention pair can also be taken This is without specifically limiting.For example, obtaining Result2 from SlaveNode2 according to task2, Result2 is to data block The operation result of Data2；Result3 is obtained from SlaveNode3 according to task3, Result3 is the fortune to data block Data3 Calculate result.

309, local to be merged from node by each operation result, final operation result is obtained, is sent most to client Whole operation result.

It in embodiments of the present invention, will be each locally from node after being collected into the corresponding operation result of each data block A operation result (Result1, Result2, Result3 ..., ResultN) merge, obtain terminal operation result, will The operation result is back to Driver.

Method provided in an embodiment of the present invention, in task implementation procedure, when detecting the error message of low memory, Obtain the initial data of failed tasks, and initial data progress cutting obtained at least one data block, call later it is local from Node or other from node at least one data block carry out calculation process, due to having carried out cutting processing, institute to initial data To realize ancestral task cutting as multiple subtasks, in the implementation procedure of subtask, for executing entire task, Memory usage is reduced, and calls and locally multiple subtasks is handled respectively from node from node with other, it is ensured that Failed tasks are effectively treated in the insufficient situation of node memory in local, improve appointing for Spark distributed system Execution efficiency of being engaged in and fault-tolerant reliability.

Referring to Fig. 5, above-mentioned steps 301 to step 309 are explained with a detailed example below.

In Fig. 5, Spark distributed system has 6 nodes, and a host node and 5 are from node.Wherein, each node Free memory be 40G, 16 cores.Respectively with SlaveNode1, SlaveNode2, SlaveNode3, SlaveNode4, SlaveNode5 refers to above-mentioned 5 from node.Wherein, SlaveNode1 refers to local from node.Each includes from node One fault-tolerant modules A.Driver application operation SVM (Support Vector Machine, support vector machines) machine learning is calculated Method, the data volume (data volume for participating in operation) of calculating are 1,000,000 datas, 5000 dimensions, size 100G.Fault-tolerant place in detail It is as follows to manage process:

The first step, Driver (client) submit task to SlaveNode1 node.

Second step, SlaveNode1 call Executor to execute the task after receiving task task.In task execution In the process due to low memory, there is the case where task execution failure.When Executor captures the mistake of " out of memery " After false information, the error message is sent to fault-tolerant modules A.That is, call the case where fault-tolerant modules A is to above-mentioned low memory into Row processing.

Total amount of data size due to participating in operation is 100G, so being averagely allocated to the data volume of SlaveNode1 node About 20G.In addition, SVM machine learning algorithm will be related to multiple iterative calculation and matrix multiple fortune in the process of implementation It calculates, these intermediate data results are often left and right 3-5 times (60-100G) of task data amount.Therefore, hold very much in this case Easily there is the case where low memory, so cause task can not successful execution finish.

After third and fourth step, the fault-tolerant modules A of SlaveNode1 receive the error message of Executor transmission, from Initial data data involved in the task is obtained in BlockManager module.

Five, the six steps, the fault-tolerant modules A of SlaveNode1 and MasterNode are interacted, with obtain SlaveNode2, The free memory information of above-mentioned 4 SlaveNode of SlaveNode3, SlaveNode4, SlaveNode5.Meanwhile fault-tolerant module It also will acquire the free memory Mem1 of SlaveNode1.

7th step, according to the free memory information of 5 SlaveNode, calculate each node corresponding data weights assigned value.

Above-mentioned 5 free memories from node, a are referred to Mem1, Mem2, Mem3, Mem4, Mem5 respectively_iIt refers to any For a corresponding data weights assigned value from node, then shown in the following formula of the calculation formula of data weights assigned value (2):

a_i=Memi/ (Mem1+Mem2+Mem3+Mem4+Mem5) (2)

Assuming that the corresponding free memory numerical value of Mem1, Mem2, Mem3, Mem4, Mem5 be respectively 21G, 18G, 12G, 9G, 6G, the then corresponding data distribution of SlaveNode1 corresponding data weights assigned value a1=21/66=0.32, SlaveNode2 add Weight a2=18/66=0.24, SlaveNode3 corresponding data weights assigned value a3=12/66=0.18, SlaveNode4 The corresponding data weights assigned value a5=6/66=of corresponding data weights assigned value a4=9/66=0.14, SlaveNode5 0.12.Later, the initial data of the task is divided into Data1, Data2, Data3, Data4, Data5 according to this ratio.With The initial data size of SlaveNode1 is 20G, then Data1=20*0.32=6.4G, Data2=20*0.24=4.8G, Data3=20*0.18=3.6G, Data4=20*0.14=2.8G, Data5=20*0.12=2.4G.

Later, the fault-tolerant modules A of SlaveNode1 is respectively these data blocks distribution subtask title.For example, respectively with Task1, task2, task3, task4, task5 are named.

8th step, the fault-tolerant modules A of SlaveNode1 respectively distribute to task1, task2, task3, task4, task5 SlaveNode1, SlaveNode2, SlaveNode3, SlaveNode4, SlaveNode5 carry out operation.

Wherein, other are from the fault-tolerant modules A of node in the subtask that the fault-tolerant modules A for receiving SlaveNode1 is sent Afterwards, which is committed to Executor, actual operation is carried out by Executor.

9th step, SlaveNode2, SlaveNode3, SlaveNode4, SlaveNode5 have been calculated corresponding task2, After task3, task4, task5 task, result Result2, Result3, Result4, Result5 are back to SlaveNode1, SlaveNode1 have simultaneously participated in the operation of task1, result Result1.

Tenth step, Slave1Node1 node have collected all subtasks (task1, task2, task3, task4, task5) Operation result Result1, Result2, Result3, Result4, Result5 after, these operation results are merged, are obtained To the final result Result of the task, it is then returned to client.

Above ten steps are the full instances that above method embodiment is realized.By taking distributed calculating side Formula can solve Spark distributed system in the process of running, due to mission failure problem caused by low memory, improve The execution efficiency of Spark distributed system fault-tolerant reliability and task.

Fig. 6 is a kind of flow chart of fault-tolerance processing method provided in an embodiment of the present invention.Interaction agent is local from node And client.It is provided in an embodiment of the present invention referring to Fig. 6 in the fault-tolerant processing mode for taking local disk storing data as an example Method flow includes:

601, in task implementation procedure, when error message of the local from nodal test to low memory, failure is obtained The initial data of task.

By taking local is from the SlaveNodeN (N is less than from node total number) that node is in Fig. 7 as an example, then connect in SlaveNodeN After the receiving Driver submission of the task, if causing task execution to fail due to low memory during execution task, The Executor of SlaveNodeN can capture the error message " out of memery " of low memory.At this point, SlaveNodeN The error message will not be fed back to Driver, but fault-tolerant module B in calling figure 7 carries out fault-tolerant processing.

Fault-tolerant module B is obtained from BlockManager module and is lost after the error message for receiving Executor transmission Lose the initial data of task.Certainly, in addition to the mode for taking above-mentioned acquisition initial data, other acquisition modes can be also taken, this Inventive embodiments are to this without specifically limiting.

602, the local memory peak value and free memory information being obtained from from node during executing failed tasks.

By taking the memory peak value that character peakMem is represented in failed tasks implementation procedure as an example, then obtained in local from node When memory peak value peakMem, it may be implemented in the following manner:

Memory peak value peakMem is obtained by the memory change information before and after failed tasks in the process of implementation RDD transformation.

Certainly, other modes can also be taken in addition to taking aforesaid way to obtain peakMem, the embodiment of the present invention to this not Specifically limited.And locally can be then directly obtained from the free memory information of node SlaveNodeN, details are not described herein again.

603, locally from node according to memory peak value and free memory information, the piecemeal quantity of initial data is calculated.

In embodiments of the present invention, locally from node SlaveNodeN according to memory peak value and free memory information, foundation Following formula (3), calculate the piecemeal quantity of the initial data of failed tasks.

M=(peakMem/remainMem) * radio (3)

Wherein, M refers to the piecemeal quantity of initial data, and remainMem refers to the local free memory information from node, Radio is constant, and value is generally seated between 1 to 2.Since the peakMem of task in the process of implementation is a dynamic The peak value of estimation, in order to improve subsequent arithmetic reliability, usually after obtaining the ratio of peakMem and remainMem, multiplied by One constant ratio (1≤ratio≤2), using the product of the ratio and constant ratio as piecemeal quantity M.Such data block Number will increase, every piece of data volume can suitably reduce, and ensure that the reliability of later period task execution.

604, it is local from node according to the size of piecemeal quantity and initial data, cutting is carried out to initial data, obtain to A few data block.

By taking the size of initial data is S as an example, then the size of each data block is S/M.For example, S=30G, M=6, 5 data blocks will be then obtained, the size of each data block is 5G.

605, local to distribute subtask title from node for each data block.

By each data block be respectively Data1, Data2 ..., for DataN, then can be distributed for each data block Task1, task2, task3 ..., subtask title as taskN.

606, local that calculation process successively is carried out to each data block from node, obtained operation result is stored in this On local disk.

In embodiments of the present invention, after carrying out cutting according to initial data of the above-mentioned steps 605 to failed tasks, Fig. 7 In fault-tolerant module B just start once to carry out calculation process to each data block, and operation result each time is stored On the local disk of SlaveNodeN.Wherein, when carrying out calculation process to each data block, fault-tolerant module B is by cutting Good data block is committed to Executor, carries out calculation process to data block by Executor, obtains at least one operation result Result1、Result2、Result3、……、ResultN。

607, locally from node from local disk, the corresponding operation result of at least one data block is obtained.

Certainly, in addition to taking above-mentioned operation result acquisition modes, other acquisition modes, the embodiment of the present invention pair can also be taken This is without specifically limiting.For example, obtaining Result2 from local disk according to task2, Result2 is to data block Data2 Operation result；Result3 is obtained from local disk according to task3, Result3 is the operation result to data block Data3.

608, local to be merged from node by each operation result, final operation result is obtained, is sent most to client Whole operation result.

Due to during once-through operation, an only subtask of operation failed tasks, and operation result is stored in this In local disk, therefore the local memory stressful situation from node can be effectively relieved, improving the fault-tolerant of Spark distributed system can By property.

Method provided in an embodiment of the present invention, in task implementation procedure, when detecting the error message of low memory, Obtain the initial data of failed tasks, and initial data progress cutting obtained at least one data block, call later it is local from Node or other from node at least one data block carry out calculation process, due to having carried out cutting processing, institute to initial data To realize ancestral task cutting as multiple subtasks, in the implementation procedure of subtask, for executing entire task, Reduce memory usage；Locally successively multiple subtasks are handled from node in addition, calling, and operation result is protected There are in local disk, it is ensured that is effectively treated, is promoted to failed tasks in the insufficient situation of node memory in local The task execution efficiency and fault-tolerant reliability of Spark distributed system.

Referring to Fig. 8, above-mentioned steps 301 to step 309 are explained with a detailed example below.

In fig. 8, Spark distributed system has 6 nodes, and a host node and 5 are from node.Wherein, each from section Free memory 100G, 16 cores of point.SlaveNodeN refers to local from node comprising a fault-tolerant module B.Driver application LR (Logistic regression, logistic regression) machine learning algorithm is run, the data volume of calculating (participates in the data of operation Amount) it is 1,000,000 datas, 10000 dimensions, size 200G.Fault-tolerant processing process in detail is as follows:

The first step, Driver (client) submit task to SlaveNodeN node.

Second step, SlaveNodeN call Executor to execute the task after receiving task task.In task execution In the process due to low memory, there is the case where task execution failure.When Executor captures the mistake of " out of memery " After false information, the error message is sent to fault-tolerant module B.That is, call the case where fault-tolerant module B is to above-mentioned low memory into Row processing.

Total data size due to participating in operation is 200G, is averagely allocated to the data volume about 40G of SlaveN node.And LR algorithm will be related to multiple iterative calculation and matrix multiple operation in the process of implementation, these intermediate data results are often Left and right 2-3 times (80-120G) of task data amount.Therefore, the case where it is easy to appear low memories in this case, in turn Cause task can not successful execution finish.

After third and fourth is walked, the fault-tolerant module B of SlaveNodeN receives the error message that Executor is sent, from The memory that initial data data, the task involved in the task occur in the process of running is obtained in BlockManager module Peak value peakMem.

Wherein, memory peak value peakMem can be changed by the memory before and after task in the process of implementation RDD conversion and be obtained. Because the data of Spark distributed system are existed in the form of RDD, the size of local RDD can reflect data in memory Occupancy situation.

5th step, SlaveNodeN fault-tolerant module B get peakMem after, according to the free memory of itself RemainMem calculates the piecemeal quantity of initial data.

By taking piecemeal quantity is M as an example, then M=(peakMem/remainMem) * ratio, (1≤ratio≤2).? That is, the data Data that t failed tasks are related to is divided for M block according to numerical value M.

Further, since the memory peak value peakMem of failed tasks in the process of implementation is the peak value of a dynamic estimation, it is The reliability of subsequent arithmetic is improved, it can be to M multiplied by a constant ratio.The number of data block in this way will increase, every piece Data volume can suitably reduce, and ensure that the reliability of later period task execution.Wherein, the usual best value of ratio is 1.5.

6th step, SlaveNodeN fault-tolerant module B by failed tasks task be divided into task1, task2 ..., TaskM, and successively these subtasks of operation, obtain each subtask operation result Result1, Result2 ..., ResultM, and each operation result is stored in the local disk of SlaveNodeN.

Wherein, why the operation result of each subtask is put into local disk, main purpose is in order to slow Solve the low memory situation in SlaveNodeN node.

After the completion of 7th step, all subtasks calculate, all operation results that fault-tolerant module B stores local disk are carried out Merge, obtains final operation result, final operation result is returned into Driver.

Above seven steps are the full instances that above method embodiment is realized.By taking local disk to store Calculation can solve Spark distributed system in the process of running, due to mission failure problem caused by low memory, Improve the execution efficiency of Spark distributed system fault-tolerant reliability and task.

Fig. 9 is a kind of schematic diagram of internal structure of fault-tolerant processing device provided in an embodiment of the present invention.Referring to Fig. 9, the dress Set includes: that initial data obtains module 901, initial data cutting module 902, processing node calling module 903, operation result obtain Modulus block 904, operation result merging module 905, operation result sending module 906.

Wherein, initial data obtains module 901, is used in task implementation procedure, when the mistake for detecting low memory When information, the initial data of failed tasks is obtained；Initial data cutting module 902 obtains module 901 with initial data and connect, and uses Cutting is carried out in the initial data for getting initial data acquisition module 901, obtains at least one data block；Handle node tune It is connect with module 903 with initial data cutting module 902, for calling data processing node to initial data cutting module 902 At least one data block of cutting carries out calculation process, and data processing node is local from node or except local is in addition to node Other are from node；Operation result obtains module 904 and connect with processing node calling module 903, for obtaining at least one data The corresponding operation result of block；Operation result merging module 905 obtains module 904 with operation result and connect, and is used for operation result It obtains each operation result that module 904 is got to merge, obtains final operation result；Operation result sending module 906 It is connect with operation result merging module 905, for sending the final operation knot that operation result merging module 905 obtains to client Fruit.

Optionally, the device further include:

First memory information obtains module, for obtaining the local free memory information from node with other from node；

Weight calculation module, for calculating each from the corresponding data weights assigned of node according to free memory information Value；

Initial data cutting module, for the size according to data weights assigned value and initial data, to initial data into Row cutting obtains each from the corresponding data block of node.

Optionally, the device further include:

Node calling module is handled, for sending corresponding data block and subtask title from node to each, by local Calculation process is carried out at least one data block from node from node with other.

Optionally, operation result obtains module, for according to the corresponding subtask title of each data block, from local from Node from node, obtains the corresponding operation result of each data block with other.

Optionally, the device further include:

Memory peak value obtains module, for obtaining the local memory peak value from node during executing failed tasks；

Second memory information obtains module, for obtaining the local free memory information from node；

Initial data cutting module, for calculating the block count of initial data according to memory peak value and free memory information Amount；According to the size of piecemeal quantity and initial data, cutting is carried out to initial data, obtains at least one data block.

Optionally, the device further include:

Second task names distribution module, for distributing subtask title for each data block；

Node calling module is handled, calculation process successively locally is carried out to each data block from node for calling, it will Obtained operation result is stored on local disk.

Optionally, operation result obtains module, is used for according to the corresponding subtask title of each data block, from this earth magnetism In disk, the corresponding operation result of each data block is obtained.

Device provided in an embodiment of the present invention, in task implementation procedure, when detecting the error message of low memory, Obtain the initial data of failed tasks, and initial data progress cutting obtained at least one data block, call later it is local from Node or other from node at least one data block carry out calculation process, due to having carried out cutting processing, institute to initial data To realize ancestral task cutting as multiple subtasks, in the implementation procedure of subtask, for executing entire task, Reduce memory usage, and call it is local from node or other from node multiple subtasks are handled respectively, it is ensured that Failed tasks are effectively treated in the insufficient situation of node memory in local, improve appointing for Spark distributed system Execution efficiency of being engaged in and fault-tolerant reliability.

The embodiment of the invention provides a kind of fault-tolerant processing equipment.The equipment includes at least one processor, such as CPU, At least one network interface, memory and at least one communication bus.Communication bus is for realizing the connection between these devices Communication.Memory may include high speed Ram memory, it is also possible to further include nonvolatile memory (non-volatile Memory), a for example, at least magnetic disk storage.

Processor is used to execute the program of memory storage, to realize following method:

In task implementation procedure, when detecting the error message of low memory, the initial data of failed tasks is obtained； Initial data is subjected to cutting, obtains at least one data block；Data processing node is called to transport at least one data block Calculation processing, data processing node are local from node or except local is from other in addition to node from node；Obtain at least one number According to the corresponding operation result of block；Each operation result is merged, final operation result is obtained, sends final fortune to client Calculate result.

Further, processor, specifically for obtaining the local free memory information from node with other from node；According to Free memory information calculates each from the corresponding data weights assigned value of node；According to data weights assigned value and initial data Size, to initial data carry out cutting, obtain each from the corresponding data block of node.

Further, processor is specifically used for distributing subtask title from the corresponding data block of node to be each；To each Send corresponding data block and subtask title from node, by local from node and other from node at least one data block into Row calculation process.

Further, processor, is specifically used for according to the corresponding subtask title of each data block, from local from node With other from node, the corresponding operation result of each data block is obtained.

Further, processor, specifically for obtaining the local memory peak value from node during executing failed tasks； Obtain the local free memory information from node；According to memory peak value and free memory information, the block count of initial data is calculated Amount；According to the size of piecemeal quantity and initial data, cutting is carried out to initial data, obtains at least one data block.

Further, processor is specifically used for distributing subtask title for each data block；Call it is local from node according to It is secondary that calculation process is carried out to each data block, obtained operation result is stored on local disk.

Further, processor is specifically used for according to the corresponding subtask title of each data block, from local disk In, obtain the corresponding operation result of each data block.

Equipment provided in this embodiment, when detecting the error message of low memory, obtains in task implementation procedure The initial data of failed tasks, and initial data progress cutting is obtained at least one data block, it calls later local from node Or other carry out calculation process at least one data block from node, due to having carried out cutting processing to initial data, so real Having showed ancestral task cutting is multiple subtasks, in the implementation procedure of subtask, for executing entire task, is reduced Memory usage, and call it is local from node or other respectively multiple subtasks are handled from node, it is ensured that at this Ground is effectively treated failed tasks in the insufficient situation of node memory, and the improving Spark distributed system of the task is held Line efficiency and fault-tolerant reliability.

It should be understood that fault-tolerant processing device provided by the above embodiment is when carrying out fault-tolerant processing, only with above-mentioned each The division progress of functional module can according to need and for example, in practical application by above-mentioned function distribution by different function Energy module is completed, i.e., the internal structure of device is divided into different functional modules, to complete whole described above or portion Divide function.In addition, fault-tolerant processing device provided by the above embodiment and fault-tolerance processing method embodiment belong to same design, have Body realizes that process is detailed in embodiment of the method, and which is not described herein again.

Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. a kind of fault-tolerance processing method, which is characterized in that the described method includes:

According to each size from node corresponding data weights assigned value and the initial data, the initial data is carried out Cutting, obtains at least one data block, at least one described data block is described each to be described each from the corresponding data block of node It is a from node include local from node and except the local is from other in addition to node from node, the data weights assigned value root It is calculated from node and described other from the free memory information of node according to the local,

Alternatively, being cut according to the size of the piecemeal quantity of the initial data and the initial data to the initial data Point, obtain at least one data block, the piecemeal quantity according to the local from node during executing the failed tasks Memory peak value and the local be calculated from the free memory information of node；

Data processing node is called to carry out calculation process at least one described data block, the data processing node is described Ground from node or it is described other from node；

2. the method according to claim 1, wherein it is described according to each from the corresponding data weights assigned of node The size of value and the initial data, before carrying out cutting to the initial data, the method also includes:

According to the free memory information, calculate each from the corresponding data weights assigned value of node.

3. according to the method described in claim 2, it is characterized in that, the calling data processing node is at least one described number Before carrying out calculation process according to block, the method also includes:

Send corresponding data block and subtask title from node to each, by the local from node and it is described other from node Calculation process is carried out at least one described data block.

4. according to the method described in claim 3, it is characterized in that, the corresponding operation of at least one data block described in the acquisition As a result, comprising:

5. the method according to claim 1, wherein the piecemeal quantity according to the initial data and described The size of initial data, before carrying out cutting to the initial data, the method also includes:

Obtain free memory information of the local from node；

According to the memory peak value and the free memory information, the piecemeal quantity of the initial data is calculated.

6. according to the method described in claim 5, it is characterized in that, the calling data processing node is at least one described number Before carrying out calculation process according to block, the method also includes:

Subtask title is distributed for each data block；

It calls the local successively to carry out calculation process to each data block from node, obtained operation result is stored in this On local disk.

7. according to the method described in claim 6, it is characterized in that, the corresponding operation of at least one data block described in the acquisition As a result, comprising:

It is corresponding to obtain each data block from the local disk according to the corresponding subtask title of each data block Operation result.

8. a kind of fault-tolerant processing device, which is characterized in that described device includes:

Initial data obtains module, for when detecting the error message of low memory, obtaining and losing in task implementation procedure Lose the initial data of task；

Initial data cutting module, for according to each from the big of the corresponding data weights assigned value of node and the initial data It is small, the initial data that gets of module is obtained to the initial data and carries out cutting, obtains at least one data block, it is described at least One data block be it is described each from the corresponding data block of node, it is described it is each from node include local from node and removing described Ground from other other than node from node, the data weights assigned value according to the local from node and it is described other from node Free memory information be calculated,

Alternatively, the initial data cutting module, for according to the piecemeal quantity of the initial data and the initial data Size obtains the initial data that gets of module to the initial data and carries out cutting, obtains at least one data block, and described point Number of blocks is according to the local from memory peak value of node during executing the failed tasks and the local from node Free memory information is calculated；

Node calling module is handled, for calling data processing node at least one of the initial data cutting module cutting Data block carries out calculation process, the data processing node be the local from node or it is described other from node；

Operation result merging module, each operation result for getting operation result acquisition module merge, Obtain final operation result；

Operation result sending module, for sending the final operation result that the operation result merging module obtains to client.

9. device according to claim 8, which is characterized in that when the initial data cutting module, for according to The size of data weights assigned value and the initial data carries out cutting, described device to the initial data further include:

First memory information obtain module, for obtain the local from node and it is described other from the free memory of node believe Breath；

Weight calculation module, for calculating each from the corresponding data weights assigned of node according to the free memory information Value.

10. device according to claim 9, which is characterized in that described device further include:

The processing node calling module, for sending corresponding data block and subtask title from node to each, by described It is local from node and described other from node carry out calculation process at least one described data block.

11. device according to claim 10, which is characterized in that the operation result obtains module, for according to each The corresponding subtask title of a data block, from the local from node and it is described other from node, obtain each data block Corresponding operation result.

12. device according to claim 8, which is characterized in that when the initial data cutting module, for according to The size of piecemeal quantity and the initial data carries out cutting, described device to the initial data further include:

Memory peak value obtains module, for obtaining memory peak of the local from node during executing the failed tasks Value；

The initial data cutting module is also used to calculate the original according to the memory peak value and the free memory information The piecemeal quantity of beginning data.

13. device according to claim 12, which is characterized in that described device further include:

The processing node calling module, for calling the local successively to carry out operation place to each data block from node Reason, obtained operation result is stored on local disk.

14. device according to claim 13, which is characterized in that the operation result obtains module, for according to each The corresponding subtask title of a data block obtains the corresponding operation result of each data block from the local disk.