CN106357813A

CN106357813A - Task rescheduling method applied in shared-file system

Info

Publication number: CN106357813A
Application number: CN201610952589.4A
Authority: CN
Inventors: 陈军; 闫鹏飞
Original assignee: Long Yu Technology (beijing) Ltd By Share Ltd
Current assignee: Long Yu Technology (beijing) Ltd By Share Ltd
Priority date: 2016-11-02
Filing date: 2016-11-02
Publication date: 2017-01-25
Anticipated expiration: 2036-11-02
Also published as: CN106357813B

Abstract

The invention provides a task rescheduling method applied in a shared-file system. The task rescheduling method comprises the following steps: when a certain node fails, preemptive operation for tasks of failed nodes is carried out among a plurality of unfailed nodes, and the node with successful preemption takes over the tasks of the failed nodes. The preemptive operation is realized by utilizing renaming operation for a certain file, i.e., renaming operation for the same file is carried out by the plurality of unfailed nodes simultaneously, and a certain node with successful operation is the node with successful preemption. The task rescheduling method provided by the invention has the advantages that the implementation is simple, no single-point fault is caused, the simultaneous fault of the plurality of nodes can be tolerated, and the tasks can be better assigned to all nodes in a distributed type system. The task rescheduling method provided by the invention has the advantages that the characteristics of the shared-file system can be better utilized, so that the problem of task rescheduling when the nodes fail is better solved.

Description

A kind of task dispatching method again being applied to shared-file system

Technical field

The present invention relates to communication technical field, particularly a kind of task dispatching method again being applied to shared-file system.

Background technology

Mass data processing is the typical case's application of one of distributed system.In this type of application, there is the number of a class application According to there being following two features:

(1) it is saved in shared-file system, each node of system can be transferred through the client visit of shared-file system Ask these data.

(2) there is general character in data, and the general character of data has fixing span, and data can be protected according to span Exist in different files.

For the feature of this kind of application, the data file of different spans is typically scheduled to different nodes by us Processed.During processing data, node failure and node recover to be a problem needing to consider.How will lose efficacy The task of node reschedule to other nodes and node recover when how again taking over tasks be to ensure that what task smoothly completed One key factor, method proposed by the present invention is directed to this problem.

The task of node failure and node recovery is dispatched and is generally had following several ways at present:

Centralized: a special scheduling node is set in which, and other nodes are all to process node, scheduling node Monitoring processes the state of node.When certain processes node failure, the task of this node is reassigned to other healthy nodes, When failure node gets well state, original task is taken over by this node again.This mode is realized simply, but adjusts Degree node is likely to become bottleneck, and there is Single Point of Faliure.

Two-by-two mutually for formula: match two-by-two between which interior joint, match node main and standby relation each other.Suppose node a It is pairing node with b, when node a lost efficacy, its task takes over original task, the mistake of node b by b process, node a when recovering Effect and recovery are also such.This mode realizes simply, there is not Single Point of Faliure problem, but if pairing node all lost efficacy, then Its task does not have node to take over.

Concentrating type: which is the mutual upgraded version for formula two-by-two, when a node failure, needs multiple health nodes Between hold consultation and select a node and take over its task.This mode can tolerate that multiple nodes break down simultaneously, but It is related to the communication between multiple nodes, implement more complicated.

Content of the invention

In order to overcome the deficiencies in the prior art, the invention provides a kind of task of being applied to shared-file system is dispatched again Method, the method is realized simply, not to be had Single Point of Faliure and can tolerate that multiple nodes break down, and can preferably divide task simultaneously Task each node in distributed system.

The technical solution adopted for the present invention to solve the technical problems is: a kind of task of being applied to shared-file system is again Dispatching method, comprises the steps:

When certain node failure, carry out failure node task between multiple non-failure nodes seizes operation, seizes into The task of the node taking over failing node of work(.

Preferably, described seizing operates to realize using operating to the renaming of a certain file, that is, the plurality of does not lose efficacy Node carries out renaming operation simultaneously to same file, and successfully certain node is to seize successful node for operation.

Preferably, the file named with " ti-nj " creating in shared-file system when described file is initial, its In, ti is mission number, and nj is node serial number.

Preferably, on each node run a process, described seize operation before, travel through above-mentioned All Files, inspection Look into filename and the modification time of each file, judge whether to seize operation.

Preferably, to number node as k, as a example the file of the entitled ti-nj of file, described judge whether to seize behaviour Make step particularly as follows:

If a. i=k, j=k, show that this task, just by node k process, updates the modification time of this document, at continuation Manage this task；

If b. i=k, j ≠ k, show this task reason node k process, node k once lost efficacy, and was seized by other nodes Processed, present node k has recovered to health status, carry out seizing back the process of this node tasks；

If c. i ≠ k, j=k, show this task reason node ni process, ni was seized successfully by nk when losing efficacy, and now updated The modification time of this document simultaneously continues with this task；

If d. i ≠ k, j ≠ k, show that this task, by other node processing, now judges whether nj lost efficacy, if losing efficacy, by Other nodes carry out seizing operation.

Preferably, carry out described in step b seizing back the process of this node tasks particularly as follows: checking repairing of file ti-nj Change whether time gap current time has exceeded the out-of-service time, such as exceed, then show that node nj lost efficacy, be now found nj The healthy node losing efficacy is seized, and node nk attempts ti-nj RNTO ti-nk, if success, illustrates to seize successfully, Update the modification time of ti-nk, process task ti；As not less than, then need to be communicated with nj, notify nj stop when predecessor Business, then by ti-nj RNTO ti-nk, if success, illustrates to seize the modification time successfully updating ti-nk, processes and appoint Business ti.

Preferably, judge described in step d whether nj lost efficacy, if losing efficacy, carrying out gunbattle by other nodes and operating specifically For: check whether the modification time of file ti-nj has exceeded the out-of-service time apart from current time, such as exceed, then node nj loses Effect, the healthy node that now be found nj lost efficacy is seized, and node nk attempts ti-nj RNTO ti-nk, if become Work(then illustrates to seize the modification time successfully updating ti-nk, process task ti.

The positive effect of the present invention: set forth herein a kind of implementation method of concentrating type scheduling, the method need not multiple nodes Between carry out communication and select hosting node, but allow and carry out seizing of failure node task between multiple nodes, seize successfully The task of node taking over failing node, node is also carried out seizing of task when recovering.The process wherein seized needs one and " cuts out Sentence " determining that who can seize successfully, and the server of shared-file system just can serve as the role of this " judge ".Seize This operation operates to realize using file renaming, and multiple file system client renaming to same file simultaneously Only have one during operation successfully to return, only have one and seize successfully.The mechanism that decision node lost efficacy can be based on file Modification time is realizing.The present invention using multiple client in shared-file system carry out file renaming have exclusiveness this Feature, proposes a kind of task dispatching method again carrying out task preemption in node failure using renaming operation, side of the present invention Method is realized simply, not to be had Single Point of Faliure and can tolerate that multiple nodes break down simultaneously, can preferably be dispatched to be distributed by task Each node in formula system, the inventive method can preferably utilize the feature of shared-file system, solves section well The problem that during point failure, task is dispatched again.

Brief description

Fig. 1 is the schematic flow sheet of the present invention.

Specific embodiment

Below in conjunction with the accompanying drawings to a preferred embodiment of the present invention will be described in detail.

With reference to Fig. 1, the preferred embodiment of the present invention provides a kind of task dispatching method again being applied to shared-file system, should Method need not select hosting node by multiple enabling communication between nodes, but allow and carry out failure node task between multiple nodes Seize, seize the task of successful node taking over failing node, node is also carried out seizing of task when recovering.Seize operation to utilize File renaming operates to be realized, and only has one when multiple node operates to the renaming of same file simultaneously and successfully return Return, only have one and seize successfully.The mechanism that decision node lost efficacy can be realized based on filemodetime, specifically real Existing as described below:

It is assumed that nodes are identical with number of tasks (data file number), be n, node serial number from n1 to nn, mission number from T1 to tn.When all nodes are health status, one task of each node processing, i.e. node ni process task ti.We will Task ti is right with node nj composition " task-node ", and that is, ti-nj represents task i by node j process.When initial, Wo Men Create the n file named with " ti-nj " in shared-file system, represent that n " task-node " is right, i.e. t1-n1, t2- N2...tn-nn, shows that ni processes same task ti of numbering, ti is subordinated to ni.

Run a process on each node, travel through this n file, check the modification time of each filename and file. We carry out following judgement to number node as k as a example the file of the entitled ti-nj of file:

1) if i=k, j=k, this shows that this task is processed by this section point, therefore only need to update the modification time of this document , continue with this task.

2) if i=k, j ≠ k, this shows this task this node processing of reason, and this node once lost efficacy, by other nodes Seize and processed, this node recovers to health status now, therefore needs are seized back this section point and processed.Check file Whether whether the modification time of ti-nj exceeded the out-of-service time apart from current time, such as exceedes, then show that node nj lost efficacy, The healthy node that now be found nj lost efficacy is seized.Node nk attempts ti-nj RNTO ti-nk, if success Then the modification time successfully updating ti-nk, process task ti are seized in explanation.As not less than, then need to be communicated with nj, lead to Know that nj stops current task, then by ti-nj RNTO ti-nk, if success, illustrate to seize successfully, update ti-nk's Modification time, process task ti.

3) if i ≠ k, j=k, this shows this task reason node ni process, and ni was seized successfully by nk when losing efficacy, therefore only needed Update the modification time of this document, continue with this task.

4) if i ≠ k, j ≠ k, this shows that this task, by other node processing, now needs to judge whether nj lost efficacy.Check literary composition Whether whether the modification time of part ti-nj exceeded the out-of-service time apart from current time, such as exceedes, then show that node nj loses Effect, the healthy node that now be found nj lost efficacy is seized.Node nk attempts ti-nj RNTO ti-nk, if become Work(then illustrates to seize the modification time successfully updating ti-nk, process task ti.

It is more than number of tasks and nodes identical situation, usual number of tasks can substantially exceed nodes, and we can will appoint Business carries out Hash calculation according to certain rule, and the task with identical cryptographic Hash is dispatched to identical node and is processed.

In sum, the inventive method can utilize the feature of shared-file system well, thus solving section well The problem that during point failure, task is dispatched again.

Above-described only the preferred embodiments of the present invention, the explanation of should be understood that above example is to use In help understand the method for the present invention and its core concept, the protection domain being not intended to limit the present invention, all the present invention's Any modification of being made within thought and principle, equivalent etc., should be included within the scope of the present invention.

Claims

1. a kind of task of being applied to shared-file system again dispatching method it is characterised in that comprising the steps:

When certain node failure, carry out failure node task between multiple non-failure nodes seizes operation, seizes successfully The task of node taking over failing node.

2. a kind of task of being applied to shared-file system according to claim 1 again dispatching method it is characterised in that: institute State to seize and operate to realize using operating to the renaming of a certain file, that is, the plurality of non-failure node is simultaneously to same literary composition Part carries out renaming operation, and successfully certain node is to seize successful node for operation.

3. a kind of task of being applied to shared-file system according to claim 2 again dispatching method it is characterised in that: institute State the file named with " ti-nj " creating in shared-file system when file is initial, wherein, ti is mission number, nj For node serial number.

4. a kind of task of being applied to shared-file system according to claim 3 again dispatching method it is characterised in that: every On individual node run a process, described seize operation before, travel through above-mentioned All Files, check the filename of each file And modification time, judge whether to seize operation.

5. a kind of task of being applied to shared-file system according to claim 4 again dispatching method it is characterised in that: with Numbering is the node of k, as a example the file of the entitled ti-nj of file, the described step judging whether to seize operation particularly as follows:

If a. i=k, j=k, show that this task, just by node k process, updates the modification time of this document, continues with this Task；

If b. i=k, j ≠ k, show this task reason node k process, node k once lost efficacy, and is seized by other nodes and carries out Process, present node k has recovered to health status, carries out seizing back the process of this node tasks；

If c. i ≠ k, j=k, show this task reason node ni process, ni seized successfully by nk when losing efficacy, now update this article The modification time of part simultaneously continues with this task；

If d. i ≠ k, j ≠ k, show that this task, by other node processing, now judges whether nj lost efficacy, if losing efficacy, by other Node carries out seizing operation.

6. a kind of task of being applied to shared-file system according to claim 5 again dispatching method it is characterised in that: step Carry out described in rapid b seizing back the process of this node tasks particularly as follows: checking the modification time of file ti-nj apart from current time Whether exceed the out-of-service time, such as exceeded, then shown that node nj lost efficacy, the healthy node that now be found nj lost efficacy has been carried out Seize, node nk attempts ti-nj RNTO ti-nk, if success, when illustrating to seize the modification successfully updating ti-nk Between, process task ti；As not less than, then need to be communicated with nj, notify nj stop current task, then ti-nj is ordered again Entitled ti-nk, if success, illustrates to seize the modification time successfully updating ti-nk, process task ti.

7. a kind of task of being applied to shared-file system according to claim 5 again dispatching method it is characterised in that: step Judge whether nj lost efficacy described in rapid d, if losing efficacy, carrying out gunbattle by other nodes and operating particularly as follows: checking file ti-nj's Whether modification time has exceeded the out-of-service time apart from current time, such as exceedes, then node nj lost efficacy, and now be found nj loses The healthy node of effect is seized, and node nk attempts ti-nj RNTO ti-nk, if success, illustrates to seize successfully, more The modification time of new ti-nk, process task ti.