CN106357813A - Task rescheduling method applied in shared-file system - Google Patents
Task rescheduling method applied in shared-file system Download PDFInfo
- Publication number
- CN106357813A CN106357813A CN201610952589.4A CN201610952589A CN106357813A CN 106357813 A CN106357813 A CN 106357813A CN 201610952589 A CN201610952589 A CN 201610952589A CN 106357813 A CN106357813 A CN 106357813A
- Authority
- CN
- China
- Prior art keywords
- node
- task
- file
- shared
- seize
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1008—Server selection for load balancing based on parameters of servers, e.g. available memory or workload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1034—Reaction to server failures by a load balancer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/60—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
Abstract
The invention provides a task rescheduling method applied in a shared-file system. The task rescheduling method comprises the following steps: when a certain node fails, preemptive operation for tasks of failed nodes is carried out among a plurality of unfailed nodes, and the node with successful preemption takes over the tasks of the failed nodes. The preemptive operation is realized by utilizing renaming operation for a certain file, i.e., renaming operation for the same file is carried out by the plurality of unfailed nodes simultaneously, and a certain node with successful operation is the node with successful preemption. The task rescheduling method provided by the invention has the advantages that the implementation is simple, no single-point fault is caused, the simultaneous fault of the plurality of nodes can be tolerated, and the tasks can be better assigned to all nodes in a distributed type system. The task rescheduling method provided by the invention has the advantages that the characteristics of the shared-file system can be better utilized, so that the problem of task rescheduling when the nodes fail is better solved.
Description
Technical field
The present invention relates to communication technical field, particularly a kind of task dispatching method again being applied to shared-file system.
Background technology
Mass data processing is the typical case's application of one of distributed system.In this type of application, there is the number of a class application
According to there being following two features:
(1) it is saved in shared-file system, each node of system can be transferred through the client visit of shared-file system
Ask these data.
(2) there is general character in data, and the general character of data has fixing span, and data can be protected according to span
Exist in different files.
For the feature of this kind of application, the data file of different spans is typically scheduled to different nodes by us
Processed.During processing data, node failure and node recover to be a problem needing to consider.How will lose efficacy
The task of node reschedule to other nodes and node recover when how again taking over tasks be to ensure that what task smoothly completed
One key factor, method proposed by the present invention is directed to this problem.
The task of node failure and node recovery is dispatched and is generally had following several ways at present:
Centralized: a special scheduling node is set in which, and other nodes are all to process node, scheduling node
Monitoring processes the state of node.When certain processes node failure, the task of this node is reassigned to other healthy nodes,
When failure node gets well state, original task is taken over by this node again.This mode is realized simply, but adjusts
Degree node is likely to become bottleneck, and there is Single Point of Faliure.
Two-by-two mutually for formula: match two-by-two between which interior joint, match node main and standby relation each other.Suppose node a
It is pairing node with b, when node a lost efficacy, its task takes over original task, the mistake of node b by b process, node a when recovering
Effect and recovery are also such.This mode realizes simply, there is not Single Point of Faliure problem, but if pairing node all lost efficacy, then
Its task does not have node to take over.
Concentrating type: which is the mutual upgraded version for formula two-by-two, when a node failure, needs multiple health nodes
Between hold consultation and select a node and take over its task.This mode can tolerate that multiple nodes break down simultaneously, but
It is related to the communication between multiple nodes, implement more complicated.
Content of the invention
In order to overcome the deficiencies in the prior art, the invention provides a kind of task of being applied to shared-file system is dispatched again
Method, the method is realized simply, not to be had Single Point of Faliure and can tolerate that multiple nodes break down, and can preferably divide task simultaneously
Task each node in distributed system.
The technical solution adopted for the present invention to solve the technical problems is: a kind of task of being applied to shared-file system is again
Dispatching method, comprises the steps:
When certain node failure, carry out failure node task between multiple non-failure nodes seizes operation, seizes into
The task of the node taking over failing node of work(.
Preferably, described seizing operates to realize using operating to the renaming of a certain file, that is, the plurality of does not lose efficacy
Node carries out renaming operation simultaneously to same file, and successfully certain node is to seize successful node for operation.
Preferably, the file named with " ti-nj " creating in shared-file system when described file is initial, its
In, ti is mission number, and nj is node serial number.
Preferably, on each node run a process, described seize operation before, travel through above-mentioned All Files, inspection
Look into filename and the modification time of each file, judge whether to seize operation.
Preferably, to number node as k, as a example the file of the entitled ti-nj of file, described judge whether to seize behaviour
Make step particularly as follows:
If a. i=k, j=k, show that this task, just by node k process, updates the modification time of this document, at continuation
Manage this task;
If b. i=k, j ≠ k, show this task reason node k process, node k once lost efficacy, and was seized by other nodes
Processed, present node k has recovered to health status, carry out seizing back the process of this node tasks;
If c. i ≠ k, j=k, show this task reason node ni process, ni was seized successfully by nk when losing efficacy, and now updated
The modification time of this document simultaneously continues with this task;
If d. i ≠ k, j ≠ k, show that this task, by other node processing, now judges whether nj lost efficacy, if losing efficacy, by
Other nodes carry out seizing operation.
Preferably, carry out described in step b seizing back the process of this node tasks particularly as follows: checking repairing of file ti-nj
Change whether time gap current time has exceeded the out-of-service time, such as exceed, then show that node nj lost efficacy, be now found nj
The healthy node losing efficacy is seized, and node nk attempts ti-nj RNTO ti-nk, if success, illustrates to seize successfully,
Update the modification time of ti-nk, process task ti;As not less than, then need to be communicated with nj, notify nj stop when predecessor
Business, then by ti-nj RNTO ti-nk, if success, illustrates to seize the modification time successfully updating ti-nk, processes and appoint
Business ti.
Preferably, judge described in step d whether nj lost efficacy, if losing efficacy, carrying out gunbattle by other nodes and operating specifically
For: check whether the modification time of file ti-nj has exceeded the out-of-service time apart from current time, such as exceed, then node nj loses
Effect, the healthy node that now be found nj lost efficacy is seized, and node nk attempts ti-nj RNTO ti-nk, if become
Work(then illustrates to seize the modification time successfully updating ti-nk, process task ti.
The positive effect of the present invention: set forth herein a kind of implementation method of concentrating type scheduling, the method need not multiple nodes
Between carry out communication and select hosting node, but allow and carry out seizing of failure node task between multiple nodes, seize successfully
The task of node taking over failing node, node is also carried out seizing of task when recovering.The process wherein seized needs one and " cuts out
Sentence " determining that who can seize successfully, and the server of shared-file system just can serve as the role of this " judge ".Seize
This operation operates to realize using file renaming, and multiple file system client renaming to same file simultaneously
Only have one during operation successfully to return, only have one and seize successfully.The mechanism that decision node lost efficacy can be based on file
Modification time is realizing.The present invention using multiple client in shared-file system carry out file renaming have exclusiveness this
Feature, proposes a kind of task dispatching method again carrying out task preemption in node failure using renaming operation, side of the present invention
Method is realized simply, not to be had Single Point of Faliure and can tolerate that multiple nodes break down simultaneously, can preferably be dispatched to be distributed by task
Each node in formula system, the inventive method can preferably utilize the feature of shared-file system, solves section well
The problem that during point failure, task is dispatched again.
Brief description
Fig. 1 is the schematic flow sheet of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawings to a preferred embodiment of the present invention will be described in detail.
With reference to Fig. 1, the preferred embodiment of the present invention provides a kind of task dispatching method again being applied to shared-file system, should
Method need not select hosting node by multiple enabling communication between nodes, but allow and carry out failure node task between multiple nodes
Seize, seize the task of successful node taking over failing node, node is also carried out seizing of task when recovering.Seize operation to utilize
File renaming operates to be realized, and only has one when multiple node operates to the renaming of same file simultaneously and successfully return
Return, only have one and seize successfully.The mechanism that decision node lost efficacy can be realized based on filemodetime, specifically real
Existing as described below:
It is assumed that nodes are identical with number of tasks (data file number), be n, node serial number from n1 to nn, mission number from
T1 to tn.When all nodes are health status, one task of each node processing, i.e. node ni process task ti.We will
Task ti is right with node nj composition " task-node ", and that is, ti-nj represents task i by node j process.When initial, Wo Men
Create the n file named with " ti-nj " in shared-file system, represent that n " task-node " is right, i.e. t1-n1, t2-
N2...tn-nn, shows that ni processes same task ti of numbering, ti is subordinated to ni.
Run a process on each node, travel through this n file, check the modification time of each filename and file.
We carry out following judgement to number node as k as a example the file of the entitled ti-nj of file:
1) if i=k, j=k, this shows that this task is processed by this section point, therefore only need to update the modification time of this document
, continue with this task.
2) if i=k, j ≠ k, this shows this task this node processing of reason, and this node once lost efficacy, by other nodes
Seize and processed, this node recovers to health status now, therefore needs are seized back this section point and processed.Check file
Whether whether the modification time of ti-nj exceeded the out-of-service time apart from current time, such as exceedes, then show that node nj lost efficacy,
The healthy node that now be found nj lost efficacy is seized.Node nk attempts ti-nj RNTO ti-nk, if success
Then the modification time successfully updating ti-nk, process task ti are seized in explanation.As not less than, then need to be communicated with nj, lead to
Know that nj stops current task, then by ti-nj RNTO ti-nk, if success, illustrate to seize successfully, update ti-nk's
Modification time, process task ti.
3) if i ≠ k, j=k, this shows this task reason node ni process, and ni was seized successfully by nk when losing efficacy, therefore only needed
Update the modification time of this document, continue with this task.
4) if i ≠ k, j ≠ k, this shows that this task, by other node processing, now needs to judge whether nj lost efficacy.Check literary composition
Whether whether the modification time of part ti-nj exceeded the out-of-service time apart from current time, such as exceedes, then show that node nj loses
Effect, the healthy node that now be found nj lost efficacy is seized.Node nk attempts ti-nj RNTO ti-nk, if become
Work(then illustrates to seize the modification time successfully updating ti-nk, process task ti.
It is more than number of tasks and nodes identical situation, usual number of tasks can substantially exceed nodes, and we can will appoint
Business carries out Hash calculation according to certain rule, and the task with identical cryptographic Hash is dispatched to identical node and is processed.
In sum, the inventive method can utilize the feature of shared-file system well, thus solving section well
The problem that during point failure, task is dispatched again.
Above-described only the preferred embodiments of the present invention, the explanation of should be understood that above example is to use
In help understand the method for the present invention and its core concept, the protection domain being not intended to limit the present invention, all the present invention's
Any modification of being made within thought and principle, equivalent etc., should be included within the scope of the present invention.
Claims (7)
1. a kind of task of being applied to shared-file system again dispatching method it is characterised in that comprising the steps:
When certain node failure, carry out failure node task between multiple non-failure nodes seizes operation, seizes successfully
The task of node taking over failing node.
2. a kind of task of being applied to shared-file system according to claim 1 again dispatching method it is characterised in that: institute
State to seize and operate to realize using operating to the renaming of a certain file, that is, the plurality of non-failure node is simultaneously to same literary composition
Part carries out renaming operation, and successfully certain node is to seize successful node for operation.
3. a kind of task of being applied to shared-file system according to claim 2 again dispatching method it is characterised in that: institute
State the file named with " ti-nj " creating in shared-file system when file is initial, wherein, ti is mission number, nj
For node serial number.
4. a kind of task of being applied to shared-file system according to claim 3 again dispatching method it is characterised in that: every
On individual node run a process, described seize operation before, travel through above-mentioned All Files, check the filename of each file
And modification time, judge whether to seize operation.
5. a kind of task of being applied to shared-file system according to claim 4 again dispatching method it is characterised in that: with
Numbering is the node of k, as a example the file of the entitled ti-nj of file, the described step judging whether to seize operation particularly as follows:
If a. i=k, j=k, show that this task, just by node k process, updates the modification time of this document, continues with this
Task;
If b. i=k, j ≠ k, show this task reason node k process, node k once lost efficacy, and is seized by other nodes and carries out
Process, present node k has recovered to health status, carries out seizing back the process of this node tasks;
If c. i ≠ k, j=k, show this task reason node ni process, ni seized successfully by nk when losing efficacy, now update this article
The modification time of part simultaneously continues with this task;
If d. i ≠ k, j ≠ k, show that this task, by other node processing, now judges whether nj lost efficacy, if losing efficacy, by other
Node carries out seizing operation.
6. a kind of task of being applied to shared-file system according to claim 5 again dispatching method it is characterised in that: step
Carry out described in rapid b seizing back the process of this node tasks particularly as follows: checking the modification time of file ti-nj apart from current time
Whether exceed the out-of-service time, such as exceeded, then shown that node nj lost efficacy, the healthy node that now be found nj lost efficacy has been carried out
Seize, node nk attempts ti-nj RNTO ti-nk, if success, when illustrating to seize the modification successfully updating ti-nk
Between, process task ti;As not less than, then need to be communicated with nj, notify nj stop current task, then ti-nj is ordered again
Entitled ti-nk, if success, illustrates to seize the modification time successfully updating ti-nk, process task ti.
7. a kind of task of being applied to shared-file system according to claim 5 again dispatching method it is characterised in that: step
Judge whether nj lost efficacy described in rapid d, if losing efficacy, carrying out gunbattle by other nodes and operating particularly as follows: checking file ti-nj's
Whether modification time has exceeded the out-of-service time apart from current time, such as exceedes, then node nj lost efficacy, and now be found nj loses
The healthy node of effect is seized, and node nk attempts ti-nj RNTO ti-nk, if success, illustrates to seize successfully, more
The modification time of new ti-nk, process task ti.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610952589.4A CN106357813B (en) | 2016-11-02 | 2016-11-02 | A kind of task applied to shared-file system dispatching method again |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610952589.4A CN106357813B (en) | 2016-11-02 | 2016-11-02 | A kind of task applied to shared-file system dispatching method again |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106357813A true CN106357813A (en) | 2017-01-25 |
CN106357813B CN106357813B (en) | 2019-08-06 |
Family
ID=57863582
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610952589.4A Active CN106357813B (en) | 2016-11-02 | 2016-11-02 | A kind of task applied to shared-file system dispatching method again |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106357813B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107707620A (en) * | 2017-08-30 | 2018-02-16 | 华为技术有限公司 | Handle the method and device of I/O request |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609303A (en) * | 2012-01-18 | 2012-07-25 | 华为技术有限公司 | Slow-task dispatching method and slow-task dispatching device of Map Reduce system |
CN103414761A (en) * | 2013-07-23 | 2013-11-27 | 北京工业大学 | Mobile terminal cloud resource scheduling method based on Hadoop framework |
CN103812674A (en) * | 2012-11-07 | 2014-05-21 | 北京信威通信技术股份有限公司 | Method for main and standby server replacement |
-
2016
- 2016-11-02 CN CN201610952589.4A patent/CN106357813B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609303A (en) * | 2012-01-18 | 2012-07-25 | 华为技术有限公司 | Slow-task dispatching method and slow-task dispatching device of Map Reduce system |
CN103812674A (en) * | 2012-11-07 | 2014-05-21 | 北京信威通信技术股份有限公司 | Method for main and standby server replacement |
CN103414761A (en) * | 2013-07-23 | 2013-11-27 | 北京工业大学 | Mobile terminal cloud resource scheduling method based on Hadoop framework |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107707620A (en) * | 2017-08-30 | 2018-02-16 | 华为技术有限公司 | Handle the method and device of I/O request |
Also Published As
Publication number | Publication date |
---|---|
CN106357813B (en) | 2019-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102737088B (en) | Seamless upgrade in distributed data base system | |
DE69627240T2 (en) | Synchronized data transfer between units of a processing system | |
Johansen et al. | NAP: Practical fault-tolerance for itinerant computations | |
US8949801B2 (en) | Failure recovery for stream processing applications | |
JP5373770B2 (en) | Deterministic computing systems, methods, and program storage devices (distributed, fault tolerant, and high availability computing systems) to achieve distributed, fault tolerant, and high availability | |
US8875157B2 (en) | Deployment of pre-scheduled tasks in clusters | |
US20090177914A1 (en) | Clustering Infrastructure System and Method | |
US7562154B2 (en) | System and method for filtering stale messages resulting from membership changes in a distributed computing environment | |
US7480823B2 (en) | In-memory replication of timing logic for use in failover within application server node clusters | |
JP2018156683A (en) | Multi-database log with multi-item transaction support | |
JP2008542858A5 (en) | ||
CN106302709B (en) | A kind of method and system of network file management | |
CN103678051B (en) | A kind of online failure tolerant method in company-data processing system | |
CN106357813A (en) | Task rescheduling method applied in shared-file system | |
CN111800484A (en) | Service anti-destruction replacing method for mobile edge information service system | |
CN102916830A (en) | Implement system for resource service optimization allocation fault-tolerant management | |
Zhong et al. | Dynamic lines of collaboration in CPS disruption response | |
Li et al. | Fault-tolerant cluster management for reliable high-performance computing | |
CN103095832A (en) | Distributed storage method based on communication reliability | |
Pattanaik et al. | Recovery and reliability prediction in fault tolerant automotive embedded system | |
JP4818379B2 (en) | Distributed system having failover function and failover method in the same system | |
CN107908797A (en) | A kind of ETL data stream treatment technology method and systems in real time | |
Patel et al. | Fault tolerance mechanisms and its implementation in cloud computing–a review | |
Singh et al. | Important factors for estimating reliability of SOA | |
JPH08249196A (en) | Redundancy execution system for task |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |