CN106357813A - Task rescheduling method applied in shared-file system - Google Patents

Task rescheduling method applied in shared-file system Download PDF

Info

Publication number
CN106357813A
CN106357813A CN201610952589.4A CN201610952589A CN106357813A CN 106357813 A CN106357813 A CN 106357813A CN 201610952589 A CN201610952589 A CN 201610952589A CN 106357813 A CN106357813 A CN 106357813A
Authority
CN
China
Prior art keywords
node
task
file
shared
seize
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610952589.4A
Other languages
Chinese (zh)
Other versions
CN106357813B (en
Inventor
陈军
闫鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Long Yu Technology (beijing) Ltd By Share Ltd
Original Assignee
Long Yu Technology (beijing) Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Long Yu Technology (beijing) Ltd By Share Ltd filed Critical Long Yu Technology (beijing) Ltd By Share Ltd
Priority to CN201610952589.4A priority Critical patent/CN106357813B/en
Publication of CN106357813A publication Critical patent/CN106357813A/en
Application granted granted Critical
Publication of CN106357813B publication Critical patent/CN106357813B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1034Reaction to server failures by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources

Abstract

The invention provides a task rescheduling method applied in a shared-file system. The task rescheduling method comprises the following steps: when a certain node fails, preemptive operation for tasks of failed nodes is carried out among a plurality of unfailed nodes, and the node with successful preemption takes over the tasks of the failed nodes. The preemptive operation is realized by utilizing renaming operation for a certain file, i.e., renaming operation for the same file is carried out by the plurality of unfailed nodes simultaneously, and a certain node with successful operation is the node with successful preemption. The task rescheduling method provided by the invention has the advantages that the implementation is simple, no single-point fault is caused, the simultaneous fault of the plurality of nodes can be tolerated, and the tasks can be better assigned to all nodes in a distributed type system. The task rescheduling method provided by the invention has the advantages that the characteristics of the shared-file system can be better utilized, so that the problem of task rescheduling when the nodes fail is better solved.

Description

A kind of task dispatching method again being applied to shared-file system
Technical field
The present invention relates to communication technical field, particularly a kind of task dispatching method again being applied to shared-file system.
Background technology
Mass data processing is the typical case's application of one of distributed system.In this type of application, there is the number of a class application According to there being following two features:
(1) it is saved in shared-file system, each node of system can be transferred through the client visit of shared-file system Ask these data.
(2) there is general character in data, and the general character of data has fixing span, and data can be protected according to span Exist in different files.
For the feature of this kind of application, the data file of different spans is typically scheduled to different nodes by us Processed.During processing data, node failure and node recover to be a problem needing to consider.How will lose efficacy The task of node reschedule to other nodes and node recover when how again taking over tasks be to ensure that what task smoothly completed One key factor, method proposed by the present invention is directed to this problem.
The task of node failure and node recovery is dispatched and is generally had following several ways at present:
Centralized: a special scheduling node is set in which, and other nodes are all to process node, scheduling node Monitoring processes the state of node.When certain processes node failure, the task of this node is reassigned to other healthy nodes, When failure node gets well state, original task is taken over by this node again.This mode is realized simply, but adjusts Degree node is likely to become bottleneck, and there is Single Point of Faliure.
Two-by-two mutually for formula: match two-by-two between which interior joint, match node main and standby relation each other.Suppose node a It is pairing node with b, when node a lost efficacy, its task takes over original task, the mistake of node b by b process, node a when recovering Effect and recovery are also such.This mode realizes simply, there is not Single Point of Faliure problem, but if pairing node all lost efficacy, then Its task does not have node to take over.
Concentrating type: which is the mutual upgraded version for formula two-by-two, when a node failure, needs multiple health nodes Between hold consultation and select a node and take over its task.This mode can tolerate that multiple nodes break down simultaneously, but It is related to the communication between multiple nodes, implement more complicated.
Content of the invention
In order to overcome the deficiencies in the prior art, the invention provides a kind of task of being applied to shared-file system is dispatched again Method, the method is realized simply, not to be had Single Point of Faliure and can tolerate that multiple nodes break down, and can preferably divide task simultaneously Task each node in distributed system.
The technical solution adopted for the present invention to solve the technical problems is: a kind of task of being applied to shared-file system is again Dispatching method, comprises the steps:
When certain node failure, carry out failure node task between multiple non-failure nodes seizes operation, seizes into The task of the node taking over failing node of work(.
Preferably, described seizing operates to realize using operating to the renaming of a certain file, that is, the plurality of does not lose efficacy Node carries out renaming operation simultaneously to same file, and successfully certain node is to seize successful node for operation.
Preferably, the file named with " ti-nj " creating in shared-file system when described file is initial, its In, ti is mission number, and nj is node serial number.
Preferably, on each node run a process, described seize operation before, travel through above-mentioned All Files, inspection Look into filename and the modification time of each file, judge whether to seize operation.
Preferably, to number node as k, as a example the file of the entitled ti-nj of file, described judge whether to seize behaviour Make step particularly as follows:
If a. i=k, j=k, show that this task, just by node k process, updates the modification time of this document, at continuation Manage this task;
If b. i=k, j ≠ k, show this task reason node k process, node k once lost efficacy, and was seized by other nodes Processed, present node k has recovered to health status, carry out seizing back the process of this node tasks;
If c. i ≠ k, j=k, show this task reason node ni process, ni was seized successfully by nk when losing efficacy, and now updated The modification time of this document simultaneously continues with this task;
If d. i ≠ k, j ≠ k, show that this task, by other node processing, now judges whether nj lost efficacy, if losing efficacy, by Other nodes carry out seizing operation.
Preferably, carry out described in step b seizing back the process of this node tasks particularly as follows: checking repairing of file ti-nj Change whether time gap current time has exceeded the out-of-service time, such as exceed, then show that node nj lost efficacy, be now found nj The healthy node losing efficacy is seized, and node nk attempts ti-nj RNTO ti-nk, if success, illustrates to seize successfully, Update the modification time of ti-nk, process task ti;As not less than, then need to be communicated with nj, notify nj stop when predecessor Business, then by ti-nj RNTO ti-nk, if success, illustrates to seize the modification time successfully updating ti-nk, processes and appoint Business ti.
Preferably, judge described in step d whether nj lost efficacy, if losing efficacy, carrying out gunbattle by other nodes and operating specifically For: check whether the modification time of file ti-nj has exceeded the out-of-service time apart from current time, such as exceed, then node nj loses Effect, the healthy node that now be found nj lost efficacy is seized, and node nk attempts ti-nj RNTO ti-nk, if become Work(then illustrates to seize the modification time successfully updating ti-nk, process task ti.
The positive effect of the present invention: set forth herein a kind of implementation method of concentrating type scheduling, the method need not multiple nodes Between carry out communication and select hosting node, but allow and carry out seizing of failure node task between multiple nodes, seize successfully The task of node taking over failing node, node is also carried out seizing of task when recovering.The process wherein seized needs one and " cuts out Sentence " determining that who can seize successfully, and the server of shared-file system just can serve as the role of this " judge ".Seize This operation operates to realize using file renaming, and multiple file system client renaming to same file simultaneously Only have one during operation successfully to return, only have one and seize successfully.The mechanism that decision node lost efficacy can be based on file Modification time is realizing.The present invention using multiple client in shared-file system carry out file renaming have exclusiveness this Feature, proposes a kind of task dispatching method again carrying out task preemption in node failure using renaming operation, side of the present invention Method is realized simply, not to be had Single Point of Faliure and can tolerate that multiple nodes break down simultaneously, can preferably be dispatched to be distributed by task Each node in formula system, the inventive method can preferably utilize the feature of shared-file system, solves section well The problem that during point failure, task is dispatched again.
Brief description
Fig. 1 is the schematic flow sheet of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawings to a preferred embodiment of the present invention will be described in detail.
With reference to Fig. 1, the preferred embodiment of the present invention provides a kind of task dispatching method again being applied to shared-file system, should Method need not select hosting node by multiple enabling communication between nodes, but allow and carry out failure node task between multiple nodes Seize, seize the task of successful node taking over failing node, node is also carried out seizing of task when recovering.Seize operation to utilize File renaming operates to be realized, and only has one when multiple node operates to the renaming of same file simultaneously and successfully return Return, only have one and seize successfully.The mechanism that decision node lost efficacy can be realized based on filemodetime, specifically real Existing as described below:
It is assumed that nodes are identical with number of tasks (data file number), be n, node serial number from n1 to nn, mission number from T1 to tn.When all nodes are health status, one task of each node processing, i.e. node ni process task ti.We will Task ti is right with node nj composition " task-node ", and that is, ti-nj represents task i by node j process.When initial, Wo Men Create the n file named with " ti-nj " in shared-file system, represent that n " task-node " is right, i.e. t1-n1, t2- N2...tn-nn, shows that ni processes same task ti of numbering, ti is subordinated to ni.
Run a process on each node, travel through this n file, check the modification time of each filename and file. We carry out following judgement to number node as k as a example the file of the entitled ti-nj of file:
1) if i=k, j=k, this shows that this task is processed by this section point, therefore only need to update the modification time of this document , continue with this task.
2) if i=k, j ≠ k, this shows this task this node processing of reason, and this node once lost efficacy, by other nodes Seize and processed, this node recovers to health status now, therefore needs are seized back this section point and processed.Check file Whether whether the modification time of ti-nj exceeded the out-of-service time apart from current time, such as exceedes, then show that node nj lost efficacy, The healthy node that now be found nj lost efficacy is seized.Node nk attempts ti-nj RNTO ti-nk, if success Then the modification time successfully updating ti-nk, process task ti are seized in explanation.As not less than, then need to be communicated with nj, lead to Know that nj stops current task, then by ti-nj RNTO ti-nk, if success, illustrate to seize successfully, update ti-nk's Modification time, process task ti.
3) if i ≠ k, j=k, this shows this task reason node ni process, and ni was seized successfully by nk when losing efficacy, therefore only needed Update the modification time of this document, continue with this task.
4) if i ≠ k, j ≠ k, this shows that this task, by other node processing, now needs to judge whether nj lost efficacy.Check literary composition Whether whether the modification time of part ti-nj exceeded the out-of-service time apart from current time, such as exceedes, then show that node nj loses Effect, the healthy node that now be found nj lost efficacy is seized.Node nk attempts ti-nj RNTO ti-nk, if become Work(then illustrates to seize the modification time successfully updating ti-nk, process task ti.
It is more than number of tasks and nodes identical situation, usual number of tasks can substantially exceed nodes, and we can will appoint Business carries out Hash calculation according to certain rule, and the task with identical cryptographic Hash is dispatched to identical node and is processed.
In sum, the inventive method can utilize the feature of shared-file system well, thus solving section well The problem that during point failure, task is dispatched again.
Above-described only the preferred embodiments of the present invention, the explanation of should be understood that above example is to use In help understand the method for the present invention and its core concept, the protection domain being not intended to limit the present invention, all the present invention's Any modification of being made within thought and principle, equivalent etc., should be included within the scope of the present invention.

Claims (7)

1. a kind of task of being applied to shared-file system again dispatching method it is characterised in that comprising the steps:
When certain node failure, carry out failure node task between multiple non-failure nodes seizes operation, seizes successfully The task of node taking over failing node.
2. a kind of task of being applied to shared-file system according to claim 1 again dispatching method it is characterised in that: institute State to seize and operate to realize using operating to the renaming of a certain file, that is, the plurality of non-failure node is simultaneously to same literary composition Part carries out renaming operation, and successfully certain node is to seize successful node for operation.
3. a kind of task of being applied to shared-file system according to claim 2 again dispatching method it is characterised in that: institute State the file named with " ti-nj " creating in shared-file system when file is initial, wherein, ti is mission number, nj For node serial number.
4. a kind of task of being applied to shared-file system according to claim 3 again dispatching method it is characterised in that: every On individual node run a process, described seize operation before, travel through above-mentioned All Files, check the filename of each file And modification time, judge whether to seize operation.
5. a kind of task of being applied to shared-file system according to claim 4 again dispatching method it is characterised in that: with Numbering is the node of k, as a example the file of the entitled ti-nj of file, the described step judging whether to seize operation particularly as follows:
If a. i=k, j=k, show that this task, just by node k process, updates the modification time of this document, continues with this Task;
If b. i=k, j ≠ k, show this task reason node k process, node k once lost efficacy, and is seized by other nodes and carries out Process, present node k has recovered to health status, carries out seizing back the process of this node tasks;
If c. i ≠ k, j=k, show this task reason node ni process, ni seized successfully by nk when losing efficacy, now update this article The modification time of part simultaneously continues with this task;
If d. i ≠ k, j ≠ k, show that this task, by other node processing, now judges whether nj lost efficacy, if losing efficacy, by other Node carries out seizing operation.
6. a kind of task of being applied to shared-file system according to claim 5 again dispatching method it is characterised in that: step Carry out described in rapid b seizing back the process of this node tasks particularly as follows: checking the modification time of file ti-nj apart from current time Whether exceed the out-of-service time, such as exceeded, then shown that node nj lost efficacy, the healthy node that now be found nj lost efficacy has been carried out Seize, node nk attempts ti-nj RNTO ti-nk, if success, when illustrating to seize the modification successfully updating ti-nk Between, process task ti;As not less than, then need to be communicated with nj, notify nj stop current task, then ti-nj is ordered again Entitled ti-nk, if success, illustrates to seize the modification time successfully updating ti-nk, process task ti.
7. a kind of task of being applied to shared-file system according to claim 5 again dispatching method it is characterised in that: step Judge whether nj lost efficacy described in rapid d, if losing efficacy, carrying out gunbattle by other nodes and operating particularly as follows: checking file ti-nj's Whether modification time has exceeded the out-of-service time apart from current time, such as exceedes, then node nj lost efficacy, and now be found nj loses The healthy node of effect is seized, and node nk attempts ti-nj RNTO ti-nk, if success, illustrates to seize successfully, more The modification time of new ti-nk, process task ti.
CN201610952589.4A 2016-11-02 2016-11-02 A kind of task applied to shared-file system dispatching method again Active CN106357813B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610952589.4A CN106357813B (en) 2016-11-02 2016-11-02 A kind of task applied to shared-file system dispatching method again

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610952589.4A CN106357813B (en) 2016-11-02 2016-11-02 A kind of task applied to shared-file system dispatching method again

Publications (2)

Publication Number Publication Date
CN106357813A true CN106357813A (en) 2017-01-25
CN106357813B CN106357813B (en) 2019-08-06

Family

ID=57863582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610952589.4A Active CN106357813B (en) 2016-11-02 2016-11-02 A kind of task applied to shared-file system dispatching method again

Country Status (1)

Country Link
CN (1) CN106357813B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107707620A (en) * 2017-08-30 2018-02-16 华为技术有限公司 Handle the method and device of I/O request

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609303A (en) * 2012-01-18 2012-07-25 华为技术有限公司 Slow-task dispatching method and slow-task dispatching device of Map Reduce system
CN103414761A (en) * 2013-07-23 2013-11-27 北京工业大学 Mobile terminal cloud resource scheduling method based on Hadoop framework
CN103812674A (en) * 2012-11-07 2014-05-21 北京信威通信技术股份有限公司 Method for main and standby server replacement

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609303A (en) * 2012-01-18 2012-07-25 华为技术有限公司 Slow-task dispatching method and slow-task dispatching device of Map Reduce system
CN103812674A (en) * 2012-11-07 2014-05-21 北京信威通信技术股份有限公司 Method for main and standby server replacement
CN103414761A (en) * 2013-07-23 2013-11-27 北京工业大学 Mobile terminal cloud resource scheduling method based on Hadoop framework

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107707620A (en) * 2017-08-30 2018-02-16 华为技术有限公司 Handle the method and device of I/O request

Also Published As

Publication number Publication date
CN106357813B (en) 2019-08-06

Similar Documents

Publication Publication Date Title
CN102737088B (en) Seamless upgrade in distributed data base system
DE69627240T2 (en) Synchronized data transfer between units of a processing system
Johansen et al. NAP: Practical fault-tolerance for itinerant computations
US8949801B2 (en) Failure recovery for stream processing applications
JP5373770B2 (en) Deterministic computing systems, methods, and program storage devices (distributed, fault tolerant, and high availability computing systems) to achieve distributed, fault tolerant, and high availability
US8875157B2 (en) Deployment of pre-scheduled tasks in clusters
US20090177914A1 (en) Clustering Infrastructure System and Method
US7562154B2 (en) System and method for filtering stale messages resulting from membership changes in a distributed computing environment
US7480823B2 (en) In-memory replication of timing logic for use in failover within application server node clusters
JP2018156683A (en) Multi-database log with multi-item transaction support
JP2008542858A5 (en)
CN106302709B (en) A kind of method and system of network file management
CN103678051B (en) A kind of online failure tolerant method in company-data processing system
CN106357813A (en) Task rescheduling method applied in shared-file system
CN111800484A (en) Service anti-destruction replacing method for mobile edge information service system
CN102916830A (en) Implement system for resource service optimization allocation fault-tolerant management
Zhong et al. Dynamic lines of collaboration in CPS disruption response
Li et al. Fault-tolerant cluster management for reliable high-performance computing
CN103095832A (en) Distributed storage method based on communication reliability
Pattanaik et al. Recovery and reliability prediction in fault tolerant automotive embedded system
JP4818379B2 (en) Distributed system having failover function and failover method in the same system
CN107908797A (en) A kind of ETL data stream treatment technology method and systems in real time
Patel et al. Fault tolerance mechanisms and its implementation in cloud computing–a review
Singh et al. Important factors for estimating reliability of SOA
JPH08249196A (en) Redundancy execution system for task

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant