CN109104304A - A kind of distribution real time fail processing method - Google Patents

A kind of distribution real time fail processing method Download PDF

Info

Publication number
CN109104304A
CN109104304A CN201810819362.1A CN201810819362A CN109104304A CN 109104304 A CN109104304 A CN 109104304A CN 201810819362 A CN201810819362 A CN 201810819362A CN 109104304 A CN109104304 A CN 109104304A
Authority
CN
China
Prior art keywords
node
task
key
failure
real time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810819362.1A
Other languages
Chinese (zh)
Other versions
CN109104304B (en
Inventor
秦佳峰
杨祎
林颖
李程启
白德盟
冯新岩
周超
刘洋
贾然
李龙龙
郑文杰
孙景文
韩明明
乔颖
王娟娟
王宏安
罗雄飞
郭超平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Institute of Software of CAS
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Institute of Software of CAS
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Institute of Software of CAS, Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201810819362.1A priority Critical patent/CN109104304B/en
Publication of CN109104304A publication Critical patent/CN109104304A/en
Application granted granted Critical
Publication of CN109104304B publication Critical patent/CN109104304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources

Abstract

The present invention provides a kind of distributed real time fail processing method, which is characterized in that the method includes: S1: establishing task-set τ={ τ of real time fail processingi| 1≤i≤n }, wherein n indicates to constitute the n task of task-set τ, each task τiRespectively corresponding one has the critical fault tree TR of mixingi;S2: according to the execution state of failure, the dispatching method of failure task is determined;S3: the failure that system generates is matched with the safety operation figure of its troubleshooting according to the dispatching method of step S2, completes the elimination of failure by the task-set obtained using step S1.This method can complete real-time troubleshooting under distributed environment, and in view of the Restoration Mechanism in the case of the potential sprawling of failure.

Description

A kind of distribution real time fail processing method
Technical field
The invention belongs to real time reaction formula systems and real-time technique field, and in particular to a kind of distribution real time fail processing Method.
Background technique
Large-scale and complex distributed system is monitored and controlled in real time in order to realize, by powerful sensing section The key node of point deployment in systems, and it has been directly connected to internet, collected information is transmitted to accordingly in real time Server cluster in the instruction that is calculated, and needed to be implemented back to sensing node or control node, complete predetermined Security target.This kind of ultra-large complexity that centralization with distributed coordination operation are cooperatively formed by the multiple types network integration The real time reaction formula system of network, referred to as complicated real time reaction formula system.Smart grid is the one of complicated real time reaction formula system A Typical Representative.
Complicated real time reaction formula system is usually related to the security of the lives and property, society and Environmental security, concerns safely, There is high requirement of real-time, i.e., after the event for needing to pay close attention to occurs, system must be completed corresponding dynamic within the given time limit These events of opposing are responded, and the intelligent operation of a large amount of even magnanimity needs enterprising in different nodes, different equipment There are stringent regulation in row, the execution order of these operations and time;Once response has exceeded its time limit or has operation in mistake Equipment on, mistake at the time of execute, execute time overlength, execution order mistake, it will cause catastrophic consequences: Ren Yuanchong The harm of wound or the dead perhaps serious damage or environment of equipment.
The comment on network thousands of miles of complicated real time reaction formula system, equipment is multifarious, environment is ever-changing, in the whole network By can acquire information in real time, rapid data operation, complete relevant business operation in time in range, to the fortune of whole system Row is monitored;Once breaking down, the modes such as the reaction equation system that needs to concern by actual time safety is quickly checked, diagnosed It reduces loss, repair rapidly.Complicated intelligent business should be had timely completed as target using trouble saving when operating normally to grasp Make;Failure occur when find failure in time, according to current failure state, the fusion situation of multiple network and network information state, The various states such as distributed equipment state carry out emergency action and self-regeneration to the failure of appearance to eliminate event within the time limit Barrier, to guarantee the safety of system;Its key problem is to study the Real-Time Scheduling problem of its distributed fault processing task.
In complicated real time reaction formula system when faulty generation, if cannot locate in time due to limited system resources Reason, may causing other, there are business or the new failures of data correlation, constantly occur under distributed environment to break down The case where sprawling.For the chain reaction that this failure may occur, there is no consider for current complicated real time reaction formula system How the real-time of troubleshooting is guaranteed under its chain reaction, to affect the success rate and safety of fault recovery.
Summary of the invention
For complicated real time reaction formula system the deficiencies in the prior art, the present invention provides a kind of new distribution events in real time Hinder processing method, this method can complete real-time troubleshooting under distributed environment, and consider the potential sprawling situation of failure Under Restoration Mechanism.
The technical scheme is that realize in the following manner:
A kind of distribution real time fail processing method, the method include:
S1: task-set τ={ τ of real time fail processing is establishedi| 1≤i≤n }, wherein n indicates to constitute the n of task-set τ A task, each task τiRespectively corresponding one has the critical fault tree TR of mixingi
S2: according to the execution state of failure, the dispatching method of failure task is determined;
S3: the task-set obtained using step S1, the failure for generating system according to the dispatching method of step S2 and its event The safety operation figure of barrier processing is matched, and the elimination of failure is completed.
Further, the specific implementation process of step S1 are as follows:
S11: creation task τiThe primary fault node τ of corresponding fault treei,1
S12: according to previous fault data, failure τ is derivedi,1The consequent malfunction node of initiation forms τi,1Subsequent section Point, until all malfunctioning node τi,jAll establish;
S13: all malfunctioning node τ are establishedi,jSet form task τi
S14: task τ is utilizediEstablish the task-set τ of real time fail processing.
Further, malfunctioning node τi,jWith the corresponding relationship between safety operation figure are as follows: Wherein, Gi,jRepresent processing τi,jCorresponding failure institute The safety operation figure that need to be executed, includes ni,jA subtask for carrying out safety operationDi,jIt is Gi,jThe opposite off period,It is subtaskThe execution time needed for completing safety operation.
Further, malfunctioning node τi,jSet τi(ri,TRi)={ τi,j|1≤j≤ ni, wherein TRiIndicate oriented Tree, riIt is TRiThe ready time of initial malfunctioning node, τi,jIndicate TRiEach node.
Further, the specific implementation process of step S2 are as follows:
S21: the execution state of analysis failure τ i default, according to fault tree TRiKey where source node, confirmation is crucial Node;
S22: MCE2E task cluster is formed according to key node, wherein the ordinary node in each cluster is chosen to be saved according to key The comprehensive decision of key state and its pressing degree where the pressing degree and ordinary node of point;If not yet there is crucial section Point is then initially formed the Candidate Set of MCE2E task cluster according to the node of the key state of current highest;
S23: according to the round of key node, the dispatching method of each cluster interior joint is established.
Further, task τiThe execution state of its default of representative failure is its fault tree TRiSource node where Key, i.e. τii,1, TRiSource node Its In, Gi,1Represent processing τi,1The safety operation figure that need to be executed, Gi,1Only one originating task and a whole task, including ni,jIt is a into The subtask of row safety operation
Further, in step S23, the execution method of dispatching method are as follows: scheduling window of every wheel in the cluster key node The three phases that ordinary node is likely to occur in interior judgement cluster,
If being in key state reservation phase, all nodes are all executed in the case where currently mixing key state, at this point, tired The long-pending execution time is not up to the upper limit of the key state of the mixing;
If be in key state switch step, ordinary node be key node successful execution and concede processor money Source;
If being in the key state more new stage, due to the key state switching that second stage generates, update general In logical node, the subsequent node information in other clusters.
Further, in key state switch step, method is specifically executed are as follows:
According to the key state and its pressing degree where ordinary node, choose key state it is lower and idle when Between relatively abundant ordinary node carry out degradation execution;
If key state conversion occurs for the ordinary node for being downgraded execution, next common section is chosen from Candidate Set Point carries out degradation execution.
Further, the specific steps that ordinary node degradation executes are as follows:
1) the critical task subclass of highest is scheduled, is looked for subtask of each key node on partial order figure The local off period allocation plan schedulable to one kind,
2) according under current key character state execution time demand and the off period, in conjunction with local off period splitting scheme come Can analysis find sufficiently long idle processor length on multiple agent to complete to execute;
3) if task can be by successful dispatch, the task is by current key state access and executes;Otherwise, this Business activates the inter-related task of the next key state of grade, and goes to and 2) continue to execute.
The beneficial effects of the present invention are:
The present invention is directed to the demand for security of complicated real time reaction formula system, present in complicated real time reaction formula system Scheduling problem devises the real time fail processing method under distributed environment, with increase malfunctioning node safety operation processing at Power, and reduce the subsequent triggers rate of failure.The present invention judges that can existing system resource meet according to Schedulable conditions The off period constraint of reasoning task in system, according to the determination processing sequence that its scheduling strategy is reasoning task, be real-time reasoning Process distributes reasonable system resource, and judges that can newly arrived reasoning task have reasoning task in not influence system Under the premise of be safely completed, real-time reasoning process when system enters normal operation if being able to satisfy;Otherwise, with total repair time Most short, most short failure sprawling length is target, is scheduled to the self-healing multiple agent in system, solves effective fault restoration Solution makes system that can also avoid losing in the worst cases as far as possible.This method is suitable for complicated real time reaction formula system, both Guarantee the safe operation of multiple agent entirety, and the subsequent failure rate that fault treating procedure can be made to occur is few, failure degree of expansion It is small, to improve the high real-time and high reliability of complicated real time reaction formula system.
Detailed description of the invention
Fig. 1 is the mapping schematic diagram of fault tree and safety operation figure of the invention;
Fig. 2 is distributed real time fail processing task model figure of the invention;
Fig. 3 is the flow chart of the method for the present invention.
Specific embodiment
Below in conjunction with the attached drawing specific embodiment that the present invention will be described in detail, following disclosure provides specific embodiment For realizing the device of the invention and method, those skilled in the art is made to be more clearly understood that how to realize the present invention.In order to Simplify disclosure of the invention, hereinafter the component of specific examples and setting are described.In addition, the present invention can be in different examples Repeat reference numerals or letter in son.This repetition is for purposes of simplicity and clarity, itself not indicate discussed various Relationship between embodiment or setting.It should be noted that illustrated component is not drawn necessarily to scale in the accompanying drawings.The present invention saves The description to known assemblies and treatment technology and process has been omited to avoid the present invention is unnecessarily limiting.It will be appreciated that though this Invention describes its preferred embodiment, however these are elaborations to embodiment, rather than limits the present invention Range.
The principle of entire technical solution are as follows:
Fault tree set is the malfunction that the possibility rule of thumb summarized occurs or has occurred and that, each malfunction A time span is all corresponded to, the off period is defined herein as, if cannot complete in the off period, new failure will be caused.
Safety operation figure includes the task processing sequence of all safety of daily maintenance and troubleshooting, with the side of digraph Formula is stored.
As shown in Figure 1, any one fault tree or normal condition chain, one at least corresponded in a safety operation figure is complete Whole sequence of operation subtree;After corresponding intelligent agent completes all operationss sequence, then the troubleshooting;If cannot provide It is completed in time, failure cannot exclude, and generate new failure, and fault tree enters next link, need to complete more safety The sequence of operation.
As shown in Figures 2 and 3, the present processes mainly comprise the steps that a kind of distributed real time fail processing side Method, the method include:
S1: task-set τ={ τ of real time fail processing is establishedi| 1≤i≤n }, wherein n indicates to constitute the n of task-set τ A task, each task τiRespectively corresponding one has the critical fault tree TR of mixingi, the mixing of task is key to be referred to Fault tree TRiUpper failure τiThe different degrees of τ of extensioni,j
The specific implementation principle and process of step S1 are as follows:
S11: creation task τiThe primary fault node τ of corresponding fault treei,1
S12: according to previous fault data, failure τ is derivedi,1The consequent malfunction node of initiation forms τi,1Subsequent section Point, until all malfunctioning node τi,jIt all establishes, forms fault tree TRi
TRiDirected edge is from τ if it existsi,jIt is directed toward τi,l, then τi,jIt is τi,lFather node, τi,lIt is τi,jChild node.τi,lOnly In τi,jCorresponding safety operation figure, which executes, exceeds τi,jOff period when be just triggered and ready, at this point, τi,jIt terminates and holds immediately Row.
The subtask for not having father node is source node, and the subtask of child node is not terminal note.Each node only has one A father node and multiple child nodes, TRiOnly one source node and multiple terminal notes.
S13: all malfunctioning node τ are establishedi,jSet form task τi.Node τ on any fault treei,jCorrespond to by Handling failure τ defined in directed acyclic graphi,jThe safety operation figure G of required executioni,j.Failure τi,jSuccess or not is eliminated to depend on In all safety operations (meet off period constraint) in its corresponding safety operation figure can be completed in time.
TRiThere is niA node, each node τi,jIt is by directed acyclic graph Gi,jIt is defined, represent processing τi,jIt is corresponding The safety operation figure executed needed for failure, includes ni,jA subtask for carrying out safety operationMalfunctioning node τi,jWith safety Corresponding relationship between operation diagram are as follows:Wherein, Gi,jRepresent processing τi,jThe safety operation figure executed needed for corresponding failure, includes ni,jA son for carrying out safety operation is appointed BusinessDi,jIt is Gi,jOpposite off period (namely TRiTransfer occurs and generates key variation between upper difference node Minimum interval),It is subtaskThe execution time needed for completing safety operation.
Gi,jDirected edge indicate τi,jThe safety operation for carrying out troubleshooting executes stream, defines τi,jSafety operation son The temporal constraint of task.Gi,jIn if it exists directed edge from subtaskIt is directed toward subtaskThenIt isIt is direct before It drives,It isImmediate successor.Gi,jIn if it exists directed walk from subtaskUp to subtaskThenIt is's Forerunner,It isIt is subsequent.
There is no the subtask of forerunner to be known as originating task, not subsequent subtask is known as whole task.It must be until all It could start to execute after the completion of direct precursor,May there are multiple direct precursors and multiple immediate successors;Gi,jOnly one source Task and a whole task.
S14: task τ is utilizediEstablish task-set τ={ τ of real time fail processingi|1≤i ≤n}。
If τiThe corresponding task of upper first node was completed to execute within its off period, then the fault tree not back-propagation; Otherwise, the corresponding task of next node on the fault tree is triggered.τiIf going to the last one subtask, peace is entered The state to concern entirely, that is, becomeOtherwise, it isAllIt must complete to execute before its off period,Allow to miss the off period and enters next subtask.
Engineer often according to be effectively treated in the different functionalities of system, system the distribution of resource, system it is existing about Beam and with the proximity etc. in physical environment between sensor and driver, rank is designed in system based on its experience and preference Section is good by the mapping settings between the task and process resource in system.
Therefore, different faults tree τ is adhered to separatelyiSubtask according to action type difference, be assigned to the distribution of corresponding classification It is executed on formula processor, the subtask that can be executed on certain processor set is denoted as Ψ (h).
S2: according to the execution state of failure, the dispatching method of failure task is determined.By right to active fault tree node institute All subtasks in safety operation figure answered are scheduled, and not only can guarantee that all MCE2E tasks are all schedulable, but also can make Failure degree of expansion minimum (the i.e. mean failure rate degree of expansion minimum MIN (AVG (et of task-seti)) or maximum failure extension journey Spend minimum MIN (MAX (eti)))。
The specific implementation process of step S2 are as follows:
Failure τ is analyzed firstiThe execution state of default, task τiThe execution state of its default of representative failure is its event Barrier tree TRiSource node where key, i.e. τii,1
TRiSource node Gi,1It represents Handle τi,1The safety operation figure that need to be executed, Gi,1Only one originating task and a whole task, including ni,jA carry out safety operation Subtask
Then, MCE2E task cluster is formed according to key node, sub-clustering is carried out to active fault tree node, by being closed at one Key node and several ordinary nodes form, it may be assumed that
MCE2E task cluster is formed on intelligent body according to key node, the ordinary node in each cluster is chosen according to key The pressing degree of node and the key state where ordinary node and its comprehensive decision of pressing degree;If not yet there is crucial section Point is then initially formed the Candidate Set of MCE2E task cluster according to the node of the key state of current highest.
S23: according to the round of key node, the dispatching method of each cluster interior joint, the execution method of dispatching method are established Are as follows: scheduling window of every wheel in the cluster key nodeThe three phases that ordinary node is likely to occur in interior judgement cluster:
If being in key state reservation phase, all nodes are all executed in the case where currently mixing key state, at this point, tired The long-pending execution time is not up to the upper limit of the key state of the mixing.
If be in key state switch step, ordinary node be key node successful execution and concede processor money Source is potentially caused certain ordinary nodes and is converted to higher key state.
For key state switch step, implementation strategy are as follows: according to the key state where ordinary node and its tightly Compel degree, chooses the ordinary node that key state is lower and free time is relatively abundant and carry out degradation execution;If being downgraded Key state conversion occurs for the ordinary node of execution, then chooses next ordinary node from Candidate Set and carry out degradation execution.
It should be noted that in key state switch step, it should be ensured that ordinary node executes bring due to degrading Key conversion is as few as possible.
The key state more new stage is updated related general due to the key state switching that second stage generates Logical node has the subsequent node information in other clusters of temporal constraint.
It, can will be by if key node is completed after executing in the switching of key state and key state more new stage Disconnected ordinary node restores to execute, and generates to reduce unnecessary key state switching, to reduce by higher key Off period caused by state task misses spread length.
For in key node, the scheduling of key subtask is described in detail below with reference to concrete instance.
If Pk i,1It is in Gi,1In from τk i,1To all paths between whole task, Pk i,1In longest path be known as critical path Diameter Pi,1 kcri, the length is Ci,1 kcri;Pi,1 kcriOn subtask be known as crucial subtask.
Since any task delay in critical path can all cause the delay of overall task response time, base can use In the critical path and correlation technique of depth-first search, analysis is scheduled to the task based on graph model, is found out to task Crucial execution sequence is dispatched, preferably analyzes the executive condition of the task on the whole.
The slack time of crucial subtask is minimum, executes crucial subtask as soon as possible preferably to obtain the best sound of its task Between seasonable.By giving subtask τk i,1Distribute local off period dk i,1, it is respectively completed all subtasks on respective intelligent body It executes demand, while the off period d of any subtaskk i,1No more than its affiliated task τi,1Off period di,1, thus institute There is task τiIt can be under its initial key character state by successful dispatch.
For this purpose, forerunner subtask will excessively cannot be reserved enough using the slack time on intelligent body to subsequent subtask Time complete execute.For this purpose, the optimization aim of local off period distribution method is so that τ on each intelligent bodyk i,1Minimum Path relaxation degree is maximum [4] [5], is that slack time as much as possible is saved in the follow-up work of these subtasks, to help to meet Constraint (the i.e. d of its affiliated task total off periodk i,1)。
Meanwhile also to guarantee that all subtask set Ψs (h) of the different task on the intelligent body also can successfully be adjusted Degree.
Optimization aim are as follows: max:min { di,1-dk i,1-Ci,1 kcrik i,1∈Ψ(h)}.With mixed integer linear programming or non- Linear programming model solves the optimization problem.
Constraint condition are as follows: rk I, 1+Ck I, 1≤dk I, 1≤dI, 1-CI, 1 kcri,
If rightThe solution of the problem can be found, All It runs succeeded under the initial key character state of source node;Otherwise, the τ of failure is dispatchedi,1It is terminated on its safety operation figure immediately It operates and enters more advancedly key, all τi,1The failure that represents of child node be all triggered, need to be all to what is newly triggered Failure is handled.
Work as τiPresent node set in have node τi,jIt is TRiOn destination node when, system is by τiKey be defined as Highest is key, by τi,jIt is defined as key node, other nodes are that have different critical ordinary nodes.System at regular intervals prison It surveys the triggering situation of key node and checks that can the execution of key node meet the constraint of its off period;If being unable to satisfy, need It selects suitable ordinary node to be interrupted on a processor in key node and postpones to execute, allowed for the execution of key node Processor resource out;When key node meets off period constraint, the ordinary node being interrupted can be continued to execute;It will be above-mentioned The degradation that method is known as ordinary node executes.
The specific strategy that ordinary node degradation executes is as follows:
1) task subclass Υ critical to highestcriIt is scheduled: for son of each key node on partial order figure Task, by for ΥcriIn a distributed manner to a kind of schedulable local off period allocation plan is found, to ensure multiple agent Processor resource can successfully dispatch the subtask of all key nodes;It is reserved at the free time as much as possible to other tasks simultaneously Manage device resource.
2) to other tasks Υnon-criIt is scheduled, task is defaulted as τI, 1Key state;According to current key Execution time demand and off period under character state, can analyze in conjunction with local off period splitting scheme look on multiple agent It completes to execute to sufficiently long idle processor length.
3) if task can be by successful dispatch, the task is by current key state access and executes;Otherwise, this Business activates the inter-related task of the next key state of grade, and goes to and 2) continue to execute.
Might as well set in system has m processor, and Ψ (h) is enabled to be preassigned safety operation on some processor to be all The set (these subtasks can be seized mutually) of subtask, i.e. Ψ={ Ψ (h), 1≤h≤m }.Each intelligent body will according to appoint Business subtask is divided on partial order figure local off period, to Ψ (h),On subtask be scheduled. Steps are as follows:
For the new local off period of all subtasks distribution in Ψ, it is the smallest to select the local off period in Ψ (h)It goes It executes,
When the subtask of Ψ (h)It, will when completionCompletion information notify toWherein,?It is abandoned from Ψ (h);Part is selected in Ψ (h) Off period the smallest subtaskIt executes,
Ψ (l) is obtainedCompletion information after, allowIt is ready, It calculates'sAnd the part cut-off new for all subtasks distribution in Ψ (l) Phase.
If rightDivide local off period failure, and τI, jIt is Υnon-criIn task, by τI, jThe next grade caused Key state activation.
Aforesaid operations are repeated until all task executions finish.
S3: the task-set obtained using step S1, the failure for generating system according to the dispatching method of step S2 and its event The safety operation figure of barrier processing is matched, and the elimination of failure is completed.
If τi,1In its off period Di,1All operations on safety operation figure are completed before, then, τiAlso it is carried out into Function, otherwise, τi,1Execution have exceeded Di,1, then, τi,1The operation on its safety operation figure is terminated immediately and enters next stage Key τi,2i,2It is τi,1Child node), and carry out troubleshooting (i.e. since current time τ by the key statei= τi,2, and τiOff period be updated to Di,2, τi,2Referred to as present node).
If τi,1There are multiple child nodes, then, the failure that all child nodes represent all is triggered, and becomes and work as prosthomere Point, and possess its respective off period constraint.
And so on, by τiThe fault tree TR of representativeiThere may be multiple opposite off periods to constrain, and is by all respectively Defined in present node.
If fault tree TRiUpper all present nodes all complete the operation on its safety operation figure before its off period, then Task τiIt is schedulable;At this point, any present node on all fault trees or be not destination node or satisfaction Ci,SINK_NODE≤Di,SINK_NODE.Fault tree TRiAny one upper destination node, if the operation on its corresponding safety operation figure Execution has exceeded its off period, then, τiThe scheduled failure of representative troubleshooting.
In addition, application range of the invention is not limited to the technique, mechanism, system of specific embodiment described in specification It makes, material composition, means, method and step.From the disclosure, will be easy as those skilled in the art Ground understands, for current technique that is existing or will developing later, mechanism, manufacture, material composition, means, method or Step, the knot that the function or acquisition that wherein they are executed is substantially the same with the corresponding embodiment that the present invention describes are substantially the same Fruit can apply them according to the present invention.Therefore, appended claims of the present invention are intended to these techniques, mechanism, system It makes, material composition, means, method or step are included in its protection scope.

Claims (9)

1. a kind of distribution real time fail processing method, which is characterized in that the method includes:
S1: task-set τ={ τ of real time fail processing is establishedi| 1≤i≤n }, wherein n indicates the n task of composition task-set τ, Each task τiRespectively corresponding one has the critical fault tree TR of mixingi
S2: according to the execution state of failure, the dispatching method of failure task is determined;
S3: the task-set obtained using step S1, at the failure and its failure generated system according to the dispatching method of step S2 The safety operation figure of reason is matched, and the elimination of failure is completed.
2. a kind of distributed real time fail processing method according to claim 1, which is characterized in that the specific reality of step S1 Existing process are as follows:
S11: creation task τiThe primary fault node τ of corresponding fault treei,1
S12: according to previous fault data, failure τ is derivedi,1The consequent malfunction node of initiation forms τi,1Descendant node, directly To all malfunctioning node τi,jAll establish;
S13: all malfunctioning node τ are establishedi,jSet form task τi
S14: task τ is utilizediEstablish the task-set τ of real time fail processing.
3. a kind of distributed real time fail processing method according to claim 2, which is characterized in that malfunctioning node τi,jWith Corresponding relationship between safety operation figure are as follows: Its In, Gi,jRepresent processing τi,jThe safety operation figure executed needed for corresponding failure, includes ni,jA son for carrying out safety operation TaskDi,jIt is Gi,jThe opposite off period,It is subtaskThe execution time needed for completing safety operation.
4. a kind of distributed real time fail processing method according to claim 3, which is characterized in that malfunctioning node τi,j's Set τi(ri,TRi)={ τi,j|1≤j≤ni, wherein TRiIndicate directed tree, riIt is TRiInitial malfunctioning node it is ready when Between, τi,jIndicate TRiEach node.
5. a kind of distributed real time fail processing method according to claim 1, which is characterized in that the specific reality of step S2 Existing process are as follows:
The execution state of S21: analysis task τ i default, according to fault tree TRiIt is key where source node, confirm key node;
S22: MCE2E task cluster is formed according to key node, wherein the ordinary node in each cluster is chosen according to key node The comprehensive decision of key state and its pressing degree where pressing degree and ordinary node;If not yet there is key node, The Candidate Set of MCE2E task cluster is initially formed according to the node of the key state of current highest;
S23: according to the round of key node, the dispatching method of each cluster interior joint is established.
6. a kind of distributed real time fail processing method according to claim 5, which is characterized in that task τiRepresentative The execution state of its default of failure is its fault tree TRiSource node where key, i.e. τii,1, TRiSource nodeWherein, Gi,1Represent processing τi,1The peace that need to be executed Full operation figure, Gi,1Only one originating task and a whole task, including ni,jA subtask for carrying out safety operation
7. a kind of distributed real time fail processing method according to claim 5, which is characterized in that in step S23, scheduling The execution method of method are as follows: every wheel judges ordinary node is likely to occur in cluster three in the scheduling window of the cluster key node Stage,
If being in key state reservation phase, all nodes are all executed in the case where currently mixing key state, at this point, accumulation The execution time is not up to the upper limit of the key state of the mixing;
If be in key state switch step, ordinary node be key node successful execution and concede processor resource;
If being in the key state more new stage, due to the key state switching that second stage generates, common section is updated Subsequent node information in point, in other clusters.
8. a kind of distributed real time fail processing method according to claim 7, which is characterized in that cut in key state It changes the stage, specifically executes method are as follows:
According to the key state and its pressing degree where ordinary node, choose that key state is lower and free time phase Degradation execution is carried out to abundant ordinary node;
If the conversion of key state occurs for the ordinary node for being downgraded execution, chosen from Candidate Set next ordinary node into Row, which degrades, to be executed.
9. a kind of distributed real time fail processing method according to claim 8, which is characterized in that ordinary node degradation is held Capable specific steps are as follows:
1) the critical task subclass of highest is scheduled, finds one to subtask of each key node on partial order figure The schedulable local off period allocation plan of kind,
2) according under current key character state execution time demand and the off period, analyzed in conjunction with local off period splitting scheme Sufficiently long idle processor length can be found on multiple agent to complete to execute;
3) if task can be by successful dispatch, the task is by current key state access and executes;Otherwise, which swashs The inter-related task of the next key state of grade living, and go to and 2) continue to execute.
CN201810819362.1A 2018-07-24 2018-07-24 Distributed real-time fault processing method Active CN109104304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810819362.1A CN109104304B (en) 2018-07-24 2018-07-24 Distributed real-time fault processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810819362.1A CN109104304B (en) 2018-07-24 2018-07-24 Distributed real-time fault processing method

Publications (2)

Publication Number Publication Date
CN109104304A true CN109104304A (en) 2018-12-28
CN109104304B CN109104304B (en) 2021-06-01

Family

ID=64847231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810819362.1A Active CN109104304B (en) 2018-07-24 2018-07-24 Distributed real-time fault processing method

Country Status (1)

Country Link
CN (1) CN109104304B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111784246A (en) * 2020-07-01 2020-10-16 深圳市检验检疫科学研究院 Logistics path estimation method
CN111784248A (en) * 2020-07-01 2020-10-16 深圳市检验检疫科学研究院 Logistics tracing method
CN117453379A (en) * 2023-12-25 2024-01-26 麒麟软件有限公司 Scheduling method and system for AOE network computing tasks in Linux system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105759171A (en) * 2016-03-30 2016-07-13 广西电网有限责任公司南宁供电局 Method for improving distribution network switching-out inspection efficiency based on distribution line condition evaluation
CN106372785A (en) * 2016-08-29 2017-02-01 陈赛 System fault data processing method based on characteristic index
CN106886667A (en) * 2017-04-14 2017-06-23 中国人民解放军海军航空工程学院 A kind of complication system availability analysis method based on event scheduling
US20170193143A1 (en) * 2015-12-31 2017-07-06 Palo Alto Research Center Incorporated Method for modelica-based system fault analysis at the design stage
CN108021435A (en) * 2017-12-14 2018-05-11 南京邮电大学 A kind of cloud computing task stream scheduling method with fault-tolerant ability based on deadline

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170193143A1 (en) * 2015-12-31 2017-07-06 Palo Alto Research Center Incorporated Method for modelica-based system fault analysis at the design stage
CN105759171A (en) * 2016-03-30 2016-07-13 广西电网有限责任公司南宁供电局 Method for improving distribution network switching-out inspection efficiency based on distribution line condition evaluation
CN106372785A (en) * 2016-08-29 2017-02-01 陈赛 System fault data processing method based on characteristic index
CN106886667A (en) * 2017-04-14 2017-06-23 中国人民解放军海军航空工程学院 A kind of complication system availability analysis method based on event scheduling
CN108021435A (en) * 2017-12-14 2018-05-11 南京邮电大学 A kind of cloud computing task stream scheduling method with fault-tolerant ability based on deadline

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
THANYALAK CHALERMARREWONG等: "Failure Prediction of Data Centers Using Time Series and Fault Tree Analysis", 《 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS》 *
杜洁敏: "智能变电站故障诊断模型和恢复策略的研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111784246A (en) * 2020-07-01 2020-10-16 深圳市检验检疫科学研究院 Logistics path estimation method
CN111784248A (en) * 2020-07-01 2020-10-16 深圳市检验检疫科学研究院 Logistics tracing method
CN117453379A (en) * 2023-12-25 2024-01-26 麒麟软件有限公司 Scheduling method and system for AOE network computing tasks in Linux system
CN117453379B (en) * 2023-12-25 2024-04-05 麒麟软件有限公司 Scheduling method and system for AOE network computing tasks in Linux system

Also Published As

Publication number Publication date
CN109104304B (en) 2021-06-01

Similar Documents

Publication Publication Date Title
CN110794800B (en) Intelligent factory information management monitoring system
CN109104304A (en) A kind of distribution real time fail processing method
CN106815071A (en) Big data job scheduling system based on directed acyclic graph
CN110222923A (en) Dynamically configurable big data analysis system
CN103399787B (en) A kind of MapReduce operation streaming dispatching method and dispatching patcher calculating platform based on Hadoop cloud
CN105159769A (en) Distributed job scheduling method suitable for heterogeneous computational capability cluster
CN102663543A (en) Scheduling system used for enterprise data unification platform
US11833685B2 (en) System using natural conversation for monitoring a facility
CN110569113A (en) Method and system for scheduling distributed tasks and computer readable storage medium
CN107273589A (en) Reconstruction strategy generation system and its generation method based on DIMA systems
CN110798339A (en) Task disaster tolerance method based on distributed task scheduling framework
Aggarwal et al. Incorporating Autonomic Capability as Quality Attribute for a Software System
Piatkowska et al. Online Reasoning about the Root Causes of Software Rollout Failures in the Smart Grid
Zhou et al. Improving the dependability of self-adaptive cyber physical system with formal compositional contract
CN112350862A (en) Monitoring alarm and fault self-healing system
Dai et al. Enhancing distributed automation systems with efficiency and reliability by applying autonomic service management
Schreiber et al. Context-aware self adapting systems: a ground for the cooperation of data, software, and services
Wei et al. Model checking for the goal-feedback-result pattern in ROS
Seilonen et al. Agent technology and process automation
Lau et al. An artificial immune systems (AIS)-based unified framework for general job shop scheduling
CN103888495A (en) Execution method and system for combination service
Li et al. Large-scale software unit testing on the grid.
CN107066366A (en) The Complex event processing engine status monitoring of internet of things oriented and Disaster Recovery Method
Amin et al. A time-triggered scheduling algorithm for active diagnosis in heterogeneous distributed systems
Yep et al. A framework for a knowledge-based cell controller for flexible manufacturing systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant