CN109104304A - A kind of distribution real time fail processing method - Google Patents
A kind of distribution real time fail processing method Download PDFInfo
- Publication number
- CN109104304A CN109104304A CN201810819362.1A CN201810819362A CN109104304A CN 109104304 A CN109104304 A CN 109104304A CN 201810819362 A CN201810819362 A CN 201810819362A CN 109104304 A CN109104304 A CN 109104304A
- Authority
- CN
- China
- Prior art keywords
- node
- task
- key
- failure
- real time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/60—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
Abstract
The present invention provides a kind of distributed real time fail processing method, which is characterized in that the method includes: S1: establishing task-set τ={ τ of real time fail processingi| 1≤i≤n }, wherein n indicates to constitute the n task of task-set τ, each task τiRespectively corresponding one has the critical fault tree TR of mixingi;S2: according to the execution state of failure, the dispatching method of failure task is determined;S3: the failure that system generates is matched with the safety operation figure of its troubleshooting according to the dispatching method of step S2, completes the elimination of failure by the task-set obtained using step S1.This method can complete real-time troubleshooting under distributed environment, and in view of the Restoration Mechanism in the case of the potential sprawling of failure.
Description
Technical field
The invention belongs to real time reaction formula systems and real-time technique field, and in particular to a kind of distribution real time fail processing
Method.
Background technique
Large-scale and complex distributed system is monitored and controlled in real time in order to realize, by powerful sensing section
The key node of point deployment in systems, and it has been directly connected to internet, collected information is transmitted to accordingly in real time
Server cluster in the instruction that is calculated, and needed to be implemented back to sensing node or control node, complete predetermined
Security target.This kind of ultra-large complexity that centralization with distributed coordination operation are cooperatively formed by the multiple types network integration
The real time reaction formula system of network, referred to as complicated real time reaction formula system.Smart grid is the one of complicated real time reaction formula system
A Typical Representative.
Complicated real time reaction formula system is usually related to the security of the lives and property, society and Environmental security, concerns safely,
There is high requirement of real-time, i.e., after the event for needing to pay close attention to occurs, system must be completed corresponding dynamic within the given time limit
These events of opposing are responded, and the intelligent operation of a large amount of even magnanimity needs enterprising in different nodes, different equipment
There are stringent regulation in row, the execution order of these operations and time;Once response has exceeded its time limit or has operation in mistake
Equipment on, mistake at the time of execute, execute time overlength, execution order mistake, it will cause catastrophic consequences: Ren Yuanchong
The harm of wound or the dead perhaps serious damage or environment of equipment.
The comment on network thousands of miles of complicated real time reaction formula system, equipment is multifarious, environment is ever-changing, in the whole network
By can acquire information in real time, rapid data operation, complete relevant business operation in time in range, to the fortune of whole system
Row is monitored;Once breaking down, the modes such as the reaction equation system that needs to concern by actual time safety is quickly checked, diagnosed
It reduces loss, repair rapidly.Complicated intelligent business should be had timely completed as target using trouble saving when operating normally to grasp
Make;Failure occur when find failure in time, according to current failure state, the fusion situation of multiple network and network information state,
The various states such as distributed equipment state carry out emergency action and self-regeneration to the failure of appearance to eliminate event within the time limit
Barrier, to guarantee the safety of system;Its key problem is to study the Real-Time Scheduling problem of its distributed fault processing task.
In complicated real time reaction formula system when faulty generation, if cannot locate in time due to limited system resources
Reason, may causing other, there are business or the new failures of data correlation, constantly occur under distributed environment to break down
The case where sprawling.For the chain reaction that this failure may occur, there is no consider for current complicated real time reaction formula system
How the real-time of troubleshooting is guaranteed under its chain reaction, to affect the success rate and safety of fault recovery.
Summary of the invention
For complicated real time reaction formula system the deficiencies in the prior art, the present invention provides a kind of new distribution events in real time
Hinder processing method, this method can complete real-time troubleshooting under distributed environment, and consider the potential sprawling situation of failure
Under Restoration Mechanism.
The technical scheme is that realize in the following manner:
A kind of distribution real time fail processing method, the method include:
S1: task-set τ={ τ of real time fail processing is establishedi| 1≤i≤n }, wherein n indicates to constitute the n of task-set τ
A task, each task τiRespectively corresponding one has the critical fault tree TR of mixingi;
S2: according to the execution state of failure, the dispatching method of failure task is determined;
S3: the task-set obtained using step S1, the failure for generating system according to the dispatching method of step S2 and its event
The safety operation figure of barrier processing is matched, and the elimination of failure is completed.
Further, the specific implementation process of step S1 are as follows:
S11: creation task τiThe primary fault node τ of corresponding fault treei,1;
S12: according to previous fault data, failure τ is derivedi,1The consequent malfunction node of initiation forms τi,1Subsequent section
Point, until all malfunctioning node τi,jAll establish;
S13: all malfunctioning node τ are establishedi,jSet form task τi;
S14: task τ is utilizediEstablish the task-set τ of real time fail processing.
Further, malfunctioning node τi,jWith the corresponding relationship between safety operation figure are as follows: Wherein, Gi,jRepresent processing τi,jCorresponding failure institute
The safety operation figure that need to be executed, includes ni,jA subtask for carrying out safety operationDi,jIt is Gi,jThe opposite off period,It is subtaskThe execution time needed for completing safety operation.
Further, malfunctioning node τi,jSet τi(ri,TRi)={ τi,j|1≤j≤ ni, wherein TRiIndicate oriented
Tree, riIt is TRiThe ready time of initial malfunctioning node, τi,jIndicate TRiEach node.
Further, the specific implementation process of step S2 are as follows:
S21: the execution state of analysis failure τ i default, according to fault tree TRiKey where source node, confirmation is crucial
Node;
S22: MCE2E task cluster is formed according to key node, wherein the ordinary node in each cluster is chosen to be saved according to key
The comprehensive decision of key state and its pressing degree where the pressing degree and ordinary node of point;If not yet there is crucial section
Point is then initially formed the Candidate Set of MCE2E task cluster according to the node of the key state of current highest;
S23: according to the round of key node, the dispatching method of each cluster interior joint is established.
Further, task τiThe execution state of its default of representative failure is its fault tree TRiSource node where
Key, i.e. τi=τi,1, TRiSource node Its
In, Gi,1Represent processing τi,1The safety operation figure that need to be executed, Gi,1Only one originating task and a whole task, including ni,jIt is a into
The subtask of row safety operation
Further, in step S23, the execution method of dispatching method are as follows: scheduling window of every wheel in the cluster key node
The three phases that ordinary node is likely to occur in interior judgement cluster,
If being in key state reservation phase, all nodes are all executed in the case where currently mixing key state, at this point, tired
The long-pending execution time is not up to the upper limit of the key state of the mixing;
If be in key state switch step, ordinary node be key node successful execution and concede processor money
Source;
If being in the key state more new stage, due to the key state switching that second stage generates, update general
In logical node, the subsequent node information in other clusters.
Further, in key state switch step, method is specifically executed are as follows:
According to the key state and its pressing degree where ordinary node, choose key state it is lower and idle when
Between relatively abundant ordinary node carry out degradation execution;
If key state conversion occurs for the ordinary node for being downgraded execution, next common section is chosen from Candidate Set
Point carries out degradation execution.
Further, the specific steps that ordinary node degradation executes are as follows:
1) the critical task subclass of highest is scheduled, is looked for subtask of each key node on partial order figure
The local off period allocation plan schedulable to one kind,
2) according under current key character state execution time demand and the off period, in conjunction with local off period splitting scheme come
Can analysis find sufficiently long idle processor length on multiple agent to complete to execute;
3) if task can be by successful dispatch, the task is by current key state access and executes;Otherwise, this
Business activates the inter-related task of the next key state of grade, and goes to and 2) continue to execute.
The beneficial effects of the present invention are:
The present invention is directed to the demand for security of complicated real time reaction formula system, present in complicated real time reaction formula system
Scheduling problem devises the real time fail processing method under distributed environment, with increase malfunctioning node safety operation processing at
Power, and reduce the subsequent triggers rate of failure.The present invention judges that can existing system resource meet according to Schedulable conditions
The off period constraint of reasoning task in system, according to the determination processing sequence that its scheduling strategy is reasoning task, be real-time reasoning
Process distributes reasonable system resource, and judges that can newly arrived reasoning task have reasoning task in not influence system
Under the premise of be safely completed, real-time reasoning process when system enters normal operation if being able to satisfy;Otherwise, with total repair time
Most short, most short failure sprawling length is target, is scheduled to the self-healing multiple agent in system, solves effective fault restoration
Solution makes system that can also avoid losing in the worst cases as far as possible.This method is suitable for complicated real time reaction formula system, both
Guarantee the safe operation of multiple agent entirety, and the subsequent failure rate that fault treating procedure can be made to occur is few, failure degree of expansion
It is small, to improve the high real-time and high reliability of complicated real time reaction formula system.
Detailed description of the invention
Fig. 1 is the mapping schematic diagram of fault tree and safety operation figure of the invention;
Fig. 2 is distributed real time fail processing task model figure of the invention;
Fig. 3 is the flow chart of the method for the present invention.
Specific embodiment
Below in conjunction with the attached drawing specific embodiment that the present invention will be described in detail, following disclosure provides specific embodiment
For realizing the device of the invention and method, those skilled in the art is made to be more clearly understood that how to realize the present invention.In order to
Simplify disclosure of the invention, hereinafter the component of specific examples and setting are described.In addition, the present invention can be in different examples
Repeat reference numerals or letter in son.This repetition is for purposes of simplicity and clarity, itself not indicate discussed various
Relationship between embodiment or setting.It should be noted that illustrated component is not drawn necessarily to scale in the accompanying drawings.The present invention saves
The description to known assemblies and treatment technology and process has been omited to avoid the present invention is unnecessarily limiting.It will be appreciated that though this
Invention describes its preferred embodiment, however these are elaborations to embodiment, rather than limits the present invention
Range.
The principle of entire technical solution are as follows:
Fault tree set is the malfunction that the possibility rule of thumb summarized occurs or has occurred and that, each malfunction
A time span is all corresponded to, the off period is defined herein as, if cannot complete in the off period, new failure will be caused.
Safety operation figure includes the task processing sequence of all safety of daily maintenance and troubleshooting, with the side of digraph
Formula is stored.
As shown in Figure 1, any one fault tree or normal condition chain, one at least corresponded in a safety operation figure is complete
Whole sequence of operation subtree;After corresponding intelligent agent completes all operationss sequence, then the troubleshooting;If cannot provide
It is completed in time, failure cannot exclude, and generate new failure, and fault tree enters next link, need to complete more safety
The sequence of operation.
As shown in Figures 2 and 3, the present processes mainly comprise the steps that a kind of distributed real time fail processing side
Method, the method include:
S1: task-set τ={ τ of real time fail processing is establishedi| 1≤i≤n }, wherein n indicates to constitute the n of task-set τ
A task, each task τiRespectively corresponding one has the critical fault tree TR of mixingi, the mixing of task is key to be referred to
Fault tree TRiUpper failure τiThe different degrees of τ of extensioni,j。
The specific implementation principle and process of step S1 are as follows:
S11: creation task τiThe primary fault node τ of corresponding fault treei,1。
S12: according to previous fault data, failure τ is derivedi,1The consequent malfunction node of initiation forms τi,1Subsequent section
Point, until all malfunctioning node τi,jIt all establishes, forms fault tree TRi。
TRiDirected edge is from τ if it existsi,jIt is directed toward τi,l, then τi,jIt is τi,lFather node, τi,lIt is τi,jChild node.τi,lOnly
In τi,jCorresponding safety operation figure, which executes, exceeds τi,jOff period when be just triggered and ready, at this point, τi,jIt terminates and holds immediately
Row.
The subtask for not having father node is source node, and the subtask of child node is not terminal note.Each node only has one
A father node and multiple child nodes, TRiOnly one source node and multiple terminal notes.
S13: all malfunctioning node τ are establishedi,jSet form task τi.Node τ on any fault treei,jCorrespond to by
Handling failure τ defined in directed acyclic graphi,jThe safety operation figure G of required executioni,j.Failure τi,jSuccess or not is eliminated to depend on
In all safety operations (meet off period constraint) in its corresponding safety operation figure can be completed in time.
TRiThere is niA node, each node τi,jIt is by directed acyclic graph Gi,jIt is defined, represent processing τi,jIt is corresponding
The safety operation figure executed needed for failure, includes ni,jA subtask for carrying out safety operationMalfunctioning node τi,jWith safety
Corresponding relationship between operation diagram are as follows:Wherein,
Gi,jRepresent processing τi,jThe safety operation figure executed needed for corresponding failure, includes ni,jA son for carrying out safety operation is appointed
BusinessDi,jIt is Gi,jOpposite off period (namely TRiTransfer occurs and generates key variation between upper difference node
Minimum interval),It is subtaskThe execution time needed for completing safety operation.
Gi,jDirected edge indicate τi,jThe safety operation for carrying out troubleshooting executes stream, defines τi,jSafety operation son
The temporal constraint of task.Gi,jIn if it exists directed edge from subtaskIt is directed toward subtaskThenIt isIt is direct before
It drives,It isImmediate successor.Gi,jIn if it exists directed walk from subtaskUp to subtaskThenIt is's
Forerunner,It isIt is subsequent.
There is no the subtask of forerunner to be known as originating task, not subsequent subtask is known as whole task.It must be until all
It could start to execute after the completion of direct precursor,May there are multiple direct precursors and multiple immediate successors;Gi,jOnly one source
Task and a whole task.
S14: task τ is utilizediEstablish task-set τ={ τ of real time fail processingi|1≤i ≤n}。
If τiThe corresponding task of upper first node was completed to execute within its off period, then the fault tree not back-propagation;
Otherwise, the corresponding task of next node on the fault tree is triggered.τiIf going to the last one subtask, peace is entered
The state to concern entirely, that is, becomeOtherwise, it isAllIt must complete to execute before its off period,Allow to miss the off period and enters next subtask.
Engineer often according to be effectively treated in the different functionalities of system, system the distribution of resource, system it is existing about
Beam and with the proximity etc. in physical environment between sensor and driver, rank is designed in system based on its experience and preference
Section is good by the mapping settings between the task and process resource in system.
Therefore, different faults tree τ is adhered to separatelyiSubtask according to action type difference, be assigned to the distribution of corresponding classification
It is executed on formula processor, the subtask that can be executed on certain processor set is denoted as Ψ (h).
S2: according to the execution state of failure, the dispatching method of failure task is determined.By right to active fault tree node institute
All subtasks in safety operation figure answered are scheduled, and not only can guarantee that all MCE2E tasks are all schedulable, but also can make
Failure degree of expansion minimum (the i.e. mean failure rate degree of expansion minimum MIN (AVG (et of task-seti)) or maximum failure extension journey
Spend minimum MIN (MAX (eti)))。
The specific implementation process of step S2 are as follows:
Failure τ is analyzed firstiThe execution state of default, task τiThe execution state of its default of representative failure is its event
Barrier tree TRiSource node where key, i.e. τi=τi,1。
TRiSource node Gi,1It represents
Handle τi,1The safety operation figure that need to be executed, Gi,1Only one originating task and a whole task, including ni,jA carry out safety operation
Subtask
Then, MCE2E task cluster is formed according to key node, sub-clustering is carried out to active fault tree node, by being closed at one
Key node and several ordinary nodes form, it may be assumed that
MCE2E task cluster is formed on intelligent body according to key node, the ordinary node in each cluster is chosen according to key
The pressing degree of node and the key state where ordinary node and its comprehensive decision of pressing degree;If not yet there is crucial section
Point is then initially formed the Candidate Set of MCE2E task cluster according to the node of the key state of current highest.
S23: according to the round of key node, the dispatching method of each cluster interior joint, the execution method of dispatching method are established
Are as follows: scheduling window of every wheel in the cluster key nodeThe three phases that ordinary node is likely to occur in interior judgement cluster:
If being in key state reservation phase, all nodes are all executed in the case where currently mixing key state, at this point, tired
The long-pending execution time is not up to the upper limit of the key state of the mixing.
If be in key state switch step, ordinary node be key node successful execution and concede processor money
Source is potentially caused certain ordinary nodes and is converted to higher key state.
For key state switch step, implementation strategy are as follows: according to the key state where ordinary node and its tightly
Compel degree, chooses the ordinary node that key state is lower and free time is relatively abundant and carry out degradation execution;If being downgraded
Key state conversion occurs for the ordinary node of execution, then chooses next ordinary node from Candidate Set and carry out degradation execution.
It should be noted that in key state switch step, it should be ensured that ordinary node executes bring due to degrading
Key conversion is as few as possible.
The key state more new stage is updated related general due to the key state switching that second stage generates
Logical node has the subsequent node information in other clusters of temporal constraint.
It, can will be by if key node is completed after executing in the switching of key state and key state more new stage
Disconnected ordinary node restores to execute, and generates to reduce unnecessary key state switching, to reduce by higher key
Off period caused by state task misses spread length.
For in key node, the scheduling of key subtask is described in detail below with reference to concrete instance.
If Pk i,1It is in Gi,1In from τk i,1To all paths between whole task, Pk i,1In longest path be known as critical path
Diameter Pi,1 kcri, the length is Ci,1 kcri;Pi,1 kcriOn subtask be known as crucial subtask.
Since any task delay in critical path can all cause the delay of overall task response time, base can use
In the critical path and correlation technique of depth-first search, analysis is scheduled to the task based on graph model, is found out to task
Crucial execution sequence is dispatched, preferably analyzes the executive condition of the task on the whole.
The slack time of crucial subtask is minimum, executes crucial subtask as soon as possible preferably to obtain the best sound of its task
Between seasonable.By giving subtask τk i,1Distribute local off period dk i,1, it is respectively completed all subtasks on respective intelligent body
It executes demand, while the off period d of any subtaskk i,1No more than its affiliated task τi,1Off period di,1, thus institute
There is task τiIt can be under its initial key character state by successful dispatch.
For this purpose, forerunner subtask will excessively cannot be reserved enough using the slack time on intelligent body to subsequent subtask
Time complete execute.For this purpose, the optimization aim of local off period distribution method is so that τ on each intelligent bodyk i,1Minimum
Path relaxation degree is maximum [4] [5], is that slack time as much as possible is saved in the follow-up work of these subtasks, to help to meet
Constraint (the i.e. d of its affiliated task total off periodk i,1)。
Meanwhile also to guarantee that all subtask set Ψs (h) of the different task on the intelligent body also can successfully be adjusted
Degree.
Optimization aim are as follows: max:min { di,1-dk i,1-Ci,1 kcri|τk i,1∈Ψ(h)}.With mixed integer linear programming or non-
Linear programming model solves the optimization problem.
Constraint condition are as follows: rk I, 1+Ck I, 1≤dk I, 1≤dI, 1-CI, 1 kcri,
If rightThe solution of the problem can be found, All
It runs succeeded under the initial key character state of source node;Otherwise, the τ of failure is dispatchedi,1It is terminated on its safety operation figure immediately
It operates and enters more advancedly key, all τi,1The failure that represents of child node be all triggered, need to be all to what is newly triggered
Failure is handled.
Work as τiPresent node set in have node τi,jIt is TRiOn destination node when, system is by τiKey be defined as
Highest is key, by τi,jIt is defined as key node, other nodes are that have different critical ordinary nodes.System at regular intervals prison
It surveys the triggering situation of key node and checks that can the execution of key node meet the constraint of its off period;If being unable to satisfy, need
It selects suitable ordinary node to be interrupted on a processor in key node and postpones to execute, allowed for the execution of key node
Processor resource out;When key node meets off period constraint, the ordinary node being interrupted can be continued to execute;It will be above-mentioned
The degradation that method is known as ordinary node executes.
The specific strategy that ordinary node degradation executes is as follows:
1) task subclass Υ critical to highestcriIt is scheduled: for son of each key node on partial order figure
Task, by for ΥcriIn a distributed manner to a kind of schedulable local off period allocation plan is found, to ensure multiple agent
Processor resource can successfully dispatch the subtask of all key nodes;It is reserved at the free time as much as possible to other tasks simultaneously
Manage device resource.
2) to other tasks Υnon-criIt is scheduled, task is defaulted as τI, 1Key state;According to current key
Execution time demand and off period under character state, can analyze in conjunction with local off period splitting scheme look on multiple agent
It completes to execute to sufficiently long idle processor length.
3) if task can be by successful dispatch, the task is by current key state access and executes;Otherwise, this
Business activates the inter-related task of the next key state of grade, and goes to and 2) continue to execute.
Might as well set in system has m processor, and Ψ (h) is enabled to be preassigned safety operation on some processor to be all
The set (these subtasks can be seized mutually) of subtask, i.e. Ψ={ Ψ (h), 1≤h≤m }.Each intelligent body will according to appoint
Business subtask is divided on partial order figure local off period, to Ψ (h),On subtask be scheduled.
Steps are as follows:
For the new local off period of all subtasks distribution in Ψ, it is the smallest to select the local off period in Ψ (h)It goes
It executes,
When the subtask of Ψ (h)It, will when completionCompletion information notify toWherein,?It is abandoned from Ψ (h);Part is selected in Ψ (h)
Off period the smallest subtaskIt executes,
Ψ (l) is obtainedCompletion information after, allowIt is ready, It calculates'sAnd the part cut-off new for all subtasks distribution in Ψ (l)
Phase.
If rightDivide local off period failure, and τI, jIt is Υnon-criIn task, by τI, jThe next grade caused
Key state activation.
Aforesaid operations are repeated until all task executions finish.
S3: the task-set obtained using step S1, the failure for generating system according to the dispatching method of step S2 and its event
The safety operation figure of barrier processing is matched, and the elimination of failure is completed.
If τi,1In its off period Di,1All operations on safety operation figure are completed before, then, τiAlso it is carried out into
Function, otherwise, τi,1Execution have exceeded Di,1, then, τi,1The operation on its safety operation figure is terminated immediately and enters next stage
Key τi,2(τi,2It is τi,1Child node), and carry out troubleshooting (i.e. since current time τ by the key statei=
τi,2, and τiOff period be updated to Di,2, τi,2Referred to as present node).
If τi,1There are multiple child nodes, then, the failure that all child nodes represent all is triggered, and becomes and work as prosthomere
Point, and possess its respective off period constraint.
And so on, by τiThe fault tree TR of representativeiThere may be multiple opposite off periods to constrain, and is by all respectively
Defined in present node.
If fault tree TRiUpper all present nodes all complete the operation on its safety operation figure before its off period, then
Task τiIt is schedulable;At this point, any present node on all fault trees or be not destination node or satisfaction
Ci,SINK_NODE≤Di,SINK_NODE.Fault tree TRiAny one upper destination node, if the operation on its corresponding safety operation figure
Execution has exceeded its off period, then, τiThe scheduled failure of representative troubleshooting.
In addition, application range of the invention is not limited to the technique, mechanism, system of specific embodiment described in specification
It makes, material composition, means, method and step.From the disclosure, will be easy as those skilled in the art
Ground understands, for current technique that is existing or will developing later, mechanism, manufacture, material composition, means, method or
Step, the knot that the function or acquisition that wherein they are executed is substantially the same with the corresponding embodiment that the present invention describes are substantially the same
Fruit can apply them according to the present invention.Therefore, appended claims of the present invention are intended to these techniques, mechanism, system
It makes, material composition, means, method or step are included in its protection scope.
Claims (9)
1. a kind of distribution real time fail processing method, which is characterized in that the method includes:
S1: task-set τ={ τ of real time fail processing is establishedi| 1≤i≤n }, wherein n indicates the n task of composition task-set τ,
Each task τiRespectively corresponding one has the critical fault tree TR of mixingi;
S2: according to the execution state of failure, the dispatching method of failure task is determined;
S3: the task-set obtained using step S1, at the failure and its failure generated system according to the dispatching method of step S2
The safety operation figure of reason is matched, and the elimination of failure is completed.
2. a kind of distributed real time fail processing method according to claim 1, which is characterized in that the specific reality of step S1
Existing process are as follows:
S11: creation task τiThe primary fault node τ of corresponding fault treei,1;
S12: according to previous fault data, failure τ is derivedi,1The consequent malfunction node of initiation forms τi,1Descendant node, directly
To all malfunctioning node τi,jAll establish;
S13: all malfunctioning node τ are establishedi,jSet form task τi;
S14: task τ is utilizediEstablish the task-set τ of real time fail processing.
3. a kind of distributed real time fail processing method according to claim 2, which is characterized in that malfunctioning node τi,jWith
Corresponding relationship between safety operation figure are as follows: Its
In, Gi,jRepresent processing τi,jThe safety operation figure executed needed for corresponding failure, includes ni,jA son for carrying out safety operation
TaskDi,jIt is Gi,jThe opposite off period,It is subtaskThe execution time needed for completing safety operation.
4. a kind of distributed real time fail processing method according to claim 3, which is characterized in that malfunctioning node τi,j's
Set τi(ri,TRi)={ τi,j|1≤j≤ni, wherein TRiIndicate directed tree, riIt is TRiInitial malfunctioning node it is ready when
Between, τi,jIndicate TRiEach node.
5. a kind of distributed real time fail processing method according to claim 1, which is characterized in that the specific reality of step S2
Existing process are as follows:
The execution state of S21: analysis task τ i default, according to fault tree TRiIt is key where source node, confirm key node;
S22: MCE2E task cluster is formed according to key node, wherein the ordinary node in each cluster is chosen according to key node
The comprehensive decision of key state and its pressing degree where pressing degree and ordinary node;If not yet there is key node,
The Candidate Set of MCE2E task cluster is initially formed according to the node of the key state of current highest;
S23: according to the round of key node, the dispatching method of each cluster interior joint is established.
6. a kind of distributed real time fail processing method according to claim 5, which is characterized in that task τiRepresentative
The execution state of its default of failure is its fault tree TRiSource node where key, i.e. τi=τi,1, TRiSource nodeWherein, Gi,1Represent processing τi,1The peace that need to be executed
Full operation figure, Gi,1Only one originating task and a whole task, including ni,jA subtask for carrying out safety operation
7. a kind of distributed real time fail processing method according to claim 5, which is characterized in that in step S23, scheduling
The execution method of method are as follows: every wheel judges ordinary node is likely to occur in cluster three in the scheduling window of the cluster key node
Stage,
If being in key state reservation phase, all nodes are all executed in the case where currently mixing key state, at this point, accumulation
The execution time is not up to the upper limit of the key state of the mixing;
If be in key state switch step, ordinary node be key node successful execution and concede processor resource;
If being in the key state more new stage, due to the key state switching that second stage generates, common section is updated
Subsequent node information in point, in other clusters.
8. a kind of distributed real time fail processing method according to claim 7, which is characterized in that cut in key state
It changes the stage, specifically executes method are as follows:
According to the key state and its pressing degree where ordinary node, choose that key state is lower and free time phase
Degradation execution is carried out to abundant ordinary node;
If the conversion of key state occurs for the ordinary node for being downgraded execution, chosen from Candidate Set next ordinary node into
Row, which degrades, to be executed.
9. a kind of distributed real time fail processing method according to claim 8, which is characterized in that ordinary node degradation is held
Capable specific steps are as follows:
1) the critical task subclass of highest is scheduled, finds one to subtask of each key node on partial order figure
The schedulable local off period allocation plan of kind,
2) according under current key character state execution time demand and the off period, analyzed in conjunction with local off period splitting scheme
Sufficiently long idle processor length can be found on multiple agent to complete to execute;
3) if task can be by successful dispatch, the task is by current key state access and executes;Otherwise, which swashs
The inter-related task of the next key state of grade living, and go to and 2) continue to execute.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810819362.1A CN109104304B (en) | 2018-07-24 | 2018-07-24 | Distributed real-time fault processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810819362.1A CN109104304B (en) | 2018-07-24 | 2018-07-24 | Distributed real-time fault processing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109104304A true CN109104304A (en) | 2018-12-28 |
CN109104304B CN109104304B (en) | 2021-06-01 |
Family
ID=64847231
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810819362.1A Active CN109104304B (en) | 2018-07-24 | 2018-07-24 | Distributed real-time fault processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109104304B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111784246A (en) * | 2020-07-01 | 2020-10-16 | 深圳市检验检疫科学研究院 | Logistics path estimation method |
CN111784248A (en) * | 2020-07-01 | 2020-10-16 | 深圳市检验检疫科学研究院 | Logistics tracing method |
CN117453379A (en) * | 2023-12-25 | 2024-01-26 | 麒麟软件有限公司 | Scheduling method and system for AOE network computing tasks in Linux system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105759171A (en) * | 2016-03-30 | 2016-07-13 | 广西电网有限责任公司南宁供电局 | Method for improving distribution network switching-out inspection efficiency based on distribution line condition evaluation |
CN106372785A (en) * | 2016-08-29 | 2017-02-01 | 陈赛 | System fault data processing method based on characteristic index |
CN106886667A (en) * | 2017-04-14 | 2017-06-23 | 中国人民解放军海军航空工程学院 | A kind of complication system availability analysis method based on event scheduling |
US20170193143A1 (en) * | 2015-12-31 | 2017-07-06 | Palo Alto Research Center Incorporated | Method for modelica-based system fault analysis at the design stage |
CN108021435A (en) * | 2017-12-14 | 2018-05-11 | 南京邮电大学 | A kind of cloud computing task stream scheduling method with fault-tolerant ability based on deadline |
-
2018
- 2018-07-24 CN CN201810819362.1A patent/CN109104304B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170193143A1 (en) * | 2015-12-31 | 2017-07-06 | Palo Alto Research Center Incorporated | Method for modelica-based system fault analysis at the design stage |
CN105759171A (en) * | 2016-03-30 | 2016-07-13 | 广西电网有限责任公司南宁供电局 | Method for improving distribution network switching-out inspection efficiency based on distribution line condition evaluation |
CN106372785A (en) * | 2016-08-29 | 2017-02-01 | 陈赛 | System fault data processing method based on characteristic index |
CN106886667A (en) * | 2017-04-14 | 2017-06-23 | 中国人民解放军海军航空工程学院 | A kind of complication system availability analysis method based on event scheduling |
CN108021435A (en) * | 2017-12-14 | 2018-05-11 | 南京邮电大学 | A kind of cloud computing task stream scheduling method with fault-tolerant ability based on deadline |
Non-Patent Citations (2)
Title |
---|
THANYALAK CHALERMARREWONG等: "Failure Prediction of Data Centers Using Time Series and Fault Tree Analysis", 《 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS》 * |
杜洁敏: "智能变电站故障诊断模型和恢复策略的研究", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111784246A (en) * | 2020-07-01 | 2020-10-16 | 深圳市检验检疫科学研究院 | Logistics path estimation method |
CN111784248A (en) * | 2020-07-01 | 2020-10-16 | 深圳市检验检疫科学研究院 | Logistics tracing method |
CN117453379A (en) * | 2023-12-25 | 2024-01-26 | 麒麟软件有限公司 | Scheduling method and system for AOE network computing tasks in Linux system |
CN117453379B (en) * | 2023-12-25 | 2024-04-05 | 麒麟软件有限公司 | Scheduling method and system for AOE network computing tasks in Linux system |
Also Published As
Publication number | Publication date |
---|---|
CN109104304B (en) | 2021-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110794800B (en) | Intelligent factory information management monitoring system | |
CN109104304A (en) | A kind of distribution real time fail processing method | |
CN106815071A (en) | Big data job scheduling system based on directed acyclic graph | |
CN110222923A (en) | Dynamically configurable big data analysis system | |
CN103399787B (en) | A kind of MapReduce operation streaming dispatching method and dispatching patcher calculating platform based on Hadoop cloud | |
CN105159769A (en) | Distributed job scheduling method suitable for heterogeneous computational capability cluster | |
CN102663543A (en) | Scheduling system used for enterprise data unification platform | |
US11833685B2 (en) | System using natural conversation for monitoring a facility | |
CN110569113A (en) | Method and system for scheduling distributed tasks and computer readable storage medium | |
CN107273589A (en) | Reconstruction strategy generation system and its generation method based on DIMA systems | |
CN110798339A (en) | Task disaster tolerance method based on distributed task scheduling framework | |
Aggarwal et al. | Incorporating Autonomic Capability as Quality Attribute for a Software System | |
Piatkowska et al. | Online Reasoning about the Root Causes of Software Rollout Failures in the Smart Grid | |
Zhou et al. | Improving the dependability of self-adaptive cyber physical system with formal compositional contract | |
CN112350862A (en) | Monitoring alarm and fault self-healing system | |
Dai et al. | Enhancing distributed automation systems with efficiency and reliability by applying autonomic service management | |
Schreiber et al. | Context-aware self adapting systems: a ground for the cooperation of data, software, and services | |
Wei et al. | Model checking for the goal-feedback-result pattern in ROS | |
Seilonen et al. | Agent technology and process automation | |
Lau et al. | An artificial immune systems (AIS)-based unified framework for general job shop scheduling | |
CN103888495A (en) | Execution method and system for combination service | |
Li et al. | Large-scale software unit testing on the grid. | |
CN107066366A (en) | The Complex event processing engine status monitoring of internet of things oriented and Disaster Recovery Method | |
Amin et al. | A time-triggered scheduling algorithm for active diagnosis in heterogeneous distributed systems | |
Yep et al. | A framework for a knowledge-based cell controller for flexible manufacturing systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |