CN103391207A

CN103391207A - Heterogeneous fault management system

Info

Publication number: CN103391207A
Application number: CN2012101399293A
Authority: CN
Inventors: 姚军; 赵磊; 袁跃峰; 张小林; 左德参
Original assignee: SHANGHAI FITSCO INTELLIGENT TRAFFIC CONTROL CO Ltd
Current assignee: SHANGHAI FITSCO INTELLIGENT TRAFFIC CONTROL CO Ltd
Priority date: 2012-05-08
Filing date: 2012-05-08
Publication date: 2013-11-13
Anticipated expiration: 2032-05-08
Also published as: CN103391207B

Abstract

The invention discloses a heterogeneous fault management system. A system fault management module is composed of two heterogeneous fault management modules. An operation mode of passive receiving and an operation mode of active query are adopted for the two fault management modules respectively. When any one of the fault management modules is started, the system enters a fault management mode. The first fault management module collects fault alarm information passively, and the second fault management module determines the normal working state of a task actively. The two fault management modules monitor task execution through two different channels, and therefore the execution of system fault management is effectively guaranteed. Various modes can be adopted for the second fault management module to monitor the task according to characteristics of the task, and therefore flexibility is good. The first fault management module is rapid in response, and the second fault management module is stricter in fault judgment. The two fault management modules are combined for using, therefore, the system can be rapid in response to a fault, and even if any one of the fault management modules goes wrong, the situation that the function of the system fault management is ineffective can not be caused.

Description

The Fault Management System of isomery

Technical field

The invention belongs to the software algorithm field, be used for carrying out the fault management capability of track traffic safety product.For example, safety computer of rail transit platform, train automatic protection etc.

Background technology

(be applied to the industries such as aviation electronics, railway signal, nuclear power) in industrial control system and safety signal system, take measures under nonserviceabling to avoid as far as possible or reduce the infringement of fault to the person and property, the correct execution of fault management task has extremely important effect for assurance system safe and reliable.When fault occurred, if fault management capability can not in time respond, the possibility of system generation security incident just greatly increased.Therefore, must guarantee that fault management capability can work in all cases.Existing fault management technology emphasis is different, below enumerates 3 existing technology.

1.IBM, US patent No.6,654,910, " Intelligent fault management ", set forth the intelligent trouble management method that is used for automotive electronics, can guarantee the shortest failure recovery time and the high availability of system.Control system is comprised of a plurality of logic control elements, and each logic control element has corresponding fault monitoring method.When fault being detected, control unit meeting degradated system performance, and notify other control unit.

2.NEC, US patent 7,003,696, " Fault management system for switching equipment ", described the Fault Management System of a switching equipment.When recoverable fault appearred in the processor of a switching equipment or circuit, relevant fault terminal can be detected automatically.When a clock failure checkout gear detected clocking fault, it can report to central Fault Management System.Central authorities' Fault Management System is sent processor and peripheral circuit reset signal, and reports to outside display terminal.

3. Bosch GmbH Robert, CN200780036171.8, " method and apparatus that is used for troubleshooting ".Described the method for carrying out fault management in having the system of a plurality of assemblies, its core is to show by means of state value the malfunction of described assembly, and the state value between assembly has certain dependence.

Above these 3 kinds of methods are all also to take knockdown failure management method, but are different from the operational mode that active inquiry in this patent and passive response (a positive and a negative) combine, and the concrete processing mode of each fault management module also has larger difference.

Summary of the invention:

The technical problem to be solved in the present invention is to provide a kind of Fault Management System of isomery, and it can carry out fault management, early warning reliably.

In order to solve above technical problem, the invention provides a kind of Fault Management System of isomery, the system failure management module is comprised of the fault management module of two isomeries, two fault management module take respectively the mode of passive reception and active inquiry to move, when any one fault management module started, system just entered the fault management pattern.

Beneficial effect of the present invention is: the passive collection fault alarm information of fault management module one, and fault management module two is initiatively confirmed the normal operating conditions of task.These two kinds of fault management module come monitor task to carry out by two different channels, have effectively ensured the execution of system failure management.The monitoring of two pairs of tasks of fault management module can be adopted various ways according to the characteristics of task, and flexibility is good.Fault management module one is swift in response, and fault management module two judges that fault is stricter.Being used in combination of two kinds of fault management module can make system make response to fault rapidly, and when any one fault management module goes wrong, also can not cause the system failure management disabler.

Fault management module one takes the mode of passive reception to move, the fault alarm information that the Real Time Monitoring task is sent.When not receiving fault alarm information, think the system normal operation., if receive fault alarm information, start immediately fault reaction mechanism.

Fault management module one adopts Real Time Monitoring and receives the mode that error message is reported, and just is in the state of monitoring after operation is got up always; A plurality of trouble shooting points are arranged in all software tasks, and when having fault to occur, trace routine is determined fault type and grade, and triggers the failure alarm signal amount, fault message is written to the message queue of appointment; After fault management module one is received the failure alarm signal amount, start immediately and take out fault alarm information from the failure message formation, according to fault type and grade, making the different processing such as warning, off-line operation or shutdown.

Fault management module two take the initiative the inquiry mode move, the periodic duty Mission Monitor, can report on one's own initiative separately operating state when each cycle of monitoring of task, think the system normal operation, if there is any task not report operating state within the specific time, fault management module two starts fault reaction mechanism.

The fault management module two-way is crossed a plurality of global variables and is checked the normal operating conditions of monitored task; If each task can work, they can the corresponding global variable of proper operation; Fault management module two is determined the operating state of institute's monitor task by the inspection of global variable; If all tasks can correctly be processed global variable, fault management module two thinks that system works is normal, does not trigger fault management capability; , if this fault management module two finds that the task of any monitoring can not the proper operation global variable, just judge that certain task breaks down, and start fault management module and process.

Description of drawings:

Fig. 1 is the structure chart of system failure management function.

Fig. 2 is the workflow diagram of fault administration module one.

Fig. 3 is the workflow diagram of fault administration module two.

Embodiment:

The invention provides a kind of Fault Management System of isomery, this algorithm can be applied to include but are not limited to: the fields such as safety signal system, industrial control system.Detailed process is as follows: the fault management capability of system depends on the collaborative work of the Fault Management System of two kinds of isomeries.When any one failure management method detected fault, system just entered the fault management pattern.The fault management capability framework of system is referring to Fig. 1.Suppose that two kinds of failure management methods are respectively by fault management module one and fault management module two realizations.As shown in Figure 2, fault management module one adopts Real Time Monitoring and receives the mode that error message is reported, and just is in the state of monitoring after operation is got up always.A plurality of trouble shooting points are arranged in all software tasks, and when having fault to occur, trace routine is determined fault type and grade, and triggers the failure alarm signal amount, fault message is written to the message queue of appointment.After fault management module one is received the failure alarm signal amount, start immediately and take out fault alarm information from the failure message formation, according to fault type and grade, making the different processing such as warning, off-line operation or shutdown.

As shown in Figure 3, the fault management module two-way is crossed a plurality of global variables and is checked the normal operating conditions of monitored task.If each task can work, they can the corresponding global variable of proper operation.Fault management module two is determined the operating state of institute's monitor task by the inspection of global variable.If all tasks can correctly be processed global variable, fault management module two thinks that system works is normal, does not trigger fault management capability., if this fault management module two finds that the task of any monitoring can not the proper operation global variable, just judge that certain task breaks down, and start fault management module and process.Any one in two kinds of failure management methods detects fault and enters troubleshooting, and whole system just enters fault mode, and forces system to enter safe condition.

The course of work of fault management module one:

1) at first to determine the task that fault management module one is monitored, and as far as possible at large consider the various faults that each task there will be, and the fault of each task is numbered and classifies.When trace routine detected the fault generation, the fault-signal amount was set to " True ", and with in the information package Write fault message queues such as fault numbering, fault level and processing mode.In order to guarantee exactly catastrophe failure information to be reported and submitted out, failure message can add check information, for example crc value when writing formation.And also have specific logical relation between the content of failure message, for example corresponding fault level has specific troubleshooting mode.If find in troubleshooting that the incorrect or logic of the information checking of fault message is not inconsistent, and directly makes equipment enter off-line state or by operating personnel, determines follow-up processing.

2) fault management module one must be to the fault message code division fault level of institute's monitor task, and the treatment measures of every kind of fault level are not identical yet.After fault management module one is received fault message, can be according to code judgement out of order spot, fault level and the fault state of fault message.Because fault level presets, fault management module one can determine follow-up measure according to the code of fault message.We can be divided into warning, off-line operation and three kinds of states of shutdown usually at the troubleshooting of design at present.

3) fault management module is in listening state always after operation.After the set of fault-signal amount being detected, can obtain fault message at once from the failure message formation, and start troubleshooting, make system enter the fault management pattern.

The course of work of fault management module two:

1) at first to determine the task that fault management module two is monitored.Monitoring here of task can be different with fault management module one, and while formulating the normal operation of each task must with fault management module two carry out mutual.For example, each task triggers the time requirement (one-period or a plurality of cycle) of fault management module two and interactive form (set of semaphore, reply the mutual etc. of formula for function call, the operation of global variable).

2) supposition fault management module two adopts the form of monitoring global variable to check the state of each task.Within each cycle, monitored task all will remove to operate according to predefined algorithm the global variable of oneself.The initial value of global variable can be provided in each cycle at random by fault management module two.Can correctly operate the global variable of oneself when all monitor tasks, fault management module two just thinks that monitor task is working properly.If any one task does not have at the appointed time or operated mistakenly corresponding global variable, fault management module two will triggering system enter the troubleshooting state.Different tasks can produce different fault categories and grade.

3) startup of fault management module two can be triggered by Interruption, periodic duty.The inspection intervals of task can be divided into one or more cycle.This parameter can be read by fault management module two from corresponding configuration file.

4) if all tasks can operate global variable in official hour, and the verification by fault management module two, fault management module two thinks that system works is normal.The action with fault management module two defineds if certain task fails to finish on schedule, fault management module two starts, and records corresponding fault message, and makes system enter the troubleshooting pattern.The troubleshooting of system can be divided into three kinds of warning, off-line operation and shutdown.

5) fault management module two also can adopt and directly reply mutual test mode and judge the operating state of monitored task.Fault management module two can send the request of inspection by semaphore, and solicited message is written in corresponding message queue.If monitored task can be according to the solicited message return data in official hour, and the checking by fault management mould fast two, malfunction do not triggered.Otherwise will start malfunction.

The overall global variable that fault management module two also can check, and the end product of this global variable should be the operational set of all monitored tasks.Also can adopt the combination of a plurality of fault management passages, each fault management passage is responsible for the troubleshooting of dissimilar or grade.

The passive collection fault alarm information of fault management module one of the present invention, and fault management module two is initiatively confirmed the normal operating conditions of task.These two kinds of fault management module come monitor task to carry out by two different channels, have effectively ensured the execution of system failure management.The monitoring of two pairs of tasks of fault management module can be adopted various ways according to the characteristics of task, and flexibility is good.Fault management module one is swift in response, and fault management module two judges that fault is stricter.Being used in combination of two kinds of fault management module can make system make response to fault rapidly, and when any one fault management module goes wrong, also can not cause the system failure management disabler.

Adopt the fault of semaphore and message queue to receive and processing mode in fault management processing mode one, adopted the processing mode of IE and global variable monitor task poll in troubleshooting mode two.On concrete technical finesse means, these two kinds of troubleshooting modes have the characteristics of oneself, are the technology of oneself inventing.And the enhancement method of two kinds of fault combined treatment should be original in field of track traffic.

The present invention is not limited to execution mode discussed above.Above description to embodiment is intended in order to describe and illustrate the technical scheme that the present invention relates to.Based on the apparent conversion of the present invention enlightenment or substitute and also should be considered to fall into protection scope of the present invention.Above embodiment is used for disclosing best implementation method of the present invention, so that those of ordinary skill in the art can apply numerous embodiments of the present invention and multiple alternative reaches purpose of the present invention.

Claims

1. the Fault Management System of an isomery, it is characterized in that, the system failure management module is comprised of the fault management module of two isomeries, two fault management module take respectively the mode of passive reception and active inquiry to move, when any one fault management module started, system just entered the fault management pattern.

2. the Fault Management System of isomery as claimed in claim 1, is characterized in that, fault management module one takes the mode of passive reception to move, the fault alarm information that the Real Time Monitoring task is sent.When not receiving fault alarm information, think the system normal operation; , if receive fault alarm information, start immediately fault reaction mechanism.

3. the Fault Management System of isomery as claimed in claim 2, is characterized in that, fault management module one adopts Real Time Monitoring and receives the mode that error message is reported, and just is in the state of monitoring after operation is got up always; A plurality of trouble shooting points are arranged in all software tasks, and when having fault to occur, trace routine is determined fault type and grade, and triggers the failure alarm signal amount, fault message is written to the message queue of appointment; After fault management module one is received the failure alarm signal amount, start immediately and take out fault alarm information from the failure message formation, according to fault type and grade, making the different processing such as warning, off-line operation or shutdown.

4. the Fault Management System of isomery as claimed in claim 1, it is characterized in that, fault management module two take the initiative the inquiry mode move, the periodic duty Mission Monitor, can report on one's own initiative separately operating state when each cycle of monitoring of task, think the system normal operation, if there is any task not reporting operating state within the specific time, fault management module two starts fault reactions mechanism.

5. the Fault Management System of isomery as claimed in claim 4, is characterized in that, the fault management module two-way is crossed a plurality of global variables and checked the normal operating conditions of monitored task; If each task can work, they can the corresponding global variable of proper operation; Fault management module two is determined the operating state of institute's monitor task by the inspection of global variable; If all tasks can correctly be processed global variable, fault management module two thinks that system works is normal, does not trigger fault management capability; , if this fault management module two finds that the task of any monitoring can not the proper operation global variable, just judge that certain task breaks down, and start fault management module and process.