CN103391207A - Heterogeneous fault management system - Google Patents

Heterogeneous fault management system Download PDF

Info

Publication number
CN103391207A
CN103391207A CN2012101399293A CN201210139929A CN103391207A CN 103391207 A CN103391207 A CN 103391207A CN 2012101399293 A CN2012101399293 A CN 2012101399293A CN 201210139929 A CN201210139929 A CN 201210139929A CN 103391207 A CN103391207 A CN 103391207A
Authority
CN
China
Prior art keywords
fault
fault management
management module
task
failure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101399293A
Other languages
Chinese (zh)
Other versions
CN103391207B (en
Inventor
姚军
赵磊
袁跃峰
张小林
左德参
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI FITSCO INTELLIGENT TRAFFIC CONTROL CO Ltd
Original Assignee
SHANGHAI FITSCO INTELLIGENT TRAFFIC CONTROL CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI FITSCO INTELLIGENT TRAFFIC CONTROL CO Ltd filed Critical SHANGHAI FITSCO INTELLIGENT TRAFFIC CONTROL CO Ltd
Priority to CN201210139929.3A priority Critical patent/CN103391207B/en
Publication of CN103391207A publication Critical patent/CN103391207A/en
Application granted granted Critical
Publication of CN103391207B publication Critical patent/CN103391207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a heterogeneous fault management system. A system fault management module is composed of two heterogeneous fault management modules. An operation mode of passive receiving and an operation mode of active query are adopted for the two fault management modules respectively. When any one of the fault management modules is started, the system enters a fault management mode. The first fault management module collects fault alarm information passively, and the second fault management module determines the normal working state of a task actively. The two fault management modules monitor task execution through two different channels, and therefore the execution of system fault management is effectively guaranteed. Various modes can be adopted for the second fault management module to monitor the task according to characteristics of the task, and therefore flexibility is good. The first fault management module is rapid in response, and the second fault management module is stricter in fault judgment. The two fault management modules are combined for using, therefore, the system can be rapid in response to a fault, and even if any one of the fault management modules goes wrong, the situation that the function of the system fault management is ineffective can not be caused.

Description

The Fault Management System of isomery
Technical field
The invention belongs to the software algorithm field, be used for carrying out the fault management capability of track traffic safety product.For example, safety computer of rail transit platform, train automatic protection etc.
Background technology
(be applied to the industries such as aviation electronics, railway signal, nuclear power) in industrial control system and safety signal system, take measures under nonserviceabling to avoid as far as possible or reduce the infringement of fault to the person and property, the correct execution of fault management task has extremely important effect for assurance system safe and reliable.When fault occurred, if fault management capability can not in time respond, the possibility of system generation security incident just greatly increased.Therefore, must guarantee that fault management capability can work in all cases.Existing fault management technology emphasis is different, below enumerates 3 existing technology.
1.IBM, US patent No.6,654,910, " Intelligent fault management ", set forth the intelligent trouble management method that is used for automotive electronics, can guarantee the shortest failure recovery time and the high availability of system.Control system is comprised of a plurality of logic control elements, and each logic control element has corresponding fault monitoring method.When fault being detected, control unit meeting degradated system performance, and notify other control unit.
2.NEC, US patent 7,003,696, " Fault management system for switching equipment ", described the Fault Management System of a switching equipment.When recoverable fault appearred in the processor of a switching equipment or circuit, relevant fault terminal can be detected automatically.When a clock failure checkout gear detected clocking fault, it can report to central Fault Management System.Central authorities' Fault Management System is sent processor and peripheral circuit reset signal, and reports to outside display terminal.
3. Bosch GmbH Robert, CN200780036171.8, " method and apparatus that is used for troubleshooting ".Described the method for carrying out fault management in having the system of a plurality of assemblies, its core is to show by means of state value the malfunction of described assembly, and the state value between assembly has certain dependence.
Above these 3 kinds of methods are all also to take knockdown failure management method, but are different from the operational mode that active inquiry in this patent and passive response (a positive and a negative) combine, and the concrete processing mode of each fault management module also has larger difference.
Summary of the invention:
The technical problem to be solved in the present invention is to provide a kind of Fault Management System of isomery, and it can carry out fault management, early warning reliably.
In order to solve above technical problem, the invention provides a kind of Fault Management System of isomery, the system failure management module is comprised of the fault management module of two isomeries, two fault management module take respectively the mode of passive reception and active inquiry to move, when any one fault management module started, system just entered the fault management pattern.
Beneficial effect of the present invention is: the passive collection fault alarm information of fault management module one, and fault management module two is initiatively confirmed the normal operating conditions of task.These two kinds of fault management module come monitor task to carry out by two different channels, have effectively ensured the execution of system failure management.The monitoring of two pairs of tasks of fault management module can be adopted various ways according to the characteristics of task, and flexibility is good.Fault management module one is swift in response, and fault management module two judges that fault is stricter.Being used in combination of two kinds of fault management module can make system make response to fault rapidly, and when any one fault management module goes wrong, also can not cause the system failure management disabler.
Fault management module one takes the mode of passive reception to move, the fault alarm information that the Real Time Monitoring task is sent.When not receiving fault alarm information, think the system normal operation., if receive fault alarm information, start immediately fault reaction mechanism.
Fault management module one adopts Real Time Monitoring and receives the mode that error message is reported, and just is in the state of monitoring after operation is got up always; A plurality of trouble shooting points are arranged in all software tasks, and when having fault to occur, trace routine is determined fault type and grade, and triggers the failure alarm signal amount, fault message is written to the message queue of appointment; After fault management module one is received the failure alarm signal amount, start immediately and take out fault alarm information from the failure message formation, according to fault type and grade, making the different processing such as warning, off-line operation or shutdown.
Fault management module two take the initiative the inquiry mode move, the periodic duty Mission Monitor, can report on one's own initiative separately operating state when each cycle of monitoring of task, think the system normal operation, if there is any task not report operating state within the specific time, fault management module two starts fault reaction mechanism.
The fault management module two-way is crossed a plurality of global variables and is checked the normal operating conditions of monitored task; If each task can work, they can the corresponding global variable of proper operation; Fault management module two is determined the operating state of institute's monitor task by the inspection of global variable; If all tasks can correctly be processed global variable, fault management module two thinks that system works is normal, does not trigger fault management capability; , if this fault management module two finds that the task of any monitoring can not the proper operation global variable, just judge that certain task breaks down, and start fault management module and process.
Description of drawings:
Fig. 1 is the structure chart of system failure management function.
Fig. 2 is the workflow diagram of fault administration module one.
Fig. 3 is the workflow diagram of fault administration module two.
Embodiment:
The invention provides a kind of Fault Management System of isomery, this algorithm can be applied to include but are not limited to: the fields such as safety signal system, industrial control system.Detailed process is as follows: the fault management capability of system depends on the collaborative work of the Fault Management System of two kinds of isomeries.When any one failure management method detected fault, system just entered the fault management pattern.The fault management capability framework of system is referring to Fig. 1.Suppose that two kinds of failure management methods are respectively by fault management module one and fault management module two realizations.As shown in Figure 2, fault management module one adopts Real Time Monitoring and receives the mode that error message is reported, and just is in the state of monitoring after operation is got up always.A plurality of trouble shooting points are arranged in all software tasks, and when having fault to occur, trace routine is determined fault type and grade, and triggers the failure alarm signal amount, fault message is written to the message queue of appointment.After fault management module one is received the failure alarm signal amount, start immediately and take out fault alarm information from the failure message formation, according to fault type and grade, making the different processing such as warning, off-line operation or shutdown.
As shown in Figure 3, the fault management module two-way is crossed a plurality of global variables and is checked the normal operating conditions of monitored task.If each task can work, they can the corresponding global variable of proper operation.Fault management module two is determined the operating state of institute's monitor task by the inspection of global variable.If all tasks can correctly be processed global variable, fault management module two thinks that system works is normal, does not trigger fault management capability., if this fault management module two finds that the task of any monitoring can not the proper operation global variable, just judge that certain task breaks down, and start fault management module and process.Any one in two kinds of failure management methods detects fault and enters troubleshooting, and whole system just enters fault mode, and forces system to enter safe condition.
The course of work of fault management module one:
1) at first to determine the task that fault management module one is monitored, and as far as possible at large consider the various faults that each task there will be, and the fault of each task is numbered and classifies.When trace routine detected the fault generation, the fault-signal amount was set to " True ", and with in the information package Write fault message queues such as fault numbering, fault level and processing mode.In order to guarantee exactly catastrophe failure information to be reported and submitted out, failure message can add check information, for example crc value when writing formation.And also have specific logical relation between the content of failure message, for example corresponding fault level has specific troubleshooting mode.If find in troubleshooting that the incorrect or logic of the information checking of fault message is not inconsistent, and directly makes equipment enter off-line state or by operating personnel, determines follow-up processing.
2) fault management module one must be to the fault message code division fault level of institute's monitor task, and the treatment measures of every kind of fault level are not identical yet.After fault management module one is received fault message, can be according to code judgement out of order spot, fault level and the fault state of fault message.Because fault level presets, fault management module one can determine follow-up measure according to the code of fault message.We can be divided into warning, off-line operation and three kinds of states of shutdown usually at the troubleshooting of design at present.
3) fault management module is in listening state always after operation.After the set of fault-signal amount being detected, can obtain fault message at once from the failure message formation, and start troubleshooting, make system enter the fault management pattern.
The course of work of fault management module two:
1) at first to determine the task that fault management module two is monitored.Monitoring here of task can be different with fault management module one, and while formulating the normal operation of each task must with fault management module two carry out mutual.For example, each task triggers the time requirement (one-period or a plurality of cycle) of fault management module two and interactive form (set of semaphore, reply the mutual etc. of formula for function call, the operation of global variable).
2) supposition fault management module two adopts the form of monitoring global variable to check the state of each task.Within each cycle, monitored task all will remove to operate according to predefined algorithm the global variable of oneself.The initial value of global variable can be provided in each cycle at random by fault management module two.Can correctly operate the global variable of oneself when all monitor tasks, fault management module two just thinks that monitor task is working properly.If any one task does not have at the appointed time or operated mistakenly corresponding global variable, fault management module two will triggering system enter the troubleshooting state.Different tasks can produce different fault categories and grade.
3) startup of fault management module two can be triggered by Interruption, periodic duty.The inspection intervals of task can be divided into one or more cycle.This parameter can be read by fault management module two from corresponding configuration file.
4) if all tasks can operate global variable in official hour, and the verification by fault management module two, fault management module two thinks that system works is normal.The action with fault management module two defineds if certain task fails to finish on schedule, fault management module two starts, and records corresponding fault message, and makes system enter the troubleshooting pattern.The troubleshooting of system can be divided into three kinds of warning, off-line operation and shutdown.
5) fault management module two also can adopt and directly reply mutual test mode and judge the operating state of monitored task.Fault management module two can send the request of inspection by semaphore, and solicited message is written in corresponding message queue.If monitored task can be according to the solicited message return data in official hour, and the checking by fault management mould fast two, malfunction do not triggered.Otherwise will start malfunction.
The overall global variable that fault management module two also can check, and the end product of this global variable should be the operational set of all monitored tasks.Also can adopt the combination of a plurality of fault management passages, each fault management passage is responsible for the troubleshooting of dissimilar or grade.
The passive collection fault alarm information of fault management module one of the present invention, and fault management module two is initiatively confirmed the normal operating conditions of task.These two kinds of fault management module come monitor task to carry out by two different channels, have effectively ensured the execution of system failure management.The monitoring of two pairs of tasks of fault management module can be adopted various ways according to the characteristics of task, and flexibility is good.Fault management module one is swift in response, and fault management module two judges that fault is stricter.Being used in combination of two kinds of fault management module can make system make response to fault rapidly, and when any one fault management module goes wrong, also can not cause the system failure management disabler.
Adopt the fault of semaphore and message queue to receive and processing mode in fault management processing mode one, adopted the processing mode of IE and global variable monitor task poll in troubleshooting mode two.On concrete technical finesse means, these two kinds of troubleshooting modes have the characteristics of oneself, are the technology of oneself inventing.And the enhancement method of two kinds of fault combined treatment should be original in field of track traffic.
The present invention is not limited to execution mode discussed above.Above description to embodiment is intended in order to describe and illustrate the technical scheme that the present invention relates to.Based on the apparent conversion of the present invention enlightenment or substitute and also should be considered to fall into protection scope of the present invention.Above embodiment is used for disclosing best implementation method of the present invention, so that those of ordinary skill in the art can apply numerous embodiments of the present invention and multiple alternative reaches purpose of the present invention.

Claims (5)

1. the Fault Management System of an isomery, it is characterized in that, the system failure management module is comprised of the fault management module of two isomeries, two fault management module take respectively the mode of passive reception and active inquiry to move, when any one fault management module started, system just entered the fault management pattern.
2. the Fault Management System of isomery as claimed in claim 1, is characterized in that, fault management module one takes the mode of passive reception to move, the fault alarm information that the Real Time Monitoring task is sent.When not receiving fault alarm information, think the system normal operation; , if receive fault alarm information, start immediately fault reaction mechanism.
3. the Fault Management System of isomery as claimed in claim 2, is characterized in that, fault management module one adopts Real Time Monitoring and receives the mode that error message is reported, and just is in the state of monitoring after operation is got up always; A plurality of trouble shooting points are arranged in all software tasks, and when having fault to occur, trace routine is determined fault type and grade, and triggers the failure alarm signal amount, fault message is written to the message queue of appointment; After fault management module one is received the failure alarm signal amount, start immediately and take out fault alarm information from the failure message formation, according to fault type and grade, making the different processing such as warning, off-line operation or shutdown.
4. the Fault Management System of isomery as claimed in claim 1, it is characterized in that, fault management module two take the initiative the inquiry mode move, the periodic duty Mission Monitor, can report on one's own initiative separately operating state when each cycle of monitoring of task, think the system normal operation, if there is any task not reporting operating state within the specific time, fault management module two starts fault reactions mechanism.
5. the Fault Management System of isomery as claimed in claim 4, is characterized in that, the fault management module two-way is crossed a plurality of global variables and checked the normal operating conditions of monitored task; If each task can work, they can the corresponding global variable of proper operation; Fault management module two is determined the operating state of institute's monitor task by the inspection of global variable; If all tasks can correctly be processed global variable, fault management module two thinks that system works is normal, does not trigger fault management capability; , if this fault management module two finds that the task of any monitoring can not the proper operation global variable, just judge that certain task breaks down, and start fault management module and process.
CN201210139929.3A 2012-05-08 2012-05-08 The Fault Management System of isomery Active CN103391207B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210139929.3A CN103391207B (en) 2012-05-08 2012-05-08 The Fault Management System of isomery

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210139929.3A CN103391207B (en) 2012-05-08 2012-05-08 The Fault Management System of isomery

Publications (2)

Publication Number Publication Date
CN103391207A true CN103391207A (en) 2013-11-13
CN103391207B CN103391207B (en) 2016-11-16

Family

ID=49535371

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210139929.3A Active CN103391207B (en) 2012-05-08 2012-05-08 The Fault Management System of isomery

Country Status (1)

Country Link
CN (1) CN103391207B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113043969A (en) * 2021-03-26 2021-06-29 中汽创智科技有限公司 Vehicle function safety monitoring method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1819531A (en) * 2006-03-21 2006-08-16 南京邮电大学 Tribal large-scale network fault managment based on mobile agent
CN101114945A (en) * 2007-09-04 2008-01-30 华为技术有限公司 Method for controlling alarm flux, managing equipment, managed equipment and system
CN102017537A (en) * 2008-04-30 2011-04-13 松下电工株式会社 Device management system
CN102158360A (en) * 2011-04-01 2011-08-17 华中科技大学 Network fault self-diagnosis method based on causal relationship positioning of time factors

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1819531A (en) * 2006-03-21 2006-08-16 南京邮电大学 Tribal large-scale network fault managment based on mobile agent
CN101114945A (en) * 2007-09-04 2008-01-30 华为技术有限公司 Method for controlling alarm flux, managing equipment, managed equipment and system
CN102017537A (en) * 2008-04-30 2011-04-13 松下电工株式会社 Device management system
CN102158360A (en) * 2011-04-01 2011-08-17 华中科技大学 Network fault self-diagnosis method based on causal relationship positioning of time factors

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
朱永娇: "一种基于异构网络的分布式故障管理模型", 《邵阳师范高等专科学校学报》, vol. 24, no. 2, 30 April 2002 (2002-04-30), pages 76 - 78 *
钟仕群,等: "一种基于贝叶斯网络的集成的故障定位模型", 《计算机技术与发展》, vol. 16, no. 12, 31 December 2006 (2006-12-31) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113043969A (en) * 2021-03-26 2021-06-29 中汽创智科技有限公司 Vehicle function safety monitoring method and system

Also Published As

Publication number Publication date
CN103391207B (en) 2016-11-16

Similar Documents

Publication Publication Date Title
CN101631057B (en) Network control method of dual-redundancy CAN bus
KR20190079809A (en) Fault injection test apparatus and method for the same
CN103699111A (en) Failure detection method and device for distributed monitoring system
CN102073284A (en) Dual-computer redundant embedded control system suitable for nuclear industrial robot
CN103077575A (en) Novel sensor access bus protocol
CN102130784A (en) Communication error monitoring system of power device based on Ethernet and method thereof
CN105204952A (en) Fault tolerance management method of multi-core operation system
JP2011043957A (en) Fault monitoring circuit, semiconductor integrated circuit, and faulty part locating method
CN112383457B (en) Safety slave station system based on CANopen protocol
CN105843208B (en) Train control system fault recovery method based on spring-go stress effect
CN106227096A (en) New-energy automobile monitoring method, device and car-mounted terminal
CN103605592A (en) Mechanism of detecting malfunctions of distributed computer system
CN104283718A (en) Network device and hardware fault diagnosis method used for network device
CN102975670B (en) The processing method of vehicle bus control system transient fault and system, vehicle
CN112099412B (en) Safety redundancy architecture of micro control unit
CN103391207A (en) Heterogeneous fault management system
CN110488206B (en) Fault monitoring system
CN103995759A (en) High-availability computer system failure handling method and device based on core internal-external synergy
CN102095952B (en) Self-monitoring system of valve-based electronic device of converter valve
CN103761157A (en) Method for implementing system fault-tolerant mechanism on basis of multitask patrol strategy
CN202153352U (en) Watchdog device capable of preventing computer control system from failure
CN111831507B (en) TCMS-RIOM control unit with safety level design
Grunske Transformational patterns for the improvement of safety properties in architectural specification
CN110096416B (en) Abnormity warning method and device, computer device and readable storage medium
CN114693051A (en) On-duty monitoring method, system, equipment and medium for power change station

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant