CN102722146B - Distributed system control structure with failure protection function, and failure protection method - Google Patents

Distributed system control structure with failure protection function, and failure protection method Download PDF

Info

Publication number
CN102722146B
CN102722146B CN 201210162638 CN201210162638A CN102722146B CN 102722146 B CN102722146 B CN 102722146B CN 201210162638 CN201210162638 CN 201210162638 CN 201210162638 A CN201210162638 A CN 201210162638A CN 102722146 B CN102722146 B CN 102722146B
Authority
CN
China
Prior art keywords
node
distributed system
communication
layer
failure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201210162638
Other languages
Chinese (zh)
Other versions
CN102722146A (en
Inventor
冯丽媛
姚绪梁
金鸿章
曹然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN 201210162638 priority Critical patent/CN102722146B/en
Publication of CN102722146A publication Critical patent/CN102722146A/en
Application granted granted Critical
Publication of CN102722146B publication Critical patent/CN102722146B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a distributed system control structure with a failure protection function, and a failure protection method. The method comprises the following steps of: performing failure protection on the original connection in the distributed system, starting from a second layer of the distributed system, and performing connection between neighboring nodes at the same layer; during a process of communication between upper and lower layers or management, massively transmitting a control command to the lower layer nodes connected with an upper layer node by the upper layer node, and detecting whether the communication or the management is failed according to information returned back by the lower layer nodes; and if the communication or the management is failed, performing control by the neighboring lower layer nodes to recover the failed communication or the failed management. The method provided by the invention is suitable for occasions with high requirements on safety and reliability, and especially, suitable for the systems, such as a fire alarm system, and a mine safety system.

Description

Distributed system control structure and failure protection method with fail safe
Technical field
What the present invention relates to is a kind of distributed system control field, the present invention also relates to a kind of distributed system and controls failure protection method.
Background technology
The application that distributed system is controlled is very extensive, so reliability and fail safe that distributed system is controlled are particularly important.The reliability that at present distributed system is controlled is mainly derived from the fault tolerant mechanism of its structure self, communication even of overall importance or management interrupt, and local station still can maintenance work, but can not recover communication or the management interrupted.Therefore need a kind of effective guard method, be effectively protected after making the communication of distributed system control or being controlled at inefficacy, even communication and control are still effective.
As shown in Figure 1, this structure depends on the structure of distributed system control itself to existing distributed system control structure to the reliability of communication or management, and, after communication or management interrupt appear in system, can not recover.Therefore, need a kind of method to address the above problem.
Summary of the invention
The object of the present invention is to provide the distributed system control structure with fail safe that a kind of reliability and safety is high.The failure protection method that provides a kind of distributed system to control is provided.
The object of the present invention is achieved like this:
Distributed system control structure with fail safe of the present invention is: the adjacent node of the second layer of distributed system connects successively; In the 3rd node layer of distributed system, the node belonged under the same node control of last layer connects successively; The connected mode of other layers of distributed system is identical with the 3rd layer.
The failure protection method that distributed system of the present invention is controlled comprises:
The adjacent node of the second layer of distributed system connects successively; In the 3rd node layer of distributed system, the node belonged under the same node control of last layer connects successively; The connected mode of other layers of distributed system is identical with the 3rd layer;
When upper layer node is controlled the some nodes of lower floor, all nodes of controlling to lower floor send information simultaneously; If the node of controlling, return to acknowledge message after executing control task, also to return to acknowledge message if not controlled node;
After a certain lower level node being detected and returning without acknowledge message, i.e. communication or management were lost efficacy and were judged to be failure node, the node that failure node is adjacent is controlled failure node, and communication or supervisory signal that upper layer node is sent are sent to failure node, recovers communication or the management of losing efficacy; Again, after the confirmation information of receiving the failure node transmission until upper layer node, original communication connection is returned in redirect again.
The present invention carries out the fail safe setting to original connection in distributed system, from the second layer of distributed system, with layer adjacent node, connects setting; In levels communication or management process, upper layer node is by the control command mass-sending to coupled lower level node, and whether the information detection communication of returning according to lower level node or management lost efficacy; If find communication or management inefficacy, by adjacent lower level node, controlled, to recover communication or the management of losing efficacy.
The present invention, by the fail safe to the distributed system node, can increase reliability and fail safe that distributed system is controlled.After distributed system is controlled inefficacy, can recover rapidly communication or control, require high occasion for safety and reliability, particularly applicable such as fire alarm system, mine safety system etc.
The accompanying drawing explanation
Fig. 1 has the distributed system architecture schematic diagram now.
Fig. 2 distributed system fail safe of the present invention schematic diagram.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described further:
In conjunction with Fig. 2, it is 3 layers that the distributed system control structure with fail safe is divided into: ground floor has 1 main control computer, and the second layer has n slave, and the 3rd layer has m control unit under each slave is controlled.The second layer is connected slave 1 successively to slave n, and the 3rd layer of each slave controlled m lower control unit and connected successively, and the control units under the control of different control slaves do not connect.
Survey communication or manage and whether lost efficacy.When upper layer node sends control command to a certain node of lower floor, order can be mass-sended to all lower level nodes of controlling to upper layer node.After lower level node receives orders, if give the control command of oneself, carry out control command and return to acknowledge message; If not give the control command of oneself, still to return to acknowledge message.Upper layer node judges that according to the confirmation message returned whether communication is effective, if do not receive acknowledge message, thinks communication failure, otherwise thinks that communication effectively.
Communication or management recover to lose efficacy.If upper layer node finds that there is the confirmation message of not returning, think communication failure, at this moment enable fail safe.Because same node layer connects successively, so need to select the passage of fail safe after losing efficacy.System of selection is as follows: if node i lost efficacy, whether decision node i is last node of current layer node, if last node is sent to the i node by the i-1 node by control information, to recover inefficacy; If i is not last node, by the i+1 node, control information is sent to the i node.After upper layer node finds that the node lost efficacy recovers normally, reactivate former passage, adjacent lower level node will no longer be protected former failure node.

Claims (1)

1. the failure protection method that a distributed system is controlled is characterized in that:
The adjacent node of the second layer of distributed system connects successively; In the 3rd node layer of distributed system, the node belonged under the same node control of last layer connects successively; The connected mode of other layers of distributed system is identical with the 3rd layer;
When upper layer node is controlled the some nodes of lower floor, all nodes of controlling to lower floor send information simultaneously; If the node of controlling, return to acknowledge message after executing control task, also to return to acknowledge message if not controlled node;
After a certain lower level node being detected and returning without acknowledge message, i.e. communication or management were lost efficacy and were judged to be failure node, the node that failure node is adjacent is controlled failure node, and communication or supervisory signal that upper layer node is sent are sent to failure node, recovers communication or the management of losing efficacy; Again, after the confirmation information of receiving the failure node transmission until upper layer node, original communication connection is returned in redirect again.
CN 201210162638 2012-05-24 2012-05-24 Distributed system control structure with failure protection function, and failure protection method Expired - Fee Related CN102722146B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201210162638 CN102722146B (en) 2012-05-24 2012-05-24 Distributed system control structure with failure protection function, and failure protection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201210162638 CN102722146B (en) 2012-05-24 2012-05-24 Distributed system control structure with failure protection function, and failure protection method

Publications (2)

Publication Number Publication Date
CN102722146A CN102722146A (en) 2012-10-10
CN102722146B true CN102722146B (en) 2013-12-18

Family

ID=46947947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201210162638 Expired - Fee Related CN102722146B (en) 2012-05-24 2012-05-24 Distributed system control structure with failure protection function, and failure protection method

Country Status (1)

Country Link
CN (1) CN102722146B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1581813A (en) * 2003-08-01 2005-02-16 光桥科技(中国)有限公司 Method for conducting data transmission using logic loop network in ethernet
CN1741489A (en) * 2005-09-01 2006-03-01 西安交通大学 High usable self-healing Logic box fault detecting and tolerating method for constituting multi-machine system
CN1889496A (en) * 2006-07-19 2007-01-03 山东富臣发展有限公司 Layer control tree-shape network based on CAN bus for supporting plug and use
WO2008058933A1 (en) * 2006-11-13 2008-05-22 Siemens Aktiengesellschaft Method for establishing bidirectional data transmission paths in a wireless meshed communication network
CN101378327A (en) * 2007-08-29 2009-03-04 中国移动通信集团公司 Communication network system and method for processing communication network business

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SE524863C2 (en) * 2001-04-23 2004-10-12 Transmode Systems Ab Optical coarse wavelength division multiplexing system has multiple logical optical rings that form multiplexed ring structure, such that each ring links several nodes of ring structure

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1581813A (en) * 2003-08-01 2005-02-16 光桥科技(中国)有限公司 Method for conducting data transmission using logic loop network in ethernet
CN1741489A (en) * 2005-09-01 2006-03-01 西安交通大学 High usable self-healing Logic box fault detecting and tolerating method for constituting multi-machine system
CN1889496A (en) * 2006-07-19 2007-01-03 山东富臣发展有限公司 Layer control tree-shape network based on CAN bus for supporting plug and use
WO2008058933A1 (en) * 2006-11-13 2008-05-22 Siemens Aktiengesellschaft Method for establishing bidirectional data transmission paths in a wireless meshed communication network
CN101378327A (en) * 2007-08-29 2009-03-04 中国移动通信集团公司 Communication network system and method for processing communication network business

Also Published As

Publication number Publication date
CN102722146A (en) 2012-10-10

Similar Documents

Publication Publication Date Title
CN104440923B (en) A kind of emergent stop signal control system for robot and robot thereof
CN105095001B (en) Virtual machine abnormal restoring method under distributed environment
CN103955188A (en) Control system and method supporting redundancy switching function
CN104137477B (en) For disposing the technology that situation changes in interconnecting nodes
CN102427412A (en) Zero-delay disaster recovery switching method and system of active standby source based on content distribution network
CN103401696A (en) Dual-network redundant communication system in industrial equipment and communication method thereof
CN105915426A (en) Failure recovery method and device of ring network
CN104977907A (en) Direct Connect Algorithm
CN105204952A (en) Fault tolerance management method of multi-core operation system
CN108725521B (en) Hot standby redundancy management system and method for main and standby control centers of rail transit
CN103455464A (en) Relay device, connection management method, and information communication system
CN104461811A (en) Graded and hierarchical spacecraft single particle soft error protection system structure
US7836208B2 (en) Dedicated redundant links in a communicaton system
CN101163059B (en) Network node detection method and apparatus
CN108445857B (en) Design method for 1+ N redundancy mechanism of SCADA system
CN102722146B (en) Distributed system control structure with failure protection function, and failure protection method
CN103051482A (en) Method for isolating and restoring port based on FC (Fiber Channel) switchboard
KR101098041B1 (en) Subway fire-sensing system and control method thereof
CN104714439A (en) Safety relay box system
CN101753465B (en) Protection method taking Ethernet Ring protection system to control VLAN message and device thereof
CN101568135A (en) Communication method, communication equipment and communication system
JP2015087918A (en) Tunnel disaster prevention system
CN102638369B (en) Method, device and system for arbitrating main/standby switch
KR101846222B1 (en) Redundancy system and controllin method thereof
CN109361672A (en) A kind of the data back transmission method and system of safety insulating device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20131218

Termination date: 20190524

CF01 Termination of patent right due to non-payment of annual fee