WO2005015833A1 - Arrangement and method for connecting a processing node in a distributed system - Google Patents

Arrangement and method for connecting a processing node in a distributed system Download PDF

Info

Publication number
WO2005015833A1
WO2005015833A1 PCT/EP2004/051700 EP2004051700W WO2005015833A1 WO 2005015833 A1 WO2005015833 A1 WO 2005015833A1 EP 2004051700 W EP2004051700 W EP 2004051700W WO 2005015833 A1 WO2005015833 A1 WO 2005015833A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
processing
processing node
guardian
coupled
Prior art date
Application number
PCT/EP2004/051700
Other languages
French (fr)
Inventor
Christopher P Temple
Original Assignee
Freescale Semiconductors, Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Freescale Semiconductors, Inc filed Critical Freescale Semiconductors, Inc
Priority to EP04766406A priority Critical patent/EP1654833B1/en
Priority to JP2006522356A priority patent/JP4579242B2/en
Priority to US10/567,309 priority patent/US7818613B2/en
Publication of WO2005015833A1 publication Critical patent/WO2005015833A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/12Arrangements for remote connection or disconnection of substations or of equipment thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • H04L12/407Bus networks with decentralised control

Definitions

  • This invention relates to distributed systems and particularly (thoughy not exclusively) to hard real-time systems using static TDMA (Time Division Multiple Access) based medium arbitration that should remain operational even if subjected to the arbitrary failure of a single processing node.
  • TDMA Time Division Multiple Access
  • a fail- uncontrolled processing node may fail in an arbitrary way by sending random messages at unspecified points in time in violation of any given network arbitration scheme.
  • This provides the advantage of transferring the problem of fault containment from the output interface of a potentially faulty processing node to the input interface of fault-free processing nodes. By doing so, problems encountered by spatial proximity faults or functional dependencies within a faulty processing node that may jeopardize fault containment at its output interface are mitigated .
  • FIG. 1 shows a block schematic illustration of a known bus guardian based distributed processing system according to a first embodiment
  • FIG. 2 shows a block schematic illustration of a node guardian arrangement which may be used in the present invention
  • FIG. 3 shows a block schematic illustration of a distributed processing system developed from the bus guardian based system of FIG. 1 and utilising the node guardian arrangement of FIG. 2;
  • FIG. 4 shows a block schematic illustration of an improved version of the distributed processing system of FIG. 3 for making use of dual channel capabilities
  • FIG. 5 shows a block schematic illustration of a distributed processing system containing four central processing nodes, demonstrating how the node guardian approach of FIG. A scales to a larger system;
  • FIG. 6 shows a block schematic illustration of a node guardian arrangement which may be used in the present invention according to a second embodiment
  • FIG. 7 shows a block schematic illustration of a distributed processing system utilising the node guardian arrangement of FIG.6.
  • a known bus guardian based distributed processing TTP/CTM (Time Triggered Protocol class C) network system 100 includes processing nodes 110, 120, 130, 140, 150, 160, 170, 180; each of the processing nodes has a bus guardian BG 111, 121, 131, 141, 142, 151, 152, 161, 171, 181.
  • the processing nodes 110, 120 and 130 are coupled via a common channel 190 to the processing nodes 140 and 150, and constitute an error containment region 101.
  • the processing nodes 160, 170 and 180 are coupled via a common channel 195 to the processing nodes 140 and 150, and constitute an error containment region 102.
  • the invention allows the problem of protecting a processing node in a distributed system against 'babbling idiot' failures to be solved by equipping each processing node with a 'node guardian' .
  • the structure of such a node guardian is shown in FIG. 2.
  • the node guardian 200 consists of a set of switches (FIG. 2 shows an example of three, namely: 240, 241, 242) that connect input signals received from different subnetworks through a set of bus drivers 230, 231, 232 to a receiver 271 of the communication processor 280 via a logical-OR operation 260 and a control unit 250 that interoperates with the communication processor 280 via a control unit 272 and controls the state of each respective switch 240, 241, 242.
  • input switches 240, 241, 242 combined with a logic element 260 act as an in input multiplexer under the control of the control unit 250.
  • Connection of the communication processor 280 to the different subnetworks via the node guardian 200 can occur either as a dedicated transmitter for the subnetwork as demonstrated in the case of bus driver 210, or as a combined transmitter and receiver for the subnetwork as demonstrated in the case of bus drivers 220 and 230, or as a dedicated receiver for the subnetwork as demonstrated in the case of bus drivers 231 and 232.
  • control units 272 and 250 are only separated to demonstrate conformance to the known FlexRayTM architecture, and may otherwise be commonly provided.
  • the node guardian 200 protects the receiver 271 of the communication processor against subnetworks that are blocked by jabbering processing nodes by enabling and disabling the respective switches according to a TDMA schedule (which will be understood and need not be described in further detail) based on the knowledge of the assignment of the transmission slots to the respective subnetworks.
  • a TDMA schedule which will be understood and need not be described in further detail
  • the node guardian 200 can enable and disable the reception path between one of the respective subnetworks 230, 231 and 232 and the receiver 271 of the associated communication processor 280 prior to the actual transmission of a message on a particular subnetwork.
  • the node guardian 200 implements an input protection boundary where protection occurs within the sphere of the fault-free processing node- This eliminates the risk of spatial proximity faults that in the case of the bus guardian could cause both the bus guardian and the associated communication processor to fail in a malign way resulting m jabbering, in particular, if both share mutual and potentially faulty resources such as clock or power supply.
  • the node guardian 200 and the communication processor 280 may share resources such as the clock oscillator or the power supply, without putting the protection of other node guardian protected processing nodes at risk.
  • FIG. 3 shows an example of a hierarchical distributed processing network system system developed from the purely bus guardian based system of FIG. 1 and utilising the bus guardian arrangement 200 of FIG. 2.
  • processing nodes 310, 320, 330, 360, 370 and 380 implement the bus guardian approach as in the system of FIG. 1, while two processing nodes 340 and 350 each incorporate the arrangement 200 of FIG. 2 m 341 and 351 and are based on the node guardian approach.
  • the processing nodes 310, 320 and 330 are coupled via a common channel 390 to the processing node 340 and constitute error containment region 301, and are also coupled v a the channel 390 to the processing node 350.
  • the processing nodes 360, 370 and 380 are coupled via a common channel 395 to the processing node 350 and constitute an error containment region 302, and are also coupled via the channel 395 to the processing node 340.
  • the processing node 340 is coupled to the processing node 350 via a unidirectional path 393, and the processing node 350 is coupled to the processing node 340 via a unidirectional path 398.
  • Path 393 enables processing node 340 to communicate with processing node 350 even if a jabbering fault in processing node 310, 320 or 330 has penetrated past the respective bus guardian 311, 312 or 313 and blocked channel 390.
  • Path 398 serves m a corresponding way for processing node 350. In the system of FIG.
  • a fault occurring in error containment region 301 i.e., processing nodes 310, 320, 330 or 340
  • region 301 fault propagation path a and xl
  • a clear concept of confinement is implemented at the system level.
  • FIG. 4 shows an improved version of the example shown in FIG. 3 that makes use of the dual channel capabilities provided in particular by FlexRayTM.
  • the processing nodes 410, 420 and 430 are coupled to the processing nodes 440 and 450 via a common channel 491, and the processing nodes 460, 470 and 480 are coupled to the processing nodes 440 and 450 via a common channel 495; the processing node 440 is coupled to the processing node 450 via path 492, and the processing node 450 is coupled to the processing node 440 via path 497.
  • the processing nodes 410, 420, 430 and 440 can still communicate via channel 491. The same holds true vice versa for a channel failure in fault containment region 402.
  • FIG. 5 shows how the node guardian approach (as shown, for example, m FIG. 3) scales to a large system containing four central processing nodes 540, 545, 550, 555 each having a node guardian arrangement 200 for each channel (the figure only being completed for one channel) .
  • the system 500 of FIG. 5 as can be seen by comparison with FIG.
  • the arrangement of the four central processing nodes 540, 545, 550, 555 each having a node guardian arrangement 200 and each being crosscoupled by a unidirectional path 591, 592, 593, 594 provides means for starting TDMA based communication among these four central processing nodes system even under the assumption of a general asymmetric communication fault, which is also known as Byzantine fault, as the two conditions necessary for mitigating such a fault are fulfilled.
  • the first condition demands the provision of 3k+l processing nodes to mitigate k Byzantine faults. For k equal 1 the condition is met with 4 processing nodes.
  • the second condition demands the ability to exchange k+1 rounds of communication among the fault-free processing nodes. For k equal 1 this condition implies the ability to exchange 2 rounds of communication.
  • the arrangement with four central processing nodes 540, 545, 550, 555 each having a node guardian arrangement 200 and each being crosscoupled by a unidirectional path 591, 592, 593, 594 can be used to define a system with four error containment regions whereby each of the four central processing nodes would be located in a single one of the four error containment regions .
  • An alternative embodiment of a node guardian is shown in FIG. 6 in which the same reference numerals are used for the same elements as those for the node guardian shown in FIG. 2.
  • the node guardian 600 differs to that of the node guardian 200 shown in FIG. 2 by the inclusion of a logic element 602, which includes an OR gate 605 and a switch 601.
  • the logic element 602 is placed in the transmit path of the node guardian 600 coupled between the transmitter 270 and the bus driver 210.
  • An output from the transmitter forms one input to the OR gate 605 and the output from the switch 601 forms a second input (i.e. a signal received by the node guardian) .
  • a logic element 604, which includes an OR gate 606 and a switch 603 is placed in the transmit path of the node guardian 600 coupled between the transmitter 270 and the bus driver 220.
  • An output from the transmitter forms one input to the logic element 604 and the output from the switch 603, forms a second input.
  • the operation of the switches 601 and 603 are controlled via the control unit 250 such that when a signal is received by the node guardian 600, which s to be transferred to another node (not shown) , the control unit 250 operates one or both switches 601, 603 to allow the received signal to be communicated to the corresponding OR-gate 605,606, which acts as a multiplexer for transmitting the received signal via the bus drivers 210 and/or 220.
  • the node guardian 600 embodiment shown in FIG. 6 allows a node in a error containment region, for example error containment region 301, to broadcast a message directly to a node in another error containment region, for example error containment region 302, without the need for a processing node (e.g. communication processor), coupled to the node guardian to have to actively relay the message on to the other error containment region.
  • a processing node e.g. communication processor
  • node guardian 600 embodiment shown in FIG. 6 avoids the need for the alternative embodiment of coupling error containment region 301 to error containment region 302 by the use of extra tracking, as shown in FIG. 3. Consequently, by allowing the node guardian 341 to protect against a jabbering fault within error containment region 301 messages can be conveyed via the node guardian 341 to node guardian 350 and vice versa.
  • the corresponding network configuration incorporating node guardian 600 is shown in FIG. 1 , in which the same reference numberals are used for the same elements as those shown in FIG 3.
  • the error containment boundary is defined within the reception path of the processing node. This stands in contrast to the bus guardian approach where the error containment boundary is defined within the transmission path of the processing node where it can be impacted by common failure modes or spatial proximity faults that may cause both a faulty behaviour of the communication processor resulting in an unscheduled transmission as well as a faulty behaviour of the bus guardian allowing the unscheduled transmission to propagate to the network.
  • the node guardian approach eliminates many problems encountered with the bus guardian approach concerning avoiding common failure modes, such as independent clock sourcmg, independent power supply, testing and test interaction.
  • the invention transfers the problem of fault containment from the output interface of a potentially faulty processing node to the input interface of fault-free processing nodes. By doing so, problems encountered by spatial proximity faults or functional dependencies within a faulty processing node that may jeopardize fault containment at its output interface are mitigated.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Small-Scale Networks (AREA)
  • Hardware Redundancy (AREA)
  • Computer And Data Communications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Multi Processors (AREA)

Abstract

An arrangement for connecting a processing node in a distributed system containing fail-uncontrolled processing nodes, the arrangement comprising: receiver means for receiving signals from another processing node of the system, and node guardian means coupled to the receiver means wherein the node guardian means includes switch means for receiving a plurality of unidirectional input signals, logic means coupled to the switch means for combining the plurality of received input signals according to a predetermined logic function, and control means coupled to the switch means for controlling application of the plurality of received signals to the logic means for controlling reception of signals thereat so as to reduce reception by the processing node of uncontrolled transmission from another processing node of the system.

Description

ARRANGEMENT AND METHOD FOR CONNECTING A PROCESSING NODE IN A DISTRIBUTED SYSTEM
Field of the Invention
This invention relates to distributed systems and particularly (thoughy not exclusively) to hard real-time systems using static TDMA (Time Division Multiple Access) based medium arbitration that should remain operational even if subjected to the arbitrary failure of a single processing node.
Background of the Invention
In the field of this invention it is known that in a distributed processing system having a plurality of processing nodes, while a processing node can fail in an arbitrary way, it is necessary to assure that a single faulty fail-uncontrolled processing node does not disrupt communication among fault-free processing nodes. A fail- uncontrolled processing node may fail in an arbitrary way by sending random messages at unspecified points in time in violation of any given network arbitration scheme. In order to achieve this objective it is known to use either (a) a fully connected network topology, or (b) a multi-drop transmission line topology, or (c) a star topology containing an intelligent central distribution unit, or (d) a multi-access topology with an *anti-jabbering' unit in the form of a bus guardian at the outgoing network interface of each processing node.
However, these approaches have the disadvantages that: (a) a fully connected network topology involves high cost, and an unfeasible network structure; (b) a multi-drop transmission line topology requires a receiver for every multi-drop transmission channel; (c) a star topology containing an intelligent central distribution unit requires high complexity in the distribution unit, increasing susceptibility to faults; (d) in a multi-access topology with an xanti- jabbering' unit in the form of a bus guardian at the outgoing network interface of each processing node, the bus guardian is susceptible to spatial proximity faults, and potential functional dependency between the bus guardian and the communication unit within the processing node.
A need therefore exists for a scheme for interconnecting processing nodes in a distributed system wherein the abovementioned disadvantages may be alleviated.
Statement of Invention
In accordance with a first aspect of the present invention there is provided an arrangement for connecting a processing node in a distributed system as claimed in claim 1.
This provides the advantage of transferring the problem of fault containment from the output interface of a potentially faulty processing node to the input interface of fault-free processing nodes. By doing so, problems encountered by spatial proximity faults or functional dependencies within a faulty processing node that may jeopardize fault containment at its output interface are mitigated .
In accordance with a second aspect of the present invention there is provided a method of operating a processing node in a distributed system as claimed in claim 10.
Brief Description of the Drawings
Various methods and arrangements for interconnecting fail-uncontrolled processing nodes m a dependable distributed system incorporating the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which:
FIG. 1 shows a block schematic illustration of a known bus guardian based distributed processing system according to a first embodiment; FIG. 2 shows a block schematic illustration of a node guardian arrangement which may be used in the present invention;
FIG. 3 shows a block schematic illustration of a distributed processing system developed from the bus guardian based system of FIG. 1 and utilising the node guardian arrangement of FIG. 2;
FIG. 4 shows a block schematic illustration of an improved version of the distributed processing system of FIG. 3 for making use of dual channel capabilities;
FIG. 5 shows a block schematic illustration of a distributed processing system containing four central processing nodes, demonstrating how the node guardian approach of FIG. A scales to a larger system;
FIG. 6 shows a block schematic illustration of a node guardian arrangement which may be used in the present invention according to a second embodiment;
FIG. 7 shows a block schematic illustration of a distributed processing system utilising the node guardian arrangement of FIG.6. Description of Preferred Embodiments
Referring firstly to FIG. 1, a known bus guardian based distributed processing TTP/C™ (Time Triggered Protocol class C) network system 100 includes processing nodes 110, 120, 130, 140, 150, 160, 170, 180; each of the processing nodes has a bus guardian BG 111, 121, 131, 141, 142, 151, 152, 161, 171, 181. The processing nodes 110, 120 and 130 are coupled via a common channel 190 to the processing nodes 140 and 150, and constitute an error containment region 101. The processing nodes 160, 170 and 180 are coupled via a common channel 195 to the processing nodes 140 and 150, and constitute an error containment region 102. 'Babbling idiot' faults (when a faulty processing node continually broadcasts a message, which takes over the bus) occurring in processing nodes 110, 120 or 130 act on 140 and 150 (depicted by fault propagation path a) but not on processing nodes 160, 170 and 180, while faults occurring in processing nodes 160, 170 or 180 act on 140 and 150 (depicted by fault propagation path b) but not on processing nodes 110, 120 and 130. Faults in 140 or 150 (fault propagation path xl and x2) , however, act on both error containment regions, i.e., processing nodes 110, 120, 130, 160, 170 and 180 including processing nodes 140 and 150.
The invention allows the problem of protecting a processing node in a distributed system against 'babbling idiot' failures to be solved by equipping each processing node with a 'node guardian' . The structure of such a node guardian is shown in FIG. 2. The node guardian 200 consists of a set of switches (FIG. 2 shows an example of three, namely: 240, 241, 242) that connect input signals received from different subnetworks through a set of bus drivers 230, 231, 232 to a receiver 271 of the communication processor 280 via a logical-OR operation 260 and a control unit 250 that interoperates with the communication processor 280 via a control unit 272 and controls the state of each respective switch 240, 241, 242. It will be appreciated that input switches 240, 241, 242 combined with a logic element 260 act as an in input multiplexer under the control of the control unit 250. Connection of the communication processor 280 to the different subnetworks via the node guardian 200 can occur either as a dedicated transmitter for the subnetwork as demonstrated in the case of bus driver 210, or as a combined transmitter and receiver for the subnetwork as demonstrated in the case of bus drivers 220 and 230, or as a dedicated receiver for the subnetwork as demonstrated in the case of bus drivers 231 and 232.
It will be understood that control units 272 and 250 are only separated to demonstrate conformance to the known FlexRay™ architecture, and may otherwise be commonly provided.
The node guardian 200 protects the receiver 271 of the communication processor against subnetworks that are blocked by jabbering processing nodes by enabling and disabling the respective switches according to a TDMA schedule (which will be understood and need not be described in further detail) based on the knowledge of the assignment of the transmission slots to the respective subnetworks. By following the TDMA schedule the node guardian 200 can enable and disable the reception path between one of the respective subnetworks 230, 231 and 232 and the receiver 271 of the associated communication processor 280 prior to the actual transmission of a message on a particular subnetwork.
In contrast to a bus guardian that implements an output protection boundary where protection occurs within the sphere of the faulty processing node the node guardian 200 implements an input protection boundary where protection occurs within the sphere of the fault-free processing node- This eliminates the risk of spatial proximity faults that in the case of the bus guardian could cause both the bus guardian and the associated communication processor to fail in a malign way resulting m jabbering, in particular, if both share mutual and potentially faulty resources such as clock or power supply. In the case of the node guardian the node guardian 200 and the communication processor 280 may share resources such as the clock oscillator or the power supply, without putting the protection of other node guardian protected processing nodes at risk.
FIG. 3 shows an example of a hierarchical distributed processing network system system developed from the purely bus guardian based system of FIG. 1 and utilising the bus guardian arrangement 200 of FIG. 2. In the system 300 of FIG. 3, processing nodes 310, 320, 330, 360, 370 and 380 implement the bus guardian approach as in the system of FIG. 1, while two processing nodes 340 and 350 each incorporate the arrangement 200 of FIG. 2 m 341 and 351 and are based on the node guardian approach. The processing nodes 310, 320 and 330 are coupled via a common channel 390 to the processing node 340 and constitute error containment region 301, and are also coupled v a the channel 390 to the processing node 350. The processing nodes 360, 370 and 380 are coupled via a common channel 395 to the processing node 350 and constitute an error containment region 302, and are also coupled via the channel 395 to the processing node 340. The processing node 340 is coupled to the processing node 350 via a unidirectional path 393, and the processing node 350 is coupled to the processing node 340 via a unidirectional path 398. Path 393 enables processing node 340 to communicate with processing node 350 even if a jabbering fault in processing node 310, 320 or 330 has penetrated past the respective bus guardian 311, 312 or 313 and blocked channel 390. Path 398 serves m a corresponding way for processing node 350. In the system of FIG. 3, a fault occurring in error containment region 301 (i.e., processing nodes 310, 320, 330 or 340) is confined to region 301 (fault propagation path a and xl) and cannot impact the processing nodes in error containment region 302. The same holds true vice versa for faults originating in error containment region 302. Hence, a clear concept of confinement is implemented at the system level.
FIG. 4 shows an improved version of the example shown in FIG. 3 that makes use of the dual channel capabilities provided in particular by FlexRay™. Additionally to the system of FIG. 3, in the system 400 of FIG. 4 the processing nodes 410, 420 and 430 are coupled to the processing nodes 440 and 450 via a common channel 491, and the processing nodes 460, 470 and 480 are coupled to the processing nodes 440 and 450 via a common channel 495; the processing node 440 is coupled to the processing node 450 via path 492, and the processing node 450 is coupled to the processing node 440 via path 497. In the system of FIG. 4, it is possible to tolerate transient and one permanent channel failure in either fault containment region. In case of a failure of channel 490, for example, the processing nodes 410, 420, 430 and 440 can still communicate via channel 491. The same holds true vice versa for a channel failure in fault containment region 402.
FIG. 5 shows how the node guardian approach (as shown, for example, m FIG. 3) scales to a large system containing four central processing nodes 540, 545, 550, 555 each having a node guardian arrangement 200 for each channel (the figure only being completed for one channel) . In the system 500 of FIG. 5, as can be seen by comparison with FIG. 3, two further central processing nodes 545 and 555 have been added to the processing nodes 540 and 550, the processing nodes 545 and 555 being provided in error containment regions 501 and 502 respectively, the processing nodes 510, 520, 530 being coupled directly to central processing nodes 540, 545 and 550, the processing nodes 560, 570, 580 being coupled directly to central processing nodes 540, 545 and 555, and the processing nodes 540, 545, 550, 555 being cross- coupled via unidirectional paths. The cross-coupling enables processing nodes 540, 545, 550 and 555 to maintain communication even if, for example, the channel connecting 510, 520 and 530 with 540, 545 and 550 fails. The arrangement of the four central processing nodes 540, 545, 550, 555 each having a node guardian arrangement 200 and each being crosscoupled by a unidirectional path 591, 592, 593, 594 provides means for starting TDMA based communication among these four central processing nodes system even under the assumption of a general asymmetric communication fault, which is also known as Byzantine fault, as the two conditions necessary for mitigating such a fault are fulfilled. The first condition demands the provision of 3k+l processing nodes to mitigate k Byzantine faults. For k equal 1 the condition is met with 4 processing nodes. The second condition demands the ability to exchange k+1 rounds of communication among the fault-free processing nodes. For k equal 1 this condition implies the ability to exchange 2 rounds of communication. By providing each central processing node with an exclusive unidirectional broadcast path to the other central processing nodes 2 rounds of communication can be ensured among the central processing nodes even under the assumption of any single faulty central processing node.
In the general case the arrangement with four central processing nodes 540, 545, 550, 555 each having a node guardian arrangement 200 and each being crosscoupled by a unidirectional path 591, 592, 593, 594 can be used to define a system with four error containment regions whereby each of the four central processing nodes would be located in a single one of the four error containment regions . An alternative embodiment of a node guardian is shown in FIG. 6 in which the same reference numerals are used for the same elements as those for the node guardian shown in FIG. 2.
The node guardian 600 differs to that of the node guardian 200 shown in FIG. 2 by the inclusion of a logic element 602, which includes an OR gate 605 and a switch 601. The logic element 602 is placed in the transmit path of the node guardian 600 coupled between the transmitter 270 and the bus driver 210. An output from the transmitter forms one input to the OR gate 605 and the output from the switch 601 forms a second input (i.e. a signal received by the node guardian) . Additionally, or alternatively, in an equivalent way a logic element 604, which includes an OR gate 606 and a switch 603, is placed in the transmit path of the node guardian 600 coupled between the transmitter 270 and the bus driver 220. An output from the transmitter forms one input to the logic element 604 and the output from the switch 603, forms a second input.
The operation of the switches 601 and 603 are controlled via the control unit 250 such that when a signal is received by the node guardian 600, which s to be transferred to another node (not shown) , the control unit 250 operates one or both switches 601, 603 to allow the received signal to be communicated to the corresponding OR-gate 605,606, which acts as a multiplexer for transmitting the received signal via the bus drivers 210 and/or 220. The node guardian 600 embodiment shown in FIG. 6 allows a node in a error containment region, for example error containment region 301, to broadcast a message directly to a node in another error containment region, for example error containment region 302, without the need for a processing node (e.g. communication processor), coupled to the node guardian to have to actively relay the message on to the other error containment region.
This provides the advantage of reducing message latency and computational requirements.
Additionally, the node guardian 600 embodiment shown in FIG. 6 avoids the need for the alternative embodiment of coupling error containment region 301 to error containment region 302 by the use of extra tracking, as shown in FIG. 3. Consequently, by allowing the node guardian 341 to protect against a jabbering fault within error containment region 301 messages can be conveyed via the node guardian 341 to node guardian 350 and vice versa. The corresponding network configuration incorporating node guardian 600 is shown in FIG. 1 , in which the same reference numberals are used for the same elements as those shown in FIG 3.
It will be appreciated that a key benefit of the node guardian approach is that the error containment boundary is defined within the reception path of the processing node. This stands in contrast to the bus guardian approach where the error containment boundary is defined within the transmission path of the processing node where it can be impacted by common failure modes or spatial proximity faults that may cause both a faulty behaviour of the communication processor resulting in an unscheduled transmission as well as a faulty behaviour of the bus guardian allowing the unscheduled transmission to propagate to the network. The node guardian approach eliminates many problems encountered with the bus guardian approach concerning avoiding common failure modes, such as independent clock sourcmg, independent power supply, testing and test interaction.
It will be understood that the method and arrangement for interconnecting fail-uncontrolled processors in a dependable distributed system described above provides the following advantages:
The invention transfers the problem of fault containment from the output interface of a potentially faulty processing node to the input interface of fault-free processing nodes. By doing so, problems encountered by spatial proximity faults or functional dependencies within a faulty processing node that may jeopardize fault containment at its output interface are mitigated.

Claims

Claims
1. An arrangement for connecting a processing node in a distributed system containing fail-uncontrolled processing nodes, the arrangement comprising: receiver means for receiving signals from another processing node of the system, and node guardian means coupled to the receiver means wherein the node guardian means includes switch means for receiving a plurality of unidirectional input signals, logic means coupled to the switch means for combining the plurality of received input signals according to a predetermined logic function, and control means coupled to the switch means for controlling application of the plurality of received signals to the logic means for controlling reception of signals thereat so as to reduce reception by the processing node of uncontrolled transmission from another processing node of the system.
2. The arrangement of claim 1 wherein the predetermined logic function comprises an OR logic function.
3. The arrangement of claim 1 or 2 wherein the control means is arranged to control the switch means according to a predetermined TDMA schedule.
4. A distributed system comprising the arrangement according to claim 1, 2, or 3.
5. The system of claim 4 further comprising at least one processing node having bus guardian means.
6. The system of claim 5 comprising: a first processing node having a node guardian according to claim 1, 2, or 3, a second processing node having a node guardian according to claim 1, 2, or 3, a first group of processing nodes having bus guardian means, and a second group of processing nodes having bus guardian means, wherein the first group is coupled to the first and second processing nodes via a first common channel, and the second group is coupled to the first and second processing nodes via a second common channel, the first group and the first processing node forming a first error containment region, and the second group and the second processing node forming a second error containment region.
7. The system of claim 6, the first group further being coupled to the first and second processing nodes via a third common channel, and the second group further being coupled to the first and second processing nodes via a fourth common channel.
8. The system of claim 5 or 6 further comprising: a third processing node having a node guardian according to claim 1, 2, or 3, and a fourth processing node having a node guardian according to claim 1, 2, or 3, wherein the third processing node is coupled to the first common channel, the fourth processing node is coupled to the second common channel, and the first, second, third and fourth processing nodes are cross-coupled, the third processing node being in the first error containment region, and the fourth processing node being in the second error containment region.
9. A system of claim 4 comprising a plurality of processing nodes each having a node guardian means according to claim 1, 2 or 3 coupled by unidirectional broadcast paths, wherein each of the plurality of processing nodes is assigned exclusively to one of the unidirectional broadcast paths for the purpose of transmission between the plurality of processing nodes.
10. The system of any one of claims 5-9 wherein the system is one of A-B: A a TTP/C system, B a FlexRay™ system.
11. A method of operating a processing node m a fail- uncontrolled distributed system, the method comprising: providing receiver means for receiving signals from another processing node of the system, and providing node guardian means coupled to the receiver means for receiving a plurality of unidirectional input signals, wherein the node guardian means includes logic means for combining the plurality of received input signals according to a predetermined logic function, and control means for controlling application of the plurality of received signals to the logic means for controlling reception of signals thereat so as to reduce reception by the processing node of uncontrolled transmission from another processing node of the system.
12. The method of claim 11 wherein the predeterrru ned logic function comprises an OR logic function.
13. The processing node of claim 11 or 12 wherein the control means controls the switch means according to a predetermined TDMA schedule.
14. A method of operating a distributed system comprising the method of operating a processing node according to claim 11, 12, or 13.
15. The method of claim 14 further comprising providing at least one processing node having bus guardian means.
16. The method of claim 15 comprising: operating a first processing node having a node guardian means according to claim 11, 12, or 13, operating a second processing node having node guardian means according to claim 11, 12, or 13, providing a first group of processing nodes having bus guardian means, and providing a second group of processing nodes having bus guardian means, wherein the first group is coupled to the first and second processing nodes via a first common channel, and the second group is coupled to the first and second processing nodes via a second common channel, the first group and the first processing node forming a first error containment region, and the second group and the second processing node forming a second error containment region.
17. The method of claim 16, the first group further being coupled to the first and second processing nodes via a third common channel, and the second group further being coupled to the first and second processing nodes via a fourth common channel.
18. The method of claim 15 or 16 further comprising: operating a third processing node having node guardian means according to claim 11, 12, or 13, and operating a fourth processing node having node guardian means according to claim 11, 12, or 13, wherein the third processing node is coupled to the first common channel, the fourth processing node is coupled to the second common channel, and the first, second, third and fourth processing nodes are cross-coupled, the third processing node being in the first error containment region, and the fourth processing node being in the second error containment region.
19. The method of any one of claims 15-18 wherein the system is one of A-B: A a TTP system, B a FlexRay™ system.
PCT/EP2004/051700 2003-08-05 2004-08-03 Arrangement and method for connecting a processing node in a distributed system WO2005015833A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP04766406A EP1654833B1 (en) 2003-08-05 2004-08-03 Arrangement and method for connecting a processing node in a distributed system
JP2006522356A JP4579242B2 (en) 2003-08-05 2004-08-03 Apparatus and method for connecting processing nodes in a distributed system
US10/567,309 US7818613B2 (en) 2003-08-05 2004-08-03 Arrangement and method for connecting a processing node in a distribution system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0318256.5 2003-08-05
GB0318256A GB2404827A (en) 2003-08-05 2003-08-05 Fault containment at non-faulty processing nodes in TDMA networks

Publications (1)

Publication Number Publication Date
WO2005015833A1 true WO2005015833A1 (en) 2005-02-17

Family

ID=27839622

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2004/051700 WO2005015833A1 (en) 2003-08-05 2004-08-03 Arrangement and method for connecting a processing node in a distributed system

Country Status (7)

Country Link
US (1) US7818613B2 (en)
EP (1) EP1654833B1 (en)
JP (1) JP4579242B2 (en)
KR (1) KR20060058699A (en)
GB (1) GB2404827A (en)
TW (1) TW200511773A (en)
WO (1) WO2005015833A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8429218B2 (en) * 2006-04-06 2013-04-23 International Business Machines Corporation Process restart on a compute node
US10140049B2 (en) 2012-02-24 2018-11-27 Missing Link Electronics, Inc. Partitioning systems operating in multiple domains
US20140039957A1 (en) * 2012-08-03 2014-02-06 International Business Machines Corporation Handling consolidated tickets

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5568615A (en) 1992-06-12 1996-10-22 The Dow Chemical Company Stealth interface for process control computers
US5636204A (en) * 1995-01-20 1997-06-03 Fujitsu Limited Transmission fault processing method and transmisssion fault processing device
US20020133756A1 (en) 2001-02-12 2002-09-19 Maple Optical Systems, Inc. System and method for providing multiple levels of fault protection in a data communication network

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6157705A (en) * 1997-12-05 2000-12-05 E*Trade Group, Inc. Voice control of a server
US6882639B1 (en) * 1998-09-21 2005-04-19 Nortel Networks Limited Telecommunications middleware
AT407582B (en) * 1999-08-13 2001-04-25 Fts Computertechnik Gmbh MESSAGE DISTRIBUTION UNIT WITH INTEGRATED GUARDIAN TO PREVENT '' BABBLING IDIOT '' ERRORS
DE19950433A1 (en) * 1999-10-19 2001-04-26 Philips Corp Intellectual Pty Network has nodes for media access checking with test signal generators for providing test signals outside allocated time slots, detectors of faults in associated and/or other node(s)
AT410490B (en) * 2000-10-10 2003-05-26 Fts Computertechnik Gmbh METHOD FOR TOLERATING "SLIGHTLY-OFF-SPECIFICATION" ERRORS IN A DISTRIBUTED ERROR-TOLERANT REAL-TIME COMPUTER SYSTEM
US6782350B1 (en) * 2001-04-27 2004-08-24 Blazent, Inc. Method and apparatus for managing resources
EP1280024B1 (en) * 2001-07-26 2009-04-01 Freescale Semiconductor, Inc. Clock synchronization in a distributed system
DE10144070A1 (en) * 2001-09-07 2003-03-27 Philips Corp Intellectual Pty Communication network and method for controlling the communication network
US7305357B2 (en) * 2002-01-24 2007-12-04 Shaw Cablesystems, G.P. Method and system for providing and controlling delivery of content on-demand over a cable television network and a data network
GB2386804A (en) * 2002-03-22 2003-09-24 Motorola Inc Communications network node access switches
DE60301752T9 (en) * 2002-04-16 2006-11-23 Robert Bosch Gmbh A method for monitoring an access sequence for a communication medium of a communication controller of a communication system
CN1784325A (en) * 2003-05-06 2006-06-07 皇家飞利浦电子股份有限公司 Timeslot sharing over different cycles in tdma bus
EP1695516A2 (en) * 2003-11-19 2006-08-30 Honeywell International, Inc. Mobius time-triggered communication
WO2005053243A2 (en) * 2003-11-19 2005-06-09 Honeywell International Inc. Priority based arbitration for tdma schedule enforcement in a multi-channel system in star configuration
ES2436609T3 (en) * 2006-05-16 2014-01-03 Saab Ab Fault tolerance data bus node in a distributed system
US20080098234A1 (en) * 2006-10-20 2008-04-24 Honeywell International Inc. Fault-containment and/or failure detection using encryption

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5568615A (en) 1992-06-12 1996-10-22 The Dow Chemical Company Stealth interface for process control computers
US5636204A (en) * 1995-01-20 1997-06-03 Fujitsu Limited Transmission fault processing method and transmisssion fault processing device
US20020133756A1 (en) 2001-02-12 2002-09-19 Maple Optical Systems, Inc. System and method for providing multiple levels of fault protection in a data communication network

Also Published As

Publication number Publication date
GB0318256D0 (en) 2003-09-10
US7818613B2 (en) 2010-10-19
GB2404827A (en) 2005-02-09
JP2007517417A (en) 2007-06-28
EP1654833A1 (en) 2006-05-10
EP1654833B1 (en) 2012-11-21
JP4579242B2 (en) 2010-11-10
KR20060058699A (en) 2006-05-30
TW200511773A (en) 2005-03-16
US20060274790A1 (en) 2006-12-07

Similar Documents

Publication Publication Date Title
EP2139172B1 (en) Hybrid topology ethernet architecture
JP4782823B2 (en) User terminal, master unit, communication system and operation method thereof
US6233704B1 (en) System and method for fault-tolerant transmission of data within a dual ring network
CN101662421B (en) Method and device for transmitting control message based on ethernet multi-ring network
JP2007180830A (en) Duplex monitoring control system and redundant switching method of the system
JP5395450B2 (en) Ring type switch and ring type switch control method
CN101164264A (en) Method and device for synchronising two bus systems, and arrangement consisting of two bus systems
Penney et al. Survey of computer communications loop networks: Part 1
EP1654833B1 (en) Arrangement and method for connecting a processing node in a distributed system
GB2031628A (en) Digital communication system
Montenegro et al. Network centric systems for space applications
KR20070057593A (en) Network system
Bauer et al. Byzantine fault containment in TTP/C
US5778193A (en) Multi-node data processing system
TWI289985B (en) Communication network and arrangement for use therein
JP2001308893A (en) Dynamic reconfiguration system for routing information in loop-type topology network
Abuteir et al. Mixed-criticality systems based on time-triggered ethernet with multiple ring topologies
JP2003037636A (en) Gateway device
Toillon et al. An optimized answer toward a switchless avionics communication network
JP3717286B2 (en) Network reconfiguration method
Hall et al. Jet engine control using ethernet with a brain (postprint)
KR970004892B1 (en) Apparatus for doubling a communication bus
CN115001898A (en) Network equipment redundant communication system and method
RU2177674C2 (en) Data transmission system
KR100433649B1 (en) Method for imposing the fail-silent characteristic in a distributed computer system and distribution unit in such a system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2004766406

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2006522356

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 1020067002355

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 2004766406

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1020067002355

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 2006274790

Country of ref document: US

Ref document number: 10567309

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 10567309

Country of ref document: US