WO2018111272A1 - Groupe opérationnel tolérant aux défaillances sur un réseau distribué - Google Patents

Groupe opérationnel tolérant aux défaillances sur un réseau distribué Download PDF

Info

Publication number
WO2018111272A1
WO2018111272A1 PCT/US2016/066862 US2016066862W WO2018111272A1 WO 2018111272 A1 WO2018111272 A1 WO 2018111272A1 US 2016066862 W US2016066862 W US 2016066862W WO 2018111272 A1 WO2018111272 A1 WO 2018111272A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
health
nodes
messages
log
Prior art date
Application number
PCT/US2016/066862
Other languages
English (en)
Inventor
Samuel BEILIN
Original Assignee
The Charles Stark Draper Laboratory, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Charles Stark Draper Laboratory, Inc. filed Critical The Charles Stark Draper Laboratory, Inc.
Priority to SE1950600A priority Critical patent/SE1950600A1/en
Priority to DE112016007522.7T priority patent/DE112016007522T5/de
Priority to PCT/US2016/066862 priority patent/WO2018111272A1/fr
Publication of WO2018111272A1 publication Critical patent/WO2018111272A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • H04L2012/40208Bus networks characterized by the use of a particular bus standard
    • H04L2012/40215Controller Area Network CAN

Definitions

  • a method receives, at a first node of multiple nodes, each node connected to a common network bus, a health message from a second node.
  • the health message includes a log of health messages from other nodes.
  • Each node sends health messages at a frequency known to the plurality of nodes.
  • the method further compares, at the first node, the log of messages from other nodes in the received health message to a log of health messages previously received from other nodes stored by the first node. Based on the comparison, the method determines a health status of each node.
  • receiving a health message further includes receiving multiple health messages from one or more of the other nodes of the plurality of nodes. Comparing further includes comparing each log of messages from the received multiple health messages to the log of health messages stored by the first node.
  • the common bus is at least one of a controller area network (CAN) bus and an Ethernet bus.
  • CAN controller area network
  • Ethernet Ethernet
  • the method further includes generating, at the first node, the log of health messages from other nodes stored by the first node by recording a timestamp of each received health message from other nodes in the log during one clock cycle of the first node.
  • determining a health status of a particular node is performed by verifying timestamps of health messages from the particular node that corresponds with timestamps in the log stored by the first node.
  • the method further includes broadcasting, from the first node over the common network bus, a health message of the first node to the other nodes, the health status including a log of other received health messages.
  • each node may have the same clock frequency.
  • the method can operate as long as the clock frequency of each node is known by each other node.
  • comparing further includes determining that all health messages at the first node match timestamps of their respective nodes in the logs of health messages from the nodes. Otherwise, the method marks the nodes having unmatched timestamps as out of synchronization.
  • the method further includes forming a fault-tolerant group with other nodes based on the determined health status of each node.
  • the method further includes determining a health status of the first node by comparing an entry of the log of messages in the received health message corresponding to the first node to entries of the log of messages in other received health messages.
  • a system includes a common network bus, and a plurality of nodes, each connected to the common network bus.
  • a first node of multiple nodes is configured to receive a health message from a second node, the health message including a log of health messages from other nodes of the plurality of nodes.
  • Each node sends health messages at a frequency known to the plurality of nodes.
  • the system is further configured to compare, at the first node, the log of messages from other nodes in the received health message to a log of health messages previously received from other nodes stored by the first node.
  • the system is further configured to, based on the comparison, determine a health status of each node.
  • a non-transitory computer-readable medium is configured to store instructions.
  • the instructions when loaded and executed by a processor, cause the processor to receive, at a first node of multiple nodes each connected to a common network bus, a health message from a second node.
  • the health message includes a log of health messages from other nodes of the plurality of nodes.
  • Each node sends health messages at a frequency known to the plurality of nodes.
  • the instructions further cause the processor to compare, at the first node, the log of messages from other nodes in the received health message to a log of health messages previously received from other nodes stored by the first node.
  • the instructions further cause the processor to, based on the comparison, determine a health status of each node.
  • Fig. 1 is a diagram illustrating an example embodiment of a car having an illustrative controller area network (CAN) bus connecting multiple subsystems.
  • CAN controller area network
  • FIG. 2A is a block diagram illustrating an example embodiment of a CAN Bus connected with nodes.
  • FIG. 2B is a block diagram illustrating an example embodiment of a node having a computing unit and fault-tolerance layer that is operatively coupled with a CAN Bus.
  • FIG. 3 is a diagram of an example embodiment of a packet with a health message.
  • Fig. 4 is a flow diagram illustrating an example embodiment of a process employed by the present invention.
  • Fig. 5 is a diagram illustrating an example embodiment of a timeline of health messages.
  • Fig. 6 is a block diagram illustrating verification of a communication line.
  • Fig. 7 is a diagram illustrating an example embodiment of a verification table employed in an embodiment of the present invention.
  • Fig. 8 is a flow diagram illustrating an example embodiment of a process employed by the present invention in relation to the above described verification table.
  • Fig. 9 illustrates a computer network or similar digital processing environment in which embodiments of the present invention may be implemented.
  • Fig. 10 is a diagram of an example internal structure of a computer (e.g., client processor/device or server computers) in the computer system of Fig. 9.
  • Previous methods of implementing fault tolerance employ nodes that are directly connected to each other. Each node independently performs the same function, and for each operation, results are compared and voted on by the other system. In voting, when there is a difference in the results, a failure can be overridden by the correctly calculated answer found by a majority of the nodes, or if there is not a majority, failure can be flagged.
  • fault-tolerant operational groups are referred to by the number of backup systems employed.
  • a simplex is an operational group with one node
  • a duplex is an operational group with two nodes.
  • Both simplex and duplex operational groups are zero-fault-tolerant.
  • a simplex does not have another node to check results against, and while a duplex can check each node against each other, in the case of a fault, the nodes cannot agree on which node is correct. However, the duplex can note the error, and other corrective actions can be taken, such as cancelling a launch or other operation.
  • a one-fault- tolerant operational group is a triplex, which has three nodes.
  • a two-fault-tolerant operational group is a quad, or quadraplex.
  • a person of ordinary skill in the art can envision higher level fault-tolerant operational groups according to this formula. In these methods, each node was connected to all other nodes directly. For example, a duplex would have two lines - one from the first node to the second, and one from the second to the first. For higher-level fault-tolerant operational groups, however, many more connections are needed. For example, in a triplex, six wires are needed. In a quad, 12 wires are needed.
  • U.S. Patent No. 8,972,772 System and Method for Duplexed Replicated Computing
  • CAN controller area network
  • the CAN bus is a serial communication protocol, which supports distributed real-time control and multiplexing for use within road vehicles and other control applications.
  • the CAN bus can be implemented by the International Organization for Standardization (ISO) 11898, specifically ISO 11898-1 :2003 and ISO 11898-1 :2015, which are hereby incorporated by reference in their entirety.
  • ISO International Organization for Standardization
  • ISO 11898-3 describes creating redundant connections between components on the CAN bus, however, does not create fault-tolerant operational groups. In other words, if a wire of the CAN bus described by 11898-3 were severed, an alternate wire pathway would allow components on the CAN bus to continue to communicate. Instead, with each component connected via a bus, creating a fault-tolerant architecture is implemented differently, as described below.
  • Fig. 1 is a diagram 100 illustrating an example embodiment of a car 102 having an illustrative CAN bus 104 connecting multiple subsystems.
  • computing units 108a-l for each vehicle system 106a-l may be used in a distributed fashion to assist each other.
  • traction control system 106a is a traction control system 106a, an entertainment system 106b, an anti-lock brake system 106c, a pre-collision braking/collision warning system 106d, a blind-spot detection system 106e, an image processing system 106f, a power steering system 106g, an adaptive cruise control system 106h, a lane monitoring system 106i, an air bag deployment system 108j, an adaptive headlights system 108k, and a rearview camera system 1061.
  • a person of ordinary skill in the art can recognize that more or less systems can be employed in the car 102 and connected via the CAN bus 104; however, the systems 106a-l are shown for exemplary purposes.
  • the computing unit 108a-l for a non-emergency system can assist with processing for a critical system (e.g., anti-lock braking 106c, pre-collision braking 106d, an imaging processing system 106f for imaging the vehicle's surroundings objects, etc.).
  • a critical system e.g., anti-lock braking 106c, pre-collision braking 106d, an imaging processing system 106f for imaging the vehicle's surroundings objects, etc.
  • the car 102 can organize the systems into fault-tolerant groups based on the required fault-tolerance of the required function. For example, functions that are more critical may be two-fault-tolerant, where less critical functions, such as heating or entertainment, can be no fault-tolerant. In time critical situations, however, critical functions can have a simplex as overhead, such as application by user input of the driver of the emergency brake.
  • the computing units 108a of each subsystem can be shared in a fault-tolerant way.
  • Image processing 106f can include stereo-vision systems, Radar, Lidar, or other vision systems, and the processing of data related to the same.
  • image processing 106f is critical to the car's autonomous functions. An error in image processing 106f can result in the vehicle 102 failing to recognize an object on the road, which can cause a collision. Therefore, the vehicle 102 could make the image processing system as two-fault- tolerant. Doing so requires a quad, which in previous systems required four image processing systems to be connected to each other directly, all programmed to do the same function.
  • the image processing system 106f can leverage the computing units 108a-e and 108g-l of the other systems 106a-e and 106g-l to verify its calculations in a distributed manner. Therefore, to emulate a quad, four of the computing units 108a-l can perform calculations, vote on the calculations, and output a response so that the car 102 can take an appropriate action. In this way, the car distributes its computing power in a fault-tolerant way.
  • a triplex, duplex, or simplex can be implemented similarly.
  • any «-fault-tolerant operational group can be implemented to for any n greater than or equal to zero, even though it is uncommon for n to be greater than three.
  • the nodes after determining health statuses of nodes, can form a fault-tolerant operational group, such as a simplex, duplex, triplex, quad, or a three-fault or higher tolerant operational group.
  • the fault-tolerant operational group can also be referred to as a redundancy group.
  • a person of ordinary skill in the art can also recognize that other bus architectures or network technologies can be implemented instead of the ISO 11898 architecture.
  • wired or wireless Ethernet is one example of a network technology that can be employed in other embodiments; however, different types of networks other than Ethernet can be used.
  • a person of ordinary skill in the art can employ Ethernet with the principles described in relation to the CAN bus 104 in this application, and is not described separately. However, it is noted that in an Ethernet system, packet collisions have to be accounted for, which is not a factor with the CAN bus 104. In an Ethernet network, packets that collide are resent at a later time with an updated timestamp.
  • nodes can consider that packets may be delayed due to packet collision before determining that a node that has not sent an anticipated health message is experiencing a fault. While many methods can perform this, one embodiment is delaying determination of health of a particular node during periods of high network congestion.
  • Fig. 2A is a block diagram 200 illustrating an example embodiment of a CAN Bus 204 connected with Nodes A-E 208a-e.
  • the nodes 208a-e can represent, for example, computing units 108a-l of Fig. 1. However, nodes 208a-e can further represent any computing unit in a fault-tolerant operational group. In an example embodiment, the nodes 208a-e can represent four nodes of a quad of a two-fault-tolerant system, and one additional voting node that regulates. Regardless of the functions of the nodes, each has to confirm that the others are communicating the correct data with each other.
  • each node 208a-e sends out its own health message 210 on a clock that is known to all of the other nodes 208a- e. In other words, each node 208a-e knows the clock speed of the other nodes 208a-e. Each node 208a-e can have the same clock speed or different clock speeds.
  • the message 210 can be broadcast to all nodes on the CAN bus 204. In other embodiments; however, the message 210 can be multicast to specific nodes on the CAN bus 204
  • the health message 210 may only include data about node E 208e. However, after one cycle, each health message 210 should include data about the other nodes A-D 208a-d as well. This is accomplished by, at each node, recording when respective health messages are received from each node. Then, in the next health message, the node includes a log of all other health messages it has received. In this way, each node can compare its log of (a) received health messages and (b) its own sent health messages to the log of health messages received from other nodes. If the two logs of a first node match the logs received in a health message from a second node, then the first node can verify that its connection to the second node is receiving messages correctly.
  • Fig. 2B is a block diagram illustrating an example embodiment of a node 256 having a computing unit 252 and fault-tolerance layer 254 that is operatively coupled with a CAN Bus 204.
  • the node 256 of Fig. 2B is an example embodiment of any of the nodes A-E 208a-e.
  • the node 256 can implement, for example, any of the systems 106a-l shown in Fig. 1, but can implement any other system as well.
  • the node 256 includes a computing unit 252 that determines, computationally, needed data to be sent to the bus 258 in response to data received from the bus 260.
  • the data received from the bus can be requests to perform operations from other nodes connected to the CAN bus 204.
  • the computing unit 252 can calculate data based on real-world input, such as a pedestrian being detected by the vehicle.
  • the data to the bus 258 is first sent to a fault-tolerance layer 254.
  • the fault-tolerance layer 254 appends a health message, described further in relation to Fig. 3, to the data packet.
  • the data with the health message 262 is then sent to the CAN bus 204.
  • the health message 210 can be added to the data 258 to become the data 262 with health message.
  • the fault-tolerance layer 254 further receives data from the CAN bus 204 having health messages from other nodes.
  • the fault-tolerance layer 254 determines health of the other nodes as well as the node 256 itself, before sending the data 260 to the computing unit 252 for processin. In this manner, the fault-tolerance layer 254 abstracts away fault-tolerant management from computing units 252 of any nodes.
  • the fault-tolerance layer 254 can be implements in software by a processor, or also in hardware by an FPGA, or other hardware device.
  • Fig. 3 is a diagram 300 of an example embodiment of a packet 302 with a health message 306.
  • the packet 302 includes packet data 304, but further includes the health message 306.
  • the health message 306 includes two components, a timestamp 308 of the packet, and a log 310 of other timestamps 310.
  • the log 310 may be empty or incomplete during an initialization clock cycle due to lack of received data.
  • a cyclic redundancy check (CRC) hash, check bits, or a check sum are appended to each packet 302.
  • the CRC is an error detecting code that is first calculated by a sending node, and then attached to the packet 302.
  • the length of the message is either pre-determined or encoded into the message so the receiving node knows which part of the message is the CRC or checksum.
  • the receiving node calculates the CRC based on the packet data 304 and, optionally, health message 306, and confirms that the received CRC matches the CRC appended to the packet 302. This verifies that no accidental data changes have been made to the packet 302.
  • the health message 306 can also include a timestamp of the packet 308, and a log of other timestamps 310. This information can be, separate from the CRC information, also checked against timestamp logs in each corresponding node after transmission of each packet.
  • Fig. 4 is a flow diagram 400 illustrating an example embodiment of a process employed by the present invention.
  • the process receives a health message from another network element on a same network bus (402).
  • the health message can be, as described above, appended to a data packet as a timestamp and a log of timestamps of other health messages received from other nodes, or CRC information, or both.
  • the process checks whether the log of messages in the health message is consistent with the log of health messages stored by the network element (404). Based on this comparison, the process determines a health status of each network element (406).
  • Fig. 5 is a diagram 500 illustrating an example embodiment of a timeline 502 of health messages 504-(0-7).
  • the health messages 504-(0-7) can be appended to data packages, or in another embodiment, independent messages.
  • a person of ordinary skill in the art can further recognize that while the timeline 502 is on the millisecond scale, any other timescale can be used.
  • each network element may send messages at different frequencies or the same frequencies as other nodes on the bus.
  • the health messages can be sent sequentially, simultaneously, or any combination thereof. This example assumes that each message is successfully received by each other node.
  • the health messages 504-(0-3) can be considered initialization health messages that fill up empty logs at the respective Nodes A-E.
  • the health messages 504-(5-7), on the other hand, are sent after the initialization phase.
  • the health messages sent after the initialization phase edit their respective logs as a rolling queue.
  • the nodes edit their respective verification table or verification matrix.
  • the health message 504-3 includes a log of health messages received from other nodes. At this point, the log of the health message 504-3 includes the
  • the log of both health messages 504-3d and 504-3e includes the representations of the message from Node A received with a timestamp of 0ms, the message from Node B received with a timestamp of 1ms, and the message from Node C with a timestamp of 2ms.
  • the fourth and fifth messages 504-3d and 504-3e are an example of messages sent on the CAN bus in parallel.
  • the example health messages illustrated in Fig. 5 show that messages can be sent either in parallel or sequentially on a CAN bus.
  • messages sent at the same time have a packet collision, and one or more may be resent according to Ethernet network protocol.
  • a person of ordinary skill in the art can also recognize that in an Ethernet network, messages may not be sent in parallel, but can be sent in sequence shortly after each other. Further, a person of ordinary skill in the art can recognize that no messages are sent at 4ms. This represents the fact that there may be idle periods on the CAN bus.
  • the log at each node has data points of last health messages from each other node.
  • the log replaces indications of health messages at a node with any newly received health message.
  • Node A sends a health message 504-5 across the bus with the log including the representations of the message from the message from Node B received with a timestamp of 1ms, the message from Node C at 2ms, the message from Node D with a timestamp of 3ms, and the message from Node E with a timestamp of 3ms.
  • the log does not include an entry for Node A because the health message itself can represent itself.
  • the log can be more explicit, or even include multiple iterations of messages from multiple nodes.
  • Node B sends a health message 504-6 across the bus with the log including the representations of the message from the message from the message from Node C at 2ms, the message from Node D with a timestamp of 3ms, the message from Node E with a timestamp of 3ms, and the message from node A with a timestamp of 5ms.
  • Node C sends a health message 504-7 across the bus with the log including the representations of the message from the message from the message from the message from the message from Node D with a timestamp of 3ms, the message from Node E with a timestamp of 3ms, and the message from Node A with a timestamp of 5ms, and Node B with a timestamp of 6ms.
  • the health messages show in Fig. 5, therefore, can be analyzed by the nodes they are received at to ensure that communication channel from the node that sent the health message to the node receiving the health message is functioning properly.
  • communication channel can be verified by performing the CRC checks described above.
  • the communication channel can be further verified by comparing the timestamps in each health message to timestamps of health messages received at each node, on a per-node basis, which can collectively verify the entire network of nodes. Both of these checks can be performed, or in other embodiments, one check can be performed. A person of ordinary skill in the art could further envision other ways to verify message integrity from one node to another.
  • a person of ordinary skill in the art can recognize that the examples described herein illustrate, for simplicity, the health messages being sent all on the same frequency.
  • a person of ordinary skill in the art can configure the described system to operate when health messages are sent across the bus at different frequencies as well.
  • all nodes must know the frequency that each other nodes are sending their messages.
  • the nodes can determine accurately whether a particular node's health message should have been received or not.
  • the receiving node can compare the time it last received a message from the receiving node to the receiving node's known frequency of sending health messages. If more time has elapsed than the frequency, the communication channel with the node may be faulty.
  • This embodiment of nodes sending messages at different, but known, frequencies can be applied to the other embodiments described herein.
  • Fig. 6 is a block diagram 600 illustrating verification of a communication line.
  • the verification is performed at Node A 208a on its communication line from Node B 208b, previously described in relation to Fig. 2.
  • Node B 208b sends health message 504-6, as described in relation to Fig. 5, to a comparison module 602 of Node A 208a.
  • the comparison module 602 compares the health message 504-6 and its log to Node A's 208a log 608. In the comparison, the entries for Nodes C, D, E, and A are consistent.
  • the log 608 also includes an entry for Node B at 1ms, because it has not replaced Node B's current health message 504-6 yet. However, the comparison module 602 can take this into account, and allow verification of Node B's connection to Node A 606.
  • the system can mark the communication link as unverified.
  • the system can flag Node A 208a or Node B 208b as faulty, for example.
  • the system can also send messages to compare all verifications of other nodes. This may reveal, for example, that Node B's 208b messages to all other nodes on the network were corrupted, and the rest of the nodes can assume Node B is faulty.
  • collectively received logs can be compared at a node to determine the source of a network problem or fault in a node.
  • Fig. 7 is a diagram 700 illustrating an example embodiment of a verification table 702 employed in an embodiment of the present invention.
  • a verification table 702 is constructed based on received logs from each node at a particular node.
  • the verification table 702 can be the same at each node, assuming each node has received each health message and each corresponding log correctly.
  • a separate log e.g., log 208 of Fig. 6
  • the verification table 702 is an alternative embodiment that is more robust, as it stores the timestamps of the messages received at the particular node, as well as the timestamps of the logs from all of the health messages.
  • the verification table 702 represents the logs received from each node's most recent health message. Each column of the verification table 702 represents a log from the node listed in the header. Each row of the verification table 702 represents the timestamp of each particular node on the network. Therefore, the cell at Column "Node A” and Row "Node E” represents the timestamp of Node E in Node A's most recent health message log. [0059] A person of ordinary skill in the art can further recognize that the verification table 702 can be expanded to store more logs than each node's most recent log. For example, the verification table 702 can be extended into a verification matrix that is a collection of multiple verification tables, each layer representing previous sets of health messages received. However, if one verification table 702 is used, the table can overwrite past entries as new health messages arrive.
  • Some of the data in the verification table 702 can be compared to determine whether a fault or communication error has occurred, however, some of the data is out of date.
  • the shaded cells in the verification table represent the time that the health message was sent from that particular node (e.g., Node A sent its health message at 5ms, Node B send its health message at 6ms, etc.).
  • the data in each row can be compared to verify the connection.
  • the node is arranged starting at Node C, each row and column organized sequentially backwards in time based on the last health message received from each node. This makes it easier to visualize new data and out of date data.
  • Node A The entries of Node A in the logs of Nodes C, B, and A are the same, i.e., 5ms, but the entries in the logs of Node E and D are different, i.e., 0ms.
  • the process can determine that Nodes D and E are not in error, but simply out of date, by checking that the health message from Nodes D and E were both sent at 3ms - and therefore, a timestamp of 5ms could not have been included in its last message.
  • embodiments of the invention can include an embodiment of the verification table 702 including two or more versions of timestamp logs from all nodes. In this way, out of date timestamps can be compared to timestamps on a previous clock cycle.
  • Fig. 8 is a flow diagram 800 illustrating an example embodiment of a process employed by the present invention in relation to the above described verification table. The process begins by receiving a health message from another network element on a same network bus (802).
  • the process stores the health message and the log of timestamps in the health message in a verification table (804). By doing this across multiple health messages, the verification table grows to include a history of all messages.
  • the verification table can store health messages indefinitely, for a set period of time, for a set period of clock cycles, or other configurable period of time.
  • the process determines whether, for each particular node, the timestamps for that particular node in the logs of each health message, now stored in the verification table, that are sent on or after the timestamp of the health message match each other (806). If so, the process verifies the particular node as properly functioning (808). If not, the process flags the node for action (810), such as taking the node out of service, disabling voting of the node, etc.
  • a person of ordinary skill in the art can further recognize that the above method can be performed without formally assembling a verification table, but can store the multiple health messages in a memory or database, and retrieve each timestamp separately for each comparison. However, assembling the verification abstracts away such data retrieval and aids the processing of the comparison.
  • FIG. 9 illustrates a computer network or similar digital processing environment in which embodiments of the present invention may be implemented.
  • Client computer(s)/devices 50 and server computer(s) 60 provide processing, storage, and input/output devices executing application programs and the like.
  • the client computer(s)/devices 50 can also be linked through communications network 70 to other computing devices, including other client devices/processes 50 and server computer(s) 60.
  • the communications network 70 can be part of a remote access network, a global network (e.g., the Internet), a worldwide collection of computers, local area or wide area networks, and gateways that currently use respective protocols (TCP/IP, Bluetooth®, a registered trademark of Bluetooth SIG, Inc., etc.) to communicate with one another.
  • Other electronic device/computer network architectures are suitable.
  • Fig. 10 is a diagram of an example internal structure of a computer (e.g., client processor/device 50 or server computers 60) in the computer system of Fig. 9.
  • Each computer 50, 60 contains a system bus 79, where a bus is a set of hardware lines used for data transfer among the components of a computer or processing system.
  • the system bus 79 is essentially a shared conduit that connects different elements of a computer system (e.g., processor, disk storage, memory, input/output ports, network ports, etc.) that enables the transfer of information between the elements.
  • Attached to the system bus 79 is an I/O device interface 82 for connecting various input and output devices (e.g., keyboard, mouse, displays, printers, speakers, etc.) to the computer 50, 60.
  • a network interface 86 allows the computer to connect to various other devices attached to a network (e.g., network 70 of Fig. 9).
  • Memory 90 provides volatile storage for computer software instructions 92 and data 94 used to implement an embodiment of the present invention (e.g., comparison module, CAN bus, and verification table generation code detailed above).
  • Disk storage 95 provides non-volatile storage for computer software instructions 92 and data 94 used to implement an embodiment of the present invention.
  • a central processor unit 84 is also attached to the system bus 79 and provides for the execution of computer instructions.
  • the processor routines 92 and data 94 are a computer program product (generally referenced 92), including a non-transitory computer-readable medium (e.g., a removable storage medium such as one or more DVD-ROM's, CD-ROM's, diskettes, tapes, etc.) that provides at least a portion of the software instructions for the invention system.
  • the computer program product 92 can be installed by any suitable software installation procedure, as is well known in the art.
  • at least a portion of the software instructions may also be downloaded over a cable communication and/or wireless connection.
  • the invention programs are a computer program propagated signal product embodied on a propagated signal on a propagation medium (e.g., a radio wave, an infrared wave, a laser wave, a sound wave, or an electrical wave propagated over a global network such as the Internet, or other network(s)).
  • a propagation medium e.g., a radio wave, an infrared wave, a laser wave, a sound wave, or an electrical wave propagated over a global network such as the Internet, or other network(s)
  • Such carrier medium or signals may be employed to provide at least a portion of the software instructions for the present invention routines/program 92.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Debugging And Monitoring (AREA)

Abstract

Dans un mode de réalisation de la présente invention, un procédé reçoit, au niveau d'un premier nœud de plusieurs nœuds, chaque nœud étant connecté à un bus de réseau commun, un message de santé à partir d'un second nœud. Le message de santé comprend un journal de messages de santé provenant d'autres nœuds. Chaque nœud envoie des messages de santé à une fréquence connue de la pluralité de nœuds. Le procédé compare en outre, au niveau du premier nœud, le journal de messages provenant d'autres nœuds dans le message de santé reçu à un journal de messages de santé précédemment reçus en provenance d'autres nœuds stockés par le premier nœud. Sur la base de la comparaison, un état de santé de chaque nœud est déterminé. Au moyen des modes de réalisation du présent procédé et du système, des unités de calcul peuvent former des groupes dynamiques tolérants aux défaillances.
PCT/US2016/066862 2016-12-15 2016-12-15 Groupe opérationnel tolérant aux défaillances sur un réseau distribué WO2018111272A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
SE1950600A SE1950600A1 (en) 2016-12-15 2016-12-15 Fault-tolerant operational group on a distributed network
DE112016007522.7T DE112016007522T5 (de) 2016-12-15 2016-12-15 Fehlertolerante Betriebsgruppe bei einem verteilten Netzwerk
PCT/US2016/066862 WO2018111272A1 (fr) 2016-12-15 2016-12-15 Groupe opérationnel tolérant aux défaillances sur un réseau distribué

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2016/066862 WO2018111272A1 (fr) 2016-12-15 2016-12-15 Groupe opérationnel tolérant aux défaillances sur un réseau distribué

Publications (1)

Publication Number Publication Date
WO2018111272A1 true WO2018111272A1 (fr) 2018-06-21

Family

ID=57796983

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/066862 WO2018111272A1 (fr) 2016-12-15 2016-12-15 Groupe opérationnel tolérant aux défaillances sur un réseau distribué

Country Status (3)

Country Link
DE (1) DE112016007522T5 (fr)
SE (1) SE1950600A1 (fr)
WO (1) WO2018111272A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040167912A1 (en) * 2003-02-20 2004-08-26 International Business Machines Corporation Unified logging service for distributed applications
US20140043962A1 (en) * 2012-08-07 2014-02-13 Qualcomm Incorporated STATISTICS AND FAILURE DETECTION IN A NETWORK ON A CHIP (NoC) NETWORK
US8972772B2 (en) 2011-02-24 2015-03-03 The Charles Stark Draper Laboratory, Inc. System and method for duplexed replicated computing
EP2953295A1 (fr) * 2014-06-06 2015-12-09 Nokia Solutions and Networks Oy Synchronisation d'événements delta automatique dans de multiples environnements agent-gestionnaire

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040167912A1 (en) * 2003-02-20 2004-08-26 International Business Machines Corporation Unified logging service for distributed applications
US8972772B2 (en) 2011-02-24 2015-03-03 The Charles Stark Draper Laboratory, Inc. System and method for duplexed replicated computing
US20140043962A1 (en) * 2012-08-07 2014-02-13 Qualcomm Incorporated STATISTICS AND FAILURE DETECTION IN A NETWORK ON A CHIP (NoC) NETWORK
EP2953295A1 (fr) * 2014-06-06 2015-12-09 Nokia Solutions and Networks Oy Synchronisation d'événements delta automatique dans de multiples environnements agent-gestionnaire

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ISO 11898-1, 2003
ISO 11898-1, 2015

Also Published As

Publication number Publication date
SE1950600A1 (en) 2019-05-21
DE112016007522T5 (de) 2019-10-10

Similar Documents

Publication Publication Date Title
US10356203B2 (en) Fault-tolerant operational group on a distributed network
US10955847B2 (en) Autonomous vehicle interface system
EP2238026B1 (fr) Systeme distribue de commande de vol
US20160034363A1 (en) Method for handling faults in a central control device, and control device
JP5313023B2 (ja) 通信ネットワーク内のノードステータス監視方法及び監視装置
US20080189352A1 (en) Complex event processing system having multiple redundant event processing engines
Kopetz A comparison of CAN and TTP
US7809863B2 (en) Monitor processor authentication key for critical data
US8606460B2 (en) Method, an electrical system, a digital control module, and an actuator control module in a vehicle
CN104285190A (zh) 故障安全发现和地址分配
WO2023077968A1 (fr) Procédé de communication embarqué, appareil et dispositif, et support de stockage
US12093006B2 (en) Method and device for controlling a driving function
Kopetz et al. Tolerating arbitrary node failures in the time-triggered architecture
WO2018111272A1 (fr) Groupe opérationnel tolérant aux défaillances sur un réseau distribué
US8321495B2 (en) Byzantine fault-tolerance in distributed computing networks
WO2009147066A1 (fr) Synchronisation d'informations d'erreur de dispositif entre des noeuds
US8843218B2 (en) Method and system for limited time fault tolerant control of actuators based on pre-computed values
Bergmiller et al. Probabilistic fault detection and handling algorithm for testing stability control systems with a drive-by-wire vehicle
JP2023546475A (ja) データ処理のためのデータ処理ネットワーク
Leu et al. Robustness analysis of the FlexRay system through fault tree analysis
Leu et al. A bayesian network reliability modeling for flexray systems
Krishnan et al. A comparison of AFDX and 1553B protocols using formal verification
Debouk et al. Architecture of by-wire systems design elements and comparative methodology
Gaujal et al. Optimal replica allocation for TTP/C based systems
AU2020201518A1 (en) Method and system for a geographical hot redundancy

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16826231

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16826231

Country of ref document: EP

Kind code of ref document: A1