GB2238143A - Voters for fault-tolerant computer systems - Google Patents

Voters for fault-tolerant computer systems Download PDF

Info

Publication number
GB2238143A
GB2238143A GB8922773A GB8922773A GB2238143A GB 2238143 A GB2238143 A GB 2238143A GB 8922773 A GB8922773 A GB 8922773A GB 8922773 A GB8922773 A GB 8922773A GB 2238143 A GB2238143 A GB 2238143A
Authority
GB
United Kingdom
Prior art keywords
buffer
buffers
circuit
voter
slice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB8922773A
Other versions
GB8922773D0 (en
Inventor
John Standeven
Martin John Colley
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Essex
Original Assignee
University of Essex
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Essex filed Critical University of Essex
Priority to GB8922773A priority Critical patent/GB2238143A/en
Publication of GB8922773D0 publication Critical patent/GB8922773D0/en
Publication of GB2238143A publication Critical patent/GB2238143A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/18Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
    • G06F11/187Voting techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/18Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
    • G06F11/181Eliminating the failing redundant component

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

A hardware voter suitable for use in a multi-channel computer system where nominally identical asynchronous messages are carried on the channels has (preferably FIFO) buffers 10 as channels, each buffer having an associated data sampler 11. A voting circuit 13 receives data from buffers 10, one slice (i.e. from each channel as many bits as the channel width) at a time and provides an output slice determined on the basis of the majority of the received slices. A circuit 15 controls the supply of the data from the buffers to the voting circuit, and also inhibits operation of a buffer in the event that its slice disagrees with those of the other channels. The control circuit may also inhibit operation of a buffer if it has no data slice to transfer to the voting circuit a predetermined time after the appearance of a data slice on one of the other buffers, ready for transfer to the voting circuit. <IMAGE>

Description

VOTERS FOR FAULT-TOLERANT COMPUTER SYSTEMS This invention relates to a hardware voter for use in a fault-tolerant computer system.
There is an increasing awareness of the need to make computer systems resilient so that an acceptable level of service can be maintained in the presence of one or more faults. Applications where some fault tolerance is desirable or essential include, for example, array processors, flight guidance systems and process control systems. When an application must keep running with only a minimal performance degradation despite the presence of faults in the computer system, there is implied substantial parallelism and redundancy in the computer system. Triplication of the computer system enables the masking of any single point of failure, and higher levels of protection can be achieved at proportionally greater costs by increasing the redundancy still further.
Software techniques can be used to make a system fault-tolerant without requiring substantial redundancy in the hardware. These techniques unfortunately require so significant an overhead that performance may be affected to such an extent as to be unacceptable for a real-time application where time-critical processes are being handled. As a consequence, for such applications it is generally necessary to employ parallel processing on separate processors of multiple copies of the program code and then to use a voting algorithm to determine an agreed result. Such software voting algorithms necessarily involve a substantial communications overhead between the processors operating in parallel, and this significantly reduces the performance, as compared to a non-parallel system.Moreover, should a fault arise, alternative message routing will thereafter be required to by-pass the faulty processor, and this will result in quite different path times for the parallel channels. The consequence is that messages will be arriving out of synchronism and these must be sorted before further voting can take place.
It is a principal aim of the present invention to provide a hardware voter for use in a fault-tolerant computer system, which voter overcomes the disadvantages discussed above of the known voting techniques as currently employed in parallel processing systems.
Accordingly, this invention provides a hardware voter for use in a fault-tolerant computer system, which voter comprises at least three buffers each arranged to receive a separate incoming digital signal from each of a like number of slice-wide processing channels which are permitted to operate asynchronously, a voting circuit arranged to receive at the same time one slice from each of the buffers and to provide an output slice determined on the basis of the majority of the received slices, and a control circuit adapted to control the supply of slices to the voting circuit from the buffers and to render inactive the buffer of a channel should the received slice therefrom disagree with the received slices of the other channels.
It will be appreciated that the hardware voter of this invention may be inserted in a replicated (or parallel) processor digital computing system at any point where at least three separate channels are expected to be carrying the same messages but not necessarily at the same times, as the channels may operate asynchronously. In a multi-channel system, the output of the voter may feed a like number of further processing channels with identical majority-agreed messages, derived from the messages of the three input channels, to allow processing to be continued in a fault-tolerant manner. Alternatively, a like number of voters as channels may be provided, each producing an identical output on the basis of incoming signals from all of the channels, and each feeding a further processing stage of the associated channel.In this latter case, the computer system will be resilient also to a fault occurring in any one of the voters themselves.
Each channel of the computer system may be just one bit wide, or may be more than one bit wide.
Irrespective of the bit-width of each channel, the single bit or the set of bits (as appropriate) is referred to herein as a 'slice', and the data carried on any one channel is made up from a plurality of such slices running serially.
In the voter of this invention, it is preferred for each buffer to be a FIFO circuit and for each buffer to be adapted to provide a status output which is supplied to the control circuit, to indicate data is present in the buffer. The control circuit preferably is arranged to respond to all of the buffers indicating that data is present so as then to transfer the next data slice from each of the buffers to the voting circuit.
Most preferably the control circuit includes a timer, which timer commences the determination of a time-out period following any one of the buffer status outputs indicating that data is present; any buffer which indicates a status different from the others at the end of the predetermined time-out period may then be rendered inactive on the basis that a fault is likely to be present on that channel.
In the hardware voter of this invention, each voter may be provided with an associated sampler circuit, to facilitate the determination of the information content of the incoming signal. As with all asynchronous communications, it is necessary to oversample the input signal, at least at twice the data rate of the communication link. Typically, the sampler circuit may sample at four times the link data rate, in order to prevent data loss due to edge jitter of the link data.
(RSY) Transputersthave specifically been designed to assist in the creation of highly parallel computing systems. A transputer has flexible interconnections which make it an attractive processor on which to build a fault-tolerant system. Most conveniently, multiple transputers may be employed to accommodate the parallelism in a computer system required for an application and also to provide redundancy for fault tolerance. Voters of this invention may conveniently be used in the communications links between the transputers of the parallel channels, to ensure that a majorityvoted signal is passed on to the next transputer in each channel, even should a fault have arisen in the immediately preceding processor of any one of the channels.
Specifically in the case of transputers, for communications these have high speed asynchronous serial data links. The communications are in the form of messages which consist of a series of data packets. For each data packet that is sent by a transputer, an acknowledgement must be received by that transputer before the next data packet can be output. The format of a data packet comprises a start bit, a packet type bit (which is always ONE for a data packet), eight data bits and a stop bit. An acknowledgement packet has only a start bit and a packet type bit, which is always ZERO. The link protocol allows data and acknowledgements to be interleaved in an unrestricted manner, but after only two bit times it is possible to determine whether the following bits are data.
When the voters of this invention are used in the communications links between parallel-operating transputers having communication links as described above, it is possible for the voters to operate not only on the data fields, as determined from the second bit of a packet, but also on the acknowledgements transmitted along the communications links. However, appropriate filtering may be provided to allow the unvoted transfer of acknowledgements and for the voters to operate only on actual data of a data packet.
This invention extends to a method for deriving a majority-agreed digital signal from at least three notionally-corresponding incoming asynchronous digital signals, in which method the incoming signals are supplied to a corresponding number of buffers, the signals are transferred in parallel but one slice at a time from the buffers to a voting circuit, and the voting circuit produces an output slice determined on the basis of the majority of the slices transferred thereto from the buffers, a control circuit serving to control the transfer of slices to the voting circuit and to render inactive the buffer associated with an incoming signal should any one slice of that signal not agree with the corresponding slices of the other buffers.
It will be appreciated that the buffer of any one channel may be rendered inactive by actually disabling that buffer, or otherwise by inhibiting its operation.
For example, a sampler circuit on the buffer input could be rendered inactive, or the transfer of the slices from the buffer to the voting circuit could be inhibited.
From the foregoing description, it will be appreciated that the method of this invention as defined above particularly lends itself to voting in a faulttolerant computer system constructed from a plurality of transputers, which provides both the parallelism in the application and the redundancy for fault tolerance.
By way of example only, one specific embodiment of a voter constructed and arranged to operate in accordance with this invention will now be described with reference to the accompanying drawings, in which: Figure 1 is a block diagram of the voter; and Figure 2 is a diagram showing a possible connection arrangement for a plurality of voters each arranged as shown in Figure 1, when incorporated in a transputer computer system.
Referring initially to Figure 1, it can be seen that the voter comprises a plurality (and in this example, three) data packet buffers 10, each of which is a conventional FIFO buffer. Three sampler circuits 11 are provided, one for each buffer 10, each sampler circuit having an input line 12 for an asynchronous serial communication link. The output of each buffer 10 is supplied to a voting circuit comprising appropriate logic devices and having a single data output line 14.
A controller/timer circuit 15 controls operation of the buffers and the voting circuit, and receives buffer status information from each buffer 10. A clock controls the operations of the controller/timer 15 and the samplers 11.
In operation, the samplers 11 are clocked at an appropriate rate to sample the incoming digital signals, in order to ensure there is no data loss due to edge jitter of the link data. Each sampler also is arranged to determine whether the incoming signal is a data packet or an acknowledgement packet (as described above) in order that the packets may be processed appropriately by the buffers and voting circuit. As a consequence, each sampler 11 is connected to the associated buffer 10 by means of two lines 16 and 17, respectively for data and acknowledgement signals, and by a further clock (or read) line.
Since a transputer demands an acknowledgement for every data packet transmitted and will suspend further transmission until an acknowledgement is received, there is no need for the buffers to be able to hold more than a single byte and data can never be lost due to overflow. As soon as a buffer holds data, a data present signal becomes TRUE when the first bit of a data packet has been accepted and is transferred to the controller/timer circuit 15.
The controller/timer circuit operates by starting a time-out counter as soon as any one of the buffers 10 signals data present. Data will be extracted from the buffers when either all of the active buffers indicate data present or when the time-out period expires. A buffer associated with an input which has previously been prevented from voting by an error condition may at this time still be deemed to be inactive and so excluded from this control, so that voting can always take place with the minimum possible delay.
The voting circuit 13 produces the majority consensus, one bit at a time. If a buffer is forced to the inactive state because of an error condition or because the buffer remains empty at the expiry of the time-out period, then that channel of the voter is inhibited at for some pre-set time period to prevent further erroneous voting during that period. Each time a new set of data bits is accessed from the buffers and presented to the voting circuit, a vote enable signal is generated to ensure synchronous operation of the voting circuit.
An input channel is considered to be in error and excluded from future voting for at least some pre-set time period if one of the following conditions arise: (a) it does not agree with the majority consensus for a particular output bit value; (b) no data was received before the expiry of the time-out period; (c) that channel received data but the others did not by the expiry of the time-out period; or (d) its buffer signalled empty before the end of the packet.
An error caused by any of these conditions will be latched by the voting circuit as a fault status which will remain until corrective action is taken.
The length of the time-out period determines the maximum allowable skew between the co-operating processors; two data packet times could be used.
Referring now to Figure 2, there is shown a part of a parallel processing fault-tolerant transputer system employing a plurality of voters each as described above with reference to Figure 1, between each stage of the transputer system. As can be seen, stage M of the system employs three processors P1, P2 and P3, each of which provides an output signal to three parallel voters V1, V2 and V3, each as described above. Each voter provides an individual - but identical, so long as all of the voters are operating correctly - output signal to the transputers P4, P5, P6 of stage (M + 1) of the transputer system and in a similar manner, each of those processors supplies an output signal to three further voters V4, V5 and V6. It will be appreciated that by interconnecting processors and voters in this way, should a fault arise in any one processor or voter, the overall system will continue to operate with no degradation in the system performance. Moreover all the channels will remain fully operational except for where a fault has arisen. Also, using three channels in parallel as illustrated, the system will still remain fully operational should one fault (but no more than one fault) arise at any one or more of the stages of the overall system. Further parallel channels may be provided as required to give tolerance to more than one fault occurring at any one stage of the system. Also, in a practical system further voters would be provided to vote an acknowledgements passed from processors P4-P6 back to processors P1-P3.

Claims (18)

1. A hardware voter for use in a fault-tolerant computer system, which voter comprises at least three buffers each arranged to receive a separate incoming digital signal from each of a like number of slice-wide processing channels which are permitted to operate asynchronously, a voting circuit arranged to receive at the same tine one slice (as defined herein) from each of the buffers and to provide an output slice determined on the basis of the majority of the received slices, and a control circuit adapted to control the supply of slices to the voting circuit from the buffers and to render inactive the buffer of a channel should the received slice therefrom disagree with the received slices of the other channels.
2. A hardware voter according to claim 1, wherein each buffer is a FIFO circuit, and each buffer is adapted to provide a status output which Is supplied to thta control circuit, to indicate data is present in the buffer.
3. A hardware voter according to claim 2, wherein the control circuit is arranged to respond to all of the buffers indicating that data is present, so as then to transfer the next data slice from each of the buffers to the voting circuit.
4. A hardware voter according to claim 2 or claim 3, wherein the control circuit includes a timer, which timer commences the determination of a time-out period following any one of the buffer status outputs indicating that data is present.
5. A hardware voter according to claim 4, wherein the control circuit renders inactive any buffer which indicates a status different from the others at the jnd of the predetermined time-out period.
6. A hardware voter according to any of the preceding claims, wherein each voter is provided with an associated sampler circuit, to facilitate the determination of the information content of the incoming signal.
7. A hardware voter according to claim 6, wherein the sampler circuit samples at four times the link data rate.
8. A hardware voter according to claim 1 and substantially as hereinbefore described, with reference to and as illustrated in the accompanying drawings.
9. A replicated-processor digital computing system having at least three channels along each of which there is a point at which the same message may be expected as at said points of the other channels, in which system a hardware voter according to any of claims 1 to 8 is provided, connected to said points.
10. A computing system according to Claim 9, wherein the output of the voter is supplied to each channel.
11. A computing system according to claim 9, wherein there is a like number of voters as channels, each associated with a respective channel, each voter receiving an input from all of the channels but supplying an output to only its associated channel.
12. A fault-tolerant transputer system having a plurality of transputers with parallel data links therebetween, a hardware voter being provided in the data links to pass majority-voted signals from one transputer to another.
13. A fault-tolerant transputer system according to claim 12, wherein filters are provided to pass on acknowledgment packets without voting, whilst allowing voting on data packets.
14. A method for deriving a majority-agreed digital signal from at least three notionally-corresponding incoming asynchronous digital signals, in which method the incoming signals are supplied to a corresponding number of buffers, the signals are transferred in parallel but one slice (as defined herein) at a time from the buffers to a voting circuit, and the voting circuit produces an output slice determined on the basis of the majority of the slices transferred thereto from the buffers, a control circuit serving to control the transfer of slices to the voting circuit and to render inactive the buffer associated with an incoming signal should any one slice of that signal not agree with the corresponding slices of the other buffers.
15. A method according to claim 14, wherein the buffer of any one channel is rendered inactive by disabling operation of that buffer.
16. A method according to claim 14, wherein there is provided a sampler circuit on the buffer input, which sampler circuit is rendered inactive to disable operation of the associate buffer.
17. A method according to claim 14, wherein a buffer is rendered inactive by inhibit;ng the transfer. of slices from that buffer to the voting circuit.
18. A method for deriving a majority-agreed digital signal substantially as hereinbefore described, with reference to and as illustrated in the accompanying drawings.
GB8922773A 1989-10-10 1989-10-10 Voters for fault-tolerant computer systems Withdrawn GB2238143A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB8922773A GB2238143A (en) 1989-10-10 1989-10-10 Voters for fault-tolerant computer systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB8922773A GB2238143A (en) 1989-10-10 1989-10-10 Voters for fault-tolerant computer systems

Publications (2)

Publication Number Publication Date
GB8922773D0 GB8922773D0 (en) 1989-11-22
GB2238143A true GB2238143A (en) 1991-05-22

Family

ID=10664319

Family Applications (1)

Application Number Title Priority Date Filing Date
GB8922773A Withdrawn GB2238143A (en) 1989-10-10 1989-10-10 Voters for fault-tolerant computer systems

Country Status (1)

Country Link
GB (1) GB2238143A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0969373A2 (en) * 1998-06-30 2000-01-05 Sun Microsystems, Inc. I/O handling for a fault tolerant multiprocessor computer system
EP0969369A2 (en) * 1998-06-30 2000-01-05 Sun Microsystems, Inc. Control of multiple computer processes
DE102004032405A1 (en) * 2004-07-03 2006-02-09 Diehl Bgt Defence Gmbh & Co. Kg Space-enabled computer architecture
CN105700354A (en) * 2016-01-31 2016-06-22 南通大学 Intelligent sampling and detecting system with fault accommodation function
US9699009B1 (en) 2016-06-30 2017-07-04 International Business Machines Corporation Dual-mode non-return-to-zero (NRZ)/ four-level pulse amplitude modulation (PAM4) receiver with digitally enhanced NRZ sensitivity

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113655707B (en) * 2021-07-29 2023-12-12 浙江中控技术股份有限公司 Voting control method and device of safety instrument system and electronic device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3783250A (en) * 1972-02-25 1974-01-01 Nasa Adaptive voting computer system
WO1987007793A1 (en) * 1986-06-13 1987-12-17 Valtion Teknillinen Tutkimuskeskus Method for realizing a fault-tolerant electronic system and a corresponding system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3783250A (en) * 1972-02-25 1974-01-01 Nasa Adaptive voting computer system
WO1987007793A1 (en) * 1986-06-13 1987-12-17 Valtion Teknillinen Tutkimuskeskus Method for realizing a fault-tolerant electronic system and a corresponding system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0969373A2 (en) * 1998-06-30 2000-01-05 Sun Microsystems, Inc. I/O handling for a fault tolerant multiprocessor computer system
EP0969369A2 (en) * 1998-06-30 2000-01-05 Sun Microsystems, Inc. Control of multiple computer processes
EP0969373A3 (en) * 1998-06-30 2003-05-21 Sun Microsystems, Inc. I/O handling for a fault tolerant multiprocessor computer system
EP0969369A3 (en) * 1998-06-30 2004-01-07 Sun Microsystems, Inc. Control of multiple computer processes
DE102004032405A1 (en) * 2004-07-03 2006-02-09 Diehl Bgt Defence Gmbh & Co. Kg Space-enabled computer architecture
CN105700354A (en) * 2016-01-31 2016-06-22 南通大学 Intelligent sampling and detecting system with fault accommodation function
CN105700354B (en) * 2016-01-31 2018-08-07 南通大学 The intellegent sampling and detecting system of adjustable failure
US9699009B1 (en) 2016-06-30 2017-07-04 International Business Machines Corporation Dual-mode non-return-to-zero (NRZ)/ four-level pulse amplitude modulation (PAM4) receiver with digitally enhanced NRZ sensitivity

Also Published As

Publication number Publication date
GB8922773D0 (en) 1989-11-22

Similar Documents

Publication Publication Date Title
EP0123507B1 (en) Data communication system and apparatus
EP0381334B1 (en) Apparatus for management, comparison, and correction of redundant digital data
KR970005033B1 (en) Frame group transmission and reception for parallel/serial bues
US4937741A (en) Synchronization of fault-tolerant parallel processing systems
US5349654A (en) Fault tolerant data exchange unit
EP0177690A2 (en) Method for error detection and correction by majority voting
EP0204449A2 (en) Method for multiprocessor communications
EP1198105A2 (en) High speed transmission line interface
JPH0544043B2 (en)
JPH0241221B2 (en)
CA2091993A1 (en) Fault tolerant computer system
EP0310110A2 (en) (1+N) hitless channel switching system
US7496681B2 (en) System and method for extending virtual synchrony to wide area networks
GB2238143A (en) Voters for fault-tolerant computer systems
EP0623266B1 (en) An atm plane merging filter for atm switches and the method thereof
US5278843A (en) Multiple processor system and output administration method thereof
US4561088A (en) Communication system bypass architecture
DK153605B (en) DEVICE FOR MONITORING OF THE TIMING SIGNALS IN A DIGITAL PLANT
JPH08241217A (en) Information processor
EP0414385B1 (en) Communication apparatus for monitoring the amount of transmitted data
JP2010206775A (en) Parallel/serial communication method
Standeven et al. Hardware voter for fault-tolerant transputer systems
EP1988469B1 (en) Error control device
JP2551143B2 (en) ATM switch communication path failure detection system
Silva et al. Master replication and bus error detection in FTT-CAN with multiple buses

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)