EP0164414A1 - Computer processor controller - Google Patents

Computer processor controller

Info

Publication number
EP0164414A1
EP0164414A1 EP85900389A EP85900389A EP0164414A1 EP 0164414 A1 EP0164414 A1 EP 0164414A1 EP 85900389 A EP85900389 A EP 85900389A EP 85900389 A EP85900389 A EP 85900389A EP 0164414 A1 EP0164414 A1 EP 0164414A1
Authority
EP
European Patent Office
Prior art keywords
processors
processor
controller
xxx
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP85900389A
Other languages
German (de)
French (fr)
Other versions
EP0164414A4 (en
Inventor
William W. Kolb
Neil A. Katin
Richard D. Mcmurray
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PARALLEL COMPUTERS Inc
Original Assignee
PARALLEL COMPUTERS Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PARALLEL COMPUTERS Inc filed Critical PARALLEL COMPUTERS Inc
Publication of EP0164414A1 publication Critical patent/EP0164414A1/en
Publication of EP0164414A4 publication Critical patent/EP0164414A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1675Temporal synchronisation or re-synchronisation of redundant processing components
    • G06F11/1687Temporal synchronisation or re-synchronisation of redundant processing components at event level, e.g. by interrupt or result of polling

Definitions

  • the present invention relates generally to control apparatus for a computer central processing unit, and more particularly to a processor controller for faulttolerant computers having at least two central processors operating simultaneously.
  • Fault-tolerant computers typically utilize two or more separate central processing units. In the event one processing unit or processor fails, the remaining processor(s) is (are) relied upon to maintain the computer in a functioning mode.
  • Various techniques have been used to detect processor failures.
  • One such technique commonly referred to as the lockstep method, utilizes multiple processors executing identical code. The processors are all clocked by a common clock source so that every bus cycle of the processors will be identical to one another, provided the processors are operating properly. Comparison of the data, addresses or control output of the processors are periodically made to verify proper operation. If the comparison is negative, steps are taken to determine which of the processors has failed so that the processor may be disabled.
  • Another technique for verifying proper processor operation is sometimes referred to as the checkpoint approach.
  • Each of the processors are processing different code. Periodically, each of the processors will transmit a signal to the other processor that the transmitting processor is operating properly. Steps are taken to ensure that a defective processor will not issue an OK signal. If the signal is not periodically received, it is assumed that a fault in the processor has occurred. In addition, data from each of the processors are periodically made available to another processor (i.e., a checkpoint) so that an operating processor will be able to assume the operations of a failed processor, at a reduced rate, without loss of data.
  • a checkpoint another processor
  • the above-described techniques for verifying proper processor operation possess various shortcomings.
  • the lockstep approach cannot be implemented using commercially available hardware and operating systems without major modifications.
  • the lockstep approach utilizes a common clock, therefore a failure of the clock source will result in a total failure of the system.
  • the checkpoint approach requires that excess processor capacity be utilized so that the system response time will be maintained following a processor failure. If excess processor capacity is not used, it is necessary to reduce the number of functions carried out by the system in order to maintain the same response time.
  • the checkpoint approach cannot be applied using most existing hardware and operating systems without relatively major modifications.
  • existing application programs typically must be extensively modified or rewritten to be fault tolerant.
  • the present system overcomes the shortcomings of the prior art approaches.
  • a processor controller for a computer system having a plurality of processors which execute programmed instructions is disclosed.
  • An exemplary computer system would be a fault-tolerant computer having redundant processors which execute substantially identical instructions.
  • the subject controller receives asynchronous external signals, primarily interrupt signals, and forwards the asynchronous signals to the processors in a predetermined manner.
  • the controller includes apparatus for determining the position of each of the processors in executing the code associated with the processor.
  • the position is referred to herein as virtual time.
  • the position can be determined, for example, by monitoring the number of bus cycles which have been executed by each of the processors. This may be accomplished by counting the number of address strobes generated by each processor, although other signals may be used for this purpose.
  • the number of bus cycles that a particular processor has executed is used to determine the location of the processor in virtual time.
  • the processors will typically execute their respective identical programs at different rates. Thus, at any given point in actual time, the processors are likely to be at different positions in virtual time.
  • the subject controller When an asynchronous signal, such as interrupt, is received, the subject controller examines the virtual time of each processor. If the virtual times are identical, the controller notifies each of the processors of the signal. If the processors are at different virtual times, one embodiment of the subject controller refrains from forwarding the asynchronous signal to either processor. The leading processor in virtual time is halted and the trailing processor is permitted to execute code until it has reached the leader in virtual time. Once the virtual times are the same, the asynchronous signal is presented to both processors at the same actual time.
  • an asynchronous signal such as interrupt
  • the asynchronous signal is forwarded to the lead processor when the signal is received.
  • the virtual time of the leading processor is then stored.
  • the lagging processor is then permitted to execute code until its virtual time matches the stored virtual time. At that point, the lagging processor is notified of the asynchronous signal.
  • the processors are notified of the asynchronous signal at the same virtual time. However, the actual times of the notification will not usually be the same.
  • the subject controller preferably also includes means for verifying proper processor operation.
  • the two processors will have been programmed to periodically request for synchronization by the controller. When each such request is made, the requesting processor is immediately halted.
  • the lead processor will be the first to issue a request to the controller and thus will be halted first.
  • the following processor will continue to execute code until it also makes a request for synchronization. At this time both processors will be halted. Since the processors are supposed to be executing identical code, the processors should be halted at the same virtual time.
  • the subject controller verifies that the virtual times are the same, and then releases the processors. In the event the virtual times are not identical, an error is reported.
  • Figure 1 is a simplified block diagram of a fault-tolerant computer utilizing the subject processor controller.
  • FIG. 2 is a simplified flow chart which illustrates the manner in which external events such as interrupts are processed by the subject processor controller.
  • Figure 3 is a flow chart which illustrates the manner in which a first embodiment of the subject controller processes external events.
  • Figure 4 is a timing diagram which depicts an exemplary operation of the Figure 3 embodiment of the subject invention.
  • Figure 5 is a flow chart which illustrates the manner in which a second embodiment of the subject controller processes external events.
  • Figure 6 is a timing diagram which depicts an exemplary operation of the Figure 5 embodiment of the subject invention.
  • Figure 7 is a flow chart which illustrates the manner in which proper operation of the processors is verified by the subject controller.
  • Figure 8 is a timing diagram which depicts an exemplary verification operation of the subject controller.
  • Figure 9 is a functional block diagram of the first embodiment of the subject controller.
  • Figure 10 is a functional block diagram of the state machine of the subject controller.
  • Figure 11 is a flow chart illustrating the overall operation of the subject controller state machine.
  • Figure 12 is a flow chart showing the operation of the Counters Running Main Loop block of the subject controller state machine.
  • Figure 13 is a flow chart showing the operation of the Interrupt Handler block of the subject controller state machine.
  • Figure 14 is a flow chart illustrating the operation of the Autohalt Handler block of the subject controller state machine.
  • Figure 15 is a flow chart illustrating the operation of the Present Interrupts block of the subject controller state machine.
  • Figure 16 is a flow chart illustrating the operation of the Wait For Counters On Request block of the subject controller state machine.
  • Figure 17 is a detailed schematic drawing of a portion of the subject controller, including the virtual time counters and comparator circuits.
  • Figure 18 is a detailed schematic drawing of a portion of the subject controller, including the timeout timer and halt control circuitry.
  • Figure 19 is a detailed schematic drawing of a portion of 16 subject controller, including various interface circuits.
  • Figure 20 is a detailed schematic drawing of a portion of the subject controller, including interrupt control circuitry.
  • Figure 21 is a detailed schematic drawing of a portion of the subject controller, including the state machine microcode memory.
  • Figure 22 is a detailed schematic diagram of a portion of the subject controller, including bus interface circuits.
  • Figure 23 is a detailed schematic diagram of a portion of the subject controller, including the connector pin assignments.
  • the computer includes a pair of substantially identical processors, including a processor A which is designated by the numeral 30 and a processor B, which is designated by the numeral 32.
  • Each processor includes an internal microprocessor (not shown), such as the 16-bit microprocessor manufactured by Motorola, Inc., under the designation MC 68000.
  • Each processor further includes a local memory (not shown), a peripheral device in the form of a terminal controller (not shown) connected to a local bus (not shown).
  • the computer system further includes a common memory, represented by block 34, which may be accessed by either processor 30 or 32 by way of an interprocessor bus 36.
  • Peripheral devices such as terminal controllers and disc controllers, are also coupled to the interprocessor bus, as represented by block 38.
  • the subject processor controller generally designated by the numeral 40, is also coupled to bus 36.
  • Processor controller 40 performs various functions to be described later in greater detail. One such function is to verify proper operation of the two processors 30 and 32. Another exemplary function is to control the flow of asynchronous external events, primarily interrupts, to the two processors.
  • Processors A and B together with processor controller 40 can be viewed as a single logical processor.
  • processor controller 40 receives interrupts and other asynchronous signals from external sources which are intended for the two processors. These signals will be collectively referred to herein as interrupts.
  • Processors A and B are identically programmed and, under normal operating conditions, execute identical code. For various reasons, however, the processors typically do not execute the same code at the same time. For example, if the two processors request interprocessor bus 36 simultaneously, only one processor will be given access. The remaining processor will be requested to wait until the first processor has released the bus. As a consequence, the second processor to access the bus will fall behind the first processor in executing the common code.
  • the two processors have separate internal clocks which are not identical in frequency. The processor with the higher frequency clock will tend to execute code at a higher rate.
  • Virtual time can be defined as the measure of how far a particular processor has progressed in its execution of its associated code. The smallest measurable change in virtual t ime can b e expressed in terms of virtual time ticks.
  • processor events There are several possible choices of processor events which can be used as a virtual time tick for the purpose of measuring virtual time. Depending upon the particular processor used, instructions, data references or bus cycles could be used as virtual time ticks. MOVE instructions, procedure calls and the like could also be used for this purpose.
  • the ideal virtual time tick is chosen such that the processors will be at the same virtual time when they are at the same point in the execution of their respective code.
  • the capability must exist to count each tick of virtual time. For example, instruction counts should not be used as virtual time ticks for processors such as the Motorola model MC68010 which utilize an internal cache memory. This is because there are no external indications of instruction cycles for this type of processor. However, instruction fetches or bus cycles could be used as virtual time ticks.
  • the virtual time tick should occur at least once during a predetermined maximum period of time. It is preferable that the ticks occur at least once every few microseconds or even more frequently. Ticks which occur at a lower frequency do not provide sufficient resolution for verifying proper processor operation.
  • the subject controller monitors the positions of each of the processors in virtual time and presents interrupts to the processors only at the same point in virtual time.
  • This aspect of the invention is illustrated in the flow chart of Figure 2.
  • the controller monitors for asynchronous external events (interrupts) which are intended for the two processors. If no events are detected, the controller remains in a loop condition. After receipt of an interrupt, the two processors are notified of the interrupt at the same virtual time, as indicated by block 46. The controller then sequences back to state 44 and waits for further interrupts.
  • Figure 3 illustrates a first embodiment of the subject controller. At state 44, the controller waits for an interrupt. After an interrupt is received, the lead processor in virtual time is halted as indicated by block 46.
  • This processor will no longer advance in virtual time. Meanwhile, the lagging processor is permitted to execute code. As indicated by element 48, when the lagging processor has finally reached the halted processor in virtual time, both processors are notified of the interrupt as indicated by block 50. Thus, the notification is given at the same virtual and actual time.
  • the operation of the first embodiment controller is further illustrated in the timing diagram of Figure 4.
  • the vertical axis of the graph represents virtual time and the horizontal axis represents actual time.
  • the solid line A and broken line B represent the states of processors A and B, respectively. At virtual and actual times V o and T o , lines A and B indicate that the processors are at the same virtual and actual times.
  • processor A is at time T 2 .
  • processor B starts to lag in virtual time.
  • the two processors advance in both actual and virtual time until actual time T 14 , when an interrupt is detected by the controller.
  • processor A is at virtual time V 10 and processor B is at virtual time V 6 .
  • the controller then commands the advanced processor, processor A, to halt.
  • the processor will not proceed further in virtual t ime until it is un-halted.
  • processor B is permitted to advance in virtual time.
  • processor B also reaches virtual time V 10 , at actual time T 19 , the two processor are at the same virtual and actual times. At this time both processors are un-halted and notified of the interrupt.
  • the processors then service the interrupt and proceed to execute code.
  • FIG. 5 Operation of a second embodiment of the subject invention is illustrated in the flow chart of Figure 5.
  • the controller waits for an interrupt.
  • the lead processor in virtual time is notified of the interrupt as indicated by block 52.
  • the virtual time at which the notification is given is stored.
  • the leading processor then services the interrupt and proceeds to process code.
  • the controller compares the stored virtual time with the virtual time of the lagging processor as indicated by element 54. When the virtual times are equal, the lagging processor is notified of the interrupt, as represented by block 56.
  • the second processor then services the interrupt and proceeds to process code.
  • FIG 6 a timing diagram similar to that depicted in Figure 4 is used to further illustrate the operation of the second embodiment controller.
  • processor A is leading processor in virtual time.
  • the controler receives an interrupt.
  • Lagging processor B was only at virtual time V 6 when the lead processor was notified of the interrupt. Processor B continues to process code until it reaches virtual time V 8 and real time T 17 . At this time, the controller detects that the virtual time of the lagging processor matches the stored virtual time V 8 and processor B is notified of the interrupt. Processor B then services the interrupt and continues to process code. Thus, both processors are notified of the interrupt at the same virtual time, but not necessarily the same actual time.
  • Controller 40 derives the virtual time of the two processors by counting processor local bus cycles. A bus cycle, or virtual time tick, occurs whenever the processor moves data via the local bus. The local bus cycles of the Motorola MC 68000 microprocessor are indicated by occurrences of address strobes. Address strobe signals from processor A, referred to as Clock A, are coupled to controller 40, as indicated by line 52. Similarly, address strobe signals from processor B, referred to as Clock B, are also fed to controller 40, as indicated by line 52'. The controller provides processor halt signals to processors A and B as represented by lines 54 and 54', respectively. The interrupts received by the controller on line 42 are forwarded to processors A and B at the same virtual time on lines 56 and 56', respectively.
  • processors A and B have a means for communicating with one another by, for example, exchanging messages. Unless semaphore techniques are used, messages cannot be reliably exchanged unless the processors are synchronized.
  • the processors are synchronized by definition when they are simultaneously at the same virtual and actual times. To enable an exchange of messages and for other reasons, the subject invention is capable of synchronizing the processors when requested by the processors to do so. Once the processors are synchronized, the subject controller also verifies proper processor operation.
  • the processors will be programmed to request synchronization at the same virtual times.
  • the processor most advanced in virtual time will be the first to request synchronization.
  • the processors are halted by the subject controller. Since both processors are supposed to be executing identical code, the lagging processor will eventually also request a synchronization. Again, the processor will have halted just after the issuance of the request.
  • the controller then compares the virtual times of the two processors as indicated by element 62. If both processors are operating properly, the virtual times should be equal.
  • processors are released or un-halted as shown by block 64.
  • the processors are thereby synchronized. If the virtual times do not correspond, the processors are notified that there is an error and that they are not synchronized as indicated by block 66.
  • the operation of the processor synchronization process is further illustrated by the timing diagram of
  • FIG 8 which is similar to the diagrams of Figures 4 and 6.
  • the two processors are synchronized. As time passes, the processors go out of synchronization, with processor A leading.
  • processor A requests that the processors be synchronized and is halted.
  • processor B requests that the processors be synchronized and is also halted.
  • the virtual times of the two processors are then compared. Proper operation is verified since both processors are at virtual time V 8 . The processors are then un-halted and permitted to continue executing code. If the virtual times did not correspond, the processors would have been notified of the anomaly.
  • each processor would be suitably programmed to request verification of proper operation at the same point in virtual time. If, in fact, the requests for verification are received by the subject controller at the same virtual times, proper operation is confirmed. If the requests occur at different virtual times, a fault has occurred and the processors are notified of the anomaly.
  • the two processors are identical in hardware configuration, that the processors execute identical code, and that the data present within the processors are identical.
  • the processors then are capable of executing non-identical code with the controller off. During this time, the controller will not forward interrupts to the two independently-operating processors.
  • the processor programs also include instruction for turning the subject controller back on for synchronized operation. Steps are taken to ensure that both processors turn the subject controller off at the same virtual time and that the means for tracking virtual time on the controller is reinitialized when the controller is turned back on.
  • the controller receives the clock A and clock B signals from the processors on lines 52 and 52', respectively.
  • the clocks are synchronized with the internal processor controller clock (not shown) utilizing conventional synchronization circuits 70 and 70'.
  • a pair of virtual time counters, 72 and 72', are provided for counting the two clocks outputed by the synchronizers.
  • the subject controller further includes a state machine 68 which provides various control outputs in response to microcoded instructions and selected condition inputs.
  • the major components of the state machine are shown in Figure 10.
  • the machine includes a memory 76. loaded with approximately 256 32-bit microcode instructions.
  • Memory 76 is a Programmable Read Only Memory (PROM) although a Random Access Memory (RAM) may be used provided a non-volatile memory is included for permanently storing the microcode.
  • a program counter 78 is included for providing an 8-bit address for memory 76.
  • the state machine further includes a branch condition selector multiplexer having 24 inputs which represent 24 different conditions of the subject processor controller.
  • Input 1 may be a logic 1 when the virtual time counter 72 for processor A is greater than the counter 72' for processor B and a logic 0 at all other times.
  • the particular condition to be selected is determined by 5 bits of the 32 bits of data read from memory 76, as represented by line 82.
  • the output of multiplexer 80 is coupled to the load input of program counter 78. If the selected condition is not present, program counter 78 increments to the next higher memory 76 address to read the next instruction. If the condition is present, 8 bits of the 32-bits of data read from the memory are loaded into counter 78 and used as the address to read the next instruction out of memory. Unless another branch condition occurs, the counter will increment from the new branch address. The remaining 19 bits of the 32 bits of data from the memory are used as control outputs for the subject processor controller.
  • state machine 68 generates count enable signals on lines 84 and 84' which are coupled to logic elements 74 and 74', respectively.
  • virtual time counters 72 and 72' count clocks from the processors A and B, respectively.
  • the outputs of the virtual time counters 72 and 72' are compared by the subject controller as represented by block 86. If processor A is advanced further in virtual time than processor B, a signal indicative of this condition is generated on line 88 and forwarded to state machine 68. If processor A is lagging in virtual time, a signal indicative of this condition is generated on line 90 and forwarded to to the machine. If processor A is neither leading nor lagging, the state machine concludes that the virtual time counters are equal.
  • Halt control circuitry 88 and 88' is provided which, among other things, synchronize the Halt A and Halt B signal with the bus cycles of the respective processors.
  • the processors can request the subject controller to synchronize the processors. Also, the processors can instruct the controller to either turn on or turn off the virtual time counter 12 and 12'. This may be done if the processors are programmed to operate independently of one another for a period of time. These functions are accomplished utilizing a pair of control registers 94 and 94' which are associated with processor A and B, respectively. Control registers 94 and 94' are coupled in a timemultiplexed fashion to the local data bases of processors A and B, respectively, by way of interprocessor bus 36 ( Figure 1). An interprocessor bus interface circuit 97 is provided for interfacing between the interprocessor bus 36 and the local bus of the subject controller.
  • Interface circuit 97 includes address recognition apparatus which detects commands addressed to control registers 94 and 94' and provides enable signals to the registers on lines 102 and 102', respectively.
  • the enable signals cause command data addressed to registers 94 and 94' to be loaded.
  • the address recognition circuit causes a signal to be sent back to the processor which originated the command, acknowledging receipt of the command.
  • the commands which may be loaded into registers 94 and 94' includes virtual time counter on commands, counter off commands and synchronization request commands.
  • an autohalt signal is forwarded to the halt control circuitry as represented by lines 95 and 95'.
  • the autohalt signals cause the halt control circuitry to halt the particular processor which transmitted the command. The halt is achieved very quickly since state machine 68 is not utilized to generate the autohalt signals.
  • control registers 94 and 94' are then transferred to state machine 68 as represented by lines 104, 104', 106 and 106'. After the commands have been acted upon by state machine 68, the registers are cleared as indicated by lines 110 and 110'.
  • the processors also provide lock requests, reset requests and (by implication) unreset requests, which are received by control registers 94 and 94' and transferred to state machine 68. These latter three request commands will be subsequently described.
  • error data identifying the type of error is generated on line 114.
  • the error status data will then be loaded into an error status register 116.
  • the processors will periodically request error status data from the controller by generating an appropriate address which wil be detected by the address recognition circuitry of interprocessor bus interface 97.
  • An enable signal will be produced on line 120 causing error status data to be forwarded to the requesting processor, as indicated by line 117, via interprocessor bus interface 97.
  • the address recognition circuit of interface 97 will transmit an acknowledgment signal back to the processor requesting the error status data.
  • the subject controller further includes interrupt control circuitry 120.
  • interrupt control circuitry 120 When an interrupt for the processors is received on line 42, control 120 forwards an interrupt request signal to state machine 68 by way of line 122.
  • comparator circuit 86 determines that the virtual times of the two processors are matched, an inhibit signal on line 124 to control 120 is taken away thereby enabling the control to notify processors A and B of the interrupt by way of lines 56 and 56', respectively.
  • interrupt levels 0 - 7, are defined.
  • Four interrupt levels, 0 - 3, are reserved for devices to assert directly to the processors, by-passing the subject controller.
  • Interrupt levels 1 - 3 are reprogrammed to be forwarded by the controller as levels 4 - 6, respectively.
  • Level 0 is reprogrammed as level 7.
  • Interrupt level 0 is the software refresh timer which must be received by the processors at least every two milliseconds for proper operation of the processor's internal dynamic memory.
  • the state machine program counter 78 is in a reset state at address 00 as represented by block 128. This is a latched state wherein virtual time counters 72 and 72' ( Figure 9) are cleared, but not running, and control registers 94 and 94' and halt control circuits 38 and 88' are cleared. The controller stays in the reset state until one or both of the processors unresets either or both of control registers 94, 94'. This unreset request is the only processor command which need not be given by both processors in order to be acted upon.
  • the program counter advances to address 04, as indicated by block 131, and the state machine waits for processors A and B to issue a request to turn the virtual time counters on. Since the virtual counters are already off, the processors should not issue a counters off request or a synchronization request. If either of such requests occurs from either of the processors, the program will report an error as represented by Error Reporting block 135. The particular type of error which has occurred can be determined from the error status data of the
  • the subject controller stops further operation thereby permitting the processors to examine the outputs of virtual time counters 72 and 72', control registers 94 and 94' and other circuits to facilitate locating the source of the error.
  • the subject controller will issue a status word to the processors indicating that the counters have been turned on.
  • the program will then advance to address 57 of a Present Interrupts block 130.
  • a comparison is made of the count of the virtual timers. If the counts are identical, any interrupts presently on line 42 ( Figures 1 and 9) will be presented to both processors by the Present Interrupts block. Once all interrupts have been presented, the virtual counters 72 and 72' are enabled.
  • the program then makes an unconditional jump from the Present Interrupts block and enters the Main Counters Running Loop at address 17 as indicated by block 133.
  • the program will remain in the loop until either an interrupt is received or the processors issues an autohalt signal.
  • An autohalt will be issued anytime the processors make a synchronization request, a counters on request or a counters off request.
  • the program will jump to address 20.
  • the interrupt will then be processed by an Interrupt Handler, as represented by block 132.
  • the processor which is most advanced in virtual time will, be halted.
  • the program proceeds to address 57 of the Present Interrupt block 130, at which time interrupts are presented to the two processors. If errors are detected by Interrupt Handler 132, the program will proceed to Error Reporting block 135 to report the error to the processors. An exemplary error would indicate that the lagging processor has failed to respond to a halt request.
  • Interrupt Handler 132 is not implemented to process autohalts.
  • Autohalt Handler 134 responds to autohalt signals which are produced when the processors request that the virtual time counters be turned on, that the counters be turned off or that the processors be synchronized.
  • the autohalt signal cause the processor which issued the request to half, independent of the state machine of the subject controller. If processors request that they be synchronized, Autohalt Handler 134 will verify that the processors have been halted at the same virtual time. If the virtual time counters are not equal, then the program proceeds to Error Reporting block 132.
  • the processors should not issue a counters on request while the counters are running. Thus, if this request is made, the program will also proceed from the Autohalt Handler to block 135 so that an error will be reported. If the processors make a counters off request, the program will proceed from the Autohalt Handler to Wait For Counters on Request block 131 and wait until a counters on request issues.
  • the counters off request allows the processors to diverge and accomplish different tasks in a protected manner.
  • the subject controller will not forward interrupts while the counters are off. This ensures that there will be no critical points while the processors are diverged.
  • a critical point occurs when data are either being changed and/or tested in the normal execution of the code when an asynchronous event, such as an interrupt, takes place which also changes and/or tests the same data.
  • the one interrupt which is forwarded while the counters are off is the software refresh interrupt, a level 7 interrupt which occurs typically every two milliseconds. Steps are taken in the software such that critical points do not occur during a software refresh routine.
  • the program will proceed to error reporting block 135.
  • the program will then remain in Wait For Counters On Request block 131 until the processors requests that the counters be turned back on. Certain steps must be taken to configure the processors so that they will again execute similar code after having been diverged.
  • one of the processors will issue the previously-noted lock request which is received by control registers 94 and 94' ( Figure 9).
  • the processor then loops, waiting for the subject controller to issue a refresh-locked-out status bit. When this status bit is active, the subject controller will refrain from forwarding refresh interrupts to the processor.
  • the synchronization requests which are processed by Autohalt Handler block 134, are issued by the processors primarily to verify proper processor operation.
  • the requests which should be issued fay the processors at the same virtual times, cause the counters to first be turned off and the virtual time counters to be compared. The counters are then turned back on. The counters are turned on at the same virtual time so that the processors will be synchronized in both actual and virtual times. If the processors wish to exchange data, the processors can cause the data to be written in a predetermined location, issue a synchronization request, and then read the data of the other processor.
  • the synchronization request ensures that each processor will have written the data before it is read by the other processor.
  • processors must utilize a synchronization request in order to synchronize themselves if the virtual time counters are running. For example, if a polling loop were to be used while the processors are waiting for certain data to appear, there is no way to ensure that the processors will execute the loop the same number of times since the processors are not running at identical speeds. Thus, the virtual times of the processors will not be the same and the subject controller will conclude that an error has occurred.
  • the subject controller may also be reset by the processors, in which case the sequence will return to Reset block 128.
  • a reset request may be made by either or both processors. In the event only one processor makes a reset request, the request command must be forwarded to both control registers 94 and 94'.
  • Figure 12 shows a more detailed flow chart further illustrating the operation of Counter Running Main Loop block 133.
  • the program will jump to address 20 of Interrupt Handler 132. Otherwise, the counter will increment to address 18 at which time a determination is made as to whether processor A has issued an autohalt signal AAHLT, as indicated by block 138. If the signal has issued, the program will jump to address 63 of Autohalt Handler 134. If the signal has not issued, the counter will advance to address 19 at which time a determination will be made as to whether processor B has issued an autohalt signal BAHLT, as indicated by block 140. If the signal has issued, the program will proceed to the Autohalt Handler, otherwise the program will branch back to address 17 as indicated by block 142.
  • Interrupt Handler 132 is shown in the flow chart of Figure 13. As previously described, when an interrupt is received while the program is in the Counters Running Main Loop block, the program branches to address 20 of Interrupt Handler 132 as indicated by element 142. At this time, a time-out timer (not shown) is cleared. The time-out timer is used to determine whether an anticipated event occurs within a predetermined time period. The timer of the present embodiment will time-out after 360 microseconds. A determination is then made as to whether the comparator 86 output on line 88 ( Figure 9) indicates whether the virtual time counter 72 for processor A is greater than the counter for processor B. If so, the program will jump to address 3C as represented by block 144.
  • a time-out timer (not shown) is cleared. The time-out timer is used to determine whether an anticipated event occurs within a predetermined time period. The timer of the present embodiment will time-out after 360 microseconds. A determination is then made as to whether the comparator 86 output on line 88 ( Figure 9) indicates whether
  • processor A is not leading processor B
  • the program will proceed to address 21.
  • a determination is then made as to whether comparator 86 indicates that processor B is the leading processor. If this is the case, the program will proceed to address 3A as indicated by block 148. If processor B is not leading, the virtual time of the processor happened to be equal when the interrupt was received. In this unlikely event, the program will proceed to address 22.
  • both processors are instructed to halt at the first opportunity utilizing halt controls 88 and 88'. This is signified, by the terms A PGM HALT and B PGM HALT. The processors are designed to actually halt at the end of the processor bus cycle after receipt of these signals.
  • block 144 indicates that processor A is instructed to halt at the first opportunity.
  • Expression B EQ HALT of block 144 signifies that control register 88' is set such that processor B will halt when the processor B virtual time counter catches up with the counter of halted processor A.
  • block 148 indicates that processor B is instructed to halt at the next opportunity and processor A is instructed to halt at the same virtual time as that of halted processor B.
  • the program will advance to address 23 as indicated by block 152. At this time, one or both of the processors will have been instructed to halt. However, it is possible that the processors have not yet reached the end of the current bus cycle where they are designed to actually halt. At block 152, the term
  • BHLTQQQQ signifies that a determination is made as to whether processor B has received the halt instruction provided by the state machine of subject controller. This is in contradistinction to halt which results from the occurrence of an autohalt signal which is produced when a processor has issued a command to the subject controller. Assuming that the answer is no, the program advances to address 24 where element 154 signifies that a determination is made as to whether processor B has been halted as the result of an autohalt signal. This is represented by the term BAHLT. If this event has occurred, the sequence will be controlled by Autohalt Handler 134.
  • the program will then jump to address 3E, as signified by block 156, at which time halt control circuitry 88 and 88' are cleared so that no halt originating from the state machine of the subject controller will be acted upon by the processors.
  • the program will then proceed to the Autohalt Handler. Assuming that an autohalt has not occurred, the program will proceed to address 25 as indicated by block 158.
  • a determination is then made as to whether the time-out timer has timed out. The 360 microsecond period is sufficiently long such that, unless an anomaly has occurred, processor B will have acknowledged that it received a halt command from the subject controller. Note that the halt command could have originated from any of three sources represented by blocks 144, 148 or 150.
  • the program will continue in the loop defined by blocks 152, 154 and 158 until either a time out occurs or the halt command is received by processor B. If a time out occurs, an error is reported to Error Reporting block 132, otherwise, the program will proceed to block 160.
  • a determination is made as to whether processor B has actually responded to the controller halt commands. If not, a determination is made at block 162 as to whether a time out has occurred. The program will remain in the loop defined by blocks 160 and 162 until a time out occurs, in which case an error will be reported, or an indication is received that processor B has actually halted.
  • processor B has halted, the program will proceed to address 29 as represented by block 164. At this time, a similar sequence for processor A is carried out.
  • a determination is made as to whether processor A has received the halt command which originated from the state machine of the controller.
  • a test is made as to whether an autohalt has been caused to be issued by processor A.
  • a time-out condition is tested at block 168.
  • a determination is made as to whether processor A has actually halted.
  • both processors When the program has advanced to address 2F, both processors will actually have been halted.
  • the subject controller is implemented such that, at this point, the two virtual time counters will be close to one another, but usually not the same.
  • the lagging processor will overshoot the leader by one or two counts. In that case, the leading processor, which becomes the lagging processor, will be single stepped by the subject controller until the counts are identical, as described as follows.
  • halt control registers 88 and 88' are instructed to halt both processors in the event the processors proceed to run at their normal rate. This precaution is taken because, under certain circumstances, a processor may respond to single step instruction by running rather than single stepping.
  • processor B being single-stepped until a determination is made at blocks 180 and 186 that the counters are identical.
  • the program will then proceed to address BF at block 192 at which time both processors will be single stepped an identical number of times as a precaution to avoid certain anomalies.
  • the program will then advance to Present Interrupts block 130. In the event processor B had overshot processor A, the program would have advanced from block 186 to blocks 188 and 190 so that processor A would be single-stepped in the same manner.
  • Autohalt Handler 134 is depicted in the flow chart of Figure 14.
  • processors A and B cause autohalt signals to be issued in the event the processors request synchronization, request that the virtual time counters be turned on or request that the virtual time counters be turned off. If an autohalt signal is produced when the program is in Counters Running Main Loop 130, the program will jump to address 63 as indicated by block 192. At this time, the 360 microsecond time-out counter will be cleared. The program will then advance to address 64. In the event an autohalt signal is produced while the program is in the Interrupt Handler, the program will also jump to address 64.
  • the program will proceed to address 70 where, as represented by element 198, a determination will be made as to whether the time-out timer has timed out. Assuming that it has not timed out, the program will jump back to block 196. The program will remain in this loop until either a time out has occurred or processor B has caused an autohalt. If a time out occurs, an error message is forwarded to Error Reporting block 135, otherwise the program will branch to address 68. In the event the autohalt was caused by processor B, a similar wait sequence for a processor A autohalt will be performed according to blocks 200 and 202.
  • the processors cause themselves to be halted upon their issuance of any of the three previously-noted requests to the subject controller.
  • the two virtual time counters should be exactly the same when an autohalt signal has occurred.
  • the processor A counter is greater than that of B, an error will be forwarded to Error Reporting block 135.
  • the program will proceed to address 69 where, as indicated by block 206, a determination is made as to whether the A counter value is less than the B value. If not, the counters are identical and the program will advance to address 6A. If A is smaller, an error will be forwarded to Error Reporting block 132.
  • the virtual time counters are running either because the processors have just requested synchronization or have just requested that the counters be turned off. Accordingly, a request that the counters be turned on is not proper at this time.
  • a determination is made as to whether processor A had requested counters on. If so the program will jump to address 76, as indicated by block 210, and a determination will be made if processor B also issued a counter on request. In either case, an error will be forwarded to Error Reporting block 135. Two different error signal are produced to facilitate locating the source of the anomaly by the processors.
  • the program will proceed to address 6B as shown by element 212. A determination will then be made as to whether processor A had requested that the counters be turned off. If this request has been made, the program will proceed to address 74, as shown by element 214. At this address, it is determined whether processor B has issued the same request. If not, an error will be forwarded to Error Reporting block 135. If both processors requested that the counters be turned off, the counters are stopped and the program will proceed to address 03, which is Wait For Counter On Request block 131.
  • the program will proceed to address 6C as indicated by block 216 and a determination will be made as to whether processor A requested a synchronization. If this request was made, the program will proceed to address 78, as shown in block 218. A determination will then be made as to whether processor B made a similar request. If not, an error signal will be forwarded to Error Reporting block 135. If a similar request was made, the program will jump to address 57 of Present Interrupts block 130. If a synchronization was not requested, an anomaly has occurred since it had already been determined that the processors had not requested that the counters be turned off. These being the only two valid processor requests executed by the Autohalt Handler, an error will be reported.
  • FIG. 15 A flow chart further illustrating the operation of Present Interrupts block 130 is shown in Figure 15. If the Present Interrupt block is entered from either Autohalt Handler block 134 or Wait For Counters On Request block 131, the program will proceed to address 57. As indicated by block 222 and 224, if the virtual time counters are not equal at this point, an error is forwarded to Error Reporting block 135. If the counters are equal, the program will advance to address 5A. Also, any outputs from Interrupt Handler 132 enter the Present Interrupts block at this point. As indicated by block 226, a delay timer is reset at this time. The delay timer typically provides a 12 microsecond delay before presenting interrupts to the two processors.
  • the program will then proceed to address B5 where a determination is made as to whether the 12 microsecond delay provided by the delay timer has lapsed. The program will remain in this loop until the delay is over and then proceed to address 36. As indicated by block 230, all pending interrupts are then presented to the two processors. In addition, both virtual time counters are cleared. The program then will advance to address 5E as shown by block 232 at which time the halt control circuits 88 and 88' will be cleared, this being required before the processors can be released or un-halted.
  • the virtual time counters are off, either because the computer was just powered-up and the processors have not yet requested that the counters be turned on or because the processors had previously requested that the counters be turned off. All halts are cleared from halt control circuits 88 and 88' and the control registers 94 and 94' are cleared.
  • the program then proceeds to address 04, as indicated by element 240, and it is determined whether processor A has requested that the virtual time counters be turned on. If such a request has been made, the program will proceed to address OE shown in block 252 at which t.ime the 360 microsecond time-out timer is reset.
  • the program will then advance to address OF where a determination is made as to whether processor B has also requested that the counters turned on. Unless the two processors are closely synchronized at this time, the lagging processor, processor B, probably will not have yet made the request, therefore the program will proceed to address 10 as indicated by block 256. If the time-out timer has not timed out, the program will jump back to address OF of block 254. The program will remain in this loop until either 360 microseconds have lapsed, in which case an error signal is transferred to Error Reporting block 135, or processor B has also requested that the counters be turned on. If processor B has made the request within 360 microseconds, the program will proceed to the Present Interrupts block at which time the counters will be turned on.
  • processor A has not yet made a turn on request, the program will proceed to address 05, as indicated by block 242. If a request has been made by processor B, the program will then wait for a request by processor A, as represented by blocks 258, 260 and 262. If A does not make a timely request, an error will be reported. If such a request is made, the program will proceed to the Present Interrupts block. Assuming that neither processor has requested a counter turn on, the subject controller will then verify that no improper processor requests have been made. As indicated by blocks 244 and 246, if either processor requests that the counters be turned off, an error will be reported since the counters are already off. Similarly, if either processor should request a synchronization, an error will be reported since such a request is not permitted when the counters are off. If no errors are reported, the program will jump back to address 04 and repeat the sequence.
  • microcode listing is for use with a particular implementation of the state machine.
  • the details of such particular implementation are shown in Figures 17 through 24 of the drawings.
  • Figures 17 through 24 of the drawings show detailed schematic diagrams of one implementation of the subject processor controller which incorporates a state machine which utilizes the submitted microcode instructions.
  • the schematic diagrams use conventional circuit symbols which can be readily understood by persons having even less than average skill in the applicable art. Accordingly, further description of the schematic diagrams will not be given, except for the following listing of descriptions of certain ones of the integrated circuits called out in the diagrams:
  • the interrupt handler is only executed when the counters are turned on.
  • Mcode sets the hard err bit (which can be strapped to NMI) , freezes the counters while in this loop to maybe help with diagnostics. and removes all halts from A&B so that, they are free to diagnose the problem.
  • This routine sets two bits in the bit register to indicate that cntrs were off when procs requested counters off.
  • First bit is set here,second bit is set by jumping to error routine that sets what was called the TIMEOUT bit but is now used as a modifier bit in combination with the other bits.
  • timing pad prior to presenting interrupts, mcode stays at address B5 until timer has timed out, timer is jumper programmable-see page 7 of schematics for timing TIMPADQQ.
  • UNEXPECTED SUPERBIT - Super bit is the OR of either proc setting his unused bit in the control reg.
  • This bit is not really unused, it is used to tell the microcode to go back to waiting for "CNTR ON RQST" after microcode has detected an error and has sent NMI. However the processors may mistakingly set this bit, in that case the microcode sets the "UNexpectd Superbit” status bit.
  • the code then un-halts both procs if they are halted so that they can diagnose the NMI, and the mcode waits for SUPER Bit to first go low, getting rid of the unexpected Super bit and then go hi again telling the microcode to resume normal operation by Timeout ERRORS - Timeout errors are reported by setting the timeout status bit and waiting for the super bit to tell the mcooe to resume normal process son software prefers, normal operation can be resumed by Reseting and then unresetting the board. On the prototype board there is a reset bit, on BETA if this bit is not part of the design then reset will be accomplished by activating the INIT line, probably via the status board. Which brings up the next subject....

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

Contrôleur de processeur (40) à utiliser dans un ordinateur insensible aux défaillances (Fig. 1) possédant des processeurs redondants (30 et 32) et exécutant des programmes indentiques. Malgré le fait que les programmes sont identiques, les processeurs exécutent les instructions à des vitesses différentes puisque les processeurs ne sont par exemple pas autorisés à accéder à leur bus commun (36) au même moment effectif. Le contrôleur reçoit des signaux d'interruption (42) destinés aux processeurs et les présente aux processeurs respectifs à la même position dans le programme. Le contrôleur surveille la position de chaque processeur pendant son exécution du programme, d'une manière typique en comptant les signaux d'échantillonnage d'adresses. Dans un mode de réalisation, le processeur le plus avancé dans le programme est arrêté par le contrôleur jusqu'à ce qu'il soit rattrapé par le processeur en retard. A ce moment là, les interruptions sont présentées simultanément aux processeurs qui sont libérés. Dans un second mode de réalisation, l'interruption est transmise immédiatement aux processeurs en avance et la situation du processeur dans le programme est stockée. Le processeur en avance utilise ensuite l'interruption et poursuit le programme. Lorsque le processeur en retard arrive à la même position dans le programme, il reçoit également notification de l'interruption. Ce contrôleur permet l'utilisation de programmes d'applications disponibles commercialement sans modification spéciale.Processor controller (40) for use in a fault-tolerant computer (Fig. 1) having redundant processors (30 and 32) and running identical programs. Despite the fact that the programs are identical, the processors execute the instructions at different speeds since the processors are, for example, not authorized to access their common bus (36) at the same effective time. The controller receives interrupt signals (42) for the processors and presents them to the respective processors at the same position in the program. The controller monitors the position of each processor during program execution, typically by counting address sampling signals. In one embodiment, the most advanced processor in the program is stopped by the controller until it is caught up by the late processor. At this time, the interrupts are presented simultaneously to the processors which are released. In a second embodiment, the interrupt is immediately transmitted to the processors in advance and the situation of the processor in the program is stored. The advance processor then uses the interrupt and continues the program. When the late processor arrives at the same position in the program, it also receives notification of the interruption. This controller allows the use of commercially available application programs without any special modification.

Description

Description
Computer Processor Controller
Technical Field
The present invention relates generally to control apparatus for a computer central processing unit, and more particularly to a processor controller for faulttolerant computers having at least two central processors operating simultaneously.
Background Art The demand for high reliability computer systems has greatly increased in recent years because of the substantial losses, both economic and non-economic, which frequently result from computer downtime. This demand has also increased by virtue of the substantially decreased costs of hardware which has made fault-tolerant computers economically feasible.
Fault-tolerant computers typically utilize two or more separate central processing units. In the event one processing unit or processor fails, the remaining processor(s) is (are) relied upon to maintain the computer in a functioning mode. Various techniques have been used to detect processor failures. One such technique, commonly referred to as the lockstep method, utilizes multiple processors executing identical code. The processors are all clocked by a common clock source so that every bus cycle of the processors will be identical to one another, provided the processors are operating properly. Comparison of the data, addresses or control output of the processors are periodically made to verify proper operation. If the comparison is negative, steps are taken to determine which of the processors has failed so that the processor may be disabled. Another technique for verifying proper processor operation is sometimes referred to as the checkpoint approach. Each of the processors are processing different code. Periodically, each of the processors will transmit a signal to the other processor that the transmitting processor is operating properly. Steps are taken to ensure that a defective processor will not issue an OK signal. If the signal is not periodically received, it is assumed that a fault in the processor has occurred. In addition, data from each of the processors are periodically made available to another processor (i.e., a checkpoint) so that an operating processor will be able to assume the operations of a failed processor, at a reduced rate, without loss of data.
The above-described techniques for verifying proper processor operation possess various shortcomings. The lockstep approach cannot be implemented using commercially available hardware and operating systems without major modifications. Furthermore, the lockstep approach utilizes a common clock, therefore a failure of the clock source will result in a total failure of the system. The checkpoint approach requires that excess processor capacity be utilized so that the system response time will be maintained following a processor failure. If excess processor capacity is not used, it is necessary to reduce the number of functions carried out by the system in order to maintain the same response time. In addition, the checkpoint approach cannot be applied using most existing hardware and operating systems without relatively major modifications. Also, existing application programs typically must be extensively modified or rewritten to be fault tolerant. The present system overcomes the shortcomings of the prior art approaches. Existing hardware and application programs can be used with relatively minor modifications. In addition, each processor has a separate clock, therefore a failure of either clock will not result in a total system failure. Also, less system resources are consumed in supporting the present system as compared to the checkpoint system. Furthermore, following a processor failure, all processes (executing programs) continue execution without increasing response time, without reexecuting instructions and with essentially no delay. Finally, the present invention can be implemented at significantly lower costs than lockstep and checkpoint systems. These and other advantages of the present invention will become apparent to those skilled in the art after having read the following Best Mode for
Carrying Out the Invention together with the drawings.
Disclosure of the Invention
A processor controller for a computer system having a plurality of processors which execute programmed instructions is disclosed. An exemplary computer system would be a fault-tolerant computer having redundant processors which execute substantially identical instructions.
The subject controller receives asynchronous external signals, primarily interrupt signals, and forwards the asynchronous signals to the processors in a predetermined manner. The controller includes apparatus for determining the position of each of the processors in executing the code associated with the processor. The position is referred to herein as virtual time. The position can be determined, for example, by monitoring the number of bus cycles which have been executed by each of the processors. This may be accomplished by counting the number of address strobes generated by each processor, although other signals may be used for this purpose. The number of bus cycles that a particular processor has executed is used to determine the location of the processor in virtual time. The processors will typically execute their respective identical programs at different rates. Thus, at any given point in actual time, the processors are likely to be at different positions in virtual time.
When an asynchronous signal, such as interrupt, is received, the subject controller examines the virtual time of each processor. If the virtual times are identical, the controller notifies each of the processors of the signal. If the processors are at different virtual times, one embodiment of the subject controller refrains from forwarding the asynchronous signal to either processor. The leading processor in virtual time is halted and the trailing processor is permitted to execute code until it has reached the leader in virtual time. Once the virtual times are the same, the asynchronous signal is presented to both processors at the same actual time.
In a second embodiment of the subject invention, the asynchronous signal is forwarded to the lead processor when the signal is received. The virtual time of the leading processor is then stored. The lagging processor is then permitted to execute code until its virtual time matches the stored virtual time. At that point, the lagging processor is notified of the asynchronous signal. Thus, like the first embodiment controller, the processors are notified of the asynchronous signal at the same virtual time. However, the actual times of the notification will not usually be the same.
The subject controller preferably also includes means for verifying proper processor operation. In that event, the two processors will have been programmed to periodically request for synchronization by the controller. When each such request is made, the requesting processor is immediately halted. The lead processor will be the first to issue a request to the controller and thus will be halted first. The following processor will continue to execute code until it also makes a request for synchronization. At this time both processors will be halted. Since the processors are supposed to be executing identical code, the processors should be halted at the same virtual time. The subject controller verifies that the virtual times are the same, and then releases the processors. In the event the virtual times are not identical, an error is reported.
Brief Description of the Drawings
Figure 1 is a simplified block diagram of a fault-tolerant computer utilizing the subject processor controller.
Figure 2 is a simplified flow chart which illustrates the manner in which external events such as interrupts are processed by the subject processor controller.
Figure 3 is a flow chart which illustrates the manner in which a first embodiment of the subject controller processes external events.
Figure 4 is a timing diagram which depicts an exemplary operation of the Figure 3 embodiment of the subject invention.
Figure 5 is a flow chart which illustrates the manner in which a second embodiment of the subject controller processes external events.
Figure 6 is a timing diagram which depicts an exemplary operation of the Figure 5 embodiment of the subject invention. Figure 7 is a flow chart which illustrates the manner in which proper operation of the processors is verified by the subject controller.
Figure 8 is a timing diagram which depicts an exemplary verification operation of the subject controller.
Figure 9 is a functional block diagram of the first embodiment of the subject controller.
Figure 10 is a functional block diagram of the state machine of the subject controller.
Figure 11 is a flow chart illustrating the overall operation of the subject controller state machine.
Figure 12 is a flow chart showing the operation of the Counters Running Main Loop block of the subject controller state machine.
Figure 13 is a flow chart showing the operation of the Interrupt Handler block of the subject controller state machine.
Figure 14 is a flow chart illustrating the operation of the Autohalt Handler block of the subject controller state machine.
Figure 15 is a flow chart illustrating the operation of the Present Interrupts block of the subject controller state machine. Figure 16 is a flow chart illustrating the operation of the Wait For Counters On Request block of the subject controller state machine.
Figure 17 is a detailed schematic drawing of a portion of the subject controller, including the virtual time counters and comparator circuits.
Figure 18 is a detailed schematic drawing of a portion of the subject controller, including the timeout timer and halt control circuitry.
Figure 19 is a detailed schematic drawing of a portion of 16 subject controller, including various interface circuits. Figure 20 is a detailed schematic drawing of a portion of the subject controller, including interrupt control circuitry.
Figure 21 is a detailed schematic drawing of a portion of the subject controller, including the state machine microcode memory.
Figure 22 is a detailed schematic diagram of a portion of the subject controller, including bus interface circuits. Figure 23 is a detailed schematic diagram of a portion of the subject controller, including the connector pin assignments.
Best Mode for Carrying Out the Invention
Referring now to Figure 1 of the drawings a simplified block diagram of an exemplary fault-tolerant computer utilizing the subject processor controller may be seen. The computer includes a pair of substantially identical processors, including a processor A which is designated by the numeral 30 and a processor B, which is designated by the numeral 32. Each processor includes an internal microprocessor (not shown), such as the 16-bit microprocessor manufactured by Motorola, Inc., under the designation MC 68000. Each processor further includes a local memory (not shown), a peripheral device in the form of a terminal controller (not shown) connected to a local bus (not shown).
The computer system further includes a common memory, represented by block 34, which may be accessed by either processor 30 or 32 by way of an interprocessor bus 36. Peripheral devices, such as terminal controllers and disc controllers, are also coupled to the interprocessor bus, as represented by block 38. The subject processor controller, generally designated by the numeral 40, is also coupled to bus 36. Processor controller 40 performs various functions to be described later in greater detail. One such function is to verify proper operation of the two processors 30 and 32. Another exemplary function is to control the flow of asynchronous external events, primarily interrupts, to the two processors.
Processors A and B together with processor controller 40 can be viewed as a single logical processor.
As indicated by line 42, processor controller 40 receives interrupts and other asynchronous signals from external sources which are intended for the two processors. These signals will be collectively referred to herein as interrupts. Processors A and B are identically programmed and, under normal operating conditions, execute identical code. For various reasons, however, the processors typically do not execute the same code at the same time. For example, if the two processors request interprocessor bus 36 simultaneously, only one processor will be given access. The remaining processor will be requested to wait until the first processor has released the bus. As a consequence, the second processor to access the bus will fall behind the first processor in executing the common code. In addition, the two processors have separate internal clocks which are not identical in frequency. The processor with the higher frequency clock will tend to execute code at a higher rate.
In order to ensure that the two processors will remain executing identical code, it is necessary to present interrupts to the processors at the same point in the respective programs. For example, if one processor is substantially ahead of the other processor, it is possible that the simultaneous presentation of an interrupt s ignal would cause the processors to respond differently. If an interrupt occurs at certain critical points in the execution of the program by the processors, the processors will take different paths and will no longer be identical as desired. The common feature of such critical points is that data are either being changed and/or tested in the normal execution of the code when an asynchronous event, such as an interrupt, occurs which also changes and/or tests the same data. Accordingly, one function of the subject controller is to ensure that the processors are notified of all asynchronous events in the same relative position in the execution of their respective code.
The concept of virtual time will be used to further explain the subject invention. Virtual time can be defined as the measure of how far a particular processor has progressed in its execution of its associated code. The smallest measurable change in virtual t ime can b e expressed in terms of virtual time ticks. There are several possible choices of processor events which can be used as a virtual time tick for the purpose of measuring virtual time. Depending upon the particular processor used, instructions, data references or bus cycles could be used as virtual time ticks. MOVE instructions, procedure calls and the like could also be used for this purpose.
The ideal virtual time tick is chosen such that the processors will be at the same virtual time when they are at the same point in the execution of their respective code. In addition, the capability must exist to count each tick of virtual time. For example, instruction counts should not be used as virtual time ticks for processors such as the Motorola model MC68010 which utilize an internal cache memory. This is because there are no external indications of instruction cycles for this type of processor. However, instruction fetches or bus cycles could be used as virtual time ticks. Finally, the virtual time tick should occur at least once during a predetermined maximum period of time. It is preferable that the ticks occur at least once every few microseconds or even more frequently. Ticks which occur at a lower frequency do not provide sufficient resolution for verifying proper processor operation.
The subject controller monitors the positions of each of the processors in virtual time and presents interrupts to the processors only at the same point in virtual time. This aspect of the invention is illustrated in the flow chart of Figure 2. As represented by element 44, the controller monitors for asynchronous external events (interrupts) which are intended for the two processors. If no events are detected, the controller remains in a loop condition. After receipt of an interrupt, the two processors are notified of the interrupt at the same virtual time, as indicated by block 46. The controller then sequences back to state 44 and waits for further interrupts. Figure 3 illustrates a first embodiment of the subject controller. At state 44, the controller waits for an interrupt. After an interrupt is received, the lead processor in virtual time is halted as indicated by block 46. This processor will no longer advance in virtual time. Meanwhile, the lagging processor is permitted to execute code. As indicated by element 48, when the lagging processor has finally reached the halted processor in virtual time, both processors are notified of the interrupt as indicated by block 50. Thus, the notification is given at the same virtual and actual time.
The operation of the first embodiment controller is further illustrated in the timing diagram of Figure 4. The vertical axis of the graph represents virtual time and the horizontal axis represents actual time. The solid line A and broken line B represent the states of processors A and B, respectively. At virtual and actual times Vo and To, lines A and B indicate that the processors are at the same virtual and actual times.
At virtual time V1, processor A is at time T2.
However, it takes processor B until actual time T8 to advance to the same virtual time. Thus, processor B starts to lag in virtual time.
The two processors advance in both actual and virtual time until actual time T14, when an interrupt is detected by the controller. At this actual time, processor A is at virtual time V10 and processor B is at virtual time V 6. The controller then commands the advanced processor, processor A, to halt. The processor will not proceed further in virtual t ime until it is un-halted. Meanwhile, processor B is permitted to advance in virtual time. When processor B also reaches virtual time V10, at actual time T19, the two processor are at the same virtual and actual times. At this time both processors are un-halted and notified of the interrupt. The processors then service the interrupt and proceed to execute code.
Operation of a second embodiment of the subject invention is illustrated in the flow chart of Figure 5. As indicated by element 44, the controller waits for an interrupt. When an interrupt is received, the lead processor in virtual time is notified of the interrupt as indicated by block 52. In addition, the virtual time at which the notification is given is stored. The leading processor then services the interrupt and proceeds to process code. The controller then compares the stored virtual time with the virtual time of the lagging processor as indicated by element 54. When the virtual times are equal, the lagging processor is notified of the interrupt, as represented by block 56. The second processor then services the interrupt and proceeds to process code. Referring now to Figure 6, a timing diagram similar to that depicted in Figure 4 is used to further illustrate the operation of the second embodiment controller. At the exemplary time periods shown, processor A is leading processor in virtual time. At actual time T12, the controler receives an interrupt.
Soon after receipt, lead processor A is notified of the interrupt. Processor A services the interrupt and continues to process code. In addition, the virtual time V8 at which the processor was notified of the interrupt is stored.
Lagging processor B was only at virtual time V6 when the lead processor was notified of the interrupt. Processor B continues to process code until it reaches virtual time V8 and real time T17. At this time, the controller detects that the virtual time of the lagging processor matches the stored virtual time V8 and processor B is notified of the interrupt. Processor B then services the interrupt and continues to process code. Thus, both processors are notified of the interrupt at the same virtual time, but not necessarily the same actual time.
Referring again to Figure 1 , the processor controller 40 depicted is implemented in accordance with the first embodiment of the subject invention. Controller 40 derives the virtual time of the two processors by counting processor local bus cycles. A bus cycle, or virtual time tick, occurs whenever the processor moves data via the local bus. The local bus cycles of the Motorola MC 68000 microprocessor are indicated by occurrences of address strobes. Address strobe signals from processor A, referred to as Clock A, are coupled to controller 40, as indicated by line 52. Similarly, address strobe signals from processor B, referred to as Clock B, are also fed to controller 40, as indicated by line 52'. The controller provides processor halt signals to processors A and B as represented by lines 54 and 54', respectively. The interrupts received by the controller on line 42 are forwarded to processors A and B at the same virtual time on lines 56 and 56', respectively.
It is desirable that processors A and B have a means for communicating with one another by, for example, exchanging messages. Unless semaphore techniques are used, messages cannot be reliably exchanged unless the processors are synchronized. The processors are synchronized by definition when they are simultaneously at the same virtual and actual times. To enable an exchange of messages and for other reasons, the subject invention is capable of synchronizing the processors when requested by the processors to do so. Once the processors are synchronized, the subject controller also verifies proper processor operation.
Referring now to the Figure 7 flow chart, the operation of the synchronization function of the subject controller will now be described. The two processors, will be programmed to request synchronization at the same virtual times. As represented by block 58, the processor most advanced in virtual time will be the first to request synchronization. Immediately after requesting synchronization, the processors are halted by the subject controller. Since both processors are supposed to be executing identical code, the lagging processor will eventually also request a synchronization. Again, the processor will have halted just after the issuance of the request. The controller then compares the virtual times of the two processors as indicated by element 62. If both processors are operating properly, the virtual times should be equal. If they are equal, the processors are released or un-halted as shown by block 64. The processors are thereby synchronized. If the virtual times do not correspond, the processors are notified that there is an error and that they are not synchronized as indicated by block 66. The operation of the processor synchronization process is further illustrated by the timing diagram of
Figure 8, which is similar to the diagrams of Figures 4 and 6. At actual time To, the two processors are synchronized. As time passes, the processors go out of synchronization, with processor A leading. At virtual time V8, and actual time T10, processor A requests that the processors be synchronized and is halted. At actual time T14, processor B requests that the processors be synchronized and is also halted. To verify proper operation, the virtual times of the two processors are then compared. Proper operation is verified since both processors are at virtual time V8. The processors are then un-halted and permitted to continue executing code. If the virtual times did not correspond, the processors would have been notified of the anomaly.
It would also be possible to verify proper processor operation without actually synchronizing the processors. In that event, each processor would be suitably programmed to request verification of proper operation at the same point in virtual time. If, in fact, the requests for verification are received by the subject controller at the same virtual times, proper operation is confirmed. If the requests occur at different virtual times, a fault has occurred and the processors are notified of the anomaly.
It has been assumed that the two processors are identical in hardware configuration, that the processors execute identical code, and that the data present within the processors are identical. However, in interactions with the external peripherals, it is often necessary for the two separate processors and the subject controller to appear as one logical processor. This is accomplished by suitably programming the processors so that they effectively shut the subject controller off. The processors then are capable of executing non-identical code with the controller off. During this time, the controller will not forward interrupts to the two independently-operating processors. The processor programs also include instruction for turning the subject controller back on for synchronized operation. Steps are taken to ensure that both processors turn the subject controller off at the same virtual time and that the means for tracking virtual time on the controller is reinitialized when the controller is turned back on.
Further details of the subject processor controller 40 may be seen in the block diagram of Figure 9. The controller receives the clock A and clock B signals from the processors on lines 52 and 52', respectively. The clocks are synchronized with the internal processor controller clock (not shown) utilizing conventional synchronization circuits 70 and 70'. A pair of virtual time counters, 72 and 72', are provided for counting the two clocks outputed by the synchronizers.
The subject controller further includes a state machine 68 which provides various control outputs in response to microcoded instructions and selected condition inputs. The major components of the state machine are shown in Figure 10. The machine includes a memory 76. loaded with approximately 256 32-bit microcode instructions. Memory 76 is a Programmable Read Only Memory (PROM) although a Random Access Memory (RAM) may be used provided a non-volatile memory is included for permanently storing the microcode. A program counter 78 is included for providing an 8-bit address for memory 76.
The state machine further includes a branch condition selector multiplexer having 24 inputs which represent 24 different conditions of the subject processor controller. For example. Input 1 may be a logic 1 when the virtual time counter 72 for processor A is greater than the counter 72' for processor B and a logic 0 at all other times. The particular condition to be selected is determined by 5 bits of the 32 bits of data read from memory 76, as represented by line 82.
The output of multiplexer 80 is coupled to the load input of program counter 78. If the selected condition is not present, program counter 78 increments to the next higher memory 76 address to read the next instruction. If the condition is present, 8 bits of the 32-bits of data read from the memory are loaded into counter 78 and used as the address to read the next instruction out of memory. Unless another branch condition occurs, the counter will increment from the new branch address. The remaining 19 bits of the 32 bits of data from the memory are used as control outputs for the subject processor controller.
Referring again to Figure 9, state machine 68 generates count enable signals on lines 84 and 84' which are coupled to logic elements 74 and 74', respectively. When the enable signals are present, virtual time counters 72 and 72' count clocks from the processors A and B, respectively. The outputs of the virtual time counters 72 and 72' are compared by the subject controller as represented by block 86. If processor A is advanced further in virtual time than processor B, a signal indicative of this condition is generated on line 88 and forwarded to state machine 68. If processor A is lagging in virtual time, a signal indicative of this condition is generated on line 90 and forwarded to to the machine. If processor A is neither leading nor lagging, the state machine concludes that the virtual time counters are equal. As previously noted, the subject controller generates Halt A and Halt B control signals on lines 54 and 54', respectively. Halt control circuitry 88 and 88' is provided which, among other things, synchronize the Halt A and Halt B signal with the bus cycles of the respective processors.
As previously noted, the processors can request the subject controller to synchronize the processors. Also, the processors can instruct the controller to either turn on or turn off the virtual time counter 12 and 12'. This may be done if the processors are programmed to operate independently of one another for a period of time. These functions are accomplished utilizing a pair of control registers 94 and 94' which are associated with processor A and B, respectively. Control registers 94 and 94' are coupled in a timemultiplexed fashion to the local data bases of processors A and B, respectively, by way of interprocessor bus 36 (Figure 1). An interprocessor bus interface circuit 97 is provided for interfacing between the interprocessor bus 36 and the local bus of the subject controller. Interface circuit 97 includes address recognition apparatus which detects commands addressed to control registers 94 and 94' and provides enable signals to the registers on lines 102 and 102', respectively. The enable signals cause command data addressed to registers 94 and 94' to be loaded. In addition, the address recognition circuit causes a signal to be sent back to the processor which originated the command, acknowledging receipt of the command. The commands which may be loaded into registers 94 and 94' includes virtual time counter on commands, counter off commands and synchronization request commands. When the commands are loaded into the registers, an autohalt signal is forwarded to the halt control circuitry as represented by lines 95 and 95'. The autohalt signals cause the halt control circuitry to halt the particular processor which transmitted the command. The halt is achieved very quickly since state machine 68 is not utilized to generate the autohalt signals.
The commands loaded into control registers 94 and 94' are then transferred to state machine 68 as represented by lines 104, 104', 106 and 106'. After the commands have been acted upon by state machine 68, the registers are cleared as indicated by lines 110 and 110'. Although not depicted, the processors also provide lock requests, reset requests and (by implication) unreset requests, which are received by control registers 94 and 94' and transferred to state machine 68. These latter three request commands will be subsequently described.
In the event the subject controller detects an error in the operation of the two processors, error data identifying the type of error is generated on line 114. The error status data will then be loaded into an error status register 116. The processors will periodically request error status data from the controller by generating an appropriate address which wil be detected by the address recognition circuitry of interprocessor bus interface 97. An enable signal will be produced on line 120 causing error status data to be forwarded to the requesting processor, as indicated by line 117, via interprocessor bus interface 97. In addition, the address recognition circuit of interface 97 will transmit an acknowledgment signal back to the processor requesting the error status data.
The subject controller further includes interrupt control circuitry 120. When an interrupt for the processors is received on line 42, control 120 forwards an interrupt request signal to state machine 68 by way of line 122. When comparator circuit 86 determines that the virtual times of the two processors are matched, an inhibit signal on line 124 to control 120 is taken away thereby enabling the control to notify processors A and B of the interrupt by way of lines 56 and 56', respectively.
In the present implementation, eight interrupt levels, 0 - 7, are defined. Four interrupt levels, 0 - 3, are reserved for devices to assert directly to the processors, by-passing the subject controller. Four levels, 4 - 7, are used by the subject controller for forwarding interrupts to the processors. If the full logical processor (processors 30, 32 and processor controller 40) is operating properly, the processors mask out or ignore interrupt levels 0 - 3 and only respond to levels 4 - 7 forwarded by the controller. Interrupt levels 1 - 3 are reprogrammed to be forwarded by the controller as levels 4 - 6, respectively. Level 0 is reprogrammed as level 7.
Should part of the logical processor fail, the subject controller will no longer forward interrupts to the processors. In this case, the surviving processor will set itself such that interrupt levels 1 through 3 will no longer be masked or ignored. This will enable the processor to receive the interrupts that were originally forwarded by the subject controller on levels 4 - 7. The Motorola MC68000 cannot detect an interrupt at level 0, therefore the device that generates this level is reprogrammed to interrupt directly at level 7 if the full logical processor is not working. Interrupt level 0 is the software refresh timer which must be received by the processors at least every two milliseconds for proper operation of the processor's internal dynamic memory. Referring now to Figure 11, a simplified flow chart illustrating the operations performed by microcoded state machine 68 may be seen. The hexadecimal address in microcode memory 76 at which a particular instruction is stored is shown in the flow chart in small circles. If a block in the chart represents more than a single instruction address, only the first address is set forth.
At power on and at other certain states, the state machine program counter 78 is in a reset state at address 00 as represented by block 128. This is a latched state wherein virtual time counters 72 and 72' (Figure 9) are cleared, but not running, and control registers 94 and 94' and halt control circuits 38 and 88' are cleared. The controller stays in the reset state until one or both of the processors unresets either or both of control registers 94, 94'. This unreset request is the only processor command which need not be given by both processors in order to be acted upon. When an unreset command is issued, the program counter advances to address 04, as indicated by block 131, and the state machine waits for processors A and B to issue a request to turn the virtual time counters on. Since the virtual counters are already off, the processors should not issue a counters off request or a synchronization request. If either of such requests occurs from either of the processors, the program will report an error as represented by Error Reporting block 135. The particular type of error which has occurred can be determined from the error status data of the
Error Reporting block. Any time an error is detected, the subject controller stops further operation thereby permitting the processors to examine the outputs of virtual time counters 72 and 72', control registers 94 and 94' and other circuits to facilitate locating the source of the error.
Assuming that both counters issue a counters on request within a predetermined time period of one another, the subject controller will issue a status word to the processors indicating that the counters have been turned on. The program will then advance to address 57 of a Present Interrupts block 130. At this time, a comparison is made of the count of the virtual timers. If the counts are identical, any interrupts presently on line 42 (Figures 1 and 9) will be presented to both processors by the Present Interrupts block. Once all interrupts have been presented, the virtual counters 72 and 72' are enabled.
The program then makes an unconditional jump from the Present Interrupts block and enters the Main Counters Running Loop at address 17 as indicated by block 133. The program will remain in the loop until either an interrupt is received or the processors issues an autohalt signal. An autohalt will be issued anytime the processors make a synchronization request, a counters on request or a counters off request.
Assuming that an interrupt is eventually received on line 42, the program will jump to address 20. The interrupt will then be processed by an Interrupt Handler, as represented by block 132. As will be described later in greater detail, the processor which is most advanced in virtual time will, be halted. Then when the virtual time of the lagging processor matches that of the lead processor, the program proceeds to address 57 of the Present Interrupt block 130, at which time interrupts are presented to the two processors. If errors are detected by Interrupt Handler 132, the program will proceed to Error Reporting block 135 to report the error to the processors. An exemplary error would indicate that the lagging processor has failed to respond to a halt request. Also, if the processors should issue an autohalt signal to be produced by the controller while the program is in the Interrupt Handler, the halt sequence will be executed by an Autohalt Handler, as represented by block 134. Interrupt Handler 132 is not implemented to process autohalts.
Autohalt Handler 134 responds to autohalt signals which are produced when the processors request that the virtual time counters be turned on, that the counters be turned off or that the processors be synchronized. The autohalt signal cause the processor which issued the request to half, independent of the state machine of the subject controller. If processors request that they be synchronized, Autohalt Handler 134 will verify that the processors have been halted at the same virtual time. If the virtual time counters are not equal, then the program proceeds to Error Reporting block 132.
The processors should not issue a counters on request while the counters are running. Thus, if this request is made, the program will also proceed from the Autohalt Handler to block 135 so that an error will be reported. If the processors make a counters off request, the program will proceed from the Autohalt Handler to Wait For Counters on Request block 131 and wait until a counters on request issues.
As previously noted, it is sometimes necessary to allow the processors to execute dissimilar code. The counters off request allows the processors to diverge and accomplish different tasks in a protected manner. The subject controller will not forward interrupts while the counters are off. This ensures that there will be no critical points while the processors are diverged. A critical point occurs when data are either being changed and/or tested in the normal execution of the code when an asynchronous event, such as an interrupt, takes place which also changes and/or tests the same data.
The one interrupt which is forwarded while the counters are off is the software refresh interrupt, a level 7 interrupt which occurs typically every two milliseconds. Steps are taken in the software such that critical points do not occur during a software refresh routine.
If the processors request that the counters be turned off at different virtual times, then the program will proceed to error reporting block 135. The program will then remain in Wait For Counters On Request block 131 until the processors requests that the counters be turned back on. Certain steps must be taken to configure the processors so that they will again execute similar code after having been diverged. First, one of the processors will issue the previously-noted lock request which is received by control registers 94 and 94' (Figure 9). The processor then loops, waiting for the subject controller to issue a refresh-locked-out status bit. When this status bit is active, the subject controller will refrain from forwarding refresh interrupts to the processor. This is a precaution which is taken because there is a possibility that a refresh interrupt will occur at about the same time the processors request that the counters be turned on. As previously noted, when a counters on request is received, an autohalt signal is produced by the subject controller which causes the requesting processor to immediately halt. Since the processors are diverged and executing different instructions, there is the
. possibility that the lead processor will receive a refresh interrupt before the counters are turned on and the lagging processor will receive the interrupt after the counters are on. The lead processor will have halted and will not respond to the refresh interrupts whereas the lagging processor will respond. Thus, the processors will not be executing identical code as desired at this point. The refresh-locked-out function prevents this anomaly from occuring. Once both processors have issued a lock request, the subject controller will cause the refresh-lockedout status bit to go active. When the processors detect the active status bit, they will both issue a counters on request and the program will proceed to Present Interrupt block 130 and then to Counters Running Main Loop block 133.
The synchronization requests, which are processed by Autohalt Handler block 134, are issued by the processors primarily to verify proper processor operation. The requests, which should be issued fay the processors at the same virtual times, cause the counters to first be turned off and the virtual time counters to be compared. The counters are then turned back on. The counters are turned on at the same virtual time so that the processors will be synchronized in both actual and virtual times. If the processors wish to exchange data, the processors can cause the data to be written in a predetermined location, issue a synchronization request, and then read the data of the other processor. The synchronization request ensures that each processor will have written the data before it is read by the other processor.
It is important to note that the processors must utilize a synchronization request in order to synchronize themselves if the virtual time counters are running. For example, if a polling loop were to be used while the processors are waiting for certain data to appear, there is no way to ensure that the processors will execute the loop the same number of times since the processors are not running at identical speeds. Thus, the virtual times of the processors will not be the same and the subject controller will conclude that an error has occurred.
Although not depicted in the Figure 11 flow chart, the subject controller may also be reset by the processors, in which case the sequence will return to Reset block 128. A reset request may be made by either or both processors. In the event only one processor makes a reset request, the request command must be forwarded to both control registers 94 and 94'.
Figure 12 shows a more detailed flow chart further illustrating the operation of Counter Running Main Loop block 133. As indicated by block 136, if an interrupt is received when the program counter is at address 17 of the Main Loop, the program will jump to address 20 of Interrupt Handler 132. Otherwise, the counter will increment to address 18 at which time a determination is made as to whether processor A has issued an autohalt signal AAHLT, as indicated by block 138. If the signal has issued, the program will jump to address 63 of Autohalt Handler 134. If the signal has not issued, the counter will advance to address 19 at which time a determination will be made as to whether processor B has issued an autohalt signal BAHLT, as indicated by block 140. If the signal has issued, the program will proceed to the Autohalt Handler, otherwise the program will branch back to address 17 as indicated by block 142.
Additional details regarding Interrupt Handler 132 are shown in the flow chart of Figure 13. As previously described, when an interrupt is received while the program is in the Counters Running Main Loop block, the program branches to address 20 of Interrupt Handler 132 as indicated by element 142. At this time, a time-out timer (not shown) is cleared. The time-out timer is used to determine whether an anticipated event occurs within a predetermined time period. The timer of the present embodiment will time-out after 360 microseconds. A determination is then made as to whether the comparator 86 output on line 88 (Figure 9) indicates whether the virtual time counter 72 for processor A is greater than the counter for processor B. If so, the program will jump to address 3C as represented by block 144.
Assuming that processor A is not leading processor B, the program will proceed to address 21. As indicated by element 146, a determination is then made as to whether comparator 86 indicates that processor B is the leading processor. If this is the case, the program will proceed to address 3A as indicated by block 148. If processor B is not leading, the virtual time of the processor happened to be equal when the interrupt was received. In this unlikely event, the program will proceed to address 22. As indicated by element 150 of this address, both processors are instructed to halt at the first opportunity utilizing halt controls 88 and 88'. This is signified, by the terms A PGM HALT and B PGM HALT. The processors are designed to actually halt at the end of the processor bus cycle after receipt of these signals. If processor A is leading, block 144 indicates that processor A is instructed to halt at the first opportunity. Expression B EQ HALT of block 144 signifies that control register 88' is set such that processor B will halt when the processor B virtual time counter catches up with the counter of halted processor A. If processor B is leading, block 148 indicates that processor B is instructed to halt at the next opportunity and processor A is instructed to halt at the same virtual time as that of halted processor B.
Regardless of the state of the virtual time counters, the program will advance to address 23 as indicated by block 152. At this time, one or both of the processors will have been instructed to halt. However, it is possible that the processors have not yet reached the end of the current bus cycle where they are designed to actually halt. At block 152, the term
BHLTQQQQ signifies that a determination is made as to whether processor B has received the halt instruction provided by the state machine of subject controller. This is in contradistinction to halt which results from the occurrence of an autohalt signal which is produced when a processor has issued a command to the subject controller. Assuming that the answer is no, the program advances to address 24 where element 154 signifies that a determination is made as to whether processor B has been halted as the result of an autohalt signal. This is represented by the term BAHLT. If this event has occurred, the sequence will be controlled by Autohalt Handler 134. The program will then jump to address 3E, as signified by block 156, at which time halt control circuitry 88 and 88' are cleared so that no halt originating from the state machine of the subject controller will be acted upon by the processors. The program will then proceed to the Autohalt Handler. Assuming that an autohalt has not occurred, the program will proceed to address 25 as indicated by block 158. A determination is then made as to whether the time-out timer has timed out. The 360 microsecond period is sufficiently long such that, unless an anomaly has occurred, processor B will have acknowledged that it received a halt command from the subject controller. Note that the halt command could have originated from any of three sources represented by blocks 144, 148 or 150. The program will continue in the loop defined by blocks 152, 154 and 158 until either a time out occurs or the halt command is received by processor B. If a time out occurs, an error is reported to Error Reporting block 132, otherwise, the program will proceed to block 160. At block 160, a determination is made as to whether processor B has actually responded to the controller halt commands. If not, a determination is made at block 162 as to whether a time out has occurred. The program will remain in the loop defined by blocks 160 and 162 until a time out occurs, in which case an error will be reported, or an indication is received that processor B has actually halted.
Once processor B has halted, the program will proceed to address 29 as represented by block 164. At this time, a similar sequence for processor A is carried out. At block 164, a determination is made as to whether processor A has received the halt command which originated from the state machine of the controller. At block 166, a test is made as to whether an autohalt has been caused to be issued by processor A. A time-out condition is tested at block 168. Finally, at block 170, a determination is made as to whether processor A has actually halted.
When the program has advanced to address 2F, both processors will actually have been halted. The subject controller is implemented such that, at this point, the two virtual time counters will be close to one another, but usually not the same. Typically, the lagging processor will overshoot the leader by one or two counts. In that case, the leading processor, which becomes the lagging processor, will be single stepped by the subject controller until the counts are identical, as described as follows. At block 174, halt control registers 88 and 88' are instructed to halt both processors in the event the processors proceed to run at their normal rate. This precaution is taken because, under certain circumstances, a processor may respond to single step instruction by running rather than single stepping. At blocks 176 and 178, a determination is again made as to whether either processor has caused an autohalt signal to be issued. Assuming that no autohalt signals have issued, the program will advance to address 32 where a determination is made as to whether the processor A virtual time counter is greater than that of processor B. If the A count is greater, the program will jump to address 37 and, as indicated by block 182, processor B will be single stepped once. When single stepping, it is necessary to remove the halt provided at blocks 144, 148 or 150. As represented by block 184, the program will self-loop at address 38 until the halt goes away. Once this occurs, the program will loop back to address 23 as shown in block 152. The sequence will be repeated, with processor B being single-stepped until a determination is made at blocks 180 and 186 that the counters are identical. The program will then proceed to address BF at block 192 at which time both processors will be single stepped an identical number of times as a precaution to avoid certain anomalies. The program will then advance to Present Interrupts block 130. In the event processor B had overshot processor A, the program would have advanced from block 186 to blocks 188 and 190 so that processor A would be single-stepped in the same manner.
Further details regarding Autohalt Handler 134 are depicted in the flow chart of Figure 14. As previously noted, processors A and B cause autohalt signals to be issued in the event the processors request synchronization, request that the virtual time counters be turned on or request that the virtual time counters be turned off. If an autohalt signal is produced when the program is in Counters Running Main Loop 130, the program will jump to address 63 as indicated by block 192. At this time, the 360 microsecond time-out counter will be cleared. The program will then advance to address 64. In the event an autohalt signal is produced while the program is in the Interrupt Handler, the program will also jump to address 64.
As indicated by element 194, a determination is made as to whether the generated autohalt signal was caused by processor A. If not, the program will proceed to address 65 and a determination will be made, as represented by block 195, as to whether processor B caused the autohalt signal to issue. If not, an error has occurred since one or the other processors has caused an autohalt signal to be generated. The error is forwarded to Error Reporting block 132. Assuming that processor A caused the autohalt, the program will proceed to address 6F. As indicated by block 196, a determination is made as to whether processor B has also caused an autohalt. Although both processors are executing identical code, unless the processors are at almost the same point in virtual time, a processor B autohalt will not yet have been generated. Accordingly, the program will proceed to address 70 where, as represented by element 198, a determination will be made as to whether the time-out timer has timed out. Assuming that it has not timed out, the program will jump back to block 196. The program will remain in this loop until either a time out has occurred or processor B has caused an autohalt. If a time out occurs, an error message is forwarded to Error Reporting block 135, otherwise the program will branch to address 68. In the event the autohalt was caused by processor B, a similar wait sequence for a processor A autohalt will be performed according to blocks 200 and 202.
As previously noted, the processors cause themselves to be halted upon their issuance of any of the three previously-noted requests to the subject controller. Thus, unlike halts initiated by the subject controller, the two virtual time counters should be exactly the same when an autohalt signal has occurred. As indicated by block 204, if the processor A counter is greater than that of B, an error will be forwarded to Error Reporting block 135. If not, the program will proceed to address 69 where, as indicated by block 206, a determination is made as to whether the A counter value is less than the B value. If not, the counters are identical and the program will advance to address 6A. If A is smaller, an error will be forwarded to Error Reporting block 132.
The virtual time counters are running either because the processors have just requested synchronization or have just requested that the counters be turned off. Accordingly, a request that the counters be turned on is not proper at this time. At address 6A, a determination is made as to whether processor A had requested counters on. If so the program will jump to address 76, as indicated by block 210, and a determination will be made if processor B also issued a counter on request. In either case, an error will be forwarded to Error Reporting block 135. Two different error signal are produced to facilitate locating the source of the anomaly by the processors.
Assuming that a counters on request was not made, the program will proceed to address 6B as shown by element 212. A determination will then be made as to whether processor A had requested that the counters be turned off. If this request has been made, the program will proceed to address 74, as shown by element 214. At this address, it is determined whether processor B has issued the same request. If not, an error will be forwarded to Error Reporting block 135. If both processors requested that the counters be turned off, the counters are stopped and the program will proceed to address 03, which is Wait For Counter On Request block 131.
Assuming that a counter off request was not made, the program will proceed to address 6C as indicated by block 216 and a determination will be made as to whether processor A requested a synchronization. If this request was made, the program will proceed to address 78, as shown in block 218. A determination will then be made as to whether processor B made a similar request. If not, an error signal will be forwarded to Error Reporting block 135. If a similar request was made, the program will jump to address 57 of Present Interrupts block 130. If a synchronization was not requested, an anomaly has occurred since it had already been determined that the processors had not requested that the counters be turned off. These being the only two valid processor requests executed by the Autohalt Handler, an error will be reported. A flow chart further illustrating the operation of Present Interrupts block 130 is shown in Figure 15. If the Present Interrupt block is entered from either Autohalt Handler block 134 or Wait For Counters On Request block 131, the program will proceed to address 57. As indicated by block 222 and 224, if the virtual time counters are not equal at this point, an error is forwarded to Error Reporting block 135. If the counters are equal, the program will advance to address 5A. Also, any outputs from Interrupt Handler 132 enter the Present Interrupts block at this point. As indicated by block 226, a delay timer is reset at this time. The delay timer typically provides a 12 microsecond delay before presenting interrupts to the two processors.
The program will then proceed to address B5 where a determination is made as to whether the 12 microsecond delay provided by the delay timer has lapsed. The program will remain in this loop until the delay is over and then proceed to address 36. As indicated by block 230, all pending interrupts are then presented to the two processors. In addition, both virtual time counters are cleared. The program then will advance to address 5E as shown by block 232 at which time the halt control circuits 88 and 88' will be cleared, this being required before the processors can be released or un-halted.
At address 5F, a determination is made as to whether any new interrupts have been presented to the controller which are pending. If so, the program will branch back to address 5A and present the new interrupts in the manner previously described. If all the interrupts have been presented, the program will, as shown by block 236, proceed to address 60 at which time all halts are removed as indicated by block 236, including autohalts initiated by the processors and halts initiated by interrupts. This action releases the two processors. Other functions are also completed, including the clearing of control registers 94 and 94'. The program then returns to Counter Running Main Loop block 133 at address 17. Referring now to Figure 16, a flow chart further illustrating the operation of Wait For Counters On Request block 131 may be seen. At address 03 of the program, as depicted by block 238, the virtual time counters are off, either because the computer was just powered-up and the processors have not yet requested that the counters be turned on or because the processors had previously requested that the counters be turned off. All halts are cleared from halt control circuits 88 and 88' and the control registers 94 and 94' are cleared. The program then proceeds to address 04, as indicated by element 240, and it is determined whether processor A has requested that the virtual time counters be turned on. If such a request has been made, the program will proceed to address OE shown in block 252 at which t.ime the 360 microsecond time-out timer is reset. The program will then advance to address OF where a determination is made as to whether processor B has also requested that the counters turned on. Unless the two processors are closely synchronized at this time, the lagging processor, processor B, probably will not have yet made the request, therefore the program will proceed to address 10 as indicated by block 256. If the time-out timer has not timed out, the program will jump back to address OF of block 254. The program will remain in this loop until either 360 microseconds have lapsed, in which case an error signal is transferred to Error Reporting block 135, or processor B has also requested that the counters be turned on. If processor B has made the request within 360 microseconds, the program will proceed to the Present Interrupts block at which time the counters will be turned on.
If processor A has not yet made a turn on request, the program will proceed to address 05, as indicated by block 242. If a request has been made by processor B, the program will then wait for a request by processor A, as represented by blocks 258, 260 and 262. If A does not make a timely request, an error will be reported. If such a request is made, the program will proceed to the Present Interrupts block. Assuming that neither processor has requested a counter turn on, the subject controller will then verify that no improper processor requests have been made. As indicated by blocks 244 and 246, if either processor requests that the counters be turned off, an error will be reported since the counters are already off. Similarly, if either processor should request a synchronization, an error will be reported since such a request is not permitted when the counters are off. If no errors are reported, the program will jump back to address 04 and repeat the sequence.
Although the state machine of the subject controller has been described in some detail, a listing of the microcode instructions stored in the state machine memory may be helpful. Thirteen (13) sheets of computer printout follow which set forth the contents of the microcode memory. These printout sheets show the hexadecimal memory addresses in the left-most two columns of the listing. The corresponding contents of the memory are set forth in the next thirty-two columns, with the symbol "X" representing "don't cares."
The microcode listing is for use with a particular implementation of the state machine. The details of such particular implementation are shown in Figures 17 through 24 of the drawings. These figures show detailed schematic diagrams of one implementation of the subject processor controller which incorporates a state machine which utilizes the submitted microcode instructions. The schematic diagrams use conventional circuit symbols which can be readily understood by persons having even less than average skill in the applicable art. Accordingly, further description of the schematic diagrams will not be given, except for the following listing of descriptions of certain ones of the integrated circuits called out in the diagrams:
Circuit Designation Manufacturer Description AM27S29 Advanced Micro 4096 Bit PROM Devices
AM27S27 Advanced Micro 4096 Bit PROM Devices
7124H Fujitsu 4096 Bit PROM AS885 Texas Instruments 8 Bit Magnitude Comparator
Circuit Designation Manufacturer Description
AS869 Texas Instruments Sychronous 8-Bit Up/Down Counter
25LS2521 Advanced Micro 8 Bit Comparator Devices
74F157 Fairchild SemiQuad 2-to-1 Line conductor Data
Selectors/Multiplexers Non-inverted Data Outputs
LS138 Texas Instruments 3-to-8 Line Decoders/Multiplexers LS244 Texas Instruments Octal Buffers/Line Drivers/Line Receivers Noninverted Tri- State Outputs 74259 Texas Instruments 8-Bit Addressable Latch
F379 Fairchild SemiQuad Parallel conductor Registers with Enable LS393 Texas Instruments Dual 4-Bit Binary Counters
8287 Intel Octal Bus
Transeiver,
Inverting
LS175 Texas Instruments Quad D-Type FlipFlops Complementary Outputs, Common Direct Clear F175 Fairchild SemiQuad D-Type Flipconductor Flops Complementary Outputs, Common Direct Clear
Circuit Designation Manufacturer Description
F374 Fairchild SemiOctal D-Type Flipconductor Flops, Tri-State
LS393 Texas Instruments Dual 4-Bit Binary Counters
Thus, two embodiments of a novel processor controller for a fault-tolerant computer have been disclosed. Although the embodiments have been disclosed in some detail, it is to be understood that various changes could be made by persons skilled in the pertinent technology without departing from the spirit and scope of the subject invention. For example, rather than use separate virtual time counters, it would be possible to use a single up/down counter to monitor the relative positions of the processors in virtual time. The counter would be clocked in the up direction by one of the processors as it executes code and in the down direction by the other processor as it executes its associated code. Thus, the two processors will be at the same point in virtual time when the counter has received the same number of count-up and count-down clocks so that the count would be zero or some other predetermined value.
Oct 6 20:32 1983 mcode.rcs Page 2
* Revision 1.2 83/09/23 14:36:54 bill (Bill Kolb)
* This version redefines bits 21 and 20 as don't cares, bit 24 as Clear
* Request registers, bit 25 as Transparent NHI. (See page 6 of schematic)
* This is the first working version with the exception of time padding
* before and after interrupt presentation. *
* Revision 1.1 83/09/23 13:42:44 bill (Bill Kolb)
* Initial revision
See Flowchart with drawings
The interrupt handler is only executed when the counters are turned on.
ADDR BL41 BL29 CA29 CA41
C X X
BRA P SS U1 B T T AA BB AUAA B BB BRANCH
O TT pr I I r CC CC APCS A CS DAAA
L BB iR T M a NN NN UHLS U LS AAB B
21 nq E n TT TT TLR T R EPE P ts R s SS SS OT O QGQ G
It p 10 10 * X M M s a
. r clr INTS
00 xxx 0 00 10 1 0 0x 00 00 1100 1x00 0 xxx 0001 FORCE COUNTERS so they will clear
01 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx 0001 hold bit pattern for hold time 74259
02 xxx 0 00 00 0 0x 00 00 1100 1x00 0 xxx 1001 unforce cntrs
*******************************
This is entry point for "WAIT FOR COUNTERS ON REQUEST ROUTINE" Wait in this routine until both procs request COUNTERS ON
RESET TIMER
03 xxx 0 00 01 0 1 0x 00 00 1100 1x00 0 xxx 1001 hold pattern 74259 if coming from 02
04 010 0 11 00 0 0 1x 00 00 1100 1x00 0 000 1110 if ACNTRQST* active goto 0E to wait for
B's request.
05 101 0 11 00 0 0 1x 00 00 1100 1x00 0 001 0010 if BCNTRQST* active goto 12 to wait for
A's request. 06 011 0 11 00 0 0 1x 00 00 1100 1x00 0 101 0011 IF cntroffrqst activ goto53 cntr already off error . Oct 6 20:321983 mcode.rcs Page 3
07 110 0 11 00 0 0 1x 00 00 1100 1x00 0 101 0011 IF Bcntroffrqst activ goto 53 cntr already off error.
08 100 0 11 00 0 0 lx 00 00 1100 1x00 0 0011100 if Asyncrqst* active then goto 1C sync rqst w/cntrs off ERR.
09 111 0 11 00 0 0 1x 00 00 1100 1x00 0 0011100 if bsyncrqst* active goto 1C ERROR rqst sync while cntrs off. 0A 111 0 10 00 0 0 1x 00 00 1100 1x00 0 000 1100 IF TIMENMI* active then jump to 0C to let Timer keep running 0B xxx 0 00 00 0 11x 00 00 1100 1x00 0 xxx xxxx Reset the Timer .becaus TIMENMI* not active in previous instruction. 0C 100 0 10 00 0 0 1x 00 00 1100 1x00 0 000 0100 IF TIMER OK loop to 04 because NMI* did not TIMEOUT 0D xxx 1 00 00 0 0 1x 00 00 1100 1x00 0 110 0001 NMI* has timed out so we had better scurry to some microcode that warns those zany procs is too late, GOTO 61
***********
0E xxx 0 00 00 0 1 1x 00 00 1100 1x00 0 xxx xxxx RESET Timer because we were timing NMI* and now we want to time proc B.
0F 101 0 11 00 0 0 1x 00 00 1100 1x00 0 101 1010 IF B rqsts cntr on then GOTO5A to presen interrupts since A already rqstd at 04
10 100 0 10 00 0 0 1x 00 00 1100 1x00 0 000 1111 IF TIME OK LOOP 0F
11 xxx 1 00 00 0 0 1x 00 00 1100 1x00 0 111 1100 TIMEOUT B GOTO 7C to report error *************
12 xxx 0 00 00 0 1 1x 00 00 1100 1x00 0 xxx xxxx RESET TIMER so we can time A.
13 010 0 11 00 0 0 1x 00 00 1100 1x00 0 101 1010 If A rqsts cntrs on then goto5A because B alreadt reqstd at 05, 5A is present int
14 100 0 10 00 0 0 1x 00 00 1100 1x00 0 001 0011 IF TIME OK LOOP 13
15 xxx 1 00 00 0 0 1x 00 00 1100 1x00 0 111 1010 TIMEOUT A GOTO 7A to report error
***************************************
16 xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx unused
***************************************
MAIN COUNTERS RUNNING LOOP Oct 6 20:32 1983 mcode.res Page 4
17 000 110 00 0 0 0x 11 11 1100 1x00 010 0000 IF INTERRUPT CHANGE GOTO20 INT HANDLER
18 110 0 01 00 0 0 0x 11 11 1100 1x00 110 0011 IF A AUTOHLT* active go to 63 autohlt hndl
19 000 011 00 0 0 0x 11 11 1100 1x00 110 0011 IF B AUTOHALT goto autohalt handler 1A xxx 1 00 00 0 0 0x 11 11 1100 1x00 001 0111 GOTO 17 ***************************************
1B xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx unused
*******************************************
Entry for ERROR routine to set 0x01 00x1 to indicate SYNC RQST while counters off in the status reg. This is accomplished by setting the BIT 4 here, and then jumping to 7C to set Bit 0, messy but I am out of mcode space.
1C xxx 0 00 00 1 0 0x 00 00 1100 1x00 0 xxx 1100 SET the 4 BIT
1D xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx 1100 hold bit pattern 7425 1E xxx 1 00 00 0 0 0x 00 00 1100 1x00 0 111 1100 GOTO 7C to set other bit. *********************************************
1F xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx unused
***************************************
INTERRUPT HANDLER ***************************************
CLEAR TIMER
20 010 1 01 00 0 1 0x 11 11 1100 1X00 0 011 1100 IF A>B GOTO 3C
21 101 1 01 00 0 0 0x 11 11 1100 1x00 0 011 1010 IF A<B GOTO 3A
Wait for B to come to a Halt.
22 xxx 0 00 00 0 0 0x 11 11 1000 1x00 0 010 1xxx BOTH PGM HALTS
23 001 0 11 00 0 0 0x 11 11 1100 1x00 0 010 1000 if bhalt asert GOTO28
24 000 0 11 00 0 0 0x 11 11 1100 1x00 0 011 1110 if auto GOTO3E auto
25 100 0 10 00 0 0 0x 11 11 1100 1x00 0 010 0011 if TIMER OK GOTO 23
26 xxx 1 00 00 0 0 0x 11 11 1100 1x00 0 111 1100 GOTO TIMEOUT B
Wait for B's final cycle.
27 100 1 10 00 0 0 0x 11 11 1100 1x00 0 111 1100 if TIMEOUT GOTO 7C
28 001 0 01 00 0 0 0x 11 11 1100 1x00 0 010 0111 if BAS* active loop27
Now wait for A to Halt.
29 111 0 01 00 0 0 0x 11 11 1100 1x00 0 010 1110 if A Halt asertGOTO2E
2A 110 0 01 00 0 0 0x 11 11 1100 1x00 0 011 1110 if AAutoHalted GOTO3E
2B 100 0 10 00 0 0 0x 11 11 1100 1x00 0 010 1001 if TIMER OK Loop29
2C xxx 1 00 00 0 0 0x 11 11 1100 1x00 0 111 1010 TIMER UNOK GOTO 7A
Wait for A's final cycle.
2D 100 1 10 00 0 0 0x 11 11 1100 1x00 0 111 1010 if TIMER UNOK GOTO7A Oct 6 20:321983 mcode.res Page 5
2E 000 0 01 00 0 0 0x 11 11 1100 1x00 0 010 1101 if AAS* activeLOOP2D Set both PGM Halts in case SS doesn't work.
... . .Halt Register 2F xxx 0 00 00 0 0 0x 11 11 1000 1x00 0 010 1xxx Set Both PGM HALTS Check for those pesky Autohalts.
30 110 0 01 00 0 0 0x 11 11 1100 1x00 0 011 1110 if AAUTOHALT GOTO 3E
31 000 0 11 00 0 0 0x 11 11 1100 1x00 0 011 1110 if BAUTOHALT GOTO 3E
32 010 1 01 00 0 0 0x 11 11 1100 1x00 0 011 0111 if A>B GOTO 37 to SSB
33 101 0 01 00 0 0 0x 11 11 1100 1x00 1 011 1111 if Anot<B GOTO bf 2ss
SINGLE STEP A else SS A
.
34 xxx 0 00 00 0 0 0x 11 11 1101 1x00 0 xxx xxxx SINGLE STEP A
35 111 0 01 00 0 0 0x 11 11 1100 1x00 0 011 0101 wait here for Halt-hi
36 xxx 1 00 00 0 0 0x 11 11 1100 1x00 0 010 0011 GO TO 23
SINGLE STEP B .
37 xxx 0 00 00 0 0 0x 11 11 1100 1x01 0 xxx xxxx SINGLE STEP B
38 001 0 11 00 0 0 0x 11 11 1100 1x00 0 011 1000 wait here for Halt-hi
39 xxx 1 00 00 0 0 0x 11 11 1100 1x00 0 010 0011 GO TO 23 to wait forB
This code logically belongs at top of Interrupt Handler, see flowchart.
COME HERE at beginning of Interrupt Handler if B is the leader, eg A.LT.B=1
... . Update Halt Reg
3A xxx 0 00 00 0 0 0x 11 11 1000 1x00 0 100 1xxx Halt reg := AEQ &BPGM 3B xxx 1 00 00 0 0 0x 11 11 1100 1x00 0 010 0011 GO TO 23
COME HERE if A is the leader. ... .
3C xxx 0 00 00 0 0 0x 11 11 1000 1x00 0 011 0xxx Halt reg:= APGM BEQ
3D xxx 1 00 00 0 0 0x 11 11 1100 1x00 0 010 0011 GO TO 23 ***************************************
COME HERE FROM INTERRUPTS HANDLER IF AUTOHALT DETECTED.
This code clears all halts except AUTOHALT, in preparation for going to Autohlt
3E xxx 0 00 00 0 0 0x 11 11 1010 1x10 0 000 0xxx Clear HALT REG& FLOPS
3F xxx 1 00 00 0 0 0x 11 11 1110 1x10 0 110 0100 GOTO64 AUTOHALT ***************************************
HARDWARE ERROR loop for sync board thinking itself is sick.
Mcode sets the hard err bit (which can be strapped to NMI) , freezes the counters while in this loop to maybe help with diagnostics. and removes all halts from A&B so that, they are free to diagnose the problem.
The microcode loops at 43 waiting for the "UNUSED BIT to be set by either proc.
Then mcode goes to 03 similar to coming out of RESET. .... .
40 xxx 0 00 00 1 0 0x 11 11 1100 1x00 0 xxx 1101 SET HARD ERROR BIT .. .. .... freeze counters& bitp Oct 6 20:32 1983 mcode.rcs Page 6
41 xxx 0 00 00 0 0 0x 10 10 1100 1x00 0 xxx 1101 HOLD SAME BIT PATERN
************
WAIT FOR ERROR TO BE FIXED.
Entry point for waiting for Superbit to go hi & say start again at 03
Processors have no halts here so they can go diagnose the problem.
.. .. .frez cntrs to help diagnos
... . no HALTs for A so he .both can go diagnose prob
42 xxx 0 00 00 0 0 0x 10 10 0010 0x10 0 000 0xxx SET A & B FREE
Transparent NMI 43 110 0 10 00 0 0 1x 10 10 0110 0x10 0 100 0011 WAIT HERE FOR HELP .... . .CLR RQST & HALT REGS
44 xxx 0 00 01 0 0 0x 11 11 1000 1x00 0 000 0xxx CLR RQST REGS&HLT REG 45 xxx 1 00 00 0 0 0x 11 11 1100 1x00 0 000 0011 GOTO 03 ************************
Entry point for reporting ERROR due to counters not equal.
46 xxx 0 00 00 1 0 0x 10 10 1100 1x00 0 xxx 1100 SET BIT "CNTR NEQ" ^^^^ ^ see pg 6 Bit Reg
47 xxx 0 00 00 0 0 0x 10 10 1100 1x00 0 xxx 1100 Hold bit pattern for
74259 hold time pg 6.
48 xxx 1 00 00 0 0 0x 10 10 1100 1x00 0 100 0010 GOTO42 to waitforhelp ************************
49 xxx 0 00 00 1 0 0x 10 10 1100 1x00 0 xxx 1011 BIT"UNEXPECTEDSUPRBIT 4A xxx 0 00 00 0 0 0x 10 10 1100 1x00 0 xxx 1011 Hold BIT PATTERN
.FREE PROCS TO DIAGNOS
4B 110 1 10 00 0 0 0x 10 10 0110 0x10 0 100 1011 Wait for SUPERBIT =0
4C xxx 1 00 00 0 0 0x 10 10 1100 1x00 0 100 0011 GOTO 43 to wait for Superbit to go hi again, this time meaning goto 03 to resume normal operation waiting for "TURN ON CNTRS" RQST
**************************************** Entry point for A & B procs Commands disagree.
4D xxx 0 00 00 1 0 0x 10 10 1100 1x00 0 xxx 1010 SET BIT A&B commands disagree.
4E xxx 0 00 00 0 0 0x 10 10 1100 1x00 0 xxx 1010 Hold bit pattern for
74259 hold time.
4F xxx 1 00 00 0 0 0x 10 10 1100 1x00 0 100 0010 GOTO42 to wait for
SUPERBIT
********************************************
THIS IS ERROR ROUTINE FOR when the procs request counters to be turned on, but counters were already on.
. .... .
50 xxx 0 00 00 1 0 0x 10 10 1100 1x00 0 xxx 1111 SET "ALREADY ON" BIT
51 xxx 0 00 00 0 0 0x 10 10 1100 1x00 0 xxx 1111 Hold BIT PATTERN
52 xxx 1 00 00 0 0 0x 10 10 1100 1x00 0 100 0010 GOTO 42 to wait for
SUPERBIT. ******************************************** Oct 6 20:32 1983 mcode.res Page 7
COUNTERS ALREADY WERE OFF ERROR ROUTINE
This routine sets two bits in the bit register to indicate that cntrs were off when procs requested counters off. First bit is set here,second bit is set by jumping to error routine that sets what was called the TIMEOUT bit but is now used as a modifier bit in combination with the other bits.
53 xxx 0 00 00 1 0 0x 10 10 1100 1x00 0 xxx 1111 SET BIT 7
54 xxx 0 00 00 0 0 0x 10 10 1100 1x00 0 xxx 1111 hold pattern for 742590
55 xxx 1 00 00 0 0 0x 10 10 1100 1x00 0 111 1100 GOTO 7C to set the modifier bit called the TIMEOUT BIT, but really used in combo with bit 7 to mean counters already off.
******************************************* 56 xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx unused
***********************************************
PRESENT INTERRUPTS ***********************************************
57 010 1 01 00 0 0 0x 11 11 1100 1x00 0 101 1001 if A>B GOTO <>ERROR
58 101 0 01 00 0 0 0x 11 11 1100 1x00 0 101 1010 if Anot<B GOTO 5A
59 xxx 1 00 00 0 0 0x 11 11 1100 1x00 0 100 0000 GOTO ERROR 40
Instructions 5A-5D lower then raise FORCECTR* while S0 & S1 =00, FORCECTR* forces the cntrs to clock and 00 is the clear counter operation, thus the counters are cleared. Note that all bit operations require a cycle to clock the bit register followed by a cycle to hold the inputs to meet hold time for 74259
TIMER FOR PREPAD ATB5
5A xxx 1 00 00 0 1 0x 00 00 1100 1x00 1 011 0001 GOTO B1 to prepad
5B xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx 0001 HOLD BIT INFO :: :: : : : : Raise FORCECTR* so
5C xxx 0 00 00 1 0 0x 0000 1100 1x00 0 xxx 1001 cntrs can clear,cntrs are synchronous and FORCECTR* going 1o then hi forces :: :: :::: them to clock,pp1,2,6
5D XXX 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx 1001 HOLD BIT INFO
5E xxx 0 00 00 0 0 0x 00 00 1000 1x00 0 000 0xxx CLEAR HALTS REGISTER
5F 000 1 10 00 0 0 0x 00 00 1100 1x00 0 101 1010 LOOP to 5A if INTS have changed.
60 xxx 1 00 00 0 1 0x 00 00 1100 1x00 0 111 1111 GOTO7F
Clr Timer (not really necessary but logical)
********************************************
61 xxx 0 00 00 1 0 0x 00 00 1100 1x00 0 xxx 1011 SET 3 BIT to warn of ^^^^ ^NMI* TIMEOUT
62 xxx 1 00 00 0 0 0x 00 00 1100 1x00 0 111 1011 WOW what dirty coding notice that the last 4 bits of this word match the one above it just got strobed needed the pattern to
Oct 6 20:32 1983 mcode.rcs Page 8
be held, anyway now we go to 7B which fall into 7C which will set the other BIT 0 which in combo means NMI* timeout. I would never have done this so dirty, but there are not 3 consecutive microwords left anywhere so I had to somehow do it in 2.
********************************************
AUTOHALT HANDLER ********************************************
*** 63 is entry point into AUTOHALT HANDLER from COUNTERS RUNNING loop *** the timer is turned on to see if the other processor will make the *** same request soon enough to be considered healthy.
63 xxx 0 00 00 0 1 0x 11 11 1100 1x00 0 xxx xxxx CLEAR TIMER
*** 64 is the entry point into AUTO HANDLER from INTERRUPT HANDLER, *** TIMER is already running to see whether procs can be brought into *** sync in the allowed time.
64 110 0 01 00 0 0 0x 11 11 1100 1x00 0 110 1111 IF AAutohlt* active then GOTO 6F
65 000 0 11 00 0 0 0x 11 11 1100 1x00 0 110 0111 IF BAutohlt* active then GOTO 67
*** IF code gets to 66 then it has come to Autohalt without either of the
Autohalts being active which is a hardware or mcode error, so go to *** 40 which is Sync bd hard error code.
66 xxx 1 00 00 0 0 0x 10 10 1100 1x00 0 100 0000 GOTO 40 HARD ERROR
67 110 1 01 00 0 0 0x 11 11 1100 1x00 0 111 0010 if no AAuto* activ then GOTO 72 in tight wait loop
68 010 1 01 00 0 0 0x 11 11 1100 1x00 0 100 0110 IF Cntrs notEQ GOTO46 to report error
69 101 1 01 00 0 0 0x 11 11 1100 1x00 0 100 0110 IF Cntrs NotEQ GOTO46 6A 010 0 11 00 0 0 0x 11 11 1100 1x00 0 111 0110 IF Acntrrqst* active then GOTO76 6B 011 0 11 00 0 0 0x 11 11 1100 1x00 0 111 0100 IF Acountoff* active then A wants cntroff so goto74 to see if6C 100 0 B wants cntroff.
11 00 0 0 0x 11 11 1100 1x00 0 111 1000 GOTO78 to see if B requested sync. 6D 110 0 10 00 0 0 0x 11 11 1100 1x00 0 100 0000 IF not SUPERBIT then hardware or mcode err SO GOTO 40 HARD ERROR
6E xxx 1 00 00 0 0 0x 10 10 1100 1x00 0 100 1001 Super bit GOTO49
Oct 6 20:32 1983 mcode.res Page 9
********************************
See Autohalt Handler Flowchart to see where this fits in.
6F 000 0 11 00 0 0 0x 11 11 1100 1x00 0 110 1000 IF Bauto halt* active then GOTO68 to do it.
70 100 0 10 00 0 0 0x 11 11 1100 1x00 0 110 1111 IF TIME OK LOOP 6F
71 xxx 1 00 00 0 0 0x 11 11 1100 1x00 0 111 1100 TIMED OUT,GOTO7C which is time out B error. ********************************
72 100 0 10 00 0 0 0x 11 11 1100 1x00 0 110 0111 this is part of tight loop with 67 to wait for AAuto* to go low See flowchart pg MC2 IF TIME OK loopto 67.
73 xxx 1 00 00 0 0 0x 11 11 1100 1x00 0 111 1010 GOTO 7A = A timeout. *******************************
Come here from 6B to see if B has also requested counters off.
74 110 1 11 00 0 0 0x 11 11 1100 1x00 0 100 1101 IF B doesnt say CNTROFF then they disagree so go to disagree ERROR 4D
75 xxx 1 00 00 0 0 0x 11 11 1100 1x00 0 000 0011 GOTO 03 to wait for CNTRS ON COMMAND
76 101 1 1100 0 0 0x 11 11 1100 1x00 0 100 1101 IF BCNTRRQST* is not active then A&B disagree so go to 4D Err.
77 xxx 1 00 00 0 0 0x 11 11 1100 1x00 0 101 0000 GOTO50, cntrs already were on so error
78 111 1 11 00 0 0 0x 11 11 1100 1x00 0 100 1101 IF not BSYNC* active then A&B disagree so then GOTO 4D, else
79 xxx 1 00 00 0 0 0x 11 11 1100 1x00 0 101 0111 Itsa syncrqst GOTO57 to present ints.
******************************************* TIME OUT ERROR ROUTINE
7A xxx 0 00 00 1 0 0x 10 10 1100 1x00 0 xxx 1010 SET A is the timouted proc, this only means A timeout if we set Timeout bit also.
7B xxx 0 00 00 0 0 0x 10 10 1100 1x00 0 xxx 1010 Hold bit pattern 7C xxx 0 00 00 1 0 0x 10 10 1100 1x00 0 xxx 1000 SET BIT Timeout Bit 7D xxx 0 00 00 0 0 0x 10 10 1100 1x00 0 xxx 1000 hold bit pattern for 7E xxx 1 00 00 0 0 0x 10 10 1100 1x00 0 100 0010 GOTO 42 to wait help wait on Superbit.
*************************************
7F xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
Oct 6 20:32 1983 mcode.res Page 10
80 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
81 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
82 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
83 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
84 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
85 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
86 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
87 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
88 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
89 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
8A xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
8B xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
8C xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
8D xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
8E xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
8F xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
90 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
91 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
92 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
93 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
94 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
95 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
96 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
97 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
98 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
99 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
9A xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on 9B xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
9C xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
9D xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
9E xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
9F xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
A0 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
A1 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
A2 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
A3 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
A4 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
A5 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
A6 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
A7 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
A8 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
A9 xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
AA xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
AB xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
AC xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
AD xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
AE xxx 0 00 00 0 0 0x 00 00 1100 1x00 0 xxx xxxx Waste time and go on
AF xxx 1 00 01 0 0 0x 00 00 1110 1x10 0 001 0111 Go to 17 clear reg B0 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX *******************************************
B1 xxx 0 00 0 0 0 0 0x 00 00 1100 1x00 xxxx xxxx waste time B2 xxx 0 00 0 0 0 0 0x 00 00 1100 1x00 xxxx xxxx waste time B3 XXX 0 00 0 0 0 0 0x 00 00 1100 1x00 xxxx xxxx waste time
Oct 6 20:321983 mcode.res Page 11
This is timing pad prior to presenting interrupts, mcode stays at address B5 until timer has timed out, timer is jumper programmable-see page 7 of schematics for timing TIMPADQQ.
B4 xxx 0 00 0 0 0 0 0x 00 00 1100 1x00 xxxx xxxx waste time for timer to clear.
B5 011 010 0 0 0 0 0x 00 00 1100 1x00 1011 0101 Brn to self if PAD=0
. : Updat ints.:set bit
B6 xxx 0 00 1 0 1 0 0x 00 00 1100 1x00 xxxx 0001 Clr counters B7 XXX 0 00 0 0 0 0 0x 00 00 1100 1x00 xxxx 0001 hold for 259 B8 xxx 1 00 0 0 0 0 0x 00 00 1100 1x00 0101 1011 GOTO 5B
*************************************************
B9 xxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
*************************************************
This is a patch to the routine that single steps twice to make sure that the processors take int at the same time.
BA xxx 0 00 0 0 0 0 0x 1111 1101 1x00 xxxx xxxx Single step A
BB 111 0 01 0 0 0 0 0x 11 11 1100 1x00 1011 1011 wait ahalt togoaway
BC xxx 1 00 0 0 0 0 0x 11 11 1100 1x00 1101 0000 goto DO
Patch finished **********************************************
BD xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx BE xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
**********************************************
BF xxx 0 00 0 0 0 0 0x 1111 1101 1x00 xxxx xxxx Single step A same for whole routine.
C0 111 0 010 0 0 0 0x 11 11 1100 1x00 1100 0000 watch halt go away
C1 xxx 00 0 0 0 0 0x 11 11 1100 1x01 xxxx xxxx Single step both C2 001 11 0 0 0 0 0x 1111 1100 1x00 1100 0010 watch bhalt go also
C3 001 0 11 0 0 0 0 0x 11 11 1100 1x00 1100 0111 if bhltqq* active goto C7
C4 000 0 11 0 0 0 0 0x 1111 1100 1x00 0011 1110 if autob goto 3E to prep for autohalt
C5 100 0 10 0 0 0 0x 11 11 1100 1x00 1100 0011 if T ok loopback C3 C6 xxx 100 0 0 0 0x 11 11 1100 1x00 0111 1100 Timeout goto 7c err C7 100 110 0 0 0 0x 11 11 1100 1x00 0111 1100 if timeup goto 7C C8 001 0 01 0 0 0 0x 11 11 1100 1x00 1100 0111 if bas* active loopC7 C9 111 0 01 0 0 0 0x 11 11 1100 1x00 1100 1101 if ahltq* active goto CA 110 0 01 0 0 0 0 0x 11 11 1100 1x00 0011 1110 if aauto goto 3E to service rqst
CB 100 0 10 0 0 0 0 ox 11 11 1100 1x00 1100 1001 if Timok loop C9
CC xxx 1 00 0 0 0 0 0x 11 11 1100 1x00 0111 1010 goto 7A
CD 100 1 10 0 0 0 0 0x 11 11 1100 1x00 0111 1010 if timeup goto 7A
CE 000 0 01 0 0 0 0 0x 11 11 1100 1x00 1100 1101 if aas* activ loopCD
Oct 6 20:321983 mcode.res Page 12
CF xxx 1 00 0 0 0 0 0x 11 11 1100 1x00 1011 1010 jump to patch at BA same for whole routine.
D0 xxx 0 00 0 0 0 0 0x 11 11 1100 1x01 xxxx xxxx single step b
D1 001 0 11 0 0 0 0 0x 11 11 1100 1x00 1101 0001 watch bhalt go also
D2 001 0 11 0 0 0 0 0x 11 11 1100 1x00 1101 0110 if bhltqq* active goto D6
D3 000 0 11 0 0 0 0 0x 11 11 1100 1x00 0011 1110 if autob goto 3E to prep for autohalt
D4 100 10 0 0 0x 11 11 1100 1x00 1101 0010 if T ok loopback D2 D5 xxx 00 0 0 0x 11 11 1100 1x00 0111 1100 Timeout goto 7c err D6 100 10 0 0 0x 11 11 1100 1x00 0111 1100 if timeup goto 7C D7 001 01 0 0 0x 11 11 1100 1x00 1101 0110 if bas* active loopd6 D8 111 01 0 0 0x 11 11 1100 1x00 1101 1100 if ahltq* active goto
CD to wait for as*
D9 110 0 01 0 0 0 0 0x 11 11 1100 1x00 0011 1110 if aauto goto 3E to service rqst
DA 100 10 0 0x 11 11 1100 1x00 1101 1000 if Timok loop C9 DB xxx 00 0 0x 11 11 1100 1x00 0111 1010 goto 7A DC 100 10 0 0x 11 11 1100 1x00 0111 1010 if timeup goto 7A DD 000 01 0 0x 11 11 1100 1x00 1101 1100 if aas* activ loopDC DE 110 01 0 0x 11 11 1100 1x00 0011 1110 if aauto goto 3e DF 000 11 0 0x 11 11 1100 1x00 0011 1110 if bauto " E0 100 10 0 0x 11 11 1100 1x00 0111 1010 if timout goto 7A E1 010 01 0 0x 11 11 1100 1x00 0011 0111 if a>b goto 37 try ag E2 101 01 0 0x 11 11 1100 1x00 0011 0100 if a<b goto 34 tryagn E3 xxx 00 0 0x 11 11 1100 1x00 0101 1010 finis, goto 5Apresint
There are some things that this code doesthat need to be documented,
I will attempt to ad hoc do that here.
UNEXPECTED SUPERBIT - Super bit is the OR of either proc setting his unused bit in the control reg.
This bit is not really unused, it is used to tell the microcode to go back to waiting for "CNTR ON RQST" after microcode has detected an error and has sent NMI. However the processors may mistakingly set this bit, in that case the microcode sets the "UNexpectd Superbit" status bit. The code then un-halts both procs if they are halted so that they can diagnose the NMI, and the mcode waits for SUPER Bit to first go low, getting rid of the unexpected Super bit and then go hi again telling the microcode to resume normal operation by Timeout ERRORS - Timeout errors are reported by setting the timeout status bit and waiting for the super bit to tell the mcooe to resume normal process son software prefers, normal operation can be resumed by Reseting and then unresetting the board. On the prototype board there is a reset bit, on BETA if this bit is not part of the design then reset will be accomplished by activating the INIT line, probably via the status board. Which brings up the next subject....
Oct 6 20:321983 mcode.res Page 13
RESETS - It is my opinion that all boards in the system should be reset in a manner that appears consistent to software, eg. each board can be held in reset by software via the status board and each board can be individually unreset via the status board. Therefore the BETA Sync board should probably not have a RESET control bit, but rather should be reset via the status board.

Claims

1. A processor controller for use in a computer system having a plurality of processors which execute associated programmed instructions, said controller comprising: interrupt receiving means for receiving asynchronous signals, including interrupt signals, intended for the processors; virtual time calculating means associated with each of the processors for determining the position of the associated processor in executing the associated programmed instructions; virtual time comparator means for comparing said code positions of the associated processors; and interrupt control means for forwarding said asynchronous signals received by said interrupt receiving means to the processors in response to said virtual time comparator means.
2. The controller of Claim 1 wherein said virtual time calculating means determines said code positions by monitoring processor local bus cycles.
3. The controller of Claim 2 wherein said virtual time calculating means monitors said processor local bus cycles by counting processor address strobe signals.
4. The controller of Claim 1 wherein said interrupt control means forwards said asynchronous signals to the processors when said code positions of the processors are the same.
5. The controller of Claim 4 further including processor halt means for halting and releasing the processors in response to said interrupt receiving means.
6. The controller of Claim 5 wherein said halt means is also responsive to said virtual time calculating means.
7. The controller of Claim 6 wherein said halt means halts the lead processor which is at the most advanced of said code positions and releases the lead processor when said code positions of all processors are the same.
8. The controller of Claim 7 wherein each of the processors provides request for synchronization signals to said controller and said halt mean releases all of said processors after said controller has received one of the request for synchronization signals from each of the processors.
9. The controller of Claim 8 further including error reporting means responsive to said virtual time comparator for reporting an error if the processors provide requests for synchronization signals at different code positions.
10. A method of controlling processors of a computer system having a pluarality of processors which execute programmed instructions comprising the following steps: receiving asynchronous signals, including interrupt signals, intended for the processors; determining the relative position of each of the processors in its execution of the instructions; and forwarding said asynchronous signals to the processors when said code positions are the same.
EP19850900389 1983-12-12 1984-12-10 Computer processor controller. Withdrawn EP0164414A4 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US56075983A 1983-12-12 1983-12-12
US560759 1983-12-12

Publications (2)

Publication Number Publication Date
EP0164414A1 true EP0164414A1 (en) 1985-12-18
EP0164414A4 EP0164414A4 (en) 1986-06-05

Family

ID=24239250

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19850900389 Withdrawn EP0164414A4 (en) 1983-12-12 1984-12-10 Computer processor controller.

Country Status (3)

Country Link
EP (1) EP0164414A4 (en)
AU (1) AU3746585A (en)
WO (1) WO1985002698A1 (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3412049A1 (en) * 1984-03-30 1985-10-17 Licentia Patent-Verwaltungs-Gmbh, 6000 Frankfurt SIGNAL-SAFE DATA PROCESSING DEVICE
AU568977B2 (en) * 1985-05-10 1988-01-14 Tandem Computers Inc. Dual processor error detection system
CA1293819C (en) * 1986-08-29 1991-12-31 Thinking Machines Corporation Very large scale computer
CA2003338A1 (en) * 1987-11-09 1990-06-09 Richard W. Cutts, Jr. Synchronization of fault-tolerant computer system having multiple processors
AU616213B2 (en) * 1987-11-09 1991-10-24 Tandem Computers Incorporated Method and apparatus for synchronizing a plurality of processors
US4908502A (en) * 1988-02-08 1990-03-13 Pitney Bowes Inc. Fault tolerant smart card
JPH0271644A (en) * 1988-09-07 1990-03-12 Toshiba Corp Master slave type control system
US4965717A (en) * 1988-12-09 1990-10-23 Tandem Computers Incorporated Multiple processor system having shared memory with private-write capability
AU625293B2 (en) * 1988-12-09 1992-07-09 Tandem Computers Incorporated Synchronization of fault-tolerant computer system having multiple processors
US5203004A (en) * 1990-01-08 1993-04-13 Tandem Computers Incorporated Multi-board system having electronic keying and preventing power to improperly connected plug-in board with improperly configured diode connections
JPH05128080A (en) * 1991-10-14 1993-05-25 Mitsubishi Electric Corp Information processor
US5613127A (en) * 1992-08-17 1997-03-18 Honeywell Inc. Separately clocked processor synchronization improvement
JPH0773059A (en) * 1993-03-02 1995-03-17 Tandem Comput Inc Fault-tolerant computer system
WO1995006277A2 (en) * 1993-08-18 1995-03-02 Honeywell Inc. Separately clocked processor synchronization improvement
FR2742015B1 (en) * 1995-12-01 1998-01-09 Sextant Avionique METHOD FOR SECURING AN ACTION AND DEVICE FOR IMPLEMENTING IT
GB2399190B (en) * 2003-03-07 2005-11-16 * Zarlink Semiconductor Limited Parallel processing architecture
US9732483B2 (en) * 2015-09-11 2017-08-15 Westfield Retail Solutions, Inc. Vehicle barrier system
US20210107512A1 (en) * 2020-03-27 2021-04-15 Intel Corporation Computing system for mitigating execution drift

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4196470A (en) * 1976-12-17 1980-04-01 Telefonaktiebolaget L M Ericsson Method and arrangement for transfer of data information to two parallelly working computer means

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3810119A (en) * 1971-05-04 1974-05-07 Us Navy Processor synchronization scheme
CH556576A (en) * 1973-03-28 1974-11-29 Hasler Ag DEVICE FOR SYNCHRONIZATION OF THREE COMPUTERS.
US3909795A (en) * 1973-08-31 1975-09-30 Gte Automatic Electric Lab Inc Program timing circuitry for central data processor of digital communications system
US3866184A (en) * 1973-08-31 1975-02-11 Gte Automatic Electric Lab Inc Timing monitor circuit for central data processor of digital communication system
US4456952A (en) * 1977-03-17 1984-06-26 Honeywell Information Systems Inc. Data processing system having redundant control processors for fault detection
US4358823A (en) * 1977-03-25 1982-11-09 Trw, Inc. Double redundant processor
IT1111606B (en) * 1978-03-03 1986-01-13 Cselt Centro Studi Lab Telecom MULTI-CONFIGURABLE MODULAR PROCESSING SYSTEM INTEGRATED WITH A PRE-PROCESSING SYSTEM
US4428044A (en) * 1979-09-20 1984-01-24 Bell Telephone Laboratories, Incorporated Peripheral unit controller
US4453215A (en) * 1981-10-01 1984-06-05 Stratus Computer, Inc. Central processing apparatus for fault-tolerant computing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4196470A (en) * 1976-12-17 1980-04-01 Telefonaktiebolaget L M Ericsson Method and arrangement for transfer of data information to two parallelly working computer means

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
COMPUTER DESIGN, vol. 21, no. 11, November 1982, pages 211,212,215,216,218,220, Winchester, US; J.H. WENSLEY: "Fault tolerant systems can prevent timing problems" *
See also references of WO8502698A1 *

Also Published As

Publication number Publication date
EP0164414A4 (en) 1986-06-05
WO1985002698A1 (en) 1985-06-20
AU3746585A (en) 1985-06-26

Similar Documents

Publication Publication Date Title
EP0164414A1 (en) Computer processor controller
CA1235524A (en) User interface processor for computer network
US4695946A (en) Maintenance subsystem for computer network including power control and remote diagnostic center
US5068851A (en) Apparatus and method for documenting faults in computing modules
JPH01154240A (en) Double-rail processor with error check function added to single-rail interface
JPH02118872A (en) Dual rail processor having error checking function for reading of i/o
US5153881A (en) Method of handling errors in software
CA1310129C (en) Interface of non-fault tolerant components to fault tolerant system
US7028218B2 (en) Redundant multi-processor and logical processor configuration for a file server
US5185877A (en) Protocol for transfer of DMA data
US4503534A (en) Apparatus for redundant operation of modules in a multiprocessing system
US5005174A (en) Dual zone, fault tolerant computer system with error checking in I/O writes
US5251227A (en) Targeted resets in a data processor including a trace memory to store transactions
JP2573508B2 (en) Digital logic synchronization monitoring method and apparatus
JPH0833874B2 (en) Device for synchronizing multiple processors
US5163138A (en) Protocol for read write transfers via switching logic by transmitting and retransmitting an address
JPH052654A (en) Method and circuit for detecting fault of microcomputer
EP0415546A2 (en) Memory device
US20040193735A1 (en) Method and circuit arrangement for synchronization of synchronously or asynchronously clocked processor units
EP0416732B1 (en) Targeted resets in a data processor
JP2001175545A (en) Server system, fault diagnosing method, and recording medium
SU1365086A1 (en) Device for checking control units
JPH11265321A (en) Fault restoring method central processing unit and central processing system
WO1994019744A1 (en) Synchronization arbitration technique and apparatus
JPH04102935A (en) Device and method for monitoring power on sequence of recording device

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19850806

AK Designated contracting states

Designated state(s): AT BE CH DE FR GB LI LU NL SE

A4 Supplementary search report drawn up and despatched

Effective date: 19860605

17Q First examination report despatched

Effective date: 19881028

REG Reference to a national code

Ref country code: GB

Ref legal event code: 732

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19890308

RIN1 Information on inventor provided before grant (corrected)

Inventor name: KATIN, NEIL, A.

Inventor name: KOLB, WILLIAM, W.

Inventor name: MCMURRAY, RICHARD, D.