CA1143026A - Computer system - Google Patents
Computer systemInfo
- Publication number
- CA1143026A CA1143026A CA000311096A CA311096A CA1143026A CA 1143026 A CA1143026 A CA 1143026A CA 000311096 A CA000311096 A CA 000311096A CA 311096 A CA311096 A CA 311096A CA 1143026 A CA1143026 A CA 1143026A
- Authority
- CA
- Canada
- Prior art keywords
- computer
- memory
- individual
- modules
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired
Links
- 230000015654 memory Effects 0.000 claims abstract description 65
- 238000012544 monitoring process Methods 0.000 claims abstract description 23
- 230000002950 deficient Effects 0.000 claims abstract description 18
- 230000008878 coupling Effects 0.000 claims abstract description 16
- 238000010168 coupling process Methods 0.000 claims abstract description 16
- 238000005859 coupling reaction Methods 0.000 claims abstract description 16
- 238000012545 processing Methods 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims abstract description 15
- 230000003936 working memory Effects 0.000 claims abstract description 14
- 238000000034 method Methods 0.000 claims abstract description 13
- 230000008569 process Effects 0.000 claims abstract description 8
- 238000012360 testing method Methods 0.000 claims abstract description 8
- 230000004044 response Effects 0.000 claims description 10
- 238000001514 detection method Methods 0.000 claims description 5
- 230000011664 signaling Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims 1
- 239000000543 intermediate Substances 0.000 description 11
- 239000000306 component Substances 0.000 description 5
- 238000000819 phase cycle Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000012806 monitoring device Methods 0.000 description 3
- 238000012512 characterization method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- NLZUEZXRPGMBCV-UHFFFAOYSA-N Butylhydroxytoluene Chemical compound CC1=CC(C(C)(C)C)=C(O)C(C(C)(C)C)=C1 NLZUEZXRPGMBCV-UHFFFAOYSA-N 0.000 description 1
- 238000012369 In process control Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000010965 in-process control Methods 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2038—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/142—Reconfiguring to eliminate the error
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2041—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with more than one idle spare processing component
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
- G06F11/2025—Failover techniques using centralised failover control functionality
Abstract
ABSTRACT OF THE DISCLOSURE
A computer system has two or more computer modules, each including an individual computer, a coupling memory and a working memory. The modules can be coupled to a system bus comprising a control and address bus and a data bus. Access is obtained to the coupling memory either from the system bus or from the individual computer by switching techniques, and only the individual computer has access to its working memory. The system bus can be coupled to a control computer and a safeguarding memory, to which the control computer has access over the system bus, and a further memory, to which the control computer also has access, are provided. The control computer, the further memory and part of the existing modules are employed to process the user's program, and a monitoring phase is inserted at regular intervals in which all the individual computers are checked for functional capacity by test programs stored in the working memories of the modules. The fact that defec-tive modules are not found is stored in the safeguarding memory and normal processing continues. If one or more defective modules are recognized, the same are replaced by other modules not used for processing the user's program and the individual function of the module replaced is loaded from the further memory which stores the entire user program into the replacement module before processing continues with the last safeguarded intermediate results stored in the safeguarding memory.
A computer system has two or more computer modules, each including an individual computer, a coupling memory and a working memory. The modules can be coupled to a system bus comprising a control and address bus and a data bus. Access is obtained to the coupling memory either from the system bus or from the individual computer by switching techniques, and only the individual computer has access to its working memory. The system bus can be coupled to a control computer and a safeguarding memory, to which the control computer has access over the system bus, and a further memory, to which the control computer also has access, are provided. The control computer, the further memory and part of the existing modules are employed to process the user's program, and a monitoring phase is inserted at regular intervals in which all the individual computers are checked for functional capacity by test programs stored in the working memories of the modules. The fact that defec-tive modules are not found is stored in the safeguarding memory and normal processing continues. If one or more defective modules are recognized, the same are replaced by other modules not used for processing the user's program and the individual function of the module replaced is loaded from the further memory which stores the entire user program into the replacement module before processing continues with the last safeguarded intermediate results stored in the safeguarding memory.
Description
3~6 BACKGROUND OF THE NVENTION
Field of the Invention _ _ _ _ . _ _ _ _ _ _ _ _ _ The present invention relates to a computer system in which two or more computer modules can be coupled to a system bus, each of the modules including an individual computer, a coupling memory and a working memory, and in which the system bus comprises a control and address bus and a data bus, and more particularly to such a system in which access can be gained to a coupling memory either from the system bus or from an individual computer by transfer techniques and in which only the individual computer has access to ].0 its worki.ng memory and the system bus can be coupled to a control computer.
Description of the Prior Art _ ____ _ _____ A computer system of the type briefly described above is known in the art. This prior system operates in a three-phase operation. The first phase consists of a control phase during which only the control computer is operative, carries out its program and informs the individual computers of the function which they must carry out during the following phase. The second phase consists of an autonomous phase during which the individual computers carry out their assigned functions simultaneously and independently of one another without being connected to the control computer or to its memory, and then report the execution of their function by transmitting a "STOP" signal to the control computer. The third phase consists of a data exchange phase which starts when the control computer has received a "STOP"
signal from all of the indi.vidual computers or from a selection of individual computers established by the circuit, and during which, under the control of the control computer, the data exchange is carri.ed out between the memories of the individual com-.~.~f b ~ 6 puters, and possibly the control computer.
For specific fields of use of data processing systems, for example in process control monitoring of nuclear power stations~ for example, and in navigation systems for flying bodies, as another example, computer systems having a high degree of reliability are required.
The reliability of data processing systems can be increased by re-dundancy in construction, for example by a multiple provision of critical components such as a central unit with a working memory, in which in the case of differing results, the result emitted by the majority of components is used or else by a redundancy in the organization, for example by means of redundant, full-correcting codes. A fundamental requirement of the organiz-ation is in being able to continue computation without a time loss or with ; only a small time loss when faultsoccur. It is not sufficient to isolate and replace faulty components and then to reinitiate the function being processed from the beginning. If this is at all possible, the result in time loss would generally be incompatible with the requirements of real time problems.
SUMMARY OF Tl-IE INVENTION
The object of the present invention is to provide a computer system which facilitates real time operation inspite of breakdown of individual com-ponents.
This object is achieved by means of a computer system of the typebriefly described above in that a safeguarding memory to which the control computer has access the other system bus and a further memory to which the control computer also has access are provided.
A high degree of reliability can be achieved with this computer system if it is operated in such a manner that the control computer, the further memory and a part of the existing modules are used to process the user program. A monitoring phase is interposed at regular intervals in which all
Field of the Invention _ _ _ _ . _ _ _ _ _ _ _ _ _ The present invention relates to a computer system in which two or more computer modules can be coupled to a system bus, each of the modules including an individual computer, a coupling memory and a working memory, and in which the system bus comprises a control and address bus and a data bus, and more particularly to such a system in which access can be gained to a coupling memory either from the system bus or from an individual computer by transfer techniques and in which only the individual computer has access to ].0 its worki.ng memory and the system bus can be coupled to a control computer.
Description of the Prior Art _ ____ _ _____ A computer system of the type briefly described above is known in the art. This prior system operates in a three-phase operation. The first phase consists of a control phase during which only the control computer is operative, carries out its program and informs the individual computers of the function which they must carry out during the following phase. The second phase consists of an autonomous phase during which the individual computers carry out their assigned functions simultaneously and independently of one another without being connected to the control computer or to its memory, and then report the execution of their function by transmitting a "STOP" signal to the control computer. The third phase consists of a data exchange phase which starts when the control computer has received a "STOP"
signal from all of the indi.vidual computers or from a selection of individual computers established by the circuit, and during which, under the control of the control computer, the data exchange is carri.ed out between the memories of the individual com-.~.~f b ~ 6 puters, and possibly the control computer.
For specific fields of use of data processing systems, for example in process control monitoring of nuclear power stations~ for example, and in navigation systems for flying bodies, as another example, computer systems having a high degree of reliability are required.
The reliability of data processing systems can be increased by re-dundancy in construction, for example by a multiple provision of critical components such as a central unit with a working memory, in which in the case of differing results, the result emitted by the majority of components is used or else by a redundancy in the organization, for example by means of redundant, full-correcting codes. A fundamental requirement of the organiz-ation is in being able to continue computation without a time loss or with ; only a small time loss when faultsoccur. It is not sufficient to isolate and replace faulty components and then to reinitiate the function being processed from the beginning. If this is at all possible, the result in time loss would generally be incompatible with the requirements of real time problems.
SUMMARY OF Tl-IE INVENTION
The object of the present invention is to provide a computer system which facilitates real time operation inspite of breakdown of individual com-ponents.
This object is achieved by means of a computer system of the typebriefly described above in that a safeguarding memory to which the control computer has access the other system bus and a further memory to which the control computer also has access are provided.
A high degree of reliability can be achieved with this computer system if it is operated in such a manner that the control computer, the further memory and a part of the existing modules are used to process the user program. A monitoring phase is interposed at regular intervals in which all
-2-
3~6 of the individual computers are checked in respect of functioning capacity by means of test programs stored in tlle working memories of the modules and de-fective modules are determined and indicated. In the event that no defective modules are recognized, the intermediate results calculated at that time are stored in the safeguarding memory and further processing of the user program is continued in normal fashion. In the event that one or more of the one defective module is recognized, such modules are replaced by certain of the other modules which are not used for processing the user program, for which purpose the individual computer function of the module which is to be re-placed is loaded from the other memory which stores the entire user programinto each replacing module. Then, further processing is continued with the last-safeguarded intermediate results stored in the safeguarding memory.
Advantageously, the computer system processes the user program in a three-phase cycle.
Advantageously, the computer system is operated in such a manner that after as few as possible phase cycles a monitoring phase is additionally inserted between the autonomous phase and the next data exchange phase.
For triggering of the monitoring phases, the computer system is advantageously provided with a pulse generator which is coupled to a control computer and which triggers the monitoring phases with a period of a pulse train.
In order to exchange a defective module for an intact module, it is expedient for each module to be provided with a fixed module number and a module number which can be modified by the control computer for character-ization purposes. The exchange process is then expediently carried out in that the modifiable module numbers of the defective modules are exchanged with those of intact modules, and their fixed module numbers are used for addressing purposes.
Advantageously, a computer system constructed in accordance with the present invention is provided with a time monitoring device which is coupled to the computer system and which indicates an impermissibly long auto-nomous phase and immediately i.ntroduces an addi~ional monitoring phase.
The computer system can advantageously be designed in such a manner that each module possesses a parity production and checking unit which con-stantly monitors the module and, on the recognition of a defect, reports this defect to the control computer by means of a parity fault message and thus immediately triggers a monitoring phase.
Thus, in accordance with one broad aspect of the invention, there is provided a computer system comprising: a control computer; a system bus system, including a control and address bus and a data bus, connected to said control computer; a plurality of computer modules connected to said system bus, each including an individual computer, a coupling memory and a working memory, access to said coupling memory being had from said system bus and from said individual computer; an information safeguarding memory connected to said system bus for storing intermediate computed results; a further memory connected to said control computer for storing an entire user program, said control computer operable to monitor the performance of said individual computers and to substitute an operable individual computer along with said safeguarding and further memories in response to faulty operation of an individual computer, means operable to periodically interpose a monitoring phase in the multi-phase operation of the system; means in the individual computers, including test program means in sai.d working memories, to check the functioning capacity of the respective computers; means for determining and signaling an intact or a defective module; means for storing the intermediate computed results in response to fault-free detection; means for 3~6 causing said memory to load thc program of a defective module into a substitute module in response to detection of a defective module; and means for causing the safeguarding memory to provide the intermediate results to the substitute module and continuing of the data processing originally undertaken.
In accordance with another broad aspect of the inventi.on there is provided a method of operating a computer system which has a control computer, a bus system connected to the control computer and a plurality of modules connected to the bus system each including a working memory storing test programs, an individual computer and a coupling memory, a safeguarding memory and a further memory storing an entire user program, comprising the steps of:
operating the system through a con-crol phase in which the control computer informs the individual cornputers of their f~mctions, an autonomous phase in which the individual computers carry out their functions, and a data exchange phase in which data is exchanged between computers; operating the system at regular intervals through a monitoring phase in which the individual test programs are rlm and contemporaneously checking the operating capabilities of each module, storing intermediate computed results in the safeguarding memory; continuing normal processing when faults are not found; loadi.ng the individual function of a defective module from the further memory into replacement module in response to detecti.on of a faulty module; and continuing processing with the replacement module and the intermediate results stored in the safeguarding memory.
BRIEF D SCRIPTION OF THE _ RAWING
Other objects, features and advantages of the invention, its organizati.on, construction and mode of operation will be best understood from the following detailed description, taken in conjunction with the accompanying drawing, on which there is a single fi~gure which is a block diagram -~a-~ 't~ ~ ~
illustration of an exemplary embodiment of a computer system which ;s con-structed and operates in accordance with the present invention.
DESCRIPTI~N OF THE PREFERRED EM DI~ENT
Referring to the drawing, the exemplary embodiment illustrated comprises a pluralIty of computer modules 11, 12, 13, 15, 16 and 18 which are coupled to a system data line. Each module comprises a coupling memory KS, an individual computer ER and a working memory AS. In each module, only the individual computer has access to its working memory, whereas access can be obtained to the coupling memory selectively from the individual computer or from the system bus. For purposes of fault recognition, each module is pro-vided with a parity production and checking unit and possesses its own out-put a for the parity fault message. By way of characterization, each module possesses a fixed module number and a module number which can be modified from the control computer. Furthermore, a control computer STR is provided -4b-. .
r~2~
which can be coupled to the system bus 1 and has access to a further memory GS and access via the system bus, to a safeguarding memory SS. The further memory GS preferably consists of a high-speed large-capacity memory, for example a disc memory. All of the individual computers are preferably micro-processors. The safeguarding memory SS is preferably identical in construc-tion to the coupling memory of a module. Also provided are a pulse generator T and a time monitoring device ZU which are both coupled to the control computer STR. The pulse train period of the pulse generator regularly triggers monitoring phases. All of the outputs a of the computer modules are likewise connected to the control computer STR.
In the following the cooperation of all the described components will be explained.
It has been assumed that the modules 11--15 are used to process the user program, whereas the modules 16--18 are redundant modules. The computer system which processes the user program comprises the modules 11--15, the control computer STR and the further memory GS and can simultaneously process as many sub-functions of the user program as computer modules 11^-15 are provided.
Computer system operates in the above-described three-phase cycle.
The computer state is established following each three-phase cycle by the individual functions stored in the modules and by the exchanged results which are primarily intermediate results.
Whereas the individual functions are fixed and can be called up, for example from the further memory, the intermediate results must be safe-guarded. Safeguarding is carried out, together with a check on the computer, in the additionally interposed monitoring phases.
The duration between two monitoring phases is determined by the period duration of the pulse generator T. The pulse generator transmits an 3~"~
interrupt request to the control computer which inserts a monitoring phase before the next data exchallge pilase.
A control computer starts test programs which are provided in all the modules and which carry out a function check of the modules. Here, it is necessary to use test programs which, in the case of fault-free modules, do not permanently alter the memory contents. The fault messages are stored in the coupling memory KS. The control computer now checks whether fault messages have been received from modules and trusted with the processing of a sub-function. If this is not the case, for the following data exchange phase the safeguarding memory is coupled to the system bus in order to simultaneously receive the intermediate results with the coupling memories of the modules entrusted with the sub-functions. The further processing of the user program is then continued without modification. If, however, the faults occur, the defective modules are replaced by intact, previously unused modules.
Replacement is carried out in the following steps: the module numbers, modifiable by the control computer, of the free and defective modules are exchanged and addressed during this procedure by way of the fixed module numbers; then, the missing individual functions are reloaded from the further memory which stores the entire user program. For the duration of the following exchange phase, the safeguarding memory is coupled to the system bus. In contrast ~o a fault-free situation in which the inter-mediate results have been written into the safeguarding memory, it now forms the source of safeguarded results. These safeguarded results are read from the safeguarding memory and transferred into the coupling memories.
- This fulfills the conditions for th0 restarting of the system.
The starting point is the control phase which follows the last phase cycle with a fault-free monitoring phase.
~3~326 In addition to initiation by the pulse generator T, monitoring phases can also be triggered by the time monitoring device ZU which indicates an impermissibly long autonomous phase or by a parity fault message from one of the modules which appears at the output a. In these situations, the modules are checked immediately and, only after the conclusion of the auto-nomous phase.
Although I have described my invention by reference to a particular illustrative embodiment thereof, many changes and modifications of the inven-tion may become apparent to those skilled in the art without departing from the spirit and scope of the invention. I therefore in~end to include within the patent warranted hereon all such changes and modifications as may reason-ably and properly be included within the scope of my contribution to the art.
Advantageously, the computer system processes the user program in a three-phase cycle.
Advantageously, the computer system is operated in such a manner that after as few as possible phase cycles a monitoring phase is additionally inserted between the autonomous phase and the next data exchange phase.
For triggering of the monitoring phases, the computer system is advantageously provided with a pulse generator which is coupled to a control computer and which triggers the monitoring phases with a period of a pulse train.
In order to exchange a defective module for an intact module, it is expedient for each module to be provided with a fixed module number and a module number which can be modified by the control computer for character-ization purposes. The exchange process is then expediently carried out in that the modifiable module numbers of the defective modules are exchanged with those of intact modules, and their fixed module numbers are used for addressing purposes.
Advantageously, a computer system constructed in accordance with the present invention is provided with a time monitoring device which is coupled to the computer system and which indicates an impermissibly long auto-nomous phase and immediately i.ntroduces an addi~ional monitoring phase.
The computer system can advantageously be designed in such a manner that each module possesses a parity production and checking unit which con-stantly monitors the module and, on the recognition of a defect, reports this defect to the control computer by means of a parity fault message and thus immediately triggers a monitoring phase.
Thus, in accordance with one broad aspect of the invention, there is provided a computer system comprising: a control computer; a system bus system, including a control and address bus and a data bus, connected to said control computer; a plurality of computer modules connected to said system bus, each including an individual computer, a coupling memory and a working memory, access to said coupling memory being had from said system bus and from said individual computer; an information safeguarding memory connected to said system bus for storing intermediate computed results; a further memory connected to said control computer for storing an entire user program, said control computer operable to monitor the performance of said individual computers and to substitute an operable individual computer along with said safeguarding and further memories in response to faulty operation of an individual computer, means operable to periodically interpose a monitoring phase in the multi-phase operation of the system; means in the individual computers, including test program means in sai.d working memories, to check the functioning capacity of the respective computers; means for determining and signaling an intact or a defective module; means for storing the intermediate computed results in response to fault-free detection; means for 3~6 causing said memory to load thc program of a defective module into a substitute module in response to detection of a defective module; and means for causing the safeguarding memory to provide the intermediate results to the substitute module and continuing of the data processing originally undertaken.
In accordance with another broad aspect of the inventi.on there is provided a method of operating a computer system which has a control computer, a bus system connected to the control computer and a plurality of modules connected to the bus system each including a working memory storing test programs, an individual computer and a coupling memory, a safeguarding memory and a further memory storing an entire user program, comprising the steps of:
operating the system through a con-crol phase in which the control computer informs the individual cornputers of their f~mctions, an autonomous phase in which the individual computers carry out their functions, and a data exchange phase in which data is exchanged between computers; operating the system at regular intervals through a monitoring phase in which the individual test programs are rlm and contemporaneously checking the operating capabilities of each module, storing intermediate computed results in the safeguarding memory; continuing normal processing when faults are not found; loadi.ng the individual function of a defective module from the further memory into replacement module in response to detecti.on of a faulty module; and continuing processing with the replacement module and the intermediate results stored in the safeguarding memory.
BRIEF D SCRIPTION OF THE _ RAWING
Other objects, features and advantages of the invention, its organizati.on, construction and mode of operation will be best understood from the following detailed description, taken in conjunction with the accompanying drawing, on which there is a single fi~gure which is a block diagram -~a-~ 't~ ~ ~
illustration of an exemplary embodiment of a computer system which ;s con-structed and operates in accordance with the present invention.
DESCRIPTI~N OF THE PREFERRED EM DI~ENT
Referring to the drawing, the exemplary embodiment illustrated comprises a pluralIty of computer modules 11, 12, 13, 15, 16 and 18 which are coupled to a system data line. Each module comprises a coupling memory KS, an individual computer ER and a working memory AS. In each module, only the individual computer has access to its working memory, whereas access can be obtained to the coupling memory selectively from the individual computer or from the system bus. For purposes of fault recognition, each module is pro-vided with a parity production and checking unit and possesses its own out-put a for the parity fault message. By way of characterization, each module possesses a fixed module number and a module number which can be modified from the control computer. Furthermore, a control computer STR is provided -4b-. .
r~2~
which can be coupled to the system bus 1 and has access to a further memory GS and access via the system bus, to a safeguarding memory SS. The further memory GS preferably consists of a high-speed large-capacity memory, for example a disc memory. All of the individual computers are preferably micro-processors. The safeguarding memory SS is preferably identical in construc-tion to the coupling memory of a module. Also provided are a pulse generator T and a time monitoring device ZU which are both coupled to the control computer STR. The pulse train period of the pulse generator regularly triggers monitoring phases. All of the outputs a of the computer modules are likewise connected to the control computer STR.
In the following the cooperation of all the described components will be explained.
It has been assumed that the modules 11--15 are used to process the user program, whereas the modules 16--18 are redundant modules. The computer system which processes the user program comprises the modules 11--15, the control computer STR and the further memory GS and can simultaneously process as many sub-functions of the user program as computer modules 11^-15 are provided.
Computer system operates in the above-described three-phase cycle.
The computer state is established following each three-phase cycle by the individual functions stored in the modules and by the exchanged results which are primarily intermediate results.
Whereas the individual functions are fixed and can be called up, for example from the further memory, the intermediate results must be safe-guarded. Safeguarding is carried out, together with a check on the computer, in the additionally interposed monitoring phases.
The duration between two monitoring phases is determined by the period duration of the pulse generator T. The pulse generator transmits an 3~"~
interrupt request to the control computer which inserts a monitoring phase before the next data exchallge pilase.
A control computer starts test programs which are provided in all the modules and which carry out a function check of the modules. Here, it is necessary to use test programs which, in the case of fault-free modules, do not permanently alter the memory contents. The fault messages are stored in the coupling memory KS. The control computer now checks whether fault messages have been received from modules and trusted with the processing of a sub-function. If this is not the case, for the following data exchange phase the safeguarding memory is coupled to the system bus in order to simultaneously receive the intermediate results with the coupling memories of the modules entrusted with the sub-functions. The further processing of the user program is then continued without modification. If, however, the faults occur, the defective modules are replaced by intact, previously unused modules.
Replacement is carried out in the following steps: the module numbers, modifiable by the control computer, of the free and defective modules are exchanged and addressed during this procedure by way of the fixed module numbers; then, the missing individual functions are reloaded from the further memory which stores the entire user program. For the duration of the following exchange phase, the safeguarding memory is coupled to the system bus. In contrast ~o a fault-free situation in which the inter-mediate results have been written into the safeguarding memory, it now forms the source of safeguarded results. These safeguarded results are read from the safeguarding memory and transferred into the coupling memories.
- This fulfills the conditions for th0 restarting of the system.
The starting point is the control phase which follows the last phase cycle with a fault-free monitoring phase.
~3~326 In addition to initiation by the pulse generator T, monitoring phases can also be triggered by the time monitoring device ZU which indicates an impermissibly long autonomous phase or by a parity fault message from one of the modules which appears at the output a. In these situations, the modules are checked immediately and, only after the conclusion of the auto-nomous phase.
Although I have described my invention by reference to a particular illustrative embodiment thereof, many changes and modifications of the inven-tion may become apparent to those skilled in the art without departing from the spirit and scope of the invention. I therefore in~end to include within the patent warranted hereon all such changes and modifications as may reason-ably and properly be included within the scope of my contribution to the art.
Claims (8)
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:
1. A computer system comprising: a control computer; a system bus system, including a control and address bus and a data bus, connected to said control computer; a plurality of computer modules connected to said system bus, each including an individual computer, a coupling memory and a working memory, access to said coupling memory being had from said system bus and from said individual computer; an information safeguarding memory connected to said system bus for storing intermediate computed results; a further memory connected to said control computer for storing an entire user program, said control computer operable to monitor the performance of said individual computers and to substitute an operable individual computer along with said safeguarding and further memories in response to faulty operation of an individual computer, means operable to periodically interpose a monitoring phase in the multi-phase operation of the system; means in the individual computers, including test program means in said working memories, to check the functioning capacity of the respective computers; means for determining and signaling an intact or a defective module; means for storing the intermediate computed results in response to fault-free detection; means for causing said further memory to load the program of a defective module into a substitute module in response to detection of a defective module; and means for causing the safeguarding memory to provide the intermediate results to the substitute module and continuing of the data processing originally undertaken.
2. The computer system of claim 1, wherein said control computer in-cludes means operable to control said modules in three-phase operation including a control phase informing the individual computers of their pro-cesses, an autonomous phase in which the individual processes. are carried out, and a data exchange phase in which data is exchanged between computers.
3. The computer system of claim 2, comprising means. for periodically interposing a monitoring phase between said autonomous phase and said data exchange phase.
4. The computer of claim 1, wherein the first-mentioned means includes a pulse train generator connected to said control computer for triggering the monitoring phases with the period of the pulse train produced by said pulse generator.
5. The computer system of claim 1, wherein each of said modules com-prises: fixed address number means; and modifiable address number means.
6. The computer system of claim 1, comprising time monitoring means connected to said control computer for monitoring the time of operation of said modules and causing said control computer to effect monitoring of the functioning of said modules in response to the time of operation being greater than a predetermined time interval.
7. The computer system of claim 1, wherein the means for determining and signaling an intact or a defective module includes a parity checking circuit in each module operable to transmit a fault message to said control computer in response to defective operation of a module.
8. A method of operating a computer system which has a control com-puter, a bus system connected to the control computer and a plurality of modules connected to the bus system each including a working memory storing test programs, an individual computer and a coupling memory, a safeguarding memory and a further memory storing an entire user program, comprising the steps of: operating the system through a control phase in which the control computer informs the individual computers of their functions, an autonomous phase in which the individual computers carry out their functions, and a data exchange phase in which data is exchanged between computers; operating the system at regular intervals through a monitoring phase in which the individual test programs are run and contemporaneously checking the operating capabilities of each module, storing intermediate computed results in the safeguarding memory; continuing normal processing when faults are not found; loading the individual function of a defective module from the further memory into re-placement module in response to detection of a faulty module; and continuing processing with the replacement module and the intermediate results stored in the safeguarding memory.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE19772741379 DE2741379A1 (en) | 1977-09-14 | 1977-09-14 | COMPUTER SYSTEM |
DEP2741379.2 | 1977-09-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
CA1143026A true CA1143026A (en) | 1983-03-15 |
Family
ID=6018946
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA000311096A Expired CA1143026A (en) | 1977-09-14 | 1978-09-12 | Computer system |
Country Status (8)
Country | Link |
---|---|
JP (1) | JPS5451439A (en) |
BE (1) | BE870484A (en) |
CA (1) | CA1143026A (en) |
DE (1) | DE2741379A1 (en) |
FR (1) | FR2403598B1 (en) |
GB (1) | GB2004673B (en) |
IT (1) | IT1098538B (en) |
NL (1) | NL7809313A (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4412281A (en) * | 1980-07-11 | 1983-10-25 | Raytheon Company | Distributed signal processing system |
GB2217487B (en) * | 1988-04-13 | 1992-09-23 | Yokogawa Electric Corp | Dual computer system |
GB2369538B (en) * | 2000-11-24 | 2004-06-30 | Ibm | Recovery following process or system failure |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB1243464A (en) * | 1969-01-17 | 1971-08-18 | Plessey Telecomm Res Ltd | Stored-programme controlled data-processing systems |
JPS5633915B1 (en) * | 1970-11-06 | 1981-08-06 | ||
JPS5627905B1 (en) * | 1970-11-06 | 1981-06-27 | ||
BE789828A (en) * | 1972-10-09 | 1973-04-09 | Bell Telephone Mfg | DATA PROCESSING OPERATING SYSTEM. |
CA1053352A (en) * | 1974-11-12 | 1979-04-24 | Scott A. Inrig | Method for providing a substitute memory module in a data processing system |
DE2546202A1 (en) * | 1975-10-15 | 1977-04-28 | Siemens Ag | COMPUTER SYSTEM OF SEVERAL INTERCONNECTED AND INTERACTING INDIVIDUAL COMPUTERS AND PROCEDURES FOR OPERATING THE COMPUTER SYSTEM |
-
1977
- 1977-09-14 DE DE19772741379 patent/DE2741379A1/en not_active Ceased
-
1978
- 1978-09-08 FR FR7825868A patent/FR2403598B1/en not_active Expired
- 1978-09-12 CA CA000311096A patent/CA1143026A/en not_active Expired
- 1978-09-12 JP JP11214678A patent/JPS5451439A/en active Granted
- 1978-09-13 NL NL7809313A patent/NL7809313A/en not_active Application Discontinuation
- 1978-09-13 GB GB7836732A patent/GB2004673B/en not_active Expired
- 1978-09-13 IT IT27595/78A patent/IT1098538B/en active
- 1978-09-14 BE BE190488A patent/BE870484A/en not_active IP Right Cessation
Also Published As
Publication number | Publication date |
---|---|
DE2741379A1 (en) | 1979-03-15 |
JPS618988B2 (en) | 1986-03-19 |
BE870484A (en) | 1979-01-02 |
FR2403598B1 (en) | 1985-08-30 |
JPS5451439A (en) | 1979-04-23 |
IT1098538B (en) | 1985-09-07 |
GB2004673B (en) | 1982-02-03 |
NL7809313A (en) | 1979-03-16 |
IT7827595A0 (en) | 1978-09-13 |
FR2403598A1 (en) | 1979-04-13 |
GB2004673A (en) | 1979-04-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0045836B1 (en) | Data processing apparatus including a bsm validation facility | |
EP0031501A2 (en) | Diagnostic and debugging arrangement for a data processing system | |
EP0496506A2 (en) | A processing unit for a computer and a computer system incorporating such a processing unit | |
US3959638A (en) | Highly available computer system | |
CA2032067A1 (en) | Fault-tolerant computer system with online reintegration and shutdown/restart | |
JPH0950424A (en) | Dump sampling device and dump sampling method | |
JPH0526214B2 (en) | ||
JPS6375963A (en) | System recovery system | |
CA1143026A (en) | Computer system | |
US20070055480A1 (en) | System and method for self-diagnosis in a controller | |
TW200307200A (en) | Multiple fault location in a series of devices | |
JPH11120154A (en) | Device and method for access control in computer system | |
JPH0754947B2 (en) | Standby system monitoring method | |
JPH1027115A (en) | Fault information sampling circuit for computer system | |
Blakeney et al. | An application-oriented multiprocessing system, II: Design characteristics of the 9020 system | |
JPH07114521A (en) | Multimicrocomputer system | |
JPH047645A (en) | Fault tolerant computer | |
EP0342261B1 (en) | Arrangement for error recovery in a self-guarding data processing system | |
SU849219A1 (en) | Data processing system | |
JPH079636B2 (en) | Bus diagnostic device | |
JPS6113627B2 (en) | ||
JP3042034B2 (en) | Failure handling method | |
Dieterich et al. | A compatible airborne multiprocessor | |
JPH07334383A (en) | Computer with monitoring and diagnostic function | |
JPH02122335A (en) | Test method for ras circuit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MKEX | Expiry |