GB2086104A - Circuit Arrangement for Detecting Malfunctioning in Data Processing Systems - Google Patents
Circuit Arrangement for Detecting Malfunctioning in Data Processing Systems Download PDFInfo
- Publication number
- GB2086104A GB2086104A GB8128117A GB8128117A GB2086104A GB 2086104 A GB2086104 A GB 2086104A GB 8128117 A GB8128117 A GB 8128117A GB 8128117 A GB8128117 A GB 8128117A GB 2086104 A GB2086104 A GB 2086104A
- Authority
- GB
- United Kingdom
- Prior art keywords
- unit
- central processing
- diagnosis
- signal
- data bus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000012545 processing Methods 0.000 title claims abstract description 57
- 238000003745 diagnosis Methods 0.000 claims abstract description 24
- 230000015654 memory Effects 0.000 claims abstract description 16
- 238000012546 transfer Methods 0.000 claims description 10
- 238000004092 self-diagnosis Methods 0.000 claims description 2
- 230000007257 malfunction Effects 0.000 abstract 1
- 238000000034 method Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 238000001208 nuclear magnetic resonance pulse sequence Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0745—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in an input/output transactions management context
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/1629—Error detection by comparing the output of redundant processing systems
- G06F11/1641—Error detection by comparing the output of redundant processing systems where the comparison is not performed by the redundant processing components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/1629—Error detection by comparing the output of redundant processing systems
- G06F11/165—Error detection by comparing the output of redundant processing systems with continued operation after detection of the error
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
- G06F11/26—Functional testing
- G06F11/267—Reconfiguring circuits for testing, e.g. LSSD, partitioning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/32—Monitoring with visual or acoustical indication of the functioning of the machine
- G06F11/324—Display of status information
- G06F11/327—Alarm or error message display
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computer Hardware Design (AREA)
- Hardware Redundancy (AREA)
- Debugging And Monitoring (AREA)
- Facsimiles In General (AREA)
- Test And Diagnosis Of Digital Computers (AREA)
Abstract
The data processing system comprises a first central processing unit CPU, an interrupt controlling unit INT, a direct-memory access unit DMA, and a memory unit MEM connected together by a data bus B1. The circuit arrangement comprises first, second, third, and fourth means PM, SM, TM, QM which provide alarm signals upon detecting malfunctions in the central processing unit, the interrupt controller, the direct memory access unit, and the memory, respectively. A diagnosis unit UDG resets and disconnects the central processing unit from the data bus and is connected to the bus when an alarm signal is generated so as to carry out diagnosis programmes for detecting the faulty member. When correct functioning has been restored, the diagnosis unit reconnects the central processing unit to the data bus. <IMAGE>
Description
SPECIFICATION
Improvements In or Relating to Circuit
Arrangements for Detecting Malfunctioning in
Data Processing Systems
The present invention relates to a circuit arrangement for detecting errors made by the units forming a data processing system, for instance, controlled by a microprocessor of commercial type, and also for identifying the component whose malfunctioning has caused the error.
With some data processing systems controlled by a commercial-type microprocessor the problem exists of quickly detecting the presence of errors made by single modules in the processing system, thereby preventing their propagation. This applies to data processing systems controlled by a commercial-type microprocessor and used in telephone systems.
For instance the case where the system referred to above is arranged to process codes indicating user charging criteria will be considered. In this case, should the processing of such criteria be affected by errors, a drawback arises in that the user to whom the criteria relate is erroneously charged, and thus it is necessary to provide suitable means for sensing the presence of such errors as well for inhibiting the microprocessor until correct operation of the component that has made the error has been restored, thereby preventing the error from being propagated (in this example, errors are propagated up to memories storing user charging data).
To meet the above described requirements, some known solutions involve triplication of most units (e.g. the central processing unit, the memory unit, the input-output unit, etc.) in the processing system and also comprise suitable means for detecting malfunctioning on the ground of a majority evaluation of the single-unit outputs.
Such means is termed "voter" in the Specific art and is arranged to generate an alarm signal when the signal available at the output of a given unit differ from the output signals provided by the remaining pair of units.
This kind of solution is especially adopted in systems installed on board satellites (which thus cannot be repaired) but has a number of drawbacks when used in repairable systems, including:
particularly high cost owing to the fact that the above specified units are triplicated and a voter is used;
the voter must be, in turn, triplicated to ensure the same level of realiability as that attained with the triplicated units;
moreover, the repairable systems must include suitable means for detecting the unit in which malfunctioning has occurred;
it is necessary to triplicate the timing signal generator and to provide synchronization means for the three generators.
According to the invention there is provided a circuit arrangement for detecting malfunctioning in a data processing system comprising a data bus which is connected to a first central processing unit by way of first signal sending members, an interruption controlling unit, a data direct transfer unit, and a memory unit, the circuit arrangement comprising: first means for generating an alarm signal upon detecting malfunctioning in the central processing unit; second means for generating an alarm signal upon detecting malfunctioning in the interrupt controlling unit; third means for generating an alarm signal upon detecting malfunctioning in the data direct transfer unit; fourth means for generating an alarm signal upon detecting malfunctioning in the memory unit; and a diagnosis unit which is arranged to reset and disconnect the central processing unit from the data bus upon receiving one of the alarm signals, and to be connected to the data bus and to carry out diagnosis programmes designed to detect the member whose malfunctioning has caused the alarm signal to be generated, and to be disconnected from the data bus and to restore the connection of the central processing unit to the data bus upon detecting correct functioning of all units in the processing system.
It is thus possible to provide a simple and economic circuit arrangement capable of endowing a processing system with a degree of realiability comparable to that obtained by the prior art and of substantially reducing the above specified drawbacks.
The circuits forming such a circuit arrangement constitute only a fraction of the circuits in the data processing system, contrary to the known solution referred to above which calls for the use of a number of circuits twice that of the circuits forming the processing system.
The invention will be further described, by way of example, with reference to the accompanying drawing, which shows the circuit of a data processing system indicated by thick lines and of a circuit arrangement constituting a preferred embodiment of the present invention by thin lines.
In the drawing, the thick lines represent the circuit of a data processing system which comprises a data bus B1 to which the following units are connected:
a central processing unit CPU1 which is connected to the data bus B1 by way of first signal sending members;
an interrupt controlling unit INT;
a data direct transfer controlling (direct memory access) unit DMA; and
a memory unit MEM.
A circuit arrangement constituting a preferred embodiment of the present invention is illustrated by thin lines and comprises means arranged to generate an alarm signal upon detecting malfunctioning in any one of the above specified units.
More particularly, the central processing unit
CPU1 is associated with first means PM comprising a further central processing unit CPU2 which operates synchronously with the unit CPU, and thus both units receive on their inputs the timing pulses CK1 generated by a first timing unit
UT,. The signals available at the output of the units CPU1 and CPU2 are sent to the inputs of a comparison circuit CFR arranged to generate an alarm signal A, upon detecting lack of identity in the signals provided on its inputs.
In this way, should one of the two units CPU erroneously process a datum, its output signals will differ from the signals generated by the other processing unit and such an event is sensed by the circuit CFR which produces the output A,.
The interrupt controlling unit INT is associated with second means SM for generating an alarm signal A2 upon receiving, from one of the interrupt signal generators, an operative-programme interrupt request having a priority smaller than, or equal to, that of the requests being dealt with. An embodiment of the second means SM is illustrated in the Italian Patent Application N.
24467 A/80.
The data direct transfer controlling unit DMA is associated with third means TM for generating an alarm signal A3 when the parity bit of the input data of the unit DMA differs from the parity bit of the output data from the same unit. The third means TM are also designed to generate an alarm signal A4 when the parity bit of the i-th address computed on the ground of the (i-1 Xth address available during transfer of the datum d,~1 differs from the parity bit of the i-th output address provided along with the datum dl. An embodiment of the third means TM is disclosed in the Italian
Patent Application No. 24466 A/80.
The memory unit MEM is associated with fourth means QM for detecting malfunctioning by checking parity both in the data and the addresses, and also for generating an alarm signal
A5 upon detecting malfunctioning. The fourth means QM are not described in detail as they include circuit structures well known to a person skilled in the art.
The alarm signals A1 to A5 are sent to an ORgate OR belonging to a diagnosis unit UDG which also comprises a data bus B2 to which the following units are connected:
a central processing unit CPU,; a memory unit MM of limited size; and
an input-output unit I/O to which a communication channel with an operator's place (not shown) is connected.
The unit CPU3 receives on its input a timing signal CK2 available at the output of a second timing unit UT2 which is asynchronous with respect to the unit UT1. The data bus B2 of the diagnosis unit is connected to the data bus B1 of the processing system by way of second signal transfer members which are enabled by a first bistable circuit FF1. The bistable FF1 is switched to the ON state and to the OFF state by a first signal
C1 and a signal C2, respectively, generated by the operative programme of the diagnosis unit.
The first signal transfer members which connect the units CPU1 and CPU2 to the data bus
B1 are enabled by a second bistable circuit FF2 which is switched to the ON state by the signal available at the output of the OR-gate OR and to the OFF state by a third signal C3 generated by the operative programme of the diagnosis unit.
When the OR-gate OR receives an alarm signal on its input, the bistable FF2 is switched, and thus its output disables the first signal sending members (by disconnecting the units CPU1 and
CPU2 from the data bus), resets the units CPU and CPU2 which must then start again the operative programme from the beginning, and generates an interrupt request for the unit CPU3.
Upon receiving an interrupt request, the unit
CPU3 starts the execution of a diagnosis programme whose first instruction provides the signal C1 to be generated which causes switching of the bistable FF1, thereby enabling the second signal transfer members (by connecting the diagnosis unit to the data bus of the processing system).
Thus, the timing pulses generated by the unit
UT2 are sent to the data bus of the processing system with a given delay with respect to disconnection of the unit UT1, thereby making it possible to execute refreshing operations of the memory units without causing conflicts due to disconnection-connection operations of two asynchronous timing pulse sequences.
Once the diagnosis programme carried out by the unit CPU3 has spotted the member whose malfunctioning has generated the alarm Ai and once such member has been replaced on the basis of the information sent to the operator's place, the programme carried out by the unit
CPU3 terminates with a particular instruction which causes generation of the signal C2 for switching of the bistable FF1 to the OFF state, thereby disconnecting the diagnosis unit UDG from the data bus B1 of the processing system.
This instruction is followed by a further instruction which provides the generation of a signal C3 which causes the bistable FF2 to be switched to the OFF state, thereby connecting the data bus B1 of the processing system to the pair of central processing units CPU1 and CPU2.
In this case also, no conflicts concerning timing pulses reaching the data bus of the processing system arise since the disconnection-connection operations in the two pulse sequencies are performed in such a way that a short length of time elapses therebetween.
The diagnosis unit UDG further comprises a unit ADG arranged to execute self-diagnosis programms of the unit UDG during the time interval during which no alarm signals are available at the input of the OR gate OR.
Should the unit ADG sense malfunctioning, it energizes its output, thereby generating a nonmaskable interrupt request for the units CPU1 and
CPU2 which then take over the diagnosis tasks by making use of the memories and the input-output unit of the processing system.
With such a circuit arrangement, it is possible to attain a realiability degree comparable to that obtainable by the previously described prior art arrangement through the addition of a limited number of circuits to the circuits of the processing system. Thus, only the central processing unit is triplicated, whereas the remaining circuits constitute a fraction of the overall circuits of the processing system.
Claims (6)
1. A circuit arrangement for detecting malfunctioning in a data processing system comprising a data bus which is connected to a first central processing unit by way of first signal sending members, an interrupt controlling unit, data direct transfer unit, and a memory unit, the circuit arrangement comprising: first means for generating an alarm signal upon detecting malfunctioning in the central processing unit; second means for generating an alarm signal upon detecting malfunctioning in the interrupt controlling unit; third means for generating an alarm signal upon detecting malfunctioning in the data direct transfer unit; fourth means for generating an alarm signal upon detecting malfunctioning in the memory unit; and a diagnosis unit which is arranged to reset and disconnect the central processing unit from the data bus upon receiving one of the alarm signals, and to be connected to the data bus and to carry out diagnosis programmes designed to detect the member whose malfunctioning has caused the alarm signal to be generated, and to be disconnected from the data bus and to restore the connection of the central processing unit to the data bus upon detecting correct functioning of all units in the processing system.
2. A circuit arrangement as claimed in claim 1, in which the first means comprises a second central processing unit arranged to operate synchronously with the first central processing unit of the data processing system, and a comparison circuit arranged to receive the signals at the outputs of the first and second central processing units and to generate an alarm signal upon detecting lack of identity between the signals at its inputs
3.A circuit arrangement as claimed in claim 1 or 2, in which the diagnosis unit comprises a data bus connectable to the data bus of the processing system by way of second signal sending members and connected to a third central processing unit arranged to receive pulses provided by a second timing unit, a memory unit, and an input-output unit, the diagnosis unit further comprising: an OR gate arranged to receive the alarm signals and having an output connected to an interrupt request input of the third central processing unit; a first bistable circuit arranged to be switched to the ON and OFF states by a first and second signals, respectively, generated at the beginning of the diagnosis programme and after the diagnosis programme has sensed correct operation of the member which had caused the generation of the alarm signal, respectively, the output signal of the said first bistable circuit enabling or disabling the second signal sending members; and a second bistable circuit arranged to be switched to the ON and OFF states upon energization of the output of the OR gate and by a third signal generated by the diagnosis programme in an instant following the generation of the second signal, respectively, the output signal of the second bistable circuit being supplied to the reset input of the first and second central processing units and to the enabling input of the first signal sending members.
4. A circuit arrangement as claimed in claim 3, in which the diagnosis unit comprises a selfdiagnosis unit arranged to generate an alarm signal upon detecting malfunctioning in the diagnosis unit and to send the alarm signal to the interrupt request inputs of the first and second central processing units.
5. A circuit arrangement substantially as hereinbefore described with reference to and as illustrated in the accompanying drawing.
6. A data processing system including a circuit arrangement as claimed in any one of the preceding claims.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IT8024701A IT8024701A0 (en) | 1980-09-17 | 1980-09-17 | CIRCUIT ARRANGEMENT SUITABLE FOR DETECTING THE PRESENCE OF MALFUNCTIONS IN A DATA PROCESSING SYSTEM USING A COMMERCIAL TYPE MICROPROCESSOR. |
Publications (1)
Publication Number | Publication Date |
---|---|
GB2086104A true GB2086104A (en) | 1982-05-06 |
Family
ID=11214447
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB8128117A Withdrawn GB2086104A (en) | 1980-09-17 | 1981-09-17 | Circuit Arrangement for Detecting Malfunctioning in Data Processing Systems |
Country Status (5)
Country | Link |
---|---|
BR (1) | BR8105689A (en) |
DE (1) | DE3137046A1 (en) |
FR (1) | FR2490366A1 (en) |
GB (1) | GB2086104A (en) |
IT (1) | IT8024701A0 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2235075A (en) * | 1989-06-23 | 1991-02-20 | Ansaldo Spa | Switching module for pairs of homologous processors connected to at least one communication bus |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE3334765A1 (en) * | 1983-09-26 | 1985-04-11 | Siemens AG, 1000 Berlin und 8000 München | TEST DEVICE FOR DETECTING ERRORS IN DOUBLE CIRCUITS, IN PARTICULAR PROCESSORS OF A TELEPHONE SWITCHING SYSTEM |
DE3335695C1 (en) * | 1983-09-28 | 1985-04-18 | Siemens AG, 1000 Berlin und 8000 München | Testing device for monitoring the data channel during data interchange between central processors in microprocessor-controlled telephone switching systems |
GB9101227D0 (en) * | 1991-01-19 | 1991-02-27 | Lucas Ind Plc | Method of and apparatus for arbitrating between a plurality of controllers,and control system |
DE4241319A1 (en) * | 1992-12-09 | 1994-06-16 | Ant Nachrichtentech | Computer system |
DE10029642B4 (en) * | 2000-06-15 | 2005-10-20 | Daimler Chrysler Ag | Device for monitoring a vehicle data bus system |
DE10148325A1 (en) | 2001-09-29 | 2003-04-17 | Daimler Chrysler Ag | Central node of data bus system with bus monitor unit e.g. for motor vehicles and aircraft, has diagnosis unit integrated into central node |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3768074A (en) * | 1972-05-12 | 1973-10-23 | Burroughs Corp | Multiprocessing system having means for permissive coupling of different subsystems |
US3921141A (en) * | 1973-09-14 | 1975-11-18 | Gte Automatic Electric Lab Inc | Malfunction monitor control circuitry for central data processor of digital communication system |
US4023142A (en) * | 1975-04-14 | 1977-05-10 | International Business Machines Corporation | Common diagnostic bus for computer systems to enable testing concurrently with normal system operation |
US4145734A (en) * | 1975-04-22 | 1979-03-20 | Compagnie Honeywell Bull (Societe Anonyme) | Method and apparatus for implementing the test of computer functional units |
DE2612100A1 (en) * | 1976-03-22 | 1977-10-06 | Siemens Ag | DIGITAL DATA PROCESSING ARRANGEMENT, IN PARTICULAR FOR RAILWAY SAFETY TECHNOLOGY |
-
1980
- 1980-09-17 IT IT8024701A patent/IT8024701A0/en unknown
-
1981
- 1981-09-04 FR FR8116805A patent/FR2490366A1/en not_active Withdrawn
- 1981-09-04 BR BR8105689A patent/BR8105689A/en unknown
- 1981-09-17 GB GB8128117A patent/GB2086104A/en not_active Withdrawn
- 1981-09-17 DE DE19813137046 patent/DE3137046A1/en not_active Withdrawn
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2235075A (en) * | 1989-06-23 | 1991-02-20 | Ansaldo Spa | Switching module for pairs of homologous processors connected to at least one communication bus |
GB2235075B (en) * | 1989-06-23 | 1993-05-26 | Ansaldo Spa | Switching module for pairs of homologous processors connected to at least one communication bus |
Also Published As
Publication number | Publication date |
---|---|
IT8024701A0 (en) | 1980-09-17 |
FR2490366A1 (en) | 1982-03-19 |
DE3137046A1 (en) | 1982-04-01 |
BR8105689A (en) | 1982-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5068851A (en) | Apparatus and method for documenting faults in computing modules | |
US4366535A (en) | Modular signal-processing system | |
US5249187A (en) | Dual rail processors with error checking on I/O reads | |
EP0306244B1 (en) | Fault tolerant computer system with fault isolation | |
CA1310129C (en) | Interface of non-fault tolerant components to fault tolerant system | |
CA1306546C (en) | Dual zone, fault tolerant computer system with error checking on i/o writes | |
US5185877A (en) | Protocol for transfer of DMA data | |
EP0514075A2 (en) | Fault tolerant processing section with dynamically reconfigurable voting | |
US5068780A (en) | Method and apparatus for controlling initiation of bootstrap loading of an operating system in a computer system having first and second discrete computing zones | |
EP0650615B1 (en) | A fault-tolerant computer system | |
US4866604A (en) | Digital data processing apparatus with pipelined memory cycles | |
US5251227A (en) | Targeted resets in a data processor including a trace memory to store transactions | |
US5048022A (en) | Memory device with transfer of ECC signals on time division multiplexed bidirectional lines | |
JPH079625B2 (en) | Computer with fault-tolerant capabilities | |
US5163138A (en) | Protocol for read write transfers via switching logic by transmitting and retransmitting an address | |
US6532545B1 (en) | Apparatus for swapping, adding or removing a processor in an operating computer system | |
EP0411805B1 (en) | Bulk memory transfer during resync | |
GB2086104A (en) | Circuit Arrangement for Detecting Malfunctioning in Data Processing Systems | |
US20040193735A1 (en) | Method and circuit arrangement for synchronization of synchronously or asynchronously clocked processor units | |
KR100583214B1 (en) | Information processing apparatus | |
RU1792540C (en) | Multiprocessor computation system | |
EP0416732B1 (en) | Targeted resets in a data processor | |
JPH06242979A (en) | Dual computer device | |
SU1734251A1 (en) | Double-channel redundant computing system | |
CA1316608C (en) | Arrangement for error recovery in a self-guarding data processing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WAP | Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1) |