GB2086104A - Circuit Arrangement for Detecting Malfunctioning in Data Processing Systems - Google Patents

Circuit Arrangement for Detecting Malfunctioning in Data Processing Systems Download PDF

Info

Publication number
GB2086104A
GB2086104A GB8128117A GB8128117A GB2086104A GB 2086104 A GB2086104 A GB 2086104A GB 8128117 A GB8128117 A GB 8128117A GB 8128117 A GB8128117 A GB 8128117A GB 2086104 A GB2086104 A GB 2086104A
Authority
GB
United Kingdom
Prior art keywords
unit
central processing
diagnosis
signal
data bus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB8128117A
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Italtel SpA
Original Assignee
Italtel SpA
Italtel Societa Italiana Telecomunicazioni SpA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Italtel SpA, Italtel Societa Italiana Telecomunicazioni SpA filed Critical Italtel SpA
Publication of GB2086104A publication Critical patent/GB2086104A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0745Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in an input/output transactions management context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1629Error detection by comparing the output of redundant processing systems
    • G06F11/1641Error detection by comparing the output of redundant processing systems where the comparison is not performed by the redundant processing components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1629Error detection by comparing the output of redundant processing systems
    • G06F11/165Error detection by comparing the output of redundant processing systems with continued operation after detection of the error
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/26Functional testing
    • G06F11/267Reconfiguring circuits for testing, e.g. LSSD, partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Hardware Design (AREA)
  • Hardware Redundancy (AREA)
  • Debugging And Monitoring (AREA)
  • Facsimiles In General (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

The data processing system comprises a first central processing unit CPU, an interrupt controlling unit INT, a direct-memory access unit DMA, and a memory unit MEM connected together by a data bus B1. The circuit arrangement comprises first, second, third, and fourth means PM, SM, TM, QM which provide alarm signals upon detecting malfunctions in the central processing unit, the interrupt controller, the direct memory access unit, and the memory, respectively. A diagnosis unit UDG resets and disconnects the central processing unit from the data bus and is connected to the bus when an alarm signal is generated so as to carry out diagnosis programmes for detecting the faulty member. When correct functioning has been restored, the diagnosis unit reconnects the central processing unit to the data bus. <IMAGE>

Description

SPECIFICATION Improvements In or Relating to Circuit Arrangements for Detecting Malfunctioning in Data Processing Systems The present invention relates to a circuit arrangement for detecting errors made by the units forming a data processing system, for instance, controlled by a microprocessor of commercial type, and also for identifying the component whose malfunctioning has caused the error.
With some data processing systems controlled by a commercial-type microprocessor the problem exists of quickly detecting the presence of errors made by single modules in the processing system, thereby preventing their propagation. This applies to data processing systems controlled by a commercial-type microprocessor and used in telephone systems.
For instance the case where the system referred to above is arranged to process codes indicating user charging criteria will be considered. In this case, should the processing of such criteria be affected by errors, a drawback arises in that the user to whom the criteria relate is erroneously charged, and thus it is necessary to provide suitable means for sensing the presence of such errors as well for inhibiting the microprocessor until correct operation of the component that has made the error has been restored, thereby preventing the error from being propagated (in this example, errors are propagated up to memories storing user charging data).
To meet the above described requirements, some known solutions involve triplication of most units (e.g. the central processing unit, the memory unit, the input-output unit, etc.) in the processing system and also comprise suitable means for detecting malfunctioning on the ground of a majority evaluation of the single-unit outputs.
Such means is termed "voter" in the Specific art and is arranged to generate an alarm signal when the signal available at the output of a given unit differ from the output signals provided by the remaining pair of units.
This kind of solution is especially adopted in systems installed on board satellites (which thus cannot be repaired) but has a number of drawbacks when used in repairable systems, including: particularly high cost owing to the fact that the above specified units are triplicated and a voter is used; the voter must be, in turn, triplicated to ensure the same level of realiability as that attained with the triplicated units; moreover, the repairable systems must include suitable means for detecting the unit in which malfunctioning has occurred; it is necessary to triplicate the timing signal generator and to provide synchronization means for the three generators.
According to the invention there is provided a circuit arrangement for detecting malfunctioning in a data processing system comprising a data bus which is connected to a first central processing unit by way of first signal sending members, an interruption controlling unit, a data direct transfer unit, and a memory unit, the circuit arrangement comprising: first means for generating an alarm signal upon detecting malfunctioning in the central processing unit; second means for generating an alarm signal upon detecting malfunctioning in the interrupt controlling unit; third means for generating an alarm signal upon detecting malfunctioning in the data direct transfer unit; fourth means for generating an alarm signal upon detecting malfunctioning in the memory unit; and a diagnosis unit which is arranged to reset and disconnect the central processing unit from the data bus upon receiving one of the alarm signals, and to be connected to the data bus and to carry out diagnosis programmes designed to detect the member whose malfunctioning has caused the alarm signal to be generated, and to be disconnected from the data bus and to restore the connection of the central processing unit to the data bus upon detecting correct functioning of all units in the processing system.
It is thus possible to provide a simple and economic circuit arrangement capable of endowing a processing system with a degree of realiability comparable to that obtained by the prior art and of substantially reducing the above specified drawbacks.
The circuits forming such a circuit arrangement constitute only a fraction of the circuits in the data processing system, contrary to the known solution referred to above which calls for the use of a number of circuits twice that of the circuits forming the processing system.
The invention will be further described, by way of example, with reference to the accompanying drawing, which shows the circuit of a data processing system indicated by thick lines and of a circuit arrangement constituting a preferred embodiment of the present invention by thin lines.
In the drawing, the thick lines represent the circuit of a data processing system which comprises a data bus B1 to which the following units are connected: a central processing unit CPU1 which is connected to the data bus B1 by way of first signal sending members; an interrupt controlling unit INT; a data direct transfer controlling (direct memory access) unit DMA; and a memory unit MEM.
A circuit arrangement constituting a preferred embodiment of the present invention is illustrated by thin lines and comprises means arranged to generate an alarm signal upon detecting malfunctioning in any one of the above specified units.
More particularly, the central processing unit CPU1 is associated with first means PM comprising a further central processing unit CPU2 which operates synchronously with the unit CPU, and thus both units receive on their inputs the timing pulses CK1 generated by a first timing unit UT,. The signals available at the output of the units CPU1 and CPU2 are sent to the inputs of a comparison circuit CFR arranged to generate an alarm signal A, upon detecting lack of identity in the signals provided on its inputs.
In this way, should one of the two units CPU erroneously process a datum, its output signals will differ from the signals generated by the other processing unit and such an event is sensed by the circuit CFR which produces the output A,.
The interrupt controlling unit INT is associated with second means SM for generating an alarm signal A2 upon receiving, from one of the interrupt signal generators, an operative-programme interrupt request having a priority smaller than, or equal to, that of the requests being dealt with. An embodiment of the second means SM is illustrated in the Italian Patent Application N.
24467 A/80.
The data direct transfer controlling unit DMA is associated with third means TM for generating an alarm signal A3 when the parity bit of the input data of the unit DMA differs from the parity bit of the output data from the same unit. The third means TM are also designed to generate an alarm signal A4 when the parity bit of the i-th address computed on the ground of the (i-1 Xth address available during transfer of the datum d,~1 differs from the parity bit of the i-th output address provided along with the datum dl. An embodiment of the third means TM is disclosed in the Italian Patent Application No. 24466 A/80.
The memory unit MEM is associated with fourth means QM for detecting malfunctioning by checking parity both in the data and the addresses, and also for generating an alarm signal A5 upon detecting malfunctioning. The fourth means QM are not described in detail as they include circuit structures well known to a person skilled in the art.
The alarm signals A1 to A5 are sent to an ORgate OR belonging to a diagnosis unit UDG which also comprises a data bus B2 to which the following units are connected: a central processing unit CPU,; a memory unit MM of limited size; and an input-output unit I/O to which a communication channel with an operator's place (not shown) is connected.
The unit CPU3 receives on its input a timing signal CK2 available at the output of a second timing unit UT2 which is asynchronous with respect to the unit UT1. The data bus B2 of the diagnosis unit is connected to the data bus B1 of the processing system by way of second signal transfer members which are enabled by a first bistable circuit FF1. The bistable FF1 is switched to the ON state and to the OFF state by a first signal C1 and a signal C2, respectively, generated by the operative programme of the diagnosis unit.
The first signal transfer members which connect the units CPU1 and CPU2 to the data bus B1 are enabled by a second bistable circuit FF2 which is switched to the ON state by the signal available at the output of the OR-gate OR and to the OFF state by a third signal C3 generated by the operative programme of the diagnosis unit.
When the OR-gate OR receives an alarm signal on its input, the bistable FF2 is switched, and thus its output disables the first signal sending members (by disconnecting the units CPU1 and CPU2 from the data bus), resets the units CPU and CPU2 which must then start again the operative programme from the beginning, and generates an interrupt request for the unit CPU3.
Upon receiving an interrupt request, the unit CPU3 starts the execution of a diagnosis programme whose first instruction provides the signal C1 to be generated which causes switching of the bistable FF1, thereby enabling the second signal transfer members (by connecting the diagnosis unit to the data bus of the processing system).
Thus, the timing pulses generated by the unit UT2 are sent to the data bus of the processing system with a given delay with respect to disconnection of the unit UT1, thereby making it possible to execute refreshing operations of the memory units without causing conflicts due to disconnection-connection operations of two asynchronous timing pulse sequences.
Once the diagnosis programme carried out by the unit CPU3 has spotted the member whose malfunctioning has generated the alarm Ai and once such member has been replaced on the basis of the information sent to the operator's place, the programme carried out by the unit CPU3 terminates with a particular instruction which causes generation of the signal C2 for switching of the bistable FF1 to the OFF state, thereby disconnecting the diagnosis unit UDG from the data bus B1 of the processing system.
This instruction is followed by a further instruction which provides the generation of a signal C3 which causes the bistable FF2 to be switched to the OFF state, thereby connecting the data bus B1 of the processing system to the pair of central processing units CPU1 and CPU2.
In this case also, no conflicts concerning timing pulses reaching the data bus of the processing system arise since the disconnection-connection operations in the two pulse sequencies are performed in such a way that a short length of time elapses therebetween.
The diagnosis unit UDG further comprises a unit ADG arranged to execute self-diagnosis programms of the unit UDG during the time interval during which no alarm signals are available at the input of the OR gate OR.
Should the unit ADG sense malfunctioning, it energizes its output, thereby generating a nonmaskable interrupt request for the units CPU1 and CPU2 which then take over the diagnosis tasks by making use of the memories and the input-output unit of the processing system.
With such a circuit arrangement, it is possible to attain a realiability degree comparable to that obtainable by the previously described prior art arrangement through the addition of a limited number of circuits to the circuits of the processing system. Thus, only the central processing unit is triplicated, whereas the remaining circuits constitute a fraction of the overall circuits of the processing system.

Claims (6)

Claims
1. A circuit arrangement for detecting malfunctioning in a data processing system comprising a data bus which is connected to a first central processing unit by way of first signal sending members, an interrupt controlling unit, data direct transfer unit, and a memory unit, the circuit arrangement comprising: first means for generating an alarm signal upon detecting malfunctioning in the central processing unit; second means for generating an alarm signal upon detecting malfunctioning in the interrupt controlling unit; third means for generating an alarm signal upon detecting malfunctioning in the data direct transfer unit; fourth means for generating an alarm signal upon detecting malfunctioning in the memory unit; and a diagnosis unit which is arranged to reset and disconnect the central processing unit from the data bus upon receiving one of the alarm signals, and to be connected to the data bus and to carry out diagnosis programmes designed to detect the member whose malfunctioning has caused the alarm signal to be generated, and to be disconnected from the data bus and to restore the connection of the central processing unit to the data bus upon detecting correct functioning of all units in the processing system.
2. A circuit arrangement as claimed in claim 1, in which the first means comprises a second central processing unit arranged to operate synchronously with the first central processing unit of the data processing system, and a comparison circuit arranged to receive the signals at the outputs of the first and second central processing units and to generate an alarm signal upon detecting lack of identity between the signals at its inputs
3.A circuit arrangement as claimed in claim 1 or 2, in which the diagnosis unit comprises a data bus connectable to the data bus of the processing system by way of second signal sending members and connected to a third central processing unit arranged to receive pulses provided by a second timing unit, a memory unit, and an input-output unit, the diagnosis unit further comprising: an OR gate arranged to receive the alarm signals and having an output connected to an interrupt request input of the third central processing unit; a first bistable circuit arranged to be switched to the ON and OFF states by a first and second signals, respectively, generated at the beginning of the diagnosis programme and after the diagnosis programme has sensed correct operation of the member which had caused the generation of the alarm signal, respectively, the output signal of the said first bistable circuit enabling or disabling the second signal sending members; and a second bistable circuit arranged to be switched to the ON and OFF states upon energization of the output of the OR gate and by a third signal generated by the diagnosis programme in an instant following the generation of the second signal, respectively, the output signal of the second bistable circuit being supplied to the reset input of the first and second central processing units and to the enabling input of the first signal sending members.
4. A circuit arrangement as claimed in claim 3, in which the diagnosis unit comprises a selfdiagnosis unit arranged to generate an alarm signal upon detecting malfunctioning in the diagnosis unit and to send the alarm signal to the interrupt request inputs of the first and second central processing units.
5. A circuit arrangement substantially as hereinbefore described with reference to and as illustrated in the accompanying drawing.
6. A data processing system including a circuit arrangement as claimed in any one of the preceding claims.
GB8128117A 1980-09-17 1981-09-17 Circuit Arrangement for Detecting Malfunctioning in Data Processing Systems Withdrawn GB2086104A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
IT8024701A IT8024701A0 (en) 1980-09-17 1980-09-17 CIRCUIT ARRANGEMENT SUITABLE FOR DETECTING THE PRESENCE OF MALFUNCTIONS IN A DATA PROCESSING SYSTEM USING A COMMERCIAL TYPE MICROPROCESSOR.

Publications (1)

Publication Number Publication Date
GB2086104A true GB2086104A (en) 1982-05-06

Family

ID=11214447

Family Applications (1)

Application Number Title Priority Date Filing Date
GB8128117A Withdrawn GB2086104A (en) 1980-09-17 1981-09-17 Circuit Arrangement for Detecting Malfunctioning in Data Processing Systems

Country Status (5)

Country Link
BR (1) BR8105689A (en)
DE (1) DE3137046A1 (en)
FR (1) FR2490366A1 (en)
GB (1) GB2086104A (en)
IT (1) IT8024701A0 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2235075A (en) * 1989-06-23 1991-02-20 Ansaldo Spa Switching module for pairs of homologous processors connected to at least one communication bus

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3334765A1 (en) * 1983-09-26 1985-04-11 Siemens AG, 1000 Berlin und 8000 München TEST DEVICE FOR DETECTING ERRORS IN DOUBLE CIRCUITS, IN PARTICULAR PROCESSORS OF A TELEPHONE SWITCHING SYSTEM
DE3335695C1 (en) * 1983-09-28 1985-04-18 Siemens AG, 1000 Berlin und 8000 München Testing device for monitoring the data channel during data interchange between central processors in microprocessor-controlled telephone switching systems
GB9101227D0 (en) * 1991-01-19 1991-02-27 Lucas Ind Plc Method of and apparatus for arbitrating between a plurality of controllers,and control system
DE4241319A1 (en) * 1992-12-09 1994-06-16 Ant Nachrichtentech Computer system
DE10029642B4 (en) * 2000-06-15 2005-10-20 Daimler Chrysler Ag Device for monitoring a vehicle data bus system
DE10148325A1 (en) 2001-09-29 2003-04-17 Daimler Chrysler Ag Central node of data bus system with bus monitor unit e.g. for motor vehicles and aircraft, has diagnosis unit integrated into central node

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3768074A (en) * 1972-05-12 1973-10-23 Burroughs Corp Multiprocessing system having means for permissive coupling of different subsystems
US3921141A (en) * 1973-09-14 1975-11-18 Gte Automatic Electric Lab Inc Malfunction monitor control circuitry for central data processor of digital communication system
US4023142A (en) * 1975-04-14 1977-05-10 International Business Machines Corporation Common diagnostic bus for computer systems to enable testing concurrently with normal system operation
US4145734A (en) * 1975-04-22 1979-03-20 Compagnie Honeywell Bull (Societe Anonyme) Method and apparatus for implementing the test of computer functional units
DE2612100A1 (en) * 1976-03-22 1977-10-06 Siemens Ag DIGITAL DATA PROCESSING ARRANGEMENT, IN PARTICULAR FOR RAILWAY SAFETY TECHNOLOGY

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2235075A (en) * 1989-06-23 1991-02-20 Ansaldo Spa Switching module for pairs of homologous processors connected to at least one communication bus
GB2235075B (en) * 1989-06-23 1993-05-26 Ansaldo Spa Switching module for pairs of homologous processors connected to at least one communication bus

Also Published As

Publication number Publication date
IT8024701A0 (en) 1980-09-17
FR2490366A1 (en) 1982-03-19
DE3137046A1 (en) 1982-04-01
BR8105689A (en) 1982-05-25

Similar Documents

Publication Publication Date Title
US5068851A (en) Apparatus and method for documenting faults in computing modules
US4366535A (en) Modular signal-processing system
US5249187A (en) Dual rail processors with error checking on I/O reads
EP0306244B1 (en) Fault tolerant computer system with fault isolation
CA1310129C (en) Interface of non-fault tolerant components to fault tolerant system
CA1306546C (en) Dual zone, fault tolerant computer system with error checking on i/o writes
US5185877A (en) Protocol for transfer of DMA data
EP0514075A2 (en) Fault tolerant processing section with dynamically reconfigurable voting
US5068780A (en) Method and apparatus for controlling initiation of bootstrap loading of an operating system in a computer system having first and second discrete computing zones
EP0650615B1 (en) A fault-tolerant computer system
US4866604A (en) Digital data processing apparatus with pipelined memory cycles
US5251227A (en) Targeted resets in a data processor including a trace memory to store transactions
US5048022A (en) Memory device with transfer of ECC signals on time division multiplexed bidirectional lines
JPH079625B2 (en) Computer with fault-tolerant capabilities
US5163138A (en) Protocol for read write transfers via switching logic by transmitting and retransmitting an address
US6532545B1 (en) Apparatus for swapping, adding or removing a processor in an operating computer system
EP0411805B1 (en) Bulk memory transfer during resync
GB2086104A (en) Circuit Arrangement for Detecting Malfunctioning in Data Processing Systems
US20040193735A1 (en) Method and circuit arrangement for synchronization of synchronously or asynchronously clocked processor units
KR100583214B1 (en) Information processing apparatus
RU1792540C (en) Multiprocessor computation system
EP0416732B1 (en) Targeted resets in a data processor
JPH06242979A (en) Dual computer device
SU1734251A1 (en) Double-channel redundant computing system
CA1316608C (en) Arrangement for error recovery in a self-guarding data processing system

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)