DE69311797D1 - FAULT-TOLERANT COMPUTER SYSTEM WITH DEVICE FOR PROCESSING EXTERNAL EVENTS - Google Patents

FAULT-TOLERANT COMPUTER SYSTEM WITH DEVICE FOR PROCESSING EXTERNAL EVENTS

Info

Publication number
DE69311797D1
DE69311797D1 DE69311797T DE69311797T DE69311797D1 DE 69311797 D1 DE69311797 D1 DE 69311797D1 DE 69311797 T DE69311797 T DE 69311797T DE 69311797 T DE69311797 T DE 69311797T DE 69311797 D1 DE69311797 D1 DE 69311797D1
Authority
DE
Germany
Prior art keywords
task
primary
external
backup
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
DE69311797T
Other languages
German (de)
Other versions
DE69311797T2 (en
Inventor
Barry Gleeson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisys Corp
Original Assignee
Unisys Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisys Corp filed Critical Unisys Corp
Publication of DE69311797D1 publication Critical patent/DE69311797D1/en
Application granted granted Critical
Publication of DE69311797T2 publication Critical patent/DE69311797T2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2041Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with more than one idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1658Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2028Failover techniques eliminating a faulty processor or activating a spare
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/86Event-based monitoring

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)
  • Retry When Errors Occur (AREA)
  • Multi Processors (AREA)

Abstract

A fault tolerant computer system employing primary tasks and corresponding backup tasks. The system operates to provide fault tolerant operation even where uncontrolled external events may occur whose time of occurrence may affect task performance. For this purpose, external event data is stored for each external event occurring during performance of a primary task which indicates the event type and the relationship between the occurrence of the external event and the occurrence of a predetermined primary task event, such as a memory access operation. This external event data is sent to each respective backup task along with messages transmitted to the respective primary task. In the event a primary task fails, the backup task will replay the failed primary task by processing these transmitted messages while using the transmitted external event data to redeliver each external signal to the backup task at an appropriate time which will assure that the backup task properly recovers the primary task.
DE69311797T 1992-01-22 1993-01-22 FAULT-TOLERANT COMPUTER SYSTEM WITH DEVICE FOR PROCESSING EXTERNAL EVENTS Expired - Fee Related DE69311797T2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US07/824,134 US5363503A (en) 1992-01-22 1992-01-22 Fault tolerant computer system with provision for handling external events
PCT/US1993/000618 WO1993015461A1 (en) 1992-01-22 1993-01-22 Fault tolerant computer system with provision for handling external events

Publications (2)

Publication Number Publication Date
DE69311797D1 true DE69311797D1 (en) 1997-07-31
DE69311797T2 DE69311797T2 (en) 1998-02-05

Family

ID=25240678

Family Applications (1)

Application Number Title Priority Date Filing Date
DE69311797T Expired - Fee Related DE69311797T2 (en) 1992-01-22 1993-01-22 FAULT-TOLERANT COMPUTER SYSTEM WITH DEVICE FOR PROCESSING EXTERNAL EVENTS

Country Status (5)

Country Link
US (1) US5363503A (en)
EP (1) EP0623230B1 (en)
JP (1) JP3209748B2 (en)
DE (1) DE69311797T2 (en)
WO (1) WO1993015461A1 (en)

Families Citing this family (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1993009494A1 (en) * 1991-10-28 1993-05-13 Digital Equipment Corporation Fault-tolerant computer processing using a shadow virtual processor
US5398330A (en) * 1992-03-05 1995-03-14 Seiko Epson Corporation Register file backup queue
JPH05260134A (en) * 1992-03-12 1993-10-08 Fujitsu Ltd Monitor system for transmission equipment
US5715386A (en) * 1992-09-30 1998-02-03 Lucent Technologies Inc. Apparatus and methods for software rejuvenation
CA2106280C (en) * 1992-09-30 2000-01-18 Yennun Huang Apparatus and methods for fault-tolerant computing employing a daemon monitoring process and fault-tolerant library to provide varying degrees of fault tolerance
JPH07175597A (en) * 1993-12-17 1995-07-14 Fujitsu Ltd Dual device for storage medium
DE69506404T2 (en) * 1994-06-10 1999-05-27 Texas Micro Inc., Houston, Tex. MAIN STORAGE DEVICE AND RESTART LABELING METHOD FOR AN ERROR TOLERANT COMPUTER SYSTEM
CA2167634A1 (en) * 1995-01-23 1996-07-24 Michael E. Fisher Method and apparatus for maintaining network connections across a voluntary process switchover
US5621885A (en) * 1995-06-07 1997-04-15 Tandem Computers, Incorporated System and method for providing a fault tolerant computer program runtime support environment
US5699502A (en) * 1995-09-29 1997-12-16 International Business Machines Corporation System and method for managing computer system faults
US5864657A (en) * 1995-11-29 1999-01-26 Texas Micro, Inc. Main memory system and checkpointing protocol for fault-tolerant computer system
US5802265A (en) * 1995-12-01 1998-09-01 Stratus Computer, Inc. Transparent fault tolerant computer system
US5819021A (en) * 1995-12-11 1998-10-06 Ab Initio Software Corporation Overpartitioning system and method for increasing checkpoints in component-based parallel applications
FR2743164B1 (en) 1995-12-28 1998-02-06 Cegelec METHOD FOR ORDERING A PLURALITY OF MESSAGES COMING RESPECTIVELY FROM A PLURALITY OF SOURCES, AND SYSTEM FOR IMPLEMENTING THIS PROCESS
US5978933A (en) * 1996-01-11 1999-11-02 Hewlett-Packard Company Generic fault tolerant platform
GB9601585D0 (en) * 1996-01-26 1996-03-27 Hewlett Packard Co Fault-tolerant processing method
GB9601584D0 (en) * 1996-01-26 1996-03-27 Hewlett Packard Co Fault-tolerant processing method
US5796941A (en) * 1996-09-06 1998-08-18 Catalyst Semiconductor, Inc. Method for supervising software execution in a license restricted environment
US5835698A (en) * 1996-09-20 1998-11-10 Novell, Inc. Unilaterally-controlled, time-insensitive, data-link recovery apparatus and method
US5983371A (en) * 1997-07-11 1999-11-09 Marathon Technologies Corporation Active failure detection
US6289474B1 (en) 1998-06-24 2001-09-11 Torrent Systems, Inc. Computer system and process for checkpointing operations on data in a computer system by partitioning the data
US6801938B1 (en) * 1999-06-18 2004-10-05 Torrent Systems, Inc. Segmentation and processing of continuous data streams using transactional semantics
GB2359384B (en) 2000-02-16 2004-06-16 Data Connection Ltd Automatic reconnection of partner software processes in a fault-tolerant computer system
US20020129110A1 (en) * 2001-03-07 2002-09-12 Ling-Zhong Liu Distributed event notification service
US6971043B2 (en) * 2001-04-11 2005-11-29 Stratus Technologies Bermuda Ltd Apparatus and method for accessing a mass storage device in a fault-tolerant server
US6954877B2 (en) * 2001-11-29 2005-10-11 Agami Systems, Inc. Fault tolerance using logical checkpointing in computing systems
US7478275B1 (en) * 2004-03-29 2009-01-13 Symantec Operating Corporation Method and apparatus for performing backup storage of checkpoint data within a server cluster
US7644050B2 (en) * 2004-12-02 2010-01-05 International Business Machines Corporation Method and apparatus for annotation-based behavior extensions
FR2882448B1 (en) * 2005-01-21 2007-05-04 Meiosys Soc Par Actions Simpli METHOD OF MANAGING, JOURNALIZING OR REJECTING THE PROGRESS OF AN APPLICATION PROCESS
FR2881306B1 (en) * 2005-01-21 2007-03-23 Meiosys Soc Par Actions Simpli METHOD FOR NON-INTRUSIVE JOURNALIZATION OF EXTERNAL EVENTS IN AN APPLICATION PROCESS, AND SYSTEM IMPLEMENTING SAID METHOD
FR2881309B1 (en) * 2005-01-21 2007-03-23 Meiosys Soc Par Actions Simpli METHOD FOR OPTIMIZING THE TRANSMISSION OF JOURNALIZATION DATA IN MULTI-COMPUTER ENVIRONMENT AND SYSTEM IMPLEMENTING SAID METHOD
FR2881308B1 (en) * 2005-01-21 2007-03-23 Meiosys Soc Par Actions Simpli METHOD OF ACCELERATING THE TRANSMISSION OF JOURNALIZATION DATA IN A MULTI-COMPUTER ENVIRONMENT AND SYSTEM USING THE SAME
FR2882449A1 (en) 2005-01-21 2006-08-25 Meiosys Soc Par Actions Simpli NON-INTRUSIVE METHOD OF REJECTING INTERNAL EVENTS WITHIN AN APPLICATION PROCESS, AND SYSTEM IMPLEMENTING SAID METHOD
FR2881307B1 (en) * 2005-01-21 2007-03-23 Meiosys Soc Par Actions Simpli NON-INTRUSTIVE METHOD OF SIMULATION OR REJECTION OF EXTERNAL EVENTS FROM AN APPLICATION PROCESS, AND SYSTEM IMPLEMENTING SAID METHOD
FR2881246B1 (en) * 2005-01-21 2007-03-23 Meiosys Soc Par Actions Simpli PERFECT PROCESS FOR MANAGING, JOURNALIZING OR REJECTING NON-DETERMINISTIC OPERATIONS IN THE CONDUCT OF AN APPLICATION PROCESS
US20060222125A1 (en) * 2005-03-31 2006-10-05 Edwards John W Jr Systems and methods for maintaining synchronicity during signal transmission
US20060222126A1 (en) * 2005-03-31 2006-10-05 Stratus Technologies Bermuda Ltd. Systems and methods for maintaining synchronicity during signal transmission
US8041985B2 (en) 2006-08-11 2011-10-18 Chicago Mercantile Exchange, Inc. Match server for a financial exchange having fault tolerant operation
US7480827B2 (en) 2006-08-11 2009-01-20 Chicago Mercantile Exchange Fault tolerance and failover using active copy-cat
US7434096B2 (en) 2006-08-11 2008-10-07 Chicago Mercantile Exchange Match server for a financial exchange having fault tolerant operation

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4347563A (en) * 1980-06-16 1982-08-31 Forney Engineering Company Industrial control system
US4485438A (en) * 1982-06-28 1984-11-27 Myrmo Erik R High transfer rate between multi-processor units
US4590554A (en) * 1982-11-23 1986-05-20 Parallel Computers Systems, Inc. Backup fault tolerant computer system
US4493035A (en) * 1982-12-07 1985-01-08 Motorola, Inc. Data processor version validation
US4524415A (en) * 1982-12-07 1985-06-18 Motorola, Inc. Virtual machine data processor
US4562538A (en) * 1983-05-16 1985-12-31 At&T Bell Laboratories Microprocessor having decision pointer to process restore position
DE3566314D1 (en) * 1984-04-26 1988-12-22 Bbc Brown Boveri & Cie Apparatus for saving a calculator status
SE454730B (en) * 1986-09-19 1988-05-24 Asea Ab PROCEDURE AND COMPUTER EQUIPMENT FOR SHORT-FREE REPLACEMENT OF THE ACTIVITY FROM ACTIVE DEVICES TO EMERGENCY UNITS IN A CENTRAL UNIT
DE69021712T2 (en) * 1990-02-08 1996-04-18 Ibm Restart marking mechanism for fault tolerant systems.
US5271013A (en) * 1990-05-09 1993-12-14 Unisys Corporation Fault tolerant computer system
US5032979A (en) * 1990-06-22 1991-07-16 International Business Machines Corporation Distributed security auditing subsystem for an operating system
US5175847A (en) * 1990-09-20 1992-12-29 Logicon Incorporated Computer system capable of program execution recovery

Also Published As

Publication number Publication date
JPH07503334A (en) 1995-04-06
JP3209748B2 (en) 2001-09-17
US5363503A (en) 1994-11-08
WO1993015461A1 (en) 1993-08-05
EP0623230B1 (en) 1997-06-25
DE69311797T2 (en) 1998-02-05
EP0623230A1 (en) 1994-11-09

Similar Documents

Publication Publication Date Title
DE69311797D1 (en) FAULT-TOLERANT COMPUTER SYSTEM WITH DEVICE FOR PROCESSING EXTERNAL EVENTS
DE69122713D1 (en) FAULT-TOLERANT COMPUTER SYSTEM
US9176823B2 (en) Data transfer and recovery process
US8145945B2 (en) Packet mirroring between primary and secondary virtualized software images for improved system failover performance
DE69021712D1 (en) Restart marking mechanism for fault tolerant systems.
CA2339783A1 (en) Fault tolerant computer system
CA2150059A1 (en) Progressive Retry Method and Apparatus Having Reusable Software Modules for Software Failure Recovery in Multi-Process Message-Passing Applications
WO2002050678A8 (en) Method of 'split-brain' prevention in computer cluster systems
SE9504396D0 (en) Processor redundancy in a distributed system
US5343480A (en) System for detecting loss of message
JPS61234690A (en) Restart processing method for trouble of exchange
KR100298319B1 (en) Redundancy Device in Communication System_
KR930010952B1 (en) Memory error correction method
WO2017014793A1 (en) Preserving volatile memory across a computer system disruption
JPH0268634A (en) Spare system for electronic computer
Appel et al. Implications of fault management and replica determinism on the real-time execution scheme of VOTRICS
KR100407689B1 (en) Time synchronization method after standby loading in ATM switch
Rabson et al. Feasibility of the fail-silent property in distributed computer control systems
JPS62224560A (en) Method of tracing virtual tracking
CHETTO Fault-tolerant scheduling in distributed critical real-time systems
JPH0588926A (en) Automatic switching circuit for monitor and control system
JPS575160A (en) Multiple computer system
JPH01311627A (en) Line backup system
JPS59100997A (en) Abnormality alarm system
JPH03158037A (en) Fault restoration system

Legal Events

Date Code Title Description
8364 No opposition during term of opposition
8339 Ceased/non-payment of the annual fee