DE69311797D1 - FAULT-TOLERANT COMPUTER SYSTEM WITH DEVICE FOR PROCESSING EXTERNAL EVENTS - Google Patents
FAULT-TOLERANT COMPUTER SYSTEM WITH DEVICE FOR PROCESSING EXTERNAL EVENTSInfo
- Publication number
- DE69311797D1 DE69311797D1 DE69311797T DE69311797T DE69311797D1 DE 69311797 D1 DE69311797 D1 DE 69311797D1 DE 69311797 T DE69311797 T DE 69311797T DE 69311797 T DE69311797 T DE 69311797T DE 69311797 D1 DE69311797 D1 DE 69311797D1
- Authority
- DE
- Germany
- Prior art keywords
- task
- primary
- external
- backup
- event
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2041—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with more than one idle spare processing component
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/1658—Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
- G06F11/2028—Failover techniques eliminating a faulty processor or activating a spare
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/86—Event-based monitoring
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Hardware Redundancy (AREA)
- Retry When Errors Occur (AREA)
- Multi Processors (AREA)
Abstract
A fault tolerant computer system employing primary tasks and corresponding backup tasks. The system operates to provide fault tolerant operation even where uncontrolled external events may occur whose time of occurrence may affect task performance. For this purpose, external event data is stored for each external event occurring during performance of a primary task which indicates the event type and the relationship between the occurrence of the external event and the occurrence of a predetermined primary task event, such as a memory access operation. This external event data is sent to each respective backup task along with messages transmitted to the respective primary task. In the event a primary task fails, the backup task will replay the failed primary task by processing these transmitted messages while using the transmitted external event data to redeliver each external signal to the backup task at an appropriate time which will assure that the backup task properly recovers the primary task.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US07/824,134 US5363503A (en) | 1992-01-22 | 1992-01-22 | Fault tolerant computer system with provision for handling external events |
PCT/US1993/000618 WO1993015461A1 (en) | 1992-01-22 | 1993-01-22 | Fault tolerant computer system with provision for handling external events |
Publications (2)
Publication Number | Publication Date |
---|---|
DE69311797D1 true DE69311797D1 (en) | 1997-07-31 |
DE69311797T2 DE69311797T2 (en) | 1998-02-05 |
Family
ID=25240678
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
DE69311797T Expired - Fee Related DE69311797T2 (en) | 1992-01-22 | 1993-01-22 | FAULT-TOLERANT COMPUTER SYSTEM WITH DEVICE FOR PROCESSING EXTERNAL EVENTS |
Country Status (5)
Country | Link |
---|---|
US (1) | US5363503A (en) |
EP (1) | EP0623230B1 (en) |
JP (1) | JP3209748B2 (en) |
DE (1) | DE69311797T2 (en) |
WO (1) | WO1993015461A1 (en) |
Families Citing this family (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1993009494A1 (en) * | 1991-10-28 | 1993-05-13 | Digital Equipment Corporation | Fault-tolerant computer processing using a shadow virtual processor |
US5398330A (en) * | 1992-03-05 | 1995-03-14 | Seiko Epson Corporation | Register file backup queue |
JPH05260134A (en) * | 1992-03-12 | 1993-10-08 | Fujitsu Ltd | Monitor system for transmission equipment |
US5715386A (en) * | 1992-09-30 | 1998-02-03 | Lucent Technologies Inc. | Apparatus and methods for software rejuvenation |
CA2106280C (en) * | 1992-09-30 | 2000-01-18 | Yennun Huang | Apparatus and methods for fault-tolerant computing employing a daemon monitoring process and fault-tolerant library to provide varying degrees of fault tolerance |
JPH07175597A (en) * | 1993-12-17 | 1995-07-14 | Fujitsu Ltd | Dual device for storage medium |
DE69506404T2 (en) * | 1994-06-10 | 1999-05-27 | Texas Micro Inc., Houston, Tex. | MAIN STORAGE DEVICE AND RESTART LABELING METHOD FOR AN ERROR TOLERANT COMPUTER SYSTEM |
CA2167634A1 (en) * | 1995-01-23 | 1996-07-24 | Michael E. Fisher | Method and apparatus for maintaining network connections across a voluntary process switchover |
US5621885A (en) * | 1995-06-07 | 1997-04-15 | Tandem Computers, Incorporated | System and method for providing a fault tolerant computer program runtime support environment |
US5699502A (en) * | 1995-09-29 | 1997-12-16 | International Business Machines Corporation | System and method for managing computer system faults |
US5864657A (en) * | 1995-11-29 | 1999-01-26 | Texas Micro, Inc. | Main memory system and checkpointing protocol for fault-tolerant computer system |
US5802265A (en) * | 1995-12-01 | 1998-09-01 | Stratus Computer, Inc. | Transparent fault tolerant computer system |
US5819021A (en) * | 1995-12-11 | 1998-10-06 | Ab Initio Software Corporation | Overpartitioning system and method for increasing checkpoints in component-based parallel applications |
FR2743164B1 (en) | 1995-12-28 | 1998-02-06 | Cegelec | METHOD FOR ORDERING A PLURALITY OF MESSAGES COMING RESPECTIVELY FROM A PLURALITY OF SOURCES, AND SYSTEM FOR IMPLEMENTING THIS PROCESS |
US5978933A (en) * | 1996-01-11 | 1999-11-02 | Hewlett-Packard Company | Generic fault tolerant platform |
GB9601585D0 (en) * | 1996-01-26 | 1996-03-27 | Hewlett Packard Co | Fault-tolerant processing method |
GB9601584D0 (en) * | 1996-01-26 | 1996-03-27 | Hewlett Packard Co | Fault-tolerant processing method |
US5796941A (en) * | 1996-09-06 | 1998-08-18 | Catalyst Semiconductor, Inc. | Method for supervising software execution in a license restricted environment |
US5835698A (en) * | 1996-09-20 | 1998-11-10 | Novell, Inc. | Unilaterally-controlled, time-insensitive, data-link recovery apparatus and method |
US5983371A (en) * | 1997-07-11 | 1999-11-09 | Marathon Technologies Corporation | Active failure detection |
US6289474B1 (en) | 1998-06-24 | 2001-09-11 | Torrent Systems, Inc. | Computer system and process for checkpointing operations on data in a computer system by partitioning the data |
US6801938B1 (en) * | 1999-06-18 | 2004-10-05 | Torrent Systems, Inc. | Segmentation and processing of continuous data streams using transactional semantics |
GB2359384B (en) | 2000-02-16 | 2004-06-16 | Data Connection Ltd | Automatic reconnection of partner software processes in a fault-tolerant computer system |
US20020129110A1 (en) * | 2001-03-07 | 2002-09-12 | Ling-Zhong Liu | Distributed event notification service |
US6971043B2 (en) * | 2001-04-11 | 2005-11-29 | Stratus Technologies Bermuda Ltd | Apparatus and method for accessing a mass storage device in a fault-tolerant server |
US6954877B2 (en) * | 2001-11-29 | 2005-10-11 | Agami Systems, Inc. | Fault tolerance using logical checkpointing in computing systems |
US7478275B1 (en) * | 2004-03-29 | 2009-01-13 | Symantec Operating Corporation | Method and apparatus for performing backup storage of checkpoint data within a server cluster |
US7644050B2 (en) * | 2004-12-02 | 2010-01-05 | International Business Machines Corporation | Method and apparatus for annotation-based behavior extensions |
FR2882448B1 (en) * | 2005-01-21 | 2007-05-04 | Meiosys Soc Par Actions Simpli | METHOD OF MANAGING, JOURNALIZING OR REJECTING THE PROGRESS OF AN APPLICATION PROCESS |
FR2881306B1 (en) * | 2005-01-21 | 2007-03-23 | Meiosys Soc Par Actions Simpli | METHOD FOR NON-INTRUSIVE JOURNALIZATION OF EXTERNAL EVENTS IN AN APPLICATION PROCESS, AND SYSTEM IMPLEMENTING SAID METHOD |
FR2881309B1 (en) * | 2005-01-21 | 2007-03-23 | Meiosys Soc Par Actions Simpli | METHOD FOR OPTIMIZING THE TRANSMISSION OF JOURNALIZATION DATA IN MULTI-COMPUTER ENVIRONMENT AND SYSTEM IMPLEMENTING SAID METHOD |
FR2881308B1 (en) * | 2005-01-21 | 2007-03-23 | Meiosys Soc Par Actions Simpli | METHOD OF ACCELERATING THE TRANSMISSION OF JOURNALIZATION DATA IN A MULTI-COMPUTER ENVIRONMENT AND SYSTEM USING THE SAME |
FR2882449A1 (en) | 2005-01-21 | 2006-08-25 | Meiosys Soc Par Actions Simpli | NON-INTRUSIVE METHOD OF REJECTING INTERNAL EVENTS WITHIN AN APPLICATION PROCESS, AND SYSTEM IMPLEMENTING SAID METHOD |
FR2881307B1 (en) * | 2005-01-21 | 2007-03-23 | Meiosys Soc Par Actions Simpli | NON-INTRUSTIVE METHOD OF SIMULATION OR REJECTION OF EXTERNAL EVENTS FROM AN APPLICATION PROCESS, AND SYSTEM IMPLEMENTING SAID METHOD |
FR2881246B1 (en) * | 2005-01-21 | 2007-03-23 | Meiosys Soc Par Actions Simpli | PERFECT PROCESS FOR MANAGING, JOURNALIZING OR REJECTING NON-DETERMINISTIC OPERATIONS IN THE CONDUCT OF AN APPLICATION PROCESS |
US20060222125A1 (en) * | 2005-03-31 | 2006-10-05 | Edwards John W Jr | Systems and methods for maintaining synchronicity during signal transmission |
US20060222126A1 (en) * | 2005-03-31 | 2006-10-05 | Stratus Technologies Bermuda Ltd. | Systems and methods for maintaining synchronicity during signal transmission |
US8041985B2 (en) | 2006-08-11 | 2011-10-18 | Chicago Mercantile Exchange, Inc. | Match server for a financial exchange having fault tolerant operation |
US7480827B2 (en) | 2006-08-11 | 2009-01-20 | Chicago Mercantile Exchange | Fault tolerance and failover using active copy-cat |
US7434096B2 (en) | 2006-08-11 | 2008-10-07 | Chicago Mercantile Exchange | Match server for a financial exchange having fault tolerant operation |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4347563A (en) * | 1980-06-16 | 1982-08-31 | Forney Engineering Company | Industrial control system |
US4485438A (en) * | 1982-06-28 | 1984-11-27 | Myrmo Erik R | High transfer rate between multi-processor units |
US4590554A (en) * | 1982-11-23 | 1986-05-20 | Parallel Computers Systems, Inc. | Backup fault tolerant computer system |
US4493035A (en) * | 1982-12-07 | 1985-01-08 | Motorola, Inc. | Data processor version validation |
US4524415A (en) * | 1982-12-07 | 1985-06-18 | Motorola, Inc. | Virtual machine data processor |
US4562538A (en) * | 1983-05-16 | 1985-12-31 | At&T Bell Laboratories | Microprocessor having decision pointer to process restore position |
DE3566314D1 (en) * | 1984-04-26 | 1988-12-22 | Bbc Brown Boveri & Cie | Apparatus for saving a calculator status |
SE454730B (en) * | 1986-09-19 | 1988-05-24 | Asea Ab | PROCEDURE AND COMPUTER EQUIPMENT FOR SHORT-FREE REPLACEMENT OF THE ACTIVITY FROM ACTIVE DEVICES TO EMERGENCY UNITS IN A CENTRAL UNIT |
DE69021712T2 (en) * | 1990-02-08 | 1996-04-18 | Ibm | Restart marking mechanism for fault tolerant systems. |
US5271013A (en) * | 1990-05-09 | 1993-12-14 | Unisys Corporation | Fault tolerant computer system |
US5032979A (en) * | 1990-06-22 | 1991-07-16 | International Business Machines Corporation | Distributed security auditing subsystem for an operating system |
US5175847A (en) * | 1990-09-20 | 1992-12-29 | Logicon Incorporated | Computer system capable of program execution recovery |
-
1992
- 1992-01-22 US US07/824,134 patent/US5363503A/en not_active Expired - Lifetime
-
1993
- 1993-01-22 EP EP93903657A patent/EP0623230B1/en not_active Expired - Lifetime
- 1993-01-22 DE DE69311797T patent/DE69311797T2/en not_active Expired - Fee Related
- 1993-01-22 WO PCT/US1993/000618 patent/WO1993015461A1/en active IP Right Grant
- 1993-01-22 JP JP51335893A patent/JP3209748B2/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
JPH07503334A (en) | 1995-04-06 |
JP3209748B2 (en) | 2001-09-17 |
US5363503A (en) | 1994-11-08 |
WO1993015461A1 (en) | 1993-08-05 |
EP0623230B1 (en) | 1997-06-25 |
DE69311797T2 (en) | 1998-02-05 |
EP0623230A1 (en) | 1994-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE69311797D1 (en) | FAULT-TOLERANT COMPUTER SYSTEM WITH DEVICE FOR PROCESSING EXTERNAL EVENTS | |
DE69122713D1 (en) | FAULT-TOLERANT COMPUTER SYSTEM | |
US9176823B2 (en) | Data transfer and recovery process | |
US8145945B2 (en) | Packet mirroring between primary and secondary virtualized software images for improved system failover performance | |
DE69021712D1 (en) | Restart marking mechanism for fault tolerant systems. | |
CA2339783A1 (en) | Fault tolerant computer system | |
CA2150059A1 (en) | Progressive Retry Method and Apparatus Having Reusable Software Modules for Software Failure Recovery in Multi-Process Message-Passing Applications | |
WO2002050678A8 (en) | Method of 'split-brain' prevention in computer cluster systems | |
SE9504396D0 (en) | Processor redundancy in a distributed system | |
US5343480A (en) | System for detecting loss of message | |
JPS61234690A (en) | Restart processing method for trouble of exchange | |
KR100298319B1 (en) | Redundancy Device in Communication System_ | |
KR930010952B1 (en) | Memory error correction method | |
WO2017014793A1 (en) | Preserving volatile memory across a computer system disruption | |
JPH0268634A (en) | Spare system for electronic computer | |
Appel et al. | Implications of fault management and replica determinism on the real-time execution scheme of VOTRICS | |
KR100407689B1 (en) | Time synchronization method after standby loading in ATM switch | |
Rabson et al. | Feasibility of the fail-silent property in distributed computer control systems | |
JPS62224560A (en) | Method of tracing virtual tracking | |
CHETTO | Fault-tolerant scheduling in distributed critical real-time systems | |
JPH0588926A (en) | Automatic switching circuit for monitor and control system | |
JPS575160A (en) | Multiple computer system | |
JPH01311627A (en) | Line backup system | |
JPS59100997A (en) | Abnormality alarm system | |
JPH03158037A (en) | Fault restoration system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
8364 | No opposition during term of opposition | ||
8339 | Ceased/non-payment of the annual fee |