DE69021712T2 - Wiederanlaufkennzeichnungsmechanismus für fehlertolerierende Systeme. - Google Patents

Wiederanlaufkennzeichnungsmechanismus für fehlertolerierende Systeme.

Info

Publication number
DE69021712T2
DE69021712T2 DE69021712T DE69021712T DE69021712T2 DE 69021712 T2 DE69021712 T2 DE 69021712T2 DE 69021712 T DE69021712 T DE 69021712T DE 69021712 T DE69021712 T DE 69021712T DE 69021712 T2 DE69021712 T2 DE 69021712T2
Authority
DE
Germany
Prior art keywords
memory
processor
active
processors
backup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
DE69021712T
Other languages
English (en)
Other versions
DE69021712D1 (de
Inventor
Haissam Alaiwan
Claude Basso
Jean Calvignac
Jacques Combes
Francois Kermarec
Andre Pauporte
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of DE69021712D1 publication Critical patent/DE69021712D1/de
Application granted granted Critical
Publication of DE69021712T2 publication Critical patent/DE69021712T2/de
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2028Failover techniques eliminating a faulty processor or activating a spare
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1658Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2043Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share a common memory address space

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)
  • Retry When Errors Occur (AREA)
  • Multi Processors (AREA)
DE69021712T 1990-02-08 1990-02-08 Wiederanlaufkennzeichnungsmechanismus für fehlertolerierende Systeme. Expired - Fee Related DE69021712T2 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP90480021A EP0441087B1 (de) 1990-02-08 1990-02-08 Wiederanlaufkennzeichnungsmechanismus für fehlertolerierende Systeme

Publications (2)

Publication Number Publication Date
DE69021712D1 DE69021712D1 (de) 1995-09-21
DE69021712T2 true DE69021712T2 (de) 1996-04-18

Family

ID=8205827

Family Applications (1)

Application Number Title Priority Date Filing Date
DE69021712T Expired - Fee Related DE69021712T2 (de) 1990-02-08 1990-02-08 Wiederanlaufkennzeichnungsmechanismus für fehlertolerierende Systeme.

Country Status (4)

Country Link
US (1) US5235700A (de)
EP (1) EP0441087B1 (de)
JP (1) JP2505928B2 (de)
DE (1) DE69021712T2 (de)

Families Citing this family (92)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0410041A (ja) * 1990-04-27 1992-01-14 Toshiba Corp データ退避方式
US5157663A (en) * 1990-09-24 1992-10-20 Novell, Inc. Fault tolerant computer system
EP0529303A3 (en) * 1991-08-29 1993-09-22 International Business Machines Corporation Checkpoint synchronization with instruction overlap enabled
US5815651A (en) * 1991-10-17 1998-09-29 Digital Equipment Corporation Method and apparatus for CPU failure recovery in symmetric multi-processing systems
WO1993009494A1 (en) * 1991-10-28 1993-05-13 Digital Equipment Corporation Fault-tolerant computer processing using a shadow virtual processor
US5363503A (en) * 1992-01-22 1994-11-08 Unisys Corporation Fault tolerant computer system with provision for handling external events
US5394542A (en) * 1992-03-30 1995-02-28 International Business Machines Corporation Clearing data objects used to maintain state information for shared data at a local complex when at least one message path to the local complex cannot be recovered
US5485585A (en) * 1992-09-18 1996-01-16 International Business Machines Corporation Personal computer with alternate system controller and register for identifying active system controller
US6237108B1 (en) * 1992-10-09 2001-05-22 Fujitsu Limited Multiprocessor system having redundant shared memory configuration
US5574849A (en) * 1992-12-17 1996-11-12 Tandem Computers Incorporated Synchronized data transmission between elements of a processing system
US5664195A (en) * 1993-04-07 1997-09-02 Sequoia Systems, Inc. Method and apparatus for dynamic installation of a driver on a computer system
JPH0713838A (ja) * 1993-06-14 1995-01-17 Internatl Business Mach Corp <Ibm> エラーの回復方法及び装置
AU7211594A (en) * 1993-07-20 1995-02-20 Vinca Corporation Method for rapid recovery from a network file server failure
JPH07262034A (ja) * 1994-03-18 1995-10-13 Fujitsu Ltd データ引き継ぎシステム
JPH10506483A (ja) * 1994-06-10 1998-06-23 テキサス・マイクロ・インコーポレーテッド フォールト・トレラントなコンピュータ・システムのためのメイン・メモリ・システム及びチェックポイント用プロトコル
US5566297A (en) * 1994-06-16 1996-10-15 International Business Machines Corporation Non-disruptive recovery from file server failure in a highly available file system for clustered computing environments
GB2290891B (en) * 1994-06-29 1999-02-17 Mitsubishi Electric Corp Multiprocessor system
US5557735A (en) * 1994-07-21 1996-09-17 Motorola, Inc. Communication system for a network and method for configuring a controller in a communication network
US5692120A (en) * 1994-08-08 1997-11-25 International Business Machines Corporation Failure recovery apparatus and method for distributed processing shared resource control
US5835953A (en) 1994-10-13 1998-11-10 Vinca Corporation Backup system that takes a snapshot of the locations in a mass storage device that has been identified for updating prior to updating
US5649152A (en) 1994-10-13 1997-07-15 Vinca Corporation Method and system for providing a static snapshot of data stored on a mass storage system
US5530946A (en) * 1994-10-28 1996-06-25 Dell Usa, L.P. Processor failure detection and recovery circuit in a dual processor computer system and method of operation thereof
CA2167633A1 (en) * 1995-01-23 1996-07-24 Leonard R. Fishler Apparatus and method for efficient modularity in a parallel, fault tolerant, message based operating system
US5644742A (en) * 1995-02-14 1997-07-01 Hal Computer Systems, Inc. Processor structure and method for a time-out checkpoint
US6105148A (en) * 1995-06-16 2000-08-15 Lucent Technologies Inc. Persistent state checkpoint and restoration systems
WO1997000477A1 (en) * 1995-06-16 1997-01-03 Lucent Technologies Checkpoint and restoration systems for execution control
US6044475A (en) * 1995-06-16 2000-03-28 Lucent Technologies, Inc. Checkpoint and restoration systems for execution control
JP3481737B2 (ja) * 1995-08-07 2003-12-22 富士通株式会社 ダンプ採取装置およびダンプ採取方法
US5867642A (en) * 1995-08-10 1999-02-02 Dell Usa, L.P. System and method to coherently and dynamically remap an at-risk memory area by simultaneously writing two memory areas
US5630047A (en) * 1995-09-12 1997-05-13 Lucent Technologies Inc. Method for software error recovery using consistent global checkpoints
US5708771A (en) * 1995-11-21 1998-01-13 Emc Corporation Fault tolerant controller system and method
US5737514A (en) * 1995-11-29 1998-04-07 Texas Micro, Inc. Remote checkpoint memory system and protocol for fault-tolerant computer system
US5864657A (en) * 1995-11-29 1999-01-26 Texas Micro, Inc. Main memory system and checkpointing protocol for fault-tolerant computer system
GB2308040A (en) * 1995-12-09 1997-06-11 Northern Telecom Ltd Telecommunications system
US5978933A (en) * 1996-01-11 1999-11-02 Hewlett-Packard Company Generic fault tolerant platform
GB9601585D0 (en) * 1996-01-26 1996-03-27 Hewlett Packard Co Fault-tolerant processing method
GB9601584D0 (en) * 1996-01-26 1996-03-27 Hewlett Packard Co Fault-tolerant processing method
US5870537A (en) * 1996-03-13 1999-02-09 International Business Machines Corporation Concurrent switch to shadowed device for storage controller and device errors
JPH09251443A (ja) * 1996-03-18 1997-09-22 Hitachi Ltd 情報処理システムのプロセッサ障害回復処理方法
US5724501A (en) * 1996-03-29 1998-03-03 Emc Corporation Quick recovery of write cache in a fault tolerant I/O system
US5828889A (en) * 1996-05-31 1998-10-27 Sun Microsystems, Inc. Quorum mechanism in a two-node distributed computer system
US5883939A (en) * 1996-08-29 1999-03-16 Cornell Research Foundation, Inc. Distributed architecture for an intelligent networking coprocessor
US6393581B1 (en) 1996-08-29 2002-05-21 Cornell Research Foundation, Inc. Reliable time delay-constrained cluster computing
KR19980024086A (ko) * 1996-09-03 1998-07-06 니시무로 타이조 컴퓨터 시스템 및 화일 관리 방법
TW379298B (en) * 1996-09-30 2000-01-11 Toshiba Corp Memory updating history saving device and memory updating history saving method
US5933474A (en) * 1996-12-24 1999-08-03 Lucent Technologies Inc. Telecommunications call preservation in the presence of control failure and high processing load
JPH10326220A (ja) * 1997-05-27 1998-12-08 Toshiba Corp ファイルシステムおよびファイル管理方法
US5948108A (en) * 1997-06-12 1999-09-07 Tandem Computers, Incorporated Method and system for providing fault tolerant access between clients and a server
US5995960A (en) * 1998-04-08 1999-11-30 International Business Machines Corporaiton Method and system for improving efficiency of programs utilizing databases by exeuting scenarios based on recalled processed information
US6223304B1 (en) 1998-06-18 2001-04-24 Telefonaktiebolaget Lm Ericsson (Publ) Synchronization of processors in a fault tolerant multi-processor system
DE19836347C2 (de) 1998-08-11 2001-11-15 Ericsson Telefon Ab L M Fehlertolerantes Computersystem
US6401216B1 (en) 1998-10-29 2002-06-04 International Business Machines Corporation System of performing checkpoint/restart of a parallel program
US6338147B1 (en) * 1998-10-29 2002-01-08 International Business Machines Corporation Program products for performing checkpoint/restart of a parallel program
US6393583B1 (en) 1998-10-29 2002-05-21 International Business Machines Corporation Method of performing checkpoint/restart of a parallel program
US6460133B1 (en) * 1999-05-20 2002-10-01 International Business Machines Corporation Queue resource tracking in a multiprocessor system
US6622263B1 (en) 1999-06-30 2003-09-16 Jack Justin Stiffler Method and apparatus for achieving system-directed checkpointing without specialized hardware assistance
US6766381B1 (en) 1999-08-27 2004-07-20 International Business Machines Corporation VLSI network processor and methods
US6687849B1 (en) * 2000-06-30 2004-02-03 Cisco Technology, Inc. Method and apparatus for implementing fault-tolerant processing without duplicating working process
US7065098B2 (en) 2001-01-19 2006-06-20 Raze Technologies, Inc. Redundant telecommunication system using memory equalization apparatus and method of operation
JP4394298B2 (ja) * 2001-02-20 2010-01-06 日本電気株式会社 マルチプロセッサシステムとその共有メモリ制御方法、及び共有メモリ制御プログラム
JP4258986B2 (ja) * 2001-03-16 2009-04-30 沖電気工業株式会社 通信制御装置の制御方法およびネットワーク管理システムの制御プログラム
US7055056B2 (en) * 2001-11-21 2006-05-30 Hewlett-Packard Development Company, L.P. System and method for ensuring the availability of a storage system
KR100441712B1 (ko) * 2001-12-29 2004-07-27 엘지전자 주식회사 확장 가능형 다중 처리 시스템 및 그의 메모리 복제 방법
JP4165747B2 (ja) * 2003-03-20 2008-10-15 株式会社日立製作所 記憶システム、制御装置及び制御装置のプログラム
CN1292346C (zh) * 2003-09-12 2006-12-27 国际商业机器公司 用于在分布式计算体系结构中执行作业的系统和方法
US7315951B2 (en) * 2003-10-27 2008-01-01 Nortel Networks Corporation High speed non-volatile electronic memory configuration
US7440553B2 (en) * 2004-02-04 2008-10-21 Samsung Electronics Co., Ltd. Apparatus and method for checkpointing a half-call model in redundant call application nodes
JP4555713B2 (ja) 2005-03-17 2010-10-06 富士通株式会社 エラー通知方法及び情報処理装置
US7478278B2 (en) * 2005-04-14 2009-01-13 International Business Machines Corporation Template based parallel checkpointing in a massively parallel computer system
JP4831599B2 (ja) * 2005-06-28 2011-12-07 ルネサスエレクトロニクス株式会社 処理装置
US20070028144A1 (en) * 2005-07-29 2007-02-01 Stratus Technologies Bermuda Ltd. Systems and methods for checkpointing
US7761426B2 (en) 2005-12-07 2010-07-20 International Business Machines Corporation Apparatus, system, and method for continuously protecting data
US20070180312A1 (en) * 2006-02-01 2007-08-02 Avaya Technology Llc Software duplication
CN101193092A (zh) * 2006-11-29 2008-06-04 鸿富锦精密工业(深圳)有限公司 网络设备及其数据同步传输方法
US20080148095A1 (en) * 2006-12-14 2008-06-19 Motorola, Inc. Automated memory recovery in a zero copy messaging system
US8296768B2 (en) * 2007-06-30 2012-10-23 Intel Corporation Method and apparatus to enable runtime processor migration with operating system assistance
US8799213B2 (en) * 2007-07-31 2014-08-05 Oracle International Corporation Combining capture and apply in a distributed information sharing system
US7801852B2 (en) * 2007-07-31 2010-09-21 Oracle International Corporation Checkpoint-free in log mining for distributed information sharing
JP5392594B2 (ja) * 2008-03-05 2014-01-22 日本電気株式会社 仮想計算機冗長化システム、コンピュータシステム、仮想計算機冗長化方法、及びプログラム
JP5288185B2 (ja) * 2009-01-07 2013-09-11 日本電気株式会社 ネットワークインタフェース、計算機システム、それらの動作方法、及びプログラム
US9230002B2 (en) 2009-01-30 2016-01-05 Oracle International Corporation High performant information sharing and replication for single-publisher and multiple-subscriber configuration
US20100229029A1 (en) * 2009-03-06 2010-09-09 Frazier Ii Robert Claude Independent and dynamic checkpointing system and method
US20110154133A1 (en) * 2009-12-22 2011-06-23 International Business Machines Corporation Techniques for enhancing firmware-assisted system dump in a virtualized computer system employing active memory sharing
US8799710B2 (en) * 2012-06-28 2014-08-05 International Business Machines Corporation 3-D stacked multiprocessor structures and methods to enable reliable operation of processors at speeds above specified limits
JP2014010739A (ja) 2012-07-02 2014-01-20 Fujitsu Ltd システムの状態の復元についての情報処理方法、情報処理プログラム及び情報処理装置
KR101638437B1 (ko) 2012-09-25 2016-07-12 한국전자통신연구원 고장감내처리 시스템의 동작방법
US9069701B2 (en) 2012-12-11 2015-06-30 International Business Machines Corporation Virtual machine failover
US9251002B2 (en) 2013-01-15 2016-02-02 Stratus Technologies Bermuda Ltd. System and method for writing checkpointing data
US9563452B2 (en) 2013-06-28 2017-02-07 Sap Se Cloud-enabled, distributed and high-availability system with virtual machine checkpointing
ES2652262T3 (es) 2013-12-30 2018-02-01 Stratus Technologies Bermuda Ltd. Método de retardar puntos de comprobación inspeccionando paquetes de red
JP6518672B2 (ja) 2013-12-30 2019-05-22 ストラタス・テクノロジーズ・バミューダ・リミテッド 動的チェックポインティングシステムおよび方法
EP3090336A1 (de) 2013-12-30 2016-11-09 Paul A. Leveille Checkpointingsysteme und verfahren zur verwendung von datenweiterleitung

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS57137949A (en) * 1981-02-18 1982-08-25 Nec Corp Error recovery system of logical device
JPS5860359A (ja) * 1981-10-06 1983-04-09 Toshiba Corp 複合計算機システム
US4590554A (en) * 1982-11-23 1986-05-20 Parallel Computers Systems, Inc. Backup fault tolerant computer system
US5155678A (en) * 1985-10-29 1992-10-13 International Business Machines Corporation Data availability in restartable data base system
SE454730B (sv) * 1986-09-19 1988-05-24 Asea Ab Forfarande och datorutrustning for stotfri omkoppling av funktionen fran aktiva enheter till beredskapsenheter i en centralenhet
JP2514208B2 (ja) * 1987-07-15 1996-07-10 富士通株式会社 ホットスタンドバイメモリ−コピ−方式

Also Published As

Publication number Publication date
EP0441087B1 (de) 1995-08-16
DE69021712D1 (de) 1995-09-21
JPH04213736A (ja) 1992-08-04
US5235700A (en) 1993-08-10
EP0441087A1 (de) 1991-08-14
JP2505928B2 (ja) 1996-06-12

Similar Documents

Publication Publication Date Title
DE69021712D1 (de) Wiederanlaufkennzeichnungsmechanismus für fehlertolerierende Systeme.
DE69311797D1 (de) Fehlertolerantes computersystem mit vorrichtung fuer die bearbeitung von externen ereignissen
DE69122713D1 (de) Fehlertolerantes rechnersystem
EP0864126B1 (de) Prüfpunkt-fernspeichersystem und -verfahren für ein fehlertolerantes system
DE3650651T2 (de) Fehlertolerantes Datenverarbeitungssystem
US5745672A (en) Main memory system and checkpointing protocol for a fault-tolerant computer system using a read buffer
US4965717B1 (de)
EP0363863A2 (de) Verfahren und Vorrichtung zum Wiederanlauf nach einem Fehler in einem digitalen Rechnersystem
EP0433979A3 (en) Fault-tolerant computer system with/config filesystem
EP0881569B1 (de) Dateiensystem und Dateienverwaltungsverfahren, die eine verteilte Replikation in einem System mit gemeinsamen RAID verwirklichen
KR100298319B1 (ko) 통신시스템에서의 이중화 장치_
Damani et al. Optimistic distributed simulation based on transitive dependency tracking
KR930010952B1 (ko) 메모리 장애 처리 방법
Nett et al. Dynamic actions: a flexible model integrating fault-tolerant and real-time requirements of distributed systems
JPH05108388A (ja) プロセス復旧方式
JPH0323298B2 (de)
SHEPARD et al. Fault tolerance for real-time multiprocessor operating systems
JPH0713792A (ja) ホットスタンバイシステムにおけるエラー制御方式
HINKEY et al. A fault recovery mechanism using logical bus addressing
JPS60251443A (ja) プログラマブルコントロ−ラのバツクアツプ装置
JPS52149046A (en) Data automatic recovery system at computer abnormality time
BANATRE et al. The Fault Tolerant Multiprocessor(FTM)(Presentation du Fault Tolerant Multiprocesseur/FTM/)
JPH04256134A (ja) デュプレックスシステムリカバリ方式
JPS58112120A (ja) 初期プログラム・ロ−ド方式
JPS5794863A (en) Collecting system of fault information

Legal Events

Date Code Title Description
8364 No opposition during term of opposition
8339 Ceased/non-payment of the annual fee