TW366447B - Fault tolerant computer system - Google Patents

Fault tolerant computer system

Info

Publication number
TW366447B
TW366447B TW087104963A TW87104963A TW366447B TW 366447 B TW366447 B TW 366447B TW 087104963 A TW087104963 A TW 087104963A TW 87104963 A TW87104963 A TW 87104963A TW 366447 B TW366447 B TW 366447B
Authority
TW
Taiwan
Prior art keywords
cpus
computer
voted
memory
vote
Prior art date
Application number
TW087104963A
Other languages
English (en)
Inventor
Andrew J Wardrop
Original Assignee
Gen Dynamics Inf Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gen Dynamics Inf Systems Inc filed Critical Gen Dynamics Inf Systems Inc
Application granted granted Critical
Publication of TW366447B publication Critical patent/TW366447B/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1658Data re-synchronization of a redundant component, or initial sync of replacement, additional or spare unit
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B9/00Safety arrangements
    • G05B9/02Safety arrangements electric
    • G05B9/03Safety arrangements electric with multiple-channel loop, i.e. redundant control systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/1675Temporal synchronisation or re-synchronisation of redundant processing components
    • G06F11/1679Temporal synchronisation or re-synchronisation of redundant processing components at clock signal level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/18Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
    • G06F11/181Eliminating the failing redundant component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/18Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
    • G06F11/183Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits by voting, the voting not being performed by the redundant components
    • G06F11/184Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits by voting, the voting not being performed by the redundant components where the redundant components implement processing functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/18Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits
    • G06F11/187Voting techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2038Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with a single idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2043Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant where the redundant components share a common memory address space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Automation & Control Theory (AREA)
  • Computer Hardware Design (AREA)
  • Hardware Redundancy (AREA)
TW087104963A 1997-04-02 1998-04-02 Fault tolerant computer system TW366447B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US08/832,479 US5903717A (en) 1997-04-02 1997-04-02 Fault tolerant computer system

Publications (1)

Publication Number Publication Date
TW366447B true TW366447B (en) 1999-08-11

Family

ID=25261774

Family Applications (1)

Application Number Title Priority Date Filing Date
TW087104963A TW366447B (en) 1997-04-02 1998-04-02 Fault tolerant computer system

Country Status (8)

Country Link
US (1) US5903717A (zh)
EP (1) EP0972244A4 (zh)
JP (1) JP2001505338A (zh)
KR (1) KR20010005956A (zh)
CN (1) CN1259212A (zh)
IL (1) IL132075A0 (zh)
TW (1) TW366447B (zh)
WO (1) WO1998044416A1 (zh)

Families Citing this family (89)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6141769A (en) * 1996-05-16 2000-10-31 Resilience Corporation Triple modular redundant computer system and associated method
US7714778B2 (en) 1997-08-20 2010-05-11 Tracbeam Llc Wireless location gateway and applications therefor
US5923830A (en) * 1997-05-07 1999-07-13 General Dynamics Information Systems, Inc. Non-interrupting power control for fault tolerant computer systems
US6044487A (en) * 1997-12-16 2000-03-28 International Business Machines Corporation Majority voting scheme for hard error sites
US6412082B1 (en) * 1997-12-17 2002-06-25 Sony Corporation Method and apparatus for selecting computer programs based on an error detection mechanism
US6085350A (en) * 1998-03-04 2000-07-04 Motorola, Inc. Single event upset tolerant system and method
DE19831720A1 (de) * 1998-07-15 2000-01-20 Alcatel Sa Verfahren zur Ermittlung einer einheitlichen globalen Sicht vom Systemzustand eines verteilten Rechnernetzwerks
JP3293125B2 (ja) * 1998-07-24 2002-06-17 日本電気株式会社 オンチップマルチプロセッサシステムにおける初期設定・診断方式
EP1157324A4 (en) * 1998-12-18 2009-06-17 Triconex Corp PROCESS AND DEVICE FOR PROCESSING CONTROL USING A MULTIPLE REDUNDANT PROCESS CONTROL SYSTEM
US6801951B1 (en) * 1999-10-08 2004-10-05 Honeywell International Inc. System and method for fault-tolerant clock synchronization using interactive convergence
US6732300B1 (en) * 2000-02-18 2004-05-04 Lev Freydel Hybrid triple redundant computer system
US6687851B1 (en) * 2000-04-13 2004-02-03 Stratus Technologies Bermuda Ltd. Method and system for upgrading fault-tolerant systems
US6820213B1 (en) 2000-04-13 2004-11-16 Stratus Technologies Bermuda, Ltd. Fault-tolerant computer system with voter delay buffer
US20010042202A1 (en) * 2000-04-14 2001-11-15 Horvath Charles J. Dynamically extendible firewall
US6504411B2 (en) 2000-11-02 2003-01-07 Intersil Americas Inc. Redundant latch circuit and associated methods
US6563347B2 (en) 2000-11-20 2003-05-13 Intersil Americas Inc. Redundant comparator design for improved offset voltage and single event effects hardness
US6525590B2 (en) 2001-02-01 2003-02-25 Intersil Americas Inc. Spatially redundant and complementary semiconductor device-based, single event transient-resistant linear amplifier circuit architecture
EP1239369A1 (de) 2001-03-07 2002-09-11 Siemens Aktiengesellschaft Fehlertolerante Rechneranordnung und Verfahren zum Betrieb einer derartigen Anordnung
US7065672B2 (en) * 2001-03-28 2006-06-20 Stratus Technologies Bermuda Ltd. Apparatus and methods for fault-tolerant computing using a switching fabric
US6862693B2 (en) * 2001-04-13 2005-03-01 Sun Microsystems, Inc. Providing fault-tolerance by comparing addresses and data from redundant processors running in lock-step
US6952756B1 (en) * 2001-05-08 2005-10-04 Lewiz Communications Method and apparatus for speculative loading of a memory
KR100402757B1 (ko) * 2001-08-22 2003-10-22 한국전자통신연구원 시스템보드의 오류검사 장치 및 방법
KR100449232B1 (ko) * 2001-12-04 2004-09-18 한국전기연구원 다중화 제어기용 펄스 보우팅 방법
JP2003316599A (ja) * 2002-02-22 2003-11-07 Seiko Epson Corp 集積回路
JP2004046455A (ja) * 2002-07-10 2004-02-12 Nec Corp 情報処理装置
JP3774826B2 (ja) * 2002-07-11 2006-05-17 日本電気株式会社 情報処理装置
JP3982353B2 (ja) * 2002-07-12 2007-09-26 日本電気株式会社 フォルトトレラントコンピュータ装置、その再同期化方法及び再同期化プログラム
US7260742B2 (en) * 2003-01-28 2007-08-21 Czajkowski David R SEU and SEFI fault tolerant computer
US7467326B2 (en) * 2003-02-28 2008-12-16 Maxwell Technologies, Inc. Self-correcting computer
JP3737810B2 (ja) * 2003-05-09 2006-01-25 株式会社東芝 計算機システム及び故障計算機代替制御プログラム
US7134104B2 (en) * 2003-12-05 2006-11-07 International Business Machines Corporation Method of selectively building redundant logic structures to improve fault tolerance
JP3808874B2 (ja) * 2004-03-12 2006-08-16 東芝ソリューション株式会社 分散システム及び多重化制御方法
JP4452533B2 (ja) * 2004-03-19 2010-04-21 株式会社日立製作所 システムおよび記憶装置システム
US20050240806A1 (en) * 2004-03-30 2005-10-27 Hewlett-Packard Development Company, L.P. Diagnostic memory dump method in a redundant processor
US20050273653A1 (en) * 2004-05-19 2005-12-08 Honeywell International Inc. Single fault tolerance in an architecture with redundant systems
US7392426B2 (en) * 2004-06-15 2008-06-24 Honeywell International Inc. Redundant processing architecture for single fault tolerance
US7684654B2 (en) * 2004-06-30 2010-03-23 General Electric Company System and method for fault detection and recovery in a medical imaging system
TWI306241B (en) * 2004-07-12 2009-02-11 Infortrend Technology Inc A controller capable of self-monitoring, a redundant storage system having the same, and its method
US7308605B2 (en) * 2004-07-20 2007-12-11 Hewlett-Packard Development Company, L.P. Latent error detection
US7404105B2 (en) * 2004-08-16 2008-07-22 International Business Machines Corporation High availability multi-processor system
US7328371B1 (en) * 2004-10-15 2008-02-05 Advanced Micro Devices, Inc. Core redundancy in a chip multiprocessor for highly reliable systems
US7971095B2 (en) 2005-02-16 2011-06-28 Honeywell International Inc. Fault recovery for real-time, multi-tasking computer system
US20060190700A1 (en) * 2005-02-22 2006-08-24 International Business Machines Corporation Handling permanent and transient errors using a SIMD unit
US20060212677A1 (en) * 2005-03-15 2006-09-21 Intel Corporation Multicore processor having active and inactive execution cores
US20060236168A1 (en) * 2005-04-01 2006-10-19 Honeywell International Inc. System and method for dynamically optimizing performance and reliability of redundant processing systems
WO2007018651A1 (en) * 2005-08-05 2007-02-15 Honeywell International, Inc. Method for redunancy management of distributed and recoverable digital control system
WO2007094808A1 (en) * 2005-08-05 2007-08-23 Honeywell International Inc. Monitoring system and methods for a distributed and recoverable digital control system
US7725215B2 (en) * 2005-08-05 2010-05-25 Honeywell International Inc. Distributed and recoverable digital control system
DE102005037236A1 (de) * 2005-08-08 2007-02-15 Robert Bosch Gmbh Vorrichtung und Verfahren zur Konfiguration einer Halbleiterschaltung
US7502957B2 (en) * 2005-09-09 2009-03-10 International Business Machines Corporation Method and system to execute recovery in non-homogeneous multi processor environments
US7421601B2 (en) 2006-02-17 2008-09-02 International Business Machines Corporation Method and system for controlling power in a chip through a power-performance monitor and control unit
US20070220369A1 (en) * 2006-02-21 2007-09-20 International Business Machines Corporation Fault isolation and availability mechanism for multi-processor system
US7587663B2 (en) * 2006-05-22 2009-09-08 Intel Corporation Fault detection using redundant virtual machines
US7793147B2 (en) * 2006-07-18 2010-09-07 Honeywell International Inc. Methods and systems for providing reconfigurable and recoverable computing resources
US7898937B2 (en) * 2006-12-06 2011-03-01 Cisco Technology, Inc. Voting to establish a new network master device after a network failover
US8412981B2 (en) * 2006-12-29 2013-04-02 Intel Corporation Core sparing on multi-core platforms
US7797575B2 (en) * 2007-04-04 2010-09-14 International Business Machines Corporation Triple voting cell processors for single event upset protection
JP2009238068A (ja) * 2008-03-28 2009-10-15 Fujitsu Ltd 通信制御装置、通信制御方法
JP5195913B2 (ja) * 2008-07-22 2013-05-15 トヨタ自動車株式会社 マルチコアシステム、車両用電子制御ユニット、タスク切り替え方法
US8090984B2 (en) * 2008-12-10 2012-01-03 Freescale Semiconductor, Inc. Error detection and communication of an error location in multi-processor data processing system having processors operating in Lockstep
CN101478428B (zh) * 2009-01-20 2011-03-23 北京全路通信信号研究设计院 软硬件协同的以太网故障安全通信系统和数据传输方法
US8082425B2 (en) * 2009-04-29 2011-12-20 Advanced Micro Devices, Inc. Reliable execution using compare and transfer instruction on an SMT machine
US8566633B2 (en) * 2011-02-10 2013-10-22 GM Global Technology Operations LLC Method of dynamic allocation on a statically allocated and embedded software architecture
JP5699057B2 (ja) * 2011-08-24 2015-04-08 株式会社日立製作所 プログラマブルデバイス、プログラマブルデバイスのリコンフィグ方法および電子デバイス
CN106502624B (zh) * 2011-11-30 2019-10-18 英特尔公司 用于提供向量横向多数表决功能的处理器、设备和处理系统
US8954794B2 (en) * 2012-06-05 2015-02-10 Infineon Technologies Ag Method and system for detection of latent faults in microcontrollers
US9146882B2 (en) * 2013-02-04 2015-09-29 International Business Machines Corporation Securing the contents of a memory device
US9274909B2 (en) * 2013-08-23 2016-03-01 Scaleo Chip Method and apparatus for error management of an integrated circuit system
CN104765587B (zh) 2014-01-08 2018-12-14 雅特生嵌入式计算有限公司 用于使处理器同步到相同的计算点的系统和方法
KR101764680B1 (ko) * 2014-12-29 2017-08-03 주식회사 효성 이중화 제어 시스템
US10956265B2 (en) 2015-02-03 2021-03-23 Hamilton Sundstrand Corporation Method of performing single event upset testing
JP2017021617A (ja) * 2015-07-13 2017-01-26 株式会社東芝 多重化制御装置
DE102015218898A1 (de) * 2015-09-30 2017-03-30 Robert Bosch Gmbh Verfahren zur redundanten Verarbeitung von Daten
CN105398472B (zh) * 2015-11-06 2017-08-11 湖南中车时代通信信号有限公司 一种平台主机插件
WO2017094162A1 (ja) * 2015-12-03 2017-06-08 三菱電機株式会社 多重系システム
US10481963B1 (en) * 2016-06-29 2019-11-19 Amazon Technologies, Inc. Load-balancing for achieving transaction fault tolerance
CN106648998A (zh) * 2016-12-23 2017-05-10 北京交通大学 一种基于cmc芯片的安全计算机系统
EP3531286B1 (en) * 2018-02-26 2020-08-05 ARM Limited Circuitry
CN109933374B (zh) * 2019-01-23 2021-10-22 西安微电子技术研究所 一种计算机启动方法
CN109828449A (zh) * 2019-01-25 2019-05-31 杭州电子科技大学 一种三模冗余控制计算表决系统及方法
US11868143B2 (en) 2019-04-25 2024-01-09 Aerovironment, Inc. Methods of climb and glide operations of a high altitude long endurance aircraft
SG11202111384PA (en) * 2019-04-25 2021-11-29 Aerovironment Inc Systems and methods for distributed control computing for a high altitude long endurance aircraft
WO2020223114A2 (en) 2019-04-25 2020-11-05 Aero Vironment, Inc. Off-center parachute flight termination system (fts)
CN111431651B (zh) * 2020-03-04 2021-12-07 上海航天控制技术研究所 一种适用于火星探测的多计算机同步运行与时间对准方法
CN111651118B (zh) * 2020-04-27 2023-11-21 中国科学院微电子研究所 存储器系统、控制方法和控制装置
WO2021101643A2 (en) * 2020-10-16 2021-05-27 Futurewei Technologies, Inc. Cpu-gpu lockstep system
CN112558461B (zh) * 2021-02-25 2021-05-14 四川腾盾科技有限公司 一种多余度无人机飞机管理计算机输出信号表决方法
JP2024009696A (ja) * 2022-07-11 2024-01-23 横河電機株式会社 制御装置、制御システム、制御方法、及び制御プログラム
CN117215177A (zh) * 2023-11-09 2023-12-12 北京控制工程研究所 一种天地往返一体化控制系统及控制方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1373014A (en) * 1972-03-23 1974-11-06 Marconi Co Ltd Processor security arrangements
US4228496A (en) * 1976-09-07 1980-10-14 Tandem Computers Incorporated Multiprocessor system
US5280487A (en) * 1989-06-16 1994-01-18 Telefonaktiebolaget L M Ericsson Method and arrangement for detecting and localizing errors or faults in a multi-plane unit incorporated in a digital time switch
US5295258A (en) * 1989-12-22 1994-03-15 Tandem Computers Incorporated Fault-tolerant computer system with online recovery and reintegration of redundant components
DE69231452T2 (de) * 1991-01-25 2001-05-03 Hitachi Ltd Fehlertolerantes Rechnersystem mit Verarbeitungseinheiten die je mindestens drei Rechnereinheiten haben
US5339404A (en) * 1991-05-28 1994-08-16 International Business Machines Corporation Asynchronous TMR processing system
US5233615A (en) * 1991-06-06 1993-08-03 Honeywell Inc. Interrupt driven, separately clocked, fault tolerant processor synchronization
US5349654A (en) * 1992-02-20 1994-09-20 The Boeing Company Fault tolerant data exchange unit
US5680408A (en) * 1994-12-28 1997-10-21 Intel Corporation Method and apparatus for determining a value of a majority of operands
US5742753A (en) * 1996-06-06 1998-04-21 The Boeing Company Mesh interconnected array in a fault-tolerant computer system

Also Published As

Publication number Publication date
KR20010005956A (ko) 2001-01-15
EP0972244A4 (en) 2000-11-15
WO1998044416A1 (en) 1998-10-08
EP0972244A1 (en) 2000-01-19
CN1259212A (zh) 2000-07-05
JP2001505338A (ja) 2001-04-17
US5903717A (en) 1999-05-11
IL132075A0 (en) 2001-03-19

Similar Documents

Publication Publication Date Title
TW366447B (en) Fault tolerant computer system
CN100375050C (zh) 高可靠性处理器的片上机制
US6854070B2 (en) Hot-upgrade/hot-add memory
US6892271B2 (en) Memory module resync
US7320086B2 (en) Error indication in a raid memory system
US6785835B2 (en) Raid memory
US6766469B2 (en) Hot-replace of memory
US7890797B2 (en) Vehicle including a processor system having fault tolerance
US7613948B2 (en) Cache coherency during resynchronization of self-correcting computer
US5423024A (en) Fault tolerant processing section with dynamically reconfigurable voting
JP2608904B2 (ja) 多重冗長誤検出システムおよびその使用方法
CN101930052B (zh) Sram型fpga数字时序电路在线检测容错系统及方法
US6981173B2 (en) Redundant memory sequence and fault isolation
EP0349539B1 (en) Method and apparatus for digital logic synchronism monitoring
US6981095B1 (en) Hot replace power control sequence logic
US20030208654A1 (en) Computer system architecture with hot pluggable main memory boards
EP0868692B1 (en) Processor independent error checking arrangement
US6567950B1 (en) Dynamically replacing a failed chip
US5115511A (en) Arrangement for loading the parameters into active modules in a computer system
JPH06161798A (ja) 情報処理装置
JP3063334B2 (ja) 高信頼度化情報処理装置
Lee et al. Design Of A Fault-Tolerant Microprocessor
JPH06168151A (ja) 2重化計算機システム
JPS62160539A (ja) 中央処理装置の多重化チエツク方式