DE60004365D1 - System und verfahren zur überwachung von einem verteilten fehlertoleranten rechnersystem - Google Patents

System und verfahren zur überwachung von einem verteilten fehlertoleranten rechnersystem

Info

Publication number
DE60004365D1
DE60004365D1 DE60004365T DE60004365T DE60004365D1 DE 60004365 D1 DE60004365 D1 DE 60004365D1 DE 60004365 T DE60004365 T DE 60004365T DE 60004365 T DE60004365 T DE 60004365T DE 60004365 D1 DE60004365 D1 DE 60004365D1
Authority
DE
Germany
Prior art keywords
computer system
tolerant computer
monitoring
bus
management subsystem
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
DE60004365T
Other languages
English (en)
Other versions
DE60004365T2 (de
Inventor
Hossein Moiin
Peter Martin Grant Dickinson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Microsystems Inc
Original Assignee
Sun Microsystems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Microsystems Inc filed Critical Sun Microsystems Inc
Application granted granted Critical
Publication of DE60004365D1 publication Critical patent/DE60004365D1/de
Publication of DE60004365T2 publication Critical patent/DE60004365T2/de
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2002Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant
    • G06F11/2007Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where interconnections or communication control functionality are redundant using redundant communication media
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/88Monitoring involving counting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Hardware Redundancy (AREA)
DE60004365T 1999-06-29 2000-05-16 System und verfahren zur überwachung von einem verteilten fehlertoleranten rechnersystem Expired - Fee Related DE60004365T2 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US343146 1999-06-29
US09/343,146 US6550017B1 (en) 1999-06-29 1999-06-29 System and method of monitoring a distributed fault tolerant computer system
PCT/US2000/013457 WO2001001246A1 (en) 1999-06-29 2000-05-16 System and method of monitoring a distributed fault tolerant computer system

Publications (2)

Publication Number Publication Date
DE60004365D1 true DE60004365D1 (de) 2003-09-11
DE60004365T2 DE60004365T2 (de) 2004-06-24

Family

ID=23344889

Family Applications (1)

Application Number Title Priority Date Filing Date
DE60004365T Expired - Fee Related DE60004365T2 (de) 1999-06-29 2000-05-16 System und verfahren zur überwachung von einem verteilten fehlertoleranten rechnersystem

Country Status (4)

Country Link
US (1) US6550017B1 (de)
EP (1) EP1190320B1 (de)
DE (1) DE60004365T2 (de)
WO (1) WO2001001246A1 (de)

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3880760B2 (ja) * 1999-12-28 2007-02-14 富士通株式会社 切替制御装置、切替制御方法および切替制御プログラムを記録したコンピュータ読み取り可能な記録媒体
US6708283B1 (en) 2000-04-13 2004-03-16 Stratus Technologies, Bermuda Ltd. System and method for operating a system with redundant peripheral bus controllers
US6633996B1 (en) * 2000-04-13 2003-10-14 Stratus Technologies Bermuda Ltd. Fault-tolerant maintenance bus architecture
US20040208842A1 (en) * 2001-09-18 2004-10-21 Ritchie Branson W. Antimicrobial cleansing compositions and methods of use
US6751683B1 (en) * 2000-09-29 2004-06-15 International Business Machines Corporation Method, system and program products for projecting the impact of configuration changes on controllers
US7263620B2 (en) * 2001-08-07 2007-08-28 Hewlett-Packard Development Company, L.P. System and method for graceful shutdown of host processor cards in a server system
DE10144070A1 (de) * 2001-09-07 2003-03-27 Philips Corp Intellectual Pty Kommunikationsnetzwerk und Verfahren zur Steuerung des Kommunikationsnetzwerks
US7565424B2 (en) * 2001-10-16 2009-07-21 International Business Machines Corporation Data processing system, method, and product for reporting loss of service application
US6766482B1 (en) 2001-10-31 2004-07-20 Extreme Networks Ethernet automatic protection switching
US7610366B2 (en) * 2001-11-06 2009-10-27 Canon Kabushiki Kaisha Dynamic network device reconfiguration
US20040153841A1 (en) * 2003-01-16 2004-08-05 Silicon Graphics, Inc. Failure hierarchy in a cluster filesystem
CN1784325A (zh) * 2003-05-06 2006-06-07 皇家飞利浦电子股份有限公司 在tdma总线中不同周期之上的时隙共享
US7506215B1 (en) * 2003-12-09 2009-03-17 Unisys Corporation Method for health monitoring with predictive health service in a multiprocessor system
US7254750B1 (en) 2004-03-30 2007-08-07 Unisys Corporation Health trend analysis method on utilization of network resources
US7448012B1 (en) 2004-04-21 2008-11-04 Qi-De Qian Methods and system for improving integrated circuit layout
JP4353005B2 (ja) * 2004-06-29 2009-10-28 株式会社日立製作所 クラスタ構成コンピュータシステムの系切替方法
WO2006045773A2 (de) * 2004-10-25 2006-05-04 Robert Bosch Gmbh Vorrichtung und verfahren zur modusumschaltung bei einem rechnersystem mit wenigstens zwei ausführungseinheiten
US7337350B2 (en) * 2005-02-09 2008-02-26 Hitachi, Ltd. Clustered storage system with external storage systems
US8542574B2 (en) * 2005-06-29 2013-09-24 Honeywell International Inc. Apparatus and method for network error prevention
US8260492B2 (en) * 2005-08-05 2012-09-04 Honeywell International Inc. Method and system for redundancy management of distributed and recoverable digital control system
DE502007004858D1 (de) * 2006-03-23 2010-10-07 Fujitsu Technology Solutions I Verfahren und managementsystem zum konfigurieren eines informationssystems
US7958396B2 (en) * 2006-05-19 2011-06-07 Microsoft Corporation Watchdog processors in multicore systems
US8028195B2 (en) * 2007-12-18 2011-09-27 International Business Machines Corporation Structure for indicating status of an on-chip power supply system
US7917806B2 (en) * 2007-12-18 2011-03-29 International Business Machines Corporation System and method for indicating status of an on-chip power supply system
US8271437B2 (en) * 2008-07-28 2012-09-18 International Business Machines Corporation Managing locks across distributed computing nodes
US9740178B2 (en) 2013-03-14 2017-08-22 GM Global Technology Operations LLC Primary controller designation in fault tolerant systems
US9183174B2 (en) 2013-03-15 2015-11-10 Qualcomm Incorporated Use case based reconfiguration of co-processor cores for general purpose processors
CN106293979B (zh) * 2015-06-25 2019-11-15 伊姆西公司 检测进程无响应的方法和装置
US9904587B1 (en) 2015-12-18 2018-02-27 Amazon Technologies, Inc. Detecting anomalous behavior in an electronic environment using hardware-based information

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4263647A (en) 1979-02-07 1981-04-21 Allen-Bradley Company Fault monitor for numerical control system
US4586179A (en) * 1983-12-09 1986-04-29 Zenith Electronics Corporation Microprocessor reset with power level detection and watchdog timer
US4887076A (en) 1987-10-16 1989-12-12 Digital Equipment Corporation Computer interconnect coupler for clusters of data processing devices
US4956842A (en) * 1988-11-16 1990-09-11 Sundstrand Corporation Diagnostic system for a watchdog timer
US4964017A (en) 1989-04-26 1990-10-16 Intelligent Instrumentation, Inc. Adaptable housing for embedding of computer-controlled products
US5397176A (en) 1992-10-02 1995-03-14 Compaq Computer Corporation Lockable computer tower unit housing
US5392192A (en) 1993-03-22 1995-02-21 Apple Computer, Inc. Methods and apparatus for snap-together computer construction
CA2097476C (en) 1993-06-01 1997-07-01 Joseph Germain Maurice Paul Girard Computer case with adjustable drive housing for interchangeable desktop/tower configuration
US5447367A (en) 1994-05-31 1995-09-05 Wei; Hui-Yao Structure of vertical computer mainframe
US5568611A (en) 1994-07-29 1996-10-22 International Business Machines Corporation Unauthorized access monitor
US5560033A (en) * 1994-08-29 1996-09-24 Lucent Technologies Inc. System for providing automatic power control for highly available n+k processors
US5547272A (en) 1995-04-24 1996-08-20 At&T Global Information Solutions Company Modular cabinet bezel
US5542757A (en) 1995-10-19 1996-08-06 Chang; Chia-Chi Front panel assembly of a diskdrive case
US5884988A (en) 1997-07-08 1999-03-23 Sun Microsystems, Inc. Toner-type computer housing for peripherals
US6230181B1 (en) * 1997-11-03 2001-05-08 3Com Corporation Management shutdown and reset of embedded systems
USD426198S (en) 1999-01-19 2000-06-06 Sun Microsystems, Inc. Computer cabinet
USD425879S (en) 1999-01-19 2000-05-30 Sun Microsystems, Inc. Computer bezel

Also Published As

Publication number Publication date
US6550017B1 (en) 2003-04-15
DE60004365T2 (de) 2004-06-24
WO2001001246A1 (en) 2001-01-04
EP1190320A1 (de) 2002-03-27
EP1190320B1 (de) 2003-08-06

Similar Documents

Publication Publication Date Title
DE60004365D1 (de) System und verfahren zur überwachung von einem verteilten fehlertoleranten rechnersystem
GB2434670A (en) Monitoring and management of distributed information systems
WO2001041529A3 (en) Method and apparatus for disabling a clock signal within a multithreaded processor
US4691126A (en) Redundant synchronous clock system
JPS57139861A (en) Multicomputer system
JP3164360B2 (ja) ウォッチドッグ回路を有するマイクロプロセッサ回路装置及びそのプロセッサプログラムの流れを監視する方法
KR100687616B1 (ko) 프로세서의 장애 감지 복구 장치 및 그 방법
JPS60258656A (ja) マイクロプロセツサリセツト回路
JP2592525B2 (ja) 共通バスシステムの異常検出回路
JPH064301A (ja) 時分割割込制御方式
JPH064418A (ja) メモリバックアップ装置の異常処理方式
SU1035569A1 (ru) Устройство дл контрол параметров
JPS6236270B2 (de)
JP2774595B2 (ja) Cpuシステムの動作監視装置
JP2749994B2 (ja) 数値制御装置
JPS6084602A (ja) 誤操作防止回路
SU1444783A1 (ru) Устройство дл контрол микропроцессора
JPH05257748A (ja) マイクロプロセッサ装置
JPS6343561Y2 (de)
JPS61169036A (ja) システム監視装置
JPH01185742A (ja) プログラム暴走検出回路
JPS60238947A (ja) マイクロプロセツサの誤動作検出回路
JPH07230432A (ja) 計算装置
JPS6252650A (ja) メモリのチエツク方法
JPH02271449A (ja) バス障害検出方式

Legal Events

Date Code Title Description
8364 No opposition during term of opposition
8339 Ceased/non-payment of the annual fee