CN102959521B - 计算机系统的管理方法以及管理系统 - Google Patents

计算机系统的管理方法以及管理系统 Download PDF

Info

Publication number
CN102959521B
CN102959521B CN201080067454.0A CN201080067454A CN102959521B CN 102959521 B CN102959521 B CN 102959521B CN 201080067454 A CN201080067454 A CN 201080067454A CN 102959521 B CN102959521 B CN 102959521B
Authority
CN
China
Prior art keywords
failure
analytical result
node apparatus
event
fault analytical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201080067454.0A
Other languages
English (en)
Chinese (zh)
Other versions
CN102959521A (zh
Inventor
永井崇之
国井雅
增田峰义
黑田泽希
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Publication of CN102959521A publication Critical patent/CN102959521A/zh
Application granted granted Critical
Publication of CN102959521B publication Critical patent/CN102959521B/zh
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Debugging And Monitoring (AREA)
CN201080067454.0A 2010-07-16 2010-07-28 计算机系统的管理方法以及管理系统 Expired - Fee Related CN102959521B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2010161724A JP5419819B2 (ja) 2010-07-16 2010-07-16 計算機システムの管理方法、及び管理システム
JP2010-161724 2010-07-16
PCT/JP2010/062696 WO2012008058A1 (ja) 2010-07-16 2010-07-28 計算機システムの管理方法、及び管理システム

Publications (2)

Publication Number Publication Date
CN102959521A CN102959521A (zh) 2013-03-06
CN102959521B true CN102959521B (zh) 2015-11-25

Family

ID=45469079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201080067454.0A Expired - Fee Related CN102959521B (zh) 2010-07-16 2010-07-28 计算机系统的管理方法以及管理系统

Country Status (5)

Country Link
US (1) US8429455B2 (https=)
EP (1) EP2562651A4 (https=)
JP (1) JP5419819B2 (https=)
CN (1) CN102959521B (https=)
WO (1) WO2012008058A1 (https=)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2674851B1 (en) * 2011-02-10 2018-01-17 Fujitsu Limited Storage control device, storage device, storage system, storage control method, and program for same
JP5658417B2 (ja) * 2012-02-27 2015-01-28 株式会社日立製作所 監視システム及び監視プログラム
US9354961B2 (en) * 2012-03-23 2016-05-31 Hitachi, Ltd. Method and system for supporting event root cause analysis
WO2014033945A1 (ja) * 2012-09-03 2014-03-06 株式会社日立製作所 複数の監視対象デバイスを有する計算機システムの管理を行う管理システム
WO2014068705A1 (ja) * 2012-10-31 2014-05-08 株式会社日立製作所 監視システム及び監視プログラム
US9619314B2 (en) * 2013-04-05 2017-04-11 Hitachi, Ltd. Management system and management program
WO2014184941A1 (ja) * 2013-05-17 2014-11-20 株式会社日立製作所 ストレージ装置
JP6009089B2 (ja) * 2013-09-18 2016-10-19 株式会社日立製作所 計算機システムを管理する管理システム及びその管理方法
WO2015079564A1 (ja) * 2013-11-29 2015-06-04 株式会社日立製作所 イベントの根本原因の解析を支援する管理システム及び方法
US9948693B2 (en) * 2014-02-24 2018-04-17 Ca, Inc. Generic cloud service for publishing data to be consumed by RSS readers
JP6387777B2 (ja) * 2014-06-13 2018-09-12 富士通株式会社 評価プログラム、評価方法、および評価装置
CN105223884A (zh) * 2015-09-30 2016-01-06 国网北京市电力公司 故障图像的推送方法及装置
CN112005223A (zh) * 2018-09-24 2020-11-27 惠普发展公司,有限责任合伙企业 设备状态评估
CN113590016B (zh) * 2020-04-30 2025-02-14 伊姆西Ip控股有限责任公司 用于管理存储盘的方法、电子设备和计算机程序产品
CN112415885B (zh) * 2020-11-30 2022-07-05 北京控制工程研究所 一种适用于多机多总线冗余容错系统的通用总线管理方法
CN119697001A (zh) * 2024-11-27 2025-03-25 烽火通信科技股份有限公司 一种设备脱管故障分析方法、系统、电子设备及存储介质
CN119938463B (zh) * 2024-12-11 2025-11-04 北京航空航天大学 基于改进TimesNet的云数据中心负载预测方法和系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050137832A1 (en) * 1994-05-25 2005-06-23 System Management Arts, Inc. Apparatus and method for event correlation and problem reporting
CN101295229A (zh) * 2007-04-24 2008-10-29 株式会社日立制作所 管理装置及管理方法
US20090300428A1 (en) * 2008-05-27 2009-12-03 Hitachi, Ltd. Method of collecting information in system network
JP2010128661A (ja) * 2008-11-26 2010-06-10 Fujitsu Ltd 故障原因推測方法、故障原因推測装置、及びプログラム

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0730540A (ja) * 1993-07-08 1995-01-31 Hitachi Ltd ネットワーク障害監視装置
US7107185B1 (en) 1994-05-25 2006-09-12 Emc Corporation Apparatus and method for event correlation and problem reporting
US6119074A (en) * 1998-05-20 2000-09-12 Caterpillar Inc. Method and apparatus of predicting a fault condition
JP3996040B2 (ja) * 2002-11-06 2007-10-24 株式会社日立製作所 データベース乱れ解消処理方法及びその実施装置並びにその処理プログラム
JP4872262B2 (ja) * 2005-07-27 2012-02-08 日本電気株式会社 管理支援システム、管理支援方法、および管理支援プログラム
US8112378B2 (en) * 2008-06-17 2012-02-07 Hitachi, Ltd. Methods and systems for performing root cause analysis
JP4981974B2 (ja) * 2009-03-24 2012-07-25 株式会社日立製作所 管理システム及び情報処理システム
US8429453B2 (en) * 2009-07-16 2013-04-23 Hitachi, Ltd. Management system for outputting information denoting recovery method corresponding to root cause of failure
JP5542398B2 (ja) * 2009-09-30 2014-07-09 株式会社日立製作所 障害の根本原因解析結果表示方法、装置、及びシステム

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050137832A1 (en) * 1994-05-25 2005-06-23 System Management Arts, Inc. Apparatus and method for event correlation and problem reporting
CN101295229A (zh) * 2007-04-24 2008-10-29 株式会社日立制作所 管理装置及管理方法
US20090300428A1 (en) * 2008-05-27 2009-12-03 Hitachi, Ltd. Method of collecting information in system network
JP2010128661A (ja) * 2008-11-26 2010-06-10 Fujitsu Ltd 故障原因推測方法、故障原因推測装置、及びプログラム

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
A fault detection service for wide area distributed computations;Paul Stelling 等;《Cluster Computing》;19990901;第2卷(第2期);第117-128页 *

Also Published As

Publication number Publication date
EP2562651A4 (en) 2017-08-23
EP2562651A1 (en) 2013-02-27
JP5419819B2 (ja) 2014-02-19
WO2012008058A1 (ja) 2012-01-19
US20120017127A1 (en) 2012-01-19
CN102959521A (zh) 2013-03-06
JP2012022614A (ja) 2012-02-02
US8429455B2 (en) 2013-04-23

Similar Documents

Publication Publication Date Title
CN102959521B (zh) 计算机系统的管理方法以及管理系统
US10761926B2 (en) Server hardware fault analysis and recovery
JP5684946B2 (ja) イベントの根本原因の解析を支援する方法及びシステム
US9021077B2 (en) Management computer and method for root cause analysis
CN107431643B (zh) 用于监测存储集群元件的方法和装置
EP2887222B1 (en) Management system and management program
JP5432867B2 (ja) 計算機システムの管理方法、及び管理システム
US20120066376A1 (en) Management method of computer system and management system
JP5222876B2 (ja) 計算機システムにおけるシステム管理方法、及び管理システム
US20160378583A1 (en) Management computer and method for evaluating performance threshold value
US20120102362A1 (en) Management system and management method
EP2397947A1 (en) Computer for specifying event generation origins in a computer system including a plurality of node devices
US10275330B2 (en) Computer readable non-transitory recording medium storing pseudo failure generation program, generation method, and generation apparatus
JP2019009726A (ja) 障害切り分け方法および管理サーバ
US9021078B2 (en) Management method and management system
US10866875B2 (en) Storage apparatus, storage system, and performance evaluation method using cyclic information cycled within a group of storage apparatuses
JP2007323193A (ja) 性能負荷異常検出システム、性能負荷異常検出方法、及びプログラム
JP7694341B2 (ja) 判定プログラム、判定方法、及び、情報処理装置
JP2003131905A (ja) 管理サーバシステム
Jiang Understanding storage system problems and diagnosing them through log analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20151125

Termination date: 20180728