JP2020027615A - サーバハードウェア障害の分析及びリカバリ - Google Patents
サーバハードウェア障害の分析及びリカバリ Download PDFInfo
- Publication number
- JP2020027615A JP2020027615A JP2019128482A JP2019128482A JP2020027615A JP 2020027615 A JP2020027615 A JP 2020027615A JP 2019128482 A JP2019128482 A JP 2019128482A JP 2019128482 A JP2019128482 A JP 2019128482A JP 2020027615 A JP2020027615 A JP 2020027615A
- Authority
- JP
- Japan
- Prior art keywords
- hardware failure
- failure event
- hardware
- report
- server device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 38
- 238000011084 recovery Methods 0.000 title claims abstract description 37
- 238000000034 method Methods 0.000 claims abstract description 67
- 238000012545 processing Methods 0.000 claims abstract description 8
- 230000008569 process Effects 0.000 claims description 36
- 230000007246 mechanism Effects 0.000 claims description 13
- 230000008439 repair process Effects 0.000 claims description 8
- 238000001514 detection method Methods 0.000 claims description 5
- 238000012502 risk assessment Methods 0.000 claims description 5
- 230000007547 defect Effects 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 claims description 2
- 238000007726 management method Methods 0.000 description 42
- 238000011156 evaluation Methods 0.000 description 8
- 238000013461 design Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000001816 cooling Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000003449 preventive effect Effects 0.000 description 3
- 238000010972 statistical evaluation Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/008—Reliability or availability analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0709—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0769—Readable error formats, e.g. cross-platform generic formats, human understandable formats
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1415—Saving, restoring, recovering or retrying at system level
- G06F11/142—Reconfiguring to eliminate the error
- G06F11/1428—Reconfiguring to eliminate the error with loss of hardware functionality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
- G06F11/2028—Failover techniques eliminating a faulty processor or activating a spare
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3034—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3051—Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/86—Event-based monitoring
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Debugging And Monitoring (AREA)
- Computer And Data Communications (AREA)
Abstract
Description
102…ラックサーバ
104,204…オンサイト管理者
106,206…ネットワーク
108,208…顧客
110,210…サーバデバイス
112…製造者
113,213…データセンタ管理システム
11,21…ストレージエラー
12,22…電源エラー
13,23…CPUエラー
14,24…メモリエラー
15,25…入力/出力エラー
200…データセンタシステム
202…ラックサーバ
212…ITエンジニア
300,400,500,600…方法
Claims (10)
- データセンタシステムで発生するハードウェア障害イベントを自動的に管理する方法であって、
前記ハードウェア障害イベントに対応するハードウェア障害イベント分析を収集する工程であって、前記ハードウェア障害イベント分析は、前記ハードウェア障害イベントを被るサーバデバイスのレポートとして構成されている、工程と、
前記サーバデバイスのレポートから受信した統計データを処理する工程と、
処理された統計データに基づいてハードウェアリカバリを実行する工程と、
を含むことを特徴とする方法。 - 前記ハードウェア障害イベント分析を収集する工程は、ハードウェア障害イベント検出プロセスを、前記サーバデバイスのベースボード管理コントローラ(BMC)ファームウェアに記憶する工程を含み、前記レポートは、ハードウェア障害イベントレポートと、デバイスレポートと、を含む、ことを特徴とする請求項1に記載の方法。
- 前記ハードウェア障害イベントの原因を識別する工程と、前記ハードウェア障害イベントが修復可能又は修復不可能なエラーの何れかの結果であるかを判別する工程であって、前記ハードウェア障害イベントの原因がBIOSサービスルーチンによって決定される、工程と、
前記ハードウェア障害イベントを識別する工程であって、障害位置、障害カテゴリ、障害タイプ、及び、障害重大度のうち少なくとも1つを識別する、工程と、
前記ハードウェア障害イベントの識別の通知をBMCから受信する工程と、
前記レポート内のデータオブジェクトを表現するために、人間が読めるテキストを使用する言語非依存のオープンデータフォーマットを受信する工程と、
を含むことを特徴とする請求項1に記載の方法。 - 前記レポートの分析コンポーネントにおいて、データの中心傾向分析を実行する工程を含み、
前記中心傾向分析は、
前記ハードウェア障害イベントに関連するオペレーティングシステム及びソフトウェアサービスのリスクを分析する工程と、
前記サーバデバイスの保護の方向性を分析する工程と、
前記ハードウェア障害イベントの傾向及び前記ハードウェア障害イベントの影響を予測する工程と、を含む、
ことを特徴とする請求項1に記載の方法。 - 前記ハードウェア障害イベントを測定する工程と、予測性分析プロセスによってリスク評価を生成して、前記ハードウェア障害イベントの診断証明を生成する工程と、を含むことを特徴とする請求項1に記載の方法。
- 前記ハードウェアリカバリを実行する工程は、前記サーバデバイスのリカバリポリシーを検査する工程と、リカバリメカニズムをスケジューリングする工程であって、前記リカバリメカニズムは、前記リカバリポリシーに基づいて、即時的な修復又は遅延の修復の何れかにスケジュールされる、工程と、
前記サーバデバイスの性能欠陥についてハードウェア障害イベントを監視する工程と、
を含むことを特徴とする請求項1に記載の方法。 - データセンタシステムで発生するハードウェア障害イベントを自動的に管理するシステムであって、
それぞれサーバデバイスを有する複数のラックサーバと、
前記サーバデバイスに接続されたデータセンタ管理システムと、を備え、
前記データセンタ管理システムは、
前記ハードウェア障害イベントに対応するハードウェア障害イベント分析を収集する工程であって、前記ハードウェア障害イベント分析は、前記ハードウェア障害イベントを被る前記サーバデバイスのレポートとして構成されている、工程と、
前記サーバデバイスのレポートから受信した統計データを処理する工程と、
評価された統計データに基づいてハードウェアリカバリを実行する工程と、
を行うように構成されている、
ことを特徴とするシステム。 - 前記ハードウェア障害イベント分析を収集する工程は、ハードウェア障害イベント検出システムを、前記サーバデバイスのベースボード管理コントローラ(BMC)ファームウェアに記憶する工程を含み、前記レポートは、ハードウェア障害イベントレポートと、デバイスレポートと、を含む、ことを特徴とする請求項7に記載のシステム。
- 前記データセンタ管理システムは、前記ハードウェア障害イベントの原因を識別する工程と、前記ハードウェア障害イベントが修復可能又は修復不可能なエラーの何れかの結果であるかを判別する工程と、を行うように構成されている、ことを特徴とする請求項7に記載のシステム。
- 前記データセンタ管理システムは、
前記ハードウェア障害を識別する工程であって、障害位置、障害カテゴリ、障害タイプ、及び、障害重大度のうち少なくとも1つを識別する、工程と、
前記ハードウェア障害イベントの識別の通知をBMCから受信する工程と、
前記レポート内のデータオブジェクトを表現するために、人間が読めるテキストを使用する言語非依存のオープンデータフォーマットを受信する工程と、
を行うように構成されている、
ことを特徴とする請求項7に記載のシステム。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/101,749 | 2018-08-13 | ||
US16/101,749 US10761926B2 (en) | 2018-08-13 | 2018-08-13 | Server hardware fault analysis and recovery |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2020027615A true JP2020027615A (ja) | 2020-02-20 |
JP6828096B2 JP6828096B2 (ja) | 2021-02-10 |
Family
ID=67211531
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2019128482A Active JP6828096B2 (ja) | 2018-08-13 | 2019-07-10 | サーバハードウェア障害の分析及びリカバリ |
Country Status (5)
Country | Link |
---|---|
US (1) | US10761926B2 (ja) |
EP (1) | EP3620922A1 (ja) |
JP (1) | JP6828096B2 (ja) |
CN (1) | CN110825578A (ja) |
TW (1) | TWI680369B (ja) |
Families Citing this family (58)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10761743B1 (en) | 2017-07-17 | 2020-09-01 | EMC IP Holding Company LLC | Establishing data reliability groups within a geographically distributed data storage environment |
US10880040B1 (en) | 2017-10-23 | 2020-12-29 | EMC IP Holding Company LLC | Scale-out distributed erasure coding |
US10382554B1 (en) | 2018-01-04 | 2019-08-13 | Emc Corporation | Handling deletes with distributed erasure coding |
US10579297B2 (en) | 2018-04-27 | 2020-03-03 | EMC IP Holding Company LLC | Scaling-in for geographically diverse storage |
US11023130B2 (en) | 2018-06-15 | 2021-06-01 | EMC IP Holding Company LLC | Deleting data in a geographically diverse storage construct |
US10936196B2 (en) | 2018-06-15 | 2021-03-02 | EMC IP Holding Company LLC | Data convolution for geographically diverse storage |
US11436203B2 (en) | 2018-11-02 | 2022-09-06 | EMC IP Holding Company LLC | Scaling out geographically diverse storage |
CN109491826B (zh) * | 2018-11-27 | 2021-02-12 | 英业达科技有限公司 | 远程硬件诊断系统与诊断方法 |
US10901635B2 (en) | 2018-12-04 | 2021-01-26 | EMC IP Holding Company LLC | Mapped redundant array of independent nodes for data storage with high performance using logical columns of the nodes with different widths and different positioning patterns |
US10931777B2 (en) | 2018-12-20 | 2021-02-23 | EMC IP Holding Company LLC | Network efficient geographically diverse data storage system employing degraded chunks |
US11119683B2 (en) | 2018-12-20 | 2021-09-14 | EMC IP Holding Company LLC | Logical compaction of a degraded chunk in a geographically diverse data storage system |
US10892782B2 (en) | 2018-12-21 | 2021-01-12 | EMC IP Holding Company LLC | Flexible system and method for combining erasure-coded protection sets |
US11023331B2 (en) | 2019-01-04 | 2021-06-01 | EMC IP Holding Company LLC | Fast recovery of data in a geographically distributed storage environment |
US10942827B2 (en) | 2019-01-22 | 2021-03-09 | EMC IP Holding Company LLC | Replication of data in a geographically distributed storage environment |
US10846003B2 (en) | 2019-01-29 | 2020-11-24 | EMC IP Holding Company LLC | Doubly mapped redundant array of independent nodes for data storage |
US10936239B2 (en) | 2019-01-29 | 2021-03-02 | EMC IP Holding Company LLC | Cluster contraction of a mapped redundant array of independent nodes |
US10942825B2 (en) * | 2019-01-29 | 2021-03-09 | EMC IP Holding Company LLC | Mitigating real node failure in a mapped redundant array of independent nodes |
US10866766B2 (en) | 2019-01-29 | 2020-12-15 | EMC IP Holding Company LLC | Affinity sensitive data convolution for data storage systems |
US10754722B1 (en) * | 2019-03-22 | 2020-08-25 | Aic Inc. | Method for remotely clearing abnormal status of racks applied in data center |
US11029865B2 (en) | 2019-04-03 | 2021-06-08 | EMC IP Holding Company LLC | Affinity sensitive storage of data corresponding to a mapped redundant array of independent nodes |
US10944826B2 (en) | 2019-04-03 | 2021-03-09 | EMC IP Holding Company LLC | Selective instantiation of a storage service for a mapped redundant array of independent nodes |
US11121727B2 (en) | 2019-04-30 | 2021-09-14 | EMC IP Holding Company LLC | Adaptive data storing for data storage systems employing erasure coding |
US11113146B2 (en) | 2019-04-30 | 2021-09-07 | EMC IP Holding Company LLC | Chunk segment recovery via hierarchical erasure coding in a geographically diverse data storage system |
US11119686B2 (en) | 2019-04-30 | 2021-09-14 | EMC IP Holding Company LLC | Preservation of data during scaling of a geographically diverse data storage system |
US11748004B2 (en) | 2019-05-03 | 2023-09-05 | EMC IP Holding Company LLC | Data replication using active and passive data storage modes |
JP7358800B2 (ja) * | 2019-06-27 | 2023-10-11 | 京セラドキュメントソリューションズ株式会社 | 電子機器及びその制御プログラム |
US11209996B2 (en) | 2019-07-15 | 2021-12-28 | EMC IP Holding Company LLC | Mapped cluster stretching for increasing workload in a data storage system |
US11023145B2 (en) | 2019-07-30 | 2021-06-01 | EMC IP Holding Company LLC | Hybrid mapped clusters for data storage |
US11449399B2 (en) | 2019-07-30 | 2022-09-20 | EMC IP Holding Company LLC | Mitigating real node failure of a doubly mapped redundant array of independent nodes |
US11228322B2 (en) | 2019-09-13 | 2022-01-18 | EMC IP Holding Company LLC | Rebalancing in a geographically diverse storage system employing erasure coding |
US11449248B2 (en) | 2019-09-26 | 2022-09-20 | EMC IP Holding Company LLC | Mapped redundant array of independent data storage regions |
US11119690B2 (en) | 2019-10-31 | 2021-09-14 | EMC IP Holding Company LLC | Consolidation of protection sets in a geographically diverse data storage environment |
US11435910B2 (en) | 2019-10-31 | 2022-09-06 | EMC IP Holding Company LLC | Heterogeneous mapped redundant array of independent nodes for data storage |
US11288139B2 (en) | 2019-10-31 | 2022-03-29 | EMC IP Holding Company LLC | Two-step recovery employing erasure coding in a geographically diverse data storage system |
US11435957B2 (en) | 2019-11-27 | 2022-09-06 | EMC IP Holding Company LLC | Selective instantiation of a storage service for a doubly mapped redundant array of independent nodes |
US11144220B2 (en) | 2019-12-24 | 2021-10-12 | EMC IP Holding Company LLC | Affinity sensitive storage of data corresponding to a doubly mapped redundant array of independent nodes |
US11231860B2 (en) | 2020-01-17 | 2022-01-25 | EMC IP Holding Company LLC | Doubly mapped redundant array of independent nodes for data storage with high performance |
US11507308B2 (en) | 2020-03-30 | 2022-11-22 | EMC IP Holding Company LLC | Disk access event control for mapped nodes supported by a real cluster storage system |
CN113626275A (zh) * | 2020-05-07 | 2021-11-09 | 捷普科技(上海)有限公司 | 资讯的建立方法及分析方法 |
US11288229B2 (en) | 2020-05-29 | 2022-03-29 | EMC IP Holding Company LLC | Verifiable intra-cluster migration for a chunk storage system |
CN111767181B (zh) * | 2020-06-29 | 2021-11-02 | 深圳小马洛可科技有限公司 | 一种led显示屏用大规模集群管理系统 |
US11893644B2 (en) | 2020-10-15 | 2024-02-06 | State Farm Mutual Automobile Insurance Company | Intelligent user interface monitoring and alert |
US11836032B2 (en) | 2020-10-15 | 2023-12-05 | State Farm Mutual Automobile Insurance Company | Error monitoring and prevention in computing systems based on determined trends and routing a data stream over a second network having less latency |
US11693983B2 (en) | 2020-10-28 | 2023-07-04 | EMC IP Holding Company LLC | Data protection via commutative erasure coding in a geographically diverse data storage system |
US11847141B2 (en) | 2021-01-19 | 2023-12-19 | EMC IP Holding Company LLC | Mapped redundant array of independent nodes employing mapped reliability groups for data storage |
US11625174B2 (en) | 2021-01-20 | 2023-04-11 | EMC IP Holding Company LLC | Parity allocation for a virtual redundant array of independent disks |
CN112799917B (zh) * | 2021-02-08 | 2024-01-23 | 联想(北京)有限公司 | 一种数据处理方法、装置及设备 |
CN113238916A (zh) * | 2021-05-14 | 2021-08-10 | 山东英信计算机技术有限公司 | 一种服务器资产管理方法、bmc、管理后台、终端 |
US11449234B1 (en) | 2021-05-28 | 2022-09-20 | EMC IP Holding Company LLC | Efficient data access operations via a mapping layer instance for a doubly mapped redundant array of independent nodes |
US11354191B1 (en) | 2021-05-28 | 2022-06-07 | EMC IP Holding Company LLC | Erasure coding in a large geographically diverse data storage system |
CN113392052B (zh) * | 2021-06-11 | 2023-07-18 | 深圳市同泰怡信息技术有限公司 | 一种基于四路服务器的bios系统、方法及计算机可读存储介质 |
US20210397530A1 (en) * | 2021-06-25 | 2021-12-23 | Intel Corporation | Methods and apparatus to transmit central processing unit performance information to an operating system |
US11841773B2 (en) * | 2021-09-14 | 2023-12-12 | Dell Products L.P. | Persistence of learned profiles |
CN114003416B (zh) * | 2021-09-23 | 2024-01-12 | 苏州浪潮智能科技有限公司 | 内存错误动态处理方法、系统、终端及存储介质 |
TWI815310B (zh) * | 2022-02-16 | 2023-09-11 | 玉山商業銀行股份有限公司 | 主動式資料庫風險偵測系統與運作方法 |
US11886283B2 (en) | 2022-03-30 | 2024-01-30 | International Business Machines Corporation | Automatic node crash detection and remediation in distributed computing systems |
CN115562913B (zh) * | 2022-04-21 | 2023-11-14 | 荣耀终端有限公司 | 一种硬件状态分析方法、装置及系统 |
CN117675505A (zh) * | 2022-09-08 | 2024-03-08 | 华为技术有限公司 | 事件处理方法、装置及系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11265322A (ja) * | 1998-03-18 | 1999-09-28 | Fujitsu Ltd | バックアップ機能付オンラインデータベース情報処理システム |
JP2004259044A (ja) * | 2003-02-26 | 2004-09-16 | Hitachi Ltd | 情報処理装置の管理方法およびシステム |
JP2012178014A (ja) * | 2011-02-25 | 2012-09-13 | Hitachi Ltd | 故障予測・対策方法及びクライアントサーバシステム |
JP2014146110A (ja) * | 2013-01-28 | 2014-08-14 | Nec Corp | 情報処理装置、エラー検出機能診断方法およびコンピュータプログラム |
JP2016152011A (ja) * | 2015-02-19 | 2016-08-22 | ファナック株式会社 | 制御装置の故障予測システム |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6516429B1 (en) * | 1999-11-04 | 2003-02-04 | International Business Machines Corporation | Method and apparatus for run-time deconfiguration of a processor in a symmetrical multi-processing system |
US7536370B2 (en) | 2004-06-24 | 2009-05-19 | Sun Microsystems, Inc. | Inferential diagnosing engines for grid-based computing systems |
JP4859558B2 (ja) | 2006-06-30 | 2012-01-25 | 株式会社日立製作所 | コンピュータシステムの制御方法及びコンピュータシステム |
US20090259890A1 (en) * | 2008-04-14 | 2009-10-15 | Turin Networks | Method & apparatus for hardware fault management |
US8332690B1 (en) | 2008-06-27 | 2012-12-11 | Symantec Corporation | Method and apparatus for managing failures in a datacenter |
US20120221884A1 (en) | 2011-02-28 | 2012-08-30 | Carter Nicholas P | Error management across hardware and software layers |
TW201417536A (zh) | 2012-10-24 | 2014-05-01 | Hon Hai Prec Ind Co Ltd | 伺服器自動管理方法及系統 |
US20140122930A1 (en) | 2012-10-25 | 2014-05-01 | International Business Machines Corporation | Performing diagnostic tests in a data center |
WO2015166510A1 (en) * | 2014-04-30 | 2015-11-05 | Hewlett-Packard Development Company, L.P. | On demand remote diagnostics for hardware component failure and disk drive data recovery using embedded storage media |
US9965367B2 (en) | 2014-12-17 | 2018-05-08 | Quanta Computer Inc. | Automatic hardware recovery system |
US10599504B1 (en) * | 2015-06-22 | 2020-03-24 | Amazon Technologies, Inc. | Dynamic adjustment of refresh rate |
US10360114B2 (en) | 2016-02-24 | 2019-07-23 | Quanta Computer Inc. | Hardware recovery systems |
CN107077408A (zh) | 2016-12-05 | 2017-08-18 | 华为技术有限公司 | 故障处理的方法、计算机系统、基板管理控制器和系统 |
-
2018
- 2018-08-13 US US16/101,749 patent/US10761926B2/en active Active
-
2019
- 2019-01-28 TW TW108103022A patent/TWI680369B/zh active
- 2019-02-11 CN CN201910109755.8A patent/CN110825578A/zh active Pending
- 2019-07-05 EP EP19184702.9A patent/EP3620922A1/en active Pending
- 2019-07-10 JP JP2019128482A patent/JP6828096B2/ja active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11265322A (ja) * | 1998-03-18 | 1999-09-28 | Fujitsu Ltd | バックアップ機能付オンラインデータベース情報処理システム |
JP2004259044A (ja) * | 2003-02-26 | 2004-09-16 | Hitachi Ltd | 情報処理装置の管理方法およびシステム |
JP2012178014A (ja) * | 2011-02-25 | 2012-09-13 | Hitachi Ltd | 故障予測・対策方法及びクライアントサーバシステム |
JP2014146110A (ja) * | 2013-01-28 | 2014-08-14 | Nec Corp | 情報処理装置、エラー検出機能診断方法およびコンピュータプログラム |
JP2016152011A (ja) * | 2015-02-19 | 2016-08-22 | ファナック株式会社 | 制御装置の故障予測システム |
Also Published As
Publication number | Publication date |
---|---|
US10761926B2 (en) | 2020-09-01 |
JP6828096B2 (ja) | 2021-02-10 |
EP3620922A1 (en) | 2020-03-11 |
TWI680369B (zh) | 2019-12-21 |
TW202009705A (zh) | 2020-03-01 |
CN110825578A (zh) | 2020-02-21 |
US20200050510A1 (en) | 2020-02-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6828096B2 (ja) | サーバハードウェア障害の分析及びリカバリ | |
Wang et al. | What can we learn from four years of data center hardware failures? | |
Zheng et al. | Co-analysis of RAS log and job log on Blue Gene/P | |
US8839032B2 (en) | Managing errors in a data processing system | |
US8108724B2 (en) | Field replaceable unit failure determination | |
US9262253B2 (en) | Middlebox reliability | |
JP5267736B2 (ja) | 障害検出装置、障害検出方法およびプログラム記録媒体 | |
US9292402B2 (en) | Autonomous service management | |
Tang et al. | Assessment of the effect of memory page retirement on system RAS against hardware faults | |
WO2006035931A1 (ja) | 情報システムの信頼性評価システム、信頼性評価方法、信頼性評価プログラム | |
US6567935B1 (en) | Performance linking methodologies | |
Bauer et al. | Practical system reliability | |
Li et al. | Fighting the fog of war: Automated incident detection for cloud systems | |
US20230239194A1 (en) | Node health prediction based on failure issues experienced prior to deployment in a cloud computing system | |
JP4648961B2 (ja) | 装置メンテナンスシステム、方法および情報処理装置 | |
Amvrosiadis et al. | Getting back up: Understanding how enterprise data backups fail | |
Di Martino et al. | Measuring the resiliency of extreme-scale computing environments | |
JP2014021577A (ja) | 故障予測装置、故障予測システム、故障予測方法、及び、故障予測プログラム | |
Li et al. | Going through the life cycle of faults in clouds: Guidelines on fault handling | |
US20100251029A1 (en) | Implementing self-optimizing ipl diagnostic mode | |
WO2011051999A1 (ja) | 情報処理装置及び情報処理装置の制御方法 | |
JP4575020B2 (ja) | 障害解析装置 | |
JP2013206046A (ja) | 情報処理装置、起動時診断方法、及びプログラム | |
Salfner et al. | Architecting dependable systems with proactive fault management | |
Sankar et al. | Soft failures in large datacenters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20190710 |
|
A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20200812 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20200915 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20201207 |
|
TRDD | Decision of grant or rejection written | ||
A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20210105 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20210120 |
|
R150 | Certificate of patent or registration of utility model |
Ref document number: 6828096 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |