JP2024521357A - Mlモデルを用いて準リアルタイムデータ/オフラインデータでデータセンタの大規模な故障の検出 - Google Patents
Mlモデルを用いて準リアルタイムデータ/オフラインデータでデータセンタの大規模な故障の検出 Download PDFInfo
- Publication number
- JP2024521357A JP2024521357A JP2023574401A JP2023574401A JP2024521357A JP 2024521357 A JP2024521357 A JP 2024521357A JP 2023574401 A JP2023574401 A JP 2023574401A JP 2023574401 A JP2023574401 A JP 2023574401A JP 2024521357 A JP2024521357 A JP 2024521357A
- Authority
- JP
- Japan
- Prior art keywords
- data
- fault
- data center
- input data
- sources
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0769—Readable error formats, e.g. cross-platform generic formats, human understandable formats
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0709—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
- G06F11/0754—Error or fault detection not based on redundancy by exceeding limits
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
- G06F11/2252—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using fault dictionaries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
- G06F11/2257—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using expert systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3058—Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3058—Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
- G06F11/3062—Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations where the monitored property is the power consumption
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2379—Updates performed during online database operations; commit processing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/301—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is a virtual computing platform, e.g. logically partitioned systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Debugging And Monitoring (AREA)
- Test And Diagnosis Of Digital Computers (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/338,478 | 2021-06-03 | ||
| US17/338,478 US11397634B1 (en) | 2021-06-03 | 2021-06-03 | Detecting datacenter mass outage with near real-time/offline using ML models |
| PCT/US2022/031614 WO2022256330A1 (en) | 2021-06-03 | 2022-05-31 | Detecting datacenter mass outage with near real-time/offline data using ml models |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| JP2024521357A true JP2024521357A (ja) | 2024-05-31 |
| JP2024521357A5 JP2024521357A5 (https=) | 2025-05-30 |
| JPWO2022256330A5 JPWO2022256330A5 (https=) | 2025-05-30 |
Family
ID=82399474
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2023574401A Pending JP2024521357A (ja) | 2021-06-03 | 2022-05-31 | Mlモデルを用いて準リアルタイムデータ/オフラインデータでデータセンタの大規模な故障の検出 |
Country Status (5)
| Country | Link |
|---|---|
| US (3) | US11397634B1 (https=) |
| EP (1) | EP4348429B1 (https=) |
| JP (1) | JP2024521357A (https=) |
| CN (1) | CN117280327B (https=) |
| WO (1) | WO2022256330A1 (https=) |
Families Citing this family (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115211168B (zh) * | 2020-08-14 | 2025-06-17 | 中兴通讯股份有限公司 | 一种基于ai的负载预测方法 |
| US11397634B1 (en) | 2021-06-03 | 2022-07-26 | Oracle International Corporation | Detecting datacenter mass outage with near real-time/offline using ML models |
| US11706130B2 (en) * | 2021-07-19 | 2023-07-18 | Cisco Technology, Inc. | Root-causing user experience anomalies to coordinate reactive policies in application-aware routing |
| CN119604834A (zh) | 2022-09-23 | 2025-03-11 | 甲骨文国际公司 | 利用无功功率封顶的数据中心级电力管理 |
| US12007734B2 (en) | 2022-09-23 | 2024-06-11 | Oracle International Corporation | Datacenter level power management with reactive power capping |
| US12155210B2 (en) | 2022-11-08 | 2024-11-26 | Oracle International Corporation | Techniques for orchestrated load shedding |
| CN120167057A (zh) * | 2022-11-08 | 2025-06-17 | 甲骨文国际公司 | 编排负载削减的技术 |
| US12045125B2 (en) * | 2022-11-15 | 2024-07-23 | Sap Se | Alert aggregation and health issues processing in a cloud environment |
| US20250053472A1 (en) * | 2023-08-09 | 2025-02-13 | Siemens Aktiengesellschaft | System and corresponding computer-implemented method for identifying exception in behavior of a computer system or of an application executed on the computer system |
| US20250245630A1 (en) * | 2024-01-27 | 2025-07-31 | Charter Communications Operating, Llc | Automated power outage detection, reporting and mitigation |
| CN120386650A (zh) * | 2024-01-29 | 2025-07-29 | 杭州阿里云飞天信息技术有限公司 | 宕机故障分析方法、装置、电子设备及介质 |
| US20260037397A1 (en) * | 2024-08-05 | 2026-02-05 | Dell Products L.P. | Failover and synchronization management for databases |
Family Cites Families (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6889908B2 (en) * | 2003-06-30 | 2005-05-10 | International Business Machines Corporation | Thermal analysis in a data processing system |
| WO2008052982A1 (en) * | 2006-10-30 | 2008-05-08 | Thomson Licensing | Method for indicating a service interruption source |
| US9843948B2 (en) * | 2015-03-18 | 2017-12-12 | T-Mobile Usa, Inc. | Pathway-based data interruption detection |
| US10055277B1 (en) * | 2015-09-30 | 2018-08-21 | Amdocs Development Limited | System, method, and computer program for performing health checks on a system including a plurality of heterogeneous system components |
| US10761921B2 (en) * | 2017-11-30 | 2020-09-01 | Optumsoft, Inc. | Automatic root cause analysis using ternary fault scenario representation |
| US10623273B2 (en) * | 2018-01-02 | 2020-04-14 | Cisco Technology, Inc. | Data source modeling to detect disruptive changes in data dynamics |
| US20190228296A1 (en) * | 2018-01-19 | 2019-07-25 | EMC IP Holding Company LLC | Significant events identifier for outlier root cause investigation |
| CN108469987B (zh) * | 2018-02-26 | 2020-12-29 | 华东师范大学 | 一种基于中断控制流图的中断验证系统 |
| US10977154B2 (en) * | 2018-08-03 | 2021-04-13 | Dynatrace Llc | Method and system for automatic real-time causality analysis of end user impacting system anomalies using causality rules and topological understanding of the system to effectively filter relevant monitoring data |
| EP3637261A1 (en) * | 2018-10-10 | 2020-04-15 | Schneider Electric IT Corporation | Systems and methods for automatically generating a data center network mapping for automated alarm consolidation |
| US11126493B2 (en) * | 2018-11-25 | 2021-09-21 | Aloke Guha | Methods and systems for autonomous cloud application operations |
| US11347576B2 (en) * | 2019-07-23 | 2022-05-31 | Vmware, Inc. | Root cause analysis of non-deterministic performance anomalies |
| US11061393B2 (en) * | 2019-08-28 | 2021-07-13 | International Business Machines Corporation | Consolidating anomaly root causes and alarms using graphical granger models |
| US11252014B2 (en) * | 2019-09-30 | 2022-02-15 | Dynatrace Llc | Forming root cause groups of incidents in clustered distributed system through horizontal and vertical aggregation |
| US20220027257A1 (en) * | 2020-07-23 | 2022-01-27 | Vmware, Inc. | Automated Methods and Systems for Managing Problem Instances of Applications in a Distributed Computing Facility |
| US11176016B1 (en) * | 2020-09-22 | 2021-11-16 | International Business Machines Corporation | Detecting and managing anomalies in underground sensors for agricultural applications |
| US12326777B2 (en) * | 2021-04-16 | 2025-06-10 | Workspot, Inc. | Method and system for real-time identification of root cause of a fault in a globally distributed virtual desktop fabric |
| US11675648B2 (en) * | 2021-04-27 | 2023-06-13 | Microsoft Technology Licensing, Llc | Automatic triaging of diagnostics failures |
| US12204431B2 (en) * | 2021-05-07 | 2025-01-21 | Dynatrace Llc | Method and system for the on-demand generation of graph-like models out of multidimensional observation data |
| US11397634B1 (en) * | 2021-06-03 | 2022-07-26 | Oracle International Corporation | Detecting datacenter mass outage with near real-time/offline using ML models |
-
2021
- 2021-06-03 US US17/338,478 patent/US11397634B1/en active Active
-
2022
- 2022-05-31 EP EP22737684.5A patent/EP4348429B1/en active Active
- 2022-05-31 CN CN202280033725.3A patent/CN117280327B/zh active Active
- 2022-05-31 JP JP2023574401A patent/JP2024521357A/ja active Pending
- 2022-05-31 WO PCT/US2022/031614 patent/WO2022256330A1/en not_active Ceased
- 2022-06-22 US US17/846,537 patent/US11656928B2/en active Active
-
2023
- 2023-04-11 US US18/133,394 patent/US12045123B2/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| US20220391278A1 (en) | 2022-12-08 |
| US11397634B1 (en) | 2022-07-26 |
| EP4348429A1 (en) | 2024-04-10 |
| US12045123B2 (en) | 2024-07-23 |
| US20230251920A1 (en) | 2023-08-10 |
| WO2022256330A1 (en) | 2022-12-08 |
| CN117280327B (zh) | 2024-04-05 |
| US11656928B2 (en) | 2023-05-23 |
| CN117280327A (zh) | 2023-12-22 |
| EP4348429B1 (en) | 2025-05-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP4348429B1 (en) | Detecting datacenter mass outage with near real-time/offline data using ml models | |
| JP7818616B2 (ja) | コンテナフレームワークのネットワークポリシーを検証するための技術 | |
| US11843510B2 (en) | Automatically inferring software-defined network policies from the observed workload in a computing environment | |
| US11797414B2 (en) | Method and system for failure prediction in cloud computing platforms | |
| US12242332B2 (en) | Identifying root cause anomalies in time series | |
| US12135991B2 (en) | Management plane orchestration across service cells | |
| US10185614B2 (en) | Generic alarm correlation by means of normalized alarm codes | |
| US20240061939A1 (en) | Threat change analysis system | |
| EP3239840B1 (en) | Fault information provision server and fault information provision method | |
| US11381451B2 (en) | Methods, systems, and computer readable mediums for selecting and configuring a computing system to support a replicated application | |
| US20220394107A1 (en) | Techniques for managing distributed computing components | |
| US20230342125A1 (en) | Enforcement of environmental conditions for cloud applications | |
| JP2024546424A (ja) | クラウドインフラストラクチャシステム内のコンピューティングノードの認可のためのエッジアテステーション | |
| US20180060987A1 (en) | Identification of abnormal behavior in human activity based on internet of things collected data | |
| US11563628B1 (en) | Failure detection in cloud-computing systems | |
| US20230403291A1 (en) | Framework for anomaly detection in a cloud environment | |
| JP2025523447A (ja) | ランサムウェア攻撃に基づく動的クラウド作業負荷再割り当てのための命令モニタリング | |
| US11366651B2 (en) | Framework for hardware-specific analytic plugins | |
| US20250272601A1 (en) | Artificial intelligence training using accesibility data | |
| US12210400B2 (en) | Techniques for performing fault tolerance validation for a data center | |
| US20250298650A1 (en) | Watchdog daemons for self-recovering hypervisors | |
| US20250355780A1 (en) | Large scale event fault simulator | |
| US20140040447A1 (en) | Management system and program product |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20250521 |
|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20250521 |