JP2013257764A

JP2013257764A - Failure analysis system, failure analysis device, server device, and failure analysis method and program

Info

Publication number: JP2013257764A
Application number: JP2012133793A
Authority: JP
Inventors: Masaki Chibana; 昌樹知花
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2012-06-13
Filing date: 2012-06-13
Publication date: 2013-12-26

Abstract

PROBLEM TO BE SOLVED: To load a failure analysis function on a failure analysis device controlling a plurality of server devices so as to improve accuracy of failure analysis at low costs.SOLUTION: A plurality of server devices 10-1 and 10-2, and a failure analysis device 20 are connected with each other via the Internet. The server devices, upon detection of hardware failure thereof, collect and transmit failure information to the failure analysis device. The failure analysis device analyzes the hardware failure that has occurred in the server devices, and transmits a failure analysis result including information on a suspected failure place to the server devices. The server devices degenerates the suspected failure place on the basis of the received failure analysis result.

Description

本発明は、コンピュータの障害を解析する障害解析システム、障害解析装置、サーバ装置、障害解析方法及びプログラムに関する。 The present invention relates to a failure analysis system, a failure analysis device, a server device, a failure analysis method, and a program for analyzing a failure of a computer.

障害解析システムの一例が特許文献１に記載されている。
特許文献１に記載の障害解析システムは、コンピュータシステムに障害が発生したときに、自動的に障害内容を分析し、障害発生コンピュータシステムの電話番号を検索し、この電話番号を自動ダイヤルして障害ログデータ収集回線をコンピュータシステムに接続する。
そして、特許文献１に記載の障害解析システムは、コンピュータシステムから障害ログデータを収集し、この障害ログデータを解析し、障害の原因および障害復旧の処置方法を表示する。
特許文献１に記載の障害解析システムは、コンピュータシステムに障害が発生した時に、障害内容の分析、コンピュータシステムからのログデータの収集および障害ログデータの解析を、遠隔診断システムによって自動的に行うことができるので、保守技術者の負担を軽減できるとともに、障害修復までの時間を短縮することができる。 An example of a failure analysis system is described in Patent Document 1.
The failure analysis system described in Patent Document 1 automatically analyzes the content of a failure when a failure occurs in a computer system, searches for the telephone number of the failure computer system, and automatically dials this telephone number for failure. Connect the log data collection line to the computer system.
The failure analysis system described in Patent Document 1 collects failure log data from a computer system, analyzes the failure log data, and displays the cause of the failure and a method for recovering from the failure.
The failure analysis system described in Patent Document 1 automatically performs analysis of failure contents, collection of log data from the computer system, and analysis of failure log data by a remote diagnosis system when a failure occurs in the computer system. Therefore, the burden on the maintenance engineer can be reduced and the time to repairing the fault can be shortened.

一方、障害解析システムの他例が特許文献２に記載されている。
特許文献２に記載の障害解析システムは、コンピュータサーバ内の単一障害に起因してＢＭＣとＢＩＯＳが分担して分散収集・保持したログを同一のｅｖｅｎｔＩＤで対応付けて管理するようにしている。
これにより、特許文献２に記載の障害解析システムは、分散収集・保持されたログをコンピュータサーバ外へ回収し、人手による障害原因究明を行う場合でも、どのログが同一障害発生に起因して収集されたものなのかｅｖｅｎｔＩＤを用いて容易かつ確実に判断することができる。
したがって、特許文献２に記載の障害解析システムは、人手による障害原因となった故障個所究明の時間短縮を図ることができ、総合的・横断的に故障解析する場合に、障害原因となった故障個所を適確に特定することができる。 On the other hand, Patent Document 2 describes another example of the failure analysis system.
The failure analysis system described in Patent Document 2 manages logs that are distributed and collected and held by BMC and BIOS in a distributed manner due to a single failure in a computer server in association with the same event ID. .
As a result, the failure analysis system described in Patent Document 2 collects distributed and collected logs out of the computer server and collects them due to the occurrence of the same failure even when the cause of failure is manually investigated. It can be easily and surely determined by using the event ID.
Therefore, the failure analysis system described in Patent Document 2 can reduce the time required for finding the failure that caused the failure manually, and the failure that caused the failure when analyzing the failure comprehensively and across the board. The location can be accurately identified.

他方、障害解析システムの他例が特許文献３に記載されている。
特許文献３に記載の障害解析システムは、統合監視システムと監視サーバを独立したＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）で接続し、マルチベンダ環境の異なる監視情報を一元化するために監視サーバに統合用通知モジュールを組み入れた。
これにより、特許文献３に記載の障害解析システムは、統合監視サーバの継続的な運用を実現できる。 On the other hand, Patent Document 3 describes another example of the failure analysis system.
In the failure analysis system described in Patent Document 3, the integrated monitoring system and the monitoring server are connected by an independent LAN (Local Area Network), and an integrated notification module is provided on the monitoring server to unify different monitoring information in a multi-vendor environment. Incorporated.
Thereby, the failure analysis system described in Patent Literature 3 can realize continuous operation of the integrated monitoring server.

特開平４−３６７０４０号公報JP-A-4-367040 特開２０１１−１４５８２４号公報JP 2011-145824 A 特開２００６−０７２７８４号公報JP 2006-072784 A

ところで、特許文献１、特許文献２、及び、特許文献３に関連して、障害発生時にＢＭＣＦＷ（ＢａｓｅｂｏａｒｄＭａｎａｇｅｍｅｎｔＣｏｎｔｒｏｌｌｅｒＦｉｒｍＷａｒｅ）が被疑箇所を推定し、縮退指示を出す障害解析システムの提案がある。
このような障害解析システムにおいては、推定した被疑箇所が必ずしも的確に指摘できることはなく、例えばＩ／Ｆ系の障害等の場合、複数箇所の被疑を指摘し、それらのいずれか１箇所が実際の故障箇所である場合もありうる。
そのため、このような障害解析システムにおいては、障害被疑指摘の精度を上げるために、ＢＭＣＦＷを直接改修する必要がある。したがって、このような障害解析システムは、障害解析精度を向上させる際に出荷している全てのサーバのＢＭＣＦＷを改修することにより、多大なコストを必要とすることになる。 Incidentally, in relation to Patent Document 1, Patent Document 2, and Patent Document 3, there is a proposal of a failure analysis system in which a BMCFW (Baseboard Management Controller Firmware) estimates a suspected location and issues a degeneration instruction when a failure occurs.
In such a failure analysis system, the estimated suspicious location cannot always be pointed out accurately. For example, in the case of an I / F system failure or the like, a plurality of suspicions are pointed out, and any one of them is the actual location. There may be a failure location.
Therefore, in such a failure analysis system, it is necessary to directly modify the BMCFW in order to increase the accuracy of the failure suspect indication. Therefore, such a failure analysis system requires a great deal of cost by renovating the BMCFW of all servers shipped when improving the failure analysis accuracy.

本発明の目的は、上述した課題を解決する障害解析システム、障害解析装置、サーバ装置、障害解析方法及びプログラムを提供することにある。 An object of the present invention is to provide a failure analysis system, a failure analysis device, a server device, a failure analysis method, and a program that solve the above-described problems.

本発明は上記の課題を解決するためになされたものであり、複数のサーバ装置と、前記サーバ装置に発生したハードウェア障害を解析する障害解析装置とを備え、前記サーバ装置は、前記障害解析装置による障害解析結果に基づいて障害発生箇所の縮退処理を行うことを特徴とする。 The present invention has been made to solve the above problem, and includes a plurality of server devices and a failure analysis device that analyzes a hardware failure that has occurred in the server device, and the server device includes the failure analysis. According to the failure analysis result by the device, the failure occurrence portion is degenerated.

また、本発明は、複数のサーバ装置から当該サーバ装置に発生したハードウェア障害に関する情報である障害情報を受信する障害情報受信部と、前記障害情報受信部が受信した障害情報に基づいて前記サーバ装置に発生したハードウェア障害を解析する解析部と、前記解析部による障害解析結果を前記サーバ装置に送信する解析結果送信部とを備えることを特徴とする。 According to another aspect of the present invention, there is provided a failure information receiving unit that receives failure information that is information related to a hardware failure that has occurred in the server device from a plurality of server devices, and the server based on the failure information received by the failure information receiving unit. An analysis unit that analyzes a hardware failure that has occurred in the apparatus, and an analysis result transmission unit that transmits a failure analysis result by the analysis unit to the server device.

また、本発明は、自装置のハードウェア障害を検知する検知部と、前記検知部が検知した障害に関する情報である障害情報を収集する障害情報収集部と、前記障害情報収集部が収集した障害情報を前記障害解析装置に送信する障害情報送信部と、前記障害解析装置から障害解析結果を受信する解析結果受信部と、前記解析結果受信部が受信した障害解析結果に基づいて縮退処理を実施する縮退部とを備えることを特徴とする。 In addition, the present invention provides a detection unit that detects a hardware failure of the device itself, a failure information collection unit that collects failure information that is information related to a failure detected by the detection unit, and a failure collected by the failure information collection unit A failure information transmission unit that transmits information to the failure analysis device, an analysis result reception unit that receives a failure analysis result from the failure analysis device, and a degeneration process based on the failure analysis result received by the analysis result reception unit And a degenerate portion.

また、本発明は、障害解析装置は、サーバ装置に発生したハードウェア障害を解析し、前記サーバ装置は、前記障害解析装置による障害解析結果に基づいて障害発生箇所の縮退処理を行うことを特徴とする。 According to the present invention, the failure analysis device analyzes a hardware failure that has occurred in the server device, and the server device performs a degeneration process of the failure occurrence location based on a failure analysis result by the failure analysis device. And

また、本発明は、コンピュータを、複数のサーバ装置から当該サーバ装置に発生したハードウェア障害に関する情報である障害情報を受信する障害情報受信部、前記障害情報受信部が受信した障害情報に基づいて前記サーバ装置に発生したハードウェア障害を解析する解析部、前記解析部による障害解析結果を前記サーバ装置に送信する解析結果送信部として機能させるためのプログラムである。 The present invention also provides a computer based on the failure information received by the failure information receiving unit, a failure information receiving unit that receives failure information that is information related to a hardware failure that has occurred in the server device from a plurality of server devices. It is a program for functioning as an analysis unit that analyzes a hardware failure that has occurred in the server device, and an analysis result transmission unit that transmits a failure analysis result by the analysis unit to the server device.

また、本発明は、コンピュータを、自装置のハードウェア障害を検知する検知部、前記検知部が検知した障害に関する情報である障害情報を収集する障害情報収集部、前記障害情報収集部が収集した障害情報を障害解析装置に送信する障害情報送信部、前記障害解析装置から障害解析結果を受信する解析結果受信部、前記解析結果受信部が受信した障害解析結果に基づいて縮退処理を実施する縮退部として機能させるためのプログラムである。 In the present invention, the computer has a detection unit that detects a hardware failure of the device itself, a failure information collection unit that collects failure information that is information related to the failure detected by the detection unit, and the failure information collection unit. A failure information transmission unit that transmits failure information to a failure analysis device, an analysis result reception unit that receives a failure analysis result from the failure analysis device, and a degeneration process that performs a degeneration process based on the failure analysis result received by the analysis result reception unit This is a program for functioning as a part.

本発明によれば、サーバ装置毎に改修を行わずに、各サーバ装置を統括する障害解析装置を改修する。これにより、より少ないコストで障害解析の精度向上が可能となる。 According to the present invention, the failure analysis device that supervises each server device is repaired without performing the repair for each server device. As a result, the accuracy of failure analysis can be improved at a lower cost.

本発明に係る一実施形態の障害解析システムの基本構成を示す概略ブロック図である。It is a schematic block diagram which shows the basic composition of the failure analysis system of one Embodiment which concerns on this invention. 本発明に係る一実施形態の障害解析システムのソフトウェア構成を示す概略ブロック図である。It is a schematic block diagram which shows the software structure of the failure analysis system of one Embodiment which concerns on this invention. 本発明に係る一実施形態の障害解析システムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the failure analysis system of one Embodiment which concerns on this invention.

以下、図面を参照しながら本発明の実施形態について詳しく説明する。
図１は、本発明の一実施形態の障害解析システムの基本構成を示す概略ブロック図である。
図１に示すように、障害解析システム１は、複数のサーバ装置１０（サーバ装置１０−１、１０−２）、障害解析装置２０を備える。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a schematic block diagram showing a basic configuration of a failure analysis system according to an embodiment of the present invention.
As illustrated in FIG. 1, the failure analysis system 1 includes a plurality of server devices 10 (server devices 10-1 and 10-2) and a failure analysis device 20.

障害解析装置２０は、サーバ装置１０に発生したハードウェア障害を解析し、サーバ装置１０においてハードウェア障害が発生した箇所を推定する。サーバ装置１０は、障害解析装置２０による障害解析結果に基づいて障害発生箇所の縮退処理を行う。
なお、サーバ装置１０と障害解析装置２０とは、インターネットを介して接続される。 The failure analysis device 20 analyzes a hardware failure that has occurred in the server device 10 and estimates a location where a hardware failure has occurred in the server device 10. The server device 10 performs a degeneration process of the failure occurrence location based on the failure analysis result by the failure analysis device 20.
The server device 10 and the failure analysis device 20 are connected via the Internet.

図２は、本発明の一実施形態による障害解析システム１におけるサーバ装置１０、障害解析装置２０のソフトウェア構成を示す概略ブロック図である。 FIG. 2 is a schematic block diagram showing software configurations of the server device 10 and the failure analysis device 20 in the failure analysis system 1 according to an embodiment of the present invention.

図２に示すように、サーバ装置１０は、検知部１１、障害情報収集部１２、障害情報送信部１３、解析結果受信部１４、縮退部１５、回復結果送信１６を備える。
障害解析装置２０は、障害情報受信部２１、解析部２２、解析結果送信部２３、回復結果受信部２４を備える。 As illustrated in FIG. 2, the server device 10 includes a detection unit 11, a failure information collection unit 12, a failure information transmission unit 13, an analysis result reception unit 14, a degeneration unit 15, and a recovery result transmission 16.
The failure analysis device 20 includes a failure information reception unit 21, an analysis unit 22, an analysis result transmission unit 23, and a recovery result reception unit 24.

検知部１１は、サーバ装置１０のハードウェア障害を検知する。
障害情報収集部１２は、検知部１１が検知した障害に関する情報である障害情報を収集する。
障害情報送信部１３は、障害情報収集部１２が収集した障害情報を障害解析装置２０に送信する。
解析結果受信部１４は、障害解析装置２０の解析結果送信部２３から障害解析結果を受信する。
ここで、障害解析結果は、障害解析装置２０によってハードウェア障害が発生していると推定した箇所である障害被疑箇所の情報を含む。
縮退部１５は、解析結果受信部１４が受信した障害解析結果に基づいて縮退処理を実施する。つまり、縮退部１５は、障害解析結果が示す障害被疑箇所を縮退させる。
回復結果送信部１６は、縮退部１５による縮退処理の実行によるハードウェア障害の回復結果を、障害解析装置２０に送信する。 The detection unit 11 detects a hardware failure of the server device 10.
The failure information collection unit 12 collects failure information that is information related to the failure detected by the detection unit 11.
The failure information transmission unit 13 transmits the failure information collected by the failure information collection unit 12 to the failure analysis apparatus 20.
The analysis result reception unit 14 receives the failure analysis result from the analysis result transmission unit 23 of the failure analysis device 20.
Here, the failure analysis result includes information on a suspected failure location that is a location estimated by the failure analysis device 20 that a hardware failure has occurred.
The degeneration unit 15 performs a degeneration process based on the failure analysis result received by the analysis result receiving unit 14. That is, the degeneration unit 15 degenerates the suspected failure location indicated by the failure analysis result.
The recovery result transmission unit 16 transmits the recovery result of the hardware failure resulting from the execution of the reduction process by the reduction unit 15 to the failure analysis apparatus 20.

障害情報受信部２１は、サーバ装置１０から障害情報を受信する。
解析部２２は、障害情報受信部２１が受信した障害情報に基づいてサーバ装置１０に発生したハードウェア障害を解析する。
解析結果送信部２３は、解析部２２による障害解析結果をサーバ装置１０の解析結果受信部１４に送信する
回復結果受信部２４は、縮退部１５が実施した縮退処理に伴う回復結果の良否を判定して処理する。 The failure information receiving unit 21 receives failure information from the server device 10.
The analysis unit 22 analyzes a hardware failure that has occurred in the server device 10 based on the failure information received by the failure information reception unit 21.
The analysis result transmission unit 23 transmits the failure analysis result by the analysis unit 22 to the analysis result reception unit 14 of the server device 10. The recovery result reception unit 24 determines whether the recovery result accompanying the degeneration processing performed by the degeneration unit 15 is good or bad. And process.

次に、本実施形態の障害解析システム１の動作について説明する。
図３は、本実施形態の障害解析システム１の動作を示すフローチャートである。
図３に示すように、まず検知部１１がサーバ装置１０のハードウェア障害を検知すると（ステップＳ１０１）、障害情報収集部１２は、検知部１１が検知した障害に関する情報である障害情報を収集する（ステップＳ１０２）。
次に、障害情報送信部１３は、障害情報収集部１２が収集した障害情報を障害解析装置２０に送信する（ステップＳ１０３）。 Next, the operation of the failure analysis system 1 of this embodiment will be described.
FIG. 3 is a flowchart showing the operation of the failure analysis system 1 of the present embodiment.
As shown in FIG. 3, first, when the detection unit 11 detects a hardware failure of the server device 10 (step S101), the failure information collection unit 12 collects failure information that is information about the failure detected by the detection unit 11. (Step S102).
Next, the failure information transmission unit 13 transmits the failure information collected by the failure information collection unit 12 to the failure analysis apparatus 20 (step S103).

サーバ装置１０が障害情報を送信すると、障害解析装置２０の障害情報受信部２１は、障害情報を受信する（ステップＳ１０４）解析部２２は、当該障害情報に基づいてサーバ装置１０に発生したハードウェア障害を解析する（ステップＳ１０５）。このとき、解析部２２は、ハードウェア障害が発生した箇所の推定を行う。次に、解析結果送信部２３は、解析部２２による障害解析結果をサーバ装置１０に送信する（ステップＳ１０６）。当該障害解析結果には、解析部２２が推定したハードウェア障害が発生した箇所である、障害被疑箇所を示す情報が含まれる。 When the server device 10 transmits the failure information, the failure information reception unit 21 of the failure analysis device 20 receives the failure information (step S104). The analysis unit 22 generates hardware generated in the server device 10 based on the failure information. The failure is analyzed (step S105). At this time, the analysis unit 22 estimates the location where the hardware failure has occurred. Next, the analysis result transmission unit 23 transmits the failure analysis result by the analysis unit 22 to the server device 10 (step S106). The failure analysis result includes information indicating a suspected failure location that is a location where the hardware failure estimated by the analysis unit 22 has occurred.

次に、サーバ装置１０の解析結果受信部１４は、障害解析装置２０から障害解析結果を受信する（ステップＳ１０７）。そして、縮退部１５は、解析結果受信部１４が受信した障害解析結果に従って、当該障害解析結果に含まれる障害被疑箇所の縮退処理を実施する（ステップＳ１０８）。そして、回復結果送信部１６は、縮退部１５が実施した縮退処理に伴う回復結果を、障害解析装置２０に送信する（ステップＳ１０９）。 Next, the analysis result receiving unit 14 of the server device 10 receives the failure analysis result from the failure analysis device 20 (step S107). Then, the degeneration unit 15 performs the degeneration process of the suspected failure part included in the failure analysis result according to the failure analysis result received by the analysis result receiving unit 14 (step S108). Then, the recovery result transmission unit 16 transmits the recovery result associated with the degeneration process performed by the degeneration unit 15 to the failure analysis apparatus 20 (step S109).

次に、回復結果受信部２４は、障害解析装置１０から回復結果を受信し、その良否を判定する（ステップＳ１１０）。そして、回復結果受信部２４が処理した回復結果を障害解析装置２０内で処理し、回復の成否を判定する（ステップＳ１１１）。回復結果が失敗を示す場合（ステップＳ１１１：ＮＯ）、ステップＳ１０５に戻り、解析部２２は、再度解析処理を実行する。他方、回復結果が成功を示す場合（ステップＳ１１１：ＹＥＳ）、処理を終了する。 Next, the recovery result receiving unit 24 receives the recovery result from the failure analysis device 10 and determines whether it is acceptable (step S110). Then, the recovery result processed by the recovery result receiving unit 24 is processed in the failure analysis device 20, and the success or failure of the recovery is determined (step S111). When the recovery result indicates failure (step S111: NO), the process returns to step S105, and the analysis unit 22 executes the analysis process again. On the other hand, when the recovery result indicates success (step S111: YES), the process ends.

上述のサーバ装置１０及び障害解析装置２０は内部に、コンピュータシステムを有している。そして、上述の各処理動作は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶しており、このプログラムをコンピュータが読み出して実行することによって、処理を行う。 The server device 10 and the failure analysis device 20 described above have a computer system inside. Each processing operation described above is stored in a computer-readable recording medium in the form of a program, and processing is performed by the computer reading and executing this program.

ここでコンピュータ読み取り可能な記録媒体とは、磁気ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、半導体メモリ等をいう。また、このコンピュータプログラムを通信回線によってコンピュータに配信し、この配信を受けたコンピュータが当該プログラムを実行するようにしても良い。 Here, the computer-readable recording medium means a magnetic disk, a magneto-optical disk, a CD-ROM, a DVD-ROM, a semiconductor memory, or the like. Alternatively, the computer program may be distributed to the computer via a communication line, and the computer that has received the distribution may execute the program.

また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録しているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

このように、本実施形態によれば、障害解析装置２０が複数のサーバ装置１０に発生したハードウェア障害を解析し、サーバ装置１０が障害解析装置２０による障害解析結果に基づいて障害発生箇所の縮退処理を行う。これにより、サーバ装置１０毎に改修を行わずに、各サーバ装置１０を統括する障害解析装置２０を改修することにより、より少ないコストで障害解析の精度向上が可能となる。 As described above, according to the present embodiment, the failure analysis device 20 analyzes a hardware failure that has occurred in the plurality of server devices 10, and the server device 10 determines the location of the failure occurrence location based on the failure analysis result by the failure analysis device 20. Perform degeneration processing. As a result, it is possible to improve the accuracy of the failure analysis at a lower cost by renovating the failure analysis device 20 that supervises each server device 10 without renovating each server device 10.

また、本実施形態によれば、障害解析装置２０で障害発生時のログと、障害回復時の実際の交換箇所を集約・蓄積することができる。これにより、同様の障害に対して、的確な被疑指摘ができる。 Further, according to the present embodiment, the failure analysis apparatus 20 can collect and accumulate the log at the time of failure occurrence and the actual exchange location at the time of failure recovery. As a result, it is possible to accurately point out a suspicion for the same failure.

以上、図面を参照してこの発明の一実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 As described above, the embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to the above, and various design changes and the like can be made without departing from the scope of the present invention. It is possible to

１障害解析システム
１０サーバ装置
１１検知部
１２障害情報収集部
１３障害情報送信部
１４解析結果受信部
１５縮退部
１６回復結果送信部
２０障害解析装置
２１障害情報受信部
２２解析部
２３解析結果送信部
２４回復結果受信部 DESCRIPTION OF SYMBOLS 1 Failure analysis system 10 Server apparatus 11 Detection part 12 Failure information collection part 13 Failure information transmission part 14 Analysis result reception part 15 Degeneration part 16 Recovery result transmission part 20 Failure analysis apparatus 21 Failure information reception part 22 Analysis part 23 Analysis result transmission part 24 Recovery result receiver

Claims

A plurality of server devices;
A failure analysis device that analyzes a hardware failure that has occurred in the server device,
The server device performs a degeneration process of a failure occurrence location based on a failure analysis result by the failure analysis device.

The failure analysis device estimates a location where a hardware failure has occurred in the server device,
The failure analysis result includes information on a suspected failure location that is a location estimated by the failure analysis device,
The failure analysis system according to claim 1, wherein the server device performs a degeneracy process on the suspected failure location indicated by the failure analysis result.

The server device
A detection unit for detecting a hardware failure of the own device;
A failure information collection unit that collects failure information that is information about the failure detected by the detection unit;
A failure information transmission unit for transmitting the failure information collected by the failure information collection unit to the failure analysis device;
An analysis result receiving unit for receiving a failure analysis result from the failure analysis device;
A reduction unit that performs a reduction process based on a failure analysis result received by the analysis result receiving unit, and
The failure analysis device
A failure information receiving unit for receiving the failure information from the server device;
An analysis unit for analyzing a hardware failure that has occurred in the server device based on the failure information received by the failure information receiving unit;
The failure analysis system according to claim 1, further comprising: an analysis result transmission unit that transmits a failure analysis result by the analysis unit to the server device.

A failure information receiving unit that receives failure information that is information related to a hardware failure that has occurred in the server device from a plurality of server devices;
An analysis unit for analyzing a hardware failure that has occurred in the server device based on the failure information received by the failure information receiving unit;
A failure analysis apparatus comprising: an analysis result transmission unit configured to transmit a failure analysis result by the analysis unit to the server device.

The analysis unit estimates a location where a hardware failure has occurred,
The failure analysis apparatus according to claim 4, wherein the failure analysis result includes information on a suspected failure location that is a location estimated by the analysis unit.

A detection unit for detecting a hardware failure of the own device;
A failure information collection unit that collects failure information that is information about the failure detected by the detection unit;
A failure information transmission unit for transmitting the failure information collected by the failure information collection unit to the failure analysis device;
An analysis result receiving unit for receiving a failure analysis result from the failure analysis device;
A degeneration unit that performs degeneration processing based on a failure analysis result received by the analysis result receiving unit.

The failure analysis result includes information on a suspected failure location that is a location where a hardware failure is estimated to have occurred by the failure analysis device,
The server device according to claim 6, wherein the degeneration unit degenerates a suspected failure location indicated by the failure analysis result.

The failure analysis device analyzes the hardware failure that occurred in the server device,
The failure analysis method, wherein the server device performs a degeneration process of a failure occurrence location based on a failure analysis result by the failure analysis device.

Computer
A failure information receiving unit that receives failure information that is information related to a hardware failure that has occurred in the server device from a plurality of server devices;
An analysis unit for analyzing a hardware failure that has occurred in the server device based on the failure information received by the failure information receiving unit;
The program for functioning as an analysis result transmission part which transmits the failure analysis result by the said analysis part to the said server apparatus.

Computer
A detection unit that detects a hardware failure of its own device,
A failure information collection unit that collects failure information that is information about the failure detected by the detection unit;
A failure information transmission unit for transmitting the failure information collected by the failure information collection unit to a failure analysis device;
An analysis result receiving unit for receiving a failure analysis result from the failure analysis device;
A program for functioning as a degeneration unit that performs degeneration processing based on a failure analysis result received by the analysis result receiving unit.