JP2013225217A

JP2013225217A - Multiprocessor system

Info

Publication number: JP2013225217A
Application number: JP2012097056A
Authority: JP
Inventors: Hiroki Konno; 廣毅今野
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2012-04-20
Filing date: 2012-04-20
Publication date: 2013-10-31
Anticipated expiration: 2032-04-20
Also published as: JP5929465B2

Abstract

PROBLEM TO BE SOLVED: To provide a multiprocessor system capable of making fault analysis more efficient.SOLUTION: If a fault occurs in a CPU, a fault handling part determines how serious the fault is, collects fault data and analytic data according to the seriousness, and reports the fault to a fault processing part. The fault processing part checks whether a file system is available at the local processor or another processor, and then determines whether the collected data can be stored when the file system is available. When the data cannot be stored, an accessible file server is selected, a virtual file system is mounted, and the collected data is transmitted. After the collected data is stored, a CPU is reset.

Description

以下の実施形態は、マルチプロセッサシステムに関する。 The following embodiments relate to a multiprocessor system.

通常、ルータなどの通信装置は、シェルフに、プロセッサなどを搭載した基板を搭載することによって構成される。このような基板は、シェルフに搭載する構成としては１単位となるので、以下においては、基板Unitと呼ぶ。 Usually, a communication device such as a router is configured by mounting a substrate on which a processor or the like is mounted on a shelf. Since such a substrate is one unit as a configuration to be mounted on the shelf, it is hereinafter referred to as a substrate Unit.

従来の基板Unit上は、機能分散したマルチプロセッサ構成となっている。ところで、基板Unitにおいて、障害が発生した場合には、障害の分析を行い、対策を施すことが要求される。そのためには、障害を解析するためのデータ（障害データ、解析データ等）を取得する必要がある。 The conventional board Unit has a multiprocessor configuration with distributed functions. Incidentally, when a failure occurs in the board unit, it is required to analyze the failure and take measures. For this purpose, it is necessary to acquire data (failure data, analysis data, etc.) for analyzing the failure.

メインプロセッサまたはサブプロセッサ（便宜的にメインとサブのプロセッサに分けて考えるが、機能的に差異があるわけではない）にてWDT(Watch Dog Timeout)など、障害種別が基板Unitにとって深刻なレベルの障害が発生したとする。障害データを収集する際、障害が発生したプロセッサに具備されている、ファイルシステムや不揮発性メモリの障害データ収集用に準備した領域に障害データ、解析データ等が格納される。当該領域が、収集データに対して予め十分な容量を確保できない場合、障害データは最低限のものだけを選択して収集するようにしていた。 WDT (Watch Dog Timeout) and other fault types are serious for the board unit in the main processor or sub-processor (conveniently divided into main and sub-processors for convenience, but not functionally different) Suppose a failure occurs. When fault data is collected, fault data, analysis data, and the like are stored in an area prepared for fault data collection in the file system or nonvolatile memory provided in the processor in which the fault has occurred. When the area cannot secure a sufficient capacity for the collected data in advance, only the minimum trouble data is selected and collected.

サブプロセッサにてWDTなど、障害種別が基板Unitにとって深刻なレベルの障害が発生し、サブプロセッサにファイルシステムが具備されていない場合、障害データを破棄するか、ファイルシステムを持ったプロセッサ(メインプロセッサまたは他のサブプロセッサ)に収集してもらっていた。実際、組み込みシステムでは、コストの関係で全てのプロセッサにファイルシステムを実装しないケースが多い。 If the sub-processor has a serious failure such as WDT, and the sub-processor does not have a file system, discard the fault data or use a processor with a file system (main processor Or other sub-processors). In fact, in embedded systems, there are many cases where a file system is not mounted on all processors due to cost.

従来技術には、ネットワークサーバに障害再現、障害検出及び状態確認を行うためのハードウェアの内容を保存するものがある。 In the prior art, there is one that stores the contents of hardware for performing fault reproduction, fault detection, and status confirmation in a network server.

特開平１１−６５８９８号公報Japanese Patent Laid-Open No. 11-65898

従来の技術では以下の問題に対応できない。
(1)プロセッサに具備されたファイルシステムの容量が小さいために、必要最低限のデータだけを収集する場合、からなずしも解析に有効なデータが全て揃っていない事がありうる。
(2)サブプロセッサにファイルシステムが具備されておらず、障害データを破棄してしまう場合、障害解析が困難になる。
(3)サブプロセッサにファイルシステムが具備されておらず、障害データはファイルシステムを持ったプロセッサが収集する場合、ファイルシステムを持ったプロセッサ側のファイルシステムの条件によって収集データのデータ量が左右されてしまい、(1)で記載した問題となってしまう。 Conventional technology cannot cope with the following problems.
(1) Since the capacity of the file system provided in the processor is small, when collecting only the minimum necessary data, it is possible that not all data effective for analysis is available.
(2) If the sub processor does not have a file system and the failure data is discarded, failure analysis becomes difficult.
(3) If the sub-processor does not have a file system and failure data is collected by a processor with a file system, the amount of collected data depends on the conditions of the file system on the processor side with the file system. It becomes the problem described in (1).

以下の実施形態では、障害解析の効率化を促進できるマルチプロセッサシステムを提供する。 In the following embodiments, a multiprocessor system capable of promoting the efficiency of failure analysis is provided.

以下の実施形態の一側面におけるマルチプロセッサシステムは、プロセッサが複数設けられたマルチプロセッサシステムであって、該プロセッサは、発生した障害の深刻度に応じて、障害に関するデータを収集する障害ハンドリング部と、外部のファイルサーバにアクセスするファイルサーバアクセス部と、自プロセッサあるいは前記マルチプロセッサ内の他プロセッサにファイルシステムが搭載されている場合に、該障害に関するデータを格納可能か判断し、格納可能な場合には、該ファイルシステムに該障害に関するデータを格納し、格納不可能な場合には、該ファイルサーバに該障害に関するデータを格納する障害処理部とを備える。 A multiprocessor system according to an aspect of the following embodiment is a multiprocessor system including a plurality of processors, and the processor includes a fault handling unit that collects data related to a fault according to the severity of the fault that has occurred. When the file server access unit for accessing an external file server and the file system is mounted on the own processor or another processor in the multiprocessor, it is determined whether or not data relating to the failure can be stored. Includes a failure processing unit that stores data related to the failure in the file system, and stores data related to the failure in the file server when the data cannot be stored.

以下の実施形態によれば、障害解析の効率化を促進できるマルチプロセッサシステムを提供することができる。 According to the following embodiments, it is possible to provide a multiprocessor system that can promote efficiency of failure analysis.

本実施形態の第１の構成例を説明する図（その１）である。It is FIG. (1) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その２）である。It is FIG. (2) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その３）である。It is FIG. (3) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その４）である。It is FIG. (4) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その５）である。It is FIG. (5) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その６）である。It is FIG. (6) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その７）である。It is FIG. (7) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その８）である。It is FIG. (8) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その９）である。It is FIG. (9) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その１０）である。It is FIG. (10) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その１１）である。It is FIG. (11) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その１２）である。It is FIG. (12) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その１３）である。It is FIG. (13) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その１４）である。It is FIG. (14) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その１５）である。It is FIG. (15) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その１６）である。It is FIG. (16) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その１７）である。It is FIG. (17) explaining the 1st structural example of this embodiment. 本実施形態の第１の構成例を説明する図（その１８）である。It is FIG. (18) explaining the 1st structural example of this embodiment. 本実施形態の第２の構成例を説明する図（その１）である。It is FIG. (1) explaining the 2nd structural example of this embodiment. 本実施形態の第２の構成例を説明する図（その２）である。It is FIG. (2) explaining the 2nd structural example of this embodiment. 本実施形態の第２の構成例を説明する図（その３）である。It is FIG. (The 3) explaining the 2nd structural example of this embodiment. 本実施形態の第３の構成例を説明する図（その１）である。It is FIG. (1) explaining the 3rd structural example of this embodiment. 本実施形態の第３の構成例を説明する図（その２）である。It is FIG. (2) explaining the 3rd structural example of this embodiment. 本実施形態の第３の構成例を説明する図（その３）である。It is FIG. (3) explaining the 3rd structural example of this embodiment. 本実施形態を具体的な構成に適用した場合を説明する図（その１）である。It is FIG. (1) explaining the case where this embodiment is applied to a concrete structure. 本実施形態を具体的な構成に適用した場合を説明する図（その２）である。It is FIG. (2) explaining the case where this embodiment is applied to a concrete structure. 本実施形態を具体的な構成に適用した場合を説明する図（その３）である。It is FIG. (3) explaining the case where this embodiment is applied to a concrete structure. 本実施形態を具体的な構成に適用した場合を説明する図（その４）である。It is FIG. (The 4) explaining the case where this embodiment is applied to a concrete structure. 本実施形態を具体的な構成に適用した場合を説明する図（その５）である。It is FIG. (5) explaining the case where this embodiment is applied to a concrete structure.

本実施形態では、障害ハンドリング部で障害解析を行った結果、障害種別が基板Unitにとって深刻なレベル（Watch Dog Timeout（WDTと記載）や命令例外やメモリ例外といったCPUコアのリセットを必要とする深刻なレベル）の障害で、かつ、大量の障害データを必要とする場合の障害データ収集に関して述べる。 In this embodiment, as a result of failure analysis in the failure handling unit, the failure type is a serious level that requires a reset of the CPU core such as Watch Dog Timeout (WDT), instruction exception, and memory exception. The failure data collection when a large amount of failure data is required is described.

基板UNITは、CPUやメモリが搭載された基板の事であり、要求機能ごとに基板の機能が違うためUnitとしている。この基板Unitが機能分散したマルチプロセッサ構成となっているとする。 The board UNIT is a board on which a CPU and a memory are mounted. The board function is different for each required function. It is assumed that the board unit has a multiprocessor configuration in which functions are distributed.

メインプロセッサまたはサブプロセッサにて障害が発生した時、障害ハンドリング部で障害解析を行う。そして、障害種別が基板Unitにとって深刻なレベル(WDTや命令例外やメモリ例外といったCPUコアのリセットを必要とする深刻な障害）であるか否かを自動判断する。そして、深刻なレベルの場合に障害データと解析データを詳細に収集し、軽微な場合は状況に合わせて収集を行なわないことなどする。障害種別が深刻なレベルであったと判断できた場合、障害種別毎に異なる障害データを自動的に選択し、障害種別毎に異なる障害解析に必要なデータ(レジスタ情報、メモリ情報、呼処理情報)を自動的に選択する。その全ての障害データ、解析データ等を収集してからCPUのリセットを行なう。そして、障害データと解析データを収集する際、収集すべき情報のサイズが基板UNIT内の不揮発性メモリ(自プロセッサ配下及び他プロセッサ配下の不揮発性メモリ)に格納しきれるのかをプロセッサ間通信により自動判定する。格納しきれない場合のみ、サブプロセッサ自身が構成可能な仮想ファイルシステムのリストからファイルシステムを選択して、外部のファイルサーバに接続し、障害データを収集保存させる。 When a failure occurs in the main processor or sub processor, the failure handling unit performs failure analysis. Then, it is automatically determined whether or not the failure type is a serious level for the board unit (a serious failure that requires resetting the CPU core such as a WDT, an instruction exception, or a memory exception). Then, failure data and analysis data are collected in detail in the case of a serious level, and collection is not performed according to the situation in a minor case. When it can be determined that the failure type is a serious level, different failure data is automatically selected for each failure type, and data required for failure analysis that differs for each failure type (register information, memory information, call processing information) Is automatically selected. After collecting all the failure data and analysis data, reset the CPU. When collecting fault data and analysis data, whether the size of the information to be collected can be stored in the non-volatile memory in the board UNIT (non-volatile memory under its own processor and other processors) is automatically determined by inter-processor communication. judge. Only when the data cannot be stored, the sub processor selects a file system from the list of virtual file systems that can be configured, connects to an external file server, and collects and stores failure data.

以上により、具備されているファイルシステムや不揮発性メモリ容量の制限に依存することなく、障害データと解析データを収集することができるようになる。 As described above, the failure data and the analysis data can be collected without depending on the limitation of the file system provided and the nonvolatile memory capacity.

したがって、障害種別に対応した、より多くの、障害情報を障害データとして残す事が可能となる。 Therefore, it is possible to leave more failure information corresponding to the failure type as failure data.

これにより、より多くの情報を障害ログとして残す事が可能となり、障害解析がより実施しやすくなり、システムの平均修理時間(MTTR;Mean Time To Repair)を短縮できる。 This makes it possible to leave more information as a failure log, making it easier to perform failure analysis and shortening the mean time to repair (MTTR) of the system.

図１〜図１８は、本実施形態の第１の構成例を説明する図である。
第１の構成例では、メインプロセッサ以外のサブプロセッサにて障害が発生し、サブプロセッサにファイルシステム(または不揮発性メモリ)が具備されておらず、メインプロセッサに具備されているが、メインプロセッサ側のファイルシステムの残量が少ない場合を示す。 1 to 18 are diagrams illustrating a first configuration example of the present embodiment.
In the first configuration example, a failure occurs in a sub processor other than the main processor, and the file system (or nonvolatile memory) is not provided in the sub processor, but is provided in the main processor. Indicates the case where the remaining file system is low.

図１においては、基板Unit９は、サブプロセッサ１０とメインプロセッサ１１からなっているとする。メインプロセッサ１１とサブプロセッサ１０との区別は便宜的なものであって、同じプロセッサであってよい。ここでは、サブプロセッサ１０において障害が発生するとする。サブプロセッサ１０のCPU１２において、CPUコアが障害を検出すると、これがシステム障害割り込み部１２に通知される。システム障害割り込み部１２は、割り込み命令を発行するが、障害ハンドリング部１４にこれが通知され、障害ハンドリング部１４が起動され、後述するような障害分析処理が行われる。なお、障害ハンドリング部１４は、CPUが実効する、基板Unitの機能を実現するためのプログラムの実行において障害が発生した場合も起動される。図１のCall処理部２２は、メインプログラムのサブルーチンなどの呼び出し処理を行うもので、呼び出し処理において異常が発生した場合に、障害ハンドリング部１４を起動する。障害ハンドリング部１４の障害分析結果は、障害処理部１５に通知される。障害処理部１５は、サブプロセッサ１０にファイルシステムが無いので、メインプロセッサ１１のファイルシステム残量チェック部１９に、メインプロセッサ１１のファイルシステム２０の残量を問い合わせる。ファイルシステム２０の残量が少なくて障害データ等を格納できない場合、障害処理部１５は、仮想ファイルシステムプロトコル処理部１６に、ファイルサーバ２１のファイルシステムへのアクセスを要求する。仮想ファイルシステムプロトコル処理部１６は、NFS(Network File System)やVFS(Virtual File System)といったプロトコルの処理をする。仮想ファイルシステムプロトコル処理部１６でプロトコル処理されたアクセス要求は、ETH(Ethernet)ドライバ１７、物理インタフェースPHY18を介して、ファイルサーバ２１に通知される。これにより、障害データ等は、ファイルサーバ２１のファイルシステムに格納される。障害データ等の格納後、障害処理部１５は、CPU１２をリセットする。 In FIG. 1, the board Unit 9 is composed of a sub processor 10 and a main processor 11. The distinction between the main processor 11 and the sub-processor 10 is convenient and may be the same processor. Here, it is assumed that a failure occurs in the sub processor 10. In the CPU 12 of the sub-processor 10, when the CPU core detects a failure, this is notified to the system failure interrupt unit 12. The system fault interrupt unit 12 issues an interrupt command, which is notified to the fault handling unit 14, the fault handling unit 14 is activated, and a fault analysis process as described later is performed. The failure handling unit 14 is also activated when a failure occurs in the execution of the program for realizing the function of the board unit, which is executed by the CPU. The Call processing unit 22 in FIG. 1 performs a calling process such as a subroutine of the main program, and activates the failure handling unit 14 when an abnormality occurs in the calling process. The failure analysis result of the failure handling unit 14 is notified to the failure processing unit 15. Since the sub processor 10 does not have a file system, the failure processing unit 15 inquires of the file system remaining amount check unit 19 of the main processor 11 about the remaining amount of the file system 20 of the main processor 11. When the remaining amount of the file system 20 is small and failure data cannot be stored, the failure processing unit 15 requests the virtual file system protocol processing unit 16 to access the file system of the file server 21. The virtual file system protocol processing unit 16 processes protocols such as NFS (Network File System) and VFS (Virtual File System). The access request processed by the virtual file system protocol processing unit 16 is notified to the file server 21 via the ETH (Ethernet) driver 17 and the physical interface PHY18. Thereby, the failure data and the like are stored in the file system of the file server 21. After storing the failure data or the like, the failure processing unit 15 resets the CPU 12.

システム障害割り込み部１３は、障害ハンドリング部１４を起動する。障害ハンドリング部１４は、障害分析を実施し、CPUのリセットを必要とするような深刻な障害の判断を行う深刻度を自動判断する。また、深刻なレベルならデータ収集し、深刻でない場合はデータ収集を実施しないことにする。さらに、深刻なレベルの障害要因によって(WDTや命令例外やメモリ例外などの要因によって)、それぞれに有効な障害データを自動選択し、深刻なレベルの障害要因によって、それぞれの解析に必要なデータ(レジスタ情報、メモリ情報、呼処理情報)を自動選択する。そして、障害ハンドリング部１４は、障害処理部を起動する。 The system failure interrupt unit 13 activates the failure handling unit 14. The fault handling unit 14 performs a fault analysis and automatically determines a seriousness degree for determining a serious fault that requires a CPU reset. Data is collected if it is a serious level, and data is not collected if it is not serious. Furthermore, depending on the serious level of failure factor (by factors such as WDT, instruction exception, memory exception, etc.), the effective failure data is automatically selected for each, and the data required for each analysis by the serious level of failure factor ( Register information, memory information, call processing information). Then, the failure handling unit 14 activates the failure processing unit.

図２は、システム障害割り込み部と障害ハンドリング部の動作を示すフローチャートである。 FIG. 2 is a flowchart showing the operations of the system fault interrupt unit and the fault handling unit.

CPUで障害が検出されると、システム障害割り込み部は、障害割り込み番号によって、起動するプログラムをVector Table化して保持しているので、ステップＳ１１で、障害の種別が分かるindex値を用いて障害ハンドリング部を起動する。また、システム障害割り込み部からの起動とは別に、Call処理において障害が発生した場合にも、障害ハンドリング部が起動される。ステップＳ１２において、Call処理において異常を検出した際には、障害種別が分かるindex値を用いて、障害ハンドリング部を起動する。 When a failure is detected in the CPU, the system failure interrupt unit stores the program to be activated as a vector table based on the failure interrupt number. In step S11, the failure handling is performed using an index value that indicates the type of failure. Start the department. In addition to the activation from the system failure interrupt unit, the failure handling unit is also activated when a failure occurs in the call processing. In step S12, when an abnormality is detected in the call process, the fault handling unit is activated using an index value that indicates the fault type.

障害ハンドリング部では、ステップＳ１３において、障害の深刻度を自動判定する。ステップＳ１４で障害の深刻度に応じて、障害データを収集するか否かを判断する。深刻度が大きければデータを収集し、深刻度が小さければデータ収集しないとする。ここで、システム障害割り込み部からの障害ハンドリング部の起動は深刻な障害と判断し、Call処理部からの起動は軽微な障害と判断する。そして、深刻な障害と軽微な障害の双方について、深刻度を調べ、データを収集するか判断する。ステップＳ１４で、障害データを収集しないと判断された場合には、ステップＳ１７に進む。ステップＳ１４で、障害データを収集すると判断された場合には、ステップＳ１５において、収集障害データ自動選択処理を行い、ステップＳ１６において、収集解析データ自動選択処理を行い、ステップＳ１７に進む。ステップＳ１７では、データ収集の有無、障害データ種別、解析データ種別とともに、障害通知を障害処理部に行う。なお、ステップＳ１３、Ｓ１５、Ｓ１６の詳細は以下に述べる。 In step S13, the fault handling unit automatically determines the severity of the fault. In step S14, it is determined whether or not to collect failure data according to the severity of the failure. Data is collected if the severity is high, and data is not collected if the severity is low. Here, activation of the fault handling unit from the system fault interrupt unit is determined as a serious failure, and activation from the Call processing unit is determined as a minor failure. Then, for both serious and minor obstacles, the severity is examined to determine whether data is collected. If it is determined in step S14 that failure data is not collected, the process proceeds to step S17. If it is determined in step S14 that failure data is to be collected, collection failure data automatic selection processing is performed in step S15, collection analysis data automatic selection processing is performed in step S16, and the process proceeds to step S17. In step S17, failure notification is sent to the failure processing unit together with the presence / absence of data collection, failure data type, and analysis data type. Details of steps S13, S15, and S16 will be described below.

図３及び図４は、深刻度自動判定処理を説明する図である。
図４は、深刻度判定テーブルの例を説明する図である。 3 and 4 are diagrams illustrating the automatic severity determination process.
FIG. 4 is a diagram illustrating an example of the severity determination table.

障害が発生すると、障害ハンドリング部に障害番号が通知されるが、この障害番号と起動するプログラムの格納されているアドレスとが対応付けられており、このアドレスをVectorアドレスと呼ぶ。深刻度判定テーブルは、システム障害用と軽微な障害用とが設けられる。（１）のシステム障害用深刻度判定テーブルでは、Vectorアドレスをインデックスとして、障害データを収集するか否かを示すフラグと、障害深刻度を示す数値が予め格納されている。（２）の軽微な障害用深刻度判定テーブルにおいては、障害種別をインデックスとして、障害データを収集するか否かを示すフラグと、障害深刻度を示す数値が予め格納されている。 When a failure occurs, a failure number is notified to the failure handling unit. This failure number is associated with an address where a program to be started is stored, and this address is called a vector address. The seriousness determination table is provided for system failures and minor failures. In the system failure severity determination table (1), a flag indicating whether or not failure data is collected and a numerical value indicating the failure severity are stored in advance using the vector address as an index. In the minor failure severity determination table (2), a failure type is used as an index, and a flag indicating whether or not failure data is collected and a numerical value indicating the failure severity are stored in advance.

図３のフローチャートにおいて、ステップＳ２０で、発生した障害がシステム障害割り込み部からの例外割り込みなどのシステム障害か、Call処理部からの障害かを判断する。Call処理部からの障害は、軽微な障害と判断する。 In the flowchart of FIG. 3, in step S20, it is determined whether the failure that has occurred is a system failure such as an exception interrupt from the system failure interrupt unit or a failure from the Call processing unit. A failure from the call processing unit is determined to be a minor failure.

ステップＳ２０で、例外割り込みなどのシステム障害と判断された場合には、ステップＳ２１で、システム障害用深刻度判定テーブル（図４の（１））を、受け取ったVectorアドレスをインデックスとして参照する。ステップＳ２２において、障害データ収集フラグがＯＮか否かを判断する。ステップＳ２２の判断で、当該フラグがＯＮと判断された場合には、ステップＳ２３において、障害深刻度のデータを返り値として返して、処理を終了する。ステップＳ２２の判断で、当該フラグがＯＦＦであると判断された場合には、ステップＳ２４で、障害深刻度のデータを０として、返り値として返して、処理を終了する。 If it is determined in step S20 that there is a system failure such as an exception interrupt, in step S21, the system failure severity determination table ((1) in FIG. 4) is referred to using the received vector address as an index. In step S22, it is determined whether the failure data collection flag is ON. If it is determined in step S22 that the flag is ON, failure severity data is returned as a return value in step S23, and the process ends. If it is determined in step S22 that the flag is OFF, the failure severity data is set to 0 and returned as a return value in step S24, and the process ends.

ステップＳ２０において、障害が軽微な障害であると判断された場合には、ステップＳ２５において、軽微な障害用深刻度判定テーブル（図４の（２））を、障害種別をインデックスとして参照する。ステップＳ２６において、障害データ収集フラグがＯＮか否かを判断する。ステップＳ２６で、当該フラグがＯＮと判断された場合には、ステップＳ２７において、深刻度データを返り値として返して、処理を終了する。ステップＳ２６で、当該フラグがＯＦＦと判断された場合には、ステップＳ２８において、深刻度データを０として、返り値として返して、処理を終了する。 If it is determined in step S20 that the failure is a minor failure, in step S25, the minor failure seriousness determination table ((2) in FIG. 4) is referenced using the failure type as an index. In step S26, it is determined whether the failure data collection flag is ON. If it is determined in step S26 that the flag is ON, the severity data is returned as a return value in step S27, and the process ends. If it is determined in step S26 that the flag is OFF, in step S28, the severity data is set to 0 and returned as a return value, and the process ends.

図５及び図６は、収集障害データ自動選択処理を説明する図である。
図６は、システム障害用収集障害データテーブルの例を説明する図である。ここでは、２次テーブル構造のテーブルの例を示すが、１次構造であってもよい。（１）のテーブルは、深刻度データをインデックスとして、（２）のテーブルのインデックスを示すVectorアドレスと、そのアドレスの有効ビットを格納する。アドレスの有効ビットは、登録されているVectorアドレスが有効なものであるか否かを示す。（２）のテーブルは、（１）のテーブルで得られるVectorアドレスをインデックスとして、障害データとその有効ビットが格納される。障害データは、発生した障害の内容を示すデータである。 5 and 6 are diagrams for explaining the collection failure data automatic selection process.
FIG. 6 is a diagram illustrating an example of a system failure collection failure data table. Here, an example of a table having a secondary table structure is shown, but a primary structure may be used. The table of (1) stores the vector address indicating the index of the table of (2) and the valid bit of the address using the severity data as an index. The valid bit of the address indicates whether or not the registered vector address is valid. The table (2) stores failure data and its valid bits using the vector address obtained in the table (1) as an index. The failure data is data indicating the content of the failure that has occurred.

図５のフローチャートにおいて、ステップＳ３０では、障害がシステム障害か軽微な障害かを判断する。ステップＳ３０で、システム障害と判断された場合には、ステップＳ３１において、図６（１）と（２）のシステム障害用収集障害データテーブルを深刻度データとVectorアドレスをインデックスとして参照する。ステップＳ３０において、軽微な障害と判断された場合には、ステップＳ３２において、軽微な障害用収集障害データテーブルを深刻度データと障害種別をインデックスとして参照する。図６には、システム障害用収集障害データテーブルしか示していないが、軽微な障害用収集障害データテーブルにおいては、図６の（１）のテーブルが深刻度データをインデックスとして、障害種別を格納するものとなる。また、同様に、図６の（２）のテーブルが、障害種別をインデックスとして、障害データを格納するものとなる。ステップＳ３３では、収集障害データの情報を返り値として返して処理を終了する。 In the flowchart of FIG. 5, in step S30, it is determined whether the failure is a system failure or a minor failure. If it is determined in step S30 that a system failure has occurred, in step S31, the system failure collection failure data table shown in FIGS. 6A and 6B is referred to using the severity data and the vector address as indexes. If it is determined in step S30 that the fault is minor, in step S32, the minor fault collected fault data table is referenced using the severity data and the fault type as an index. FIG. 6 shows only the system failure collection failure data table, but in the case of a minor failure collection failure data table, the table (1) in FIG. 6 stores the failure type with the severity data as an index. It will be a thing. Similarly, the table of (2) in FIG. 6 stores failure data using the failure type as an index. In step S33, the collection failure data information is returned as a return value, and the process is terminated.

図７及び図８は、解析データ自動選択処理を説明する図である。
図８は、システム障害用解析データテーブルの例を説明する図である。ここでは、２次テーブル構造のテーブルを示しているが、１次構造のものであっても良い。図８の（１）のテーブルは、深刻度データをインデックスとして、図８の（２）のテーブルのインデックスであるVectorアドレスと、そのアドレスの有効ビットを格納する。アドレスの有効ビットは、登録されているVectorアドレスが有効なものであるか否かを示す。図８の（２）のテーブルは、図８の（１）のテーブルで得られるVectorアドレスをインデックスとして、解析データとその有効ビットを格納する。解析データは、発生した障害において、どのようなハードウェアの動作が行われていたかを示すデータである。解析データは、障害の解析に用いられる。 7 and 8 are diagrams for explaining the analysis data automatic selection process.
FIG. 8 is a diagram illustrating an example of a system failure analysis data table. Here, a table having a secondary table structure is shown, but a table having a primary structure may be used. The table of (1) in FIG. 8 stores the vector address which is the index of the table of (2) in FIG. 8 and the valid bit of the address, using the severity data as an index. The valid bit of the address indicates whether or not the registered vector address is valid. The table of (2) in FIG. 8 stores analysis data and its valid bits using the vector address obtained in the table of (1) of FIG. 8 as an index. The analysis data is data indicating what kind of hardware operation was performed in the failure that occurred. The analysis data is used for failure analysis.

図７において、ステップＳ３５で、障害がシステム障害か軽微な障害かを判断する。ステップＳ３５の判断で、システム障害と判断された場合には、ステップＳ３６において、システム障害用解析データテーブルを深刻度データとVectorアドレスをインデックスとして参照する。ステップＳ３５の判断で、軽微な障害と判断された場合には、ステップＳ３７で、軽微な障害用解析データテーブルを深刻度データと障害種別をインデックスとして参照する。図８には、システム障害用解析データテーブルしか示していないが、軽微な障害用解析データテーブルにおいては、図８の（１）のテーブルが深刻度データをインデックスとして、障害種別を格納するものとなる。また、同様に、図８の（２）のテーブルが、障害種別をインデックスとして、解析データを格納するものとなる。ステップＳ３８において、解析データを返り値として返して、処理を終了する。 In FIG. 7, it is determined in step S35 whether the failure is a system failure or a minor failure. If it is determined in step S35 that there is a system failure, in step S36, the system failure analysis data table is referenced using the severity data and the vector address as an index. If it is determined in step S35 that the failure is minor, in step S37, the minor failure analysis data table is referred to using the severity data and the failure type as an index. FIG. 8 shows only the system failure analysis data table, but in the case of a minor failure analysis data table, the table (1) in FIG. 8 stores the failure type using the severity data as an index. Become. Similarly, the table of (2) in FIG. 8 stores analysis data using the failure type as an index. In step S38, the analysis data is returned as a return value, and the process ends.

障害処理部は、障害ハンドリング部から渡された情報から収集データサイズを計算する。また、自プロセッサ配下にファイルシステムが存在するかを判断し、存在する際は、その残量によってデータが収集可能か否かを判断する。自プロセッサ配下にファイルシステムが存在しない場合、他のプロセッサ配下にファイルシステムが存在するかを判断し、存在する際はデータが収集可能か否かを判断する。他のプロセッサ配下にファイルシステムが存在する場合、他プロセッサのファイルシステムの残量がデータを格納可能なだけ存在するか判断するために、プロセッサ間通信にて他のプロセッサ配下のファイルシステムの残量をチェックする。チェックの結果を返り値として返し、データが収集可能か否を判断する。仮想ファイルシステムを接続する先のリストデータから優先度の高いものからファイルシステムを選択する。また、システムCallを使用し、仮想ファイルシステムのプロトコル(NFSやVFS)経由でLANから先のファイルサーバ上のファイルシステムをマウントする。障害要因と障害データを、自プロセッサ配下の不揮発性メモリ/他プロセッサ配下の不揮発性メモリ/仮想ファイルシステムのいずれかに出力する。障害要因と障害データの出力が完了するまで待って、CPUにリセット信号を出し、リセットする。 The failure processing unit calculates the collected data size from the information passed from the failure handling unit. In addition, it is determined whether a file system exists under its own processor, and if it exists, it is determined whether data can be collected based on the remaining amount. If the file system does not exist under its own processor, it is determined whether the file system exists under another processor, and if it exists, it is determined whether data can be collected. When there is a file system under another processor, the remaining amount of the file system under the other processor is determined by inter-processor communication in order to determine whether the remaining amount of the file system of the other processor is large enough to store data. Check. The check result is returned as a return value, and it is determined whether data can be collected. A file system is selected from the list data to which the virtual file system is connected from the list data with the highest priority. In addition, the system Call is used to mount the file system on the file server ahead of the LAN via the virtual file system protocol (NFS or VFS). The failure factor and the failure data are output to any one of the non-volatile memory under the own processor / non-volatile memory under the other processor / virtual file system. Wait until the cause of the failure and the output of the failure data are complete, then issue a reset signal to the CPU to reset it.

図９は、障害処理部の処理の流れを示すフローチャートである。
障害ハンドリング部から障害処理部が起動されると、ステップＳ４０において、障害ハンドリング部から渡された情報から、収集データサイズを計算する。ステップＳ４１において、自プロセッサ、あるいは、他プロセッサの不揮発性メモリに格納可能か問い合わせる。他プロセッサのファイルシステムの格納領域の残量は、プロセッサ間通信で問い合わせる。 FIG. 9 is a flowchart showing the flow of processing of the failure processing unit.
When the failure processing unit is activated from the failure handling unit, the collected data size is calculated from the information passed from the failure handling unit in step S40. In step S41, an inquiry is made as to whether data can be stored in the non-volatile memory of its own processor or another processor. The remaining amount of the storage area of the file system of another processor is inquired by inter-processor communication.

ステップＳ４２において、データの収集が不可能と判断された場合には、ステップＳ４３において、仮想ファイルシステムを接続する先のファイルサーバ接続先リストからファイルサーバを選択し、ステップＳ４４において、仮想ファイルシステムをマウントする。ステップＳ４５において、仮想ファイルシステムに障害要因、障害データ、障害解析用データを出力し、ステップＳ４６に進む。 If it is determined in step S42 that data cannot be collected, a file server is selected from the file server connection destination list to which the virtual file system is connected in step S43, and the virtual file system is selected in step S44. Mount. In step S45, the failure factor, failure data, and failure analysis data are output to the virtual file system, and the process proceeds to step S46.

ステップＳ４２において、データ収集可能と判断された場合には、ステップＳ４８において、収集先が自プロセッサか他プロセッサかを判断する。ステップＳ４８の判断で、自プロセッサに格納する場合には、ステップＳ４９において、自プロセッサの不揮発性メモリに障害要因、障害データ、障害解析用データを出力する。ステップＳ４８の判断で、他プロセッサに格納する場合には、ステップＳ５０において、他プロセッサの不揮発性メモリに、障害要因、障害データ、障害解析用データを出力する。なお、ステップＳ４８で自プロセッサ、他プロセッサのいずれを選ぶかは次の図１０で説明する処理の結果による。 If it is determined in step S42 that data can be collected, it is determined in step S48 whether the collection destination is the local processor or another processor. If it is determined in step S48 that the data is to be stored in its own processor, in step S49, the failure factor, failure data, and failure analysis data are output to the nonvolatile memory of the own processor. If it is determined in step S48 that the data is stored in another processor, in step S50, the failure factor, the failure data, and the failure analysis data are output to the non-volatile memory of the other processor. Whether to select the own processor or another processor in step S48 depends on the result of the process described in FIG.

ステップＳ４６において、障害要因、障害データ、障害解析用データの出力が完了するのを待ち、ステップＳ４７において、CPUをリセットする。 In step S46, it waits for the output of the failure factor, failure data, and failure analysis data to be completed, and in step S47, the CPU is reset.

図１０は、自プロセッサ／他プロセッサ不揮発性メモリ格納可能選択処理のフローチャートである。 FIG. 10 is a flowchart of the self-processor / other processor non-volatile memory storable selection process.

ステップＳ５５において、自プロセッサにファイルシステムがあるか否かを判断する。ステップＳ５５で、無いと判断された場合には、ステップＳ５６において、他プロセッサにファイルシステムがあるか否かを判断する。ステップＳ５６の判断で、有りとなった場合には、ステップＳ５７において、他プロセッサのファイルシステムの残量をプロセッサ間通信でチェックする。ステップＳ５８において、残量があるか否かを判断する。ステップＳ５８の判断で、無いと判断された場合には、ステップＳ５９において、データ収集不可能という判断を返り値として返して処理を終了する。ステップＳ５８の判断で、有りと判断された場合には、ステップＳ６０において、他プロセッサでデータ収集可能という判断を返り値として返して処理を終了する。 In step S55, it is determined whether or not the own processor has a file system. If it is determined in step S55 that there is no file, it is determined in step S56 whether the other processor has a file system. If YES in step S56, the remaining file system remaining capacity of other processors is checked by inter-processor communication in step S57. In step S58, it is determined whether there is a remaining amount. If it is determined in step S58 that there is no data, in step S59, a determination that data collection is impossible is returned as a return value, and the process is terminated. If it is determined in step S58 that the data is present, in step S60, a determination that data can be collected by another processor is returned as a return value, and the process ends.

ステップＳ５６の判断で、他プロセッサにファイルシステムがないと判断された場合には、ステップＳ６１において、データ収集不可能という判断を返り値として返して処理を終了する。 If it is determined in step S56 that there is no file system in another processor, in step S61, a determination that data collection is impossible is returned as a return value, and the process is terminated.

ステップＳ５５の判断で、自プロセッサにファイルシステムがあると判断された場合には、ステップＳ６２において、入力されたファイルのサイズと自プロセッサのファイルシステムの残量とを比較する。ステップＳ６３において、残量が十分か否かを判断する。ステップＳ６３の判断で、残量が不十分となった場合には、ステップＳ６４において、データ収集不可能という判断を返り値として返して処理を終了する。ステップＳ６３の判断で、残量が十分となった場合には、ステップＳ６５で、自プロセッサでデータを収集可能という判断を返り値として返して処理を終了する。 If it is determined in step S55 that the processor has a file system, the input file size is compared with the remaining amount of the file system of the processor in step S62. In step S63, it is determined whether the remaining amount is sufficient. If it is determined in step S63 that the remaining amount is insufficient, a determination that data collection is impossible is returned as a return value in step S64, and the process ends. If it is determined in step S63 that the remaining amount is sufficient, in step S65, a determination that data can be collected by the own processor is returned as a return value, and the process ends.

図１１は、他プロセッサファイルシステム残量チェック処理（図９のステップＳ４１）のフローチャートである。 FIG. 11 is a flowchart of the remaining processor file system remaining amount check process (step S41 in FIG. 9).

ステップＳ７０において、要求元の自プロセッサから要求先の他プロセッサに、プロセッサ間通信で、ファイルシステムの残量のチェックリクエストを送信する。ステップＳ７１において、他プロセッサでは、ファイルシステムの残量のチェックリクエストを受信する。ステップＳ７２において、必要なファイルサイズをチェックする。ステップＳ７３において、他プロセッサは、チェック結果をプロセッサ間通信で自プロセッサに通知する。ステップＳ７４において、自プロセッサは、プロセッサ間通信で結果を受信し、ステップＳ７５において、結果を返り値として返して処理を終了する。 In step S70, a request for checking the remaining capacity of the file system is transmitted from the requesting processor to the requesting other processor through inter-processor communication. In step S71, the other processor receives a request for checking the remaining capacity of the file system. In step S72, the necessary file size is checked. In step S73, the other processor notifies the self processor of the check result through inter-processor communication. In step S74, the processor itself receives the result through inter-processor communication, and returns the result as a return value in step S75, and ends the process.

図１２及び図１３は、仮想ファイルシステムの選択処理（図９のステップＳ４３）を説明する図である。 12 and 13 are diagrams for explaining the virtual file system selection process (step S43 in FIG. 9).

図１３は、仮想ファイルシステムの接続先ファイルサーバリストの例を示す図である。接続先のファイルサーバのIPアドレスと優先度を対にしてリストに登録する。これらの仮想ファイルシステムを接続する先のファイルサーバのリストはCPUのプログラムが参照可能なリスト形式のデータでＲＡＭに格納される。リスト内の接続先のうち数個はデフォルト接続先としてシステム固定で準備する。このデータは、システムが立ち上がった際、ＲＡＭ上にロードする。 FIG. 13 is a diagram illustrating an example of the connection destination file server list of the virtual file system. Register the IP address and priority of the file server to connect to the list. A list of file servers to which these virtual file systems are connected is stored in the RAM in list format data that can be referred to by the CPU program. Several of the connection destinations in the list are prepared as system defaults. This data is loaded onto the RAM when the system is started up.

優先度は以下の考え方で決定する。
・デフォルトのシステム固定データ（ＲＯＭプログラム内のデータで用意する）の時点で0-10の優先度をつけておく(数字が小さいほど優先度が高い)。
・システムが立ち上がった際、ＲＡＭ上にロードして、リスト形式のデータを形成する。その後、リストの先頭のＩＰアドレスからpingを送信し、hop数の少ないものが近いサーバとして、「優先度×hop数の値」が小さいものからリストの並べ替えを行い、リストの先頭ほど優先度が高くなるようにする。次に優先度の高いものの選択はリストデータの先頭からデータを検索することで選択できる。 The priority is determined based on the following concept.
-At the time of default system fixed data (prepared by data in ROM program), give a priority of 0-10 (the smaller the number, the higher the priority).
When the system starts up, it is loaded onto the RAM to form list format data. Then, ping is sent from the IP address at the top of the list, and the server with the smaller number of hops is the closest server, and the list is rearranged from the one with the smaller “priority x hop number”. To be higher. The next highest priority can be selected by searching for data from the top of the list data.

図１２において、ステップＳ８０では、仮想ファイルサーバの接続先ファイルサーバリストテーブルを検索し、ステップＳ８１で、仮想ファイルサーバ接続先ファイルサーバリストテーブルの先頭からIPアドレスを取り出す。ステップＳ８２において、取得したIPアドレスにpingを送信し、返信を得ることができるかチェックする。ステップＳ８３で、pingの送信がOKならば、ステップＳ８４において、当該IPアドレスの仮想ファイルサーバを接続対象とする。ステップＳ８３において、pingの送信がNGならば、ステップＳ８０に戻る。 12, in step S80, the connection destination file server list table of the virtual file server is searched, and in step S81, the IP address is extracted from the head of the virtual file server connection destination file server list table. In step S82, a ping is sent to the acquired IP address to check whether a reply can be obtained. If the ping transmission is OK in step S83, the virtual file server with the IP address is set as the connection target in step S84. If the ping transmission is NG in step S83, the process returns to step S80.

図１４〜図１６は、仮想ファイルサーバの接続先ファイルサーバリストテーブルの生成方法を説明する図である。 14 to 16 are diagrams for explaining a method of generating a connection destination file server list table of the virtual file server.

図１６は、デフォルトのファイルサーバのIPアドレスの固定データの例を示す。図１６においては、接続先ファイルサーバのIPアドレスがリストアップされ、仮の優先度が高いものが、リストの上のほうに登録されている。これは、デフォルトのリストであるため、システムの設計時に予め作成しておくもので、ROMなどに格納される。 FIG. 16 shows an example of fixed data of the IP address of the default file server. In FIG. 16, the IP address of the connection destination file server is listed, and the one with a high temporary priority is registered at the top of the list. Since this is a default list, it is created in advance at the time of designing the system, and is stored in a ROM or the like.

図１４のフローチャートにおいて、ステップＳ８９では、デフォルトのファイルサーバリストのIPアドレスの固定データを読み込む。ステップＳ９０において、デフォルトのファイルサーバのIPアドレスの固定データを先頭から１つ読み出す。ステップＳ９１において、読み出したIPアドレス宛にpingを送信し、ステップＳ９２で、正常に返信が得られるか判断する。ステップＳ９２で、正常に返信が得られない場合には、ステップＳ８９に戻る。ステップＳ９２で、正常に返信が得られる場合には、ステップＳ９３において、pingの結果受け取ったhop数を取得する。ステップＳ９４において、（データのインデックス値（仮優先度））×（hop数の値）＝（本優先度）として、当該IPアドレスをワークエリアに格納し、ステップＳ８９に戻る。ステップＳ８９からステップＳ９４の処理を、デフォルトリストの全ての固定データにあるIPアドレスについて行ったら、ステップＳ９５において、ワークエリアの情報を、本優先度の順番で並び替えながら、仮想ファイルサーバの接続先ファイルサーバリストテーブルを生成する。 In the flowchart of FIG. 14, in step S89, the fixed data of the IP address of the default file server list is read. In step S90, one fixed data of the default file server IP address is read from the head. In step S91, a ping is transmitted to the read IP address. In step S92, it is determined whether a reply can be normally obtained. If a reply cannot be obtained normally in step S92, the process returns to step S89. If a reply is normally obtained in step S92, the number of hops received as a result of the ping is acquired in step S93. In step S94, (data index value (temporary priority)) × (hop number value) = (main priority) is stored in the work area, and the process returns to step S89. When the processing from step S89 to step S94 is performed for the IP addresses in all the fixed data in the default list, in step S95, the virtual file server connection destinations are rearranged while rearranging the work area information in this priority order. Generate a file server list table.

図１５のオペレーション処理は、ユーザからの操作を受け付ける処理という意味である。図１５において、ステップＳ９６では、ユーザからの操作により、追加変更したい仮想ファイルシステムの接続先を受け付ける。ステップＳ９７において、受け付けたIPアドレスに対し、pingを送信する。ステップＳ９８において、pingの返信が正常に受け取れたか否かを判断する。pingの返信を正常に受け取れた場合には、ステップＳ９９において、pingの結果からhop数を取得する。ステップＳ１００において、（データのインデックス値（仮優先度））×（hop数の値）＝（本優先度）として、当該IPアドレスをワークエリアに格納し、ステップＳ１０２に進む。ステップＳ９８で、pingの返信が正常に受け取れなかった場合には、ステップＳ１０１において、当該IPアドレスは、追加、変更できない旨をユーザに通知して（表示して）ステップＳ１０２に進む。ステップＳ１０２では、ワークエリアの情報を本優先度の順番で並び替えながら、仮想ファイルサーバの接続先ファイルサーバリストテーブルを生成して、処理を終了する。 The operation process of FIG. 15 means a process of accepting an operation from the user. In FIG. 15, in step S96, the connection destination of the virtual file system to be added or changed is received by an operation from the user. In step S97, ping is transmitted to the accepted IP address. In step S98, it is determined whether a ping reply has been received normally. If the ping reply can be received normally, the number of hops is acquired from the ping result in step S99. In step S100, the IP address is stored in the work area as (data index value (temporary priority)) × (value of hop number) = (main priority), and the process proceeds to step S102. In step S98, if the ping reply cannot be received normally, in step S101, the user is notified (displayed) that the IP address cannot be added or changed, and the process proceeds to step S102. In step S102, the virtual file server connection destination file server list table is generated while the work area information is rearranged in the order of the priorities, and the process ends.

図１７は、第１の構成例の全体動作を示す図である。
まず、CPUで障害が発生するとする（１）。すると、システム障害割り込み部を介して、障害ハンドリング部にシステム障害が通知される（２）。障害ハンドリング部では、障害の深刻度自動判定部、収集障害データ自動選択部、収集解析データ自動選択部の処理を行い、障害処理部に障害通知を行う。障害処理部では（３）、障害データの収集サイズの計算を行う。そして、自プロセッサにファイルシステムがあるか、および、他プロセッサにファイルシステムがあるかを判断する。他プロセッサ（例えば、メインプロセッサ）にファイルシステムがある場合には、他プロセッサのファイルシステムの残量を、プロセッサ間通信でチェックする。今、他プロセッサのファイルシステムに残量が十分ないと判断されたとする。すると、つぎに、接続先ファイルサーバリストデータを参照して、リストから接続先のファイルサーバを選択する。そして、仮想ファイルシステムをマウントして、仮想ファイルシステムに、障害要因、障害データ、障害解析用データを出力する。データの出力完了を待って、CPUをリセットする。 FIG. 17 is a diagram illustrating the overall operation of the first configuration example.
First, assume that a failure occurs in the CPU (1). Then, a system fault is notified to the fault handling unit via the system fault interrupt unit (2). The fault handling unit performs processing of a fault severity automatic determination unit, a collected fault data automatic selection unit, and a collected analysis data automatic selection unit, and notifies the fault processing unit of faults. In the failure processing unit (3), the collection size of failure data is calculated. Then, it is determined whether the own processor has a file system and whether another processor has a file system. When there is a file system in another processor (for example, main processor), the remaining amount of the file system of the other processor is checked by inter-processor communication. Assume that it is determined that there is not enough remaining capacity in the file system of another processor. Then, referring to the connection destination file server list data, the connection destination file server is selected from the list. Then, the virtual file system is mounted, and the failure factor, failure data, and failure analysis data are output to the virtual file system. Wait for the data output to complete and reset the CPU.

図１８は、第１の構成例の一連の処理の流れを示すシーケンス図である。
CPUで障害検出が起こると、障害ハンドリング部では、障害解析、障害深刻度自動判定、収集障害データ自動選択、収集解析データ自動選択を行い、障害通知を障害処理部に行う。なお、収集障害データ自動選択と収集解析データ自動選択は、深刻度にしたがって、処理を分岐して、異なる処理を行う。 FIG. 18 is a sequence diagram illustrating a flow of a series of processes in the first configuration example.
When failure detection occurs in the CPU, the failure handling unit performs failure analysis, automatic determination of failure severity, automatic collection failure data selection, automatic collection analysis data selection, and sends failure notification to the failure processing unit. Note that the collection failure data automatic selection and the collection analysis data automatic selection branch different processes according to the severity and perform different processes.

障害処理部では、障害通知を受け付けると、自プロセッサ／他プロセッサの不揮発性メモリに障害データや解析データが格納可能か否かを問い合わせる。自プロセッサ／他プロセッサ不揮発性メモリ格納可能判断部では、自プロセッサにファイルシステムがあるか、及び、他プロセッサにファイルシステムがあるかを判断する。今の場合、他プロセッサにファイルシステムがあるとする。すると、ファイルシステム残量チェック部が、プロセッサ間通信で他プロセッサ（メインプロセッサ）にファイルシステムの残量のチェックを依頼する。メインプロセッサのファイルシステム残量チェック部では、残量チェック依頼を受け、必要ファイルサイズをチェックし、不揮発性メモリにアクセスして残量を取得し、結果を自プロセッサ（サブプロセッサ）に通知する。サブプロセッサの自プロセッサ／他プロセッサ不揮発性メモリ格納可能判断部は、残量が十分あるか判断する。今の場合、残量が不十分であるとする。すると、障害処理部にデータを収集しきれない旨の通知が行われる。障害処理部では、接続先ファイルサーバリストデータを参照し、リストからファイルサーバを選択し、仮想ファイルシステムをマウントし、ファイルサーバのファイルシステムにアクセスする。そして、障害要因、障害データ及び解析データをファイルサーバのファイルシステムに出力し、CPUをリセットする。 When receiving a failure notification, the failure processing unit inquires whether failure data or analysis data can be stored in the non-volatile memory of its own processor / other processor. The own processor / other processor non-volatile memory storage possibility determination unit determines whether the own processor has a file system and whether the other processor has a file system. In this case, assume that another processor has a file system. Then, the file system remaining amount check unit requests another processor (main processor) to check the remaining amount of the file system through inter-processor communication. The file system remaining amount check unit of the main processor receives the remaining amount check request, checks the necessary file size, obtains the remaining amount by accessing the nonvolatile memory, and notifies the own processor (sub processor) of the result. The sub processor's own processor / other processor non-volatile memory storage possibility determination unit determines whether the remaining amount is sufficient. In this case, it is assumed that the remaining amount is insufficient. Then, the failure processing unit is notified that the data cannot be collected. The failure processing unit refers to the connection destination file server list data, selects a file server from the list, mounts the virtual file system, and accesses the file system of the file server. Then, the failure factor, failure data, and analysis data are output to the file system of the file server, and the CPU is reset.

図１９〜図２１は、本実施形態の第２の構成例を説明する図である。
第２の構成例では、メインプロセッサまたはサブプロセッサにて障害が発生し、自プロセッサにファイルシステム(または不揮発性メモリ)が具備されているが、残量が少なく、仮想ファイルシステムにアクセスする場合を説明する。 19 to 21 are diagrams illustrating a second configuration example of the present embodiment.
In the second configuration example, when a failure occurs in the main processor or sub processor and the own processor has a file system (or non-volatile memory), the remaining amount is small and the virtual file system is accessed. explain.

図１９において、図１と同様の構成要素には同様の参照符号を付し、それらの説明を省略する。 19, the same components as those in FIG. 1 are denoted by the same reference numerals, and the description thereof is omitted.

図１９において、自プロセッサは、メインプロセッサあるいはサブプロセッサである。
CPU１２の障害をシステム障害割り込み部１３で受けて、障害ハンドリング部１４を起動する。障害ハンドリング部１４で障害分析を行い、その情報を基に障害処理部１５を起動する。自プロセッサには、ファイルシステム３０が設けられているが、残量が少ないとしている。そこで、ファイルサーバの選択をし、障害要因、障害データ・障害解析データ格納を行い、CPUをリセットする。 In FIG. 19, the own processor is a main processor or a sub processor.
The system failure interrupt unit 13 receives the failure of the CPU 12 and activates the failure handling unit 14. The failure handling unit 14 performs failure analysis, and the failure processing unit 15 is activated based on the information. The own processor is provided with the file system 30, but the remaining amount is assumed to be small. Therefore, the file server is selected, the failure factor, failure data / failure analysis data are stored, and the CPU is reset.

図２０は、第２の構成例の全体の流れを示す図である。
まず、CPUで障害が発生するとする（１）。すると、システム障害割り込み部を介して、障害ハンドリング部にシステム障害が通知される（２）。障害ハンドリング部では、障害の深刻度自動判定部、収集障害データ自動選択部、収集解析データ自動選択部の処理を行い、障害処理部に障害通知を行う。障害処理部では（３）、障害データの収集サイズの計算を行う。そして、自プロセッサにファイルシステムがあるかを判断する。自プロセッサにファイルシステムがある場合には、自プロセッサのファイルシステムの残量をチェックする。今、自プロセッサのファイルシステムに残量がないと判断されたとする。すると、つぎに、接続先ファイルサーバリストデータを参照して、リストから接続先のファイルサーバを選択する。そして、仮想ファイルシステムをマウントして、仮想ファイルシステムに、障害要因、障害データ、障害解析用データを出力する。データの出力完了を待って、CPUをリセットする。 FIG. 20 is a diagram illustrating an overall flow of the second configuration example.
First, assume that a failure occurs in the CPU (1). Then, a system fault is notified to the fault handling unit via the system fault interrupt unit (2). The fault handling unit performs processing of a fault severity automatic determination unit, a collected fault data automatic selection unit, and a collected analysis data automatic selection unit, and notifies the fault processing unit of faults. In the failure processing unit (3), the collection size of failure data is calculated. Then, it is determined whether the processor has a file system. If the own processor has a file system, the remaining amount of the file system of the own processor is checked. Assume that it is determined that there is no remaining capacity in the file system of its own processor. Then, referring to the connection destination file server list data, the connection destination file server is selected from the list. Then, the virtual file system is mounted, and the failure factor, failure data, and failure analysis data are output to the virtual file system. Wait for the data output to complete and reset the CPU.

図２１は、第２の構成例の一連の処理の流れを示すシーケンス図である。
CPUで障害検出が起こると、障害ハンドリング部では、障害解析、障害深刻度自動判定を行い、収集障害データ自動選択、収集解析データ自動選択（収集データの自動選択）を行い、障害通知を障害処理部に行う。なお、収集データの自動選択は、深刻度にしたがって、処理を分岐して、異なる処理を行う。 FIG. 21 is a sequence diagram illustrating a flow of a series of processes in the second configuration example.
When a failure is detected in the CPU, the failure handling unit performs failure analysis, automatic determination of failure severity, automatic collection failure data selection, collection analysis data automatic selection (collection data automatic selection), and failure notification failure processing To the department. Note that the automatic selection of collected data branches different processes according to the severity and performs different processes.

障害処理部では、障害通知を受け付けると、自プロセッサの不揮発性メモリに障害データや解析データ等が格納可能か否かを問い合わせる。ファイルシステム残量チェック部が、自プロセッサにファイルシステムの残量チェックを依頼する。自プロセッサのファイルシステム残量チェック部では、残量チェック依頼を受け、ファイルシステムの残量を取得し、結果を障害処理部に通知する。障害処理部は、残量が十分あるか判断する。今の場合、残量が不十分であるとする。障害処理部では、接続先ファイルサーバリストデータを参照し、リストからファイルサーバを選択し、仮想ファイルシステムをマウントし、ファイルサーバのファイルシステムにアクセスする。そして、障害要因、障害データ及び解析データをファイルサーバのファイルシステムに出力し、CPUをリセットする。 When receiving the failure notification, the failure processing unit inquires whether failure data, analysis data, or the like can be stored in the nonvolatile memory of the own processor. The file system remaining amount check unit requests its own processor to check the remaining amount of the file system. The file system remaining amount check unit of the own processor receives the remaining amount check request, acquires the remaining amount of the file system, and notifies the failure processing unit of the result. The failure processing unit determines whether the remaining amount is sufficient. In this case, it is assumed that the remaining amount is insufficient. The failure processing unit refers to the connection destination file server list data, selects a file server from the list, mounts the virtual file system, and accesses the file system of the file server. Then, the failure factor, failure data, and analysis data are output to the file system of the file server, and the CPU is reset.

図２２〜図２４は、本実施形態の第３の構成例を説明する図である。
図２２において、図１９と同様の構成要素には同様の参照符号を付し、それらの説明を省略する。 22 to 24 are diagrams illustrating a third configuration example of the present embodiment.
In FIG. 22, the same components as those in FIG. 19 are denoted by the same reference numerals, and description thereof will be omitted.

第３の構成例では、メインプロセッサまたはサブプロセッサにて障害が発生し、自プロセッサ及び他プロセッサにファイルシステム(または不揮発性メモリ)が具備されてない場合を示す。 The third configuration example shows a case where a failure occurs in the main processor or sub processor, and the file system (or non-volatile memory) is not provided in the own processor and other processors.

CPU１２の障害をシステム障害割り込み部１３で受けて、障害ハンドリング部１４を起動する。障害ハンドリング部１４で障害分析を行い、その情報を基に障害処理部１５を起動する。ファイルサーバの選択をし、障害要因、障害データ、障害解析データ格納を行い、CPUをリセットする。 The system failure interrupt unit 13 receives the failure of the CPU 12 and activates the failure handling unit 14. The failure handling unit 14 performs failure analysis, and the failure processing unit 15 is activated based on the information. Select the file server, store the cause of failure, failure data, failure analysis data, and reset the CPU.

図２３は、第３の構成例の全体の流れを示す図である。
まず、CPUで障害が発生するとする（１）。すると、システム障害割り込み部を介して、障害ハンドリング部にシステム障害が通知される（２）。障害ハンドリング部では、障害の深刻度自動判定部、収集障害データ自動選択部、収集解析データ自動選択部の処理を行い、障害処理部に障害通知を行う。障害処理部では（３）、障害データの収集サイズの計算を行う。そして、自プロセッサ及び他プロセッサにファイルシステムがあるかを判断する。今、自プロセッサ及び他プロセッサにファイルシステムがないと判断されたとする。すると、つぎに、接続先ファイルサーバリストデータを参照して、リストから接続先のファイルサーバを選択する。そして、仮想ファイルシステムをマウントして、仮想ファイルシステムに、障害要因、障害データ、障害解析用データを出力する。データの出力完了を待って、CPUをリセットする。 FIG. 23 is a diagram illustrating an overall flow of the third configuration example.
First, assume that a failure occurs in the CPU (1). Then, a system fault is notified to the fault handling unit via the system fault interrupt unit (2). The fault handling unit performs processing of a fault severity automatic determination unit, a collected fault data automatic selection unit, and a collected analysis data automatic selection unit, and notifies the fault processing unit of faults. In the failure processing unit (3), the collection size of failure data is calculated. Then, it is determined whether the own processor and the other processor have a file system. Assume that it is determined that there is no file system in the own processor and other processors. Then, referring to the connection destination file server list data, the connection destination file server is selected from the list. Then, the virtual file system is mounted, and the failure factor, failure data, and failure analysis data are output to the virtual file system. Wait for the data output to complete and reset the CPU.

図２４は、第３の構成例の一連の処理の流れを示すシーケンス図である。
CPUで障害検出が起こると、障害ハンドリング部では、障害解析、障害深刻度自動判定を行い、収集障害データ自動選択、収集解析データ自動選択（収集データの自動選択）を行い、障害通知を障害処理部に行う。なお、収集データの自動選択は、深刻度にしたがって、処理を分岐して、異なる処理を行う。 FIG. 24 is a sequence diagram illustrating a flow of a series of processes of the third configuration example.
When a failure is detected in the CPU, the failure handling unit performs failure analysis, automatic determination of failure severity, automatic collection failure data selection, collection analysis data automatic selection (collection data automatic selection), and failure notification failure processing To the department. Note that the automatic selection of collected data branches different processes according to the severity and performs different processes.

障害処理部では、障害通知を受け付けると、自プロセッサあるいは他プロセッサにファイルシステムがあるか否かを判断する。今の場合、自プロセッサにも他プロセッサにもファイルシステムが無いとする。障害処理部では、接続先ファイルサーバリストデータを参照し、リストからファイルサーバを選択し、仮想ファイルシステムをマウントし、ファイルサーバのファイルシステムにアクセスする。そして、障害要因、障害データ及び解析データをファイルサーバのファイルシステムに出力し、CPUをリセットする。 When receiving the failure notification, the failure processing unit determines whether the own processor or another processor has a file system. In this case, it is assumed that neither the own processor nor another processor has a file system. The failure processing unit refers to the connection destination file server list data, selects a file server from the list, mounts the virtual file system, and accesses the file system of the file server. Then, the failure factor, failure data, and analysis data are output to the file system of the file server, and the CPU is reset.

図２５〜図２９は、本実施形態を具体的な構成に適用した場合を説明する図である。
図２５〜図２７を参照して、サブプロセッサでWDTが発生した際、サブプロセッサにファイルシステムが具備されていない場合、かつ、メインプロセッサに十分な容量のファイルシステムが存在する場合を説明する。 25 to 29 are diagrams for explaining a case where the present embodiment is applied to a specific configuration.
With reference to FIG. 25 to FIG. 27, a description will be given of a case where a file system is not provided in the sub processor when a WDT occurs in the sub processor and a file system having a sufficient capacity exists in the main processor.

図２５で、WDTドライバ４０は、サブプロセッサ１０でWDTが発生した際、WDTの発生を示す割り込みを発生させる。また、障害種別の解析と障害収集データの自動判断を実施する。ＯＳ（Operating System）４１は、WDTドライバ４０からの割り込みを、割り込みハンドラ経由でデータ収集処理部４２に通知する。データ収集処理部４２は、ＯＳ４１から割り込みハンドラ経由で起動され、以下の処理を順次行なう。
（１）割り込み要求を受け付ける
（２）メインプロセッサ１１のファイルシステム残量をチェックするためのプロセッサ間通信を行い、障害データを保存できるだけのサイズがあるかをチェックする。
（３）仮想ファイルシステムを接続する接続先ファイルサーバリストデータから優先度の高いものを選択する。
（４）システムCallを使用し、仮想ファイルシステムのプロトコル(NFSやVFS)経由でLANから先のファイルサーバ上のファイルシステムをマウントする。
（５）WDT要因や障害データを全て仮想ファイルシステム上にセーブする
（６）CPUをリセットする。 In FIG. 25, when the WDT is generated in the sub processor 10, the WDT driver 40 generates an interrupt indicating the generation of the WDT. Also, failure type analysis and failure collection data automatic judgment are performed. An OS (Operating System) 41 notifies an interrupt from the WDT driver 40 to the data collection processing unit 42 via an interrupt handler. The data collection processing unit 42 is activated from the OS 41 via an interrupt handler and sequentially performs the following processing.
(1) Accepting an interrupt request (2) Inter-processor communication for checking the remaining file system of the main processor 11 is performed to check whether there is enough size to store the failure data.
(3) Select a high priority file server list data to which the virtual file system is connected.
(4) Mount the file system on the file server ahead of the LAN via the virtual file system protocol (NFS or VFS) using the system call.
(5) Save all WDT factors and failure data on the virtual file system. (6) Reset the CPU.

通信用ドライバ４３は、メインプロセッサ１１と通信するためのドライバ(PCI（Peripheral Component Interconnect）バスドライバ、割り込みドライバ等)である。メインプロセッサ側のサイズチェック処理部４４は、問い合わせがあった場合、ファイルシステムの使用可能サイズをチェックし、結果をサブプロセッサ１０に知らせる処理を行う。仮想ファイルシステムプロトコル４５は、NFS(Network File System)やVFS(Virtual File System)といったプロトコルである。ETHER NETドライバ（ETHドライバ）４６は、イーサネット（登録商標）による通信を行うためのドライバであり、PHY４７を制御し、仮想ファイルシステムプロトコルから制御されるドライバである。PHY４７は、LAN経由でサーバに接続する物理的なデバイスである。LAN４８は、ファイルサーバ４９と基板Unit９とを接続する。ファイルサーバ４９は、基板Unit９と仮想的に接続されるサーバである。 The communication driver 43 is a driver (such as a PCI (Peripheral Component Interconnect) bus driver, an interrupt driver) for communicating with the main processor 11. When there is an inquiry, the size check processing unit 44 on the main processor side checks the usable size of the file system and performs processing for notifying the sub processor 10 of the result. The virtual file system protocol 45 is a protocol such as NFS (Network File System) or VFS (Virtual File System). The ETHER NET driver (ETH driver) 46 is a driver for performing communication by Ethernet (registered trademark), and is a driver that controls the PHY 47 and is controlled by the virtual file system protocol. The PHY 47 is a physical device that connects to a server via a LAN. The LAN 48 connects the file server 49 and the board unit 9. The file server 49 is a server virtually connected to the board unit 9.

サブプロセッサにおいてWDTが発生し、ハングアップ通知(WDT割り込み)をWDTドライバで検出する。WDTドライバはプロセッサでWDTが発生した際、WDT発生の意味を持つUser割り込みを発生させる。OSの処理はWDTドライバからのUser割り込みを、割り込みハンドラ経由でデータ収集処理部に通知することである。データ収集処理部は割り込み要求を受け付ける。 A WDT is generated in the sub processor, and a hangup notification (WDT interrupt) is detected by the WDT driver. When a WDT occurs in the processor, the WDT driver generates a User interrupt that has the meaning of generating a WDT. The OS processing is to notify the data collection processing unit of a User interrupt from the WDT driver via an interrupt handler. The data collection processing unit accepts an interrupt request.

データ収集処理部は、図２６及び図２７のPCI通信処理により、メインプロセッサと通信する。もし、メインプロセッサのファイルシステムに障害データを保存できるだけのサイズが残っていれば、サブプロセッサ側のWDT要因や障害データを全てメインプロセッサ配下のファイルシステムにPCI通信にて送信格納する。そして、CPUリセットしてサブプロセッサを立ち上げなおす。 The data collection processing unit communicates with the main processor by the PCI communication processing of FIGS. If there is still enough size to save the failure data in the file system of the main processor, all the WDT factors and failure data on the sub processor side are transmitted and stored by PCI communication to the file system under the main processor. Then, reset the CPU and restart the sub processor.

図２６は、ファイルシステムの残量チェックのためのPCI通信処理を説明する図である。 FIG. 26 is a diagram for explaining PCI communication processing for checking the remaining amount of the file system.

ステップＳ１１０において、サブプロセッサ（ここで障害が発生したとする）は、PCI通信を開始し、ステップＳ１１１において、PCI空間の通信種別をレジスタに書き込む。ここでは、ファイルサイズチェックを書き込む。ステップＳ１１６において、PCI空間の通信トリガをPCI送信A-＞B方向のレジスタに書き込む。メインプロセッサでは、ステップＳ１１２において、通信領域をポーリングしておき、ステップＳ１１３において、通信トリガを認識する。ステップＳ１１４において、通信要求を認識し、PCI割り込みを発生させる。ステップＳ１２３において、通信リクエストを読み取り、ステップＳ１１５において、ファイルシステムの残量をチェックする。ステップＳ１１８において、PCI空間の通信結果をレジスタに書き込み、ステップＳ１１９で、PCI空間の通信トリガをPCI送信B->A方向のレジスタに書き込む。ここでは、結果は、データの収集が可能という旨のデータである。サブプロセッサでは、ステップＳ１１７において、通信領域のポーリングを開始し、ステップＳ１２０において、メインプロセッサからの通信トリガを認識する。ステップＳ１２１において、通信要求を認識して、PCI割り込みを発生する。ステップＳ１２２において、PCI通信結果を読み取り、処理を終了する。 In step S110, the sub processor (assuming that a failure has occurred) starts PCI communication, and in step S111, the communication type of the PCI space is written in the register. Here, a file size check is written. In step S116, a communication trigger for the PCI space is written to a register in the PCI transmission A-> B direction. The main processor polls the communication area in step S112, and recognizes the communication trigger in step S113. In step S114, the communication request is recognized and a PCI interrupt is generated. In step S123, the communication request is read. In step S115, the remaining amount of the file system is checked. In step S118, the PCI space communication result is written in the register, and in step S119, the PCI space communication trigger is written in the PCI transmission B-> A direction register. Here, the result is data indicating that data can be collected. The sub processor starts polling of the communication area in step S117, and recognizes a communication trigger from the main processor in step S120. In step S121, the communication request is recognized and a PCI interrupt is generated. In step S122, the PCI communication result is read, and the process ends.

図２７は、WDT要因や障害データ等を書き込む際のPCI通信を説明する図である。
サブプロセッサは、ステップＳ１３０において、PCI通信を開始し、ステップＳ１３１において、PCI空間の通信種別をレジスタに書き込む。ここでは、ファイルを書き込む旨の書き込みを行う。ステップＳ１３２において、PCI空間の通信トリガをPCI送信A->B方向のレジスタに書き込む。ステップＳ１３３で、PCI空間の送信レジスタにファイルサイズとファイルを書き出す。メインプロセッサでは、ステップＳ１３４において、通信領域のポーリングを行っており、ステップＳ１３５において、通信トリガを認識する。ステップＳ１３６において、通信要求を認識し、PCI割り込みを発生する。ステップＳ１３７において、通信リクエストを読み取り、ステップＳ１３８において、ファイルサイズ分のファイルの内容を読み出し、自ファイルシステムに書き出す。ステップＳ１３９において、PCI空間の通信結果をレジスタに書き込み、ステップＳ１４０で、PCI空間の通信トリガをPCI送信B->A方向のレジスタに書き込む。サブプロセッサでは、ステップＳ１４１において、通信領域のポーリングを始めており、ステップＳ１４２において、通信トリガを認識する。ステップＳ１４３において、通信要求を認識し、PCI割り込みを発生する。ステップＳ１４４において、PCI通信結果を読み取り、処理を終了する。 FIG. 27 is a diagram for explaining PCI communication when writing WDT factors, failure data, and the like.
In step S130, the sub processor starts PCI communication. In step S131, the sub processor writes the communication type of the PCI space in the register. Here, writing to write the file is performed. In step S132, a PCI space communication trigger is written to a PCI transmission A-> B direction register. In step S133, the file size and the file are written to the PCI space transmission register. The main processor polls the communication area in step S134, and recognizes a communication trigger in step S135. In step S136, the communication request is recognized and a PCI interrupt is generated. In step S137, the communication request is read. In step S138, the contents of the file corresponding to the file size are read and written to the own file system. In step S139, the PCI space communication result is written to the register, and in step S140, the PCI space communication trigger is written to the PCI transmission B-> A direction register. In step S141, the sub-processor starts polling the communication area, and recognizes a communication trigger in step S142. In step S143, the communication request is recognized and a PCI interrupt is generated. In step S144, the PCI communication result is read, and the process ends.

次に、サブプロセッサでWDTが発生した際、サブプロセッサにファイルシステムが具備されていない場合、かつ、メインプロセッサに十分な容量のファイルシステムが存在しない場合を説明する。 Next, a description will be given of a case where a file system is not provided in the sub processor and a file system having a sufficient capacity does not exist in the main processor when a WDT occurs in the sub processor.

サブプロセッサでWDTが発生し、ハングアップ通知(WDT T.O 割り込み)をWDTドライバで検出する。WDTドライバはプロセッサでWDTが発生した際、WDT発生の意味を持つUser割り込みを発生させる。OSの処理はWDTドライバからのUser割り込みを、割り込みハンドラ経由でデータ収集処理部に通知することである。データ収集処理部は割り込み要求を受け付ける。データ収集処理部は、図２８のPCI通信処理により、メインプロセッサと通信する。もし、障害データを保存できるだけのサイズが残ってなければ、仮想ファイルシステムを接続する先のリストデータから優先度の高いものから選択する。次に、システムCallを使用し、仮想ファイルシステムのプロトコル(NFSやVFS)経由でLANから先のファイルサーバ上のファイルシステムをマウントする。システムCallの戻り値が正常で、マウントに成功したら、そのファイルシステムに、WDT要因や障害データを全てセーブする。CPUリセットを行い、仮想ファイルのマウントも同時に解除される。 The WDT is generated in the sub processor and the hangup notification (WDT T.O interrupt) is detected by the WDT driver. When a WDT occurs in the processor, the WDT driver generates a User interrupt that has the meaning of generating a WDT. The OS processing is to notify the data collection processing unit of a User interrupt from the WDT driver via an interrupt handler. The data collection processing unit accepts an interrupt request. The data collection processing unit communicates with the main processor by the PCI communication process of FIG. If there is not enough size to store the failure data, the list data to which the virtual file system is connected is selected from the list data with higher priority. Next, the system Call is used to mount the file system on the file server ahead of the LAN via the virtual file system protocol (NFS or VFS). If the return value of the system call is normal and the mount is successful, save all WDT factors and failure data in the file system. The CPU is reset and the virtual file is unmounted at the same time.

図２８は、ファイル残量問い合わせの際のPCI通信を説明する図である。図２８は、図２６と同様であり、同様のステップには同様のステップ番号を付し、それらの説明を省略する。 FIG. 28 is a diagram for explaining PCI communication when a file remaining amount inquiry is made. FIG. 28 is the same as FIG. 26, and the same steps are denoted by the same step numbers and the description thereof is omitted.

図２８では、ステップＳ１１５のファイルシステムの残量チェックの結果、残量が十分でないことが判明する場合であり、通信結果として、データの収集が不可である旨がレジスタに書き込まれる。 In FIG. 28, the result of checking the remaining amount of the file system in step S115 indicates that the remaining amount is not sufficient, and the fact that data cannot be collected is written in the register as a communication result.

次に、メインプロセッサでWDTが発生した際、メインプロセッサに具備されているファイルシステムが収集データに対して十分な容量を確保できない場合を図２９で説明する。 Next, a case where a file system provided in the main processor cannot secure a sufficient capacity for collected data when a WDT occurs in the main processor will be described with reference to FIG.

図２９において、図２５と同様な構成要素には同様な参照符号を付し、それらの説明を省略する。 In FIG. 29, the same components as those in FIG. 25 are denoted by the same reference numerals, and the description thereof will be omitted.

データ収集処理部４２は、OSから割り込みハンドラ経由で起動され、自身のファイルシステムの残量をチェックし、障害データを保存できるだけのサイズがあるかをチェックする。十分な保存容量がない場合には、仮想ファイルシステムを接続する先のファイルサーバリストデータから優先度の高いものから選択し、システムCallを使用し、仮想ファイルシステムのプロトコル(NFSやVFS)経由でLANから先のファイルサーバ上のファイルシステムをマウントする。そして、WDT要因や障害データ等を全て仮想ファイルシステム上にセーブし、CPUをリセットする。 The data collection processing unit 42 is started from the OS via an interrupt handler, checks the remaining capacity of its own file system, and checks whether there is enough size to store the failure data. If there is not enough storage space, select the file server list data to which the virtual file system is connected from the one with higher priority, use the system call, and via the virtual file system protocol (NFS or VFS) Mount the file system on the destination file server from the LAN. Then save all WDT factors and failure data on the virtual file system and reset the CPU.

メインプロセッサでWDTが発生し、ハングアップ通知（WDT T.O 割り込み)をWDTドライバで検出する。WDTドライバはプロセッサでWDTが発生した際、WDT発生の意味を持つUser割り込みを発生させる。OSはWDTドライバからのUser割り込みを、割り込みハンドラ経由でデータ収集処理部に通知する。データ収集処理部は割り込み要求を受け付ける。その後、自身のファイルシステム残量をチェックし、障害データを保存できるだけのサイズがあるかをチェックする。もし、障害データを保存できるだけのサイズが残っていれば、障害データを自プロセッサ配下のファイルシステムに格納し、CPUのリセットを行なう。もし、障害データを保存できるだけのサイズが残ってなければ、仮想ファイルシステムを接続する先のファイルサーバリストデータから優先度の高いものを選択する。 A WDT occurs in the main processor, and a hangup notification (WDT T.O interrupt) is detected by the WDT driver. When a WDT occurs in the processor, the WDT driver generates a User interrupt that has the meaning of generating a WDT. The OS notifies a user interrupt from the WDT driver to the data collection processing unit via an interrupt handler. The data collection processing unit accepts an interrupt request. After that, it checks its own file system remaining capacity and checks whether there is enough size to store the failure data. If there is still enough size to store the fault data, store the fault data in the file system under its own processor and reset the CPU. If there is not enough size to store the failure data, the one with higher priority is selected from the file server list data to which the virtual file system is connected.

次に、システムCallを使用し、仮想ファイルシステムのプロトコル(NFSやVFS)経由でLANから先のファイルサーバ上のファイルシステムをマウントする。システムCallの戻り値が正常で、マウントに成功したら、そのファイルシステムに、WDT要因や障害データを全てセーブし、CPUをリセットする。CPUリセットにより、仮想ファイルのマウントも解除される。 Next, the system Call is used to mount the file system on the file server ahead of the LAN via the virtual file system protocol (NFS or VFS). If the return value of the system call is normal and the mount is successful, save all WDT causes and failure data to the file system and reset the CPU. The virtual file is also unmounted by CPU reset.

１０サブプロセッサ
１１メインプロセッサ
１２ CPU
１３システム障害割り込み部
１４障害ハンドリング部
１５障害処理部
１６、４５仮想ファイルシステムプロトコル処理部
１７、４６ ETHドライバ
１８、４７ PHY
１９ファイルシステム残量チェック部
２０、３０ファイルシステム
２１、４９ファイルサーバ
２２ Call処理部
４０ WDTドライバ
４１ OS
４２データ収集処理部
４３通信用ドライバ
４４サイズチェック処理部
４８ LAN 10 Sub-processor 11 Main processor 12 CPU
13 System fault interrupt unit 14 Fault handling unit 15 Fault processing unit 16, 45 Virtual file system protocol processing unit 17, 46 ETH driver 18, 47 PHY
19 File system remaining amount check unit 20, 30 File system 21, 49 File server 22 Call processing unit 40 WDT driver 41 OS
42 Data Collection Processing Unit 43 Communication Driver 44 Size Check Processing Unit 48 LAN

Claims

A multiprocessor system having a plurality of processors,
The processor
Depending on the severity of the failure that occurred, a failure handling unit that collects data about the failure,
A file server access unit for accessing an external file server;
When a file system is mounted on the own processor or another processor in the multiprocessor system, it is determined whether data relating to the failure can be stored. If the data can be stored, the data relating to the failure is stored in the file system. If the data cannot be stored, a failure processing unit that stores data related to the failure in the file server;
A multiprocessor system comprising:

The multiprocessor system according to claim 1, wherein the failure having a severity level for collecting data related to the failure is a failure that requires a CPU reset.

The multiprocessor system according to claim 2, wherein the CPU is reset after the failure processing unit stores data related to the failure in the file system or the file server.

2. The multiprocessor according to claim 1, wherein the failure processing unit stores data relating to the failure in the file server when the file system is not installed in a local processor or another processor. 3. system.

The multiprocessor system according to claim 1, wherein the failure processing unit inquires whether or not the other processor has a capacity for storing data related to the failure using inter-processor communication.

The multiprocessor system according to claim 1, wherein the data relating to the failure includes failure data according to a failure factor, a failure type, and analysis data used for failure analysis.

The multiprocessor system according to claim 1, wherein the file system is provided in a nonvolatile memory mounted on the processor.

A failure information storage method in a multiprocessor system provided with a plurality of processors each having a file server access unit for accessing an external file server,
The processor
Depending on the severity of the failure that occurred, collect data about the failure,
When a file system is mounted on the own processor or another processor in the multiprocessor system, it is determined whether data relating to the failure can be stored. If the data can be stored, the data relating to the failure is stored in the file system. If the data cannot be stored, the data related to the failure is stored in the file server.
A failure information storage method characterized by the above.

10. The failure information storage method according to claim 9, wherein when the file system is not installed in the own processor or the other processor, the data relating to the failure is stored in the file server.