JP5954420B2

JP5954420B2 - Connection device and monitoring method

Info

Publication number: JP5954420B2
Application number: JP2014532624A
Authority: JP
Inventors: 一良宮澤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2012-08-29
Filing date: 2012-08-29
Publication date: 2016-07-20
Anticipated expiration: 2032-08-29
Also published as: WO2014033847A1; JPWO2014033847A1

Description

本発明は、接続装置、及び監視方法に関する。 The present invention relates to a connection device and a monitoring method.

メインフレームをはじめとする情報処理システムには、システム内の一部の装置やパス等が故障してもシステムダウンとならないような耐障害性が要求されることがある。
図１０は、情報処理システム１００の構成例を示す図である。図１０に示すように、情報処理システム１００は、２つのＣＰＵ（Central Processing Unit）２００−１及び２００−２、並びに２つのメモリ装置（Memory Storage；以下、ＭＳという）３００−１及び３００−２を有する。また、情報処理システム１００は、システム制御装置（System Controller；以下、ＳＣという）４００を有する。ＣＰＵ２００−１及び２００−２、並びにＭＳ３００−１及び３００−２は、それぞれＳＣ４００に接続され、冗長化されている。この構成により、情報処理システム１００は、ＣＰＵ２００−１、２００−２、ＭＳ３００−１、又は３００−２に障害が発生した場合、障害個所を切り離すことでシステムダウンを回避することができる。なお、以下、ＭＳ３００−１及び３００−２を区別しない場合には、単にＭＳ３００という。An information processing system such as a mainframe may be required to have fault tolerance so that the system does not go down even if some devices or paths in the system fail.
FIG. 10 is a diagram illustrating a configuration example of the information processing system 100. As shown in FIG. 10, the information processing system 100 includes two CPUs (Central Processing Units) 200-1 and 200-2, and two memory devices (hereinafter referred to as MSs) 300-1 and 300-2. Have The information processing system 100 includes a system controller (hereinafter referred to as SC) 400. The CPUs 200-1 and 200-2 and the MSs 300-1 and 300-2 are connected to the SC 400 and redundant. With this configuration, when a failure occurs in the CPU 200-1, 200-2, MS 300-1, or 300-2, the information processing system 100 can avoid a system down by separating the failure portion. Hereinafter, when MS 300-1 and 300-2 are not distinguished, they are simply referred to as MS 300.

また、図１０に示すように、情報処理システム１００は、２つの入出力処理装置（Input Output Processor；以下、ＩＯＰという）５００−１及び５００−２、並びに２つのブリッジ装置（BRidge；以下、ＢＲという）６００−１及び６００−２を有する。さらに、情報処理システム１００は、４つのチャネル装置（CHannel；以下、ＣＨという）７００−１〜７００−４、並びに２つの入出力装置（Input Output；以下、ＩＯという）８００−１及び８００−２を有する。なお、図１０及び後述する図１１において、ＩＯＰ５００−１及び５００−２はそれぞれＩＯＰ＃０及び＃１と表記し、ＣＨ７００−１〜７００−４はそれぞれＣＨ＃０〜＃３と表記する場合がある。 As shown in FIG. 10, the information processing system 100 includes two input / output processing devices (Input Output Processors; hereinafter referred to as IOP) 500-1 and 500-2, and two bridge devices (BRidges; hereinafter referred to as BR). 600-1 and 600-2. Further, the information processing system 100 includes four channel devices (hereinafter referred to as CH) 700-1 to 700-4, and two input / output devices (hereinafter referred to as IO) 800-1 and 800-2. Have In FIG. 10 and FIG. 11 described later, IOPs 500-1 and 500-2 may be expressed as IOP # 0 and # 1, respectively, and CH700-1 to 700-4 may be expressed as CH # 0 to # 3, respectively. is there.

ＩＯＰ５００−１は、ＳＣ４００、ＢＲ６００−１を経由してＣＨ７００−１及び７００−２を制御し、ＩＯＰ５００−２は、ＳＣ４００、ＢＲ６００−２を経由してＣＨ７００−３及び７００−４を制御する。また、ＣＨ７００−１及び７００−３は、ＩＯ８００−１に接続され、ＭＳ３００とＩＯ８００−１との間のデータ転送を制御し、ＣＨ７００−２及び７００−４は、ＩＯ８００−２に接続され、ＭＳ３００とＩＯ８００−２との間のデータ転送を制御する。 The IOP 500-1 controls the CHs 700-1 and 700-2 via the SC 400 and the BR 600-1, and the IOP 500-2 controls the CHs 700-3 and 700-4 via the SC 400 and the BR 600-2. The CHs 700-1 and 700-3 are connected to the IO 800-1, and control data transfer between the MS 300 and the IO 800-1, and the CHs 700-2 and 700-4 are connected to the IO 800-2. And control data transfer between IO 800-2.

情報処理システム１００においては、ＣＰＵ２００−１又は２００−２は、ＩＯアクセスが発生した場合、ＩＯＰ５００−１又は５００−２にＩＯ命令を渡して、ＩＯ８００−１又は８００−２へのＩＯアクセスを実行させる。ＩＯＰ５００−１及び５００−２は、例えばＩＯ８００−１へアクセスするために、ＩＯＰ５００−１、ＣＨ７００−１、ＩＯ８００−１のパス（系列）、及び、ＩＯＰ５００−２、ＣＨ７００−３、ＩＯ８００−１のパス（系列）の２つのパスを用いることができる。これにより、情報処理システム１００は、ＩＯ系についても複数パスで構成されるため、一方のパスが故障しても、もう一方のパスが正常であればシステムダウンを回避して動作を継続することができる。 In the information processing system 100, when an IO access occurs, the CPU 200-1 or 200-2 passes an IO command to the IOP 500-1 or 500-2 and executes the IO access to the IO 800-1 or 800-2. Let For example, in order to access the IO 800-1, the IOPs 500-1 and 500-2 have a path (series) of the IOP 500-1, CH 700-1, and IO 800-1, and the IOP 500-2, CH 700-3, and IO 800-1. Two paths (series) can be used. As a result, the information processing system 100 is configured with a plurality of paths for the IO system, so even if one path fails, if the other path is normal, the system down is avoided and the operation is continued. Can do.

図１０に示す情報処理システム１００では、システム内の装置やパス等が故障した場合に、故障個所を早期に検出し、故障していない正常なパスを用いるアクセスに切り替えることが重要である。
関連する技術として、システム内の装置やパス等の故障を検出するために、システム内の２つの装置が、メモリを介して相手装置の生存を相互に監視する手法が知られている（例えば、特許文献１及び特許文献２参照）。In the information processing system 100 shown in FIG. 10, when a device or path in the system fails, it is important to detect the failure part at an early stage and switch to access using a normal path that does not fail.
As a related technique, in order to detect a failure such as a device or a path in the system, a method in which two devices in the system mutually monitor the survival of the partner device via a memory is known (for example, (See Patent Document 1 and Patent Document 2).

図１１は、装置間の相互監視の手順の一例を示す図である。図１１に示すように、情報処理システム１００のＩＯ系のパスを構成する第１の装置（ＩＯＰ＃０及び＃１）、並びに第２の装置（ＣＨ７００＃０〜＃４）は、ＭＳ３００を介して相手装置の生存を相互に監視する。なお、ＭＳ３００は、ＣＨ＃０〜＃３ごとに、領域１及び領域２の記憶領域を備える。 FIG. 11 is a diagram illustrating an example of a mutual monitoring procedure between apparatuses. As shown in FIG. 11, the first devices (IOP # 0 and # 1) and the second devices (CH 700 # 0 to # 4) configuring the IO path of the information processing system 100 are connected via the MS 300. To monitor the other devices' survival. Note that the MS 300 includes storage areas of area 1 and area 2 for each of CH # 0 to CH3.

各装置による相互監視は、以下の（ｉ）〜（vi）の手順により行なわれる。
（ｉ）ＣＨ＃０は、一定時間ごとに、ＭＳ３００上の領域１を任意の値（例えば所定の値）に更新する。
（ii）ＣＨ＃０は、一定時間ごとに、ＭＳ３００上の領域２の値をフェッチし、前回フェッチした値と比較して不一致であることを確認する。なお、ＣＨ＃０は、３回以上連続して前回フェッチした値と一致した値をフェッチした場合に、ＩＯＰ＃０がハングアップしているものとみなす。Mutual monitoring by each device is performed by the following procedures (i) to (vi).
(I) CH # 0 updates the area 1 on the MS 300 to an arbitrary value (for example, a predetermined value) at regular time intervals.
(Ii) CH # 0 fetches the value of area 2 on the MS 300 at regular time intervals, and confirms that they are inconsistent compared with the previously fetched value. Note that CH # 0 considers that IOP # 0 is hung up when fetching a value that matches the value fetched last time three or more times consecutively.

（iii）ＣＨ＃０以外のＣＨ＃１〜＃３も、同様の制御を行なう。
（iv）ＩＯＰ＃０は、一定時間ごとに、ＣＨ＃０の領域１の値と領域２の値とを比較して不一致であることを確認する。なお、ＩＯＰ＃０は、３回以上連続してＣＨ＃０の領域１の値と領域２の値とが一致している場合に、ＣＨ＃０がハングアップしているものとみなす。(Iii) The same control is performed for CH # 1 to CH3 other than CH # 0.
(Iv) The IOP # 0 compares the value of the region 1 of the CH # 0 with the value of the region 2 and confirms that they do not match at regular time intervals. Note that IOP # 0 considers that CH # 0 is hung up when the value of region 1 and the value of region 2 of CH # 0 match three or more times consecutively.

（ｖ）ＩＯＰ＃０は、参照したＣＨ＃０の領域１の値をＣＨ＃０の領域２にストアする。
（vi）ＩＯＰ＃０は、ＣＨ＃０以外のＣＨ＃１についても、ＣＨ＃０と同様にチェックする。また、ＩＯＰ＃１は、ＣＨ＃２及び＃３について、ＩＯＰ＃０と同様のチェックを行なう。(V) The IOP # 0 stores the value of the referenced region 1 of the CH # 0 in the region 2 of the CH # 0.
(Vi) IOP # 0 checks CH # 1 other than CH # 0 as well as CH # 0. In addition, IOP # 1 performs the same check on CH # 2 and # 3 as IOP # 0.

上記の（ｉ）〜（vi）の手順を繰り返すことにより、ＩＯＰ＃０及び＃１、並びにＣＨ＃０〜＃３は、相互監視を実施する。このように、ＩＯＰ＃０及び＃１、並びにＣＨ＃０〜＃３は、ＭＳ３００を定期的に更新することで、ＭＳ３００の更新が実施されているか否かを監視し、相手装置の異常を検出する。 By repeating the above steps (i) to (vi), the IOPs # 0 and # 1 and the CHs # 0 to # 3 perform mutual monitoring. In this way, IOP # 0 and # 1, and CH # 0 to # 3 periodically update the MS 300 to monitor whether or not the MS 300 has been updated and detect an abnormality in the counterpart device. To do.

特開平２−２０６８０６号公報JP-A-2-206806 特開平９−１２８２６８号公報JP-A-9-128268

ＩＯＰは、複数のＣＨを制御するためにビジー率が非常に高い。従って、ＩＯＰには、様々な処理を効率良く短時間に行なうことが要求される。
しかしながら、図１１に示す例では、ＩＯＰ＃０及び＃１は、上記手順の（iv）〜（vi）において、ＭＳ３００（領域１及び領域２）を参照しながら１ＣＨずつチェックを行なう。つまり、ＩＯＰ＃０及び＃１には、複数のＣＨ＃０〜＃３の制御に要する処理負荷及び処理時間に加えて、相互監視のチェックに要する処理負荷及び処理時間が発生する。IOP has a very high busy rate for controlling a plurality of CHs. Therefore, IOP is required to perform various processes efficiently and in a short time.
However, in the example shown in FIG. 11, IOPs # 0 and # 1 check each channel by referring to the MS 300 (region 1 and region 2) in (iv) to (vi) of the above procedure. That is, in IOP # 0 and # 1, in addition to the processing load and processing time required for control of the plurality of CH # 0 to # 3, processing load and processing time required for the mutual monitoring check are generated.

例えば情報処理システム１００がメインフレーム等の大規模なシステムである場合には、ＣＨ数が非常に多くなるため、ＩＯＰ＃０及び＃１において、チェックに要する処理負荷及び処理時間が増大してしまう。これにより、ＩＯＰ＃０及び＃１のビジー率が高くなるとともに、ビジー状態の継続時間が長くなり、システムのパフォーマンスに影響を与えることになる。 For example, when the information processing system 100 is a large-scale system such as a mainframe, the number of CHs is very large, so that the processing load and processing time required for checking increase in IOPs # 0 and # 1. . As a result, the busy rates of IOPs # 0 and # 1 are increased, and the duration of the busy state is increased, which affects system performance.

また、図１１に示す例では、ＣＨ＃０〜＃３は、上記手順の（ｉ）〜（iii）において、ＢＲ６００−１又は６００−２、並びにＳＣ４００を介してＭＳ３００へアクセスを行なう。このとき、ＣＨ＃０〜＃３によるＭＳ３００へのアクセス及びチェックに要する処理負荷及び処理時間が発生する。さらに、ＣＨ数が多い場合には、ＣＨ＃０〜＃ｎによるＭＳ３００へのアクセスが発生するため、ＢＲ６００−１又は６００−２、並びにＳＣ４００における処理負荷も増加する。 In the example shown in FIG. 11, CHs # 0 to # 3 access the MS 300 via the BR 600-1 or 600-2 and the SC 400 in the above procedures (i) to (iii). At this time, a processing load and a processing time required for accessing and checking the MS 300 by CH # 0 to # 3 are generated. Furthermore, when the number of CHs is large, access to the MS 300 by CHs # 0 to #n occurs, so the processing load on the BR 600-1 or 600-2 and the SC 400 also increases.

このように、図１１に示す例では、ＩＯＰ＃０及び＃１、並びにＣＨ＃０〜＃４の相互監視により、情報処理システム１００の処理負荷が増大し、パフォーマンスを低下させるという問題がある。
１つの側面では、本発明は、第１の装置及び第２の装置による相互監視を、システムの処理負荷を抑えた簡素な制御により実現することを目的とする。As described above, in the example illustrated in FIG. 11, there is a problem in that the processing load of the information processing system 100 increases due to the mutual monitoring of the IOPs # 0 and # 1 and the CHs # 0 to # 4, thereby reducing the performance.
In one aspect, an object of the present invention is to realize mutual monitoring by a first device and a second device by simple control with a reduced processing load on the system.

本件の接続装置は、第１の装置と第２の装置との間に介設された接続装置であって、前記第１の装置によりアクセスされる第１ビット領域と、前記第２の装置によりアクセスされる第２ビット領域とを有するレジスタ部と、前記第１の装置からの前記第１ビット領域へのアクセスを検出すると、前記第１ビット領域及び前記第２ビット領域にそれぞれ設定された値の組み合わせに基づき、前記第１ビット領域の値を制御するとともに、前記第２の装置からの前記第２ビット領域へのアクセスを検出すると、前記値の組み合わせに基づき、前記第２ビット領域の値を制御する書込制御部と、前記第１及び第２の装置の各々による前記レジスタ部へのアクセスを監視し、監視結果と前記値の組み合わせとに基づいて、前記第１及び第２の装置のうちのいずれか一方の装置に障害が発生したことを検出する検出部と、前記検出部により検出された前記一方の装置の障害の発生を、他方の装置へ通知する通知部と、を有する。 The connection device of the present invention is a connection device interposed between the first device and the second device, and includes a first bit area accessed by the first device , and the second device. A register unit having a second bit area to be accessed, and a value set in each of the first bit area and the second bit area upon detecting access to the first bit area from the first device; And controlling the value of the first bit area based on the combination of the values, and detecting the access to the second bit area from the second device, the value of the second bit area based on the combination of the values a write control unit for controlling, monitoring access to each by the register unit of the first and second devices, based on a combination of monitoring results and the value, the first and second Equipment It has a detecting unit for detecting that one of the one device fails, and the occurrence of a failure of the detection portion by said detected one device, a notification unit for notifying to the other device.

一実施形態によれば、第１の装置及び第２の装置による相互監視を、システムの処理負荷を抑えた簡素な制御により実現することができる。 According to one embodiment, mutual monitoring by the first device and the second device can be realized by simple control that suppresses the processing load of the system.

一実施形態に係る情報処理システムの構成例を示す図である。It is a figure showing an example of composition of an information processing system concerning one embodiment. 一実施形態に係るブリッジ装置の構成例を示す図である。It is a figure which shows the structural example of the bridge | bridging apparatus which concerns on one Embodiment. 図２に示す書込制御部による相互監視レジスタの状態の制御の一例を説明する図である。FIG. 3 is a diagram illustrating an example of control of a state of a mutual monitoring register by a write control unit illustrated in FIG. （ａ）は、チャネル装置に障害が発生している場合の、相互監視レジスタの状態遷移の一例を示すタイムチャートであり、（ｂ）は、入出力処理装置に障害が発生している場合の、相互監視レジスタの状態遷移の一例を示すタイムチャートである。(A) is a time chart which shows an example of the state transition of a mutual monitoring register when a failure has occurred in the channel device, and (b) is a case in which a failure has occurred in the input / output processing device. It is a time chart which shows an example of the state transition of a mutual monitoring register. 一実施形態に係るブリッジ装置による、入出力処理装置及びチャネル装置間の相互監視処理の一例を説明するフローチャートである。It is a flowchart explaining an example of the mutual monitoring process between the input-output processing apparatus and the channel apparatus by the bridge device which concerns on one Embodiment. 一実施形態に係る入出力処理装置による、入出力処理装置及びチャネル装置間の相互監視処理の一例を説明するフローチャートである。It is a flowchart explaining an example of the mutual monitoring process between the input-output processing apparatus and the channel apparatus by the input-output processing apparatus which concerns on one Embodiment. 一実施形態に係るチャネル装置による、入出力処理装置及びチャネル装置間の相互監視処理の一例を説明するフローチャートである。It is a flowchart explaining an example of the mutual monitoring process between the input / output processing apparatus and channel apparatus by the channel apparatus which concerns on one Embodiment. 一実施形態に係る入出力処理装置による、障害の発生が検出された装置の切り離し処理の一例を説明するフローチャートである。It is a flowchart explaining an example of the isolation | separation process of the apparatus by which the occurrence of the failure was detected by the input / output processing apparatus which concerns on one Embodiment. 一実施形態に係るチャネル装置による、障害の発生が検出された装置の切り離し処理の一例を説明するフローチャートである。It is a flowchart explaining an example of the disconnection process of the apparatus by which the occurrence of the failure was detected by the channel apparatus which concerns on one Embodiment. 情報処理システムの構成例を示す図である。It is a figure which shows the structural example of an information processing system. 装置間の相互監視の手順の一例を示す図である。It is a figure which shows an example of the procedure of the mutual monitoring between apparatuses.

以下、図面を参照して実施の形態を説明する。
〔１〕一実施形態
〔１−１〕情報処理システムの説明
図１は、一実施形態に係る情報処理システム１の構成例を示す図である。図１に示すように、情報処理システム１は、２つのＣＰＵ２−１及び２−２、２つのＭＳ３−１及び３−２、並びにＳＣ４を有する。また、情報処理システム１は、２つのＩＯＰ５−１及び５−２、２つのＢＲ６−１及び６−２、４つのＣＨ７−１〜７−４、並びに２つのＩＯ８−１及び８−２を有する。Hereinafter, embodiments will be described with reference to the drawings.
[1] One Embodiment [1-1] Description of Information Processing System FIG. 1 is a diagram illustrating a configuration example of an information processing system 1 according to one embodiment. As shown in FIG. 1, the information processing system 1 includes two CPUs 2-1 and 2-2, two MSs 3-1 and 3-2, and an SC4. The information processing system 1 also includes two IOPs 5-1 and 5-2, two BR6-1 and 6-2, four CH7-1 to 7-4, and two IO8-1 and 8-2. .

なお、以下、ＣＰＵ２−１及び２−２を区別しない場合には、単にＣＰＵ２といい、ＭＳ３−１及び３−２を区別しない場合には、単にＭＳ３といい、ＩＯＰ５−１及び５−２を区別しない場合には、単にＩＯＰ５という。また、以下、ＢＲ６−１及び６−２を区別しない場合には、単にＢＲ６といい、ＣＨ７−１〜７−４を区別しない場合には、単にＣＨ７といい、ＩＯ８−１及び８−２を区別しない場合には、単にＩＯ８という。 Hereinafter, when the CPUs 2-1 and 2-2 are not distinguished from each other, the CPU 2 is simply referred to as CPU 2. When the MSs 3-1 and 3-2 are not distinguished from each other, they are simply referred to as MS 3, and the IOPs 5-1 and 5-2 are referred to. If they are not distinguished, they are simply referred to as IOP5. In the following, when BR6-1 and 6-2 are not distinguished, they are simply referred to as BR6, and when CH7-1 to 7-4 are not distinguished, they are simply referred to as CH7, and IO8-1 and 8-2 are designated as IO8-1 and 8-2. If they are not distinguished, they are simply referred to as IO8.

なお、図１及び後述する図２において、ＩＯＰ５−１及び５−２はそれぞれＩＯＰ＃０及び＃１と表記し、ＢＲ６−１及び６−２はそれぞれＢＲ＃０及び＃１と表記する場合がある。また、図１及び後述する図２において、ＣＨ７−１〜７−４はそれぞれＣＨ＃０〜＃３と表記し、ＩＯ８−１及び８−２はそれぞれＩＯ＃０及び＃１と表記する場合がある。 In FIG. 1 and FIG. 2 to be described later, IOPs 5-1 and 5-2 may be expressed as IOP # 0 and # 1, respectively, and BR 6-1 and 6-2 may be expressed as BR # 0 and # 1, respectively. is there. Further, in FIG. 1 and FIG. 2 described later, CH7-1 to 7-4 may be expressed as CH # 0 to # 3, and IO8-1 and 8-2 may be expressed as IO # 0 and # 1, respectively. is there.

ＣＰＵ２−１及び２−２、並びにＭＳ３−１及び３−２は、それぞれＳＣ４に接続され、冗長化されている。また、ＩＯＰ５−１及び５−２、並びにＢＲ６−１及び６−２は、それぞれＳＣ４に接続され、冗長化されている。さらに、ＣＨ７−１及び７−２はそれぞれＢＲ６−１に接続されて冗長化され、ＣＨ７−３及び７−４はそれぞれＢＲ６−２に接続されて冗長化されている。また、ＩＯ８−１は、ＣＨ７−１及び７−３にそれぞれ接続され、ＩＯ８−２は、ＣＨ７−２及び７−４にそれぞれ接続されている。 The CPUs 2-1 and 2-2 and the MSs 3-1 and 3-2 are connected to the SC 4 and are made redundant. Also, the IOPs 5-1 and 5-2 and the BRs 6-1 and 6-2 are connected to the SC 4 and are made redundant. Further, CH7-1 and 7-2 are connected to BR6-1 for redundancy, and CH7-3 and 7-4 are connected to BR6-2 for redundancy. The IO8-1 is connected to the CH7-1 and 7-3, and the IO8-2 is connected to the CH7-2 and 7-4, respectively.

ＣＰＵ２は、種々の制御や演算を行なう処理装置である。ＣＰＵ２は、ＭＳ３又は図示しないＲＯＭ（Read Only Memory）等に格納されたプログラムを実行することにより、種々の機能を実現する。
ＭＳ（メモリ装置）３は、種々のデータやプログラムを一時的に格納する記憶装置であって、ＣＰＵ２がプログラムを実行する際に、データやプログラムを一時的に格納・展開して用いる。なお、ＭＳ３としては、ＲＡＭ（Random Access Memory）等の揮発性メモリを有する複数のメモリモジュールが挙げられる。The CPU 2 is a processing device that performs various controls and calculations. The CPU 2 realizes various functions by executing a program stored in the MS 3 or a ROM (Read Only Memory) (not shown).
The MS (memory device) 3 is a storage device that temporarily stores various data and programs, and the CPU 2 temporarily stores and expands the data and programs when the CPU 2 executes the programs. The MS 3 includes a plurality of memory modules having a volatile memory such as a RAM (Random Access Memory).

上述した構成により、情報処理システム１は、ＣＰＵ２又はＭＳ３に障害が発生した場合、障害個所を切り離すことでシステムダウンを回避することができる。
ＳＣ（システム制御装置）４は、ＣＰＵ２及びＭＳ３間のアクセスを制御するとともに、ＣＰＵ２と、他のＣＰＵ２、ＩＯＰ５、又はＢＲ６等との通信制御を行なう制御装置である。ＳＣ４としては、ＬＳＩ（Large Scale Integration）等の集積回路が挙げられる。なお、図示は省略しているが、情報処理システム１は、ＳＣ４を複数備えて冗長化しても良い。With the configuration described above, when a failure occurs in the CPU 2 or the MS 3, the information processing system 1 can avoid a system down by separating the failure portion.
The SC (system control device) 4 is a control device that controls access between the CPU 2 and the MS 3 and controls communication between the CPU 2 and another CPU 2, IOP 5, BR 6, or the like. Examples of SC4 include integrated circuits such as LSI (Large Scale Integration). Although not shown, the information processing system 1 may be redundantly provided with a plurality of SC4.

ＩＯＰ（入出力処理装置，第１の装置）５は、ＣＰＵ２に代わってＩＯ８（ＣＨ７）の制御を実行する処理部である。つまり、ＣＰＵ２は、ＩＯ命令を実行する場合、ＩＯＰ５にＩＯ命令を渡して、ＩＯ８へのＩＯアクセスを実行させる。具体的には、ＩＯＰ５は、ＣＰＵ２においてデータリード／ライト等のＩＯ命令が実行された場合に、ＩＯ８を制御するＣＨ７へＩＯ命令の詳細を送出する。図１に示す例では、ＩＯＰ５−１は、ＳＣ４、ＢＲ６−１を経由してＣＨ７−１及び７−２を制御し、ＩＯＰ５−２は、ＳＣ４、ＢＲ６−２を経由してＣＨ７−３及び７−４を制御する。また、ＩＯＰ５は、ＣＨ７からの割り込みを受けた場合に、ＳＣ４を介してＣＰＵ２へ当該割り込みを通知する。 The IOP (input / output processing device, first device) 5 is a processing unit that executes control of the IO 8 (CH 7) instead of the CPU 2. That is, when executing the IO instruction, the CPU 2 passes the IO instruction to the IOP 5 and causes the IO access to the IO 8 to be executed. Specifically, the IOP 5 sends details of the IO command to the CH 7 that controls the IO 8 when the CPU 2 executes an IO command such as data read / write. In the example shown in FIG. 1, IOP5-1 controls CH7-1 and 7-2 via SC4 and BR6-1, and IOP5-2 selects CH7-3 and SC7-3 via SC4 and BR6-2. 7-4 is controlled. When the IOP5 receives an interrupt from the CH7, the IOP5 notifies the CPU2 of the interrupt via the SC4.

また、本実施形態に係るＩＯＰ５は、ＣＨ７との相互監視の処理として、ＢＲ６の後述するレジスタ（相互監視レジスタ６１）に対して、所定時間ごとに監視用の情報の書込アクセスを行なう。なお、ＩＯＰ５は、ＢＲ６から監視対象のＣＨ７の障害の発生を通知されると、障害が発生したＣＨ７とＩＯ８との間の接続を切り離す処理を行なう。
ＩＯＰ５の相互監視及び切り離しの処理については後述する。Further, the IOP 5 according to the present embodiment performs write access of monitoring information at predetermined time intervals to a later-described register (mutual monitoring register 61) of the BR 6 as mutual monitoring processing with the CH 7. Note that when the occurrence of a failure in the monitoring target CH7 is notified from the BR6, the IOP5 performs a process of disconnecting the connection between the failed CH7 and the IO8.
The mutual monitoring and disconnection processing of IOP5 will be described later.

ＣＨ（チャネル装置，第２の装置）７は、ＭＳ３及びＩＯ８間のデータやコマンド等のデータ転送を制御する装置である。例えば、ＣＨ７は、ＩＯＰ５からＩＯ命令を受けると、この指示を解析し、解析した指示内容に従い制御対象のＩＯ８へ指示を送る。
また、本実施形態に係るＣＨ７は、ＩＯＰ５との相互監視の処理として、ＩＯＰ５と同様に、ＢＲ６のレジスタ（相互監視レジスタ６１）に対して、所定時間ごとに監視用の情報の書込アクセスを行なう。なお、ＣＨ７は、ＢＲ６から監視対象のＩＯＰ５の障害の発生を通知されると、自身のＣＨ７とＩＯ８との間の接続を切り離す処理を行なう。The CH (channel device, second device) 7 is a device that controls data transfer such as data and commands between the MS 3 and the IO 8. For example, when CH7 receives an IO command from IOP5, CH7 analyzes this instruction and sends an instruction to IO8 to be controlled according to the analyzed instruction content.
In addition, as a mutual monitoring process with the IOP5, the CH7 according to the present embodiment performs a monitoring information write access to the BR6 register (mutual monitoring register 61) every predetermined time as in the IOP5. Do. Note that, when the failure of the monitored IOP 5 is notified from the BR 6, the CH 7 performs a process of disconnecting the connection between its own CH 7 and the IO 8.

ＣＨ７の相互監視及び切り離しの処理については後述する。
なお、上述したＩＯＰ５及びＣＨ７としての機能は、ＩＯＰ５及びＣＨ７がそれぞれ備えるＭＰＵ（Micro-Processing Unit）等のプロセッサにより実現される。
ＩＯ（入出力装置）８は、ＩＯＰ５によるＩＯアクセスの対象となる装置である。ＩＯ８としては、例えばＨＤＤ（Hard Disk Drive）等の磁気ディスク装置、ＳＳＤ（Solid State Drive）等の半導体ディスク装置を含む各種記憶装置（ストレージ装置）、又はコンソール等の種々の装置が挙げられる。The process of mutual monitoring and disconnection of CH7 will be described later.
Note that the functions as IOP5 and CH7 described above are realized by a processor such as an MPU (Micro-Processing Unit) included in each of IOP5 and CH7.
The IO (input / output device) 8 is a device that is a target of IO access by the IOP 5. Examples of the IO 8 include various storage devices (storage devices) including a magnetic disk device such as an HDD (Hard Disk Drive), a semiconductor disk device such as an SSD (Solid State Drive), and various devices such as a console.

以上の構成により、情報処理システム１では、ＣＰＵ２でＩＯ命令が実行されると、ＩＯＰ５を経由してＣＨ７へＩＯ命令の詳細が伝搬される。ＣＨ７は、ＩＯ命令の指示内容に従ってＩＯ８及びＭＳ３間のデータ転送を実行する。その後、ＣＨ７は、ＩＯ割り込みをＩＯＰ５を経由してＣＰＵ２へ通知する。
なお、ＩＯＰ５は、例えばＩＯ８−１へアクセスするために、ＩＯＰ５−１、ＣＨ７−１、ＩＯ８−１のパス（系列）、及び、ＩＯＰ５−２、ＣＨ７−３、ＩＯ８−１のパス（系列）の２つのパスを用いることができる。このように、情報処理システム１は、ＩＯ系についても複数パスで構成することで、一方のパスが故障しても、もう一方のパスが正常であればシステムダウンを回避して動作を継続することができる。With the above configuration, in the information processing system 1, when an IO command is executed by the CPU 2, the details of the IO command are propagated to CH7 via the IOP5. CH7 executes data transfer between IO8 and MS3 according to the instruction content of the IO command. Thereafter, CH7 notifies the CPU 2 of the IO interrupt via IOP5.
For example, the IOP5 has a path (series) of IOP5-1, CH7-1, and IO8-1, and a path (series) of IOP5-2, CH7-3, and IO8-1 to access the IO8-1. These two paths can be used. In this way, the information processing system 1 is configured by a plurality of paths for the IO system, so that even if one path fails, if the other path is normal, the system operation is avoided and the operation is continued. be able to.

ＢＲ（ブリッジ装置，接続装置）６は、ＩＯＰ５と複数のＣＨ７との間に介設され、ＩＯＰ５及びＣＨ７間のデータやコマンド等の入出力を中継する装置である。
〔１−２〕ブリッジ装置の説明
以下、図２を参照して、ＢＲ６の構成について説明する。
図２は、一実施形態に係るＢＲ６の構成例を示す図である。図２に示すように、本実施形態に係るＢＲ６は、相互監視チェック制御回路６０、並びにバス制御回路６５及び６６を有する。The BR (bridge device, connection device) 6 is a device that is interposed between the IOP 5 and a plurality of CHs 7 and relays input / output of data, commands, and the like between the IOPs 5 and CH 7.
[1-2] Description of Bridge Device Hereinafter, the configuration of BR6 will be described with reference to FIG.
FIG. 2 is a diagram illustrating a configuration example of the BR 6 according to an embodiment. As shown in FIG. 2, the BR 6 according to this embodiment includes a mutual monitoring check control circuit 60 and bus control circuits 65 and 66.

バス制御回路６５は、ＳＣ４を介してＩＯＰ５とバスを介して接続され、ＩＯＰ５からの書込アクセス及びＩＯＰ５への割込通知に係るバスの制御を行なう回路である。また、バス制御回路６６は、複数のＣＨ７とバスを介して接続され、ＣＨ７からの書込アクセス及びＣＨ７への割込通知に係るバスの制御を行なう回路である。なお、以下、ＢＲ６には、ｎ＋１個のＣＨ７が接続されているものとして説明する。また、図２において、これら複数のＣＨ７をそれぞれＣＨ＃０〜ＣＨ＃ｎと表記する場合がある。 The bus control circuit 65 is connected to the IOP5 via the SC4 via the bus, and controls the bus related to the write access from the IOP5 and the interrupt notification to the IOP5. The bus control circuit 66 is a circuit that is connected to a plurality of CHs 7 via a bus and controls the bus related to write access from CH7 and interrupt notification to CH7. In the following description, it is assumed that n + 1 CH7s are connected to BR6. In FIG. 2, the plurality of CH7 may be denoted as CH # 0 to CH # n, respectively.

相互監視チェック制御回路６０は、ＩＯＰ５及びＣＨ７間で相互にハングアップを監視するための回路であり、制御回路６０ａ、及び相互監視レジスタ６１を有する。
制御回路６０ａは、本実施形態に係るＩＯＰ５及びＣＨ７間の相互監視を実現するための制御を行なうハードウェアであり、書込制御部６２、障害検出部６３、及び通知部６４としての機能を有する。The mutual monitoring check control circuit 60 is a circuit for monitoring hang-up between the IOP 5 and CH 7, and includes a control circuit 60 a and a mutual monitoring register 61.
The control circuit 60 a is hardware that performs control for realizing mutual monitoring between the IOP 5 and the CH 7 according to the present embodiment, and has functions as a write control unit 62, a failure detection unit 63, and a notification unit 64. .

相互監視レジスタ（保持部）６１は、ＣＨ７ごとに、相互監視に用いる記憶領域を備えるものである。図２に示す例では、相互監視レジスタ６１は、記憶領域として、レジスタ６１ａ−１〜６１ａ−（ｎ＋１）（以下、レジスタ６１ａ−１〜６１ａ−（ｎ＋１）を区別しない場合には、単にレジスタ６１ａという）を備える。
レジスタ６１ａは、ＢＲ６に接続されて管理されるＣＨ７と同数備えられる。なお、相互監視レジスタ６１は、ＢＲ６に接続されたＣＨ７の数よりも少ないレジスタ６１ａを備えても良く、この場合、相互監視可能なＣＨ７の数は、レジスタ６１ａの数（ｎ＋１）となる。The mutual monitoring register (holding unit) 61 includes a storage area used for mutual monitoring for each CH7. In the example illustrated in FIG. 2, the mutual monitoring register 61 simply uses registers 61 a-1 to 61 a-(n + 1) (hereinafter referred to as registers 61 a-1 to 61 a-(n + 1) as the storage area. Provided).
There are as many registers 61a as there are CH7 connected to the BR6 and managed. Note that the mutual monitoring register 61 may include a smaller number of registers 61a than the number of CH7 connected to BR6. In this case, the number of CH7 that can be monitored is the number (n + 1) of the registers 61a.

なお、各レジスタ６１ａは、それぞれ、レジスタ６１ａに対応するＣＨ７及び当該ＣＨ７を制御するＩＯＰ５により書込アクセスが行なわれる。図１に示す例では、ＢＲ＃０が有するＣＨ＃０用のレジスタ６１ａは、ＩＯＰ＃０及びＣＨ＃０により書込アクセスが行なわれ、ＣＨ＃１用のレジスタ６１ａは、ＩＯＰ＃０及びＣＨ＃１により書込アクセスが行なわれる。同様に、ＢＲ＃１が有するＣＨ＃２用のレジスタ６１ａは、ＩＯＰ＃１及びＣＨ＃２により書込アクセスが行なわれ、ＣＨ＃３用のレジスタ６１ａは、ＩＯＰ＃１及びＣＨ＃３により書込アクセスが行なわれる。 Each register 61a is accessed for writing by CH7 corresponding to the register 61a and IOP5 controlling the CH7. In the example shown in FIG. 1, the CH # 0 register 61a of the BR # 0 is accessed for writing by the IOP # 0 and CH # 0, and the CH # 1 register 61a is the IOP # 0 and CH # 0. Write access is performed by # 1. Similarly, the BR # 1 register 61a for CH # 2 is accessed for writing by IOP # 1 and CH # 2, and the register 61a for CH # 3 is written by IOP # 1 and CH # 3. Access.

図２に示すように、レジスタ６１ａは、ＩＯＰＭａｓｋ、ＩＯＰＡｌｉｖｅ、ＣＨＭａｓｋ、及びＣＨＡｌｉｖｅの各ビットと、Ｔｈｒｅｓｈｏｌｄカウンタと、ＩＯＰＩｎｔｅｒｒｕｐｔ、及びＣＨＩｎｔｅｒｒｕｐｔの各ビットと、を持つ。なお、図２において、各ビット名又はカウンタ名の後ろに付された“＃０”，…“＃ｎ”は、レジスタ６１ａに割り当てられたＣＨ＃０〜＃ｎを示すものである。以下の説明では、“＃０”，…“＃ｎ”の表記を省略し、各ビット名又はカウンタ名のみを示す。 As shown in FIG. 2, the register 61 a has IOP Mask, IOP Alive, CH Mask, and CH Alive bits, a Threshold counter, and IOP Interrupt and CH Interrupt bits. In FIG. 2, “# 0”,... “#N” appended to the end of each bit name or counter name indicates CH # 0 to #n assigned to the register 61a. In the following description, the notation of “# 0”,... “#N” is omitted, and only each bit name or counter name is shown.

ＩＯＰＭａｓｋ及びＣＨＭａｓｋは、対応するＣＨ７が後述する障害検出部６３による障害の検出対象であるか否か（相互監視の対象であるか否か）を示すＭａｓｋＢｉｔ（第３情報）が設定されるビット（第３領域）である。
ＩＯＰ５は、制御対象の複数のＣＨ７のうち、動作中のＣＨ７であって相互監視を実施するＣＨ７を認識すると、当該ＣＨ７に対応するレジスタ６１ａのＩＯＰＭａｓｋに、マスクの無効、つまり監視対象であることを示すＭａｓｋＢｉｔ（例えば“０”）を設定する。一方、ＩＯＰ５は、制御対象の複数のＣＨ７のうち、未実装又はオフライン（未使用）状態のＣＨ７を認識すると、当該ＣＨ７に対応するレジスタ６１ａのＩＯＰＭａｓｋに、マスクの有効、つまり監視対象から除外することを示すＭａｓｋＢｉｔ（例えば“１”）を設定する。なお、ＣＨ７のオフライン状態には、ＣＨ７が故障中、初期化処理中、又は故障等の診断の実行中等の状態が含まれて良い。In the IOP Mask and the CH Mask, Mask Bit (third information) indicating whether or not the corresponding CH 7 is a failure detection target (whether or not mutual monitoring is performed) by the failure detection unit 63 described later is set. Bits (third region).
When the IOP5 recognizes the CH7 that is operating and is performing the mutual monitoring among the plurality of CH7 to be controlled, the IOP5 is ineffective, that is, the monitoring target in the IOP Mask of the register 61a corresponding to the CH7. A Mask Bit (for example, “0”) indicating the above is set. On the other hand, when the IOP5 recognizes an unimplemented or offline (unused) state CH7 among a plurality of control target CH7s, the IOP Mask of the register 61a corresponding to the CH7 is effective, that is, excluded from the monitoring target. A Mask Bit (for example, “1”) indicating that the operation is to be performed is set. Note that the offline state of CH7 may include a state in which CH7 is in failure, during initialization processing, or during execution of diagnosis such as failure.

また、ＣＨ７は、自身がＩＯＰ５と相互監視をする場合には、対応するレジスタ６１ａのＣＨＭａｓｋに、マスクの無効、つまり監視対象であることを示すＭａｓｋＢｉｔ（例えば“０”）を設定する。一方、ＣＨ７は、ＩＯＰ５の障害が検出された場合、又は上述の如く自身のオフライン（未使用）状態の場合には、対応するレジスタ６１ａのＣＨＭａｓｋに、マスクの有効、つまり自身を監視対象から除外することを示すＭａｓｋＢｉｔ（例えば“１”）を設定する。 Further, when the CH 7 performs mutual monitoring with the IOP 5, the CH 7 of the corresponding register 61a sets a mask bit (for example, “0”) indicating that the mask is invalid, that is, the monitoring target. On the other hand, when a failure of IOP5 is detected, or when the CH7 is in its offline (unused) state as described above, the CH Mask of the corresponding register 61a has a mask valid, that is, itself is monitored. A Mask Bit (for example, “1”) indicating exclusion is set.

相互監視チェック制御回路６０は、レジスタ６１ａのＩＯＰＭａｓｋ及びＣＨＭａｓｋを参照することで、対応するＣＨ７の相互監視の要否を判断する。例えば、相互監視チェック制御回路６０は、レジスタ６１ａにおいて、ＩＯＰＭａｓｋ及びＣＨＭａｓｋのビットの値がいずれも“０”である場合には、マスクは無効、つまり対応するＣＨ７の相互監視を行なうと判断する。一方、相互監視チェック制御回路６０は、レジスタ６１ａにおいて、ＩＯＰＭａｓｋ及びＣＨＭａｓｋのうちの少なくとも一方が“１”である場合には、マスクは有効、つまり対応するＣＨ７の相互監視を行なわないと判断する。そして、相互監視チェック制御回路６０は、相互監視が不要の場合には、対応するＣＨ７についての後述する相互監視処理の実施を抑止する。 The mutual monitoring check control circuit 60 refers to the IOP Mask and CH Mask in the register 61a, and determines whether or not mutual monitoring of the corresponding CH7 is necessary. For example, if the values of the IOP Mask and CH Mask bits are both “0” in the register 61a, the mutual monitoring check control circuit 60 determines that the mask is invalid, that is, performs mutual monitoring of the corresponding CH7. To do. On the other hand, if at least one of IOP Mask and CH Mask is “1” in register 61a, mutual monitoring check control circuit 60 determines that the mask is valid, that is, does not perform mutual monitoring of the corresponding CH7. To do. When the mutual monitoring is unnecessary, the mutual monitoring check control circuit 60 suppresses the execution of the mutual monitoring process described later for the corresponding CH7.

このように、ＩＯＰ５及びＣＨ７は、相互監視を実施するＩＯＰ５及びＣＨ７についてのみ、ＭａｓｋＢｉｔを無効に設定し、例えば動作していないＣＨ７についてはＭａｓｋＢｉｔを有効に設定する。これにより、ＩＯＰ５及びＣＨ７は、相互監視が不要なＣＨ７を簡単に監視対象から除外することができる。また、ＩＯＰ５及びＣＨ７の双方から、相互監視の要否を判断することができるため、より確実に、不要な相互監視の実行を抑止することができる。従って、情報処理システム１のリソースを有効活用することができる。 In this way, the IOP5 and CH7 set the Mask Bit invalid only for the IOP5 and CH7 that perform mutual monitoring, for example, set the Mask Bit valid for the CH7 that is not operating. As a result, the IOP 5 and the CH 7 can easily exclude the CH 7 that does not require mutual monitoring from the monitoring targets. In addition, since it is possible to determine whether mutual monitoring is necessary from both IOP5 and CH7, it is possible to more reliably prevent unnecessary mutual monitoring from being executed. Therefore, the resources of the information processing system 1 can be effectively used.

なお、ＩＯＰＭａｓｋ及びＣＨＭａｓｋの設定は、少なくとも相互監視が開始されるときに行なわれれば良い。相互監視の開始のトリガとしては、例えば情報処理システム１が起動した場合やＣＰＵ２等により指示された場合等が挙げられる。また、ＩＯＰＭａｓｋ及びＣＨＭａｓｋの値は、相互監視の実施中に更新（再設定）されても良い。この場合、相互監視チェック制御回路６０は、ＩＯＰＭａｓｋ及びＣＨＭａｓｋの更新を検出し、更新後のＭａｓｋＢｉｔに基づいて、相互監視の状態を切り替える。 The IOP Mask and CH Mask may be set at least when mutual monitoring is started. Examples of triggers for starting mutual monitoring include a case where the information processing system 1 is activated or a case where the CPU 2 or the like instructs. Further, the values of IOP Mask and CH Mask may be updated (reset) during the execution of mutual monitoring. In this case, the mutual monitoring check control circuit 60 detects the update of the IOP Mask and the CH Mask, and switches the mutual monitoring state based on the updated Mask Bit.

ＩＯＰＡｌｉｖｅは、ＩＯＰ５により、一定時間ごとに、有効（例えば“１”）を示すＡｌｉｖｅＢｉｔ（第１情報）が設定されるビット（第１領域）である。
ＣＨＡｌｉｖｅは、ＣＨ７により、一定時間ごとに、有効（例えば“１”）を示すＡｌｉｖｅＢｉｔ（第２情報）が設定されるビット（第２領域）である。
ＩＯＰ５は、相互監視において、一定時間ごとに、監視対象の全てのＣＨ７に対応するレジスタ６１ａのＩＯＰＡｌｉｖｅビットを更新するために、レジスタ６１ａへ書込アクセスを行なう。また、ＣＨ７は、相互監視において、一定時間ごとに、対応するレジスタ６１ａのＣＨＡｌｉｖｅビットの更新するために、レジスタ６１ａへ書込アクセスを行なう。The IOP Alive is a bit (first area) in which an Alive Bit (first information) indicating validity (for example, “1”) is set at regular intervals by the IOP5.
CH Alive is a bit (second area) in which an Alive Bit (second information) indicating validity (for example, “1”) is set by CH7 at regular time intervals.
In the mutual monitoring, the IOP5 performs a write access to the register 61a in order to update the IOP Alive bit of the register 61a corresponding to all the monitored CH7 in the mutual monitoring. In addition, CH7 performs write access to the register 61a in order to update the CH Alive bit of the corresponding register 61a at regular intervals during mutual monitoring.

なお、ＩＯＰ５及びＣＨ７における一定時間（所定時間）、つまりＩＯＰ５がＩＯＰＡｌｉｖｅへＡｌｉｖｅＢｉｔを書き込む周期と、ＣＨ７がＣＨＡｌｉｖｅへＡｌｉｖｅＢｉｔを書き込む周期とは、同一又は略同一（同程度）である。
Ｔｈｒｅｓｈｏｌｄは、ＩＯＰ５及びＣＨ７のいずれかにより、相互監視レジスタ６１に対してＡｌｉｖｅＢｉｔの連続した書込アクセスがあった回数を示すビット（カウンタ）である。なお、以下の説明において、Ｔｈｒｅｓｈｏｌｄを閾値カウンタ（Threshold Counter）という。本実施形態においては、閾値カウンタは２ビットで構成される。It should be noted that a fixed time (predetermined time) in IOP5 and CH7, that is, a cycle in which IOP5 writes Alive Bit to IOP Alive and a cycle in which CH7 writes Alive Bit to CH Alive are the same or substantially the same (similar).
Threshold is a bit (counter) indicating the number of times that the Abit 5 has been continuously written to the mutual monitoring register 61 by either IOP5 or CH7. In the following description, Threshold is referred to as a threshold counter. In this embodiment, the threshold counter is composed of 2 bits.

ＩＯＰＩｎｔｅｒｒｕｐｔ及びＣＨＩｎｔｅｒｒｕｐｔは、ＩＯＰ５又はＣＨ７の障害の発生が検出されたことを示す値（第４情報）が設定されるビット（第４領域）である。例えば、障害検出部６３により、ＩＯＰ５の障害の発生が検出された場合には、ＩＯＰＩｎｔｅｒｒｕｐｔに有効を示す第４情報（例えば“１”）が設定され、ＣＨ７の障害の発生が検出された場合には、ＣＨＩｎｔｅｒｒｕｐｔに有効を示す第４情報（例えば“１”）が設定される。 IOP Interrupt and CH Interrupt are bits (fourth area) in which a value (fourth information) indicating that a failure of IOP5 or CH7 is detected is set. For example, when the failure detection unit 63 detects the occurrence of a failure in IOP5, the fourth information (for example, “1”) indicating validity is set in IOP Interrupt, and the occurrence of a failure in CH7 is detected. Is set with fourth information (for example, “1”) indicating that CH Interrupt is valid.

書込制御部６２は、ＩＯＰ５又はＣＨ７による相互監視レジスタ６１への書込アクセスを検出する。具体的には、書込制御部６２は、レジスタ６１ａごとのＩＯＰＡｌｉｖｅ及びＣＨＡｌｉｖｅに割り当てられたアドレス空間を監視する。そして、書込制御部６２は、ＩＯＰ５又はＣＨ７から、バス制御回路６５又は６６を介して、ＩＯＰＡｌｉｖｅ及びＣＨＡｌｉｖｅに割り当てられたアドレス空間に対するＡｌｉｖｅＢｉｔの書込アクセスを検出する。 The write control unit 62 detects a write access to the mutual monitoring register 61 by IOP5 or CH7. Specifically, the write control unit 62 monitors the address space assigned to the IOP Alive and CH Alive for each register 61a. Then, the write control unit 62 detects Alive Bit write access to the address space allocated to the IOP Alive and CH Alive from the IOP 5 or CH 7 via the bus control circuit 65 or 66.

また、書込制御部６２は、ＩＯＰ５又はＣＨ７から書込アクセスがあった場合に、ＩＯＰＡｌｉｖｅ及びＣＨＡｌｉｖｅに設定された値が示すアクセス状況に応じて、ＩＯＰＡｌｉｖｅ及びＣＨＡｌｉｖｅの状態の更新を行なう。つまり、ＩＯＰＡｌｉｖｅ及びＣＨＡｌｉｖｅの値（状態）は、ＩＯＰ５及びＣＨ７により直接書き換えられるものではなく、ＩＯＰ５及びＣＨ７からの書込アクセスに応じて、書込制御部６２により更新される。 Also, when there is a write access from IOP5 or CH7, the write control unit 62 updates the state of IOP Alive and CH Alive according to the access status indicated by the values set in IOP Alive and CH Alive. Do. That is, the values (states) of IOP Alive and CH Alive are not directly rewritten by IOP5 and CH7, but are updated by the write control unit 62 in response to a write access from IOP5 and CH7.

さらに、書込制御部６２は、ＩＯＰ５又はＣＨ７から書込アクセスがあった場合に、アクセス状況に応じて閾値カウンタの制御を行なう。
ここで、アクセス状況とは、ＩＯＰＡｌｉｖｅ及びＣＨＡｌｉｖｅに設定された各１ビット、計２ビットのＡｌｉｖｅＢｉｔが示す、ＩＯＰ５及びＣＨ７による書込アクセスの実行状況をいう。つまり、アクセス状況は、ＩＯＰＡｌｉｖｅ及びＣＨＡｌｉｖｅに設定された計２ビットのＡｌｉｖｅＢｉｔを示す。なお、以下の説明において、ＩＯＰＡｌｉｖｅ及びＣＨＡｌｉｖｅをＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅと表記する場合がある。Furthermore, when there is a write access from IOP5 or CH7, the write control unit 62 controls the threshold counter according to the access status.
Here, the access status refers to the execution status of write access by IOP5 and CH7 indicated by 1 bit each set in IOP Alive and CH Alive and a total of 2 bits Alive Bit. That is, the access status indicates a total 2-bit Alive Bit set in IOP Alive and CH Alive. In the following description, IOP Alive and CH Alive may be referred to as IOP Alive / CH Alive.

アクセス状況には、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅの値によって、“００”、“０１”、及び“１０”の状態が存在する。アクセス状況が“００”の状態とは、ＩＯＰ５及びＣＨ７のうちのいずれの装置も書込アクセスを行なっていない状態、又はＩＯＰ５及びＣＨ７が交互に書込アクセスを行なっている状態を示す。また、アクセス状況が“０１”の状態とは、直前にＣＨ７が書込アクセスを行なっている状態を示す。さらに、アクセス状況が“１０”の状態とは、直前にＩＯＰ５が書込アクセスを行なっている状態を示す。 The access status includes states of “00”, “01”, and “10” depending on the value of IOP Alive / CH Alive. The state where the access status is “00” indicates a state where none of the devices IOP5 and CH7 is performing write access, or a state where IOP5 and CH7 are alternately performing write access. Further, the state where the access status is “01” indicates a state in which CH7 is performing write access immediately before. Furthermore, the state where the access status is “10” indicates a state where the IOP 5 is performing write access immediately before.

書込制御部６２の詳細な説明については、後述する。
障害検出部（検出部）６３は、ＩＯＰ５及びＣＨ７の各々による相互監視レジスタ６１への書き込みを監視し、監視結果に基づいて、ＩＯＰ５及びＣＨ７のうちのいずれか一方の装置にハングアップ等の障害が発生したことを検出する。
具体的には、障害検出部６３は、ＩＯＰ５及びＣＨ７のうちのいずれか他方の装置による相互監視レジスタ６１へのＡｌｉｖｅＢｉｔの書き込みが所定の回数連続して行なわれたか否かを判定する。より具体的に、障害検出部６３は、各レジスタ６１ａの閾値カウンタの値を監視し、閾値カウンタの値が所定の回数（所定の閾値）に達したか否かを判定する。そして、障害検出部６３は、閾値カウンタの値が所定の閾値に達した場合に、そのときのレジスタ６１ａのアクセス状況に応じて、ＡｌｉｖｅＢｉｔの書き込みが行なわれなかった上記一方の装置に障害が発生したことを検出するのである。Detailed description of the write control unit 62 will be described later.
The failure detection unit (detection unit) 63 monitors writing to the mutual monitoring register 61 by each of IOP5 and CH7, and based on the monitoring result, a failure such as a hang-up occurs in either one of IOP5 or CH7. Detect that occurred.
Specifically, the failure detection unit 63 determines whether or not the writing of Alive Bit to the mutual monitoring register 61 by the other device of IOP5 and CH7 has been continuously performed a predetermined number of times. More specifically, the failure detection unit 63 monitors the value of the threshold counter of each register 61a and determines whether or not the value of the threshold counter has reached a predetermined number of times (predetermined threshold). Then, when the value of the threshold counter reaches a predetermined threshold value, the failure detection unit 63 determines that there is a failure in the one device to which the Alive Bit has not been written according to the access status of the register 61a at that time. It detects what happened.

なお、上述の如く、ＩＯＰ５がＡｌｉｖｅＢｉｔを書き込む周期（一定時間）と、ＣＨ７がＡｌｉｖｅＢｉｔを書き込む周期（一定時間）とは同一又は略同一である。しかし、上記一方の装置に障害が発生したことを検出する際には、ＩＯＰ５及びＣＨ７によるＡｌｉｖｅＢｉｔの書き込みのタイミングのズレ等を考慮して、所定の閾値を３以上とすることが好ましい。 Note that, as described above, the cycle (fixed time) in which the IOP 5 writes the alive bit and the cycle (fixed time) in which the CH 7 writes the alive bit are the same or substantially the same. However, when it is detected that a failure has occurred in one of the devices, it is preferable to set the predetermined threshold value to 3 or more in consideration of a shift in the timing of writing of Alive Bit by IOP5 and CH7.

本実施形態においては、障害検出部６３は、２ビットで構成された閾値カウンタが所定の閾値としての“１１”に達した場合、つまり上記他方の装置によるＡｌｉｖｅＢｉｔの書き込みが３回連続して行なわれた場合に、上記一方の装置に障害が発生したことを検出する。
これにより、ＩＯＰ５及びＣＨ７によるＡｌｉｖｅＢｉｔの書き込みのタイミングにズレ等が生じた場合であっても、上記一方の装置に障害が発生したことを正確に検出することができる。In the present embodiment, the failure detection unit 63 determines that when the threshold counter composed of 2 bits reaches “11” as the predetermined threshold, that is, the writing of the Alive Bit by the other device is performed three times in succession. If so, it detects that a failure has occurred in one of the devices.
As a result, even when a deviation or the like occurs in the timing of writing the Alive bit by the IOP5 and CH7, it is possible to accurately detect that a failure has occurred in the one device.

また、障害検出部６３は、上記一方の装置に障害が発生したことを検出したレジスタ６１ａにおける、他方の装置、つまり監視相手の装置に対応するＩＯＰＩｎｔｅｒｒｕｐｔ又はＣＨＩｎｔｅｒｒｕｐｔに、有効（例えば“１”）を示す値を設定する。
このように、相互監視チェック制御回路６０は、ＩＯＰ５及びＣＨ７のうちの片方の装置からしかＡｌｉｖｅＢｉｔが更新されないことを、閾値カウンタによりカウントすることで、ＡｌｉｖｅＢｉｔが更新されない装置のハングアップ等の障害の発生を検出する。Further, the failure detection unit 63 is effective (for example, “1”) for the IOP Interrupt or the CH Interrupt corresponding to the other device, that is, the monitoring partner device, in the register 61a that has detected that a failure has occurred in the one device. ) Is set.
In this way, the mutual monitoring check control circuit 60 counts the fact that the Alive Bit is updated only from one of the IOP5 and CH7 by using the threshold counter, so that the apparatus whose Alive Bit is not updated hangs up. Detect the occurrence of a failure.

通知部６４は、障害検出部６３により検出された上記一方の装置の障害の発生を、上記他方の装置へ、バス制御回路６５又は６６を介して通知する。具体的には、通知部６４は、各レジスタ６１ａのＩＯＰＩｎｔｅｒｒｕｐｔ及びＣＨＩｎｔｅｒｒｕｐｔを監視する。そして、通知部６４は、各Ｉｎｔｅｒｒｕｐｔのうちのいずれかのビットに有効を示す値が設定されると、当該値が設定された装置（上記他方の装置）に対して割り込みを上げて、相手装置（上記一方の装置）の障害発生を通知する。 The notification unit 64 notifies the occurrence of a failure of the one device detected by the failure detection unit 63 to the other device via the bus control circuit 65 or 66. Specifically, the notification unit 64 monitors the IOP Interrupt and CH Interrupt of each register 61a. When a value indicating validity is set in any bit of each interrupt, the notification unit 64 raises an interrupt to the device (the other device) in which the value is set, and the partner device Notify the occurrence of a failure in (one of the above devices).

なお、障害検出部６３は、上記一方の装置に障害が発生したことを検出した場合に、ＩＯＰＩｎｔｅｒｒｕｐｔ又はＣＨＩｎｔｅｒｒｕｐｔを設定したレジスタ６１ａを示す情報を通知部６４へ通知しても良い。このとき、通知部６４は、障害検出部６３からの通知を受けてから、通知されたレジスタ６１ａのＩＯＰＩｎｔｅｒｒｕｐｔ又はＣＨＩｎｔｅｒｒｕｐｔを参照すれば良い。この場合、通知部６４は、各レジスタ６１ａのＩＯＰＩｎｔｅｒｒｕｐｔ及びＣＨＩｎｔｅｒｒｕｐｔの監視を省略しても良い。 Note that the failure detection unit 63 may notify the notification unit 64 of information indicating the register 61a in which IOP Interrupt or CH Interrupt is set when detecting that a failure has occurred in one of the devices. At this time, the notification unit 64 may refer to the IOP Interrupt or the CH Interrupt of the notified register 61a after receiving the notification from the failure detection unit 63. In this case, the notification unit 64 may omit the monitoring of the IOP Interrupt and the CH Interrupt of each register 61a.

上述のように、相互監視チェック制御回路６０は、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅに対してＩＯＰ５又はＣＨ７から書込アクセスがあったことにより、当該ＩＯＰ５又はＣＨ７にはハングアップ等の障害が発生していないと判断する。換言すれば、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅのビットは、ＩＯＰ５又はＣＨ７が、自身にハングアップ等の障害が発生していないことをＢＲ６へ通知するためのビットであるといえる。 As described above, the mutual monitoring check control circuit 60 does not cause a failure such as hang-up in the IOP5 or CH7 due to the write access from the IOP5 or CH7 to the IOP Alive / CH Alive. Judge. In other words, it can be said that the IOP Alive / CH Alive bit is a bit for notifying the BR 6 that the IOP 5 or CH 7 does not cause a failure such as a hang-up.

〔１−３〕書込制御部の説明
以下、図３及び図４を参照して、書込制御部６２の詳細を説明する。
図３は、図２に示す書込制御部６２による相互監視レジスタ６１の状態の制御の一例を説明する図である。また、図４（ａ）は、ＣＨ７に障害が発生している場合の、相互監視レジスタ６１の状態遷移の一例を示すタイムチャートであり、図４（ｂ）は、ＩＯＰ５に障害が発生している場合の、相互監視レジスタ６１の状態遷移の一例を示すタイムチャートである。[1-3] Description of Write Control Unit Details of the write control unit 62 will be described below with reference to FIGS. 3 and 4.
FIG. 3 is a diagram for explaining an example of control of the state of the mutual monitoring register 61 by the write control unit 62 shown in FIG. 4A is a time chart showing an example of the state transition of the mutual monitoring register 61 when a failure occurs in CH7, and FIG. 4B shows a case where a failure occurs in IOP5. It is a time chart which shows an example of the state transition of the mutual monitoring register | resistor 61 in the case of being.

なお、図３の左欄は、書込制御部６２によるＩＯＰＡｌｉｖｅ及びＣＨＡｌｉｖｅの更新前の状態、図３の右欄は、書込制御部６２によるＩＯＰＡｌｉｖｅ及びＣＨＡｌｉｖｅの更新後の状態をそれぞれ表している。
書込制御部６２は、ＩＯＰ５又はＣＨ７により、あるレジスタ６１ａのＩＯＰＡｌｉｖｅ又はＣＨＡｌｉｖｅへの書込アクセスの発生を検出すると、図３及び以下に示すように、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅの状態の更新を行なう。3 shows the state before the IOP Alive and CH Alive are updated by the write controller 62, and the right column of FIG. 3 shows the state after the IOP Alive and CH Alive is updated by the write controller 62. Represents each.
When the write control unit 62 detects the occurrence of a write access to the IOP Alive or CH Alive of a certain register 61a by the IOP5 or CH7, the IOP Alive / CH Alive state is updated as shown in FIG. To do.

（Ｉ）書込制御部６２が、ＩＯＰ５によるＩＯＰＡｌｉｖｅへの書込アクセスの発生を検出した場合。
（Ｉ−１）更新前にＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅの値（アクセス状況）が“００”である場合（図３の左欄第１行参照）。
書込制御部６２は、ＩＯＰＡｌｉｖｅへＡｌｉｖｅＢｉｔを設定し、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅの値を“１０”にする（図３の右欄第１行参照）。なお、書込制御部６２は、閾値カウンタの値については、現状（“０”）を維持する。(I) The write control unit 62 detects the occurrence of a write access to the IOP Alive by IOP5.
(I-1) When the value (access status) of IOP Alive / CH Alive is “00” before update (refer to the first line in the left column of FIG. 3).
The write control unit 62 sets Alive Bit to IOP Alive and sets the value of IOP Alive / CH Alive to “10” (see the first line on the right column in FIG. 3). Note that the write control unit 62 maintains the current state (“0”) for the value of the threshold counter.

（Ｉ−２）ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅの値（アクセス状況）が“０１”である場合（図３の左欄第２行参照）。
書込制御部６２は、ＣＨＡｌｉｖｅに設定されたＡｌｉｖｅＢｉｔを無効（“０”）に変更し、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅの値を“００”にする（図３の右欄第２行参照）。また、書込制御部６２は、“Ｎ”（Ｎは０以上の整数）が設定された閾値カウンタの値を“０”にリセットする。(I-2) When the value (access status) of IOP Alive / CH Alive is “01” (see the second column on the left column in FIG. 3).
The write control unit 62 changes the Alive Bit set to CH Alive to invalid (“0”) and sets the value of IOP Alive / CH Alive to “00” (see the second line on the right column of FIG. 3). . Further, the write control unit 62 resets the value of the threshold counter to which “N” (N is an integer of 0 or more) is set to “0”.

（Ｉ−３）ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅの値（アクセス状況）が“１０”である場合（図３の左欄第３行参照）。
書込制御部６２は、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅについて、現状（“１０”）を維持する（図３の右欄第３行参照）。また、書込制御部６２は、“Ｎ”が設定された閾値カウンタの値をインクリメントする（“Ｎ＋１”にする）。(I-3) When the value (access status) of IOP Alive / CH Alive is “10” (see the third column on the left column in FIG. 3).
The write controller 62 maintains the current state (“10”) for the IOP Alive / CH Alive (see the third column on the right column in FIG. 3). Further, the write control unit 62 increments the value of the threshold counter for which “N” is set (set to “N + 1”).

（II）書込制御部６２が、ＣＨ７によるＣＨＡｌｉｖｅへの書込アクセスの発生を検出した場合。
（II−１）更新前にＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅの値（アクセス状況）が“００”である場合（図３の左欄第４行参照）。
書込制御部６２は、ＣＨＡｌｉｖｅへＡｌｉｖｅＢｉｔを設定し、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅの値を“０１”にする（図３の右欄第４行参照）。なお、書込制御部６２は、閾値カウンタの値については、現状（“０”）を維持する。(II) When the write control unit 62 detects occurrence of write access to CH Alive by CH7.
(II-1) When the value (access status) of IOP Alive / CH Alive is “00” before update (see the fourth column on the left column in FIG. 3).
The write control unit 62 sets Alive Bit to CH Alive and sets the value of IOP Alive / CH Alive to “01” (see the fourth column on the right column of FIG. 3). Note that the write control unit 62 maintains the current state (“0”) for the value of the threshold counter.

（II−２）ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅの値（アクセス状況）が“０１”である場合（図３の左欄第５行参照）。
書込制御部６２は、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅについて、現状（“０１”）を維持する（図３の右欄第５行参照）。また、書込制御部６２は、“Ｎ”が設定された閾値カウンタの値をインクリメントする（“Ｎ＋１”にする）。(II-2) When the value (access status) of IOP Alive / CH Alive is “01” (see the fifth column on the left column in FIG. 3).
The write control unit 62 maintains the current state (“01”) for IOP Alive / CH Alive (see the fifth column on the right column in FIG. 3). Further, the write control unit 62 increments the value of the threshold counter for which “N” is set (set to “N + 1”).

（II−３）ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅの値（アクセス状況）が“１０”である場合（図３の左欄第６行参照）。
書込制御部６２は、ＩＯＰＡｌｉｖｅに設定されたＡｌｉｖｅＢｉｔを無効（“０”）に変更し、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅの値を“００”にする（図３の右欄第６行参照）。また、書込制御部６２は、“Ｎ”（Ｎは０以上の整数）が設定された閾値カウンタの値を“０”にリセットする。(II-3) When the value (access status) of IOP Alive / CH Alive is “10” (see the sixth line on the left column in FIG. 3).
The write control unit 62 changes the Alive Bit set in the IOP Alive to invalid (“0”) and sets the IOP Alive / CH Alive value to “00” (see the sixth column on the right column in FIG. 3). . Further, the write control unit 62 resets the value of the threshold counter to which “N” (N is an integer of 0 or more) is set to “0”.

以上のように、書込制御部６２は、ＩＯＰＡｌｉｖｅ及びＣＨＡｌｉｖｅの更新を行なう。
次に、図４（ａ）に示すように、ＣＨ７に障害が発生している場合の、相互監視レジスタ６１の状態遷移の一例を説明する。なお、図４（ａ）に示す例では、タイミングｔ０において、アクセス状況が“００”の状態であるものとする。As described above, the write control unit 62 updates IOP Alive and CH Alive.
Next, as shown in FIG. 4A, an example of the state transition of the mutual monitoring register 61 when a failure has occurred in CH7 will be described. In the example shown in FIG. 4A, it is assumed that the access status is “00” at the timing t0.

タイミングｔ０において、ＩＯＰ５によりＩＯＰＡｌｉｖｅへのＡｌｉｖｅＢｉｔの書込アクセスが発生すると（図３の左欄第１行参照）、書込制御部６２によりアクセス状況が“１０”に更新される（タイミングｔ１，図３の右欄第１行参照）。
ＣＨ７には障害が発生しており、書込アクセスが発生しないため、タイミングｔ０から所定時間Ｔ経過後、ＩＯＰ５によりＩＯＰＡｌｉｖｅへのＡｌｉｖｅＢｉｔの書込アクセスが発生する（タイミングｔ２，図３の左欄第３行参照）。このとき、アクセス状況は“１０”であるため、書込制御部６２によりアクセス状況が“１０”に維持され、閾値カウンタの値が“００”から“０１”に更新される（タイミングｔ３，図３の右欄第３行参照）。At timing t0, when the write access of the active bit to the IOP Alive is generated by the IOP5 (see the first line in the left column of FIG. 3), the access status is updated to “10” by the write control unit 62 (timing t1). , See the first line in the right column of FIG. 3).
Since a failure has occurred in CH7 and no write access has occurred, after a predetermined time T has elapsed from timing t0, a write access of Alive to IOP Alive occurs by IOP5 (timing t2, left in FIG. 3) Column 3rd line). At this time, since the access status is “10”, the access status is maintained at “10” by the write control unit 62, and the value of the threshold counter is updated from “00” to “01” (timing t3, FIG. 3 (see the third column on the right column).

続いて、タイミングｔ２から所定時間Ｔ経過後、ＩＯＰ５によりＩＯＰＡｌｉｖｅへのＡｌｉｖｅＢｉｔの書込アクセスが発生する（タイミングｔ４）。この場合も、タイミングｔ３と同様に、書込制御部６２によりアクセス状況が“１０”に維持され、閾値カウンタの値が“０１”から“１０”に更新される（タイミングｔ５）。
さらに、タイミングｔ４から所定時間Ｔ経過後、ＩＯＰ５によりＩＯＰＡｌｉｖｅへのＡｌｉｖｅＢｉｔの書込アクセスが発生する（タイミングｔ６）。この場合も、タイミングｔ５と同様に、書込制御部６２によりアクセス状況が“１０”に維持され、閾値カウンタの値が“１０”から“１１”に更新される（タイミングｔ７）。Subsequently, after a predetermined time T elapses from the timing t2, the IOP5 generates an access access to the IOP Alive in the IOP Alive (timing t4). Also in this case, similarly to the timing t3, the access status is maintained at “10” by the write control unit 62, and the value of the threshold counter is updated from “01” to “10” (timing t5).
Further, after a predetermined time T has elapsed from the timing t4, the IOP5 generates an access access to the IOP Alive in the IOP Alive (timing t6). Also in this case, similarly to the timing t5, the access status is maintained at “10” by the write control unit 62, and the value of the threshold counter is updated from “10” to “11” (timing t7).

障害検出部６３は、タイミングｔ７において閾値カウンタの値が“１１”に達したことを検出すると、“１０”であるアクセス状況に基づき、ＣＨ７に障害が発生したことを検出する（タイミングｔ８）。そして、障害検出部６３は、ＩＯＰＩｎｔｅｒｒｕｐｔに“１”を設定する（タイミングｔ９）。ＩＯＰＩｎｔｅｒｒｕｐｔに“１”が設定されると、通知部６４は、ＩＯＰ５に対して、障害が検出されたレジスタ６１ａに対応するＣＨ７にハングアップ等の障害が発生したことを割り込みで通知する。 When detecting that the value of the threshold counter has reached “11” at timing t7, the failure detecting unit 63 detects that a failure has occurred in CH7 based on the access status of “10” (timing t8). Then, the failure detection unit 63 sets “1” in the IOP Interrupt (timing t9). When “1” is set in the IOP interrupt, the notification unit 64 notifies the IOP 5 that a failure such as a hang-up has occurred in the CH 7 corresponding to the register 61 a in which the failure has been detected by an interrupt.

次いで、図４（ｂ）に示すように、ＩＯＰ５に障害が発生している場合の、相互監視レジスタ６１の状態遷移の一例を説明する。なお、図４（ｂ）に示す例では、タイミングｔ１０において、アクセス状況が“００”の状態であるものとする。
タイミングｔ１０において、ＣＨ７によりＣＨＡｌｉｖｅへのＡｌｉｖｅＢｉｔの書込アクセスが発生すると（図３の左欄第４行参照）、書込制御部６２によりアクセス状況が“０１”に更新される（タイミングｔ１１，図３の右欄第４行参照）。Next, as shown in FIG. 4B, an example of state transition of the mutual monitoring register 61 when a failure occurs in the IOP 5 will be described. In the example shown in FIG. 4B, it is assumed that the access status is “00” at the timing t10.
At timing t10, when CH7 causes an Alive Bit write access to CH Alive (see the fourth row on the left column in FIG. 3), the write control unit 62 updates the access status to “01” (timing t11). , See the fourth column in the right column of FIG. 3).

ＩＯＰ５には障害が発生しており、書込アクセスが発生しないため、タイミングｔ１０から所定時間Ｔ経過後、ＣＨ７によりＣＨＡｌｉｖｅへのＡｌｉｖｅＢｉｔの書込アクセスが発生する（タイミングｔ１２，図３の左欄第５行参照）。このとき、アクセス状況は“０１”であるため、書込制御部６２によりアクセス状況が“０１”に維持され、閾値カウンタの値が“００”から“０１”に更新される（タイミングｔ１３，図３の右欄第５行参照）。 Since a failure has occurred in IOP5 and no write access has occurred, after a predetermined time T has elapsed from timing t10, a write access of Alive Bit to CH Alive occurs by CH7 (timing t12, left in FIG. 3) Column 5th line). At this time, since the access status is “01”, the access status is maintained at “01” by the write control unit 62, and the value of the threshold counter is updated from “00” to “01” (timing t13, FIG. 3 (see right column, line 5).

続いて、タイミングｔ１２から所定時間Ｔ経過後、ＣＨ７によりＣＨＡｌｉｖｅへのＡｌｉｖｅＢｉｔの書込アクセスが発生する（タイミングｔ１４）。この場合も、タイミングｔ１３と同様に、書込制御部６２によりアクセス状況が“０１”に維持され、閾値カウンタの値が“０１”から“１０”に更新される（タイミングｔ１５）。
さらに、タイミングｔ１４から所定時間Ｔ経過後、ＣＨ７によりＣＨＡｌｉｖｅへのＡｌｉｖｅＢｉｔの書込アクセスが発生する（タイミングｔ１６）。この場合も、タイミングｔ１５と同様に、書込制御部６２によりアクセス状況が“０１”に維持され、閾値カウンタの値が“１０”から“１１”に更新される（タイミングｔ１７）。Subsequently, after a predetermined time T elapses from timing t12, CH7 causes the Alive Bit write access to CH Alive (timing t14). Also in this case, similarly to the timing t13, the access status is maintained at “01” by the write control unit 62, and the value of the threshold counter is updated from “01” to “10” (timing t15).
Furthermore, after a predetermined time T has elapsed from timing t14, CH7 causes a write access of Alive Bit to CH Alive (timing t16). Also in this case, similarly to the timing t15, the access status is maintained at “01” by the write control unit 62, and the value of the threshold counter is updated from “10” to “11” (timing t17).

障害検出部６３は、タイミングｔ１７において閾値カウンタの値が“１１”に達したことを検出すると、“０１”であるアクセス状況に基づき、ＩＯＰ５に障害が発生したことを検出する（タイミングｔ１８）。そして、障害検出部６３は、ＣＨＩｎｔｅｒｒｕｐｔに“１”を設定する（タイミングｔ１９）。ＣＨＩｎｔｅｒｒｕｐｔに“１”が設定されると、通知部６４は、ＣＨ７に対して、ＩＯＰ５にハングアップ等の障害が発生したことを割り込みで通知する。 When the failure detection unit 63 detects that the value of the threshold counter has reached “11” at timing t17, the failure detection unit 63 detects that a failure has occurred in IOP5 based on the access status of “01” (timing t18). Then, the failure detection unit 63 sets “1” in CH Interrupt (timing t19). When “1” is set in CH Interrupt, the notification unit 64 notifies CH7 that a failure such as a hang-up has occurred in IOP5 by an interrupt.

以上のように、ＣＨ７又はＩＯＰ５に障害が発生している場合、相互監視レジスタ６１の状態は図４（ａ）又は（ｂ）に示すように遷移する。
このように、ＩＯＰ５及びＣＨ７は、相互監視において、自身のＡｌｉｖｅＢｉｔを一定時間ごとに更新する処理を行なうだけで良い。従って、ＩＯＰ５及びＣＨ７による相互監視を、システムの処理負荷を抑えた簡素な制御により実現することができる。As described above, when a failure occurs in CH7 or IOP5, the state of the mutual monitoring register 61 changes as shown in FIG. 4 (a) or (b).
As described above, the IOP 5 and the CH 7 need only perform the process of updating their own Alive Bit at regular intervals in the mutual monitoring. Therefore, mutual monitoring by IOP5 and CH7 can be realized by simple control with reduced processing load on the system.

〔１−４〕切り離し処理の説明
上述のように、ＢＲ６は、ＩＯＰ５又はＣＨ７における障害の発生を検出すると、障害が発生していない装置に対して、相手装置に障害が発生したことを割り込みにより通知する。
ＢＲ６からの割り込みを受けると、ＩＯＰ５又はＣＨ７は、以下のようにして、障害の発生した装置の切り離し処理を実施する。[1-4] Explanation of Disconnection Processing As described above, when the BR 6 detects the occurrence of a failure in the IOP 5 or CH 7, it notifies the device that has not failed that the failure has occurred in the counterpart device by an interrupt. Notice.
When receiving an interrupt from the BR 6, the IOP 5 or CH 7 performs the process of disconnecting the failed device as follows.

ＩＯＰ５がＣＨ７において障害が発生したことを通知された場合、ＩＯＰ５は、障害の発生が検出されたＣＨ７を識別する。そして、ＩＯＰ５は、識別したＣＨ７に対応したレジスタ６１ａのＩＯＰＭａｓｋに対して、マスクの有効を示すＭａｓｋＢｉｔ（“１”）を設定し、そのＣＨ７を相互監視対象から除外する。
そして、ＩＯＰ５は、相互監視対象から除外したＣＨ７と、当該ＣＨ７が管理する（ＣＨ７に接続された）ＩＯ８との間の接続を切り離す。When IOP5 is notified that a failure has occurred in CH7, IOP5 identifies CH7 in which the failure has been detected. The IOP 5 sets a Mask Bit (“1”) indicating the validity of the mask for the IOP Mask of the register 61a corresponding to the identified CH7, and excludes the CH7 from the mutual monitoring target.
Then, the IOP5 disconnects the connection between the CH7 excluded from the mutual monitoring target and the IO8 managed by the CH7 (connected to the CH7).

一方、ＣＨ７がＩＯＰ５において障害が発生したことを通知された場合、ＣＨ７は、自身に対応したレジスタ６１ａのＣＨＭａｓｋに対して、マスクの有効を示すＭａｓｋＢｉｔ（“１”）を設定し、自身をＩＯＰ５との相互監視対象から除外する。
そして、ＣＨ７は、自身と、自身が管理する（自身に接続された）ＩＯ８との間の接続を切り離す。On the other hand, when CH7 is notified that a failure has occurred in IOP5, CH7 sets a Mask Bit (“1”) indicating the validity of the mask for CH Mask of register 61a corresponding to CH7, and itself Are excluded from mutual monitoring targets with IOP5.
Then, CH7 disconnects the connection between itself and IO8 that it manages (connected to itself).

このように、ＩＯＰ５及びＣＨ７は、障害の発生が検出された装置（パス）をシステムから切り離すことで、情報処理システム１は、正常な交替パスで動作を継続することができる。
なお、ＣＨ−ＩＯ間の接続の切り離しは、既知の種々の手法により行なうことが可能であり、その詳細な説明は省略する。In this manner, the IOP 5 and the CH 7 disconnect the device (path) in which the occurrence of the failure is detected from the system, so that the information processing system 1 can continue the operation with the normal alternate path.
The connection between the CH-IO can be disconnected by various known methods, and detailed description thereof is omitted.

〔１−５〕情報処理システムの動作例
次に、上述の如く構成された本実施形態に係る情報処理システム１における動作例を、図５〜図９を参照して説明する。
図５〜図７は、一実施形態に係るＢＲ６，ＩＯＰ５，ＣＨ７のそれぞれによる、ＩＯＰ５及びＣＨ７間の相互監視処理の一例を説明するフローチャートである。図８及び図９は、一実施形態に係るＩＯＰ５，ＣＨ７のそれぞれによる、障害の発生が検出された装置の切り離し処理の一例を説明するフローチャートである。[1-5] Operation Example of Information Processing System Next, an operation example in the information processing system 1 according to the present embodiment configured as described above will be described with reference to FIGS.
5 to 7 are flowcharts for explaining an example of mutual monitoring processing between IOP5 and CH7 by BR6, IOP5, and CH7 according to an embodiment. FIG. 8 and FIG. 9 are flowcharts for explaining an example of the detachment process of the device in which the occurrence of the failure is detected by each of the IOP5 and CH7 according to the embodiment.

なお、図５に示す処理は、ＢＲ６により、複数のレジスタ６１ａ−１〜６１ａ−（ｎ＋１）それぞれについて実施される。また、図７に示す処理は、ＢＲ６が制御する複数のＣＨ＃０〜ＣＨ＃ｎそれぞれにより実施される。図５〜図７の説明においては、代表して図１に示すＩＯＰ５−１、ＢＲ６−１、及びＣＨ７−１により実施される処理について説明する。 The processing shown in FIG. 5 is performed for each of the plurality of registers 61a-1 to 61a- (n + 1) by BR6. 7 is performed by each of the plurality of CH # 0 to CH # n controlled by the BR6. In the description of FIG. 5 to FIG. 7, processing executed by the IOP 5-1, BR 6-1, and CH 7-1 shown in FIG. 1 is representatively described.

〔１−５−１〕相互監視処理
はじめに、ＢＲ６、ＩＯＰ５、及びＣＨ７による、ＩＯＰ５及びＣＨ７間の相互監視処理の一例を、図５〜図７を参照して説明する。
相互監視が開始されると、ＩＯＰ５及び複数のＣＨ７により、対応するレジスタ６１ａのＩＯＰＭａｓｋ及びＣＨＭａｓｋに対してＭａｓｋＢｉｔが設定される（図６のステップＳ２１及び図７のステップＳ３１）。このとき、ＩＯＰ５は、自身が制御する全てのＣＨ７に対応するレジスタ６１ａのＩＯＰＭａｓｋに対してＭａｓｋＢｉｔを設定する。なお、ＩＯＰＭａｓｋ及びＣＨＭａｓｋには、相互監視を有効とする“０”、又は相互監視を無効とする“１”が設定される。[1-5-1] Mutual Monitoring Processing First, an example of mutual monitoring processing between IOP5 and CH7 by BR6, IOP5, and CH7 will be described with reference to FIGS.
When the mutual monitoring is started, a mask bit is set for the IOP mask and the CH mask of the corresponding register 61a by the IOP5 and the plurality of CH7 (step S21 in FIG. 6 and step S31 in FIG. 7). At this time, the IOP5 sets a Mask Bit for the IOP Mask of the register 61a corresponding to all CH7 controlled by itself. Note that “0” that enables mutual monitoring or “1” that disables mutual monitoring is set in the IOP Mask and CH Mask.

また、ＩＯＰ５により、自身が制御する全てのＣＨ７に対応するレジスタ６１ａのＩＯＰＡｌｉｖｅに対して、ＡｌｉｖｅＢｉｔ（“１”）の書込アクセスが実行される（図６のステップＳ２２）。なお、このステップＳ２２の処理は、ＩＯＰ５により、一定時間ごとに（ステップＳ２３）繰り返し実行される。
さらに、ＣＨ７により、自身のＣＨ７に対応するレジスタ６１ａのＣＨＡｌｉｖｅに対して、ＡｌｉｖｅＢｉｔ（“１”）の書込アクセスが実行される（図７のステップＳ３２）。なお、このステップＳ３２の処理は、ＣＨ７により、一定時間ごとに（ステップＳ３３）繰り返し実行される。Also, the write access of Alive Bit (“1”) is executed by the IOP5 to the IOP Alive of the register 61a corresponding to all CH7 controlled by the IOP5 (step S22 in FIG. 6). Note that the processing in step S22 is repeatedly executed at regular time intervals (step S23) by IOP5.
Further, the write access of Alive Bit (“1”) is executed by CH7 to the CH Alive of the register 61a corresponding to its own CH7 (step S32 in FIG. 7). Note that the process of step S32 is repeatedly executed at regular time intervals (step S33) by CH7.

図５に示すように、ＢＲ６においては、レジスタ６１ａのＩＯＰＭａｓｋ及びＣＨＭａｓに設定されたＭａｓｋＢｉｔがいずれも“０”であるか否かが判定される（ステップＳ１）。２つのＭａｓｋＢｉｔのうちの少なくとも一方が“１”である場合には（ステップＳ１のＮｏルート）、ＢＲ６により、これらのＩＯＰ５及びＣＨ７は相互監視対象ではないと判断され、ＭａｓｋＢｉｔが更新されるまで待機される。 As shown in FIG. 5, in BR6, it is determined whether or not both Mask Bits set in the IOP Mask and CH Mas of the register 61a are “0” (step S1). When at least one of the two mask bits is “1” (No route in step S1), BR6 determines that these IOP5 and CH7 are not mutual monitoring targets, and the mask bit is updated. Wait until.

一方、ＩＯＰＭａｓｋ及びＣＨＭａｓｋに設定された値がいずれも“０”である場合には（ステップＳ１のＹｅｓルート）、書込制御部６２により、ＩＯＰ５によるＩＯＰＡｌｉｖｅへの書込アクセスがあったか否かが判定される（ステップＳ２）。ＩＯＰＡｌｉｖｅへの書込アクセスがあった場合（ステップＳ２のＹｅｓルート）、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅ（アクセス状況）の値に応じて、ステップＳ３、Ｓ４、又はＳ６へ移行する。 On the other hand, when the values set in the IOP Mask and the CH Mask are both “0” (Yes route in Step S1), whether or not the write control unit 62 has made a write access to the IOP Alive by the IOP5. Is determined (step S2). When there is a write access to the IOP Alive (Yes route in Step S2), the process proceeds to Step S3, S4, or S6 depending on the value of the IOP Alive / CH Alive (access status).

アクセス状況が“００”である場合（ステップＳ２のＹｅｓルートからの“００”ルート）、書込制御部６２により、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅが“１０”に更新され（ステップＳ３）、ステップＳ１に移行する。
一方、アクセス状況が“０１”である場合（ステップＳ２のＹｅｓルートからの“０１”ルート）、書込制御部６２により、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅが“００”に更新される（ステップＳ４）。また、書込制御部６２により、閾値カウンタの値が“００”にリセットされ（ステップＳ５）、ステップＳ１に移行する。If the access status is “00” (“00” route from the Yes route in Step S2), the write control unit 62 updates IOP Alive / CH Alive to “10” (Step S3), and then goes to Step S1. Transition.
On the other hand, when the access status is “01” (“01” route from the Yes route in step S2), the write control unit 62 updates IOP Alive / CH Alive to “00” (step S4). Further, the write control unit 62 resets the value of the threshold counter to “00” (step S5), and the process proceeds to step S1.

また、アクセス状況が“１０”である場合（ステップＳ２のＹｅｓルートからの“１０”ルート）、書込制御部６２により、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅが“１０”に維持される（ステップＳ６）。また、書込制御部６２により、閾値カウンタの値がインクリメントされる（ステップＳ７）。そして、障害検出部６３により、閾値カウンタの値が所定の閾値である“１１”であるか否かが判定される（ステップＳ８）。 If the access status is “10” (“10” route from the Yes route in step S2), the write control unit 62 maintains IOP Alive / CH Alive at “10” (step S6). Further, the value of the threshold counter is incremented by the write control unit 62 (step S7). Then, the failure detection unit 63 determines whether or not the value of the threshold counter is “11” which is a predetermined threshold (step S8).

閾値カウンタの値が“１１”である場合（ステップＳ８のＹｅｓルート）、障害検出部６３により、ＩＯＰＩｎｔｅｒｒｕｐｔに“１”が設定される（ステップＳ９）。そして、通知部６４により、ＩＯＰ５に対してＣＨ７のハングアップ等の障害の発生が検出されたことが割り込みで通知され（ステップＳ１０）、本説明におけるＩＯＰ５−１、ＢＲ６−１、及びＣＨ７−１に係る相互監視処理が終了する。 When the value of the threshold counter is “11” (Yes route in step S8), the failure detection unit 63 sets “1” in IOP Interrupt (step S9). Then, the notification unit 64 notifies the IOP5 that the occurrence of a failure such as CH7 hang-up has been detected by interruption (step S10), and the IOP5-1, BR6-1, and CH7-1 in this description are notified. The mutual monitoring process related to is completed.

一方、ステップＳ８において、閾値カウンタの値が“１１”ではない場合（ステップＳ８のＮｏルート）、ステップＳ１に移行する。
また、ステップＳ２において、ＩＯＰＡｌｉｖｅへの書込アクセスがなかった場合（ステップＳ２のＮｏルート）、書込制御部６２により、ＣＨ７によるＣＨＡｌｉｖｅへの書込アクセスがあったか否かが判定される（ステップＳ１１）。ＣＨＡｌｉｖｅへの書込アクセスがあった場合（ステップＳ１１のＹｅｓルート）、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅ（アクセス状況）の値に応じて、ステップＳ１２、Ｓ１３、又はＳ１５へ移行する。On the other hand, if the value of the threshold counter is not “11” in step S8 (No route in step S8), the process proceeds to step S1.
In step S2, if there is no write access to the IOP Alive (No route in step S2), the write control unit 62 determines whether there is a write access to CH Alive by CH7 ( Step S11). When there is a write access to CH Alive (Yes route in step S11), the process proceeds to step S12, S13, or S15 depending on the value of IOP Alive / CH Alive (access status).

アクセス状況が“００”である場合（ステップＳ１１のＹｅｓルートからの“００”ルート）、書込制御部６２により、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅが“０１”に更新され（ステップＳ１２）、ステップＳ１に移行する。
一方、アクセス状況が“１０”である場合（ステップＳ１１のＹｅｓルートからの“１０”ルート）、書込制御部６２により、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅが“００”に更新される（ステップＳ１３）。また、書込制御部６２により、閾値カウンタの値が“００”にリセットされ（ステップＳ１４）、ステップＳ１に移行する。If the access status is “00” (“00” route from the Yes route in step S11), the write control unit 62 updates IOP Alive / CH Alive to “01” (step S12), and then goes to step S1. Transition.
On the other hand, when the access status is “10” (“10” route from the Yes route in step S11), the write control unit 62 updates IOP Alive / CH Alive to “00” (step S13). Further, the write control unit 62 resets the value of the threshold counter to “00” (step S14), and the process proceeds to step S1.

また、アクセス状況が“０１”である場合（ステップＳ１１のＹｅｓルートからの“０１”ルート）、書込制御部６２により、ＩＯＰＡｌｉｖｅ／ＣＨＡｌｉｖｅが“０１”に維持される（ステップＳ１５）。また、書込制御部６２により、閾値カウンタの値がインクリメントされる（ステップＳ１６）。そして、障害検出部６３により、閾値カウンタの値が所定の閾値である“１１”であるか否かが判定される（ステップＳ１７）。 If the access status is “01” (“01” route from the Yes route in step S11), the write control unit 62 maintains IOP Alive / CH Alive at “01” (step S15). Further, the value of the threshold counter is incremented by the write control unit 62 (step S16). Then, the failure detection unit 63 determines whether or not the value of the threshold counter is “11” which is a predetermined threshold (step S17).

閾値カウンタの値が“１１”である場合（ステップＳ１７のＹｅｓルート）、障害検出部６３により、ＣＨＩｎｔｅｒｒｕｐｔに“１”が設定される（ステップＳ１８）。そして、通知部６４により、ＣＨ７に対してＩＯＰ５のハングアップ等の障害の発生が検出されたことが割り込みで通知され（ステップＳ１９）、本説明におけるＩＯＰ５−１、ＢＲ６−１、及びＣＨ７−１に係る相互監視処理が終了する。 When the value of the threshold counter is “11” (Yes route in step S17), the failure detection unit 63 sets “1” in CH Interrupt (step S18). Then, the notification unit 64 notifies CH7 that the occurrence of a failure such as a hang-up of IOP5 has been detected by an interrupt (step S19), and IOP5-1, BR6-1, and CH7-1 in the present description. The mutual monitoring process related to is completed.

一方、ステップＳ１７において、閾値カウンタの値が“１１”ではない場合（ステップＳ１７のＮｏルート）、ステップＳ１に移行する。
以上のように、ＢＲ６、ＩＯＰ５、及びＣＨ７における相互監視処理が実施される。
〔１−５−２〕切り離し処理
次に、ＩＯＰ５及びＣＨ７による、による、障害の発生が検出された装置の切り離し処理の一例を、図８及び図９を参照して説明する。On the other hand, in step S17, when the value of the threshold counter is not “11” (No route in step S17), the process proceeds to step S1.
As described above, the mutual monitoring process in BR6, IOP5, and CH7 is performed.
[1-5-2] Disconnection Process Next, an example of an apparatus disconnection process in which the occurrence of a failure is detected by IOP5 and CH7 will be described with reference to FIGS.

図８に示すように、ＩＯＰ５において、ＢＲ６からＣＨ７のハングアップ等の障害の発生の通知がされると、ＩＯＰ５により、障害が発生したＣＨ７が識別される（ステップＳ４１）。そして、ＩＯＰ５により、識別したＣＨ７に対応するレジスタ６１ａのＩＯＰＭａｓｋに対して、障害が発生したＣＨ７との相互監視を無効とする“１”のＭａｓｋＢｉｔが設定される（ステップＳ４２）。 As shown in FIG. 8, when the occurrence of a failure such as a hang-up of CH7 is notified from BR6 in IOP5, CH7 in which the failure has occurred is identified by IOP5 (step S41). Then, the IOP5 sets a Mask Bit of “1” that invalidates the mutual monitoring with the failed CH7 for the IOP Mask of the register 61a corresponding to the identified CH7 (step S42).

そして、ＩＯＰ５により、障害が発生したＣＨ７とＩＯ８との間の接続の切り離しが行なわれ（ステップＳ４３）、処理が終了する。
一方、図９に示すように、ＣＨ７において、ＢＲ６からＩＯＰ５のハングアップ等の障害の発生の通知がされると、ＣＨ７により、対応するレジスタ６１ａのＣＨＭａｓｋに対して、障害が発生したＩＯＰ５と自身のＣＨ７との相互監視を無効とする“１”のＭａｓｋＢｉｔが設定される（ステップＳ５１）。Then, the connection between CH7 and IO8 where the failure has occurred is disconnected by IOP5 (step S43), and the process ends.
On the other hand, as shown in FIG. 9, when the occurrence of a failure such as a hang-up of IOP5 is notified from BR6 to CH7, the CH7 of the corresponding register 61a is notified by CH7 to the IOP5 where the failure has occurred. A Mask Bit of “1” that disables mutual monitoring with its own CH7 is set (step S51).

そして、ＣＨ７により、自身のＣＨ７とＩＯ８との間の接続の切り離しが行なわれ（ステップＳ５２）、処理が終了する。
以上のように、ＩＯＰ５及びＣＨ７における障害の発生が検出された装置の切り離し処理が実施される。
このように、本実施形態に係る情報処理システム１によれば、レジスタ６１ａに対して、ＩＯＰ５及びＣＨ７により所定時間ごとにＡｌｉｖｅＢｉｔの書き込みが行なわれる。また、障害検出部６３により、ＩＯＰ５及びＣＨ７の各々による書き込みが監視され、監視結果に基づいて、ＩＯＰ５及びＣＨ７のうちのいずれか一方の装置に障害が発生したことが検出される。そして、通知部６４により、障害検出部６３により検出された上記一方の装置の障害の発生が、他方の装置へ通知される。Then, the connection between its own CH7 and IO8 is disconnected by CH7 (step S52), and the process ends.
As described above, the disconnection process of the device in which the occurrence of the failure in IOP5 and CH7 is detected is performed.
As described above, according to the information processing system 1 according to the present embodiment, Alive Bit is written to the register 61a at predetermined time intervals by the IOP5 and CH7. The failure detection unit 63 monitors writing by each of the IOP5 and CH7, and detects that a failure has occurred in one of the IOP5 and CH7 based on the monitoring result. Then, the notification unit 64 notifies the other device of the occurrence of the failure of the one device detected by the failure detection unit 63.

ここで、上述したように、ＩＯＰは、複数のＣＨを制御するためビジー率が非常に高い。従って、ＩＯＰには、様々な処理を、効率良く短時間に行なうことが要求される。この点について、本実施形態に係るＩＯＰ５及びＣＨ７は、自身のＡｌｉｖｅＢｉｔをＢＲ６に対して一定時間ごとに更新する処理を行なうだけで良い。これにより、ＩＯＰ５及びＣＨ７による相互監視を、システムの処理負荷を抑えた簡素な制御により実現することができる。従って、ＩＯＰ５及びＣＨ７間のハングアップ等の障害の発生の相互監視を、効率よく実施することができる。 Here, as described above, the IOP has a very high busy rate because it controls a plurality of CHs. Therefore, IOP is required to perform various processes efficiently and in a short time. In this regard, the IOP 5 and CH 7 according to the present embodiment need only perform processing for updating their own Alive Bit to the BR 6 at regular intervals. Thereby, mutual monitoring by IOP5 and CH7 can be realized by simple control with reduced system processing load. Therefore, mutual monitoring of the occurrence of a failure such as a hang-up between IOP5 and CH7 can be performed efficiently.

また、ＩＯＰ５やＣＨ７における相互監視に係る処理時間を低減でき、ＩＯＰ５又はＣＨ７は、相手装置がハングアップ等していることを早期に検出することができるため、長時間のシステム停止を防ぐことができる。
さらに、本実施形態に係る情報処理システム１によれば、ＩＯＰ５及びＣＨ７の相互監視処理の制御（アクセス）対象が、ＢＲ６内部のレジスタ６１ａとなる。このため、ＩＯＰ５及びＣＨ７は、レジスタ６１ａへの書き込みのための制御のみを実施すれば良く、図１１に示したようなＭＳ３へアクセスすることによる処理負荷や処理時間の増加を抑止し、より高速且つ簡素な動作とすることができる。Also, the processing time for mutual monitoring in IOP5 and CH7 can be reduced, and IOP5 or CH7 can detect that the counterpart device is hung up at an early stage, thereby preventing a long-term system stoppage. it can.
Furthermore, according to the information processing system 1 according to the present embodiment, the control (access) target of the mutual monitoring process of the IOP 5 and CH 7 is the register 61 a in the BR 6. For this reason, the IOP 5 and CH 7 need only perform control for writing to the register 61a, suppress an increase in processing load and processing time due to access to the MS 3 as shown in FIG. And it can be set as a simple operation.

〔２〕その他
以上、本発明の好ましい実施形態について詳述したが、本発明は、係る特定の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲内において、種々の変形、変更して実施することができる。
例えば、上述した一実施形態では、１つのＩＯＰ５と複数のＣＨ７との間に介設されたＢＲ６における構成について説明したが、複数のＩＯＰ５と複数のＣＨ７との間に介設されたＢＲ６に対しても、図２に示すＢＲ６の構成を適用することができる。この場合、複数のＩＯＰ５は、それぞれの制御対象であるＣＨ７に対応するレジスタ６１ａに対してのみ、ＩＯＰＭａｓｋを無効、つまり監視対象であることを示すＭａｓｋＢｉｔを設定すれば良い。[2] Others While the preferred embodiments of the present invention have been described in detail above, the present invention is not limited to such specific embodiments, and various modifications and changes can be made without departing from the spirit of the present invention. It can be changed and implemented.
For example, in the above-described embodiment, the configuration of the BR6 interposed between one IOP5 and a plurality of CH7 has been described. However, with respect to the BR6 interposed between the plurality of IOP5 and the plurality of CH7, However, the configuration of BR6 shown in FIG. 2 can be applied. In this case, the plurality of IOPs 5 need only set the Mask Bit indicating that the IOP Mask is invalid, that is, the monitoring target, only for the register 61a corresponding to the control target CH7.

また、上述した一実施形態では、レジスタ６１ａは、ＩＯＰＩｎｔｅｒｒｕｐｔ及びＣＨＩｎｔｅｒｒｕｐｔのビットを備えるものとして説明したが、例えば２ビットで構成されたＩｎｔｅｒｒｕｐｔのビット（第４領域）を１つ備えるものとしても良い。この場合、障害検出部６３は、ＣＨ７の障害の発生を検出するとＩｎｔｅｒｒｕｐｔに例えば“０１”を設定し、ＩＯＰ５の障害の発生を検出するとＩｎｔｅｒｒｕｐｔに例えば“１０”を設定するように構成することができる。そして、通知部６４は、Ｉｎｔｅｒｒｕｐｔの値（第４情報）を監視し、又は障害検出部６３から通知され、例えば“０１”の場合にはＩＯＰ５への割り込み通知を行ない、“１０”の場合にはＣＨ７への割り込み通知を行なうこととしても良い。 In the above-described embodiment, the register 61a has been described as including IOP Interrupt and CH Interrupt bits. However, for example, the register 61a may include one Interrupt bit (fourth area) configured by 2 bits. good. In this case, the failure detection unit 63 may be configured to set, for example, “01” to Interrupt when detecting the occurrence of CH7 failure, and to set, for example, “10” to Interrupt when the occurrence of failure of IOP5 is detected. it can. The notification unit 64 monitors the value of the Interrupt (fourth information) or is notified from the failure detection unit 63. For example, in the case of “01”, the interrupt notification to the IOP 5 is performed, and in the case of “10”. May also issue an interrupt notification to CH7.

さらに、図１に示す情報処理システム１は、それぞれ２つのＣＰＵ２、ＭＳ３、ＩＯＰ５、ＢＲ６、及びＩＯ８、１つのＳＣ４、並びに４つのＣＨ７を有するものとして説明したが、各装置の台数は、図１に示すものに限定されるものではない。
また、図２に示す書込制御部６２、障害検出部６３、及び通知部６４の機能は、任意に併合又は分割しても良い。Further, the information processing system 1 illustrated in FIG. 1 has been described as having two CPUs 2, MS3, IOP5, BR6, and IO8, one SC4, and four CH7. It is not limited to what is shown in.
Further, the functions of the write control unit 62, the failure detection unit 63, and the notification unit 64 shown in FIG. 2 may be arbitrarily merged or divided.

なお、前記目的に限らず、上述した発明を実施するための最良の形態に示す各構成により導かれる作用効果であって、従来の技術によっては得られない作用効果を奏することも本件の他の目的の一つとして位置付けることができる。
〔３〕付記
以上の実施形態に関し、更に以下の付記を開示する。
（付記１）
第１の装置と第２の装置との間に介設された接続装置であって、
前記第１の装置により、所定時間ごとに第１情報の書き込みが行なわれるとともに、前記第２の装置により、所定時間ごとに第２情報の書き込みが行なわれる保持部と、
前記第１及び第２の装置の各々による前記保持部への書き込みを監視し、前記監視結果に基づいて、前記第１及び第２の装置のうちのいずれか一方の装置に障害が発生したことを検出する検出部と、
前記検出部により検出された前記一方の装置の障害の発生を、他方の装置へ通知する通知部と、
を有することを特徴とする、接続装置。
（付記２）
前記検出部は、前記他方の装置による前記保持部への前記第１又は第２情報の書き込みが所定の回数連続して行なわれた場合に、前記一方の装置に障害が発生したことを検出することを特徴とする、付記１記載の接続装置。
（付記３）
前記保持部は、
前記第１の装置からの前記第１情報が設定される第１領域と、
前記第２の装置からの前記第２情報が設定される第２領域と、を備え、
前記第１又は第２の装置による前記保持部への前記第１又は第２情報の書込アクセスがあった場合に、前記第１及び第２領域に設定された値が示すアクセス状況に応じて、前記第１及び第２領域の状態の更新を行なう書込制御部をさらに有することを特徴とする、付記２記載の接続装置。
（付記４）
前記他方の装置により前記保持部に対して前記第１又は第２情報の連続した書込アクセスがあった回数を示すカウンタをさらに有し、
前記書込制御部は、前記第１又は第２の装置による前記書込アクセスがあった場合に、前記アクセス状況に応じて前記カウンタの制御を行ない、
前記検出部は、前記カウンタの値が所定の閾値に達した場合に、前記アクセス状況に応じて前記一方の装置に障害が発生したことを検出することを特徴とする、付記３記載の接続装置。
（付記５）
前記接続装置は、前記第１の装置と、複数の前記第２の装置との間に介設され、
前記保持部は、
前記複数の第２の装置ごとに、前記第１及び第２領域を含む記憶領域であって所定時間ごとに前記第１及び第２情報の書き込みが行なわれる記憶領域を備えることを特徴とする、付記３又は付記４記載の接続装置。
（付記６）
前記書込制御部は、
前記第１の装置による前記保持部への前記第１情報の書込アクセスがあった場合、前記複数の第２の装置それぞれに対応する前記記憶領域に対して前記第１及び第２領域の状態の更新を行ない、
前記複数の第２の装置のうちのいずれか一つの第２の装置による前記保持部への前記第２情報の書込アクセスがあった場合、前記一つの第２の装置に対応する前記記憶領域に対して前記第１及び第２領域の状態の更新を行なうことを特徴とする、付記５記載の接続装置。
（付記７）
前記複数の記憶領域の各々は、対応する前記第２の装置が前記検出部による障害の検出対象であるか否かを示す第３情報が設定される第３領域をさらに備え、
前記書込制御部は、前記第１の装置による前記保持部への前記第１情報の書込アクセスがあった場合、前記複数の第２の装置それぞれに対応する前記記憶領域の前記第３領域に設定された前記第３情報に応じて、前記記憶領域に対して前記第１及び第２領域の状態の更新を行なうことを特徴とする、付記５又は付記６記載の接続装置。
（付記８）
前記保持部は、前記第１又は第２の装置の障害の発生が検出されたことを示す第４情報が設定される第４領域をさらに備え、
前記検出部は、前記一方の装置に障害が発生したことを検出した場合に、前記第４領域に対して、前記一方の装置の障害の発生が検出されたことを示す前記第４情報を設定し、
前記通知部は、前記第４領域に設定された前記第４情報に基づいて、前記一方の装置の障害の発生を、前記他方の装置へ通知することを特徴とする、付記１〜７のいずれか１項記載の接続装置。
（付記９）
第１及び第２の装置の監視方法であって、
前記第１の装置と第２の装置との間に介設された接続装置により、
前記第１の装置による所定時間ごとの前記接続装置が有する保持部への第１情報の書き込みを監視するとともに、前記第２の装置による所定時間ごとの前記保持部への第２情報の書き込みを監視し、
前記監視結果に基づいて、前記第１及び第２の装置のうちのいずれか一方の装置に障害が発生したことを検出し、
検出された前記一方の装置の障害の発生を、他方の装置へ通知する、
ことを特徴とする、監視方法。
（付記１０）
前記検出する処理において、前記他方の装置による前記保持部への前記第１又は第２情報の書き込みが所定の回数連続して行なわれた場合に、前記一方の装置に障害が発生したことを検出することを特徴とする、付記９記載の監視方法。
（付記１１）
前記保持部は、
前記第１の装置からの前記第１情報が設定される第１領域と、
前記第２の装置からの前記第２情報が設定される第２領域と、を備え、
前記接続装置により、前記第１又は第２の装置による前記保持部への前記第１又は第２情報の書込アクセスがあった場合に、前記第１及び第２領域に設定された値が示すアクセス状況に応じて、前記第１及び第２領域の状態の更新を行なうことを特徴とする、付記１０記載の監視方法。
（付記１２）
前記更新を行なう処理において、前記第１又は第２の装置による前記書込アクセスがあった場合に、前記アクセス状況に応じて、前記他方の装置により前記保持部に対して前記第１又は第２情報の連続した書込アクセスがあった回数を示すカウンタの制御を行ない、
前記検出する処理において、前記カウンタの値が所定の閾値に達した場合に、前記アクセス状況に応じて前記一方の装置に障害が発生したことを検出することを特徴とする、付記１１記載の監視方法。
（付記１３）
前記接続装置は、前記第１の装置と、複数の前記第２の装置との間に介設され、
前記保持部は、
前記複数の第２の装置ごとに、前記第１及び第２領域を含む記憶領域であって所定時間ごとに前記第１及び第２情報の書き込みが行なわれる記憶領域を備えることを特徴とする、付記１１又は付記１２記載の監視方法。
（付記１４）
前記更新を行なう処理において、
前記第１の装置による前記保持部への前記第１情報の書込アクセスがあった場合、前記複数の第２の装置それぞれに対応する前記記憶領域に対して前記第１及び第２領域の状態の更新を行ない、
前記複数の第２の装置のうちのいずれか一つの第２の装置による前記保持部への前記第２情報の書込アクセスがあった場合、前記一つの第２の装置に対応する前記記憶領域に対して前記第１及び第２領域の状態の更新を行なうことを特徴とする、付記１３記載の監視方法。
（付記１５）
前記複数の記憶領域の各々は、対応する前記第２の装置が前記検出部による障害の検出対象であるか否かを示す第３情報が設定される第３領域をさらに備え、
前記更新を行なう処理において、前記第１の装置による前記保持部への前記第１情報の書込アクセスがあった場合、前記複数の第２の装置それぞれに対応する前記記憶領域の前記第３領域に設定された前記第３情報に応じて、前記記憶領域に対して前記第１及び第２領域の状態の更新を行なうことを特徴とする、付記１３又は付記１４記載の監視方法。
（付記１６）
前記保持部は、前記第１又は第２の装置の障害の発生が検出されたことを示す第４情報が設定される第４領域をさらに備え、
前記検出する処理において、前記一方の装置に障害が発生したことを検出した場合に、前記第４領域に対して、前記一方の装置の障害の発生が検出されたことを示す前記第４情報を設定し、
前記通知する処理において、前記第４領域に設定された前記第４情報に基づいて、前記一方の装置の障害の発生を、前記他方の装置へ通知することを特徴とする、付記９〜１５のいずれか１項記載の監視方法。
（付記１７）
第１の装置と第２の装置との間に介設され、保持部及び制御回路を有する接続装置であって、
前記制御回路は、
前記第１の装置による所定時間ごとの前記保持部への第１情報の書き込みを監視するとともに、前記第２の装置による所定時間ごとの前記保持部への第２情報の書き込みを監視し、
前記監視結果に基づいて、前記第１及び第２の装置のうちのいずれか一方の装置に障害が発生したことを検出し、
検出された前記一方の装置の障害の発生を、他方の装置へ通知する、
ことを特徴とする、接続装置。 In addition, the present invention is not limited to the above-described object, and is an operational effect derived from each configuration shown in the best mode for carrying out the invention described above. It can be positioned as one of the purposes.
[3] Appendix
Regarding the above embodiment, the following additional notes are disclosed.
(Appendix 1)
A connection device interposed between the first device and the second device,
A holding unit that writes the first information every predetermined time by the first device, and writes the second information every predetermined time by the second device;
The writing to the holding unit by each of the first and second devices is monitored, and a failure has occurred in one of the first and second devices based on the monitoring result. A detection unit for detecting
A notification unit for notifying the other device of the occurrence of a failure of the one device detected by the detection unit;
A connection device characterized by comprising:
(Appendix 2)
The detection unit detects that a failure has occurred in the one device when the first or second information is continuously written to the holding unit by the other device a predetermined number of times. The connection device as set forth in appendix 1, wherein:
(Appendix 3)
The holding part is
A first region in which the first information from the first device is set;
A second area in which the second information from the second device is set,
When there is a write access of the first or second information to the holding unit by the first or second device, depending on the access status indicated by the values set in the first and second areas The connection device according to appendix 2, further comprising a write control unit that updates the states of the first and second regions.
(Appendix 4)
A counter indicating the number of times the first or second information has been continuously written to the holding unit by the other device;
The write control unit controls the counter according to the access status when the write access is made by the first or second device.
The connection device according to appendix 3, wherein the detection unit detects that a failure has occurred in the one device according to the access status when the value of the counter reaches a predetermined threshold value. .
(Appendix 5)
The connection device is interposed between the first device and a plurality of the second devices,
The holding part is
Each of the plurality of second devices includes a storage area including the first and second areas, and the first and second information are written at predetermined time intervals. The connection device according to Supplementary Note 3 or Supplementary Note 4.
(Appendix 6)
The write control unit
When there is a write access of the first information to the holding unit by the first device, the state of the first and second regions with respect to the storage region corresponding to each of the plurality of second devices Update
The storage area corresponding to the one second device when there is a write access of the second information to the holding unit by any one second device among the plurality of second devices The connection device according to appendix 5, wherein the state of the first and second regions is updated.
(Appendix 7)
Each of the plurality of storage areas further includes a third area in which third information indicating whether or not the corresponding second device is a failure detection target by the detection unit is set.
The write control unit, when there is a write access of the first information to the holding unit by the first device, the third region of the storage region corresponding to each of the plurality of second devices The connection device according to appendix 5 or appendix 6, wherein the state of the first and second areas is updated with respect to the storage area in accordance with the third information set in (5).
(Appendix 8)
The holding unit further includes a fourth region in which fourth information indicating that a failure of the first or second device is detected is set;
When the detection unit detects that a failure has occurred in the one device, the detection unit sets the fourth information indicating that a failure has occurred in the one device in the fourth area. And
The notification unit notifies the occurrence of a failure of the one device to the other device based on the fourth information set in the fourth region. The connection device according to claim 1.
(Appendix 9)
A monitoring method for first and second devices, comprising:
By a connection device interposed between the first device and the second device,
The first device monitors the writing of the first information to the holding unit of the connecting device every predetermined time, and the second device writes the second information to the holding unit every predetermined time. Monitor
Based on the monitoring result, detecting that a failure has occurred in any one of the first and second devices,
Notifying the other device of the occurrence of the detected failure of the one device;
A monitoring method characterized by the above.
(Appendix 10)
In the detecting process, when the first device or the second information is continuously written to the holding unit by the other device for a predetermined number of times, it is detected that a failure has occurred in the one device. The monitoring method according to appendix 9, wherein:
(Appendix 11)
The holding part is
A first region in which the first information from the first device is set;
A second area in which the second information from the second device is set,
The value set in the first and second areas is indicated when the connection device has a write access of the first or second information to the holding unit by the first or second device. The monitoring method according to appendix 10, wherein the states of the first and second areas are updated according to an access situation.
(Appendix 12)
In the process of performing the update, when there is the write access by the first or second device, the other device performs the first or second on the holding unit according to the access status. Control the counter that indicates the number of times the information has been continuously written and accessed,
The monitoring according to claim 11, wherein in the detection process, when the value of the counter reaches a predetermined threshold value, it is detected that a failure has occurred in the one device according to the access status. Method.
(Appendix 13)
The connection device is interposed between the first device and a plurality of the second devices,
The holding part is
Each of the plurality of second devices includes a storage area including the first and second areas, and the first and second information are written at predetermined time intervals. The monitoring method according to Supplementary Note 11 or Supplementary Note 12.
(Appendix 14)
In the process of performing the update,
When there is a write access of the first information to the holding unit by the first device, the state of the first and second regions with respect to the storage region corresponding to each of the plurality of second devices Update
The storage area corresponding to the one second device when there is a write access of the second information to the holding unit by any one second device among the plurality of second devices The monitoring method according to appendix 13, wherein the states of the first and second areas are updated.
(Appendix 15)
Each of the plurality of storage areas further includes a third area in which third information indicating whether or not the corresponding second device is a failure detection target by the detection unit is set.
In the update process, when there is a write access of the first information to the holding unit by the first device, the third region of the storage region corresponding to each of the plurality of second devices 15. The monitoring method according to appendix 13 or appendix 14, wherein the state of the first and second areas is updated with respect to the storage area in accordance with the third information set in the above.
(Appendix 16)
The holding unit further includes a fourth region in which fourth information indicating that a failure of the first or second device is detected is set;
In the detection process, when it is detected that a failure has occurred in the one device, the fourth information indicating that a failure has occurred in the one device is detected for the fourth area. Set,
In the notifying process, the occurrence of a failure in the one device is notified to the other device based on the fourth information set in the fourth area. The monitoring method according to any one of the above.
(Appendix 17)
A connection device interposed between the first device and the second device and having a holding unit and a control circuit,
The control circuit includes:
Monitoring the writing of the first information to the holding unit every predetermined time by the first device, and monitoring the writing of the second information to the holding unit every predetermined time by the second device;
Based on the monitoring result, detecting that a failure has occurred in any one of the first and second devices,
Notifying the other device of the occurrence of the detected failure of the one device;
A connection device characterized by that.

１，１００情報処理装置
２，２−１，２−２，２００−１，２００−２ＣＰＵ
３，３−１，３−２，３００−１，３００−２メモリ装置
４，４００システム制御装置
５，５−１，５−２入出力処理装置（第１の装置）
５００−１，５００−２入出力処理装置
６，６−１，６−２ブリッジ装置（接続装置）
６０相互監視チェック制御回路
６０ａ制御回路
６１相互監視レジスタ（保持部）
６１ａ，６１ａ−１〜６１ａ−（ｎ＋１）レジスタ（記憶領域）
６２書込制御部
６３障害検出部（検出部）
６４通知部
６５，６６バス制御回路
６００−１，６００−２ブリッジ装置
７，７−１〜７−（ｎ＋１）チャネル装置（第２の装置）
７００−１〜７００−４チャネル装置
８，８−１，８−２，８００−１，８００−２入出力装置1,100 Information processing device 2, 2-1, 2-2, 200-1, 200-2 CPU
3,3-1,3-2,300-1,300-2 Memory device 4,400 System controller 5,5-1,5-2 I / O processing device (first device)
500-1,500-2 Input / output processing device 6,6-1,6-2 Bridge device (connection device)
60 Mutual monitoring check control circuit 60a Control circuit 61 Mutual monitoring register (holding unit)
61a, 61a-1 to 61a- (n + 1) registers (storage areas)
62 Write control unit 63 Fault detection unit (detection unit)
64 Notification unit 65, 66 Bus control circuit 600-1, 600-2 Bridge device 7, 7-1 to 7- (n + 1) Channel device (second device)
700-1 to 700-4 channel device 8,8-1,8-2,800-1,800-2 input / output device

Claims

A connection device interposed between the first device and the second device,
A register unit having a first bit area accessed by the first device and a second bit area accessed by the second device;
When access to the first bit area from the first device is detected, the value of the first bit area is controlled based on a combination of values set in the first bit area and the second bit area, respectively. And, upon detecting access to the second bit area from the second device, based on the combination of values, a write control unit that controls the value of the second bit area;
To monitor access to the register unit by each of said first and second devices, based on a combination of monitoring results and the value, one of the devices of the first and second device A detection unit for detecting that a failure has occurred;
A notification unit for notifying the other device of the occurrence of a failure of the one device detected by the detection unit;
A connection device characterized by comprising:

Wherein the detection unit, when the access to the register unit by the other device is performed continuously for the predetermined number of times, and detects that a failure has occurred in the one apparatus, according to claim 1 connection equipment described.

Further comprising a counter that indicates the number of times there was a continuous beneath write access to the register by the other device,
When the write access is made by the first or second device, the write control unit is configured to change the counter according to a combination of values set in the first bit area and the second bit area, respectively. Control
The detection unit according to claim 2 , wherein when the value of the counter reaches a predetermined threshold, the detection unit detects that a failure has occurred in the one device according to the combination of the values . Connected device.

The connection device is interposed between the first device and a plurality of the second devices,
The register unit is
For each of the plurality of second devices , a storage region that includes the first and second bit regions and is accessed from the first device and the plurality of second devices every predetermined time. The connection device according to claim 1, further comprising a connection device.

Each of the plurality of storage areas further includes a third bit area in which information indicating whether or not the corresponding second device is a failure detection target by the detection unit is set;
The write control unit is set in the third bit area of the storage area corresponding to each of the plurality of second devices when there is a write access to the register unit by the first device. 5. The connection apparatus according to claim 4, wherein values of the first and second bit areas are controlled for the storage area in accordance with the information.

A monitoring method for first and second devices, comprising:
By a connection device interposed between the first device and the second device,
Access to the first bit region from the first device for a register unit having a first bit region accessed by the first device and a second bit region accessed by the second device , Based on a combination of values set in the first bit area and the second bit area, respectively, to control the value of the first bit area,
Upon detecting access to the second bit area from the second device, based on the combination of values, control the value of the second bit area,
Monitoring access to the register unit by each of the first and second devices;
Based on a combination of monitoring results and the value, it detects that a failure has occurred in one of the devices of said first and second devices,
Notifying the other device of the occurrence of the detected failure of the one device;
A monitoring method characterized by the above.