JP2008310411A

JP2008310411A - Duplex device and system switching method in failure

Info

Publication number: JP2008310411A
Application number: JP2007155074A
Authority: JP
Inventors: Shuhei Yamaguchi; 修平山口
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2007-06-12
Filing date: 2007-06-12
Publication date: 2008-12-25
Anticipated expiration: 2027-06-12
Also published as: JP5061739B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a duplex device and a system switching method in failure by which the system of a duplex data processing system can be changed even when one device fails in detecting that one of two devices fails in this data processing system. <P>SOLUTION: When interruption showing the failure of either of the devices is successful in a duplex data processing system (step S401: Y), and when a failure is detected in the active one, it is stopped by active-system state change processing, and when a failure is detected in the other one, it is stopped by a standby system/stop transition processing (steps S403, F405). When stop system transition completion notification is received from the other device, the device is activated by an active system state change processing (step S408). <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、データ処理を二重化した二重化装置およびその障害時系切替方法に係わり、特に障害が発生したときに系の切り替えを行う二重化装置および障害時系切替方法に関する。 The present invention relates to a duplexing apparatus that duplicates data processing and a faulty system switching method, and more particularly, to a duplexing apparatus that switches systems when a fault occurs and a faulty system switching method.

各種のデータ処理装置や通信装置では、それらの処理内容の信頼性を高めるために２つの系を配置し、一方を運用系（現用系）とし他方を待機系（予備系）とした二重化されたデータ処理システムを構成することが多い。このような二重化されたデータ処理システムでは、それぞれの系の現在の状態をシステム全体で共有できるようにしている。このために、それぞれの系は自系がどのような状態にあるかを示す記憶領域を備えている。そして、他系の状態はこの記憶領域を参照して把握することが可能なようになっており、自系の状態は記憶領域の参照だけでなく、状態変化に応じてその内容を更新できるようになっている。たとえば、運用系の装置に障害が発生したとき、他系としての待機系の装置はこの障害発生を認識して自系を運用系に変更する。障害の発生した運用系の装置の方は、自系を停止系に変更してデータ処理システムから切り離す。 In various data processing devices and communication devices, in order to increase the reliability of the processing contents, two systems are arranged, one of which is an active system (active system) and the other is a standby system (standby system). Often constitutes a data processing system. In such a duplicated data processing system, the current state of each system can be shared by the entire system. For this purpose, each system is provided with a storage area indicating the state of the own system. The status of the other system can be grasped by referring to this storage area, and the contents of the own system can be updated not only by referring to the storage area but also according to the state change. It has become. For example, when a failure occurs in the active device, the standby device as the other system recognizes the occurrence of the failure and changes its own system to the active system. The active system device in which the failure has occurred changes its own system to the stopped system and disconnects it from the data processing system.

このように二重化されたデータ処理システムでは、一方の系の装置で障害が発生した場合、他方の系の装置でもこれを確実に認識し、たとえば自系の装置を待機系から運用系に切り替えるといった処理が行われることを前提としている。ところが、このような前提が成り立たない場合がある。 In such a duplexed data processing system, when a failure occurs in one system device, the other system device also reliably recognizes this, for example, switching the own system device from the standby system to the active system. It is assumed that processing will be performed. However, there are cases where such a premise does not hold.

たとえばそれぞれ個別にコンピュータを備えた二重化されたデータ処理システムで、２つの装置が割り込み線を用いて障害発生時の通知のために割り込み処理を行う場合を考えてみる。一方の系の装置の内部で障害が発生すると、割り込み線を通じてその系と他方の系の双方の装置に対して障害発生の通知のための割り込みが試みられる。ところが、可能性としては、このうちの一方の系の装置のみが割り込みに失敗する場合がある。このような場合には、障害の発生を割り込みによって検知した装置は、これに対応させて、たとえば自装置を待機系から運用系へ切り替える。これに対して、割り込みに失敗した装置（この例では運用系の装置）では、障害の発生を割り込み処理で検出できないので、相変わらず前の系の状態としての運用系を保持することになる。この結果、この例では２つの装置が共に運用系となってしまうという不具合を発生させる。 For example, consider a case in which two devices perform interrupt processing for notification when a failure occurs using an interrupt line in a duplex data processing system each having a computer. When a failure occurs in one system device, an interrupt for notifying the occurrence of the failure is attempted to both the system device and the other system device through the interrupt line. However, as a possibility, only one of these devices may fail to interrupt. In such a case, the device that has detected the occurrence of the failure by interruption switches the own device from the standby system to the active system, for example, in response to this. On the other hand, in the device that failed to interrupt (the active device in this example), since the occurrence of the failure cannot be detected by the interrupt processing, the active system is maintained as the state of the previous system as usual. As a result, in this example, there is a problem that the two devices are both active.

そこで、運用系コンピュータの監視ができなくなるような障害が発生した場合に、この障害となった運用系のコンピュータをリセットすることが提案されている（たとえば特許文献１参照）。この提案では、運用系のコンピュータのリセットが成功すると、この旨の通知を残りのコンピュータに行って、所定の優先順位に沿って他のコンピュータがリセットされたコンピュータに代わって運用系となるようにしている。
特開２００６−０１１９９２号公報（第００２２段落、第００３１段落、図１） Therefore, it has been proposed to reset an operation computer that has become a failure when a failure occurs that makes it impossible to monitor the operation computer (see, for example, Patent Document 1). In this proposal, when the resetting of the active computer is successful, a notification to that effect is sent to the remaining computers so that the other computers become the active system in place of the reset computer according to a predetermined priority. ing.
JP 2006-011992 A (paragraph 0022, paragraph 0031, FIG. 1)

ところが、このように運用系の装置が監視できない状態になったときこれをリセットするようにすると、運用系の装置の障害が断続的に発生するよう場合、リセットした時点で障害が存在しなければこの装置が再度運用系に選択されることになる。したがって、運用系の装置のリセットが頻繁に発生するような事態が生じてシステムが安定しないという問題があった。 However, if the active device is in a state that cannot be monitored in this way, and this is reset, if a failure of the active device occurs intermittently, there must be no failure at the time of reset. This device is selected again as the active system. Therefore, there has been a problem that the system is not stable due to the occurrence of frequent resetting of the operational system.

そこで本発明の目的は、二重化されたデータ処理システムで、２つの装置のいずれかに障害が発生したときに一方の装置がこの検出を失敗してもシステムの系の変更が可能な二重化装置および障害時系切替方法を提供することにある。 SUMMARY OF THE INVENTION An object of the present invention is to provide a duplex data processing system capable of changing the system system even if one of the two devices fails when a failure occurs in one of the two devices. It is to provide a system switching method at the time of failure.

本発明では、（イ）二重化されたデータ処理システムを構成する１組の装置としての自装置と他装置のいずれの側に障害が発生してもその発生箇所から所定の伝達経路を経て障害発生の通知を伝達する障害発生伝達手段と、（ロ）この障害発生伝達手段によって自装置が障害発生の通知を受けとることに成功したとき、その通知内容から自装置と他装置のいずれの側に障害が発生したかを判別する障害発生源判別手段と、（ハ）この障害発生源判別手段により判別された障害が発生した側の装置をデータ処理システムの運用系あるいは待機系のいずれかからデータ処理システムから切り離される停止系に無条件に変更する停止系推移処理を実行する停止系推移手段とを二重化装置に具備させる。 In the present invention, (a) even if a failure occurs on either side of the own device or another device as a set of devices constituting a duplexed data processing system, a failure occurs from the occurrence location via a predetermined transmission path. (B) When the own device has successfully received a notification of the occurrence of the failure by this failure occurrence transmission means, a failure occurs on either side of the own device or another device. And (c) data processing from either the operation system or standby system of the data processing system for the device on which the failure determined by the failure source determination means has occurred. The duplexer is provided with stop system transition means for executing stop system transition processing that unconditionally changes to a stop system that is disconnected from the system.

すなわち本発明では、障害発生伝達手段から自装置が障害発生の通知を受けとることに成功した二重化装置は、障害発生源判別手段で障害が発生したと判断された側の装置を運用系あるいは待機系のいずれかから停止系に無条件に、すなわち追加的な判断を行うプロセスを経ることなく、変更する停止系推移処理を実行するようにしている。すなわち、場合によっては運用系以外の装置からでも系状態の強制的な変更処理を許すと共に、障害系をシステムから切り離すようにしている。 That is, according to the present invention, the duplexing device that has successfully received the failure notification from the failure occurrence transmitting unit is the active or standby system that is determined to have failed by the failure source determining unit. The stop system transition process to be changed is executed unconditionally from any of the above to the stop system, that is, without going through a process of making an additional determination. That is, in some cases, the system state is forcibly changed even from a device other than the active system, and the faulty system is disconnected from the system.

また、本発明では、（イ）二重化されたデータ処理システムを構成する１組の装置としての自装置と他装置のいずれの側に障害が発生してもその発生箇所から所定の伝達経路を経て障害発生の通知を伝達する障害発生伝達手段と、（ロ）この障害発生伝達手段によって自装置が障害発生の通知を受けとることに成功したとき、その通知内容から自装置と他装置のいずれの側に障害が発生したかを判別する障害発生源判別手段と、（ハ）この障害発生源判別手段により判別された障害が発生した側が自装置であるとき自装置をデータ処理システムの運用系あるいは待機系のいずれかからデータ処理システムから切り離される停止系に無条件に変更する自系状態変更処理を実行する自系状態変更手段とを二重化装置に具備させる。 In the present invention, (a) even if a failure occurs on either side of the own device or the other device as a set of devices constituting the duplexed data processing system, it passes through a predetermined transmission path from the occurrence point. (B) When the own device succeeds in receiving the notification of the occurrence of the failure by this failure occurrence transmission means, either side of the own device or the other device from the notification content (C) a failure source determination unit for determining whether a failure has occurred; and (c) when the failure occurrence side determined by the failure source determination unit is the own device, the own device can be used as an operating system of a data processing system or in a standby state. The duplexer includes a self-system state change unit that executes a self-system state change process that unconditionally changes to a stop system that is disconnected from one of the systems from the data processing system.

すなわち本発明では、障害発生伝達手段から自装置が障害発生の通知を受けとることに成功した二重化装置は、障害が発生した側が自装置であるとき、自装置をデータ処理システムの運用系あるいは待機系のいずれかから停止系に無条件に、すなわち追加的な判断を行うプロセスを経ることなく、変更する自系状態変更処理を実行するようにしている。すなわち、場合によっては運用系以外の装置からでも系状態の強制的な変更処理を許すと共に、障害系をシステムから切り離すようにしている。 That is, according to the present invention, the duplexing device that has successfully received the failure notification from the failure occurrence transmission means, when the failure occurrence side is the own device, the duplexing device is the data processing system operating system or standby system. From any of the above, the own system state changing process is executed unconditionally, that is, without going through a process of making an additional determination. That is, in some cases, the system state is forcibly changed even from a device other than the active system, and the faulty system is disconnected from the system.

更に、本発明では、（イ）二重化されたデータ処理システムを構成する１組の装置としての自装置と他装置のいずれの側に障害が発生してもその発生箇所から所定の伝達経路を経て障害発生の通知を伝達する障害発生伝達手段と、（ロ）この障害発生伝達手段によって自装置が障害発生の通知を受けとることに成功したとき、その通知内容から自装置と他装置のいずれの側に障害が発生したかを判別する障害発生源判別手段と、（ハ）この障害発生源判別手段により判別された障害が発生した側が他装置であるときこの他装置をデータ処理システムの運用系あるいは待機系のいずれかからデータ処理システムから切り離される停止系に無条件に変更する他系停止系推移処理を実行する他系停止系推移処理手段とを二重化装置に具備させる。 Further, according to the present invention, (a) even if a failure occurs on either side of the own device or the other device as a set of devices constituting the duplexed data processing system, it passes through a predetermined transmission path from the occurrence point. (B) When the own device succeeds in receiving the notification of the occurrence of the failure by this failure occurrence transmission means, either side of the own device or the other device from the notification content (C) a failure source determination unit for determining whether a failure has occurred; and (c) when the failure occurrence side determined by the failure source determination unit is another device, The duplexer is provided with other system stop system transition processing means for executing another system stop system transition processing that unconditionally changes to a stop system that is disconnected from one of the standby systems from the data processing system.

すなわち本発明では、障害発生伝達手段から自装置が障害発生の通知を受けとることに成功した二重化装置は、障害が発生した側が他装置であるときこの他装置をデータ処理システムの運用系あるいは待機系のいずれかから停止系に無条件に、すなわち追加的な判断を行うプロセスを経ることなく、変更する他系停止系推移処理を実行するようにしている。すなわち、場合によっては運用系以外の装置からでも系状態の強制的な変更処理を許すと共に、障害系をシステムから切り離すようにしている。 That is, according to the present invention, the duplexing device that has successfully received the failure notification from the failure transmission means means that when the failure side is another device, the other device is used as an operation system or standby system of the data processing system. Any other system stop system transition process to be changed is executed unconditionally from any of the above to the stop system, that is, without going through an additional determination process. That is, in some cases, the system state is forcibly changed even from a device other than the active system, and the faulty system is disconnected from the system.

更にまた、本発明では、（イ）二重化されたデータ処理システムを構成する１組の装置としての自装置と他装置のいずれの側に障害が発生してもその発生箇所から所定の伝達経路を経て障害発生の通知を伝達する障害発生伝達ステップと、（ロ）この通知受信ステップによって自装置が障害発生の通知を受けとることに成功したとき、その通知内容から自装置と他装置のいずれの側に障害が発生したかを判別する障害発生源判別ステップと、（ハ）この障害発生源判別ステップにより障害が発生した側が自装置であると判別したとき自装置をデータ処理システムの運用系あるいは待機系のいずれかからデータ処理システムから切り離される停止系に無条件に変更する自系状態変更処理を実行する自系状態変更ステップと、（ニ）障害発生源判別ステップにより障害が発生した側が他装置であると判別されたときこの他装置をデータ処理システムの運用系あるいは待機系のいずれかからデータ処理システムから切り離される停止系に無条件に変更する他系停止系推移処理を実行する他系停止系推移処理ステップとを障害時系切替方法に具備させる。 Furthermore, according to the present invention, (a) even if a failure occurs on either side of the own device or another device as a set of devices constituting the duplexed data processing system, a predetermined transmission path is established from the occurrence point. (B) When the own device succeeds in receiving the notification of the occurrence of the failure by this notification receiving step, either side of the own device or the other device from the notification content (C) a failure source determination step for determining whether or not a failure has occurred; and (c) when it is determined that the failure side is the own device by this failure source determination step, An own system state change step for executing an own system state change process that unconditionally changes to a stopped system that is disconnected from one of the systems from the data processing system; When the step determines that the failure side is another device, this other device is unconditionally changed to a stop system that is disconnected from the data processing system from either the operating system or standby system of the data processing system. A faulty system switching method is provided with another system stop system transition processing step for executing system transition processing.

すなわち本発明では、障害が発生したときにその発生箇所から所定の伝達経路を経て障害発生の通知を自装置と他装置のいずれの側にも伝達させ、障害発生の通知を受けとることに成功した側の装置（自装置）は、知内容から自装置と他装置のいずれの側に障害が発生したかを判別するようにしている。そして、障害が発生した側が自装置であると判別したとき自装置をデータ処理システムの運用系あるいは待機系のいずれかからデータ処理システムから切り離される停止系に無条件に変更する自系状態変更処理を実行し、障害が発生した側が他装置であると判別されたときこの他装置をデータ処理システムの運用系あるいは待機系のいずれかからデータ処理システムから切り離される停止系に無条件に変更する他系停止系推移処理を実行するようにしている。このように運用系以外の装置からでも系状態の強制的な変更処理を許すと共に、障害系をシステムから切り離すようにすることで、比較的安全な方の装置を運用系として残すようにしている。 In other words, in the present invention, when a failure occurs, the failure occurrence notification is transmitted from the occurrence location via the predetermined transmission path to either the own device or the other device, and the failure occurrence notification is successfully received. The device on the side (self device) determines from which side the failure has occurred on either the own device or another device. Then, when it is determined that the failure side is the own device, the own device state change process that unconditionally changes the own device to either the active system or the standby system of the data processing system and the stopped system that is disconnected from the data processing system. When the failure side is determined to be another device, the other device is unconditionally changed to a stop system that is disconnected from the data processing system from either the operating system or standby system of the data processing system. System stop system transition processing is executed. In this way, the system status can be forcibly changed even from a non-active system, and the faulty system is disconnected from the system, so that the relatively safe device remains as the active system. .

このように本発明によれば、障害の発生の通知の受信に成功した側の装置が障害が自装置と他装置のいずれの側で発生したかを判断し、発生した側の装置を停止系に推移させるので、両方の装置が共に障害の通知の受信に成功した場合に限定して所定の装置を停止系に推移させるよりも障害系をシステムから迅速かつ確実に切り離すことができる。また、比較的安全な方の装置を運用系として残すことが可能になる。 As described above, according to the present invention, the device on the side that has successfully received the notification of the occurrence of the failure determines whether the failure has occurred on either the own device or the other device, and stops the device on which the failure has occurred. Therefore, the failure system can be separated from the system more quickly and reliably than when the predetermined device is shifted to the stop system only when both devices have successfully received the failure notification. In addition, it is possible to leave a relatively safer device as an operational system.

以下実施例につき本発明を詳細に説明する。 Hereinafter, the present invention will be described in detail with reference to examples.

図１は、本発明の一実施例における二重化システムの構成を表わしたものである。この二重化システム１００は、第１の装置１０１と第２の装置１０２の２つの装置を用いて通信システムを二重化したデュプレックスシステムを構成している。ここで第１の装置１０１は運用系となっており、通信路１０３または通信路１０４に接続された第１の切替スイッチ１０５および第２の切替スイッチ１０６が共に第１の装置１０１側に接続されている。第２の装置１０２は、待機系となっている。 FIG. 1 shows a configuration of a duplex system according to an embodiment of the present invention. The duplex system 100 configures a duplex system in which a communication system is duplexed using two devices, a first device 101 and a second device 102. Here, the first apparatus 101 is an active system, and the first changeover switch 105 and the second changeover switch 106 connected to the communication path 103 or the communication path 104 are both connected to the first apparatus 101 side. ing. The second device 102 is a standby system.

第１の装置１０１は第１の切替スイッチ１０５側に第１の入出力部１１１₁₁を配置しており、第２の切替スイッチ１０６側に第１の入出力部１１１₁₂を配置している。第２の装置１０２は第１の切替スイッチ１０５側に第２の入出力部１１１₂₁を配置しており、第２の切替スイッチ１０６側に第２の入出力部１１１₂₂を配置している。 The first device 101 has and the first output unit 111 ₁₁ is disposed, the first input-output unit 111 ₁₂ to side second changeover switch 106 is disposed on the side the first selector switch 105. The second device 102 has and a second output unit 111 ₂₁ is disposed, a second input and output unit 111 ₂₂ to side second selector switch 106 is arranged on the first selector switch 105 side.

第１の装置１０１は、系の状態を制御する第１の系状態制御部１１２₁を備えており、この内部にはＣＰＵ（Central Processing Unit）１１３₁と、このＣＰＵ１１３₁が実行する制御プログラムを格納したメモリ１１４₁が配置されている。また、第１の装置１０１は、第１の系間通信対応部１１６₁と、自系の障害監視を行う第１の障害監視部１１７₁を備えている。 The first device 101 includes a first system state control unit 112 ₁ that controls the state of the system. Inside this, a CPU (Central Processing Unit) 113 ₁ and a control program executed by the CPU 113 ₁ are stored. A stored memory 114 ₁ is arranged. In addition, the first device 101 includes a first inter-system communication corresponding unit 116 ₁ and a first fault monitoring unit 117 ₁ that performs fault monitoring of the own system.

第２の装置１０２も同様の構成となっている。すなわち、第２の装置１０２は、系の状態を制御する第２の系状態制御部１１２₂を備えており、この内部にはＣＰＵ１１３₂と、このＣＰＵ１１３₂が実行する制御プログラムが格納されたメモリ１１４₂が配置されている。また、第２の装置１０２は、第２の系間通信対応部１１６₂と、自系の障害監視を行う第２の障害監視部１１７₂を備えている。 The second device 102 has the same configuration. That is, the second device 102 includes a second system state control unit 112 ₂ that controls the state of the system. The CPU 113 ₂ and a memory in which a control program executed by the CPU 113 ₂ is stored. 114 ₂ is arranged. In addition, the second device 102 includes a second inter-system communication corresponding unit 116 ₂ and a second fault monitoring unit 117 ₂ that performs fault monitoring of the own system.

第１の系間通信対応部１１６₁と第２の系間通信対応部１１６₂は、系間リンク１２１によって接続されている。系間リンク１２１は、第１の系間通信対応部１１６₁と第２の系間通信対応部１１６₂の間で同期シーケンス用の通信を行うために使用される。また、第１の障害監視部１１７₁と第２の障害監視部１１７₂は、障害監視リンク１２２によって接続されている。障害監視リンク１２２は、第１の装置１０１と第２の装置１０２のいずれかに障害が発生したときこれを両系の第１の障害監視部１１７₁と第２の障害監視部１１７₂にハードウェア割り込みによって通知するために使用される。 The first inter-system communication corresponding unit 116 ₁ and the second inter-system communication corresponding unit 116 ₂ are connected by an inter-system link 121. The inter-system link 121 is used to perform synchronization sequence communication between the first inter-system communication corresponding unit 116 ₁ and the second inter-system communication corresponding unit 116 ₂ . The first failure monitoring unit 117 ₁ and the second failure monitoring unit 117 ₂ are connected by a failure monitoring link 122. The fault monitoring link 122 provides hardware to the first fault monitoring unit 117 ₁ and the second fault monitoring unit 117 ₂ in both systems when a fault occurs in either the first device 101 or the second device 102. Used to notify by hardware interrupt.

第１の装置１０１と第２の装置１０２は、系の切り替えに関するデータの共通した記憶領域として装置状態情報格納部１２３を備えている。装置状態情報格納部１２３は、第１の装置領域１２４₁と第２の装置領域１２４₂を備えている。第１の装置１０１は第１の装置領域１２４₁を参照したり、第１の装置領域１２４₁に書き込みができるが、第２の装置領域１２４₂からはデータの参照のみが可能である。同様に、第２の装置１０２は第２の装置領域１２４₂を参照したり、第２の装置領域１２４₂に書き込みができるが、第１の装置領域１２４₁からはデータの参照のみが可能である。装置状態情報格納部１２３に格納されているデータに更新があると、第１の装置領域１２４₁と第２の装置領域１２４₂のいずれに対しても、状態の更新の通知が行われるようになっている。装置状態情報格納部１２３には、たとえば第１の装置１０１と第２の装置１０２がそれぞれ現時点で運用系であるか、待機系であるか、あるいは停止系であるかといった情報が格納されるようになっている。 The first device 101 and the second device 102 include a device state information storage unit 123 as a common storage area for data related to system switching. The device state information storage unit 123 includes a first device region 124 ₁ and a second device region 124 ₂ . The first device 101 or reference to the first device region 124 _1, the first device region 124 ₁ can write to, from the second device region 124 ₂ is possible only reference data. Similarly, the second device 102 to browse the second device region 124 _2, the second device region 124 ₂ to be able to write, from the first device region 124 ₁ can only reference data is there. When the data stored in the device state information storage unit 123 is updated, the state update notification is sent to both the first device region 124 ₁ and the second device region 124 _2. It has become. In the device status information storage unit 123, for example, information indicating whether the first device 101 and the second device 102 are currently active, standby, or stopped is stored. It has become.

このような本実施例の二重化システム１００で障害が発生して系の切り替えを行う場合を説明する。本実施例の説明を行う前に、本発明に関連する技術として一般に行われている系の切り替えのためのシーケンスを説明する。 A case will be described in which a failure occurs in the duplex system 100 of this embodiment and the system is switched. Prior to the description of the present embodiment, a system switching sequence generally performed as a technique related to the present invention will be described.

図２は一般に行われている２つの系の切替シーケンスを表わしたものである。この図２に示したように、図１に示した第１の装置１０１に対応する運用系２０１と、第２の装置１０２に対応する待機系２０２が存在するものとする。この図２では、図の上から下方向に時間が経過しているものとする。 FIG. 2 shows a switching sequence of two systems generally performed. As shown in FIG. 2, it is assumed that there is an active system 201 corresponding to the first apparatus 101 shown in FIG. 1 and a standby system 202 corresponding to the second apparatus 102. In FIG. 2, it is assumed that time has passed from the top to the bottom of the figure.

ある時刻ｔ₁に運用系２０１に障害２０３が発生したものとする。すると、その後の時刻ｔ₂に、ハードウェアからの運用系障害通知の割り込みが運用系２０１と待機系２０２に対して行われる（ステップＳ３０１）。運用系２０１と待機系２０２がこの障害割り込みを正しく受信したとする。 It is assumed that a failure 203 has occurred in the active system 201 at a certain time t ₁ . Then, at the subsequent time t ₂ , an operation failure notification interruption from the hardware is performed to the operation system 201 and the standby system 202 (step S301). Assume that the active system 201 and the standby system 202 have correctly received this fault interrupt.

すると、互いの状態情報を認識して、正常系としての待機系２０２は障害系としての運用系２０１に対して同期動作が可能であるかどうかの問い合わせ（チェック）を行う（ステップＳ３０２）。これに対して運用系２０１から同期動作がＯＫ（肯定）であるという返答が待機系２０２に届いたものとする（ステップＳ３０３）。これを基にして待機系２０２は運用系２０１に対して、停止系に推移するための状態の推移を要求する（ステップＳ３０４）。この要求を運用系２０１が受信すると、自系の状態を変更処理して（ステップＳ３０５）、停止系２０１′に推移し、停止系推移完了通知を待機系２０２に送出することになる（ステップＳ３０６）。 Then, the mutual status information is recognized, and the standby system 202 as the normal system inquires (checks) whether the synchronous operation is possible with respect to the operational system 201 as the fault system (step S302). On the other hand, it is assumed that a reply that the synchronization operation is OK (positive) has arrived from the active system 201 to the standby system 202 (step S303). Based on this, the standby system 202 requests the operation system 201 to change the state for transitioning to the stop system (step S304). When the active system 201 receives this request, it changes the status of its own system (step S305), transitions to the stop system 201 ', and sends a stop system transition completion notification to the standby system 202 (step S306). ).

待機系２０２はこの停止系推移完了通知を受信すると、自系の状態を変更処理して（ステップＳ３０７）、待機系２０２から運用系２０２′に推移する。そして、運用系推移完了通知を停止系２０１′に送出することになる（ステップＳ３０８）。以上の処理が終了すると、停止系２０１′がシステムから切り離される。 When the standby system 202 receives the stop system transition completion notification, the standby system 202 changes the state of the own system (step S307), and transitions from the standby system 202 to the active system 202 ′. Then, the operation system transition completion notification is sent to the stop system 201 ′ (step S308). When the above processing is completed, the stop system 201 ′ is disconnected from the system.

ところで、図２に示したこのような処理の手順が採られた場合には、ステップＳ３０１で示したようにハードウェアからの運用系障害通知の割り込みが運用系２０１と待機系２０２の双方で成功する必要がある。このような割り込みが運用系２０１と待機系２０２の双方で成功すれば、その後に、待機系２０２側が同期動作チェックを行うことで（ステップＳ３０２）、運用系２０１へ停止系２０１′への推移を要求することができる。 By the way, when the procedure of such processing shown in FIG. 2 is adopted, as shown in step S301, the interruption of the operation system failure notification from the hardware succeeds in both the operation system 201 and the standby system 202. There is a need to. If such an interruption succeeds in both the active system 201 and the standby system 202, then the standby system 202 side performs a synchronous operation check (step S302), and changes the operation system 201 to the stop system 201 ′. Can be requested.

ところが、ハードウェアからの運用系障害通知の割り込みが運用系２０１と待機系２０２のうちの片方のみしか成功しない場合があり得る。この場合には、図２に示した状況とは異なってくる。すなわち、図２に示した本発明に関連する技術で、ある時刻ｔ₁に運用系２０１に障害２０３が発生したにもかかわらず、その後の時刻ｔ₂に、ハードウェアからの運用系障害通知の割り込み２０４を運用系２０１が正しく受信できなかったとする。この場合にはステップＳ３０４で待機系２０２から運用系２０１に対して、停止系に推移するための状態の推移の要求があったとしても、これが正しい要求であるかの判断ができない。待機系２０２に障害（図示せず）が発生してこのような停止系への推移の要求があったとすれば、運用系がステップＳ３０５で自系状態変更処理を行って停止系２０１′に変更してしまうと、運用系がどこにも存在しなくなってしまう恐れがある。 However, there may be a case where only one of the active system 201 and the standby system 202 succeeds in interrupting the active system failure notification from the hardware. In this case, the situation differs from that shown in FIG. That is, in the technique related to the present invention shown in FIG. 2, although the failure 203 has occurred in the active system 201 at a certain time t ₁ , the operation failure notification from the hardware is received at the subsequent time t ₂ . Assume that the active system 201 cannot correctly receive the interrupt 204. In this case, even if there is a request for state transition for transitioning to the stop system from the standby system 202 to the active system 201 in step S304, it cannot be determined whether this is a correct request. If a failure (not shown) occurs in the standby system 202 and there is a request for such transition to the stopped system, the active system performs its own system state change processing in step S305 and changes to the stopped system 201 ′. If this happens, there is a risk that the operational system will no longer exist.

一方、ある時刻ｔ₁に運用系２０１に障害２０３が発生したとき、その後の時刻ｔ₂に、ハードウェアからの運用系障害通知の割り込み２０５を待機系２０２が正しく受信できなかったとする。この場合、待機系２０２はステップＳ３０２の同期動作チェックを開始することができないので、ステップＳ３０７の自系状態変更処理を行うことができず、運用系２０２′に変更することができない。 On the other hand, when the failure 203 occurs in the active system 201 at a certain time t ₁ , it is assumed that the standby system 202 cannot correctly receive the interrupt 205 of the operation system failure notification from the hardware at the subsequent time t ₂ . In this case, since the standby system 202 cannot start the synchronous operation check in step S302, it cannot perform the own system state change process in step S307 and cannot change to the active system 202 ′.

本実施例では、図２を基にして説明したこのような本発明に関連する技術に存在する問題点を解消している。すなわち、図１に示した本実施例の二重化システム１００は、図２で示したステップＳ３０２〜ステップＳ３０４のプロセスを行わず、障害の通知についての割り込み２０４、２０５を受けた装置（第１の装置１０１あるいは第２の装置１０２）が自系の状態を制御するようにしている。 In the present embodiment, such a problem existing in the technology related to the present invention described with reference to FIG. 2 is solved. That is, the duplexing system 100 of the present embodiment shown in FIG. 1 does not perform the processes of steps S302 to S304 shown in FIG. 2, and receives the interrupts 204 and 205 for the notification of the failure (first device). 101 or the second apparatus 102) controls the state of the own system.

図３は、本実施例で障害が発生したときの第１の装置と第２の装置のそれぞれの処理の様子を表わしたものである。このような処理は、図１に示した第１または第２の装置１０１、１０２内のＣＰＵ１１３₁、１１３₂が自装置のメモリ１１４₁、メモリ１１４₂に格納された制御プログラムをそれぞれ実行することによって実現する。図１および図２を用いて説明を行う。ただし、図２の運用系２０１は初期的に第１の装置１０１であり、待機系２０２は初期的に第２の装置１０２である。 FIG. 3 shows the state of processing of each of the first device and the second device when a failure occurs in this embodiment. In such processing, the CPUs 113 ₁ and 113 _{2 in} the first or second device 101 or 102 shown in FIG. 1 respectively execute control programs stored in the memory 114 ₁ and the memory 114 ₂ of the own device. Realized by. The description will be made with reference to FIGS. 1 and 2. However, the active system 201 in FIG. 2 is initially the first device 101, and the standby system 202 is initially the second device 102.

時刻ｔ₁に運用系の第１の装置１０１あるいは待機系の第２の装置１０２で障害が発生したものとする。第１の装置１０１にとりあえず着目する。図２で説明したように第１の装置１０１は、時刻ｔ₁の障害２０３の発生に基づいて、時刻ｔ₂にハードウェアからの障害通知の割り込み２０４の処理を受ける。第１の装置１０１がこの割り込み２０４を正しく受信できたものとする（ステップＳ４０１：Ｙ）。すると、第１の装置１０１は自系に障害が検出されたかを判別する（ステップＳ４０２）。 It is assumed that a failure has occurred in the active first device 101 or the standby second device 102 at time t ₁ . Attention is paid to the first apparatus 101 for the time being. As described with reference to FIG. 2, the first device 101 receives the failure notification interrupt 204 from the hardware at time t ₂ based on the occurrence of the failure 203 at time t ₁ . It is assumed that the first device 101 has correctly received this interrupt 204 (step S401: Y). Then, the first device 101 determines whether a failure has been detected in the own system (step S402).

この結果、図２に示したように自系である運用系２０１に障害が検出されたものとする（ステップＳ４０２：Ｙ）。この場合、第１の装置１０１は図２のステップＳ３０２〜ステップＳ３０４の処理を行うことなく、直ちに自系状態変更処理（ステップＳ３０５）を行って、運用系から停止系に推移する（ステップＳ４０３）。そして、停止系推移完了通知（ステップＳ３０６）を他系としての第２の装置１０２に送出する（ステップＳ４０４）。ここで停止系推移完了通知とは、たとえばハードウェアのスイッチがオン・オフの状態を現在（たとえばオン）の状態から他方の状態（この例の場合にはオフ）に切り替えるものであってもよい。この例の場合には、スイッチの状態が切り替わったことを第２の装置１０２が検出することで、停止系推移完了通知を受信したことになる。 As a result, it is assumed that a failure has been detected in the active system 201 as shown in FIG. 2 (step S402: Y). In this case, the first apparatus 101 immediately performs its own system state change process (step S305) without performing the processes of steps S302 to S304 in FIG. 2, and transitions from the active system to the stopped system (step S403). . Then, a stop system transition completion notice (step S306) is sent to the second apparatus 102 as the other system (step S404). Here, the stop system transition completion notification may be, for example, switching from a current (for example, on) state of a hardware switch to an other state (in this case, off). . In the case of this example, the second apparatus 102 detects that the switch state has been switched, and thus has received a stop system transition completion notification.

図４は、運用系の障害を運用系自身が検出したこのような場合の処理を表わしている。時刻ｔ₁に運用系２０１である第１の装置１０１に障害２０３が発生して、時刻ｔ₂にハードウェアからの障害通知の割り込み２０４の処理を受ける。この結果、第１の装置１０１は自系状態変更処理（ステップＳ４０３）を行って、運用系２０１から停止系２０１′に推移する（ステップＳ４０３）。そして、停止系推移完了通知（ステップＳ３０６）を待機系２０２の第２の装置１０２に送出する。図４でこれ以外の部分については後に説明する。 FIG. 4 shows processing in such a case where the active system itself detects a failure in the active system. At time t ₁ , a failure 203 occurs in the first device 101 that is the active system 201, and at time t ₂ , a failure notification interrupt 204 is received from the hardware. As a result, the first apparatus 101 performs its own system state change process (step S403), and transitions from the active system 201 to the stop system 201 ′ (step S403). Then, a stop system transition completion notification (step S306) is sent to the second device 102 of the standby system 202. The other parts in FIG. 4 will be described later.

図３に戻って、第１の装置１０１がハードウェアからの障害通知の割り込み２０４処理を受けたものの、自系の障害が検出されなかった場合の処理（ステップＳ４０２：Ｎ）について説明する。この場合には他系に障害が発生している。そこで、この他系を停止系に推移させるための他系停止系推移処理を実行する（ステップＳ４０５）。そして、第１の装置１０１は自系が運用系となっているかを判別する（ステップＳ４０６）。第１の装置１０１は運用系となっている（Ｙ）。また、障害は待機系の方で発生している。したがって、第１の装置１０１は運用系の状態を保持して処理を終了する（エンド）。 Returning to FIG. 3, description will be made on the processing (step S402: N) in the case where the first apparatus 101 has received the failure notification interrupt 204 processing from the hardware, but the own system failure has not been detected. In this case, a failure has occurred in the other system. Therefore, the other system stop system transition process for transitioning this other system to the stop system is executed (step S405). Then, the first apparatus 101 determines whether its own system is an active system (step S406). The first apparatus 101 is an active system (Y). Moreover, the failure has occurred in the standby system. Therefore, the first apparatus 101 holds the active state and ends the process (end).

図５は、待機系の障害を運用系が検出したこのような場合の処理を表わしている。時刻ｔ₁に待機系である第２の装置１０２に障害２１１が発生する。これを基にして、時刻ｔ₂にハードウェアからの障害通知の割り込み２０４が第１の装置１０１で受信される（ステップＳ４０１：Ｙ）。第１の装置１０１はその後の時刻ｔ₃に第２の装置１０２に対して他系停止系推移処理を実行することになる（ステップＳ４０５）。図５でこれ以外の部分については後に説明する。 FIG. 5 shows processing in such a case where the active system detects a failure in the standby system. A failure 211 occurs in the second device 102 that is a standby system at time t ₁ . On this basis, the time t ₂ interrupt 204 failure notification from the hardware is received by the first device 101 (step S401: Y). It will perform other system stop system transition process on the second device 102 the first device 101 to the subsequent time t ₃ (step S405). Other parts in FIG. 5 will be described later.

図３に戻って、障害が発生したことを受信したものの自系には障害が検出されず、かつ自系は運用系でなかった場合の処理（ステップＳ４０６：Ｎ）について説明する。この場合、運用系に障害が発生している。そこで、停止系推移完了通知の受信を待って（ステップＳ４０７）、受信したら（Ｙ）、自系状態変更処理を行って自系を運用系に推移させる（ステップＳ４０８）。そして、自系が運用系になったことを運用系推移完了通知として他系に送出する（ステップＳ４０９）。このようにして、たとえば運用系であった第１の装置１０１に障害が発生したときには、待機系であった第２の装置１０２が運用系に推移することになる。 Returning to FIG. 3, description will be given of processing (step S406: N) in the case where the failure is detected in the own system but the own system is not the active system although it has been received that the failure has occurred. In this case, a failure has occurred in the active system. Therefore, after receiving the stop system transition completion notification (step S407), if received (Y), the host system is changed to the active system by performing the host system state change process (step S408). Then, the fact that the own system has become the active system is sent to the other system as an active system transition completion notification (step S409). In this way, for example, when a failure occurs in the first device 101 that was the active system, the second device 102 that was the standby system transitions to the active system.

図６は、運用系の障害を基に、待機系が運用系に推移する上記のような場合の処理を表わしている。時刻ｔ₁に運用系２０１である第１の装置１０１に障害２０３が発生し、時刻ｔ₂に待機系２０２である第２の装置１０２が障害通知の割り込み２０５を受信し（ステップＳ４０１：Ｙ）、自系に障害がないことを判別する（ステップＳ４０２：Ｎ）。この場合、第２の装置１０２は時刻ｔ₃に第１の装置１０１に対して他系停止系推移処理を実行する（ステップＳ４０５）。そして、第１の装置１０１が運用系２０１から停止系２０１′に推移して、時刻ｔ₄に停止系推移完了通知（ステップＳ３０６）を送ってくる。第２の装置１０２は停止系推移完了通知（ステップＳ３０６）を受信して（ステップＳ４０７：Ｙ）、自系状態変更処理（ステップＳ３０７）を行って運用系２０２′に推移する（ステップＳ４０８）。そして、時刻ｔ₅に運用系推移完了通知（ステップＳ３０８）を他系としての第１の装置１０１に送出することになる（ステップＳ４０９）。 FIG. 6 shows processing in the above case where the standby system changes to the active system based on the failure of the active system. At time t ₁ , the failure 203 occurs in the first device 101 that is the active system 201, and at time t ₂ , the second device 102 that is the standby system 202 receives the failure notification interrupt 205 (step S 401: Y). Then, it is determined that there is no failure in the own system (step S402: N). In this case, the second device 102 to perform other system stop system transition process to the first device 101 at time t ₃ (step S405). Then, the first device 101 is transitioned to stop system 201 'from the operation system 201, coming send stop system transition completion notification to the time t ₄ (step S306). The second apparatus 102 receives the stop system transition completion notification (step S306) (step S407: Y), performs its own system state change process (step S307), and transitions to the active system 202 '(step S408). Then, the operation system transition completion notification to the time t ₅ will send a (step S308) to the first device 101 as another system (step S409).

以上、図３で障害通知割り込みを受けた側の装置の動作について説明した。次に、障害が発生したにも係らず、これを運用系と待機系の片方のみが受信するような場合に着目してその説明を行う。ハードウェアからの障害通知割り込みが発生したときで、運用系と待機系の一方がこの割り込みを認識し、他方が認識しないとされる場合は、次の４通りが考えられる。
（ａ）運用系の障害を待機系が認識できない場合（運用系の障害を運用系のみが検出）（図４）
（ｂ）待機系の障害を運用系が認識できない場合（待機系の障害を待機系のみが検出）（図７）
（ｃ）運用系の障害を運用系が認識できない場合（運用系の障害を待機系のみが検出）（図６）
（ｄ）待機系の障害を待機系が認識できない場合（待機系の障害を運用系のみが検出）（図５） The operation of the device on the side that received the failure notification interrupt has been described above with reference to FIG. Next, a description will be given focusing on the case where only one of the active system and the standby system receives a fault despite the occurrence of a failure. When a failure notification interrupt from hardware occurs, when one of the active system and the standby system recognizes this interrupt and the other does not recognize it, the following four methods are conceivable.
(A) When the standby system cannot recognize the active system failure (only the active system detects the active system failure) (Fig. 4)
(B) When the active system cannot recognize the failure of the standby system (only the standby system detects the failure of the standby system) (FIG. 7)
(C) When the active system cannot recognize the active system failure (only the standby system detects the active system failure) (FIG. 6)
(D) When the standby system cannot recognize the standby system failure (only the active system detects the standby system failure) (FIG. 5)

本実施例では、図２で説明した例と異なり、一方の系のみが障害通知の割り込みを正しく受信した場合であっても、系を適正に推移させることができる。なお、待機系の障害を運用系が認識できない場合の例としての図７は、後に説明する。 In the present embodiment, unlike the example described with reference to FIG. 2, even when only one of the systems correctly receives the failure notification interrupt, the system can be appropriately shifted. Note that FIG. 7 as an example in which the active system cannot recognize a standby system failure will be described later.

図３では、このように障害通知の割り込みを受信できなかった装置が適切な系に推移できるように、障害通知割り込みが行われなかった場合の処理についても定めている。まず第１として、図４に示したように自系（この例では待機系１０２）が障害通知割り込みに失敗した場合で（ステップＳ４０１：Ｎ）、時刻ｔ₄に停止系推移完了通知を他系が送ってきた場合（ステップＳ４１０：Ｙ）を説明する。この場合、その他系がその系の障害２０３によって運用系から停止系に推移しているので（ステップＳ４０３、ステップＳ４０４参照）、待機系１０２は自系状態変更処理（ステップＳ３０７）を行って、自系を運用系２０２′に推移させる処理を行う。 In FIG. 3, processing when a failure notification interrupt is not performed is also defined so that a device that has not received a failure notification interrupt can transition to an appropriate system. As a first, as shown in FIG. 4 autologous in case of failed failure notification interrupt (standby 102 in this example) (step S401: N), the other system to stop based transition completion notification to the time t ₄ Will be described (step S410: Y). In this case, since the other system has transitioned from the active system to the stopped system due to the failure 203 of that system (see steps S403 and S404), the standby system 102 performs its own system state change process (step S307) and A process of transitioning the system to the active system 202 ′ is performed.

すなわち、図４の第２の装置１０２が時刻ｔ₂のハードウェアからの運用系の障害通知を受け取れなかった例では、停止系推移完了通知を第１の装置１０１から時刻ｔ₄に受信することで、第２の装置１０２は自系状態変更処理（ステップＳ３０７）を行って、運用系２０２′に推移することができる（ステップＳ４０８）。そして、時刻ｔ₅に、運用系推移完了通知（ステップＳ３０８）を停止系となった第１の装置１０１に送出して（ステップＳ４０９）、系の変更に伴う処理を終了させることになる（エンド）。 That is, in the example in which the second device 102 in FIG. 4 has not received an operational failure notification from the hardware at time t ₂ , a stop system transition completion notification is received from the first device 101 at time t _4. Thus, the second device 102 can perform the own system state change process (step S307) and transition to the active system 202 '(step S408). Then, at time t ₅ , the operational system transition completion notification (step S308) is sent to the first apparatus 101 that has become the stopped system (step S409), and the process associated with the system change is terminated (end). ).

次に、障害通知割り込みを処理できない状態で（ステップＳ４０１：Ｎ）、他系停止推移処理の実行が求められた場合には（ステップＳ４１０：Ｎ、ステップＳ４１１：Ｙ）、他系による停止系推移処理が実行されて自系が停止系に推移する処理（ステップＳ４１２）と、停止系推移完了通知（ステップＳ４１３）が実行される。 Next, in a state where the failure notification interrupt cannot be processed (step S401: N), when execution of other system stop transition processing is requested (step S410: N, step S411: Y), stop system transition by another system A process (step S412) in which the own system transitions to the stop system after the process is executed and a stop system transition completion notification (step S413) are executed.

これを、図５に示した例で説明する。この例では、待機系２０２としての第２の装置１０２が障害２１１の発生に対する時刻ｔ₂の割り込み処理に失敗する（ステップＳ４０１：Ｎ）。しかしながら、この例では運用系２０１としての第１の装置１０１が時刻ｔ₂に割り込み２０４を行う（ステップＳ４０１：Ｙ）。第１の装置１０１は自系に障害が検出されないので（ステップＳ４０２：Ｎ）、時刻ｔ₃に第２の装置１０２に対して他系停止系推移処理を実行することになる（ステップＳ４０５）。 This will be described with reference to the example shown in FIG. In this example, the second device 102 serving as the standby system 202 fails in the interrupt processing at time t ₂ for the occurrence of the failure 211 (step S401: N). However, in this example performs the first device 101 the time t ₂ to the interrupt 204 as the operation system 201 (step S401: Y). The first device 101 because failure to own system is not detected (step S402: N), will perform other system stop system transition process on the second device 102 at time t ₃ (step S405).

したがって、第２の装置１０２ではこの時刻ｔ₃に他系停止推移処理が実行されることになり（ステップＳ４１１：Ｙ）、自系の障害発生を結果的に判別することになる。そこで、第２の装置１０２は他系停止系推移処理によって停止系２０２′′に推移する（ステップＳ４０５）。そして、その後の時刻ｔ₄に停止系推移完了通知（ステップＳ３０６）を第１の装置１０１に送出して（ステップＳ４１３）、系の変更に伴う処理を終了させる（エンド）。 Accordingly, the second device 102 to the time t ₃ will be another system stop transition process is executed (step S411: Y), will determine the failure of the self-system consequently. Therefore, the second device 102 transitions to the stop system 202 ″ by the other system stop system transition processing (step S405). The stop system transition completion notification to the subsequent time t ₄ by sending a (step S306) to the first device 101 (step S413), the processing is terminated due to a change of the system (end).

最後に、図７に示した待機系の障害を運用系が認識できない場合の動作を図３と共に説明する。この例では、時刻ｔ₁に待機系２０２である第２の装置１０２に障害２１１が発生して、時刻ｔ₂にハードウェアからの障害通知の割り込み２０５の処理を第２の装置１０２のみが受けとっている。 Finally, the operation when the active system cannot recognize the standby system failure shown in FIG. 7 will be described with reference to FIG. In this example, a failure 211 occurs in the second device 102 that is the standby system 202 at time t _1, and only the second device 102 receives the failure notification interrupt 205 from the hardware at time t _2. ing.

まず、待機系２０２としての第２の装置１０２側に着目した処理を説明する。第２の装置１０２は時刻ｔ₂にハードウェアからの障害通知の割り込み２０５の処理を受信する（ステップＳ４０１：Ｙ）。この例で、第２の装置１０２は自系に障害を検出する（ステップＳ４０２：Ｙ）。そこで、第２の装置１０２は図２のステップＳ３０２〜ステップＳ３０４の処理を行うことなく、直ちに自系状態変更処理を行って（ステップＳ４０３）、待機２０２系から停止系２０２′′に推移する。そして、時刻ｔ₄に停止系推移完了通知（ステップＳ３０６）を他系としての第２の装置１０２に送出することになる（ステップＳ４０４）。 First, processing focusing on the second device 102 side as the standby system 202 will be described. The second device 102 receives the processing of the fault interrupt notification 205 from the hardware at time t ₂ (step S401: Y). In this example, the second device 102 detects a failure in its own system (step S402: Y). Therefore, the second device 102 immediately performs its own system state change process (step S403) without performing the processes of steps S302 to S304 in FIG. 2, and transitions from the standby 202 system to the stop system 202 ″. The stop system transition completion notification to the time t ₄ will send a (step S306) to the second device 102 as the other system (step S404).

次に、この図７で運用系２０１としての第１の装置１０１側に着目した処理を説明する。第１の装置１０１は時刻ｔ₂にハードウェアからの障害通知の割り込み処理を受信することができない（ステップＳ４０１：Ｎ）。したがって、当面の間は第２の装置１０２側の障害発生の事実を知らない。しかしながら、第２の装置１０２が自装置の処理で待機系２０２から停止系２０２′′に推移して、時刻ｔ₄に停止系推移完了通知が第１の装置１０１に送られてくる。したがって、この時点で運用系２０１としての第１の装置１０１は第２の装置１０２が停止系２０２′′に推移したことを知ることができる。 Next, processing focusing on the first apparatus 101 side as the active system 201 will be described with reference to FIG. The first device 101 can not receive the interrupt processing of the fault notification from the hardware at time t ₂ (step S401: N). Therefore, for the time being, the fact that a failure has occurred on the second device 102 side is unknown. However, the second device 102 is transitioned to stop system 202 '' from the standby system 202 in the processing of the apparatus, stop based transition completion notification to the time t ₄ is sent to the first device 101. Therefore, at this time, the first device 101 as the active system 201 can know that the second device 102 has transitioned to the stop system 202 ″.

以上説明したように本実施例によれば、障害箇所によりシーケンス同期が取れないような場合でも、正常な系を運用系としてサービスの続行ができるという効果がある。また、障害箇所により両系の認識する装置状態に不一致が発生した場合でも、正常な系を運用系としてサービスの続行ができるという効果がある。更に、障害箇所のために障害系が自身の障害を認識できない場合でも、これを危険な系として正しくシステムから切り離すことができるという効果がある。更にまた、強制的な系切替に伴う状態変更通知により、両系が正しく状態を認識できる機会が増えるという効果もある。 As described above, according to the present embodiment, there is an effect that even if sequence synchronization cannot be achieved due to a failure location, the service can be continued with the normal system as the active system. In addition, even if a mismatch occurs between the device states recognized by both systems due to the failure location, there is an effect that the service can be continued with the normal system as the active system. Furthermore, even if the faulty system cannot recognize its own fault due to the fault location, there is an effect that it can be correctly separated from the system as a dangerous system. Furthermore, there is an effect that the opportunity for both systems to correctly recognize the state is increased by the state change notification accompanying forced system switching.

なお、以上説明した実施例では、装置状態情報格納部１２３を第１の装置１０１と第２の装置１０２の双方に接続された装置のように示したが、第１の装置領域１２４₁および第２の装置領域１２４₂がそれぞれの装置１０１、１０２の運用系、待機系等の現状を反映させ、かつそれぞれの書き込みや読み出しが制限の範囲で可能であれば、どのような回路構成をとってもよいことは当然である。 In the embodiment described above, the device state information storage unit 123 is shown as a device connected to both the first device 101 and the second device 102, but the first device region 1241 and the _first device As long as the second device area 124 ₂ reflects the current status of the active system, the standby system, etc. of the respective devices 101 and 102 and each writing and reading are possible within the limits, any circuit configuration may be adopted. It is natural.

本発明の一実施例における二重化装置の構成を表わしたブロック図である。It is a block diagram showing the structure of the duplexer in one Example of this invention. 一般に行われている２つの系の切替シーケンスを示した説明図である。It is explanatory drawing which showed the switching sequence of two systems currently generally performed. 本実施例で各装置の切替制御の処理内容を示した流れ図である。It is the flowchart which showed the processing content of the switching control of each apparatus in a present Example. 本実施例で運用系の障害を待機系が認識できない場合の切替シーケンスを示した説明図である。It is explanatory drawing which showed the switching sequence when a standby system cannot recognize the failure of an active system in a present Example. 本実施例で待機系の障害を待機系が認識できない場合の切替シーケンスを示した説明図である。It is explanatory drawing which showed the switching sequence when a standby system cannot recognize the failure of a standby system in a present Example. 本実施例で運用系の障害を運用系が認識できない場合の切替シーケンスを示した説明図である。It is explanatory drawing which showed the switching sequence when an active system cannot recognize the failure of an active system in a present Example. 本実施例で待機系の障害を待機系が認識できない場合の切替シーケンスを示した説明図である。It is explanatory drawing which showed the switching sequence when a standby system cannot recognize the failure of a standby system in a present Example.

Explanation of symbols

１０１第１の装置
１０２第２の装置
１０５第１の切替スイッチ
１０６第２の切替スイッチ
１１２₁ 第１の系状態制御部
１１２₂ 第２の系状態制御部
１１３₁、１１３₂ ＣＰＵ
１１７₁ 第１の障害監視部
１１７₂ 第２の障害監視部
１２２障害監視リンク
１２３装置状態情報格納部
１２４₁ 第１の装置領域
１２４₂ 第２の装置領域 101 1st device 102 2nd device 105 1st changeover switch 106 2nd changeover switch 112 ₁ 1st system state control part 112 ₂ 2nd system state control part 113 ₁ , 113 ₂ CPU
117 ₁ First failure monitoring unit 117 ₂ Second failure monitoring unit 122 Fault monitoring link 123 Device status information storage unit 124 ₁ First device region 124 ₂ Second device region

Claims

Occurrence of a failure that transmits a failure notification from a location where the failure occurs on either side of the device itself or another device as a set of devices constituting a duplex data processing system via a predetermined transmission path A transmission means;
When the own device succeeds in receiving the notification of the occurrence of the failure by the failure occurrence transmission means, the failure occurrence source determining means for determining which side of the own device or the other device a failure has occurred from the notification content;
Stop system transition for unconditionally changing the apparatus on the failure side determined by the failure source determination means from either the operation system or standby system of the data processing system to the stop system disconnected from the data processing system A duplexer comprising: stop system transition means for executing processing.

Occurrence of a failure that transmits a failure notification from a location where the failure occurs on either side of the device itself or another device as a set of devices constituting a duplex data processing system via a predetermined transmission path A transmission means;
When the own device succeeds in receiving the notification of the occurrence of the failure by the failure occurrence transmission means, the failure occurrence source determining means for determining which side of the own device or the other device a failure has occurred from the notification content;
When the failure occurrence side determined by the failure source determination unit is the own device, the own device is unconditionally changed to a stop system that is disconnected from the operation system or standby system of the data processing system from the data processing system. A duplexer comprising: a host system state changing unit that executes a host system state changing process to be changed.

Occurrence of a failure that transmits a failure notification from a location where the failure occurs on either side of the device itself or another device as a set of devices constituting a duplex data processing system via a predetermined transmission path A transmission means;
When the own device succeeds in receiving the notification of the occurrence of the failure by the failure occurrence transmission means, the failure occurrence source determining means for determining which side of the own device or the other device a failure has occurred from the notification content;
When the failure occurrence side determined by the failure source determination unit is another device, the other device is unconditionally set to a stop system that is disconnected from the data processing system from either the operation system or the standby system of the data processing system. A duplexer comprising: another system stop system transition processing means for executing another system stop system transition process to be changed to

The failure occurrence transmission means interrupts a CPU (Central Processing Unit) of each device through a failure monitoring link that connects failure monitoring units for failure monitoring provided in each of the own device and the other device. 4. The duplexer according to claim 2, wherein the duplexer is a means for performing processing.

Stop system transition that outputs a stop system transition completion notification that prompts the transition of the other device to the operating system when the own device has transitioned to the stopped system by the own system state changing means and there is no failure in the other device 3. The duplex device according to claim 2, further comprising completion notification output means.

6. The duplexer according to claim 5, wherein when the stop system transition completion notification is received, a local system state change process for transitioning the local apparatus to an active system is executed.

Occurrence of a failure that transmits a failure notification from a location where the failure occurs on either side of the device itself or another device as a set of devices constituting a duplex data processing system via a predetermined transmission path A transmission step;
A failure occurrence source determination step for determining whether a failure has occurred on either the own device or another device from the notification content when the own device has successfully received a notification of the occurrence of a failure by the notification reception step;
When it is determined by the failure source determination step that the failure side is the own device, the own device is unconditionally set to a stop system that is disconnected from the data processing system from either the operating system or the standby system of the data processing system. A host system state changing step for executing the host system state changing process to be changed;
When it is determined by the failure source determination step that the failure side is another device, the other device is not included in the stop system that is disconnected from the data processing system from either the operating system or the standby system of the data processing system. A faulty system switching method comprising: an other system stop system transition processing step for executing another system stop system transition process to be changed to a condition.