JPS63288337A

JPS63288337A - Fault processing system for hot stand-by system

Info

Publication number: JPS63288337A
Application number: JP62122549A
Authority: JP
Inventors: Shinichi Kobayashi; 眞一小林
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1987-05-21
Filing date: 1987-05-21
Publication date: 1988-11-25

Abstract

PURPOSE:To obtain a supervisory device of a low cost by adding a function by which an existing system host and a stand-by system host can execute communication through a system control mechanism, and executing periodically a report of a fact being normal, from the existing system host to the stand-by system host through a general interface adaptor. CONSTITUTION:For instance, it is supposed that a normal notice 14 from an existing system host 1 is stopped through a communication path 25 via a GSA 15 (general interface adaptor). In this case, it is also considered that the existing system host 1 is broken down, but it is also considered that the notice is stopped due to a fault of said path 25, or a failure of an inter-host communication processing of an existing system host control program 11 and a stand-by system host control program 31, and in case of the latter, the existing host 1 is normal, therefore, switching to a stand-by system host 3 is not required. Accordingly, in order to demarcate whether the existing system 1 is broken down or not, a patrol message is transmitted to the existing system host 1 through a system control mechanism 23.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は情報処理システムに於るホットスタンバイ方式
に関し、特に現用系ホストと待機系ポストとを接続する
ホスト間インタフェース機構が故障した時の障害処理方
式に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a hot standby method in an information processing system, and in particular, to a hot standby method in an information processing system, and in particular to a hot standby system that deals with failures when a host-to-host interface mechanism that connects an active host and a standby post breaks down. Regarding processing method.

[Conventional technology]

従来、この種のホントスタンバイシステムでは。 Conventionally, in this kind of true standby system.

現用系ホストと待機系ホストとをホスト間インタフェー
ス機構で接続し２本機構を介して現用系ホストのダウン
を待機系ホストが検出した場合、待機系ホストよりステ
ム制御機構を介して現用系ホストを停止状態にした後、
待機系ホストが現用系ホストで処理していた業務を引継
ぐ様になっていた。The active host and standby host are connected by a host-to-host interface mechanism, and when the standby host detects that the active host is down via the two mechanisms, the standby host connects the active host via the system control mechanism. After stopping,
The standby host took over the tasks that were being processed by the active host.

[Problem that the invention seeks to solve]

しかしながら、現用系ホストからの正常応答が途絶える
のは、実際に現用系ホストがダウンした時ばかシでなく
、ホスト間インタフェース機構が故障した時（信頼度の
問題）やホスト間インタフェース機構を制御するソフト
ウェアの不良（品質の問題）によっても起こるので、こ
のような場合にも上述した従来のホットスタンバイシス
テムでは、業務処理中の現用系ホストを停止状態にして
しまうという欠点がある。また、ホスト間インタフェー
ス機構を２重化するという方式もあるが。However, normal responses from the active host stop not only when the active host actually goes down, but also when the host-to-host interface mechanism fails (reliability issue) or when the host-to-host interface mechanism is controlled. This can also occur due to software defects (quality problems), so the conventional hot standby system described above also has the disadvantage of stopping the active host during business processing in such cases. There is also a method of duplicating the inter-host interface mechanism.

この場合ホスト間インタフェーフ機構の信頼度は改善さ
れる一方、原価が２倍になるという欠点がある。In this case, although the reliability of the host-to-host interface mechanism is improved, the disadvantage is that the cost is doubled.

[Means for solving problems]

本発明は、現用系ホストと待機系ホストを含み。 The present invention includes an active host and a standby host.

現用系ホストと待機系ホストとを汎用インタフェースア
ダプタで接続し、更に待機系ホストから現用系ホストの
停止制御が可能な様にシステム制御機構にて接続したホ
ットスタンバイシステムに於て、システム制御機構を介
して現用系ホストと待機系ホストが通信できる機能を追
加することにより、現用系ホストから汎用インタフェー
スアダプタを介して待機系ホストへ定期的に正常である
旨の報告を行ない、待機系ホスト７にて現用系ホストか
らの正常報告を受信できなくなった場合、待機系ホスト
よりシステム制御機構を介して現用系ホストへ正常性問
合せを行なうと同時に現用系ホストからの応答の有無を
監視し、応答が有った時には現用系ホストは正常なので
業務処理を継続させ。In a hot standby system where the active host and standby host are connected by a general-purpose interface adapter, and further connected by a system control mechanism so that the standby host can control the stoppage of the active host, the system control mechanism is By adding a function that allows the active host and the standby host to communicate via the general-purpose interface adapter, the active host periodically reports that it is normal to the standby host via the general-purpose interface adapter, and the standby host 7 If a normality report cannot be received from the active host, the standby host queries the active host via the system control mechanism and at the same time monitors whether there is a response from the active host. When this occurs, the active host is normal and business processing continues.

応答が無かった時には待機系ホストよりシステム制御機
構を介して現用系ホストを停止状態にした後、待機系ホ
ストが自動的に現用系ホストで処理していた業務を引継
ぐようにすることを特徴とする。When there is no response, the standby host stops the active host via the system control mechanism, and then the standby host automatically takes over the business that was being processed by the active host. do.

〔Example〕

次に１本発明の実施例について図面を参照して説明する
。Next, an embodiment of the present invention will be described with reference to the drawings.

第１図は本発明の一実施例によるホットスタンノぐイシ
ステムの構成を示すブロック図である。現用系ホスト１
と待機系ホスト３はホスト間の通信ノＰス２５及びシス
テム制御機構２３を介して接続され、現用系ホスト１．
待機系ホスト３は各々制御プログラム１１．３１を有し
、更に現用系ホスト１専用のデータベース１２は待機系
ホスト３とも接続されている。FIG. 1 is a block diagram showing the configuration of a hot stun sieve system according to an embodiment of the present invention. Active host 1
and the standby host 3 are connected via an inter-host communication station 25 and a system control mechanism 23, and the active host 1.
Each standby host 3 has a control program 11.31, and a database 12 dedicated to the active host 1 is also connected to the standby host 3.

第２図は、第１図に示す現用系ホスト制御プログラム１
１及び待機系ホスト制御プログラム３１の処理手順を示
したフローである。Figure 2 shows the active host control program 1 shown in Figure 1.
1 and the standby host control program 31.

以下、第１図及び第２図を参照して本実施例の動作につ
いて説明する。The operation of this embodiment will be described below with reference to FIGS. 1 and 2.

先ず、現用系ホスト１は制御プログラム１１及びデータ
ベース１２を用い業務処理を行っているとする。又、待
機系ホスト３は制御プログラム３１を用い、現用系ホス
ト１がダウンした時はデータベース１２を速やかに引込
んで業務続行可能な状態、即ちホットスタンバイ状態に
あるとする。First, it is assumed that the active host 1 is performing business processing using the control program 11 and database 12. It is also assumed that the standby host 3 uses the control program 31 to quickly pull in the database 12 when the active host 1 goes down and is in a state where it can continue business operations, that is, in a hot standby state.

この時の現用系ホスト１及び待機系ホスト３の内部状態
は２次の様になっている。即ち、現用系ホスト１は、処
理１４及び処理１４−１に示す如く。At this time, the internal states of the active host 1 and standby host 3 are as follows. That is, the active host 1 performs processing as shown in process 14 and process 14-1.

制御プログラム１１のもとに、第１図の汎用インタフェ
ースアダプタ（ＧＳＡ）　］　５及びＧＳＡ　３５の間
を接続した通信パス２５を介して、一定時間毎に待機系
ホスト３へ現用系ホスト正常通知１４を繰り返し送りて
いる。一方、待機系ホスト３では。Based on the control program 11, the active host normality notification 14 is sent to the standby host 3 at regular intervals via the communication path 25 that connects the general purpose interface adapter (GSA) 5 shown in FIG. 1 and the GSA 35. is sent repeatedly. On the other hand, on standby host 3.

処理３４及び処理３６によｆｉ　、　ＧＳＡ経由通信パ
スを介した現用系ホスト１からの正常通知１４を受取っ
たか否かチェックしておシ、受取った場合には、以降も
現用系ホスト１のダウンを検出するため処理３４及び処
理３６を繰シ返す。In processing 34 and processing 36, it is checked whether or not the normal notification 14 from the active host 1 via the GSA communication path has been received, and if it has been received, the active host 1 will continue to be down. Processing 34 and processing 36 are repeated to detect.

ここで、　ＧＳＡ経由通信ノ４？ス２５を介した現用系
ホスト１からの正常通知１４が途絶えたとする。Here, communication via GSA No. 4? Suppose that the normal notification 14 from the active host 1 via the host 25 is discontinued.

この場合、現用系ホスト１がダウンしたとも考えられる
が、　ＧＳＡ経由通信／４’ス２５の故障、或は。In this case, it is conceivable that the active host 1 is down, but there is also a failure of the communication via GSA/4' bus 25.

現用系ホスト制御プログラム１１や待機系ホスト制御プ
ログラム３１のホスト間通信処理の不具合により途絶え
たとも考えられ、これら後者３つの場合には現用系ホス
ト１は正常に業務処理を実行しているので、待機系ホス
ト３への系替えは不要である。It is also possible that the interruption occurred due to a malfunction in the inter-host communication processing of the active host control program 11 and the standby host control program 31; in these latter three cases, the active host 1 is normally executing business processing, so There is no need to switch to the standby host 3.

その為、以降の処理により現用系ホスト１がダウンした
のか否かを切分ける。即ち、待機系ホスト３側の処理３
６により正常通知が途絶えたと判断した場合、処理３７
によりシステム制御機構２３を介して現用系ホスト１へ
Ａ’　）ロールメツセージを送信する。Therefore, it is determined whether the active host 1 has gone down or not through subsequent processing. That is, processing 3 on the standby host 3 side
If it is determined that normal notification has ceased due to 6, process 37
A') A roll message is sent to the active host 1 via the system control mechanism 23.

ここで、現用系ホスト１がダウンしていなければ、処理
１７及び処理１８により、パトロールメツセージを受信
し、システム制御機構２３を介して待機系ホスト３ヘパ
トロールメツセージ受信の応答を送信する。この結果、
待機系ホスト３側の処理３８によってパトロールメツセ
ージの応答を受は取ることにより、現用系ホスト１はダ
ウンしてい々いと判断出来、系切替えをせず、以降の現
用系ホスト１のダウンを検出するため処理３４へ戻る。Here, if the active host 1 is not down, the patrol message is received in steps 17 and 18, and a response to the patrol message reception is sent to the standby host 3 via the system control mechanism 23. As a result,
By receiving and receiving responses to the patrol messages through the process 38 on the standby host 3 side, it can be determined that the active host 1 is about to go down, and subsequent downtime of the active host 1 is detected without system switching. The process then returns to step 34.

一方、現用系ホスト１がダウンしていれば、パトロール
メツセージ受信の応答が来ない事が処理３８によりチェ
ックされ、処理３９へ進む。その結果、システム制御機
構２３を介して現用系ホスト１を電汀切断し、現用系ホ
スト１を完全停止させた後に、処理３２により系切替え
を行ない、現用系ホスト】での処理中断時点からの業務
再開が可能となる。On the other hand, if the active host 1 is down, it is checked in process 38 that no response to the patrol message reception is received, and the process advances to process 39. As a result, the power to the active host 1 is cut off via the system control mechanism 23, and after the active host 1 is completely stopped, system switching is performed in process 32, and the system is switched from the point in time when the processing on the active host 1 is interrupted. Business can be resumed.

〔Effect of the invention〕

以上説明したように本発明は、ホスト間での停止制御の
みの目的で具備されている機構に対し。As explained above, the present invention is directed to a mechanism provided solely for the purpose of stop control between hosts.

ホスト同志が通信できる機能を追加し、ホスト間汎用イ
ンタフェース機構経由の現用系ホスト正常通知が途絶え
た時に、上記追加機能を利用してホスト間で通信を行な
う事により現用系ホストのダウンを確認ｌできるので、
不要な系切替えを行々わずに済むという効果がある。又
、従来のシステム制御機構の有している機能に対し、ホ
スト同志の通信を可能にすると言うプロトコルレベルの
機能拡充に留まるので、ホスト間汎用インタフェース機
構を２重化するのに比べて原価が安くなるという利点も
ある。Added a function that allows hosts to communicate with each other, and when the active host loses normal notification via the host-to-host general-purpose interface mechanism, it is possible to check if the active host is down by communicating between the hosts using the above additional function. Because you can
This has the effect of eliminating unnecessary system switching. Furthermore, compared to the functions provided by conventional system control mechanisms, the expansion of functions at the protocol level that enables communication between hosts is limited, so the cost is lower than duplicating the general-purpose interface mechanism between hosts. It also has the advantage of being cheaper.

[Brief explanation of drawings]

第１図は本発明の一実施例によるホットスタンバイシス
テムの構成を示すブロック図、第２図は第１図に示す現
用系ホスト制御プログラム及び待機系ホスト制御プログ
ラムの障害処理に関する処理フローの一部を示すフロー
チャートである。１・・・現用系ホスト、３・・・待機系ホス）、１１゜
３１・・・ホスト制御プログラム、】２・・・現用系ホ
ストデータベース、１５．３５・・・ＧＳＡ　（汎用イ
ンタフェースアダプタ）、２５・・・ＧＳＡ経由通信パ
ス。２３・・・システム制御機構。FIG. 1 is a block diagram showing the configuration of a hot standby system according to an embodiment of the present invention, and FIG. 2 is a part of the processing flow related to failure handling of the active host control program and standby host control program shown in FIG. It is a flowchart which shows. 1... Active host, 3... Standby host), 11゜31... Host control program, ]2... Active host database, 15.35... GSA (general purpose interface adapter), 25...Communication path via GSA. 23...System control mechanism.

Claims

[Claims]

1. Includes an active host and a standby host, connects the active host and the standby host with a general-purpose interface adapter, and further connects the active host and the standby host to send/receive messages between the hosts and stop them. In a hot standby system connected to a controllable system control mechanism, the active host periodically reports normality to the standby host via the general-purpose interface adapter, and the standby host If the host is no longer able to receive normal reports from the active host,
The standby host makes a health inquiry to the active host via the system control mechanism, and at the same time monitors whether or not there is a response from the active host, and when there is a response, the active host executes business processing. When there is no response, the standby host stops the active host via the system control mechanism, and then the standby host automatically resumes the work being processed by the active host. A failure handling method in a hot standby system characterized by takeover.