JPH1185714A

JPH1185714A - Execution resource control program for computer duplex system

Info

Publication number: JPH1185714A
Application number: JP9240681A
Authority: JP
Inventors: Katsuhiko Takeda; 勝彦武田; Hiroshi Furukawa; 博古川; Daisuke Shinohara; 大輔篠原; Hiroyuki Igata; 博之井形
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1997-09-05
Filing date: 1997-09-05
Publication date: 1999-03-30

Abstract

PROBLEM TO BE SOLVED: To improve the availability of a computer duplex system by communicating the execution/stop control of resources or the switching control of a computer to execute the resources with an execution resource control program. SOLUTION: When it is requested to switch the execution of all the resources under execution at a computer 20 to a computer 30, while communicating with a working state control program 214, an execution resource control program 211 starts processing for switching the computer to execute all the resources. When there is no resource under execution at the computer 20, a switching end notice is sent to the working state program 214. As a result, at the time point when the working state control program 214 stops working through a working managing interface, the execution of resources under execution at the computer 20 is completely switched to the computer 30 and since there is no resource under execution at the computer 20, even while an execution resource control program 311 detects non-working of the computer 20, the availability of resources presented by the computer duplex system is not lowered.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、二台の計算機によ
り高可用性を提供する計算機二重化システムに係わる。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a dual computer system which provides high availability by using two computers.

【０００２】[0002]

【従来の技術】従来における計算機二重化システムを図
面を用いて説明する。2. Description of the Related Art A conventional computer duplication system will be described with reference to the drawings.

【０００３】図２は、従来の計算機二重化システムであ
る。FIG. 2 shows a conventional computer duplication system.

【０００４】計算機２０と３０は計算機二重化システム
を構成する計算機であり、それぞれの計算機は計算機二
重化システムでは同じ構成となるため、計算機２０の構
成のみを説明する。計算機２０ではオペレーティングシ
ステム２１が実行され、オペレーティングシステム２１
上で実行資源制御プログラム２１１が実行されている。
計算機２０には通信を行うための通信手段２３と２４が
搭載されている。通信手段２３はネットワーク１０に接
続する計算機と通信を行い、通信手段２４はネットワー
ク５０に接続する計算機と通信を行うことができる。ま
た、計算機外部のハードディスクにアクセスするための
ＳＣＳＩコントローラ２２が搭載されている。The computers 20 and 30 are computers constituting a redundant computer system. Since the computers have the same configuration in the redundant computer system, only the configuration of the computer 20 will be described. In the computer 20, an operating system 21 is executed.
The execution resource control program 211 has been executed above.
The computer 20 is equipped with communication means 23 and 24 for performing communication. The communication unit 23 can communicate with a computer connected to the network 10, and the communication unit 24 can communicate with a computer connected to the network 50. Further, a SCSI controller 22 for accessing a hard disk outside the computer is mounted.

【０００５】計算機６０は計算機二重化システムの管理
計算機であり、オペレーティングシステム６１上で計算
機二重化システムの管理ソフトウェア６１１が実行さ
れ、通信手段６２によりネットワーク１０で接続された
計算機二重化システムの管理を行うことができる。The computer 60 is a management computer for the computer redundancy system. The computer redundancy system management software 611 is executed on the operating system 61. The computer 60 manages the computer redundancy system connected via the network 10 by the communication means 62. it can.

【０００６】計算機２０と３０はＳＣＳＩコントローラ
２２、３２によりハードディスク４０を共有し、実行資
源制御プログラム２１１、３１１により、ハードディス
ク４０内の共有資源４１から資源を読み込み、両計算機
で資源の実行・停止を制御することが可能である。図２
では、計算機２０で資源２１２が実行され、計算機３０
で資源３１２が実行されている。The computers 20 and 30 share the hard disk 40 by the SCSI controllers 22 and 32, read resources from the shared resource 41 in the hard disk 40 by the execution resource control programs 211 and 311, and execute and stop the resources on both computers. It is possible to control. FIG.
Then, the resource 212 is executed by the computer 20 and the computer 30
The resource 312 is being executed.

【０００７】各計算機で実行されている資源は、通信手
段２３、３３を利用して、ネットワーク１０に接続して
いる利用者に提供される。図２では、計算機２０で資源
２１２を実行し、計算機３０で資源３１２を実行し、そ
れぞれの資源を利用者に提供する。[0007] The resources executed by each computer are provided to users connected to the network 10 using the communication means 23 and 33. In FIG. 2, the resource 212 is executed by the computer 20, the resource 312 is executed by the computer 30, and each resource is provided to the user.

【０００８】計算機２０と３０は通信手段２４、３４に
より、ネットワーク５０で接続され、通信を行うことで
相互の稼動状態を監視することが可能である。実行資源
制御プログラム２１１は、計算機３０が何らかの原因に
より未稼動になったことを通信手段２４により検出する
と、自動的に計算機３０で実行されていた資源３１２を
共有ハードディスク４０から自計算機に読み込み、計算
機３０に代わって自計算機で実行し、資源３１２の可用
性を維持する。なお、計算機２０が未稼動になった場合
も、同様の手順により資源２１２が計算機３０で実行さ
れ可用性が維持される。[0008] The computers 20 and 30 are connected by a network 50 by communication means 24 and 34, and can monitor each other's operation status by performing communication. When the communication means 24 detects that the computer 30 has become inactive for some reason, the execution resource control program 211 automatically reads the resource 312 being executed by the computer 30 from the shared hard disk 40 into its own computer. It is executed by its own computer in place of 30 to maintain the availability of the resource 312. Even when the computer 20 becomes inactive, the resources 212 are executed by the computer 30 in the same procedure, and the availability is maintained.

【０００９】また、管理者は管理ソフトウェア６１１を
用いて、計算機２０または計算機３０の実行資源制御プ
ログラムと通信を行い、資源の実行・停止制御を要求し
たり、資源の実行計算機の切り替え制御を要求すること
が可能である。Further, the administrator communicates with the execution resource control program of the computer 20 or the computer 30 by using the management software 611 to request the execution / stop control of the resource or the switching control of the execution computer of the resource. It is possible to

【００１０】[0010]

【発明が解決しようとする課題】従来の計算機二重化シ
ステムでは、実行資源制御プログラムによる資源の実行
計算機の切り替えは、一方の計算機の実行資源制御プロ
グラムが相手計算機の未稼動を検出した時点で実行され
る。相手計算機の未稼動を検出する手段としては、両計
算機の実行資源制御プログラムが通信により相互監視を
行う手段しかなく、実行資源制御プログラムが相手計算
機の未稼動状態を検出するのに必要な時間が計算機二重
化システムの可用性を維持する鍵となる。In the conventional computer redundant system, the switching of the execution computer of the resource by the execution resource control program is executed when the execution resource control program of one computer detects that the other computer is not operating. You. The only means for detecting the non-operation of the other computer is that the execution resource control programs of both computers mutually monitor each other by communication, and the time required for the execution resource control program to detect the non-operation state of the other computer is limited. This is the key to maintaining the availability of a redundant computer system.

【００１１】一方の計算機が、正常に稼動を終了する場
合には、この計算機で実行中の資源の実行を相手計算機
に切り替える必要がある。しかし相手計算機の実行資源
制御プログラムがこの計算機の未稼動を検知するまで、
資源の実行計算機の切り替えが行われない。When one computer normally terminates its operation, it is necessary to switch the execution of the resource being executed by this computer to the other computer. However, until the execution resource control program of the partner computer detects that this computer is not running,
The execution computer of the resource is not switched.

【００１２】この場合、計算機の切り替えが行われるま
で、未稼動になった計算機で実行されていた資源を利用
者に提供することができなくなるため、システムの可用
性は低下する。In this case, the resources that have been executed on the computer that has not been operated cannot be provided to the user until the computer is switched, so that the availability of the system is reduced.

【００１３】また同様に、一方の計算機が重度のハード
ウェア障害により稼動停止になった場合にも、この計算
機で実行中の資源の実行を相手計算機に切り替える必要
があるが、相手計算機の実行資源制御プログラムがこの
計算機の稼動停止を検知するまで、資源の実行計算機の
切り替えが行われない。このため、計算機の切り替えが
行われるまで、未稼動になった計算機で実行されていた
資源を利用者に提供することができなくなるため、シス
テムの可用性は低下する。Similarly, when the operation of one computer is stopped due to a serious hardware failure, it is necessary to switch the execution of the resource being executed by this computer to the other computer. Until the control program detects that the operation of this computer has stopped, the execution computer of the resource is not switched. For this reason, until the switching of the computers is performed, the resources that have been executed on the computers that have become inactive cannot be provided to the users, and the availability of the system is reduced.

【００１４】また、計算機二重化システムの運用形態に
よっては、特定時刻に処理性能を上げるため、資源の実
行を停止して、計算機の負荷を軽減したい場合がある。
この場合、管理者が手動で、計算機二重化システムの管
理ソフトウェアを使って資源の制御を行わなければなら
ず、管理者の負担が大きくなる。Further, depending on the operation mode of the redundant computer system, there is a case where the execution of resources is stopped to reduce the load on the computer in order to increase the processing performance at a specific time.
In this case, the administrator must manually control the resources using the management software of the redundant computer system, which increases the burden on the administrator.

【００１５】本発明の目的は、上記課題を解決し、計算
機二重化システムの可用性を向上し、計算機二重化シス
テムの実行資源管理を自動化することにある。An object of the present invention is to solve the above problems, improve the availability of a redundant computer system, and automate execution resource management of the redundant computer system.

【００１６】[0016]

【課題を解決するための手段】上記課題は、資源の実行
・停止の制御や、資源を実行する計算機の切り替え制御
を、実行資源制御プログラムと通信することで、外部か
ら要求して実行させる手段と、上記手段を、計算機の稼
動状態制御プログラム、ハードウェア障害監視プログラ
ムなどの外部プログラムが自動的に行う手段と、制御実
行スケジュール情報により、実行資源制御プログラムが
自動的に上記制御を行う手段により達成される。The object of the present invention is to provide a means for externally requesting and executing control of resource execution / stop and switching of a computer executing a resource by communicating with an execution resource control program. A means for automatically executing the above means by an external program such as a computer operating state control program, a hardware failure monitoring program, and a means for the execution resource control program to automatically perform the above control by control execution schedule information. Achieved.

【００１７】[0017]

【発明の実施の形態】本発明の実施形態を図面を用いて
説明する。Embodiments of the present invention will be described with reference to the drawings.

【００１８】図１は、二台の計算機からなる計算機二重
化システムである。計算機２０と３０は計算機二重化シ
ステムを構成する計算機であり、それぞれの計算機は計
算機二重化システムでは同じ構成となるため、計算機２
０の構成のみを説明する。FIG. 1 shows a computer duplication system including two computers. The computers 20 and 30 are computers constituting a redundant computer system. Since the respective computers have the same configuration in the redundant computer system, the computers 20 and 30 have the same configuration.
Only the configuration of 0 will be described.

【００１９】計算機２０では、従来の計算機二重化シス
テムと同様の構成を持ち、加えて稼動状態制御プログラ
ム２１４とハードウェア障害監視プログラム２１３がオ
ペレーティングシステム２１上で実行されている。The computer 20 has a configuration similar to that of a conventional computer redundant system, and additionally, an operating state control program 214 and a hardware fault monitoring program 213 are executed on the operating system 21.

【００２０】稼動状態制御プログラム２１４は、図３に
示すように、遠隔の計算機６０のオペレーティングシス
テム６１上で動作する計算機管理ソフトウェア６１２と
の通信により、管理者から計算機の稼動終了を要求され
ると、稼動管理インタフェース２１５により計算機２０
の稼動を終了する機能を持つ。As shown in FIG. 3, the operating state control program 214 is communicated with the computer management software 612 running on the operating system 61 of the remote computer 60, and is requested by the administrator to terminate the operation of the computer. , The operation management interface 215
It has a function to end the operation of.

【００２１】ハードウェア障害監視プログラム２１３
は、図４に示すようにローカルハードディスク２７上の
ハードウェア障害情報２７１から、ドライバ２６が書き
込んだハードウェア障害情報を読み取ることで、計算機
のＣＰＵ温度異常、メモリ縮退、筐体温度上昇、電圧降
下などのハードウェア障害を検知する機能を持つ。Hardware failure monitoring program 213
As shown in FIG. 4, by reading the hardware failure information written by the driver 26 from the hardware failure information 271 on the local hard disk 27, CPU temperature abnormality of the computer, memory degeneration, enclosure temperature rise, voltage drop It has a function to detect hardware failures such as.

【００２２】図１の実行資源制御プログラム２１１は、
図５に示すように、計算機二重化システムの管理用イン
タフェース２１６により、計算機二重化システムで提供
されている資源の実行・停止制御や、資源の実行計算機
の切り替え制御を行う機能を持つ。また、実行資源制御
プログラム２１１は、上記稼動状態制御プログラム２１
３、ハード障害監視プログラム２１４との通信手段を持
ち、各プログラムから上記制御を要求されると、制御を
実行する機能を持つ。さらに、図５に示すように、実行
資源制御プログラム２１１はローカルハードディスク２
８のスケジュール情報２８１を参照し、スケジュール情
報に設定された制御を自動的に実行する機能を持つ。The execution resource control program 211 shown in FIG.
As shown in FIG. 5, the management interface 216 of the redundant computer system has a function of controlling the execution / stop of the resources provided in the redundant computer system and controlling the switching of the executing computer of the resources. In addition, the execution resource control program 211
3. It has a communication means with the hardware failure monitoring program 214, and has a function of executing the control when each program requests the above control. Further, as shown in FIG. 5, the execution resource control program 211
8 has a function of referring to the schedule information 281 and automatically executing the control set in the schedule information.

【００２３】遠隔地の計算機の計算機管理ソフトウェア
からの要求により、計算機二重化システムの一方の計算
機を正常に稼動終了するときの、計算機切り替え方式を
説明する。ここでは、管理者が計算機６０で実行されて
いる計算機管理ソフトウェア６１２を使って計算機２０
の稼動を終了しようとしたとする。A description will now be given of a computer switching method when one of the computers of the redundant computer system is normally terminated in response to a request from the computer management software of a remote computer. Here, the administrator uses the computer management software 612 running on the computer 60 to
Try to end the operation of.

【００２４】このときの稼動状態制御プログラム２１４
の処理の流れを示したものが図６であり、実行資源制御
プログラム２１１の処理の流れを示したものが図８であ
る。The operating state control program 214 at this time
FIG. 6 shows the flow of the process of FIG. 6, and FIG. 8 shows the flow of the process of the execution resource control program 211.

【００２５】まず、稼動状態制御プログラム２１４の処
理の流れを説明する。First, the flow of processing of the operating state control program 214 will be described.

【００２６】計算機２０の稼動を停止する場合、計算機
２０で実行されている稼動状態制御プログラム２１４
は、遠隔の計算機６０の計算機管理ソフトウェア６１２
から計算機稼動終了要求を受け取る。その後、６０１０
で計算機稼動終了処理を開始する。稼動状態制御プログ
ラム２１４は稼動管理インタフェース２１５により、計
算機の稼動を終了する前に、実行資源制御プログラム２
１１と通信を行い、６０２０で計算機２０で実行中の全
ての資源の実行計算機を計算機３０に切り替えるように
要求する。その後、６０３０で実行資源制御プログラム
２１１から実行計算機の切り替え終了を受け、自計算機
で資源が実行されていないことを確認した上で、６０４
０で稼動管理インタフェース２１５により計算機２０の
稼動を終了する。When the operation of the computer 20 is stopped, the operating state control program 214 executed on the computer 20
Is the computer management software 612 of the remote computer 60
Receives a computer operation end request from. Then 6010
Starts the computer operation end processing. The operation status control program 214 is executed by the operation management interface 215 before executing the computer.
Communication is performed with 11 and a request is made at 6020 to switch the execution computers of all the resources being executed by the computer 20 to the computer 30. After that, in 6030, the end of switching of the execution computer is received from the execution resource control program 211, and after confirming that the resource is not being executed by the own computer, 604
At 0, the operation of the computer 20 is terminated by the operation management interface 215.

【００２７】次に、実行資源制御プログラム２１１の処
理の流れを説明する。Next, the processing flow of the execution resource control program 211 will be described.

【００２８】実行資源制御プログラム２１１は、稼動状
態制御プログラム２１４との通信により、計算機２０で
実行中の全ての資源の実行を計算機３０に切り替えるよ
うに要求されると、８０１０で全資源実行計算機切り替
え処理を開始する。８０２０で計算機２０で実行されて
いる資源があれば、８０３０でその資源の実行を計算機
２０から計算機３０へ切り替える。８０２０で計算機２
０で実行されている資源がなくなるまで、実行計算機の
切り替え処理を続ける。計算機２０で実行されている資
源が一つもなくなると、８０４０で稼動状態制御プログ
ラム２１４へ切り替え終了通知を送る。When the execution resource control program 211 is requested to switch execution of all the resources being executed by the computer 20 to the computer 30 through communication with the operation status control program 214, the execution resource control program 211 Start processing. If there is a resource being executed by the computer 20 in 8020, the execution of the resource is switched from the computer 20 to the computer 30 in 8030. Calculator 2 at 8020
Until there are no resources being executed at 0, the switching process of the execution computer is continued. When no resources are being executed by the computer 20, a switch completion notification is sent to the operating state control program 214 in 8040.

【００２９】この結果、計算機２０の稼動状態制御プロ
グラム２１４が稼動管理インタフェースにより稼動を終
了する時点で、計算機２０が実行していた資源の実行
は、全て計算機３０に切り替えられ、計算機２０で実行
中の資源がないため、計算機３０の実行資源制御プログ
ラム３１１が計算機２０の未稼動を検知する間も計算機
二重化システムの提供する資源の可用性は低下しない。As a result, when the operation state control program 214 of the computer 20 terminates the operation by the operation management interface, the execution of the resources executed by the computer 20 is completely switched to the computer 30 and the computer 20 is executing the resources. Therefore, the availability of resources provided by the redundant computer system does not decrease while the execution resource control program 311 of the computer 30 detects that the computer 20 is not operating.

【００３０】次に、計算機二重化システムの一方の計算
機の筐体温度がしきい値を越えた場合の計算機切り替え
方式を説明する。ここでは計算機２０の筐体温度がしき
い値を越えたとする。Next, a description will be given of a computer switching method in the case where the housing temperature of one computer of the redundant computer system exceeds a threshold value. Here, it is assumed that the housing temperature of the computer 20 has exceeded the threshold value.

【００３１】図４に示すように、ハードウェア障害監視
プログラム２１３は、ローカルハードディスク２７上の
ハードウェア障害情報２７１から、ドライバ２６が書き
込んだハードウェア障害情報を読み取ることで、計算機
の筐体温度上昇のハードウェア障害を検知する機能を持
つ。As shown in FIG. 4, the hardware failure monitoring program 213 reads the hardware failure information written by the driver 26 from the hardware failure information 271 on the local hard disk 27, thereby increasing the temperature of the computer housing. It has a function to detect hardware failures.

【００３２】また、ローカルハードディスク２７上のハ
ードウェア障害しきい値情報２７２は、ドライバ２６が
参照し、筐体温度がしきい値を越えた場合にハードウェ
ア障害情報２７１に筐体温度上昇のハードウェア障害を
書き込みハードウェア障害監視プログラム２１３に障害
を通知する。ハードウェア障害しきい値情報２７２はハ
ードウェア障害監視プログラム２１３が設定することが
できる。The hardware failure threshold information 272 on the local hard disk 27 is referred to by the driver 26, and when the chassis temperature exceeds the threshold, the hardware failure information 271 is added to the hardware failure threshold information. A hardware failure is written and the hardware failure monitoring program 213 is notified of the failure. The hardware failure threshold information 272 can be set by the hardware failure monitoring program 213.

【００３３】さらに、ローカルハードディスク２７上の
計算機切り替え情報２７３には、ハードウェア障害種別
が設定可能であり、ハードウェア障害監視プログラム２
１３は、検知した障害情報が計算機切り替え情報２７３
に設定されていれば計算機の切り替え要求処理を実行す
る。計算機切り替え情報２７３はハードウェア障害監視
プログラム２１３が設定することができる。Further, a hardware failure type can be set in the computer switching information 273 on the local hard disk 27, and the hardware failure monitoring program 2
13 indicates that the detected failure information is the computer switching information 273.
If it is set to, the computer switching request processing is executed. The computer switching information 273 can be set by the hardware failure monitoring program 213.

【００３４】ここでは、管理者が遠隔の計算機６０の計
算機管理ソフトウェア６１２を使って、計算機２０のハ
ードウェア障害監視プログラム２１３と通信を行い、ハ
ードウェアしきい値情報２７２の筐体温度のしきい値を
４０度に設定し、計算機切り替え情報２７３の障害種別
に筐体温度上昇のハードウェア障害を設定しているもの
とする。Here, the administrator communicates with the hardware failure monitoring program 213 of the computer 20 using the computer management software 612 of the remote computer 60, and the case temperature threshold of the hardware threshold information 272. It is assumed that the value is set to 40 degrees, and the hardware failure of the case temperature rise is set in the failure type of the computer switching information 273.

【００３５】筐体温度が４０度を越えたときの、ハード
ウェア障害監視プログラム２１３の処理の流れを示した
ものが図７であり、実行資源制御プログラム２１１の処
理の流れを示したものが図８である。FIG. 7 shows the processing flow of the hardware failure monitoring program 213 when the housing temperature exceeds 40 degrees, and FIG. 7 shows the processing flow of the execution resource control program 211. 8

【００３６】まず、ハードウェア障害監視プログラム２
１３の処理の流れを説明する。First, the hardware failure monitoring program 2
13 is described.

【００３７】計算機２０の、ローカルハードディスク２
７上のハードウェア障害しきい値情報２７２に設定され
ている筐体温度のしきい値が４０度であるため、ハード
ウェア障害監視プログラム２１３はローカルハードディ
スク２７上のハードウェア障害情報２７１を介して、ド
ライバ２６から筐体温度が４０度を越えたというハード
ウェア障害通知を受ける。その後、ハードウェア障害監
視プログラム２１３は、７０１０でハードウェア障害発
生時処理を開始する。ローカルハードディスク２７上の
計算機切り替え情報２７３を参照し、７０２０で筐体温
度上昇のハードウェア障害がハードウェア障害種別に設
定してあるかどうかを判定する。この場合、計算機切り
替え情報２７３に筐体温度上昇のハードウェア障害が設
定されているため、７０３０で実行資源制御プログラム
２１１と通信を行い、計算機２０で実行中の全ての資源
の実行計算機を計算機３０に切り替えるように要求す
る。その後、７０４０で実行資源制御プログラム２１１
から実行計算機の切り替え終了を受け、計算機２０で資
源が実行されていないことを確認する。もし７０２０
で、筐体温度上昇のハードウェア障害が計算機切り替え
情報になければ、実行資源制御プログラムへ計算機の切
り替えを要求せず、７０５０でハードウェア障害発生時
処理を終了する。The local hard disk 2 of the computer 20
7 is 40 ° C., the hardware failure monitoring program 213 transmits the hardware failure threshold value information 271 on the local hard disk 27 via the hardware failure information 271 on the local hard disk 27. Then, a hardware failure notification that the housing temperature has exceeded 40 degrees is received from the driver 26. Thereafter, the hardware failure monitoring program 213 starts processing at the time of hardware failure occurrence at 7010. With reference to the computer switching information 273 on the local hard disk 27, it is determined at 7020 whether or not a hardware failure due to a rise in the enclosure temperature is set as a hardware failure type. In this case, since the hardware failure of the case temperature rise is set in the computer switching information 273, the computer communicates with the execution resource control program 211 in 7030, and executes the execution computers of all the resources being executed in the computer 20 in the computer 30. Request to switch to Thereafter, in 7040, the execution resource control program 211
, And confirms that the resources are not being executed by the computer 20. If 7020
If the hardware failure due to the rise in the enclosure temperature is not present in the computer switching information, the computer does not request the execution resource control program to switch the computer, and ends the hardware failure processing at 7050.

【００３８】次に、実行資源制御プログラム２１１の処
理の流れを説明する。Next, the processing flow of the execution resource control program 211 will be described.

【００３９】計算機２０で実行されている実行資源制御
プログラム２１１はハードウェア障害監視プログラム２
１３との通信により、計算機２０で実行中の全ての資源
の実行を計算機３０に切り替えるように要求されると、
８０１０で全資源実行計算機切り替え処理を開始する。
８０２０で計算機２０で実行されている資源があれば、
８０３０でその資源の実行を計算機２０から計算機３０
へ切り替える。８０２０で計算機２０で実行されている
資源がなくなるまで、実行計算機の切り替え処理を続け
る。計算機２０で実行されている資源が一つもなくなる
と、８０４０でハードウェア障害監視プログラム２１３
へ切り替え終了通知を送る。The execution resource control program 211 executed by the computer 20 is the hardware failure monitoring program 2
13 is requested to switch execution of all resources being executed by the computer 20 to the computer 30 by communication with the
At 8010, the all resource execution computer switching process starts.
At 8020, if there is a resource being executed on the computer 20,
At 8030, the execution of the resource is sent from the computer 20 to the computer 30.
Switch to Until the resources being executed by the computer 20 at 8020 are exhausted, the execution computer switching process is continued. When no resources are running on the computer 20, the hardware failure monitoring program 213 is executed at 8040.
Send a switch end notification to.

【００４０】この後、計算機２０の筐体温度が上昇し続
け、ハードウェア障害により計算機２０の稼動が停止し
た時には、すでに計算機２０で実行されていた資源の実
行は、全て計算機３０に切り替えられ、計算機２０で実
行されている資源はないため、計算機３０の実行資源制
御プログラム３１１が計算機２０の未稼動を検知する間
も計算機二重化システムの資源の可用性は低下しない。Thereafter, when the housing temperature of the computer 20 continues to rise and the operation of the computer 20 is stopped due to a hardware failure, the execution of the resources already executed by the computer 20 is switched to the computer 30. Since no resources are being executed by the computer 20, the availability of the resources of the redundant computer system does not decrease while the execution resource control program 311 of the computer 30 detects that the computer 20 is not operating.

【００４１】最後に、計算機二重化システムの実行資源
管理の自動化方式を説明する。ここでは計算機２０で実
行されている実行資源制御プログラム２１１の自動実行
方式を説明する。Lastly, a description will be given of an automatic system for managing execution resources in a computer redundant system. Here, an automatic execution method of the execution resource control program 211 executed by the computer 20 will be described.

【００４２】図５に示すように、実行資源制御プログラ
ム２１１は、ローカルハードディスク２８上のスケジュ
ール情報２８１から、制御情報を読み取り、実行資源制
御を自動実行する。スケジュール情報２８１は実行資源
制御プログラム２１１が設定可能である。ここでは、夜
間バッチ処理の処理性能を上げるため、管理者が遠隔の
計算機６０の計算機二重化システム管理ソフトウェア６
１１を使って、計算機２０の実行資源制御プログラム２
１１と通信を行い、バッチ処理が開始される午後１１時
に、計算機２０で実行されている資源をなくすため、午
後１０時５０分に、計算機２０で実行されている資源の
実行を計算機３０に切り替えるように設定し、現在時刻
が午後１０時５０分になったとする。As shown in FIG. 5, the execution resource control program 211 reads control information from the schedule information 281 on the local hard disk 28 and automatically executes the execution resource control. The schedule information 281 can be set by the execution resource control program 211. Here, in order to improve the processing performance of the nighttime batch processing, the administrator sets up the computer duplication system management software 6 of the remote computer 60.
11 and the execution resource control program 2 of the computer 20
At 11:50 pm when the batch process is started, the resources being executed by the computer 20 are switched to the computer 30 at 10:50 pm in order to eliminate the resources being executed by the computer 20 at 11 pm when the batch processing is started. It is assumed that the current time is 10:50 pm.

【００４３】このときの実行資源制御プログラム２１１
の処理の流れを示したものが図９である。The execution resource control program 211 at this time
FIG. 9 shows the flow of the processing of FIG.

【００４４】実行資源制御プログラム２１１は定期的
に、９０１０でスケジュール情報参照処理を開始する。
実行資源制御プログラム２１１は、９０２０でローカル
ディスク２８上のスケジュール情報２８１を参照し、９
０３０で現在時刻がスケジュール情報に設定してあるか
を判定する。現在時刻午後１０時５０分がスケジュール
情報２８１に設定してあるため、スケジュール情報の制
御情報を参照する。この場合、制御情報には計算機２０
の資源の実行を計算機３０に切り替えるように設定して
あるため、９０４０で制御情報に従い、計算機２０で実
行されている全ての資源の実行計算機を計算機２０から
計算機３０へ切り替える。もし９０３０で現在時刻がス
ケジュール情報に設定してなければ、９０５０でスケジ
ュール情報参照処理を終了し、一定時間の後、再びスケ
ジュール情報参照処理を開始する。The execution resource control program 211 starts a schedule information reference process at 9010 periodically.
The execution resource control program 211 refers to the schedule information 281 on the local disk 28 in 9020, and
At 030, it is determined whether the current time is set in the schedule information. Since the current time of 10:50 pm is set in the schedule information 281, the control information of the schedule information is referred to. In this case, the control information includes the computer 20
Since the execution of the resource is set to be switched to the computer 30, the execution computer of all the resources being executed by the computer 20 is switched from the computer 20 to the computer 30 according to the control information in 9040. If the current time is not set in the schedule information at 9030, the schedule information reference processing is terminated at 9050, and after a predetermined time, the schedule information reference processing is started again.

【００４５】実行資源制御プログラム２１１の自動実行
処理により、バッチ処理が開始される午後１１時には、
すでに計算機２０で実行されていた資源の実行は、全て
計算機３０に切り替えられ、計算機２０で実行されてい
る資源はないため、バッチ処理の処理性能が向上する。At 11:00 pm when batch processing is started by the automatic execution processing of the execution resource control program 211,
The execution of the resources already executed by the computer 20 is all switched to the computer 30, and there is no resource executed by the computer 20, so that the processing performance of batch processing is improved.

【００４６】以上で本発明の実施形態の説明を終える。This concludes the description of the embodiment of the present invention.

【００４７】[0047]

【発明の効果】従来は、計算機二重化システムにおい
て、一方の計算機の正常な稼動終了や、ハードウェア障
害による稼動停止を他方の計算機の実行資源制御プログ
ラムで検出すまで、計算機の切り替えが行われず、未稼
動になった計算機で実行されていた資源を提供すること
ができなかった。しかし、外部プログラムから、実行資
源制御プログラムの制御実行を要求することにより、一
方の計算機の未稼動を検出する間も資源を提供すること
が可能となり、計算機二重化システムの可用性が向上す
る。Conventionally, in a redundant computer system, computers are not switched until the normal operation end of one computer or the operation stoppage due to a hardware failure is detected by the execution resource control program of the other computer. The resources that were running on the computer that was not running could not be provided. However, by requesting the execution of the execution resource control program from an external program, it is possible to provide resources even while one of the computers is not operating, thereby improving the availability of the redundant computer system.

【００４８】さらに、管理者の設定したスケジュール情
報に基づいて計算機の切り替えを行うことで、計算機二
重化システムの運用を自動化し、管理者の負担を軽減す
る。Further, by switching computers based on the schedule information set by the administrator, the operation of the redundant computer system is automated and the burden on the administrator is reduced.

[Brief description of the drawings]

【図１】稼動状態制御プログラム、ハードウェア障害監
視プログラムが実行資源制御プログラムと連携した場合
の計算機二重化システムの構成図。FIG. 1 is a configuration diagram of a redundant computer system when an operating state control program and a hardware failure monitoring program cooperate with an execution resource control program.

【図２】従来の計算機二重化システムの構成図。FIG. 2 is a configuration diagram of a conventional computer redundant system.

【図３】稼動状態制御プログラムと遠隔の計算機管理ソ
フトウェアの構成図。FIG. 3 is a configuration diagram of an operating state control program and remote computer management software.

【図４】ハードウェア障害監視プログラムと遠隔の計算
機管理ソフトウェアの構成図。FIG. 4 is a configuration diagram of a hardware failure monitoring program and remote computer management software.

【図５】スケジュール情報により資源管理を自動化する
場合の、実行資源管理プログラムの構成図。FIG. 5 is a configuration diagram of an execution resource management program when resource management is automated by schedule information.

【図６】計算機稼動終了要求が来たときの稼動状態制御
プログラムの処理の流れ図。FIG. 6 is a flowchart of processing of an operation state control program when a computer operation end request is received.

【図７】ハードウェア障害発生時のハードウェア障害監
視プログラムの処理の流れ図。FIG. 7 is a flowchart of processing of a hardware failure monitoring program when a hardware failure occurs.

【図８】全資源実行計算機切り替え要求が来たときの実
行資源制御プログラムの処理の流れ図。FIG. 8 is a flowchart of processing of an execution resource control program when an all resources execution computer switching request is received.

【図９】スケジュール情報参照処理の流れ図FIG. 9 is a flowchart of a schedule information reference process;

[Explanation of symbols]

２０：計算機、２１：オペレーテイングシステム、２１
３：ハードウエア障害監視プログラム、２１４：稼働状
態制御プログラム。20: Computer, 21: Operating system, 21
3: Hardware failure monitoring program, 214: Operating state control program.

───────────────────────────────────────────────────── フロントページの続き (72)発明者井形博之神奈川県横浜市戸塚区戸塚町5030番地株式会社日立製作所ソフトウェア開発本部内 ──────────────────────────────────────────────────続き Continuing from the front page (72) Inventor Hiroyuki Igata 5030 Totsukacho, Totsuka-ku, Yokohama-shi, Kanagawa Prefecture Software Development Division, Hitachi, Ltd.

Claims

[Claims]

1. A computer shared by two computers connected by a network executes resources shared by the two computers,
When one computer becomes inactive, another computer takes over execution of resources in a computer duplication system, which is an execution resource control program that controls switching of computers. A resource execution / stop control unit, a resource execution computer switching control unit, and communicates with an operation state control program and a hardware failure monitoring program that run on both computers by the management interface of An execution resource control program for executing control according to the following.

2. The operation state control program has a function of stopping operation of a computer by an operation management interface when an operation end is requested by communication with computer management software of a remote computer. Computer operating state control method.

3. The operation state control program according to claim 2, when receiving an operation end request from the computer management software of the remote computer, communicates with the execution resource control program to execute all the resources being executed by the own computer. A computer for requesting a switch to a partner computer, confirming that there is no resource being executed on the own computer by a notification from the execution resource control program, and then stopping the operation of the computer by the operation management interface. Operating state control method.

4. The hardware fault monitoring program according to claim 1,
A hardware failure monitoring method having a function of detecting a hardware failure by sharing a database mechanism in a local hard disk with a driver and receiving a hardware failure notification from the driver.

5. The hardware fault monitoring program according to claim 4, wherein a threshold value of a case temperature abnormality and a voltage abnormality notified by the driver can be set in a database in a local hard disk shared with the driver. A hardware fault monitoring method characterized by detecting that the housing temperature and voltage exceed set values as fault information.

6. The hardware failure monitoring program according to claim 4, wherein the hardware failure monitoring program detects a hardware failure such as a CPU temperature abnormality, a memory degeneration, a case temperature rise, a voltage drop, etc., based on a notification from the driver. Based on the switching information stored in the mechanism, it communicates with the execution resource control program, requests that the execution of all resources being executed on its own computer be switched to the partner computer, and before the computer stops operating due to a hardware failure. A computer switching method characterized by eliminating resources being executed by the computer.

7. The switching information according to claim 6 is stored in a database in a local hard disk, and information can be set by the computer redundancy system management software of the remote computer communicating with the hardware failure monitoring program. A computer switching method characterized in that the switching timing at the time of a hardware failure can be set freely according to the hardware environment.

8. The switching information according to claim 6, wherein a hardware failure type such as CPU temperature abnormality, memory degeneration, enclosure temperature rise, voltage drop and the like can be set. A computer switching method characterized by communicating with an execution resource control program and issuing a computer switching request when a set hardware failure is notified from a driver.

9. The switching information according to claim 6, wherein the number of fault occurrences can be set, and the hardware fault monitoring program is configured to execute the hardware fault setting set in the switching information when the set number of times has occurred. A computer switching method for communicating with an execution resource control program and issuing a computer switching request.

10. The execution resource control program according to claim 1,
An execution resource control method having a function of communicating with a remote computer redundant system management software and executing an execution resource control when a control request is received from the management software.

11. The execution resource control program according to claim 1,
Execution resource control characterized by periodically referring to schedule information stored on a hard disk, executing execution resource control of a redundant computer system based on the schedule information, and automating execution resource management of the redundant computer system. method.

12. The schedule execution function according to claim 11, wherein the schedule information has time information, and controls execution resources at a set time.

13. The schedule execution function according to claim 11, wherein execution / stop can be set for each resource, and control is performed to execute a specific resource and stop execution of another resource. .

14. The schedule execution function according to claim 11, wherein an execution computer can be set for each resource, and the distribution of resources executed by each computer is controlled.