JPH11259326A

JPH11259326A - Hot standby system, automatic re-execution method for the same and storage medium therefor

Info

Publication number: JPH11259326A
Application number: JP10063614A
Authority: JP
Inventors: Ikuo Ochiai; 郁夫落合
Original assignee: NTT Communicationware Corp
Current assignee: NTT Comware Corp
Priority date: 1998-03-13
Filing date: 1998-03-13
Publication date: 1999-09-24

Abstract

PROBLEM TO BE SOLVED: To provide a hot standby system capable of speedy continuous operating through active and standby servers to be automatically operated even when any fault occurs at the active server. SOLUTION: The hot standby system is composed of the shared disk 3 for recording an operation schedule 31 of various jobs, active server 1 for automatically operating the job based on the operation schedule 31, and standby server 4 to be operated as the active server 1, based on the information recorded on the shared disk 3 when any fault occurs at the active server. Then, the active server 1 is provided with a status managing part 14 for recording the status concerning the activation conditions of job system product to perform the activation and stop of an application through the different jobs on the shared disk 3 as status information 33. The standby server 4 is provided with a job system product reactivation part 15 for reactivating the job system product through the job while referring to the status information 33 when any fault occurs at the active server 1.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、自動運用される現
用サーバと、この現用サーバに障害が生じたときに現用
サーバとして動作する予備サーバとからなるホットスタ
ンバイシステムに関し、特にジョブの自動再実行が可能
なホットスタンバイシステムおよびホットスタンバイシ
ステムにおける自動再実行方法およびその記録媒体に関
するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a hot standby system including an active server that is automatically operated and a spare server that operates as the active server when a failure occurs in the active server. The present invention relates to a hot standby system capable of performing the above, an automatic re-execution method in the hot standby system, and a recording medium thereof.

【０００２】[0002]

【従来の技術】所定の処理を行うコンピュータサーバの
運用の自動化を図るために自動運用技術が導入されてき
ている。この自動運用をコンピュータサーバで実現する
ために、サーバ内の状態監視を行うプログラムおよび運
転スケジュールに沿った運転を行うためのプログラムが
利用されている。また、自動運用を行うコンピュータサ
ーバに障害が生じた際に、復旧までの時間を短縮するた
めにホットスタンバイシステムといったシステム構成と
する場合がある。ここで、「ホットスタンバイシステ
ム」とは、各種ジョブの運転・処理を行う現用サーバ、
予備サーバ、現用サーバおよび予備サーバで共有される
共有ディスクとにより構成され、現用サーバに障害が生
じたときに、共有ディスクに記録された情報に基づき、
予備サーバが現用サーバとして動作するものである。2. Description of the Related Art Automatic operation techniques have been introduced to automate the operation of a computer server that performs a predetermined process. In order to realize this automatic operation on a computer server, a program for monitoring the state in the server and a program for performing an operation according to an operation schedule are used. In addition, when a failure occurs in a computer server that performs automatic operation, a system configuration such as a hot standby system may be used in order to reduce the time required for recovery. Here, the “hot standby system” is an active server that operates and processes various jobs,
It is composed of a spare server, an active server and a shared disk shared by the spare server. When a failure occurs in the active server, based on information recorded on the shared disk,
The spare server operates as the active server.

【０００３】図１１は、この自動運用されるホットスタ
ンバイシステムの構成の一従来例を示す図である。図１
１より現用サーバ２１と予備サーバ２４とは同一構成を
なし、共有ディスク２３がそれらに物理的に接続されて
いる。そして、現用サーバ２１、予備サーバ２４とも処
理部２２、２５を備え、その処理部２２、２５により、
監視部１１、障害対処部１２、スケジュール処理部１
３、各種アプリケーション１７が実行処理される。処理
部２２、２５で実行処理される各機能について簡単に説
明すると以下の通りである。監視部１１は、サーバ内部
の状況を監視するもので、システム監視および自動オペ
レーションシステムといった機能を備えたものである。
障害対処部１２は監視対象プロセス等の停止とった障害
の検出を行い、障害を検出すると自動的に現用サーバ２
１の処理部２２から予備サーバ２４の処理部２５に切り
替えを行う機能を備えたものである。スケジュール処理
部１３は、共有ディスク２３内に記憶された運転スケジ
ュール３１に沿って各種アプリケーション１７の起動等
を行うとともにその履歴を実行履歴３２として共有ディ
スク２３に記録するものである。各種アプリケーション
とは、サーバがサーバの利用者に対して提供する機能群
であり、ジョブとして起動される。ここで、監視部１
１、スケジュール処理部１３が自動運用のために必要と
なる機能部であり、障害対処部１２が現用サーバ２１に
障害が生じた際に予備サーバ２４に運用を切り替えるた
め必要となる機能部である。FIG. 11 is a diagram showing a conventional example of the configuration of the hot standby system which is automatically operated. FIG.
1, the active server 21 and the spare server 24 have the same configuration, and the shared disk 23 is physically connected to them. Each of the active server 21 and the spare server 24 includes processing units 22 and 25, and the processing units 22 and 25
Monitoring unit 11, failure handling unit 12, schedule processing unit 1
3. Various applications 17 are executed. The functions executed by the processing units 22 and 25 will be briefly described below. The monitoring unit 11 monitors the status inside the server, and has functions such as a system monitoring and an automatic operation system.
The failure handling unit 12 detects a stopped failure of a process to be monitored or the like, and when the failure is detected, the active server 2
It has a function of switching from the first processing unit 22 to the processing unit 25 of the spare server 24. The schedule processing unit 13 starts various applications 17 in accordance with the operation schedule 31 stored in the shared disk 23 and records the history of the applications 17 as the execution history 32 on the shared disk 23. The various applications are a group of functions provided by the server to the user of the server, and are activated as jobs. Here, the monitoring unit 1
1. The schedule processing unit 13 is a functional unit required for automatic operation, and the failure handling unit 12 is a functional unit required for switching operation to the spare server 24 when a failure occurs in the active server 21. .

【０００４】次に、この現用サーバ２１、予備サーバ２
４、共有ディスク２３からなるホットスタンバイシステ
ムの動作概要を図１２を用いて説明する。図１２におい
て、左側に記載されたステップ（ステップＳ１１〜Ｓ１
６）は、現用サーバ２１の動作フローであり、右側に記
載されたステップ（ステップＳ２１〜Ｓ２５）は、予備
サーバ２４の動作フローである。まず、ホットスタンバ
イシステムの動作開始時に、現用サーバ２１および予備
サーバ２４において、監視部１１が起動され、自ホスト
内の監視処理が開始する（ステップＳ１１、Ｓ２１）。
次に、現用サーバ２１および予備サーバ２４において、
障害対処部１２が起動され相互監視を開始する。これに
より、現用サーバ２１に生じる障害の検出、および、障
害の発生時における現用サーバ２１の処理部２２から予
備サーバ２４の処理部２５への自動切り替えが可能とな
る（ステップＳ１２、２２）。次に、現用サーバ２１に
おいて、スケジュール処理部１３が起動される。そし
て、スケジュール処理部１３は運転スケジュール３１に
沿ってジョブとして各種アプリケーション等の起動し、
自動運用が行われる。それとともにその履歴を実行履歴
３２として記録していく（ステップＳ１４）。Next, the active server 21 and the spare server 2
4. An outline of the operation of the hot standby system including the shared disk 23 will be described with reference to FIG. In FIG. 12, the steps described on the left side (steps S11 to S1)
6) is an operation flow of the active server 21, and the steps (steps S21 to S25) described on the right side are an operation flow of the spare server 24. First, at the start of the operation of the hot standby system, the monitoring unit 11 is activated in the active server 21 and the spare server 24, and the monitoring process in the host starts (steps S11 and S21).
Next, in the active server 21 and the spare server 24,
The failure handling unit 12 is activated and starts mutual monitoring. As a result, it is possible to detect a failure occurring in the active server 21 and to automatically switch from the processing unit 22 of the active server 21 to the processing unit 25 of the spare server 24 when a failure occurs (steps S12 and S22). Next, in the active server 21, the schedule processing unit 13 is started. Then, the schedule processing unit 13 starts various applications and the like as a job according to the operation schedule 31,
Automatic operation is performed. At the same time, the history is recorded as the execution history 32 (step S14).

【０００５】自動運用中に現用サーバ２１において何ら
かの障害が発生すると（ステップＳ１５）、現用サーバ
２１および予備サーバ２４内の障害対処部１２による相
互監視により、予備サーバ２５で現用サーバ２１の異常
を検出する（ステップＳ２３）。この際、可能であれ
ば、停止さえできない障害にまで発展することを避ける
ために、予備サーバ２４への切り替えの前に、現用サー
バ２１においてスケジュール処理部１３の停止が行われ
る（ステップＳ１６）。一方、予備サーバ２４では、障
害対処部１２の機能により自動的に現用サーバ２１の処
理部２２から予備サーバ２４の処理部２５への切り替え
処理が行われ、その後、スケジュール処理部１３が起動
される（ステップＳ２４）。そして、予備サーバ２４に
おいて、障害発生前の現用サーバ２１と同じ状態にする
ために、オペレータが共有ディスク２３内の実行履歴を
参照し、さらに現用サーバ２１以外で実行されているジ
ョブに関してはそのサーバの実行状態を参照して、どの
ジョブを実行するべきか判断を行う。そして、オペレー
タはその判断結果に基づきスケジュール処理部１３を利
用してジョブを実行させ、そのジョブに対応したアプリ
ケーションが起動される（ステップＳ２５）。以上のよ
うにして、現用サーバ２１において障害が生じたとき
に、予備サーバ２４へ切り替えられる。If any failure occurs in the active server 21 during the automatic operation (step S15), the standby server 25 detects an abnormality in the active server 21 by mutual monitoring by the failure handling units 12 in the active server 21 and the standby server 24. (Step S23). At this time, if possible, the schedule processing unit 13 is stopped in the active server 21 before switching to the spare server 24 in order to avoid developing into a failure that cannot be stopped (step S16). On the other hand, in the spare server 24, the switching process from the processing unit 22 of the active server 21 to the processing unit 25 of the spare server 24 is automatically performed by the function of the failure handling unit 12, and thereafter, the schedule processing unit 13 is activated. (Step S24). In the spare server 24, the operator refers to the execution history in the shared disk 23 in order to make the same state as the active server 21 before the occurrence of the failure. It is determined which job should be executed by referring to the execution status of the job. Then, the operator causes the job to be executed using the schedule processing unit 13 based on the determination result, and the application corresponding to the job is started (step S25). As described above, when a failure occurs in the active server 21, the server is switched to the spare server 24.

【０００６】[0006]

【発明が解決しようとする課題】しかし、上述の自動運
用されるホットスタンバイシステムでは、現用サーバに
障害が生じ、予備サーバに切り替える際に、障害発生前
の現用サーバと同じ状態にするためにオペレータによる
処理が必要となる。このようなオペレータによる処理で
は、障害発生前の現用サーバと同じ状態にするために一
定の操作時間がかかる。そのため、ホットスタンバイシ
ステムにおいて現用サーバから予備サーバへの迅速な継
続運用ができない、という問題がある。However, in the hot standby system which is automatically operated as described above, a failure occurs in the active server, and when switching to the spare server, an operator is required to make the same state as the active server before the failure. Processing is required. In such processing by the operator, a certain operation time is required to make the same state as the active server before the occurrence of the failure. For this reason, there is a problem in that the hot standby system cannot perform rapid continuous operation from the active server to the spare server.

【０００７】本発明はこのような事情に鑑みてなされた
もので、自動運用される現用サーバに障害が生じたとき
にも、予備サーバにおいて自動的に障害発生前の現用サ
ーバと同じ状態にすることができ、継続的な運用が迅速
に行えるホットスタンバイシステムおよびホットスタン
バイシステムにおける自動再実行方法およびその記録媒
体を提供することを目的とする。The present invention has been made in view of such circumstances, and even when a failure occurs in an automatically operated active server, the spare server is automatically brought into the same state as the active server before the failure occurred. It is an object of the present invention to provide a hot standby system capable of performing continuous operation quickly, an automatic re-execution method in the hot standby system, and a recording medium thereof.

【０００８】[0008]

【課題を解決するための手段】上記目的を達成するため
に、本発明のうち請求項１に記載の発明は、各種ジョブ
の運転スケジュールを記録する共有ディスクと、該運転
スケジュールに基づきジョブの自動運用がなされる現用
サーバと、該現用サーバに障害が生じたときに前記共有
ディスクに記録された情報に基づき現用サーバとして動
作する予備サーバとからなるホットスタンバイシステム
であって、前記現用サーバは、アプリケーションの起動
と停止が別のジョブで行われる業務系製品の起動状況に
関するステータスをステータス情報として前記共有ディ
スクに記録するステータス管理部を備え、前記予備サー
バは、前記現用サーバに障害が生じたときに前記ステー
タス情報を参照して業務系製品をジョブにより再起動さ
せる業務系製品再起動部を備えたことを特徴とするホッ
トスタンバイシステム。また、請求項２に記載の発明
は、請求項１に記載のホットスタンバイシステムにおい
て、前記共有ディスクには、前記現用サーバにより前記
運転スケジュールに従い実行されたジョブの実行履歴が
記録され、前記予備サーバは、前記現用サーバに障害が
生じたときに前記実行履歴を参照して前記運転スケジュ
ールにおける再実行ポイントを決定し、該実行ポイント
に係るジョブもしくは該ジョブの後続ジョブより前記運
転スケジュールに基づいたジョブの再実行を行わせる自
動再実行部を更に備えたことを特徴としている。According to one aspect of the present invention, there is provided a shared disk for recording an operation schedule of various jobs, and a method for automatically controlling a job based on the operation schedule. An active server that is operated, and a hot standby system including a spare server that operates as an active server based on information recorded on the shared disk when a failure occurs in the active server, wherein the active server is A status management unit that records, as status information, a status relating to a start status of a business-related product in which start and stop of an application are performed in different jobs as status information on the shared disk, and the spare server is used when the active server fails. The business product is restarted by the job referring to the status information. Hot standby system comprising the moving parts. According to a second aspect of the present invention, in the hot standby system according to the first aspect, an execution history of a job executed according to the operation schedule by the active server is recorded on the shared disk; Determines a re-execution point in the operation schedule by referring to the execution history when a failure occurs in the active server, and determines a job based on the operation schedule from a job related to the execution point or a succeeding job of the job. And an automatic re-execution unit for re-executing the process.

【０００９】また、請求項３に記載の発明は、請求項２
に記載のホットスタンバイシステムにおいて、前記自動
再実行部が、再実行ポイントのジョブがアプリケーショ
ンの終了とジョブの終了の同じバッチ業務に関するジョ
ブである場合において、前記ジョブが前記現用サーバで
処理されていた場合、該ジョブの性質に応じて該ジョブ
より再実行を行わせ、前記ジョブが前記現用サーバ以外
の処理サーバで処理されていた場合に該処理サーバにお
ける該ジョブの処理状況や性質に応じて、該ジョブもし
くは該ジョブの後続ジョブより再実行を行わせることを
特徴としている。また、請求項４に記載の発明は、請求
項２または請求項３に記載のホットスタンバイシステム
において、前記現用サーバと前記予備サーバが、前記ス
テータス管理部、前記業務系製品再起動部、前記自動再
実行部をそれぞれ備えたことを特徴とする請求項２また
は請求項３に記載のホットスタンバイシステム。The invention described in claim 3 is the same as the invention in claim 2
In the hot standby system according to (1), when the job at the re-execution point is a job related to the same batch job of terminating the application and terminating the job, the automatic re-executing unit may process the job on the active server. In the case, the job is re-executed according to the nature of the job, and if the job is being processed by a processing server other than the active server, according to the processing status and nature of the job at the processing server, The job is re-executed from the job or a succeeding job of the job. According to a fourth aspect of the present invention, in the hot standby system according to the second or third aspect, the active server and the spare server are connected to the status management unit, the business-related product restart unit, and the automatic server. 4. The hot standby system according to claim 2, further comprising a re-execution unit.

【００１０】次に請求項５に記載の発明は、各種ジョブ
の運転スケジュールを記録する共有ディスクと、該運転
スケジュールに基づきジョブの自動運用を行う現用サー
バと、該現用サーバに障害が生じたときに前記共有ディ
スクに記録された情報に基づき現用サーバとして動作す
る予備サーバとからなるホットスタンバイシステムにお
ける自動再実行方法であって、前記現用サーバでは、ア
プリケーション起動と停止が別のジョブで行われる業務
系製品の起動状況のステータスをステータス情報を前記
共有ディスクに記録し、前記予備サーバでは、前記現用
サーバに障害が生じたときに前記ステータス情報を参照
して業務系製品をジョブにより再起動させることを特徴
とするホットスタンバイシステムにおける自動再実行方
法である。また、請求項６に記載の発明は、請求項５に
記載のホットスタンバイシステムにおける自動再実行方
法において、前記現用系サーバでは、前記運転スケジュ
ールに従い実行されたジョブの実行履歴を前記共有ディ
スクに記録し、前記予備サーバでは、前記現用サーバに
障害が生じたときに前記実行履歴を参照して前記運転ス
ケジュールにおける再実行ポイントを決定し、該実行ポ
イントに係るジョブもしくは該ジョブの後続ジョブより
前記運転スケジュールに基づいたジョブの再実行を行う
ことをさらに行うことを特徴としている。A fifth aspect of the present invention provides a shared disk for recording operation schedules of various jobs, an active server for automatically operating jobs based on the operation schedules, and a method in which a failure occurs in the active servers. An automatic re-execution method in a hot standby system including a spare server operating as an active server based on information recorded on the shared disk, wherein in the active server, an application is started and stopped by another job. Recording the status of the startup status of the system product in the shared disk, and in the spare server, referring to the status information when the active server fails, restarting the business product by a job. An automatic re-execution method in a hot standby system characterized by the following. According to a sixth aspect of the present invention, in the automatic re-execution method in the hot standby system according to the fifth aspect, the active server records an execution history of a job executed according to the operation schedule on the shared disk. The spare server determines a re-execution point in the operation schedule by referring to the execution history when a failure occurs in the active server, and determines a re-execution point in the operation schedule based on a job related to the execution point or a succeeding job of the job. It is characterized in that re-execution of a job based on a schedule is further performed.

【００１１】次に、請求項７に記載の発明は、各種ジョ
ブの運転スケジュールを記録する共有ディスクと、該運
転スケジュールに基づきジョブの自動運用を行う現用サ
ーバと、該現用サーバに障害が生じたときに前記共有デ
ィスクに記録された情報に基づき現用サーバとして動作
する予備サーバとからなるホットスタンバイシステムの
ための自動再実行プログラムを記録したコンピュータ読
み取り可能な記録媒体であって、アプリケーション起動
と停止が別々のジョブで行われる業務系製品の起動状況
のステータスをステータス情報を前記共有ディスクに記
録する前記現用サーバのためのステータス管理機能と、
前記現用サーバに障害が生じたときに前記ステータス情
報を参照して業務系製品をジョブにより再起動させる前
記予備サーバのための業務系製品再起動機能とをコンピ
ュータに実現させるための自動再実行プログラムを記録
した記録媒体である。また、請求項８に記載の発明は、
請求項７に記載のホットスタンバイシステムにおける自
動再実行プログラムを記録した記録媒体において、前記
自動再実行プログラムは、前記現用サーバに障害が生じ
たとき、前記共有ディスクに記録された前記運転スケジ
ュールに従い実行されたジョブの実行履歴を参照して前
記運転スケジュールにおける再実行ポイントを決定し、
該実行ポイントに係るジョブもしくは該ジョブの後続ジ
ョブより前記運転スケジュールに基づいたジョブの再実
行を行わせる前記予備サーバのための自動再実行機能を
さらに含むことを特徴としている。Next, according to the present invention, a shared disk for recording the operation schedule of various jobs, an active server for automatically operating jobs based on the operation schedule, and a failure in the active server A computer-readable recording medium that records an automatic re-execution program for a hot standby system including a spare server that operates as an active server based on information recorded on the shared disk when application start and stop are performed. A status management function for the active server that records status information on the shared disk as the status of the startup status of the business product performed in a separate job,
An automatic re-execution program for causing a computer to implement a business-related product restart function for the spare server that restarts a business-related product by a job with reference to the status information when a failure occurs in the active server Is a recording medium on which is recorded. The invention described in claim 8 is:
8. A recording medium recording an automatic re-execution program in the hot standby system according to claim 7, wherein the automatic re-execution program is executed according to the operation schedule recorded on the shared disk when a failure occurs in the active server. Determine the re-execution point in the operation schedule with reference to the execution history of the job,
It is characterized by further including an automatic re-execution function for the spare server for re-executing a job based on the operation schedule from a job related to the execution point or a succeeding job of the job.

【００１２】[0012]

【発明の実施の形態】以下、本発明の一実施形態による
ホットスタンバイシステムおよびホットスタンバイシス
テムにおける自動再実行方法を図面を参照して説明す
る。自動運用されている現用サーバに障害が生じたとき
に、予備サーバにおいて障害発生前の現用サーバと同じ
状態にするためには、ジョブの適切な実行処理が必要と
なる。そこで、始めに、ジョブの性質、処理する場所の
違いから整理する。なお、ここでは、運転スケジュール
に定義（設定）される処理単位のことを「ジョブ」と呼
び、このジョブの処理実態を「アプリケーション」と呼
ぶものとする。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, a hot standby system and an automatic re-execution method in the hot standby system according to one embodiment of the present invention will be described with reference to the drawings. When a failure occurs in the active server that is being automatically operated, in order for the spare server to be in the same state as the active server before the failure, appropriate execution processing of the job is required. Therefore, first, the job is sorted out based on the difference in the nature of the job and the place to be processed. Here, the processing unit defined (set) in the operation schedule is referred to as a “job”, and the processing status of the job is referred to as an “application”.

【００１３】自動運用の対象となるジョブはその性質の
違いから１）アプリケーションの起動と停止が別のジョブで行わ
れる業務系製品に関するジョブ２）アプリケーションの終了とジョブの終了とが同じと
なるバッチ業務に関するジョブに大別することができる。ここで、業務系製品に関する
ジョブ（以下、単に「業務系製品」と呼ぶ）としては、
データベース機能を提供するためのデータベースモニ
タ、オンラインによる依頼処理の機能を提供するための
オンライントランザクションモニタ等がある。これら
は、起動ジョブの実行により起動され、停止ジョブが実
行されるまで起動し続ける点で特徴付けられるジョブ
（アプリケーション）である。一方、バッチ業務に関す
るジョブ（以下、単に「バッチ業務」と呼ぶ）として
は、ファイルのソート、ファイルのマージア、バックア
ップ、定義情報の収集・配布・管理等がある。これら
は、起動ジョブにより起動されるがそのジョブの処理実
体となるアプリケーションの処理終了によりジョブを終
了する点で特徴づけられる。すなわち、正常な状態では
業務系製品のように停止ジョブによらずに所定の処理の
終了の後、正常終了するジョブである。Jobs to be automatically operated are different in nature. 1) Jobs related to business products in which the application is started and stopped by another job. 2) Batches in which the end of the application and the end of the job are the same. Jobs can be broadly divided into work-related jobs. Here, jobs related to business products (hereinafter simply referred to as “business products”) include:
There are a database monitor for providing a database function, an online transaction monitor for providing a function of online request processing, and the like. These are jobs (applications) that are characterized by being activated by the execution of a start job and continuing to be activated until a stop job is executed. On the other hand, jobs related to batch work (hereinafter, simply referred to as “batch work”) include file sorting, file merger, backup, collection, distribution, and management of definition information. These are characterized in that they are started by a start job, but are terminated by the end of processing of an application which is a processing entity of the job. That is, in a normal state, the job is a job that normally terminates after the end of a predetermined process without depending on a stopped job like a business product.

【００１４】また、バッチ業務に関するジョブでは、そ
の処理する場所の違いから２ａ）現用サーバの処理部で処理されるバッチ業務２ｂ）現用サーバとネットワークを介して接続される他
の処理サーバで処理されるバッチ業務に大別できる。２ａ）のようにバッチ業務が現用サーバ
の処理部で処理される際に現用サーバに障害が生じた場
合、このバッチ業務に関するジョブは一般に異常終了す
る。よって、予備サーバに切り替える際に、一般的に予
備サーバでこのバッチ業務に関するジョブから再度実行
しなければならい。ただし、単純に再実行すると不具合
を生じるジョブがあるため、そのジョブから再実行する
か否か判断を行い、その判断結果に基づ処理をする。一
方２ｂ）のようにバッチ業務がネットワークを介して接
続される他の処理サーバで処理される場合、そのバッチ
業務に関するジョブは一般に現用サーバの障害に関わら
ず独立に処理されている。よって、このようなバッチ業
務に関するジョブは、処理サーバでの処理結果や、その
ジョブが再実行しても問題ないか否かといった性質に応
じてそのジョブもしくはその後続ジョブから再度実行す
る必要がある。A job related to a batch job is processed by a processing unit of the active server. 2b) A batch job processed by the processing unit of the active server. 2b) The job is processed by another processing server connected to the active server via a network. Batch operations. When a failure occurs in the active server when the batch operation is processed by the processing unit of the active server as in 2a), the job related to the batch operation generally ends abnormally. Therefore, when switching to the spare server, it is generally necessary to execute the job related to the batch operation again on the spare server. However, since there is a job that causes a problem when simply re-executed, it is determined whether or not to re-execute from the job, and processing is performed based on the determination result. On the other hand, when a batch job is processed by another processing server connected via a network as in 2b), the job related to the batch job is generally processed independently of the failure of the active server. Therefore, it is necessary to re-execute such a job related to a batch job from the job or its succeeding job according to the processing result of the processing server and the nature of whether or not the job can be re-executed. .

【００１５】よって本発明のホットスタンバイシステム
は、ジョブの性質、実行場所に応じて、ａ）業務系製品に関しては、現用サーバにおいて業務系
製品の起動状況に関するステータスをステータス情報と
して共有ディスクに記憶させ、現用サーバに障害が生じ
たときに予備サーバにおいてステータス情報を参照して
業務系製品をバッチにより自動的に再起動させるｂ）バッチ業務に関しては、現用サーバに障害が生じた
ときに起動中のバッチ業務がｂ−１）現用サーバで処理されていた場合、このバッチ
業務をその性質に応じて予備サーバで再実行させるｂ−２）現用サーバ以外の処理サーバで処理されていた
場合、この処理サーバにおけるバッチ業務の処理状況や
その性質に応じて再実行のための処理をする。この場
合、バッチ業務の正常終了を一定時間待ち合わせ正常終
了したら後続のジョブから再実行させ、異常終了もしく
はジョブが終了しない場合であって再実行による問題が
生じない場合にはこのジョブから再実行させる。という機能を持たせ、自動運用される現用サーバに障害
が生じたときにも、予備サーバにおいて自動的に障害発
生前の現用サーバと同じ状態にできるようにする。な
お、業務系製品に関しては、ａ）のスタータス情報によ
り現用サーバで処理される場合も他の処理サーバで処理
される場合も対応可能となる。Therefore, the hot standby system of the present invention has the following features in accordance with the nature and execution place of a job. When a failure occurs in the active server, the spare server refers to the status information and automatically restarts the business-related products in batches. B) For the batch operation, the active server is activated when a failure occurs in the active server. B-1) If the batch job was processed on the active server, this batch job is re-executed on the spare server according to its nature. B-2) If the batch job was processed on a processing server other than the active server, this process is performed. Processing for re-execution is performed according to the processing status and nature of the batch operation in the server. In this case, wait for the normal end of the batch job for a certain period of time, and if the job ends normally, re-execute the following job. . In the event that a failure occurs in the automatically operated active server, the spare server can be automatically brought into the same state as the active server before the failure occurred. It should be noted that business-related products can be handled by the active server or by another processing server according to the status information of a).

【００１６】次に、上記機能を備えたホットスタンバイ
システムを図を用いて説明する。図１は、本発明のホッ
トスタンバイシステムの一構成例を示す図である。図１
よりホットスタンバイシステムは現用サーバ１と予備サ
ーバ４と共有ディスク３とにより構成される。ここで、
共有ディスク３は、現用サーバ１に障害が生じた際に、
予備サーバ４が現用サーバ１として自動的に機能するた
めに必要なる情報を記録している。現用サーバ１は、各
種ジョブの運転・処理を行うことで利用者に各種機能提
供を行う。予備サーバ４は、現用サーバ１に障害が生じ
たときに共有ディスク３に記録された情報に基づき現用
サーバ１として動作するために用意されたサーバであ
る。また、現用サーバ１および予備サーバ４は、ＣＰＵ
（中央演算装置）、ＯＳ（Operating System）等の制御
プログラム、内部メモリ等により構成される処理部２、
５を備えている。そして、この処理部２、５によりプロ
グラムとして提供される監視部１１、障害対処部１２等
の各機能部が実行される。Next, a hot standby system having the above functions will be described with reference to the drawings. FIG. 1 is a diagram showing one configuration example of the hot standby system of the present invention. FIG.
The hot standby system includes the active server 1, the spare server 4, and the shared disk 3. here,
The shared disk 3 is used when the active server 1 fails.
Information necessary for the spare server 4 to automatically function as the active server 1 is recorded. The active server 1 provides various functions to the user by operating and processing various jobs. The spare server 4 is a server prepared to operate as the active server 1 based on information recorded on the shared disk 3 when a failure occurs in the active server 1. Further, the active server 1 and the spare server 4 have a CPU
(Central processing unit), a processing program 2 including a control program such as an OS (Operating System), an internal memory, etc.
5 is provided. The processing units 2 and 5 execute respective functional units such as a monitoring unit 11 and a failure handling unit 12 provided as programs.

【００１７】現用サーバ１には、機能部として監視部１
１、障害対処部１２、スケジュール処理部１３、ステー
タス管理部１４、各種アプリケーション１７が備えら
れ、予備サーバ４には、監視部１１、障害対処部１２、
スケジュール処理部１３、業務系製品再起動部１５、自
動再実行部１６、各種アプリケーション１７が備えられ
ている。ここで、各機能部の機能を説明すると以下のよ
うになる。監視部１１は、自己サーバ内の状況を監視す
るもので、システム監視および自動オペレーションシス
テムといった機能を備えたものである。ここで、システ
ム監視機能とは、自己サーバ内で実行されているプロセ
スからのメッセージを監視するものである。なお、ここ
で「プロセス」とは実行されているプログラムのことを
いう。また、自動オペレーションシステム機能は、シス
テム監視と連携してメッセージを契機にプロセスの再起
動、切り替え制御などを行うものである。障害対処部１
２は監視対象プロセス等の停止とった障害の検出を行
い、障害を検出すると自動的に現用サーバ１の処理部２
から予備サーバ４の処理部５への切り替えを行う機能を
備えたものである。現用サーバ１および予備サーバ４に
おいてこの障害対処部１２が起動されることにより現用
サーバ１および予備サーバ４間で相互監視が行われるよ
うになる。そして、予備サーバ４の障害対処部１２は、
現用サーバ１における監視対象プロセスの停止等の障害
を検出すると、自動的に現用サーバ１の処理部２から予
備サーバ４の処理部５へ切り替えを行うための処理を行
う。ここで、処理部の切り替えのための処理としては、
サーバのネットワークアドレス変更、ホスト名の変更、
共有ディスク３の論理的な接続切り替え（マウント）等
がある。なお、この障害対処部１２を備えたシステム
は、一般にＨＡ構成（High Availability cluster）と
呼ばれる。The active server 1 has a monitoring unit 1 as a functional unit.
1, a failure handling unit 12, a schedule processing unit 13, a status management unit 14, and various applications 17. The spare server 4 includes a monitoring unit 11, a failure handling unit 12,
A schedule processing unit 13, a business-related product restart unit 15, an automatic restart unit 16, and various applications 17 are provided. Here, the function of each functional unit will be described as follows. The monitoring unit 11 monitors the status in the server itself, and has functions such as a system monitoring and an automatic operation system. Here, the system monitoring function is to monitor a message from a process executed in the own server. Here, "process" refers to a program being executed. In addition, the automatic operation system function performs process restart, switching control, and the like in response to a message in cooperation with system monitoring. Troubleshooting unit 1
2 detects a stopped failure of a process to be monitored or the like, and automatically detects the processing unit 2 of the active server 1 upon detecting the failure.
From the backup server 4 to the processing unit 5. When the failure handling unit 12 is activated in the active server 1 and the standby server 4, mutual monitoring is performed between the active server 1 and the standby server 4. Then, the failure handling unit 12 of the spare server 4
When a failure such as a stop of the process to be monitored in the active server 1 is detected, a process for automatically switching from the processing unit 2 of the active server 1 to the processing unit 5 of the spare server 4 is performed. Here, as processing for switching the processing unit,
Server network address change, host name change,
There is a logical connection switching (mounting) of the shared disk 3 and the like. Note that a system including the failure handling unit 12 is generally called an HA configuration (High Availability cluster).

【００１８】スケジュール処理部１３は、共有ディスク
２３内に記憶された運転スケジュールに定義されるジョ
ブに沿って各種アプリケーション１７の起動等を行うと
ともにその実行履歴３２を共有ディスク２３に記録する
ものである。このスケジュール処理部１３は、自動ジョ
ブスケジューリングシステムと呼ばれる。ステータス管
理部１４は、現用サーバ１において管理対処となる業務
系製品の起動状況に関するステータスをステータス情報
３３として共有ディスクに記録する。なお、このステー
タスとしては”起動”、”停止”の２種類である。業務
系製品再起動部１５は、現用サーバ１に障害が生じたと
きにステータス情報３３を参照して予備サーバ４におい
て業務系製品を再起動させるための処理を行う。自動再
実行部１６は、現用サーバに障害が生じたときに実行履
歴３２を参照して運転スケジュール３３における再実行
ポイントを決定し、運転スケジュール３３に基づいたジ
ョブの再実行をさせる。なお、「再実行ポイント」と
は、障害が発生した時点の運転スケジュール３３中にお
けるジョブのことをいう。また、自動再実行部１６は、
この再実行の過程において、再実行ポイントのジョブが
バッチ業務に関するものであって現用サーバ１で処理さ
れていた場合はそのジョブの性質に応じて予備サーバ４
においてそのバッチ業務からの再実行を行わせ、バッチ
業務が現用サーバ１以外の処理サーバで処理されていた
場合はこの処理サーバにおけるバッチ業務の処理状況や
その性質によりジョブの再実行処理をする。各種アプリ
ケーション１７とは、サーバがサーバの利用者に実際に
提供する機能群となるもので、前述した業務系製品およ
びバッチ業務を実現するプログラム群である。すなわち
ジョブを実行するために起動され、求められる処理を行
うプログラム群である。The schedule processing unit 13 activates various applications 17 in accordance with the job defined in the operation schedule stored in the shared disk 23, and records the execution history 32 on the shared disk 23. . This schedule processing unit 13 is called an automatic job scheduling system. The status management unit 14 records, on the shared disk, the status relating to the activation status of the business product to be managed in the active server 1 as status information 33. It should be noted that there are two types of status, "start" and "stop". The business product restart unit 15 performs processing for restarting the business product in the spare server 4 with reference to the status information 33 when a failure occurs in the active server 1. The automatic re-execution unit 16 determines a re-execution point in the operation schedule 33 with reference to the execution history 32 when a failure occurs in the active server, and causes the job to be re-executed based on the operation schedule 33. The “re-execution point” refers to a job in the operation schedule 33 at the time of occurrence of the failure. Also, the automatic re-execution unit 16
In the process of the re-execution, if the job at the re-execution point is related to a batch job and has been processed by the active server 1, the spare server 4 is selected according to the nature of the job.
When the batch job has been processed by a processing server other than the active server 1, the job is re-executed according to the processing status of the batch job in this processing server and its nature. The various applications 17 are a group of functions that the server actually provides to the user of the server, and are a group of programs for realizing the above-mentioned business products and batch business. That is, it is a group of programs that are activated to execute a job and perform the required processing.

【００１９】ここで、監視部１１、スケジュール処理部
１３は、前述したように自動運用のために必要となる機
能であり、障害対処部１２が現用サーバ１に障害が生じ
た際に予備サーバ４に運用を切り替えるため必要となる
機能である。そして、ステータス管理部１４、業務系製
品再起動部１５、自動再実行部１６が自動運用される現
用サーバ１に障害が生じたときに、予備サーバ４におい
て迅速に障害発生前の現用サーバ１と同じ状態で運用が
開始できるように、自動的にジョブの再実行をするため
に必要となる機能である。また、符号１１から１７に示
す各機能部は、プログラムとして提供され、そのプログ
ラムが実行されることによりその機能が実現される。Here, the monitoring unit 11 and the schedule processing unit 13 are functions necessary for automatic operation as described above, and the failure handling unit 12 operates when the active server 1 fails. This is a function that is required to switch operations. Then, when a failure occurs in the active server 1 on which the status management unit 14, the business product restart unit 15, and the automatic re-execution unit 16 are automatically operated, the spare server 4 quickly returns to the active server 1 before the failure. This function is necessary to automatically re-execute a job so that operation can be started in the same state. Each of the functional units denoted by reference numerals 11 to 17 is provided as a program, and the functions are realized by executing the program.

【００２０】共有ディスク３は、ハードディスク、光磁
気ディスク等の不揮発性の記録手段であり、現用サーバ
１および予備サーバ４に物理的に接続されている。な
お、現用サーバ１が正常稼動しているとき、共有ディス
ク３は現用サーバ１に論理的に接続され、現用サーバ１
に異常が生じたとき、共有ディスク３は論理的な接続が
予備サーバ４に変更される。そして、この共用ディスク
３には、運転スケジュール３１、実行履歴３２、ステー
タス情報３３が記録される。なお、これらの具体例に付
いては別途詳細に説明する。なお、現用サーバ１もしく
は予備サーバ４には、必要に応じてＣＲＴ（Cathode Ra
y Tube）等の表示装置やキーボード、マウス等の入力装
置が接続される。The shared disk 3 is a non-volatile recording means such as a hard disk or a magneto-optical disk, and is physically connected to the active server 1 and the spare server 4. When the active server 1 is operating normally, the shared disk 3 is logically connected to the active server 1 and the active server 1
, The logical connection of the shared disk 3 is changed to the spare server 4. The shared disk 3 records an operation schedule 31, an execution history 32, and status information 33. These specific examples will be described in detail separately. The active server 1 or the spare server 4 may be provided with a CRT (Cathode Rad) as necessary.
y Tube) and input devices such as a keyboard and a mouse.

【００２１】次に、図１に示すホットスタンバイシステ
ムの動作を詳細に説明する。図２は、ホットスタンバイ
システムの動作を示すフローチャートである。図２にお
いて、左側に記載されたステップ（ステップＳ１１〜Ｓ
１６）は現用サーバ１の動作フローであり、右側に記載
されたステップ（ステップＳ２１〜Ｓ２７）は予備サー
バ４の動作フローである。まず、ホットスタンバイシス
テムの動作開始時に、現用サーバ１および予備サーバ４
において、監視部１１が起動され、自己サーバ内の監視
処理が開始する（ステップＳ１１、Ｓ２１）。次に、現
用サーバ１および予備サーバ４において、障害対処部１
２が起動され相互監視が開始する。これにより、現用サ
ーバ１に障害が生じたことの検出および、障害の発生に
よる現用サーバ１の処理部２から予備サーバ４の処理部
５への自動切り替えが可能となる（ステップＳ１２、２
２）。次に、現用サーバ１において、スケジュール処理
部１３が起動される。そして、スケジュール処理部１３
は運転スケジュール３１に定義されるジョブに沿って各
種アプリケーション等の起動を行うとともにその履歴を
実行履歴３２として記録していく。さらに、すでに起動
している監視部１１のシステム監視機能により実行履歴
３２から正常に業務系製品が起動・停止したメッセージ
が取得されると、監視部１１の自動オペレーション機能
によりステータス管理部１４が起動される。起動された
ステータス管理部１４は、ステータス管理の対象、すな
わち業務系製品であれば、起動状況に応じてステータス
情報３３（”起動”もしくは”停止”）を更新する（ス
テップＳ１４’）。Next, the operation of the hot standby system shown in FIG. 1 will be described in detail. FIG. 2 is a flowchart showing the operation of the hot standby system. In FIG. 2, the steps described on the left side (steps S11 to S11)
16) is an operation flow of the active server 1, and the steps (steps S21 to S27) described on the right side are an operation flow of the spare server 4. First, at the start of the operation of the hot standby system, the active server 1 and the standby server 4
In, the monitoring unit 11 is activated, and the monitoring process in the own server is started (steps S11 and S21). Next, in the active server 1 and the spare server 4, the failure handling unit 1
2 is started and mutual monitoring starts. As a result, it is possible to detect that a failure has occurred in the active server 1 and to automatically switch from the processing unit 2 of the active server 1 to the processing unit 5 of the spare server 4 due to the failure (steps S12 and S2).
2). Next, in the active server 1, the schedule processing unit 13 is started. Then, the schedule processing unit 13
Starts various applications according to the job defined in the operation schedule 31 and records the history as an execution history 32. Further, when a message indicating that the business-related product has been normally started and stopped is acquired from the execution history 32 by the system monitoring function of the monitoring unit 11 that has already been started, the status management unit 14 is started by the automatic operation function of the monitoring unit 11. Is done. The started status management unit 14 updates the status information 33 ("start" or "stop") according to the status of the status management, that is, if the product is a business product (step S14 ').

【００２２】自動運転中の現用サーバ１において、何ら
かの障害が発生すると（ステップＳ１５）、現用サーバ
１および予備サーバ４内の障害対処部１２により、予備
サーバ４で現用サーバ１の異常を検出する（ステップＳ
２３）。この際、可能であれば、予備サーバ４への切り
替え前に現用サーバ１において、スケジュール処理部１
３の停止が行われる（ステップＳ１６）。これは、異常
が生じたままでの自動運用を停止し、停止不可能となる
状態まで発展するのを防止するためである。一方、予備
サーバ４では、障害対処部１２の機能により自動的に現
用サーバ２１から予備サーバ４に本体切り替え処理が行
われる。この本体切り替えとして、前述したサーバのネ
ットワークアドレス変更、ホスト名の変更、共有ディス
ク３の論理的な接続切り替え（マウント）等が行われ
る。その後、スケジュール処理部１３が起動される（ス
テップＳ２４）。なお、この時点でスケジュール処理部
１３は、現用サーバ１で異常が生じた際のジョブの状態
が不明であることからジョブに対する処理を開始しない
で業務系製品再起動部１５、自動再実行部１６からの指
令を待つ。次に、予備サーバ４では、業務系製品再起動
部１５により、ステータス情報３３を参照して、業務系
製品の再起動をスケジュール処理部１３に指示する（ス
テップＳ２６）。また、予備サーバ４では、自動再実行
部１６により、実行履歴３２を参照して運転スケジュー
ル３３における再実行ポイントを決定し、運転スケジュ
ール３３に基づいたジョブの再実行をスケジュール処理
部１３に指示する。なお、この時、再実行ポイントのジ
ョブがバッチ業務である場合において、このバッチ業務
が現用サーバ１で処理されていたか、他の処理サーバで
処理されていたかにより、ジョブの再開処理が異なる
（ステップＳ２７）。以上により、現用サーバ１に異常
が発生しても、自動的に予備サーバ４に切り替えられ、
予備サーバ４において自動運用が継続して行われるよう
になる。When any failure occurs in the active server 1 during automatic operation (step S15), the failure server 12 in the active server 1 and the standby server 4 detects an abnormality of the active server 1 in the standby server 4 (step S15). Step S
23). At this time, if possible, the schedule server 1 in the active server 1 before switching to the spare server 4
3 is stopped (step S16). This is to stop the automatic operation with the abnormality occurring and prevent the state from developing to a state where the operation cannot be stopped. On the other hand, in the spare server 4, the main body switching process from the active server 21 to the spare server 4 is automatically performed by the function of the failure handling unit 12. As the main body switching, the network address of the server is changed, the host name is changed, and the logical connection of the shared disk 3 is switched (mounted). Thereafter, the schedule processing unit 13 is activated (Step S24). At this point, the schedule processing unit 13 does not start processing for the job because the state of the job when the abnormality occurs in the active server 1 is unknown, and the business process product restart unit 15 and the automatic re-execution unit 16 Wait for command from. Next, in the spare server 4, the business product restart unit 15 instructs the schedule processing unit 13 to restart the business product with reference to the status information 33 (step S26). In the spare server 4, the automatic re-execution unit 16 determines the re-execution point in the operation schedule 33 with reference to the execution history 32, and instructs the schedule processing unit 13 to re-execute the job based on the operation schedule 33. . At this time, if the job at the re-execution point is a batch job, the job restart process differs depending on whether the batch job was processed by the active server 1 or another processing server (step S27). As described above, even if an abnormality occurs in the active server 1, the server is automatically switched to the spare server 4,
Automatic operation is continuously performed in the spare server 4.

【００２３】次に、本発明の特徴となるステップＳ１
４’およびステップＳ２６、Ｓ２７を具体例を交え、よ
り詳細に説明する。なお、ここで説明する具体例とし
て、運転スケジュール３１は、図６のようにジョブのス
ケジュールがなされ、符号Ａもしくは符号Ｂに示すスケ
ジュールの自動運用途中で現用サーバ１に障害が発生し
たものとする。また、運転スケジュール３１中の符号６
１、６２、６４に示す”データベースモニタ”、”オン
ライントランザクションモニタ”は業務系製品に関する
ジョブであり、その他のジョブは、バッチ業務に関する
ジョブであるものとして説明する。そして、符号６１、
６３は、ぞれぞれデータベースモニタに対する起動ジョ
ブおよび停止ジョブであり、符号６２は、オンライント
ランザクションモニタに対する起動ジョブとなる。ま
た、バッチＤ・６７、バッチＦ・６９は、他の処理サー
バＡに処理依頼されるバッチ業務に関するジョブである
ものであり、内部で処理されるバッチＣ・６６やバッチ
Ｅ・６８と並列して実行される。そのため、図１０に示
すようにホットスタンバイシステム１０と処理サーバＡ
・７とは、ネットワーク６を介して接続されているもの
とする。Next, step S1 which is a feature of the present invention is described.
4 'and steps S26 and S27 will be described in more detail with specific examples. As a specific example described here, it is assumed that the operation schedule 31 is such that a job is scheduled as shown in FIG. 6 and a failure has occurred in the active server 1 during the automatic operation of the schedule indicated by reference numeral A or B. . Reference numeral 6 in the operation schedule 31
It is assumed that the “database monitor” and “online transaction monitor” shown in 1, 62 and 64 are jobs related to business products, and the other jobs are jobs related to batch jobs. And reference numeral 61,
Reference numeral 63 denotes a start job and a stop job for the database monitor, and reference numeral 62 denotes a start job for the online transaction monitor. The batch D 67 and the batch F 69 are jobs related to batch work requested to be processed by another processing server A, and are in parallel with the batch C 66 and the batch E 68 that are processed internally. Executed. Therefore, as shown in FIG. 10, the hot standby system 10 and the processing server A
7 is assumed to be connected via the network 6.

【００２４】始めに、図２のステップＳ１４’中のステ
ータス管理部１４の動作を図３のフローチャートを用い
て説明する。ステータス管理部１４は、監視部１１のシ
ステム監視機能により実行履歴３２を参照することで正
常に業務系製品が起動・停止した旨のメッセージを取得
すると、自動オペレーション機能により起動される（ス
テップＳ３１）。ここで、正常なメッセージにより起動
されるのは、異常状態を予備サーバ４において再現する
必要がないからである。次に、起動されたステータス管
理部１４は、ステータス管理の対象、すなわち業務系製
品であるか否を判断する（ステップＳ３２）。ここで、
ステータス情報３３内には、管理対象となる業務系製品
が予め登録され、この情報を参照することによりステー
タス管理対象であるか否か判断できるものとする。な
お、このステータス情報３３は、初期状態において管理
対象となる全ての業務系製品のステータスが”停止”に
なっているものとする。ステータス管理の対象であれ
ば、起動状況のステータスをステータス情報３３として
更新する（ステップＳ３３）なお、このステータスとし
ては、”起動”もしくは”停止”のいずれかであり、メ
ッセージにより判別できる。以上のようにして、強制終
了や異常終了を除いたジョブによる起動もしくは停止の
ステータス情報が記録される。なお、図６に示す運転ス
ケジュールの符号Ａもしくは符号Ｂに示す部分で現用サ
ーバ１に障害が発生した際の、ステータス情報３３の内
容はこのステータス管理部１４により図７のように記録
される。すなわち、データベースモニタは図６の符号６
４における自動運用により”停止”状態になり、オンラ
イントランザクションモニタは符号６２における自動運
用で”起動”状態となる。以上のように、ステータス管
理部１４を現用サーバ１に設け、共有ディスク３にステ
ータス情報３３を記録することで現用サーバ１で障害が
発生しても、業務系製品に関する自動運用状況を予備サ
ーバ４において再現するための情報を残すことができる
ようになる。First, the operation of the status management unit 14 in step S14 'of FIG. 2 will be described with reference to the flowchart of FIG. When the status management unit 14 obtains a message indicating that the business product has been normally started and stopped by referring to the execution history 32 by the system monitoring function of the monitoring unit 11, it is started by the automatic operation function (step S31). . Here, the reason for starting with a normal message is that it is not necessary to reproduce the abnormal state in the spare server 4. Next, the activated status management unit 14 determines whether or not the product is a status management target, that is, a business product (step S32). here,
In the status information 33, business-related products to be managed are registered in advance, and by referring to this information, it can be determined whether or not the product is a status management target. In the status information 33, it is assumed that the statuses of all business products to be managed in the initial state are "stopped". If the status is to be managed, the status of the activation status is updated as status information 33 (step S33). This status is either "activation" or "stop", and can be determined by a message. As described above, the status information of the start or stop by the job excluding the forced termination and the abnormal termination is recorded. The contents of the status information 33 when a failure occurs in the active server 1 in the portion indicated by the reference symbol A or B in the operation schedule shown in FIG. 6 is recorded by the status management section 14 as shown in FIG. That is, the database monitor is denoted by reference numeral 6 in FIG.
The automatic operation in the step 4 causes the “stop” state, and the online transaction monitor enters the “start” state in the automatic operation 62. As described above, the status management unit 14 is provided in the active server 1 and the status information 33 is recorded on the shared disk 3 so that even if a failure occurs in the active server 1, the automatic operation status relating to the business-related product can be stored in the spare server 4 , Information to be reproduced can be left.

【００２５】次に、図２のステップＳ２６における業務
系製品再起動部１５の動作を図４のフローチャートを用
いて説明する。始めに、業務系製品再起動部１５は、ス
テータス情報３３においてステータス管理対象となる業
務系製品の番号の初期化を行う（ステップＳ４１）。次
に、Ｎ番目の業務系製品のステータスを検索する（ステ
ップＳ４２）。Ｎ番目の業務系製品のステータスが”起
動”であれば、そのＮ番目の業務系製品の起動指令をス
ケジュール処理部１３に対して行う（ステップＳ４
３）。なお、この指令を受けたスケジュール処理部１３
は、その業務系製品の起動処理を行うことになる。次
に、Ｎ番目の業務系製品のステータスが”停止”の場
合、および、ステップＳ４３の処理が終了した後に、次
の業務系製品のステータスをチェックするために変数Ｎ
をインクリメントする（ステップＳ４４）。そして、管
理対象となる全ての業務系製品のステータスチェックが
終了していれば処理を終了し、そうでなければステップ
Ｓ４２に戻る（ステップＳ４５）。以上のようにして、
業務系製品再起動部１５が動作する。具体例を示すと、
ステータス情報３３が図７に示すようになっていれば、
オンライントランザクションが起動されることになる。
これにより、図６に示す運転スケジュールにおいて、障
害が発生した部分（符号Ａもしくは符号Ｂ）までにおけ
る業務系製品の起動状況を予備サーバ４において再現す
ることができる。Next, the operation of the business product restart unit 15 in step S26 of FIG. 2 will be described with reference to the flowchart of FIG. First, the business-related product restart unit 15 initializes the number of the business-related product whose status is to be managed in the status information 33 (step S41). Next, the status of the Nth business product is searched (step S42). If the status of the N-th business product is "start", a command to start the N-th business product is issued to the schedule processing unit 13 (step S4).
3). The schedule processing unit 13 receiving this command
Performs the startup processing of the business product. Next, when the status of the N-th business product is "stop", and after the process of step S43 is completed, the variable N is used to check the status of the next business product.
Is incremented (step S44). Then, if the status check of all business-related products to be managed has been completed, the process ends, otherwise, the process returns to step S42 (step S45). As described above,
The business product restart unit 15 operates. As a specific example,
If the status information 33 is as shown in FIG.
An online transaction will be started.
Thus, in the operation schedule shown in FIG. 6, the start-up state of the business-related product up to the portion where the failure has occurred (reference numeral A or reference numeral B) can be reproduced in the spare server 4.

【００２６】次に、図２のステップＳ２７における自動
再実行部１６の動作を図５のフローチャートを用いて説
明する。自動再実行部１６は、始めに実行履歴３２を参
照して実行状況が不明となっているジョブを抽出する
（ステップＳ５１）。この処理により、再実行ポイント
が決定される。例えば、図６の運転スケジュールにおい
て符号Ａで示すバッチＢ・６５の自動運用途中で障害が
起ったとすると、その実行履歴３２は、図８のようにな
る。よって、図８の例で実行履歴が不明となっているジ
ョブはバッチＢであり、このジョブの自動運転中に障害
が発生したことが分かる。そして、このジョブが再実行
ポイントと判断され、ステップＳ５２以降の処理が行わ
れる。なお、図８において、実行履歴の”○”は正常に
処理が行われたことを示す。別の例として、図６の運転
スケジュールにおいて符号Ｂで示すバッチＣ・６６およ
びバッチＤ・６７の自動運用途中で障害が起ったとする
と、その実行履歴３２は、図９のようになる。よって、
図９の例では、実行履歴が不明となっているジョブはバ
ッチＣおよびバッチＤであり、このジョブの自動運転中
に障害が発生したことが分かり、これらジョブが再実行
ポイントと判断される。なお、ステップＳ５１では、こ
れら２つのジョブが順番に抽出され、それぞれステップ
Ｓ５２以降で処理される。なお、バッチＤについては、
その処理が他のサーバで行われることからその処理サー
バ情報も加えられている。なお、図８、図９の実行履歴
では、内部で処理するジョブに付いてはその実行処理が
行われる処理サーバ情報が省略されるものとする。Next, the operation of the automatic re-execution unit 16 in step S27 of FIG. 2 will be described with reference to the flowchart of FIG. The automatic re-executing unit 16 first refers to the execution history 32 and extracts a job whose execution status is unknown (step S51). By this processing, a re-execution point is determined. For example, if a failure occurs during the automatic operation of the batch B 65 indicated by the reference symbol A in the operation schedule of FIG. 6, the execution history 32 is as shown in FIG. Accordingly, the job whose execution history is unknown in the example of FIG. 8 is batch B, and it can be seen that a failure has occurred during the automatic operation of this job. Then, this job is determined to be the re-execution point, and the processing after step S52 is performed. In FIG. 8, “の” in the execution history indicates that the process has been performed normally. As another example, if a failure occurs during the automatic operation of the batch C 66 and the batch D 67 indicated by the reference symbol B in the operation schedule of FIG. 6, the execution history 32 is as shown in FIG. Therefore,
In the example of FIG. 9, the jobs whose execution histories are unknown are batch C and batch D. It is known that a failure has occurred during the automatic operation of this job, and these jobs are determined as re-execution points. In step S51, these two jobs are extracted in order, and are respectively processed in step S52 and subsequent steps. In addition, about batch D,
Since the processing is performed by another server, the processing server information is also added. In the execution histories of FIGS. 8 and 9, for a job to be processed internally, the processing server information on which the execution processing is performed is omitted.

【００２７】次に、再実行ポイントのジョブが現用サー
バ１で実行処理されるバッチ業務であるか否か、即ち内
部処理されるバッチ業務であるか否かの判断を行う（ス
テップＳ５２）。なお、実行履歴には、前述のようにジ
ョブの実行処理されるサーバの情報も含まれており、こ
の情報を参照することにより判断できる。内部処理され
るバッチ業務であれば、このバッチ業務が再実行可能か
否か判断する（ステップＳ５３）。ここで、再実行可能
か否か判断するのは、バッチ業務として起動されるアプ
リケーションによっては、再実行すると不具合が生じる
ものがあるからである。なお、再実行により不具合が生
じるジョブに関する情報は共有ディスク３に予め記録さ
れており、自動実行部１６はその情報を参照することで
そのジョブの再実行により不具合が生じるか否か判断で
きるものとする。なお、図１においては、この情報の図
示を省略している。再実行可能と判断された場合、再実
行処理部１６はスケジュール処理部１３に対し、ステッ
プＳ５１で抽出されたジョブからの再実行を指令する
（ステップＳ５４）。ここで、ステップＳ５１で抽出さ
れたジョブから再実行するのは、内部処理されるバッチ
業務に関するジョブの実行処理中に障害が発生した場
合、そのジョブは異常終了する可能性が高いからであ
る。例えば、図８のバッチＢは内部処理されるバッチ業
務であることから再実行により不具合が生じるものでな
ければこのジョブからの実行指令が行われる。また、図
９の例で、ステップＳ５１で１つ目に抽出されるバッチ
Ｃに関しては、内部処理されるバッチ業務であることか
ら、再実行により不具合が生じるものでなければこのジ
ョブからの実行が行われる。一方、再実行することによ
り不具合が生じるジョブであれは、そのジョブ以降の実
行は、中断されることになる。Next, it is determined whether the job at the re-execution point is a batch job to be executed by the active server 1, that is, whether it is a batch job to be internally processed (step S52). It should be noted that the execution history also includes information on the server on which the job is executed as described above, and the determination can be made by referring to this information. If it is a batch job to be internally processed, it is determined whether this batch job can be re-executed (step S53). The reason for determining whether or not re-execution is possible is that some applications that are started as batch operations may cause problems when re-executed. It should be noted that information relating to a job that causes a problem due to re-execution is recorded in advance on the shared disk 3, and the automatic execution unit 16 can determine whether or not a problem occurs due to re-execution of the job by referring to the information. I do. In FIG. 1, illustration of this information is omitted. If it is determined that re-execution is possible, the re-execution processing unit 16 instructs the schedule processing unit 13 to re-execute from the job extracted in step S51 (step S54). Here, the reason for re-executing from the job extracted in step S51 is that if a failure occurs during execution of a job relating to a batch job to be internally processed, the job is likely to be abnormally terminated. For example, since the batch B in FIG. 8 is a batch job to be internally processed, an execution command is issued from this job unless a problem occurs due to re-execution. Further, in the example of FIG. 9, the batch C extracted first in step S51 is a batch job to be internally processed, and therefore, execution from this job is performed unless a problem occurs due to re-execution. Done. On the other hand, if the job causes a problem due to re-execution, execution of the job and the subsequent jobs will be interrupted.

【００２８】一方、ステップＳ５１で抽出されたジョブ
が、他の処理サーバに処理依頼をしたバッチ業務に関す
るジョブであれば、その処理サーバに対し、そのバッチ
業務の処理結果の確認を行う（ステップＳ５５）。そし
て、そのバッチ業務に関するジョブが、依頼した処理サ
ーバにより正常終了したか否か判断する（ステップＳ５
６）。このように、処理サーバの処理結果の確認を行う
のは、バッチ業務を依頼した処理サーバでは障害の生じ
た現用サーバ１とリソース等の共有をしていない限り、
障害の影響を受けないで処理を続けあるからである。よ
って、依頼されたバッチ業務の実行処理は独立に行わ
れ、予備サーバ４への切り替え中もその処理は実行され
ることになる。この判断は、処理サーバに対して問い合
わせを行うことにより行う。そして、その処理が異常終
了した場合や、所定時間後あるいは所定回数の問い合わ
せ後においても正常終了しなければ、正常終了でないと
判断する。なお、処理サーバにおけるバッチ業務の処理
自体が終了し、結果の返却時に予備サーバ４への切り替
えが行われた場合、処理サーバにおける処理結果の返却
先が不明となる。よって、このような状態に対応するた
めにも自動再実行部１６は、結果の返却を待つのではな
く、依頼した処理サーバへ正常終了したか問い合わせる
ことが望ましい。具体例として図９の例では、ステップ
Ｓ５１で２つ目に抽出されるバッチＤに関しては、処理
サーバＡに処理依頼されるバッチ業務であることからス
テップＳ５６での判断が行われる。次に、ステップＳ５
６の判断において、正常終了と判断された場合には、ス
テップＳ５１で抽出したジョブの次のジョブ、すなわち
後続ジョブからの再実行をスケジュール処理部１３に対
して指令する（ステップＳ５７）。On the other hand, if the job extracted in step S51 is a job related to a batch job for which a processing request has been made to another processing server, the processing result of the batch job is checked with the processing server (step S55). ). Then, it is determined whether or not the job relating to the batch job has been completed normally by the requested processing server (step S5).
6). As described above, the processing result of the processing server is checked unless the processing server that has requested the batch operation shares resources with the active server 1 in which the failure has occurred.
This is because processing is continued without being affected by the failure. Therefore, the execution process of the requested batch job is performed independently, and the process is executed even during the switching to the spare server 4. This determination is made by making an inquiry to the processing server. If the process is abnormally terminated, or if the process is not terminated normally after a predetermined time or after a predetermined number of inquiries, it is determined that the process is not terminated normally. Note that, when the processing of the batch operation in the processing server is completed and switching to the spare server 4 is performed at the time of returning the result, the return destination of the processing result in the processing server becomes unknown. Therefore, in order to cope with such a state, it is desirable that the automatic re-execution unit 16 inquires of the requested processing server whether the processing has been completed normally, instead of waiting for the return of the result. As a specific example, in the example of FIG. 9, regarding the batch D extracted second in step S51, the determination is made in step S56 because it is a batch job requested to be processed by the processing server A. Next, step S5
If it is determined in step 6 that the job has been completed normally, the schedule processing unit 13 is instructed to re-execute from the job following the job extracted in step S51, that is, the succeeding job (step S57).

【００２９】一方、ステップＳ５６の判断において、正
常終了していないと判断された場合には、ステップＳ５
１で抽出したジョブが再実行可能か否か判断する（ステ
ップＳ５８）。なお、この判断は、ステップＳ５３での
判断と同様に、共有ディスク３に予め記録された再実行
により不具合が生じるジョブに関する情報を参照するこ
とで行う。そして、ステップＳ５８で再実行可能と判断
された場合には、ステップＳ５１で抽出したジョブから
の再実行をスケジュール処理部１３に対し指令する（ス
テップＳ５９）。一方、再実行することのできないジョ
ブであれは、そのジョブ以降の実行は、中断されること
になる。On the other hand, if it is determined in step S56 that the process has not been completed normally, the process proceeds to step S5.
It is determined whether the job extracted in 1 is re-executable (step S58). Note that this determination is made by referring to information about a job which is recorded in advance in the shared disk 3 and which causes a problem due to re-execution, similarly to the determination in step S53. If it is determined in step S58 that re-execution is possible, the re-execution from the job extracted in step S51 is instructed to the schedule processing unit 13 (step S59). On the other hand, if the job cannot be re-executed, the execution of the job and thereafter will be interrupted.

【００３０】以上の処理をステップＳ５１で順番に再実
行ポイントとして抽出されるジョブに対しステップＳ５
２以降の処理を行い、ステップＳ５１の抽出が終了する
と処理を終了する。以上のように、自動再実行部１６を
設けることで現用サーバ１で障害が発生する時点までの
自動運用状況を予備サーバ４において再現することがで
き、引き続き予備サーバ４による自動運転を行うことが
可能となる。なお、自動再実行部１６は、上述のように
バッチ業務関してのみ再実行判断を行っている。なぜな
らば、業務系製品に関しては、業務系製品再起動部１５
が起動判断を行い処理済みとなっているからである。The above processing is sequentially performed on the jobs extracted as re-execution points in step S51 in step S5.
The processing after step 2 is performed, and when the extraction in step S51 ends, the processing ends. As described above, by providing the automatic re-execution unit 16, the automatic operation status up to the time when the failure occurs in the active server 1 can be reproduced in the spare server 4, and the automatic operation by the spare server 4 can be continuously performed. It becomes possible. Note that the automatic re-execution unit 16 makes the re-execution determination only for the batch work as described above. This is because, for business products, the business product restart unit 15
Has been activated and has been processed.

【００３１】なお、本実施の形態においてホットスタン
バイシステムは、現用サーバ一台と予備サーバ一台とに
より構成されるものとして説明したが、これに限定され
るものではない。例えば、現用サーバＮ台に対し予備サ
ーバ一台から構成されるものであっても良い。また、本
実施の形態において、現用サーバ１はステータス管理部
１４を含み、予備サーバ４は業務系製品再起動部１５お
よび自動再実行部１６を含むものとして説明したが、こ
れに限定されるものではなく、現用サーバ１と予備サー
バ４とを同一構成としても良い。すなわち、現用サーバ
１と予備サーバ４とが、ともにステータス管理部１４、
業務系製品再起動部１５、自動再実行部１６を備えるも
のであっても良い。これにより、２つのサーバがともに
現用サーバ、予備サーバの機能を備えることになり、障
害の発生により交互に現用サーバとしての機能を発揮で
きるようになるからである。また、予備サーバ４は、現
用サーバ１が正常に動作しているとき、単に待機してい
るのみではなく、他の処理をするものであってもよい。
また、本発明のホットスタンバイシステムは、単独で動
作する場合もあり、複数のホットスタンバイシステムや
通常の処理サーバとネットワークを介して接続され、分
散自動運用が行われるものであってもよい。また、図１
におけるステータス管理部１４、業務系製品再起動部１
５、自動再実行部１６の機能を実現するためのプログラ
ムをコンピュータ読み取り可能な記録媒体に記録して、
この記録媒体に記録されたプログラムをコンピュータシ
ステムに読み込ませ、実行することにより予備サーバに
おいて自動再実行が行われるようにしてもよい。なお、
ここでいう「コンピュータシステム」とは、ＯＳや周辺
機器等のハードウェアを含むものとする。In the present embodiment, the hot standby system has been described as including one active server and one spare server, but the present invention is not limited to this. For example, the configuration may be such that one standby server is provided for N active servers. Further, in the present embodiment, the active server 1 has been described as including the status management unit 14 and the spare server 4 has been described as including the business product restart unit 15 and the automatic re-execution unit 16. However, the present invention is not limited to this. Instead, the active server 1 and the spare server 4 may have the same configuration. That is, both the active server 1 and the spare server 4 communicate with the status management unit 14,
It may have a business-related product restart unit 15 and an automatic restart unit 16. As a result, the two servers both have the function of the active server and the function of the standby server, and can function alternately as active servers due to the occurrence of a failure. When the active server 1 is operating normally, the spare server 4 may perform other processing besides simply waiting.
Further, the hot standby system of the present invention may operate alone, and may be connected to a plurality of hot standby systems or ordinary processing servers via a network to perform distributed automatic operation. FIG.
Status management unit 14 and business product restart unit 1
5. Recording a program for realizing the function of the automatic re-executing unit 16 on a computer-readable recording medium,
The program recorded on the recording medium may be read into a computer system and executed, so that the automatic re-execution may be performed in the spare server. In addition,
The “computer system” here includes an OS and hardware such as peripheral devices.

【００３２】[0032]

【発明の効果】以上説明したように、本発明によれば、
下記の効果を得ることができる。ステータス管理部を現
用サーバに設け、共有ディスクにステータス情報を記録
するとともに、業務系製品再起動部を予備サーバに設け
ることで現用サーバで障害が発生する時点までの業務系
製品に関する自動運用状況を予備サーバにおいて自動的
に再現することができるようになる。また、自動再実行
部を設けることで現用サーバで障害が発生する時点まで
の自動運用状況を予備サーバにおいて自動的に再現する
ことができ、引き続き予備サーバによる自動運転を行う
ことが可能となる。また、現用サーバと予備サーバとに
ステータス管理部、業務系製品再起動部、自動再実行部
を設けることで、２つのサーバがともに現用サーバ、予
備サーバの機能を備えることになり、障害の発生により
交互に現用サーバとしての機能を発揮できるようにな
る。As described above, according to the present invention,
The following effects can be obtained. The status management unit is provided on the active server, status information is recorded on the shared disk, and the business product restart unit is provided on the spare server, so that the automatic operation status of business products up to the point of failure in the active server can be monitored. It can be automatically reproduced on the spare server. Further, by providing the automatic re-executing unit, the automatic operation status up to the time when the failure occurs in the active server can be automatically reproduced in the spare server, and the automatic operation by the spare server can be continuously performed. In addition, by providing a status management unit, a business product restart unit, and an automatic re-execution unit in the active server and the spare server, the two servers both have the functions of the active server and the spare server, and failure occurs. Thus, the function as the active server can be exhibited alternately.

[Brief description of the drawings]

【図１】本発明のホットスタンバイシステムの構成を
示す図である。FIG. 1 is a diagram showing a configuration of a hot standby system of the present invention.

【図２】図１のホットスタンバイシステムの動作を示
すフローチャートである。FIG. 2 is a flowchart showing an operation of the hot standby system of FIG. 1;

【図３】ステータス管理部の動作を示すフローチャー
トである。FIG. 3 is a flowchart illustrating an operation of a status management unit.

【図４】業務系製品再起動部の動作を示すフローチャ
ートである。FIG. 4 is a flowchart illustrating an operation of a business-related product restart unit.

【図５】自動再実行部による再実行処理を示すフロー
チャートである。FIG. 5 is a flowchart illustrating a re-execution process performed by an automatic re-execution unit.

【図６】運転スケジュールの一例を示す図である。FIG. 6 is a diagram showing an example of an operation schedule.

【図７】ステータス情報の一例を示す図である。FIG. 7 is a diagram illustrating an example of status information.

【図８】実行履歴の一例を示す図である。FIG. 8 is a diagram illustrating an example of an execution history.

【図９】実行履歴の他の例を示す図である。FIG. 9 is a diagram showing another example of the execution history.

【図１０】分散自動運用における構成の一例を示す図
である。FIG. 10 is a diagram illustrating an example of a configuration in distributed automatic operation.

【図１１】ホットスタンバイシステムの従来例の構成
を示す図である。FIG. 11 is a diagram showing a configuration of a conventional example of a hot standby system.

【図１２】図１１のホットスタンバイシステムの動作
を示すフローチャートである。FIG. 12 is a flowchart showing the operation of the hot standby system of FIG. 11;

【符号の説明】１現用サーバ２処理部３共有ディスク４予備サ
ーバ５処理部１１監視部１２障害
対処部１３スケジュール処理部１４ステ
ータス管理部１５業務系製品再起動部１６自動
再実行部１７各種アプリケーション３１運転スケジュール３２実行
履歴３３ステータス情報[Description of Signs] 1 Active server 2 Processing unit 3 Shared disk 4 Spare server 5 Processing unit 11 Monitoring unit 12 Failure handling unit 13 Schedule processing unit 14 Status management unit 15 Business product restart unit 16 Automatic rerun unit 17 Various applications 31 Operation schedule 32 Execution history 33 Status information

Claims

[Claims]

1. A shared disk for recording an operation schedule of various jobs, an active server on which jobs are automatically operated based on the operation schedule, and an active server recorded on the shared disk when a failure occurs in the active server. A hot standby system including a spare server that operates as an active server based on the information, wherein the active server is configured to start and stop an application in another job, and to set a status relating to a start status of a business product as status information. The spare server includes a status management unit that records on a shared disk, and the spare server includes a business-related product restart unit that restarts a business-related product by a job with reference to the status information when a failure occurs in the active server. A hot standby system characterized by:

2. The execution history of a job executed by the active server according to the operation schedule is recorded on the shared disk, and the spare server refers to the execution history when a failure occurs in the active server. An automatic re-execution unit for determining a re-execution point in the operation schedule and re-executing a job based on the operation schedule from a job related to the execution point or a succeeding job of the job. The hot standby system according to claim 1, wherein

3. The automatic re-execution unit, when the job at the re-execution point is a job related to the same batch job of terminating the application and terminating the job, if the job is being processed by the active server, The job is re-executed according to the nature of the job, and if the job is being processed by a processing server other than the active server, the job or the job is re-executed according to the processing status or nature of the job at the processing server. 3. The hot standby system according to claim 2, wherein the job is re-executed from a succeeding job.

4. The system according to claim 2, wherein the active server and the spare server include the status management unit, the business product restart unit, and the automatic restart unit, respectively.
Or the hot standby system according to claim 3.

5. A shared disk for recording operation schedules of various jobs, an active server for automatically operating jobs based on the operation schedule, and information recorded on the shared disk when a failure occurs in the active server. An automatic re-execution method in a hot standby system including a spare server operating as an active server based on the active server. Recording information on the shared disk, wherein the spare server refers to the status information when a failure occurs in the active server, and restarts a business-related product by a job. Rerun method.

6. The active server records an execution history of a job executed in accordance with the operation schedule on the shared disk, and the spare server refers to the execution history when a failure occurs in the active server. Determining a re-execution point in the operation schedule, and re-executing a job based on the operation schedule from a job related to the execution point or a succeeding job of the job. 6. The automatic re-execution method in the hot standby system according to 4.

7. A shared disk for recording an operation schedule of various jobs, an active server for automatically operating jobs based on the operation schedule, and information recorded on the shared disk when a failure occurs in the active server. A computer-readable recording medium that records an automatic re-execution program for a hot standby system consisting of a spare server that operates as an active server on the basis of an application server. A status management function for the active server that records status information of the start-up status on the shared disk; and, when a failure occurs in the active server, restarting a business-related product by a job by referring to the status information. Business product restart function for the spare server A recording medium recording an automatic re-execution program for implementing the computer.

8. The automatic re-execution program, when a failure occurs in the active server, refers to an execution history of a job executed according to the operation schedule recorded on the shared disk, and executes the re-execution in the operation schedule. An automatic re-execution function for the spare server, which determines a point and causes the job related to the execution point or a succeeding job of the job to re-execute a job based on the operation schedule is further included. Item 8. A recording medium on which the automatic re-execution program according to Item 7 is recorded.