JP2005031892A

JP2005031892A - Job execution system and execution control method

Info

Publication number: JP2005031892A
Application number: JP2003194986A
Authority: JP
Inventors: Takahiro Ikeda; 隆博池田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2003-07-10
Filing date: 2003-07-10
Publication date: 2005-02-03
Anticipated expiration: 2023-07-10
Also published as: JP4099115B2

Abstract

<P>PROBLEM TO BE SOLVED: To avoid a service stop when a resource error occurs in a resource manager host which executes a job during execution of the job. <P>SOLUTION: Each of resource manager hosts A-X which execute jobs executes a job while managing resources such as a CPU, a memory and a disk, and reports a resource error to an integrated manager host 10 when the CPU efficiency is deteriorated, or the memory or disk is insufficient. The integrated manager host 10 searches another resource manager host which can execute the job performed in the resource manager host of the reporting source of the error or changes the job schedule to make another resource manager host execute the job performed in the resource manager host of the reporting source of the error. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、ジョブ実行システム及び実行制御方法に係り、特に、情報処理システムにおけるスケジューリングされたジョブの実行システム及び実行制御方法に関する。
【０００２】
【従来の技術】
近年の情報処理システムは、無人で業務ジョブを処理するために、時刻予約によるジョブの自動起動、先行ジョブの終了による次に予約した接続ジョブの自動起動等のジョブスケジューリングが行われている。
【０００３】
また、情報処理システムとして、１台のマネージャホストを用意し、そのマネージャホストで定義したジョブを、他の複数の実行ホストで実行することができるようにし、さらに、実行結果をマネージャホストで集中管理することができるシステムが知られている。
【０００４】
前述のようなジョブ実行システムは、特許文献１等に記載されているように、あるホストで実行していたジョブが、そのホストで実行することができなくなっとき、他のホストで再実行することができる。
【０００５】
【特許文献１】
特開平１１−３５３２８４号公報
【０００６】
【発明が解決しようとする課題】
ジョブのスケジュールを行ってジョブの実行を制御する前述した従来技術は、ジョブスケジュールの構築時に、マシン情報、実行ジョブ情報、ジョブ実行条件等を考慮して、スケジューリングを決定するが、その後のシステム運用により、業務データ容量が変更され、各ジョブの実行時間がスケジュール構築時の予想と大きく異なった場合等に、各ホストマシンのＣＰＵに割り当てられているジョブの実行が非効率的なものとなり、システムにスケジューリングされているジョブの処理時間が増大するという問題点を生じさせていた。
【０００７】
また、前述した従来技術は、システム運用中に、実行中のホスト内で、メモリやディスクの容量不足等のリソースエラー等、マシントラブルが生じた場合、エラー時に行うジョブを定義しておくことにより、トラブルを回避することができるが、正常なケースのジョブ実行が停止してしまうという問題点を有している。
【０００８】
本発明の目的は、前述した従来技術の問題点を解決し、ジョブ実行中のリソースエラーによる業務自体の停止を回避することができるようにしたジョブの実行システム及び実行制御方法を提供することにある。
【０００９】
【課題を解決するための手段】
本発明によれば前記目的は、スケジューリングされているジョブ群を実行するジョブ実行システムにおいて、複数種のリソースを備えてジョブの実行を行う複数のリソースマネージャホストと、ジョブ実行の管理を行う統合マネージャホストとを備え、前記リソースマネージャホストが、ジョブ実行中のリソースの状況を監視してリソースエラーを検知するリソースエラー検知手段と、リソースエラーを前記統合マネージャホストに報告するリソースエラー通知手段とを有し、前記統合マネージャホストが、リソースエラーを報告してきたリソースマネージャホスト以外の他のリソースマネージャホストのリソースの状況からリソースエラーの報告を行ったリソースマネージャホストで行っていたジョブを実行することが可能な移行先のリソースマネージャホストを決定する手段を有することにより達成される。
【００１０】
また、前記目的は、スケジューリングされているジョブ群の実行を制御するジョブ実行制御方法において、複数種のリソースを備えてジョブを実行するリソースマネージャホストからのリソースエラーの報告を受け、リソースエラーの報告を行ったリソースマネージャホスト以外の他のリソースマネージャホストのリソースの状況からリソースエラーの報告を行ったリソースマネージャホストで行っていたジョブを実行することが可能な移行先のリソースマネージャホストを決定することにより達成される。
【００１１】
【発明の実施の形態】
以下、本発明によるジョブ実行システム及び実行制御方法の実施形態を図面により詳細に説明する。
【００１２】
図１は本発明の一実施形態によるジョブ実行システムの構成を示すブロック図、図２は統合マネージャホストの内部構成を示すブロック図、図３はリソースマネージャホストの内部構成を示すブロック図である。図１〜図３において、１０は統合マネージャホスト、２０〜４０はリソースマネージャホスト、５０〜９０はリソース、１００はＰＰ情報データベース、１１０はジョブスケジュール実行部、１２０はジョブ実行部、１３０はリソースエラー情報受信部、１４０はリソースエラー情報解読部、１５０は各リソース使用状況判定部、１６０はリソース調査部、１７０はリソース削除依頼部、１８０はインストール指示部、１９０はインストール情報送信部、２００はジョブスケジュール定義変更部、２１０はリソースエラー検知部、２２０はリソースエラー通知部、２３０はリソース情報通知部、２４０は命令受け付け部、２５０はリソース削除部、２６０はインストール情報受信部、２７０はインストール実行部である。
【００１３】
本発明の実施形態によるジョブ実行システムは、図１に示すように、統合マネージャホスト１０と複数のリソースマネージャホスト２０〜４０とがネットワークを介して接続されて構成されている。リソースマネージャホスト２０〜４０は、管理対象となるＣＰＵ、メモリ、ディスク等の多種のリソース５０〜９０を含んで、あるいは、接続されて構成され、ジョブの実行を行う。また、統合マネージャホスト１０は、リソースマネージャホストで使用するアプリケーション（ＰＰ）情報を格納して管理するＰＰ情報データベース１００が接続されており、実行すべきジョブのスケジュール管理、ジョブを実行するリソースマネージャホストの管理を行う。
【００１４】
統合マネージャホスト１０は、図２に示すような各種の機能部を備えて構成されており、次に、これらの機能部のそれぞれについて説明する。
【００１５】
ジョブスケジュール実行部１１０は、すでに定義されているジョブスケジュールを元にそのジョブスケジュールを制御する。
【００１６】
ジョブ実行部１２０は、ジョブスケジュールにより定義されている１つ１つのジョブを実行する。
【００１７】
リソースエラー情報受信部１３０は、リソースマネージャホストから送信されてきたエラー通知を受信する。
【００１８】
リソースエラー情報解読部１４０は、リソースエラー情報受信部１３０で受け取ったエラー通知を解読する。
【００１９】
リソース使用状況判定部１５０は、ネットワークを介して接続されている全てのリソースマネージャホストのリソース群がどの程度使用されているかを調査し、リソースエラーにより停止しているジョブを行うことのできる移動先のリソースマネージャホストを決定する。
【００２０】
リソース調査部１６０は、ジョブを実行することができる移動先のリソースマネージャホストが決定したら、そのリソースマネージャホストにインストールしなくてはならないＰＰ等の情報や、そのリソースマネージャホストに存在する不要な情報を洗い出す。
【００２１】
リソース削除依頼部１７０は、リソース使用状況判定部１５０の判定で、どのリソースマネージャホストも、移動先としてジョブの実行を行うことができるだけの余裕がない場合に、各リソースマネージャホストが持つリソースの使用率をみて、解放できそうなリソースを持つリソースマネージャに不要情報の削除の依頼を行う。
【００２２】
インストール指示部１８０は、移動先のリソースマネージャホストにＰＰ情報等のジョブの実行に必要な情報のインストールを依頼する。
【００２３】
インストール情報送信部１９０は、実際にマシンにインストールする情報を、リソースマネージャホストに送信する。
【００２４】
ジョブスケジュール定義部２００は、ジョブをどのような順番で、どのリソースマネージャホストを使用するか、どの時間帯で実行するかを決定し、ジョブスケジュールテーブルの書き換えを行う。
【００２５】
リソースマネージャホストのそれぞれは、図２に示すような各種の機能部を備えて構成されており、次に、これらの機能部のそれぞれについて説明する。
【００２６】
リソースエラー検知部２１０は、自リソースマネージャホスト内でのリソースエラーを検知し、リソースエラー通知部２２０は、取得したリソースエラーの情報を統合マネージャホスト１０に対して通知する。
【００２７】
リソース情報通知部２３０は、自リソースマネージャホスト内でどんなリソースがどれだけ不足しているかの情報、及び、ジョブの実行に必要なＰＰ等の情報を統合マネージャホスト１０に対し通知する。
【００２８】
命令受け付け部２４０は、統合マネージャホスト１０から送信されてきた命令を受け付けて解読する。
【００２９】
リソース削除部２５０は、リソース削除依頼部１７０から通知された不要と判断された情報によりリソースを解放する。
【００３０】
インストール情報受信部２６０は、ジョブ実行に必要なインストール情報を統合マネージャホスト１０から受信し、インストール実行部２７０は、ジョブ実行に必要な情報をインストールする。
【００３１】
次に、前述したように構成されるジョブ実行システムにおけるジョブ実行中のリソースエラーによる業務の停止を回避する処理動作を説明するが、その処理動作を説明する前に、処理の中で必要とするリソースマネージャホストのリソース監視画面、各種データの通信プロトコル、各種データテーブルの構成について図面により説明する。
【００３２】
図４はリソースマネージャホストで自ホストのリソースを監視するために用いるリソース監視画面の例を示す図である。この表示画面は、リソースマネージャホストに登録されているそのホストのリソースの使用状況を表示するものであり、図示例は、この画面を表示しているリソースマネージャホストが、リソースＡ〜Ｄとして、ＣＰＵ、メモリ、ディスク、その他を有し、リソースＡ〜Ｃがジョブ▲１▼により使用されており、リソースＡ（ＣＰＵ）の使用率が３０％、リソースＢ（メモリ）、Ｃ（ディスク）をそれぞれ２０ＭＢ、８０ＭＢ使用していることを表示している。
【００３３】
図５はエラー情報データの通信プロトコルの例を示す図である。このプロトコルは、リソースマネージャホストのリソースエラー通知部２２０から統合マネージャホスト１０にエラー情報を通知するために使用されるもので、データの先頭にこのデータがエラー情報データであることを示すエラーデータ開始のフラグを設定し、それに続いて、エラーが発生したリソースマネージャホスト名、エラー要因となった容量不足等のリソースの種別、どのようなエラーが発生したのかを示すエラー種別、エラーが発生した日時が設定されて構成される。
【００３４】
図６はエラー情報データテーブルの構成例を示す図である。エラー情報データテーブルは、統合マネージャホストがリソースマネージャホストから送信されてきた図５に示すエラー情報を切り分けて格納管理するためのものである。どのリソースマネージャホストのどのリソースで、どのようなエラーが起きたのかを切り分けて示し、エラーが発生したリソースマネージャホスト名、エラー要因となった容量不足等のリソースの種別、どのようなエラーが発生したのかを示すエラー種別、エラーが発生した日時が管理される。図示の第１行のレコードの例の場合、リソースマネージャホストＡにおいて、Ｍｅｍｏｒｙ１で、Ｅｍｐｔｙとなるエラーが日時ＹＹＹ／ＭＭ／ＤＤｈｈ／ｍｍ／ｄｄに発生したことを示している。
【００３５】
図７はリソース情報データの通信プロトコルの例を示す図である。このリソース情報１０００は、リソースエラーを生じたリソースマネージャホストが統合マネージャホストに送信する移動したい（実行中だった）ジョブの情報やジョブ実行に必要なＰＰ情報等のリソース情報である。
【００３６】
このリソース情報１０００は、データの先頭にこのデータがリソース情報データであることを示すリソースデータ開始のフラグを設定し、それに続いて、送信元リソースマネージャホスト名、実行中であったジョブのＩＤ、送信日時が設定され、その後ろに、ジョブ実行に必要なメモリ容量、ディスク容量、推奨ＣＰＵ性能、ＰＰ数、ＰＰ名とそのバージョンを付け、最後に、このリソース情報データの後に送信するＰＰのインストールに必要なレジストリファイルのファイル数、ＰＰの緩急設定に必要な情報のファイル数を付与して構成されている。図示例では、２つのＰＰのＰＰ名１、２があり、それぞれに、バージョン、レジストリファイルのファイル数、情報のファイル数が設定されている。
【００３７】
図８はリソース情報データテーブルの構成例を示す図である。このリソース情報データテーブル１０１０は、リソースマネージャホストから送信されてきた図７に示すリソース情報を、統合マネージャホスト側で切り分けて、リソース情報データテーブルに格納したものである。
【００３８】
図９はパフォーマンス情報データテーブルの構成例を示す図である。このパフォーマンス情報データテーブル１０１１は、統合マネージャホストが、他に利用可能なリソースマネージャホストのリソースの使用状況を調べ、その結果を格納するものであり、管理対象のリソースマネージャホスト名、リソース数、日時、リソース名、使用率が格納される。図示例では、管理対象のリソースマネージャホスト名ＨｏｓｔＡが３つのリソースを持ち、日時２００３／０２／２１１０：３０：００の状態で、リソースとしてＣＰＵ、ＭＥＭＯＲＹ、ＤＩＳＫを持ち、それぞれ、５０％、３０％、６０％の使用率であることを示している。なお、使用状況を調査する方法は、公知の方法を使用することができる。
【００３９】
図１０は統合マネージャホスト１０に接続されているＰＰ情報データベース１００の構成を説明する図である。このＰＰ情報データベース１００には、統合マネージャホストが管理しているリソースマネージャホストにインストールしてあるＰＰ名とバージョンとが、各リソースマネージャホスト毎に区別して登録されている。管理対象のリソースマネージャホストに新しくＰＰがインストールされると、このデータベースに、ＰＰ名とバージョンとが逐次登録される。
【００４０】
図１１はインストール指示の通信プロトコル例を示す図である。インストール指示は、ジョブ実行に必要な情報が足りなくて、そのリソースマネージャホストがジョブ実行可能状態になかった場合に、インストール指示部１８０が、リソースマネージャホストに送信する指示である。
【００４１】
このインストール指示の通信プロトコル１１００は、データの先頭にこのデータがインストール指示のデータであることを示すインストール指示データ開始フラグを設定し、それに続いて、インストールするＰＰ名、バージョンＮｏ、このデータの後に送信するインストールに必要なインストール設定ファイルのファイル数、レジストリの設定に必要なレジストリ情報ファイルのファイル数、インストールするＰＰの環境設定に必要な情報ファイルのファイル数を付加して送信される。
【００４２】
図１２はインストール情報について説明する図である。このインストール情報１１１０は、図１１により説明したプロトコルによるインストール指示の後に送信する、実際にインストールに必要な情報ファイル群である。これらのファイル群は、インストールを行うインストーラファイルとそのインストーラに必要な情報を記載したインストーラ設定ファイル群、レジストリの設定に必要なレジストリファイル群、インストールＰＰの環境設定に必要なＰＰ環境設定ファイル群により構成される。
【００４３】
図１３はリソースの削除依頼の通信プロトコルの例を示す図である。このリソースの削除依頼は、ジョブスケジューリングを再定義して実行可能なリソースマネージャホストを決定した場合に、リソースの不要な情報の削除の削除をリソースマネージャホストに依頼するときに、統合マネージャホストからリソースマネージャホストに送信するものである。そして、このプロトコルは、その先頭に、このデータが削除依頼のデータであることを示す削除依頼開始のフラグが設定され、それに続いて、削除するＰＰ名称、バージョンＮｏを付与して送信される。
【００４４】
図１４はリソースマネージャホストでリソースエラーが生じた場合の統合マネージャホストにおけるリソースエラー回避の処理動作を説明するフローチャートであり、次に、これについて説明する。
【００４５】
（１）統合マネージャホストは、ジョブの全てが正常に実行されている場合、リソースエラー情報がリソースマネージャホストから送信されてくるのを待ち受ける待機状態にある。リソースエラー情報がリソースマネージャホストから図５により説明した通信データプロトコル９００の形で送信されてくると、このエラー情報は、統合マネージャホストのリソースエラー情報受信部１３０で受け取られる（ステップ３００）。
【００４６】
（２）リソースエラー情報受信部１３０は、受信したリソースエラー情報をリソースエラー情報解読部１４０に渡す。リソースエラー情報解読部１４０は、どのリソースマネージャホストのどのリソースで、どのようなエラーが起きたのかを切り分け、図６により説明したエラー情報データテーブル９１０に格納する（ステップ３１０）。
【００４７】
（３）統合マネージャホストは、ステップ３１０の情報の解読後、再びリソースマネージャホストから送信されてくる移動したい（実行中だった）ジョブの情報やジョブ実行に必要なＰＰ情報等のリソース情報の待ち受け状態になる。そして、統合マネージャホストは、図７により説明したようなリソース情報１０００をリソースマネージャホストから受け取ると、リソース情報を切り分けて、図８により説明したリソース情報テーブル１０１０に格納する。統合マネージャホストは、さらに、リソースマネージャホストから送信されてくるレジストリ情報ファイル、設定情報ファイルを受け取り、それらのファイルをユーザ指定のディレクトリへ一時保存する（ステップ３２０）。
【００４８】
（４）次に、統合マネージャホストは、リソース使用状況判定部１５０で、各リソースマネージャホストのリソースのパフォーマンス情報を取得（公知の機能を使用）し、図９により説明したようなパーフォーマンステーブルを作成する（ステップ３３０）。
【００４９】
（５）次に、統合マネージャホストは、他に利用することができるリソースマネージャホストを検索する。この検索は、前述したステップ３３０の処理で作成した図９に示すパフォーマンス情報データテーブル内のリソースとステップ３２０の処理で内容を格納した図８に示すリソース情報テーブルに格納されている移動したいジョブが必要とするリソースと比較することにより行われる。
【００５０】
例えば、リソースマネージャホストＡで実行されていたジョブ▲１▼をリソースマネージャホストＢに移すことが可能かを調べるものとする。ジョブ▲１▼が必要とするリソースとしてのメモリの最大容量は、図８に示しているように７１ＭＢであり、リソースマネージャホストＢのリソース使用率におけるメモリの使用率は、図９に示しているように５０％である。このため、リソースマネージャホストＢが備えているメモリの容量の残量、この場合５０％の残容量が７１ＭＢ以上あるか否かを調べる。同様に、ジョブ▲１▼が必要とするリソースとしてのディスク容量は、図８に示しているように１ＧＢであり、２つのディスクを持つリソースマネージャホストＢのリソース使用率におけるディスクの使用率は、図９に示しているように、それぞれ４０％、７０％である。このため、リソースマネージャホストＢが備えている２つのディスクの容量の残量、この場合６０％、３０％の残容量の何れか一方の残容量が１ＧＢ以上あるか否かを調べる。
【００５１】
全てのリソースについて前述したような比較を行い、この比較の結果、移動したいジョブが必要とするリソースより使用可能容量が大きいリソースを有する他のリソースマネージャホストを発見した場合、使用可能リソースマネージャホスト発見とし、発見できなかった場合、使用可能リソースマネージャホストなしと判定する（ステップ３４０）。
【００５２】
（６）ステップ３４０の判定で、他に利用できるリソースマネージャホストを発見できなかった場合、統合マネージャホストは、ジョブスケジュールを組み直して、組み直した結果、実行順序が遅いジョブの実行を行うリソースマネージャホストを実行可能なリソースマネージャホストとして決定する。なお、ジョブスケジュールの組み直しは、公知の手段を使用して行うことができる（ステップ３５０、３６０）。
【００５３】
（７）ステップ３６０の処理後、または、ステップ３４０の判定で、他に利用できるリソースマネージャホストを発見できた場合、リソース調査部１６０は、ジョブを実行する対象リソースマネージャホストに、ジョブ実行に必要なＰＰ情報等が存在するか、その対象リソースマネージャホストに不要な情報はないか等、そのリソースマネージャホストがジョブ実行可能状態であるか否かを調査する。必要なＰＰ等の調査は、図１０により説明した統合マネージャホストに接続されているＰＰ情報データベース１００に登録されている管理対象のリソースマネージャホストのＰＰ情報１０１を元に、図８に示したリソース情報の必要ＰＰ名を比較することにより行う（ステップ３７０）。
【００５４】
（８）ステップ３７０の調査での管理対象リソースマネージャホストのＰＰ情報に必要ＰＰが登録されているか等の結果により、そのリソースマネージャホストがジョブ実行可能状態か否かを判定する。すなわち、管理対象リソースマネージャホストのＰＰ情報に必要ＰＰが登録されていた場合、そのリソースマネージャホストがジョブ実行可能状態であると判定し、登録れていなかった場合、ジョブ実行可能状態にないと判定する。そして、そのリソースマネージャホストが実行可能状態になかった場合、ジョブ実行に必要な情報が足りないのか、後回しになったジョブが使用していたリソースがあり、その結果として必要なリソースが足りないのかを判定する（ステップ３８０）。
【００５５】
（９）ステップ３８０の判定で、ジョブ実行に必要な情報が足りないで、そのリソースマネージャホストがジョブ実行可能状態になかった場合、インストール指示部１８０は、図１１により説明したようなインストール指示通信プロトコル１１００を作成し、リソースマネージャホストに送信する（ステップ３９０）。
【００５６】
（１０）続いて、インストール情報送信部１９０は、ユーザ指定のディレクトリに保存してある図１２により説明したような必要ＰＰのインストーラや設定ファイル群であるインストール情報１１１０をリソースマネージャに送信する。その後、ステップ３７０からの処理に戻って、再びリソース調査からの処理を続ける（ステップ４００）。
【００５７】
（１１）ステップ３８０の判定で、後回しになったジョブが使用していたリソースがあり、その結果として必要なリソースが足りないで、そのリソースマネージャホストがジョブ実行可能状態になかった場合、すなわち、ステップ３４０の判定で、実行可能なリソースマネージャホストを発見することができずに、ジョブスケジューリングを再定義することにより、実行可能なリソースマネージャホストを決定した場合、リソース削除依頼部１７０は、そのリソースマネージャホストにリソース削除の依頼を行う。ここでは、図１３に示したようなリソースの不要な情報の削除依頼プロトコル１２００を作成して、リソースマネージャホストに送信する。削除して欲しいＰＰ情報の決定は、ステップ３５０、ステップ３６０の処理で決定された後回しになったジョブのＰＰ情報を削除対象ＰＰとする。その後、ステップ３７０からの処理に戻って、再びリソース調査からの処理を続ける（ステップ４１０）。
【００５８】
（１２）ステップ３８０の判定で、そのリソースマネージャホストがジョブ実行可能状態にあると判断された場合、ジョブスケジュール定義変更部２００は、ジョブスケジュールを組み直し、ジョブ実行部１２０は、ジョブの実行を再開する。ジョブスケジュールの組み直し、及び、ジョブ実行は、公知の手段により行うことができる（ステップ４２０、４３０）。
【００５９】
図１５は前述したステップ３５０の処理で行われるジョブスケジュールの組み直しの例について説明する図であり、次に、これについて説明する。
【００６０】
図１５（ａ）に６００として示すように、ジョブ▲１▼、▲２▼が同一のあるいは異なるリソースマネージャホストで実行され、いま、ジョブ▲３▼、▲４▼、▲５▼が異なるリソースマネージャホスト上で実行中であるとし、この状態で、ジョブ▲３▼を実行中のリソースマネージャホストでリソースエラーが生じたとする。そして、このときに、ステップ３５０のジョブスケジュールの組直しの処理に移行して、ジョブスケジュールの組み直しを行うと、実行順序が遅くてもかまわないジョブ▲５▼を実行しているリソースマネージャホストにジョブ▲３▼を移動して実行させる。すなわち、ジョブ▲５▼を実行しているリソースマネージャホストは、ジョブ▲５▼の実行に必要であったリソースが不要となり、このジョブ▲５▼に実行に必要であったリソースを使用することにより、ジョブ▲３▼の処理を実行することが可能となる。ジョブ▲５▼は、ジョブ▲３▼か、ジョブ▲４▼の実行後に実行するようにスケジュールすることになるが、ジョブ▲４▼がすでに実行中であるため、ジョブ▲４▼はジョブ▲３▼より先にその処理が終了することになる。このため、ジョブ▲５▼は、ジョブ▲４▼の実行中であるリソースマネージャホストに移動する。このような、ジョブスケジュールの組み直しにより、ジョブスケジュール変更後のスケジュールは、図１５（ｂ）に６０１として示すようなものとなる。なお、図１５（ａ）で最後に実行されればよいとされていたジョブ▲６▼は、図１５（ｂ）に示す組み直しの場合にも最後に行われるようにされる。
【００６１】
図１６はリソースマネージャホストでリソースエラーが生じた場合のリソースマネージャホストにおけるリソースエラー回避の処理動作を説明するフローチャートであり、次に、これについて説明する。ここでの処理は、自リソースマネージャホストでリソースエラーを生じた場合、及び、他のリソースマネージャホストでリソースエラーを生じた場合の両者の処理を含むものである。
【００６２】
（１）各リソースマネージャホストは、自ホスト内のリソースを監視している。いま、図１に示すリソースマネージャホストＡ２０において、リソースＡ−１でリソース不足が発生したものとする。この場合、リソースマネージャホストのリソースエラー検知部２１０（図４により説明したリソース監視画面）は、そのリソース不足エラーを検知する。このエラー検知は、公知の手段により行うことができ、各リソースに使用率のしきい値（ユーザ設定）を設け、それを越えた場合にエラーとみなすこととして行う（ステップ５００）。
【００６３】
（２）リソースエラー検知部２１０がリソースエラーを検知すると、リソースマネージャホストのリソースエラー情報通知部２２０は、統合マネージャホストにリソースエラーが起きたことを通知する。この通知は、図５により説明したエラー情報データ通信プロトコル９００を作成し、これを統合マネージャホストに送信することにより行われる（ステップ５１０）。
【００６４】
（３）次に、リソースマネージャホストは、実行することができなくなったジョブの情報や、ジョブ実行に必要なＰＰ情報等のリソース情報を、リソース情報通知部２３０から統合マネージャホストに通知する。この通知は、図７により説明したように、リソース情報通信プロトコルを作成して送信し、その後、必要ＰＰのレジストリ情報からレジストリ情報ファイルを作成し、さらに、ＰＰの設定が定義してあるファイルをコピーし、これらのファイル群を、リソース情報通知後に、続けて統合マネージャホストに転送することにより行われる（ステップ５２０）。
【００６５】
ステップ５２０の処理の後、リソースエラーを発生したリソースマネージャホストは、統合マネージャホスト側からの命令を待ち受ける状態となる。また、リソースエラーを発生しなかったリソースマネージャホストは、リソースの監視を行いながら統合マネージャホスト側からの命令を待ち受ける状態となっている。
【００６６】
（４）リソースマネージャホストの命令受け付け部２４０は、統合マネージャホストから送信された命令を受信すると、その命令が「リソース削除依頼」であるか、「インストール指示」であるか、「ジョブ実行」であるかに切り分ける（ステップ５３０）。
【００６７】
（５）ステップ５３０で受信した命令がインストール指示であった場合、インストール情報受信部２６０は、次に送られてくるインストール情報を受信し、インストール実行部２７０が、そのインストール情報をマシン内にインストールする。その後、再び、命令待ち受け状態となる（ステップ５５０、５６０）。
【００６８】
（６）ステップ５３０で受信した命令がリソース削除依頼であった場合、リソース削除部２５０は、送られてきた削除依頼データで指定してある不要な情報を削除し、その後、再び、命令待ち受け状態となる（ステップ５４０）。
【００６９】
（７）ステップ５３０で受信した命令がジョブ実行であった場合、そのリソースマネージャホストは、ジョブの実行を開始し、また、実行中に使用する各リソースの監視を開始する（ステップ５７０）。
【００７０】
前述した本発明の実施形態における各処理は、処理プログラムとして構成することができ、この処理プログラムは、ＨＤ、ＤＡＴ、ＦＤ、ＭＯ、ＤＶＤ−ＲＯＭ、ＣＤ−ＲＯＭ等の記録媒体に格納して提供することができる。
【００７１】
前述した本発明の実施形態によれば、ジョブ実行中にリソースエラーを発生させたリソースマネージャホストは、統合マネージャホストに対して、リソースエラーの発生を報告し、他のリソースマネージャホスト実行中のジョブを移すことができるので、リソースエラーによる業務停止を回避することができる。
【００７２】
前述した本発明の実施形態は、リソースマネージャホストでリソースエラーが発生した場合を例として説明したが、本発明は、リソースマネージャホストのリソース以外の他の機能に障害が発生して、そのリソースマネージャホストでジョブの実行を行うことができなくなった場合にも適用することができる。この場合、統合マネージャホストが、リソースマネージャホストとの間で通信を行うことができなくなったことを検出して対応すればよい。
【００７３】
【発明の効果】
以上説明したように本発明によれば、ジョブ実行中にそのジョブを実行しているリソースマネージャホストでリソースエラーが発生した場合にも、業務停止を回避することができる。
【図面の簡単な説明】
【図１】本発明の一実施形態によるジョブ実行システムの構成を示すブロック図である。
【図２】統合マネージャホストの内部構成を示すブロック図である。
【図３】リソースマネージャホストの内部構成を示すブロック図である。
【図４】リソースマネージャホストで自ホストのリソースを監視するために用いるリソース監視画面の例を示す図である。
【図５】エラー情報データの通信プロトコルの例を示す図である。
【図６】エラー情報データテーブルの構成例を示す図である。
【図７】リソース情報データの通信プロトコルの例を示す図である。
【図８】リソース情報データテーブルの構成例を示す図である。
【図９】パフォーマンス情報データテーブルの構成例を示す図である。
【図１０】統合マネージャホストに接続されているＰＰ情報データベースの構成を説明する図である。
【図１１】インストール指示の通信プロトコル例を示す図である。
【図１２】インストール情報について説明する図である。
【図１３】リソースの削除依頼の通信プロトコルの例を示す図である。
【図１４】リソースマネージャホストでリソースエラーが生じた場合の統合マネージャホストにおけるリソースエラー回避の処理動作を説明するフローチャートである。
【図１５】ジョブスケジュールの組み直しの例について説明する図である。
【図１６】リソースマネージャホストでリソースエラーが生じた場合のリソースマネージャホストにおけるリソースエラー回避の処理動作を説明するフローチャートである。
【符号の説明】
１０統合マネージャホスト
２０〜４０リソースマネージャホスト
５０〜９０リソース
１００ＰＰ情報データベース
１１０ジョブスケジュール実行部
１２０ジョブ実行部
１３０リソースエラー情報受信部
１４０リソースエラー情報解読部
１５０各リソース使用状況判定部
１６０リソース調査部
１７０リソース削除依頼部
１８０インストール指示部
１９０インストール情報送信部
２００ジョブスケジュール定義変更部
２１０リソースエラー検知部
２２０リソースエラー通知部
２３０リソース情報通知部
２４０命令受け付け部
２５０リソース削除部
２６０インストール情報受信部
２７０インストール実行部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a job execution system and an execution control method, and more particularly to an execution system and an execution control method for a scheduled job in an information processing system.
[0002]
[Prior art]
In recent information processing systems, in order to process business jobs unattended, job scheduling such as automatic start of a job by time reservation and automatic start of a connection job reserved next by the end of a preceding job is performed.
[0003]
In addition, one manager host is prepared as an information processing system so that jobs defined on the manager host can be executed on multiple other execution hosts, and execution results are centrally managed on the manager host. Systems that can do this are known.
[0004]
The job execution system as described above is re-executed on another host when a job that has been executed on a certain host cannot be executed on that host, as described in Patent Document 1 and the like. Can do.
[0005]
[Patent Document 1]
JP-A-11-353284
[0006]
[Problems to be solved by the invention]
The above-mentioned conventional technology that controls job execution by performing job scheduling determines the scheduling in consideration of machine information, execution job information, job execution conditions, etc. when building the job schedule. If the business data capacity is changed and the execution time of each job is significantly different from the forecast at the time of schedule construction, the execution of jobs assigned to the CPU of each host machine becomes inefficient. This causes a problem that the processing time of jobs scheduled in the system increases.
[0007]
In addition, the above-described conventional technology defines a job to be performed when an error occurs when a machine trouble occurs, such as a resource error such as insufficient memory or disk capacity, in the running host during system operation. Although troubles can be avoided, there is a problem that job execution in a normal case is stopped.
[0008]
An object of the present invention is to provide a job execution system and an execution control method that solve the above-described problems of the prior art and that can prevent a business operation from being stopped due to a resource error during job execution. is there.
[0009]
[Means for Solving the Problems]
According to the present invention, the object is to provide, in a job execution system for executing a scheduled job group, a plurality of resource manager hosts that execute jobs with a plurality of types of resources, and an integrated manager that manages job execution. A resource error detection means for detecting a resource error by monitoring the status of a resource during job execution, and a resource error notification means for reporting a resource error to the integrated manager host. In addition, the integrated manager host can execute a job performed on the resource manager host that has reported the resource error based on the resource status of the resource manager host other than the resource manager host that has reported the resource error. Relocation destination litho It is accomplished by having a means for determining a scan manager host.
[0010]
Further, the object is to provide a resource error report in a job execution control method for controlling execution of a scheduled job group by receiving a resource error report from a resource manager host that executes a job with a plurality of types of resources. Determine the migration destination resource manager host that can execute the job performed on the resource manager host that reported the resource error from the resource status of the resource manager host other than the resource manager host that performed the job. Is achieved.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of a job execution system and an execution control method according to the present invention will be described below in detail with reference to the drawings.
[0012]
FIG. 1 is a block diagram showing a configuration of a job execution system according to an embodiment of the present invention, FIG. 2 is a block diagram showing an internal configuration of an integrated manager host, and FIG. 3 is a block diagram showing an internal configuration of a resource manager host. 1 to 3, 10 is an integrated manager host, 20 to 40 are resource manager hosts, 50 to 90 are resources, 100 is a PP information database, 110 is a job schedule execution unit, 120 is a job execution unit, and 130 is a resource error Information receiving unit, 140 is a resource error information decoding unit, 150 is each resource usage status determining unit, 160 is a resource investigating unit, 170 is a resource deletion request unit, 180 is an installation instruction unit, 190 is an installation information transmitting unit, and 200 is a job Schedule definition change unit, 210 is a resource error detection unit, 220 is a resource error notification unit, 230 is a resource information notification unit, 240 is a command reception unit, 250 is a resource deletion unit, 260 is an installation information reception unit, and 270 is an installation execution unit It is.
[0013]
As shown in FIG. 1, the job execution system according to the embodiment of the present invention is configured by connecting an integrated manager host 10 and a plurality of resource manager hosts 20 to 40 via a network. The resource manager hosts 20 to 40 include various resources 50 to 90 such as CPUs, memories, and disks to be managed, or are connected to each other, and execute jobs. The integrated manager host 10 is connected to a PP information database 100 that stores and manages application (PP) information used by the resource manager host, and manages the schedule of jobs to be executed and the resource manager host that executes jobs. Manage.
[0014]
The integrated manager host 10 includes various functional units as shown in FIG. 2, and each of these functional units will be described next.
[0015]
The job schedule execution unit 110 controls the job schedule based on the already defined job schedule.
[0016]
The job execution unit 120 executes each job defined by the job schedule.
[0017]
The resource error information receiving unit 130 receives an error notification transmitted from the resource manager host.
[0018]
The resource error information decoding unit 140 decodes the error notification received by the resource error information receiving unit 130.
[0019]
The resource usage status determination unit 150 investigates how much the resource group of all resource manager hosts connected via the network is used, and a destination that can execute a job that has stopped due to a resource error Determine the resource manager host.
[0020]
When the resource manager host of the migration destination that can execute the job is determined, the resource investigating unit 160 determines information such as PP that must be installed on the resource manager host, and unnecessary information that exists on the resource manager host. Wash out.
[0021]
The resource deletion request unit 170 uses the resources of each resource manager host when none of the resource manager hosts has enough room to execute a job as a transfer destination as determined by the resource usage status determination unit 150. See the rate and request removal of unnecessary information to the resource manager who has resources that can be released.
[0022]
The installation instruction unit 180 requests the migration destination resource manager host to install information such as PP information necessary for execution of a job.
[0023]
The installation information transmission unit 190 transmits information to be actually installed on the machine to the resource manager host.
[0024]
The job schedule definition unit 200 determines in what order the job is used, which resource manager host is used, and in which time zone, and rewrites the job schedule table.
[0025]
Each resource manager host is configured to include various functional units as shown in FIG. 2, and each of these functional units will be described next.
[0026]
The resource error detection unit 210 detects a resource error in its own resource manager host, and the resource error notification unit 220 notifies the integrated manager host 10 of the acquired resource error information.
[0027]
The resource information notification unit 230 notifies the integrated manager host 10 of information on what resources are insufficient in the own resource manager host, and information such as PP necessary for job execution.
[0028]
The instruction receiving unit 240 receives and decodes the instruction transmitted from the integrated manager host 10.
[0029]
The resource deletion unit 250 releases resources based on information determined to be unnecessary notified from the resource deletion request unit 170.
[0030]
The installation information receiving unit 260 receives installation information necessary for job execution from the integrated manager host 10, and the installation execution unit 270 installs information necessary for job execution.
[0031]
Next, a processing operation for avoiding the suspension of a business due to a resource error during job execution in the job execution system configured as described above will be described. Before describing the processing operation, it is necessary in the processing. The configuration of the resource manager host resource monitoring screen, various data communication protocols, and various data tables will be described with reference to the drawings.
[0032]
FIG. 4 is a diagram showing an example of a resource monitoring screen used for monitoring resources of the own host by the resource manager host. This display screen displays the usage status of resources of the host registered in the resource manager host. In the illustrated example, the resource manager host displaying this screen displays the CPU as resources A to D. , Memory, disk, etc., resources A to C are used by job (1), resource A (CPU) usage rate is 30%, resources B (memory) and C (disk) are each 20 MB , 80 MB is being used.
[0033]
FIG. 5 is a diagram illustrating an example of a communication protocol for error information data. This protocol is used for notifying error information from the resource error notifying unit 220 of the resource manager host to the integrated manager host 10, and the start of error data indicating that this data is error information data at the head of the data. Followed by the resource manager host name where the error occurred, the resource type such as insufficient capacity that caused the error, the error type indicating what kind of error occurred, and the date and time when the error occurred Is set and configured.
[0034]
FIG. 6 is a diagram showing a configuration example of the error information data table. The error information data table is for the integrated manager host to store and manage the error information shown in FIG. 5 transmitted from the resource manager host. Indicate which error occurred in which resource on which resource manager host, the name of the resource manager host where the error occurred, the type of resource such as insufficient capacity that caused the error, and what kind of error occurred The error type indicating whether the error occurred and the date and time when the error occurred are managed. In the example of the record in the first row shown in the figure, in the resource manager host A, it is indicated that an error indicating “Empty” occurs in the date and time YYY / MM / DDhh / mm / dd in Memory1.
[0035]
FIG. 7 is a diagram illustrating an example of a communication protocol for resource information data. The resource information 1000 is resource information such as information on a job that the resource manager host having generated a resource error wants to move (being executed) transmitted to the integrated manager host, and PP information necessary for job execution.
[0036]
In the resource information 1000, a resource data start flag indicating that this data is resource information data is set at the head of the data. Subsequently, the transmission source resource manager host name, the ID of the job being executed, A transmission date and time is set, followed by the memory capacity necessary for job execution, disk capacity, recommended CPU performance, number of PPs, PP name and version, and finally installation of the PP to be transmitted after this resource information data The number of registry files necessary for the configuration and the number of information files necessary for the PP setting are set. In the illustrated example, there are PP names 1 and 2 of two PPs, and the version, the number of registry files, and the number of information files are set in each.
[0037]
FIG. 8 is a diagram illustrating a configuration example of the resource information data table. The resource information data table 1010 is obtained by dividing the resource information shown in FIG. 7 transmitted from the resource manager host on the integrated manager host side and storing it in the resource information data table.
[0038]
FIG. 9 is a diagram showing a configuration example of the performance information data table. This performance information data table 1011 is for the integrated manager host to check the resource usage status of other available resource manager hosts and to store the results. The managed resource manager host name, the number of resources, the date and time , Resource name, usage rate is stored. In the illustrated example, the managed resource manager host name HostA has three resources, and has the CPU, MEMORY, and DISK as resources in the date and time 2003/02/21 10:30, respectively, 50%, 30 % And 60%. In addition, the method of investigating a use condition can use a well-known method.
[0039]
FIG. 10 is a diagram for explaining the configuration of the PP information database 100 connected to the integrated manager host 10. In the PP information database 100, the PP name and version installed on the resource manager host managed by the integrated manager host are registered separately for each resource manager host. When a new PP is installed on the resource manager host to be managed, the PP name and version are sequentially registered in this database.
[0040]
FIG. 11 is a diagram illustrating an example of a communication protocol for an installation instruction. The installation instruction is an instruction that the installation instruction unit 180 transmits to the resource manager host when the information necessary for job execution is insufficient and the resource manager host is not in a job executable state.
[0041]
The installation instruction communication protocol 1100 sets an installation instruction data start flag indicating that this data is the data of the installation instruction at the head of the data, followed by the PP name to be installed, the version number, and after this data. The number of installation setting files necessary for installation to be transmitted, the number of registry information files necessary for setting the registry, and the number of information files necessary for environment setting of the PP to be installed are added.
[0042]
FIG. 12 is a diagram for explaining the installation information. The installation information 1110 is an information file group actually transmitted for installation that is transmitted after an installation instruction using the protocol described with reference to FIG. These file groups depend on the installer file to be installed, the installer setting file group in which information necessary for the installer is described, the registry file group necessary for setting the registry, and the PP environment setting file group necessary for setting the environment of the installed PP. Composed.
[0043]
FIG. 13 is a diagram illustrating an example of a communication protocol for a resource deletion request. This resource deletion request is executed when the resource manager host is requested to delete unnecessary information of resources when the resource manager host that can be executed is determined by redefining job scheduling. It is sent to the manager host. In this protocol, a deletion request start flag indicating that this data is the data of the deletion request is set at the head, and subsequently, the PP name to be deleted and the version number are assigned and transmitted.
[0044]
FIG. 14 is a flowchart for explaining the processing operation for avoiding resource errors in the integrated manager host when a resource error occurs in the resource manager host. This will be described next.
[0045]
(1) When all the jobs are executed normally, the integrated manager host is in a standby state waiting for resource error information to be transmitted from the resource manager host. When the resource error information is transmitted from the resource manager host in the form of the communication data protocol 900 described with reference to FIG. 5, this error information is received by the resource error information receiving unit 130 of the integrated manager host (step 300).
[0046]
(2) The resource error information receiving unit 130 passes the received resource error information to the resource error information decoding unit 140. The resource error information decoding unit 140 determines which resource of which resource manager host has an error and stores it in the error information data table 910 described with reference to FIG. 6 (step 310).
[0047]
(3) After decoding the information in step 310, the integrated manager host waits for resource information such as the information of the job that the resource manager host wants to move (that was being executed) and the PP information necessary for job execution, which is transmitted again from the resource manager host. It becomes a state. When the integrated manager host receives the resource information 1000 as described with reference to FIG. 7 from the resource manager host, the integrated manager host separates the resource information and stores it in the resource information table 1010 described with reference to FIG. The integrated manager host further receives the registry information file and the setting information file transmitted from the resource manager host, and temporarily stores these files in a user-specified directory (step 320).
[0048]
(4) Next, the integrated manager host acquires the resource performance information of each resource manager host (using a known function) by the resource usage status determination unit 150, and uses the performance table as described with reference to FIG. Create (step 330).
[0049]
(5) Next, the integrated manager host searches for resource manager hosts that can be used elsewhere. In this search, the resource in the performance information data table shown in FIG. 9 created in the process of step 330 and the job to be moved stored in the resource information table shown in FIG. This is done by comparing the required resources.
[0050]
For example, suppose that it is checked whether or not the job (1) executed on the resource manager host A can be transferred to the resource manager host B. As shown in FIG. 8, the maximum memory capacity required for job (1) is 71 MB, and the memory usage rate in the resource usage rate of the resource manager host B is shown in FIG. So that it is 50%. Therefore, it is checked whether the remaining capacity of the memory provided in the resource manager host B, in this case, the remaining capacity of 50% is 71 MB or more. Similarly, the disk capacity as a resource required by job (1) is 1 GB as shown in FIG. 8, and the disk usage rate in the resource usage rate of the resource manager host B having two disks is As shown in FIG. 9, they are 40% and 70%, respectively. Therefore, it is checked whether the remaining capacity of the capacity of the two disks provided in the resource manager host B, in this case, the remaining capacity of either 60% or 30% is 1 GB or more.
[0051]
If all resources are compared as described above, and the result of this comparison is that another resource manager host having a larger usable capacity than the resource required by the job to be moved is found, the available resource manager host is found. If it cannot be found, it is determined that there is no usable resource manager host (step 340).
[0052]
(6) If no other resource manager host that can be used is found in the determination in step 340, the integrated manager host reconfigures the job schedule, and as a result of the reassembly, the resource manager host that executes the job whose execution order is slow Is determined as an executable resource manager host. The job schedule can be reconfigured using a known means (steps 350 and 360).
[0053]
(7) After the process of step 360 or when the resource manager host that can be used is found in the determination of step 340, the resource investigating unit 160 needs to execute the job to the target resource manager host that executes the job. It is checked whether or not the resource manager host is in a job executable state, such as whether or not there is PP information or the like, or whether there is unnecessary information in the target resource manager host. The necessary PP is examined based on the PP information 101 of the management target resource manager host registered in the PP information database 100 connected to the integrated manager host described with reference to FIG. This is done by comparing the necessary PP names of information (step 370).
[0054]
(8) It is determined whether or not the resource manager host is in a job executable state based on the result of whether the necessary PP is registered in the PP information of the management target resource manager host in the investigation in step 370. That is, if the necessary PP is registered in the PP information of the managed resource manager host, it is determined that the resource manager host is in a job executable state, and if it is not registered, it is determined that the job is not in a job executable state. To do. If the resource manager host is not in an executable state, there is insufficient information necessary for job execution, or there are resources used by the postponed job, and as a result, there are insufficient resources Is determined (step 380).
[0055]
(9) If it is determined in step 380 that the information necessary for job execution is not sufficient and the resource manager host is not in a job executable state, the installation instruction unit 180 performs the installation instruction communication as described with reference to FIG. A protocol 1100 is created and sent to the resource manager host (step 390).
[0056]
(10) Subsequently, the installation information transmission unit 190 transmits the installation information 1110 which is a necessary PP installer and setting file group as described with reference to FIG. 12 stored in the user-specified directory to the resource manager. Thereafter, the processing returns to step 370 and the processing from the resource investigation is continued again (step 400).
[0057]
(11) If it is determined in step 380 that there is a resource used by the postponed job and, as a result, the necessary resource is insufficient and the resource manager host is not in a job executable state, that is, When it is determined in step 340 that an executable resource manager host cannot be found and the executable resource manager host is determined by redefining job scheduling, the resource deletion requesting unit 170 determines that resource. Request the manager host to delete the resource. Here, a deletion request protocol 1200 for unnecessary information on resources as shown in FIG. 13 is created and transmitted to the resource manager host. In determining the PP information to be deleted, the PP information of the job that has been delayed in the processing of Step 350 and Step 360 is set as the deletion target PP. Thereafter, the processing returns to the processing from step 370 and the processing from the resource investigation is continued again (step 410).
[0058]
(12) If it is determined in step 380 that the resource manager host is in a job executable state, the job schedule definition changing unit 200 reconfigures the job schedule, and the job execution unit 120 resumes job execution. To do. The reorganization of the job schedule and the job execution can be performed by known means (steps 420 and 430).
[0059]
FIG. 15 is a diagram for explaining an example of job schedule recombination performed in the process of step 350 described above, which will be described next.
[0060]
As indicated by 600 in FIG. 15A, jobs (1) and (2) are executed on the same or different resource manager hosts, and now jobs (3), (4), and (5) are different resource managers. Assume that a resource error occurs in the resource manager host that is executing job (3) in this state. At this time, when the process is shifted to the job schedule reassembling process in step 350 and the job schedule is reconfigured, the resource manager host executing the job (5) whose execution order may be delayed. Move job (3) to execute. That is, the resource manager host executing the job (5) does not need the resources necessary for executing the job (5), and uses the resources necessary for the execution of the job (5). Thus, it is possible to execute the process of job (3). Job (5) is scheduled to be executed after execution of job (3) or job (4), but since job (4) is already being executed, job (4) is job (3). The process ends before ▼. Therefore, the job (5) moves to the resource manager host that is executing the job (4). As a result of the reorganization of the job schedule, the schedule after the job schedule is changed becomes as shown by 601 in FIG. Note that the job {circle around (6)} that should have been executed last in FIG. 15A is executed last even in the case of the reassembly shown in FIG.
[0061]
FIG. 16 is a flowchart for explaining a processing operation for avoiding a resource error in the resource manager host when a resource error occurs in the resource manager host, which will be described next. The processing here includes both processing when a resource error occurs in its own resource manager host and when a resource error occurs in another resource manager host.
[0062]
(1) Each resource manager host monitors resources in its own host. Now, it is assumed that a resource shortage has occurred in the resource A-1 in the resource manager host A20 shown in FIG. In this case, the resource error detection unit 210 (resource monitoring screen described with reference to FIG. 4) of the resource manager host detects the resource shortage error. This error detection can be performed by known means, and a threshold value (user setting) of the usage rate is provided for each resource, and if it exceeds that, it is regarded as an error (step 500).
[0063]
(2) When the resource error detection unit 210 detects a resource error, the resource error information notification unit 220 of the resource manager host notifies the integrated manager host that a resource error has occurred. This notification is performed by creating the error information data communication protocol 900 described with reference to FIG. 5 and transmitting it to the integrated manager host (step 510).
[0064]
(3) Next, the resource manager host notifies the integrated manager host of resource information such as job information that cannot be executed and PP information necessary for job execution from the resource information notification unit 230. As described with reference to FIG. 7, this notification is generated by creating a resource information communication protocol, and then creating a registry information file from the registry information of the necessary PP. These files are copied and subsequently transferred to the integrated manager host after the resource information notification (step 520).
[0065]
After the processing of step 520, the resource manager host that has generated the resource error enters a state of waiting for an instruction from the integrated manager host side. In addition, the resource manager host in which no resource error has occurred is in a state of waiting for an instruction from the integrated manager host side while monitoring resources.
[0066]
(4) When receiving the command transmitted from the integrated manager host, the command receiving unit 240 of the resource manager host determines whether the command is a “resource deletion request”, “installation instruction”, or “job execution”. It is divided into some (step 530).
[0067]
(5) If the instruction received in step 530 is an installation instruction, the installation information receiving unit 260 receives the installation information sent next, and the installation executing unit 270 installs the installation information in the machine. To do. After that, it again enters a command waiting state (steps 550 and 560).
[0068]
(6) If the command received in step 530 is a resource deletion request, the resource deletion unit 250 deletes unnecessary information specified in the received deletion request data, and then again waits for a command. (Step 540).
[0069]
(7) If the command received in step 530 is job execution, the resource manager host starts job execution and starts monitoring each resource used during execution (step 570).
[0070]
Each processing in the above-described embodiment of the present invention can be configured as a processing program, and this processing program is stored in a recording medium such as HD, DAT, FD, MO, DVD-ROM, and CD-ROM and provided. can do.
[0071]
According to the above-described embodiment of the present invention, the resource manager host that has generated the resource error during job execution reports the occurrence of the resource error to the integrated manager host, and the job being executed by another resource manager host. Therefore, it is possible to avoid a business stop due to a resource error.
[0072]
In the above-described embodiment of the present invention, the case where a resource error has occurred in the resource manager host has been described as an example. However, in the present invention, a failure occurs in a function other than the resource of the resource manager host, and the resource manager This can also be applied when the job cannot be executed on the host. In this case, the integrated manager host may detect and respond to the fact that communication with the resource manager host is no longer possible.
[0073]
【The invention's effect】
As described above, according to the present invention, even when a resource error occurs in a resource manager host that is executing a job during job execution, it is possible to avoid a business stop.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a job execution system according to an embodiment of the present invention.
FIG. 2 is a block diagram showing an internal configuration of an integrated manager host.
FIG. 3 is a block diagram showing an internal configuration of a resource manager host.
FIG. 4 is a diagram showing an example of a resource monitoring screen used for monitoring resources of the host by the resource manager host.
FIG. 5 is a diagram illustrating an example of a communication protocol for error information data.
FIG. 6 is a diagram illustrating a configuration example of an error information data table.
FIG. 7 is a diagram illustrating an example of a communication protocol for resource information data.
FIG. 8 is a diagram illustrating a configuration example of a resource information data table.
FIG. 9 is a diagram illustrating a configuration example of a performance information data table.
FIG. 10 is a diagram illustrating a configuration of a PP information database connected to an integrated manager host.
FIG. 11 is a diagram illustrating an example of a communication protocol for an installation instruction.
FIG. 12 is a diagram illustrating installation information.
FIG. 13 is a diagram illustrating an example of a communication protocol for a resource deletion request.
FIG. 14 is a flowchart for explaining a resource error avoidance processing operation in the integrated manager host when a resource error occurs in the resource manager host.
FIG. 15 is a diagram illustrating an example of job schedule reorganization.
FIG. 16 is a flowchart for explaining a resource error avoidance processing operation in the resource manager host when a resource error occurs in the resource manager host;
[Explanation of symbols]
10 Integrated manager host
20-40 Resource manager host
50-90 resources
100 PP information database
110 Job schedule execution part
120 Job execution part
130 Resource error information receiver
140 Resource error information decoding part
150 Resource usage status determination unit
160 Resource Survey Department
170 Resource deletion request section
180 Installation instruction section
190 Installation information transmitter
200 Job schedule definition change part
210 Resource error detector
220 Resource error notification section
230 Resource information notification part
240 Command receiving part
250 Resource deletion part
260 Installation information receiver
270 Installation execution part

Claims

In the job execution system for executing a scheduled job group, the resource manager host includes a plurality of resource manager hosts that execute jobs with a plurality of types of resources, and an integrated manager host that manages job execution. Includes resource error detection means for detecting resource errors by monitoring the status of resources during job execution, and resource error notification means for reporting resource errors to the integrated manager host. Determine the migration destination resource manager host that can execute the job that was being performed on the resource manager host that reported the resource error from the resource status of the resource manager host other than the resource manager host that reported the error Job execution system, characterized in that it comprises means that.

In the job execution control method that controls the execution of scheduled job groups, the resource manager host that received the resource error report from the resource manager host that executed the job with multiple types of resources, and that reported the resource error Job execution characterized by determining the migration-destination resource manager host that can execute the job performed on the resource manager host that reported the resource error from the resource status of other resource manager hosts Control method.

3. The job execution control method according to claim 2, wherein an application which is insufficient on the migration destination resource manager host is transmitted to the migration destination resource manager host for installation.

4. The job execution control according to claim 2, wherein when a migration destination resource manager host cannot be determined from the status of the resource, a job schedule is changed to determine a migration destination resource manager host. Method.

5. The job execution control method according to claim 2, wherein when there is unnecessary information in the resource of the resource manager host of the migration destination, the information is deleted.

In a job execution control program that controls the execution of scheduled jobs, a processing step for receiving a resource error report from a resource manager host that executes a job with multiple types of resources and a resource error report were performed. A processing step for determining a migration destination resource manager host capable of executing a job performed on the resource manager host that has reported the resource error based on the resource status of the resource manager host other than the resource manager host. A job execution control program that controls execution of the job by executing each of the processing steps.