JP6542172B2

JP6542172B2 - Job execution control device and program

Info

Publication number: JP6542172B2
Application number: JP2016183485A
Authority: JP
Inventors: 陸　振宏; 振宏陸; 誠一郎田中; 敬岡山; 信秀杉本; 洋平那須
Original assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2016-09-20
Filing date: 2016-09-20
Publication date: 2019-07-10
Anticipated expiration: 2036-09-20
Also published as: JP2018049395A

Description

本発明の実施形態は、ジョブ実行制御装置およびプログラムに関する。 Embodiments of the present invention relate to a job execution control device and a program.

コンピューター処理において、従来、ジョブの実行を制御する技術が存在する。
ジョブは、入力データと、処理と、出力データとを予め定義したひとまとまりの定義である。また、複数のジョブを連続して実行させるための定義を予め行ったり、条件に応じて異なるジョブを実行させるための定義を予め行ったりすることもある。複数のジョブからなる一連の処理を、例えば「ジョブネット」などと呼ぶことがある。また逆に、１件のジョブを構成する複数の処理単位を、例えば「ジョブステップ」と呼ぶこともある。なお、個々のジョブないしはジョブステップは、正常に実行されて終了する場合もあるし、何らかのエラーにより異常終了する場合もある。 In computer processing, conventionally, there exist techniques for controlling job execution.
A job is a set of predefined data in which input data, processing, and output data are predefined. In addition, a definition may be made in advance to execute a plurality of jobs continuously, or a definition may be made in advance to execute different jobs according to conditions. A series of processes consisting of a plurality of jobs may be called, for example, "job net". Conversely, a plurality of processing units constituting one job may be called, for example, a “job step”. Note that individual jobs or job steps may be normally executed and ended, or may be abnormally ended due to some error.

従来の技術において、ジョブの障害発生時に、ジョブ構成の情報等を参照することによりジョブの再実行を試みるなど、人手を介さずに障害からの回復をしようとするための技術が存在する。しかしながら、従来技術では、ジョブが本当に失敗したのか、単にジョブの状態を示す情報にアクセスできないだけなのか、確認していないため、実際にはまだジョブが稼働しているのに失敗したと判断してしまっている可能性があった。 In the prior art, there is a technique for attempting to recover from a failure without human intervention, such as attempting to re-execute a job by referring to information on the job configuration or the like when a failure occurs in the job. However, in the prior art, it is determined that the job is actually still running because it has not confirmed whether the job has really failed or only the information indicating the status of the job can not be accessed. It might have been

また、従来の技術において、プログラムの分散型実行環境を用いて、複数のジョブを、複数のノードで並列に実行させる技術が存在する。しかしながら、従来技術では、他の実行ノード上でジョブの再実行を自動的に行うことができず、コンピューターのリソースが効果的に使用されていない可能性があった。 Further, in the prior art, there is a technique of executing a plurality of jobs in parallel by a plurality of nodes using a distributed execution environment of a program. However, in the prior art, it is not possible to automatically re-execute a job on another execution node, and there is a possibility that computer resources are not effectively used.

特開２０１５−０６４７２３号公報Unexamined-Japanese-Patent No. 2015-064723 特開２０１４−１４２７４１号公報JP, 2014-142741, A 特開２０１３−２５７９０３号公報JP, 2013-257903, A 特開２００３−０８５０２１号公報Japanese Patent Application Laid-Open No. 2003-085021 特開平６−１０３０７８号公報Japanese Patent Laid-Open No. 6-103078

本発明が解決しようとする課題は、ジョブの再実行を自動的に他ノード上でも実行することを可能とする、ジョブ実行制御装置およびプログラムを提供することである。 The problem to be solved by the present invention is to provide a job execution control device and program that make it possible to automatically execute job reexecution even on another node.

実施形態のジョブ実行制御装置は、実行マネージャー部と、複数の実行ノード部と、実行状態共有部とを持つ。実行マネージャー部は、ジョブの実行要求および再実行要求を送出する。複数の実行ノード部の各々は、前記実行マネージャー部からの前記実行要求または前記再実行要求に基づき、要求されたジョブを実行する。実行状態共有部は、前記実行ノード部における前記ジョブの実行状態を保持する。且つ、前記実行ノード部は、実行している前記ジョブの実行状態を前記実行状態共有部に書き込むものである。また、前記実行マネージャー部は、前記実行ノード部における前記ジョブの実行が失敗したことを検知した場合に、当該ジョブについての前記再実行要求を送出するものである。また、前記実行ノード部は、前記実行マネージャー部から前記ジョブの前記再実行要求を受け取った場合には、前記実行状態共有部に書き込まれた前記実行状態を参照することにより当該ジョブの実行状態を把握し、当該ジョブが他の実行ノード部でまだ実行されているか否かを確認するとともに当該ジョブの処理が完了したか否かを確認し、当該ジョブが前記他の実行ノード部ではもう実行されておらず且つ当該ジョブの処理が完了していない場合には当該ジョブを再実行する。 The job execution control apparatus according to the embodiment has an execution manager unit, a plurality of execution node units, and an execution state sharing unit. The execution manager sends out job execution requests and re-execution requests. Each of the plurality of execution node units executes the requested job based on the execution request or the re-execution request from the execution manager unit. The execution state sharing unit holds the execution state of the job in the execution node unit. The execution node unit writes the execution state of the job being executed to the execution state sharing unit. The execution manager unit sends out the re-execution request for the job when detecting that the execution of the job in the execution node unit has failed. Further, when the execution node unit receives the reexecution request for the job from the execution manager unit, the execution node unit refers to the execution state written in the execution state sharing unit, thereby executing the job execution state. It grasps and checks whether or not the job is still executed in another execution node unit, and checks whether or not the processing of the job is completed, and the job is already executed in the other execution node unit. If not, and if the processing of the job has not been completed, the job is re-executed.

第１の実施形態のジョブ実行制御装置１の概略機能構成を示すブロック図。FIG. 1 is a block diagram showing a schematic functional configuration of a job execution control device 1 according to a first embodiment. 第１の実施形態の実行マネージャー部２のより詳細な機能構成を示すブロック図。FIG. 2 is a block diagram showing a more detailed functional configuration of the execution manager unit 2 of the first embodiment. 第１の実施形態のメッセージブローカー部３のより詳細な機能構成を示すブロック図。FIG. 2 is a block diagram showing a more detailed functional configuration of the message broker unit 3 of the first embodiment. 第１の実施形態の実行ノード部５のより詳細な機能構成を示すブロック図。FIG. 2 is a block diagram showing a more detailed functional configuration of an execution node unit 5 according to the first embodiment. 第１の実施形態のジョブ実行制御装置１にける動作手順の例を示すシーケンスチャート。6 is a sequence chart showing an example of an operation procedure in the job execution control device 1 of the first embodiment. 第１の実施形態におけるジョブフローを定義するデータの構成例を示す概略図。FIG. 5 is a schematic view showing an example of the configuration of data that defines a job flow in the first embodiment. 第１の実施形態の実行状態共有部６が保持する、ジョブの実行状態の情報の構成例を示す概略図。FIG. 7 is a schematic view showing an example of the configuration of job execution state information held by the execution state sharing unit 6 according to the first embodiment. 第１の実施形態において、ジョブの実行状態の参照と確認の方法を説明するための概略図。FIG. 5 is a schematic diagram for explaining a method of referring to and confirming a job execution state in the first embodiment. 第１の実施形態において、ジョブの再実行に関する定義を規定するテキストデータを示す概略図。In the first embodiment, a schematic diagram showing text data defining the definition for re-execution of the job. 第１の実施形態のジョブ実行制御装置１における、再実行回数の制御の方法を示す概略図。FIG. 7 is a schematic view showing a method of controlling the number of re-executions in the job execution control device 1 according to the first embodiment. 第１の実施形態の再実行回数制御部２２が再実行回数を制御する際の再実行回数カウント情報の構成例を示す概略図。FIG. 7 is a schematic view showing a configuration example of reexecution count information when the reexecution control unit 22 according to the first embodiment controls the number of reexecutions. 第１の実施形態の変形例におけるジョブフロー定義のテキストデータの構成例を示す概略図。FIG. 10 is a schematic view showing an example of the configuration of text data of job flow definition in a modification of the first embodiment.

以下、実施形態のジョブ実行制御装置およびプログラムを、図面を参照して説明する。 Hereinafter, a job execution control apparatus and a program of the embodiment will be described with reference to the drawings.

（第１の実施形態）
図１は、本実施形態によるジョブ実行制御装置１の概略機能構成を示すブロック図である。図示するように、ジョブ実行制御装置１は、実行マネージャー部２と、メッセージブローカー部３と、複数の実行ノード部５と、実行状態共有部６とを含んで構成される。
ジョブ実行制御装置１は、１台の筺体内に収容された装置として実現してもよい。また、複数台の筺体でなる装置を通信ネットワーク等で相互に接続して、全体を１個のシステムとして実現してもよい。具体例としては、複数のサーバーコンピューター等を通信ネットワークで相互に接続して、その全体で構成されるシステムを、ジョブ実行制御装置１としてもよい。ジョブ実行制御装置１を構成する各部の機能は、次の通りである。 First Embodiment
FIG. 1 is a block diagram showing a schematic functional configuration of the job execution control apparatus 1 according to the present embodiment. As illustrated, the job execution control device 1 includes an execution manager unit 2, a message broker unit 3, a plurality of execution node units 5, and an execution state sharing unit 6.
The job execution control device 1 may be realized as a device housed in one housing. Further, devices consisting of a plurality of housings may be interconnected by a communication network or the like to realize the whole as a single system. As a specific example, a plurality of server computers and the like may be mutually connected via a communication network, and a system configured as a whole may be used as the job execution control device 1. The functions of the units constituting the job execution control device 1 are as follows.

実行マネージャー部２は、ジョブフローの定義データに基づいて、ジョブの実行順序を制御する機能を有する。また、実行マネージャー部２は、ジョブ実行要求のメッセージを送出する。また、実行マネージャー部２は、ジョブの実行が成功したか失敗したかを検知し、ジョブを再実行すべきであるか否かを判断する。ジョブを再実行すべきである場合には、実行マネージャー部２は、ジョブの再実行を要求するメッセージを送出する。つまり、実行マネージャー部２は、実行ノード部５におけるジョブの実行が失敗したことを検知した場合に、当該ジョブについての再実行要求を送出する場合がある。 The execution manager unit 2 has a function of controlling the job execution order based on job flow definition data. Also, the execution manager unit 2 sends out a job execution request message. Also, the execution manager unit 2 detects whether the job execution has succeeded or failed, and determines whether the job should be re-executed. If the job should be re-executed, the execution manager unit 2 sends out a message requesting re-execution of the job. That is, when the execution manager unit 2 detects that the execution of the job in the execution node unit 5 has failed, the execution manager unit 2 may send a re-execution request for the job.

なお実行マネージャー部２は、ジョブの再実行回数をカウントするとともに、予め指定されたジョブの再実行回数の上限値を参照し、再実行回数が前記上限値を超える場合には、再実行要求をそれ以上送出せず、当該ジョブの実行が失敗であることを示すアラートを出力する。 The execution manager unit 2 counts the number of times of re-execution of the job and refers to the upper limit value of the number of times of re-execution of the job specified in advance, and when the number of re-execution exceeds the upper limit, the re-execution request It does not send any more, and outputs an alert indicating that the execution of the job is unsuccessful.

メッセージブローカー（message broker）部３は、実行マネージャー部２と実行ノード部５との間において通信メッセージの仲介を行う機能を有する。実行マネージャー部２から実行ノード部５の側に送られるメッセージ（実行要求系のメッセージ）は、実行要求キューに格納され、管理される。実行ノード部５から実行マネージャー部２の側に送られるメッセージ（実行結果系のメッセージ）は、実行結果キューに格納され、管理される。
つまり、メッセージブローカー部３は、少なくとも、実行マネージャー部２から送出される実行要求および再実行要求をキューに一時的に格納するとともに、前記キューから取り出した実行要求および再実行要求を、実行ノード部５のいずれかに渡すものである。 The message broker unit 3 has a function of mediating communication messages between the execution manager unit 2 and the execution node unit 5. Messages sent from the execution manager unit 2 to the execution node unit 5 (execution request messages) are stored and managed in the execution request queue. Messages sent from the execution node unit 5 to the execution manager unit 2 (execution result messages) are stored in the execution result queue and managed.
That is, the message broker unit 3 temporarily stores at least the execution request and the re-execution request sent from the execution manager unit 2 in a queue, and at the same time, executes the execution request and the re-execution request extracted from the queue It is something to pass to one of five.

各々の実行ノード部５は、実行マネージャー部２からの実行要求または再実行要求のメッセージに基づき、要求されたジョブを実行するための機能を有するものである。
各実行ノード部５は、異なるノード上で稼働する。つまり、各実行ノード部５は、物理的にあるいは論理的に独立した環境で稼働する。したがって、ある各実行ノード部５で障害が起こっても他の各実行ノード部５にはその障害が波及しないように、システムを構成している。各実行ノード部５は、すべて同一筺体内に収容されていてもよいし、その一部または全部がそれぞれ異なる筺体に収容されていてもよい。
また、実行ノード部５は、実行状態共有部６への、ジョブの実行状態の情報の記録を行う。つまり、実行ノード部５は、実行しているジョブの実行状態を実行状態共有部６に書き込む。また実行ノード部５は、実行状態共有部６に記録されているジョブの実行状態の情報を参照する。 Each execution node unit 5 has a function for executing the requested job based on the message of the execution request or the re-execution request from the execution manager unit 2.
Each execution node unit 5 runs on a different node. That is, each execution node unit 5 operates in an environment physically or logically independent. Therefore, even if a failure occurs in each execution node unit 5, the system is configured such that the failure does not spread to the other execution node units 5. The respective execution node units 5 may all be housed in the same housing, or some or all of them may be housed in different housings.
Also, the execution node unit 5 records information on the execution state of the job in the execution state sharing unit 6. That is, the execution node unit 5 writes the execution state of the job being executed to the execution state sharing unit 6. Further, the execution node unit 5 refers to information on the execution state of the job recorded in the execution state sharing unit 6.

なお、実行ノード部５は、ジョブの再実行に関する制御を行う。即ち、実行ノード部５は、実行マネージャー部２からジョブの再実行要求を受け取った場合には、実行状態共有部６に書き込まれた実行状態を参照することにより当該ジョブの実行状態を把握し、当該ジョブが他の実行ノード部５でまだ実行されているか否かを確認するとともに当該ジョブの処理が完了したか否かを確認し、当該ジョブが他の実行ノード部５ではもう実行されておらず且つ当該ジョブの処理が完了していない場合には当該ジョブを再実行する。 The execution node unit 5 performs control related to re-execution of a job. That is, when receiving a job re-execution request from the execution manager unit 2, the execution node unit 5 grasps the execution state of the job by referring to the execution state written in the execution state sharing unit 6, It checks whether the job is still being executed by the other execution node unit 5 and also checks whether or not the processing of the job is completed, and the job is already executed by the other execution node unit 5. If the processing of the job has not been completed, the job is re-executed.

実行ノード部５は、ジョブの処理が完了したか否かを確認する際に、実行状態共有部６から当該ジョブの中のどのステップまでが完了したかを示す情報を参照するとともに、当該ジョブを構成するステップの定義の情報を参照することによって、当該ジョブが有する全てのステップの処理が完了したか否かを確認するようにしてもよい。
実行ノード部５は、ジョブが他の実行ノード部５でまだ実行されているか否かを確認する際に、所定の時間間隔をおいて複数回（少なくとも２回）、実行状態共有部６に記録された当該ジョブの実行状態を参照し、各回間で実行状態が変化したか否かを判断することにより、ジョブが他の実行ノード部５でまだ実行されているか否かを決定するようにしてもよい。 The execution node unit 5 refers to information indicating which step of the job has been completed from the execution state sharing unit 6 when confirming whether the processing of the job is completed, and By referring to the information of the definition of the step to configure, it may be confirmed whether or not the processing of all the steps of the job is completed.
The execution node unit 5 records in the execution state sharing unit 6 a plurality of times (at least twice) at predetermined time intervals when confirming whether the job is still executed by another execution node unit 5 or not. It is determined whether or not a job is still being executed by another execution node unit 5 by referring to the execution status of the job concerned and judging whether the execution status has changed between each time. It is also good.

また、実行ノード部５は、実行マネージャー部２からジョブの再実行要求を受け取った場合には、実行状態共有部６に書き込まれた実行状態を参照することにより当該ジョブの実行状態を把握し、当該ジョブが他の実行ノード部５でもう実行されておらず且つ当該ジョブの処理がまだ完了していない場合には、実行状態共有部６から当該ジョブの中のどのステップまでが完了したかを示す情報を参照するとともに、当該ジョブを構成するステップの定義の情報を参照することによって、当該ジョブを構成するステップのうちの未処理のステップを特定し、特定された未処理のステップを再開ポイントとして当該ジョブを再実行する制御を行ってもよい。 When the execution node unit 5 receives a job re-execution request from the execution manager unit 2, the execution node unit 5 grasps the execution state of the job by referring to the execution state written in the execution state sharing unit 6, If the job has not been executed by another execution node unit 5 and the processing of the job has not been completed yet, which step of the job has been completed from the execution status sharing unit 6 While referring to the information which shows and referring to the information of the definition of the step which constitutes the job, the unprocessed step among the steps which constitute the job is specified, and the specified unprocessed step is resumed point Control may be performed to re-execute the job.

実行状態共有部６は、実行ノード部５におけるジョブの実行状態の情報を保持する。これにより、複数の実行ノード部５間で、ジョブの実行状態を共有できるようになる。実行状態共有部６は、データベース管理システム等のよって管理されたデータとして、ジョブの実行状態の情報を保持する。実行状態共有部６は、ジョブの実行ＩＤに関連付けて、ジョブの実行状態の情報を保持する。
実行状態共有部６は、実行ノード部５からの記録要求により、ジョブの実行状態の情報を書き込む。また、実行状態共有部６は、実行ノード部５からの参照要求により、求められる実行ＩＤを有するジョブの実行状態の情報を提供する。 The execution state sharing unit 6 holds information on the execution state of the job in the execution node unit 5. As a result, job execution states can be shared among a plurality of execution node units 5. The execution state sharing unit 6 holds information on the execution state of a job as data managed by a database management system or the like. The execution state sharing unit 6 holds information on the execution state of the job in association with the execution ID of the job.
The execution state sharing unit 6 writes the information on the execution state of the job in response to the recording request from the execution node unit 5. The execution state sharing unit 6 also provides information on the execution state of the job having the required execution ID in response to a reference request from the execution node unit 5.

図２は、実行マネージャー部２のより詳細な機能構成を示すブロック図である。図示するように、実行マネージャー部２は、ジョブネット実行制御部２１と、再実行回数制御部２２と、アラート通知部２３とを含んで構成される。 FIG. 2 is a block diagram showing a more detailed functional configuration of the execution manager unit 2. As shown in FIG. As illustrated, the execution manager unit 2 includes a jobnet execution control unit 21, a reexecution count control unit 22, and an alert notification unit 23.

ジョブネット実行制御部２１は、ジョブの実行順序を定義したジョブネットについて、その実行制御を行う。 The jobnet execution control unit 21 controls execution of a jobnet in which the job execution order is defined.

再実行回数制御部２２は、ジョブが再実行された回数を記録し、その回数を参照可能とする。 The re-execution number control unit 22 records the number of times the job has been re-executed, and can refer to the number.

アラート通知部２３は、ジョブの再実行回数が所定の最大値を超えた場合に、アラートを出力する。このアラートを出力することにより、再実行回数が所定値を超えた事象をユーザーに通知することができる。 The alert notification unit 23 outputs an alert when the number of times of job re-execution exceeds a predetermined maximum value. By outputting this alert, it is possible to notify the user of an event in which the number of times of re-execution exceeds a predetermined value.

図３は、メッセージブローカー部３のより詳細な機能構成を示すブロック図である。図示するように、メッセージブローカー部３は、実行要求キュー管理部３１と、実行結果キュー管理部３２と、トランザクション管理部３３とを含んで構成される。 FIG. 3 is a block diagram showing a more detailed functional configuration of the message broker unit 3. As illustrated, the message broker unit 3 includes an execution request queue management unit 31, an execution result queue management unit 32, and a transaction management unit 33.

実行要求キュー管理部３１は、実行マネージャー部２から渡されるジョブ実行要求メッセージ（以下、単に「ジョブ実行要求」と呼ぶことがある）、およびその他の実行要求系のメッセージを保持するためのキュー（queue）を管理する。このキューは、先入先出方式（first in, first out）のメモリーであり、実行要求キュー管理部３１内に存在する。キューに格納されたジョブ実行要求は、順次読み出され、実行ノード部５に渡される。 The execution request queue management unit 31 is a queue for holding a job execution request message (hereinafter may be simply referred to as a “job execution request”) passed from the execution manager unit 2 and other execution request messages (see FIG. Manage queue). This queue is a first in, first out memory, and exists in the execution request queue management unit 31. The job execution requests stored in the queue are sequentially read out and passed to the execution node unit 5.

実行結果キュー管理部３２は、実行ノード部５から返されるジョブ実行結果メッセージ（以下、単に「ジョブ実行結果」と呼ぶことがある）、およびその他の実行結果系のメッセージを保持するためのキューを管理する。このキューは、先入先出方式のメモリーであり、実行結果キュー管理部３２内に存在する。キューに格納されたジョブ実行結果は、順次読み出され、実行マネージャー部２に渡される。 The execution result queue management unit 32 is a queue for holding job execution result messages (hereinafter may be simply referred to as “job execution results”) returned from the execution node unit 5 and other execution result messages. to manage. This queue is a first-in first-out memory and exists in the execution result queue management unit 32. The job execution results stored in the queue are sequentially read out and passed to the execution manager unit 2.

トランザクション管理部３３は、ジョブ実行要求およびジョブ実行結果のメッセージの受け渡しの処理において、そのトランザクションを管理する機能を有する。 The transaction management unit 33 has a function of managing the transaction in the process of passing a job execution request and a message of a job execution result.

図４は、実行ノード部５のより詳細な機能構成を示すブロック図である。図示するように、個々の実行ノード部５は、ジョブフロー定義部５１と、ジョブ実行制御部５２と、ジョブ再実行制御部５３と、実行状態記録部５４と、実行状態参照部５５と、実行状態確認部５６とを含んで構成される。 FIG. 4 is a block diagram showing a more detailed functional configuration of the execution node unit 5. As illustrated, each execution node unit 5 executes the job flow definition unit 51, the job execution control unit 52, the job reexecution control unit 53, the execution state recording unit 54, the execution state reference unit 55, and the execution. A state confirmation unit 56 is included.

ジョブフロー定義部５１は、ジョブの処理フローの定義を保持し、または参照する機能を有する。
ジョブ実行制御部５２は、ジョブの実行を行うとともに、ジョブの実行状況を制御する。
ジョブ再実行制御部５３は、ジョブの再実行に関する制御を行う。ジョブの再実行に関する制御は、ジョブが他の実行ノード部５でまだ実行されているか否かを確認したり、ジョブが完了しているか否かを確認したり、ジョブの再開ポイントを決定したりする処理を含む。 The job flow definition unit 51 has a function of holding or referring to the definition of the processing flow of a job.
The job execution control unit 52 executes a job and controls the job execution status.
The job re-execution control unit 53 performs control relating to re-execution of a job. For control concerning re-execution of a job, check whether the job is still executed by another execution node unit 5, check whether the job is completed, determine the resume point of the job, etc. Processing to be included.

実行状態記録部５４は、ジョブの実行状態を、実行状態共有部６に記録する機能を有する。ここで、ジョブの実行状態とは、ジョブフロー中におけるどの部分（ステップ）までが実行されたかを表す情報である。
実行状態参照部５５は、実行状態共有部６に記録されているジョブの実行状態の情報を参照する。 The execution state recording unit 54 has a function of recording the execution state of the job in the execution state sharing unit 6. Here, the job execution state is information indicating which part (step) in the job flow has been executed.
The execution state reference unit 55 refers to the information on the execution state of the job recorded in the execution state sharing unit 6.

実行状態確認部５６は、実行状態参照部５５が参照したジョブの実行状態の情報に基づいて、ある特定のジョブが実行されているかどうかを判断する。例えば、実行状態確認部５６は、実行状態参照部５５を介して、ある特定のジョブの前後に実行される他のジョブの実行状態を取得し、それらの実行状態の情報に基づいて、当該特定のジョブが実行されているかどうかを判断する。
実行状態確認部５６による処理の詳細については、後で、別の図面を参照しながら説明する。 The execution status confirmation unit 56 determines whether a specific job is being executed, based on the information on the execution status of the job referred to by the execution status reference unit 55. For example, the execution status confirmation unit 56 acquires, via the execution status reference unit 55, the execution status of other jobs executed before and after a specific job, and based on the information of the execution status, the specification To determine if the job is running.
Details of the process by the execution state check unit 56 will be described later with reference to another drawing.

図５は、ジョブ実行制御装置１にける動作手順の例を示すシーケンスチャートである。同図は、ジョブ実行制御装置１内の各部の動作と、それら各部間の相互作用（データのやりとり等）を示すものである。以下、このシーケンスチャートに沿って、動作手順を説明する。
まず、ステップＳ１において、実行マネージャー部２は、ジョブ実行要求をメッセージブローカー部３に送る。このとき、ジョブ実行要求は、実行ＩＤの情報を含んでいる。これを受けて、メッセージブローカー部３は、受信したジョブ実行要求を実行要求キューに書き込む。このジョブ実行要求は、一時的に、実行要求キューに保持される。
なお、ステップＳ１において、メッセージブローカー部３の実行要求キュー管理部３１が、上記のジョブ実行要求のメッセージを実行要求キューに書き込む。 FIG. 5 is a sequence chart showing an example of an operation procedure in the job execution control device 1. The figure shows the operation of each part in the job execution control apparatus 1 and the interaction (the exchange of data, etc.) among the parts. The operation procedure will be described below along the sequence chart.
First, in step S 1, the execution manager unit 2 sends a job execution request to the message broker unit 3. At this time, the job execution request contains the information of the execution ID. In response to this, the message broker unit 3 writes the received job execution request into the execution request queue. This job execution request is temporarily held in the execution request queue.
In step S1, the execution request queue management unit 31 of the message broker unit 3 writes the above-mentioned job execution request message in the execution request queue.

次に、ステップＳ１．１において、１つの実行ノード部５（実行ノード部Ａ）は、メッセージブローカー部３から、ジョブ実行要求を受け取る。つまり、メッセージブローカー部３は、実行要求キューから上記のジョブ実行要求を取り出し、そのジョブ実行要求を上記の実行ノード部５（実行ノード部Ａ）に送信する。
なお、ステップＳ１．１において、メッセージブローカー部３の実行要求キュー管理部３１が、ジョブ実行要求を実行ノード部５に送信する。 Next, in step S1.1, one execution node unit 5 (execution node unit A) receives a job execution request from the message broker unit 3. That is, the message broker unit 3 takes out the job execution request from the execution request queue, and transmits the job execution request to the execution node unit 5 (execution node unit A).
In step S1.1, the execution request queue management unit 31 of the message broker unit 3 transmits a job execution request to the execution node unit 5.

次に、ステップＳ２において、上記ステップＳ１．１でジョブ実行要求を受信した実行ノード部５（実行ノード部Ａ）が、ジョブを実行する。ここで実行されるジョブは、渡されたジョブ実行要求が持つ実行ＩＤによって特定されるジョブである。
なお、ステップＳ２において、実行ノード部５のジョブ実行制御部５２が、上記のジョブの実行を制御する。 Next, in step S2, the execution node unit 5 (execution node unit A) that receives the job execution request in step S1.1 executes a job. The job executed here is a job specified by the execution ID of the passed job execution request.
In step S2, the job execution control unit 52 of the execution node unit 5 controls the execution of the above job.

そして、ステップＳ３において、実行ノード部５（実行ノード部Ａ）は、実行状態共有部６に、ジョブの実行状態を書き込む。このジョブの実行状態の情報は、実行ＩＤを含むものである。これにより、当該実行ＩＤのジョブの実行状態の情報は、共有可能な状態で、実行状態共有部６に記録される。
なお、ステップＳ３において、実行ノード部５の実行状態記録部５４が、ジョブの実行状態を実行状態共有部６に書き込む。 Then, in step S3, the execution node unit 5 (execution node unit A) writes the execution state of the job in the execution state sharing unit 6. The information on the execution status of this job includes the execution ID. As a result, the information on the execution status of the job of the execution ID is recorded in the execution status sharing unit 6 in a shareable status.
In step S3, the execution state recording unit 54 of the execution node unit 5 writes the execution state of the job in the execution state sharing unit 6.

次に、ステップＳ４において、本シーケンス図の流れでは、実行ノード部５（実行ノード部Ａ）が、ジョブの実行を失敗する。つまり、コンピューターでの処理の実行における何らかのエラーにより、ジョブが異常に終了する。 Next, in step S4, in the flow of the sequence diagram, the execution node unit 5 (execution node unit A) fails in the execution of the job. That is, the job ends abnormally due to some error in the execution of processing on the computer.

次に、ステップＳ５において、実行ノード部５（実行ノード部Ａ）は、ステップＳ４におけるジョブ実行の失敗を受けて、ロールバック（rollback）メッセージをメッセージブローカー部３に返す。このロールバックメッセージは、失敗したジョブの実行ＩＤを含むものである。メッセージブローカー部３は、このロールバックメッセージを、実行結果キューに書き込む。
なお、ステップＳ５において、メッセージブローカー部３の実行結果キュー管理部３２が、上記のロールバックメッセージを実行結果キューに書き込む。 Next, in step S5, the execution node unit 5 (execution node unit A) returns a rollback message to the message broker unit 3 in response to the failure of the job execution in step S4. This rollback message contains the execution ID of the failed job. The message broker unit 3 writes this rollback message to the execution result queue.
In step S5, the execution result queue management unit 32 of the message broker unit 3 writes the rollback message to the execution result queue.

次に、ステップＳ５．１において、実行マネージャー部２は、メッセージブローカー部３から、上記のロールバックメッセージを受け取る。つまり、メッセージブローカー部３は、実行結果キューから上記のロールバックメッセージを取り出し、そのロールバックメッセージを上記の実行マネージャー部２に送信する。そして、実行マネージャー部２においては、このロールバックメッセージを受けて、当該実行ＩＤによって特定されるジョブの実行回数を記録する。つまり、実行マネージャー部２は、当該実行ＩＤのジョブの実行回数を、１だけ増分させる。
そして、このとき実行マネージャー部２は、当該実行ＩＤのジョブの再実行回数が、予め定められた指定値を超えたか否かを判定する。再実行回数がまだその指定値を超えていない場合には、次にステップＳ６に進む。再実行回数がその指定値を超えた場合には、次にステップＳ８に進む。
なお、ステップＳ５．１において、実行マネージャー部２の再実行回数制御部２２が、上記の再実行回数の記録や、再実行回数の値が指定値を超えているか否かに基づく処理の分岐を制御する。また、メッセージブローカー部３の実行結果キュー管理部３２が、上記のロールバックメッセージを実行マネージャー部２に送信する。 Next, in step S5.1, the execution manager unit 2 receives the above rollback message from the message broker unit 3. That is, the message broker unit 3 takes out the above rollback message from the execution result queue, and transmits the rollback message to the above execution manager unit 2. Then, the execution manager unit 2 receives this rollback message and records the number of times of execution of the job specified by the execution ID. That is, the execution manager unit 2 increments the number of times of execution of the job of the execution ID by one.
At this time, the execution manager unit 2 determines whether the number of times of re-execution of the job of the execution ID has exceeded a predetermined designated value. If the number of re-executions has not yet exceeded the designated value, the process proceeds to step S6. If the number of times of re-execution exceeds the designated value, the process proceeds to step S8.
In step S5.1, the reexecution count control unit 22 of the execution manager unit 2 records the above reexecution count and branches the process based on whether the value of the reexecution count exceeds a specified value. Control. Also, the execution result queue management unit 32 of the message broker unit 3 transmits the rollback message to the execution manager unit 2.

次に、ステップＳ６に進んだ場合、同ステップにおいて、実行マネージャー部２は、ジョブの再実行のメッセージをメッセージブローカー部３に送る。このとき、再実行のメッセージは、実行ＩＤの情報を含んでいる。これを受けて、メッセージブローカー部３は、受信した再実行のメッセージを実行要求キューに書き込む。この再実行メッセージは、一時的に、実行要求キューに保持される。
なお、ステップＳ６において、メッセージブローカー部３の実行要求キュー管理部３１が、上記の再実行のメッセージを実行要求キューに書き込む。 Next, in step S6, the execution manager unit 2 sends a message of job re-execution to the message broker unit 3 in step S6. At this time, the re-execution message contains the information of the execution ID. In response to this, the message broker unit 3 writes the received re-execution message to the execution request queue. This re-execution message is temporarily held in the execution request queue.
In step S6, the execution request queue management unit 31 of the message broker unit 3 writes the above re-execution message to the execution request queue.

次に、ステップＳ６．１において、１つの実行ノード部５（実行ノード部Ｂ）は、メッセージブローカー部３から、再実行のメッセージを受け取る。つまり、メッセージブローカー部３は、実行要求キューから上記の再実行のメッセージを取り出し、その再実行のメッセージを上記の実行ノード部５（実行ノード部Ｂ）に送信する。なお、ここで、実行ノード部Ｂは、実行ノード部Ａとは異なるノードである。
なお、ステップＳ６．１において、メッセージブローカー部３の実行要求キュー管理部３１が、再実行のメッセージを実行ノード部５に送信する。 Next, in step S6.1, one execution node unit 5 (execution node unit B) receives a re-execution message from the message broker unit 3. That is, the message broker unit 3 takes out the above re-execution message from the execution request queue, and transmits the re-execution message to the above execution node unit 5 (execution node unit B). Here, the execution node unit B is a node different from the execution node unit A.
In step S6.1, the execution request queue management unit 31 of the message broker unit 3 transmits a re-execution message to the execution node unit 5.

次に、ステップＳ６．１．１において、上記の再実行のメッセージを受信した実行ノード部５（実行ノード部Ｂ）は、指定された実行ＩＤによって特定されるジョブの実行状態を参照する。具体的には、実行ノード部５（実行ノード部Ｂ）は、実行状態共有部６に記録されているジョブの実行状態を参照する。
なお、ステップＳ６．１．１において、実行ノード部５の実行状態参照部５５が、実行状態共有部６に記録されているジョブの実行状態を参照する。 Next, in step S6.1.1, the execution node unit 5 (execution node unit B) that has received the above re-execution message refers to the execution state of the job specified by the specified execution ID. Specifically, the execution node unit 5 (execution node unit B) refers to the execution state of the job recorded in the execution state sharing unit 6.
In step S6.1.1, the execution state reference unit 55 of the execution node unit 5 refers to the execution state of the job recorded in the execution state sharing unit 6.

次に、ステップＳ６．１．２において、実行ノード部５（実行ノード部Ｂ）は、ステップＳ６．１．１で実行状態共有部６を参照した結果に基づいて、ジョブ実行状態を確認する。そして、実行ノード部５（実行ノード部Ｂ）は、指定された実行ＩＤによって特定されるジョブが、実行されているか否かを判定する。そのジョブが実行されていない場合には、ステップＳ６．１．３に進む。そのジョブが実行されている場合には、ステップＳ６．１．５に進む。
なお、ステップＳ６．１．２において、実行ノード部５の実行状態確認部５６が、上記のジョブの実行状態を確認する。 Next, in step S6.1.2, the execution node unit 5 (execution node unit B) confirms the job execution state based on the result of referring to the execution state sharing unit 6 in step S6.1.1. Then, the execution node unit 5 (execution node unit B) determines whether or not the job specified by the specified execution ID is being executed. If the job is not executed, the process proceeds to step S6.1.3. If the job is being executed, the process proceeds to step S6.1.5.
In step S6.1.2, the execution status confirmation unit 56 of the execution node unit 5 confirms the above-mentioned job execution status.

次に、ステップＳ６．１．３に進んだ場合、同ステップにおいて、実行ノード部５（実行ノード部Ｂ）は、再開ポイントを決定する。具体的には、実行ノード部５（実行ノード部Ｂ）は、上で参照したジョブ実行状態に基づいて、再開ポイントを決定する。
なお、ステップＳ６．１．３において、実行ノード部５のジョブ再実行制御部５３が、上記の再開ポイントの決定を行う。 Next, when the process proceeds to step S6.1.3, in step S6.1.3, the execution node unit 5 (execution node unit B) determines a restart point. Specifically, the execution node unit 5 (execution node unit B) determines the restart point based on the job execution state referred to above.
In step S6.1.3, the job reexecution control unit 53 of the execution node unit 5 determines the above-mentioned restart point.

次に、ステップＳ６．１．３．１において、実行ノード部５（実行ノード部Ｂ）は、ステップＳ６．１．３で決定された再開ポイントから、ジョブを再実行する。
なお、ステップＳ６．１．３．１において、実行ノード部５のジョブ再実行制御部５３が、上記の再実行を制御する。 Next, in step S6.1.3.1, the execution node unit 5 (execution node unit B) re-executes the job from the restart point determined in step S6.1.3.
In step S6.1.3.1, the job reexecution control unit 53 of the execution node unit 5 controls the above reexecution.

次に、ステップＳ６．１．４において、上で再実行したジョブの終了後に、ジョブ実行結果をメッセージブローカー部３に返す。このジョブ実行結果は、当該ジョブの実行ＩＤを含むものである。 Next, in step S6.1.4, after the end of the re-executed job, the job execution result is returned to the message broker unit 3. The job execution result includes the execution ID of the job.

次に、ステップＳ６．２において、実行マネージャー部２は、メッセージブローカー部３から、上記のジョブ実行結果を受け取る。つまり、メッセージブローカー部３は、実行結果キューから上記のジョブ実行結果を取り出し、そのジョブ実行結果を実行マネージャー部２に送信する。 Next, in step S6.2, the execution manager unit 2 receives the above job execution result from the message broker unit 3. That is, the message broker unit 3 takes out the job execution result from the execution result queue, and transmits the job execution result to the execution manager unit 2.

次に、ステップＳ７において、実行マネージャー部２は、上記のジョブ実行結果を受けて、当該実行ＩＤによって特定されるジョブの実行完了の処理を行う。つまり、実行マネージャー部２のジョブネット実行制御部２１は、当該ジョブの実行が完了したことを把握し、ジョブ制御のために内部で管理するステータス情報を更新する。 Next, in step S7, the execution manager unit 2 receives the above-described job execution result, and performs the process of completing the execution of the job specified by the execution ID. That is, the jobnet execution control unit 21 of the execution manager unit 2 grasps that the execution of the job is completed, and updates the status information internally managed for job control.

以上で、ステップＳ６．１．３に分岐した場合の処理（ジョブ実行状態を確認した結果、実行してなかった場合の処理）を完了する。 This is the end of the processing in the case of branching to step S6.1.3 (processing in the case of not executing as a result of checking the job execution state).

次に、ステップＳ６．１．５に進んだ場合、同ステップにおいて、実行ノード部５（実行ノード部Ｂ）は、ジョブの再実行を行わないことを決定する。この処理は、実行ノード部５のジョブ再実行制御部５３により行われる。
以上で、ステップＳ６．１．５に分岐した場合の処理（ジョブ実行状態を確認した結果、実行していた場合の処理）を完了する。 Next, in step S6.1.5, in step S6.1.5, the execution node unit 5 (execution node unit B) determines not to re-execute the job. This process is performed by the job reexecution control unit 53 of the execution node unit 5.
This is the end of the processing in the case of branching to step S6.1.5 (processing in the case of having been executed as a result of checking the job execution state).

次に、ステップＳ８に進んだ場合、同ステップにおいて、実行マネージャー部２は、指定値を超えて再実行してもジョブの実行が失敗した事象に基づいて、当該実行ＩＤによって特定されるジョブに関して、実行失敗の処理を行う。つまり、実行マネージャー部２のジョブネット実行制御部２１は、当該ジョブの実行が最終的に失敗したことを把握し、ジョブ制御のために内部で管理するステータス情報を更新する。
以上で、ステップＳ８に分岐した場合の処理（再実行回数が指定値を超えていた場合の処理）を完了する。 Next, when the process proceeds to step S8, in this step, the execution manager unit 2 relates to the job specified by the execution ID based on the event that the execution of the job failed even if reexecution exceeds the designated value. , Process execution failure. That is, the jobnet execution control unit 21 of the execution manager unit 2 recognizes that the execution of the job has finally failed, and updates internally managed status information for job control.
Thus, the process when the process branches to step S8 (the process when the number of times of re-execution exceeds the designated value) is completed.

以上で、シーケンス図全体の処理を終了する。 Above, the process of the whole sequence diagram is complete | finished.

図６は、ジョブフローを定義するデータの構成例を示す概略図である。このジョブフロー定義データは、実行マネージャー部２のジョブネット実行制御部２１によって管理される。ただし、このジョブフロー定義データは、ジョブ実行制御装置１内の他の各部からも参照可能な形で保持されている。このジョブフロー定義データは、予めユーザー（例えば、システム運用管理者）によって作成され、ジョブ実行制御装置１に入力されるものである。
同図の左側は、ジョブを構成する複数のステップの処理順序関係を視覚的に表すチャートである。なお、ステップの処理順序は、半順序（partial order）で表される。同図の右側は、そのチャートに対応する、ジョブフロー定義のテキストデータである。このテキストデータは、例えば、ＸＭＬ形式やＪＳＯＮ形式などといったコンピューター処理可能な形式で、ジョブ実行制御装置１内に保持される。なお、ＸＭＬは、Extensible Markup Language（エクステンシブルマークアップランゲージ，拡張マークアップ言語）の略である。また、ＪＳＯＮは、JavaScript Object Notationの略である。なお、JavaScriptは登録商標である。 FIG. 6 is a schematic view showing a configuration example of data that defines a job flow. The job flow definition data is managed by the jobnet execution control unit 21 of the execution manager unit 2. However, the job flow definition data is held in a form that can be referred to by other units in the job execution control device 1. The job flow definition data is created in advance by a user (for example, a system operation manager) and input to the job execution control apparatus 1.
The left side of the figure is a chart visually representing the processing order relationship of a plurality of steps constituting a job. Note that the processing order of the steps is expressed in partial order. The right side of the figure is text data of a job flow definition corresponding to the chart. This text data is held in the job execution control apparatus 1 in a computer processable format such as XML format or JSON format, for example. XML is an abbreviation of Extensible Markup Language (Extensible Markup Language). Also, JSON is an abbreviation of JavaScript Object Notation. JavaScript is a registered trademark.

図示するジョブフロー定義の例においては、５個のステップ（ジョブステップ）が存在し、それらは「Sｔｅｐ１」から「Sｔｅｐ５」までである。ステップ間の順序関係は、次の通りである。即ち、Ｓｔｅｐ２の処理は、Ｓｔｅｐ１の処理に後続する。Ｓｔｅｐ３の処理は、Ｓｔｅｐ１の処理に後続する。Ｓｔｅｐ２の処理とＳｔｅｐ３の処理との間の順序関係は問われない。Ｓｔｅｐ４の処理は、Ｓｔｅｐ２の処理に後続し、且つＳｔｅｐ３の処理に後続する。そして、Ｓｔｅｐ５の処理は、Ｓｔｅｐ４の処理に後続する。 In the illustrated example of the job flow definition, there are five steps (job steps), which are “Step 1” to “Step 5”. The order relationship between steps is as follows. That is, the process of Step 2 follows the process of Step 1. The process of Step 3 follows the process of Step 1. The order relation between the process of Step 2 and the process of Step 3 does not matter. The process of Step 4 follows the process of Step 2 and follows the process of Step 3. Then, the process of Step 5 follows the process of Step 4.

また、図示するテキストデータにおいて、実体ｊｏｂ（ジョブ）は、属性ｉｄを有する。この属性ｉｄは、既に説明した「実行ＩＤ」である。実体ｊｏｂは、その下位に、実体ｓｅｑｕｅｎｃｅ（シーケンス）または実体ｐａｒａｌｌｅｌ（パラレル）を含むことによりジョブフローを定義している。実体ｓｅｑｕｅｎｃｅは、その下位に、実体ｓｔｅｐや実体ｐａｒａｌｌｅｌを含むことができる。実体ｐａｒａｌｌｅｌは、その下位に、実体ｓｔｅｐや、実体ｓｅｑｕｅｎｃｅを含むことができる。 Further, in the illustrated text data, the entity job (job) has an attribute id. This attribute id is the "execution ID" described above. An entity job defines a job flow by including an entity sequence (sequence) or an entity parallel (parallel) under the entity job. The entity sequence can include the entity step and the entity parallel below it. The entity parallel can include the entity step and the entity sequence below it.

実体ｓｅｑｕｅｎｃｅは、複数の要素が順次処理の関係にあることを表す。つまり、下位の要素である実体ｓｔｅｐや実体ｐａｒａｌｌｅｌが、記載されている順の先行−後続関係として規定される。
実体ｐａｒａｌｌｅｌは、複数の要素が並列処理の関係にあることを表す。つまり、下位の要素である実体ｓｔｅｐや実体ｓｅｑｕｅｎｃｅに関して、それら要素間での先行−後続関係が規定されず、並列に実行して良いものであることが規定される。
実体ｓｔｅｐは、属性ｉｄを有する。このｉｄは、ステップを識別する情報である。個々の実体ｓｔｅｐは、ジョブ内の処理ステップに対応する。ジョブフローの定義において、ステップは、処理のまとまりとしての最小単位である。 The entity sequence represents that a plurality of elements are in a relation of sequential processing. That is, the subordinate elements entity step and entity parallel are defined as preceding-following relationships in the order described.
The entity parallel represents that a plurality of elements are in a parallel processing relationship. In other words, with regard to the subordinate steps entity step and entity sequence, the precedence-subsequent relationship between the elements is not defined, and it is defined that they may be executed in parallel.
The entity step has an attribute id. This id is information for identifying a step. Each entity step corresponds to a processing step in the job. In the definition of a job flow, a step is the smallest unit of processing.

図示するテキストデータでは、１つの要素ｊｏｂ（ｉｄ＝ｊｏｂ１）が、１つの要素ｓｅｑｕｅｎｃｅ（順次処理）を有する。その順次処理は、要素ｓｔｅｐ（ｉｄ＝ｓｔｅｐ１）と、要素ｐａｒａｌｌｅｌと、要素ｓｔｅｐ（ｉｄ＝ｓｔｅｐ４）と、要素ｓｔｅｐ（ｉｄ＝ｓｔｅｐ５）とを有する。つまり、この記述は、ｓｔｅｐ１、要素ｐａｒａｌｌｅｌ、ｓｔｅｐ４、ｓｔｅｐ５が、この順序で実行されるべきものであることを表す。そして、上記の要素ｐａｒａｌｌｅｌは、要素ｓｔｅｐ（ｉｄ＝ｓｔｅｐ２）と、要素ｓｔｅｐ（ｉｄ＝ｓｔｅｐ３）とを有する。つまり、この記述は、ｓｔｅｐ２とｓｔｅｐ３とが並列に処理可能であることを表す。以上の通り、このテキストデータによる表現は、図６の左側に示す処理順序関係のチャートと整合している。 In the illustrated text data, one element job (id = job1) has one element sequence (sequential processing). The sequential processing includes an element step (id = step 1), an element parallel, an element step (id = step 4), and an element step (id = step 5). That is, this description represents that step1, elements parallel, step4, and step5 should be executed in this order. The above element parallel has an element step (id = step 2) and an element step (id = step 3). That is, this description indicates that step 2 and step 3 can be processed in parallel. As described above, this text data representation is consistent with the processing order relationship chart shown on the left side of FIG.

図７は、実行状態共有部６が保持する、ジョブの実行状態の情報の構成例を示す概略図である。図示するように、ジョブ実行状態の情報は、表形式のデータとして実行状態共有部６に保持される。ジョブ実行状態のデータは、例えば、データベース管理システム（ＤＢＭＳ）を用いて管理され、磁気ハードディスク装置や半導体メモリー等の永続的記憶手段に記憶される。ジョブ実行状態のデータは、実行ＩＤ、ステップＩＤ（StepID）、入力、出力の各項目を含む。実行ＩＤは、ジョブを識別する情報である。ステップＩＤは、ジョブ内のステップを識別する情報である。入力は、ステップへの入力データを特定する情報である。出力は、ステップからの出力データを特定する情報である。 FIG. 7 is a schematic view showing a configuration example of information on job execution status held by the execution status sharing unit 6. As illustrated, the information on the job execution state is held in the execution state sharing unit 6 as tabular data. The job execution state data is managed using, for example, a database management system (DBMS), and is stored in permanent storage means such as a magnetic hard disk drive or semiconductor memory. The data of the job execution state includes items of an execution ID, a step ID (Step ID), an input, and an output. The execution ID is information for identifying a job. The step ID is information for identifying a step in a job. The input is information identifying input data to the step. The output is information identifying output data from the step.

図示する例では、実行ＩＤが１００であり、且つステップＩＤがＳｔｅｐ１、Ｓｔｅｐ２、Ｓｔｅｐ３である３行のデータが存在する。そして、実行ＩＤが１００であり、ステップＩＤがその他であるデータは存在しない。これは、実行ＩＤが１００のジョブに関して、Ｓｔｅｐ１とＳｔｅｐ２とＳｔｅｐ３とは既に完了しているが、その他のステップ（例えば、Ｓｔｅｐ４など）は未完了であることを表している。このように、実行状態共有部６は、ジョブの実行経過を含めた実行状態を表すデータを保持している。 In the illustrated example, there are three rows of data in which the execution ID is 100 and the step ID is Step1, Step2, and Step3. And there is no data whose execution ID is 100 and whose step ID is other. This means that Step 1, Step 2 and Step 3 have already been completed for the job having an execution ID of 100, but other steps (for example, Step 4 and the like) are not completed. As described above, the execution state sharing unit 6 holds data representing the execution state including the progress of execution of the job.

図８は、ジョブの実行状態の参照と確認の方法を説明するための概略図である。
例えば、ネットワーク障害などの理由で、一時的に、実行ノード部５にアクセスできない場合にも、その実行ノード部５上では、ジョブがまだ実行されている可能性がある。そのため、他の実行ノード部５上で、ジョブを引き継いで再実行を行う際には、まず当初そのジョブが稼働していた実行ノード部５上での実行状態を確認する。 FIG. 8 is a schematic diagram for explaining a method of referring to and checking a job execution state.
For example, even when the execution node unit 5 can not be accessed temporarily due to a network failure, there is a possibility that the job is still being executed on the execution node unit 5. Therefore, when the job is taken over and re-executed on the other execution node unit 5, first, the execution state on the execution node unit 5 in which the job was initially operated is confirmed.

同図（ａ）は、実行ノード部間での実行状態の共有の態様を示す。ここに示す例では、ある実行ノード部５（実行ノード部Ａと呼ぶ）がジョブを実行し、その実行状態を実行状態共有部６に記録する。そしてその実行ノード部Ａが何らかの障害等により一時的にサービス不能となったとき、あるいはネットワークを介した情報としてサービス不能と判断され得るとき、他の実行ノード部５（実行ノード部Ｂと呼ぶ）上で、ジョブの再実行を試みる。その際、実行ノード部Ｂは、ジョブの実行ＩＤを鍵として実行状態共有部６に記録されている実行状態の情報を参照し、そのジョブの実行状態を確認する。 The figure (a) shows the aspect of sharing of the execution state between execution node parts. In the example shown here, an execution node unit 5 (referred to as an execution node unit A) executes a job and records the execution state in the execution state sharing unit 6. Then, when the execution node unit A becomes temporarily incapable of service due to some trouble or the like, or when it can be determined that the service can not be performed as information through the network, another execution node unit 5 (referred to as execution node unit B) Above, try to rerun the job. At that time, the execution node unit B refers to the information of the execution state recorded in the execution state sharing unit 6 using the job execution ID as a key, and confirms the execution state of the job.

同図（ｂ）は、上記の状況において実行ノード部Ｂが実行状態を確認する方法を示す。具体的には、実行ノード部Ｂは、所定の時間間隔を有する複数の時刻（例えば、ｔ１とｔ２の２つの時刻）において、それぞれ、実行状態共有部６が保持するデータベースから、目的とするジョブの実行状態の情報を取得する。なおこのとき、時間間隔（ｔ２とｔ１との差）の長さは、例えば１つのジョブステップの実行に要する時間等に基づいて適宜定めることとする。そして、実行ノード部Ｂは、時刻ｔ１における実行状態と、時刻ｔ２における実行状態とを比較する。この比較の結果として、次の３通りのパターンが検出され得る。 The figure (b) shows the method the execution node part B confirms an execution state in said condition. Specifically, the execution node unit B obtains a target job from a database held by the execution state sharing unit 6 at a plurality of times (for example, two times t1 and t2) having a predetermined time interval. Get the execution status information of. At this time, the length of the time interval (difference between t2 and t1) is appropriately determined based on, for example, the time required to execute one job step. Then, the execution node unit B compares the execution state at time t1 with the execution state at time t2. The following three patterns can be detected as a result of this comparison.

パターン１として、時刻ｔ１において参照した実行状態と、時刻ｔ２において参照した実行状態とが等しくない場合がある。この場合は、時刻ｔ１から時刻ｔ２に経過する間に、実行ノード部Ａによって実行状態の記録が更新されたことを意味する。つまり、この場合、実行ノード部Ａがまだジョブを実行中である。この場合、実行ノード部Ｂは、実行ノード部Ａによるジョブ実行が完了するまで待つ。そして、もし実行ノード部Ａがジョブ実行結果を送信できなかった場合には、代わりに、実行ノード部Ｂがジョブ実行結果をメッセージブローカー部３に送信する。 As pattern 1, there are cases where the execution state referred to at time t1 is not equal to the execution state referred to at time t2. In this case, it means that the record of the execution state is updated by the execution node unit A while the time t1 passes to the time t2. That is, in this case, the execution node unit A is still executing a job. In this case, the execution node unit B waits until the execution of the job by the execution node unit A is completed. Then, if the execution node unit A can not transmit the job execution result, the execution node unit B transmits the job execution result to the message broker unit 3 instead.

パターン２として、時刻ｔ１において参照した実行状態と時刻ｔ２において参照した実行状態とが等しく、且つジョブフローの最後に既に到達している場合がある。なお、ジョブフローの最後に到達しているか否かは、ジョブフローの定義データと、実行状態共有部６が保持する実行状態のデータとを対比することにより判断可能である。このパターン２の場合、既に、ジョブの実行が完了しているが、ジョブ実行結果がまだ実行ノード部Ａから送信されていない状況である。したがって、この場合には、実行ノード部Ｂは、ジョブ実行結果をメッセージブローカー部３に送信する処理のみを行う。 As pattern 2, there are cases where the execution state referred to at time t1 is equal to the execution state referred to at time t2 and the end of the job flow has already been reached. Whether or not the end of the job flow has been reached can be determined by comparing the definition data of the job flow with the data of the execution state held by the execution state sharing unit 6. In the case of this pattern 2, the job execution has already been completed, but the job execution result has not yet been sent from the execution node unit A. Therefore, in this case, the execution node unit B performs only the process of transmitting the job execution result to the message broker unit 3.

パターン３として、時刻ｔ１において参照した実行状態と時刻ｔ２において参照した実行状態とが等しく、且つジョブフローの最後にまだ到達していない場合がある。このパターン３の場合、ジョブの途中の所定箇所までは完了し、ジョブのその後の部分については実行されていないことを意味する。よって、実行ノード部Ｂは、実行状態の情報に基づいて、ジョブ再開ポイントを適切に定め、ジョブを再実行する。 As pattern 3, the execution state referred to at time t1 may be equal to the execution state referred to at time t2 and the end of the job flow may not yet be reached. In the case of this pattern 3, it means that the predetermined part in the middle of the job is completed, and the subsequent part of the job is not executed. Therefore, the execution node unit B appropriately determines the job restart point based on the information on the execution state, and re-executes the job.

上記のパターン３の場合の、再開ポイントの決定方法について説明する。
図６に例示したジョブフロー定義と、図７に例示した実行状態データ（各ステップの入出力の記録）を前提とした場合、実行状態データがＳｔｅｐ４の入出力記録を持っていない。このため、Ｓｔｅｐ４の処理で失敗したと判断できる。よって、再開すべきポイントは、Ｓｔｅｐ４の先頭部分である。したがって、実行ノード部Ｂのジョブ再実行制御部５３は、Ｓｔｅｐ４を再開ポイントとする。
つまり、実行ノード部５は、他の実行ノード部５によって実行されたジョブの実行状態のデータに基づいて、当該ジョブの未実行の部分を特定し、その未実行の処理を再開ポイントとする。そして、実行ノード部５は、決定した再開ポイントから、ジョブを再実行する。 A method of determining the restart point in the case of the above pattern 3 will be described.
Assuming that the job flow definition illustrated in FIG. 6 and the execution state data (recordings of input and output of each step) illustrated in FIG. 7 are premised, the execution state data does not have the input and output records of Step 4. Therefore, it can be determined that the process of Step 4 has failed. Therefore, the point to be resumed is the head portion of Step 4. Therefore, the job re-execution control unit 53 of the execution node unit B sets Step 4 as a restart point.
That is, the execution node unit 5 identifies the unexecuted part of the job based on the data of the execution state of the job executed by the other execution node unit 5, and sets the unexecuted process as the restart point. Then, the execution node unit 5 re-executes the job from the determined restart point.

なお、上に示した例では、ステップの切れ目を再開ポイントとするようにしたが、ステップの途中の未処理の部分から、処理を再開するようにしてもよい。その場合、ステップ内で、どこまでが実行済みであるかを表す情報を、実行状態のデータに含めるようにする。 In the example shown above, the break of the step is taken as the restart point, but the process may be resumed from an unprocessed part in the middle of the step. In that case, in the step, information indicating how much has been executed is included in the data of the execution state.

図９は、ジョブの再実行に関する定義を規定するテキストデータを示す概略図である。図示するデータでは、ジョブを起動する際に、ジョブごとに、再実行の可否と、再実行を可能とする場合における再実行上限回数（リトライリミット，再実行回数最大値）とを指定することができる。 FIG. 9 is a schematic diagram showing text data defining a definition regarding job re-execution. In the data shown in the figure, when starting a job, it is possible to specify whether re-execution is possible and the upper limit number of re-executions (retry limit, maximum number of re-executions) when enabling re-execution for each job. it can.

同図において、要素ｊｏｂは、属性ｉｄ（ｉｄ＝ｊｏｂ１）を有する。その要素ｊｏｂは、下位に、要素ｊｏｂｎｅｔ（ジョブネット）を有する。なお、ジョブネットは、複数のジョブを含む処理単位である。そして、その要素ｊｏｂｎｅｔは、下位に、要素ｓｅｑｕｅｎｃｅを有する。要素ｓｅｑｕｅｎｃｅは、下位の要素を直列に順次処理させるものである。そして、その要素ｓｅｑｕｅｎｃｅは、下位に、３つの要素ｉｎｖｏｋｅを有する。それら３つの要素ｉｎｖｏｋｅは、属性ｊｏｂＩｄ（ジョブ識別情報）として、それぞれ、ｊｏｂＡ、ｊｏｂＢ、ｊｏｂＣを有している。また、各々の要素ｉｎｖｏｋｅは、属性ｒｅｔｒｉａｂｌｅにより、再実行可否を規定できる。ｒｅｔｒｉａｂｌｅの値がｔｒｕｅ（真）のときには再実行可能であり、その値がｆａｌｓｅ（偽）のときには再実行不可能である。また、再実行可能の場合には、属性ｒｅｔｒｙＬｉｍｉｔにより、再実行上限回数を規定することができる。 In the figure, the element job has an attribute id (id = job1). The element job has an element jobnet (jobnet) below. A jobnet is a processing unit including a plurality of jobs. And, the element jobnet has the element sequence below. The element sequence is to sequentially process subordinate elements in series. And, the element sequence has three elements invoke at the lower level. The three elements invoke each have jobA, jobB and jobC as attributes jobId (job identification information). In addition, each element invoke can specify re-execution permission or not by the attribute retriable. When the value of retriable is true, it is possible to re-execute, and when its value is false, it is not possible. Also, in the case where re-execution is possible, the upper limit number of times of re-execution can be defined by the attribute retryLimit.

図示する例では、ｊｏｂＩｄが「ｊｏｂＡ」であるジョブに関して、再実行可能であり、再実行上限回数が５に設定されている。また、ｊｏｂＩｄが「ｊｏｂＢ」であるジョブに関して、再実行可能であり、再実行上限回数が３に設定されている。ｊｏｂＩｄが「ｊｏｂＣ」であるジョブに関して、再実行不可能と設定されている。 In the illustrated example, the re-execution is enabled for the job whose jobId is “jobA”, and the re-execution upper limit number is set to five. Further, with regard to a job whose jobId is “job B”, re-execution is possible, and the re-execution upper limit number is set to 3. For jobs whose jobId is "jobC", it is set that re-execution is impossible.

ジョブ実行制御装置１は、上で説明した図９に示すテキストデータにしたがい、ジョブの再実行を制御する。 The job execution control device 1 controls re-execution of the job according to the text data shown in FIG. 9 described above.

図１０は、ジョブ実行制御装置１における、再実行回数の制御の方法を示す概略図である。同図において、複数の実行ノード部５は、実行ＩＤが「ｊｏｂＡ」であるジョブを、順次引き継いで再実行することが可能である。第１の実行ノード部５が実行に失敗したとき、その実行ノード部５は、ロールバックメッセージを、メッセージブローカー部３経由で、実行マネージャー部２における再実行回数制御部２２に送る。再実行回数制御部２２は、そのジョブ「ｊｏｂＡ」が何度再実行されているかをカウントするとともに、再実行上限回数を超えているか否かを判断し、その判断に基づいて、再実行メッセージを送出する。その再実行メッセージは、メッセージブローカー部３経由で、他の実行ノード部５に届けられる。再実行メッセージを受けた当該実行ノード部５は、ジョブ「ｊｏｂＡ」を再実行する。２度目以後のジョブ実行失敗においても同様であり、再実行上限回数を超えるまで、ロールバックおよび再実行のプロセスを繰り返すことができる。そして、障害等の何らかの理由により、再実行の回数が指定された再実行上限回数の値を超えた場合には、実行マネージャー部２側の再実行回数制御部２２は、それ以上の再実行を行わず、当該ジョブを実行失敗として終了させる。このとき、再実行回数制御部２２からアラート通知部２３に、実行失敗を示す情報と、そのジョブの実行ＩＤとが通知される。アラート通知部２３は、その実行ＩＤとともに、ジョブの実行失敗を知らせるためのアラート情報を、出力する。このアラート情報により、ユーザー（システム管理者、システム運用担当者等）は、ジョブの実行失敗を知る。 FIG. 10 is a schematic diagram showing a method of controlling the number of re-executions in the job execution control device 1. In the figure, a plurality of execution node units 5 can successively take over and re-execute jobs whose execution ID is "job A". When the first execution node unit 5 fails in execution, the execution node unit 5 sends a rollback message to the reexecution number control unit 22 in the execution manager unit 2 via the message broker unit 3. The re-execution number control unit 22 counts how many times the job "job A" is re-executed, and determines whether or not the re-execution upper limit number is exceeded, and based on the determination, the re-execution message is Send out. The re-execution message is delivered to the other execution node unit 5 via the message broker unit 3. The execution node unit 5 having received the re-execution message re-executes the job “jobA”. The same is true for the second and subsequent job execution failures, and it is possible to repeat the rollback and re-execution processes until the re-execution upper limit number is exceeded. Then, if the number of re-executions exceeds the designated re-execution upper limit number for any reason such as a failure, the re-execution number control unit 22 on the execution manager unit 2 side performs re-execution more than that. Do not execute this job, and end the job as execution failure. At this time, the information indicating the execution failure and the execution ID of the job are notified from the reexecution count control unit 22 to the alert notification unit 23. The alert notification unit 23 outputs, together with the execution ID, alert information for notifying that the job execution has failed. Based on this alert information, the user (system administrator, system operation manager, etc.) knows that the job execution has failed.

図１１は、再実行回数制御部２２が再実行回数を制御する際の再実行回数カウント情報の構成例を示す概略図である。図示するように、再実行回数カウント情報は、表形式のデータとして実現され得るものであり、実行ＩＤおよび再実行回数カウント（RetryCount）のデータ項目を有する。つまり、実行ＩＤに対応付けて、再実行回数カウントを保持することができる。この再実行回数カウント情報は、例えば実行マネージャー部２からアクセス可能なデータベース管理システム（ＤＢＭＳ）で管理される。 FIG. 11 is a schematic diagram showing a configuration example of reexecution count information when the reexecution control unit 22 controls the reexecution number. As illustrated, the re-execution count information may be realized as tabular data, and includes data items of an execution ID and a re-execution count (RetryCount). That is, the reexecution count can be held in association with the execution ID. The reexecution count information is managed by, for example, a database management system (DBMS) accessible from the execution manager unit 2.

図示するデータ例では、実行ＩＤが「１００」であるジョブに関して、その時点での再実行回数カウントは４である。また、実行ＩＤが「１１１」であるジョブに関して、その時点での再実行回数カウントは２である。また、実行ＩＤが「１２０」であるジョブに関して、その時点での再実行回数カウントは５である。再実行回数制御部２２は、この再実行回数カウント情報を適宜更新することにより、再実行回数をカウントする。
そして、再実行回数制御部２２は、前述の通り、再実行回数が指定された閾値（再実行上限回数）以下であれば、再実行リクエストを送付し、再度実行するようにする。また、再実行回数が指定された閾値（再実行上限回数）を超えていれば、それ以上の再実行によって復旧はできない可能性が高いために、再実行せず、アラート通知部２３がユーザーに対してアラート通知を行う。 In the illustrated data example, the re-execution count count at that time is 4 for the job whose execution ID is “100”. Further, with respect to the job whose execution ID is "111", the reexecution count count at that time is 2. Further, with regard to the job whose execution ID is "120", the reexecution count count at that time is five. The re-execution number control unit 22 counts the re-execution number by updating the re-execution number count information as appropriate.
Then, as described above, if the number of re-executions is equal to or less than the specified threshold (the number of upper limit of re-executions), the number-of-re-executions control unit 22 sends a re-execution request to execute again. Also, if the number of re-executions exceeds the specified threshold (the maximum number of re-executions), there is a high possibility that recovery can not be performed by more than one re-execution. Send alert notifications.

上記実施形態では、再実行回数制御部の機能を実行マネージャー部２内に有するものとしたが、下記の変形例によって実施してもよい。
この変形例においては、実行マネージャー部２ではなく、実行ノード部５が再実行回数を制御する機能（再実行制御部の機能）を有する。
また、この変形例においては、各ジョブのジョブフロー定義において、そのジョブの再実行可否と、再実行可能の場合の再実行上限回数とを、指定することができるようにする。 In the above embodiment, the function of the reexecution number control unit is provided in the execution manager unit 2. However, the following modification may be made.
In this modification, not the execution manager unit 2 but the execution node unit 5 has a function (a function of a reexecution control unit) for controlling the number of times of reexecution.
Further, in this modification, in the job flow definition of each job, it is possible to designate re-execution availability of the job and the re-execution upper limit number of times when re-execution is possible.

図１２は、この変形例におけるジョブフロー定義のテキストデータの構成例を示す概略図である。このジョブフロー定義のデータは、図２に示したテキストデータと同様に、ｓｔｅｐ１からｓｔｅｐ５までの各ステップの処理順序を規定している。ここで、当該変形例における特徴は、データの第１行目に記述されているように、要素ｊｏｂが、属性ｒｅｔｒｉａｂｌｅおよび属性ｒｅｔｒｙＬｉｍｉｔを有する点である。図示する例では、「ｒｅｔｒｉａｂｌｅ＝ｔｒｕｅ」（再実行可能を表す）、および「ｒｅｔｒｙＬｉｍｉｔ＝５」（再実行上限回数が５であることを表す）が記述されている。 FIG. 12 is a schematic view showing a configuration example of text data of job flow definition in this modification. The data of the job flow definition defines the processing order of each step from step 1 to step 5 as in the text data shown in FIG. Here, the feature in the modification is that the element job has an attribute retriable and an attribute retryLimit as described in the first line of data. In the illustrated example, “retriable = true” (represents re-permission) and “retryLimit = 5” (represents that the number of re-execution upper limit is 5) is described.

また、この変形例において、各ジョブの再実行回数のカウント値（図１１に示したのと同等のデータ）は、各実行ノード部５からアクセス可能なデータベースに保存され、ノード間で共有される。一例として、実行状態共有部６内に、各ジョブの再実行回数のカウント値を保持するようにしてもよい。そして、実行ノード部５は、これらの再実行回数のカウント値と、再実行上限回数とに基づいて、既に述べたものと同様の再実行の制御を行う。 Further, in this modification, the count value of the number of times of re-execution of each job (data equivalent to that shown in FIG. 11) is stored in a database accessible from each execution node unit 5 and shared among the nodes. . As an example, the execution state sharing unit 6 may hold the count value of the number of times of reexecution of each job. Then, the execution node unit 5 performs re-execution control similar to that described above based on the count value of the number of re-executions and the upper limit number of re-executions.

以上説明した少なくともひとつの実施形態によれば、実行マネージャー部２が、実行ノード部５におけるジョブの実行が失敗したことを検知した場合に、当該ジョブについての再実行要求を送出することにより、また、実行ノード部５は、実行マネージャー部２からジョブの再実行要求を受け取った場合には、実行状態共有部６に書き込まれた実行状態を参照することにより当該ジョブの実行状態を把握し、当該ジョブが他の実行ノード部５でまだ実行されているか否かを確認するとともに当該ジョブの処理が完了したか否かを確認し、当該ジョブが他の実行ノード部５ではもう実行されておらず且つ当該ジョブの処理が完了していない場合には当該ジョブを再実行するようにすることにより、異なるノードにまたがって、ジョブを再実行させることができる。 According to at least one embodiment described above, when the execution manager unit 2 detects that the execution of the job in the execution node unit 5 has failed, by sending a re-execution request for the job, or When the execution node unit 5 receives a job reexecution request from the execution manager unit 2, the execution node unit 5 grasps the execution state of the job by referring to the execution state written in the execution state sharing unit 6, and Check whether the job is still being executed by the other execution node unit 5 and also check whether or not the processing of the job is completed, and the job is no longer executed by the other execution node unit 5 And, if the processing of the job is not completed, the job is re-executed to re-execute the job across different nodes. Rukoto can.

また、実行ノード部５は、ジョブの処理が完了したか否かを確認する際に、実行状態共有部６から当該ジョブの中のどのステップまでが完了したかを示す情報を参照するとともに、当該ジョブを構成するステップの定義の情報を参照することによって、当該ジョブが有する全てのステップの処理が完了したか否かを確認する。これにより、障害である可能性があるノードで実行されていたジョブに関して、ジョブの処理が完了しているか否かを判定することができる。 Further, when confirming whether the processing of the job has been completed, the execution node unit 5 refers to information indicating which step in the job has been completed from the execution state sharing unit 6, and By referring to the information of the definition of the steps constituting the job, it is confirmed whether or not the processing of all the steps of the job has been completed. As a result, it is possible to determine whether the processing of the job has been completed with respect to the job that has been executed at the node that may be at fault.

また、実行ノード部５は、ジョブが他の実行ノード部５でまだ実行されているか否かを確認する際に、所定の時間間隔をおいて少なくとも２回、実行状態共有部から当該ジョブの実行状態を参照し、各回間で前記実行状態が変化したか否かを判断することにより、ジョブが他の実行ノード部５でまだ実行されているか否かを決定する。これにより、例えばネットワーク障害等により、失敗したかのように見えるジョブに関して、まだ実行されているか否かを確認することができる。 In addition, when the execution node unit 5 checks whether the job is still executed by another execution node unit 5, the execution state sharing unit executes the job at least twice at predetermined time intervals. By referring to the state and judging whether the execution state has changed between each time, it is determined whether the job is still being executed by another execution node unit 5 or not. This makes it possible to check whether a job that appears to have failed, for example, due to a network failure, is still being executed.

また、メッセージブローカー部３は、実行マネージャー部２から送出される実行要求および再実行要求をキューに一時的に格納するとともに、前記キューから取り出した実行要求および再実行要求を、実行ノード部５のいずれかに渡す。これにより、実行要求等を複数の実行ノード部５に振り分けることが可能となる。 In addition, the message broker unit 3 temporarily stores the execution request and the re-execution request sent from the execution manager unit 2 in a queue, and at the same time, the execution request and the re-execution request extracted from the queue Pass on one. This makes it possible to distribute execution requests and the like to a plurality of execution node units 5.

実行ノード部５は、実行マネージャー部２からジョブの再実行要求を受け取った場合には、実行状態共有部６に書き込まれた実行状態を参照することにより当該ジョブの実行状態を把握し、当該ジョブが他の実行ノード部５でもう実行されておらず且つ当該ジョブの処理がまだ完了していない場合には、実行状態共有部６から当該ジョブの中のどのステップまでが完了したかを示す情報を参照するとともに、当該ジョブを構成するステップの定義の情報を参照することによって、当該ジョブを構成するステップのうちの未処理のステップを特定し、特定された未処理のステップを再開ポイントとして当該ジョブを再実行する。これにより、適切なポイントからの再実行が可能となる。また、計算資源の無駄な消費を抑制することができる。 When the execution node unit 5 receives a job reexecution request from the execution manager unit 2, the execution node unit 5 refers to the execution state written in the execution state sharing unit 6 to grasp the execution state of the job, and the job Is not executed by another execution node unit 5 and the processing of the job has not been completed yet, information indicating which step of the job has been completed from the execution state sharing unit 6 By referring to the information on the definition of the steps that constitute the job, identify the unprocessed steps among the steps that constitute the job, and use the identified unprocessed steps as the restart point. Rerun the job. This enables re-execution from an appropriate point. In addition, unnecessary consumption of computing resources can be suppressed.

実行マネージャー部２は、ジョブの再実行回数をカウントするとともに、予め指定されたジョブの再実行回数の上限値を参照し、再実行回数が前記上限値を超える場合には、再実行要求をそれ以上送出せず、当該ジョブの実行が失敗であることを示すアラートを出力する。これにより、ジョブの再実行回数を制御できる。つまり、延々と再実行し続けることを防ぐことができる。 The execution manager unit 2 counts the number of times of re-execution of the job and refers to the upper limit value of the number of times of re-execution of the job specified in advance, and when the number of re-execution exceeds the upper limit, An alert indicating that the execution of the job has failed is output without sending any more. Thus, it is possible to control the number of times of job re-execution. That is, it is possible to prevent continuous re-execution.

なお、上述した実施形態におけるジョブ実行制御装置１の機能、またはその一部の機能をコンピューターで実現するようにしても良い。その場合、この機能を実現するためのプログラムをコンピューター読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピューターシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピューターシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピューター読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ＵＳＢメモリー等の可搬媒体、コンピューターシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピューター読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバーやクライアントとなるコンピューターシステム内部の揮発性メモリーのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピューターシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 Note that the functions of the job execution control device 1 in the above-described embodiment or a part of the functions may be realized by a computer. In that case, a program for realizing this function may be recorded in a computer readable recording medium, and the program recorded in the recording medium may be read and executed by a computer system. Here, the “computer system” includes an OS and hardware such as peripheral devices. The term "computer-readable recording medium" means portable media such as flexible disks, magneto-optical disks, ROMs, CD-ROMs, DVD-ROMs, and USB memories, and storage devices such as hard disks incorporated in computer systems. It means that. Furthermore, "computer-readable recording medium" holds a program dynamically for a short time, like a communication line in the case of transmitting a program via a network such as the Internet or a communication line such as a telephone line. It may also include one that holds the program for a certain period of time, such as volatile memory in the computer system that becomes the server or client in that case. The program may be for realizing a part of the functions described above, or may be realized in combination with the program already recorded in the computer system.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 While certain embodiments of the present invention have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the invention. These embodiments can be implemented in other various forms, and various omissions, replacements, and modifications can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the invention described in the claims and the equivalents thereof as well as included in the scope and the gist of the invention.

１…ジョブ実行制御装置、２…実行マネージャー部、３…メッセージブローカー部、５…実行ノード部、６…実行状態共有部、２…実行マネージャー部、２１…ジョブネット実行制御部、２２…再実行回数制御部、２３…アラート通知部、３１…実行要求キュー管理部、３２…実行結果キュー管理部、３３…トランザクション管理部、５１…ジョブフロー定義部、５２…ジョブ実行制御部、５３…ジョブ再実行制御部、５４…実行状態記録部、５５…実行状態参照部、５６…実行状態確認部 DESCRIPTION OF SYMBOLS 1 ... Job execution control apparatus, 2 ... Execution manager part, 3 ... Message broker part, 5 ... Execution node part, 6 ... Execution state share part, 2 ... Execution manager part, 21 ... Jobnet execution control part, 22 ... Re-execution Number control unit 23 Alert notification unit 31 Execution request queue management unit 32 Execution result queue management unit 33 Transaction management unit 51 Job flow definition unit 52 Job execution control unit 53 Job re-execution unit Execution control unit, 54 ... execution state recording unit, 55 ... execution state reference unit, 56 ... execution state confirmation unit

Claims

An execution manager unit that sends job execution requests and re-execution requests;
A plurality of execution node units that execute requested jobs based on the execution request from the execution manager unit or the reexecution request;
An execution state sharing unit that holds an execution state of the job in the execution node unit;
A job execution control device comprising
The execution node unit state, and are not written to the execution state of the job running on the execution state sharing unit,
When the execution node unit abnormally detects that the execution of the job in the execution node unit abnormally ends or that the execution node unit can not be temporarily accessed due to a network failure, the execution state sharing is performed. Check the execution status of the job by referring to the execution status written in the unit, and check whether the job is still being executed by another execution node unit and at the same time the processing of the job is completed Whether the job has not been executed by the other execution node unit and the processing of the job has not been completed, the job is re-executed.
Job execution control device.

The execution node unit refers to information indicating which step of the job has been completed from the execution state sharing unit when confirming whether the processing of the job is completed or not, and the job Check whether all the steps of the job have been processed or not by referring to the information on the definition of the steps that constitute the job.
The job execution control device according to claim 1.

The execution node unit checks whether the job is still executed by the other execution node unit, at least two times at predetermined time intervals, from the execution state sharing unit to the job of the job. Determine whether the job is still being executed by the other execution node unit by referring to the execution status and judging whether the execution status has changed between each time.
The job execution control device according to claim 1.

The execution request and the re-execution request sent from the execution manager unit are temporarily stored in a queue, and the execution request and the re-execution request taken out from the queue are delivered to any of the execution node units. Message Broker Department,
The job execution control device according to any one of claims 1 to 3, further comprising:

When the execution node unit receives the reexecution request for the job from the execution manager unit, the execution node unit grasps the execution state of the job by referring to the execution state written in the execution state sharing unit. If the job has not been executed by another execution node unit and processing of the job has not been completed yet, which step in the job has been completed from the execution status sharing unit While referring to the information which shows and referring to the information of the definition of the step which constitutes the job, it specifies the unprocessed step among the steps which constitute the job, and resumes the specified unprocessed step. Rerun the job as a point,
The job execution control device according to any one of claims 1 to 4.

The execution manager unit counts the number of times of re-execution of the job and refers to a predetermined upper limit value of the number of times of re-execution of the job, and when the number of times of re-execution exceeds the upper limit Do not send out any more execution requests, and output an alert indicating that the job has failed.
The job execution control device according to any one of claims 1 to 5.

Computer,
An execution manager unit that sends job execution requests and re-execution requests;
A plurality of execution node units that execute requested jobs based on the execution request from the execution manager unit or the reexecution request;
An execution state sharing unit that holds an execution state of the job in the execution node unit;
A job execution control device comprising
The execution node unit state, and are not written to the execution state of the job running on the execution state sharing unit,
When the execution node unit abnormally detects that the execution of the job in the execution node unit abnormally ends or that the execution node unit can not be temporarily accessed due to a network failure, the execution state sharing is performed. Check the execution status of the job by referring to the execution status written in the unit, and check whether the job is still being executed by another execution node unit and at the same time the processing of the job is completed Whether the job has not been executed by the other execution node unit and the processing of the job has not been completed, the job is re-executed.
A program for functioning as a job execution control device.