JP5397076B2

JP5397076B2 - Job execution apparatus, job execution method, and job execution program

Info

Publication number: JP5397076B2
Application number: JP2009183603A
Authority: JP
Inventors: 雅彦高木
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2009-08-06
Filing date: 2009-08-06
Publication date: 2014-01-22
Anticipated expiration: 2029-08-06
Also published as: JP2011039595A

Description

本発明は、ジョブ実行装置、ジョブ実行方法およびジョブ実行プログラムに関する。 The present invention relates to a job execution device, a job execution method, and a job execution program.

一般に、バッチ処理システムでは、ジョブに従って入力ファイルから順次データを読み込み、読み込んだデータを処理して出力ファイルに書き出す処理を実行する。このようなシステムで処理中に障害が発生した場合には、出力データを処理前の状態に戻し、再度処理を実行してデータを復旧する。下記特許文献１では、障害が発生した場合であっても、出力データを処理前の初期状態に戻すことなく、処理を再開する技術が開示されている。 Generally, in a batch processing system, data is sequentially read from an input file according to a job, and the read data is processed and written to an output file. If a failure occurs during processing in such a system, the output data is returned to the state before the processing, and the processing is executed again to recover the data. Patent Document 1 below discloses a technique for restarting processing without returning output data to an initial state before processing even when a failure occurs.

特開平６−１２２６７号公報JP-A-6-12267

上記特許文献１に記載の技術では、バッチ処理中に出力されたデータの件数を記憶している。そして、障害が発生した場合には、入力データを再度先頭から読み込み、上記記憶した出力データ件数分の入力データを読み飛ばし、その次に入力されるデータから処理を再開して出力ファイルを更新させている。 In the technique described in Patent Document 1, the number of data output during batch processing is stored. If a failure occurs, the input data is read again from the beginning, the input data for the number of stored output data items is skipped, the process is restarted from the next input data, and the output file is updated. ing.

ところが、バッチ処理システムの中には、複数のジョブを並列して実行可能なバッチ処理高速化方式（ＰＲＥＳＴ）を採用するものがある。このようなバッチ処理システムでは、先行するジョブでデータの一部が出力されると、その出力データを用いて後続ジョブが並列して実行される。したがって、バッチ処理中は、並列に実行されるジョブごとにデータの処理状態や処理件数が異なることになる。このようなバッチ処理システムで障害が発生すると、単に全体の処理件数を記憶しているだけでは、ジョブごとに異なるデータの処理状態や処理件数までは把握することができないため、障害発生時に処理が確定していないデータを全て初期状態に戻す必要がある。つまり、初期状態に戻したデータについては再度最初のジョブからバッチ処理を開始することになるため、障害回復後のデータ復旧に時間を要してしまう。 However, some batch processing systems employ a batch processing acceleration system (PREST) that can execute a plurality of jobs in parallel. In such a batch processing system, when a part of data is output in the preceding job, the subsequent job is executed in parallel using the output data. Therefore, during batch processing, the data processing state and the number of processes differ for each job executed in parallel. When a failure occurs in such a batch processing system, simply storing the total number of processing cases cannot grasp the data processing status and the number of processing items that differ for each job. It is necessary to return all undefined data to the initial state. That is, since the batch processing is started again from the first job for the data returned to the initial state, it takes time to recover the data after the failure recovery.

本発明は、上述した課題を解決するためになされたものであり、複数のジョブを並列して実行する場合であっても障害回復後のデータ復旧時間を短縮することができるジョブ実行装置、ジョブ実行方法およびジョブ実行プログラムを提供することを目的とする。 The present invention has been made to solve the above-described problem, and can execute a job execution apparatus and a job that can shorten the data recovery time after failure recovery even when a plurality of jobs are executed in parallel. An object is to provide an execution method and a job execution program.

本発明のジョブ実行装置は、バッチ処理を構成する第１のジョブおよび第２のジョブを並列して実行可能なジョブ実行装置であって、第１のジョブを実行する第１のジョブ実行部と、第１のジョブ実行部によって第１のジョブが実行された結果出力された第１のデータを記憶するデータ記憶部と、データ記憶部によって記憶されている第１のデータを処理する第２のジョブを実行する第２のジョブ実行部と、第２のジョブで処理されている第１のデータを特定するためのデータ特定情報を含むデータ情報を記憶するデータ情報記憶部と、障害が発生した場合に、データ記憶部によって記憶されている第１のデータ、およびデータ情報記憶部によって記憶されているデータ情報をダンプするダンプ部と、を備える。 A job execution apparatus according to the present invention is a job execution apparatus capable of executing a first job and a second job constituting a batch process in parallel, and a first job execution unit that executes the first job; A data storage unit for storing first data output as a result of execution of the first job by the first job execution unit, and a second unit for processing the first data stored in the data storage unit A second job execution unit that executes a job, a data information storage unit that stores data information including data specifying information for specifying the first data processed by the second job, and a failure has occurred A dump unit that dumps the first data stored in the data storage unit and the data information stored in the data information storage unit.

本発明のジョブ実行方法は、バッチ処理を構成する第１のジョブおよび第２のジョブを並列して実行可能な装置において実行される方法であって、第１のジョブを実行する第１のジョブ実行ステップと、第１のジョブ実行ステップにおいて第１のジョブが実行された結果出力された第１のデータを記憶するデータ記憶ステップと、データ記憶ステップにおいて記憶された第１のデータを処理する第２のジョブを実行する第２のジョブ実行ステップと、第２のジョブで処理されている第１のデータを特定するためのデータ特定情報を含むデータ情報を記憶するデータ情報記憶ステップと、障害が発生した場合に、データ記憶ステップにおいて記憶された第１のデータ、およびデータ情報記憶ステップにおいて記憶されたデータ情報をダンプするダンプステップと、を含む。 The job execution method of the present invention is a method executed in an apparatus that can execute a first job and a second job that constitute a batch process in parallel, and the first job that executes the first job An execution step; a data storage step for storing first data output as a result of execution of the first job in the first job execution step; and a first data processing step for processing the first data stored in the data storage step. A second job execution step for executing the second job, a data information storage step for storing data information including data specifying information for specifying the first data processed in the second job, and a fault If it occurs, dump the first data stored in the data storage step and the data information stored in the data information storage step Including pump and step, the.

本発明のジョブ実行プログラムは、上記ジョブ実行方法に含まれる各ステップをコンピュータに実行させる。 The job execution program of the present invention causes a computer to execute each step included in the job execution method.

本発明によれば、障害回復後のデータ復旧時間を短縮することができる。 According to the present invention, data recovery time after recovery from a failure can be shortened.

実施形態におけるジョブ実行装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the job execution apparatus in embodiment. ジョブ実行装置におけるデータの流れを模式的に示す図である。It is a figure which shows typically the data flow in a job execution apparatus. エージェント部が後続ジョブ実行部に処理データを送信する際の動作を説明するためのフローチャートである。7 is a flowchart for explaining an operation when an agent unit transmits processing data to a subsequent job execution unit. エージェント部が先行ジョブ実行部から処理結果データを受信する際の動作を説明するためのフローチャートである。6 is a flowchart for explaining an operation when an agent unit receives processing result data from a preceding job execution unit. 障害が発生した際の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement when a failure generate | occur | produces. エージェント部が後続ジョブ実行部からコミット通知を受信した際の動作を説明するためのフローチャートである。10 is a flowchart for explaining an operation when an agent unit receives a commit notification from a subsequent job execution unit.

以下、添付図面を参照して、本発明に係るジョブ実行装置、ジョブ実行方法およびジョブ実行プログラムの好適な実施形態について説明する。 Hereinafter, preferred embodiments of a job execution device, a job execution method, and a job execution program according to the present invention will be described with reference to the accompanying drawings.

まず、図１および図２を参照して、実施形態におけるジョブ実行装置の構成について説明する。図１は、ジョブ実行装置の機能構成を示すブロック図である。図２は、ジョブ実行装置におけるデータの流れを模式的に示す図である。なお、ジョブ実行装置は、複数のジョブを並列して実行可能なバッチ処理高速化方式（ＰＲＥＳＴ）を採用する。 First, the configuration of the job execution apparatus in the embodiment will be described with reference to FIGS. 1 and 2. FIG. 1 is a block diagram illustrating a functional configuration of the job execution apparatus. FIG. 2 is a diagram schematically illustrating a data flow in the job execution apparatus. The job execution apparatus employs a batch processing acceleration system (PREST) that can execute a plurality of jobs in parallel.

図１に示すように、ジョブ実行装置１は、複数のエージェント部１０と、複数のジョブ実行部２０と、エージェント管理部３０と、メモリ４０と、ダンプ情報ファイル５０とを有する。 As illustrated in FIG. 1, the job execution apparatus 1 includes a plurality of agent units 10, a plurality of job execution units 20, an agent management unit 30, a memory 40, and a dump information file 50.

ジョブ実行部２０は、一つのジョブごとに設けられ、割り当てられたジョブを実行する。本実施形態では、説明の便宜のために、ジョブが二つである場合について説明する。最初に実行するジョブを先行ジョブ（第１のジョブ）といい、この先行ジョブの実行によって出力されたデータを用いて処理を実行するジョブを後続ジョブ（第２のジョブ）という。また、先行ジョブを実行するジョブ実行部２０を先行ジョブ実行部２０（第１のジョブ実行部）といい、後続ジョブを実行するジョブ実行部２０を後続ジョブ実行部２０（第２のジョブ実行部）ということにする。なお、ジョブが二つであるため、エージェント部１０は一つとなる。 The job execution unit 20 is provided for each job and executes the assigned job. In this embodiment, for convenience of explanation, a case where there are two jobs will be described. The job to be executed first is referred to as a preceding job (first job), and the job that executes processing using data output by the execution of this preceding job is referred to as a subsequent job (second job). The job execution unit 20 that executes the preceding job is referred to as the preceding job execution unit 20 (first job execution unit), and the job execution unit 20 that executes the subsequent job is the subsequent job execution unit 20 (second job execution unit). ) Since there are two jobs, the agent unit 10 is one.

エージェント部１０は、ジョブ実行部２０間に設けられ、先行ジョブから後続ジョブへのデータの引き継ぎ処理等を制御する。エージェント管理部３０は、全てのエージェント部１０を管理する機能を有し、障害発生時には各エージェント部１０に障害が発生した旨を示す障害発生通知を送信する。エージェント部１０の詳細については後述する。 The agent unit 10 is provided between the job execution units 20 and controls data transfer processing from the preceding job to the succeeding job. The agent management unit 30 has a function of managing all the agent units 10 and transmits a failure occurrence notification indicating that a failure has occurred in each agent unit 10 when a failure occurs. Details of the agent unit 10 will be described later.

メモリ４０は、処理データ格納領域４１（データ記憶部）と、処理データ情報格納領域４２（データ情報記憶部）とを有する。図２に示すように、処理データ格納領域４１には、先行ジョブ実行部２０から出力された処理データ（第１のデータ）が格納される。処理データはチャンク（データ群）単位に特定可能に格納される。チャンクとは、複数のデータをまとめたものであり、トランザクションの単位となる。本実施形態では、三つのデータでチャンクを形成する。データにはデータを特定するための番号（データ特定情報）が入力順にカウントアップして付与され、チャンクにはチャンクを特定するための番号（データ群特定情報）が入力順にカウントアップして付与される。処理データ格納領域４１に格納される処理データは、先行ジョブ実行部２０から出力されたときに登録され、後続ジョブ実行部２０からコミットを受信したときに削除される。 The memory 40 includes a processing data storage area 41 (data storage unit) and a processing data information storage area 42 (data information storage unit). As shown in FIG. 2, the processing data storage area 41 stores processing data (first data) output from the preceding job execution unit 20. The processing data is stored in a identifiable manner in units of chunks (data groups). A chunk is a collection of a plurality of data and is a unit of transaction. In this embodiment, a chunk is formed with three data. A number for identifying data (data identification information) is counted up in order of input to data, and a number for identifying chunks (data group identification information) is counted up in order of input to data. The The processing data stored in the processing data storage area 41 is registered when output from the preceding job execution unit 20 and is deleted when a commit is received from the subsequent job execution unit 20.

処理データ情報格納領域４２には、処理データ管理テーブル４２１と、後続処理中データ番号４２２とが格納される。処理データ管理テーブル４２１は、データ項目として、例えば、チャンク番号、チャンク先頭データ番号、チャンク最後尾データ番号を有する。チャンク番号の項目には、先行ジョブ実行部２０から出力された処理データが属するチャンク番号が格納される。チャンク先頭データ番号の項目には、チャンクを構成する先頭の処理データに付与された番号（データ特定情報）が格納される。チャンク最後尾データ番号の項目には、チャンクを構成する最後尾の処理データに付与された番号（データ特定情報）が格納される。 A processing data management table 421 and a subsequent processing data number 422 are stored in the processing data information storage area 42. The processing data management table 421 has, for example, a chunk number, a chunk head data number, and a chunk tail data number as data items. The chunk number item stores the chunk number to which the processing data output from the preceding job execution unit 20 belongs. In the item of the chunk head data number, a number (data specifying information) given to the head process data constituting the chunk is stored. In the item of the chunk last data number, a number (data specifying information) assigned to the last process data constituting the chunk is stored.

処理データ管理テーブル４２１では、上記各データ項目からなる処理データ管理レコード（データ情報）単位に登録や削除が行われる。処理データ管理レコードは、チャンクを構成する先頭の処理データが先行ジョブ実行部２０から出力されたときに登録される。登録時の処理データ管理レコードのチャンク最後尾データ番号には、まだデータが存在しないことを示す“−”が格納されている。その後、先行ジョブ実行部２０からチャンクを構成する最後尾の処理データが出力されたときに、その処理データの番号を、処理データ管理レコードのチャンク最後尾データ番号に格納する。つまり、チャンク最後尾データ番号に“−”が格納されている場合には、そのチャンクに属する処理データのいずれかが先行ジョブ実行部２０で実行されている状態であることを示す。また、処理データ管理レコードは、チャンクを構成する最後尾の処理データが後続ジョブ実行部２０でジョブが実行されてコミットされたときに削除される。 In the processing data management table 421, registration and deletion are performed in units of processing data management records (data information) composed of the data items. The processing data management record is registered when the first processing data constituting the chunk is output from the preceding job execution unit 20. The chunk last data number of the processing data management record at the time of registration stores “-” indicating that no data exists yet. Thereafter, when the last process data constituting the chunk is output from the preceding job execution unit 20, the number of the process data is stored in the chunk last data number of the process data management record. That is, when “−” is stored in the chunk tail data number, this indicates that any of the processing data belonging to the chunk is being executed by the preceding job execution unit 20. Further, the processing data management record is deleted when the last processing data constituting the chunk is committed by executing the job in the subsequent job execution unit 20.

後続処理中データ番号４２２には、処理データ格納領域４１に格納されている処理データのうち、後続ジョブ実行部２０で処理中の処理データに付与された番号（データ情報）が格納される。 The subsequent processing data number 422 stores a number (data information) given to the processing data being processed by the subsequent job execution unit 20 among the processing data stored in the processing data storage area 41.

図１を参照して、エージェント部１０について詳細に説明する。図１に示すように、エージェント部１０は、処理結果データ受信部１１０と、終了コード確認部１１１と、処理データ更新部１１２と、データ要求受信部１２０と、処理対象データ管理部１２１と、処理データ取得部１２３と、データ送信部１２４と、コミット通知受信部１３０と、障害発生通知部１４０と、障害発生通知受信部１４１と、ダンプ部１４２と、データ復旧部１４３と、を有する。 The agent unit 10 will be described in detail with reference to FIG. As shown in FIG. 1, the agent unit 10 includes a processing result data receiving unit 110, an end code checking unit 111, a processing data updating unit 112, a data request receiving unit 120, a processing target data management unit 121, a processing The data acquisition unit 123, the data transmission unit 124, the commit notification reception unit 130, the failure occurrence notification unit 140, the failure occurrence notification reception unit 141, the dump unit 142, and the data recovery unit 143 are included.

処理結果データ受信部１１０は、先行ジョブ実行部２０から処理結果データを受信する。処理結果データは、ジョブ実行部２０に入力されたデータごとに、ジョブ実行部２０から出力される。処理結果データには、ジョブを実行した結果出力される処理データおよびジョブの終了コードが含まれる。処理データには、チャンクの最後尾データであるか否かを示す情報が格納される。終了コードには、処理が正常に終了したか否かを示す情報が格納される。なお、ジョブ実行中に障害が発生すると、処理結果データには処理データが含まれず、終了コードのみが含まれることになる。 The processing result data receiving unit 110 receives processing result data from the preceding job execution unit 20. The processing result data is output from the job execution unit 20 for each data input to the job execution unit 20. The processing result data includes processing data output as a result of executing the job and a job end code. The processing data stores information indicating whether the data is the tail data of the chunk. The end code stores information indicating whether or not the processing has ended normally. If a failure occurs during job execution, the processing result data does not include the processing data, and only the end code is included.

終了コード確認部１１１は、先行ジョブ実行部２０から受信した処理結果データに含まれる終了コードに基づいて、処理が正常に終了したか否かを判定する。 The end code confirmation unit 111 determines whether or not the processing has ended normally based on the end code included in the processing result data received from the preceding job execution unit 20.

処理データ更新部１１２は、終了コード確認部１１１によって処理が正常に終了したと判定された場合に、先行ジョブ実行部２０から受信した処理データを処理データ格納領域４１に格納する。 The processing data update unit 112 stores the processing data received from the preceding job execution unit 20 in the processing data storage area 41 when the end code confirmation unit 111 determines that the processing has ended normally.

処理データ更新部１１２は、処理結果データに含まれる処理データがチャンクの最後尾データであるか否かを判定する。処理データ更新部１１２は、処理データがチャンクの最後尾データである場合には、チャンク最後尾データ番号に“−”が格納されているデータ情報管理レコードを検索する。処理データ更新部１１２は、検出したデータ情報管理レコードのチャンク最後尾データ番号に、先行ジョブ実行部２０から受信した処理データの番号を格納する。 The processing data update unit 112 determines whether the processing data included in the processing result data is the tail data of the chunk. When the processing data is the last data of the chunk, the processing data update unit 112 searches for a data information management record in which “−” is stored in the chunk last data number. The processing data update unit 112 stores the number of the processing data received from the preceding job execution unit 20 in the chunk last data number of the detected data information management record.

データ要求受信部１２０は、後続ジョブ実行部２０から、処理データを送信するように要求するデータ要求を受信する。データ要求受信部１２０は、データ要求を受信した場合に、その旨を処理対象データ管理部１２１に通知する。 The data request reception unit 120 receives a data request for requesting transmission of processing data from the subsequent job execution unit 20. When the data request receiving unit 120 receives a data request, the data request receiving unit 120 notifies the processing target data management unit 121 to that effect.

処理対象データ管理部１２１は、後続ジョブ実行部２０で直前に処理されていたデータを特定し、後続ジョブ実行部２０に送信するデータを決定する。具体的に、処理対象データ管理部１２１は、処理データ情報格納領域４２に格納されている後続処理中データ番号４２２を参照することで、後続ジョブ実行部２０で直前に処理されたデータの番号を特定する。処理対象データ管理部１２１は、特定した番号に１を加算した番号を送信対象データ情報として処理データ取得部１２３に通知する。 The processing target data management unit 121 identifies data that has been processed immediately before by the subsequent job execution unit 20 and determines data to be transmitted to the subsequent job execution unit 20. Specifically, the processing target data management unit 121 refers to the subsequent processing data number 422 stored in the processing data information storage area 42, thereby determining the number of data processed immediately before by the subsequent job execution unit 20. Identify. The processing target data management unit 121 notifies the processing data acquisition unit 123 of a number obtained by adding 1 to the specified number as transmission target data information.

処理データ取得部１２３は、処理対象データ管理部１２１から受信した送信対象データ情報に対応するデータが処理データ格納領域４１に格納されているか否かを判定する。処理データ取得部１２３は、送信対象データ情報に対応するデータが格納されている場合には、送信対象データ情報に対応するデータを取得する。処理データ取得部１２３は、送信対象データ情報に対応するデータが格納されていない場合には、所定時間待機してから再度データの有無を判定する処理を繰り返す。処理データ取得部１２３は、この再判定処理を所定回数繰り返しても送信対象データ情報に対応するデータを取得できない場合には、その旨のエラー通知を発行する。 The processing data acquisition unit 123 determines whether data corresponding to the transmission target data information received from the processing target data management unit 121 is stored in the processing data storage area 41. The process data acquisition unit 123 acquires data corresponding to the transmission target data information when data corresponding to the transmission target data information is stored. If the data corresponding to the transmission target data information is not stored, the processing data acquisition unit 123 waits for a predetermined time and then repeats the process of determining the presence or absence of data. If the data corresponding to the transmission target data information cannot be acquired even if the re-determination process is repeated a predetermined number of times, the process data acquisition unit 123 issues an error notification to that effect.

データ送信部１２４は、処理データ取得部１２３によって取得されたデータを後続ジョブ実行部２０に送信する。データ送信部１２４は、処理データ取得部１２３によって発行されたエラー通知を後続ジョブ実行部２０に送信する。 The data transmission unit 124 transmits the data acquired by the processing data acquisition unit 123 to the subsequent job execution unit 20. The data transmission unit 124 transmits the error notification issued by the processing data acquisition unit 123 to the subsequent job execution unit 20.

コミット通知受信部１３０は、後続ジョブ実行部２０からコミット通知を受信する。コミット通知は、各ジョブ実行部２０においてチャンク単位で処理が確定するたびに発行される。コミット通知には、チャンク内の最後尾データの番号が含まれる。 The commit notification receiving unit 130 receives a commit notification from the subsequent job execution unit 20. The commit notification is issued each time processing is confirmed in units of chunks in each job execution unit 20. The commit notification includes the number of the last data in the chunk.

処理対象データ管理部１２１は、後続ジョブ実行部２０からコミット通知が受信された場合に、処理データ管理テーブル４２１のチャンク最後尾データ番号を参照し、このチャンク最後尾データ番号が、コミット通知に含まれるチャンク内の最後尾データの番号以下となる処理データ管理レコードを削除する。 When the commit notification is received from the subsequent job execution unit 20, the processing target data management unit 121 refers to the chunk tail data number of the processing data management table 421, and this chunk tail data number is included in the commit notification. Delete the processing data management record that is less than or equal to the number of the last data in the chunk.

処理データ更新部１１２は、後続ジョブ実行部２０からコミット通知が受信された場合に、処理データ格納領域４１に格納されている処理データの番号を参照し、この処理データの番号が、コミット通知に含まれるチャンク内の最後尾データの番号以下となる処理データをチャンク単位で削除する。 When the commit notification is received from the subsequent job execution unit 20, the process data update unit 112 refers to the process data number stored in the process data storage area 41, and the process data number is used as the commit notification. Process data that is less than or equal to the last data number in the included chunk is deleted in units of chunks.

障害発生通知部１４０は、終了コード確認部１１１によって処理が異常終了したと判定された場合には、障害が発生した旨を示す障害発生通知をエージェント管理部３０に送信する。 When the end code confirmation unit 111 determines that the process has ended abnormally, the failure occurrence notification unit 140 transmits a failure occurrence notification indicating that a failure has occurred to the agent management unit 30.

障害発生通知受信部１４１は、障害発生通知を受信した場合に、処理データ更新部１１２および処理対象データ管理部１２１での処理機能を停止させる。障害発生通知受信部１４１は、障害発生通知を受信した場合に、ダンプ処理の実行指示をダンプ部１４２に送出する。 When the failure occurrence notification receiving unit 141 receives the failure occurrence notification, the failure occurrence notification receiving unit 141 stops the processing functions of the processing data update unit 112 and the processing target data management unit 121. When the failure occurrence notification receiving unit 141 receives the failure occurrence notification, the failure occurrence notification reception unit 141 sends an execution instruction for dump processing to the dump unit 142.

ダンプ部１４２は、処理データ格納領域４１に格納されている処理データ、処理データ管理テーブル４２１に格納されている処理データ管理レコードおよび後続処理中データ番号４２２に格納されている処理データの番号をそれぞれダンプ情報ファイル５０にダンプする。 The dump unit 142 displays the process data stored in the process data storage area 41, the process data management record stored in the process data management table 421, and the process data number stored in the subsequent process data number 422, respectively. Dump to dump information file 50.

データ復旧部１４３は、ダンプ情報ファイル５０に格納された各データを処理データ格納領域４１、処理データ管理テーブル４２１および後続処理中データ番号４２２にそれぞれ格納することで、メモリ４０の各データを障害直前の状態に復旧させる。 The data recovery unit 143 stores each data stored in the dump information file 50 in the processing data storage area 41, the processing data management table 421, and the subsequent processing data number 422, so that each data in the memory 40 is immediately before the failure. Restore to the state.

次に、図面を参照して本実施形態におけるジョブ実行装置１の動作について説明する。図３を参照して、エージェント部が後続ジョブ実行部に処理データを送信する際の動作について説明する。 Next, the operation of the job execution apparatus 1 in this embodiment will be described with reference to the drawings. With reference to FIG. 3, the operation when the agent unit transmits processing data to the subsequent job execution unit will be described.

最初に、データ要求受信部１２０は、後続ジョブ実行部２０からデータ要求を受信する（ステップＳ１０１）と、データ要求を受信した旨を処理対象データ管理部１２１に通知する（ステップＳ１０２）。 First, when the data request receiving unit 120 receives a data request from the subsequent job execution unit 20 (step S101), the data request receiving unit 120 notifies the processing target data management unit 121 that the data request has been received (step S102).

続いて、処理対象データ管理部１２１は、処理データ情報格納領域４２に格納されている後続処理中データ番号４２２を参照し、後続ジョブ実行部２０で直前に処理されたデータの番号を特定する（ステップＳ１０３）。 Subsequently, the processing target data management unit 121 refers to the subsequent processing data number 422 stored in the processing data information storage area 42 and specifies the number of the data processed immediately before by the subsequent job execution unit 20 ( Step S103).

続いて、処理対象データ管理部１２１は、特定した番号に１を加算した番号を送信対象データ情報として処理データ取得部１２３に送信する（ステップＳ１０４）。 Subsequently, the processing target data management unit 121 transmits a number obtained by adding 1 to the specified number to the processing data acquisition unit 123 as transmission target data information (step S104).

続いて、処理データ取得部１２３は、処理対象データ管理部１２１から受信した送信対象データ情報に対応する処理データが処理データ格納領域４１に格納されているか否かを判定する（ステップＳ１０５）。この判定がＹＥＳである場合（ステップＳ１０５；ＹＥＳ）に、処理データ取得部１２３は、送信対象データ情報に対応する処理データを処理データ格納領域４１から取得する（ステップＳ１０６）。 Subsequently, the processing data acquisition unit 123 determines whether or not processing data corresponding to the transmission target data information received from the processing target data management unit 121 is stored in the processing data storage area 41 (step S105). When this determination is YES (step S105; YES), the processing data acquisition unit 123 acquires processing data corresponding to the transmission target data information from the processing data storage area 41 (step S106).

続いて、データ送信部１２４は、送信対象データ情報に対応する処理データを後続ジョブ実行部２０に送信する（ステップＳ１０７）。 Subsequently, the data transmission unit 124 transmits the processing data corresponding to the transmission target data information to the subsequent job execution unit 20 (step S107).

続いて、処理対象データ管理部１２１は、後続ジョブ実行部２０に送信した処理データの番号を後続処理中データ番号４２２に格納して更新する（ステップＳ１０８）。 Subsequently, the processing target data management unit 121 stores and updates the number of the processing data transmitted to the subsequent job execution unit 20 in the subsequent processing data number 422 (step S108).

一方、上記ステップＳ１０５の判定で送信対象データ情報に対応する処理データが処理データ格納領域４１に格納されていないと判定された場合（ステップＳ１０５；ＮＯ）に、処理データ取得部１２３は、判定回数をカウントアップし（ステップＳ１０９）、判定回数が所定回数を超えたか否かを判定する（ステップＳ１１０）。この判定がＹＥＳである場合（ステップＳ１１０；ＹＥＳ）に、処理データ取得部１２３は、エラー通知を発行して後続ジョブ実行部２０に送信する（ステップＳ１１１）。 On the other hand, when it is determined in step S105 that the processing data corresponding to the transmission target data information is not stored in the processing data storage area 41 (step S105; NO), the processing data acquisition unit 123 determines the number of times of determination. Is counted up (step S109), and it is determined whether the number of determinations exceeds a predetermined number (step S110). If this determination is YES (step S110; YES), the processing data acquisition unit 123 issues an error notification and transmits it to the subsequent job execution unit 20 (step S111).

一方、上記ステップＳ１１０の判定で判定回数が所定回数以下である場合（ステップＳ１１０；ＮＯ）に、処理データ取得部１２３は、所定時間待機してから上述したステップＳ１０５に移行する。 On the other hand, when the number of determinations is equal to or less than the predetermined number in step S110 (step S110; NO), the processing data acquisition unit 123 waits for a predetermined time and then proceeds to step S105 described above.

図４を参照して、エージェント部が先行ジョブ実行部から処理結果データを受信する際の動作について説明する。 With reference to FIG. 4, the operation when the agent unit receives the processing result data from the preceding job execution unit will be described.

最初に、先行ジョブ実行部２０から処理結果データを受信する（ステップＳ２０１）と、終了コード確認部１１１は、処理結果データに含まれる終了コードに基づいて、処理が正常に終了したか否かを判定する（ステップＳ２０２）。この判定がＮＯである場合（ステップＳ２０２；ＮＯ）に、障害発生通知部１４０は、エージェント管理部３０に障害発生通知を送信する（ステップＳ２０３）。これにより、エージェント管理部３０が、全てのエージェント部１０に障害発生通知を送信し、各エージェント部１０は障害時の処理を行うことになる。 First, when processing result data is received from the preceding job execution unit 20 (step S201), the end code confirmation unit 111 determines whether or not the processing has ended normally based on the end code included in the processing result data. Determination is made (step S202). When this determination is NO (step S202; NO), the failure notification unit 140 transmits a failure notification to the agent management unit 30 (step S203). Thereby, the agent management unit 30 transmits a failure notification to all the agent units 10, and each agent unit 10 performs processing at the time of failure.

一方、上記ステップＳ２０２の判定で処理が正常に終了したと判定された場合（ステップＳ２０２；ＹＥＳ）に、処理データ更新部１１２は、処理結果データに含まれる処理データを処理データ格納領域４１に格納する（ステップＳ２０４）。 On the other hand, when it is determined in step S202 that the process has been normally completed (step S202; YES), the process data update unit 112 stores the process data included in the process result data in the process data storage area 41. (Step S204).

続いて、処理データ更新部１１２は、処理結果データに含まれる処理データがチャンクの最後尾データであるか否かを判定する（ステップＳ２０５）。この判定がＮＯである場合（ステップＳ２０５；ＮＯ）には処理を終了する。 Subsequently, the process data update unit 112 determines whether or not the process data included in the process result data is the tail data of the chunk (step S205). If this determination is NO (step S205; NO), the process ends.

一方、上記ステップＳ２０５の判定で処理データがチャンクの最後尾データであると判定された場合（ステップＳ２０５；ＹＥＳ）に、処理データ更新部１１２は、チャンク最後尾データに“−”が格納されているデータ情報管理レコードを検索する（ステップＳ２０６）。処理データ更新部１１２は、検出されたデータ情報管理レコードのチャンク最後尾データに、上記ステップＳ２０１で受信した処理結果データに含まれる処理データの番号を格納して更新する（ステップＳ２０７）。 On the other hand, when it is determined in step S205 that the processing data is the tail data of the chunk (step S205; YES), the processing data update unit 112 stores “-” in the chunk tail data. A data information management record is searched (step S206). The process data update unit 112 stores and updates the number of the process data included in the process result data received in step S201 in the chunk end data of the detected data information management record (step S207).

図５を参照して、障害が発生した際の動作について説明する。 With reference to FIG. 5, the operation when a failure occurs will be described.

最初に、障害発生通知受信部１４１は、エージェント管理部３０から送信された障害発生通知を受信する（ステップＳ３０１）と、処理対象データ管理部１２１および処理データ更新部１１２の処理機能を停止させる（ステップＳ３０２）。 First, when the failure occurrence notification receiving unit 141 receives the failure occurrence notification transmitted from the agent management unit 30 (step S301), the failure occurrence notification reception unit 141 stops the processing functions of the processing target data management unit 121 and the processing data update unit 112 ( Step S302).

続いて、障害発生通知受信部１４１は、ダンプ処理の実行指示をダンプ部１４２に送出する（ステップＳ３０３）。 Subsequently, the failure occurrence notification receiving unit 141 sends an execution instruction for dump processing to the dump unit 142 (step S303).

続いて、ダンプ部１４２は、処理データ格納領域４１に格納されている処理データ、処理データ管理テーブル４２１に格納されている処理データ管理レコードおよび後続処理中データ番号４２２に格納されている処理データの番号をそれぞれダンプ情報ファイル５０にダンプする（ステップＳ３０４）。 Subsequently, the dump unit 142 stores the processing data stored in the processing data storage area 41, the processing data management record stored in the processing data management table 421, and the processing data stored in the subsequent processing data number 422. Each number is dumped in the dump information file 50 (step S304).

続いて、障害が解消した場合（ステップＳ３０５；ＹＥＳ）に、データ復旧部１４３は、ダンプ情報ファイル５０ファイルにダンプさせた各データをメモリ４０に格納する（ステップＳ３０６）。これにより、メモリ４０の各データを障害直前の状態に復旧させることができる。なお、障害が解消したことは、例えば、管理者の入力操作にしたがってエージェント管理部３０から送信される障害解消通知を受信することで判定することができる。 Subsequently, when the failure is resolved (step S305; YES), the data recovery unit 143 stores each data dumped in the dump information file 50 in the memory 40 (step S306). Thereby, each data of the memory 40 can be restored to the state immediately before the failure. In addition, it can be determined that the failure has been resolved, for example, by receiving a failure resolution notification transmitted from the agent management unit 30 in accordance with an input operation by the administrator.

図６を参照して、エージェント部が後続ジョブ実行部からコミット通知を受信した際の動作について説明する。 With reference to FIG. 6, the operation when the agent unit receives a commit notification from the subsequent job execution unit will be described.

最初に、コミット通知受信部１３０は、後続ジョブ実行部２０からコミット通知を受信する（ステップＳ４０１）と、コミット通知に含まれるチャンク内の最後尾データの番号を処理対象データ管理部１２１に通知する（ステップＳ４０２）。 First, when the commit notification receiving unit 130 receives a commit notification from the subsequent job execution unit 20 (step S401), the commit notification receiving unit 130 notifies the processing target data management unit 121 of the number of the last data in the chunk included in the commit notification. (Step S402).

続いて、処理対象データ管理部１２１は、処理データ管理テーブル４２１のチャンク最後尾データ番号を参照し、このチャンク最後尾データ番号が、コミット通知に含まれるチャンク内の最後尾データの番号以下となる処理データ管理レコードを削除する（ステップＳ４０３）。 Subsequently, the processing target data management unit 121 refers to the chunk tail data number of the processing data management table 421, and this chunk tail data number is equal to or less than the number of the tail data in the chunk included in the commit notification. The processing data management record is deleted (step S403).

続いて、処理対象データ管理部１２１は、コミット通知に含まれるチャンク内の最後尾データの番号を処理データ更新部１１２に通知する（ステップＳ４０４）。 Subsequently, the processing target data management unit 121 notifies the processing data update unit 112 of the number of the last data in the chunk included in the commit notification (step S404).

続いて、処理データ更新部１１２は、処理データ格納領域４１に格納されている処理データの番号を参照し、この処理データの番号が、コミット通知に含まれるチャンク内の最後尾データの番号以下となる処理データをチャンク単位で削除する（ステップＳ４０５）。このように、コミットされたデータを削除していくことでメモリの記憶効率を向上させることができる。 Subsequently, the processing data update unit 112 refers to the number of the processing data stored in the processing data storage area 41, and the processing data number is less than or equal to the number of the last data in the chunk included in the commit notification. Is deleted in units of chunks (step S405). In this way, the memory storage efficiency can be improved by deleting committed data.

上述してきたように、実施形態におけるジョブ実行装置１によれば、障害が発生した場合、並列に実行されるジョブごとに障害発生時に未確定の処理データをその処理状態とともに退避させることができる。したがって、障害が回復した場合に、その退避データを復旧させることで、各ジョブの処理データを障害発生時の状態に戻すことができる。つまり、データ復旧後に各ジョブの実行を再開させることで、障害発生時の状態から処理を再開させることが可能になる。それゆえに、複数のジョブを並列して実行する場合であっても障害回復後のデータ復旧時間を短縮することができる。 As described above, according to the job execution device 1 in the embodiment, when a failure occurs, it is possible to save unconfirmed processing data together with the processing state when a failure occurs for each job executed in parallel. Therefore, when the failure is recovered, the processing data of each job can be returned to the state at the time of the failure by restoring the saved data. In other words, by restarting the execution of each job after data recovery, it is possible to restart the process from the state at the time of occurrence of the failure. Therefore, even when a plurality of jobs are executed in parallel, the data recovery time after failure recovery can be shortened.

ここで、ジョブ実行装置１には、上述した各部の機能を実現させるプログラムがインストールされている。このプログラムを実行することで、上述した各部の機能を実現することができる。 Here, the job execution apparatus 1 is installed with a program for realizing the functions of the above-described units. By executing this program, the function of each unit described above can be realized.

なお、上述した実施形態は、単なる例示に過ぎず、実施形態に明示していない種々の変形や技術の適用を排除するものではない。すなわち、本発明は、その趣旨を逸脱しない範囲で様々な形態に変形して実施することができる。 Note that the above-described embodiment is merely an example, and does not exclude various modifications and technical applications that are not explicitly described in the embodiment. That is, the present invention can be implemented by being modified into various forms without departing from the spirit of the present invention.

１…ジョブ実行装置、１０…エージェント部、２０…ジョブ実行部、３０…エージェント管理部、４０…メモリ、４１…処理データ格納領域、４２…処理データ情報格納領域、５０…ダンプ情報ファイル、１１０…処理結果データ受信部、１１１…終了コード確認部、１１２…処理データ更新部、１２０…データ要求受信部、１２１…処理対象データ管理部、１２３…処理データ取得部、１２４…データ送信部、１３０…コミット通知受信部、１４０…障害発生通知部、１４１…障害発生通知受信部、１４２…ダンプ部、１４３…データ復旧部、４２１…処理データ管理テーブル、４２２…後続処理中データ番号。 DESCRIPTION OF SYMBOLS 1 ... Job execution apparatus, 10 ... Agent part, 20 ... Job execution part, 30 ... Agent management part, 40 ... Memory, 41 ... Processing data storage area, 42 ... Processing data information storage area, 50 ... Dump information file, 110 ... Processing result data receiving unit, 111... End code checking unit, 112... Processing data updating unit, 120... Data request receiving unit, 121... Processing target data management unit, 123. Commit notification receiving unit, 140... Failure occurrence notification unit, 141... Failure occurrence notification receiving unit, 142... Dump unit, 143... Data recovery unit, 421 .. processing data management table, 422.

Claims

A job execution apparatus capable of executing in parallel a first job and a second job constituting a batch process,
A first job execution unit for executing the first job;
A data storage unit for storing first data output as a result of execution of the first job by the first job execution unit;
A second job execution unit for executing the second job for processing the first data stored in the data storage unit;
A data information storage unit for storing data information including data specifying information for specifying the first data being processed in the second job;
A dump unit that dumps the first data stored by the data storage unit and the data information stored by the data information storage unit when a failure occurs;
A job execution apparatus comprising:

When the second data output as a result of executing the second job by the second job execution unit is committed, the commit is started from the first data stored in the data storage unit. The job execution apparatus according to claim 1, further comprising a data deleting unit that deletes the first data corresponding to the second data.

The data information further includes data group specifying information for specifying a data group composed of a plurality of the first data stored in the data storage unit,
The second job execution unit commits the second data for each data group,
When the second data is committed, the data deleting unit changes the first data stored in the data storage unit to the data group corresponding to the committed second data. The job execution apparatus according to claim 2, wherein the first data included is deleted.

4. The data recovery unit according to claim 1, further comprising a data recovery unit that stores the first data and the data information dumped by the dump unit in a storage area at the time of dumping. 5. Job execution device.

A job execution method executed in an apparatus capable of executing a first job and a second job constituting a batch process in parallel,
A first job execution step for executing the first job;
A data storage step of storing first data output as a result of execution of the first job in the first job execution step;
A second job execution step for executing the second job for processing the first data stored in the data storage step;
A data information storing step for storing data information including data specifying information for specifying the first data being processed in the second job;
A dump step for dumping the first data stored in the data storage step and the data information stored in the data information storage step when a failure occurs;
Including a job execution method.

A job execution program for causing a computer to execute each step according to claim 5.