JP2011233086A

JP2011233086A - Management computer, job scheduling method and job scheduling program

Info

Publication number: JP2011233086A
Application number: JP2010105301A
Authority: JP
Inventors: Hirokazu Matsumoto; 洋和松本; Satoshi Watanabe; 聡渡辺; Shinji Hamada; 真二浜田; Noriaki Takahashi; 則明高橋
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2010-04-30
Filing date: 2010-04-30
Publication date: 2011-11-17
Anticipated expiration: 2030-04-30
Also published as: JP5417626B2

Abstract

PROBLEM TO BE SOLVED: To perform the scheduling of a job so that a preliminarily scheduled job end time can be satisfied even when any unexpected system stop occurs in a computer during the execution of the job.SOLUTION: In this management computer connected to one or more execution computers which perform a plurality of jobs, and configured to assign a plurality of jobs to the one or more execution computers, each job includes a plurality of services, and each of those services includes program modules to be processed by the execution computer. The management computer is configured to acquire a prescribed coefficient and the end time of each job, and to assign each service to the execution computer so that the service can be executed the number of times shown by the acquired prescribed coefficient, and to, when any failure occurs in the execution computer to which the service has been assigned during the processing of each service by the execution computer, make the execution computer re-process the assigned service.

Description

本発明は、管理計算機、ジョブスケジューリング方法及びジョブスケジューリングプログラムに係り、特に、ジョブの実行をスケジュールする管理計算機、ジョブスケジューリング方法及びジョブスケジューリングプログラムに関する。 The present invention relates to a management computer, a job scheduling method, and a job scheduling program, and more particularly to a management computer, a job scheduling method, and a job scheduling program for scheduling job execution.

従来、バッチジョブは、メインフレーム、又は少数台の計算機等、事前に構成された計算機上でジョブスケジューリングされ、実行されていた。このため、バッチジョブに含まれるサービスの割り当て及びジョブ終了時間の見積もりは、事前に設計可能であった。 Conventionally, a batch job has been scheduled and executed on a pre-configured computer such as a mainframe or a small number of computers. For this reason, the allocation of services included in the batch job and the estimation of the job end time can be designed in advance.

近年の計算機の高速化、及びネットワークの高速化などに伴い、サーバの仮想化又は並列分散による処理が増加している。このようにサーバの仮想化又は並列分散が実装された環境における処理は、従来に比べて多くの物理計算機、又は仮想計算機を用いたクラスタシステムによって実行される。このようなクラスタシステムにおいて処理可能な計算機の構成は、台数の増減、及びリソース使用量の増減といった要因によって変化するため、クラスタシステムにおいてバッチジョブをスケジューリングする際にサービスの割り当てを手動で設定するのは従来に比べて困難となる。 With recent increases in computer speed and network speed, processing by server virtualization or parallel distribution is increasing. Processing in an environment in which server virtualization or parallel distribution is implemented in this way is executed by a cluster system using more physical computers or virtual computers than in the past. Since the configuration of computers that can be processed in such a cluster system changes depending on factors such as increase / decrease in the number of units and increase / decrease in resource usage, service allocation is manually set when scheduling batch jobs in the cluster system. Becomes more difficult than in the past.

そこで、クラスタシステムにおいてジョブを実行し、計算機が障害になった場合又は計算機の性能が低下した場合、あらかじめ用意した計算機を含むプール領域から代替計算機を割り当てて処理を続行する技術が開示されている（例えば、特許文献１参照）。特許文献１に開示された技術では、代替計算機をスタンバイプール、ベアメタルプール、及び、共用プールに分類し、障害からの回復や負荷変動へ高速に対応する方法を提案されている。 Therefore, a technique is disclosed in which a job is executed in a cluster system, and when a computer fails or the performance of the computer decreases, a replacement computer is allocated from a pool area including a computer prepared in advance and processing is continued. (For example, refer to Patent Document 1). In the technique disclosed in Patent Document 1, a method has been proposed in which alternative computers are classified into a standby pool, a bare metal pool, and a shared pool, and recovery from a failure and a load change are handled at high speed.

また、過去のジョブ実行履歴情報からジョブの実行時間を求め、ジョブスケジュール案を提供する技術が開示されている（例えば、特許文献２参照）。特許文献２に開示された技術によって、ジョブ多重度増減などのパラメタ情報によって各ジョブの実行シミュレーションを行い、ジョブスケジューリングを解析する方法が提案されている。 Further, a technique for obtaining a job schedule plan by obtaining a job execution time from past job execution history information is disclosed (for example, refer to Patent Document 2). According to the technique disclosed in Patent Document 2, a method has been proposed in which job execution simulation is performed based on parameter information such as job multiplicity increase / decrease and job scheduling is analyzed.

特開２００５−３４６２０４号公報JP 2005-346204 A 特開平８−２８６９５８号公報JP-A-8-286958

しかし、特許文献１に開示された技術は、ジョブが実行中に障害が発生した場合、代替のための計算機に、障害によって中断されていたサービスを割り当てて処理を続行する。このため、特許文献１に開示された技術を用いた場合、ジョブ実行前に予定されていたジョブ終了時間を超過する可能性がある。 However, in the technique disclosed in Patent Document 1, when a failure occurs while a job is being executed, a service interrupted due to the failure is assigned to a replacement computer and the processing is continued. For this reason, when the technique disclosed in Patent Document 1 is used, there is a possibility that the job end time scheduled before the job execution will be exceeded.

また、特許文献２に開示された技術は、短時間でバッチジョブが完了するように、ジョブを並列実行して効率のよいジョブスケジュールを生成するが、ジョブ実行中に計算機に障害が発生した場合について考慮されていない。このため、特許文献２に開示された技術を用いた場合、障害が発生した場合、ジョブ実行前に予定されていたジョブ終了時間を超過する可能性がある。 In addition, the technique disclosed in Patent Document 2 generates an efficient job schedule by executing jobs in parallel so that batch jobs can be completed in a short time, but when a computer failure occurs during job execution Is not considered. For this reason, when the technique disclosed in Patent Document 2 is used, if a failure occurs, there is a possibility that the job end time scheduled before the job execution will be exceeded.

本発明の目的は、ジョブ実行中に計算機において、障害等による想定外なシステム停止が発生しても、事前に予定されたジョブ終了時間を満たすように、ジョブをスケジューリングする手段を提供する。 An object of the present invention is to provide means for scheduling a job so that a job end time scheduled in advance is satisfied even if an unexpected system stoppage due to a failure or the like occurs in a computer during job execution.

本発明の代表的な一形態によると、複数のジョブを処理する一つ以上の実行計算機と、前記一つ以上の実行計算機と接続され、前記複数のジョブを前記一つ以上の実行計算機に割り当てる管理計算機であって、前記各ジョブは、複数のサービスを含み、前記サービスは、前記実行計算機によって処理されるプログラムモジュールを含み、前記管理計算機は、所定の係数と、前記各ジョブの終了時間とを取得し、前記取得された所定の係数が示す回数分実行されるように、前記各サービスを前記実行計算機に割り当て、前記各サービスが前記実行計算機によって処理されている間に、前記サービスが割り当てられた前記実行計算機において障害が発生した場合、前記割り当てられたサービスを、前記実行計算機によって再度処理させる。 According to an exemplary embodiment of the present invention, one or more execution computers that process a plurality of jobs and the one or more execution computers are connected, and the plurality of jobs are assigned to the one or more execution computers. Each job includes a plurality of services, and the service includes a program module processed by the execution computer. The management computer includes a predetermined coefficient, an end time of each job, and And each service is assigned to the execution computer so that the service is executed as many times as indicated by the obtained predetermined coefficient, and the service is assigned while each service is being processed by the execution computer. If a failure occurs in the assigned execution computer, the assigned service is processed again by the execution computer.

本発明の一実施形態によると、ジョブ実行中に計算機に想定外なシステム停止が発生した場合でも、計算機リソースを有効利用しながら、ジョブの終了時間を見積もることができる。 According to an embodiment of the present invention, even when an unexpected system stoppage occurs in a computer during job execution, the job end time can be estimated while effectively using computer resources.

本発明の第１の実施形態のクラスタシステムの構成を示すブロック図である。It is a block diagram which shows the structure of the cluster system of the 1st Embodiment of this invention. 本発明の第１の実施形態のスケジュール管理計算機のハードウェアの構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the schedule management computer of the 1st Embodiment of this invention. 本発明の第１の実施形態の実行計算機のハードウェアの構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the execution computer of the 1st Embodiment of this invention. 本発明の第１の実施形態のスケジュール管理テーブルの構成を示す説明図である。It is explanatory drawing which shows the structure of the schedule management table of the 1st Embodiment of this invention. 本発明の第１の実施形態の統計情報管理テーブルの構成を示す説明図である。It is explanatory drawing which shows the structure of the statistical information management table of the 1st Embodiment of this invention. 本発明の第１の実施形態のジョブ及びサービスの関係の例を示す説明図である。It is explanatory drawing which shows the example of the relationship of the job and service of the 1st Embodiment of this invention. 本発明の第１の実施形態のジョブスケジューリング部によるジョブスケジューリング処理の手順を示す説明図である。It is explanatory drawing which shows the procedure of the job scheduling process by the job scheduling part of the 1st Embodiment of this invention. 本発明の第１の実施形態のジョブスケジューリング部によるサービス振り分け処理４０７の手順を示す説明図である。It is explanatory drawing which shows the procedure of the service distribution process 407 by the job scheduling part of the 1st Embodiment of this invention. 本発明の第１の実施形態のジョブスケジューリング処理のサービスの具体例を示す説明図である。It is explanatory drawing which shows the specific example of the service of the job scheduling process of the 1st Embodiment of this invention. 本発明の第１の実施形態の統計情報管理テーブルの具体例を示す説明図である。It is explanatory drawing which shows the specific example of the statistical information management table of the 1st Embodiment of this invention. 本発明の第１の実施形態のケース１における入力値の具体例を示す説明図である。It is explanatory drawing which shows the specific example of the input value in case 1 of the 1st Embodiment of this invention. 本発明の第１の実施形態のケース１におけるジョブスケジューリング結果を示す説明図である。It is explanatory drawing which shows the job scheduling result in case 1 of the 1st Embodiment of this invention. 本発明の第１の実施形態のケース１におけるスケジュール管理テーブルを示す説明図である。It is explanatory drawing which shows the schedule management table in case 1 of the 1st Embodiment of this invention. 本発明の第１の実施形態のケース２における入力値の具体例を示す説明図である。It is explanatory drawing which shows the specific example of the input value in case 2 of the 1st Embodiment of this invention. 本発明の第１の実施形態のケース２におけるジョブスケジューリング結果を示す説明図である。It is explanatory drawing which shows the job scheduling result in case 2 of the 1st Embodiment of this invention. 本発明の第１の実施形態のケース２におけるスケジュール管理テーブルを示す説明図である。It is explanatory drawing which shows the schedule management table in case 2 of the 1st Embodiment of this invention. 本発明の第１の実施形態のディスプレイ装置に表示されるジョブ投入画面を示す説明図である。It is explanatory drawing which shows the job input screen displayed on the display apparatus of the 1st Embodiment of this invention.

以下、本発明の実施の形態を、図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の第１の実施形態のクラスタシステムの構成を示すブロック図である。 FIG. 1 is a block diagram showing the configuration of the cluster system according to the first embodiment of this invention.

本発明のクラスタシステムは、スケジュール管理計算機１１、複数の実行計算機１０−１〜１０−ｎ（ｎは任意の正の数）、及び処理データ１４１を格納するディスク装置を備える。なお、実行計算機１０−１〜１０−ｎを総称して、実行計算機１０と記載する。 The cluster system of the present invention includes a schedule management computer 11, a plurality of execution computers 10-1 to 10-n (n is an arbitrary positive number), and a disk device that stores processing data 141. The execution computers 10-1 to 10-n are collectively referred to as the execution computer 10.

本発明のクラスタシステムにおいてジョブスケジューリングする場合、スケジュール管理計算機１１は、入力値１０１を入力される。入力値１０１は、入力データ量１０２、ジョブ終了時間１０３、及び障害安全係数１０４を含む。 When job scheduling is performed in the cluster system of the present invention, the schedule management computer 11 receives an input value 101. The input value 101 includes an input data amount 102, a job end time 103, and a failure safety coefficient 104.

入力データ量１０２は、ジョブが処理する入力データの量である。入力データの量は、ジョブ実行時間が増減する要因となる値である。入力データの量以外の要因によって、ジョブの処理時間が増減する場合、入力値１０１には、ジョブの処理時間を増減する要因を数値化した値が保持されてもよい。また、入力値１０１がジョブに含まれるサービス毎に異なる場合、入力値１０１は、入力データ量１０２をサービス毎に複数保持してもよい。なおサービスとは、ジョブに含まれる実行単位である。サービスについては詳細を後述する。 The input data amount 102 is the amount of input data processed by the job. The amount of input data is a value that causes the job execution time to increase or decrease. When the job processing time increases or decreases due to factors other than the amount of input data, the input value 101 may hold a value obtained by quantifying the factor that increases or decreases the job processing time. When the input value 101 is different for each service included in the job, the input value 101 may hold a plurality of input data amounts 102 for each service. A service is an execution unit included in a job. Details of the service will be described later.

ジョブ終了時間１０３は、ジョブ及びジョブに含まれるすべてのサービスが終了すべき時間を示す時間である。ジョブ及びジョブに含まれるサービスは、ジョブ終了時間１０３が示す終了時間を超過しないようにスケジューリングされる。 The job end time 103 is a time indicating the time at which the job and all services included in the job should be ended. The job and the service included in the job are scheduled so as not to exceed the end time indicated by the job end time 103.

障害安全係数１０４は、０以上の整数値であり、ジョブが実行される間に、実行計算機１０において障害が発生した場合においても、ジョブ終了時間１０３を満たすことができる、ジョブの再実行回数を示す。すなわち、障害安全係数１０４をｎとした場合、ジョブ実行中に実行計算機１０にｎ回障害が発生した場合においても、ジョブ終了時間１０３を超過しないようにジョブがスケジューリングされる。 The failure safety coefficient 104 is an integer value equal to or greater than 0, and indicates the number of job re-executions that can satisfy the job end time 103 even if a failure occurs in the execution computer 10 while the job is being executed. Show. In other words, when the failure safety coefficient 104 is n, a job is scheduled so that the job end time 103 is not exceeded even if a failure occurs n times in the execution computer 10 during job execution.

障害安全係数１０４は、ジョブの重要度、又は、ジョブに含まれるサービスを実行するための実行計算機１０の稼働率などに従って、ユーザ又はシステムによって値を指定されてもよい。また、システムにおける固定値として保持されてもよい。 The failure safety coefficient 104 may be designated by the user or the system according to the importance of the job or the operating rate of the execution computer 10 for executing the service included in the job. Moreover, you may hold | maintain as a fixed value in a system.

例えばユーザは、必ず実行される必要があるジョブには、障害安全係数１０４に高い値を指定する。これによって、高い値を指定されたジョブは、実行中に障害が発生しても、障害安全係数１０４の値分繰り返して実行される。 For example, the user designates a high value for the failure safety coefficient 104 for a job that must be executed. Accordingly, even if a failure occurs during execution, a job designated with a high value is repeatedly executed by the value of the failure safety coefficient 104.

ジョブの重要度に従って障害安全係数１０４を判定する方法には、ジョブに含まれるサービスが、データを参照するサービス、又は、重要度の低いデータを更新するサービスでしかなく、ジョブの実行を中断しても後続ジョブの動作に影響を与えないようなサービスである場合、障害安全係数１０４が低く指定される方法がある。また、この方法を用いた場合、重要度の高いデータを更新するサービス、又は、ジョブを中断すると後続ジョブの動作に影響を与えるようなサービスには、障害安全係数１０４が高く指定される。 In the method of determining the failure safety coefficient 104 according to the importance of the job, the service included in the job is only a service that refers to data or a service that updates data with low importance, and the execution of the job is interrupted. However, if the service does not affect the operation of the succeeding job, there is a method in which the failure safety factor 104 is designated low. In addition, when this method is used, the failure safety coefficient 104 is designated high for a service that updates data with high importance or a service that affects the operation of a subsequent job when a job is interrupted.

実行計算機１０の稼働率に従って障害安全係数１０４を判定する方法は、過去のシステム停止発生件数、実行計算機の稼働年数、又は、ハードウェア構成などの情報に基づいて、システム停止の可能性が高いと判定された実行計算機１０を備えるクラスタシステムには、障害安全係数１０４を高く指定するという方法である。 The method for determining the failure safety coefficient 104 according to the operating rate of the execution computer 10 is that there is a high possibility of the system being stopped based on information such as the past number of system stop occurrences, the number of operating years of the execution computer, or the hardware configuration. For the cluster system including the determined execution computer 10, the failure safety coefficient 104 is designated high.

スケジュール管理計算機１１は、ジョブ受付部１１１、ジョブスケジューリング部１１２、実行指示部１１３、及び、統計情報管理部１１４を備える。 The schedule management computer 11 includes a job reception unit 111, a job scheduling unit 112, an execution instruction unit 113, and a statistical information management unit 114.

ジョブ受付部１１１は、ユーザ等から前述の入力値１０１をスケジュール管理計算機１１に入力するためのインターフェースである。ジョブスケジューリング部１１２は、入力された入力値１０１と統計情報管理テーブル１３２のデータとに基づいて、ジョブをスケジューリングする。そして、ジョブをスケジューリングした結果を、スケジュール管理テーブル１３１に格納する。 The job reception unit 111 is an interface for inputting the above-described input value 101 from the user or the like to the schedule management computer 11. The job scheduling unit 112 schedules a job based on the input value 101 that has been input and the data in the statistical information management table 132. Then, the job scheduling result is stored in the schedule management table 131.

実行指示部１１３は、スケジュール管理テーブル１３１に格納されたデータに基づいて、実行計算機１０−１〜１０−ｎにサービスのデプロイ及びサービスの実行を指示する。 The execution instruction unit 113 instructs the execution computers 10-1 to 10-n to deploy the service and execute the service based on the data stored in the schedule management table 131.

実行計算機１０−１〜１０−ｎは、サービスデプロイ部１２１、サービス実行部１２２、統計情報転送部１２３を備える。 The execution computers 10-1 to 10-n include a service deployment unit 121, a service execution unit 122, and a statistical information transfer unit 123.

サービスデプロイ部１２１は、実行指示部１１３から受けた指示に従って、サービスを計算機上にデプロイする。デプロイとは、サービスが用いるアプリケーションのローディング、及び、アプリケーションの各種設定を行い、サービスを実行可能な状態に、実行計算機１０を構築するための処理である。 The service deployment unit 121 deploys the service on the computer according to the instruction received from the execution instruction unit 113. Deploying is a process for loading the application used by the service and performing various settings of the application to build the execution computer 10 in a state where the service can be executed.

サービス実行部１２２は、実行指示部１１３から受けた指示に従って、処理データ１４１を参照及び更新しながら、サービスを実行する。統計情報転送部１２３は、サービスデプロイ部１２１及びサービス実行部１２２において、サービスを実行するために経過した時間を、統計情報管理部１１４に転送する。 The service execution unit 122 executes the service while referring to and updating the processing data 141 according to the instruction received from the execution instruction unit 113. The statistical information transfer unit 123 transfers the elapsed time for executing the service in the service deployment unit 121 and the service execution unit 122 to the statistical information management unit 114.

統計情報管理部１１４は、統計情報転送部１２３から転送された、サービスデプロイ部１２１において経過した時間及びサービス実行部１２２において経過した時間を、統計情報管理テーブル１３２に格納する。 The statistical information management unit 114 stores the time elapsed in the service deployment unit 121 and the time elapsed in the service execution unit 122 transferred from the statistical information transfer unit 123 in the statistical information management table 132.

スケジュール管理計算機１１は、実行中の実行計算機１０−１〜１０−ｎに障害が発生した場合、障害を検出する手段を備える。障害を検出する手段は、サービスのデプロイ、又は、サービスの実行終了の時刻までに、統計情報管理部１１４に統計情報が転送されないことを契機として、障害を検出する方法であってもよい。また、実行指示部１１３の通信が途絶えることを契機として障害を検出する方法であってもよい。また、前記以外の別の通信によって直接あるいは間接的に監視する方法を用いてもよい。 The schedule management computer 11 includes means for detecting a failure when a failure occurs in the running execution computers 10-1 to 10-n. The means for detecting a failure may be a method for detecting a failure when the statistical information is not transferred to the statistical information management unit 114 by the time of service deployment or the end of execution of the service. Further, a method of detecting a failure when the communication of the execution instruction unit 113 is interrupted may be used. Moreover, you may use the method of monitoring directly or indirectly by another communication other than the above.

図２は、本発明の第１の実施形態のスケジュール管理計算機１１のハードウェアの構成を示すブロック図である。 FIG. 2 is a block diagram illustrating a hardware configuration of the schedule management computer 11 according to the first embodiment of this invention.

スケジュール管理計算機１１は、ＣＰＵ２１、ディスプレイ装置２２、キーボード２３、マウス２４、ネットワークインタフェースカード（ＮＩＣ）２５、ハードディスク２６及びメモリ２７を備える。ＣＰＵ２１、ディスプレイ装置２２、キーボード２３、マウス２４、ＮＩＣ２５、ハードディスク２６及びメモリ２７は、バス２８によって接続される。 The schedule management computer 11 includes a CPU 21, a display device 22, a keyboard 23, a mouse 24, a network interface card (NIC) 25, a hard disk 26 and a memory 27. The CPU 21, display device 22, keyboard 23, mouse 24, NIC 25, hard disk 26 and memory 27 are connected by a bus 28.

スケジュール管理計算機１１は、ＮＩＣ２５を介してネットワークに接続され、実行計算機１０及び他のスケジュール管理計算機１１と相互に通信する。ネットワークは、ＬＡＮ、ＷＡＮなどいずれのネットワークでもよい。 The schedule management computer 11 is connected to the network via the NIC 25 and communicates with the execution computer 10 and other schedule management computers 11. The network may be any network such as a LAN or a WAN.

ＣＰＵ２１は、メモリ２７に記憶されたプログラムを実行する。メモリ２７は、ＣＰＵ２１によって実行されるプログラム及び当該プログラムの実行に必要なデータを記憶する。メモリ２７には、オペレーティングシステム３０、スケジューリング制御プログラム９１、ジョブ受付部１１１、ジョブスケジューリング部１１２、実行指示部１１３、統計情報管理部１１４、スケジュール管理テーブル１３１及び統計情報管理テーブル１３２などのプログラムが格納される。 The CPU 21 executes a program stored in the memory 27. The memory 27 stores a program executed by the CPU 21 and data necessary for executing the program. The memory 27 stores programs such as the operating system 30, scheduling control program 91, job reception unit 111, job scheduling unit 112, execution instruction unit 113, statistical information management unit 114, schedule management table 131, and statistical information management table 132. Is done.

スケジューリング制御プログラム９１は、オペレーティングシステム３０上で実行されるプログラムである。ジョブ受付部１１１、ジョブスケジューリング部１１２、実行指示部１１３及び統計情報管理部１１４は、スケジューリング制御プログラム９１によって呼び出されるプログラムである。ジョブ受付部１１１、ジョブスケジューリング部１１２、実行指示部１１３及び統計情報管理部１１４は、図１において説明した処理を実行する。 The scheduling control program 91 is a program executed on the operating system 30. The job reception unit 111, job scheduling unit 112, execution instruction unit 113, and statistical information management unit 114 are programs called by the scheduling control program 91. The job reception unit 111, job scheduling unit 112, execution instruction unit 113, and statistical information management unit 114 execute the processing described with reference to FIG.

スケジュール管理テーブル１３１には、図１において説明したように、ジョブのスケジュールに関する情報が格納される。統計情報管理テーブル１３２は、図１において説明したように、サービスのデプロイ時間及び実行時間に関する情報が格納される。 As described with reference to FIG. 1, the schedule management table 131 stores information related to job schedules. As described with reference to FIG. 1, the statistical information management table 132 stores information related to service deployment time and execution time.

ディスプレイ装置２２は、業務処理の実行結果、すなわちジョブの実行結果などの各種情報を表示する。キーボード２３及びマウス２４は、利用者が入力値１０１を入力する際に、利用者によって用いられる装置である。 The display device 22 displays various information such as business process execution results, that is, job execution results. The keyboard 23 and the mouse 24 are devices used by the user when the user inputs the input value 101.

ＮＩＣ２５は、スケジュール管理計算機１１がネットワークに接続するためのインターフェースである。ハードディスク２６は、メモリ２７に格納される処理データ、及び、メモリ２７にロードされるプログラムなどが格納される。 The NIC 25 is an interface for connecting the schedule management computer 11 to the network. The hard disk 26 stores processing data stored in the memory 27, a program loaded into the memory 27, and the like.

スケジュール管理計算機１１は、仮想計算機上で実行されるプログラムによって実装されてもよい。 The schedule management computer 11 may be implemented by a program executed on the virtual computer.

図３は、本発明の第１の実施形態の実行計算機１０−１〜１０−ｎのハードウェアの構成を示すブロック図である。 FIG. 3 is a block diagram illustrating a hardware configuration of the execution computers 10-1 to 10-n according to the first embodiment of this invention.

実行計算機１０−１〜１０−ｎは、図２において説明したスケジュール管理計算機１１のハードウェア構成と同様なハードウェアを備える。実行計算機１０のＣＰＵ３１、ディスプレイ装置３２、キーボード３３、マウス３４、ネットワークインタフェースカード（ＮＩＣ）３５、ハードディスク３６及びメモリ３７は、スケジュール管理計算機１１のＣＰＵ２１、ディスプレイ装置２２、キーボード２３、マウス２４、ＮＩＣ２５、ハードディスク２６及びメモリ２７と同じ装置である。 The execution computers 10-1 to 10-n include hardware similar to the hardware configuration of the schedule management computer 11 described in FIG. The CPU 31, display device 32, keyboard 33, mouse 34, network interface card (NIC) 35, hard disk 36, and memory 37 of the execution computer 10 are the CPU 21, display device 22, keyboard 23, mouse 24, NIC 25, schedule management computer 11. It is the same device as the hard disk 26 and the memory 27.

ただし、実行計算機１０に備わるメモリ３７には、オペレーティングシステム３０、実行制御プログラム９２、サービスデプロイ部１２１、サービス実行部１２２、統計情報転送部１２３及びサービス９９が格納される。 However, the memory 37 provided in the execution computer 10 stores the operating system 30, the execution control program 92, the service deployment unit 121, the service execution unit 122, the statistical information transfer unit 123, and the service 99.

実行制御プログラム９２は、オペレーティングシステム３０上で実行されるプログラムである。サービスデプロイ部１２１、サービス実行部１２２、及び、統計情報転送部１２３は、実行制御プログラム９２によって呼び出されるプログラムである。サービスデプロイ部１２１、サービス実行部１２２、及び、統計情報転送部１２３は、図１において説明した処理を実行する。サービス９９は、サービスデプロイ部１２１によってメモリ２７上に複数格納されるプログラムであり、サービス実行部１２２の指示によって実行される。 The execution control program 92 is a program executed on the operating system 30. The service deployment unit 121, the service execution unit 122, and the statistical information transfer unit 123 are programs that are called by the execution control program 92. The service deployment unit 121, the service execution unit 122, and the statistical information transfer unit 123 execute the processing described in FIG. A plurality of services 99 are programs stored on the memory 27 by the service deploying unit 121, and are executed according to instructions from the service execution unit 122.

実行計算機１０−１〜１０−ｎは、仮想計算機上で実行されるプログラムによって実装されてもよい。 The execution computers 10-1 to 10-n may be implemented by a program executed on the virtual computer.

図４は、本発明の第１の実施形態のスケジュール管理テーブル１３１の構成を示す説明図である。 FIG. 4 is an explanatory diagram illustrating a configuration of the schedule management table 131 according to the first embodiment of this invention.

スケジュール管理テーブル１３１は、各実行計算機１０において実行されるサービスデプロイ処理及びサービス実行処理の、開始時刻と終了時刻とを各々保持する。サービスデプロイ処理及びサービス実行処理は、ジョブスケジューリング部１１２によってスケジューリングされる。 The schedule management table 131 holds the start time and the end time of the service deployment process and the service execution process executed on each execution computer 10. Service deployment processing and service execution processing are scheduled by the job scheduling unit 112.

スケジュール管理テーブル１３１は、実行計算機名１３１−１、開始時刻１３１−２、終了時刻１３１−３、処理内容１３１−４、及び、対象処理１３１−５を含む。実行計算機名１３１−１は、サービス９９が実行される実行計算機１０を一意に示す名称が格納される。 The schedule management table 131 includes an execution computer name 131-1, a start time 131-2, an end time 131-3, a process content 131-4, and a target process 131-5. The execution computer name 131-1 stores a name that uniquely indicates the execution computer 10 on which the service 99 is executed.

開始時刻１３１−２は、実行計算機名１３１−１に示される実行計算機１０においてサービス９９が開始される時刻を示す。終了時刻１３１−３は、実行計算機名１３１−１に示される実行計算機１０においてサービス９９が終了する時刻を示す。 The start time 131-2 indicates a time at which the service 99 is started in the execution computer 10 indicated by the execution computer name 131-1. The end time 131-3 indicates the time when the service 99 ends in the execution computer 10 indicated by the execution computer name 131-1.

処理内容１３１−４は、サービス９９が処理される内容を示し、「デプロイ」又は「処理実行」の二つの内容を保持する。対象処理１３１−５は、処理されるサービス９９を一意に示す名称が格納される。 The processing content 131-4 indicates the content to be processed by the service 99 and holds two contents “deployment” and “processing execution”. The target process 131-5 stores a name that uniquely indicates the service 99 to be processed.

図４に示すスケジュール管理テーブル１３１は、全ての実行計算機１０におけるサービス９９を、一つのテーブルによって管理する場合の、テーブル例である。なお、サービス９９は、実行計算機１０毎に分けられたテーブルによって、管理されてもよい。 The schedule management table 131 shown in FIG. 4 is a table example when the service 99 in all the execution computers 10 is managed by one table. The service 99 may be managed by a table divided for each execution computer 10.

図５は、本発明の第１の実施形態の統計情報管理テーブル１３２の構成を示す説明図である。 FIG. 5 is an explanatory diagram illustrating a configuration of the statistical information management table 132 according to the first embodiment of this invention.

統計情報管理テーブル１３２によって、各サービス９９のデプロイ時間及び実行時間の履歴情報が管理される。統計情報管理テーブル１３２は、サービス名１３２−１、デプロイ時間１３２−２、及び、実行時間１３２−３を含む。各実行計算機１０の統計情報転送部１２３は、サービス９９が実行された後、デプロイ処理に要した時間、及び実行処理に要した時間を、統計情報管理部１１４に送信し、統計情報管理部１１４によって統計情報管理テーブル１３２が更新される。 The statistical information management table 132 manages the history information of the deployment time and execution time of each service 99. The statistical information management table 132 includes a service name 132-1, a deployment time 132-2, and an execution time 132-3. The statistical information transfer unit 123 of each execution computer 10 transmits the time required for the deployment process and the time required for the execution process to the statistical information management unit 114 after the service 99 is executed. As a result, the statistical information management table 132 is updated.

サービス名１３２−１は、処理されるサービス９９を一意に示す名称である。デプロイ時間１３２−２は、サービス名１３２−１が示すサービス９９がデプロイに要する時間を示す。実行時間１３２−３は、サービス名１３２−１が示すサービス９９が実行されるために要する時間を示す。 The service name 132-1 is a name that uniquely indicates the service 99 to be processed. The deployment time 132-2 indicates the time required for deployment by the service 99 indicated by the service name 132-1. The execution time 132-3 indicates the time required for the service 99 indicated by the service name 132-1 to be executed.

統計情報管理テーブル１３２は、図１において説明したとおり、サービスデプロイ部１２１によってサービス９９がデプロイされた時間（デプロイ時間１３２−２）、及び、サービス実行部１２２によってサービス９９が実行された時間（実行時間１３２−３）の履歴情報を保持する。統計情報管理テーブル１３２は、前述のとおり統計情報管理部１１３によって値が格納される。また、統計情報管理部１１３は、統計情報転送部１２３から転送されたジョブ９８の実行結果に基づいて、統計情報管理テーブル１３２に格納する値を算出する。 As described in FIG. 1, the statistical information management table 132 includes the time when the service 99 is deployed by the service deployment unit 121 (deployment time 132-2) and the time when the service 99 is executed by the service execution unit 122 (execution) The history information of time 132-3) is held. In the statistical information management table 132, values are stored by the statistical information management unit 113 as described above. Further, the statistical information management unit 113 calculates a value to be stored in the statistical information management table 132 based on the execution result of the job 98 transferred from the statistical information transfer unit 123.

統計情報管理テーブル１３２のデプロイ時間１３２−２及び実行時間１３２−３は、入力データ量１０２によって比例して増減する。このため統計情報管理テーブル１３２は、実行済みのジョブ９８の入力データ量１０２を”１００”とすることによって、ジョブ９８に含まれるサービス９９が、デプロイ処理に要した時間及び実行処理に要した時間を、相対時間に変換し、変換された各々の時間をデプロイ時間１３２−２及び実行時間１３２−３に格納する。 The deployment time 132-2 and the execution time 132-3 of the statistical information management table 132 increase and decrease in proportion to the input data amount 102. For this reason, the statistical information management table 132 sets the input data amount 102 of the executed job 98 to “100”, so that the service 99 included in the job 98 takes the time required for the deployment process and the time required for the execution process. Are converted into relative times, and the converted times are stored in the deployment time 132-2 and the execution time 132-3.

例えば、実行されたジョブ９８の入力データ量１０２が４００（バイト）であり、ジョブ９８に含まれるサービスＡのデプロイ処理に要した時間が４０（分）であった場合、デプロイ時間１３２−２には「１０」が格納される。 For example, if the input data amount 102 of the executed job 98 is 400 (bytes) and the time required for the deployment processing of the service A included in the job 98 is 40 (minutes), the deployment time 132-2 is set. Stores “10”.

入力データ量１０２以外の要因によって、サービスの処理時間が増減する場合は、当該要因を数値化した値に基づいて相対時間を算出し、算出された相対時間を統計情報管理テーブル１３２に格納してもよい。また、統計情報管理テーブル１３２に格納する時間は、過去の履歴情報における最大値を格納してもよい。 When the service processing time increases or decreases due to a factor other than the input data amount 102, the relative time is calculated based on a value obtained by quantifying the factor, and the calculated relative time is stored in the statistical information management table 132. Also good. The time stored in the statistical information management table 132 may store the maximum value in the past history information.

この場合、初回実行時は履歴情報が存在しないため、本発明によるジョブスケジューリングを行わずに各サービスのデプロイ及び実行を、手動で起動することによって、あらかじめデプロイ時間１３２−２及び実行時間１３２−３に値を格納する。 In this case, since there is no history information at the time of the first execution, the deployment time 132-2 and the execution time 132-3 are previously started by manually starting the deployment and execution of each service without performing job scheduling according to the present invention. Store the value in.

図６は、本発明の第１の実施形態のジョブ９８及びサービス９９の関係の例を示す説明図である。 FIG. 6 is an explanatory diagram illustrating an example of a relationship between the job 98 and the service 99 according to the first embodiment of this invention.

サービス９９は、ジョブ９８を構成する実行単位であり、ジョブ９８は、一つ以上のサービス９９を含む。図中の矢印は、サービス９９の依存関係であり、後続のサービス９９を実行する前に、先行するサービス９９を実行する必要があることを示す。例えば、サービス９９Ａは、サービス９９Ｂ、サービス９９Ｃ、及び、サービス９９Ｅよりも先行して実行され、サービス９９Ｂ、サービス９９Ｃ、及び、サービス９９Ｅは、サービス９９Ａが実行された後に、実行される。 The service 99 is an execution unit constituting the job 98, and the job 98 includes one or more services 99. The arrows in the figure are the dependencies of the service 99 and indicate that the preceding service 99 needs to be executed before the subsequent service 99 is executed. For example, the service 99A is executed prior to the service 99B, the service 99C, and the service 99E, and the service 99B, the service 99C, and the service 99E are executed after the service 99A is executed.

また、各サービス９９の実行前には、そのサービス９９がデプロイされている必要がある。すなわち、サービス９９Ｂが実行される前には、サービス９９Ｂがデプロイされている必要がある。 In addition, before each service 99 is executed, the service 99 needs to be deployed. That is, the service 99B needs to be deployed before the service 99B is executed.

ジョブ９８の依存関係に対応するため、サービス９９は、サービス９９の実行順序に対応した情報を各々保持する。サービス９９が保持する実行順序は、例えば、図６に示すように所定の命名ルールによって生成されたサービス９９を一意に示す識別子によって、示されてもよい。識別子の命名ルールによってサービス９９の実行順序を示す場合、サービス名の末尾に英文字（Ａ−Ｚ）の識別子を付与する。 In order to correspond to the dependency relationship of the job 98, the service 99 holds information corresponding to the execution order of the service 99. The execution order held by the service 99 may be indicated by, for example, an identifier that uniquely indicates the service 99 generated by a predetermined naming rule as shown in FIG. When the execution order of the service 99 is indicated by the identifier naming rule, an identifier of English letters (AZ) is added to the end of the service name.

例えば識別子は、英文字の先頭文字Ａから付与され、あるサービス９９に先行するサービス９９がある場合、識別子は、先行するサービス９９に付与した英文字よりも後となる英文字が付与される。ジョブ９８に含まれるサービル９９のうち、最後に実行されるサービス９９は、サービス９９中で一番後となる英文字を識別子として付与される。図６ではサービス９９Ａ〜９９Ｆの６つのサービス９９が、一つのジョブ９８に含まれる例を示す。 For example, the identifier is given from the first letter A of English letters, and when there is a service 99 preceding a certain service 99, the identifier is given an English letter that is later than the English letter given to the preceding service 99. Of the services 99 included in the job 98, the service 99 to be executed last is assigned with the last letter of the service 99 as an identifier. FIG. 6 shows an example in which six services 99 of services 99 </ b> A to 99 </ b> F are included in one job 98.

図６に示すジョブ９８を実行計算機１０−１〜１０−ｎ上で実行する場合、まずサービス９９Ａを実行する。そして、サービス９９Ａ実行終了後、サービス９９Ｂ、サービス９９Ｃ、及び、サービス９９Ｅを実行する。サービス９９Ｂ、サービス９９Ｃ、及び、サービス９９Ｅの実行順序は問わない。 When the job 98 shown in FIG. 6 is executed on the execution computers 10-1 to 10-n, the service 99A is first executed. After the execution of the service 99A, the service 99B, the service 99C, and the service 99E are executed. The execution order of the service 99B, the service 99C, and the service 99E does not matter.

またサービス９９は、複数の実行計算機１０上で並列に実行されてもよい。例えば、サービス９９Ｂ及びサービス９９Ｃの処理内容に関連がなく、別々の実行計算機１０によって並列に実行されてもよい場合、ジョブスケジューリング１１２は、サービス９９Ｂ及びサービス９９Ｃを別々の実行計算機１０にスケジューリングする。 The service 99 may be executed in parallel on a plurality of execution computers 10. For example, when the processing contents of the service 99B and the service 99C are not related and may be executed in parallel by different execution computers 10, the job scheduling 112 schedules the service 99B and the service 99C to the separate execution computers 10.

サービス９９Ｂ、サービス９９Ｃの実行終了後、サービス９９Ｄを実行する。サービス９９Ｄ、サービス９９Ｅの実行終了後、サービス９９Ｆを実行する。サービス９９Ｆの実行終了をもってジョブ９８が終了する。 After the execution of the service 99B and the service 99C, the service 99D is executed. After the execution of the service 99D and the service 99E, the service 99F is executed. The job 98 ends when the execution of the service 99F ends.

図７は、本発明の第１の実施形態のジョブスケジューリング部１１２によるジョブスケジューリング処理４０１の手順を示す説明図である。 FIG. 7 is an explanatory diagram illustrating a procedure of job scheduling processing 401 by the job scheduling unit 112 according to the first embodiment of this invention.

ＣＰＵ２１は、ジョブ受付部１１１が実行するべきジョブ９８を受け付けた場合、ジョブスケジューリング部１１２によって、図７に示す処理を行う。 When the job accepting unit 111 accepts a job 98 to be executed, the CPU 21 causes the job scheduling unit 112 to perform the process shown in FIG.

ＣＰＵ２１は、各実行計算機１０のメモリ２７の空き容量及びＣＰＵ２１の使用状況などに基づいて、ジョブスケジューリング可能な実行計算機１０が存在するか否かを判定する（ステップ４０２）。ジョブスケジューリング可能な実行計算機１０が１台もないと判定された場合、ＣＰＵ２１は実行不可と判定し、ジョブスケジューリング処理４０１を終了する。 The CPU 21 determines whether or not there is an execution computer 10 capable of job scheduling based on the free capacity of the memory 27 of each execution computer 10 and the usage status of the CPU 21 (step 402). If it is determined that there is no execution computer 10 that can perform job scheduling, the CPU 21 determines that execution is not possible and ends the job scheduling process 401.

ＣＰＵ２１は、ステップ４０２においてジョブスケジューリング可能と判定された実行計算機１０に、サービスのデプロイ処理及び処理実行の組を割り当て、各実行計算機１０に対応するスケジュール管理テーブル１３１を更新する（ステップ４０５）。デプロイ処理及び実行処理のために要する時間は、統計情報管理テーブル１３２に格納された値に入力データ量１０２の相対値を掛け合わせることによって算出される。 The CPU 21 assigns a set of service deployment processing and processing execution to the execution computer 10 determined to be capable of job scheduling in step 402, and updates the schedule management table 131 corresponding to each execution computer 10 (step 405). The time required for the deployment process and the execution process is calculated by multiplying the value stored in the statistical information management table 132 by the relative value of the input data amount 102.

ＣＰＵ２１は、障害安全係数１０４の値に１を加算した値分、ステップ４０５の処理を、繰り返し実行する（ステップ４０４）。そして、ステップ４０４の処理によって、デプロイ処理及び実行処理の組を、同一実行計算機１０に複数回割り当てる。障害安全係数１０４の値に１を加算する理由は、最初の１回目の処理回数を加算するためである。 The CPU 21 repeatedly executes the process of step 405 by the value obtained by adding 1 to the value of the failure safety coefficient 104 (step 404). Then, by the process of step 404, the set of the deploy process and the execution process is assigned to the same execution computer 10 a plurality of times. The reason for adding 1 to the value of the failure safety coefficient 104 is to add the first number of processing times.

またＣＰＵ２１は、ジョブ９８を構成するサービス９９の実行順序毎に、ステップ４０４における繰り返し処理を、繰り返し実行する（ステップ４０３）。ステップ４０３の処理によって、各サービス９９のデプロイ処理及び実行処理の複数の組を、同一実行計算機１０に割り当てる。 Further, the CPU 21 repeatedly executes the repetition process in step 404 for each execution order of the services 99 constituting the job 98 (step 403). Through the processing in step 403, a plurality of sets of deployment processing and execution processing for each service 99 are allocated to the same execution computer 10.

ＣＰＵ２１は、サービス９９を割り当てた各実行計算機１０において、ジョブ９８の終了予定時刻がジョブ終了時間１０３を超過するか否かを判定する（ステップ４０６）。超過していない場合は、実行可能と判定し、ジョブスケジューリング処理４０１を終了する。 The CPU 21 determines whether or not the scheduled end time of the job 98 exceeds the job end time 103 in each execution computer 10 to which the service 99 is assigned (step 406). If not exceeded, it is determined that the job can be executed, and the job scheduling process 401 ends.

その後、ＣＰＵ２１は、実行指示部１１３によって、スケジューリングされたサービス９９を各実行計算機１０に送信する。そして、実行計算機１０におけるサービスデプロイ部１２１及びサービス実行部１２２は、スケジュール管理計算機１１から送信されたサービス９９を実行する。 Thereafter, the CPU 21 transmits the scheduled service 99 to each execution computer 10 by the execution instruction unit 113. Then, the service deployment unit 121 and the service execution unit 122 in the execution computer 10 execute the service 99 transmitted from the schedule management computer 11.

ステップ４０６において超過すると判定された場合、ＣＰＵ２１は、後述するサービス振り分け処理を実施する（ステップ４０７）。ステップ４０７は、図８にて詳細に説明する。 If it is determined in step 406 that the number is exceeded, the CPU 21 performs a service distribution process described later (step 407). Step 407 will be described in detail with reference to FIG.

その後ＣＰＵ２１は、ステップ４０７の処理において振り分け可能であるか否かを判定し（ステップ４０８）、振り分け可能であると判定された場合、ステップ４０６の処理に戻る。振り分け不可であると判定された場合、実行不可と判定し、ジョブスケジューリング処理４０１を終了する。 Thereafter, the CPU 21 determines whether or not sorting is possible in the process of step 407 (step 408). If it is determined that sorting is possible, the process returns to step 406. If it is determined that distribution is not possible, it is determined that execution is impossible, and the job scheduling process 401 ends.

なお、ステップ４０６においてＣＰＵ２１は、ジョブ９８の終了予定時刻がジョブ終了時間１０３以下になった場合に、実行可能と判定するため、不要な実行計算機１０にサービス９９を振り分けることがない。このため本実施形態のスケジュール管理計算機１１は、実行計算機１０のリソースを最低限に抑えつつ、サービス９９を振り分けることができる。 In step 406, the CPU 21 determines that the job 98 can be executed when the scheduled end time of the job 98 is equal to or less than the job end time 103. Therefore, the service 99 is not allocated to the unnecessary execution computer 10. Therefore, the schedule management computer 11 of this embodiment can distribute the services 99 while minimizing the resources of the execution computer 10.

図８は、本発明の第１の実施形態のジョブスケジューリング部１１２によるサービス振り分け処理４０７の手順を示す説明図である。 FIG. 8 is an explanatory diagram illustrating a procedure of the service distribution process 407 by the job scheduling unit 112 according to the first embodiment of this invention.

ＣＰＵ２１は、ジョブ９８の終了予定時刻がジョブ終了時間１０３を超過すると判定された実行計算機１０を１台選択する（ステップ４２１）。そして、選択された実行計算機１０に、後述する処理を実行する。 The CPU 21 selects one execution computer 10 that is determined that the scheduled end time of the job 98 exceeds the job end time 103 (step 421). Then, a process to be described later is executed on the selected execution computer 10.

ＣＰＵ２１は、ステップ４２１において選択された実行計算機１０に振り分けられたサービス９９毎に、ステップ４２３〜ステップ４２５の処理を繰り返す（ステップ４２２）。 The CPU 21 repeats the processing from step 423 to step 425 for each service 99 assigned to the execution computer 10 selected in step 421 (step 422).

同一サービス９９に割り当てられたデプロイ処理及び実行処理が、２以上か否か判定する（ステップ４２３）。そして、同一サービス９９に２以上のデプロイ処理及び実行処理が割り当てられている場合、ＣＰＵ２１は、振り分け対象として当該サービス９９を仮選択する（ステップ４２５）。 It is determined whether or not the deployment process and the execution process assigned to the same service 99 are two or more (step 423). When two or more deployment processes and execution processes are assigned to the same service 99, the CPU 21 temporarily selects the service 99 as a distribution target (step 425).

また、図６で説明したとおり、サービス９９を別実行計算機１０上で並列に実行可能か否かを判定する（ステップ４２４）。なおサービス９９には、並列に実行可能であるか否かを示す情報があらかじめ含まれている。サービス９９が並列に実行可能な場合、ＣＰＵ２１は、並列に実行可能なサービス９９を、振り分け対象として仮選択する（ステップ４２５）。 Further, as described with reference to FIG. 6, it is determined whether or not the service 99 can be executed in parallel on another execution computer 10 (step 424). Note that the service 99 includes in advance information indicating whether or not the services 99 can be executed in parallel. When the service 99 can be executed in parallel, the CPU 21 provisionally selects the service 99 that can be executed in parallel as a distribution target (step 425).

その後ＣＰＵ２１は、振り分け対象として仮選択したサービス９９があるか否かを判定する（ステップ４２６）。そして、仮選択したサービス９９がないと判定した場合、振り分け不可と判定し、サービス振り分け処理４０７を終了する。 Thereafter, the CPU 21 determines whether or not there is a service 99 temporarily selected as a distribution target (step 426). If it is determined that there is no provisionally selected service 99, it is determined that distribution is impossible, and the service distribution process 407 is terminated.

ＣＰＵ２１は、仮選択したサービス９９のうち、サービス９９のデプロイ処理及び実行処理の合計時間が最大のサービス９９を、振り分け対象サービス９９として選択する（ステップ４２７）。そして別実行計算機１０のうち、実行計算機１０においてサービス９９が実行された後の時間帯に、選択された振り分け対象サービス９９を割り当て可能な実行計算機１０が存在するか否かを判定する（ステップ４２８）。 The CPU 21 selects, as the distribution target service 99, the service 99 having the maximum total time of the deployment process and the execution process of the service 99 among the temporarily selected services 99 (step 427). Then, it is determined whether or not there is an execution computer 10 to which the selected distribution target service 99 can be allocated in the time zone after the service 99 is executed in the execution computer 10 among the other execution computers 10 (step 428). ).

ステップ４２８において、振り分け対象サービス９９を振り分ける別の実行計算機１０が存在しない場合、ＣＰＵ２１は、振り分け不可と判定し、サービス振り分け処理４０７を終了する。 In step 428, when there is no other execution computer 10 to which the distribution target service 99 is distributed, the CPU 21 determines that the distribution is impossible and ends the service distribution process 407.

ステップ４２８において、振り分け対象サービス９９を振り分ける別の実行計算機１０が存在する場合、ＣＰＵ２１は、ステップ４２５において選択された振り分け対象サービス９９のデプロイ処理及び実行処理を、ステップ４２８において判定された実行計算機１０に、割り当て直し、スケジュール管理テーブル１３１を更新する（ステップ４２９）。そして、振り分け可能と判定し、サービス振り分け処理４０７を終了する。 If there is another execution computer 10 that distributes the distribution target service 99 in step 428, the CPU 21 executes the deployment processing and execution processing of the distribution target service 99 selected in step 425, and the execution computer 10 determined in step 428. The schedule management table 131 is updated (step 429). Then, it is determined that distribution is possible, and the service distribution process 407 ends.

図９Ａは、本発明の第１の実施形態のジョブスケジューリング処理のサービス９９の具体例を示す説明図である。 FIG. 9A is an explanatory diagram illustrating a specific example of the service 99 for job scheduling processing according to the first embodiment of this invention.

後述するケース１及びケース２におけるジョブ９８は、サービス９９Ａ、サービス９９Ｂ、サービス９９Ｃ及びサービス９９Ｄを含む。サービス９９Ａは、サービス９９Ｂ及びサービス９９Ｃに先行して実行され、サービス９９Ｂ及びサービス９９Ｃは、サービス９９Ａの実行が終了した後に実行される。サービス９９Ｄは、サービス９９Ｂ及びサービス９９Ｃに後続して実行され、サービス９９Ｂ及びサービス９９Ｃは、サービス９９Ｄに先行して実行される。 A job 98 in case 1 and case 2 described later includes a service 99A, a service 99B, a service 99C, and a service 99D. The service 99A is executed prior to the service 99B and the service 99C, and the service 99B and the service 99C are executed after the execution of the service 99A is completed. The service 99D is executed subsequent to the service 99B and the service 99C, and the service 99B and the service 99C are executed prior to the service 99D.

また、サービス９９Ｂ及びサービス９９Ｃは、お互いに並列に実行可能である。さらに、サービスＡ及びサービスＤは、自らのサービスＡ及びサービスＤと並行に実行可能である。 Further, the service 99B and the service 99C can be executed in parallel with each other. Furthermore, the service A and the service D can be executed in parallel with the service A and the service D of the service A and the service D.

図９Ｂは、本発明の第１の実施形態の統計情報管理テーブル１３２の具体例を示す説明図である。 FIG. 9B is an explanatory diagram illustrating a specific example of the statistical information management table 132 according to the first embodiment of this invention.

統計情報管理テーブル１３２は、図９Ａに示す各サービス９９のデプロイ時間１３２−２及び実行時間１３２−３を、入力データ量１０２に対する相対時間として保持する。図９Ｂに示す統計情報管理テーブル１３２は、サービス９９Ａ、サービス９９Ｂ、サービス９９Ｃ及びサービス９９Ｄのデプロイ時間１３２−２及び実行時間１３２−３を保持する。 The statistical information management table 132 holds the deployment time 132-2 and execution time 132-3 of each service 99 shown in FIG. 9A as relative time with respect to the input data amount 102. The statistical information management table 132 shown in FIG. 9B holds the deployment time 132-2 and execution time 132-3 of the service 99A, service 99B, service 99C, and service 99D.

図１０Ａは、本発明の第１の実施形態のケース１における入力値１０１の具体例を示す説明図である。 FIG. 10A is an explanatory diagram illustrating a specific example of the input value 101 in case 1 according to the first embodiment of this invention.

ケース１におけるジョブ９８の実行開始時刻は、２：００であり、ジョブ終了時間１０３の値は、単位が分である。また、図１０Ａに示すケース１における入力値１０１は、入力データ量１０２が１００であり、ジョブ終了時間１０３が１３０（分）であり、障害安全係数１０４が１である。 The execution start time of the job 98 in case 1 is 2:00, and the unit of the value of the job end time 103 is minutes. Further, in the input value 101 in case 1 shown in FIG. 10A, the input data amount 102 is 100, the job end time 103 is 130 (minutes), and the failure safety coefficient 104 is 1.

図１０Ｂは、本発明の第１の実施形態のケース１におけるジョブスケジューリング結果を示す説明図である。 FIG. 10B is an explanatory diagram illustrating a job scheduling result in case 1 of the first embodiment of the invention.

図１０Ｂは、実行計算機１０−１へ割り当てられたサービス９９を示す。また、図１０Ｂは、ジョブスケジューリング部１１２が、図９Ａに示すジョブ９８、図９Ｂに示す統計情報管理テーブル１３２及び図１０Ａに示す入力値１０１に基づいて、実行計算機１０−１においてサービス９９Ａ〜サービス９９Ｄをスケジューリングした結果を示す。図１０Ｂにおいて、「Ａデ」は、「サービスＡのデプロイ処理」を示し、「Ａ実」は、「サービスＡの実行処理」を示す。 FIG. 10B shows the service 99 assigned to the execution computer 10-1. FIG. 10B shows that the job scheduling unit 112 uses the service 99A to the service 99A in the execution computer 10-1 based on the job 98 shown in FIG. 9A, the statistical information management table 132 shown in FIG. 9B, and the input value 101 shown in FIG. The result of scheduling 99D is shown. In FIG. 10B, “A” indicates “service A deployment process”, and “A actual” indicates “service A execution process”.

ケース１におけるＣＰＵ２１は、図７に示すステップ４０３〜４０５において、障害安全係数１０４「１」に１を加算して「２」を取得する。そして、各サービス９９のデプロイ処理及び実行処理を２回ずつ割り当てる。また、ステップ４０６において、ジョブ終了予定時間が、ジョブ終了時間１０３よりも小さいと判定し、ジョブスケジューリング処理を終了する。 In Steps 403 to 405 shown in FIG. 7, the CPU 21 in Case 1 adds “1” to the failure safety coefficient 104 “1” to obtain “2”. Then, the deployment process and the execution process of each service 99 are assigned twice. In step 406, it is determined that the scheduled job end time is smaller than the job end time 103, and the job scheduling process is ended.

図１０Ｃは、本発明の第１の実施形態のケース１におけるスケジュール管理テーブルを示す説明図である。 FIG. 10C is an explanatory diagram illustrating a schedule management table in Case 1 according to the first embodiment of this invention.

ジョブスケジューリング部１１２は、図１０Ｂに示すスケジュール結果を、スケジュール管理テーブル１３１に格納する。図１０Ｃに示す実行計算機名１３１−１の値は、すべて実行計算機１０−１である。ＣＰＵ２１は、図１０Ｃに示すスケジュール管理テーブル１３１によって、各実行計算機１０にサービス９９を割り当てる。そして実行計算機１０−１は、割り当てられたサービス９９終了後、処理結果を処理データ１４１に送信する。 The job scheduling unit 112 stores the schedule result illustrated in FIG. 10B in the schedule management table 131. The values of the execution computer name 131-1 shown in FIG. 10C are all execution computers 10-1. The CPU 21 assigns a service 99 to each execution computer 10 by the schedule management table 131 shown in FIG. 10C. The execution computer 10-1 transmits the processing result to the processing data 141 after the assigned service 99 ends.

ケース１におけるＣＰＵ２１は、図９Ａに示すジョブ９８と、図１０Ａに示す入力値１０１を受信した場合、統計情報管理テーブル１３２の情報に従って、図１０Ｂに示すタイムスケジュールのようにジョブ９８をスケジュールする。ケース１におけるＣＰＵ２１は、図７に示すステップ４０６において、１台の実行計算機１０−１において１２０分以内に、ジョブ９８を実行可能と判定する。このため、ケース１におけるジョブ９８は、すべて１台の実行計算機１０−１においてスケジュールされる。 When receiving the job 98 shown in FIG. 9A and the input value 101 shown in FIG. 10A, the CPU 21 in Case 1 schedules the job 98 according to the information in the statistical information management table 132 as shown in the time schedule shown in FIG. 10B. In step 406 shown in FIG. 7, the CPU 21 in case 1 determines that the job 98 can be executed within 120 minutes in one execution computer 10-1. For this reason, all the jobs 98 in case 1 are scheduled in one execution computer 10-1.

図１０Ｂにおいて示すタイムチャートにおける各サービス９９のデプロイ処理及び実行処理の開始時刻１３１−２、終了時刻１３１−３は、スケジュール管理テーブル１３１に格納される。ＣＰＵ２１は、入力値１０１の障害安全係数１０４に基づいて、実行計算機１０−１において全てのサービス９９のデプロイ処理及び実行処理を、２回ずつスケジューリングする。このため、スケジュール管理テーブル１３１には、すべてのサービス９９のデプロイ処理及び実行処理の組が二つずつ格納される。 The start time 131-2 and end time 131-3 of the deployment process and execution process of each service 99 in the time chart shown in FIG. 10B are stored in the schedule management table 131. Based on the failure safety coefficient 104 of the input value 101, the CPU 21 schedules the deployment process and the execution process of all the services 99 in the execution computer 10-1 twice. For this reason, the schedule management table 131 stores two sets of deployment processing and execution processing for all services 99.

ジョブ９８が実行され、サービス９９が正常に実行された場合、複数回スケジューリングされたサービス９９のデプロイ処理及び実行処理は、スキップされ、実行されない。また、ジョブ９８が実行され、サービス９９のデプロイ処理時又は実行処理時に障害が発生した場合、ＣＰＵ２１は、障害が発生した旨を検知し、障害の要因に従って、処理を継続する実行計算機１０を選択する。 When the job 98 is executed and the service 99 is normally executed, the deployment process and the execution process of the service 99 scheduled multiple times are skipped and not executed. When the job 98 is executed and a failure occurs during the deployment process or the execution process of the service 99, the CPU 21 detects that a failure has occurred and selects the execution computer 10 to continue the processing according to the cause of the failure. To do.

障害の要因がサービス９９のソフトウェア障害等であり、実行制御プログラム９２が処理可能な場合、ＣＰＵ２１は、同一実行計算機１０において再度サービス９９のデプロイ処理及び実行処理を実行させる。 When the cause of the failure is a software failure of the service 99 and the execution control program 92 can process, the CPU 21 causes the same execution computer 10 to execute the deployment process and the execution process of the service 99 again.

障害の要因が実行計算機１０におけるハードウェア障害又はネットワーク障害等であり、実行制御プログラム９２が処理不可能な場合、ＣＰＵ２１は、直ちに別の実行計算機１０に残りのサービス９９のスケジュールを割り当て直し、処理を続行させる。残りのサービス９９を新たに割り当てられた実行計算機１０は、障害の発生した実行計算機１０において実行されていたサービス９９のデプロイ処理から実行する。 If the cause of the failure is a hardware failure or a network failure in the execution computer 10 and the execution control program 92 cannot be processed, the CPU 21 immediately reassigns the schedule of the remaining service 99 to another execution computer 10 and performs processing. To continue. The execution computer 10 to which the remaining service 99 is newly assigned is executed from the deployment process of the service 99 executed in the execution computer 10 in which the failure has occurred.

これによって、ＣＰＵ２１は、ジョブ９８実行中に障害安全係数１０４によって指定された数である、「１」回の障害が発生した場合においても、ジョブ終了時間１０３の１３０分以内にジョブを終了可能である。 As a result, the CPU 21 can finish the job within 130 minutes of the job end time 103 even when “1” times of failure, which is the number designated by the failure safety coefficient 104, occur during the job 98 execution. is there.

ケース２は、ケース１と異なり、ジョブ終了時間１０３が７５分である場合、ジョブスケジューリング部１１２によってスケジューリングされた各実行計算機１０へのサービス９９の割り当てを示す。 Case 2 is different from Case 1 and shows assignment of the service 99 to each execution computer 10 scheduled by the job scheduling unit 112 when the job end time 103 is 75 minutes.

図１１Ａは、本発明の第１の実施形態のケース２における入力値１０１の具体例を示す説明図である。 FIG. 11A is an explanatory diagram illustrating a specific example of the input value 101 in case 2 according to the first embodiment of this invention.

ケース２の入力値１０１は、入力データ量１０２が１００であり、ジョブ終了時間１０３が７５（分）であり、障害安全係数１０４が１である。 In the input value 101 of case 2, the input data amount 102 is 100, the job end time 103 is 75 (minutes), and the failure safety coefficient 104 is 1.

ケース２におけるＣＰＵ２１は、図７に示す４０６においてジョブ終了予定時刻が、ジョブ終了時間１０３よりも多いため、図８に示すサービス振り分け処理を行う。 The CPU 21 in Case 2 performs the service distribution process shown in FIG. 8 because the scheduled job end time is greater than the job end time 103 in 406 shown in FIG.

図１１Ｂは、本発明の第１の実施形態のケース２におけるジョブスケジューリング結果を示す説明図である。 FIG. 11B is an explanatory diagram illustrating a job scheduling result in case 2 according to the first embodiment of this invention.

図１１Ｂは、実行計算機１０−１及び実行計算機１０−２へ割り当てられたサービス９９を示す。サービス９９Ａは、二つの実行計算機１０において並列に実行される。 FIG. 11B shows the service 99 assigned to the execution computer 10-1 and the execution computer 10-2. The service 99A is executed in parallel on the two execution computers 10.

サービス９９Ｃは、サービス９９Ａの実行後であり、サービス９９Ｂと並列に実行されてもよいため、実行計算機１０−２の２：１５からと、実行計算機１０−１の２：３５にスケジュールされる。サービス９９Ｄは、サービス９９Ｂ及びサービス９９Ｃの後続にスケジュールされる。 Since the service 99C is executed after the service 99A and may be executed in parallel with the service 99B, the service 99C is scheduled from 2:15 of the execution computer 10-2 and 2:35 of the execution computer 10-1. Service 99D is scheduled after service 99B and service 99C.

ケース２におけるＣＰＵ２１は、図８に示すステップ４２３〜ステップ４２５において、サービス９９Ａ〜サービス９９Ｄの全てを仮選択する。そして、ステップ４２７において、仮選択したサービス９９の中から、デプロイ処理及び実行処理の合計時間が最大のサービス９９を選択し、実行計算機１０−２に割り当てる。 The CPU 21 in case 2 temporarily selects all of the services 99A to 99D in steps 423 to 425 shown in FIG. In step 427, the service 99 having the maximum total time of the deployment process and the execution process is selected from the provisionally selected service 99 and assigned to the execution computer 10-2.

ここで、サービス９９Ｂは、最もデプロイ処理及び実行処理に要する時間が少ないため、実行計算機１０−２に振り分けられる順番は遅くなる。ケース２におけるＣＰＵ２１は、デプロイ処理及び実行処理に要する時間の多い順に、サービス９９Ｄ、サービス９９Ａ、サービス９９Ｃを実行計算機１０−２に振り分け（ステップ４０７）、その後、ジョブ終了時間内にジョブ９８が終了する予定であると判定する（ステップ４０６）。このため、ケース２におけるサービス９９Ｂは、実行計算機１０−２に振り分けられない。 Here, since the service 99B requires the least time for the deployment process and the execution process, the order of distribution to the execution computer 10-2 is delayed. In the case 2, the CPU 21 distributes the service 99D, the service 99A, and the service 99C to the execution computer 10-2 in the order of the time required for the deployment process and the execution process (step 407), and then the job 98 ends within the job end time. (Step 406). For this reason, the service 99B in Case 2 is not distributed to the execution computer 10-2.

図１１Ｃは、本発明の第１の実施形態のケース２におけるスケジュール管理テーブル１３１を示す説明図である。 FIG. 11C is an explanatory diagram illustrating the schedule management table 131 in Case 2 according to the first embodiment of this invention.

図１１Ｃに示す実行計算機名１３１−１は、実行計算機１０−１と実行計算機１０−２とが格納される。図１１に示すスケジュール管理テーブル１３１には、図１１Ｂに示すスケジュールに基づいて、値が格納される。ケース２において、ジョブ９８の終了予定時刻は、サービス９９Ｄの終了時刻１３１−３が示す３：１０であり、ジョブ９８が実行されてから７５分間で終了する予定である。 The execution computer name 131-1 illustrated in FIG. 11C stores the execution computer 10-1 and the execution computer 10-2. Values are stored in the schedule management table 131 shown in FIG. 11 based on the schedule shown in FIG. 11B. In Case 2, the scheduled end time of the job 98 is 3:10 indicated by the end time 131-3 of the service 99D, and is scheduled to end in 75 minutes after the job 98 is executed.

ケース２におけるＣＰＵ２１は、図９Ａに示すジョブ９８と、図１１Ａに示す入力値１０１を受信した場合、統計情報管理テーブル１３２の情報に従って、図１１Ｂに示すタイムスケジュールのようにジョブ９８をスケジュールする。ケース２におけるＣＰＵ２１は、図７に示すステップ４０６において、２台の実行計算機１０−１及び実行計算機１０−２において７０分以内に、ジョブ９８を実行可能と判定する。このため、ケース２におけるジョブ９８は、２台の実行計算機１０−１及び実行計算機１０−２においてスケジュールされる。 When receiving the job 98 shown in FIG. 9A and the input value 101 shown in FIG. 11A, the CPU 21 in Case 2 schedules the job 98 according to the information in the statistical information management table 132 as shown in the time schedule shown in FIG. 11B. In step 406 shown in FIG. 7, the CPU 21 in Case 2 determines that the job 98 can be executed within 70 minutes in the two execution computers 10-1 and 10-2. For this reason, the job 98 in case 2 is scheduled in the two execution computers 10-1 and 10-2.

ケース１とは異なり、ケース２におけるＣＰＵ２１は、複数のサービス９９のデプロイ処理及び実行処理を２台の実行計算機１０に割り当てる。ジョブ９８が実行され、複数の実行計算機１０に割り当てられたサービス９９が正常に実行された場合、いずれか一つの実行計算機１０は、処理結果を処理データ１４１に反映する。 Unlike case 1, the CPU 21 in case 2 assigns the deployment processing and execution processing of a plurality of services 99 to the two execution computers 10. When the job 98 is executed and the service 99 assigned to the plurality of execution computers 10 is normally executed, any one execution computer 10 reflects the processing result in the processing data 141.

スケジュール管理計算機１１は、実行計算機１０毎に優先順位を設けることによって、いずれの実行計算機１０の処理結果を処理データ１４１に反映するかを定めてもよい。また、一つの実行計算機１０が、処理結果を反映している間、スケジュール管理計算機１１上又は処理データ１４１上に、処理結果を反映中であることを示す情報を保持してもよい。そして、処理結果を反映中であることを示す情報を他の実行計算機１０が参照することによって、他の実行計算機１０が反映処理をスキップしてもよい。 The schedule management computer 11 may determine which execution computer 10 processing result is reflected in the processing data 141 by providing a priority for each execution computer 10. Further, while one execution computer 10 reflects the processing result, information indicating that the processing result is being reflected may be held on the schedule management computer 11 or the processing data 141. Then, another execution computer 10 may skip the reflection process by referring to information indicating that the processing result is being reflected by the other execution computer 10.

本実施形態によれば、各実行計算機１０において、割り当てられたサービス９９のデプロイ処理時及び実行処理時に障害が発生した場合、残りの実行計算機１０によって同一のサービス９９が実行されるため、ジョブ９８は、継続して実行される。ジョブ実行中に１回の障害が発生した場合、障害安全係数１０４によって「１」が指定されていれば、ジョブ終了時間１０３の７５分以内にジョブを終了させることができる。 According to the present embodiment, if a failure occurs during the deployment process and the execution process of the assigned service 99 in each execution computer 10, the same service 99 is executed by the remaining execution computers 10. Is continuously executed. If a failure occurs once during job execution, if “1” is specified by the failure safety coefficient 104, the job can be ended within 75 minutes of the job end time 103.

例えば、実行計算機１０−１においてサービス９９Ａが実行されている際に、障害が発生した場合、実行計算機１０−２においてもサービス９９Ａが同時に実行されているため、サービス９９Ｂ及びサービス９９Ｃは、継続してスケジュール通り実行される。なお、実行計算機１０−１及び１０−２においてサービス９９Ａが正常に実行された場合、前述の通り、優先順位に従っていずれかの実行計算機１０による処理結果が処理データ１４１に反映されてもよいし、より早く送信された処理結果が処理データ１４１に反映されてもよい。 For example, when a failure occurs while the service 99A is being executed in the execution computer 10-1, the service 99B and the service 99C continue because the service 99A is also executed in the execution computer 10-2. Will be executed as scheduled. When the service 99A is normally executed in the execution computers 10-1 and 10-2, as described above, the processing result by any of the execution computers 10 may be reflected in the processing data 141 according to the priority order. The processing result transmitted earlier may be reflected in the processing data 141.

また例えば、実行計算機１０−２においてサービス９９Ｃが実行されている際に、障害が発生した場合、二つ目のサービス９９Ｃがスケジュール通り実行されるため、後続のサービス９９Ｄもスケジュール通り実行される。なお、サービス９９Ｃが正常に実行された場合、サービス９９Ｃはスキップされ、サービス９９Ｄはスケジュール通り実行される。 Further, for example, if a failure occurs while the service 99C is being executed in the execution computer 10-2, the second service 99C is executed as scheduled, so the subsequent service 99D is also executed as scheduled. When the service 99C is normally executed, the service 99C is skipped and the service 99D is executed as scheduled.

また、図１１Ｂに示すサービス９９Ｂは直列に実行されるが、サービス９９Ｂにおいて障害が発生した場合、ケース１と同様に、スケジュール管理計算機１１は、障害の内容に従って実行計算機１０を選択すればよい。 The service 99B shown in FIG. 11B is executed in series. However, when a failure occurs in the service 99B, the schedule management computer 11 may select the execution computer 10 according to the content of the failure as in the case 1.

前述のスケジューリングによって、スケジュール管理計算機１１は、障害が発生しても、予定した３：１０にジョブ９８を終了させることができる。 Due to the scheduling described above, the schedule management computer 11 can finish the job 98 at the scheduled 3:10 even if a failure occurs.

図１２は、本発明の第１の実施形態のディスプレイ装置２２に表示されるジョブ投入画面を示す説明図である。 FIG. 12 is an explanatory diagram illustrating a job input screen displayed on the display device 22 according to the first embodiment of this invention.

前述したジョブスケジューリングシステムに対するジョブ９８の投入は、システムが自動的に行う場合、又は、ユーザによってジョブ９８の投入がオペレーションされる場合が考えられる。ジョブ９８の投入とは、実行されるジョブ９８の情報をシステムに入力することである。 The job 98 can be input to the job scheduling system described above when the system automatically performs the operation, or when the user inputs the job 98. The input of the job 98 is to input information on the job 98 to be executed to the system.

ユーザがジョブを投入する場合、スケジュール管理計算機１１は、図１２に示すジョブ投入画面５０１をディスプレイ装置２２に表示する。そしてユーザは、ジョブ９８を実行するために必要な情報を、ジョブ投入画面５０１を介して入力する。そしてスケジュール管理計算機１１は、図７及び図８に示すジョブスケジューリングを行い、ジョブ９８の実行可否情報をユーザに通知する。 When the user submits a job, the schedule management computer 11 displays a job submission screen 501 shown in FIG. Then, the user inputs information necessary for executing the job 98 via the job input screen 501. Then, the schedule management computer 11 performs job scheduling shown in FIGS. 7 and 8 and notifies the user of execution availability information of the job 98.

ジョブ投入画面５０１は、ジョブ名５０２、入力データファイル名５０３、ジョブ実行時刻５０４、障害安全係数５０５及びジョブ投入ボタン５０６を含む入力情報領域と、ジョブ投入結果５０７を含む結果情報領域とを含む。 The job submission screen 501 includes an input information area including a job name 502, an input data file name 503, a job execution time 504, a failure safety factor 505, and a job submission button 506, and a result information area including a job submission result 507.

ジョブ名５０２は、実行するジョブを指定するための領域である。 A job name 502 is an area for designating a job to be executed.

入力データファイル名５０３は、入力データ量１０２を指定するための領域である。また、入力データファイル名５０３に示される入力データファイル以外の形式によってジョブ９８の入力データ量を指定する場合、別の入力値から算出した値を用いてもよい。 The input data file name 503 is an area for designating the input data amount 102. Further, when the input data amount of the job 98 is specified by a format other than the input data file indicated by the input data file name 503, a value calculated from another input value may be used.

ジョブ実行時刻５０４は、ジョブ終了時間１０３、すなわち、スケジュール管理テーブル１３１の開始時刻１３１−２、終了時刻１３１−３を指定するための領域である。 The job execution time 504 is an area for designating the job end time 103, that is, the start time 131-2 and the end time 131-3 of the schedule management table 131.

障害安全係数５０５は、障害安全係数１０４を入力するための領域である。当該指定値については、図１において説明したように、ジョブ９８の重要度を示す数値によってユーザに候補となる指定値を提示してもよく、実行計算機１０の稼働率などの情報に基づいて、システムが自動的に指定してもよい。 The failure safety factor 505 is an area for inputting the failure safety factor 104. As for the designated value, as described in FIG. 1, a candidate designated value may be presented to the user by a numerical value indicating the importance of the job 98, and based on information such as the operating rate of the execution computer 10, The system may specify automatically.

ジョブ投入ボタン５０６は、システムにジョブ投入を指示するためのボタンである。システムは、ジョブ投入後、前述したように、ジョブ９８をスケジューリングする。 A job input button 506 is a button for instructing the system to input a job. After the job is submitted, the system schedules the job 98 as described above.

ジョブ投入結果５０７は、ジョブ９８をスケジューリングした結果として、ジョブ９８のスケジューリング可否、ジョブ９８の予定開始時刻及び予定終了時刻をユーザに通知するための領域である。ジョブ投入結果５０７には、サービス９９が割り当てられる実行計算機１０の台数、又は、実行計算機１０毎のサービス実行予定などの情報を含めてもよい。 The job input result 507 is an area for notifying the user of the scheduling availability of the job 98, the scheduled start time and the scheduled end time of the job 98 as a result of scheduling the job 98. The job input result 507 may include information such as the number of execution computers 10 to which the service 99 is assigned or a service execution schedule for each execution computer 10.

本発明の実施の形態によれば、ジョブ実行時に障害が発生してもジョブ終了時間を満たすようにジョブ９８のサービス９９を実行計算機１０に割り当てることが可能となる。また本発明は、ジョブ終了時間１０３を満たす範囲で、サービス９９を実行計算機１０へ振り分け、サービス９９を冗長してスケジューリングしないため、実行計算機１０のリソースを有効活用するようにジョブ９８をスケジューリングすることができる。 According to the embodiment of the present invention, even if a failure occurs during job execution, the service 99 of the job 98 can be assigned to the execution computer 10 so as to satisfy the job end time. Further, according to the present invention, since the service 99 is allocated to the execution computer 10 within a range that satisfies the job end time 103 and the service 99 is not scheduled redundantly, the job 98 is scheduled so as to effectively use the resources of the execution computer 10. Can do.

以上、本発明の最適な実施形態を示したが、本発明は前記実施形態に限定されず、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。例えば以下のような実施形態が考えられる。 As mentioned above, although the optimal embodiment of this invention was shown, this invention is not limited to the said embodiment, In the range which does not deviate from the meaning of this invention, it can change suitably. For example, the following embodiments can be considered.

（第２の実施形態）
第１の実施形態におけるサービス９９は、各実行計算機１０においてデプロイされていない状態において、ジョブスケジューリングされた。実行計算機１０上に振り分け対象のサービス９９がデプロイ済みであった場合、第２の実施形態におけるスケジュール管理計算機１１は、初回のサービス９９のデプロイ処理を省略することによって、ジョブ９８の実行時間を短縮することができる。 (Second Embodiment)
The service 99 in the first embodiment is job-scheduled in a state where it is not deployed in each execution computer 10. When the distribution target service 99 has already been deployed on the execution computer 10, the schedule management computer 11 in the second embodiment shortens the execution time of the job 98 by omitting the first service 99 deployment process. can do.

さらに、スケジュール管理計算機１１は、各実行計算機１０においてデプロイ済みのサービス９９の一覧を管理し、図７に示すステップ４０２及び図８に示すステップ４２８において、サービス９９に対するデプロイが済んでいる実行計算機１０を優先的に割り当てることによって、ジョブ９８の実行時間を短縮することができる。 Further, the schedule management computer 11 manages a list of services 99 deployed in each execution computer 10, and the execution computer 10 that has been deployed for the service 99 in step 402 shown in FIG. 7 and step 428 shown in FIG. By preferentially assigning, the execution time of the job 98 can be shortened.

（第３の実施形態）
第１の実施形態では、図８に示すステップ４２９において、先行するサービスが終了した後の時間帯に別の実行計算機１０へ後続のサービス９９を割り当てていたが、先行するサービス９９が実行される前に、サービス９９のデプロイ処理に要する時間以上の空き時間が存在した場合、その空き時間に初回のサービス９９のデプロイ処理を割り当てることによって、ジョブ９８の実行時間を短縮することができる。なお、デプロイ処理は一般的に他のサービスと並行して実行されることができる。 (Third embodiment)
In the first embodiment, in step 429 shown in FIG. 8, the subsequent service 99 is assigned to another execution computer 10 in the time zone after the preceding service ends, but the preceding service 99 is executed. If there is a free time that is longer than the time required for the service 99 deployment process before, the execution time of the job 98 can be shortened by assigning the first service 99 deployment process to the free time. The deployment process can generally be executed in parallel with other services.

１０、１０−１〜１０−ｎ実行計算機
１１スケジュール管理計算機
２１ＣＰＵ
２２ディスプレイ装置
２３キーボード
２４マウス
２５ネットワークインタフェースカード（ＮＩＣ）
２６ハードディスク
２７メモリ
２８バス
３０オペレーティングシステム
９１スケジューリング制御プログラム
９２実行制御プログラム
９９、９９Ａ〜９９Ｆサービス
１０１入力値
１０２入力データ量
１０３ジョブ終了時間
１０４障害安全係数
１１１ジョブ受付部
１１２ジョブスケジューリング部
１１３実行指示部
１１４統計情報管理部
１２１サービスデプロイ部
１２２サービス実行部
１２３統計情報転送部
１３１スケジュール管理テーブル
１３２統計情報管理テーブル
１４１処理データ 10, 10-1 to 10-n Execution computer 11 Schedule management computer 21 CPU
22 Display device 23 Keyboard 24 Mouse 25 Network interface card (NIC)
26 Hard disk 27 Memory 28 Bus 30 Operating system 91 Scheduling control program 92 Execution control program 99, 99A to 99F Service 101 Input value 102 Input data amount 103 Job end time 104 Failure safety factor 111 Job acceptance unit 112 Job scheduling unit 113 Execution instruction unit 114 statistical information management unit 121 service deployment unit 122 service execution unit 123 statistical information transfer unit 131 schedule management table 132 statistical information management table 141 processing data

Claims

One or more execution computers for processing a plurality of jobs, and a management computer connected to the one or more execution computers and allocating the plurality of jobs to the one or more execution computers,
Each job includes a plurality of services,
The service includes a program module processed by the execution computer,
The management computer is
Obtain a predetermined coefficient and the end time of each job,
Assigning each service to the execution computer so that it is executed the number of times indicated by the acquired predetermined coefficient;
When a failure occurs in the execution computer to which the service is assigned while each service is being processed by the execution computer, the assigned service is processed again by the execution computer. Management computer.

The management computer is
Assigning each of the services to the first execution computer so as to be executed the number of times indicated by the acquired predetermined coefficient;
Calculating the time required to process the job based on the total time required to process the assigned services;
When the calculated time exceeds the end time of the acquired job, among the services assigned to the first execution computer, the service that can be processed within the end time of the job Assigned to the first execution computer,
2. The management computer according to claim 1, wherein a service that has not been assigned to the first execution computer is assigned to the second execution computer.

The management computer is
Extracting the services that can be processed in parallel from among a plurality of services assigned to the first execution computer so that the predetermined coefficient is executed the number of times indicated,
3. The management computer according to claim 2, wherein the extracted service is allocated to the first execution computer and the second execution computer.

The management computer is
Extracting the service that takes a long time to process from among a plurality of services assigned to the first execution computer so that it is executed the number of times indicated by the predetermined coefficient,
The management computer according to claim 3, wherein the extracted service is assigned to the second execution computer.

The process of the service includes a preparation process of the service and an execution process of the service,
The management computer is
Select the execution computer that has completed the service preparation process,
2. The management computer according to claim 1, wherein a service for which a service preparation process has been completed in the selected execution computer is allocated to the selected execution computer.

The process of the service includes a preparation process of the service and an execution process of the service,
The management computer is
Of the services assigned to the second execution computer, extract the service that can be prepared for the service at a time when the service is not processed in the second execution computer,
The management computer according to claim 2, wherein a preparation process of the extracted service is assigned at a time when the service is not processed.

One or more execution computers that process a plurality of jobs, and a job scheduling method that is connected to the one or more execution computers and assigns the plurality of jobs to the one or more execution computers,
Each job includes a plurality of services,
The service includes a program module processed by the execution computer,
The method
The management computer obtains a predetermined coefficient and an end time of each job;
A procedure for allocating the services to the execution computer so that the management computer is executed the number of times indicated by the acquired predetermined coefficient;
If a failure occurs in the execution computer to which the service is assigned while each service is being processed by the execution computer, the management computer causes the execution computer to process the assigned service again. And a job scheduling method.

The procedure for assigning the service to the execution computer is as follows:
A procedure for allocating the services to the first execution computer so that the management computer is executed the number of times indicated by the acquired predetermined coefficient;
A procedure for calculating a time required for processing the job based on a total time required for the management computer to process the assigned services;
When the calculated time exceeds the end time of the acquired job, the management computer can process within the end time of the job among a plurality of services assigned to the first execution computer Assigning the service to the first execution computer;
The job scheduling method according to claim 7, further comprising: a procedure in which the management computer assigns a service that has not been assigned to the first execution computer to the second execution computer.

The procedure for assigning the service to the execution computer is as follows:
A procedure for extracting the services that can be processed in parallel among a plurality of services assigned to the first execution computer so that the management computer is executed the number of times indicated by the predetermined coefficient; ,
9. The job scheduling method according to claim 8, wherein the management computer includes a procedure for assigning the extracted service to the first execution computer and the second execution computer.

The procedure for assigning the service to the execution computer is as follows:
A procedure for extracting the service that takes a long time to process among a plurality of services assigned to the first execution computer so that the management computer is executed the number of times indicated by the predetermined coefficient. When,
The job scheduling method according to claim 9, further comprising: a procedure in which the management computer assigns the extracted service to the second execution computer.

The process of the service includes a preparation process of the service and an execution process of the service,
The procedure for assigning the service to the execution computer is as follows:
A procedure for the management computer to select the execution computer for which the service preparation processing has been completed;
The job scheduling method according to claim 7, further comprising: allocating a service for which a service preparation process has been completed in the selected execution computer to the selected execution computer.

The process of the service includes a preparation process of the service and an execution process of the service,
The procedure for assigning the service to the execution computer is as follows:
A procedure for the management computer to extract the services that can be prepared for the service at a time when the service is not processed in the second execution computer among the services assigned to the second execution computer; ,
The job scheduling method according to claim 8, further comprising a step of allocating a preparation process of the extracted service at a time when the service is not processed.

One or more execution computers for processing a plurality of jobs, and a job scheduling program connected to the one or more execution computers and allocating the plurality of jobs to the one or more execution computers,
Each job includes a plurality of services,
The service includes a program module processed by the execution computer,
The job scheduling program is:
Causing the management computer to acquire a predetermined coefficient and an end time of each job;
Causing the management computer to allocate the services to the execution computer so that the management computer is executed the number of times indicated by the acquired predetermined coefficient;
When a failure occurs in the execution computer to which the service is assigned while each service is being processed by the execution computer, the management computer is caused to process the assigned service again by the execution computer. A job scheduling program.

The job scheduling program is:
Causing the management computer to allocate the services to the first execution computer so that the management computer is executed the number of times indicated by the acquired predetermined coefficient;
Causing the management computer to calculate the time required to process the job based on the total time required to process the allocated services;
If the calculated time exceeds the end time of the acquired job, the management computer can process the plurality of services assigned to the first execution computer within the end time of the job. And assigning the service to the first execution computer,
14. The job scheduling program according to claim 13, wherein the management computer causes a service that has not been assigned to the first execution computer to be assigned to the second execution computer.

The job scheduling program is:
Causing the management computer to extract the services that can be processed in parallel among a plurality of services assigned to the first execution computer so that the management computer is executed the number of times indicated by the predetermined coefficient;
15. The job scheduling program according to claim 14, wherein the management computer is caused to allocate the extracted service to the first execution computer and the second execution computer.

The job scheduling program is:
Causing the management computer to extract the service that takes a long time to process among a plurality of services assigned to the first execution computer so that the management computer is executed the number of times indicated by the predetermined coefficient;
16. The job scheduling program according to claim 15, wherein the management computer causes the extracted service to be assigned to the second execution computer.