JP5842704B2

JP5842704B2 - Estimation apparatus, program, and estimation method

Info

Publication number: JP5842704B2
Application number: JP2012072234A
Authority: JP
Inventors: 功作木村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2012-03-27
Filing date: 2012-03-27
Publication date: 2016-01-13
Anticipated expiration: 2032-03-27
Also published as: JP2013205970A

Description

本発明は、バッチジョブの実行時間を精度良く推定することに関する。 The present invention relates to accurately estimating the execution time of a batch job.

バッチジョブがネットワークに繋がっているバッチジョブネットワークにおいて、全てのバッチジョブを予定終了時刻に完了させるために、入力データに対するバッチジョブの実行時間を予測して、適切な開始時刻を設定することが求められている。 In a batch job network where batch jobs are connected to the network, in order to complete all batch jobs at the scheduled end time, it is necessary to predict the execution time of the batch job for the input data and set an appropriate start time. It has been.

例えば、入力データ量から実行時間を推定するモデルを用いることが提案されている。過去の各バッチジョブの入力データ量から出力データ量と実行時間を推定するモデルを作成し、作成したモデルを用いて今回の入力データ量から出力データ量と実行時間を推定して、出力データ量を後続のバッチジョブの実行時間の推定の際に入力データ量として用いること、バッチジョブの統計量を求めるためのモデルを複数用意し、複数のモデルの中から最適なモデルとそのパラメータを小規模データを用いた測定結果から決定して、決定したパラメータを用いて大規模データの処理時の統計量を推定すること等が提案されている。 For example, it has been proposed to use a model that estimates execution time from the amount of input data. Create a model that estimates the output data amount and execution time from the input data amount of each past batch job, and estimate the output data amount and execution time from the current input data amount using the created model, and output data amount Is used as the amount of input data when estimating the execution time of subsequent batch jobs, and multiple models for obtaining batch job statistics are prepared. It has been proposed to determine from a measurement result using data, and to estimate a statistic at the time of processing large-scale data using the determined parameter.

また、入力データ項目を格納するファイルやデータベースを読み込むプログラムを抽出し、各プログラムに対して、各対象項目に対応する入力レコード中の変数を起点として影響検索を行うことによって、入力データ項目の値域の変更がアプリケーションの保守に与える影響を分析する等が知られている。 In addition, by extracting the program that reads the file or database that stores the input data items and performing an impact search on each program starting from the variable in the input record corresponding to each target item, the range of the input data item It is known to analyze the effect of changes in application maintenance on applications.

特開２００４−２９５７３１号公報JP 2004-295731 A 特開２０１０−０６１４１７号公報JP 2010-061417 A 再表２００９−０１１０５７号公報No. 2009-011057

Sturges, H.A., "The choice of a class interval", J. American Statistical Association, pp.65-66Sturges, H.A., "The choice of a class interval", J. American Statistical Association, pp.65-66

上述従来技術では、入力データ量のみで出力データ量と実行時間とを推定していた。このような実行時間の推定では、複数のフィールドで構成される入力データのレコードを１件ずつ処理するバッチジョブの場合、入力データの値の分布によって出力データ量と実行時間とが大きく変化するため、推定誤差が大きくなってしまうと言った問題があった。 In the above prior art, the output data amount and the execution time are estimated only by the input data amount. In such an execution time estimation, in the case of a batch job that processes records of input data composed of a plurality of fields one by one, the output data amount and the execution time greatly vary depending on the distribution of input data values. There was a problem that the estimation error would increase.

開示の実行時間推定装置は、出力データに影響する入力データのフィールドの値を階級分けした階級毎に、分類された該入力データの入力度数を設定する度数分布表と、該度数分布表の各階級にサンプリングした入力データの該入力度数を設定し、該サンプリングした入力データに対して所定処理を行うことで出力データの出力度数を得る度数分布写像を記憶する記憶部と、前記記憶部に記憶された前記度数分布表の各階級に対して、前記所定処理の対象入力データの前記入力度数を設定する設定部と、前記度数分布写像の値を係数とした一次式によって、該対象入力データに対する前記所定処理後の出力データの前記出力度数を該階級毎に取得し、取得した該階級毎の該出力度数を合計することで、前記対象入力データに対する前記出力度数を推定する出力度数推定部とを有する。 Execution time estimation apparatus disclosed the value of the field of the input data that affects the output data for each class obtained by class classification, the frequency distribution table for setting the input power of classified said input data, each of該度number distribution table set the input frequency of the input data sampled in class, a storage unit for storing the frequency distribution map to obtain an output frequency of the output data by performing a predetermined process on the input data the sampling, stored in the storage unit For each class of the frequency distribution table , the setting unit for setting the input frequency of the target input data of the predetermined process, and a linear expression using the value of the frequency distribution map as a coefficient, The output frequency of the output data after the predetermined processing is acquired for each class, and the output frequency for the target input data is obtained by summing the acquired output frequencies for each class. Estimating a and an output power estimator.

よって、本発明の目的は、バッチジョブにおける出力データ量や実行時間をより良い精度で推定できるようにすることである。 Therefore, an object of the present invention is to enable estimation of output data amount and execution time in a batch job with better accuracy.

開示の実行時間推定装置は、階級毎に入力データの度数の分布を示す度数分布表と、該度数分布表に基づく入力データから所定処理による結果情報への写像を記憶する記憶部と、前記記憶部に記憶された前記度数分布表を入力とする該記憶部に記憶された前記結果情報への写像を用いた計算結果に基づいて、前記入力データに対する前記所定処理の実行時間を推定する実行時間推定部とを有する。 The disclosed execution time estimation device includes a frequency distribution table indicating a frequency distribution of input data for each class, a storage unit for storing a mapping from input data based on the frequency distribution table to result information by a predetermined process, and the storage An execution time for estimating the execution time of the predetermined process on the input data based on a calculation result using a mapping to the result information stored in the storage unit that receives the frequency distribution table stored in the storage unit And an estimation unit.

また、上記課題を解決するための手段として、コンピュータに上記実行時間推定装置として機能させるためのプログラム、実行時間推定方法、及びそのプログラムを記録した記録媒体とすることもできる。 Further, as means for solving the above problems, a program for causing a computer to function as the execution time estimation apparatus, an execution time estimation method, and a recording medium on which the program is recorded can be used.

開示の技術では、バッチジョブのソース情報が不明であっても、バッチジョブの実行時間を精度良く推測することができる。 With the disclosed technology, it is possible to accurately estimate the execution time of a batch job even if the source information of the batch job is unknown.

本実施の形態に係るバッチジョブの実行時間推定方法を説明するための図である。It is a figure for demonstrating the execution time estimation method of the batch job which concerns on this Embodiment. 度数分布写像生成方法を説明するための図（その１）である。It is FIG. (1) for demonstrating the frequency distribution map production | generation method. 度数分布写像生成方法を説明するための図（その２）である。It is FIG. (2) for demonstrating the frequency distribution map production | generation method. 実行時間写像生成方法を説明するための図（その１）である。It is FIG. (1) for demonstrating an execution time map production | generation method. 実行時間写像生成方法を説明するための図（その２）である。It is FIG. (2) for demonstrating an execution time map production | generation method. 出力データ度数分布推定方法を説明するための図である。It is a figure for demonstrating the output data frequency distribution estimation method. 実行時間推定方法を説明するための図である。It is a figure for demonstrating the execution time estimation method. 変量フィールド決定方法を説明するための図である。It is a figure for demonstrating the variable field determination method. 影響度算出方法を説明するための図である。It is a figure for demonstrating the influence calculation method. 本実施の形態に係るシステムの機能構成例を示す図である。It is a figure which shows the function structural example of the system which concerns on this Embodiment. コンピュータ装置のハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of a computer apparatus. 度数分布写像作成処理を説明するための図である。It is a figure for demonstrating a frequency distribution map creation process. 実時間写像作成処理を説明するための図である。It is a figure for demonstrating real time mapping production processing. 出力データ度数分布推定処理を説明するための図である。It is a figure for demonstrating an output data frequency distribution estimation process. 実行時間推定処理を説明するための図である。It is a figure for demonstrating execution time estimation processing. 変量フィールド決定処理を説明するための図である。It is a figure for demonstrating the variable field determination process. 出力データの影響度算出処理を説明するための図である。It is a figure for demonstrating the influence degree calculation process of output data. 温度センサデータのバッチ処理の一例を説明するための図である。It is a figure for demonstrating an example of the batch process of temperature sensor data. 各データのスキーマと件数見積値の例を示す図である。It is a figure which shows the example of the schema of each data, and the estimated number of cases. 第１実施例における変量フィールドの決定例を説明するための図（その１）である。It is FIG. (1) for demonstrating the example of determination of the variable field in 1st Example. 第１実施例における変量フィールドの決定例を説明するための図（その２）である。It is FIG. (2) for demonstrating the example of determination of the variable field in 1st Example. 第１実施例における３次元度数分布の例を示す図である。It is a figure which shows the example of the three-dimensional frequency distribution in 1st Example. 第１実施例における度数分布写像の作成例を説明するための図（その１）である。It is FIG. (1) for demonstrating the creation example of the frequency distribution map in 1st Example. 第１実施例における度数分布写像の作成例を説明するための図（その２）である。It is FIG. (2) for demonstrating the creation example of the frequency distribution map in 1st Example. 第１実施例における出力データの度数分布推定例を説明するための図である。It is a figure for demonstrating the frequency distribution estimation example of the output data in 1st Example. 第２実施例における実行時間写像の作成例を説明するための図（その１）である。It is FIG. (1) for demonstrating the creation example of the execution time map in 2nd Example. 第２実施例における実行時間写像の作成例を説明するための図（その２）である。It is FIG. (2) for demonstrating the creation example of the execution time map in 2nd Example. 第２実施例における他の実行時間の推定例を説明するための図である。It is a figure for demonstrating the example of estimation of the other execution time in 2nd Example. 推定誤差を説明するための図である。It is a figure for demonstrating an estimation error.

以下、本発明の実施の形態を図面に基づいて説明する。本実施の形態では、複数のフィールドで構成される入力データのレコードを１件ずつ処理するようなバッチジョブについて、入力データの度数分布から度数分布写像と実行時間写像を用いて出力データの度数分布と実行時間とを推定する。本実施の形態によって、バッチジョブのソース情報が不明であっても出力データ量と実行時間とを高精度に推定することを可能とする。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the present embodiment, for batch jobs that process records of input data composed of a plurality of fields one by one, the frequency distribution of output data using the frequency distribution mapping and the execution time mapping from the frequency distribution of the input data And the execution time. This embodiment makes it possible to estimate the output data amount and the execution time with high accuracy even when the source information of the batch job is unknown.

図１は、本実施の形態に係るバッチジョブの実行時間推定方法を説明するための図である。図１において、バッチジョブの実行時間推定方法では、入力データの度数分布３及び出力データの度数分布４の各々で変量として用いるフィールド（以下、変量フィールドと言う）を決定し、入力データの度数分布３及び出力データの度数分布４の各々の度数分布表を作成する（変量フィールド決定方法）。変量フィールド決定方法については、後述される。 FIG. 1 is a diagram for explaining a batch job execution time estimation method according to the present embodiment. In FIG. 1, in the batch job execution time estimation method, a field used as a variable (hereinafter referred to as a variable field) is determined for each of the frequency distribution 3 of the input data and the frequency distribution 4 of the output data, and the frequency distribution of the input data is determined. 3 and a frequency distribution table for each of the output data frequency distributions 4 are created (variable field determination method). The variable field determination method will be described later.

以下を生成し、出力データの度数分布４と実行時間５とを推定する。 The following is generated, and the frequency distribution 4 and the execution time 5 of the output data are estimated.

・入力データの度数分布３を生成する
入力データの度数分布３は、実際は、複数の変量からなる多次元度数分布３−２であり、バッチジョブの実行時間推定方法は、このような多次元度数分布３−２を扱う。 Generate the frequency distribution 3 of the input data The frequency distribution 3 of the input data is actually a multidimensional frequency distribution 3-2 composed of a plurality of variables, and the execution time estimation method of the batch job is such a multidimensional frequency. Handle distribution 3-2.

・度数分布写像ｆを生成する。 Generate a frequency distribution map f.

度数分布写像ｆは、入力データの度数分布３の各階級に属するレコードが出力データの度数分布４のどの階級に出力されるかを表す。 The frequency distribution map f indicates to which class of the frequency distribution 4 of the output data the records belonging to each class of the frequency distribution 3 of the input data are output.

・実行時間写像ｇは、入力データの度数分布３の階級毎のレコード１件当たりのバッチジョブ平均実行時間を表す。 The execution time map g represents the average execution time of batch jobs per record for each class of the frequency distribution 3 of the input data.

次に、度数分布写像ｆを生成する度数分布写像生成方法を説明する。図２及び図３は、度数分布写像生成方法を説明するための図である。図２において、フィールドの名前、型、値域などの情報を含む既知の入出力データのスキーマ６の入出力データのレコード件数を見積もって、バッチジョブ２で方法を行う（ステップＳ１１）。レコード件数の見積値は、例えば、１０００件である。 Next, a frequency distribution map generation method for generating the frequency distribution map f will be described. 2 and 3 are diagrams for explaining a frequency distribution map generation method. In FIG. 2, the number of records of input / output data in the schema 6 of known input / output data including information such as field names, types, and value ranges is estimated, and the method is performed by the batch job 2 (step S11). The estimated number of records is 1000, for example.

入出力データのスキーマ６は、既知であり、各項目のデータタイプを定義した情報である。各データは、フィールド、型、値域等で定義される。フィールドには、「センサＩＤ」、「日時」、「温度」等の項目名が示される。「センサＩＤ」のデータ型は「ｉｎｔ」であり値域［０，９９９９］、「日時」のデータ型は「ｄａｔｅ」であり値域［０：００，２３：５９］、「温度」のデータ型は「ｆｌｏａｔ」であり値域［０，１００］等である。 The input / output data schema 6 is known and is information defining the data type of each item. Each data is defined by a field, a type, a value range, and the like. In the field, item names such as “sensor ID”, “date / time”, and “temperature” are shown. The data type of “sensor ID” is “int” and the value range [0,9999], the data type of “date and time” is “date”, the value range [0:00, 23:59], and the data type of “temperature” is “Float” and a range [0, 100] or the like.

バッチジョブ２では、出力データへの影響度が一定以上のフィールドを変量フィールド７に決定する（ステップＳ１２）。温度フィールドが決定された入力データの変量フィールド７として決定されたとする。 In the batch job 2, the variable field 7 is determined as a field having a certain degree of influence on the output data (step S12). It is assumed that the temperature field is determined as the variable field 7 of the determined input data.

入出力データの値域とレコード件数の見積値とに基づいて、夫々の度数分布の階級数と階級幅とを決定し、度数分布表３ｆを作成する（ステップＳ１３）。例えば、変量フィールド７に決定された温度フィールドに対して、階級（１）に対して階級幅は０℃〜５℃、階級（２）に対して階級幅は５℃〜１０℃、階級（３）に対して階級幅は１０℃〜１５℃、階級（４）に対して階級幅は１５℃〜２０℃、階級（５）に対して階級幅は２０℃〜２５℃、・・・のように階級数と階級幅とが決定される。 Based on the range of the input / output data and the estimated value of the number of records, the class number and class width of each frequency distribution are determined, and the frequency distribution table 3f is created (step S13). For example, for the temperature field determined as the variable field 7, the class width is 0 ° C. to 5 ° C. for the class (1), the class width is 5 ° C. to 10 ° C. for the class (2), and the class (3 ) For the class (4), the class width is 15 ° C. to 20 ° C., for the class (5), the class width is 20 ° C. to 25 ° C., and so on. The number of classes and class width are determined.

そして、入力データについて度数分布表３ｆの階級毎にランダムなデータを所定件数（例えば、１００個）ずつ生成する（ステップＳ１４）。度数分布表３ｆの度数には生成されたランダムなデータの個数が示される。 Then, for the input data, a predetermined number (for example, 100) of random data is generated for each class of the frequency distribution table 3f (step S14). The frequency of the frequency distribution table 3f indicates the number of generated random data.

図３において、ステップＳ１４で度数分布表３ｆの階級毎に生成したランダムな入力データをバッチジョブ２に投入する（ステップＳ１５）。各階級について、バッチジョブ２でレコードを１件ずつ処理し、各階級の所定件数（１００個）の入力データが出力データの度数分布４のどの階級に何件出力されたかをカウントする（ステップＳ１６）。カウントされた値は、入出力対応表２ａに記録される。 In FIG. 3, the random input data generated for each class of the frequency distribution table 3f in step S14 is input to the batch job 2 (step S15). For each class, batch job 2 processes the records one by one, and counts how many input data of a predetermined number (100) of each class are output to which class of frequency distribution 4 of the output data (step S16). ). The counted value is recorded in the input / output correspondence table 2a.

全ての入力データを処理した後、入出力対応表２ａの値を階級毎の件数（１００個）で割って、度数分布写像ｆを得る（ステップＳ１７）。 After processing all input data, the value of the input / output correspondence table 2a is divided by the number of cases (100) for each class to obtain a frequency distribution map f (step S17).

次に、度数分布写像ｇを生成する実行時間写像生成方法を説明する。図４及び図５は、実行時間写像生成方法を説明するための図である。図４において、フィールドの名前、型、値域などの情報を含む既知の入出力データのスキーマ６の入出力データのレコード件数を見積もって、バッチジョブ２で処理を行う（ステップＳ２１）。レコード件数の見積値は、例えば、１０００件である。入出力データのスキーマ６は、図２に示すスキーマと同様である。 Next, an execution time map generation method for generating the frequency distribution map g will be described. 4 and 5 are diagrams for explaining the execution time map generation method. In FIG. 4, the number of input / output data records in the schema 6 of known input / output data including information such as field names, types, and value ranges is estimated, and processing is performed by the batch job 2 (step S21). The estimated number of records is 1000, for example. The schema 6 of input / output data is the same as the schema shown in FIG.

バッチジョブ２では、出力データへの影響度が一定以上のフィールドを変量フィールド７に決定する（ステップＳ２２）。温度フィールドが決定された入力データの変量フィールド７として決定されたとする。 In the batch job 2, the variable field 7 is determined as a field having a certain degree of influence on the output data (step S22). It is assumed that the temperature field is determined as the variable field 7 of the determined input data.

入出力データの値域とレコード件数の見積値とに基づいて、夫々の度数分布の階級数と階級幅とを決定し、度数分布表３ｆを作成する（ステップＳ２３）。例えば、変量フィールド７に決定された温度フィールドに対して、階級（１）に対して階級幅は０℃〜５℃、階級（２）に対して階級幅は５℃〜１０℃、階級（３）に対して階級幅は１０℃〜１５℃、階級（４）に対して階級幅は１５℃〜２０℃、階級（５）に対して階級幅は２０℃〜２５℃、・・・のように階級数と階級幅とが決定される。 Based on the range of the input / output data and the estimated value of the number of records, the class number and class width of each frequency distribution are determined, and the frequency distribution table 3f is created (step S23). For example, for the temperature field determined as the variable field 7, the class width is 0 ° C. to 5 ° C. for the class (1), the class width is 5 ° C. to 10 ° C. for the class (2), and the class (3 ) For the class (4), the class width is 15 ° C. to 20 ° C., for the class (5), the class width is 20 ° C. to 25 ° C., and so on. The number of classes and class width are determined.

そして、入力データについて度数分布表３ｆの階級毎にランダムなデータを所定件数（例えば、１００個）ずつ生成する（ステップＳ２４）。度数分布表３ｆの度数には生成されたランダムなデータの個数が示される。 Then, a predetermined number (for example, 100) of random data is generated for each class of the frequency distribution table 3f for the input data (step S24). The frequency of the frequency distribution table 3f indicates the number of generated random data.

図５において、ステップＳ２４で度数分布表３ｆの階級毎に生成したランダムな入力データをバッチジョブ２に投入する（ステップＳ２５）。各階級について、バッチジョブ２でレコードを１件ずつ処理し、そのときの実行時間を測定し、各階級の所定件数（１００個）の入力データの総実行時間がどれだけかかったかを実行時間対応表２ｂに記録する（ステップＳ２６）。 In FIG. 5, the random input data generated for each class of the frequency distribution table 3f in step S24 is input to the batch job 2 (step S25). For each class, batch job 2 processes records one by one, measures the execution time at that time, and corresponds to the total execution time of the input data of the predetermined number (100) of each class Record in Table 2b (step S26).

全ての入力データを処理した後、実行時間対応表２ｂの値を階級毎の件数（１００個）で割って、実行時間写像ｇを得る（ステップＳ２７）。 After processing all input data, the value of the execution time correspondence table 2b is divided by the number of cases (100) for each class to obtain an execution time map g (step S27).

次に、出力データの度数分布を推定する出力データ度数分布推定方法を説明する。図６は、出力データ度数分布推定方法を説明するための図である。図６に示す出力データ度数分布推定方法において、実際の入力データ２ｄの度数分布表３ｆ−２を作成する（ステップＳ３１）。度数分布表３ｆ−２は、実際の入力データ２ｄに基づいて度数分布表３ｆの度数に件数が設定された表である。階級（１）は「５０」度数、階級（２）は「９２」度数、階級（３）は「８１」度数、階級（４）は「７３」度数、及び階級（５）は「４２」度数が設定され、実際の入力データ２ｄの合計件数は「３３８」件となる度数分布表３ｆ−２が作成されたとする。 Next, an output data frequency distribution estimation method for estimating the frequency distribution of output data will be described. FIG. 6 is a diagram for explaining an output data frequency distribution estimation method. In the output data frequency distribution estimation method shown in FIG. 6, the frequency distribution table 3f-2 of the actual input data 2d is created (step S31). The frequency distribution table 3f-2 is a table in which the number of cases is set to the frequency of the frequency distribution table 3f based on the actual input data 2d. Class (1) is “50” frequency, Class (2) is “92” frequency, Class (3) is “81” frequency, Class (4) is “73” frequency, and Class (5) is “42” frequency. Is set, and the frequency distribution table 3f-2 in which the total number of actual input data 2d is “338” is created.

そして、度数分布写像ｆの各値を係数とした一次式で、度数分布表３ｆの階級毎の度数に対する、出力データの各階級の度数を計算する（ステップＳ３２）。実際の入力データ２ｄの度数分布表３ｆ−２に対して推定された出力データの度数分布推定値４ｆが出力される。度数分布表３ｆに対して、階級（１）は「０」度数、階級（２）は「０」度数、階級（３）は「８１」度数、階級（４）は「７３」度数、及び階級（５）は「４２」度数を示す度数分布推定値４ｆを得る。実際の入力データ２ｄに対して予測される出力データ量は、合計件数「１９６」件となる。 Then, the frequency of each class of the output data is calculated with respect to the frequency for each class in the frequency distribution table 3f by a linear expression using each value of the frequency distribution map f as a coefficient (step S32). The frequency distribution estimated value 4f of the output data estimated with respect to the frequency distribution table 3f-2 of the actual input data 2d is output. For frequency distribution table 3f, class (1) is "0" frequency, class (2) is "0" frequency, class (3) is "81" frequency, class (4) is "73" frequency, and class (5) obtains a frequency distribution estimated value 4f indicating "42" frequency. The output data amount predicted for the actual input data 2d is the total number “196”.

次に、実行時間を推定する実行時間推定方法を説明する。図７は、実行時間推定方法を説明するための図である。図７に示す実行時間推定方法において、実際の入力データ２ｄの度数分布表３ｆ−２を作成する（ステップＳ４１）。度数分布表３ｆ−２の作成は、図６のステップＳ３１と同様である。 Next, an execution time estimation method for estimating the execution time will be described. FIG. 7 is a diagram for explaining the execution time estimation method. In the execution time estimation method shown in FIG. 7, a frequency distribution table 3f-2 of actual input data 2d is created (step S41). Creation of the frequency distribution table 3f-2 is the same as step S31 in FIG.

そして、実行時間写像ｇの各値を同じ階級の度数分布表３ｆ−２の度数に対する係数とした一次式で実行時間を計算する（ステップＳ４２）。図７に示す度数分布表３ｆ−２と実行時間写像ｇとから、
実行時間推定値＝０．１＊５０＋０．１＊９２＋０．２＊８１
＋０．２＊７３＋０．２＊４２
を計算して、実際の入力データ２ｄの度数分布表３ｆ−２に基づいて実行時間推定値５（５３．４ｍｓｅｃ）を得る。 Then, the execution time is calculated by a linear expression using each value of the execution time map g as a coefficient for the frequency in the frequency distribution table 3f-2 of the same class (step S42). From the frequency distribution table 3f-2 and the execution time map g shown in FIG.
Estimated execution time = 0.1 * 50 + 0.1 * 92 + 0.2 * 81
+ 0.2 * 73 + 0.2 * 42
And an execution time estimated value 5 (53.4 msec) is obtained based on the frequency distribution table 3f-2 of the actual input data 2d.

次に、変量フィールド決定方法について説明する。図８は、変量フィールド決定方法を説明するための図である。図８に示す変量フィールド決定方法では、どのフィールドも流用可能なデータセットＡと、あるフィールドａについて入力データのデータセットＢ１及びＢ２の２種類作成する。 Next, a variable field determination method will be described. FIG. 8 is a diagram for explaining a variable field determination method. In the variable field determination method shown in FIG. 8, two types of data sets A that can be used for any field and data sets B1 and B2 of input data for a certain field a are created.

データセットＡは、各フィールドの値を平均又は標準偏差を持った正規乱数を用いて設定したレコードを見積もり件数分作成したデータセットである。２種類のデータセットＢ１及びＢ２は、各レコードについてフィールドａ以外はデータセットＡの同一番目のものと同じ値とし、フィールドａだけデータセットＡのときとは異なる又は標準偏差を持った正規乱数を用いて変更した、レコード件数見積値分が作成されたデータセットである。 The data set A is a data set in which the number of records set for the estimated number of fields is set using normal random numbers having an average or standard deviation. The two types of data sets B1 and B2 are set to the same value as that of the data set A except for the field a for each record, and only the field a is a normal random number that is different from the data set A or has a standard deviation. It is a data set in which the estimated number of records is changed.

データセットＡとデータセットＢ１及びＢ２とをバッチジョブ２で夫々処理し、バッチジョブ２に基づく出力データ２ｅを比較して、それらがどれだけ異なるか出力データ２ｅへの影響度として数値化し、影響度が所定値以上を示すフィールドが変量フィールドに決定される。出力データ２ｅへの影響度は、件数、値の異なるフィールド数等に基づいて算出される。 The data set A and the data sets B1 and B2 are respectively processed by the batch job 2, the output data 2e based on the batch job 2 is compared, and how much they differ is quantified as the degree of influence on the output data 2e. A field whose degree indicates a predetermined value or more is determined as a variable field. The degree of influence on the output data 2e is calculated based on the number of cases, the number of fields having different values, and the like.

図８では、出力データ２ｅのうち、データセットＡに対するバッチジョブ２の処理後の結果はデータセットＡ−２で示され、データセットＢ１に対するバッチジョブ２の処理後の結果はデータセットＢ１−２で示され、データセットＢ２に対するバッチジョブ２の処理後の結果はデータセットＢ２−２で示される。 In FIG. 8, of the output data 2e, the result after the processing of the batch job 2 for the data set A is indicated by the data set A-2, and the result after the processing of the batch job 2 for the data set B1 is the data set B1-2. The result after processing of batch job 2 for data set B2 is indicated by data set B2-2.

データセットＡのバッチジョブ２の処理後の出力データＡ−２が基準となる。データセットＢ１は、「センサＩＤ」フィールドを変更したデータセットであり、バッチジョブ２の処理後の出力データＢ１−２は、「センサＩＤ」フィールド以外の値に変化が無かった例を示している。また、データセットＢ２は、「温度」フィールドを変更したデータセットであり、バッチジョブ２の処理後の出力データＢ２−２は、データセットＡのデータ量から変化（増減）した例を示している。データ量とは、レコード件数等で示される。 The output data A-2 after processing of the batch job 2 of the data set A is a reference. The data set B1 is a data set in which the “sensor ID” field is changed, and the output data B1-2 after processing of the batch job 2 shows an example in which there is no change in values other than the “sensor ID” field. . The data set B2 is a data set in which the “temperature” field is changed, and the output data B2-2 after the processing of the batch job 2 shows an example in which the data amount of the data set A is changed (increased or decreased). . The data amount is indicated by the number of records.

次に、出力データへの影響度αを算出する影響度算出方法について説明する。図９は、影響度算出方法を説明するための図である。図９のように、影響度算出方法では、まず、出力データのレコード件数を比較する（比較Ｉ）。比較Ｉの判断によって、レコード件数が、基準となる出力データＡ−２と異なる場合、出力データへの影響度αに「１」が設定される。 Next, an influence calculation method for calculating the influence α on the output data will be described. FIG. 9 is a diagram for explaining an influence degree calculation method. As shown in FIG. 9, in the influence calculation method, first, the number of records of output data is compared (Comparison I). When the number of records is different from the reference output data A-2 based on the judgment of the comparison I, “1” is set to the influence degree α on the output data.

データセットＡの出力データＡ−２のレコード件数が３００件であったのに対して、「温度」フィールドを変更したデータセットＢ２の出力データＢ２−２のレコード件数は２００件であった場合、「温度」フィールドの変更は、出力データへの影響度α＝１であると判断される。出力データＢ２−２のレコード件数が、基準となる出力データＡ−２のレコード件数より少なくてもまた多くても、一致しない場合は、影響度αは「１」となる。 When the number of records of the output data A-2 of the data set A is 300, whereas the number of records of the output data B2-2 of the data set B2 in which the “temperature” field is changed is 200, The change in the “temperature” field is determined to have an influence degree α = 1 on the output data. If the number of records of the output data B2-2 is less than or greater than the number of records of the output data A-2 serving as a reference, the degree of influence α is “1”.

比較（Ｉ）の判断によって、出力データのレコード件数が等しい場合、出力データの全レコードのうち、変更された対象フィールド以外のフィールドを比較する（比較ＩＩ）。比較（Ｉ）によって、出力データのレコード件数が基準の出力データＡ−２のレコード件数と等しいと判断された場合、出力データの全レコードのうち、対象フィールド以外のフィールドが比較される。 If the number of records in the output data is equal as a result of the comparison (I), fields other than the changed target field are compared among all records in the output data (Comparison II). When it is determined by comparison (I) that the number of records in the output data is equal to the number of records in the reference output data A-2, fields other than the target field are compared among all records in the output data.

影響度αは、下記の式により算出される。 The influence degree α is calculated by the following formula.

α ＝（値が異なるフィールド数）／（全レコードのフィールド数）
「センサＩＤ」フィールドを変更したデータセットＢ１の出力データＢ１−２において、全レコードのフィールド数は、レコード件数「３００」件に、「センサＩＤ」を除いた項目数「４」を乗算することによって得られる。全レコードのフィールド数は、３００＊４＝１２００となる。出力データＢ１−２では、変更した「センサＩＤ」フィールド以外では、基準となる出力データＡ−２とに変化がないため、値が異なるフィールド数は「０」となる。従って、影響度α＝０／１２００＝０を得る。 α = (number of fields with different values) / (number of fields in all records)
In the output data B1-2 of the data set B1 in which the “sensor ID” field is changed, the number of fields of all records is to multiply the number of records “300” by the number of items “4” excluding “sensor ID”. Obtained by. The number of fields in all records is 300 * 4 = 1200. In the output data B1-2, since there is no change in the reference output data A-2 except for the changed “sensor ID” field, the number of fields having different values is “0”. Therefore, the influence degree α = 0/1200 = 0 is obtained.

また、４８０箇所のフィールドが異なった別の出力データＣの場合、全レコードのフィールド数は、３００＊４＝１２００となる。値が異なるフィールド数は「４８０」であるため、影響度α＝４８０／１２００＝０．４を得る。 In the case of different output data C with 480 different fields, the number of fields in all records is 300 * 4 = 1200. Since the number of fields having different values is “480”, the degree of influence α = 480/1200 = 0.4 is obtained.

出力データＢ１−２と、出力データＣとは、どちらもレコード件数は「３００」件であるが、影響度αは異なっている。 Both the output data B1-2 and the output data C have “300” records, but the influence α is different.

以下に、バッチジョブの実行時間推定方法を実行するシステム１０００について説明する。図１０は、本実施の形態に係るシステムの機能構成例を示す図である。図１０において、システム１０００は、主に、度数分布生成部４０と、実行時間推定部５０とを有する。度数分布生成部４０と、実行時間推定部５０とは、個別のコンピュータ装置に実装され、夫々を度数分布生成装置と、実行時間推定としても良い。又は、度数分布生成部４０と、実行時間推定部５０とが、同一のコンピュータ装置に実装されても良い。度数分布生成部４０と、実行時間推定部５０とは、後述されるＣＰＵ１１が対応するプログラムを実行することによって実現される。 Hereinafter, a system 1000 that executes a batch job execution time estimation method will be described. FIG. 10 is a diagram illustrating a functional configuration example of the system according to the present embodiment. In FIG. 10, the system 1000 mainly includes a frequency distribution generation unit 40 and an execution time estimation unit 50. The degree number distribution generating unit 40, an execution time estimation unit 50 is mounted on a separate computer device, and the respective frequency distribution generating unit may be execution time estimation. Alternatively, the frequency distribution generation unit 40 and the execution time estimation unit 50 may be mounted on the same computer device. The frequency distribution generation unit 40 and the execution time estimation unit 50 are realized by the CPU 11 described later executing a corresponding program.

度数分布生成部４０は、入力データのバッチジョブ２への処理によって出力データに影響する変量フィールド７に関して度数分布表３ｆを生成し、度数分布表３ｆに基づいて出力データ量又は／及び実行時間を推定する度数分布写像ｆ又は／及び実行時間写像ｇを作成する処理部であり、変量フィールド決定部４１と、度数分布表生成部４２と、度数分布写像作成部４３と、実行時間写像作成部４４とを有する。 The frequency distribution generation unit 40 generates a frequency distribution table 3f for the variable field 7 that affects the output data by processing the input data to the batch job 2, and outputs the output data amount or / and the execution time based on the frequency distribution table 3f. A processing unit that creates a frequency distribution map f or / and an execution time map g to be estimated. A variable field determination unit 41, a frequency distribution table generation unit 42, a frequency distribution mapping generation unit 43, and an execution time mapping generation unit 44. And have.

変量フィールド決定部４１は、バッチジョブ後の出力データへの影響度αに基づいて、変量フィールドを決定する処理部である。変量フィールド決定部４１は、入出力データのスキーマ６に基づいて入力データを生成する際に、あるフィールドを変更したデータセットを作成し、バッチジョブ後の出力データへの影響度αを算出する。変量フィールド決定部４１は、算出した影響度αに基づいて、変量フィールドを決定する。 The variable field determination unit 41 is a processing unit that determines a variable field based on the degree of influence α on the output data after the batch job. When the input field is generated based on the input / output data schema 6, the variable field determination unit 41 creates a data set in which a certain field is changed, and calculates the degree of influence α on the output data after the batch job. The variable field determination unit 41 determines a variable field based on the calculated influence degree α.

度数分布表生成部４２は、入出力データのスキーマ６を参照することによって、変量フィールド決定部４１によって決定された変量フィールド７の値域とレコード件数の見積値とに基づいて、夫々の度数分布の階級と階級幅を決定して度数分布表３ｆを作成する処理部である。 The frequency distribution table generation unit 42 refers to the schema 6 of the input / output data, and based on the range of the variable field 7 determined by the variable field determination unit 41 and the estimated value of the number of records, This is a processing unit that determines the class and class width and creates the frequency distribution table 3f.

度数分布写像作成部４３は、度数分布表３ｆに基づくランダムな入力データ２ｄに対してバッチジョブ２を行うことによって、度数分布写像ｆを作成する処理部である。実行時間写像作成部４４は、度数分布表３ｆに基づくランダムな入力データ２ｄに対してバッチジョブ２を行うことによって、実行時間写像ｇを作成する処理部である。 The frequency distribution map creation unit 43 is a processing unit that creates a frequency distribution map f by performing batch job 2 on random input data 2d based on the frequency distribution table 3f. The execution time map creation unit 44 is a processing unit that creates an execution time map g by performing batch job 2 on random input data 2d based on the frequency distribution table 3f.

実行時間推定部５０は、実際の入力データの度数分布表３ｆ−２と、度数分布写像ｆ又は／及び実行時間写像ｇとを用いて実行時間を推定する処理部であり、入力データ度数分布作成部５１と、実行時間推定部５２と、出力データ度数分布推定部５３とを有する。 The execution time estimation unit 50 is a processing unit that estimates the execution time using the frequency distribution table 3f-2 of the actual input data and the frequency distribution map f or / and the execution time map g, and creates an input data frequency distribution. Unit 51, execution time estimation unit 52, and output data frequency distribution estimation unit 53.

入力データ度数分布作成部５１は、実際の入力データの件数で度数分布表３ｆに度数を設定することにより、度数分布表３ｆ−２を作成する処理部である。 The input data frequency distribution creation unit 51 is a processing unit that creates the frequency distribution table 3f-2 by setting the frequency in the frequency distribution table 3f with the actual number of input data.

実行時間推定部５２は、入力データ度数分布作成部５１によって度数分布表３ｆに度数を設定することによって作成された度数分布表３ｆ−２に基づいて、実行時間を推定する処理部である。実行時間推定部５２は、実行時間推定値５を、度数分布表３ｆ−２の度数の総和に基づいて計算する。又は、実行時間推定部５２は、実行時間推定値５を、実行時間写像ｇの各値を係数とした一次式で計算する。 The execution time estimation unit 52 is a processing unit that estimates the execution time based on the frequency distribution table 3f-2 created by setting the frequency in the frequency distribution table 3f by the input data frequency distribution creation unit 51. The execution time estimation unit 52 calculates the execution time estimation value 5 based on the sum of frequencies in the frequency distribution table 3f-2. Alternatively, the execution time estimation unit 52 calculates the execution time estimated value 5 by a linear expression using each value of the execution time map g as a coefficient.

出力データ度数分布推定部５３は、入力データ度数分布作成部５１によって度数分布表３ｆを用いて作成された度数分布表３ｆ−２に基づいて、度数分布写像ｆの各値を係数とした一次式で出力データの各段階の度数を計算することによって、出力データの度数分布推定値４ｆを作成する処理部である。 Based on the frequency distribution table 3f-2 created by the input data frequency distribution creation unit 51 using the frequency distribution table 3f, the output data frequency distribution estimation unit 53 is a linear expression using each value of the frequency distribution map f as a coefficient. Is a processing unit that creates the frequency distribution estimated value 4f of the output data by calculating the frequency of each stage of the output data.

度数分布生成部４０と、実行時間推定部５０とを実現するコンピュータ装置１０のハードウェア構成について説明する。図１１は、コンピュータ装置のハードウェア構成を示す図である。図１１において、コンピュータ装置１０は、コンピュータによって制御される端末であって、ＣＰＵ（Central Processing Unit）１１と、主記憶装置１２と、補助記憶装置１３と、入力装置１４と、表示装置１５と、出力装置１６と、通信Ｉ／Ｆ（インターフェース）１７と、ドライブ１８とを有し、バスＢに接続される。 A hardware configuration of the computer apparatus 10 that implements the frequency distribution generation unit 40 and the execution time estimation unit 50 will be described. FIG. 11 is a diagram illustrating a hardware configuration of the computer apparatus. In FIG. 11, a computer device 10 is a terminal controlled by a computer, and includes a CPU (Central Processing Unit) 11, a main storage device 12, an auxiliary storage device 13, an input device 14, a display device 15, The output device 16, a communication I / F (interface) 17, and a drive 18 are connected to the bus B.

ＣＰＵ１１は、主記憶装置１２に格納されたプログラムに従ってコンピュータ装置１０を制御する。主記憶装置１２には、ＲＡＭ（Random Access Memory）及びＲＯＭ（Read-Only Memory）等が用いられ、ＣＰＵ１１にて実行されるプログラム、ＣＰＵ１１での処理に必要なデータ、ＣＰＵ１１での処理にて得られたデータ等を格納する。また、主記憶装置１２の一部の領域が、ＣＰＵ１１での処理に利用されるワークエリアとして割り付けられている。 The CPU 11 controls the computer device 10 according to a program stored in the main storage device 12. The main storage device 12 uses RAM (Random Access Memory), ROM (Read-Only Memory), etc., and is obtained by a program executed by the CPU 11, data necessary for processing by the CPU 11, and processing by the CPU 11. Stored data and the like are stored. A part of the main storage device 12 is allocated as a work area used for processing by the CPU 11.

補助記憶装置１３には、ハードディスクドライブが用いられ、各種処理を実行するためのプログラム等のデータを格納する。補助記憶装置１３に格納されているプログラムの一部が主記憶装置１２にロードされ、ＣＰＵ１１に実行されることによって、各種処理が実現される。記憶部１３０は、主記憶装置１２及び／又は補助記憶装置１３を有する。 The auxiliary storage device 13 uses a hard disk drive and stores data such as programs for executing various processes. A part of the program stored in the auxiliary storage device 13 is loaded into the main storage device 12 and executed by the CPU 11, whereby various processes are realized. The storage unit 130 includes the main storage device 12 and / or the auxiliary storage device 13.

入力装置１４は、マウス、キーボード等を有し、ユーザがコンピュータ装置１０による処理に必要な各種情報を入力するために用いられる。表示装置１５は、ＣＰＵ１１の制御のもとに必要な各種情報を表示する。出力装置１６は、プリンタ等を有し、ユーザからの指示に応じて各種情報を出力するために用いられる。通信Ｉ／Ｆ１７は、例えばインターネット、ＬＡＮ（Local Area Network）等に接続し、外部装置との間の通信制御をするための装置である。 The input device 14 includes a mouse, a keyboard, and the like, and is used for a user to input various information necessary for processing by the computer device 10. The display device 15 displays various information required under the control of the CPU 11. The output device 16 has a printer or the like and is used for outputting various types of information in accordance with instructions from the user. The communication I / F 17 is a device that is connected to, for example, the Internet, a LAN (Local Area Network), etc., and controls communication with an external device.

コンピュータ装置１０によって行われる処理を実現するプログラムは、例えば、ＣＤ−ＲＯＭ（Compact Disc Read-Only Memory）等の記憶媒体１９によってコンピュータ装置１０に提供される。即ち、プログラムが保存された記憶媒体１９がドライブ１８にセットされると、ドライブ１８が記憶媒体１９からプログラムを読み出し、その読み出されたプログラムがバスＢを介して補助記憶装置１３にインストールされる。そして、プログラムが起動されると、補助記憶装置１３にインストールされたプログラムに従ってＣＰＵ１１がその処理を開始する。尚、プログラムを格納する媒体としてＣＤ−ＲＯＭに限定するものではなく、コンピュータが読み取り可能な媒体であればよい。コンピュータ読取可能な記憶媒体として、ＣＤ−ＲＯＭの他に、ＤＶＤディスク、ＵＳＢメモリ等の可搬型記録媒体、フラッシュメモリ等の半導体メモリであっても良い。 A program that realizes processing performed by the computer apparatus 10 is provided to the computer apparatus 10 by a storage medium 19 such as a CD-ROM (Compact Disc Read-Only Memory). That is, when the storage medium 19 storing the program is set in the drive 18, the drive 18 reads the program from the storage medium 19, and the read program is installed in the auxiliary storage device 13 via the bus B. . When the program is activated, the CPU 11 starts its processing according to the program installed in the auxiliary storage device 13. The medium for storing the program is not limited to a CD-ROM, and any medium that can be read by a computer may be used. As a computer-readable storage medium, in addition to a CD-ROM, a portable recording medium such as a DVD disk or a USB memory, or a semiconductor memory such as a flash memory may be used.

また、コンピュータ装置１０によって行われる処理を実現するプログラムが、通信Ｉ／Ｆ１７を介して外部装置から提供されてもよい。或いは、外部装置へ該プログラムを提供し、後述される各処理は外部装置で実現されるように構成してもよい。通信Ｉ／Ｆ１７による通信は無線又は有線に限定されるものではない。 Further, a program that realizes processing performed by the computer device 10 may be provided from an external device via the communication I / F 17. Alternatively, the program may be provided to an external device, and each process described below may be realized by the external device. Communication by the communication I / F 17 is not limited to wireless or wired.

コンピュータ装置１０が、度数分布生成部４０を実装する装置である場合、記憶部１３０に入出力データのスキーマ６等が格納される。コンピュータ装置１０が、実行時間推定部５０を実装する装置である場合、記憶部１３０に入力データ２ｄ、度数分布写像ｆ、度数分布表３ｆ及び度数分布推定値４ｆを含む度数分布データ３４ｆ、実行時間写像ｇ等が格納される。 When the computer device 10 is a device that implements the frequency distribution generation unit 40, the input / output data schema 6 and the like are stored in the storage unit 130. When the computer device 10 is a device that implements the execution time estimation unit 50, the storage unit 130 includes the input data 2d, the frequency distribution map f, the frequency distribution table 3f, and the frequency distribution data 34f including the frequency distribution estimation value 4f, the execution time. The mapping g and the like are stored.

また、度数分布生成部４０と、実行時間推定部５０とを一つのコンピュータ装置１０で実現する場合には、コンピュータ装置１０がシステム１０００全体に相当する。 Further, when the frequency distribution generation unit 40 and the execution time estimation unit 50 are realized by one computer apparatus 10, the computer apparatus 10 corresponds to the entire system 1000.

度数分布写像作成部４３による度数分布写像作成処理について説明する。図１２は、度数分布写像作成処理を説明するための図である。図１２に示す度数分布写像作成処理において、ＣＰＵ１１は、入出力データのスキーマ６から入力データのスキーマを取得する（ステップＳ１０１）。入出力データのスキーマ６から、ユーザによって選択された入力データのスキーマを読み込む。 The frequency distribution map creating process by the frequency distribution map creating unit 43 will be described. FIG. 12 is a diagram for explaining the frequency distribution mapping process. In the frequency distribution mapping process shown in FIG. 12, the CPU 11 acquires the schema of input data from the schema 6 of input / output data (step S101). The input data schema selected by the user is read from the input / output data schema 6.

ＣＰＵ１１は、入力データの変量フィールドを決定する変量フィールド決定処理を実行する（ステップＳ１０２）。その後、ＣＰＵ１１は、入力データの度数分布表３ｆを作成する度数分布表生成処理を実行する（ステップＳ１０３）。度数分布表生成処理により、階級数、及び階級幅が決定される。 The CPU 11 executes a variable field determination process for determining a variable field of the input data (step S102). Thereafter, the CPU 11 executes a frequency distribution table generation process for creating the frequency distribution table 3f of the input data (step S103). The number of classes and the class width are determined by the frequency distribution table generation process.

ＣＰＵ１１は、入力データの度数分布３の階級毎にランダムな入力データを生成して（ステップＳ１０４）、バッチジョブ２で１件ずつ処理して、出力データの度数分布４のどの階級に出力されたかをカウントする（ステップＳ１０５）。 The CPU 11 generates random input data for each class of the frequency distribution 3 of the input data (step S104), processes it one by one in the batch job 2, and outputs to which class of the frequency distribution 4 of the output data. Is counted (step S105).

生成した入力データ全てに対してバッチジョブ２での処理を終了した後、ＣＰＵ１１は、階級毎のカウントされた値を入力データの総数（階級毎の度数）で割って、度数分布写像ｇを作成する（ステップＳ１０６）。そして、ＣＰＵ１１は、この度数分布写像作成処理を終了する。 After completing the processing in batch job 2 for all generated input data, the CPU 11 divides the counted value for each class by the total number of input data (frequency for each class) to create a frequency distribution map g. (Step S106). Then, the CPU 11 ends the frequency distribution mapping process.

実時間写像作成部４３による実時間写像作成処理について図１３で説明する。図１３は、実時間写像作成処理を説明するための図である。図１３に示す実時間写像作成処理において、ＣＰＵ１１は、入出力データのスキーマ６から入力データのスキーマを取得する（ステップＳ１１１）。入出力データのスキーマ６から、ユーザによって選択された入力データのスキーマを読み込む。 The real-time map creation process by the real-time map creation unit 43 will be described with reference to FIG. FIG. 13 is a diagram for explaining real-time mapping creation processing. In the real-time mapping creation process shown in FIG. 13, the CPU 11 acquires the schema of input data from the schema 6 of input / output data (step S111). The input data schema selected by the user is read from the input / output data schema 6.

ＣＰＵ１１は、入力データの変量フィールドを決定する変量フィールド決定処理を実行する（ステップＳ１１２）。また、変量フィールドの決定後、ＣＰＵ１１は、入力データの度数分布表３ｆを生成する度数分布表生成処理を実行する（ステップＳ１１３）。度数分布表生成処理により、階級数、及び階級幅が決定される。 The CPU 11 executes a variable field determination process for determining a variable field of the input data (step S112). After determining the variable field, the CPU 11 executes a frequency distribution table generation process for generating the frequency distribution table 3f of the input data (step S113). The number of classes and the class width are determined by the frequency distribution table generation process.

ＣＰＵ１１は、入力データの度数分布３の階級毎にランダムな入力データを生成して（ステップＳ１１４）、バッチジョブで１件ずつ処理して、１件ごとの実行時間を測定する（ステップＳ１１５）。 The CPU 11 generates random input data for each class of the frequency distribution 3 of the input data (step S114), processes it one by one in a batch job, and measures the execution time for each item (step S115).

そして、ＣＰＵ１１は、階級毎の実行時間の平均から実行時間写像ｇを作成する（ステップＳ１１６）。 Then, the CPU 11 creates an execution time map g from the average execution time for each class (step S116).

出力データ度数分布推定部５３による出力データ度数分布推定処理について図１４で説明する。図１４は、出力データ度数分布推定処理を説明するための図である。バッチジョブ２が単体の場合と、バッチジョブ２が多段の場合とで、出力データ度数分布推定部５３による出力データ度数分布推定処理が異なる。 The output data frequency distribution estimation processing by the output data frequency distribution estimation unit 53 will be described with reference to FIG. FIG. 14 is a diagram for explaining the output data frequency distribution estimation process. The output data frequency distribution estimation processing by the output data frequency distribution estimation unit 53 differs between the case where the batch job 2 is a single unit and the case where the batch job 2 is multistage.

図１４（Ａ）では、バッチジョブ２が単体の場合の当該バッチジョブの出力データ度数分布推定処理を説明する。図１４（Ａ）において、ＣＰＵ１１は、実際の入力データ２ｄの度数分布表３ｆ−２と、当該バッチジョブ２の度数分布写像ｆとを記憶部１３０から読み込んで（ステップＳ１３１）、度数分布写像ｆの各値を係数とした一次式で出力データの各階級の度数を計算する（ステップＳ１３２）。 FIG. 14A describes output data frequency distribution estimation processing for a batch job 2 when the batch job 2 is a single unit. 14A, the CPU 11 reads the frequency distribution table 3f-2 of the actual input data 2d and the frequency distribution map f of the batch job 2 from the storage unit 130 (step S131), and the frequency distribution map f. The frequency of each class of the output data is calculated by a linear expression with each value as a coefficient (step S132).

そして、ＣＰＵ１１は、計算した各階級の度数を出力データの度数分布推定値４ｆを生成し（ステップＳ１３３）、このバッチジョブ２が単体の場合の処理を終了する。 Then, the CPU 11 generates the frequency distribution estimated value 4f of the output data for the calculated frequency of each class (step S133), and ends the process when the batch job 2 is a single unit.

図１４（Ｂ）では、バッチジョブ２が多段の場合の処理を説明する。図１４（Ｂ）において、ＣＰＵ１１は、入力データの度数分布表３ｆを生成する（ステップＳ１４１）。 In FIG. 14B, processing when the batch job 2 has multiple stages will be described. In FIG. 14B, the CPU 11 generates a frequency distribution table 3f of input data (step S141).

ＣＰＵ１１は、当該バッチジョブの出力データ度数分布推定処理を実行する（ステップＳ１４２）。当該バッチジョブの出力データ度数分布推定処理にて出力データの度数分布推定値４ｆを生成した後、ＣＰＵ１１は、後段バッチジョブが存在するか否かを判断する（ステップＳ１４３）。 The CPU 11 executes output data frequency distribution estimation processing for the batch job (step S142). After generating the output data frequency distribution estimated value 4f in the output data frequency distribution estimation process of the batch job, the CPU 11 determines whether or not a subsequent batch job exists (step S143).

後段バッチジョブが存在すると判断した場合、ＣＰＵ１１は、当該バッチジョブの出力データの度数分布推定値４ｆを入力データの度数分布表３ｆに設定して（ステップＳ１４４）、ステップＳ１４２へと戻り、上述同様の処理を繰り返す。一方、後段バッチジョブが存在しないと判断した場合、ＣＰＵ１１は、この処理を終了する。 When determining that there is a subsequent batch job, the CPU 11 sets the frequency distribution estimated value 4f of the output data of the batch job in the frequency distribution table 3f of the input data (step S144), returns to step S142, and the same as described above. Repeat the process. On the other hand, if it is determined that there is no subsequent batch job, the CPU 11 ends this process.

実行時間推定部５２による実行時間推定処理について図１５で説明する。図１５は、実行時間推定処理を説明するための図である。バッチジョブ２が単体の場合と、バッチジョブ２が多段の場合とで、実行時間推定部５２による実行時間推定処理が異なる。 The execution time estimation process by the execution time estimation unit 52 will be described with reference to FIG. FIG. 15 is a diagram for explaining the execution time estimation process. The execution time estimation processing by the execution time estimation unit 52 differs between the case where the batch job 2 is a single unit and the case where the batch job 2 is multistage.

図１５（Ａ）では、バッチジョブ２が単体の場合の当該バッチジョブの実行時間推定処理を説明する。図１５（Ａ）において、ＣＰＵ１１は、実際の入力データ２ｄの度数分布表３ｆ−２と、当該バッチジョブ２の度数分布写像ｆとを記憶部１３０から読み込んで（ステップＳ１５１）、実行時間写像ｇの各値を係数とした一次式で実行時間を計算する（ステップＳ１５２）。そして、ＣＰＵ１１は、この処理を終了する。 FIG. 15A describes the execution time estimation process of the batch job 2 when the batch job 2 is a single unit. In FIG. 15A, the CPU 11 reads the frequency distribution table 3f-2 of the actual input data 2d and the frequency distribution map f of the batch job 2 from the storage unit 130 (step S151), and executes the execution time map g. The execution time is calculated by a linear expression with each value as a coefficient (step S152). Then, the CPU 11 ends this process.

図１５（Ｂ）では、バッチジョブ２が多段の場合の処理を説明する。図１５（Ｂ）において、ＣＰＵ１１は、入力データの度数分布表３ｆを生成する（ステップＳ１６１）。 In FIG. 15B, processing when the batch job 2 has multiple stages will be described. In FIG. 15B, the CPU 11 generates a frequency distribution table 3f of input data (step S161).

ＣＰＵ１１は、当該バッチジョブの実行時間推定処理を実行する（ステップＳ１６２）。当該バッチジョブの実行時間推定処理にて実行時間を計算した後、ＣＰＵ１１は、後段バッチジョブが存在するか否かを判断する（ステップＳ１６３）。 The CPU 11 executes the execution time estimation process for the batch job (step S162). After calculating the execution time in the execution time estimation process of the batch job, the CPU 11 determines whether there is a subsequent batch job (step S163).

後段バッチジョブが存在すると判断した場合、ＣＰＵ１１は、バッチジョブネットワークの総実行時間に当該バッチジョブの実行時間を加算して（ステップＳ１６４）、ステップＳ１６２へと戻り、上述同様の処理を繰り返す。一方、後段バッチジョブが存在しないと判断した場合、ＣＰＵ１１は、この処理を終了する。 When determining that there is a subsequent batch job, the CPU 11 adds the execution time of the batch job to the total execution time of the batch job network (step S164), returns to step S162, and repeats the same processing as described above. On the other hand, if it is determined that there is no subsequent batch job, the CPU 11 ends this process.

図１２のステップＳ１０２及び図１３のステップＳ１１２での変量フィールド決定処理について図１６で説明する。図１６は、変量フィールド決定処理を説明するための図である。図１６に示す変量フィールド決定処理において、ＣＰＵ１１は、終端データではないかを判断する（ステップＳ１７１）。終端データであると判断した場合、ＣＰＵ１１は、この処理を終了する。一方、終端データではないと判断した場合、入力データ２ｄの各フィールドを平均又は標準偏差を持った正規乱数を用いてデータセットＡを生成する（ステップＳ１７２）。 The variable field determination process in step S102 of FIG. 12 and step S112 of FIG. 13 will be described with reference to FIG. FIG. 16 is a diagram for explaining the variable field determination process. In the variable field determination process shown in FIG. 16, the CPU 11 determines whether it is the terminal data (step S171). If it is determined that the data is the end data, the CPU 11 ends this process. On the other hand, if it is determined that the data is not the terminal data, a data set A is generated using normal random numbers having an average or standard deviation for each field of the input data 2d (step S172).

そして、ＣＰＵ１１は、各レコードについてフィールドａ以外はデータセットＡの同一番目のフィールドと同じ値とし、フィールドａだけデータセットＡの時とは異なる平均又は標準偏差を持った正規乱数を用いて変更したデータセットＢを生成する（ステップＳ１７３）。 Then, the CPU 11 sets each record to the same value as the first field of the data set A except for the field a, and changes only the field a using a normal random number having an average or standard deviation different from that of the data set A. Data set B is generated (step S173).

ＣＰＵ１１は、データセットＡ及びＢをバッチジョブ２で処理し、出力としてデータセットＣ及びＤを得る（ステップＳ１７４）。ＣＰＵ１１は、データセットＣ及びＤを用いて、出力データの影響を算出する（ステップＳ１７５）。 The CPU 11 processes the data sets A and B by the batch job 2 and obtains the data sets C and D as outputs (step S174). The CPU 11 calculates the influence of the output data using the data sets C and D (step S175).

ＣＰＵ１１は、出力データの影響度αが一定以上か否かを判断する（ステップＳ１７６）。一定以上でない場合、ＣＰＵ１１は、ステップＳ１７８へと進む。一方、一定以上の場合、ＣＰＵ１１は、フィールドａを変量フィールドに決定し（ステップＳ１７７）、前段バッチジョブについてフィールドａを入力データ２ｄの変量フィールドに変更する（ステップＳ１７８）。その後、ＣＰＵ１１は、この処理を終了する。 The CPU 11 determines whether or not the influence level α of the output data is greater than or equal to a certain level (step S176). If not, the CPU 11 proceeds to step S178. On the other hand, if the value is above a certain level, the CPU 11 determines the field a as a variable field (step S177), and changes the field a to the variable field of the input data 2d for the preceding batch job (step S178). Thereafter, the CPU 11 ends this process.

図１６のステップＳ１７５での出力データの影響度算出処理について図１７で説明する。図１７は、出力データの影響度算出処理を説明するための図である。図１７に示す出力データの影響度算出処理において、ＣＰＵ１１は、データセットＣ及びＤを比較して（ステップＳ１８１）、データセットＣとデータセットＤのレコード数が異なるか否かを判断する（ステップＳ１８２）。この比較処理が、図９の比較Ｉに相当する。 The output data influence degree calculation processing in step S175 of FIG. 16 will be described with reference to FIG. FIG. 17 is a diagram for explaining the influence calculation processing of output data. In the output data influence calculation process shown in FIG. 17, the CPU 11 compares the data sets C and D (step S181), and determines whether the number of records in the data set C and the data set D is different (step S181). S182). This comparison process corresponds to the comparison I in FIG.

レコード数が一致する場合、ＣＰＵ１１は、出力データの影響度αに「１」を設定して（ステップＳ１８３）、この処理を終了する。 If the number of records matches, the CPU 11 sets “1” as the influence α of the output data (step S183), and ends this process.

一方、レコード数が異なる場合、ＣＰＵ１１は、更に、値を変更したフィールドａ以外で値が異なるフィールドが存在するか否かを判断する（ステップＳ１８４）。値が異なるフィールドが存在しない場合、ＣＰＵ１１は、データセットＣとＤ間で値が異なるフィールド数を全レコードのフィールド数で割ることによって、出力データの影響度αを計算し（ステップＳ１８５）、この処理を終了する。一方、値が異なるフィールドが存在する場合、ＣＰＵ１１は、この処理を終了する。 On the other hand, if the number of records is different, the CPU 11 further determines whether there is a field with a different value other than the field a whose value has been changed (step S184). If there is no field with different values, the CPU 11 calculates the influence α of the output data by dividing the number of fields with different values between the data sets C and D by the number of fields of all records (step S185). The process ends. On the other hand, if there are fields with different values, the CPU 11 ends this process.

以下に、本実施の形態を温度センサデータのバッチ処理に適用した場合について図１８で説明する。図１８は、温度センサデータのバッチ処理の一例を説明するための図である。図１８に示す温度センサデータのバッチ処理の例では、建屋に多数配置された温度センサ８のデータ（以下、温度センサデータと言う）に基づいて、部屋Ａの平均温度を求める。建屋全体を２５×２５（ｍ^２）とする。温度センサデータは０．１℃変化する毎に送信され、送信間隔は不定期である。 Hereinafter, a case where the present embodiment is applied to batch processing of temperature sensor data will be described with reference to FIG. FIG. 18 is a diagram for explaining an example of batch processing of temperature sensor data. In the example of batch processing of temperature sensor data shown in FIG. 18, the average temperature of the room A is obtained based on data of the temperature sensors 8 arranged in the building (hereinafter referred to as temperature sensor data). The entire building is 25 × 25 (m ² ). The temperature sensor data is transmitted every 0.1 ° C., and the transmission interval is irregular.

温度センサデータは一箇所に収集され建屋全体の温度センサデータＤ１に格納される。温度センサデータＤ１のフィールドの項目は、「センサＩＤ」、「日時」、「温度」、「ｘ」、「ｙ」等である。 The temperature sensor data is collected at one place and stored in the temperature sensor data D1 of the entire building. The field items of the temperature sensor data D1 are “sensor ID”, “date / time”, “temperature”, “x”, “y”, and the like.

バッチジョブＪ１は、部屋Ａの温度センサデータＤ２を抽出する。部屋Ａの位置情報が、例えば、図１８に示すｘ及びｙ方向によって、部屋Ａの対角線上の頂点となる座標（８、８）と座標（１８、１６）とで表される場合、センサＩＤで識別される各センサのうち、部屋Ａの領域内を示すレコードが抽出される。温度センサデータＤ２のフィールドの項目は、温度センサデータＤ１のフィールドの項目と同様である。 The batch job J1 extracts the temperature sensor data D2 of the room A. For example, when the position information of the room A is represented by coordinates (8, 8) and coordinates (18, 16) which are the vertices on the diagonal line of the room A in the x and y directions shown in FIG. Among the sensors identified by (1), a record indicating the interior of the room A is extracted. The field items of the temperature sensor data D2 are the same as the field items of the temperature sensor data D1.

バッチジョブＪ２は、バッチジョブＪ１によって抽出された部屋Ａの温度センサデータＤ２に基づいて、位置毎の平均温度を計算する。ｘｙで示される各位置の平均温度データＤ３が出力される。平均温度データＤ３のフィールドの項目は、「ｘ」、「ｙ」、「平均温度」等である。 The batch job J2 calculates an average temperature for each position based on the temperature sensor data D2 of the room A extracted by the batch job J1. Average temperature data D3 at each position indicated by xy is output. The items of the field of the average temperature data D3 are “x”, “y”, “average temperature”, and the like.

図１９は、各データのスキーマと件数見積値の例を示す図である。図１９において、温度センサデータＤ１のスキーマ６−１は、フィールド、型、値域等で各データを定義している。「センサＩＤ」、「日時」、「温度」、「ｘ」、及び「ｙ」フィールドの各々に対して、型、値域等が定義されている。 FIG. 19 is a diagram illustrating an example of the schema of each data and the estimated number of cases. In FIG. 19, the schema 6-1 of the temperature sensor data D1 defines each data by field, type, value range, and the like. For each of the “sensor ID”, “date / time”, “temperature”, “x”, and “y” fields, a type, a value range, and the like are defined.

温度センサデータＤ２のスキーマ６−２は、フィールド、型、値域等で各データを定義している。「センサＩＤ」、「日時」、「温度」、「ｘ」、及び「ｙ」フィールドの各々に対して、型、値域等が定義されている。 The schema 6-2 of the temperature sensor data D2 defines each data by field, type, value range, and the like. For each of the “sensor ID”, “date / time”, “temperature”, “x”, and “y” fields, a type, a value range, and the like are defined.

平均温度データＤ３のスキーマ６−３は、フィールド、型、値域等で各データを定義している。「ｘ」、「ｙ」、及び「平均温度」フィールドの各々に対して、型、値域等が定義されている。 The schema 6-3 of the average temperature data D3 defines each data with fields, types, value ranges, and the like. For each of the “x”, “y”, and “average temperature” fields, a type, a value range, and the like are defined.

各データＤ１、Ｄ２、及びＤ３の件数見積値は、ともに１０００件とする。 Assume that the estimated number of cases of each data D1, D2, and D3 is 1000 cases.

以下に、このようなスキーマ６−１から６−３と各件数見積値とに基づいて、度数分布写像ｆのみを使用して、実行時間を推定する第１実施例について説明する。 Hereinafter, a description will be given of a first embodiment in which the execution time is estimated using only the frequency distribution map f based on the schemas 6-1 to 6-3 and the estimated number of cases.

第１実施例における手順は下記の通りである。
（１）入力データ（温度差センサデータＤ１）の度数分布表３ｆ−２から度数分布写像ｆを用いて出力データ（温度差センサデータＤ２）の度数分布推定値４ｆを計算し、度数分布推定値４ｆからバッチジョブＪ１の出力データ量（温度差センサデータＤ２のデータ量）を求める。
（２）出力データ量（温度差センサデータＤ２のデータ量）から特許文献１に開示される手法を用いて実行時間を推定して、実行時間推定値５を得る。出力データ量（温度差センサデータＤ２のデータ量）は度数分布推定値４ｆの度数の総和で容易に得られる。
（３）出力データ（温度差センサデータＤ２）の度数分布推定値４ｆは、後続のバッチジョブＪ２の入力データの度数分布表３ｆとして用いる。後続バッチジョブＪ２の実行時間の推定は、上述同様に、出力データ量（各位置の平均温度データＤ３の出力データ量）から特許文献１に開示される手法を用いて行う。 The procedure in the first embodiment is as follows.
(1) The frequency distribution estimated value 4f of the output data (temperature difference sensor data D2) is calculated from the frequency distribution table 3f-2 of the input data (temperature difference sensor data D1) using the frequency distribution map f, and the frequency distribution estimated value is calculated. The output data amount of batch job J1 (data amount of temperature difference sensor data D2) is obtained from 4f.
(2) The execution time is estimated from the output data amount (data amount of the temperature difference sensor data D2) using the method disclosed in Patent Document 1, and the execution time estimated value 5 is obtained. The output data amount (data amount of the temperature difference sensor data D2) can be easily obtained by the sum of the frequencies of the frequency distribution estimated value 4f.
(3) The frequency distribution estimated value 4f of the output data (temperature difference sensor data D2) is used as the frequency distribution table 3f of the input data of the subsequent batch job J2. As described above, the execution time of the subsequent batch job J2 is estimated using the method disclosed in Patent Document 1 from the output data amount (the output data amount of the average temperature data D3 at each position).

第１実施例における変量フィールドの決定例について図２０で説明する。図２０及び図２１は、第１実施例における変量フィールドの決定例を説明するための図である。図２０にて、温度センサデータＤ１において、基準となる温度センサデータＤ１−０と、温度センサデータＤ１−０に対して「センサＩＤ」のみを変更した温度センサデータＤ１−１と、温度センサデータＤ１−０に対して「ｘ」のみを変更した温度センサデータＤ１−２とが、夫々１０００件ずつ用意される。 An example of determining the variable field in the first embodiment will be described with reference to FIG. 20 and 21 are diagrams for explaining an example of determining a variable field in the first embodiment. 20, in temperature sensor data D1, reference temperature sensor data D1-0, temperature sensor data D1-1 in which only “sensor ID” is changed with respect to temperature sensor data D1-0, and temperature sensor data 1000 pieces of temperature sensor data D1-2 in which only “x” is changed with respect to D1-0 are prepared.

「センサＩＤ」のみを変更した温度センサデータＤ１−１と、温度センサデータＤ１−０に対して「ｘ」のみを変更した温度センサデータＤ１−２とを例として以下に説明するが、「センサＩＤ」及び「ｘ」以外のフィールドの各々のみを変更した温度センサデータＤ１が用意される。 The temperature sensor data D1-1 in which only the “sensor ID” is changed and the temperature sensor data D1-2 in which only “x” is changed with respect to the temperature sensor data D1-0 will be described below as an example. Temperature sensor data D1 is prepared in which only fields other than “ID” and “x” are changed.

基準となる温度センサデータＤ１−０をバッチジョブＪ１で処理することによって、温度センサデータＤ１−０からは部屋Ａの温度センサデータＤ２−０が抽出され、抽出後の基準データとして使用される。 By processing the reference temperature sensor data D1-0 with the batch job J1, the temperature sensor data D2-0 of the room A is extracted from the temperature sensor data D1-0 and used as the reference data after extraction.

「センサＩＤ」のみを変更した温度センサデータＤ１−１をバッチジョブＪ１で処理することによって、温度センサデータＤ１−１からは部屋Ａの温度センサデータＤ２−１が抽出され、「センサＩＤ」以外の値について温度センサデータＤ２−０と比較される。この例では、「センサＩＤ」以外の値が全て同じであったため、影響度α（センサＩＤ）に「０」が設定される。 By processing the temperature sensor data D1-1 in which only the “sensor ID” is changed by the batch job J1, the temperature sensor data D2-1 of the room A is extracted from the temperature sensor data D1-1, and other than “sensor ID”. Is compared with the temperature sensor data D2-0. In this example, since all values other than “sensor ID” are the same, “0” is set to the influence degree α (sensor ID).

「ｘ」のみを変更した温度センサデータＤ１−２をバッチジョブＪ１で処理することによって、温度センサデータＤ１−２からは部屋Ａの温度センサデータＤ２−２が抽出され、「ｘ」以外の値について温度センサデータＤ２−０と比較される。この例では、「ｘ」以外の値が全て同じであったため、影響度α（ｘ）に「０」が設定される。 By processing the temperature sensor data D1-2 in which only “x” is changed by the batch job J1, the temperature sensor data D2-2 of the room A is extracted from the temperature sensor data D1-2, and a value other than “x”. Is compared with the temperature sensor data D2-0. In this example, since all values other than “x” are the same, “0” is set to the influence degree α (x).

上述したように、「センサＩＤ」及び「ｘ」以外のフィールドの各々のみを変更した温度センサデータＤ１についても影響度αを計算する。 As described above, the degree of influence α is also calculated for the temperature sensor data D1 in which only the fields other than “sensor ID” and “x” are changed.

更に、図２１において、影響度αが一定値（例えば、０．３）以上となるフィールドａを変量フィールドとする。バッチジョブＪ１に基づく影響度αから、温度センサデータＤ１の変量フィールドに「ｘ」及び「ｙ」が変量フィールドに設定される。 Furthermore, in FIG. 21, a field a in which the influence degree α is a certain value (for example, 0.3) or more is defined as a variable field. From the influence α based on the batch job J1, “x” and “y” are set in the variable field of the temperature sensor data D1.

また、後段のバッチジョブＪ２に基づく影響度αから、温度センサデータＤ２の変量フィールドに「温度」、「ｘ」、及び「ｙ」が変量フィールドに設定される。ここで、温度センサデータＤ１と温度センサデータＤ２は、フィールドの項目が一致するデータセットであるため、Ｄ１の変量フィールドがＤ２の変量フィールドを含むことが条件となる。従って、Ｄ１の変量フィールドに、Ｄ２の変量フィールドとの差分となる「温度」を追加する。 Further, “temperature”, “x”, and “y” are set in the variable field of the temperature sensor data D2 from the influence α based on the subsequent batch job J2. Here, since the temperature sensor data D1 and the temperature sensor data D2 are data sets in which the field items match, the condition is that the variable field of D1 includes the variable field of D2. Therefore, “temperature” which is a difference from the variable field of D2 is added to the variable field of D1.

そして、終端データである平均温度データＤ３の変量フィールドには、「ｘ」、「ｙ」、及び「平均温度」を設定する。 Then, “x”, “y”, and “average temperature” are set in the variable field of the average temperature data D3 that is the terminal data.

次に、入力データＤ１の度数分布表３ｆの作成例について説明する。先ず、Sturgesの公式（非特許文献１）を用いて階級数及び階級幅を決定する。件数見積値を１０００件とすることにより、数１の計算によって、温度センサデータＤ１の変量フィールド「温度」の階級数ｋ＝１１を得る。 Next, an example of creating the frequency distribution table 3f of the input data D1 will be described. First, the number of classes and the class width are determined using the Sturges formula (Non-Patent Document 1). By setting the estimated number of cases to 1000, the number k of the variable field “temperature” of the temperature sensor data D1 is obtained by the calculation of Equation 1.

また、数２の計算によって、変量フィールド「温度」の階級幅ｈとして、凡そ９．０９を得る。 Further, by calculation of Equation 2, approximately 9.09 is obtained as the class width h of the variable field “temperature”.

他の変量フィールド「ｘ」及び「ｙ」の各々について、上述したように階級数ｋ及び階級幅ｈを計算する。 For each of the other variable fields “x” and “y”, the class number k and the class width h are calculated as described above.

温度センサデータＤ１の度数分布３ｆは、図２２に示されるような、３次元度数分布３８で表される。図２２は、３次元度数分布の例を示す図である。図２２にて、３次元度数分布３８は、「温度」の階級３８ａ、「ｘ」の階級３８ｂ、「ｙ」の階級３８ｃの次元で表される。 The frequency distribution 3f of the temperature sensor data D1 is represented by a three-dimensional frequency distribution 38 as shown in FIG. FIG. 22 is a diagram illustrating an example of a three-dimensional frequency distribution. In FIG. 22, the three-dimensional frequency distribution 38 is represented by the dimensions of a “temperature” class 38a, an “x” class 38b, and a “y” class 38c.

変量フィールド「温度」の各階級は、階級幅ｈ＝９．０９で区切られて、「温度」の階級３８ａのようなデータ例を示す。「ｘ」の各階級は、階級幅ｈ＝４．５４で区切られて、「ｘ」の階級３８ｂのようなデータ例を示す。「ｙ」の各階級は、階級幅ｈ＝４．５４で区切られて、「ｙ」の階級３８ｃのようなデータ例を示す。 Each class of the variable field “temperature” is divided by a class width h = 9.09 to show a data example such as a class 38a of “temperature”. Each class of “x” is divided by a class width h = 4.54 to show an example of data such as a class 38b of “x”. Each class of “y” is divided by a class width h = 4.54, and shows a data example such as class “c” of “y”.

次に、第１実施例における度数分布写像ｆの作成例について説明する。図２３及び図２４は、第１実施例における度数分布写像の作成例を説明するための図である。図２３（Ａ）では、図２２の３次元度数分布３８に基づく度数分布表３ｆを示している。度数分布表３ｆでは、変量フィールド「温度」、「ｘ」、及び「ｙ」の組み合せ毎に度数「１００」が設定されている。 Next, an example of creating the frequency distribution map f in the first embodiment will be described. 23 and 24 are diagrams for explaining an example of creating a frequency distribution map in the first embodiment. FIG. 23A shows a frequency distribution table 3f based on the three-dimensional frequency distribution 38 of FIG. In the frequency distribution table 3f, the frequency “100” is set for each combination of the variable fields “temperature”, “x”, and “y”.

組み合せ毎に示される度数分（１００個）のサンプルデータ３９を生成する。例えば、「温度」の「０〜９．０９」階級、「ｘ」の「０〜４．５４」階級、及び「ｙ」の「０〜４．５４」階級の組み合せに対する（１００２、１２：２０、２．１、３、２）等を含む１００個のデータが生成される。 The sample data 39 corresponding to the frequency (100 pieces) indicated for each combination is generated. For example, for a combination of the “temperature” “0-9.09” class, the “x” “0-4.54” class, and the “y” “0-4.54” class (1002, 12:20). , 2.1, 3, 2), etc. are generated.

そして、サンプルデータ３９をバッチジョブＪ１で１件ずつ処理し、出力データの度数分推定値４ｆのどの階級に出力されたのかを入出力対応表２ａでカウントした値を記録する。 Then, the sample data 39 is processed one by one by the batch job J1, and the value counted in the input / output correspondence table 2a to which class of the estimated value 4f of the output data is output is recorded.

図２３（Ｂ）に例示するように、便宜上、階級の組み合せの一部にＡ、Ｂ、Ｃ、Ｄ、及びＥの階級名を付ける。図２３（Ｃ）にて、バッチジョブＪ１の場合の、図２３（Ｂ）の階級名を用いた入出力対応表２ａの例を示す。 As illustrated in FIG. 23B, for convenience, class names of A, B, C, D, and E are given to some of the class combinations. FIG. 23C shows an example of the input / output correspondence table 2a using the class name of FIG. 23B in the case of the batch job J1.

バッチジョブＪ１への入力データである温度センサデータＤ１と、バッチジョブＪ１後の温度センサデータＤ２との対応付けを示す、階級名Ａ〜Ｅのマトリクスを含む入力対応表２ａにおいて、入力（Ｄ１）の階級Ｂから出力（Ｄ２）の階級Ｂへと３８回出力され、入力（Ｄ１）の階級Ｃから出力（Ｄ２）の階級Ｃへと１００回出力され、入力（Ｄ１）の階級Ｂから出力（Ｄ２）の階級Ｂへと６４回出力される。 In the input correspondence table 2a including a matrix of class names A to E showing the correspondence between the temperature sensor data D1 which is input data to the batch job J1 and the temperature sensor data D2 after the batch job J1, input (D1) Is output 38 times from the class B to the class B of the output (D2), is output 100 times from the class C of the input (D1) to the class C of the output (D2), and is output from the class B of the input (D1) ( It is output 64 times to class B of D2).

図２３（Ｃ）に示される入力対応表２ａから図２４に示されるような度数分布写像ｆを得る。図２４は、第１実施例における度数分布写像の例を示す図である。図２４において、図２３（Ｃ）に示される入力対応表２ａの各値を階級毎の度数（１００件）で割ることによって、バッチジョブＪ１の度数分布写像ｆを取得する。 A frequency distribution map f as shown in FIG. 24 is obtained from the input correspondence table 2a shown in FIG. FIG. 24 is a diagram illustrating an example of a frequency distribution map in the first embodiment. In FIG. 24, the frequency distribution map f of the batch job J1 is obtained by dividing each value of the input correspondence table 2a shown in FIG. 23C by the frequency for each class (100 cases).

次に、第１実施例における出力データの度数分布推定例について説明する。図２５は、第１実施例における出力データの度数分布推定例を説明するための図である。図２５（Ａ）では、実際の入力データＤ１の度数分布表３ｆ−２を作成する。作成された度数分布表３ｆ−２は、例えば、階級Ａの度数は「５０」、階級Ｂの度数は「９２」、階級Ｃの度数は「８１」、階級Ｄの度数は「７３」、階級Ｅの度数は「４２」等を示す。 Next, a frequency distribution estimation example of output data in the first embodiment will be described. FIG. 25 is a diagram for explaining an example of estimating the frequency distribution of output data in the first embodiment. In FIG. 25A, a frequency distribution table 3f-2 of actual input data D1 is created. The created frequency distribution table 3f-2 includes, for example, the frequency of class A is “50”, the frequency of class B is “92”, the frequency of class C is “81”, the frequency of class D is “73”, and the class The frequency of E indicates “42” or the like.

そして、度数分布写像ｆの各値を係数とした一次式で出力データの度数分布の各階級の度数を推定する。図２５（Ｂ）では、図２４に示す度数分布写像ｆを用いて、出力されるデータＤ２の度数分布推定値４ｆの作成例を示す。 Then, the frequency of each class of the frequency distribution of the output data is estimated by a linear expression using each value of the frequency distribution map f as a coefficient. FIG. 25B shows an example of creating the frequency distribution estimated value 4f of the output data D2 using the frequency distribution map f shown in FIG.

データＤ２の階級Ｂの度数を、Ｄ２［Ｂ］と表記し、
Ｄ２［Ｂ］＝・・・＋０＊Ｄ１［Ａ］＋０．３８＊Ｄ１［Ｂ］＋０＊Ｄ１［Ｃ］
＋０＊Ｄ１［Ｄ］＋０＊Ｄ１［Ｄ］＋・・・
で求められる。 The frequency of the class B of the data D2 is expressed as D2 [B],
D2 [B] = ... + 0 * D1 [A] + 0.38 * D1 [B] + 0 * D1 [C]
+ 0 * D1 [D] + 0 * D1 [D] +.
Is required.

他階級についても同様の計算を行うことによって、度数分布推定値４ｆを得ることができる。 By performing the same calculation for the other classes, the frequency distribution estimated value 4f can be obtained.

そして、得られた度数分布推定値４ｆを用いて出力データ量を算出する。 Then, the output data amount is calculated using the obtained frequency distribution estimated value 4f.

出力データ量＝・・・＋Ｄ２［Ａ］＋Ｄ２［Ｂ］＋Ｄ２［Ｃ］＋Ｄ２［Ｄ］
＋Ｄ２［Ｅ］＋・・・
のようにして得られる。得られた温度差センサデータＤ２の出力データ量に基づいて、特許文献１の手法を用いてバッチジョブＪ１の実行時間推定値５を算出する。 Output data amount = ... + D2 [A] + D2 [B] + D2 [C] + D2 [D]
+ D2 [E] + ...
It is obtained as follows. Based on the output data amount of the obtained temperature difference sensor data D2, the estimated execution time 5 of the batch job J1 is calculated using the method of Patent Document 1.

また、度数分布推定値４ｆを後続バッチジョブＪ２の入力データの度数分布表３ｆ−２に設定し、上述した同様の処理を行うことによって、各位置の平均温度データＤ３の出力データ量から後続バッチジョブＪ２の実行時間推定値５を算出することができる。 Further, the frequency distribution estimated value 4f is set in the frequency distribution table 3f-2 of the input data of the subsequent batch job J2, and the same processing as described above is performed, so that the output data amount of the average temperature data D3 at each position is used to determine the subsequent batch. it is possible to calculate the execution time estimate value 5 job J2.

以下に、度数分布写像ｆと実行時間写像ｇとを使用して、実行時間を推定する第２実施例について説明する。 The second embodiment for estimating the execution time using the frequency distribution map f and the execution time map g will be described below.

第２実施例における手順は下記の通りである。
（１）入力データ（温度差センサデータＤ１）の度数分布表３ｆ−２から度数分布写像ｆを用いて出力データ（温度差センサデータＤ２）の度数分布推定値４ｆを計算する。
（２）入力データ（温度差センサデータＤ１）の度数分布表３ｆ−２から実行時間写像ｇを用いて実行時間を推定して、実行時間推定値５を得る。
（３）出力データ（温度差センサデータＤ２のデータ量）の度数分布推定値４ｆは、後続のバッチジョブＪ２の入力データの度数分布表３ｆとして用いる。後続バッチジョブＪ２の実行時間の推定は、上述同様に、実行時間写像ｇを用いて行う。 The procedure in the second embodiment is as follows.
(1) The frequency distribution estimated value 4f of the output data (temperature difference sensor data D2) is calculated from the frequency distribution table 3f-2 of the input data (temperature difference sensor data D1) using the frequency distribution map f.
(2) The execution time is estimated from the frequency distribution table 3f-2 of the input data (temperature difference sensor data D1) using the execution time map g, and the execution time estimated value 5 is obtained.
(3) The frequency distribution estimated value 4f of the output data (data amount of the temperature difference sensor data D2) is used as the frequency distribution table 3f of the input data of the subsequent batch job J2. The execution time of the subsequent batch job J2 is estimated using the execution time map g as described above.

第２実施例における変量フィールドの決定は、第１実施例で説明した通りであるので、その詳細な説明を省略する。 Since the determination of the variable field in the second embodiment is as described in the first embodiment, its detailed description is omitted.

第２実施例における実行時間写像ｇの作成例について説明する。図２６及び図２７は、第２実施例における実行時間写像の作成例を説明するための図である。図２６（Ａ）では、図２２の３次元度数分布３８に基づく度数分布表３ｆを示している。度数分布写像３ｆの作成時と同様の、図２２の３次元度数分布３８に基づく度数分布表ｆを用いる。度数分布表３ｆでは、変量フィールド「温度」、「ｘ」、及び「ｙ」の組み合せ毎に度数「１００」が設定されている。 An example of creating the execution time map g in the second embodiment will be described. 26 and 27 are diagrams for explaining an example of creating an execution time map in the second embodiment. FIG. 26A shows a frequency distribution table 3f based on the three-dimensional frequency distribution 38 of FIG. The frequency distribution table f based on the three-dimensional frequency distribution 38 of FIG. 22 is used, which is the same as when the frequency distribution map 3f is created. In the frequency distribution table 3f, the frequency “100” is set for each combination of the variable fields “temperature”, “x”, and “y”.

度数分布写像ｆの作成時と同様に、組み合せ毎に示される度数分（１００個）のサンプルデータ３９を生成する。例えば、「温度」の「０〜９．０９」階級、「ｘ」の「０〜４．５４」階級、及び「ｙ」の「０〜４．５４」階級の組み合せに対する（１００２、１２：２０、２．１、３、２）等を含む１００個のデータが生成される。 Similarly to the creation of the frequency distribution map f , sample data 39 corresponding to the frequency (100) indicated for each combination is generated. For example, for a combination of the “temperature” “0-9.09” class, the “x” “0-4.54” class, and the “y” “0-4.54” class (1002, 12:20). , 2.1, 3, 2), etc. are generated.

そして、サンプルデータ３９をバッチジョブＪ１で１件ずつ処理し、実行時間５を実行時間対応表２ｂでカウントした値を記録する。 Then, the sample data 39 is processed one by one with the batch job J1, and the value obtained by counting the execution time 5 in the execution time correspondence table 2b is recorded.

図２６（Ｂ）に例示するように、便宜上、階級の組み合せの一部にＡ、Ｂ、Ｃ、Ｄ、及びＥの階級名を付ける。図２６（Ｃ）にて、バッチジョブＪ１の場合の、図２６（Ｂ）の階級名を用いた実行時間対応表２ｂの例を示す。実行時間対応表２ｂでは、階級毎の１００件の総実行時間が示される。 As illustrated in FIG. 26B, for convenience, class names of A, B, C, D, and E are given to some of the class combinations. FIG. 26C shows an example of the execution time correspondence table 2b using the class names of FIG. 26B in the case of the batch job J1. The execution time correspondence table 2b shows 100 total execution times for each class.

図２６（Ｃ）に示される実行時間対応表２ａから図２７に示されるような実行時間写像ｇを得る。図２６（Ｄ）では、バッチジョブＪ１の実行時間写像ｇが例示される。図２６（Ｄ）において、図２６（Ｃ）に示される実行時間対応表２ｂの各値を階級毎の度数（１００件）で割ることによって、バッチジョブＪ１の実行時間写像ｇを取得する。 An execution time map g as shown in FIG. 27 is obtained from the execution time correspondence table 2a shown in FIG. FIG. 26D illustrates an execution time map g of the batch job J1. In FIG. 26D, the execution time map g of the batch job J1 is obtained by dividing each value of the execution time correspondence table 2b shown in FIG. 26C by the frequency for each class (100 cases).

図２７は、第２実施例における実行時間の推定例を説明するための図である。図２７（Ａ）では、実際の入力データＤ１の度数分布表３ｆ−２の作成例が示される。度数分布表３ｆ−２の一部である階級Ａ、Ｂ、Ｃ、Ｄ、Ｅの夫々に対して、度数「５０」、「９２」、「８１」、「７３」、及び「４２」を示す例である。 FIG. 27 is a diagram for explaining an example of estimating the execution time in the second embodiment. FIG. 27A shows an example of creating the frequency distribution table 3f-2 of actual input data D1. The frequencies “50”, “92”, “81”, “73”, and “42” are shown for each of the classes A, B, C, D, and E which are part of the frequency distribution table 3f-2. It is an example.

そして、図２７（Ｂ）に例示される実行時間写像ｇを用いて実行時間推定値５を計算する。図２７（Ｂ）に示される実行時間写像ｇは、バッチジョブＪ１の実行時間写像ｇ（図２６（Ｄ））に相当する。実行時間写像ｇの各値を係数とした一次式で、実行時間推定値５が計算される。例えば、バッチジョブＪ１の実行時間ｔ［Ｊ１］は、以下のように計算される。 Then, the execution time estimated value 5 is calculated using the execution time map g illustrated in FIG. The execution time map g shown in FIG. 27 (B) corresponds to the execution time map g (FIG. 26 (D)) of the batch job J1. An execution time estimated value 5 is calculated by a linear expression using each value of the execution time map g as a coefficient. For example, the execution time t [J1] of the batch job J1 is calculated as follows.

ｔ［Ｊ１］＝・・・＋０．１＊Ｄ１［Ａ］＋０．３＊Ｄ１［Ｂ］
＋０．３＊Ｄ１［Ｃ］＋０．３＊Ｄ１［Ｄ］＋０．１＊Ｄ１［Ｅ］
＝０．１＊（１１２５−９２−８１−７３）
＋０．３＊（９２＋８１＋７３）
＝１６１．７
但し、温度センサデータＤ１の実際のレコード数は１１２５件とする。また、Ｂ、Ｃ、Ｄ以外の階級の平均実行時間は全て０．１ｍｓとする。 t [J1] =... + 0.1 * D1 [A] + 0.3 * D1 [B]
+ 0.3 * D1 [C] + 0.3 * D1 [D] + 0.1 * D1 [E]
= 0.1 * (1125-92-81-73)
+ 0.3 * (92 + 81 + 73)
= 161.7
However, the actual number of records of the temperature sensor data D1 is 1125. The average execution time of classes other than B, C, and D is all 0.1 ms.

入力データのレコード件数Ｎが同一であるが異なる度数分布により推定時間が異なる場合について図２８及び図２９で説明する。 A case where the number of records N of the input data is the same but the estimated time is different due to different frequency distributions will be described with reference to FIGS.

図２８は、第２実施例における他の実行時間の推定例を説明するための図である。図２８では、入力データＤ１とＤ１'のレコード件数Ｎがどちらも１１２５件であるとする。 FIG. 28 is a diagram for explaining another estimation example of the execution time in the second embodiment. In FIG. 28, it is assumed that the record number N of the input data D1 and D1 ′ is 1125.

入力データＤ１の度数分布表３ｆ−２では、その一部分において、階級Ａでは「５０」度数を示し、階級Ｂでは「９２」度数を示し、階級Ｃでは「８１」度数を示し、階級Ｄでは「７３」度数を示し、階級Ｅでは「４２」度数を示す。 In the frequency distribution table 3 f-2 of the input data D 1, a part of the frequency distribution table 3 f-2 indicates “50” frequency, the class B indicates “92” frequency, the class C indicates “81” frequency, and the class D indicates “ 73 ”frequency, and class E indicates“ 42 ”frequency.

入力データＤ１'の度数分布表３ｆ−２'では、その一部分において、階級Ａでは「２０」度数を示し、階級Ｂでは「１５」度数を示し、階級Ｃでは「２４」度数を示し、階級Ｄでは「３０」度数を示し、階級Ｅでは「２５」度数を示す。 In the frequency distribution table 3f-2 ′ of the input data D1 ′, in a part thereof, the class A indicates “20” frequency, the class B indicates “15” frequency, the class C indicates “24” frequency, and the class D Indicates “30” frequency, and class E indicates “25” frequency.

また、予め作成された実行時間写像ｇによって示される階級毎の実行時間は、一部分において、階級Ａでは「０．１」msecを示し、階級Ｂでは「０．３」msecを示し、階級Ｃでは「０．３」msecを示し、階級Ｄでは「０．３」msecを示し、階級Ｅでは「０．１」msecを示す。 In addition, the execution time for each class indicated by the execution time map g created in advance shows “0.1” msec in class A, “0.3” msec in class B, and “C” in class C. “0.3” msec is shown, “0.3” msec is shown in class D, and “0.1” msec is shown in class E.

このような場合において、入力データの度数分布表３ｆ−２及び実行時間写像を用いない従来手法による実行時間を推定するための一次式がｔ［Ｊ１］＝０．１Ｎ＋５０である場合、
ｔ［Ｊ１］＝０．１＊１１２５＋５０＝１６２．５
として求められる。 In such a case, when the linear expression for estimating the execution time by the conventional method not using the frequency distribution table 3f-2 of the input data and the execution time mapping is t [J1] = 0.1N + 50,
t [J1] = 0.1 * 1125 + 50 = 162.5
As required.

一方、本実施の形態による入力データＤ１とＤ１'に対する実行時間推定値５は、
入力データＤ１の場合、
ｔ［Ｊ１］＝０．１＊（１１２５−９２−８１−７３）
＋０．３＊（９２＋８１＋７３）＝１６１．７
となる。 On the other hand, the estimated execution time 5 for the input data D1 and D1 ′ according to the present embodiment is
For input data D1,
t [J1] = 0.1 * (1125-92-81-73)
+ 0.3 * (92 + 81 + 73) = 161.7
It becomes.

また、入力データＤ１'の場合、
ｔ［Ｊ１］＝０．１＊（１１２５−１５−２４−３０）
＋０．３＊（１５＋２４＋３０）＝１２６．３
となる。 In the case of input data D1 ′,
t [J1] = 0.1 * (1125-15-24-30)
+ 0.3 * (15 + 24 + 30) = 126.3
It becomes.

このように、本実施の形態では、入力データＤ１と入力データＤ１'とでは、異なる実行時間推定値５が求まるのに対して、従来手法では、入力データ件数が同一である場合には、一つ実行時間推定値５しか得られない。つまり、従来手法では、推定された実行時間には誤差を含むことを意味しており、一方、本実施の形態では、推定誤差をより小さくし高精度に実行時間推定値５を求めることができる。 As described above, in the present embodiment, different execution time estimation values 5 are obtained for the input data D1 and the input data D1 ′, whereas in the conventional method, when the number of input data items is the same, one is obtained. Only one execution time estimate 5 is obtained. That is, the conventional method means that the estimated execution time includes an error. On the other hand, in the present embodiment, the estimation error can be made smaller and the execution time estimated value 5 can be obtained with high accuracy. .

図２８で例示した実行時間の推定値の違い（推定誤差）について図２９で説明する。図２９は、推定誤差を説明するための図である。図２９において、バッチジョブＪ１は、範囲［℃_１、℃_２］のデータを抽出し、所定処理を行うバッチジョブである。 A difference (estimation error) in the estimated value of execution time illustrated in FIG. 28 will be described with reference to FIG. FIG. 29 is a diagram for explaining the estimation error. In FIG. 29, a batch job J1 is a batch job that extracts data in the range [° C. ₁ , ° C. ₂ ] and performs a predetermined process.

この例において、レコード件数に相当する入力データ量Ｎが１１２５件であったとする。同一の入力データ量Ｎであっても、温度の度数分布が異なる場合がある。入力データＤ１では度数分布２９ａ（図２８のＤ１の度数分布表３ｆ−２）を示し、入力データＤ１'では度数分布２９ｂ（図２８のＤ１'の度数分布表３ｆ−２'）を示す等がある。例えば、入力データＤ１は夏のデータであり、入力データＤ１'は冬のデータである場合等である。 In this example, it is assumed that the input data amount N corresponding to the number of records is 1125. Even with the same input data amount N, the frequency distribution of temperature may be different. The input data D1 indicates a frequency distribution 29a (D1 frequency distribution table 3f-2 in FIG. 28), the input data D1 ′ indicates a frequency distribution 29b (D1 ′ frequency distribution table 3f-2 ′ in FIG. 28), and the like. is there. For example, the input data D1 is summer data, and the input data D1 ′ is winter data.

従来手法による実行時間推定値は、上述したように、入力データＤ１及び入力データＤ１'に対してｔ［Ｊ１］＝１６２．５で同じ結果となる。度数分布２９ａと度数分布２９ｂとの違いに関わらず同一の実行時間推定値が算出される。 As described above, the estimated execution time value according to the conventional method has the same result at t [J1] = 162.5 with respect to the input data D1 and the input data D1 ′. Regardless of the difference between the frequency distribution 29a and the frequency distribution 29b, the same execution time estimated value is calculated.

本実施の形態を適用した場合、入力データ量Ｎが同じ１１２５件であったとしても、度数分布２９ａを得た場合、度数分布２９ａ（図２８のＤ１の度数分布表３ｆ−２）に基づいてバッチジョブＪ１後の出力データ量に係る度数分布２９ｃが求まる。また、実行時間写像ｇを用いた一次式により、ｔ［Ｊ１］＝１６１．７を得る。 When this embodiment is applied, even if the input data amount N is the same 1125 cases, when the frequency distribution 29a is obtained, the frequency distribution 29a (frequency distribution table 3f-2 of D1 in FIG. 28) is obtained. A frequency distribution 29c relating to the output data amount after the batch job J1 is obtained. Further, t [J1] = 161.7 is obtained by a linear expression using the execution time map g.

また、度数分布２９ｂ（図２８のＤ１'の度数分布表３ｆ−２'）を得た場合、度数分布２９ｂに基づいてバッチジョブＪ１後の出力データ量に係る度数分布２９ｄが求まる。また、実行時間写像ｇを用いた一次式により、ｔ［Ｊ１］＝１２６．３を得る。 Further, when the frequency distribution 29b (the frequency distribution table 3f-2 ′ of D1 ′ in FIG. 28) is obtained, the frequency distribution 29d related to the output data amount after the batch job J1 is obtained based on the frequency distribution 29b. Further, t [J1] = 16.3 is obtained by a linear expression using the execution time map g.

この推定値ｔ［Ｊ１］＝１６１．７と推定値ｔ［Ｊ１］＝１２６．３との違いが、従来手法の場合における実行時間の推定誤差に相当する。 The difference between the estimated value t [J1] = 161.7 and the estimated value t [J1] = 16.3 corresponds to an execution time estimation error in the case of the conventional method.

更に、従来手法において、入力データ量から出力データ量を推測する場合、その出力データ量にも推定誤差が生じる。バッチジョブネットワークでは、出力データが後続バッチジョブの入力データとなるため、出力データ量の推定誤差が大きい場合、バッチジョブ毎に推定誤差が蓄積される。また、出力データ量の推定のみならず実行時間の推定にも影響を与えるため、適切な実行時間を推定することができない場合がある。 Further, in the conventional method, when the output data amount is estimated from the input data amount, an estimation error also occurs in the output data amount. In the batch job network, the output data becomes input data of the subsequent batch job. Therefore, when the estimation error of the output data amount is large, the estimation error is accumulated for each batch job. In addition, since it affects not only the estimation of the output data amount but also the estimation of the execution time, it may not be possible to estimate an appropriate execution time.

一方、本実施の形態では、入力データ量Ｎが同じ件数であっても、度数分布２９ａ及び度数分布２９ｂの各々に基づいて、出力データ量及び実行時間を精度良く推定する。従って、バッチジョブネットワークにおいても、推定誤差による影響を少なくすることができ、出力データ量及び実行時間を精度良く推定できる。 On the other hand, in the present embodiment, even if the input data amount N is the same number, the output data amount and the execution time are accurately estimated based on each of the frequency distribution 29a and the frequency distribution 29b. Therefore, even in the batch job network, the influence of the estimation error can be reduced, and the output data amount and the execution time can be estimated with high accuracy.

以下に、実行時間写像ｇのみを使用して、実行時間を推定する第３実施例について説明する。 A third embodiment for estimating the execution time using only the execution time map g will be described below.

第３実施例では、バッチジョブが単一のバッチジョブのみで構成されている場合に、入力データの度数分布表３ｆ−２から度数分布写像ｆを用いて実行時間を推定して、実行時間推定値５を得る。 In the third embodiment, when a batch job is composed of only a single batch job, the execution time is estimated by using the frequency distribution map f from the frequency distribution table 3f-2 of the input data, and the execution time is estimated. A value of 5 is obtained.

第３実施例において、変量フィールドの決定は、第１実施例等と同様であるので、その詳細な説明を省略する。また、実行時間写像ｇの作成は、第２実施例等と同様であるので、その詳細な説明を省略する。 In the third embodiment, the determination of the variable field is the same as that in the first embodiment and the detailed description thereof is omitted. The creation of the execution time map g is the same as that in the second embodiment and the detailed description thereof is omitted.

以上より、本実施の形態では、入力データの値の分布によって出力データ量と実行時間とを推定するため、入力データ量のみで出力データ量と実行時間とを推定する従来手法に比べて、推定誤差を小さくすることができる。 As described above, in this embodiment, since the output data amount and the execution time are estimated based on the distribution of the input data values, the estimation is performed as compared with the conventional method in which the output data amount and the execution time are estimated only by the input data amount. The error can be reduced.

本実施の形態では、複数のフィールドで構成される入力データのレコードを１件ずつ処理するようなバッチジョブ２について、入力データの度数分布から度数分布写像又は／及び実行時間写像を用いて実行時間を推定することで、バッチジョブ２に関するソース情報、設定ファイル等が不明であっても、実行時間を精度良く推定できる。また、入力データの度数分布と度数分布写像とから出力データの度数分布を取得でき、出力データ量を高精度で推定することができる。 In this embodiment, for a batch job 2 that processes input data records composed of a plurality of fields one by one, the execution time using the frequency distribution map or / and the execution time map from the frequency distribution of the input data. By estimating the execution time, the execution time can be accurately estimated even if the source information, the setting file, etc. regarding the batch job 2 are unknown. Further, the frequency distribution of the output data can be acquired from the frequency distribution of the input data and the frequency distribution map, and the output data amount can be estimated with high accuracy.

本発明は、具体的に開示された実施例に限定されるものではなく、特許請求の範囲から逸脱することなく、種々の変形や変更が可能である。 The present invention is not limited to the specifically disclosed embodiments, and various modifications and changes can be made without departing from the scope of the claims.

以上の第１から第３実施例を含む実施形態に関し、更に以下の付記を開示する。
（付記１）
階級毎に入力データの度数の分布を示す度数分布表と、該度数分布表に基づく入力データから所定処理による結果情報への写像を記憶する記憶部と、
前記記憶部に記憶された前記度数分布表を入力とする該記憶部に記憶された前記結果情報への写像を用いた計算結果に基づいて、前記入力データに対する前記所定処理の実行時間を推定する実行時間推定部と
を有する実行時間推定装置。
（付記２）
前記記憶部に記憶される前記写像は、前記入力データの度数分布表から出力データの度数分布推定値への前記所定処理による度数分布写像であり、
前記入力データの度数分布表を入力とする、前記記憶部に記憶される前記度数分布写像を用いることによって、前記出力データの前記度数分布推定値を取得する出力データ度数分布推定部と、
推定された前記度数分布推定値で示される度数の総和による出力データ量に基づいて実行時間を推定する実行時間推定部と
を有する付記１記載の実行時間推定装置。
（付記３）
前記記憶部に記憶される前記写像は、前記入力データの度数分布表から前記所定処理による実行時間写像であり、
前記記憶部に記憶される実行時間写像の各値を前記入力データの度数分布表の同一階級の度数の係数とした一次式で、実行時間を推定する実行時間推定部
を有する付記１記載の実行時間推定装置。
（付記４）
前記記憶部は、前記入力データの度数分布表から出力データの度数分布推定値への前記所定処理による度数分布写像を更に記憶し、
前記記憶部に記憶される度数分布写像の各値を前記入力データの度数分布表の同一階級の度数の係数とした一次式で、前記出力データの度数分布推定値を取得する出力データ度数分布推定部
を有する付記３記載の実行時間推定装置。
（付記５）
前記出力データ度数分布推定部は、前記度数分布写像を用いた一次式で取得した前記出力データの度数分布推定値を、前記所定処理の後段の処理に対する入力データの度数分布表に設定する付記３記載の実行時間推定装置。
（付記６）
前記階級毎に入力データの度数を示す度数分布表を前記記憶部に作成する入力データ度数分布作成部を有する付記１乃至５のいずれか一項記載の実行時間推定装置。
（付記８）
記憶部に記憶された階級毎に入力データの度数の分布を示す度数分布表を入力とする、該記憶部に記憶された該度数分布表に基づく入力データから所定処理による結果情報への写像を計算し、
前記計算の結果に基づいて、前記入力データに対する前記所定処理の実行時間を推定する
処理をコンピュータに実行させるプログラム。
（付記９）
コンピュータによって実行される実行時間推定方法であって、
記憶部に記憶された階級毎に入力データの度数の分布を示す度数分布表を入力とする、該記憶部に記憶された該度数分布表に基づく入力データから所定処理による結果情報への写像を計算し、
前記計算の結果に基づいて、前記入力データに対する前記所定処理の実行時間を推定する実行時間推定方法。
（付記１０）
階級毎に入力データの度数の分布を示す度数分布表と、該度数分布表に基づく入力データから所定処理による結果情報への写像を記憶する記憶部と、
前記度数分布表と写像とを記憶する度数分布生成部と、
前記記憶部に記憶された前記度数分布表を入力とする該記憶部に記憶された前記実行情報への写像を用いた計算結果に基づいて、前記入力データに対する前記所定処理の実行時間を推定する実行時間推定部と
を有するシステム。
（付記１１）
前記度数分布生成部は、
入出力データのスキーマを取得して、該スキーマに基づく前記入力データのフィールドのうち、前記所定処理によって出力データに影響を与える変量フィールドを決定する変量フィールド決定部と、
前記変量フィールド決定部によって決定された前記変量フィールドの値域とレコードの見積もり件数とに基づいて階級数と階級幅とを決定して、前記入力データの度数分布表を前記記憶部に生成する度数分布表生成部と、
前記実行情報への前記写像を作成する写像作成部と
を有する付記１０記載のシステム。
（付記１２）
前記変量フィールド決定部は、
各フィールドの値を平均又は標準偏差を持った正規乱数を用いて設定したレコードを見積もり件数分含む基準入力データセットと、対象フィールド以外は該基準入力データセットの同一番目のフィールドと同じ値とし、該対象フィールドを該基準入力データセットとは異なる平均又は標準偏差を持った正規乱数を用いて変更した入力データセットを生成する生成部と、
前記基準入力データセットに対して前記所定処理を行って得た基準出力データセットと、前記入力データセットに対して該所定処理を行って得た出力データセットとを比較することによって、前記対象フィールドが該所定処理によって該出力データセットに与える影響度を算出する算出部と、
前記影響度が所定値以上である場合、前記対象フィールドを前記変量フィールドに決定する決定部と
を有する付記１１記載のシステム。
（付記１３）
前記変量フィールド決定部は、
前記基準出力データセットと前記出力データセットのレコード数が異なる場合、前記影響度に最大値を設定する第一設定部と、
前記出力データセットと前記基準出力データセットと比較において、対象フィールド以外で値が異なるフィールドがある場合、値が異なるフィールド数の全レコードのフィールド数に対する割合を前記影響度に設定する第二設定部と
を有する付記１２記載のシステム。
（付記１４）
前記写像作成部は、
前記入力フィールドの度数分布表の階級毎に度数分のランダムなデータを生成するデータ生成部と、
前記所定処理で１件ずつ処理して出力データが出力された階級をカウントし、カウント結果を該階級の度数で割ることによって、度数分布を示す前記写像を作成する写像作成部と、
を有する付記１１乃至１３のいずれか一項記載のシステム。
（付記１５）
前記写像作成部は、
前記入力フィールドの度数分布表の階級毎に度数分のランダムなデータを生成するデータ生成部と、
前記所定処理で１件ずつ処理して測定した実行時間を階級毎に平均することによって、各階級の実行時間を示す前記写像を作成する写像作成部と、
を有する付記１１乃至１３のいずれか一項記載のシステム。 Regarding the embodiment including the first to third examples, the following additional notes are disclosed.
(Appendix 1)
A frequency distribution table showing the frequency distribution of the input data for each class, and a storage unit for storing a mapping from the input data based on the frequency distribution table to result information by a predetermined process;
The execution time of the predetermined process for the input data is estimated based on a calculation result using a mapping to the result information stored in the storage unit that receives the frequency distribution table stored in the storage unit. An execution time estimation device having an execution time estimation unit.
(Appendix 2)
The mapping stored in the storage unit is a frequency distribution map by the predetermined processing from the frequency distribution table of the input data to the frequency distribution estimated value of the output data,
An output data frequency distribution estimation unit that obtains the frequency distribution estimated value of the output data by using the frequency distribution map stored in the storage unit, which has the frequency distribution table of the input data as an input;
The execution time estimation device according to appendix 1, further comprising an execution time estimation unit that estimates an execution time based on an output data amount based on a sum of frequencies indicated by the estimated frequency distribution estimated value.
(Appendix 3)
The mapping stored in the storage unit is an execution time mapping by the predetermined process from the frequency distribution table of the input data,
The execution according to appendix 1, further comprising an execution time estimation unit that estimates an execution time by a linear expression using each value of the execution time map stored in the storage unit as a coefficient of the frequency of the same class of the frequency distribution table of the input data. Time estimation device.
(Appendix 4)
The storage unit further stores a frequency distribution map by the predetermined processing from the frequency distribution table of the input data to the frequency distribution estimated value of the output data,
Output data frequency distribution estimation that obtains a frequency distribution estimated value of the output data by a linear expression using each value of the frequency distribution map stored in the storage unit as a coefficient of the frequency of the same class of the frequency distribution table of the input data The execution time estimation apparatus according to supplementary note 3 having a unit.
(Appendix 5)
The output data frequency distribution estimation unit sets the frequency distribution estimation value of the output data acquired by a linear expression using the frequency distribution map in the frequency distribution table of the input data for the subsequent process of the predetermined process The execution time estimation apparatus described.
(Appendix 6)
The execution time estimation device according to any one of appendices 1 to 5, further comprising an input data frequency distribution creation unit that creates a frequency distribution table indicating the frequency of input data for each class in the storage unit.
(Appendix 8)
A frequency distribution table indicating the frequency distribution of the input data for each class stored in the storage unit is used as an input, and mapping from the input data based on the frequency distribution table stored in the storage unit to result information by a predetermined process is performed. Calculate
A program that causes a computer to execute a process of estimating an execution time of the predetermined process for the input data based on a result of the calculation.
(Appendix 9)
An execution time estimation method executed by a computer,
A frequency distribution table indicating the frequency distribution of the input data for each class stored in the storage unit is used as an input, and mapping from the input data based on the frequency distribution table stored in the storage unit to result information by a predetermined process is performed. Calculate
An execution time estimation method for estimating an execution time of the predetermined process for the input data based on a result of the calculation.
(Appendix 10)
A frequency distribution table showing the frequency distribution of the input data for each class, and a storage unit for storing a mapping from the input data based on the frequency distribution table to result information by a predetermined process;
A frequency distribution generation unit for storing the frequency distribution table and the mapping;
Based on a calculation result using a mapping to the execution information stored in the storage unit that receives the frequency distribution table stored in the storage unit, an execution time of the predetermined process for the input data is estimated. A system having an execution time estimation unit.
(Appendix 11)
The frequency distribution generation unit
A variable field determination unit that acquires a schema of input / output data and determines a variable field that affects output data by the predetermined processing among the fields of the input data based on the schema;
Frequency distribution for determining a class number and a class width based on the range of the variable field determined by the variable field determination unit and the estimated number of records, and generating a frequency distribution table of the input data in the storage unit A table generator,
The system according to claim 10, further comprising: a mapping creation unit that creates the mapping to the execution information.
(Appendix 12)
The variable field determination unit includes:
The standard input data set including the estimated number of records set by using a normal random number with an average or standard deviation for each field value, and the same value as the first field of the standard input data set other than the target field, A generating unit that generates an input data set in which the target field is changed using a normal random number having an average or standard deviation different from the reference input data set;
By comparing the reference output data set obtained by performing the predetermined processing on the reference input data set and the output data set obtained by performing the predetermined processing on the input data set, the target field Calculating a degree of influence of the predetermined processing on the output data set;
The system according to claim 11, further comprising: a determining unit that determines the target field as the variable field when the influence degree is equal to or greater than a predetermined value.
(Appendix 13)
The variable field determination unit includes:
When the number of records of the reference output data set and the output data set is different, a first setting unit that sets a maximum value for the degree of influence,
In the comparison between the output data set and the reference output data set, when there is a field having a different value other than the target field, a second setting unit that sets a ratio of the number of fields having a different value to the number of fields of all records as the influence The system according to appendix 12, which has:
(Appendix 14)
The mapping creation unit
A data generation unit for generating random data for each frequency for each class of the frequency distribution table of the input field;
A map creation unit that creates the map showing the frequency distribution by counting the class in which the output data is processed one by one in the predetermined process, and dividing the count result by the frequency of the class;
14. The system according to any one of appendices 11 to 13, comprising:
(Appendix 15)
The mapping creation unit
A data generation unit for generating random data for each frequency for each class of the frequency distribution table of the input field;
A mapping creation unit that creates the mapping indicating the execution time of each class by averaging the execution times measured and processed for each class in the predetermined process;
14. The system according to any one of appendices 11 to 13, comprising:

２バッチジョブ
６入出力データのスキーマ
１１ＣＰＵ
１２主記憶装置
１３補助記憶装置
１４入力装置
１５表示装置
１６出力装置
１７通信Ｉ／Ｆ
１８ドライブ
１９記憶媒体
４０度数分布生成部
４１変量フィールド決定部
４２度数分布表生成部
４３度数分布写像作成部
４４実行時間写像作成部
５０実行時間推定部
５１入力データ度数分布作成部
５２実行時間推定部
５３出力データ度数分布推定部
１３０記憶部
１０００システム
ｆ度数分布写像
ｇ実行時間写像 2 Batch job 6 I / O data schema 11 CPU
12 Main storage device 13 Auxiliary storage device 14 Input device 15 Display device 16 Output device 17 Communication I / F
18 drive 19 storage medium 40 frequency distribution generation unit 41 variable field determination unit 42 frequency distribution table generation unit 43 frequency distribution mapping generation unit 44 execution time mapping generation unit 50 execution time estimation unit 51 input data frequency distribution generation unit 52 execution time estimation unit 53 Output data frequency distribution estimation unit 130 Storage unit 1000 System f Frequency distribution mapping g Execution time mapping

Claims

The value of the field of the input data that affects the output data for each class obtained by class classification, the classified and frequency distribution table to set the input frequency of the input data, the input data sampled in each class of該度number distribution table A storage unit for storing a frequency distribution map that sets an input frequency and obtains an output frequency of output data by performing predetermined processing on the sampled input data ;
A setting unit for setting the input frequency of the target input data of the predetermined processing for each class of the frequency distribution table stored in the storage unit ;
The output frequency of the output data after the predetermined processing with respect to the target input data is acquired for each class by a linear expression using the value of the frequency distribution map as a coefficient, and the acquired output frequencies for each class are totaled. it is, estimator that having a <br/> an output power estimator for estimating the output power with respect to the target input data.

The storage unit stores the predetermined processing execution time indicates an execution time by mapping per input data for each class frequency distribution table of the input data,
By obtaining the execution time after the predetermined processing for the target input data for each class by a linear expression using the value of the execution time map as a coefficient, and summing the obtained execution times for each class, The estimation apparatus according to claim 1, further comprising: an execution time estimation unit that estimates the execution time for the target input data .

The storage unit stores the predetermined processing execution time indicates an execution time by mapping per input data for each class frequency distribution table of the input data,
Using said output power of said each class obtained by the output power estimating unit as the target input data of hierarchical class each, by a linear expression that the coefficient values of the mapping the execution time stored in the storage unit, Execution time estimation for estimating the execution time for the target input data by acquiring the execution time after the predetermined processing for the target input data for each class and summing the acquired execution times for each class The estimation apparatus according to claim 1, further comprising a unit.

The value of the field of the input data affects the output data stored in the storage unit for each class obtained by class classification, as inputs frequency distribution table for setting the input power of the classified the input data, stored in the storage unit It has been set the input frequency of the input data sampled in each class of該度number distribution table, to calculate a frequency distribution map to obtain an output frequency of the output data by performing a predetermined process on the input data the sampled Memorize in the memory ,
For each class of the frequency distribution table stored in the storage unit, set the input frequency of the target input data of the predetermined processing,
The output frequency of the output data after the predetermined processing for the target input data is obtained for each class by a linear expression using the value of the frequency distribution map as a coefficient,
The program which makes a computer perform the process which estimates the said output frequency with respect to the said target input data by totaling the said output frequency for every said class acquired .

An execution time estimation method executed by a computer,
The value of the field of the input data affects the output data stored in the storage unit for each class obtained by class classification, as inputs frequency distribution table for setting the input power of the classified the input data, stored in the storage unit It has been set the input frequency of the input data sampled in each class of該度number distribution table, to calculate a frequency distribution map to obtain an output frequency of the output data by performing a predetermined process on the input data the sampled Memorize in the memory ,
For each class of the frequency distribution table stored in the storage unit, set the input frequency of the target input data of the predetermined processing,
The output frequency of the output data after the predetermined processing for the target input data is obtained for each class by a linear expression using the value of the frequency distribution map as a coefficient,
An estimation method for estimating the output frequency for the target input data by summing the acquired output frequencies for each of the classes .