JP2014215764A

JP2014215764A - Set value generation device, distribution processor, set value generation method and set value generation program

Info

Publication number: JP2014215764A
Application number: JP2013091645A
Authority: JP
Inventors: 佑二山田; Yuji Yamada; 山中　章裕; Akihiro Yamanaka; 章裕山中
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-04-24
Filing date: 2013-04-24
Publication date: 2014-11-17

Abstract

PROBLEM TO BE SOLVED: To apply an appropriate set value to each job in response to request processing to a distribution processor.SOLUTION: In a distribution processor 10 including: a distribution processing base part 130 for executing a job by using a plurality of task execution parts 132; and an interface presentation part 110 for converting an input query into the instruction of a job which can be processed by the distribution processing base part 130, a set value generation part 120 for generating the set value of each job includes: an execution time information acquisition part 121 for acquiring the execution time information of the job in the case of starting the job; and a set value generation processing part 123 for generating a set value to be used for the execution of the job by referring to the acquired execution time information of the job and a set value generation model 122, and for outputting it to the distribution processing base part 130. The distribution processing base part 130 executes the job by the distribution processing base part 130 by using the output set value of the job.

Description

本発明は、設定値生成装置、分散処理装置、設定値生成方法、および、設定値生成プログラムに関する。 The present invention relates to a setting value generation device, a distributed processing device, a setting value generation method, and a setting value generation program.

大量・多様なデータを短時間に分析するために、スケールアウト型の特徴を持つ分散処理システムが注目されている。この技術では、複数の計算機（ノード）を協調させ、個々の計算機にデータ処理を分担することで、全体として大規模なデータ処理を可能にする。どの計算機に処理を任せるか等の管理は分散処理基盤（ＭａｐＲｅｄｕｃｅ等）が担うため、ユーザは分散処理基盤に対し、実現したい処理内容を命令として記述すればよい。ここで、命令の記述が複雑であるため、ユーザが分散処理基盤に対し要求する処理(要求処理)の内容を簡易に指示するためのインタフェースツール（Ｈｉｖｅ、Ｐｉｇ等）が登場した。 In order to analyze a large amount of various data in a short time, a distributed processing system having a scale-out type feature is attracting attention. In this technique, a plurality of computers (nodes) are coordinated, and data processing is shared among individual computers, thereby enabling large-scale data processing as a whole. Since a distributed processing infrastructure (such as MapReduce) is responsible for managing to which computer the processing is entrusted, the user only has to describe the processing contents to be realized as instructions to the distributed processing infrastructure. Here, since the description of the instruction is complicated, interface tools (Hive, Pig, etc.) for easily instructing the contents of the process requested by the user (request process) have appeared.

このインタフェースツールは、ユーザにとって分かりやすい、抽象度の高い形式で記載された要求処理の入力を受け付けると、この要求処理を、分散処理基盤上で処理できる実行単位の命令形式に変換する。 When this interface tool receives an input of a request process described in a format with a high level of abstraction that is easy for the user to understand, the interface tool converts the request process into an instruction format of an execution unit that can be processed on the distributed processing platform.

J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In OSDI’04, 2004, pp.137-149.J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In OSDI’04, 2004, pp.137-149. A. Thusoo, J.S.Sarma, N. Jain, Z.Shao, P. Chakka, N. Zhang, S. Anthony, H. Liu, and R. Murthy. Hive-a petabyte scale data warehouse using Hadoop. In ICDE’10, IEEE, 2010， pp. 996-1005.A. Thusoo, JSSarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Anthony, H. Liu, and R. Murthy. Hive-a petabyte scale data warehouse using Hadoop. In ICDE'10 , IEEE, 2010, pp. 996-1005. Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD'08, ACM, 2008Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins.Pig latin: a not-so-foreign language for data processing.In SIGMOD'08, ACM, 2008

一般にユーザの指示する上位概念レベルの処理(要求処理)内容は、複数の実行単位(ジョブ)からなる処理に変換され、分散処理基盤上で実行されることになる。しかし、ユーザの指示する設定値を、要求処理を構成するジョブごとに変更することができず、また、その設定値の適正さはジョブ実行時まで不明である。よって、分散処理システムにおける要求処理に対する各ジョブの設定値の最適化が不十分となり、分散処理システムが十分な性能を発揮することができないという問題があった。 In general, the processing (request processing) content at a higher concept level specified by the user is converted into a process composed of a plurality of execution units (jobs) and executed on the distributed processing platform. However, the setting value designated by the user cannot be changed for each job constituting the request process, and the appropriateness of the setting value is unknown until the job is executed. Therefore, the optimization of the setting values of each job for the request processing in the distributed processing system becomes insufficient, and there is a problem that the distributed processing system cannot exhibit sufficient performance.

例えば、分散処理システムにおいて要求処理（クエリ等）から生成された複数のジョブのうち、後半のジョブは、前半のジョブの出力データを引き継ぐことがあるが、従来技術では、前半のジョブの出力データ量に応じて、後半のジョブの設定値を変えるといった、各ジョブに対して適切な値を設定することができなかった。 For example, among the plurality of jobs generated from request processing (query, etc.) in the distributed processing system, the latter half of the job may take over the output data of the first half of the job. An appropriate value could not be set for each job, such as changing the setting value of the latter half job according to the amount.

そこで、本発明は、前記した問題を解決し、分散処理システムへの要求処理に対し、各ジョブに適切な設定値を与えることにより、分散処理システムの性能を十分発揮させることを目的とする。 In view of the above, an object of the present invention is to solve the above-described problems and to sufficiently exhibit the performance of the distributed processing system by giving an appropriate setting value to each job for the request processing to the distributed processing system.

前記した問題を解決するため、本発明は、１以上のノードを用いて１以上のジョブを実行する分散処理基盤部と、前記分散処理基盤部への要求処理の入力を受け付けたとき、前記要求処理の内容を、前記分散処理基盤部で処理可能なジョブの命令に変換し、前記変換された命令に基づき、前記分散処理基盤部を用いて、前記１以上のジョブを実行するインタフェース提供部とを備える分散処理装置において実行される前記ジョブそれぞれの設定値を生成する設定値生成装置であって、前記ジョブの実行開始時に、前記分散処理基盤部から当該ジョブの実行時情報を取得する実行時情報取得部と、当該ジョブの実行時情報に基づき、当該ジョブの設定値を求めるための設定値生成モデルを記憶する記憶部と、当該ジョブの実行時情報を用いて、前記設定値生成モデルにより、当該ジョブの実行に用いる設定値を生成し、前記分散処理基盤部へ出力する設定値生成処理部とを備えることを特徴とする設定値生成装置とした。 In order to solve the above-described problem, the present invention provides a distributed processing base unit that executes one or more jobs using one or more nodes, and receives the request processing input to the distributed processing base unit. An interface providing unit that converts processing contents into a job command that can be processed by the distributed processing platform, and executes the one or more jobs using the distributed processing platform based on the converted command; A setting value generation device that generates setting values for each of the jobs executed in the distributed processing device, and at the time of executing execution of the job, the execution time information of the job is acquired from the distributed processing infrastructure unit Based on the information acquisition unit, the execution time information of the job, a storage unit that stores a setting value generation model for obtaining the setting value of the job, and the execution time information of the job, The serial set value generation model, generates a setting value used for execution of the job, and the set value generating device, characterized in that it comprises a set value generation processing unit for outputting to the distributed processing foundation unit.

本発明によれば、分散処理システムへの要求処理に対し、各ジョブに適切な設定値を与えるため、分散処理システムの性能を十分発揮させることができる。 According to the present invention, since an appropriate setting value is given to each job for the request processing to the distributed processing system, the performance of the distributed processing system can be sufficiently exhibited.

図１は、分散処理装置の構成を示す図である。FIG. 1 is a diagram illustrating a configuration of a distributed processing apparatus. 図２は、設定値生成モデルに用いる関数ｆを説明するための図である。FIG. 2 is a diagram for explaining the function f used in the set value generation model. 図３は、入力データ量Ｄの分割サイズＭの最適値を説明するための図である。FIG. 3 is a diagram for explaining the optimum value of the division size M of the input data amount D. 図４は、分散処理装置の処理手順を示すフローチャートである。FIG. 4 is a flowchart showing the processing procedure of the distributed processing apparatus. 図５は、入力データ量Ｄの分割サイズＭと実行時間の実測値とを示すグラフである。FIG. 5 is a graph showing the division size M of the input data amount D and the measured value of the execution time. 図６は、分散処理プログラムを実行するコンピュータを示す図である。FIG. 6 is a diagram illustrating a computer that executes a distributed processing program.

[構成]
以下、図面を参照しながら、本発明の分散処理装置（分散処理システム）の実施の形態を説明する。図１に示すように分散処理装置１０は、外部装置等から、本システムで要求する要求処理（クエリ等）の入力を受け付けると、この要求処理の内容を解釈してデータ処理を実行する。データ処理は、分散処理装置１０の備える１以上のタスク実行部１３２により実行される。タスク実行部（ノード）１３２は、この分散処理装置１０の備える計算機（図示省略）により実現される。 [Constitution]
Hereinafter, embodiments of a distributed processing apparatus (distributed processing system) according to the present invention will be described with reference to the drawings. As shown in FIG. 1, when receiving an input of a request process (query or the like) requested by the present system from an external apparatus or the like, the distributed processing apparatus 10 interprets the content of the request process and executes data processing. Data processing is executed by one or more task execution units 132 included in the distributed processing apparatus 10. The task execution unit (node) 132 is realized by a computer (not shown) included in the distributed processing apparatus 10.

分散処理装置１０は、インタフェース提供部１１０と、設定値生成部１２０と、分散処理基盤部１３０とを備える。 The distributed processing apparatus 10 includes an interface providing unit 110, a setting value generating unit 120, and a distributed processing base unit 130.

インタフェース提供部１１０は、分散処理装置１０への要求処理（クエリ等）の入力を受け付けたとき、この要求処理の内容を解釈し、分散処理基盤部１３０が処理可能な１以上の実行単位（ジョブ）の命令に変換する。そして、インタフェース提供部１１０は、この変換された命令に基づき、分散処理基盤部１３０を用いて各ジョブを実行する。つまり、インタフェース提供部１１０は、分散処理基盤部１３０に対し、各ジョブの実行を指示し、分散処理基盤部１３０から、当該ジョブの実行結果を受け取るこのインタフェース提供部１１０は、非特許文献２に記載のＨｉｖｅや、非特許文献３に記載のＰｉｇにより実現される。 When the interface providing unit 110 receives an input of a request process (query or the like) to the distributed processing apparatus 10, the interface providing unit 110 interprets the content of the request process and can execute one or more execution units (jobs) that can be processed by the distributed processing base unit 130. ) Command. Then, the interface providing unit 110 executes each job using the distributed processing base unit 130 based on the converted command. That is, the interface providing unit 110 instructs the distributed processing base unit 130 to execute each job, and receives the job execution result from the distributed processing base unit 130. It is realized by the described Hive and the Pig described in Non-Patent Document 3.

インタフェース提供部１１０は、変換部１１１と、実行部１１２とを備える。変換部１１１は、要求処理の入力を受け付けたとき、この要求処理の内容を、分散処理基盤部１３０が処理可能な１以上のジョブの命令に変換する。実行部１１２は、変換部１１１により変換された命令に基づき、分散処理基盤部１３０を用いてジョブを実行する。ここで実行部１１２は、変換された命令に基づき、ジョブ同士の依存関係（例えば、ジョブ１の出力結果がジョブ２の入力になる等）があればそれに従い、分散処理基盤部１３０を用いてジョブを実行し、その実行結果を外部装置等へ出力する。なお、実行部１１２は、分散処理基盤部１３０によるジョブの実行にあたり、設定値生成部１２０から出力された当該ジョブの設定値を用いる。 The interface providing unit 110 includes a conversion unit 111 and an execution unit 112. When receiving the input of the request process, the conversion unit 111 converts the content of the request process into one or more job instructions that can be processed by the distributed processing base unit 130. The execution unit 112 executes a job using the distributed processing platform unit 130 based on the instruction converted by the conversion unit 111. Here, based on the converted instruction, the execution unit 112 uses the distributed processing platform unit 130 according to any dependency between jobs (for example, the output result of job 1 becomes the input of job 2). Execute the job and output the execution result to an external device. The execution unit 112 uses the setting value of the job output from the setting value generation unit 120 when the distributed processing base unit 130 executes the job.

設定値生成部１２０は、ジョブの実行時情報（例えば、当該ジョブへの入力データ量Ｄ）を用いて当該ジョブで用いる設定値を生成し、生成した設定値を分散処理基盤部１３０へ出力する。ジョブで用いる設定値は、例えば、リソースを最大限利用してジョブの実行時間を最小化することを要件とした、当該ジョブを構成するタスク１つあたりの最大データ量Ｍである。この設定値生成部１２０は、実行時情報取得部１２１と、設定値生成モデル１２２と、設定値生成処理部１２３とを備える。モデル更新部１２４は、装備する場合と装備しない場合とがあり、装備する場合について後記する。 The setting value generation unit 120 generates setting values used in the job using job execution time information (for example, input data amount D to the job), and outputs the generated setting values to the distributed processing platform unit 130. . The setting value used in the job is, for example, the maximum data amount M for each task constituting the job, which requires that the resource is used to the maximum to minimize the job execution time. The setting value generation unit 120 includes a runtime information acquisition unit 121, a setting value generation model 122, and a setting value generation processing unit 123. The model update unit 124 may or may not be equipped, and will be described later.

実行時情報取得部１２１は、ジョブの実行時情報を分散処理基盤部１３０から取得する。この実行時情報は、各ジョブの実行開始時に得られる情報であり、当該ジョブよりも前に実行されたジョブの実行結果に依存する。この実行時情報は、例えば、当該ジョブより前のジョブが終了した後、当該ジョブへ入力される入力データ量Ｄや、タスク実行部１３２の数Ｎ等である。 The runtime information acquisition unit 121 acquires job runtime information from the distributed processing infrastructure unit 130. This execution time information is information obtained at the start of execution of each job, and depends on the execution result of a job executed before that job. This execution time information is, for example, the input data amount D input to the job after the job preceding the job ends, the number N of task execution units 132, and the like.

設定値生成モデル１２２は、当該ジョブの実行時情報に基づき、当該ジョブの最適な設定値を得るためのモデルであり、例えば、図２に示すような関数ｆで与えられる。この関数ｆにおいて（ｘ_１，…，ｘ_ｎ）は、モデルに与える引数であり、当該ジョブの実行時情報（例えば、当該ジョブへの入力データ量Ｄ等）の値である。また、関数ｆにおける（ｗ_１，…，ｗ_l）はモデルに登場する定数値であり、例えば、当該ジョブにおけるタスク実行部１３２それぞれのタスク処理数ｗである。そして、この関数ｆから得られる（ｃ_１，…，ｃ_ｍ）は、モデルの出力であり、当該ジョブの最適な設定値である。この設定値生成モデル１２２は、分散処理装置１０の備える記憶部（図示省略）の所定領域に記憶され、設定値生成処理部１２３により参照される。なお、設定値生成モデル１２２における関数ｆや、この関数ｆで用いるパラメータの種類、定数等は、実験等により予め求めておくものとする。 The setting value generation model 122 is a model for obtaining an optimum setting value of the job based on the execution time information of the job, and is given by, for example, a function f as shown in FIG. In this function f, (x ₁ ,..., X _n ) is an argument given to the model and is a value of execution time information of the job (for example, input data amount D to the job). Further, (w ₁ ,..., W _l ) in the function f is a constant value appearing in the model, for example, the number of task processes w of each task execution unit 132 in the job. Then, (c ₁ ,..., C _m ) obtained from the function f is an output of the model, which is an optimal setting value for the job. The set value generation model 122 is stored in a predetermined area of a storage unit (not shown) included in the distributed processing apparatus 10 and is referred to by the set value generation processing unit 123. It should be noted that the function f in the set value generation model 122, the types of parameters used in the function f, constants, and the like are obtained in advance through experiments or the like.

ここで、ジョブの最適な設定値は、例えば、当該ジョブの実行時間をできるだけ短くするような、タスク１つあたりの最大データ量（つまり、入力データ量Ｄの分割サイズ）Ｍの値である。この入力データ量Ｄの分割サイズＭの最適値は、例えば、以下の式（１）に示す設定値生成モデル１２２により得られる。 Here, the optimum setting value of the job is, for example, the value of the maximum data amount per task (that is, the division size of the input data amount D) M that shortens the execution time of the job as much as possible. The optimum value of the division size M of the input data amount D is obtained by, for example, the set value generation model 122 shown in the following equation (1).

なお、式（１）におけるｗは予め与えられる定数である。 In Equation (1), w is a constant given in advance.

設定値生成処理部１２３が、式（１）に示す設定値生成モデル１２２を用いることで、当該ジョブの実行時間をできるだけ短くするような、入力データ量Ｄの分割サイズＭを求めることができる。この理由を、図３を参照して説明する。 The set value generation processing unit 123 uses the set value generation model 122 shown in Expression (1), so that the division size M of the input data amount D that can shorten the execution time of the job as much as possible can be obtained. The reason for this will be described with reference to FIG.

例えば、ジョブを実行するタスク実行部１３２の数を２個（タスク実行部１３２ａおよびタスク実行部１３２ｂ）、ｗ＝３とする。ここで、（ａ）に示すように、入力データ量Ｄを５個に分割した場合、つまり、タスク１つあたりの最大データ量Ｍを、Ｄ／５とした場合、タスク実行部１３２ｂに待機時間が発生してしまう。つまり、タスク実行部１３２群全体としてのジョブの実行時間は、タスク実行部１３２ａが５個目のタスクが終了するまでの時間となってしまう。 For example, the number of task execution units 132 that execute jobs is two (task execution unit 132a and task execution unit 132b), and w = 3. Here, as shown in (a), when the input data amount D is divided into five, that is, when the maximum data amount M per task is set to D / 5, the task execution unit 132b has a waiting time. Will occur. That is, the job execution time of the entire task execution unit 132 group is the time until the task execution unit 132a finishes the fifth task.

一方、（ｂ）に示すように、入力データ量Ｄを６個（つまり、タスク処理数ｗ（３）×タスク実行部１３２の数Ｎ（２））とすれば、タスク実行部１３２ｂの待機時間は発生せず、タスク実行部１３２ａ，１３２ｂはそれぞれほぼ同時にジョブを終了させることができる。これにより、タスク実行部１３２群全体としてのジョブの実行時間を低減させることができる。 On the other hand, as shown in (b), if the input data amount D is six (that is, the number of task processes w (3) × the number N of task execution units 132 (2)), the waiting time of the task execution unit 132b The task execution units 132a and 132b can finish the job almost simultaneously. Thereby, the job execution time of the task execution unit 132 group as a whole can be reduced.

このように、設定値生成処理部１２３が、設定値生成モデル１２２として前記した式（１）を用いることで、タスク実行部１３２群全体のジョブの実行時間を低減させるような、入力データ量Ｄの分割サイズＭを得ることができる。 As described above, the set value generation processing unit 123 uses the above-described equation (1) as the set value generation model 122, thereby reducing the job execution time of the entire task execution unit 132 group so as to reduce the input data amount D. Can be obtained.

設定値生成処理部１２３は、当該ジョブの実行時情報と、設定値生成モデル１２２とを参照して、当該ジョブで用いる設定値を生成する。例えば、設定値生成処理部１２３は、ジョブ２の実行時情報として、ジョブ２への入力データ量Ｄを取得したとき、設定値生成モデル１２２である式（１）を用いてジョブ２における入力データ量Ｄの分割サイズＭを決定する。そして、設定値生成処理部１２３は、決定したジョブ２における入力データ量Ｄの分割サイズＭの設定値を、分散処理基盤部１３０へ出力する。この設定値を受信した分散処理基盤部１３０は、当該設定値を用いてジョブ２を実行する。 The setting value generation processing unit 123 refers to the execution time information of the job and the setting value generation model 122 to generate a setting value used in the job. For example, when the set value generation processing unit 123 acquires the input data amount D to the job 2 as the execution time information of the job 2, the set value generation processing unit 123 uses the formula (1) that is the set value generation model 122 to input data in the job 2 The division size M of the quantity D is determined. Then, the set value generation processing unit 123 outputs the set value of the division size M of the input data amount D in the determined job 2 to the distributed processing base unit 130. The distributed processing infrastructure unit 130 that has received this setting value executes job 2 using the setting value.

分散処理基盤部１３０は、インタフェース提供部１１０から出力されたジョブの実行指示および設定値生成部１２０から出力された設定値に基づき、この分散処理基盤部１３０管理下にある各タスク実行部１３２にジョブの実行処理を分担させる。そして、分散処理基盤部１３０は、ジョブを分担させた各タスク実行部１３２の実行結果をまとめたデータをインタフェース提供部１１０へ返す。この分散処理基盤部１３０は、例えば、非特許文献１に記載のＭａｐＲｅｄｕｃｅ等により実現される。この分散処理基盤部１３０は、ジョブ実行管理部１３１と１以上のタスク実行部１３２とを備える。 Based on the job execution instruction output from the interface providing unit 110 and the setting value output from the setting value generation unit 120, the distributed processing base unit 130 sends to each task execution unit 132 under the management of the distributed processing base unit 130. Share job execution processing. Then, the distributed processing infrastructure unit 130 returns data that summarizes the execution results of the task execution units 132 that have assigned the job to the interface providing unit 110. The distributed processing base unit 130 is realized by, for example, MapReduce described in Non-Patent Document 1. The distributed processing base unit 130 includes a job execution management unit 131 and one or more task execution units 132.

ジョブ実行管理部１３１は、ジョブの実行指示および当該ジョブの設定値に基づき、タスク実行部１３２に当該ジョブをどのようにタスクとして実行させるのかを管理する。ジョブ実行管理部１３１は、例えば、できるだけタスク実行部１３２がローカルディスクに保持するデータを処理するようにタスクを配置する。そして、ジョブ実行管理部１３１は、ジョブを分担させた各タスク実行部１３２の実行結果をまとめたデータをインタフェース提供部１１０へ返す。 The job execution management unit 131 manages how the job execution unit 132 executes the job as a task based on the job execution instruction and the set value of the job. For example, the job execution management unit 131 arranges tasks so that the task execution unit 132 processes data held in the local disk as much as possible. Then, the job execution management unit 131 returns to the interface providing unit 110 data that summarizes the execution results of the task execution units 132 that have assigned the job.

タスク実行部１３２は、ジョブ実行管理部１３１からタスクや設定値を受け取ると、該当する入力データをローカルディスクやネットワーク転送により読み込み、当該設定値に基づくタスクの実行処理を行う。 When the task execution unit 132 receives a task or setting value from the job execution management unit 131, the task execution unit 132 reads the corresponding input data by local disk or network transfer, and performs a task execution process based on the setting value.

[処理手順]
次に、図４を用いて、分散処理装置１０の処理手順を説明する。まず、インタフェース提供部１１０は、分散処理装置１０への要求処理の入力を受け付けると（Ｓ１）、変換部１１１により、この要求処理の内容を解釈し、この要求処理の内容を、分散処理基盤部１３０が処理可能なジョブの命令へ変換する（Ｓ２）。例えば、前記した図１に示すようにジョブ１，２，３の命令に変換する。 [Processing procedure]
Next, the processing procedure of the distributed processing apparatus 10 will be described with reference to FIG. First, when receiving an input of request processing to the distributed processing apparatus 10 (S1), the interface providing unit 110 interprets the content of the request processing by the conversion unit 111, and converts the content of the request processing into the distributed processing base unit. 130 converts the job command into a processable command (S2). For example, as shown in FIG.

そして、実行時情報取得部１２１は、分散処理基盤部１３０からジョブの実行時情報を取得する（Ｓ３）。例えば、実行時情報取得部１２１は、分散処理基盤部１３０からジョブ１への入力データ量Ｄ、および、ジョブ１を実行するタスク実行部１３２の数Ｎの値を取得する。そして、設定値生成処理部１２３は、この実行時情報と、設定値生成モデル１２２とを参照して、当該ジョブ（例えば、ジョブ１）の設定値を生成する（Ｓ４）。そして、設定値生成処理部１２３は、生成した設定値を分散処理基盤部１３０へ出力する。 The runtime information acquisition unit 121 acquires job runtime information from the distributed processing infrastructure unit 130 (S3). For example, the runtime information acquisition unit 121 acquires the input data amount D from the distributed processing infrastructure unit 130 to the job 1 and the value of the number N of task execution units 132 that execute the job 1. Then, the set value generation processing unit 123 generates a set value for the job (for example, job 1) with reference to the execution time information and the set value generation model 122 (S4). Then, the set value generation processing unit 123 outputs the generated set value to the distributed processing base unit 130.

実行部１１２は、変換部１１１により変換された命令に基づき、分散処理基盤部１３０によりジョブを実行する（Ｓ５）。例えば、実行部１１２は分散処理基盤部１３０により、ジョブ１の設定値に基づきジョブ１を実行する。 The execution unit 112 executes the job by the distributed processing base unit 130 based on the instruction converted by the conversion unit 111 (S5). For example, the execution unit 112 causes the distributed processing base unit 130 to execute job 1 based on the setting value of job 1.

Ｓ５の後、未実行のジョブがあれば（Ｓ６のＹｅｓ）、Ｓ３へ戻る。例えば、変換部１１１により変換された命令において、ジョブ１に引き続き、ジョブ２の実行が指示されていれば、Ｓ３へ戻る。そして、実行時情報取得部１２１は、ジョブ１の実行後、ジョブ２の実行開始時に、ジョブ２への入力データ量Ｄ、および、ジョブ２を実行するタスク実行部１３２の数Ｎを分散処理基盤部１３０から取得する（Ｓ３）。そして設定値生成処理部１２３は、ジョブ２への入力データ量Ｄと、ジョブ２を実行するタスク実行部１３２の数Ｎと、設定値生成モデル１２２とを参照して、ジョブ２の設定値を生成する（Ｓ４）。その後、実行部１１２は、生成されたジョブ２の設定値を用いて、分散処理基盤部１３０によりジョブ２を実行する（Ｓ５）。 If there is an unexecuted job after S5 (Yes in S6), the process returns to S3. For example, in the instruction converted by the conversion unit 111, if execution of job 2 is instructed following job 1, the process returns to S3. The execution time information acquisition unit 121 then distributes the input data amount D to the job 2 and the number N of task execution units 132 that execute the job 2 when the execution of the job 2 starts after the execution of the job 1. Obtained from the unit 130 (S3). Then, the setting value generation processing unit 123 refers to the input data amount D to the job 2, the number N of task execution units 132 that execute the job 2, and the setting value generation model 122 to determine the setting value of the job 2. Generate (S4). Thereafter, the execution unit 112 executes the job 2 by the distributed processing infrastructure unit 130 using the generated setting value of the job 2 (S5).

Ｓ５の後、Ｓ６において未実行のジョブがなければ（Ｓ６のＮｏ）、実行部１１２はジョブの実行結果を出力し（Ｓ７）、処理を終了する。 After S5, if there is no unexecuted job in S6 (No in S6), the execution unit 112 outputs the job execution result (S7) and ends the process.

このようにすることで、分散処理装置１０は、要求処理に対し、ジョブごとに最適な設定値を与えることができるため、分散処理装置１０の性能を十分発揮させることができる。また、分散処理装置１０のユーザは、ジョブごとに設定値の調整を行う必要がなくなる。 By doing so, the distributed processing device 10 can give optimum setting values for each job to the request processing, and thus the performance of the distributed processing device 10 can be sufficiently exerted. Further, the user of the distributed processing apparatus 10 does not need to adjust the setting value for each job.

[実験結果]
なお、分散処理装置１０におけるジョブへの入力データ量Ｄの分割サイズＭと当該ジョブの実行時間の実測値との関係を、図５に示す。ここではジョブの実行に用いたタスク実行部１３２の数Ｎ＝１０とし、入力データ量Ｄ＝１８（ＧＢ）とした。図５に示すように、入力データ量Ｄの分割サイズＭが、Ｄ／Ｎとなっている場合、および、Ｄ／２Ｎとなっている場合において、ジョブの実行時間が最も短くなっていることが分かる。このことからも、前記した式（１）が、入力データ量Ｄの分割サイズＭの最適値を計算するのに適した式であることが分かる。 [Experimental result]
FIG. 5 shows the relationship between the division size M of the input data amount D to the job in the distributed processing apparatus 10 and the measured value of the execution time of the job. Here, the number N of task execution units 132 used for job execution is set to 10, and the amount of input data D is set to 18 (GB). As shown in FIG. 5, when the division size M of the input data amount D is D / N and when it is D / 2N, the job execution time is the shortest. I understand. From this, it can be seen that the above-described expression (1) is an expression suitable for calculating the optimum value of the division size M of the input data amount D.

なお、分散処理装置１０への実際の入力データは、複数の計算機（タスク実行部１３２）に分散して保持されている。これは、１つのタスクが処理するデータ量を決める時に影響が出る。１つの計算機が保持するデータ量よりも多いデータを１つのタスクで処理するように指示した場合、他の計算機からデータを受け取る必要があり、転送というコストが上乗せされる。そのコストを避けるために、計算機をまたいで１つのデータとみなすことは少ない。実際に従来技術の実装では、ユーザは分散処理システムに対し、１つのタスクが処理するデータ量は指定できず、最大量（これを分割サイズＭと呼ぶ）のみ指定できる。ジョブ実行管理部１３１は、分割サイズを超えているデータについては、データを分割する（つまりタスクも増える）。ここで、分割サイズＭに満たない余りのデータは、転送コストを避けるために、余りだけで１つの分割データを構成し、タスクとして実行される。つまり、分割サイズＭと実際のタスクが処理するデータ量は必ずしも一致しない。例えば、分割サイズＭを「６」とし、ある計算機は「１０」のデータ量を保持している場合、「６」の分割データを処理するタスクと「４」の分割データを処理するタスクができる。このように、厳密な等分割り当てができないため、実験で実行時間が最小になる設定値を探す必要がある。実験の結果、入力データ量Ｄを前記した式（１）により求めた分割サイズＭで分割した場合の設定値を採用すると本実験の環境では上手くいくことが分かるため、その値を算出する式（１）をモデルとして採用している。 Actual input data to the distributed processing apparatus 10 is distributed and held in a plurality of computers (task execution unit 132). This affects when determining the amount of data to be processed by one task. When an instruction is given to process data larger than the amount of data held by one computer in one task, it is necessary to receive data from another computer, which adds to the cost of transfer. In order to avoid the cost, it is rare that the data is regarded as one data across the computers. Actually, in the implementation of the prior art, the user cannot specify the amount of data to be processed by one task for the distributed processing system, but can specify only the maximum amount (referred to as the division size M). The job execution management unit 131 divides data (that is, the number of tasks increases) for data that exceeds the division size. Here, in order to avoid the transfer cost, the remaining data less than the division size M constitutes one piece of divided data and is executed as a task. That is, the division size M and the amount of data processed by the actual task do not necessarily match. For example, when the division size M is “6” and a computer holds a data amount of “10”, a task for processing the division data “6” and a task for processing the division data “4” can be performed. . As described above, since strict equal allocation cannot be performed, it is necessary to search for a setting value that minimizes the execution time in an experiment. As a result of the experiment, it is understood that if the set value in the case where the input data amount D is divided by the division size M obtained by the above equation (1) is adopted, it works well in the environment of this experiment. 1) is adopted as a model.

[その他の実施の形態]
（設定値生成モデルの例その１）
なお、設定値生成処理部１２３は、ジョブの設定値として、当該ジョブへの割り当てメモリ量を生成してもよい。例えば、設定値生成処理部１２３は、当該ジョブにおける中間出力を加工（例えば、ソート）する際に用いる最適なメモリ量や、中間出力を転送する際に用いる最適なメモリ量等を生成してもよい。ここでいう最適なメモリ量とは、例えば、当該ジョブの実行時間をできるだけ短くするようなメモリ量である。 [Other embodiments]
(Example of set value generation model 1)
Note that the set value generation processing unit 123 may generate an allocated memory amount for the job as the set value of the job. For example, the set value generation processing unit 123 may generate an optimal memory amount used when processing (for example, sorting) the intermediate output in the job, an optimal memory amount used when transferring the intermediate output, or the like. Good. The optimum memory amount here is, for example, a memory amount that makes the execution time of the job as short as possible.

この場合、設定値生成モデル１２２として、当該ジョブの中間出力を加工する際に用いる最適なメモリ量や、中間出力を転送する際に用いる最適なメモリ量を求めるための関数ｆを用いる。この関数ｆのパラメータは、例えば、当該ジョブへの入力データ量Ｄや、各タスク実行部１３２（つまり、タスク実行部１３２によりデータ処理を行う計算機）間を接続するネットワークにおける単位時間あたりのネットワーク転送速度ｖ_ｔや、各タスク実行部１３２で用いることができる最大メモリ量の値である。なお、当該ジョブへの入力データ量Ｄは、実行時情報取得部１２１により取得された値を用いる。また、ネットワーク転送速度ｖ_ｔや各タスク実行部１３２で用いることができる最大メモリ量の値は、分散処理基盤部１３０等から事前に取得されるものとする。 In this case, as the set value generation model 122, a function f for obtaining an optimum memory amount used when processing the intermediate output of the job or an optimum memory amount used when transferring the intermediate output is used. The parameters of the function f include, for example, the amount of input data D to the job and network transfer per unit time in a network connecting each task execution unit 132 (that is, a computer that performs data processing by the task execution unit 132). This is the speed v _t or the maximum memory amount that can be used by each task execution unit 132. Note that the value acquired by the runtime information acquisition unit 121 is used as the input data amount D to the job. In addition, the network transfer rate v _t and the value of the maximum memory amount that can be used by each task execution unit 132 are assumed to be acquired in advance from the distributed processing platform unit 130 or the like.

このように分散処理装置１０は、ジョブの設定値として、当該ジョブにおける中間出力を加工する際に用いる最適なメモリ量や、中間出力を転送する際に用いる最適なメモリ量等を与える。これにより、当該ジョブにおける中間出力を加工するためのメモリ量が大きすぎることにより、加工後の中間出力を転送するためのメモリ量が不足し、当該ジョブの実行時間が長くなってしまうことを避けることができる。また、当該ジョブにおける中間出力を加工するためのメモリ量が小さすぎることにより、加工自体に時間がかかり当該ジョブの実行時間が長くなってしまうことを避けることができる。 As described above, the distributed processing apparatus 10 gives the optimum memory amount used when processing the intermediate output in the job, the optimum memory amount used when transferring the intermediate output, and the like as the setting value of the job. As a result, it is avoided that the amount of memory for processing the intermediate output in the job is too large, so that the amount of memory for transferring the intermediate output after processing becomes insufficient and the execution time of the job becomes long. be able to. Further, since the amount of memory for processing the intermediate output in the job is too small, it can be avoided that the processing itself takes time and the execution time of the job becomes long.

（設定値生成モデルの例その２）
さらに、設定値生成処理部１２３は、当該ジョブの設定値として、当該ジョブの中間出力データを転送する際、この中間出力データを圧縮してから転送するか否かを示した値を生成してもよい。 (Example of setting value generation model 2)
Further, when transferring the intermediate output data of the job, the set value generation processing unit 123 generates a value indicating whether to transfer the intermediate output data after compressing the intermediate output data. Also good.

この場合、設定値生成処理部１２３が用いる設定値生成モデル１２２は、例えば、当該ジョブの中間出力データを圧縮する場合の転送時間（解凍時間も含む）と、圧縮しない場合の転送時間とを計算し、より転送時間の短い方（中間出力データを圧縮するかしないか）を採用するモデルである。つまり、設定値生成モデル１２２は、当該ジョブの中間出力データを圧縮する場合の転送時間の方が短かったとき、中間出力データを圧縮すると判断するが、圧縮しない場合の転送時間の方が短かったとき、当該ジョブにおいて中間出力データを圧縮しないと判断するモデルである。 In this case, the setting value generation model 122 used by the setting value generation processing unit 123 calculates, for example, a transfer time when the intermediate output data of the job is compressed (including the decompression time) and a transfer time when the compression is not performed. However, this is a model that employs the shorter transfer time (whether or not to compress the intermediate output data). That is, the setting value generation model 122 determines that the intermediate output data is compressed when the transfer time when the intermediate output data of the job is compressed is shorter, but the transfer time when the compression is not performed is shorter. The intermediate output data is determined not to be compressed in the job.

この場合、設定値生成処理部１２３は、実行時情報として、当該ジョブへの入力データ量Ｄおよび当該ジョブにおいて用いられるユーザ関数ｆを用いる。そして、設定値生成処理部１２３が用いる設定値生成モデル１２２は、入力データ量Ｄに対する当該ユーザ関数ｆによる処理の実行により得られる中間出力データの転送にあたり、圧縮しない中間出力データのデータ転送時間である非圧縮時転送時間と、中間出力データを圧縮した場合の中間出力データの圧縮および解凍処理に要する時間と圧縮後の中間出力データの転送時間の合計値である圧縮時転送時間との長短を比較し、圧縮転送時間に対し、非圧縮転送時間が短い場合は圧縮しない、非圧縮転送時間が長い場合は圧縮する、と判断するモデルである。また、設定値生成処理部１２３により生成される当該ジョブの設定値は、この設定値生成モデル１２２により判断された、当該ジョブにおいて中間出力データを圧縮するか否かを示す値である。 In this case, the set value generation processing unit 123 uses the input data amount D to the job and the user function f used in the job as execution time information. The set value generation model 122 used by the set value generation processing unit 123 is a data transfer time of intermediate output data that is not compressed when transferring intermediate output data obtained by executing the process by the user function f for the input data amount D. The length of a certain uncompressed transfer time and the compression transfer time that is the sum of the time required for compression and decompression of the intermediate output data when the intermediate output data is compressed and the intermediate output data transfer time after compression. In comparison, it is a model that determines that compression is not performed when the non-compression transfer time is short with respect to the compression transfer time, and compression is performed when the non-compression transfer time is long. Further, the setting value of the job generated by the setting value generation processing unit 123 is a value determined by the setting value generation model 122 and indicating whether or not intermediate output data is compressed in the job.

圧縮しない場合の転送時間は、例えば、以下の式（２）により推定される。 The transfer time when not compressed is estimated by, for example, the following equation (2).

式（２）におけるｖ_ｔは、ネットワーク転送速度であり、ｇ(Ｄ，ｆ)は、入力データ量Ｄをユーザ関数ｆで処理した際の出力データ量（つまり、中間出力データ量）である。 In Expression (2), v _t is a network transfer rate, and g (D, f) is an output data amount (that is, an intermediate output data amount) when the input data amount D is processed by the user function f.

圧縮する場合の転送時間は、例えば、以下の式（３）により推定される。 The transfer time in the case of compression is estimated by the following formula (3), for example.

なお、前記したとおり、式（２）および式（３）における当該ジョブへの入力データ量Ｄとユーザ関数ｆは、実行時情報取得部１２１により取得される。また、ネットワーク転送速度ｖ_ｔ、データ圧縮速度ｖ_ｃ、および、データ解凍速度ｖ_ｄは、分散処理基盤部１３０等から事前に取得される値とする。 As described above, the input data amount D and the user function f to the job in Expression (2) and Expression (3) are acquired by the runtime information acquisition unit 121. The network transfer rate v _t , the data compression rate v _c , and the data decompression rate v _d are values acquired in advance from the distributed processing platform 130 and the like.

このように分散処理装置１０は、ジョブの設定値として、当該ジョブにおいて中間出力を圧縮するか否かを設定するため、当該ジョブの実行時間をできるだけ短くすることができる。 In this way, the distributed processing apparatus 10 sets whether or not to compress the intermediate output in the job as the job setting value, so that the execution time of the job can be shortened as much as possible.

（モデル更新部）
なお、設定値生成部１２０は、生成した設定値によるジョブの統計情報を取得し、その統計情報をもとに設定値生成モデル１２２を修正（更新）するようにしてもよい。このような設定値生成部１２０は、図１の破線で示すモデル更新部１２４をさらに備える。また、実行時情報取得部１２１は、当該ジョブの実行時情報として、当該ジョブの統計情報を取得する。ここで取得する統計情報とは、当該ジョブの実行時間等である。 (Model update part)
Note that the set value generation unit 120 may acquire job statistical information based on the generated set value and correct (update) the set value generation model 122 based on the statistical information. Such a set value generation unit 120 further includes a model update unit 124 indicated by a broken line in FIG. The runtime information acquisition unit 121 acquires statistical information of the job as the runtime information of the job. The statistical information acquired here is the execution time of the job.

モデル更新部１２４は、実行時情報取得部１２１等により、各ジョブの統計情報を取得し、また、設定値生成処理部１２３等から、ジョブに設定された設定値を取得する。そして、モデル更新部１２４は、取得したジョブの統計情報と、設定値とを用いて、設定値生成モデル１２２の関数ｆを修正する。その後、設定値生成処理部１２３は、修正された設定値生成モデル１２２の関数ｆを用いてジョブの設定値を生成し、分散処理基盤部１３０へ出力する。そして、モデル更新部１２４は、修正された設定値生成モデル１２２に基づく設定値と、その設定値によるジョブの統計情報とを用いて、設定値生成モデル１２２の関数ｆをさらに修正する。モデル更新部１２４が、このような処理を繰り返すことで、設定値生成モデル１２２を適切な設定値を計算するようなモデルにすることができる。 The model update unit 124 acquires statistical information of each job by the runtime information acquisition unit 121 and the like, and acquires setting values set for the job from the setting value generation processing unit 123 and the like. Then, the model update unit 124 corrects the function f of the setting value generation model 122 using the acquired statistical information of the job and the setting value. Thereafter, the setting value generation processing unit 123 generates a setting value for the job using the corrected function f of the setting value generation model 122 and outputs the job setting value to the distributed processing platform unit 130. Then, the model update unit 124 further modifies the function f of the setting value generation model 122 using the setting value based on the corrected setting value generation model 122 and the job statistical information based on the setting value. By repeating such processing, the model update unit 124 can make the setting value generation model 122 a model that calculates an appropriate setting value.

（設定値生成装置）
なお、前記した実施の形態において、設定値生成部１２０は分散処理装置１０内の構成としたが、これに限定されない。例えば、設定値生成部１２０を分散処理装置１０とは別個の装置として実現し、設定値生成部１２０は、ネットワーク経由で、分散処理装置１０から各ジョブの実行時情報を取得し、分散処理装置１０へ各ジョブの設定値を出力するようにしてもよい。 (Set value generator)
In the above-described embodiment, the setting value generation unit 120 is configured in the distributed processing apparatus 10, but is not limited thereto. For example, the setting value generation unit 120 is realized as a device separate from the distributed processing device 10, and the setting value generation unit 120 acquires execution time information of each job from the distributed processing device 10 via the network, and the distributed processing device The setting value of each job may be output to 10.

（ジョブの設定値の例）
前記した実施の形態におけるジョブの設定値は、リソースを最大限利用してジョブの実行時間を最小化することを要件とした、当該ジョブを構成するタスク１つあたりの最大データ量Ｍ以外にも以下の設定値であってもよい。例えば、分散処理装置１０において複数の要求処理（クエリ等）が入力されたとき、タスク実行部１３２において、同時に複数のジョブを実行する場合もある。このような場合、当該ジョブの設定値として、各要求処理はユーザが定めた時間内で処理の実行を完了し、かつ、要求処理に基づく処理の実行開始からすべての要求処理の実行完了までの時間をできるだけ短くするためには、タスク実行部１３２それぞれが用いるＣＰＵやメモリ量等の消費リソースを最小化することが好ましい。よって、このような場合のジョブの設定値として、当該ジョブの実行に係るタスクを実行するタスク実行部１３２の数や当該タスクに割り当てるメモリ量、当該タスク実行部１３２で実行するタスク数等が考えられる。さらに、各タスク実行部１３２の総消費電力を最小化することを要件とする場合におけるジョブの設定値として、例えば、計算リソースであるタスク実行部１３２の数等が考えられる。 (Example of job settings)
The setting value of the job in the above-described embodiment is not limited to the maximum data amount M per task constituting the job, which requires that the execution time of the job is minimized by making maximum use of resources. The following setting values may be used. For example, when a plurality of request processes (queries or the like) are input in the distributed processing device 10, the task execution unit 132 may execute a plurality of jobs at the same time. In such a case, as the setting value of the job, each request process completes the process within the time set by the user, and from the start of the process based on the request process to the completion of the execution of all the request processes In order to shorten the time as much as possible, it is preferable to minimize consumption resources such as a CPU and a memory amount used by each task execution unit 132. Therefore, as the setting value of the job in such a case, the number of task execution units 132 that execute a task related to the execution of the job, the amount of memory allocated to the task, the number of tasks executed by the task execution unit 132, and the like are considered. It is done. Furthermore, as a setting value of a job when it is required to minimize the total power consumption of each task execution unit 132, for example, the number of task execution units 132 that are calculation resources can be considered.

（プログラム）
前記した分散処理装置１０が実行する処理をコンピュータが実行可能な言語で記述したプログラム（分散処理プログラム）で実現することもできる。この場合、コンピュータがプログラムを実行することにより、前記した実施の形態と同様の効果を得ることができる。さらに、かかるプログラムをコンピュータで読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータに読み込ませて実行することにより前記した実施形態と同様の処理を実現してもよい。以下に、分散処理装置１０と同様の機能を実現するプログラムを実行するコンピュータの一例を説明する。 (program)
The processing executed by the distributed processing apparatus 10 can also be realized by a program (distributed processing program) described in a language that can be executed by a computer. In this case, when the computer executes the program, the same effects as those of the above-described embodiment can be obtained. Further, such a program may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer and executed to execute the same processing as in the above-described embodiment. Hereinafter, an example of a computer that executes a program that realizes the same function as the distributed processing apparatus 10 will be described.

図６は、プログラムを実行するコンピュータ１０００を示す図である。図６に例示するように、コンピュータ１０００は、例えば、メモリ１０１０と、ＣＰＵ１０２０と、ハードディスクドライブインタフェース１０３０と、ディスクドライブインタフェース１０４０と、ネットワークインタフェース１０７０とを有し、これらの各部はバス１０８０によって接続される。 FIG. 6 is a diagram illustrating a computer 1000 that executes a program. As illustrated in FIG. 6, the computer 1000 includes, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, and a network interface 1070, and these units are connected by a bus 1080. The

メモリ１０１０は、図６に例示するように、ＲＯＭ（Read Only Memory）１０１１及びＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、図６に例示するように、ハードディスクドライブ１０３１に接続される。ディスクドライブインタフェース１０４０は、図６に例示するように、ディスクドライブ１０４１に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１０４１に挿入される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012 as illustrated in FIG. The ROM 1011 stores a boot program such as BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1031 as illustrated in FIG. The disk drive interface 1040 is connected to the disk drive 1041 as illustrated in FIG. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1041.

ここで、図６に例示するように、ハードディスクドライブ１０３１は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、上記のプログラムは、コンピュータ１０００によって実行される指令が記述されたプログラムモジュールとして、例えばハードディスクドライブ１０３１に記憶される。 Here, as illustrated in FIG. 6, the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the above program is stored in, for example, the hard disk drive 1031 as a program module in which a command to be executed by the computer 1000 is described.

また、上記実施の形態で説明した各種データは、プログラムデータとして、例えばメモリ１０１０やハードディスクドライブ１０３１に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０３１に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出し、前記したインタフェース提供部１１０、分散処理基盤部１３０、実行時情報取得部１２１、設定値生成処理部１２３、および、モデル更新部１２４の機能を実現する。 The various data described in the above embodiment is stored as program data, for example, in the memory 1010 or the hard disk drive 1031. Then, the CPU 1020 reads out the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1031 to the RAM 1012 as necessary, and the interface providing unit 110, the distributed processing base unit 130, and the runtime information acquisition unit 121 described above. The functions of the set value generation processing unit 123 and the model update unit 124 are realized.

なお、プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０３１に記憶される場合に限られず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ１０４１等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、プログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ネットワークを介して接続された他のコンピュータに記憶され、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 Note that the program module 1093 and program data 1094 related to the program are not limited to being stored in the hard disk drive 1031, but may be stored in, for example, a removable storage medium and read out by the CPU 1020 via the disk drive 1041 or the like. Good. Alternatively, the program module 1093 and the program data 1094 related to the program may be stored in another computer connected via a network and read by the CPU 1020 via the network interface 1070.

１０分散処理装置
４０ノード
１１０インタフェース提供部
１１１変換部
１１２実行部
１２０設定値生成部
１２１実行時情報取得部
１２２設定値生成モデル
１２３設定値生成処理部
１２４モデル更新部
１３０分散処理基盤部
１３２タスク実行部 DESCRIPTION OF SYMBOLS 10 Distributed processing apparatus 40 Node 110 Interface provision part 111 Conversion part 112 Execution part 120 Setting value generation part 121 Runtime information acquisition part 122 Setting value generation model 123 Setting value generation processing part 124 Model update part 130 Distributed processing base part 132 Task execution Part

Claims

A distributed processing base unit that executes one or more jobs using one or more nodes, and when the request processing input to the distributed processing base unit is received, the content of the request processing is processed by the distributed processing base unit Each of the jobs to be executed in a distributed processing device that includes an interface providing unit that converts the command into a possible job command and executes the one or more jobs using the distributed processing platform unit based on the converted command A setting value generation device for generating a setting value of
An execution time information acquisition unit for acquiring execution time information of the job from the distributed processing infrastructure unit at the start of execution of the job;
A storage unit that stores a setting value generation model for obtaining a setting value of the job based on the execution time information of the job;
A setting value generation processing unit configured to generate a setting value used for execution of the job by the setting value generation model using the execution time information of the job and to output the setting value to the distributed processing base unit; Setting value generator.

The set value generation apparatus according to claim 1, wherein a model that optimizes the set value is used as one of the set value generation models as a requirement for shortening the execution time of the job.

The runtime information is the amount of input data to the job,
The set value generation model is a model for obtaining, as a set value, a division size that is a maximum data amount processed by one task with respect to the input data amount in all task processes for executing the job,
The setting value generation apparatus according to claim 1, wherein the setting value of the job to be generated is the division size obtained by the setting value generation model.

The runtime information is an input data amount to the job and a user function used in the job,
The set value generation model includes a non-compressed transfer time that is a data transfer time of the intermediate output data that is not compressed when transferring intermediate output data obtained by executing processing by the user function for the input data amount, and the intermediate The compression transfer time is compared between the time required for compression and decompression processing of the intermediate output data when the output data is compressed and the transfer time during compression that is the total value of the transfer time of the intermediate output data after compression. On the other hand, when the uncompressed transfer time is short, the model is determined not to compress, and when the uncompressed transfer time is long, the model is determined to compress.
The setting value of the job to be generated is a value determined by the model and indicating whether or not intermediate output data is to be compressed in the job. Setting value generator.

The runtime information acquisition unit further includes:
Obtain statistical information of the job executed by the generated setting value,
The set value generation device further includes:
5. The model update unit according to claim 1, further comprising a model update unit configured to update the set value generation model with reference to the generated set value and statistical information of the job. Setting value generator.

A distributed processing platform that executes one or more jobs using one or more nodes;
When receiving an input of request processing, the content of the request processing is converted into a job command that can be processed by the distributed processing platform, and based on the converted command, the distributed processing platform is used, An interface providing unit for executing one or more jobs;
An execution time information acquisition unit for acquiring execution time information of the job from the distributed processing infrastructure unit at the start of execution of the job;
A storage unit that stores a setting value generation model for obtaining a setting value of the job based on the execution time information of the job;
With reference to the acquired job execution time information and the setting value generation model, a setting value used for execution of the job is generated, and a setting value generation processing unit that outputs the setting value to the distributed processing infrastructure unit,
The distributed processing platform is
The distributed processing apparatus, wherein the job is executed using the output setting value of the job.

A distributed processing base unit that executes one or more jobs using one or more nodes, and when the request processing input to the distributed processing base unit is received, the content of the request processing is processed by the distributed processing base unit Each of the jobs to be executed in a distributed processing device that includes an interface providing unit that converts the command into a possible job command and executes the one or more jobs using the distributed processing platform unit based on the converted command A setting value generation method for generating a setting value of
Based on the execution time information of the job, a computer that stores a setting value generation model for obtaining the setting value of the job,
Obtaining execution time information of the job from the distributed processing platform when starting execution of the job;
Referring to the acquired execution time information of the job and the setting value generation model, a setting value used for execution of the job is generated, and a setting value generation processing step is output to the distributed processing platform A setting value generation method characterized by the above.

A distributed processing base unit that executes one or more jobs using one or more nodes, and when the request processing input to the distributed processing base unit is received, the content of the request processing is processed by the distributed processing base unit Each of the jobs to be executed in a distributed processing device that includes an interface providing unit that converts the command into a possible job command and executes the one or more jobs using the distributed processing platform unit based on the converted command A setting value generation program for generating a setting value of
Based on the execution time information of the job, a computer that stores a setting value generation model for obtaining the setting value of the job,
Obtaining execution time information of the job from the distributed processing platform when starting execution of the job;
Referring to the execution time information of the acquired job and the setting value generation model, a setting value used for execution of the job is generated, and a setting value generation processing step that is output to the distributed processing platform is executed. Setting value generation program.