JP2010218307A

JP2010218307A - Distributed calculation controller and method

Info

Publication number: JP2010218307A
Application number: JP2009065232A
Authority: JP
Inventors: Hiroyasu Nishiyama; 博泰西山
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2009-03-17
Filing date: 2009-03-17
Publication date: 2010-09-30

Abstract

PROBLEM TO BE SOLVED: To solve the following problem: because a conventional Map/Reduce type distributed calculation processing system is configured by assuming a physical computer, a distributed arrangement method of calculation processing under environment using system virtualization technology or a communication method between virtual machines are not considered, so that deterioration of calculation processing performance occurs. SOLUTION: In this distributed calculation controller and method for distributedly arranging a plurality of pieces of second calculation processing constituting first calculation processing in the plurality of virtual machines operating on one or more physical machines, various kinds of arrangement patterns wherein the plurality of pieces of the second calculation processing are distributedly arranged in the plurality of virtual machines are detected, cost of each arrangement pattern is calculated, the arrangement pattern wherein the cost becomes minimum is selected based on a calculation result of the calculation, and the plurality of pieces of the second calculation processing are distributedly arranged in the plurality of virtual machines according to the arrangement pattern. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は分散計算制御装置及び方法に関し、システム仮想化機能を用いて分散計算処理を行う分散計算システムに適用して好適なものである。 The present invention relates to a distributed calculation control apparatus and method, and is suitable for application to a distributed calculation system that performs distributed calculation processing using a system virtualization function.

近年、Webデータや、センサからの入力データ、音声データ及び動画データなどの大量かつ大規模なデータの処理要求が増大している。このような大量かつ大規模なデータを処理するための処理システムとして、Map/Reduce型の分散計算システムが存在する（非特許文献１参照）。 In recent years, processing demands for large-scale and large-scale data such as Web data, input data from sensors, audio data, and moving image data are increasing. As a processing system for processing such a large amount of large-scale data, there is a Map / Reduce type distributed computing system (see Non-Patent Document 1).

Map/Reduce型の分散計算システムでは、計算ノード上に仮想的な分散ファイルシステムを設け、１つの計算処理を複数の計算ノードで分散して行う。このとき、かかる分散計算システムでは、目的とする計算処理をMap処理及びReduce処理の２つの計算処理に分けて定義する。Map処理は、＜key, value＞の組で表わされたデータ集合を別の＜key', value'＞の組の集合へ変換する処理である。またReduce処理は、Map処理の処理結果を集計する処理である。 In a map / reduce type distributed calculation system, a virtual distributed file system is provided on a calculation node, and one calculation process is distributed among a plurality of calculation nodes. At this time, in such a distributed calculation system, the target calculation process is defined by being divided into two calculation processes of a map process and a reduce process. The Map process is a process of converting a data set represented by a set of <key, value> into a set of another set of <key ', value'>. The Reduce process is a process for counting the processing results of the Map process.

Map処理及びReduce処理間の入出力データは、分散計算システム上に構築された分散ファイルシステムを通してやり取りされる。分散ファイルシステムに対する操作は、分散ファイルシステムの管理プログラムと、Map/Reduce処理を実行するデータ処理プログラムとの間の通信により行われる。Map/Reduce型の分散計算処理では、分散ファイルシステム上のデータをデータブロックと呼ばれる単位に分割し、これらのデータブロックに対してMap処理及びReduce処理を順次適用する。 Input / output data between Map processing and Reduce processing is exchanged through a distributed file system constructed on a distributed computing system. An operation on the distributed file system is performed by communication between the management program of the distributed file system and a data processing program that executes Map / Reduce processing. In the Map / Reduce type distributed calculation processing, data on the distributed file system is divided into units called data blocks, and Map processing and Reduce processing are sequentially applied to these data blocks.

このようなMap/Reduce型の分散計算システムによれば、各データブロックに対する一連の計算処理を複数の計算ノードで同時に分散して実行することが可能となる。各計算ノードに対する処理の配置は、基本的に処理の完了順に処理を計算ノードに割り当てることにより行われる。 According to such a map / reduce type distributed calculation system, a series of calculation processes for each data block can be simultaneously distributed and executed by a plurality of calculation nodes. Arrangement of processes for each computation node is basically performed by assigning processes to the computation nodes in the order of completion of the processes.

一方、近年では、物理的なマシン（以下、これを物理マシンと呼ぶ）上で複数の仮想的なマシンを稼動させる技術が知られている。このような仮想化技術はシステム仮想化技術と呼ばれ、物理マシン上で動作する仮想的なマシンは仮想マシンと呼ばれる。仮想マシンは、当該仮想マシン上で動作するソフトウェアからは物理マシンと同様に見える。
J. Dean他，「MapReduce: Simplified Data Processing on Large Clusters」，In Proceedings of OSDI'04, 2004 On the other hand, in recent years, a technique for operating a plurality of virtual machines on a physical machine (hereinafter referred to as a physical machine) is known. Such a virtualization technique is called a system virtualization technique, and a virtual machine that operates on a physical machine is called a virtual machine. A virtual machine looks like a physical machine to software running on the virtual machine.
J. Dean et al., “MapReduce: Simplified Data Processing on Large Clusters”, In Proceedings of OSDI'04, 2004

従来のMap/Reduce型の分散計算システムは、Map処理やReduce処理を実行する個々の計算ノードが物理マシンであることを想定している。このため従来のMap/Reduce型の分散計算システムでは、システム仮想化技術を用いた環境下における計算処理の分散配置方法や、仮想マシン間における通信方法については何らの考慮もされていない。この結果、従来のMap/Reduce型の分散計算システムでは、システム仮想化技術を用いた環境下においては、システム全体として計算処理性能が低下するという事態が生じていた。 A conventional Map / Reduce type distributed computation system assumes that each computation node that executes Map processing and Reduce processing is a physical machine. For this reason, in the conventional Map / Reduce type distributed computing system, no consideration is given to the distributed arrangement method of calculation processing in an environment using system virtualization technology and the communication method between virtual machines. As a result, in the conventional map / reduce type distributed computing system, there has been a situation in which the computing performance of the entire system deteriorates in an environment using system virtualization technology.

本発明は以上の点を考慮してなされたもので、システム仮想化技術を用いた環境下における計算処理性能を向上させ得る分散計算制御装置及び方法を提案しようとするものである。 The present invention has been made in consideration of the above points, and an object of the present invention is to propose a distributed calculation control apparatus and method capable of improving the calculation processing performance in an environment using a system virtualization technique.

かかる課題を解決するため本発明においては、第１の計算処理を構成する複数の第２の計算処理を、１又は複数の物理マシン上で動作する複数の仮想マシンに分散配置する分散計算制御装置において、前記複数の第２の計算処理を前記複数の仮想マシンに分散配置する各種の配置パターンを検出する配置パターン検出部と、前記配置パターンごとのコストを計算するコスト計算部と、前記コスト計算部の計算結果に基づいて、コストが最小となる配置パターンを選択し、当該配置パターンに従って前記複数の第２の計算処理を前記複数の仮想マシンに分散配置する分散配置部とを備えることを特徴とする。 In order to solve such a problem, in the present invention, a distributed calculation control device that distributes and arranges a plurality of second calculation processes constituting the first calculation process in a plurality of virtual machines operating on one or a plurality of physical machines. The arrangement pattern detection unit for detecting various arrangement patterns for distributing and arranging the plurality of second calculation processes in the plurality of virtual machines, the cost calculation unit for calculating the cost for each arrangement pattern, and the cost calculation A distribution arrangement unit that selects an arrangement pattern that minimizes the cost based on the calculation results of the units, and distributes the plurality of second calculation processes to the plurality of virtual machines according to the arrangement pattern. And

また本発明においては、第１の計算処理を構成する複数の第２の計算処理を、１又は複数の物理マシン上で動作する複数の仮想マシンに分散配置する分散計算制御方法において、前記複数の第２の計算処理を前記複数の仮想マシンに分散配置する各種の配置パターンを検出する第１のステップと、前記配置パターンごとのコストを計算する第２のステップと、当該計算の計算結果に基づいて、コストが最小となる配置パターンを選択し、当該配置パターンに従って前記複数の第２の計算処理を前記複数の仮想マシンに分散配置する第３のステップとを備えることを特徴とする。 According to the present invention, in the distributed calculation control method for distributing and arranging a plurality of second calculation processes constituting the first calculation process in a plurality of virtual machines operating on one or a plurality of physical machines, Based on a first step of detecting various arrangement patterns in which the second calculation process is distributed and arranged in the plurality of virtual machines, a second step of calculating a cost for each of the arrangement patterns, and a calculation result of the calculation And a third step of selecting an arrangement pattern with the lowest cost and distributing the plurality of second calculation processes to the plurality of virtual machines according to the arrangement pattern.

本発明によれば、システム仮想化技術を用いた環境下における計算処理性能を向上させることができる。 ADVANTAGE OF THE INVENTION According to this invention, the calculation processing performance in the environment using a system virtualization technique can be improved.

以下図面について、本発明の一実施の形態を詳述する。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

（１）第１の実施の形態
図１において、１００は全体として本実施の形態による分散計算システムを示す。この分散計算システム１００は、分散計算制御ノード１０１、第１及び第２の分散計算ノード１０２，１０３並びにネットワーク１０４から構成される。なお、図１では分散計算ノードが２台（第１及び第２の分散計算ノード１０２，１０３）存在する場合について例示しているが、分散計算システム１００内に３以上の分散計算ノードが存在する場合であっても本発明を適用することができる。 (1) 1st Embodiment In FIG. 1, 100 shows the distributed calculation system by this embodiment as a whole. The distributed calculation system 100 includes a distributed calculation control node 101, first and second distributed calculation nodes 102 and 103, and a network 104. Although FIG. 1 illustrates the case where there are two distributed calculation nodes (first and second distributed calculation nodes 102 and 103), there are three or more distributed calculation nodes in the distributed calculation system 100. Even in this case, the present invention can be applied.

分散計算制御ノード１０１は、ハードウェア１１０及び外部記憶装置１１１を備えて構成される。ハードウェア１１０は、ＣＰＵ（Central Processing Unit）１１０Ａ及び共有メモリ等の情報処理資源を備えており、このハードウェア１１０上で分散計算制御システム１１２が動作する。 The distributed computation control node 101 includes hardware 110 and an external storage device 111. The hardware 110 includes information processing resources such as a CPU (Central Processing Unit) 110 A and a shared memory, and the distributed calculation control system 112 operates on the hardware 110.

分散計算制御システム１１２は、ハードウェア１１０を構成するＣＰＵ１１０Ａが外部記憶装置１１１に格納された対応するプログラムを実行することにより実現される機能であり、外部記憶装置１１１に格納された分散計算定義１１３を読み込み、当該分散計算定義１１３に基づいて第１及び第２の分散計算ノード１０２，１０３に対する計算処理の分散配置を行う。分散計算制御システム１１２は、本分散計算システム１００の特徴的な処理であるジョブ配置処理を実行するための機能であるジョブ配置処理部１１４を備える。 The distributed calculation control system 112 is a function realized by the CPU 110 A constituting the hardware 110 executing a corresponding program stored in the external storage device 111, and the distributed calculation definition 113 stored in the external storage device 111. Is distributed based on the distributed calculation definition 113 for the first and second distributed calculation nodes 102 and 103. The distributed computation control system 112 includes a job placement processing unit 114 that is a function for executing job placement processing that is characteristic processing of the present distributed computation system 100.

第１及び第２の分散計算ノード１０２，１０３は、ハードウェア１２０，１３０と、外部記憶装置１２１，１３１とを備えて構成される。ハードウェア１２０，１３０は、ＣＰＵ（Central Processing Unit）１２０Ａ，１３０Ａ及び共有メモリ１２０Ｂ，１３０Ｂ等の情報処理資源を備えており、このハードウェア１２０，１３０上でシステム仮想化機能１２２，１３２が動作する。 The first and second distributed computation nodes 102 and 103 include hardware 120 and 130 and external storage devices 121 and 131. The hardware 120 and 130 includes information processing resources such as CPUs (Central Processing Units) 120A and 130A and shared memories 120B and 130B, and the system virtualization functions 122 and 132 operate on the hardware 120 and 130. .

システム仮想化機能１２２，１３２は、ハードウェア１２０，１３０を構成するＣＰＵ１２０Ａ，１３０Ａが同じ分散計算ノード１０２，１０３内の外部記憶装置１２１，１３１に格納された対応するプログラムを実行することにより実現される機能である。そして第１の分散計算ノード１０２のシステム仮想化機能１２２上で仮想マシン１２３，１２４が動作し、これら仮想マシン１２３，１２４上で分散処理アプリケーション１２５，１２６が動作する。また第２の分散計算ノード１０３のシステム仮想化機能１３２上では仮想マシン１３４，１３５が動作し、これら仮想マシン１３４，１３５上で分散処理アプリケーション１３６，１３６が動作する。なお図１では、システム仮想化機能１２２，１３２上で動作する仮想マシンの数がそれぞれ２つである場合について例示しているが、システム仮想化機能１２２，１３２上で動作する仮想マシンの数が３以上である場合でも本発明を適用することができる。 The system virtualization functions 122 and 132 are realized by the CPUs 120A and 130A constituting the hardware 120 and 130 executing corresponding programs stored in the external storage devices 121 and 131 in the same distributed computing nodes 102 and 103. It is a function. Then, virtual machines 123 and 124 operate on the system virtualization function 122 of the first distributed computing node 102, and distributed processing applications 125 and 126 operate on these virtual machines 123 and 124. In addition, virtual machines 134 and 135 operate on the system virtualization function 132 of the second distributed computing node 103, and distributed processing applications 136 and 136 operate on these virtual machines 134 and 135. 1 illustrates the case where the number of virtual machines operating on the system virtualization functions 122 and 132 is two, the number of virtual machines operating on the system virtualization functions 122 and 132 is illustrated. The present invention can be applied even when the number is three or more.

同一の分散計算ノード（第１又は第２の分散計算ノード１０２，１０３）上で動作する仮想マシン１２３，１２４又は１３４，１３５間は、その第１又は第２の分散計算ノード１０２，１０３内の共有メモリ１２０Ｂ，１３０Ｂを利用した通信路１２７，１３７によって高速にデータ転送を行うことができる。一方、別の分散計算ノード上で動作する仮想マシン１２３，１２４又は１３４，１３５間は、ネットワーク１０４を介したより低速な通信路１４０を介してデータ転送が行われる。 Between the virtual machines 123, 124 or 134, 135 operating on the same distributed computing node (first or second distributed computing node 102, 103), the first or second distributed computing node 102, 103 Data can be transferred at high speed by the communication paths 127 and 137 using the shared memories 120B and 130B. On the other hand, data is transferred between the virtual machines 123, 124 or 134, 135 operating on another distributed computing node via a slower communication path 140 via the network 104.

図２は、従来の分散計算システム２００の構成例を示す。この分散計算システム２００は、分散計算制御ノード２０１と、第１〜第４の分散計算ノード２０２〜２０５とがネットワーク２０６を介して接続されることにより構成されている。 FIG. 2 shows a configuration example of a conventional distributed computing system 200. The distributed calculation system 200 is configured by connecting a distributed calculation control node 201 and first to fourth distributed calculation nodes 202 to 205 via a network 206.

分散計算制御ノード２０１は、ハードウェア２１０及び外部記憶装置２１１を備えて構成され、ハードウェア２１０上で分散計算制御システム２１２が動作する。分散計算制御システム２１２は、ハードウェア２１０を構成するＣＰＵが外部記憶装置２１１に格納された対応するプログラムを実行することにより実現される機能であり、外部記憶装置２１１に格納された分散計算定義２１３を読み込み、第１〜第４の分散計算ノード２０２〜２０５に対する計算処理の分散配置を行う。 The distributed calculation control node 201 includes hardware 210 and an external storage device 211, and the distributed calculation control system 212 operates on the hardware 210. The distributed calculation control system 212 is a function realized by the CPU constituting the hardware 210 executing a corresponding program stored in the external storage device 211, and the distributed calculation definition 213 stored in the external storage device 211. Is distributed and calculation processing is distributed to the first to fourth distributed calculation nodes 202 to 205.

第１〜第４の分散計算ノード２０２〜２０５は、それぞれハードウェア２２０，２３０，２４０，２５０と外部記憶装置２２１，２３１，２４１，２５１から構成される。ハードウェア２２０，２３０，２４０，２５０は、ＣＰＵ、メモリ及び通信インタフェース等の情報処理資源から構成されており、これらのハードウェア２２０，２３０，２４０，２５０上でそれぞれ分散計算アプリケーション２２２，２３２，２４２，２５２が動作する。分散計算アプリケーション２２２，２３２，２４２，２５２間におけるデータ転送は、比較的低速のネットワーク２０６を利用した通信路２６０を介して行われる。 The first to fourth distributed computation nodes 202 to 205 are configured by hardware 220, 230, 240, 250 and external storage devices 221, 231, 241, 251 respectively. The hardware 220, 230, 240, 250 includes information processing resources such as a CPU, a memory, and a communication interface, and the distributed computing applications 222, 232, 242 on these hardware 220, 230, 240, 250, respectively. 252 operates. Data transfer between the distributed calculation applications 222, 232, 242, and 252 is performed via a communication path 260 using a relatively low-speed network 206.

図３は、本発明が対象としている分散計算処理の定義を示す。分散計算処理は、Map処理３０１及びReduce処理３０２からなる。Map処理３０１は、複数の＜key, value＞の対を各々の分散計算ノード１０２，１０３上の外部記憶装置１２１，１３１に格納された分散ファイルシステムのデータブロックから入力し、＜key, 1＞の対を出力する。Reduce処理では、各々のkeyについて、その出現回数を集計し、＜key, 集計値＞を出力する。なお、以下においては、個々のMap処理３０１及びReduce処理を、適宜、単位計算処理と呼ぶものとする。 FIG. 3 shows the definition of the distributed calculation process targeted by the present invention. The distributed calculation process includes a map process 301 and a reduce process 302. The map processing 301 inputs a plurality of <key, value> pairs from the data blocks of the distributed file system stored in the external storage devices 121 and 131 on the respective distributed computing nodes 102 and 103, and <key, 1>. Output a pair of In Reduce processing, the number of occurrences of each key is totaled and <key, total value> is output. In the following, each Map process 301 and Reduce process will be appropriately referred to as a unit calculation process.

かかる一連の計算処理を、図２について上述した従来の分散計算システム２００において分散配置する場合の配置例を図５に示す。図５の例では、第１〜第４の分散計算ノード２０２〜２０５内のハードウェア２２０，２３０，２４０，２５０において個々のデータ５０１に対するMap処理５０２及び又はReduce処理５０３が実行され、最終的な結果であるデータ５０４が出力される。 FIG. 5 shows an arrangement example in which such a series of calculation processing is distributed in the conventional distributed calculation system 200 described above with reference to FIG. In the example of FIG. 5, the Map processing 502 and / or the Reduce processing 503 for the individual data 501 are executed in the hardware 220, 230, 240, 250 in the first to fourth distributed computing nodes 202-205, and the final The result data 504 is output.

この場合において、第１〜第４の分散計算ノード２０２〜２０５はそれぞれ独立した存在であるため、ハードウェア２２０，２３０，２４０，２５０間のデータ転送はネットワーク２０６を利用した低速な通信路２６０を介して行う必要がある。例えば図５において、「BLK#1」というデータ５０１に対する「MAP#1」というMAP処理５０３の処理結果と、「BLK#2」というデータ５０１に対する「MAP#2」というMAP処理５０３の処理結果とを「RED#1」というReduce処理で利用するためには、「MAP#2」というMAP処理５０３の処理結果を「RED#1」というReduce処理を実行するハードウェア（ハードウェア２２０，２３０，２４０，２５０のいずれか）に転送する必要がある。 In this case, since the first to fourth distributed computation nodes 202 to 205 are independent from each other, data transfer between the hardware 220, 230, 240, and 250 is performed via the low-speed communication path 260 using the network 206. Need to do through. For example, in FIG. 5, the processing result of the MAP process 503 “MAP # 1” for the data 501 “BLK # 1” and the processing result of the MAP process 503 “MAP # 2” for the data 501 “BLK # 2” Is used in the Reduce process “RED # 1”, the hardware (hardware 220, 230, 240) that executes the Reduce process “RED # 1” from the processing result of the MAP process 503 “MAP # 2”. , 250).

一方、図４（Ａ）及び（Ｂ）は、図３に示す一連の計算処理を、本実施の形態による分散計算システム１００（図１）において分散配置する場合の配置例を示す。 On the other hand, FIGS. 4A and 4B show an arrangement example when the series of calculation processing shown in FIG. 3 is distributed in the distributed calculation system 100 (FIG. 1) according to the present embodiment.

図４（Ａ）は、「MAP#1」及び「MAP#2」という２つのMap処理４０２と、これに続く「RED#1」というReduce処理４０３とを同一のハードウェア１２０上で実行し、「MAP#3」及び「MAP#4」という２つのMap処理４０３と、これに続く「RED#2」というReduce処理４０４とを同一のハードウェア１３０上で実行するパターン（パターン＃１）を示す。 In FIG. 4A, two Map processes 402 “MAP # 1” and “MAP # 2” and a subsequent Reduce process 403 “RED # 1” are executed on the same hardware 120. A pattern (pattern # 1) in which two Map processes 403 "MAP # 3" and "MAP # 4" and a subsequent Reduce process 404 "RED # 2" are executed on the same hardware 130 is shown. .

また図４（Ｂ）は、「MAP#1」及び「MAP#2」という２つのMap処理４０３と、これに続く「RED#1」というReduce処理４０４とを別々のハードウェア１２０，１３０上で実行し、「MAP#3」及び「MAP#4」という２つのMap処理４０３と、これに続く「RED#2」というReduce処理４０４とを別々のハードウェア１３０，１２０上で実行するパターン（パターン＃２）を示す。 4B shows that two Map processes 403 “MAP # 1” and “MAP # 2” and a subsequent Reduce process 404 “RED # 1” are performed on different hardware 120 and 130. A pattern (pattern that executes two Map processes 403 “MAP # 3” and “MAP # 4” and a subsequent Reduce process 404 “RED # 2” on different hardware 130 and 120. # 2).

図４（Ａ）のパターン（パターン＃１）では、Map処理４０３の処理結果をReduce処理４０４で利用する際のデータ転送を高速な通信路１２７（図１）を介して行うことができるのに対し、図４（Ｂ）のパターン（パターン＃２）では、Map処理４０３の処理結果をReduce処理４０４で利用する際のデータ転送を低速な通信路１４０（図１）を介して行うことが必要となる。 In the pattern (pattern # 1) in FIG. 4A, the data transfer when the processing result of the map process 403 is used in the reduce process 404 can be performed via the high-speed communication path 127 (FIG. 1). On the other hand, in the pattern (pattern # 2) of FIG. 4B, it is necessary to perform data transfer when using the processing result of the Map process 403 in the Reduce process 404 via the low-speed communication path 140 (FIG. 1). It becomes.

図６は、図１の分散計算制御システム１１２の具体的な構成を示す。この図６に示すように、分散計算制御システム１１２は、分散計算処理全体の制御を行うジョブ制御部６０１と、個別の単位計算処理の制御を行うタスク制御部６０２と、分散ファイルシステム上の個別データの管理を行うデータ管理部６０３と、分散ファイルシステムの論理データの管理を行うメタデータ管理部６０４とから構成される。 FIG. 6 shows a specific configuration of the distributed calculation control system 112 of FIG. As shown in FIG. 6, the distributed calculation control system 112 includes a job control unit 601 that controls the entire distributed calculation processing, a task control unit 602 that controls individual unit calculation processing, and an individual on the distributed file system. A data management unit 603 that manages data and a metadata management unit 604 that manages logical data of the distributed file system are configured.

タスク制御部６０２及びデータ管理部６０３は、第１及び第２の分散計算ノード１０２，１０３上に複数存在する。ジョブ制御部６０１は、本分散計算システム１００の特徴的な処理であるジョブ配置処理を実行するジョブ配置処理部１１４を備える。ジョブ配置処理部１１４は、分散計算定義１１３を入力として個々の分散計算処理をどの仮想マシンに配置するかを計算し、計算結果に基づいてタスク制御部６０２を制御する。 A plurality of task control units 602 and data management units 603 exist on the first and second distributed computation nodes 102 and 103. The job control unit 601 includes a job placement processing unit 114 that executes job placement processing that is characteristic processing of the distributed computing system 100. The job placement processing unit 114 receives the distributed calculation definition 113 as an input, calculates which virtual machine the individual distributed calculation processing is placed in, and controls the task control unit 602 based on the calculation result.

ジョブ配置処理部１１４により実行されるジョブ配置処理の処理手順を図７に示す。ジョブ配置処理部１１４は、例えばユーザ操作等に応じてこの図７に示すジョブ配置処理を開始し、まず、ユーザにより入力設定された計算処理のフローグラフ（以下、これを処理フローグラフＧと呼ぶ）と、そのとき利用可能な仮想マシンの集合（以下、これを仮想マシン集合Ｍと呼ぶ）とを求める（ＳＰ１）。 FIG. 7 shows a processing procedure of job placement processing executed by the job placement processing unit 114. The job placement processing unit 114 starts the job placement processing shown in FIG. 7 in response to, for example, a user operation. First, a flow graph of calculation processing input by the user (hereinafter referred to as a processing flow graph G). ) And a set of virtual machines available at that time (hereinafter referred to as a virtual machine set M) (SP1).

なお、処理フローグラフＧは、既存のコンパイラ実装技術により求めることができる。ステップＳＰ１の処理により、例えば図４（Ａ）及び（Ｂ）に示すような処理フローグラフＧを得ることができる。また仮想マシン集合Ｍは、分散計算システム１００内に存在するすべての仮想マシンの情報を収集し、これら仮想マシンの中から利用可能な仮想マシンを抽出することにより求めることができる。 The processing flow graph G can be obtained by existing compiler mounting technology. By the processing in step SP1, for example, a processing flow graph G as shown in FIGS. 4A and 4B can be obtained. The virtual machine set M can be obtained by collecting information on all virtual machines existing in the distributed computing system 100 and extracting available virtual machines from these virtual machines.

続いてジョブ配置処理部１１４は、処理フローグラフＧの各要素（単位計算処理）を、それぞれ仮想マシン集合Ｍに属するいずれかの仮想マシンにランダムに配置したパターン（以下、これを配置パターンｑと呼ぶ）の集合（以下、これを配置パターン集合Ｑと呼ぶ）を求める（ＳＰ２）。そしてジョブ配置処理部１１４は、このようにして得られた配置パターン集合Ｑの情報をハードウェア１１０（図１）内の図示しないメモリに保存する。 Subsequently, the job arrangement processing unit 114 randomly arranges each element (unit calculation process) of the processing flow graph G in any of the virtual machines belonging to the virtual machine set M (hereinafter referred to as an arrangement pattern q). (Referred to as an arrangement pattern set Q hereinafter) (SP2). Then, the job arrangement processing unit 114 stores the information of the arrangement pattern set Q obtained in this way in a memory (not shown) in the hardware 110 (FIG. 1).

図８は、図４に示す一連の計算処理を図１の分散計算システム１００で実行する場合における配置パターン集合Ｑを示す。この配置パターン集合Ｑには、「MAP#1」〜「MAP#4」という４つのMap処理と、「RED#1」及び「RED#2」という２つのReduce処理とを、それぞれ「VM#1」〜「VM#4」という４つの仮想マシン１２３，１２４，１３３，１３４に分散配置した場合の全配置パターンｑが含まれる。 FIG. 8 shows an arrangement pattern set Q when the series of calculation processes shown in FIG. 4 is executed by the distributed calculation system 100 of FIG. In this arrangement pattern set Q, four Map processes “MAP # 1” to “MAP # 4” and two Reduce processes “RED # 1” and “RED # 2” are respectively referred to as “VM # 1”. ”To“ VM # 4 ”, the entire arrangement pattern q in the case of distributed arrangement in the four virtual machines 123, 124, 133, and 134 is included.

図８では、各行が個々の配置パターンｑを表しており、例えば、項番が「１」の配置パターンｑは、すべての単位計算処理（４つのMap処理及び２つのReduce処理のすべて）を「VM#1」という仮想マシン１２３に配置した配置パターンを表し、項番が「２」の配置パターンｑは、４つのMap処理及び１つのReduce処理（「RED#1」）を「VM#1」という仮想マシン１２３に配置し、もう１つのReduce処理（「RED#2」）を「VM#2」という仮想マシン１２４に配置した配置パターンを表す。 In FIG. 8, each row represents an individual arrangement pattern q. For example, an arrangement pattern q having an item number “1” represents all unit calculation processes (all four Map processes and two Reduce processes) as “ “VM # 1” represents an arrangement pattern arranged in the virtual machine 123, and an arrangement pattern q having an item number “2” is obtained by performing four Map processes and one Reduce process (“RED # 1”) on “VM # 1”. Represents a placement pattern in which another Reduce process (“RED # 2”) is placed in the virtual machine 124 called “VM # 2”.

次いでジョブ配置処理部１１４は、上述の配置パターン集合Ｑを構成する各配置パターンｑについて、そのコスト合計値Ｃｑをそれぞれ計算する（ＳＰ３）。さらにジョブ配置処理部１１４は、かかるコスト合計値Ｃｑが最低となるような配置パターンｑを選択し（ＳＰ４）、この後、このジョブ配置処理を終了する。 Next, the job placement processing unit 114 calculates the total cost value Cq of each placement pattern q constituting the placement pattern set Q described above (SP3). Furthermore, the job arrangement processing unit 114 selects an arrangement pattern q that minimizes the total cost value Cq (SP4), and thereafter ends the job arrangement process.

以上のジョブ配置処理のステップＳＰ３におけるジョブ配置処理部１１４の具体的な処理内容を図９に示す。ジョブ配置処理部１１４は、ジョブ配置処理のステップＳＰ３に進むと、この図９に示すコスト計算処理を開始し、まず、ジョブ配置処理のステップＳＰ２において求めた配置パターン集合Ｑの情報をメモリから読み出す（ＳＰ１０）。 FIG. 9 shows specific processing contents of the job placement processing unit 114 in step SP3 of the job placement processing described above. When the job placement processing unit 114 proceeds to step SP3 of the job placement processing, the job placement processing portion 114 starts the cost calculation processing shown in FIG. 9, and first reads out information of the placement pattern set Q obtained in step SP2 of the job placement processing from the memory. (SP10).

続いてジョブ配置処理部１１４は、配置パターン集合Ｑが空集合であるか否かを判定し（ＳＰ１１）、配置パターン集合Ｑが空集合である場合には（ＳＰ１１：ＹＥＳ）、このコスト計算処理を終了してジョブ配置処理に戻る。 Subsequently, the job arrangement processing unit 114 determines whether or not the arrangement pattern set Q is an empty set (SP11). If the arrangement pattern set Q is an empty set (SP11: YES), this cost calculation process is performed. To return to the job placement process.

これに対してジョブ配置処理部１１４は、かかる配置パターン集合Ｑが空集合でない場合には（ＳＰ１１：ＮＯ）、配置パターン集合Ｑから要素（配置パターンｑ）を１つ取り出し、その配置パターンｑを配置パターン集合Ｑから削除する。またジョブ配置処理部１１４は、その配置パターンｑにおける個々の単位計算処理がそれぞれ割り当てられた仮想マシンの集合（以下、これを割当て仮想マシン集合Ｐと呼ぶ）を求める。さらにジョブ配置処理部１１４は、上述のコスト合計値Ｃｑを「０」に初期化する（ＳＰ１２）。 On the other hand, when the arrangement pattern set Q is not an empty set (SP11: NO), the job arrangement processing unit 114 extracts one element (arrangement pattern q) from the arrangement pattern set Q, and determines the arrangement pattern q. Delete from the arrangement pattern set Q. Further, the job placement processing unit 114 obtains a set of virtual machines to which each unit calculation process in the placement pattern q is assigned (hereinafter referred to as an assigned virtual machine set P). Furthermore, the job placement processing unit 114 initializes the above-described total cost value Cq to “0” (SP12).

続いてジョブ配置処理部１１４は、割当て仮想マシン集合Ｐが空集合か否かを判断する（ＳＰ１３）。そしてジョブ配置処理部１１４は、かかる割当て仮想マシン集合Ｐが空集合でない場合には（ＳＰ１３：ＮＯ）、割当て仮想マシン集合Ｐから要素ｐ（単位計算処理が割り当てられた仮想マシン）を１つ選択し、選択した仮想マシンを割当て仮想マシン集合Ｐから除去する。またジョブ配置処理部１１４は、その選択した仮想マシンに関するコストＣｐを計算し、得られたコストＣｐをコスト合計値Ｃｑに加算する（ＳＰ１４）。 Subsequently, the job placement processing unit 114 determines whether or not the assigned virtual machine set P is an empty set (SP13). If the assigned virtual machine set P is not an empty set (SP13: NO), the job placement processing unit 114 selects one element p (virtual machine to which unit calculation processing is assigned) from the assigned virtual machine set P. Then, the selected virtual machine is removed from the assigned virtual machine set P. The job placement processing unit 114 calculates the cost Cp related to the selected virtual machine, and adds the obtained cost Cp to the total cost value Cq (SP14).

続いてジョブ配置処理部１１４は、ステップＳＰ１３に戻り、この後、そのとき対象としている配置パターンｑの割当て仮想マシン集合Ｐが空集合となるまで、同様の処理を繰り返す（ＳＰ１３−ＳＰ１４−ＳＰ１３）。 Subsequently, the job placement processing unit 114 returns to step SP13, and thereafter repeats the same processing until the assigned virtual machine set P of the target placement pattern q becomes an empty set (SP13-SP14-SP13). .

そしてジョブ配置処理部１１４は、やがてそのとき対象としている配置パターンｑの割当て仮想マシン集合Ｐに含まれる各仮想マシンに対するコストＣｐをすべて合計したコスト合計値Ｃｑを得ることによりステップＳＰ１３において肯定結果を得ると（ＳＰ１３：ＹＥＳ）、その配置パターンｑのコスト合計値Ｃｑを記憶した後（ＳＰ１５）、ステップＳＰ１１に戻る。 Then, the job placement processing unit 114 eventually obtains an affirmative result in step SP13 by obtaining the total cost value Cq obtained by summing all the costs Cp for the virtual machines included in the assigned virtual machine set P of the placement pattern q that is the target at that time. If obtained (SP13: YES), after storing the cost total value Cq of the arrangement pattern q (SP15), the process returns to step SP11.

またジョブ配置処理部１１４は、この後、ステップＳＰ１１において肯定結果を得るまで同様の処理を繰り返す（ＳＰ１１〜ＳＰ１５）。そしてジョブ配置処理部１１４は、やがてステップＳＰ１０において取得した配置パターン集合Ｑを構成するすべての配置パターンｑについて同様の処理を終えることによりステップＳＰ１１において肯定結果を得ると、このコスト計算処理を終了する。 The job placement processing unit 114 thereafter repeats the same processing until a positive result is obtained in step SP11 (SP11 to SP15). When the job placement processing unit 114 eventually obtains a positive result in step SP11 by completing the same processing for all the placement patterns q constituting the placement pattern set Q acquired in step SP10, it ends this cost calculation processing. .

なお、上述のコスト計算処理のステップＳＰ１５における個々の単位計算処理に関するコスト計算処理時には、最適化したい対象に応じて、適宜、条件を設定することができる。かかる条件としては、実行時間、消費電力及び又はＩ／Ｏ量などを適用することが考えられる。このような条件の一例を図１０に示す。条件欄１０００に示す条件にマッチする場合、その処理をその仮想マシンに割り当てたときのコストがコスト欄１００１に示すコストとして計算される。 It should be noted that conditions can be set as appropriate according to the target to be optimized during the cost calculation process related to each unit calculation process in step SP15 of the above-described cost calculation process. As such conditions, it is conceivable to apply execution time, power consumption, and / or I / O amount. An example of such conditions is shown in FIG. When the condition shown in the condition column 1000 is matched, the cost when the process is assigned to the virtual machine is calculated as the cost shown in the cost column 1001.

この場合において、本実施の形態においては、Map処理と、当該Map処理の処理結果を利用するReduce処理とを同一のハードウェア１２０，１３０上で動作する仮想マシンに割り当てることを目的として、このような関係を有するMap処理及びReduce処理が、同一のハードウェア１２０，１３０上で動作する１又は複数の仮想マシンに分散配置されているときのコストが低く設定され（図１０の項番「１」参照）、かかる関係を有するMap処理及びReduce処理が、異なるハードウェア１２０，１３０上で動作する１又は複数の仮想マシンに分散配置されているときのコスト高く設定されている（図１０の項番「２」参照）。 In this case, in the present embodiment, for the purpose of allocating the Map process and the Reduce process using the processing result of the Map process to the virtual machines operating on the same hardware 120 and 130, The cost when the Map processing and Reduce processing having the above relationship are distributed and arranged in one or a plurality of virtual machines operating on the same hardware 120, 130 is set low (item number “1” in FIG. 10). The map processing and the Reduce processing having such a relationship are set high in cost when they are distributed and arranged in one or a plurality of virtual machines operating on different hardware 120 and 130 (item numbers in FIG. 10). (See “2”).

この結果、図４（Ａ）の配置パターンには図１０の項番「１」の条件のみが該当するため、この配置パターンのコストは「１」と計算することができる。また図４（Ｂ）の配置パターンには図１０の項番「２」〜「５」の条件が該当するため、この配置パターンのコストは「１８」と計算することができる。よって、図４（Ａ）及び（Ｂ）の例の場合、ジョブ配置処理部１１４は、図７のステップＳＰ４において図４（Ａ）の配置パターンを選択することとなり、より高性能な分散配置を行うことが可能となる。 As a result, since only the condition of item number “1” in FIG. 10 is applicable to the arrangement pattern in FIG. 4A, the cost of this arrangement pattern can be calculated as “1”. Further, since the conditions of item numbers “2” to “5” in FIG. 10 correspond to the arrangement pattern of FIG. 4B, the cost of this arrangement pattern can be calculated as “18”. Therefore, in the example of FIGS. 4A and 4B, the job placement processing unit 114 selects the placement pattern of FIG. 4A in step SP4 of FIG. Can be done.

以上の観点より、本実施の形態の場合、コスト計算時の条件が登録された図１０に示すような条件テーブルが外部記憶装置１１１に格納されており、ジョブ配置処理部１１４は、この条件テーブルを参照して、コスト計算処理のステップＳＰ１５におけるコスト計算処理を実行する。 From the above viewpoint, in the case of the present embodiment, a condition table as shown in FIG. 10 in which the conditions at the time of cost calculation are registered is stored in the external storage device 111. The cost calculation process in step SP15 of the cost calculation process is executed with reference to FIG.

図１１は、コスト計算処理のステップＳＰ１５において実行される、そのとき対象としている配置パターンｑの割当て仮想マシン集合Ｐの要素ｐごとのコスト計算処理について、実行時間を条件とした場合の処理内容の一例を示す。 FIG. 11 shows the processing contents when the execution time is a condition for the cost calculation processing for each element p of the allocation virtual machine set P of the allocation pattern q to be executed at that time in step SP15 of the cost calculation processing. An example is shown.

ジョブ配置処理部１１４は、図９について上述したジョブ配置処理のステップＳＰ１５に進むと、この図１１に示す要素コスト計算処理を開始し、上述のコストＣｐ及び後述する合計通信時間ｐｔをそれぞれ「０」に初期化する（ＳＰ２０）。 When the job placement processing unit 114 proceeds to step SP15 of the job placement process described above with reference to FIG. 9, it starts the element cost calculation process shown in FIG. ”(SP20).

続いて、ジョブ配置処理部１１４は、そのとき対象としている仮想マシン集合Ｐの要素ｐ（単位計算処理が割り当てられた仮想マシン）の通信先となる仮想マシン（以下、これを通信先仮想マシンと呼ぶ）の集合（以下、適宜、これを通信先集合Ｄと呼ぶ）を求め（ＳＰ２１）、この通信先集合Ｄが空集合であるか否かを判断する（ＳＰ２２）。 Subsequently, the job placement processing unit 114 is a virtual machine (hereinafter referred to as a communication destination virtual machine) that is a communication destination of the element p (a virtual machine to which unit calculation processing is assigned) of the target virtual machine set P at that time. (Referred to as a communication destination set D hereinafter) (SP21), and determines whether the communication destination set D is an empty set (SP22).

そしてジョブ配置処理部１１４は、かかる判断において否定結果を得ると（ＳＰ２３）、通信先集合Ｄの要素ｄ（通信先仮想マシン）を１つ取り出し、その通信先仮想マシンを通信先集合Ｄから削除する。またジョブ配置処理部１１４は、そのとき対象としている単位計算処理が割り当てられた仮想マシン（以下、適宜、これを通信元仮想マシンと呼ぶ）と、その通信先仮想マシンとの間における通信時間ｔを合計通信時間ｐｔに加算する（ＳＰ２３）。 If the job placement processing unit 114 obtains a negative result in this determination (SP23), it extracts one element d (communication destination virtual machine) of the communication destination set D and deletes the communication destination virtual machine from the communication destination set D. To do. In addition, the job placement processing unit 114 communicates between the virtual machine to which the target unit calculation process is assigned (hereinafter referred to as a communication source virtual machine as appropriate) and the communication destination virtual machine. Is added to the total communication time pt (SP23).

続いてジョブ配置処理部１１４は、ステップＳＰ２２に戻り、この後、そのとき対象としている通信元仮想マシンについての通信先集合Ｄが空集合となるまで、同様の処理を繰り返す（ＳＰ２２−ＳＰ２３−ＳＰ２２）。 Subsequently, the job placement processing unit 114 returns to step SP22, and thereafter repeats the same processing until the communication destination set D for the target communication source virtual machine becomes an empty set (SP22-SP23-SP22). ).

そしてジョブ配置処理部１１４は、やがてそのとき対象としている通信元仮想マシンと各通信先仮想マシンとの間の通信時間ｔの合計値である合計通信時間ｐｔを得ることによりステップＳＰ２２において肯定結果を得ると（ＳＰ２２：ＹＥＳ）、そのとき取得した合計通信時間ｐｔと、分散計算定義１１３に含まれる図１０について上述した条件とに基づいて、そのとき対象としている割当て仮想マシン集合Ｐの要素ｐについてのコストＣｐを取得し（ＳＰ２４）、この後、コスト計算処理（図９）に戻る。 Then, the job placement processing unit 114 eventually obtains an affirmative result in step SP22 by obtaining the total communication time pt that is the total value of the communication times t between the communication source virtual machine and the communication destination virtual machines that are the target at that time. If obtained (SP22: YES), based on the total communication time pt acquired at that time and the condition described above with reference to FIG. 10 included in the distributed calculation definition 113, the element p of the assigned virtual machine set P that is the target at that time Cost Cp is acquired (SP24), and then the process returns to the cost calculation process (FIG. 9).

なお、コスト計算の条件が消費電力及びＩ／Ｏ処理の場合についても、図１１と同様のアルゴリズムによってコスト計算を行うことが可能である。 Even when the cost calculation conditions are power consumption and I / O processing, it is possible to perform cost calculation using the same algorithm as in FIG.

以上のように本実施の形態による分散計算システム１００では、個々の単位計算処理を割り当て可能な仮想マシンに分散配置した配置パターンをすべて求め、これら配置パターンｐについてそれぞれコストを計算し、最もコストが低い配置パターンｐを選択するようにしたことにより、Map処理及び当該Map処理の処理結果を使用するReduce処理を、同一の物理マシン（第１又は第２の分散計算ノード１０２，１０３）上で動作する仮想マシンに分散配置することができる。かくするにつき、分散計算システム１００における分散計算処理の処理性能を向上させることができる。 As described above, in the distributed calculation system 100 according to the present embodiment, all the arrangement patterns distributed to the virtual machines to which individual unit calculation processes can be allocated are obtained, the costs are calculated for each of the arrangement patterns p, and the cost is the highest. By selecting a low arrangement pattern p, the Map process and the Reduce process using the processing result of the Map process are operated on the same physical machine (first or second distributed computing node 102, 103). Can be distributed to virtual machines. As a result, the processing performance of the distributed calculation processing in the distributed calculation system 100 can be improved.

（２）第２の実施の形態
第１の実施の形態によれば、Map処理及び当該Map処理の処理結果を使用するReduce処理を同一の物理マシン上で動作する仮想マシンに分散配置することができる。 (2) Second Embodiment According to the first embodiment, Map processing and Reduce processing using the processing result of the Map processing can be distributed and arranged in virtual machines operating on the same physical machine. it can.

この場合において、同一の物理マシン上で動作する仮想マシン間は、共有メモリ１２０Ｂ，１３０Ｂ（図１）を利用した高速な通信路１２７（図１）を利用したデータ転送を行うことができるため、かかるMap処理の処理結果をReduce処理に転送する際、共有メモリ１２０Ｂ，１３０Ｂを利用した通信路１２７（図１）を介して行うことにより分散計算処理の高速化を図ることができる。 In this case, data transfer using the high-speed communication path 127 (FIG. 1) using the shared memories 120B and 130B (FIG. 1) can be performed between the virtual machines operating on the same physical machine. When the processing result of the Map process is transferred to the Reduce process, the speed of the distributed calculation process can be increased by using the communication path 127 (FIG. 1) using the shared memories 120B and 130B.

図１２は、このような観点に基づいてジョブ配置処理部１１４により実行される、仮想マシン間の通信路を設定する処理（以下、これを通信路設定処理と呼ぶ）の処理内容を示す。ジョブ配置処理部１１４は、この図１２に示す通信路設定処理を、外部記憶装置１１１（図１）に格納されている対応するプログラムに基づいて実行する。 FIG. 12 shows the processing contents of processing for setting a communication path between virtual machines (hereinafter referred to as communication path setting processing) executed by the job placement processing unit 114 based on such a viewpoint. The job placement processing unit 114 executes the communication path setting process shown in FIG. 12 based on a corresponding program stored in the external storage device 111 (FIG. 1).

すなわちジョブ配置処理部１１４は、図７について上述したジョブ配置処理を終了すると、この図１２に示す通信路設定処理を開始し、まず、かかるジョブ配置処理のステップＳＰ４において選択した配置パターンｑの割当て仮想マシン集合Ｐを求める（ＳＰ３０）。 That is, when the job placement processing unit 114 finishes the job placement processing described above with reference to FIG. 7, the job placement processing unit 114 starts the communication path setting processing shown in FIG. 12, and first assigns the placement pattern q selected in step SP4 of the job placement processing. A virtual machine set P is obtained (SP30).

続いてジョブ配置処理部１１４は、その割当て仮想マシン集合Ｐが空集合であるか否かを判断し（ＳＰ３１）、当該割当て仮想マシン集合Ｐが空集合でない場合には（ＳＰ３１：ＮＯ）、割当て仮想マシン集合Ｐの要素ｐ（単位計算処理が割り当てられた仮想マシン）を１つ選択して、その通信先集合Ｄを求める（ＳＰ３２）。またジョブ配置処理部１１４は、これと併せてその要素ｐを割当て仮想マシン集合Ｐから削除する。 Subsequently, the job placement processing unit 114 determines whether or not the assigned virtual machine set P is an empty set (SP31). If the assigned virtual machine set P is not an empty set (SP31: NO), the assigned virtual machine set P is assigned. One element p (virtual machine to which unit calculation processing is assigned) of the virtual machine set P is selected, and its communication destination set D is obtained (SP32). In addition, the job placement processing unit 114 deletes the element p from the assigned virtual machine set P together with this.

続いてジョブ配置処理部１１４は、かかる通信先集合Ｄが空集合であるか否かを判断し（ＳＰ３３）、否定結果を得ると（ＳＰ３３：ＮＯ）、かかる通信先集合Ｄの要素ｄ（通信先仮想マシン）を１つ選択して（ＳＰ３４）、かかる通信先仮想マシンとステップＳＰ３２において選択した仮想マシン（通信元仮想マシン）とが同一のハードウェア１２０，１３０上で動作するか否かを判断する（ＳＰ３５）。この際、ジョブ配置処理部１１４は、その要素ｄを通信先集合Ｄから削除する。 Subsequently, the job placement processing unit 114 determines whether or not the communication destination set D is an empty set (SP33), and if a negative result is obtained (SP33: NO), the element d (communication) of the communication destination set D is obtained. (Destination Virtual Machine) is selected (SP34), and whether or not the communication destination virtual machine and the virtual machine (communication source virtual machine) selected in Step SP32 operate on the same hardware 120, 130. Judgment is made (SP35). At this time, the job placement processing unit 114 deletes the element d from the communication destination set D.

ジョブ配置処理部１１４は、かかる判断において肯定結果（ＳＰ３５：ＹＥＳ）を得ると、これら通信元仮想マシン及び通信先仮想マシン間の通信路として共有メモリ１２０Ｂ，１３０Ｂを利用した通信路１２７を設定する（ＳＰ３６）。そしてジョブ配置処理部１１４は、この後、ステップＳＰ３３に戻る。 If the job placement processing unit 114 obtains a positive result (SP35: YES) in this determination, it sets the communication path 127 using the shared memories 120B and 130B as the communication path between the communication source virtual machine and the communication destination virtual machine. (SP36). Then, the job placement processing unit 114 returns to step SP33.

これに対してジョブ配置処理部１１４は、ステップＳＰ３５の判断において否定結果（ＳＰ３５：ＮＯ）を得ると、かかる通信元仮想マシン及び通信先仮想マシン間の通信路としてネットワーク１０４（図１）を利用した通信路１４０（図１）を設定する（ＳＰ３７）。そしてジョブ配置処理部１１４は、この後、ステップＳＰ３３に戻る。 On the other hand, when the job placement processing unit 114 obtains a negative result (SP35: NO) in the determination at step SP35, it uses the network 104 (FIG. 1) as a communication path between the communication source virtual machine and the communication destination virtual machine. The established communication path 140 (FIG. 1) is set (SP37). Then, the job placement processing unit 114 returns to step SP33.

そしてジョブ配置処理部１１４は、やがてステップＳＰ３２において選択した仮想マシン（通信元仮想マシン）の通信先集合Ｄのすべての要素ｄについて同様の処理を終えることによりステップＳＰ３３において肯定結果を得るとステップＳＰ３１に戻り、この後、ステップＳＰ３１以降を同様に処理する。 When the job placement processing unit 114 eventually obtains an affirmative result in step SP33 by completing similar processing for all elements d of the communication destination set D of the virtual machine (communication source virtual machine) selected in step SP32, step SP31 is obtained. Thereafter, the processing after step SP31 is similarly performed.

そしてジョブ配置処理部１１４は、やがてジョブ配置処理（図７）のステップＳＰ４において選択した配置パターンｑの割当て仮想マシン集合Ｐのすべての要素ｐについて同様の処理を終えることによりステップＳＰ３１において肯定結果を得ると（ＳＰ３１：ＹＥＳ）、この通信路設定処理を終了する。 Then, the job placement processing unit 114 eventually gives a positive result in step SP31 by completing the same processing for all the elements p of the assigned virtual machine set P to which the placement pattern q selected in step SP4 of the job placement processing (FIG. 7). If it is obtained (SP31: YES), this communication path setting process is terminated.

以上のように本実施の形態によれば、同一ハードウェア１２０，１３０上で動作する仮想マシン間におけるデータ転送の通信路として共有メモリ１２０Ｂ，１３０Ｂを利用した通信路１２７を設定することができるため、分散計算処理の高速化を図ることができる。 As described above, according to the present embodiment, the communication path 127 using the shared memories 120B and 130B can be set as a communication path for data transfer between virtual machines operating on the same hardware 120 and 130. Therefore, it is possible to speed up the distributed calculation processing.

（３）第３の実施の形態
第１の実施の形態によれば、Map処理及び当該Map処理の処理結果を使用するReduce処理を同一の物理マシン上で動作する仮想マシンに分散配置することができるが、このような入出力関係を有するMap処理及びreduce処理を同一の仮想マシンにおいて実行することによって、より一層と分散計算処理の高速化を図ることができる。 (3) Third Embodiment According to the first embodiment, Map processing and Reduce processing using the processing result of the Map processing can be distributed and arranged in virtual machines operating on the same physical machine. However, by executing the map process and the reduce process having such an input / output relationship in the same virtual machine, the speed of the distributed calculation process can be further increased.

図１３は、かかる入出力関係を有するMap処理及びReduce処理を１つの仮想マシンにまとめて配置するジョブまとめ処理の処理手順を示すフローチャートである。ジョブ配置処理部１１４は、この図１３に示すジョブまとめ処理を、外部記憶装置１１１（図１）に格納されたプログラムに基づいて、例えば図７について上述したジョブ配置処理の終了後であって、図１２について上述した通信路設定処理の実行前に実行する。 FIG. 13 is a flowchart illustrating a processing procedure of job summarization processing in which Map processing and Reduce processing having such an input / output relationship are collectively arranged in one virtual machine. The job placement processing unit 114 performs the job summarization processing shown in FIG. 13 after the job placement processing described above with reference to FIG. 7, for example, based on the program stored in the external storage device 111 (FIG. 1). This is executed before the communication path setting process described above with reference to FIG.

すなわちジョブ配置処理部１１４は、かかるジョブ配置処理が終了すると、このジョブまとめ処理を開始し、まず、かかるジョブ配置処理のステップＳＰ４において選択した配置パターンｐにおいて、Map処理及び当該Map処理の処理結果を使用するReduce処理の組の集合（以下、これを関連処理集合Ｙと呼ぶ）を求め（ＳＰ４０）、その関連処理集合Ｙが空集合であるか否かを判断する（ＳＰ４１）。 That is, when the job placement processing unit 114 completes the job placement processing, the job placement processing unit 114 starts the job summarization processing. First, in the placement pattern p selected in step SP4 of the job placement processing, the map processing and the processing result of the map processing are performed. Is obtained (hereinafter referred to as a related processing set Y) (SP40), and it is determined whether or not the related processing set Y is an empty set (SP41).

ジョブ配置処理部１１４は、この関連処理集合Ｙの要素ｙ（Map処理及びReduce処理の組）を１つ選択し（ＳＰ４２）、その要素ｙについて、Map処理を実行する仮想マシンと、Reduce処理を実行する仮想マシンとが同一ハードウェア１２０，１３０上で動作するか否かを判断する（ＳＰ４３）。 The job placement processing unit 114 selects one element y (a combination of the map process and the reduce process) of the related process set Y (SP42), and for the element y, the virtual machine that executes the map process and the reduce process are selected. It is determined whether or not the virtual machine to be executed operates on the same hardware 120 and 130 (SP43).

そしてジョブ配置処理部１１４は、この判断において否定結果を得るとステップＳＰ４１に戻り、これに対して肯定結果を得ると、かかるMap処理及びReduce処理を融合可能であるか否かを判断する（ＳＰ４４）。 If the job placement processing unit 114 obtains a negative result in this determination, the job placement processing unit 114 returns to step SP41. If the job placement processing unit 114 obtains a positive result, the job placement processing unit 114 determines whether the map processing and the reduce processing can be merged (SP44). ).

ジョブ配置処理部１１４は、この判断において否定結果を得るとステップＳＰ４１に戻り、これに対して肯定結果を得ると、かかるMap処理及びReduce処理を融合し、この融合により得られた一連のMap/Reduce処理をそれまでそのMap処理又はReduce処理が割り当てられていたいずれかの仮想マシンに再割り当てした後（ＳＰ４５）、ステップＳＰ４１に戻る。 If the job placement processing unit 114 obtains a negative result in this determination, it returns to step SP41, and if it obtains a positive result, it merges such Map processing and Reduce processing, and a series of Map / After the Reduce process is reassigned to one of the virtual machines to which the Map process or Reduce process has been assigned (SP45), the process returns to step SP41.

この後、ジョブ配置処理部１１４は、ステップＳＰ４１において肯定結果を得るまで同様の処理を繰り返す（ＳＰ４１〜ＳＰ４５−ＳＰ４１）。そしてジョブ配置処理部は、やがて関連処理集合Ｙに属するすべての要素ｙについて同様の処理を実行し終えることによりステップＳＰ４１において肯定結果を得ると、このジョブまとめ処理を終了する。 Thereafter, the job placement processing unit 114 repeats the same processing until a positive result is obtained in step SP41 (SP41 to SP45-SP41). If the job placement processing unit eventually obtains a positive result in step SP41 by completing the same processing for all elements y belonging to the related processing set Y, it ends this job summarization processing.

以上のように本実施の形態によれば、Map処理及び当該Map処理の処理結果を使用するReduce処理を同一の仮想マシンにまとめて配置することができるため、より一層と分散計算処理の高速化を図ることができる。 As described above, according to the present embodiment, Map processing and Reduce processing using the processing result of the Map processing can be collectively arranged in the same virtual machine, so the speed of distributed calculation processing can be further increased. Can be achieved.

（４）他の実施の形態
なお上述の第１〜第３の実施の形態においては、本発明を図１のように構成された分散計算システム１００に適用するようにした場合について述べたが、本発明はこれに限らず、この他種々の構成の分散計算システムに広く適用することができる。 (4) Other Embodiments In the above first to third embodiments, the case where the present invention is applied to the distributed computing system 100 configured as shown in FIG. 1 has been described. The present invention is not limited to this, and can be widely applied to other distributed computing systems having various configurations.

また上述の第１〜第３の実施の形態においては、Map処理及びReduce処理を複数の仮想マシンに分散配置する各種の配置パターンを検出する配置パターン検出部と、配置パターンごとのコストを計算するコスト計算部と、コスト計算部の計算結果に基づいて、コストが最小となる配置パターンを選択し、当該配置パターンに従ってMap処理及びReduce処理を複数の仮想マシンに分散配置する分散配置部との３つの機能を１つのジョブ配置処理部１１４により実現するようにした場合について述べたが、本発明はこれに限らず、例えば分散計算制御ノード１０１を複数用意し、かかる３つの機能をこれら複数の分散計算制御ノード１０１に分散させるようにしてもよい。 In the first to third embodiments described above, the arrangement pattern detection unit that detects various arrangement patterns that distribute and arrange Map processing and Reduce processing in a plurality of virtual machines, and the cost for each arrangement pattern are calculated. 3 of a cost calculation unit and a distributed arrangement unit that selects an arrangement pattern that minimizes the cost based on the calculation result of the cost calculation unit and distributes map processing and reduce processing to a plurality of virtual machines according to the arrangement pattern. The case where one function is realized by one job placement processing unit 114 has been described. However, the present invention is not limited to this. For example, a plurality of distributed calculation control nodes 101 are prepared, and these three functions are distributed to the plurality of distributed functions. You may make it distribute to the calculation control node 101. FIG.

本発明は、システム仮想化機能を用いて分散計算処理を行う種々の構成の分散計算システムに適用することができる。 The present invention can be applied to distributed computing systems having various configurations that perform distributed computing processing using a system virtualization function.

第１〜第３の実施の形態による分散計算システムの全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the distributed calculation system by 1st-3rd embodiment. 従来の分散計算システムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the conventional distributed calculation system. 分散処理の定義例を示すフロー図である。It is a flowchart which shows the example of a definition of distributed processing. （Ａ）及び（Ｂ）は、図１の分散計算システムにおける計算処理の分散配置例を示すフローグラフである。(A) And (B) is a flow graph which shows the example of distribution arrangement | positioning of the calculation process in the distributed calculation system of FIG. 図２の分散計算システムにおける計算処理の分散配置例を示すフローグラフである。3 is a flow graph showing an example of a distributed arrangement of calculation processing in the distributed calculation system of FIG. 2. 分散処理システムの構成例を示すブロック図である。It is a block diagram which shows the structural example of a distributed processing system. ジョブ配置処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a job arrangement | positioning process. 配置パターン集合の説明に供する図表である。It is a chart used for description of the arrangement pattern set. コスト計算処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a cost calculation process. 条件テーブルの構成例を示す概念図である。It is a conceptual diagram which shows the structural example of a condition table. 実行時間を条件とする要素コスト計算処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the element cost calculation process on condition of execution time. 通信路設定処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a communication path setting process. ジョブまとめ処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a job summary process.

１００……分散計算システム、１０１……分散計算制御ノード、１０２，１０３……分散計算ノード、１０４……ネットワーク、１１０，１２０，１３０……ハードウェア、１１０Ａ，１２０Ａ，１３０Ａ……ＣＰＵ、１１２……分散計算制御システム、１１３……分散計算定義、１１４……ジョブは一処理部、１２０Ｂ，１３０Ｂ……共有メモリ、１１１，１２１，１３１……外部記憶装置、１２２，１３２……システム仮想化機能、１２３，１２４，１３３，１３４……仮想マシン、３０１，４０２……Map処理、３０２，４０３……Reduce処理。
DESCRIPTION OF SYMBOLS 100 ... Distributed computing system, 101 ... Distributed computing control node, 102, 103 ... Distributed computing node, 104 ... Network, 110, 120, 130 ... Hardware, 110A, 120A, 130A ... CPU, 112 ... ... distributed calculation control system, 113 ... distributed calculation definition, 114 ... job is one processing unit, 120B, 130B ... shared memory, 111, 121, 131 ... external storage device, 122, 132 ... system virtualization function , 123, 124, 133, 134... Virtual machine, 301, 402... Map processing, 302, 403.

Claims

In a distributed calculation control apparatus that distributes and arranges a plurality of second calculation processes constituting the first calculation process in a plurality of virtual machines operating on one or a plurality of physical machines
An arrangement pattern detection unit for detecting various arrangement patterns for distributing and arranging the plurality of second calculation processes in the plurality of virtual machines;
A cost calculation unit for calculating a cost for each arrangement pattern;
A distribution arrangement unit that selects an arrangement pattern that minimizes the cost based on a calculation result of the cost calculation unit, and distributes and arranges the plurality of second calculation processes in the plurality of virtual machines according to the arrangement pattern. A distributed calculation control device.

The cost calculation unit
The distributed calculation control apparatus according to claim 1, wherein the cost for each of the arrangement patterns is calculated on the condition of at least one of the execution time, power consumption, and I / O amount.

The physical machine is
A shared memory shared by the plurality of virtual machines operating on the physical machine,
The distributed arrangement unit is
Based on the calculation result of the cost calculation unit, a plurality of the second calculation processes having an input / output relationship are arranged in the virtual machine operating on the same physical machine, and a plurality of the input / output relationships are provided. 2. The distributed calculation control device according to claim 1, wherein the corresponding physical machine is set to perform data transfer between the virtual machines in which the second calculation processing is arranged via the shared memory. 3. .

The distributed arrangement unit is
The plurality of second calculation processes having an input / output relationship are merged into one calculation process based on a calculation result of the cost calculation unit and arranged in the virtual machine. Distributed calculation control device.

In a distributed calculation control method for distributing and arranging a plurality of second calculation processes constituting the first calculation process in a plurality of virtual machines operating on one or a plurality of physical machines,
A first step of detecting various arrangement patterns in which the plurality of second calculation processes are distributed and arranged in the plurality of virtual machines;
A second step of calculating a cost for each arrangement pattern;
A third step of selecting an arrangement pattern that minimizes the cost based on a calculation result of the calculation, and distributing and arranging the plurality of second calculation processes in the plurality of virtual machines according to the arrangement pattern. A distributed calculation control method characterized by the above.

In the first step,
The distributed calculation control method according to claim 5, wherein a cost for each of the arrangement patterns is calculated on the condition of at least one of the execution time, power consumption, and I / O amount.

The physical machine is
A shared memory shared by the plurality of virtual machines operating on the physical machine,
In the third step,
Based on the calculation result of the cost calculation unit, a plurality of the second calculation processes having an input / output relationship are arranged in the virtual machine operating on the same physical machine, and a plurality of the input / output relationships are 6. The distributed calculation control method according to claim 5, wherein the corresponding physical machine is set so that data transfer between the virtual machines in which the second calculation processing is arranged is performed via the shared memory. .

In the third step,
The distributed calculation control method according to claim 5, wherein a plurality of the second calculation processes having an input / output relationship are merged into one calculation process and arranged in the virtual machine.