JP5331549B2

JP5331549B2 - Distributed processing system and distributed processing method

Info

Publication number: JP5331549B2
Application number: JP2009095062A
Authority: JP
Inventors: 晩煕趙; 一郎岡島; 博川上; 俊博鈴木; 大介越智; 智大永田; 基成小林; 勇輝大薮
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2009-04-09
Filing date: 2009-04-09
Publication date: 2013-10-30
Anticipated expiration: 2029-04-09
Also published as: JP2010244470A

Description

本発明は、複数のコンピュータ及び複数の中継装置を備える分散処理システム、並びにそのシステムにより実行される分散処理方法に関する。 The present invention relates to a distributed processing system including a plurality of computers and a plurality of relay devices, and a distributed processing method executed by the system.

従来から、分散処理システムの処理効率を高めるために様々な技術が開発されている。例えば下記特許文献１には、ソフトウェアの起動状態の変更に要する時間を考慮して、計算処理を割り当てるノードを決定する技術が開示されている。具体的には、この技術は、各ノードから取得した、起動中のソフトウェアを示す環境情報と、処理に必要なソフトウェアを示す環境条件情報とを比較し、新たなソフトウェアの起動が少ないノードを処理の割当先として優先的に選択するものである。 Conventionally, various techniques have been developed to increase the processing efficiency of a distributed processing system. For example, Patent Document 1 below discloses a technique for determining a node to which a calculation process is assigned in consideration of the time required to change the activation state of software. Specifically, this technology compares the environmental information indicating the active software acquired from each node with the environmental condition information indicating the software required for processing, and processes a node with fewer new software startups. Is preferentially selected as the allocation destination.

特開２００８−１４０１２０号公報JP 2008-140120 A

分散処理の種類によっては、ハードディスクなどからデータを読み出す速度や、データをコンピュータ間で移動させる際の通信速度が全体の処理速度を低下させる要因となる。例えば、マップリデュース（ＭａｐＲｅｄｕｃｅ）のグレップ（ｇｒｅｐ）処理のように中間データの生成量が少ない場合には、各コンピュータにおいてハードディスクからのデータ読み出し速度がボトルネックとなる。また、ＭａｐＲｅｄｕｃｅのソート（ｓｏｒｔ）処理のように中間データの生成量が多い場合には、Ｌ２スイッチなどの中継装置に大量のトラフィックが集中し、ボトルネックとなる。 Depending on the type of distributed processing, the speed at which data is read from a hard disk or the like, and the communication speed when data is moved between computers become factors that reduce the overall processing speed. For example, when the amount of intermediate data generated is small, such as a map reduction (grep) process, the data read speed from the hard disk becomes a bottleneck in each computer. Further, when a large amount of intermediate data is generated as in the MapReduce sort process, a large amount of traffic is concentrated on a relay device such as an L2 switch, which becomes a bottleneck.

本発明は上記課題を解決するためになされたものであり、データを高速に処理することが可能な分散処理システム及び分散処理方法を提供することを目的とする。 The present invention has been made to solve the above problems, and an object of the present invention is to provide a distributed processing system and a distributed processing method capable of processing data at high speed.

本発明の分散処理システムは、データを中継する中継装置と、該中継装置に直接接続されている少なくとも一つのコンピュータとを含むグループを複数備える分散処理システムであって、分散処理を実行した各コンピュータにおける、該処理の入力データのデータ量と、該処理の結果である出力データのデータ量とを取得する取得手段と、取得手段により取得された入力データ量及び出力データ量に基づいて分散処理前後におけるデータ量の変化を判定し、該処理によりデータ量が増加した場合には、該処理によりデータ量が減少する場合よりも、分散処理する複数の被分割データの分散度を低く設定する設定手段と、設定手段により設定された分散度が所定の閾値以下である場合には、分散処理する複数の被分割データを複数のグループのうち一のグループに配置し、そうでない場合には、該複数の被分割データを複数のグループのうち２以上のグループに配置する配置手段と、を備えることを特徴とする。 The distributed processing system of the present invention is a distributed processing system including a plurality of groups including a relay device that relays data and at least one computer that is directly connected to the relay device, and each computer that has executed the distributed processing Acquisition means for acquiring the amount of input data of the process and the amount of output data as a result of the process, and before and after distributed processing based on the input data amount and output data amount acquired by the acquisition means Setting means for determining a variance of a plurality of divided data to be distributed when a change in the amount of data is determined and the amount of data increases due to the processing, compared to a case where the amount of data decreases due to the processing When the degree of distribution set by the setting means is equal to or less than a predetermined threshold, a plurality of divided data to be distributed is assigned to a plurality of groups. Placed one group Chi, otherwise, characterized in that it comprises an arrangement means for arranging the data segments of the plurality of the two or more groups among the groups, the.

また、本発明の分散処理方法は、データを中継する中継装置と、該中継装置に直接接続されている少なくとも一つのコンピュータとを含むグループを複数備える分散処理システムにより実行される分散処理方法であって、分散処理を実行した各コンピュータにおける、該処理の入力データのデータ量と、該処理の結果である出力データのデータ量とを取得する取得ステップと、取得ステップにおいて取得された入力データ量及び出力データ量に基づいて分散処理前後におけるデータ量の変化を判定し、該処理によりデータ量が増加した場合には、該処理によりデータ量が減少する場合よりも、分散処理する複数の被分割データの分散度を低く設定する設定ステップと、設定ステップにおいて設定された分散度が所定の閾値以下である場合には、分散処理する複数の被分割データを複数のグループのうち一のグループに配置し、そうでない場合には、該複数の被分割データを複数のグループのうち２以上のグループに配置する配置ステップと、を含むことを特徴とする。 The distributed processing method of the present invention is a distributed processing method executed by a distributed processing system including a plurality of groups including a relay device that relays data and at least one computer directly connected to the relay device. The acquisition step of acquiring the data amount of the input data of the processing and the data amount of the output data as a result of the processing in each computer that has executed the distributed processing, the input data amount acquired in the acquisition step, and When a change in the data amount before and after the distributed processing is determined based on the output data amount and the data amount is increased by the processing, a plurality of pieces of divided data to be distributed are processed rather than when the data amount is decreased by the processing. A setting step for setting the degree of dispersion of a low value, and when the degree of dispersion set in the setting step is equal to or less than a predetermined threshold Arranging a plurality of divided data to be distributed in one group among a plurality of groups, if not, arranging the plurality of divided data in two or more groups among a plurality of groups; It is characterized by including.

このような分散処理システム及び分散処理方法によれば、各コンピュータにおける入力データ量及び出力データ量に基づいて、分散処理によりデータ量が増加したか否かが判定され、この判定に基づいて被分割データの分散度が決定される。分散度は、データ量が増加した場合には相対的に低く設定され、データ量が減少した場合には相対的に高く設定される。そして、設定された分散度が所定の閾値以下である場合には被分割データが一つのグループに集約され、そうでない場合には被分割データは複数のグループに分散される。 According to such a distributed processing system and distributed processing method, it is determined whether the data amount has increased due to the distributed processing based on the input data amount and the output data amount in each computer. The degree of data distribution is determined. The degree of dispersion is set relatively low when the amount of data increases, and is set relatively high when the amount of data decreases. If the set degree of dispersion is equal to or less than a predetermined threshold, the divided data is aggregated into one group, and otherwise, the divided data is distributed into a plurality of groups.

データ量が増えた場合には、その後の処理においてネットワークトラフィックを抑制することが処理の高速化につながる。一方、データ量が減少した場合には、その後に処理するデータを極力多くのコンピュータに分散させてシステムの処理能力を最大限に引き出すことが処理の高速化につながる。したがって、処理前後におけるデータ量の変化に基づいて分散度を設定し、その分散度に基づいて被分割データを配置することで、上記のような柔軟なデータ配置を実現でき、その結果、データを高速に処理することが可能になる。 When the amount of data increases, suppressing network traffic in subsequent processing leads to faster processing. On the other hand, when the amount of data decreases, distributing the data to be processed thereafter to as many computers as possible and maximizing the processing capacity of the system leads to faster processing. Therefore, by setting the degree of dispersion based on the change in the amount of data before and after processing and arranging the divided data based on the degree of dispersion, it is possible to realize the flexible data arrangement as described above, and as a result, the data It becomes possible to process at high speed.

本発明の分散処理システムでは、分散処理がマップリデュース（ＭａｐＲｅｄｕｃｅ）のプログラミングモデルにより実行されることが好ましい。 In the distributed processing system of the present invention, it is preferable that the distributed processing is executed by a map reduction programming model.

この場合、マップリデュース（ＭａｐＲｅｄｕｃｅ）処理においてデータを高速に処理することができる。 In this case, data can be processed at high speed in the map reduce process.

このような分散処理システム及び分散処理方法によれば、処理前後におけるデータ量の変化に基づいて分散度が設定され、その分散度に基づいて被分割データが配置されるので、データを高速に処理することができる。 According to such a distributed processing system and distributed processing method, the degree of distribution is set based on the change in the amount of data before and after the processing, and the divided data is arranged based on the degree of dispersion, so that the data is processed at high speed. can do.

実施形態に係る分散処理システムの全体構成を示す図である。1 is a diagram illustrating an overall configuration of a distributed processing system according to an embodiment. 図１に示すマスタノードの機能構成を示す図である。It is a figure which shows the function structure of the master node shown in FIG. 図１に示すマスタノードのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of the master node shown in FIG. 図２に示す分散度テーブルに記憶されている情報の例を示す図である。It is a figure which shows the example of the information memorize | stored in the dispersion degree table shown in FIG. 図１に示すスレーブノードの機能構成を示す図である。It is a figure which shows the function structure of the slave node shown in FIG. 図５に示すラック構成テーブルに記憶されているグループ情報の例を示す図である。It is a figure which shows the example of the group information memorize | stored in the rack structure table shown in FIG. 図１に示す分散処理システムの動作を示すシーケンス図である。It is a sequence diagram which shows operation | movement of the distributed processing system shown in FIG.

以下、添付図面を参照しながら本発明の実施形態を詳細に説明する。なお、図面の説明において同一又は同等の要素には同一の符号を付し、重複する説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the description of the drawings, the same or equivalent elements are denoted by the same reference numerals, and redundant description is omitted.

まず、図１〜６を用いて、実施形態に係る分散処理システム１の機能及び構成を説明する。 First, the function and configuration of the distributed processing system 1 according to the embodiment will be described with reference to FIGS.

分散処理システム１は、マップリデュース（ＭａｐＲｅｄｕｃｅ）というプログラミングモデルを用いて分散処理を実行するコンピュータシステムである。なお、処理されるデータは、所定の手法により予め分散処理システム１内に分散配置されるものとし、以下では、分散配置されたデータを被分割データという。分散処理システム１は、図１に示すように、１個のマスタノード１０と、複数のスレーブノード２０と、各ノード間を接続する複数のネットワークスイッチ（以下では単に「スイッチ」という）３０とを備えている。マスタノード１０は分散処理を統括するコンピュータであり、スレーブノード２０はデータを実際に処理するコンピュータである。スイッチ３０は、ノード間で伝送されるデータを中継する装置であり、例えばＬ２スイッチ、Ｌ３スイッチ、ルータなどである。 The distributed processing system 1 is a computer system that executes distributed processing using a programming model called MapReduce. The data to be processed is assumed to be distributed in advance in the distributed processing system 1 by a predetermined method. Hereinafter, the distributed data is referred to as divided data. As shown in FIG. 1, the distributed processing system 1 includes one master node 10, a plurality of slave nodes 20, and a plurality of network switches (hereinafter simply referred to as “switches”) 30 that connect the nodes. I have. The master node 10 is a computer that supervises distributed processing, and the slave node 20 is a computer that actually processes data. The switch 30 is a device that relays data transmitted between nodes, and is, for example, an L2 switch, an L3 switch, or a router.

マスタノード１０、各スレーブノード２０及び各スイッチ３０は、複数あるラックＲのうちのいずれか一つの中に格納されている。図１の例では、分散処理システム１は１個のマスタノード１０、１１個のスレーブノード２０、４個のスイッチ３０及び４個のラックＲを含んで構成されている。各ラックＲには３個のノード（マスタノード１０又はスレーブノード２０）と１個のスイッチ３０が格納されており、各ラックＲ内では、スイッチ３０に各ノードが直接接続されている。一つのラックＲ内に格納されている複数のスレーブノード２０は、一つのグループ（以下では「ノードグループ」という）を形成している。 The master node 10, each slave node 20, and each switch 30 are stored in any one of a plurality of racks R. In the example of FIG. 1, the distributed processing system 1 is configured to include one master node 10, 11 slave nodes 20, 4 switches 30, and 4 racks R. Each rack R stores three nodes (master node 10 or slave node 20) and one switch 30, and each node is directly connected to the switch 30 in each rack R. The plurality of slave nodes 20 stored in one rack R form one group (hereinafter referred to as “node group”).

次に、マスタノード１０について説明する。図２に示すように、マスタノード１０は機能的構成要素として受付部１１、分散度テーブル１２、指示部１３及び分散度設定部（設定手段）１４を備えている。 Next, the master node 10 will be described. As shown in FIG. 2, the master node 10 includes a reception unit 11, a distribution degree table 12, an instruction unit 13, and a distribution degree setting unit (setting unit) 14 as functional components.

図３に示すように、マスタノード１０は、オペレーティングシステムやアプリケーションプログラムなどを実行するＣＰＵ１０１と、ＲＯＭ及びＲＡＭで構成される主記憶部１０２と、ハードディスクなどで構成される補助記憶部１０３と、ネットワークカードなどで構成される通信制御部１０４と、キーボードやマウスなどの入力部１０５と、モニタなどの出力部１０６とで構成される。マスタノード１０の各機能は、ＣＰＵ１０１や主記憶部１０２の上に所定のソフトウェアを読み込ませ、ＣＰＵ１０１の制御の下で通信制御部１０４を動作させ、主記憶部１０２や補助記憶部１０３におけるデータの読み出し及び書き込みを行うことで実現される。 As shown in FIG. 3, the master node 10 includes a CPU 101 that executes an operating system, application programs, and the like, a main storage unit 102 that includes a ROM and a RAM, an auxiliary storage unit 103 that includes a hard disk, and a network. The communication control unit 104 includes a card, an input unit 105 such as a keyboard and a mouse, and an output unit 106 such as a monitor. Each function of the master node 10 reads predetermined software on the CPU 101 and the main storage unit 102, operates the communication control unit 104 under the control of the CPU 101, and stores data in the main storage unit 102 and the auxiliary storage unit 103. This is realized by reading and writing.

図２に戻って、受付部１１は、ユーザ操作により入力されたり、他のコンピュータシステム（図示せず）から入力された分散処理の実行指示を受け付ける部分である。この実行指示には、分散処理プログラムを識別するアプリケーション情報が含まれ、場合によっては、分散度を示すパラメータも含まれる。ここで、分散度とは、分散処理される複数の中間データ（分散処理の途中結果）をどの程度分散させて配置するかを示す数値である。分散度が小さければ、中間データは狭い範囲（例えば一つのノードグループ（ラックＲ）内）に配置され、分散度が大きければ中間データは広い範囲（例えばシステム１内の全スレーブノード２０）に配置される。受付部１１は、受け付けた実行指示を指示部１３に出力する。 Returning to FIG. 2, the accepting unit 11 is a part that accepts a distributed processing execution instruction input by a user operation or input from another computer system (not shown). This execution instruction includes application information for identifying the distributed processing program and, in some cases, includes a parameter indicating the degree of distribution. Here, the degree of distribution is a numerical value that indicates how much a plurality of intermediate data to be distributed (intermediate results of distributed processing) are distributed and arranged. If the degree of distribution is small, the intermediate data is arranged in a narrow range (for example, in one node group (rack R)), and if the degree of dispersion is large, the intermediate data is arranged in a wide range (for example, all slave nodes 20 in the system 1). Is done. The accepting unit 11 outputs the accepted execution instruction to the instructing unit 13.

分散度テーブル１２は、アプリケーション毎に設定された分散度を記憶する部分である。例えば、分散度テーブル１２は図４に示すようなアプリケーションの分散度を記憶する。図４に従えば、アプリケーションＡを実行する場合には中間データは特定のスレーブノードのみから成る群に集められ、アプリケーションＢを実行する場合には中間データはそれよりも広い範囲のスレーブノード群に配置される。また、アプリケーションＣを実行する場合には、アプリケーションＢの場合よりも更に広い範囲のスレーブノード群に配置されることになる。 The degree of dispersion table 12 is a part that stores the degree of dispersion set for each application. For example, the degree of distribution table 12 stores the degree of application dispersion as shown in FIG. According to FIG. 4, when the application A is executed, the intermediate data is collected into a group consisting of only a specific slave node, and when the application B is executed, the intermediate data is collected into a wider range of slave nodes. Be placed. Further, when executing the application C, it is arranged in a wider range of slave nodes than in the case of the application B.

指示部１３は、各スレーブノード２０に分散処理の実行を指示する部分である。具体的には、指示部１３は入力されたアプリケーション情報と、そのアプリケーションに対応する分散度とを含む指示信号を各スレーブノード２０に送信する。 The instruction unit 13 is a part that instructs each slave node 20 to execute distributed processing. Specifically, the instruction unit 13 transmits an instruction signal including the input application information and the degree of distribution corresponding to the application to each slave node 20.

分散度が入力された場合には、指示部１３はその分散度、すなわちユーザ又は他のシステムにより指定された分散度を指示信号に含めて送信する。一方、分散度が入力されなかった場合には、指示部１３はアプリケーション情報に対応する分散度を分散度テーブル１２から読み出し、取得できた場合にはその分散度を指示信号に含めて送信する。取得できなかった場合には、指示部１３は予め保持している分散度のデフォルト値を指示信号に含めて送信する。なお、分散度が取得できない現象は、例えば、あるアプリケーションが分散処理システム１において初めて実行される場合などに起こり得る。 When the degree of dispersion is input, the instruction unit 13 transmits the degree of dispersion, that is, the degree of dispersion designated by the user or another system in the instruction signal. On the other hand, when the degree of distribution is not input, the instruction unit 13 reads out the degree of dispersion corresponding to the application information from the degree-of-dispersion table 12, and when it can be obtained, transmits the degree of dispersion included in the instruction signal. If it cannot be obtained, the instruction unit 13 transmits a default value of the degree of dispersion that is held in advance in the instruction signal. A phenomenon in which the degree of distribution cannot be acquired may occur, for example, when an application is executed for the first time in the distributed processing system 1.

分散度設定部１４は、各スレーブノード２０から受信した入力データ量及び出力データ量に基づいて分散処理前後におけるデータ量の変化を判定し、その処理によりデータ量が増加した場合には、データ量が減少する場合よりも、分散処理する複数の被分割データの分散度を低く設定する部分である。なお、入力データ量とは被分割データのデータ量であり、出力データ量とは演算結果データのデータ量であるが、これらについては以下で改めて説明する。 The degree-of-distribution setting unit 14 determines a change in the data amount before and after the distributed processing based on the input data amount and the output data amount received from each slave node 20, and if the data amount increases due to the processing, the data amount This is a part in which the degree of distribution of a plurality of divided data to be distributed is set to be lower than that in the case where the decrease is. The input data amount is the data amount of the divided data, and the output data amount is the data amount of the operation result data. These will be described later again.

分散度設定部１４は、入力データ量と出力データ量との大小関係を検査し、出力データ量の方が大きい場合には分散度を所定値ａ以下に設定し、そうでなければ分散度を所定値ａよりも大きい値に設定する。例えば、分散度設定部１４は、各スレーブノード２０について、入力データ量に対する出力データ量の比ｒ＝（出力データ量）／（入力データ量）を算出し、その比ｒの平均値ｒ_ａを算出する。そしてｒ_ａ＞１であれば分散度を１に設定し、ｒ_ａ≦１であれば、ｒ_ａが小さくなるほど分散度が大きくなるように、分散度を２以上に設定する。また、分散度設定部１４は、所定の時間内に受信した入力データ量及び出力データ量のみ、すなわち一部のスレーブノード２０から受信したデータ量のみに基づいて、上記と同様に平均値ｒ_ａを算出し、分散度を設定してもよい。 The degree-of-dispersion setting unit 14 checks the magnitude relation between the amount of input data and the amount of output data. If the amount of output data is larger, the degree of dispersion is set to a predetermined value a or less. A value larger than the predetermined value a is set. For example, the dispersion degree setting unit 14, for each slave node 20, calculates the ratio of the amount of output data to the input data amount r = (output data amount) / (input data amount), the average value r _a of the ratio r calculate. And set to 1 degree of dispersion if r _a> 1, if r _a ≦ 1, as the degree of dispersion higher r _a smaller increase, sets the degree of dispersion in two or more. Further, the dispersity setting unit 14, an input data amount received within a predetermined time and the amount of output data only, i.e. only on the basis of the amount of data received from some of the slave node 20, similarly to the above average value r _a May be calculated to set the degree of dispersion.

続いて、分散度設定部１４は、算出した分散度と、受付部により受け付けられたアプリケーション情報とを関連付けて分散度テーブル１２に記憶する。これにより、実行中のアプリケーションの分散度が登録又は更新され、次回に同じアプリケーションに関する実行指示が入力されたときに参照される。 Subsequently, the dispersity setting unit 14 stores the calculated dispersity and the application information received by the receiving unit in the dispersity table 12 in association with each other. As a result, the degree of distribution of the application being executed is registered or updated, and is referred to the next time an execution instruction regarding the same application is input.

次に、スレーブノード２０について説明する。図５に示すように、スレーブノード２０は機能的構成要素としてラック構成テーブル２１、処理部（配置手段）２２及び取得部（取得手段）２３を備えている。スレーブノード２０のハードウェア構成は図３に示すものと同様であり、スレーブノード２０の各機能がハードウェア上でどのように実現されるかも、マスタノード１０と同様である。 Next, the slave node 20 will be described. As shown in FIG. 5, the slave node 20 includes a rack configuration table 21, a processing unit (arranging unit) 22, and an acquiring unit (acquiring unit) 23 as functional components. The hardware configuration of the slave node 20 is the same as that shown in FIG. 3, and how each function of the slave node 20 is realized on the hardware is the same as that of the master node 10.

ラック構成テーブル２１は、スイッチ３０と、そのスイッチ３０に直接接続されているスレーブノード２０（ノードグループ）とに関する情報を記憶する部分である。具体的には、ラック構成テーブル２１は、スイッチ３０を識別するスイッチＩＤと、そのスイッチ３０に直接接続されているスレーブノード２０を識別するノードＩＤと、そのスレーブノード２０で構成されるノードグループを識別するラックＩＤとが互いに関連付けられたグループ情報を記憶する。分散処理システム１が図１に示す構成であれば、ラック構成テーブル２１は図６に示すグループ情報を記憶する。この情報から、例えば、ノードＩＤが「ａ」，「ｂ」、「ｃ」である３個のスレーブノード２０が一つのノードグループを形成し、これらのノード２０は「ＳＷ１」のスイッチＩＤで特定されるスイッチ３０に直接接続されていることなどを知ることができる。 The rack configuration table 21 is a part that stores information regarding the switch 30 and the slave node 20 (node group) directly connected to the switch 30. Specifically, the rack configuration table 21 includes a switch ID for identifying the switch 30, a node ID for identifying the slave node 20 directly connected to the switch 30, and a node group including the slave node 20. Group information in which rack IDs to be identified are associated with each other is stored. If the distributed processing system 1 has the configuration shown in FIG. 1, the rack configuration table 21 stores the group information shown in FIG. From this information, for example, three slave nodes 20 with node IDs “a”, “b”, and “c” form one node group, and these nodes 20 are identified by the switch ID of “SW1”. It is possible to know that the switch 30 is directly connected.

なお、ラック構成テーブル２１の情報は、グループ情報を管理するマスタノード１０から所定のタイミングで送信されてくる。この情報がラック構成テーブル２１に記憶されることで、マスタノード１０とスレーブノード２０との間でラック構成テーブルの同期が取られる。 The information in the rack configuration table 21 is transmitted at a predetermined timing from the master node 10 that manages the group information. By storing this information in the rack configuration table 21, the rack configuration table is synchronized between the master node 10 and the slave node 20.

処理部２２は、マスタノード１０から送られてきた情報に基づいて、被分割データに対する演算を実行する部分である。処理部２２は、マスタノード１０から送られてきた指示信号を受信し、アプリケーション情報及び分散度を抽出する。続いて、処理部２２はそのアプリケーション情報と予め記憶している被分割データとに基づいて、自ノード内での処理が必要か否かを判定する。このとき、処理不要と判定した場合には、処理部２２は他のノードからの要求があるまで待機する。一方、処理が必要と判定した場合には、処理部２２はアプリケーション情報で示されるアプリケーションプログラムを実行して被分割データに対する所定の演算（マップ（Ｍａｐ）処理）を実行する。 The processing unit 22 is a part that performs an operation on the divided data based on the information sent from the master node 10. The processing unit 22 receives the instruction signal sent from the master node 10 and extracts the application information and the degree of distribution. Subsequently, the processing unit 22 determines whether or not processing in the own node is necessary based on the application information and the divided data stored in advance. At this time, if it is determined that the processing is not necessary, the processing unit 22 waits until there is a request from another node. On the other hand, if it is determined that processing is necessary, the processing unit 22 executes an application program indicated by the application information and executes a predetermined calculation (map processing) on the divided data.

処理部２２は、被分割データをＭａｐ処理して演算結果データを生成する。そして、当該演算結果データの一部ごとにＲｅｄｕｃｅ処理を行う。 The processing unit 22 performs map processing on the divided data and generates calculation result data. Then, Reduce processing is performed for each part of the calculation result data.

その際、処理部２２は入力された分散度とラック構成テーブル２１から読み出したグループ情報とに基づいて、次に分散処理を実行するノードグループを選択する。ノードグループの選択方法は一つに限定されるものではない。例えば、処理部２２は、分散度が閾値ｃ以下の場合には１個のノードグループのみを選択し、分散度が閾値ｃを超える場合には２個以上のノードグループを選択する。なお、具体的にどのノードグループを指定するかも任意であるが、記憶している被分割データの個数が最も多いノードグループが含まれるのが好ましい。 At this time, the processing unit 22 selects a node group to execute the next distributed processing based on the input degree of distribution and the group information read from the rack configuration table 21. The node group selection method is not limited to one. For example, the processing unit 22 selects only one node group when the degree of distribution is less than or equal to the threshold value c, and selects two or more node groups when the degree of dispersion exceeds the threshold value c. In addition, although it is arbitrary which node group is specifically designated, it is preferable that the node group with the largest number of stored divided data is included.

ここで、閾値ｃは、上記平均値ｒ_ａが１を超えるか否か、すなわち分散処理前後でデータ量が増加したか否かの基準に合わせて設定してもよい。この場合には、データ量が増加した場合には１個のノードグループのみが選択され、そうでない場合には複数のノードグループが選択されることになる。もちろん、閾値ｃを、上記平均値ｒ_ａが１を超えるか否かとは関係なく設定してもよい。この場合には、データ量が一定の割合以上増加した場合に初めて、１個のノードグループのみが選択されることになる。 Here, the threshold value c is whether the average value r _a is greater than 1, i.e. may be set in accordance with the distributed processing whether the reference data amount is increased in the front and rear. In this case, when the amount of data increases, only one node group is selected. Otherwise, a plurality of node groups are selected. Of course, the threshold c, the average value r _a may be set regardless of whether more than one. In this case, only one node group is selected only when the amount of data increases by a certain percentage or more.

続いて、処理部２２は選択したノードグループに含まれるスレーブノード２０のうちの一つをリデューサー（Ｒｅｄｕｃｅｒ）として選択する。そして、処理部２２は選択したスレーブノード２０に演算結果データの一部を送信する。なお、送信される演算結果データは、更なる処理が必要な中間データであるといえる。処理部２２は、このようなＲｅｄｕｃｅｒの選択及び送信処理を、演算結果データに対する処理がすべて終了するまで繰り返し実行する。 Subsequently, the processing unit 22 selects one of the slave nodes 20 included in the selected node group as a reducer. Then, the processing unit 22 transmits a part of the operation result data to the selected slave node 20. It can be said that the operation result data to be transmitted is intermediate data that requires further processing. The processing unit 22 repeatedly performs such reduction selection and transmission processing until all the processing on the operation result data is completed.

各スレーブノード２０の処理部２２がこのような処理を実行することで、Ｍａｐ処理された複数の中間データはいずれかのラックＲ内のいずれかのスレーブノード２０に配置され、その後は、従来のＭａｐＲｅｄｕｃｅプログラミングモデルによりリデュース（Ｒｅｄｕｃｅ）処理が実行される。すなわち、処理部２２は配置手段として機能する。 When the processing unit 22 of each slave node 20 executes such processing, the plurality of intermediate data subjected to the map processing is arranged in any of the slave nodes 20 in any of the racks R, and thereafter A Reduce process is executed according to the MapReduce programming model. That is, the processing unit 22 functions as an arrangement unit.

取得部２３は、処理部２２により処理された被分割データ（入力データ）のデータ量と、演算結果データ（出力データ）のデータ量とを取得する部分である。取得部２３は処理部２２の演算処理を監視することでこれらのデータ量を取得し、マスタノード１０に送信する。 The acquisition unit 23 is a part that acquires the data amount of the divided data (input data) processed by the processing unit 22 and the data amount of calculation result data (output data). The acquisition unit 23 acquires these data amounts by monitoring the arithmetic processing of the processing unit 22 and transmits the data amounts to the master node 10.

次に、図７を用いて、図１に示す分散処理システムの動作を説明するとともに本実施形態に係る分散処理方法について説明する。 Next, the operation of the distributed processing system shown in FIG. 1 will be described with reference to FIG. 7, and the distributed processing method according to the present embodiment will be described.

マスタノード１０において、受付部１１が実行指示を受け付けると（ステップＳ１１）、指示部１３が指定されたアプリケーションに対応する分散度を取得する（ステップＳ１２）。具体的には、指示部１３は、分散度テーブル１２から分散度を読み出すか、実行指示から分散度を抽出するか、又は予め保持している分散度のデフォルト値を用いることで、分散度を取得する。続いて、指示部１３はアプリケーション情報及び分散度をすべてのスレーブノード２０に送信する（ステップＳ１３）。 In the master node 10, when the accepting unit 11 accepts an execution instruction (step S11), the instructing unit 13 acquires the degree of dispersion corresponding to the designated application (step S12). Specifically, the instruction unit 13 reads the degree of dispersion from the degree-of-dispersion table 12, extracts the degree of dispersion from the execution instruction, or uses the default value of the degree of dispersion that is held in advance, thereby determining the degree of dispersion. get. Subsequently, the instruction unit 13 transmits the application information and the degree of distribution to all the slave nodes 20 (Step S13).

各スレーブノード２０では、処理部２２がそのアプリケーション情報と予め記憶している被分割データとに基づいて処理の要否を判定し、処理が必要である場合には、まず被分割データに対してＭａｐ処理を実行する（ステップＳ１４）。このとき、取得部２３が、入力データである被分割データのデータ量と、出力データである演算結果データのデータ量とを取得する（ステップＳ１４、取得ステップ）。 In each slave node 20, the processing unit 22 determines whether or not processing is necessary based on the application information and the pre-stored divided data. Map processing is executed (step S14). At this time, the acquisition unit 23 acquires the data amount of the divided data that is input data and the data amount of calculation result data that is output data (step S14, acquisition step).

続いて、処理部２２は、演算結果データを一部ずつ送信するために、分散度とラック構成テーブル２１から読み出したグループ情報とに基づいて一又は複数のノードグループを選択し（ステップＳ１５、配置ステップ）、選択したノードグループから一のスレーブノードを選択する（ステップＳ１６、配置ステップ）。そして、処理部２２はそのスレーブノード２０に演算結果データの一部を送信する（ステップＳ１７、配置ステップ）。上記ステップＳ１５〜Ｓ１７の処理は、演算結果データに対する処理がすべて終了するまで繰り返し実行される。 Subsequently, the processing unit 22 selects one or a plurality of node groups based on the degree of distribution and the group information read from the rack configuration table 21 in order to transmit the calculation result data part by part (step S15, arrangement). Step), one slave node is selected from the selected node group (Step S16, placement step). And the process part 22 transmits a part of calculation result data to the slave node 20 (step S17, arrangement | positioning step). The processes of steps S15 to S17 are repeatedly executed until all the processes for the calculation result data are completed.

各スレーブノード２０の処理部２２が上述した処理を実行することで、Ｍａｐ処理された複数の中間データは一又は複数のグループ内に集められ、その後は、そのグループ内のスレーブノード２０においてＲｅｄｕｃｅ処理が実行される（ステップＳ１８）。なお、図７では「選択されたスレーブノード」を一つしか示していないが、実際にはそのようなスレーブノード２０は複数存在する。 When the processing unit 22 of each slave node 20 executes the above-described processing, the plurality of intermediate data subjected to the map processing are collected in one or a plurality of groups, and thereafter, the reduction processing is performed in the slave nodes 20 in the group. Is executed (step S18). FIG. 7 shows only one “selected slave node”, but actually there are a plurality of such slave nodes 20.

Ｍａｐ処理を実行したスレーブノード２０では、上記ステップＳ１５〜Ｓ１７の処理と並行して、取得部２３が被分割データ及び演算結果データのデータ量をマスタノード１０に送信する（ステップＳ１９）。 In the slave node 20 that has executed the Map process, in parallel with the processes of Steps S15 to S17, the acquisition unit 23 transmits the data amounts of the divided data and the operation result data to the master node 10 (Step S19).

これを受けてマスタノード１０では、分散度設定部１４がそれらのデータ量の比を算出し、その比に基づいて分散度を設定する（ステップＳ２０、設定ステップ）。例えば、分散処理によりデータ量が増加した場合には、分散度設定部１４は、次回に同じアプリケーションが実行されたときに中間データが一のノードグループ（ラックＲ）に集まるように分散度を設定する。一方、分散処理によりデータ量が減少した場合には、分散度設定部１４は、次回に同じアプリケーションが実行されたときに２以上のノードグループに中間データが分散するように、データ量が増加した場合よりも高い分散度を設定する。分散度設定部１４は、このように設定した分散度を分散度テーブル１２に記憶する（ステップＳ２１、設定ステップ）。この分散度は、同じアプリケーションの実行指示が入力されたときに参照される。 In response to this, in the master node 10, the degree-of-dispersion setting unit 14 calculates a ratio between the data amounts, and sets the degree of dispersion based on the ratio (step S20, setting step). For example, when the amount of data increases due to distributed processing, the degree-of-distribution setting unit 14 sets the degree of distribution so that intermediate data is collected in one node group (rack R) the next time the same application is executed. To do. On the other hand, when the amount of data decreases due to distributed processing, the degree-of-distribution setting unit 14 increases the amount of data so that intermediate data is distributed to two or more node groups the next time the same application is executed. Set a higher degree of dispersion than the case. The dispersity setting unit 14 stores the dispersity set in this way in the dispersity table 12 (step S21, setting step). This degree of distribution is referred to when an execution instruction for the same application is input.

以上説明したように、本実施形態によれば、各スレーブノード２０における被分割データ及び演算結果データのデータ量に基づいて、分散処理によりデータ量が増加したか否かが判定され、この判定に基づいて中間データの分散度が決定される。分散度は、データ量が増加した場合には相対的に低く設定され（例えば１に設定）、データ量が減少した場合には相対的に高く設定される（例えば３に設定）。そして、設定された分散度が所定の閾値以下である場合には中間データが一つのグループに集約され、そうでない場合には複数のグループに分散される。 As described above, according to the present embodiment, it is determined whether or not the data amount has increased due to the distributed processing based on the data amount of the divided data and the operation result data in each slave node 20. Based on this, the degree of distribution of the intermediate data is determined. The degree of dispersion is set relatively low when the data amount increases (for example, set to 1), and relatively high when the data amount decreases (for example, set to 3). If the set degree of dispersion is equal to or less than a predetermined threshold, the intermediate data is aggregated into one group, and if not, it is distributed into a plurality of groups.

データ量が増えた場合には、データが複数のスイッチ３０を経由しないように一つのグループ（ラックＲ）内で分散処理することが、高速化につながる。一方、データ量が減少した場合には、どのグループ（ラックＲ）かを考慮することなく極力多くのスレーブノード２０にデータを分散させてシステム１の処理能力を最大限に引き出すことが、処理の高速化につながる。したがって、処理前後におけるデータ量の変化に基づいて分散度を設定し、その分散度に基づいて中間データを配置することで、上記のような柔軟なデータ配置を実現でき、その結果、データを高速に処理することが可能になる。 When the amount of data increases, distributed processing within one group (rack R) so that the data does not pass through the plurality of switches 30 leads to an increase in speed. On the other hand, when the amount of data decreases, it is possible to maximize the processing capacity of the system 1 by distributing the data to as many slave nodes 20 as possible without considering which group (rack R). It leads to high speed. Therefore, by setting the degree of dispersion based on the change in the amount of data before and after processing and placing intermediate data based on the degree of dispersion, flexible data placement as described above can be realized, and as a result, data can be transferred at high speed. Can be processed.

例えば、分散処理システム１でｓｏｒｔ処理を実行した場合には演算後のデータ量の方が多くなるので、中間データ（演算結果データ）は一つのグループ（ラックＲ）に集まり、その結果、ラックを跨って移動するデータが大量に発生してスイッチ３０にトラフィックが集中するのを回避できる。一方、ｇｒｅｐ処理の場合には演算後のデータの方が少なくなるので、中間データは複数のグループ（ラックＲ）に分散され、その結果、処理効率が上がる。これに対して、従来の手法を用いてｓｏｒｔ処理した場合には、中間データは複数のグループ（ラックＲ）に分散するので、スイッチに大量のトラフィックが集中し、全体として処理速度が低下する。 For example, when the sort processing is executed in the distributed processing system 1, the amount of data after the calculation is larger, so the intermediate data (calculation result data) is gathered in one group (rack R). It is possible to avoid the occurrence of a large amount of data moving across and the concentration of traffic on the switch 30. On the other hand, in the case of the grep process, since the data after the operation is smaller, the intermediate data is distributed to a plurality of groups (rack R), and as a result, the processing efficiency is improved. On the other hand, when the sort process is performed using the conventional method, the intermediate data is distributed to a plurality of groups (rack R), so that a large amount of traffic is concentrated on the switch, and the processing speed is lowered as a whole.

また、本実施形態によれば、分散度がアプリケーション毎に自動的に設定されるので、従来のように手動で分散度を設定する場合と比べて、より最適な分散度を柔軟に設定することが可能になる。 In addition, according to the present embodiment, since the degree of dispersion is automatically set for each application, a more optimal degree of dispersion can be flexibly set as compared with the case where the degree of dispersion is manually set as in the past. Is possible.

また、本実施形態によれば、演算結果データを一つのグループ内に配置する際には、被分割データを最も多く記憶しているグループに演算結果データが集められるので、一グループ内にデータを移動する際の通信量や処理量を低減することができる。 Further, according to the present embodiment, when the operation result data is arranged in one group, the operation result data is collected in the group that stores the most divided data, so that the data is stored in one group. The amount of communication and the amount of processing when moving can be reduced.

以上、本発明をその実施形態に基づいて詳細に説明した。しかし、本発明は上記実施形態に限定されるものではない。本発明は、その要旨を逸脱しない範囲で以下のような様々な変形が可能である。 The present invention has been described in detail based on the embodiments. However, the present invention is not limited to the above embodiment. The present invention can be modified in various ways as described below without departing from the scope of the invention.

本発明は、ＭａｐＲｅｄｕｃｅ以外の手法により分散処理を実行する場合にも適用し得る。 The present invention can also be applied to a case where distributed processing is executed by a method other than MapReduce.

１…分散処理システム、１０…マスタノード、１１…受付部、１２…分散度テーブル、１３…指示部、１４…分散度設定部（設定手段）、２０…スレーブノード（コンピュータ）、２１…ラック構成テーブル、２２…処理部（配置手段）、２３…取得部（取得手段）、３０…スイッチ（中継装置）。 DESCRIPTION OF SYMBOLS 1 ... Distributed processing system, 10 ... Master node, 11 ... Reception part, 12 ... Dispersion degree table, 13 ... Instruction part, 14 ... Dispersion degree setting part (setting means), 20 ... Slave node (computer), 21 ... Rack structure Table, 22... Processing unit (arrangement unit), 23... Acquisition unit (acquisition unit), 30.

Claims

A distributed processing system comprising a plurality of groups including a relay device for relaying data and at least one computer directly connected to the relay device,
An acquisition means for acquiring the amount of input data of the processing and the amount of output data as a result of the processing in each computer that has performed distributed processing;
A change in the data amount before and after the distributed processing is determined based on the input data amount and the output data amount acquired by the acquisition means, and when the data amount increases by the processing, the data amount decreases by the processing. Setting means for setting the degree of dispersion of a plurality of divided data to be distributed lower than the case,
If the degree of distribution set by the setting means is less than or equal to a predetermined threshold, a plurality of divided data to be distributed is arranged in one group among the plurality of groups, and if not, the plurality of divided data Arrangement means for arranging the divided data in two or more groups among the plurality of groups;
A distributed processing system comprising:

The distributed processing is executed by a map reduction (MapReduce) programming model.
The distributed processing system according to claim 1.

A distributed processing method executed by a distributed processing system including a plurality of groups including a relay device that relays data and at least one computer directly connected to the relay device,
In each of the computers that have executed distributed processing, an acquisition step of acquiring the amount of input data of the processing and the amount of output data that is the result of the processing;
A change in the data amount before and after the distributed processing is determined based on the input data amount and the output data amount acquired in the acquisition step, and when the data amount increases by the processing, the data amount decreases by the processing. A setting step for setting the degree of dispersion of the plurality of divided data to be distributed lower than the case,
When the degree of distribution set in the setting step is equal to or less than a predetermined threshold, a plurality of divided data to be distributed are arranged in one group of the plurality of groups, and if not, the plurality of divided data Placing the divided data in two or more of the plurality of groups;
A distributed processing method comprising: