JP2013088863A

JP2013088863A - Parallel distributed processing method and parallel distributed processing system

Info

Publication number: JP2013088863A
Application number: JP2011225967A
Authority: JP
Inventors: Makoto Nakayama; 誠中山; Satoshi Tanaka; 聡田中
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2011-10-13
Filing date: 2011-10-13
Publication date: 2013-05-13

Abstract

PROBLEM TO BE SOLVED: To enable a master device to ascertain processing efficiency of each slave device without introducing an external monitoring computer or a monitoring program, in a master/slave type system.SOLUTION: A parallel distributed processing method to be executed by a parallel distributed processing system 4, includes the steps in which: slaves 1 execute tasks and also measure execution of the tasks by using a measurement unit predetermined in the parallel distributed processing system 4, store measurement results in a storage unit 11, and transmit the stored measurement results to a master 2; the master 2 stores the received measurement results in association with information for identifying respective transmission source slaves 1 in a collection unit 21 as a collection result, generates instruction information for the slaves 1 on the basis of the stored collection result, and transmits the instruction information to the slaves 1; and the respective slaves 1 control operation of its own slave 1 on the basis of the received instruction information.

Description

本発明は、タスクを並列分散処理する並列分散処理方法及び並列分散処理システムに関する。 The present invention relates to a parallel distributed processing method and a parallel distributed processing system for performing parallel distributed processing of tasks.

今日、非常に多数の、個々としては小さい一様なタスクを、複数の計算機で分散して処理する方法が用いられている。このような方法では、個々のタスクを処理する複数の計算機を「スレーブ」、各スレーブにタスクを割り当て、各スレーブの処理結果を集約する役割を果たす計算機を「マスター」と呼ぶことが多い。 Today, a very large number of small individual tasks are distributed and processed by a plurality of computers. In such a method, a plurality of computers that process individual tasks are often called “slaves”, and a computer that assigns tasks to each slave and aggregates the processing results of each slave is often called a “master”.

このようなマスター・スレーブ型システムとしては、Ｇｏｏｇｌｅ（登録商標）社が提唱しているＭａｐＲｅｄｕｃｅ技術や、そのオープンソース実装であるＨａｄｏｏｐ、Ｍｉｃｒｏｓｏｆｔ（登録商標）社が提唱しているＤｒｙａｄ（登録商標）、更にはＯｐｅｎＭＰＩ等が挙げられる。 As such a master / slave type system, MapReduce technology proposed by Google (registered trademark), Hadoop, which is an open source implementation thereof, and Dryad (registered trademark) proposed by Microsoft (registered trademark) are included. Furthermore, OpenMPI etc. are mentioned.

マスター・スレーブ型システムにおいては、各計算機の動作状況を監視することは必須である。監視を行う目的の一つは、故障や過負荷等の何らかの原因で予定通りの処理効率を発揮できていない計算機を発見することである。処理効率低下の原因が、特定の計算機がボトルネックになっている為であればそれを解決し、故障であれば修理や交換などの処置を施すことになる。 In a master / slave system, it is essential to monitor the operating status of each computer. One of the purposes of monitoring is to find a computer that does not exhibit the planned processing efficiency for some reason such as failure or overload. If the cause of the reduction in processing efficiency is that the specific computer is a bottleneck, it will be resolved, and if it is a failure, measures such as repair or replacement will be taken.

計算機の動作状況を監視する技術としては、例えば下記特許文献１や下記非特許文献１に記載の技術が知られている。特許文献１では、監視対象の計算機を監視する専用の監視コンピュータを用意し、監視対象計算機の外部から当該監視対象計算機を監視する点が開示されている。非特許文献１では、監視対象の計算機に、専用の監視ソフトウェアをインストールして動作させ、監視対象の計算機上で実行されている監視対象のプロセス（監視対象のソフトウェアが動作している状態）の外部から当該監視対象プロセスを監視する点が開示されている。 As techniques for monitoring the operation status of a computer, for example, techniques described in Patent Document 1 and Non-Patent Document 1 below are known. Patent Document 1 discloses that a dedicated monitoring computer for monitoring a monitoring target computer is prepared, and the monitoring target computer is monitored from outside the monitoring target computer. In Non-Patent Document 1, a dedicated monitoring software is installed and operated on a monitoring target computer, and a monitoring target process (a state in which the monitoring target software is operating) is executed on the monitoring target computer. The point of monitoring the process to be monitored from the outside is disclosed.

一方、サーバ・クライアント型システムにおいて、外部の監視コンピュータや監視ソフトウェアの助けを借りずに監視を行う技術としては、例えば下記特許文献２や下記特許文献３に記載の技術が知られている。特許文献２では、サーバがクライアントとの間の通信速度を監視して、通信速度に応じて、例えばクライアントに提供する画像データの解像度を下げる等、サーバの振る舞いを変化させる点が開示されている。特許文献３では、サーバが各クライアントへポーリングを行い、各クライアントの活動（死活）状況を監視し、不活動であると判断したクライアントをシステムから自動的に除外する点が開示されている。 On the other hand, in a server / client type system, as a technique for performing monitoring without the aid of an external monitoring computer or monitoring software, for example, techniques described in Patent Document 2 and Patent Document 3 below are known. Patent Document 2 discloses that the server changes the behavior of the server by monitoring the communication speed with the client and reducing the resolution of image data provided to the client, for example, according to the communication speed. . Patent Document 3 discloses that a server polls each client, monitors the activity (dead or alive) status of each client, and automatically excludes a client determined to be inactive from the system.

特開２００１−３２５１６１号公報JP 2001-325161 A 特開２００２−３２２９５号公報JP 2002-32295 A 特開平７−２７１７０１号公報JP-A-7-271701

The Ganglia Distributed Monitoring System: Design Implementation, and Experience、Parallel Computing Volume30, Issue 7The Ganglia Distributed Monitoring System: Design Implementation, and Experience, Parallel Computing Volume30, Issue 7

マスター・スレーブ型システムにおいて、全タスクの処理をより早く終えられるよう、タスクのスレーブへの割り当て（スケジューリング）を工夫することが、マスターには求められる。そのようなスケジューリングを実現するためには、マスターは、各スレーブのタスクの処理効率を監視して、より処理効率の高いスレーブへタスクを優先的に割り当てたり、システム内の計算機プールに未使用の計算機がある場合は、処理効率の低いスレーブを停止して計算機プール内の他の計算機を新たにスレーブにしたりする等の工夫が必要となる。 In a master / slave system, the master is required to devise assignment (scheduling) of tasks to slaves so that processing of all tasks can be completed earlier. In order to realize such scheduling, the master monitors the processing efficiency of tasks of each slave, and assigns tasks to higher-performance slaves preferentially or is not used in the computer pool in the system. When there is a computer, it is necessary to devise such as stopping a slave with low processing efficiency and making another computer in the computer pool a new slave.

ここで、特許文献１や非特許文献１に記載の技術では、監視対象の計算機やプロセスの外側から、ＣＰＵ使用率やメモリ使用率等の汎用的かつ客観的に監視可能な指標を監視することになる。特許文献１や非特許文献１に記載の技術では、ＣＰＵ使用率やメモリ使用率が高い計算機またはプロセスには、高い負荷がかかっていることは検出できるが、それゆえに、タスクの処理効率が低いと判断することはできない。なぜならば、タスクを高い効率で処理しているからこそ、ＣＰＵ使用率やメモリ使用率が高くなっていることもあり得るためである。言い換えると、特許文献１や非特許文献１に記載の技術では、外部から監視できる指標のみから、タスクの処理効率を間接的に推測することしかできない。 Here, with the techniques described in Patent Document 1 and Non-Patent Document 1, general-purpose and objectively monitorable indicators such as CPU usage rate and memory usage rate are monitored from outside the computer or process to be monitored. become. With the techniques described in Patent Literature 1 and Non-Patent Literature 1, it can be detected that a high load is applied to a computer or process having a high CPU usage rate or a high memory usage rate. Therefore, the task processing efficiency is low. It cannot be judged. This is because the CPU usage rate and the memory usage rate may be high because the task is processed with high efficiency. In other words, with the techniques described in Patent Document 1 and Non-Patent Document 1, it is only possible to indirectly estimate the task processing efficiency only from an index that can be monitored from the outside.

また、特許文献１に記載の技術では、監視コンピュータを用意及び運用するコストや手間が発生し、非特許文献１に記載の技術では、監視ソフトウェアを動作させることによる余計な負荷が発生することも問題である。 In addition, the technique described in Patent Document 1 requires costs and labor for preparing and operating a monitoring computer, and the technique described in Non-Patent Document 1 may generate an extra load due to operating the monitoring software. It is a problem.

一方、特許文献２や特許文献３に記載の技術では、外部の監視コンピュータや監視ソフトウェアを必要としない。しかしながら、特許文献２に記載の技術は、サーバとクライアントとの間の通信速度を監視するのみであり、通信速度だけではタスクの処理効率を測ることはできない。また、特許文献３に記載の技術は、クライアントの活動状況を監視するのみであり、クライアントの活動状況だけではタスクの処理効率を測ることはできない。 On the other hand, the techniques described in Patent Document 2 and Patent Document 3 do not require an external monitoring computer or monitoring software. However, the technique described in Patent Document 2 only monitors the communication speed between the server and the client, and the task processing efficiency cannot be measured only by the communication speed. The technique described in Patent Document 3 only monitors the activity status of the client, and cannot measure the task processing efficiency only by the activity status of the client.

本発明は、上記のような課題を解決するために成されたものであり、マスター・スレーブ型システムにおいて、外部の監視コンピュータや監視プログラムを導入することなく、スレーブの処理効率をマスターが把握することができる並列分散処理方法及び並列分散処理システムを提供することを目的とする。 The present invention has been made to solve the above-described problems. In a master / slave type system, the master grasps the processing efficiency of the slave without introducing an external monitoring computer or monitoring program. An object of the present invention is to provide a parallel distributed processing method and a parallel distributed processing system.

本発明の一側面に係る並列分散処理方法は、一台以上のスレーブと、スレーブとネットワークを介して接続されているマスターとから構成され、タスクをスレーブにて並列分散処理する並列分散処理システム、により実行される並列分散処理方法であって、スレーブが、当該スレーブに割り当てられたタスクを実行すると共に、並列分散処理システム内で予め定められた計測単位を用いて当該タスクの実行を計測し、計測結果を当該スレーブの格納手段に格納する処理ステップと、スレーブが、当該スレーブの格納手段に格納された計測結果をマスターに送信する第１の報告ステップと、マスターが、スレーブから計測結果を受信し、当該計測結果と送信元の当該スレーブを識別する情報とを関連付け、集計結果として当該マスターの集計手段に格納する通信ステップと、マスターが、集計手段に格納された集計結果に基づいて、スレーブに対する指示情報を生成し、当該スレーブに送信する判断ステップと、スレーブが、マスターから指示情報を受信する第２の報告ステップと、スレーブが、第２の報告ステップにおいて受信された指示情報に基づいて、当該スレーブの動作を制御する制御ステップと、を備える。 A parallel and distributed processing method according to one aspect of the present invention includes a parallel and distributed processing system that includes one or more slaves and a master connected to the slaves via a network, and performs parallel and distributed processing of tasks on the slaves. The parallel distributed processing method executed by the slave, the slave executes the task assigned to the slave, and measures the execution of the task using a predetermined measurement unit in the parallel distributed processing system, A processing step for storing the measurement result in the storage unit of the slave, a first reporting step in which the slave transmits the measurement result stored in the storage unit of the slave to the master, and the master receiving the measurement result from the slave The measurement result is associated with the information for identifying the slave at the transmission source, A communication step for storing in the master, a master generating instruction information for the slave based on the counting result stored in the counting means, and a determination step for transmitting to the slave, and a step in which the slave receives the instruction information from the master. And a control step in which the slave controls the operation of the slave based on the instruction information received in the second reporting step.

このように、スレーブにて、並列分散処理システム内で予め定められた計測単位を用いてタスクの実行が計測され、計測結果がマスターに送信されることで、マスターは、外部の監視コンピュータや監視プログラムを導入することなく、スレーブの処理効率を把握することができる。そして、マスターは、スレーブの処理効率に基づいてスレーブに指示を出し、スレーブは指示に基づいて動作を制御することで、マスターは、スレーブの処理効率に基づき、システム全体の動作を動的に制御することができる。 In this way, the execution of a task is measured by a slave using a predetermined measurement unit in the parallel distributed processing system, and the measurement result is transmitted to the master. It is possible to grasp the processing efficiency of a slave without introducing a program. Then, the master issues an instruction to the slave based on the processing efficiency of the slave, and the slave dynamically controls the operation of the entire system based on the processing efficiency of the slave by controlling the operation based on the instruction. can do.

また、判断ステップは、マスターが、集計手段に格納された集計結果のうち、通信ステップにおける送信元のスレーブからの一受信当たりの集計結果と、通信ステップにおける送信元のスレーブ以外のスレーブからの一受信当たりの集計結果とを比較し、比較結果に基づいて送信元のスレーブに対する指示情報を生成し、送信元のスレーブに送信してもよい。かかる場合、マスターは、対象スレーブの一受信当たりの集計結果と、他のスレーブの一受信当たりの集計結果との比較結果に基づいて、対象スレーブに指示を出すことができるため、より正確に対象スレーブの処理効率を把握することができると共に、より効率的にシステム全体の動作を動的に制御することができる。 In addition, the determination step includes a summation result per reception from the transmission source slave in the communication step among the aggregation results stored in the aggregation means, and one from a slave other than the transmission source slave in the communication step. It is also possible to compare the aggregated results per reception, generate instruction information for the transmission source slave based on the comparison result, and transmit the instruction information to the transmission source slave. In such a case, the master can issue an instruction to the target slave more accurately based on the comparison result of the aggregated result per reception of the target slave and the aggregated result per reception of other slaves. The processing efficiency of the slave can be grasped, and the operation of the entire system can be dynamically controlled more efficiently.

また、通信ステップは、マスターが、送信元のスレーブからの前回の計測結果の受信からの経過時間を更に関連付け、集計結果として当該マスターの集計手段に格納し、判断ステップは、マスターが、集計手段に格納された集計結果のうち、通信ステップにおける送信元のスレーブからの一受信の集計結果の単位時間当たりの値と、通信ステップにおける送信元のスレーブ以外のスレーブからの一受信の集計結果の単位時間当たりの値とを比較し、比較結果に基づいて送信元のスレーブに対する指示情報を生成し、送信元のスレーブに送信してもよい。かかる場合、マスターは、対象スレーブの一受信の集計結果の単位時間当たりの値と、他のスレーブの一受信の集計結果の単位時間当たりの値との比較結果に基づいて、対象スレーブに指示を出すことができるため、より正確に対象スレーブの処理効率を把握することができると共に、より効率的にシステム全体の動作を動的に制御することができる。 In the communication step, the master further associates the elapsed time from the reception of the previous measurement result from the transmission source slave, and stores it in the totaling means of the master as a totaling result. Of the aggregated results stored in, the value per unit time of the aggregated results of one reception from the transmission source slave in the communication step, and the unit of the aggregated results of one reception from slaves other than the transmission source slave in the communication step It may be compared with a value per time, instruction information for the transmission source slave may be generated based on the comparison result, and transmitted to the transmission source slave. In such a case, the master instructs the target slave based on the comparison result between the value per unit time of the aggregate result of one reception of the target slave and the value per unit time of the aggregate result of one reception of the other slave. Therefore, the processing efficiency of the target slave can be grasped more accurately and the operation of the entire system can be dynamically controlled more efficiently.

また、判断ステップは、マスターが、集計手段に格納された集計結果のうち、送信元のスレーブの全ての集計結果の平均値と、送信元のスレーブ以外のスレーブの全ての集計結果の平均値とを比較し、比較結果に基づいて送信元のスレーブに対する指示情報を生成し、送信元のスレーブに送信してもよい。かかる場合、マスターは、対象スレーブの全ての集計結果の平均値と、他のスレーブの全ての集計結果の平均値との比較結果に基づいて、対象スレーブに指示を出すことができるため、より正確に対象スレーブの処理効率を把握することができると共に、より効率的にシステム全体の動作を動的に制御することができる。 In addition, in the determination step, the master calculates an average value of all the aggregation results of the transmission source slave among the aggregation results stored in the aggregation means, and an average value of all the aggregation results of the slaves other than the transmission source slave. May be generated, instruction information for the transmission source slave may be generated based on the comparison result, and transmitted to the transmission source slave. In such a case, the master can issue an instruction to the target slave based on the comparison result between the average value of all the aggregation results of the target slave and the average value of all the aggregation results of the other slaves. In addition, the processing efficiency of the target slave can be grasped, and the operation of the entire system can be dynamically controlled more efficiently.

また、通信ステップは、マスターが、送信元のスレーブからの前回の計測結果の受信からの経過時間を更に関連付け、集計結果として当該マスターの集計手段に格納し、判断ステップは、マスターが、集計手段に格納された集計結果のうち、送信元のスレーブの全ての集計結果の単位時間当たりの値の平均値と、送信元のスレーブ以外のスレーブの全ての集計結果の単位時間当たりの値の平均値とを比較し、比較結果に基づいて送信元のスレーブに対する指示情報を生成し、送信元のスレーブに送信してもよい。かかる場合、マスターは、対象スレーブの全ての集計結果の単位時間当たりの値の平均値と、他のスレーブの全ての集計結果の単位時間当たりの値の平均値との比較結果に基づいて、対象スレーブに指示を出すことができるため、より正確に対象スレーブの処理効率を把握することができると共に、より効率的にシステム全体の動作を動的に制御することができる。 In the communication step, the master further associates the elapsed time from the reception of the previous measurement result from the transmission source slave, and stores it in the totaling means of the master as a totaling result. Of the aggregation results stored in, the average value per unit time of all aggregation results of the transmission source slave and the average value of unit aggregation values of all the aggregation results of slaves other than the transmission source slave May be generated, instruction information for the transmission source slave may be generated based on the comparison result, and transmitted to the transmission source slave. In such a case, the master determines the target based on the comparison result between the average value of all the aggregate results of the target slaves per unit time and the average value of all the aggregate results of other slaves per unit time. Since an instruction can be issued to the slave, the processing efficiency of the target slave can be grasped more accurately and the operation of the entire system can be dynamically controlled more efficiently.

また、判断ステップは、マスターが、集計手段に格納された集計結果のうち、通信ステップにおけるスレーブからの一受信当たりの集計結果の、全てのスレーブに関する集計結果に対して統計処理を行い、統計処理結果に基づいて送信元のスレーブに対する指示情報を生成し、送信元のスレーブに送信してもよい。かかる場合、マスターは、スレーブからの一受信当たりの集計結果の、全てのスレーブに関する集計結果に対しての統計処理結果に基づいて、スレーブに指示を出すことができるため、より正確にスレーブの処理効率を把握することができると共に、より効率的にシステム全体の動作を動的に制御することができる。 In addition, in the determination step, the master performs statistical processing on the aggregation results for all slaves of the aggregation results received from the slave in the communication step among the aggregation results stored in the aggregation means. The instruction information for the transmission source slave may be generated based on the result and transmitted to the transmission source slave. In such a case, the master can issue an instruction to the slave based on the statistical processing results for the totaling results for all slaves of the totaling results per reception from the slave, so the slave processing can be performed more accurately. The efficiency can be grasped, and the operation of the entire system can be dynamically controlled more efficiently.

また、通信ステップは、マスターが、送信元のスレーブからの前回の計測結果の受信からの経過時間を更に関連付け、集計結果として当該マスターの集計手段に格納し、判断ステップは、マスターが、集計手段に格納された集計結果のうち、通信ステップにおけるスレーブからの一受信の集計結果の単位時間当たりの値の、全てのスレーブに関する値に対して統計処理を行い、統計処理結果に基づいて送信元のスレーブに対する指示情報を生成し、送信元のスレーブに送信してもよい。かかる場合、マスターは、スレーブからの一受信の集計結果の単位時間当たりの値の、全てのスレーブに関する値に対しての統計処理結果に基づいて、スレーブに指示を出すことができるため、より正確にスレーブの処理効率を把握することができると共に、より効率的にシステム全体の動作を動的に制御することができる。 In the communication step, the master further associates the elapsed time from the reception of the previous measurement result from the transmission source slave, and stores it in the totaling means of the master as a totaling result. The statistical processing is performed on the values related to all slaves of the aggregated results of one reception from the slave in the communication step among the aggregated results stored in the communication step, and based on the statistical processing results, Instruction information for the slave may be generated and transmitted to the slave at the transmission source. In such a case, the master can issue an instruction to the slave based on the statistical processing results for the values related to all slaves of the values per unit time of the total result of one reception from the slave, and thus more accurate. In addition, the processing efficiency of the slave can be grasped, and the operation of the entire system can be dynamically controlled more efficiently.

また、判断ステップは、集計手段に格納された集計結果のうち、通信ステップにおけるスレーブからの一受信当たりの集計結果の、全てのスレーブに関する平均値が、予め定められた閾値を超えた場合に、マスターが、当該マスターの負荷を下げる旨のスレーブに対する指示情報を生成し、当該スレーブに送信してもよい。かかる場合、マスターは、例えば、マスターの能力を超えた台数のスレーブが接続されていることを検出できるようになり、マスターの能力、または能力に応じて設定された条件に応じてスレーブの接続台数を動的に制御することが可能となる。 In addition, the determination step includes, among the aggregation results stored in the aggregation means, when the average value for all slaves of the aggregation results per reception from the slave in the communication step exceeds a predetermined threshold, The master may generate instruction information for the slave to reduce the load on the master and transmit the instruction information to the slave. In such a case, the master can detect that, for example, the number of slaves exceeding the master's capacity is connected, and the number of slaves connected according to the master's capacity or the conditions set according to the capacity. Can be controlled dynamically.

ところで、並列分散処理方法に係る発明は、システムの発明として捉えることができ、同様の作用・効果を奏する。システムの発明は、以下のように記述することができる。 By the way, the invention according to the parallel distributed processing method can be regarded as the invention of the system, and has the same operation and effect. The invention of the system can be described as follows.

本発明の一側面に係る並列分散処理システムは、一台以上のスレーブと、スレーブとネットワークを介して接続されているマスターとから構成され、タスクをスレーブにて並列分散処理する並列分散処理システムであって、スレーブは、当該スレーブに割り当てられたタスクを実行すると共に、並列分散処理システム内で予め定められた計測単位を用いて当該タスクの実行を計測し、計測結果を当該スレーブの格納手段に格納する処理手段と、当該スレーブの格納手段に格納された計測結果をマスターに送信し、当該送信の応答としてマスターから指示情報を受信する報告手段と、報告手段によって受信された指示情報に基づいて、当該スレーブの動作を制御する制御手段と、を備え、マスターは、スレーブから計測結果を受信し、当該計測結果と送信元の当該スレーブを識別する情報とを関連付け、集計結果として当該マスターの集計手段に格納する通信手段と、集計手段に格納された集計結果に基づいて、スレーブに対する指示情報を生成し、当該スレーブに送信する判断手段と、を備える。 A parallel distributed processing system according to an aspect of the present invention is a parallel distributed processing system that includes one or more slaves and a master connected to the slaves via a network, and performs parallel distributed processing of tasks on the slaves. The slave executes the task assigned to the slave, measures the execution of the task using a predetermined measurement unit in the parallel distributed processing system, and stores the measurement result in the storage unit of the slave. Based on the processing means for storing, the reporting means for transmitting the measurement result stored in the storage means of the slave to the master, and receiving the instruction information from the master as a response to the transmission, and the instruction information received by the reporting means A control means for controlling the operation of the slave, and the master receives the measurement result from the slave and performs the measurement. The communication means for storing the result and the information for identifying the slave of the transmission source and storing it in the totaling means of the master as the totaling result, and generating the instruction information for the slave based on the totaling result stored in the totaling means, Determining means for transmitting to the slave.

本発明によれば、マスター・スレーブ型システムにおいて、外部の監視コンピュータや監視プログラムを導入することなく、スレーブの処理効率をマスターが把握することができる。 According to the present invention, in the master / slave type system, the master can grasp the processing efficiency of the slave without introducing an external monitoring computer or a monitoring program.

本発明の一実施形態に係る並列分散処理システムの概念図である。It is a conceptual diagram of the parallel distributed processing system which concerns on one Embodiment of this invention. 本発明の一実施形態に係るスレーブの構成を示すブロック図である。It is a block diagram which shows the structure of the slave which concerns on one Embodiment of this invention. 本発明の一実施形態に係るマスターの構成を示すブロック図である。It is a block diagram which shows the structure of the master which concerns on one Embodiment of this invention. 本発明の一実施形態に係るスレーブ及びマスターのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the slave and master which concern on one Embodiment of this invention. 本発明の一実施形態に係るスレーブにおける並列分散処理方法の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the parallel distributed processing method in the slave which concerns on one Embodiment of this invention. 本発明の一実施形態に係るマスターにおける並列分散処理方法の処理動作（その１）を示すフローチャートである。It is a flowchart which shows the processing operation (the 1) of the parallel distributed processing method in the master which concerns on one Embodiment of this invention. 本発明の一実施形態に係るマスターにおける並列分散処理方法の処理動作（その２）を示すフローチャートである。It is a flowchart which shows the processing operation (the 2) of the parallel distributed processing method in the master which concerns on one Embodiment of this invention. 本発明の一実施形態に係るマスターにおける並列分散処理方法の処理動作（その３）を示すフローチャートである。It is a flowchart which shows the processing operation (the 3) of the parallel distributed processing method in the master which concerns on one Embodiment of this invention. 本発明の一実施形態に係る集計部のテーブルデータの一例（その１）を示す図である。It is a figure which shows an example (the 1) of the table data of the total part which concerns on one Embodiment of this invention. 本発明の一実施形態に係る集計部のテーブルデータの一例（その２）を示す図である。It is a figure which shows an example (the 2) of the table data of the total part which concerns on one Embodiment of this invention. 本発明の一実施形態に係る集計部のテーブルデータの一例（その３）を示す図である。It is a figure which shows an example (the 3) of the table data of the total part which concerns on one Embodiment of this invention. 本発明の一実施形態に係る集計部のテーブルデータの一例（その４）を示す図である。It is a figure which shows an example (the 4) of the table data of the total part which concerns on one Embodiment of this invention. 本発明の一実施形態に係る集計部のテーブルデータの一例（その５）を示す図である。It is a figure which shows an example (the 5) of table data of the total part which concerns on one Embodiment of this invention.

以下、添付図面を参照しながら本発明の実施形態を詳細に説明する。なお、図面の説明において同一又は同等の要素には同一の符号を付し、重複する説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the description of the drawings, the same or equivalent elements are denoted by the same reference numerals, and redundant description is omitted.

（並列分散処理システムの概要）
まず、図１を用いて、本発明の一実施形態に係る並列分散処理システムの全体像を説明する。図１は、並列分散処理システム４の概要を示す図である。並列分散処理システム４は、１台以上のスレーブ１と、マスター２とを含んで構成される。並列分散処理システム４において、各スレーブ１とマスター２とは、ネットワーク３を介して互いに通信可能である。 (Overview of parallel distributed processing system)
First, an overall view of a parallel distributed processing system according to an embodiment of the present invention will be described with reference to FIG. FIG. 1 is a diagram showing an overview of a parallel distributed processing system 4. The parallel distributed processing system 4 includes one or more slaves 1 and a master 2. In the parallel distributed processing system 4, each slave 1 and master 2 can communicate with each other via a network 3.

スレーブ１及びマスター２は、それぞれサーバ装置であるが、これに限定されず、それぞれ、ＰＣ（Personal Computer）、ノートＰＣ、及び携帯電話等の移動通信端末等であってもよい。ネットワーク３は、例えばインターネット網や移動通信ネットワーク等により構成されているが、これらに限定されない。 Each of the slave 1 and the master 2 is a server device, but is not limited thereto, and may be a PC (Personal Computer), a notebook PC, a mobile communication terminal such as a mobile phone, or the like. The network 3 is configured by, for example, the Internet network or a mobile communication network, but is not limited thereto.

並列分散処理システム４は、いわゆるマスター・スレーブ型のクラスタ（計算機の集合）であり、並列分散処理システム４において、多数のタスク（処理内容）であって、ただし個々は小さい一様なタスクを、１台以上のスレーブ１間で分散処理することで、並列分散処理が実行される。また、並列分散処理システム４では、マスター２が、全スレーブ１を管理している。 The parallel distributed processing system 4 is a so-called master / slave type cluster (a set of computers). In the parallel distributed processing system 4, there are a large number of tasks (processing contents), each of which is a small uniform task, Parallel distributed processing is executed by performing distributed processing among one or more slaves 1. In the parallel distributed processing system 4, the master 2 manages all the slaves 1.

（スレーブ１の構成）
図２は、本発明の一実施形態に係るスレーブ１の構成を示す図である。図２に示すスレーブ１は、処理部１０（処理手段）、格納部１１（格納手段）、報告部１２（報告手段）、及び制御部１３（制御手段）を含んで構成される。 (Configuration of slave 1)
FIG. 2 is a diagram illustrating a configuration of the slave 1 according to the embodiment of the present invention. The slave 1 shown in FIG. 2 includes a processing unit 10 (processing unit), a storage unit 11 (storage unit), a reporting unit 12 (reporting unit), and a control unit 13 (control unit).

図４は、スレーブ１のハードウェア構成の一例を示す。スレーブ１は、ハードウェア構成として、ＣＰＵ５０と、ＲＡＭ５１と、ＲＯＭ５２と、キーボードやテンキーなどから成る入力部５３と、外部との通信を行う通信部５４と、補助記憶装置５５と、ディスプレイなどから成る５６と、を備える。前述したスレーブ１の各機能ブロックの機能は、ＲＡＭ５１などにプログラムやデータなどを読み込ませ、ＣＰＵ５０の制御の下でプログラムを実行することで実現される。なお、後述するマスター２も、上記図４のハードウェア構成と同様である。 FIG. 4 shows an example of the hardware configuration of the slave 1. As a hardware configuration, the slave 1 includes a CPU 50, a RAM 51, a ROM 52, an input unit 53 including a keyboard and a numeric keypad, a communication unit 54 that communicates with the outside, an auxiliary storage device 55, and a display. 56. The function of each functional block of the slave 1 described above is realized by causing the RAM 51 or the like to read a program or data and executing the program under the control of the CPU 50. The master 2 described later has the same hardware configuration as that shown in FIG.

以下、図２に示すスレーブ１の各機能ブロックについて説明する。処理部１０は、スレーブ１に割り当てられたタスクを実行すると共に、並列分散処理システム４内で予め定められた計測単位を用いて当該タスクの実行を計測し、計測結果を格納部１１に格納する。処理部１０は、スレーブ１に割り当てられたタスクが無くなるまで、この動作を繰り返す。 Hereinafter, each functional block of the slave 1 shown in FIG. 2 will be described. The processing unit 10 executes the task assigned to the slave 1, measures the execution of the task using a predetermined measurement unit in the parallel distributed processing system 4, and stores the measurement result in the storage unit 11. . The processing unit 10 repeats this operation until there is no task assigned to the slave 1.

計測単位については、並列分散処理システム４内の全スレーブ１とマスター２との間で予め決定しておく必要がある。計測単位は全スレーブ１において等しく計測（またはカウント）できるものなら何でもよく、例えば「処理したタスク数」や「秒数」等が挙げられる。本実施形態では、計測単位を「処理したタスク数」として説明する。なお、タスクの総数が非常に多く、個々のタスクは小さくて一様であるような状況を対象としている。 The measurement unit needs to be determined in advance between all the slaves 1 and the master 2 in the parallel distributed processing system 4. The measurement unit may be anything as long as it can be measured (or counted) equally in all the slaves 1, and examples thereof include “number of processed tasks” and “seconds”. In this embodiment, the measurement unit is described as “the number of processed tasks”. It is intended for situations where the total number of tasks is very large and each task is small and uniform.

処理部１０は、計測結果を格納部１１に格納する際に、一定間隔ごとに計測結果を格納する。ここで、一定間隔とは、一定の時間でもよいし、タスクを一定数処理する度でもよいが、並列分散処理システム４内の全スレーブ１とマスター２との間で予め決定しておく必要がある。また、処理部１０は、処理結果を格納部１１に格納すると、処理部１０が保持する計測結果をゼロにリセットする。 When storing the measurement result in the storage unit 11, the processing unit 10 stores the measurement result at regular intervals. Here, the fixed interval may be a fixed time or every time a certain number of tasks are processed, but needs to be determined in advance between all the slaves 1 and the master 2 in the parallel distributed processing system 4. is there. When the processing unit 10 stores the processing result in the storage unit 11, the processing unit 10 resets the measurement result held by the processing unit 10 to zero.

また、処理部１０は、計測結果を格納部１１に格納する際に、計測結果を格納するかどうかを判断してもよい。判断基準としては、例えば「前回格納時から１秒以上経過していたら格納する」や「タスクを１個処理する毎に格納する」等が挙げられる。判断基準は、並列分散処理システム４内の全スレーブ１で統一されている必要があるが、プログラム的に固定されていても良いし、設定ファイル等を用いて設定するようにしてもよい。本実施形態では、処理部１０は、「タスクを１個処理する毎に格納する」という判断基準を用いて、計測結果を格納部１１に格納するかどうかを判断するものとする。 Further, the processing unit 10 may determine whether to store the measurement result when storing the measurement result in the storage unit 11. Examples of the determination criterion include “store if one second or more has elapsed since the previous storage”, “store every time one task is processed”, and the like. The determination criteria need to be unified for all slaves 1 in the parallel distributed processing system 4, but may be fixed programmatically or may be set using a setting file or the like. In the present embodiment, the processing unit 10 determines whether to store the measurement result in the storage unit 11 using the determination criterion “store every time one task is processed”.

格納部１１には、処理部１０により計測結果が格納される。格納部１１では、計測結果が格納される度に、格納される計測結果を上書きすることなく、それまでに格納済みの計測結果に加算する。 The measurement result is stored in the storage unit 11 by the processing unit 10. Each time the measurement result is stored in the storage unit 11, the stored measurement result is added to the previously stored measurement result without being overwritten.

報告部１２は、格納部１１に格納された計測結果を取り出し、取り出した計測結果をマスター２に送信する。そして、報告部１２は、当該送信の応答としてマスター２から指示情報を受信し、受信した指示情報を報告部１２に出力する。 The report unit 12 extracts the measurement result stored in the storage unit 11 and transmits the extracted measurement result to the master 2. Then, the reporting unit 12 receives the instruction information from the master 2 as a response to the transmission, and outputs the received instruction information to the reporting unit 12.

報告部１２は、格納部１１に格納された計測結果を取り出すと、格納部１１の計測結果をゼロにリセットする。また、報告部１２は、取り出した計測結果をマスター２に送信する際に、取り出した計測結果をマスター２に報告すべきかどうか、つまり、計測結果が「１」以上であるかどうかを判断してもよい。この場合、報告部１２は、計測結果がゼロである際は、報告すべき計測結果が無いため待機し、報告すべき計測結果がある際は、取り出した計測結果を実際にマスター２に送信する。また、報告部１２は、取り出した計測結果をマスター２に送信する際に、従来の並列分散処理技術における、スレーブからマスターへの処理結果の逐次報告と一緒に送信してもよい。 When the report unit 12 takes out the measurement result stored in the storage unit 11, the report unit 12 resets the measurement result in the storage unit 11 to zero. The reporting unit 12 determines whether or not the taken measurement result should be reported to the master 2 when transmitting the taken measurement result to the master 2, that is, whether or not the measurement result is “1” or more. Also good. In this case, when the measurement result is zero, the reporting unit 12 waits because there is no measurement result to be reported, and when there is a measurement result to be reported, the reporting unit 12 actually transmits the extracted measurement result to the master 2. . Further, when transmitting the extracted measurement result to the master 2, the report unit 12 may transmit it together with the sequential report of the processing result from the slave to the master in the conventional parallel distributed processing technique.

制御部１３は、報告部１２から出力された指示情報を取得し、当該指示情報に基づいてスレーブ１の動作を制御する。指示情報の具体例としては、「特に何もしない」、「処理部１０による処理が実行されている主スレッド、及び報告部１２による処理が実行されている別スレッドの動作を即刻停止する」「処理部１０が計測結果を格納部１１に格納する際に用いる判断基準を変更する」等が挙げられる。 The control unit 13 acquires the instruction information output from the report unit 12, and controls the operation of the slave 1 based on the instruction information. Specific examples of the instruction information include “do nothing special”, “stop the operations of the main thread in which processing by the processing unit 10 is executed, and another thread in which processing by the reporting unit 12 is executed” “ The processing unit 10 changes the determination criterion used when storing the measurement result in the storage unit 11 ”.

（マスター２の構成）
図３は、本発明の一実施形態に係るマスター２の構成を示す図である。図３に示すマスター２は、通信部２０（通信手段）、集計部２１（集計手段）、判断部２２（判断手段）、及び表示部２３を含んで構成される。 (Configuration of Master 2)
FIG. 3 is a diagram showing a configuration of the master 2 according to an embodiment of the present invention. 3 includes a communication unit 20 (communication unit), a totaling unit 21 (totaling unit), a determination unit 22 (determination unit), and a display unit 23.

以下、図３に示すマスター２の各機能ブロックについて説明する。通信部２０は、スレーブ１から計測結果を受信し、当該計測結果と送信元の当該スレーブ１を識別する情報とを関連付け、集計結果として集計部２１に格納する。スレーブ１を識別する情報の具体例としては、スレーブ１のＩＰアドレス等が挙げられる。通信部２０は、送信元のスレーブ１からの前回の計測結果の受信からの経過時間、又は送信元のスレーブ１からの計測結果の受信時刻を更に関連付け、集計結果として集計部２１に格納してもよい。 Hereinafter, each functional block of the master 2 shown in FIG. 3 will be described. The communication unit 20 receives the measurement result from the slave 1, associates the measurement result with information for identifying the slave 1 as the transmission source, and stores the result in the aggregation unit 21 as the aggregation result. A specific example of information for identifying the slave 1 is the IP address of the slave 1. The communication unit 20 further associates the elapsed time from the reception of the previous measurement result from the transmission source slave 1 or the reception time of the measurement result from the transmission source slave 1 and stores the result in the aggregation unit 21 as the aggregation result. Also good.

集計部２１には、通信部２０により、集計結果が格納される。集計部２１では、例えば、集計結果に含まれるスレーブ１を識別する情報ごとに集計結果を格納及び集計するようにしてもよい。 The totaling result is stored in the totaling unit 21 by the communication unit 20. For example, the counting unit 21 may store and count the counting result for each piece of information for identifying the slave 1 included in the counting result.

判断部２２は、集計部２１に格納された集計結果に基づいて、スレーブ１、マスター２及び並列分散処理システム４の状況や当該状況に伴う予測や指示等を判断し、判断結果に基づいてスレーブ１に対する指示情報を生成し、当該スレーブ１に送信する。判断内容としては、処理効率の低いスレーブ１の検出、マスター２自身の負荷の程度、又は並列分散処理システム４全体の処理効率から何らかの予測を行うこと等が挙げられる。以下では、判断部２２の様々な処理パターンについて列挙する。これらの処理パターンの具体例については、後述のマスター２の各種実施形態の処理にて説明する。 The determination unit 22 determines the status of the slave 1, the master 2, and the parallel distributed processing system 4, the prediction and instruction associated with the status based on the total result stored in the total unit 21, and determines the slave based on the determination result. 1 is generated and transmitted to the slave 1. The contents of the determination include detection of the slave 1 with low processing efficiency, the degree of load of the master 2 itself, or some prediction from the processing efficiency of the entire parallel distributed processing system 4. Hereinafter, various processing patterns of the determination unit 22 are listed. Specific examples of these processing patterns will be described in the processing of various embodiments of the master 2 described later.

判断部２２は、マスター２が、集計部２１に格納された集計結果のうち、通信部２０における送信元のスレーブ１からの一受信当たりの集計結果と、通信部２０における送信元のスレーブ１以外のスレーブ１からの一受信当たりの集計結果とを比較し、比較結果に基づいて送信元のスレーブ１に対する指示情報を生成し、送信元のスレーブ１に送信してもよい。また、判断部２２は、マスター２が、集計部２１に格納された集計結果のうち、通信部２０における送信元のスレーブ１からの一受信の集計結果の単位時間当たりの値と、通信部２０における送信元のスレーブ１以外のスレーブ１からの一受信の集計結果の単位時間当たりの値とを比較し、比較結果に基づいて送信元のスレーブ１に対する指示情報を生成し、送信元のスレーブ１に送信してもよい。 The determination unit 22 is configured such that the master 2 is not included in the totaling results stored in the totaling unit 21, except for the totaling result per reception from the transmission source slave 1 in the communication unit 20 and the transmission source slave 1 in the communication unit 20. May be compared with the total result per reception from the slave 1, the instruction information for the transmission source slave 1 may be generated based on the comparison result, and may be transmitted to the transmission source slave 1. In addition, the determination unit 22 determines the value per unit time of the total result of one reception from the transmission source slave 1 in the communication unit 20 among the total results stored in the total unit 21 by the master 2, and the communication unit 20. Are compared with the values per unit time of the total result of one reception from the slaves 1 other than the transmission source slave 1, and based on the comparison result, instruction information for the transmission source slave 1 is generated, and the transmission source slave 1 May be sent to.

また、判断部２２は、マスター２が、集計部２１に格納された集計結果のうち、送信元のスレーブ１の全ての集計結果の平均値と、送信元のスレーブ１以外のスレーブ１の全ての集計結果の平均値とを比較し、比較結果に基づいて送信元のスレーブ１に対する指示情報を生成し、送信元のスレーブ１に送信してもよい。また、判断部２２は、マスター２が、集計部２１に格納された集計結果のうち、送信元のスレーブ１の全ての集計結果の単位時間当たりの値の平均値と、送信元のスレーブ１以外のスレーブ１の全ての集計結果の単位時間当たりの値の平均値とを比較し、比較結果に基づいて送信元のスレーブ１に対する指示情報を生成し、送信元のスレーブ１に送信してもよい。 In addition, the determination unit 22 determines that the master 2 calculates the average value of all the aggregation results of the transmission source slave 1 among the aggregation results stored in the aggregation unit 21 and all the slaves 1 other than the transmission source slave 1. It may be compared with the average value of the tabulation results, and the instruction information for the transmission source slave 1 may be generated based on the comparison result and transmitted to the transmission source slave 1. In addition, the determination unit 22 determines that the master 2 is an average value of all the aggregation results of the transmission source slave 1 among the aggregation results stored in the aggregation unit 21 and other than the transmission source slave 1. May be compared with the average value of the values per unit time of all the summation results of the slave 1, and the instruction information for the slave 1 of the transmission source may be generated based on the comparison result and transmitted to the slave 1 of the transmission source .

また、判断部２２は、マスター２が、集計部２１に格納された集計結果のうち、通信部２０におけるスレーブ１からの一受信当たりの集計結果の、全てのスレーブ１に関する集計結果に対して統計処理を行い、統計処理結果に基づいて送信元のスレーブ１に対する指示情報を生成し、送信元のスレーブ１に送信してもよい。また、判断部２２は、マスター２が、集計部２１に格納された集計結果のうち、通信部２０におけるスレーブ１からの一受信の集計結果の単位時間当たりの値の、全てのスレーブ１に関する値に対して統計処理を行い、統計処理結果に基づいて送信元のスレーブ１に対する指示情報を生成し、送信元のスレーブ１に送信してもよい。 In addition, the determination unit 22 performs statistics on the aggregation results for all slaves 1 of the aggregation results per reception from the slave 1 in the communication unit 20 among the aggregation results stored in the aggregation unit 21. Processing may be performed to generate instruction information for the transmission source slave 1 based on the statistical processing result and transmit the instruction information to the transmission source slave 1. In addition, the determination unit 22 is a value related to all the slaves 1 among the totaling results stored in the totaling unit 21 by the master 2 as a result of the totaling result of one reception from the slave 1 in the communication unit 20. May be subjected to statistical processing, instruction information for the transmission source slave 1 may be generated based on the statistical processing result, and transmitted to the transmission source slave 1.

また、判断部２２は、集計部２１に格納された集計結果のうち、通信部２０におけるスレーブ１からの一受信当たりの集計結果の、全てのスレーブ１に関する平均値が、予め定められた閾値を超えた場合に、マスター２が、当該マスターの負荷を下げる旨のスレーブ１に対する指示情報を生成し、当該スレーブ１に送信してよい。 In addition, the determination unit 22 determines that the average value for all slaves 1 of the total results per reception from the slave 1 in the communication unit 20 among the total results stored in the total unit 21 is a predetermined threshold value. When it exceeds, the master 2 may generate instruction information for the slave 1 to reduce the load on the master and transmit the instruction information to the slave 1.

表示部２３は、並列分散処理システム４のユーザに対して、並列分散処理システム４に関する情報を表示する。 The display unit 23 displays information related to the parallel distributed processing system 4 to the user of the parallel distributed processing system 4.

（スレーブ１の処理の説明）
続いて、スレーブ１における並列分散処理方法の処理の手順を、図５を参照して説明する。まず、スレーブ１は、タスクが割り当てられると、主スレッドともう一つの別スレッドを開始する。主スレッド上では処理部１０による処理、具体的にはステップ１０１〜１０４が実行される。一方、別スレッド上では報告部１２による処理、具体的にはステップ２０１〜２０５が実行される。 (Description of slave 1 processing)
Next, the processing procedure of the parallel distributed processing method in the slave 1 will be described with reference to FIG. First, when a task is assigned, the slave 1 starts a main thread and another thread. On the main thread, processing by the processing unit 10, specifically, steps 101 to 104 are executed. On the other hand, the processing by the reporting unit 12, specifically, steps 201 to 205 are executed on another thread.

はじめに、スレーブ１の主スレッド上での処理について説明する。まず、処理部１０は、スレーブ１に割り当てられたタスクがまだ残っているかどうかの判断を行う（ステップ１０１）。ステップ１０１にて、タスクが残っていない場合、スレーブ１の動作は終了となり、主スレッドと別スレッドの両方が停止する。一方、ステップ１０１にて、タスクが残っている場合、処理部１０は、タスクを一つ実行し、その傍らで予め定められた計測単位を用いてタスクの実行を計測する（ステップ１０２）。次に、処理部１０は、計測結果を格納部１１へ格納するかどうかを判断する（ステップ１０３）。ステップ１０３にて、格納部１１へ格納すると判断した場合、処理部１０は、計測した計測結果を格納部１１へ格納し、処理部１０の計測結果をゼロにリセットする（ステップ１０４）。なお、格納部１１に以前の計測結果が残っていた場合、上書きすることなく、今回の計測結果を加算する。ステップ１０３にて、格納部１１へ格納しないと判断した場合、ステップ１０１に進む。スレーブ１の主スレッドは、以上のステップ１０１〜１０４を、スレーブ１に割り当てられたタスクがなくなるまで繰り返す。 First, processing on the main thread of the slave 1 will be described. First, the processing unit 10 determines whether there are still tasks assigned to the slave 1 (step 101). If no task remains in step 101, the operation of the slave 1 is terminated, and both the main thread and another thread are stopped. On the other hand, if a task remains in step 101, the processing unit 10 executes one task and measures the execution of the task using a predetermined measurement unit beside the task (step 102). Next, the processing unit 10 determines whether or not to store the measurement result in the storage unit 11 (step 103). When it is determined in step 103 that the data is stored in the storage unit 11, the processing unit 10 stores the measured measurement result in the storage unit 11, and resets the measurement result of the processing unit 10 to zero (step 104). If the previous measurement result remains in the storage unit 11, the current measurement result is added without being overwritten. If it is determined in step 103 that the data is not stored in the storage unit 11, the process proceeds to step 101. The main thread of slave 1 repeats the above steps 101 to 104 until there is no task assigned to slave 1.

続いて、スレーブ１の別スレッド上での処理について説明する。まず、報告部１２は、格納部１１から計測結果を取り出し、格納部１１の計測結果をゼロにリセットする（ステップ２０１）。次に、報告部１２は、取り出した計測結果をマスター２に報告すべきかどうか、つまり、計測結果が「１」以上であるかどうかを判断する（ステップ２０２）。ステップ２０２にて、計測結果がゼロであった場合は、報告すべき計測結果が無いため、判断結果は「ＮＯ」となり、ステップ２０１に進む。ステップ２０２にて、報告すべき計測結果がある場合は、報告部１２は、計測結果をマスター２へ送信する（ステップ２０３）。次に、報告部１２は、マスター２へ送信した計測結果に対する応答として、マスター２からの指示を受信する（ステップ２０４）。次に、報告部１２は、マスター２から受信した指示を制御部１３へ引き渡す（ステップ２０５）。指示を受け取った制御部１３は、指示の内容に従って当該スレーブ１の動作を制御する。スレーブ１の別スレッドは、主スレッドが終了するまで、以上のステップ２０１〜２０５を繰り返す。 Next, processing on another thread of the slave 1 will be described. First, the reporting unit 12 retrieves the measurement result from the storage unit 11 and resets the measurement result in the storage unit 11 to zero (step 201). Next, the reporting unit 12 determines whether or not the taken measurement result should be reported to the master 2, that is, whether or not the measurement result is “1” or more (step 202). If the measurement result is zero in step 202, the determination result is “NO” because there is no measurement result to be reported, and the process proceeds to step 201. If there is a measurement result to be reported in step 202, the reporting unit 12 transmits the measurement result to the master 2 (step 203). Next, the reporting unit 12 receives an instruction from the master 2 as a response to the measurement result transmitted to the master 2 (step 204). Next, the reporting unit 12 hands over the instruction received from the master 2 to the control unit 13 (step 205). Upon receiving the instruction, the control unit 13 controls the operation of the slave 1 according to the content of the instruction. Another thread of the slave 1 repeats the above steps 201 to 205 until the main thread is finished.

ここで、スレーブ１における主スレッドと別スレッドは互いに非同期に動作する。主スレッドは、別スレッドの動作とは無関係に、ベストエフォートでタスクの処理、計測、及び計測結果の格納を繰り返す。別スレッドも、主スレッドの動作とは無関係に、計測結果の取り出し、マスターへの計測結果の送信、マスターからの指示の受信、制御部１３への指示の受け渡し、という一連の動作を、ベストエフォートで主スレッドが終了するまで繰り返す。ところで、並列分散処理システム４全体には１台以上のスレーブ１が存在するが、スレーブ１の台数と、ある一つのスレーブ１の主スレッドの処理効率との間には何ら因果関係は無く、無関係である。一方、スレーブ１の台数と、ある一つのスレーブ１の別スレッドの効率には因果関係がある。なぜならば、スレーブ１の台数が増えると、マスター２が受信及び応答すべき報告の送信元スレーブ１が増えるため、マスター２の負荷が高くなり、ある一つのスレーブ１から見ると、マスター２からの応答が返ってくるまでの時間（以下、応答時間と呼ぶ）が長くなるためである。本実施形態のポイントは、スレーブ１における主スレッドと別スレッドの効率の比率を利用する、というところにある。 Here, the main thread and another thread in the slave 1 operate asynchronously with each other. The main thread repeats task processing, measurement, and storage of measurement results at best effort, regardless of the operation of another thread. The other thread also performs a best-effort sequence of operations such as taking out the measurement result, sending the measurement result to the master, receiving the instruction from the master, and passing the instruction to the control unit 13 regardless of the operation of the main thread. Repeat until the main thread ends. By the way, although there are one or more slaves 1 in the entire parallel distributed processing system 4, there is no causal relationship between the number of slaves 1 and the processing efficiency of the main thread of one slave 1, which is irrelevant. It is. On the other hand, there is a causal relationship between the number of slaves 1 and the efficiency of another thread of a certain slave 1. This is because, as the number of slaves 1 increases, the number of slaves 1 that transmit masters to which the master 2 should receive and respond increases, so the load on the master 2 increases. This is because the time until the response is returned (hereinafter referred to as response time) becomes longer. The point of this embodiment is that the ratio of the efficiency of the main thread and another thread in the slave 1 is used.

（マスター２の処理の第１の例の説明）
続いて、マスター２による並列分散処理方法の処理の第１の例を、図６を参照して説明する。まず、通信部２０は、各スレーブ１からの計測結果を受信する（ステップ３０１）。次に、通信部２０は、時間情報を利用するか否かを判断する（ステップ３０２）。なお、以下の説明では、計測結果の送信元スレーブ１毎の最新の計測結果だけを用いるのではなく、送信元スレーブ１毎の計測結果の履歴を集計部２１に記憶し、その平均値を用いることとする。 (Description of the first example of processing of the master 2)
Next, a first example of the parallel distributed processing method performed by the master 2 will be described with reference to FIG. First, the communication unit 20 receives a measurement result from each slave 1 (step 301). Next, the communication unit 20 determines whether to use time information (step 302). In the following description, instead of using only the latest measurement result for each transmission source slave 1 of the measurement result, the history of the measurement result for each transmission source slave 1 is stored in the counting unit 21 and the average value thereof is used. I will do it.

ステップ３０２にて、時間情報を利用しないと判断した場合、通信部２０は、受信した計測結果と、当該計測結果の送信元のスレーブ１を識別する送信元ＩＤとを関連付け、集計結果として集計部２１に格納する（ステップ３０３）。なお、時間情報を利用しないケースは、全スレーブ１の処理能力が均一、例えば、全スレーブ１のハードウェア構成が均一で、かつ、各スレーブ１とマスター２との間の接続経路も均一、すなわち通信遅延の差異を無視できる状況において有効なケースである。 If it is determined in step 302 that the time information is not used, the communication unit 20 associates the received measurement result with the transmission source ID for identifying the slave 1 that is the transmission source of the measurement result, and calculates the totaling unit as a totaling result. 21 (step 303). In the case where the time information is not used, the processing capability of all slaves 1 is uniform, for example, the hardware configuration of all slaves 1 is uniform, and the connection path between each slave 1 and master 2 is also uniform. This is an effective case in a situation where the difference in communication delay can be ignored.

図９は、ステップ３０３の直後に、集計部２１に格納された集計結果のテーブル例を示す図である。なお、図９（及び後述の図１０〜図１３）において、計測単位は「処理したタスク数」である。図９に示すテーブル例から、送信元スレーブ１毎に計測結果の平均値を算出すると、送信元ＩＤがＡＡＡＡＡのスレーブ１（以降、ＡＡＡＡＡと呼ぶ。他の送信ＩＤのスレーブ１についても同様）、ＣＣＣＣＣ、及びＥＥＥＥＥの計測結果の平均値は「１０」であり、ＢＢＢＢＢの平均値は「５」、ＤＤＤＤＤの平均値は「１５」である。 FIG. 9 is a diagram illustrating a table example of the totaling result stored in the totaling unit 21 immediately after step 303. In FIG. 9 (and FIGS. 10 to 13 described later), the unit of measurement is “number of processed tasks”. When the average value of the measurement results is calculated for each transmission source slave 1 from the table example shown in FIG. 9, slave 1 whose transmission source ID is AAAAA (hereinafter referred to as AAAAA. The same applies to slaves 1 of other transmission IDs), The average value of the measurement results of CCCCC and EEEEEE is “10”, the average value of BBBBB is “5”, and the average value of DDDDD is “15”.

ステップ３０３に続いて、判断部２２は、集計結果の平均値に関して、集計結果が異常であるか否かを判断する（ステップ３０４）。判断部２２は、例えばＢＢＢＢＢに関しては、次のような判断を行ってもよい。つまり、全スレーブ１とマスター２との間の接続経路が均一ならば、各スレーブ１から見た応答時間もほぼ均一なはずなのに、その間に処理できるタスク数が、ＢＢＢＢＢでは他の半分程度ということである。本実施形態では、各タスクは一様であることを前提とするため、ＢＢＢＢＢは何らかの原因により処理効率が低下していると判断できる。なぜなら、ハードウェア構成が均一である前提から、飛び抜けて低い値となるのはおかしいためである。一方、判断部２２は、例えばＤＤＤＤＤに関しては、次のような判断を行ってもよい。つまり、タスクは一様であり、かつ、ハードウェア構成が均一である前提から、ＤＤＤＤＤだけが飛び抜けて処理効率が高くなることも考え難い。つまり、何らかの原因、例えばネットワーク３接続の不調等により、ＤＤＤＤＤから見た応答時間が長くなっていると判断できる。なお「計数結果がいくつ以上（あるいは以下）ならば異常と見なすか」という判断のポリシーは、プログラム的に固定されていてもよいし、ポリシーファイル等で設定できるようになっていてもよい。 Subsequent to step 303, the determination unit 22 determines whether or not the aggregation result is abnormal with respect to the average value of the aggregation results (step 304). The determination unit 22 may make the following determination regarding BBBBB, for example. In other words, if the connection paths between all the slaves 1 and the master 2 are uniform, the response time seen from each slave 1 should be almost uniform, but the number of tasks that can be processed during that time is about half of that in the BBBBB. It is. In this embodiment, since it is assumed that each task is uniform, it can be determined that the processing efficiency of BBBBB is reduced due to some cause. This is because it is strange that the hardware configuration is extremely low due to the assumption that the hardware configuration is uniform. On the other hand, for example, regarding the DDDDD, the determination unit 22 may make the following determination. In other words, it is difficult to consider that only DDDDDD can be skipped and the processing efficiency is increased on the assumption that the tasks are uniform and the hardware configuration is uniform. That is, it can be determined that the response time as viewed from DDDDD is long due to some cause, for example, malfunction of the network 3 connection. Note that the policy for determining “how many (or less) the counting results are considered abnormal” may be fixed programmatically or may be set in a policy file or the like.

判断部２２による以上のような判断を踏まえ、判断部２２が今回受信した計測結果がＢＢＢＢＢやＤＤＤＤＤからのものであった場合、ステップ３０４にて異常であると判断し（ステップ３０４の判断「ＹＥＳ」）、例えば当該スレーブ１の動作を即刻停止する等の相応の指示情報を生成して、当該スレーブ１に返信する（ステップ３０８）。判断部２２が今回受信した計測結果がＢＢＢＢＢやＤＤＤＤＤからのものでなかった場合（ステップ３０４の判断「ＮＯ」）は、例えば「何もしない」という指示情報等の正常応答を生成して、当該スレーブ１に返信する（ステップ３０７）。 Based on the above determination by the determination unit 22, if the measurement result received by the determination unit 22 is from BBBBB or DDDDD, it is determined as abnormal in step 304 (determination “YES” in step 304 ]), For example, corresponding instruction information such as immediately stopping the operation of the slave 1 is generated and returned to the slave 1 (step 308). If the measurement result received by the determination unit 22 is not from BBBBB or DDDDD (determination “NO” in step 304), a normal response such as instruction information “do nothing” is generated, for example. It returns to the slave 1 (step 307).

なお、判断部２２は、特定のスレーブ１において何らかの異常が発生したと判断した場合、マスター２はその情報を記憶しておき、例えば、タスクのスレーブ１への割り当て（スケジューリング）を工夫する等、その後の並列分散処理システム４全体の制御に役立てたり、例えば、表示部２３により異常が発生したスレーブ１をユーザに報告する等、並列分散処理システム４のユーザに対して情報を提示したりすることができる。 When the determination unit 22 determines that some abnormality has occurred in the specific slave 1, the master 2 stores the information, for example, devise assignment (scheduling) of tasks to the slave 1, etc. Information is presented to the user of the parallel distributed processing system 4, for example, for the control of the entire parallel distributed processing system 4 thereafter, for example, reporting to the user the slave 1 in which an abnormality has occurred by the display unit 23 Can do.

続いて、ステップ３０２にて時間情報を利用すると判断したケースについて説明する。時間情報を利用すると判断したケースは、全スレーブ１の処理能力が均一で、かつ、各スレーブ１とマスター２との間の接続経路も均一であるという前提を置くことができない状況において有効なケースである。 Next, a case where it is determined in step 302 that time information is used will be described. The case where it is determined that the time information is used is effective in a situation where it is impossible to assume that the processing capability of all the slaves 1 is uniform and the connection path between each slave 1 and the master 2 is uniform. It is.

ステップ３０２にて、時間情報を利用すると判断した場合、通信部２０は、受信した計測結果と、当該計測結果の送信元のスレーブ１を識別する送信元ＩＤと、当該計測結果を受信した時刻とを関連付け、集計結果として集計部２１に格納する（ステップ３０５）。 When it is determined in step 302 that the time information is used, the communication unit 20 receives the measurement result, the transmission source ID for identifying the slave 1 that is the transmission source of the measurement result, and the time when the measurement result is received. Are stored in the totaling unit 21 as a totaling result (step 305).

図１０は、ステップ３０５の直後に、集計部２１に格納された集計結果のテーブル例を示す図である。図１０に示すテーブル例では、一見すると通信部２０による一受信当りの計測結果の平均値は全て「１０」となり、問題なさそうに見える。しかしながら、時間当り（以降の実施形態では「秒」を用いる）の計測結果の平均値を見ると、ＡＡＡＡＡ、ＢＢＢＢＢ、ＤＤＤＤＤ、ＥＥＥＥＥでは「２」なのに対し、ＣＣＣＣＣでは「１」となっていることがわかる。 FIG. 10 is a diagram illustrating a table example of a totaling result stored in the totaling unit 21 immediately after step 305. In the example of the table shown in FIG. 10, at first glance, the average values of the measurement results per reception by the communication unit 20 are all “10”, and it seems that there is no problem. However, when looking at the average value of measurement results per time (in the following embodiments, “seconds”), it is “2” for AAAAA, BBBBB, DDDDD, and IEEE, and “1” for CCCCC. I understand.

ステップ２０５に続いて、判断部２２は、集計結果の平均値に関して、集計結果が異常であるか否かを判断する（ステップ３０６）。判断部２２は、ＣＣＣＣＣから計測結果を受信する周期は他のスレーブ１の倍であるが、これはＣＣＣＣＣのネットワーク接続が元来他よりも性能が劣るものであるためか、それとも、ＣＣＣＣＣのネットワーク接続に何か不調が発生したためか、判断することはできない。しかしながら、判断部２２は、ＣＣＣＣＣについて、単位時間あたりのタスクの処理効率が他よりも劣っていることを判断する。なお、判断部２２は、ＣＣＣＣＣの処理効率が元来他よりも劣っているためか、それとも、ＣＣＣＣＣに何か故障が発生したためかの判断はつかないが、とにかくＣＣＣＣＣの処理効率が他よりも劣っていることは判断する。 Subsequent to step 205, the determination unit 22 determines whether or not the aggregation result is abnormal with respect to the average value of the aggregation results (step 306). The determination unit 22 receives the measurement result from the CCCCC twice as long as the other slave 1 because the network connection of the CCCCC is originally inferior to the other, or the network of the CCCCC It cannot be judged whether something is wrong with the connection. However, the determination unit 22 determines that the task processing efficiency per unit time is inferior to the other for CCCCC. Note that the determination unit 22 cannot determine whether the CCCCC processing efficiency is originally inferior to the other, or because some failure has occurred in the CCCCC, but anyway, the CCCCC processing efficiency is higher than the others. Judge that it is inferior.

判断部２２による以上のような判断を踏まえ、判断部２２が今回受信した計測結果がＣＣＣＣＣからのものであった場合、判断部２２は、ステップ３０６にて異常であると判断し（ステップ３０６の判断「ＹＥＳ」）、ステップ３０８に進み、ＣＣＣＣＣからのものでなかった場合、判断部２２は、ステップ３０６にて異常でないと判断し（ステップ３０６の判断「ＮＯ」）、ステップ３０７に進む。 Based on the above determination by the determination unit 22, if the measurement result received by the determination unit 22 is from the CCCCC, the determination unit 22 determines that there is an abnormality in step 306 (in step 306). If “No” (determination “YES”), the process proceeds to step 308, and if it is not from the CCCCC, the determination unit 22 determines that there is no abnormality in step 306 (determination “NO” in step 306) and proceeds to step 307.

（マスター２の処理の第２の例の説明）
続いて、マスター２による並列分散処理方法の処理の第２の例を、図７を参照して説明する。まず、通信部２０は、各スレーブ１から計測結果を受信する（ステップ４０１）。次に、通信部２０は、受信した計測結果と、当該計測結果の送信元のスレーブ１を識別する送信元ＩＤとを関連付け、集計結果として集計部２１に格納する（ステップ４０２）。図１１及び図１２はそれぞれ、ステップ４０２の直後に、集計部２１に格納された集計結果のテーブル例を示す図である。図１１に示すテーブル例では、スレーブ１の台数が５台、図１２に示すテーブル例では、スレーブ１の台数が１０台の場合の例である。 (Description of second example of processing of master 2)
Subsequently, a second example of the parallel distributed processing method performed by the master 2 will be described with reference to FIG. First, the communication unit 20 receives a measurement result from each slave 1 (step 401). Next, the communication unit 20 associates the received measurement result with the transmission source ID for identifying the slave 1 that is the transmission source of the measurement result, and stores it in the totaling unit 21 as the totaling result (step 402). FIG. 11 and FIG. 12 are diagrams showing examples of tabulation result tables stored in the tabulation unit 21 immediately after step 402, respectively. In the example of the table shown in FIG. 11, the number of slaves 1 is five, and in the example of the table shown in FIG. 12, the number of slaves 1 is ten.

次に、判断部２２は、報告当りの集計結果を算出する（ステップ４０３）。ここで算出する報告当りの集計結果は、スレーブ１別のものではなく、全スレーブ１に関する平均値である。ここでの算出には、各スレーブ１の最新の計測結果のみを用いてもよいし、各スレーブ１の全計測結果を用いてもよい。図１１に示すテーブル例では、算出結果は「１０」となり、図１２に示すテーブル例では、「２０」となる。これは、図１２に示すテーブル例では図１１に示すテーブル例と比較して、マスター２が応答すべきスレーブ１の数が２倍になっており、各スレーブ１から見た応答時間が倍になる一方、各スレーブ１の主スレッドがタスクを処理する効率は並列分散処理システム４全体のスレーブ１の数に依存しないためである。 Next, the determination unit 22 calculates a total result per report (step 403). The tabulated result per report calculated here is not the one for each slave 1 but the average value for all the slaves 1. In this calculation, only the latest measurement result of each slave 1 may be used, or all measurement results of each slave 1 may be used. In the table example shown in FIG. 11, the calculation result is “10”, and in the table example shown in FIG. 12, “20”. In the table example shown in FIG. 12, the number of slaves 1 to which the master 2 should respond is doubled compared to the table example shown in FIG. 11, and the response time viewed from each slave 1 is doubled. On the other hand, this is because the efficiency with which the main thread of each slave 1 processes a task does not depend on the number of slaves 1 in the parallel distributed processing system 4 as a whole.

次に、判断部２２は、算出した報告当りの集計結果が閾値を超えているかどうかを判断する（ステップ４０４）。この閾値は、プログラム的に固定されていてもよいし、設定ファイル等で設定できるようになっていてもよい。なお、ここでの閾値は「１５」に設定されているものとする。これは、「報告当りの集計結果が１５を超える場合、応答時間が長過ぎる、つまりスレーブ１の数が多すぎる」ということを意味する。 Next, the determination unit 22 determines whether or not the calculated total result per report exceeds a threshold value (step 404). This threshold value may be fixed programmatically or may be set by a setting file or the like. Note that the threshold here is set to “15”. This means that if the total result per report exceeds 15, the response time is too long, that is, the number of slaves 1 is too large.

ステップ４０４にて、集計部２１に格納された集計結果が、図１１に示すテーブル例の場合、判断部２２による判断結果は「ＮＯ」となり、判断部２２は、計測結果の送信元スレーブ１に対して、正常応答、例えば「何もしない」という指示を含む応答を返信する（ステップ４０５）。一方、ステップ４０４にて、集計部２１に格納された集計結果が、図１２に示すテーブル例の場合、判断部２２による判断結果は「ＹＥＳ」となり、判断部２２は、マスター２の負荷を下げるための方法を判断する。例えば、判断部２２は、処理効率の一番低いスレーブ１を選択し（ステップ４０６）、当該スレーブ１に対して停止指示を返信してもよい。図１２に示すテーブル例の場合では、処理効率の一番低いＪＪＪＪＪが選択される。 If the counting result stored in the counting unit 21 in step 404 is the table example shown in FIG. 11, the determination result by the determining unit 22 is “NO”, and the determining unit 22 sets the measurement result transmission source slave 1 to A normal response, for example, a response including an instruction “do nothing” is returned (step 405). On the other hand, in step 404, when the total result stored in the total unit 21 is the table example shown in FIG. 12, the determination result by the determination unit 22 is “YES”, and the determination unit 22 reduces the load on the master 2. To determine how to. For example, the determination unit 22 may select the slave 1 with the lowest processing efficiency (step 406) and return a stop instruction to the slave 1. In the case of the table example shown in FIG. 12, JJJJJJ having the lowest processing efficiency is selected.

ステップ４０６に続いて、判断部２２は、計測結果を送信元スレーブ１が、ステップ４０６で選択されたスレーブ１からのものであるかどうかを判断する（ステップ４０７）。ステップ４０７にて、スレーブ１からのものであると判断した場合（ステップ４０７の判断「ＹＥＳ」）、判断部２２は、送信元スレーブ１への返信として相応の指示情報、例えばスレーブ１を強制終了する等の指示情報を返信する（ステップ４０８）。ステップ４０７にて、スレーブ１からのものでないと判断した場合（ステップ４０７の判断「ＮＯ」）、計測結果の送信元スレーブ１に対して正常応答、例えば「何もしない」という指示を含む応答を返信する（ステップ４０５）。なお、ステップ４０６で選択したスレーブ（ここではＪＪＪＪＪ）をマスター２内で記憶しておき、次にＪＪＪＪＪから報告を受信した際に、ステップ４０８へ進み、判断部２２は、相応の指示を返信するようにしてもよい。 Subsequent to step 406, the determination unit 22 determines whether the measurement result of the transmission source slave 1 is from the slave 1 selected in step 406 (step 407). If it is determined in step 407 that the data is from the slave 1 (the determination “YES” in step 407), the determination unit 22 forcibly terminates the corresponding instruction information, for example, the slave 1 as a reply to the transmission source slave 1. Instruction information such as to do is returned (step 408). If it is determined in step 407 that the slave unit 1 is not from the slave 1 (determination “NO” in step 407), a normal response to the measurement result transmission source slave 1, for example, a response including an instruction “do nothing”. A reply is made (step 405). The slave selected in step 406 (here, JJJJJ) is stored in the master 2, and when a report is next received from JJJJJJ, the process proceeds to step 408, and the determination unit 22 returns a corresponding instruction. You may do it.

（マスター２の処理の第３の例の説明）
続いて、マスター２による並列分散処理方法の処理の第３の例を、図８を参照して説明する。まず、通信部２０は、各スレーブ１からの計測結果を受信する（ステップ５０１）。次に、通信部２０は、時間情報を利用するか否かを判断する（ステップ５０２）。ステップ５０２にて、時間情報を利用しないと判断した場合、通信部２０は、受信した計測結果と、当該計測結果の送信元のスレーブ１を識別する送信元ＩＤとを関連付け、集計結果として集計部２１に格納する（ステップ５０３）。次に、判断部２２は、通信部２０の一受信当たりの集計結果の統計処理、例えば平均値の算出を行う（ステップ５０４）。 (Explanation of the third example of processing of the master 2)
Next, a third example of the parallel distributed processing method performed by the master 2 will be described with reference to FIG. First, the communication unit 20 receives a measurement result from each slave 1 (step 501). Next, the communication unit 20 determines whether to use time information (step 502). If it is determined in step 502 that the time information is not used, the communication unit 20 associates the received measurement result with the transmission source ID for identifying the slave 1 that is the transmission source of the measurement result, and calculates the totaling unit as the totaling result. 21 (step 503). Next, the determination unit 22 performs statistical processing of the total result per reception of the communication unit 20, for example, calculation of an average value (step 504).

ステップ５０２にて、時間情報を利用すると判断した場合、通信部２０は、受信した計測結果と、当該計測結果の送信元のスレーブ１を識別する送信元ＩＤと、当該計測結果を受信した時刻とを関連付け、集計結果として集計部２１に格納する（ステップ５０５）。図１３は、ステップ５０５の直後に、集計部２１に格納された集計結果のテーブル例を示す図である。次に、判断部２２は、時間当りの集計結果の統計処理、例えば平均値の算出を行う（ステップ５０６）。例えば図１３に示すテーブル例から、判断部２２は、並列分散処理システム４全体で１秒間に平均で１０個のタスクを処理できていることを判断する。 If it is determined in step 502 that the time information is used, the communication unit 20 receives the measurement result, the transmission source ID for identifying the slave 1 that is the transmission source of the measurement result, and the time when the measurement result is received. Are stored in the totaling unit 21 as a totaling result (step 505). FIG. 13 is a diagram illustrating a table example of a totaling result stored in the totaling unit 21 immediately after step 505. Next, the determination unit 22 performs statistical processing of the counting results per time, for example, calculates an average value (step 506). For example, from the table example shown in FIG. 13, the determination unit 22 determines that an average of 10 tasks can be processed per second in the entire parallel distributed processing system 4.

次に、判断部２２は、ステップ５０４または５０６の統計処理に基づいて、予測的な判断や処理を行う（ステップ５０８）。例えば、並列分散処理システム４全体で処理すべきタスクの総数が１０００個とする。図１３に示すテーブル例から、判断部２２は、１５秒間で１５０のタスクが処理済みであり、１秒あたり平均で１０個のタスクが処理されているため、全タスクを全て処理し終えるには、残り８５秒程度を要することが予測でき、例えば、判断部２２は、この予測結果を表示部２３に出力し、表示部２３が並列分散処理システム４のユーザへ当該予測結果を出力することが可能となる。次に、判断部２２は、送信元スレーブ１へ指示を返信する（ステップ５０５）。判断部２２は、予測に基づく判断に従って何かの具体的な処理を指示してもよいし、「特に何もしない」という指示でもよい。 Next, the determination unit 22 performs predictive determination and processing based on the statistical processing in step 504 or 506 (step 508). For example, assume that the total number of tasks to be processed in the entire parallel distributed processing system 4 is 1000. From the table example shown in FIG. 13, the determination unit 22 has processed 150 tasks in 15 seconds and processed 10 tasks on average per second. For example, the determination unit 22 may output the prediction result to the display unit 23, and the display unit 23 may output the prediction result to the user of the parallel distributed processing system 4. It becomes possible. Next, the determination unit 22 returns an instruction to the transmission source slave 1 (step 505). The determination unit 22 may instruct a specific process according to the determination based on the prediction, or may indicate “do nothing special”.

以下、本実施形態の作用効果について説明する。 Hereinafter, the effect of this embodiment is demonstrated.

本実施形態の並列分散処理方法によれば、スレーブ１にて、並列分散処理システム４内で予め定められた計測単位を用いてタスクの実行が計測され、計測結果がマスター２に送信されることで、マスター２は、外部の監視コンピュータや監視プログラムを導入することなく、各スレーブ１の処理効率を把握することができる。そして、マスター２は、各スレーブ１の処理効率に基づいて各スレーブ１に指示を出し、各スレーブ１は指示に基づいて動作を制御することで、マスター２は、各スレーブ１の処理効率に基づき、並列分散処理システム４全体の動作を動的に制御することができる。 According to the parallel distributed processing method of this embodiment, the slave 1 measures task execution using a predetermined measurement unit in the parallel distributed processing system 4 and transmits the measurement result to the master 2. Thus, the master 2 can grasp the processing efficiency of each slave 1 without introducing an external monitoring computer or a monitoring program. Then, the master 2 issues an instruction to each slave 1 based on the processing efficiency of each slave 1, and each slave 1 controls the operation based on the instruction, so that the master 2 is based on the processing efficiency of each slave 1. The overall operation of the parallel distributed processing system 4 can be dynamically controlled.

また、判断部２２は、マスター２が、集計部２１に格納された集計結果のうち、通信部２０による送信元のスレーブ１からの一受信当たりの集計結果と、通信部２０による送信元のスレーブ１以外のスレーブ１からの一受信当たりの集計結果とを比較し、比較結果に基づいて送信元のスレーブ１に対する指示情報を生成し、送信元のスレーブ１に送信してもよい。かかる場合、マスター２は、対象スレーブ１の一受信当たりの集計結果と、他のスレーブ１の一受信当たりの集計結果との比較結果に基づいて、対象スレーブ１に指示を出すことができるため、より正確に対象スレーブ１の処理効率を把握することができると共に、より効率的に並列分散処理システム４全体の動作を動的に制御することができる。 In addition, the determination unit 22 determines that the master 2 includes a total result per reception from the transmission source slave 1 by the communication unit 20 among the total results stored in the totalization unit 21 and a transmission source slave by the communication unit 20. It is also possible to compare the aggregated results per reception from the slaves 1 other than 1, generate instruction information for the transmission source slave 1 based on the comparison result, and transmit the instruction information to the transmission source slave 1. In such a case, the master 2 can issue an instruction to the target slave 1 based on a comparison result between the total result per reception of the target slave 1 and the total result per reception of the other slave 1. The processing efficiency of the target slave 1 can be grasped more accurately, and the operation of the entire parallel distributed processing system 4 can be dynamically controlled more efficiently.

また、通信部２０は、マスター２が、送信元のスレーブ１からの前回の計測結果の受信からの経過時間を更に関連付け、集計結果として当該マスター２の集計部２１に格納し、判断部２２は、マスター２が、集計部２１に格納された集計結果のうち、通信部２０による送信元のスレーブ１からの一受信の集計結果の単位時間当たりの値と、通信部２０による送信元のスレーブ１以外のスレーブ１からの一受信の集計結果の単位時間当たりの値とを比較し、比較結果に基づいて送信元のスレーブ１に対する指示情報を生成し、送信元のスレーブ１に送信してもよい。かかる場合、マスター２は、対象スレーブ１の一受信の集計結果の単位時間当たりの値と、他のスレーブ１の一受信の集計結果の単位時間当たりの値との比較結果に基づいて、対象スレーブ１に指示を出すことができるため、より正確に対象スレーブ１の処理効率を把握することができると共に、より効率的に並列分散処理システム４全体の動作を動的に制御することができる。 In addition, the communication unit 20 further associates the elapsed time from the reception of the previous measurement result from the transmission source slave 1, and stores it in the totaling unit 21 of the master 2 as a totaling result. , Out of the totaling results stored in the totaling unit 21, the master 2 has a value per unit time of the totaling result of one reception from the transmission source slave 1 by the communication unit 20 and the transmission source slave 1 by the communication unit 20. May be compared with a value per unit time of a total result of one reception from a slave 1 other than the slave 1, and instruction information for the transmission source slave 1 may be generated based on the comparison result and transmitted to the transmission source slave 1. . In such a case, the master 2 determines that the target slave 1 is based on a comparison result between the value per unit time of the total reception result of the target slave 1 and the value per unit time of the total reception result of the other slave 1. 1 can be instructed, the processing efficiency of the target slave 1 can be grasped more accurately, and the operation of the entire parallel distributed processing system 4 can be dynamically controlled more efficiently.

また、判断部２２は、マスター２が、集計部２１に格納された集計結果のうち、送信元のスレーブ１の全ての集計結果の平均値と、送信元のスレーブ１以外のスレーブ１の全ての集計結果の平均値とを比較し、比較結果に基づいて送信元のスレーブ１に対する指示情報を生成し、送信元のスレーブ１に送信してもよい。かかる場合、マスター２は、対象スレーブ１の全ての集計結果の平均値と、他のスレーブ１の全ての集計結果の平均値との比較結果に基づいて、対象スレーブ１に指示を出すことができるため、より正確に対象スレーブ１の処理効率を把握することができると共に、より効率的に並列分散処理システム４全体の動作を動的に制御することができる。 In addition, the determination unit 22 determines that the master 2 calculates the average value of all the aggregation results of the transmission source slave 1 among the aggregation results stored in the aggregation unit 21 and all the slaves 1 other than the transmission source slave 1. It may be compared with the average value of the tabulation results, and the instruction information for the transmission source slave 1 may be generated based on the comparison result and transmitted to the transmission source slave 1. In this case, the master 2 can issue an instruction to the target slave 1 based on a comparison result between the average value of all the aggregation results of the target slave 1 and the average value of all the aggregation results of the other slaves 1. Therefore, the processing efficiency of the target slave 1 can be grasped more accurately, and the operation of the entire parallel distributed processing system 4 can be dynamically controlled more efficiently.

また、通信部２０は、マスター２が、送信元のスレーブ１からの前回の計測結果の受信からの経過時間を更に関連付け、集計結果として当該マスター２の集計部２１に格納し、判断部２２は、マスター２が、集計部２１に格納された集計結果のうち、送信元のスレーブ１の全ての集計結果の単位時間当たりの値の平均値と、送信元のスレーブ１以外のスレーブ１の全ての集計結果の単位時間当たりの値の平均値とを比較し、比較結果に基づいて送信元のスレーブ１に対する指示情報を生成し、送信元のスレーブ１に送信してもよい。かかる場合、マスター２は、対象スレーブ１の全ての集計結果の単位時間当たりの値の平均値と、他のスレーブ１の全ての集計結果の単位時間当たりの値の平均値との比較結果に基づいて、対象スレーブ１に指示を出すことができるため、より正確に対象スレーブ１の処理効率を把握することができると共に、より効率的に並列分散処理システム４全体の動作を動的に制御することができる。 In addition, the communication unit 20 further associates the elapsed time from the reception of the previous measurement result from the transmission source slave 1, and stores it in the totaling unit 21 of the master 2 as a totaling result. , Of the total results stored in the totaling unit 21 by the master 2, the average value of all the total results of the transmission source slave 1 and all of the slaves 1 other than the transmission source slave 1 It may be compared with the average value of the aggregated values per unit time, and the instruction information for the transmission source slave 1 may be generated based on the comparison result and transmitted to the transmission source slave 1. In such a case, the master 2 is based on the comparison result between the average value of all the aggregation results of the target slave 1 per unit time and the average value of all the aggregation results of other slaves 1 per unit time. Since the instruction can be issued to the target slave 1, the processing efficiency of the target slave 1 can be grasped more accurately and the operation of the entire parallel distributed processing system 4 can be dynamically controlled more efficiently. Can do.

また、判断部２２は、マスター２が、集計部２１に格納された集計結果のうち、通信部２０によるスレーブ１からの一受信当たりの集計結果の、全てのスレーブ１に関する集計結果に対して統計処理を行い、統計処理結果に基づいて送信元のスレーブ１に対する指示情報を生成し、送信元のスレーブ１に送信してもよい。かかる場合、マスター２は、スレーブ１からの一受信当たりの集計結果の、全てのスレーブ１に関する集計結果に対しての統計処理結果に基づいて、各スレーブ１に指示を出すことができるため、より正確に各スレーブ１の処理効率を把握することができると共に、より効率的に並列分散処理システム４全体の動作を動的に制御することができる。 In addition, the determination unit 22 performs statistics on the aggregation results for all slaves 1 of the aggregation results per reception from the slave 1 by the communication unit 20 among the aggregation results stored in the aggregation unit 21. Processing may be performed to generate instruction information for the transmission source slave 1 based on the statistical processing result and transmit the instruction information to the transmission source slave 1. In such a case, since the master 2 can issue an instruction to each slave 1 based on the statistical processing result for the totaling results for all the slaves 1 in the totaling result per reception from the slave 1, The processing efficiency of each slave 1 can be accurately grasped, and the operation of the entire parallel distributed processing system 4 can be dynamically controlled more efficiently.

また、通信部２０は、マスター２が、送信元のスレーブ１からの前回の計測結果の受信からの経過時間を更に関連付け、集計結果として当該マスター２の集計部２１に格納し、判断部２２は、マスター２が、集計部２１に格納された集計結果のうち、通信部２０によるスレーブ１からの一受信の集計結果の単位時間当たりの値の、全てのスレーブ１に関する値に対して統計処理を行い、統計処理結果に基づいて送信元のスレーブ１に対する指示情報を生成し、送信元のスレーブ１に送信してもよい。かかる場合、マスター２は、スレーブ１からの一受信の集計結果の単位時間当たりの値の、全てのスレーブ１に関する値に対しての統計処理結果に基づいて、各スレーブ１に指示を出すことができるため、より正確に各スレーブ１の処理効率を把握することができると共に、より効率的に並列分散処理システム４全体の動作を動的に制御することができる。 In addition, the communication unit 20 further associates the elapsed time from the reception of the previous measurement result from the transmission source slave 1, and stores it in the totaling unit 21 of the master 2 as a totaling result. The master 2 performs statistical processing on the values related to all slaves 1 of the aggregated results of one reception from the slave 1 by the communication unit 20 among the aggregated results stored in the aggregating unit 21. The instruction information for the transmission source slave 1 may be generated based on the statistical processing result and transmitted to the transmission source slave 1. In such a case, the master 2 can issue an instruction to each slave 1 based on the statistical processing results of the values per unit time of the total result of one reception from the slave 1 with respect to the values related to all the slaves 1. Therefore, the processing efficiency of each slave 1 can be grasped more accurately, and the operation of the entire parallel distributed processing system 4 can be dynamically controlled more efficiently.

また、判断部２２は、集計部２１に格納された集計結果のうち、通信部２０におけるスレーブ１からの一受信当たりの集計結果の、全てのスレーブ１に関する平均値が、予め定められた閾値を超えた場合に、マスター２が、当該マスター２の負荷を下げる旨のスレーブ１に対する指示情報を生成し、当該スレーブ１に送信してもよい。かかる場合、マスター２は、例えば、マスター２の能力を超えた台数のスレーブ１が接続されていることを検出できるようになり、マスター２の能力、または能力に応じて設定された条件に応じてスレーブ１の接続台数を動的に制御することが可能となる。 In addition, the determination unit 22 determines that the average value for all slaves 1 of the total results per reception from the slave 1 in the communication unit 20 among the total results stored in the total unit 21 is a predetermined threshold value. When it exceeds, the master 2 may generate instruction information for the slave 1 to reduce the load on the master 2 and transmit the instruction information to the slave 1. In such a case, for example, the master 2 can detect that the number of slaves 1 exceeding the capacity of the master 2 is connected, and according to the capacity of the master 2 or a condition set according to the capacity. It becomes possible to dynamically control the number of slaves 1 connected.

マスター２は並列分散処理システム４内の全スレーブ１を管理する必要があるが、マスター２の処理能力も有限であるため、際限なく多数のスレーブ１を管理できるわけでない。従って、マスター２の能力を超過した数のスレーブ１の管理を要求された場合、マスター２がマスター２自身の負荷を監視することでそのことを検出し、適切な振る舞いを選択することができる。例えば、マスター２は、処理効率の低いスレーブ１を停止させ、マスター２が適切に管理できる範囲でベストな並列分散処理システム４を維持すること等が挙げられる。 The master 2 needs to manage all the slaves 1 in the parallel distributed processing system 4, but since the processing capability of the master 2 is limited, it is not possible to manage a large number of slaves 1 without limit. Therefore, when management of the number of slaves 1 exceeding the capacity of the master 2 is requested, the master 2 can detect this by monitoring the load of the master 2 itself, and can select an appropriate behavior. For example, the master 2 may stop the slave 1 with low processing efficiency and maintain the best parallel distributed processing system 4 within a range that the master 2 can appropriately manage.

また、本実施形態の並列分散処理システム４によれば、マスター２は、並列分散処理システム４全体の処理効率を把握した上で、予測的な動作、例えば、全タスクの完了予測時刻をユーザへ提示すること等を行うことができる。 Further, according to the parallel distributed processing system 4 of the present embodiment, the master 2 knows the processing efficiency of the entire parallel distributed processing system 4 and then notifies the user of predictive operations, for example, predicted completion times of all tasks. It can be presented.

本実施形態の並列分散処理システム４の応用事例としては、従来の並列分散処理システムにおいて、外部の監視コンピュータや監視ソフトウェアの助けを借りず、マスター２が各スレーブ１の処理効率を監視したり、マスター２がマスター２自身の負荷の程度を監視したり、マスター２が並列分散処理システム４全体の処理効率を把握して並列分散処理システム４全体の動作を予測的に判断したりして、スレーブ１に対する指示を生成し、並列分散処理システム４全体の振る舞いを制御するために用いることができる。 As an application example of the parallel distributed processing system 4 of the present embodiment, in the conventional parallel distributed processing system, the master 2 monitors the processing efficiency of each slave 1 without the help of an external monitoring computer or monitoring software, The master 2 monitors the degree of load of the master 2 itself, or the master 2 grasps the processing efficiency of the entire parallel distributed processing system 4 and predicts the operation of the entire parallel distributed processing system 4 in a predictive manner. 1 can be generated and used to control the behavior of the entire parallel distributed processing system 4.

以上の通り、本実施形態の並列分散処理システム４によれば、外部の監視コンピュータや監視プログラムを導入することなく、マスター・スレーブ型システムにおいて、各スレーブ１の処理効率をマスター２が把握することができ、その情報を、例えばタスクの各スレーブ１への割り当て（スケジューリング）に役立てることができる。また、マスター２の能力を超えた台数のスレーブ１が接続されていることを、マスター２自身が検出できるようになり、例えば、マスター２の能力（もしくは能力に応じて設定された条件）に応じてスレーブ１の接続台数を動的に制御することが可能となる。更に、各スレーブ１の処理効率をマスター２が把握できることを応用して、並列分散処理システム４全体の動作に関する予測を立てることができ、例えば、全タスクが完了するまでの予測時間を利用者へ提示する、などのサービスを提供して、スレーブ１に対して予測に基づく指示を与えることができるようになる。 As described above, according to the parallel distributed processing system 4 of the present embodiment, the master 2 grasps the processing efficiency of each slave 1 in the master / slave type system without introducing an external monitoring computer or a monitoring program. The information can be used for assignment (scheduling) of tasks to each slave 1, for example. In addition, it becomes possible for the master 2 itself to detect that the number of slaves 1 exceeding the capability of the master 2 is connected. For example, according to the capability of the master 2 (or conditions set according to the capability) Thus, the number of connected slaves 1 can be dynamically controlled. Furthermore, by applying the fact that the master 2 can grasp the processing efficiency of each slave 1, it is possible to make a prediction regarding the overall operation of the parallel distributed processing system 4. For example, the estimated time until all tasks are completed is given to the user. By providing a service such as presenting, it is possible to give an instruction based on the prediction to the slave 1.

１…スレーブ、２…マスター、３…ネットワーク、４…並列分散処理システム、１０…処理部、１１…格納部、１２…報告部、１３…制御部、２０…通信部、２１…集計部、２２…判断部、２３…表示部。 DESCRIPTION OF SYMBOLS 1 ... Slave, 2 ... Master, 3 ... Network, 4 ... Parallel distributed processing system, 10 ... Processing part, 11 ... Storage part, 12 ... Report part, 13 ... Control part, 20 ... Communication part, 21 ... Total part, 22 ... judgment part, 23 ... display part.

Claims

A parallel distributed processing method executed by a parallel distributed processing system that includes one or more slaves and a master connected to the slaves via a network and performs parallel distributed processing of tasks on the slaves. ,
The slave executes the task assigned to the slave, measures the execution of the task using a predetermined measurement unit in the parallel distributed processing system, and stores the measurement result in the storage unit of the slave Processing steps;
A first reporting step in which the slave transmits a measurement result stored in the storage means of the slave to the master;
The master receives a measurement result from the slave, associates the measurement result with information for identifying the slave of the transmission source, and a communication step of storing in the aggregation means of the master as the aggregation result;
Based on the counting result stored in the counting means, the master generates instruction information for the slave, and transmits to the slave.
A second reporting step in which the slave receives instruction information from the master;
A control step in which the slave controls the operation of the slave based on the instruction information received in the second reporting step;
A parallel distributed processing method.

In the determination step, the master determines a total result per reception from the transmission source slave in the communication step out of the total results stored in the totalization unit, and other than the transmission source slave in the communication step. Comparing the aggregated result per reception from the slave, generating instruction information for the transmission source slave based on the comparison result, and transmitting to the transmission source slave,
The parallel distributed processing method according to claim 1.

In the communication step, the master further associates the elapsed time from the reception of the previous measurement result from the slave of the transmission source, and stores it in the aggregation means of the master as the aggregation result,
In the determination step, the master determines a value per unit time of a totaling result of one reception from the transmission source slave in the communication step, among the totaling results stored in the totaling unit, and the communication step. Compare the value per unit time of the total result of one reception from the slave other than the transmission source slave, generate instruction information for the transmission source slave based on the comparison result, and transmit to the transmission source slave To
The parallel distributed processing method according to claim 1.

In the determination step, the master calculates an average value of all the aggregation results of the transmission source slave among the aggregation results stored in the aggregation means, and all the aggregation results of the slaves other than the transmission source slave. The instruction value for the transmission source slave based on the comparison result, and transmit to the transmission source slave,
The parallel distributed processing method according to claim 1.

In the communication step, the master further associates the elapsed time from the reception of the previous measurement result from the slave of the transmission source, and stores it in the aggregation means of the master as the aggregation result,
In the determining step, the master calculates an average value of values per unit time of all the aggregation results of the transmission source slave among the aggregation results stored in the aggregation means, and the other than the transmission source slave. Comparing the average value of all the aggregate results of the slaves per unit time, generating instruction information for the slave of the transmission source based on the comparison result, and transmitting to the slave of the transmission source,
The parallel distributed processing method according to claim 1.

In the determination step, the master performs a statistical process on the aggregation results for all the slaves of the aggregation results received from the slave in the communication step among the aggregation results stored in the aggregation means. Performing instruction information for the transmission source slave based on the statistical processing result, and transmitting to the transmission source slave,
The parallel distributed processing method according to claim 1.

In the communication step, the master further associates the elapsed time from the reception of the previous measurement result from the slave of the transmission source, and stores it in the aggregation means of the master as the aggregation result,
In the determination step, for the values related to all of the slaves, the values per unit time of the total result of one reception from the slave in the communication step among the total results stored in the total unit. Performing statistical processing, generating instruction information for the transmission source slave based on the statistical processing result, and transmitting to the transmission source slave,
The parallel distributed processing method according to claim 1.

In the determination step, among the total results stored in the total means, an average value for all the slaves of the total results per reception from the slave in the communication step exceeds a predetermined threshold value. In this case, the master generates instruction information for the slave to reduce the load on the master, and transmits the instruction information to the slave.
The parallel distributed processing method according to any one of claims 1 to 7, wherein:

A parallel distributed processing system that includes one or more slaves and a master connected to the slaves via a network, and performs parallel distributed processing of tasks on the slaves,
The slave is
A processing unit that executes the task assigned to the slave, measures the execution of the task using a predetermined measurement unit in the parallel distributed processing system, and stores the measurement result in the storage unit of the slave;
Reporting means for transmitting the measurement result stored in the storage means of the slave to the master and receiving instruction information from the master as a response to the transmission;
Control means for controlling the operation of the slave based on the instruction information received by the reporting means;
With
The master
A communication means for receiving a measurement result from the slave, associating the measurement result with information for identifying the slave of the transmission source, and storing the result in the master as a total result;
Based on the counting results stored in the counting means, generating instruction information for the slave, and determining means for transmitting to the slave;
A parallel distributed processing system.