JP4485428B2

JP4485428B2 - Network system, management computer, cluster management method, and computer program

Info

Publication number: JP4485428B2
Application number: JP2005222647A
Authority: JP
Inventors: 信也和田
Original assignee: Sony Computer Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2004-08-02
Filing date: 2005-08-01
Publication date: 2010-06-23
Anticipated expiration: 2025-08-01
Also published as: JP2006072987A

Description

本発明は、広帯域環境のコンピュータ・ネットワークに接続される複数のコンピュータによって行われる効率的な分散コンピューティング技術に関する。
より詳しくは、分散コンピューティングが可能なネットワークシステムおよびその構成要素、ならびにネットワークにおけるクラスタ管理方法に関する。 The present invention relates to an efficient distributed computing technique performed by a plurality of computers connected to a computer network in a broadband environment.
More particularly, the present invention relates to a network system capable of distributed computing, its components, and a cluster management method in a network.

今日、ネットワークに接続された複数のコンピュータが協働して一つのジョブを分散処理することが一般的になっている。ジョブを分散処理する場合、従来は、どのコンピュータにどのジョブを割り当てるかを決めるために、予めネットワークに接続可能なすべてのコンピュータの処理能力を把握しているサーバの存在が不可欠となる。
サーバは、ジョブの負荷の大きさと、分散処理を行おうとするときにネットワークに接続されている各コンピュータの余剰処理能力（計算資源）とを特定し、負荷に応じた余剰処理能力を有するコンピュータを逐次割り当てていき、その割り当てたコンピュータからジョブの実行結果を受け取る。 Today, it is common for a plurality of computers connected to a network to collaborate to process one job in a distributed manner. In the case of distributed processing of jobs, conventionally, in order to determine which job is assigned to which computer, it is indispensable to have a server that knows the processing capabilities of all computers that can be connected to the network in advance.
The server identifies the size of the job load and the surplus processing capacity (calculation resources) of each computer connected to the network when performing distributed processing, and selects a computer having surplus processing capacity according to the load. Sequential assignment is performed, and job execution results are received from the assigned computers.

サーバを必要とする従来の分散処理方法では、任意の時点でネットワークに接続されたり、非接続になったりするコンピュータの余剰処理能力を、サーバにおいて、迅速に把握することは、非常に困難である。また、サーバが、ジョブの分散処理を依頼したコンピュータからその実行結果を受け取って、ジョブの依頼元に転送しなければならないため、サーバのオーバーヘッドが大きくなる。そのため、ジョブの実行に必要な時間と、ネットワークを介するデータ伝送に必要な時間とが実質的に増加してしまうという問題がしばしば生じていた。
本発明は、このような従来の問題点を解決することができる分散処理の仕組みを提供することを、主たる課題とする。 In a conventional distributed processing method that requires a server, it is very difficult for the server to quickly grasp the surplus processing capacity of the computer that is connected to the network or disconnected at any time. . Further, since the server must receive the execution result from the computer that requested the job distribution processing and transfer it to the job request source, the overhead of the server increases. For this reason, there has often been a problem that the time required for job execution and the time required for data transmission via the network are substantially increased.
The main object of the present invention is to provide a distributed processing mechanism capable of solving such conventional problems.

本発明は、実行すべきジョブのサイズ、種類、数が様々で、それらが時々刻々と変化することが予想される情報処理を複数のコンピュータにより効率的に分散実行することができるネットワークシステムおよびその構成要素、ならびにクラスタ管理方法により、上記の課題を解決する。
ここにいう「コンピュータ」は、コンピュータプログラムによって動作するプロセッサを含む装置をいうが、装置の形式をとらないデバイス、プロセッサボード、プロセッサチップ、あるいはプロセッサ自体も「コンピュータ」の概念に含まれる。例えば、マルチプロセッサシステムのような個々のプロセッサも「コンピュータ」となり得る。 The present invention relates to a network system capable of efficiently distributing and executing information processing, which is expected to change every moment, in various sizes, types, and numbers of jobs to be executed by a plurality of computers. The above problem is solved by the component and the cluster management method.
The “computer” mentioned here refers to a device including a processor that operates according to a computer program, but a device, processor board, processor chip, or processor itself that does not take the form of the device is also included in the concept of “computer”. For example, an individual processor, such as a multiprocessor system, can also be a “computer”.

本発明が提供する第１構成のネットワークシステムは、それぞれ、他のコンピュータとのクラスタ化が可能な複数のコンピュータが自由に参加および離脱することができるネットワークシステムである。
このネットワークシステムは、各々のコンピュータの状態がそれぞれクラスタ化が可能な状態かどうかを表すクラスタ化可否情報を記録した第１のテーブルと、既に一又は複数のコンピュータにより形成されているクラスタが他のコンピュータをどの程度追加し易いかを表す追加容易性情報を記録するための第２のテーブルとを含んでいる。
いずれかの前記コンピュータは、前記第１のテーブルのクラスタ化可否情報がクラスタ化可能の状態を表す自己および他のコンピュータを含んだクラスタを形成するとともに、形成したクラスタに含まれるすべてのコンピュータについての前記クラスタ化可否情報を、クラスタ化不能の状態を表す情報に更新し、さらに形成したクラスタについての前記追加容易性情報を前記第２テーブルに記録するクラスタ形成手段を有している。また、クラスタを形成した前記コンピュータは、当該クラスタへの追加候補となる候補コンピュータが存在するときに当該候補コンピュータを前記クラスタに追加するかどうかを前記第２のテーブルに記録されている追加容易性情報に基づいて決定するクラスタ成長手段を有するものである。 The network system of the first configuration provided by the present invention is a network system in which a plurality of computers that can be clustered with other computers can freely join and leave.
This network system includes a first table in which information on whether or not each computer can be clustered is recorded, and a cluster that is already formed by one or more computers. And a second table for recording additional ease information indicating how easy it is to add a computer.
Any one of the computers forms a cluster including self and other computers whose clusterability information in the first table indicates a clusterable state, and for all the computers included in the formed cluster. Cluster forming means for updating the clusterability information to information indicating a non-clusterable state and further recording the additional ease information on the formed cluster in the second table. Further, the computer that has formed a cluster has the ease of addition recorded in the second table as to whether or not to add the candidate computer to the cluster when there is a candidate computer to be added to the cluster. It has a cluster growth means which determines based on information.

このような構成のネットワークシステムでは、いずれかのコンピュータが、第１のテーブルに記録されている個々のコンピュータのクラスタ化可否情報をもとに自律的にクラスタを形成し、そのクラスタにおける追加容易性情報を第２のテーブルに記録する。クラスタに含めるコンピュータの数は、クラスタが形成されたタイミングで定まるデフォルト数であっても良く、実際に投入された、あるいは、近い将来投入が予想されるジョブのサイズ等に応じてフレキシブルに定まる数であっても良い。いずれにしても、クラスタ化可否情報がクラスタ化可能な状態であるコンピュータが選定されてクラスタが形成される。
クラスタが形成された後、追加候補となる候補コンピュータがあるときは、そのクラスタを形成したコンピュータが、第２テーブルに記録されている追加容易性情報に基づいてその候補コンピュータをクラスタに追加するかどうかを決定する。 In the network system having such a configuration, any one of the computers autonomously forms a cluster based on the clustering availability information of each computer recorded in the first table, and is easy to add in the cluster. Information is recorded in the second table. The number of computers included in the cluster may be the default number determined at the time the cluster is formed, or a number that is determined flexibly according to the size of the job that was actually submitted or expected to be submitted in the near future. It may be. In any case, a computer whose clusterability information is clusterable is selected to form a cluster.
After the cluster is formed, if there is a candidate computer that is an additional candidate, whether the computer that formed the cluster adds the candidate computer to the cluster based on the additional ease information recorded in the second table Decide if.

追加容易性情報は、クラスタを形成したコンピュータが、候補コンピュータを追加することが適切かどうかを自律的に決定するための情報であって、様々な種類の情報を用いることができる。最先に追加を申し込んだいくつかのコンピュータのみを追加させる、乱数により発生した番号を識別情報とするコンピュータのみを追加させる・・・といった条件等も追加容易性情報として採用することができる。但し、追加させるかどうかの決定の処理を簡略化する観点からは、追加のし易さといった尺度を数値により定量化しておき、この数値との大小比較を行えるようにして、数値が高くなるほど候補コンピュータを当該クラスタへ追加させ易くするようにすることが望ましい。この場合の数値は、一例を挙げれば、不定期に発生する５回の追加可否の問い合わせに対して１回だけランダムに応答して追加を認める（この場合の数値は１／５：２０％となる）、合計で１日のシステム運用時間の１０％の時間帯の受付分だけ、候補コンピュータの追加を認める（この場合の数値は２４（時間）×６０（分）×６０（秒）×０．１×α（確率値）となる）という類のものである。 The addability information is information for the computer forming the cluster to autonomously determine whether it is appropriate to add the candidate computer, and various types of information can be used. Conditions such as adding only a few computers that have applied for the addition first, adding only a computer having identification number as a number generated by a random number, etc. can also be adopted as additional ease information. However, from the viewpoint of simplifying the process of determining whether or not to add, a measure such as ease of addition is quantified by a numerical value so that it can be compared with this numerical value. It is desirable to facilitate the addition of computers to the cluster. As an example, the numerical value in this case is one-time response to an inquiry about whether or not to add five times that occur irregularly, and the addition is permitted (the numerical value in this case is 1/5: 20%) In this case, addition of candidate computers is allowed only for the accepted time period of 10% of the system operation time of one day (in this case, the numerical value is 24 (hours) × 60 (minutes) × 60 (seconds) × 0 .1 × α (probability value)).

なお、数値は、候補コンピュータの追加に拘わらず、一定値に保持させるようにしても良いし、第２のテーブルに記録した後に変動可能にしても良い。数値を変動可能にする場合は、候補コンピュータをクラスタに追加したことを契機に変動させることにより、候補コンピュータが集まり易いクラスタは、さらに他の候補コンピュータを集めやすくなり、実際の需要に即したサイズのクラスタを容易に形成できるようになり易いという利点が生じる。 Note that the numerical value may be held at a constant value regardless of addition of candidate computers, or may be variable after being recorded in the second table. If the numerical value can be changed, it is easier to gather candidate computers by changing the candidate computer added to the cluster as a trigger, and it becomes easier to collect other candidate computers. This is advantageous in that it is easy to form a cluster.

このようにして、候補コンピュータがクラスタに逐次追加されることにより、様々なサイズのクラスタがネットワークシステムの複数のコンピュータ上で形成され、成長していく。そのため、実行すべきジョブのサイズ、種類、数が様々で、それらが時々刻々と変化していくことが予想される情報処理を効率的に実行することができる分散コンピューティングが実現され、従来の問題点が解消される。 In this way, as candidate computers are sequentially added to the cluster, clusters of various sizes are formed and grown on a plurality of computers in the network system. As a result, distributed computing is realized that can efficiently execute information processing that is expected to vary in size, type, and number of jobs to be executed. The problem is solved.

第１構成のネットワークシステムにおける、ある実施の態様では、クラスタを形成したコンピュータが、当該クラスタによるジョブの実行が終了した時点で当該クラスタを消滅させるとともに、前記第２のテーブルに記録されている前記追加容易性情報および消滅したクラスタに所属していたすべてのコンピュータについて前記第１のテーブルに記録されていたクラスタ化可否情報をクラスタ形成前の状態に復帰させるクラスタ消滅手段をさらに有している。
クラスタは、ジョブが実行されるまでは、上述した追加容易性情報に基づいて成長するが、ジョブの実行が終了した時点で消滅するので、固定的にクラスタを用意しておく場合に比べて、コンピュータの有効活用が図れるようになる。 In one embodiment of the network system of the first configuration, the computer that formed the cluster deletes the cluster when the job execution by the cluster is completed, and is recorded in the second table. It further has cluster extinguishing means for restoring the clusterability information recorded in the first table for all computers belonging to the added cluster information and the extinguished cluster to the state before the cluster formation.
Until the job is executed, the cluster grows based on the additional ease information described above. However, since it disappears when the job execution is completed, compared to the case where the cluster is fixedly prepared, The computer can be used effectively.

本発明が提供する第２構成のネットワークシステムは、それぞれ、他のコンピュータとのクラスタ化が可能な一又は複数のコンピュータを自己の傘下とする複数の管理コンピュータが自由に参加および離脱することができるネットワークシステムである。
「一又は複数のコンピュータを自己の傘下のコンピュータとして管理する」とは、自己に接続されている複数のコンピュータの動作を制御するとともに、それらの動作状態を監視することをいう。
このネットワークシステムにおいて、各々のコンピュータの状態がそれぞれクラスタ化が可能な状態かどうかを表すクラスタ化可否情報を記録した第１のテーブルと、既に一又は複数のコンピュータにより形成されたクラスタが他のコンピュータをどの程度追加し易いかを表す追加容易性情報を記録した第２のテーブルとを含んでいる。
また、少なくとも一つの前記管理コンピュータは、自己の傘下にあるコンピュータおよび他の管理コンピュータの傘下にあるコンピュータのうち前記第１のテーブルのクラスタ化可否情報がクラスタ化可能の状態を表すコンピュータを含んだクラスタを形成するとともに、形成したクラスタに含まれるすべてのコンピュータについての前記クラスタ化可否情報をクラスタ化不能の状態を表す情報に更新するクラスタ形成手段を有している。さらに、クラスタを形成した前記管理コンピュータは、追加候補となる候補コンピュータが存在するときに当該候補コンピュータを前記クラスタに追加するかどうかを前記第２のテーブルに記録されている追加容易性情報に基づいて決定するクラスタ成長手段を有している。 In the network system of the second configuration provided by the present invention, a plurality of management computers each having one or a plurality of computers that can be clustered with other computers can freely join and leave. It is a network system.
“Managing one or a plurality of computers as computers under their control” means controlling the operations of a plurality of computers connected to the computers and monitoring their operation states.
In this network system, a first table in which information on whether or not each computer can be clustered is recorded, and a cluster already formed by one or more computers is another computer. And a second table in which easy-to-add information indicating how easy it is to add is recorded.
In addition, at least one of the management computers includes a computer in which the clusterability information in the first table represents a clusterable state among computers under its own control and computers under the control of another management computer. Cluster forming means for forming a cluster and updating the clustering availability information for all computers included in the formed cluster to information indicating a non-clusterable state. Further, the management computer that formed the cluster determines whether or not to add the candidate computer to the cluster when there is a candidate computer to be added based on the additional ease information recorded in the second table. Cluster growth means to be determined.

このような構成のネットワークシステムでは、いずれかの管理コンピュータが、第１のテーブルに記録されているコンピュータのクラスタ化可否情報をもとに自律的にクラスタを形成し、そのクラスタにおける追加容易性情報を第２のテーブルに記録する。クラスタが形成された後、追加候補となる候補コンピュータがあるときは、そのクラスタを形成した管理コンピュータが、第２テーブルに記録されている追加容易性情報に基づいてその候補コンピュータをクラスタに追加するかどうかを決定する。クラスタ化可否情報、追加容易性情報の扱い、クラスタ化、クラスタ成長の際の基準は、第１構成のネットワークシステムと同じである。 In the network system having such a configuration, any of the management computers autonomously forms a cluster based on the computer clusterability information recorded in the first table, and additional ease information in the cluster. Is recorded in the second table. After the cluster is formed, if there is a candidate computer that is an additional candidate, the management computer that formed the cluster adds the candidate computer to the cluster based on the additional ease information recorded in the second table. Decide whether or not. The criteria for handling clusterability information and additional ease information, clustering, and cluster growth are the same as those of the network system of the first configuration.

第２構成のネットワークシステムにおける、ある実施の態様では、前記管理コンピュータは、前記形成したクラスタによるジョブの実行が終了した時点で当該クラスタを消滅させるとともに、前記第２のテーブルおよび消滅したクラスタに所属していたすべてのコンピュータについての前記第１のテーブルの記録情報をクラスタ形成前の状態に復帰させるクラスタ消滅手段をさらに有するものである。
クラスタは、ジョブが実行されるまでは、上述した追加容易性情報に基づいて成長するが、ジョブの実行が終了した時点で消滅するので、固定的にクラスタを用意しておく場合に比べて、コンピュータの有効活用が図れるようになる。 In one embodiment of the network system of the second configuration, the management computer annihilates the cluster when job execution by the formed cluster is completed, and belongs to the second table and the erasure cluster The apparatus further includes cluster extinguishing means for restoring the recording information of the first table for all the computers that have been used to the state before the cluster formation.
Until the job is executed, the cluster grows based on the additional ease information described above. However, since it disappears when the job execution is completed, compared to the case where the cluster is fixedly prepared, The computer can be used effectively.

第２構成のネットワークシステムにおいて、第２のテーブルは、任意の箇所に存在しても良いが、通常は、第２のテーブルを生成することになるいずれかの管理コンピュータが保有することになる。この場合、その管理コンピュータは、最大で自己の傘下にあるコンピュータの数の前記第２のテーブルを保有することになる。
第２テーブルは、それを保有する管理コンピュータが自己の傘下にある第１コンピュータを含む第１クラスタを形成するときに当該第１コンピュータ用として生成するマスターテーブルと、それを保有する管理コンピュータが、他の管理コンピュータが形成した第２クラスタに追加される、自己の傘下にある第２コンピュータの動作を監視制御するときには当該第２コンピュータ用に生成するスレーブテーブルのいずれかを少なくとも含む。追加容易性情報はマスターテーブルに記録される。マスターテーブルを保有する管理コンピュータは、第１クラスタの形成、第１クラスタ内のコンピュータ数の変更並びに第１クラスタの消滅に関わる情報処理を主導的に行うマスター管理コンピュータとして振る舞い、スレーブテーブルを保有する管理コンピュータは、第２クラスタに対するスレーブ管理コンピュータとして振る舞う。 In the network system of the second configuration, the second table may exist at an arbitrary place, but normally, it is held by any management computer that will generate the second table. In this case, the management computer holds the second table of the number of computers under its control.
The second table is a master table that is generated for the first computer when the management computer that owns it forms a first cluster that includes the first computer under its control, and the management computer that owns the master table, It includes at least one of the slave tables generated for the second computer when the operation of the second computer under its control added to the second cluster formed by another management computer is monitored and controlled. Additional ease information is recorded in the master table. The management computer that holds the master table behaves as a master management computer that mainly performs information processing related to formation of the first cluster, change of the number of computers in the first cluster, and disappearance of the first cluster, and holds the slave table. The management computer behaves as a slave management computer for the second cluster.

クラスタ化を促進する観点からは、マスター管理コンピュータは、いずれかの管理コンピュータに対してクラスタ化可能状態のコンピュータがあるかどうかを問い合わせることにより第１クラスタに追加させるための候補コンピュータを探索する探索手段を有するものとする。また、いずれかの管理コンピュータは、マスター管理コンピュータにより形成された第１クラスタについての前記追加容易性情報に基づいて自己の傘下の候補コンピュータを当該第１クラスタに追加させるかどうかを決定するように構成する。 From the viewpoint of promoting clustering, the master management computer searches for a candidate computer to be added to the first cluster by inquiring of any management computer whether there is a computer in a clusterable state. It shall have means. Further, one of the management computers determines whether or not to add a candidate computer belonging to the management computer to the first cluster based on the additional ease information about the first cluster formed by the master management computer. Constitute.

本発明が提供する第３構成のネットワークシステムは、それぞれ、他のコンピュータとのクラスタ化が可能な複数のコンピュータが自由に参加および離脱することができるネットワークシステムであって；既に一又は複数のコンピュータにより形成されているクラスタ毎に、当該クラスタに所属するコンピュータが関係付けられている他のクラスタの識別情報をリストアップしたテーブルを含み；クラスタを形成した前記コンピュータは、追加候補となる候補コンピュータが存在するときに当該候補コンピュータを自己のクラスタに関係付けられている他のクラスタに追加するかどうかを前記テーブルにリストアップされた識別情報に基づいて決定するクラスタ成長手段を有しているネットワークシステムである。「関係付けられている」とは、例えば互いに連絡が可能で連携処理等ができる状態をいう。このような構成のネットワークシステムでは、追加容易性情報によらずともベキ分布に従うクラスタリングが可能になる。 The network system of the third configuration provided by the present invention is a network system in which a plurality of computers that can be clustered with other computers can freely join and leave; already one or a plurality of computers For each of the clusters formed by the above, including a table listing identification information of other clusters to which the computers belonging to the cluster are related; the computer forming the cluster is a candidate computer that is an additional candidate A network system having cluster growth means for determining whether to add the candidate computer to another cluster related to the own cluster, based on the identification information listed in the table It is. “Associated” means a state in which, for example, it is possible to communicate with each other and perform cooperation processing. In the network system having such a configuration, clustering according to the power distribution can be performed regardless of the additional ease information.

本発明は、また、例えば上記第２構成のネットワークシステムを構成するための管理コンピュータを提供することにより、上記課題の解決を図る。
この管理コンピュータは、それぞれ、他のコンピュータとのクラスタ化が可能な一又は複数のコンピュータを自己の傘下とする管理コンピュータであって；他の同種機能を有する管理コンピュータと共に自由に参加および離脱することができるコンピュータネットワークに接続するためのネットワーク接続手段と；各々のコンピュータの状態がそれぞれクラスタ化が可能な状態かどうかを表すクラスタ化可否情報を記録した第１のテーブル、および、既に一又は複数のコンピュータにより形成されたクラスタが他のコンピュータをどの程度追加し易いかを表す追加容易性情報を記録するための第２のテーブルへのアクセスを可能にするテーブル管理手段と；自己の傘下にある前記コンピュータおよび他の管理コンピュータの傘下にある前記コンピュータのうち前記第１のテーブルのクラスタ化可否情報がクラスタ化可能の状態を表すコンピュータを含んだクラスタを形成するとともに、形成したクラスタに含まれるすべてのコンピュータについての前記クラスタ化可否情報をクラスタ化不能の状態を表す情報に更新し、さらに、当該クラスタについての前記追加容易性情報を前記第２のテーブルに記録するクラスタ形成手段と；追加候補となる候補コンピュータが存在するときに当該候補コンピュータを前記クラスタに追加するかどうかを前記第２のテーブルに記録されている追加容易性情報に基づいて決定するクラスタ成長手段と；を有するものである。前記形成したクラスタによるジョブの実行が終了した時点で当該クラスタを消滅させるとともに、前記第２のテーブルに記録されている追加容易性情報および消滅したクラスタに所属していたすべてのコンピュータについて前記第１のテーブルに記録されているクラスタ化可否情報をクラスタ形成前の状態に復帰させるクラスタ消滅手段をさらに有するものとすることもできる。 The present invention also solves the above problem by providing a management computer for configuring the network system of the second configuration, for example.
Each of these management computers is a management computer that owns one or a plurality of computers that can be clustered with other computers; freely participate and leave with other management computers having similar functions. A network connection means for connecting to a computer network capable of performing a clustering; a first table recording clustering availability information indicating whether each computer is in a clusterable state, and one or more already Table management means for allowing access to a second table for recording additional ease information indicating how easily a cluster formed by computers can add another computer; A computer and other management computers A cluster including computers whose clusterability information in the first table of the computers indicates a clusterable state is formed, and the clusterability information for all computers included in the formed cluster is clustered. A cluster forming unit that updates the information indicating an incapable state and records the additional ease information about the cluster in the second table; and when there is a candidate computer that is an additional candidate, Cluster growth means for determining whether to add to the cluster based on the additional ease information recorded in the second table. When the execution of the job by the formed cluster is completed, the cluster is deleted, and the additional ease information recorded in the second table and the first computer for all computers belonging to the deleted cluster. It is also possible to further have a cluster disappearance means for returning the clustering availability information recorded in the table to the state before the cluster formation.

本発明は、また、例えば上記の第１構成および第３構成のネットワークシステムに含まれる複数のコンピュータにより実行されるクラスタ管理方法により、上記課題の解決を図る。
第１のクラスタ管理方法は、それぞれ、他のコンピュータとのクラスタ化が可能な複数のコンピュータが自由に参加および離脱することができるネットワークシステムにおけるクラスタ管理方法であって；各コンピュータが、それぞれ自己の状態がクラスタ化が可能な状態かどうかを表すクラスタ化可否情報を第１のテーブルに記録する段階と；いずれかのコンピュータが、自己と前記第１のテーブルのクラスタ化可否情報がクラスタ化可能の状態を表す他のコンピュータとを含んだクラスタを形成するとともに、形成したクラスタに含まれるすべてのコンピュータについての前記クラスタ化可否情報をクラスタ化不能の状態を表す情報に更新し、さらに、当該クラスタが他のコンピュータをどの程度追加し易いかを表す追加容易性情報を第２のテーブルに記録する段階と；追加候補となる候補コンピュータが存在するときに当該候補コンピュータを前記クラスタに追加するかどうかを前記第２のテーブルに記録されている追加容易性情報に基づいて決定する段階と；を有するクラスタ管理方法である。
前記クラスタを形成したコンピュータが、前記形成したクラスタによるジョブの実行が終了した時点で当該クラスタを消滅させるとともに、前記第２のテーブルおよび消滅したクラスタに所属していたすべてのコンピュータについての前記第１のテーブルの記録情報をクラスタ形成前の状態に復帰させる段階とをさらに含めることもできる。 The present invention also solves the above problem by, for example, a cluster management method executed by a plurality of computers included in the network systems having the first configuration and the third configuration described above.
A first cluster management method is a cluster management method in a network system in which a plurality of computers that can be clustered with other computers can freely join and leave; each computer has its own Recording the clusterability information indicating whether or not the state is clusterable in a first table; any one of the computers can cluster the clusterability information of itself and the first table; Forming a cluster including other computers representing the state, updating the clustering availability information for all computers included in the formed cluster to information representing a non-clusterable state, and Addability information that indicates how easy it is to add other computers Determining whether to add the candidate computer to the cluster when there is a candidate computer to be added based on the additional ease information recorded in the second table. A cluster management method comprising:
When the computer that formed the cluster finishes executing the job by the formed cluster, the cluster disappears, and the first table for all computers belonging to the second table and the disappeared cluster. And a step of returning the recorded information of the table to the state before the cluster formation.

第２のクラスタ管理方法は、それぞれ、他のコンピュータとのクラスタ化が可能な複数のコンピュータが自由に参加および離脱することができるネットワークシステムにおけるクラスタ管理方法であって；既に一又は複数のコンピュータによりクラスタを形成したコンピュータが、当該クラスタに所属するコンピュータが関係付けられている他のクラスタの識別情報を所定のテーブルにリストアップする段階と；クラスタを形成した前記コンピュータが、追加候補となる候補コンピュータが存在するときに当該候補コンピュータを当該クラスタと関係付けられている他のクラスタに追加するかどうかを前記テーブルに記録されている識別情報に基づいて決定する段階と；を有する、クラスタ管理方法である。 A second cluster management method is a cluster management method in a network system in which a plurality of computers that can be clustered with other computers can freely join and leave, respectively; A computer that forms a cluster lists identification information of other clusters to which a computer that belongs to the cluster is associated in a predetermined table; a candidate computer in which the computer that forms the cluster is an additional candidate Determining whether to add the candidate computer to another cluster associated with the cluster based on the identification information recorded in the table. is there.

本発明は、また、コンピュータに所定の機能を付与するためのコンピュータプログラムを提供することにより、上記課題の解決を図る。
第１のコンピュータプログラムは、それぞれ、他のコンピュータとのクラスタ化が可能な複数のコンピュータが自由に参加および離脱することができるネットワークシステムにおけるいずれかのコンピュータに読み取られて実行されることにより当該コンピュータを、各々のコンピュータの状態がそれぞれクラスタ化が可能な状態かどうかを表すクラスタ化可否情報を記録した第１のテーブル、および、既に一又は複数のコンピュータにより形成されたクラスタが他のコンピュータをどの程度追加し易いかを表す追加容易性情報を記録するための第２のテーブルへのアクセスを可能にするテーブル管理手段；前記第１のテーブルのクラスタ化可否情報がクラスタ化可能の状態を表す自己および他のコンピュータを含んだクラスタを形成するとともに、形成したクラスタに含まれるすべてのコンピュータについての前記第１のテーブルのクラスタ化可否情報をクラスタ化不能の状態を表す情報に更新し、さらに、当該クラスタについての前記追加容易性情報を前記第２のテーブルに記録するクラスタ形成手段；追加候補となる候補コンピュータが存在するときに当該候補コンピュータを前記クラスタに追加するかどうかを前記第２のテーブルに記録されている追加容易性情報に基づいて決定するクラスタ成長手段として機能させるためのコンピュータプログラムである。 The present invention also solves the above problems by providing a computer program for giving a computer a predetermined function.
Each of the first computer programs is read and executed by any computer in the network system in which a plurality of computers that can be clustered with other computers can freely join and leave. A first table in which information on whether or not each computer can be clustered is recorded, and a cluster already formed by one or more computers. Table management means for enabling access to the second table for recording additional ease information indicating whether or not it is easy to add; self indicating that the clusterability information of the first table indicates a clusterable state And forming a cluster containing other computers In addition, the clustering availability information of the first table for all computers included in the formed cluster is updated to information indicating a non-clusterable state, and the additional ease information for the cluster is further updated. Cluster forming means for recording in the second table; based on the additional ease information recorded in the second table, whether or not to add the candidate computer to the cluster when there is a candidate computer to be added It is a computer program for functioning as a cluster growth means to be determined.

第２のコンピュータプログラムは、それぞれ他のコンピュータとのクラスタ化が可能な一又は複数のコンピュータを自己の傘下とする管理コンピュータに読み取られて実行されることにより、前記管理コンピュータを；他の同種機能を有する管理コンピュータと共に自由に参加および離脱することができるコンピュータネットワークに接続するためのネットワーク接続手段；各々のコンピュータの状態がそれぞれクラスタ化が可能な状態かどうかを表すクラスタ化可否情報を記録した第１のテーブル、および、既に一又は複数のコンピュータにより形成されたクラスタが他のコンピュータをどの程度追加し易いかを表す追加容易性情報を記録するための第２のテーブルへのアクセスを可能にするテーブル管理手段；自己の傘下にある前記コンピュータおよび他の管理コンピュータの傘下にある前記コンピュータのうち前記第１のテーブルのクラスタ化可否情報がクラスタ化可能の状態を表すコンピュータを含んだクラスタを形成するとともに、形成したクラスタに含まれるすべてのコンピュータについての前記クラスタ化可否情報をクラスタ化不能の状態を表す情報に更新し、さらに、当該クラスタについての前記追加容易性情報を前記第２のテーブルに記録するクラスタ形成手段；追加候補となる候補コンピュータが存在するときに当該候補コンピュータを前記クラスタに追加するかどうかを前記第２のテーブルに記録されている追加容易性情報に基づいて決定するクラスタ成長手段；として機能させるためのコンピュータプログラムである。 The second computer program is read and executed by a management computer that owns one or more computers each capable of clustering with another computer, thereby executing the management computer; A network connection means for connecting to a computer network that can freely join and leave with a management computer having; a clustering availability information indicating whether each computer is clusterable or not Allows access to one table and a second table to record additional ease information representing how easy a cluster already formed by one or more computers can add another computer Table management means; the computer under its own umbrella Among the computers under the control of the computer and other management computers form a cluster including computers in which the clustering availability information in the first table indicates a clusterable state, and all of the computers included in the formed cluster Cluster forming means for updating the clusterability information on the computer to information indicating a non-clusterable state, and further recording the additional ease information on the cluster in the second table; candidate to be an additional candidate A computer program for functioning as cluster growth means for determining whether to add a candidate computer to the cluster when a computer exists based on additional ease information recorded in the second table. .

第３のコンピュータプログラムは、それぞれ、他のコンピュータとのクラスタ化が可能な複数のコンピュータが自由に参加および離脱することができるネットワークシステムにおけるいずれかのコンピュータに読み取られて実行されることにより、当該コンピュータを、既に一又は複数のコンピュータによりクラスタを形成したときに当該クラスタに所属するコンピュータが関係付けられている他のクラスタの識別情報を所定のテーブルにリストアップするクラスタ管理手段と、追加候補となる候補コンピュータが存在するときに当該候補コンピュータを自己のクラスタに関係付けられている他のクラスタに追加するかどうかを前記テーブルにリストアップされた識別情報に基づいて決定するクラスタ成長手段；として機能させるためのコンピュータプログラムである。 Each of the third computer programs is read and executed by any computer in the network system in which a plurality of computers that can be clustered with other computers can freely join and leave. Cluster management means for listing identification information of other clusters associated with computers belonging to the cluster when the computer is already formed with one or a plurality of computers in a predetermined table, an additional candidate, A cluster growth means for determining whether or not to add a candidate computer to another cluster related to the own cluster based on the identification information listed in the table. Con to make A-menu data program.

これらのコンピュータプログラムは、可搬性の記録媒体に記録されて市場を流通し、あるいは、コンピュータまたは管理コンピュータがアクセス可能なコンピュータネットワークを通じてプログラムサーバ等からダウンロードされる。 These computer programs are recorded on a portable recording medium and distributed in the market, or downloaded from a program server or the like through a computer network accessible by a computer or a management computer.

本発明によれば、様々なサイズのクラスタがネットワークシステムの複数のコンピュータのいずれかにおいて形成され、それが所定のテーブルの記録情報に基づいて成長していくので、任意のサイズのクラスタを容易に得られるようになる。そのため、実行すべきジョブのサイズ、種類、数が様々で、それらのいずれかが時々刻々と変化していくことが予想される不確実性の情報処理であっても、それを効率的に実行することができる分散コンピューティングが実現されるという、特有の効果が得られる。 According to the present invention, clusters of various sizes are formed in any one of a plurality of computers of the network system, and grow based on recorded information in a predetermined table. It will be obtained. As a result, there are various job size, type, and number of jobs to be executed, and even if it is information processing of uncertainty that is expected to change from moment to moment, it is efficiently executed A unique effect is achieved in that distributed computing is realized.

＜ネットワークシステムのアーキテクチャ＞
まず、本発明が適用されるネットワークシステムのアーキテクチャを説明する。
図１は、本発明が適用されるネットワークシステム１０１の全体図である。このネットワークシステム１０１は、コンピュータ・ネットワーク１０４を含んでいる。コンピュータ・ネットワーク１０４の例としては、ローカル・エリア・ネットワーク(ＬＡＮ)、インターネットのようなグローバルネットワーク、あるいは他のコンピュータ・ネットワークが挙げられる。 <Network system architecture>
First, the architecture of a network system to which the present invention is applied will be described.
FIG. 1 is an overall view of a network system 101 to which the present invention is applied. The network system 101 includes a computer network 104. Examples of computer network 104 include a local area network (LAN), a global network such as the Internet, or other computer network.

コンピュータ・ネットワーク１０４には、一又は複数のコンピュータを自己の傘下とする管理コンピュータ１０６が、それぞれ任意のタイミングで接続することができる。各管理コンピュータ１０６は、それぞれ、他の管理コンピュータ１０６、および、他の管理コンピュータ１０６の傘下にある他のコンピュータ１０８との間で双方向の通信を行うことができる。各コンピュータ１０８は、何時でも、いずれかの管理コンピュータ１０６の傘下になることができる。つまり、これらのコンピュータ１０６、１０８は、ネットワークシステム１０１に、何時でも自由に離脱できる形態で参加できるようになっている。 A management computer 106 having one or more computers under its control can be connected to the computer network 104 at an arbitrary timing. Each management computer 106 can perform bidirectional communication with another management computer 106 and another computer 108 under the control of the other management computer 106. Each computer 108 can be part of any management computer 106 at any time. That is, these computers 106 and 108 can participate in the network system 101 in a form that can be freely detached at any time.

この実施形態にいう「コンピュータ」は、コンピュータプログラムによって動作するプロセッサを含む装置をいうが、装置の形式をとらないデバイス、プロセッサボード、あるいはこれらの集合も「コンピュータ」の概念に含まれる。また、「一又は複数のコンピュータ１０８を傘下とする」とは、自己に接続されている複数のコンピュータ１０８の動作を制御するとともに、それらの動作状態を監視することをいう。 The “computer” in this embodiment refers to an apparatus including a processor that operates according to a computer program, but a device, a processor board, or a set of these that does not take the form of the apparatus is also included in the concept of “computer”. Further, “having one or a plurality of computers 108 as a subsidiary” means controlling the operations of the plurality of computers 108 connected to the computer 108 and monitoring their operation states.

管理コンピュータ１０６の例としては、サーバ機能を有するコンピュータ、通信機能付のゲームコンソール、コンピューティングデバイス、プロセッサボードなどが含まれる。
管理コンピュータ１０６の傘下にあるコンピュータ１０８の例としては、パーソナルコンピュータ、通信機能付のゲームコンソール及びその他の有線または無線コンピュータ、コンピューティング・デバイス、プロセッサボードなどが含まれる。 Examples of the management computer 106 include a computer having a server function, a game console with a communication function, a computing device, and a processor board.
Examples of the computer 108 under the management computer 106 include a personal computer, a game console with a communication function and other wired or wireless computers, a computing device, a processor board, and the like.

管理コンピュータ１０６へのコンピュータ１０８の接続形態は様々である。例えば、図１に示されるように、コンピュータ・ネットワーク１０４に接続された管理コンピュータ１０６を中心に複数のコンピュータ１０８が直接接続されるスター接続型、管理コンピュータ１０６が接続されている構内ネットワークを介して複数のコンピュータ１０８が接続される構内ネットワーク型、コンピュータ・ネットワーク１０４を介して複数のコンピュータ１０８が管理コンピュータ１０６に接続される広域ネットワーク型等が挙げられる。
構内ネットワーク型における構内ネットワークの種類も様々であり、コンピュータ・ネットワーク１０４とダイレクトに接続されたネットワークもあれば、コンピュータ・ネットワーク１０４とは管理コンピュータ１０６を介して接続されるネットワークもある。 There are various connection forms of the computer 108 to the management computer 106. For example, as shown in FIG. 1, a star connection type in which a plurality of computers 108 are directly connected around a management computer 106 connected to a computer network 104, via a local network to which the management computers 106 are connected. There are a local network type in which a plurality of computers 108 are connected, a wide area network type in which a plurality of computers 108 are connected to a management computer 106 via a computer network 104, and the like.
There are various types of local networks in the local network type. Some networks are directly connected to the computer network 104, and some networks are connected to the computer network 104 via the management computer 106.

各管理コンピュータ１０６は、図２に示されるように、バスＢ１１を通じて相互に接続されたハードディスク等のストレージ装置１０６１、通信装置１０６３、半導体メモリ１０６５、プロセッサ１０６７を含んでいる。このようなハードウエア資源を有する管理コンピュータ１０６は、ＣＤ−ＲＯＭ等の記録媒体を通じて半導体メモリ１０６５にロードされたコンピュータプログラムをプロセッサ１０６７が読み取って実行することにより、図４に示されるように、ストレージ装置１０６１に、後述する複数の管理テーブル２０６を格納するとともに、テーブル管理部２１２、クラスタ管理部２１６、通信制御部２３６、およびジョブ実行部２５６の機能をコンピュータ本体内に構築する。 As shown in FIG. 2, each management computer 106 includes a storage device 1061 such as a hard disk, a communication device 1063, a semiconductor memory 1065, and a processor 1067 connected to each other via a bus B 11. The management computer 106 having such hardware resources allows the processor 1067 to read and execute a computer program loaded into the semiconductor memory 1065 through a recording medium such as a CD-ROM, and as shown in FIG. A plurality of management tables 206 (to be described later) are stored in the device 1061, and the functions of the table management unit 212, cluster management unit 216, communication control unit 236, and job execution unit 256 are built in the computer main body.

テーブル管理部２１２は、管理テーブル２０６にアクセスして、その管理テーブル２０６の記録内容を更新させる。クラスタ管理部２１６は、自己の傘下のコンピュータ１０８および他の管理コンピュータ１０６の傘下にあるコンピュータ１０８が所属するクラスタの形成・成長・消滅に関わる情報処理を行う。つまり、クラスタ管理部２１６は、クラスタの状態変化に応じて、クラスタ形成手段、クラスタ成長手段、クラスタ消滅手段として機能する。 The table management unit 212 accesses the management table 206 and updates the recorded contents of the management table 206. The cluster management unit 216 performs information processing related to the formation, growth, and disappearance of a cluster to which the computer 108 under its control and the computer 108 under the control of the other management computer 106 belong. That is, the cluster management unit 216 functions as a cluster forming unit, a cluster growing unit, and a cluster extinguishing unit in accordance with a change in the cluster state.

通信制御部２３６は、通信装置１０６３を通じて、自己の傘下のコンピュータ１０８および他の管理コンピュータ１０６との間の通信を可能にする。ジョブ実行部２５６は、自己の傘下のコンピュータ１０８を含むクラスタにジョブを投入し、それを実行させる。ジョブ実行部２５６には、最適なクラスタのサイズ、つまりコンピュータ数の特定を容易にするために、ジョブのサイズ、処理時間等を計算する機能を持たせるようにしても良い。 The communication control unit 236 enables communication between the computer 108 under its control and the other management computer 106 through the communication device 1063. The job execution unit 256 submits a job to a cluster including the computer 108 under its control and executes it. The job execution unit 256 may have a function of calculating the job size, processing time, and the like in order to easily identify the optimal cluster size, that is, the number of computers.

コンピュータ１０８は、管理コンピュータ１０６と同様のストレージ装置、通信装置、半導体メモリおよびプロセッサを有している。この実施形態では、ストレージ装置に、後述するステータステーブルを格納する。また、ＣＤ−ＲＯＭ等の記録媒体を通じて半導体メモリにロードされたコンピュータプログラムをプロセッサが読み取って実行することにより、クラスタ管理およびジョブの実行に関する種々の機能をコンピュータ本体内に構築する。望ましくは、他のコンピュータ１０８と命令セット・アーキテクチャ(ＩＳＡ)がすべて同じ、あるいは同じとみなされるもので、同じ命令セットに従って所要の処理を実行できるものとする。 The computer 108 has the same storage device, communication device, semiconductor memory, and processor as the management computer 106. In this embodiment, a status table described later is stored in the storage device. Also, various functions related to cluster management and job execution are built in the computer main body by the processor reading and executing the computer program loaded into the semiconductor memory through a recording medium such as a CD-ROM. Preferably, the other computer 108 and the instruction set architecture (ISA) are all considered to be the same or the same, and the required processing can be executed according to the same instruction set.

各管理コンピュータ１０６における傘下のコンピュータ１０８の数は、任意である。ある管理コンピュータ１０６では、各種アプリケーションによって与えられるジョブを実行する上で必要な処理能力によって、コンピュータ１０８の数が割り当てられている。
管理コンピュータ１０６の傘下にある複数のコンピュータ１０８におけるＩＳＡがそれぞれ同一あるいは同一とみなされる場合には、管理コンピュータ１０６あるいはネットワークシステム１０１におけるアダプタビリティを飛躍的に改善することができる。 The number of subordinate computers 108 in each management computer 106 is arbitrary. In a certain management computer 106, the number of computers 108 is assigned depending on the processing capability required to execute jobs given by various applications.
When the ISAs of a plurality of computers 108 under the management computer 106 are considered to be the same or the same, adaptability in the management computer 106 or the network system 101 can be dramatically improved.

各管理コンピュータ１０６は、各々自己の傘下のコンピュータ１０８又は他の管理コンピュータ１０６の傘下のコンピュータ１０８のうち１つまたはそれ以上を含んでクラスタを形成し、クラスタ単位でジョブを実行できるようにする。自己の傘下のコンピュータ１０８と他の管理コンピュータ１０６の傘下のコンピュータ１０８との間には、少なくとも一つのクラスタを形成する上での性能上の相違、制約等は存在しない。このように扱うことにより、ジョブは、どの管理コンピュータ１０６のどのコンピュータ１０８において実行するかは、さほど重要でなくなる。ジョブの実行結果の受け取り先を、ジョブを依頼した管理コンピュータ１０６、その傘下のコンピュータ１０８あるいは後続のジョブを実行する任意のコンピュータ１０８として指定するだけで足りるようになる。そのため、個々のジョブは、コンピュータ・ネットワーク１０４に接続されている複数の管理コンピュータ１０６の傘下のコンピュータ１０８の間で分散実行することが容易になる。 Each management computer 106 includes one or more of the computers 108 under its own umbrella or the computers 108 under the control of the other management computer 106 to form a cluster so that jobs can be executed in units of clusters. There are no performance differences, restrictions, etc. in forming at least one cluster between the computer 108 under its own umbrella and the computer 108 under the umbrella of the other management computer 106. By handling in this way, it is not so important which computer 108 of which management computer 106 executes the job. It is only necessary to specify the recipient of the job execution result as the management computer 106 that requested the job, the computer 108 under the management computer 106, or an arbitrary computer 108 that executes the subsequent job. Therefore, individual jobs can be easily executed in a distributed manner among computers 108 belonging to a plurality of management computers 106 connected to the computer network 104.

上記のように各コンピュータ１０８が、すべて同じ、あるいは同じとみなされるＩＳＡを有するものとする場合は、コンピュータ１０８間の互換性を達成するためのソフトウェアの追加層の計算上の負担も回避されるし、異質なコンピュータ・ネットワークの混在という問題の多くを防ぐことができる。そのため、用途に応じて、そのようなネットワークシステム１０１を構成するのが望ましい。このようなネットワークシステム１０１では、コンピュータ・ネットワーク１０４に接続されている複数の管理コンピュータ１０６およびコンピュータ１０８間のアーキテクチャの相違は吸収され、図３に示されるように、各管理コンピュータ１０６の傘下にある多数のコンピュータ１０８の各々があたかも情報処理の細胞（Cell）のように機能する、広帯域処理の実現が可能な大規模情報処理統合体ＷＯが形成される。 If each computer 108 has an ISA that is all or considered the same as described above, the computational burden of an additional layer of software to achieve compatibility between computers 108 is also avoided. However, many of the problems of mixing heterogeneous computer networks can be prevented. Therefore, it is desirable to configure such a network system 101 according to the application. In such a network system 101, the difference in architecture between the plurality of management computers 106 and the computers 108 connected to the computer network 104 is absorbed, and is under the control of each management computer 106 as shown in FIG. A large-scale information processing integrated body WO capable of realizing broadband processing, in which each of a large number of computers 108 functions as a cell of information processing (Cell), is formed.

大規模情報処理統合体ＷＯにおける個々のコンピュータ１０８は、物理的には自己が所属する管理コンピュータ１０６によって管理され、単独のコンピュータとして動作したり、その管理コンピュータ１０６の傘下にある他のコンピュータ１０８と共にクラスタ化されて協働で動作したりする。しかし、論理的には、管理コンピュータ１０６による壁はなく、異なる管理コンピュータ１０６の傘下の他のコンピュータ１０８との間でのクラスタ化も可能である。このような形態でクラスタ化される場合、一つのジョブを同じクラスタに属する複数のコンピュータ１０８で分散実行することができる。 Each computer 108 in the large-scale information processing integrated body WO is physically managed by the management computer 106 to which the computer belongs, and operates as a single computer or together with other computers 108 under the management computer 106. They are clustered and work together. However, logically, there is no wall by the management computer 106, and clustering with other computers 108 belonging to different management computers 106 is also possible. When clustered in such a form, one job can be distributedly executed by a plurality of computers 108 belonging to the same cluster.

＜分散コンピューティング＞
本発明は、上記のネットワークシステム１０１のアーキテクチャを利用した効率的な分散コンピューティングの仕組みを提供する。このような仕組みを可能にするため、本発明では、例えば図３に示した大規模情報処理統合体ＷＯのようなネットワークを構成する複数のコンピュータ１０８の各々がノードとなり、少数のノードからスタートして時間の経過と共にノード数が増加する「成長」、ノード間のリンクの接続手法としての「選択的接続」とを繰り返すことができるようにする。
「選択的接続」とは、新たにネットワークに参加したノードがリンクを張る際に、リンク先のノードを選ぶ基準として、より多くのリンクを既に持っているノードに、より接続しやすくなるいうものである。成長と選択的接続の２つの特徴により、ネットワークは、成長しながらスケールフリーの性質を持つようになる。このような成長と選択的接続の繰り返しにより、ノードの持つリンク数の分布がベキ分布になることは、例えば、Albert Barabasi,Reka Albert,Hawoong“Mean-field theory for scale-free random networks"に詳細に示されている。 <Distributed computing>
The present invention provides an efficient distributed computing mechanism using the architecture of the network system 101 described above. In order to enable such a mechanism, in the present invention, for example, each of a plurality of computers 108 constituting a network such as the large-scale information processing integrated body WO shown in FIG. 3 becomes a node, and starts from a small number of nodes. Thus, “growth” in which the number of nodes increases with the passage of time and “selective connection” as a link connection method between nodes can be repeated.
“Selective connection” means that when a node newly joined to the network establishes a link, it becomes easier to connect to a node that already has more links as a criterion for selecting a link destination node. It is. The two characteristics of growth and selective connectivity make the network grow and have a scale-free nature. It is detailed in Albert Barabasi, Reka Albert, Hawoong “Mean-field theory for scale-free random networks” that the distribution of the number of links of a node becomes a power distribution by repeating such growth and selective connection. Is shown in

図５は、ノードをランダムに接続した場合と、選択的接続した場合のノードが持つリンクの数を横軸に、そのリンク数を持つノード数を縦軸にして、実測値をプロットしたグラフである。図５（ａ）は、ノードをランダムに接続した場合の例である。この例の場合は、典型的なリンク数をもつ代表的なノードというものが現れる。図５（ｂ）は、選択的接続した場合の例であり、代表的なノードが現れず、リンク数は、広い範囲に拡がる。つまり、典型的なスケールが現れないという意味で、スケールフリーと称される。図５（ｂ）の分布は、以下のように表され、両対数のグラフで書くと、直線になるような非常に裾野の広い分布となる。
Ｐ（ｋ）＝ＡＫ＾（−ｒ）・・・（１）
但し、ｋ：リンク数、Ｐ（ｋ）：ノード数 FIG. 5 is a graph in which the measured values are plotted with the horizontal axis representing the number of links of nodes when the nodes are randomly connected and the case of selective connection, and the vertical axis representing the number of nodes having the number of links. is there. FIG. 5A shows an example in which nodes are connected at random. In this example, a representative node having a typical number of links appears. FIG. 5B is an example in the case of selective connection, a representative node does not appear, and the number of links extends over a wide range. That is, it is called scale free in the sense that a typical scale does not appear. The distribution of FIG. 5B is expressed as follows, and when written in a log-log graph, the distribution is very wide and forms a straight line.
P (k) = AK ^ (− r) (1)
Where k: number of links, P (k): number of nodes

本発明は、上記の２つの特徴を応用し、個々のノードをクラスタと解釈し、リンクは、互いに結ばれたノードに相当するクラスタに含まれる２つのコンピュータ１０８であると解釈する。これにより、任意の分布、もちろん、図５（ｂ）に示したベキ分布をもつことも可能な形態でクラスタリングを行い、これにより、効率的な分散コンピューティングを可能にするものである。このようにして成長していくネットワークでは、リンク数は、クラスタに含まれるコンピュータ１０８の数、つまりクラスタサイズ、あるいはクラスタの計算能力となる。横軸にクラスタに含まれるコンピュータ１０８の数、縦軸にクラスタの数又はクラスタ化の頻度をプロットすると、それは、図５（ｂ）に示したベキ分布と類似した分布になるはずである。 The present invention applies the above two features, interprets each node as a cluster, and interprets a link as two computers 108 included in a cluster corresponding to nodes connected to each other. Thus, clustering is performed in a form that can have an arbitrary distribution, of course, the power distribution shown in FIG. 5B, thereby enabling efficient distributed computing. In a network that grows in this way, the number of links is the number of computers 108 included in the cluster, that is, the cluster size or the computing capacity of the cluster. If the number of computers 108 included in the cluster is plotted on the horizontal axis and the number of clusters or the frequency of clustering is plotted on the vertical axis, the distribution should be similar to the power distribution shown in FIG.

以下、上記のようなクラスタリングを可能にするには、個々のコンピュータ１０８がどのように動作し、各管理コンピュータ１０６がどのようにコンピュータ１０８又は他の同種の管理コンピュータ１０６の動作を監視制御していったら良いかを述べる。 Hereinafter, in order to enable clustering as described above, how each computer 108 operates, and how each management computer 106 monitors and controls the operation of the computer 108 or other similar management computer 106. Describe what you should do.

個々のコンピュータ１０８は、現在、自己の状態がクラスタ化が可能な状態にあるかどうかを表すコンピュータ・クラスタ・ステータス（クラスタ化可否情報）により管理される。コンピュータ・クラスタ・ステータスは、ステータステーブルに記録され、コンピュータ１０８の状態の変化に追随して、当該コンピュータ１０８又はそれを傘下として管理する管理コンピュータ１０６により更新される。ステータステーブルは、コンピュータ１０８又は管理コンピュータ１０６がアクセス可能な任意のメモリ領域に存在すれば足りるが、コンピュータ１０８のストレージ装置に設けるようにしても良い。
ステータステーブルに記録されるコンピュータ・クラスタ・ステータスの例として、本実施形態では、「clustered」、「run」、「free」の３種類の情報を用いる。「clustered」は既にクラスタ化されており、現在はジョブの実行を待っているコンピュータであることを示す。「run」はジョブを実行中のコンピュータであることを示す。「free」は未稼働状態にあるコンピュータであることを示す。「clustered」、「run」のときは、そのコンピュータ１０８はクラスタ化ができない。そのため、「clustered」、「run」は、クラスタ化不能の状態を表す情報となる。他方、「free」はクラスタ化可能の状態を表す情報となる。 Each computer 108 is currently managed by a computer cluster status (clustering availability information) indicating whether or not its own state is in a clusterable state. The computer cluster status is recorded in the status table, and is updated by the management computer 106 that manages the computer 108 or the computer 108 in accordance with changes in the state of the computer 108. The status table suffices if it exists in an arbitrary memory area accessible by the computer 108 or the management computer 106, but may be provided in the storage device of the computer 108.
In this embodiment, three types of information “clustered”, “run”, and “free” are used as examples of computer cluster statuses recorded in the status table. “Clustered” indicates that the computer is already clustered and is currently waiting for job execution. “Run” indicates that the computer is executing a job. “Free” indicates that the computer is in a non-operating state. When “clustered” or “run”, the computer 108 cannot be clustered. Therefore, “clustered” and “run” are information indicating a non-clusterable state. On the other hand, “free” is information indicating a clusterable state.

管理コンピュータ１０６の役割の一つは、上述したように、自己の傘下のコンピュータ１０８を管理することにある。管理テーブル２０６は、そのために用いられる。上述したように、管理テーブル２０６は、例えば当該管理コンピュータ１０６が有するストレージ装置１０６１に格納される。この管理コンピュータ１０６は、最大で、自己の傘下にあるコンピュータ１０８の数だけの管理テーブル２０６を有する。 One of the roles of the management computer 106 is to manage the computer 108 under its control as described above. The management table 206 is used for that purpose. As described above, the management table 206 is stored in, for example, the storage device 1061 included in the management computer 106. This management computer 106 has as many management tables 206 as the number of computers 108 under its control.

管理テーブル２０６には、２種類ある。ある種類に分類される管理テーブル２０６は、管理コンピュータ１０６が、初めに自己の傘下のコンピュータ１０８を含むクラスタを形成する際にそのクラスタの状態に関する情報を記録するための「マスタテーブル」である。もう１つは、他の管理コンピュータ１０６により形成されるクラスタに追加されるコンピュータ１０８があるときに、そのクラスタの状態に関する情報を記録するための「スレーブテーブル」である。 There are two types of management table 206. The management table 206 classified into a certain type is a “master table” for recording information regarding the state of the cluster when the management computer 106 first forms a cluster including the computer 108 under its management. The other is a “slave table” for recording information about the state of the cluster when there is a computer 108 added to the cluster formed by another management computer 106.

管理コンピュータ１０６は、マスターテーブルにより自己が管理するクラスタに含まれるコンピュータ１０８を監視制御するときは「マスター管理コンピュータ」として振る舞う。他方、スレーブテーブルにより他の管理コンピュータ１０６が管理するクラスタに属する自己の傘下のコンピュータ１０８を監視制御するときは、「スレーブ管理コンピュータ」として振る舞う。つまり、１つの管理コンピュータ１０６でありながら、最大で、自己の傘下のコンピュータ１０８の数だけの種類の管理コンピュータとして振る舞うことになる。 The management computer 106 behaves as a “master management computer” when the computer 108 included in the cluster managed by the management computer 106 is monitored and controlled. On the other hand, when the slave table supervises and controls the computer 108 belonging to the cluster managed by the other management computer 106 using the slave table, it behaves as a “slave management computer”. That is, although it is one management computer 106, it behaves as many types of management computers as many as the number of computers 108 under its control.

マスターテーブルは、マスター管理コンピュータとして振る舞う管理コンピュータ１０６が生成し、スレーブテーブルは、スレーブ管理コンピュータとして振る舞う管理コンピュータ１０６が生成する。スレーブテーブルが存在するということは、それに対応するマスターテーブルがどこかの管理コンピュータ１０６に存在していることを意味する。 The master table is generated by the management computer 106 that behaves as a master management computer, and the slave table is generated by the management computer 106 that behaves as a slave management computer. The presence of a slave table means that a corresponding master table exists in some management computer 106.

ここで、本実施形態におけるマスターテーブルとスレーブテーブルの内容例を具体的に説明する。図６（ａ）にはマスターテーブル２１６１、図６（ｂ）にはスレーブテーブル２１６２の内容例が示されている。
マスターテーブル２１６１には、クラスタＩＤ（ＩＤは識別情報の意、以下同じ）２３６１、クラスタ・サイズ２３６２、コンピュータリスト２３６３、クラスタ・ステータス２３６４、最大コンピュータ数２３６５、最小コンピュータ数２３６６、トータルコンピュータ数２３６７、クラスタ・コネクションレシオ２３６８の値（図６（ａ）の右側のフィールドの記録値）が、それぞれ、そのクラスタの成分（値）として、マスター管理コンピュータにより記録される。 Here, specific examples of contents of the master table and the slave table in the present embodiment will be described. FIG. 6A shows an example of the contents of the master table 2161, and FIG. 6B shows an example of the contents of the slave table 2162.
The master table 2161 includes a cluster ID (ID means identification information, the same shall apply hereinafter) 2361, cluster size 2362, computer list 2363, cluster status 2364, maximum number of computers 2365, minimum number of computers 2366, total number of computers 2367, The value of the cluster connection ratio 2368 (recorded value in the field on the right side of FIG. 6A) is recorded by the master management computer as a component (value) of the cluster.

クラスタＩＤ２３６１は、形成したクラスタに付与される固有ＩＤである。そのクラスタが存在し続ける限りにおいて一意の情報になれば良いので、例えば、マスター管理コンピュータのＩＤと初めにクラスタに含まれることになるコンピュータ１０８のＩＤとの組をクラスタＩＤとすることができる。
クラスタ・サイズ２３６２は、新規あるいは追加でクラスタに含まれることとなったコンピュータ１０８の総数値である。これは、マスター管理コンピュータが、クラスタ内のコンピュータ数を計測することにより特定することができる。 The cluster ID 2361 is a unique ID assigned to the formed cluster. As long as the cluster continues to exist, it may be unique information. For example, a set of the ID of the master management computer and the ID of the computer 108 to be included in the cluster first can be used as the cluster ID.
The cluster size 2362 is the total number of computers 108 newly or added to be included in the cluster. This can be specified by the master management computer measuring the number of computers in the cluster.

コンピュータリスト２３６３は、クラスタに含まれることになったコンピュータ１０８の識別情報のリストである。
クラスタ・ステータス２３６４は、クラスタの現在の状態を表す情報である。本実施形態では、「idle」、「run」、「wait」の３種類の状態をクラスタ・ステータスとして用いる。「idle」はクラスタがジョブを実行していない状態である。クラスタがこの状態のときは、そのクラスタへのコンピュータ１０８の追加が可能である。「run」はそのクラスタがジョブを実行している状態である。「wait」は保持されているコンピュータ１０８の数が最大コンピュータ数に達している状態である。この状態では、そのクラスタへのコンピュータ１０８の追加はできない。 The computer list 2363 is a list of identification information of the computers 108 that are to be included in the cluster.
The cluster status 2364 is information representing the current state of the cluster. In this embodiment, three types of states “idle”, “run”, and “wait” are used as the cluster status. “Idle” is a state in which the cluster is not executing a job. When the cluster is in this state, the computer 108 can be added to the cluster. “Run” is a state in which the cluster is executing a job. “Wait” is a state in which the number of held computers 108 reaches the maximum number of computers. In this state, the computer 108 cannot be added to the cluster.

最大コンピュータ数２３６５は、クラスタが保持可能な最大コンピュータ数であり、ユーザによって定義されるか、あるいはシステムの保有する定数として定義される。図示の例では、最大で２００個のコンピュータを保有できるクラスタであることが示されている。
最小コンピュータ数２３６６は、クラスタとして保持しなければならない最小コンピュータ数であり、ユーザによって定義されるか、ネットワークシステム１０１の保有する定数として定義される。図示の例では、３個のコンピュータ１０８によりクラスタとなることが示されている。
トータルコンピュータ数２３６７は、そのクラスタに含まれることになる可能性があるコンピュータ１０８の総数値であり、統計に基づく期待値により求めることができる。図示の例では、４８個のコンピュータ１０８が含まれる可能性のあることが示されている。 The maximum computer number 2365 is the maximum number of computers that can be held by the cluster, and is defined by the user or defined as a constant held by the system. In the illustrated example, it is indicated that the cluster can hold a maximum of 200 computers.
The minimum number of computers 2366 is the minimum number of computers that must be held as a cluster, and is defined by the user or defined as a constant held by the network system 101. In the illustrated example, a cluster is formed by three computers 108.
The total computer number 2367 is a total value of the computers 108 that may be included in the cluster, and can be obtained from an expected value based on statistics. In the illustrated example, it is shown that 48 computers 108 may be included.

コネクションレシオ２３６８は、追加容易性情報の一例となる情報である。本実施形態では、例えば追加候補となるコンピュータ１０８（候補コンピュータ）が存在するときにそれを２０％の確率でクラスタに追加させる、あるいは、必要とするクラスタ・サイズ／トータルコンピュータ数のように、追加のし易さを確率値等により定めた数値をコネクションレシオとして用いる。この数値が高いほど、クラスタにはコンピュータ１０８が追加され易くなり、クラスタが成長しやすくなる。図示の例では、現在のクラスタサイズを２３、トータルコンピュータを４８個としたときのコネクションレシオ（＝０．４８）が示されている。 The connection ratio 2368 is information that is an example of additional ease information. In this embodiment, for example, when there is a computer 108 (candidate computer) as an additional candidate, it is added to the cluster with a probability of 20%, or added as in the required cluster size / total number of computers. A numerical value that is determined by a probability value or the like is used as a connection ratio. The higher this value is, the easier it is for the computer 108 to be added to the cluster and the easier the cluster grows. In the illustrated example, the connection ratio (= 0.48) when the current cluster size is 23 and the total number of computers is 48 is shown.

コネクションレシオ２３６８は、一定値を用いることもできるし、変動値とすることもできる。一定値にしていたものを事後的に変動値に代えることもできるし、その逆にすることもできる。これは、クラスタを形成するときのデフォルト値として自動的に設定されるようにしても良いし、クラスタにジョブを実行させるアプリケーションプログラムがパラメータの一つとしてマスターテーブル２１６１のコネクションレシオ２３６８に設定するようにしても良い。
図５のように横軸にクラスタに含まれるコンピュータ１０８の数、縦軸にクラスタの数又はクラスタ化の頻度をプロットしたときの分布がどのようになるかは、このコネクションレシオをどのようにするかに因るところが大きい。例えば、図５（ａ）のような正規分布を得ようとするときは、コネクションレシオを一定値とする。これにより、上記の分布は、代表値を中心とする正規分布となる。コンピュータが追加されてクラスタサイズが大きくなるにつれてコネクションレシオが変動するようにしても良い。例えば、１又はいくつかのコンピュータが追加される毎に、コネクションレシオが高くなるようにしても良い。このようにすれば、図５（ｂ）のようなベキ分布を容易に得ることができる。
なお、マスターテーブル２１６１における上記の成分の持ち方は例示であり、適宜、増減することができる。 The connection ratio 2368 can be a fixed value or a variable value. What was set to a constant value can be replaced later with a variable value, and vice versa. This may be automatically set as a default value when forming a cluster, or an application program for causing a cluster to execute a job may be set in the connection ratio 2368 of the master table 2161 as one of the parameters. Anyway.
As shown in FIG. 5, the distribution ratio when the number of computers 108 included in the cluster is plotted on the horizontal axis and the number of clusters or the frequency of clustering is plotted on the vertical axis is determined by using this connection ratio. The place which depends on is big. For example, when trying to obtain a normal distribution as shown in FIG. 5A, the connection ratio is set to a constant value. Thereby, the above distribution becomes a normal distribution centered on the representative value. The connection ratio may be changed as the computer is added and the cluster size is increased. For example, the connection ratio may be increased every time one or several computers are added. In this way, a power distribution as shown in FIG. 5B can be easily obtained.
Note that the manner of holding the above-described components in the master table 2161 is merely an example, and can be increased or decreased as appropriate.

スレーブテーブル２１６２には、クラスタＩＤ２４６１と、コンピュータｉｄ２４６２とが記録される。クラスタＩＤ２４６１は、クラスタを識別するためのＩＤ（マスターテーブル２１６１に記録されているクラスタＩＤ２３６１と同じ）である。コンピュータｉｄ２４６２はクラスタに含まれるコンピュータ１０８を識別するためのＩＤである。
スレーブテーブル２１６２は、クラスタＩＤ２４６１とコンピュータｉｄ２４６２さえ判明していれば、そのクラスタＩＤ２４６１を同一にするマスターテーブル２１６１とリンクしているので、それに記録されている成分（値）は、マスターテーブル２１６１よりも簡易なものとなっている。
なお、スレーブテーブル２１６２における上記の成分（値）の持ち方も例示であり、適宜、増減することができる。 In the slave table 2162, a cluster ID 2461 and a computer id 2462 are recorded. The cluster ID 2461 is an ID for identifying a cluster (the same as the cluster ID 2361 recorded in the master table 2161). A computer id 2462 is an ID for identifying the computer 108 included in the cluster.
If only the cluster ID 2461 and the computer id 2462 are known, the slave table 2162 is linked to the master table 2161 that makes the cluster ID 2461 identical. Therefore, the component (value) recorded in the slave table 2162 is more than the master table 2161. It is simple.
Note that the way of holding the above components (values) in the slave table 2162 is also an example, and can be increased or decreased as appropriate.

＜運用形態＞
次に、本実施形態のネットワークシステム１０１による分散コンピューティングの運用形態の一例を説明する。
本実施形態による分散コンピューティングでは、図７に示すように、３段階のクラスタの状態変化のサイクルを繰り返しながら、ジョブを分散実行していく。第１段階の状態変化は、新しいクラスタの形成、成長により生じる（ステップＳ１）。第２段階の状態変化は、アプリケーションプログラム等から依頼されたジョブの投入により生じる（ステップＳ２）。そして、第３段階の状態変化は、ジョブ実行後のクラスタの消滅により生じる（ステップＳ３）。クラスタが消滅すると、それまでそのクラスタに属していたすべてのコンピュータ１０８は、その時点で他のクラスタへ新たに属することができる候補コンピュータ（ノード）となる。 <Operational form>
Next, an example of an operation mode of distributed computing by the network system 101 of this embodiment will be described.
In the distributed computing according to the present embodiment, as shown in FIG. 7, a job is distributedly executed while repeating a three-stage cluster state change cycle. The state change in the first stage is caused by the formation and growth of a new cluster (step S1). The state change in the second stage occurs when a job requested by an application program or the like is input (step S2). The state change in the third stage occurs due to the disappearance of the cluster after the job is executed (step S3). When the cluster disappears, all the computers 108 that belonged to the cluster until then become candidate computers (nodes) that can newly belong to another cluster at that time.

このサイクルにおける第１段階の状態変化は、ジョブの有無とは無関係に生じる。そして、ある管理コンピュータ１０６がマスター管理コンピュータとなってクラスタを形成し、それをスレーブ管理コンピュータとの協働により成長させていく。クラスタがどの程度成長するかは、個々のコンピュータ１０８の稼働状態とコネクションレシオによって決まる。コンピュータ１０８の稼働状態が「free」でコネクションレシオも高いほど、他のコンピュータ１０８と接続し易くなる。以下、上述の３段階の状態変化の様子を詳細に説明する。 The first stage state change in this cycle occurs regardless of the presence or absence of a job. Then, a certain management computer 106 becomes a master management computer, forms a cluster, and grows in cooperation with the slave management computer. How much the cluster grows depends on the operating state and connection ratio of each computer 108. As the operating state of the computer 108 is “free” and the connection ratio is higher, it becomes easier to connect to another computer 108. Hereinafter, the state of the above-described three-stage state change will be described in detail.

＜第１段階の状態変化＞
［新しいクラスタの形成、成長１］
図７は、マスター管理コンピュータ（そのように動作する管理コンピュータ１０６のクラスタ管理部２１６）によるクラスタの形成、成長の過程を示す手順説明図である。
マスター管理コンピュータは、自己の傘下のコンピュータ１０８の稼働状態を調べる。具体的には、各コンピュータ１０８が保有するステータステーブルに記録されているコンピュータ・クラスタ・ステータスを調べる（ステップＳ１０１）。そして、未稼働状態のコンピュータ１０８、すなわちコンピュータ・クラスタ・ステータスが「free」となっているいくつかのコンピュータ１０８により１つのクラスタを形成する（ステップＳ１０２）。最初に形成するクラスタにいくつのコンピュータを含まるかは、例えば事前の設定により決める。通常は、自己の傘下のコンピュータ１０８のうち、いくつかのコンピュータ１０８を残して、クラスタ化する。残ったコンピュータ１０８は、他のクラスタを成長させ易くするために、なるべく他のクラスタに追加されるような状態にしておく。この状態を図１２（ａ）に示す。
図１２（ａ）の例では、３つのコンピュータ１０８により１つのクラスタＣ１１が形成されている。このとき、マスター管理コンピュータは、１つのマスターテーブルを生成し、このマスターテーブルの成分（値）をセットする（ステップＳ１０４）。マスター管理コンピュータが各コンピュータ１０８のステータステーブルを管理することになっている場合は、このクラスタに含まれるすべてのコンピュータ１０８のコンピュータ・クラスタ・ステータスを「free」から「clustered」に更新する（ステップＳ１０４）。 <First stage state change>
[Formation and growth of new clusters 1]
FIG. 7 is a procedure explanatory diagram showing the process of cluster formation and growth by the master management computer (the cluster management unit 216 of the management computer 106 that operates in this manner).
The master management computer checks the operating state of the computer 108 under its control. Specifically, the computer cluster status recorded in the status table held by each computer 108 is checked (step S101). Then, one cluster is formed by the computers 108 that are not operating, that is, several computers 108 whose computer cluster status is “free” (step S102). The number of computers included in the initially formed cluster is determined by, for example, prior settings. Normally, some of the computers 108 under their own group are left and clustered. The remaining computer 108 is set in a state where it can be added to other clusters as much as possible in order to facilitate the growth of other clusters. This state is shown in FIG.
In the example of FIG. 12A, one cluster C11 is formed by three computers. At this time, the master management computer generates one master table and sets the components (values) of this master table (step S104). If the master management computer is to manage the status table of each computer 108, the computer cluster status of all computers 108 included in this cluster is updated from “free” to “clustered” (step S104). ).

その後、マスター管理コンピュータは、ランダムに選定した他の管理コンピュータ１０６の傘下のコンピュータ１０８のうち、自己が形成したクラスタに追加するための候補コンピュータが無いかどうか、すなわちコンピュータ・クラスタ・ステータスが「free」になっている他のコンピュータ１０８が無いかどうかを問い合わせることにより探索する（ステップＳ１０５）。探索は、例えば、自己を起点として論理的な距離の短いいくつかの他の管理コンピュータ１０６をランダムに選定し、逐次、その範囲を拡げていくことにより行う。候補コンピュータが存在する場合は（ステップＳ１０６：Yes）、マスターテーブルのコネクションレシオを調べ、コネクションレシオに基づいて、その候補コンピュータをクラスタに追加し、当該候補コンピュータに関する情報をマスターテーブルに記録する（ステップＳ１０７）。その後、追加した候補コンピュータのコンピュータ・クラスタ・ステータスを「free」から「clustered」に更新する（ステップＳ１０８）。
追加された候補コンピュータを管理する管理コンピュータ１０６は、その候補コンピュータ１０８についてのスレーブ管理コンピュータとなり、スレーブテーブルを生成する。候補コンピュータが存在したときの状態を図１２（ｂ）に示す。図１２（ｂ）の例では、２つのコンピュータ１０８が候補コンピュータとして追加されて、５つのコンピュータ１０８が含まれることになったクラスタＣ１４に成長している。マスター管理コンピュータは、追加された新たなコンピュータ１０８の情報をマスターテーブルに追記する。
なお、ステップＳ１０６において、候補コンピュータが存在しない場合は（ステップＳ１０６：No）、ステップＳ１０５の処理に戻る。 Thereafter, the master management computer determines whether there is no candidate computer to be added to the cluster formed by itself among the computers 108 under the randomly selected other management computer 106, that is, the computer cluster status is “free”. It is searched by inquiring whether there is any other computer 108 that is "" (step S105). The search is performed, for example, by randomly selecting several other management computers 106 having a short logical distance starting from themselves and sequentially expanding the range. If there is a candidate computer (step S106: Yes), the connection ratio of the master table is checked, the candidate computer is added to the cluster based on the connection ratio, and information about the candidate computer is recorded in the master table (step S106). S107). Thereafter, the computer cluster status of the added candidate computer is updated from “free” to “clustered” (step S108).
The management computer 106 that manages the added candidate computer becomes a slave management computer for the candidate computer 108 and generates a slave table. The state when the candidate computer exists is shown in FIG. In the example of FIG. 12B, two computers 108 have been added as candidate computers, and have grown into a cluster C14 that includes five computers 108. The master management computer adds information on the added new computer 108 to the master table.
In step S106, when there is no candidate computer (step S106: No), the process returns to step S105.

［新しいクラスタの形成、成長２］
図９は、マスター管理コンピュータとならない管理コンピュータ１０６によるクラスタの成長過程を示す手順説明図である。
クラスタの成長は、図８に示したように、マスター管理コンピュータが能動的に振る舞うことにより実現されるのが原則であるが、マスター管理コンピュータとはならない管理コンピュータ１０６の方からマスター管理コンピュータにアクセスしてそれを行うこともできる。
すなわち、クラスタ化されていないコンピュータ１０８を傘下に持つ管理コンピュータ１０６は、例えばランダムに選定した他の管理コンピュータ１０６に問い合わせることにより、そのコンピュータ１０８を追加することができる他のクラスタを探索する（ステップＳ２０１）。そして、入り込む先のクラスタのコネクションレシオに基づいてそのクラスタに自己の傘下のコンピュータ１０８を追加するかどうかを決定する（ステップＳ２０２）。追加しないことになった場合は、ステップＳ２０１に戻る（ステップＳ２０３：No）。
追加する場合は、そのコンピュータ１０８について、接続先のクラスタに対するスレーブ管理コンピュータとなるので、そのコンピュータ１０８についてのスレーブテーブルを生成し、このスレーブテーブルに各成分（値）をセットする（ステップＳ２０４）。そして、追加するコンピュータ１０８のコンピュータ・クラスタ・ステータスを「free」から「clustered」に更新する（ステップＳ２０５）。さらに、追加される側のクラスタのマスター管理コンピュータに、マスターテーブルをアップデートさせる（ステップＳ２０６）。なお、このアップデートは、スレーブ管理コンピュータからマスター管理コンピュータにアクセスして自ら行うようにしても良い。 [Formation and growth of new clusters 2]
FIG. 9 is a procedure explanatory diagram showing a cluster growth process by the management computer 106 that does not become the master management computer.
As shown in FIG. 8, the growth of the cluster is basically realized by the master management computer acting actively, but the management computer 106 that is not the master management computer accesses the master management computer. And you can do it.
In other words, the management computer 106 having a computer 108 that is not clustered searches for other clusters to which the computer 108 can be added by, for example, inquiring to another management computer 106 selected at random (step S100). S201). Then, based on the connection ratio of the destination cluster, it is determined whether or not to add the computer 108 belonging to the cluster to the cluster (step S202). If not added, the process returns to step S201 (step S203: No).
When adding, since it becomes a slave management computer with respect to the cluster of a connection destination about the computer 108, the slave table about the computer 108 is produced | generated and each component (value) is set to this slave table (step S204). Then, the computer cluster status of the computer 108 to be added is updated from “free” to “clustered” (step S205). Further, the master management computer of the cluster to be added is updated with the master table (step S206). This update may be performed by accessing the master management computer from the slave management computer.

＜第２段階の状態変化＞
［ジョブの投入］
第２段階の状態変化は、アプリケーション等からの依頼によりジョブが投入されたいずれかの管理コンピュータ１０６によるアクションを起点に始まる。ジョブの実行に際しては、ジョブの実行に必要となる計算量と所要時間、例えばコンピュータ１０８の数がパラメータとなる。このパラメータは、投入されたジョブのサイズに基づいて当該管理コンピュータ１０６のプロセッサが自動的に生成しても良いし、アプリケーション等を利用するユーザが適宜与えるようにしても良い。アプリケーションが持っているパラメータを使用するようにしても良い。 <Second stage state change>
[Submit Job]
The state change in the second stage starts from an action by any management computer 106 to which a job has been submitted by a request from an application or the like. When executing a job, the amount of calculation and the time required for executing the job, for example, the number of computers 108 are parameters. This parameter may be automatically generated by the processor of the management computer 106 based on the size of the input job, or may be appropriately given by a user who uses an application or the like. You may make it use the parameter which an application has.

図９は、ジョブが投入された管理コンピュータ１０６によるクラスタ利用の手順説明図である。管理コンピュータ１０６は、上記の計算量に基づいて、クラスタリスト２０８を参照することにより、あるいは他の管理コンピュータに問い合わせることにより、必要なサイズのクラスタを形成しているマスター管理コンピュータを探索する（ステップＳ３０１）。見つかった場合は、そのマスター管理コンピュータに、ジョブの実行を依頼する（ステップＳ３０２：Yes、Ｓ３０３）。ジョブの実行依頼は、例えば、ジョブと、その実行に必要なプログラムおよびデータと、実行結果の送出を自己宛とするための指定アドレスとを含むパケットをそのマスター管理コンピュータに送出することにより行う。 FIG. 9 is an explanatory diagram of a procedure for using a cluster by the management computer 106 to which a job has been submitted. The management computer 106 searches for a master management computer forming a cluster of a necessary size by referring to the cluster list 208 or inquiring another management computer based on the above calculation amount (step). S301). If found, the master management computer is requested to execute the job (step S302: Yes, S303). The job execution request is performed by, for example, sending a packet including a job, a program and data necessary for the execution, and a designated address for sending the execution result to the master management computer.

マスター管理コンピュータは、そのジョブを実行するクラスタに所属している各コンピュータ１０８のコンピュータ・クラスタ・ステータスを「idol」又は「wait」から「run」に更新したうえで、各コンピュータ１０８による分散処理によってジョブの実行を始める。図１２（ｃ）は、図１２（ｂ）のように成長したクラスタＣ１４を管理するマスター管理コンピュータが、ジョブを実行している状態を表している。ジョブの実行結果は、マスター管理コンピュータから依頼元の管理コンピュータ１０６に送出される。実行結果を受領した管理コンピュータ１０６は、その実行結果をジョブの投入元に伝達する（ステップＳ３０４：Yes、Ｓ３０５）。
なお、ジョブが、必要なサイズのクラスタを管理しているマスター管理コンピュータに直接投入された場合は、上記ステップＳ３０１〜Ｓ３０４は、自ら行う処理手順となる。また、ステップＳ３０２において、複数のクラスタに跨ってジョブを実行する必要があると判定した場合、例えば、画像処理と音声処理とを別々のクラスタで実行する必要があると判定した場合は、各クラスタを管理するマスター管理コンピュータに、それぞれジョブの実行を依頼することになる。 The master management computer updates the computer cluster status of each computer 108 belonging to the cluster executing the job from “idol” or “wait” to “run”, and then performs distributed processing by each computer 108. Start the job execution. FIG. 12C shows a state where the master management computer that manages the cluster C14 grown as shown in FIG. 12B is executing a job. The job execution result is sent from the master management computer to the requesting management computer 106. The management computer 106 that has received the execution result transmits the execution result to the job submission source (step S304: Yes, S305).
When a job is directly submitted to the master management computer that manages a cluster of a required size, the above steps S301 to S304 are processing procedures performed by the job. In step S302, if it is determined that a job needs to be executed across a plurality of clusters, for example, if it is determined that image processing and audio processing need to be executed in separate clusters, Each job is requested to be executed by the master management computer that manages the job.

＜第３段階の状態変化＞
［クラスタの消滅］
クラスタの消滅は、ジョブの実行を行ったクラスタを管理するマスター管理コンピュータが行う。図１１は、このクラスタを消滅させるときのマスター管理コンピュータにおける処理手順図である。
マスター管理コンピュータは、ジョブが終了すると（ステップＳ４０１：Yes）、それまでジョブを実行したクラスタに含まれるすべてのコンピュータ１０８についてのコンピュータ・クラスタ・ステータスを「free」に更新する（ステップＳＳ４０２）。また、そのクラスタのマスターテーブルの成分（値）をクリアする。同時に、クラスタに所属していたコンピュータを傘下にするスレーブ管理コンピュータを通じてスレーブテーブルの成分（値）をクリアする。すなわち、すべてのコンピュータをクラスタ形成前の状態に戻す（ステップＳ４０３）。これにより、そのクラスタに所属していたすべてのコンピュータ１０８は、未稼働状態で、クラスタ化が可能な状態となり、あるコンピュータ１０８は、直ちにクラスタ化され、他のコンピュータ１０８は、他のクラスタに追加される状態となる。図１２（ｄ）はこの状態を示している。 <3rd stage state change>
[Disappearance of cluster]
The disappearance of the cluster is performed by the master management computer that manages the cluster that has executed the job. FIG. 11 is a processing procedure diagram in the master management computer when this cluster is extinguished.
When the job ends (step S401: Yes), the master management computer updates the computer cluster status for all computers 108 included in the cluster that has executed the job so far to “free” (step SS402). Also, the component (value) of the master table of the cluster is cleared. At the same time, the components (values) of the slave table are cleared through the slave management computer under the control of the computer belonging to the cluster. That is, all computers are returned to the state before cluster formation (step S403). As a result, all the computers 108 belonging to the cluster are in a non-operating state and can be clustered. One computer 108 is immediately clustered, and another computer 108 is added to another cluster. It will be in a state to be. FIG. 12 (d) shows this state.

このように、この実施形態によれば、投入されるジョブのサイズ、種類、数のいずれかが様々で、それらが時々刻々と変化していくことが予想される不確定要素の多いネットワークシステムであっても、複数存在する管理コンピュータ１０６の各々が、その傘下のコンピュータ１０８毎に、それぞれ、マスター管理コンピュータあるいはスレーブ管理コンピュータとして振る舞いながら、最適なサイズのクラスタを形成し、それを成長させていくので、適切なサイズのクラスタにジョブの実行を割り当てることができ、効果的にジョブを実行していくことができる。また、大規模なプロセスから非常に小さなプロセスまで満遍なく一様の手順で対応が可能になるので、柔軟性に富む分散コンピューティングを実現することができる。特に、プロセスのサイズの分布がべき分布になるときには、追加容易性情報を変動可能にし、クラスタサイズもべき分布になるように設定することができるので、最も効果的にプロセスを処理することができ、コンピュータ資源の使用効率を高めることができる。
＜変形例＞
以上の例では、便宜上、ステータステーブルは、各コンピュータ１０８が保有し、管理テーブル２０６（マスターテーブル／スレーブテーブル）は、管理コンピュータ１０６が保有することを前提として説明したが、これらのテーブルは、各コンピュータ１０８および管理コンピュータ１０６がそれらにアクセスできるメモリ領域に存在すれば良い。例えば、管理コンピュータ１０６が、自己の傘下のコンピュータ１０８のステータステーブルを備えていても良いし、管理コンピュータ１０６のテーブル管理部２１２がアクセス可能なサーバ等に管理テーブル２０６を一括して格納するようにしても良い。
また、以上の例では、スレーブ管理コンピュータの傘下にある候補コンピュータを他のクラスタに追加するかどうかは、そのスレーブ管理コンピュータが決定するようにしているが、候補コンピュータをクラスタに追加するかどうかは、すべてマスター管理コンピュータにおいて決定するようにしても良い。 As described above, according to this embodiment, in a network system with many uncertainties, which is expected to vary in size, type, or number of jobs to be input and changes from moment to moment. Even if there is a plurality of management computers 106, each of the computers 108 under their control acts as a master management computer or a slave management computer while forming a cluster of an optimal size and growing it. Therefore, job execution can be assigned to a cluster of an appropriate size, and the job can be executed effectively. In addition, since a large-scale process to a very small process can be handled in a uniform and uniform procedure, distributed computing with high flexibility can be realized. In particular, when the process size distribution is a power distribution, the additional ease information can be changed and the cluster size can also be set to a power distribution, so the process can be processed most effectively. , Increase the use efficiency of computer resources.
<Modification>
In the above example, for the sake of convenience, the description has been given on the assumption that the status table is held by each computer 108 and the management table 206 (master table / slave table) is held by the management computer 106. The computer 108 and the management computer 106 only need to exist in a memory area that can access them. For example, the management computer 106 may have a status table of the computer 108 under its control, or the management table 206 may be stored in a batch on a server or the like accessible by the table management unit 212 of the management computer 106. May be.
In the above example, the slave management computer decides whether or not to add a candidate computer under the slave management computer to another cluster, but whether or not to add a candidate computer to the cluster. All may be determined by the master management computer.

＜管理コンピュータによらない場合の実施例＞
以上の説明では、複数のコンピュータ１０８を傘下にする管理コンピュータ１０６の集合から成るネットワークシステム１０１の例を示したが、本発明は、管理コンピュータ１０６によらずに、複数のコンピュータ１０８の各々が直接参加および離脱することができるネットワークシステムとしての実施も可能である。 <Example in case of not using a management computer>
In the above description, an example of the network system 101 including a set of management computers 106 under the control of a plurality of computers 108 has been shown. However, in the present invention, each of the plurality of computers 108 is directly connected without using the management computer 106. Implementation as a network system that can join and leave is also possible.

このようなネットワークシステムとして実施する場合は、各コンピュータ１０８のステータステーブルに、コネクションレシオをも記録するようにし、ステータステーブルの管理も自ら行うようにする。例えば、他のコンピュータ１０８により形成されているクラスタに、自己を追加するときには、自己のコンピュータ・クラスタ・ステータスを「clustered」に更新させる。そして、そのクラスタによるジョブの実行が終了した時点で、自己のコンピュータ・クラスタ・ステータスを「free」に復帰させるようにする。また、上述した管理コンピュータ１０６と同様の機能を、いずれかのコンピュータ１０８に持たせるようにする。つまり、自らが主導的にクラスタを形成するときは、そのクラスタの内容を記録したマスターテーブルを生成する。 When implemented as such a network system, the connection ratio is also recorded in the status table of each computer 108, and the status table is also managed by itself. For example, when a self is added to a cluster formed by another computer 108, the self computer cluster status is updated to “clustered”. Then, when the execution of the job by the cluster is completed, the own computer cluster status is returned to “free”. Further, any of the computers 108 has the same function as the management computer 106 described above. That is, when a cluster is formed by itself, a master table that records the contents of the cluster is generated.

クラスタを形成したコンピュータ１０８は、他のコンピュータが保有するステータステーブルの最新のコンピュータ・クラスタ・ステータスが「free」になっている他の追加候補となる候補コンピュータを探索し、これにより特定された候補コンピュータを、クラスタのコネクションレシオに基づいて追加するかどうかを決定するようにする。クラスタを形成したコンピュータ１０８は、このようにして成長したクラスタにジョブを実行させ、実行終了後は、そのクラスタを消滅させる。 The computer 108 that formed the cluster searches for other additional candidate computers whose latest computer cluster status is “free” in the status table held by the other computer, and the candidate identified by this is searched. Decide whether to add computers based on the cluster connection ratio. The computer 108 that has formed the cluster causes the cluster that has grown in this way to execute a job, and after the execution is completed, the cluster disappears.

＜マルチプロセッサ／マルチ・コア・プロセッサとしての実施例＞
本発明は、複数のプロセッサ又はマルチ・コア・プロセッサによるネットワークシステムとして実施することもできる。この場合は、図１３（ａ）に例示したように、内部バスを上述したコンピュータネットワーク１０４として機能させ、この内部バスに接続されたいずれかのプロセッサ又はマルチ・コア・プロセッサのいずれか一つ又はいくつかを上述した管理コンピュータ１０６として動作させるとともに、残りのプロセッサ等を上述したコンピュータ１０８として動作させるようにすれば良い。
図１３（ｂ）に例示した個々のマルチ・コア・プロセッサ自体をネットワークシステムとして実施することもできる。この場合は、内部バスに接続されたいずれか一つ又はいくつかのコア（プロセッサ・コア）および入出力制御部（Ｉ／Ｏ）とキャッシュメモリとで上述した管理コンピュータ１０６の動作を実現するとともに、残りのコアを上述したコンピュータ１０８として動作させるようにすれば良い。
［他の実施の形態］ <Embodiment as a multiprocessor / multi-core processor>
The present invention can also be implemented as a network system with a plurality of processors or multi-core processors. In this case, as illustrated in FIG. 13A, the internal bus functions as the computer network 104 described above, and any one of the processors or multi-core processors connected to the internal bus or Some may be operated as the management computer 106 described above, and the remaining processors and the like may be operated as the computer 108 described above.
The individual multi-core processors themselves illustrated in FIG. 13B can also be implemented as a network system. In this case, the operation of the management computer 106 described above is realized by any one or several cores (processor cores) connected to the internal bus, the input / output control unit (I / O), and the cache memory. The remaining cores may be operated as the computer 108 described above.
[Other embodiments]

本発明のネットワークシステムは、あるクラスタのコンピュータ１０８が他のクラスタにアクセスしたときに、当該他のクラスタと関係付けられている別のクラスタにコンピュータ１０８を追加させるという実施形態も可能である。このような実施形態のモードを「デュープリケーションモード（Duplication Mode）」と称する。この「デュープリケーションモードについては、下記の文献に詳細に記載されている。
タイトル：Growing networks with local rules:preferential attachment,clustering hierarchy and degree correlations 著作者：Alexei V´azquez 雑誌名：Phys Rev E Stat Nonlin Soft Matter Phys. 2003 May;67(5 Pt2):056104. Epub 2003 May 7. The network system of the present invention may be configured such that when a computer 108 of a cluster accesses another cluster, the computer 108 is added to another cluster associated with the other cluster. Such a mode of the embodiment is referred to as a “Duplication Mode”. This “duplication mode” is described in detail in the following document.
Title: Growing networks with local rules: preferential attachment, clustering hierarchy and degree correlations Author: Alexei V´azquez Journal title: Phys Rev E Stat Nonlin Soft Matter Phys. 2003 May; 67 (5 Pt2): 056104. Epub 2003 May 7 .

デュープリケーションモードでは、上述したコネクションレシオ２３６８に代えて、アクセスしたクラスタに所属するコンピュータが関係付けられている他のクラスタの識別情報（接続アドレス等）をリストアップしたクラスタリストを用いる。クラスタリストは、例えばマスターテーブルに設けられる。 In the duplication mode, in place of the connection ratio 2368 described above, a cluster list that lists identification information (connection address, etc.) of other clusters associated with computers belonging to the accessed cluster is used. The cluster list is provided, for example, in the master table.

図１４は、この場合のマスターテーブルの内容例を示している。このマスターテーブル２１６３は、図６（ａ）に示したマスターテーブル２１６１と同じ内容のクラスタＩＤ２３６１、クラスタサイズ１２６２、コンピュータリスト２３６３、クラスタ・ステータス２３６４、最大コンピュータ数２３６５、最小コンピュータ数２３６６のほか、クラスタリスト２３６９を有する。マスター管理コンピュータは、自己が管理するクラスタに新たなクラスタが関係付けられる度に、その新たなクラスタの識別情報をクラスタリスト２３６９にリストアップする。この実施形態にいう「関係付け」は、あるクラスタに所属するコンピュータと他のクラスタに所属するコンピュータとが論理的につながって（連絡しあって）、連携処理等ができることをいう。
マスター管理コンピュータは、例えばスレーブ管理コンピュータからの求めに応じてクラスタリスト２３６９にリストアップされた一つのクラスタの識別情報を読み出し、それを当該スレーブ管理コンピュータへ通知する。セキュリティを重視しないで足りる用途では、このクラスタリスト２３６９を例えばスレーブ管理コンピュータに閲覧させるようにしても良い。 FIG. 14 shows an example of the contents of the master table in this case. This master table 2163 includes a cluster ID 2361, a cluster size 1262, a computer list 2363, a cluster status 2364, a maximum number of computers 2365, a minimum number of computers 2366 and the same contents as the master table 2161 shown in FIG. It has a list 2369. Each time a new cluster is associated with a cluster managed by the master management computer, the master management computer lists the identification information of the new cluster in the cluster list 2369. “Relation” in this embodiment means that a computer belonging to a certain cluster and a computer belonging to another cluster are logically connected (contacted), and can perform cooperation processing or the like.
The master management computer reads the identification information of one cluster listed in the cluster list 2369 in response to a request from the slave management computer, for example, and notifies the slave management computer of it. For applications where security is not important, the cluster list 2369 may be browsed by, for example, a slave management computer.

このクラスタリスト２３６９からの読み出し順序は、ランダムであっても良く、所定順であっても良い。前者の場合、クラスタリスト２３６９へのクラスタの識別情報のリストアップの順位は、任意であって良い。他方、後者の場合は、リストアップの順位に重み付けを行う。例えば、そのマスター管理コンピュータが管理するクラスタの処理内容との関連性が高い他のクラスタほど先に読み出されるようにする。この場合の関連性は、例えば、クラスタ間のアクセス回数、クラスタ間の所定時間内の交信回数、クラスタ間の分散処理の関連度合い（画像処理と同期をとるべき音声処理等）等により決めることができる。 The order of reading from the cluster list 2369 may be random or a predetermined order. In the former case, the order of listing the cluster identification information in the cluster list 2369 may be arbitrary. On the other hand, in the latter case, the ranking is weighted. For example, other clusters having higher relevance to the processing contents of the cluster managed by the master management computer are read out first. The relevance in this case can be determined by, for example, the number of accesses between clusters, the number of times of communication between clusters within a predetermined time, the degree of relevance of distributed processing between clusters (audio processing that should be synchronized with image processing, etc.), etc. it can.

デュープリケーションモードでの運用形態は、例えば以下のようになる。
いま、ネットワークシステムが、図１５の上段のような初期状態であったとする。図１５の例では、複数のクラスタＣ６〜Ｃ１１が既に形成されている。各クラスタＣ６〜Ｃ１１には、それぞれ１又は複数のコンピュータ（上述したコンピュータ１０８、以下同じ）が所属している。 For example, the operation mode in the duplication mode is as follows.
Assume that the network system is in an initial state as shown in the upper part of FIG. In the example of FIG. 15, a plurality of clusters C6 to C11 are already formed. One or a plurality of computers (the computer 108 described above, the same applies hereinafter) belong to each of the clusters C6 to C11.

この状態で、図１５中段に示されるように、ネットワークシステムに新たにクラスタＣ２０が形成されたとする。このクラスタＣ２０を形成した管理コンピュータは、当該クラスタＣ２０に所属するコンピュータのほか、他のクラスタを成長させ易くするためのいくつかのコンピュータ、すなわち候補コンピュータ（図示省略）をも自己の傘下として管理しているものとする。この候補コンピュータを他のクラスタに提供する場合、その候補コンピュータを管理する管理コンピュータがスレーブ管理コンピュータとなることは、前述のとおりである。
クラスタＣ２０を管理する管理コンピュータは、クラスタＣ６〜Ｃ１１の中から、クラスタＣ７を探し出す。そして、このクラスタＣ７を管理しているマスター管理コンピュータに問い合わせ、そのマスターテーブル２１６３におけるクラスタＣ７と他のクラスタとの関係付けの情報を入手する。図１５中段の場合、クラスタＣ７はクラスタＣ１１とのみ関係付けられている。そこで、クラスタＣ２０を管理する管理コンピュータは、候補コンピュータをクラスタＣ１１に追加させ、この候補コンピュータとクラスタＣ２０（それに所属するコンピュータ）とを関係付ける（図１５下段）。また、自己が保有するスレーブテーブルの内容を更新する。
他方、クラスタＣ１１を管理するマスター管理コンピュータは、マスターテーブル２１６３に、新たに追加されたコンピュータ（追加前の候補コンピュータ）のリストをコンピュータリスト２３６３に追加するとともに、クラスタリスト２３６９に、クラスタＣ２０の識別情報を追加する。 Assume that a new cluster C20 is formed in the network system in this state as shown in the middle part of FIG. The management computer that forms the cluster C20 manages not only the computers belonging to the cluster C20 but also several computers for facilitating the growth of other clusters, that is, candidate computers (not shown) as their own companies. It shall be. As described above, when this candidate computer is provided to another cluster, the management computer that manages the candidate computer becomes a slave management computer.
The management computer that manages the cluster C20 searches for the cluster C7 from the clusters C6 to C11. Then, the master management computer that manages the cluster C7 is inquired, and information on the association between the cluster C7 and another cluster in the master table 2163 is obtained. In the middle of FIG. 15, the cluster C7 is related only to the cluster C11. Therefore, the management computer that manages the cluster C20 adds the candidate computer to the cluster C11 and associates the candidate computer with the cluster C20 (a computer belonging to the computer) (lower row in FIG. 15). Also, the contents of the slave table owned by itself are updated.
On the other hand, the master management computer that manages the cluster C11 adds a list of newly added computers (candidate computers before addition) to the master table 2163 to the computer list 2363 and identifies the cluster C20 in the cluster list 2369. Add information.

以上は、クラスタＣ２０とクラスタＣ１１（各々のコンピュータ）同士が１対１で関係付けた場合の例であるが、関係付けは必ずしも１対１だけではなく、１対Ｎ（Ｎは２以上の自然数）であっても良い。また、この関係付けは、物理的なコンソール内のコンピュータ同士のつながりとは関係がない。 The above is an example in which the cluster C20 and the cluster C11 (each computer) are associated with each other on a one-to-one basis. However, the association is not necessarily one-to-one, but one-to-N (N is a natural number of 2 or more). ). This association is not related to the connection between computers in the physical console.

例えば、図１６（ａ）は、ネットワークシステムに新たに加わるコンピュータの所属するクラスタＣ２０が形成された場合において、クラスタＣ２０を管理する管理コンピュータが、自己の傘下の候補コンピュータの追加先を探す際に、クラスタＣ１１に到達したとする。クラスタＣ１１のクラスタリストには、クラスタＣ１１のコンピュータと１対１で関係付けられているクラスタＣ６，Ｃ７のほか，クラスタＣ１１の複数のコンピュータと関係付けられているＣ８，Ｃ１０がリストアップされていたとすると、候補コンピュータは、クラスタＣ６，Ｃ７，Ｃ８，Ｃ１０のいずれかに追加される。図１６（ｂ）の場合も同様である。
なお、これらの場合において、スレーブテーブル、マスターテーブルの記録情報が更新されることは、図１５に示した例の場合と同様である。 For example, FIG. 16A shows a case where a management computer that manages a cluster C20 searches for an addition destination of a candidate computer under its control when a cluster C20 to which a computer newly added to the network system belongs is formed. Suppose that the cluster C11 is reached. In the cluster list of the cluster C11, in addition to the clusters C6 and C7 that are associated with the computers of the cluster C11 on a one-to-one basis, C8 and C10 that are associated with a plurality of computers of the cluster C11 are listed. Then, the candidate computer is added to one of the clusters C6, C7, C8, and C10. The same applies to the case of FIG.
In these cases, the recording information of the slave table and the master table is updated as in the example shown in FIG.

デュープリケーションモードでは、クラスタリスト２３６９とコンピュータリスト２３６３とを併用した運用形態も可能である。以下、この場合の運用形態例を説明する。
ネットワークシステムに加わる新たにクラスタＣ２０が形成され、クラスタＣ２０を管理する管理コンピュータが、自己の傘下の候補コンピュータの追加先を探し出す際に、クラスタＣ１１に到達するまでは、図１６（ａ），（ｂ）の例と同じである。
クラスタＣ１１のコンピュータリスト２３６３によれば、図１７に示されるように、コンピュータ＃１〜＃５が所属しており、クラスタリスト２３６９によれば、コンピュータ＃１，＃３はクラスタＣ１０、コンピュータ＃４はクラスタＣ７、コンピュータ＃５はクラスタＣ６に、それぞれ関係付けられており、コンピュータ＃２はどことも関係付けられていないとする。必ずしもそのようにする必要はないが、コンピュータリスト２３６３の読み出し順もランダムに行うとすると、コンピュータリスト２３６３からコンピュータ＃１が選ばれたときは、それに関係付けられているクラスタＣ１０に、候補コンピュータが追加される。なお、コンピュータ＃２が選ばれたときは、それに関係付けられているクラスタがないので、再選択される。 In the duplication mode, an operation mode in which the cluster list 2369 and the computer list 2363 are used together is also possible. Hereinafter, an example of the operation mode in this case will be described.
When a new cluster C20 that joins the network system is formed and the management computer that manages the cluster C20 searches for the addition destination of the candidate computer under its control, it reaches the cluster C11 until it reaches the cluster C11. It is the same as the example of b).
According to the computer list 2363 of the cluster C11, as shown in FIG. 17, computers # 1 to # 5 belong, and according to the cluster list 2369, the computers # 1 and # 3 are the cluster C10 and the computer # 4. Are associated with cluster C7 and computer # 5 are associated with cluster C6, respectively, and computer # 2 is not associated with anything. Although it is not always necessary to do so, if the computer list 2363 is read at random, when the computer # 1 is selected from the computer list 2363, the candidate computer is added to the cluster C10 related thereto. Added. When computer # 2 is selected, it is reselected because there is no cluster associated with it.

デュープリケーションモードでは、コネクションレシオ２３６８を設定しなくとも、ベキ分布に従うクラスタリングが可能になるので、コンピュータのクラスタへの追加が、よりスムーズになるという効果がある。 In the duplication mode, it is possible to perform clustering according to the power distribution without setting the connection ratio 2368, so that there is an effect that the addition of the computer to the cluster becomes smoother.

本発明は、コンピュータ同士をつなぐノードの数が、例えば、正規分布あるいはベキ分布に従うクラスタリングを容易にするためのネットワークシステムに適用が可能である。また、投入されるジョブのサイズ、種類、数が様々で、それらが時々刻々と変化していく情報処理を効率的に実行することができる分散コンピューティング全般にも適用が可能なものである。 The present invention can be applied to a network system for facilitating clustering in which the number of nodes connecting computers is, for example, a normal distribution or a power distribution. In addition, the present invention can be applied to all distributed computing that can efficiently execute information processing in which the size, type, and number of jobs to be submitted vary and change from moment to moment.

本発明が適用されるネットワークシステムの全体図。1 is an overall view of a network system to which the present invention is applied. 本実施形態による管理コンピュータの概要アーキテクチャを示す図。The figure which shows the general | schematic architecture of the management computer by this embodiment. 大規模情報処理統合体の説明図。Explanatory drawing of a large-scale information processing integrated body. 本実施形態による管理コンピュータの機能構成図。The functional block diagram of the management computer by this embodiment. リンクの数を横軸に、リンク数を持つノード数を縦軸にしてプロットした実測図であり、（ａ）は、ランダムに接続した場合の例、（ｂ）は選択的接続した場合の例である。It is the actual measurement figure plotted with the number of links on the horizontal axis and the number of nodes with the number of links on the vertical axis, where (a) is an example of a random connection, and (b) is an example of a selective connection. It is. （ａ）はマスターテーブル、（ｂ）はスレーブテーブルの内容例を示した図。(A) is a master table, (b) is the figure which showed the example of the content of the slave table. 本実施形態による分散コンピューティングの概要手順図。FIG. 3 is a schematic procedure diagram of distributed computing according to the present embodiment. 管理コンピュータによるクラスタの形成、成長の手順説明図。Explanatory drawing of the procedure of cluster formation and growth by a management computer. 管理コンピュータによるクラスタの形成、成長の手順説明図。Explanatory drawing of the procedure of cluster formation and growth by a management computer. 管理コンピュータによるジョブの投入、実行の手順説明図。Explanatory diagram of job submission and execution by the management computer. 管理コンピュータによるクラスタ消滅時の処理手順説明図。Explanatory drawing of the process sequence at the time of the cluster extinction by a management computer. （ａ）〜（ｄ）は、本実施形態によるクラスタの形成、成長、消滅の過程を示した説明図。(A)-(d) is explanatory drawing which showed the process of formation of the cluster by this embodiment, growth, and extinction. （ａ）はマルチプロセッサにより実現されるネットワークシステムの概要図、（ｂ）はマルチ・コア・プロセッサ単体により実現されるネットワークシステムの概要図。(A) is a schematic diagram of a network system realized by a multiprocessor, (b) is a schematic diagram of a network system realized by a single multi-core processor. デュープリケーションモードで使用されるマスターテーブルの内容例を示した図。The figure which showed the example of the content of the master table used by duplication mode. デュープリケーションモードでの運用形態例を示した図。The figure which showed the example of the operation | use form in duplication mode. （ａ），（ｂ）はデュープリケーションモードでの他の運用形態例を示した図。(A), (b) is the figure which showed the other example of operation form in duplication mode. デュープリケーションモードでの他の運用形態例を示した図。The figure which showed the other example of operation form in duplication mode.

Explanation of symbols

１０１・・・ネットワークシステム
１０４・・・コンピュータ・ネットワーク
１０６・・・管理コンピュータ
１０８・・・コンピュータ（候補コンピュータ）
２０６・・・管理テーブル
２０８・・・クラスタリスト
２１２・・・テーブル管理部
２１６・・・クラスタ管理部
２３６・・・通信制御部
２５６・・・ジョブ実行部
３１５・・・共用ＤＲＡＭ
１０６１・・・ストレージ装置
１０６３・・・通信装置
１０６５・・・半導体メモリ
１０６７・・・プロセッサ
２１６１，２１６３・・・マスターテーブル
２１６２・・・スレーブテーブル
２３６１〜２３６９・・・マスターテーブルの成分（値）
２４６１〜２４６２・・・スレーブテーブルの成分（値）
Ｂ１１・・・バス
ＷＯ・・・大規模情報処理統合体
DESCRIPTION OF SYMBOLS 101 ... Network system 104 ... Computer network 106 ... Management computer 108 ... Computer (candidate computer)
206: Management table 208 ... Cluster list 212 ... Table management unit 216 ... Cluster management unit 236 ... Communication control unit 256 ... Job execution unit 315 ... Shared DRAM
1061 ... Storage device 1063 ... Communication device 1065 ... Semiconductor memory 1067 ... Processor 2161, 2163 ... Master table 2162 ... Slave table 2361 to 2369 ... Components (values) of the master table
2461-2462 ... Slave table components (values)
B11 ... Bus WO ... Large-scale information processing integrated body

Claims

A network system in which a plurality of computers each capable of clustering with other computers can freely join and leave;
A first table that records clusterability information indicating whether each computer is in a clusterable state;
A second table for recording additional ease information indicating how easily a cluster already formed by one or more computers can add another computer;
Any one of the computers forms a cluster including self and other computers whose clusterability information in the first table indicates a clusterable state, and for all the computers included in the formed cluster. A cluster forming unit that updates the clusterability information to information indicating a non-clusterable state, and further records the additional ease information about the formed cluster in the second table;
The computer that has formed the cluster indicates whether or not to add the candidate computer to the cluster when there is a candidate computer to be added to the cluster in the additional ease information recorded in the second table. Having a cluster growth means to determine based on; network system.

The computer that formed the cluster disappears when the execution of the job by the cluster is completed, and belongs to the additional ease information recorded in the second table and the disappeared cluster. Cluster extinction means for returning the clustering availability information recorded in the first table for all computers to the state before cluster formation;
The network system according to claim 1.

The additional ease information is a numerical value obtained by quantifying the ease of addition, and the cluster growth means facilitates adding the candidate computer to the cluster as the numerical value increases.
The network system according to claim 1.

The cluster growth means holds the numerical value at a constant value regardless of addition of candidate computers.
The network system according to claim 3.

The cluster growth means is variable after recording the numerical value in the second table;
The network system according to claim 3.

The cluster growth means changes the numerical value when the candidate computer is added to the cluster.
The network system according to claim 5.

A network system in which a plurality of management computers each having one or a plurality of computers that can be clustered with other computers can freely join and leave;
A first table that records clusterability information indicating whether each computer is in a clusterable state;
A second table recording addability information indicating how easily a cluster already formed by one or more computers can add another computer;
At least one of the management computers includes a cluster including a computer under its own control and a computer under the control of another management computer including a computer in which the clusterability information in the first table indicates a clusterable state. Forming and updating the clustering availability information for all the computers included in the formed cluster with information representing a non-clusterable state, and a cluster forming means,
The management computer forming the cluster determines whether to add the candidate computer to the cluster when there is a candidate computer to be added based on the additional ease information recorded in the second table. Having cluster growth means to
Network system.

The management computer that formed the cluster lost the cluster when the job execution by the cluster was completed, and belonged to the additional ease information recorded in the second table and the deleted cluster Cluster erasure means for returning the clusterability information recorded in the first table for all computers to the state before the cluster formation,
The network system according to claim 7.

Any of the management computers has the second table of the maximum number of computers under its control,
The network system according to claim 8.

The second table includes a master table that is generated for the first computer when the management computer that owns it forms a first cluster that includes the first computer under its control, and a management computer that owns the master table. Including at least one of the slave tables generated for the second computer when monitoring and controlling the operation of the second computer under its control, which is added to the second cluster formed by another management computer,
The additional ease information is recorded in the master table,
The management computer having the master table behaves as a master management computer that mainly performs information processing related to formation of the first cluster, change of the number of computers in the first cluster, and disappearance of the first cluster, and the slave table , The management computer that acts as a slave management computer for the second cluster,
The network system according to claim 9.

The master management computer has search means for searching for candidate computers to be added to the first cluster by inquiring whether any of the management computers is in a clusterable state.
The network system according to claim 10.

Any of the management computers determines whether to add a candidate computer belonging to the management computer to the first cluster based on the additional ease information about the first cluster formed by the master management computer.
The network system according to claim 10.

The additional ease information is a numerical value quantifying the ease of adding a candidate computer to the first cluster,
The cluster growth means of the master management computer makes it easier to add the candidate computer to the first cluster as the numerical value increases.
The network system according to claim 10.

The cluster growth means holds the numerical value at a constant value regardless of addition of candidate computers.
The network system according to claim 13.

The cluster growth means makes the value variable after recording the numerical value in the second table.
The network system according to claim 13.

The cluster growth means varies the numerical value when the candidate computer is added to the first cluster.
The network system according to claim 15.

A network system in which a plurality of computers each capable of clustering with other computers can freely join and leave;
For each cluster already formed by one or more computers, including a table listing identification information of other clusters associated with the computers belonging to the cluster;
The computer that formed the cluster has the identification information listed in the table as to whether or not to add the candidate computer to another cluster associated with its own cluster when there is a candidate computer to be added. Having a cluster growth means to determine based on
Network system.

The table is preferentially listed so that the candidate computer is added to a related cluster having a higher relevance to the processing executed by the own cluster.
The network system according to claim 17.

Including a first table that records clusterability information indicating whether each computer state is clusterable.
Any one of the computers forms a cluster including self and other computers whose clusterability information in the first table indicates a clusterable state, and for all the computers included in the formed cluster. The cluster forming means for updating the clustering availability information to information indicating a non-clusterable state,
The network system according to claim 17.

Each is a management computer that owns one or more computers that can be clustered with other computers;
Network connection means for connecting to a computer network that can freely join and leave with other management computers having similar functions;
A first table that records whether or not each computer can be clustered, and how much other clusters are added to the cluster already formed by one or more computers Table management means enabling access to a second table for recording additional ease information indicating whether or not
A cluster including a computer in which the clustering availability information in the first table represents a clusterable state among the computers under the control of the own computer and other management computers is formed and formed. Cluster forming means for updating the clusterability information on all computers included in the cluster to information indicating a non-clusterable state, and further recording the additional ease information on the cluster in the second table When;
And a cluster growth means for determining, based on the additional ease information recorded in the second table, whether to add the candidate computer to the cluster when there is a candidate computer to be added. Computer.

When the execution of the job by the formed cluster is completed, the cluster is deleted, and the additional ease information recorded in the second table and the first computer for all computers belonging to the deleted cluster. Further comprising cluster erasure means for returning the clustering availability information recorded in the table to the state before the cluster formation,
The management computer according to claim 20.

A cluster management method in a network system in which a plurality of computers each capable of clustering with other computers can freely join and leave;
Each computer records in the first table clusterability information indicating whether its own state is a state in which clustering is possible;
One of the computers forms a cluster including itself and another computer whose clusterability information in the first table indicates a clusterable state, and for all computers included in the formed cluster. Updating the clusterability information to information indicating a non-clusterable state, and further recording in the second table additional ease information indicating how easy the cluster can add another computer;
Determining whether to add the candidate computer to the cluster based on the additional ease information recorded in the second table when there is a candidate computer as an addition candidate. .

When the computer that formed the cluster ends execution of the job by the formed cluster, the computer disappears and belongs to the additional ease information recorded in the second table and the disappeared cluster. Returning the clusterability information recorded in the first table for all computers to a state before cluster formation.
The cluster management method according to claim 22.

A cluster management method in a network system in which a plurality of computers each capable of clustering with other computers can freely join and leave;
A computer that has already formed a cluster with one or a plurality of computers lists the identification information of other clusters with which the computers belonging to the cluster are associated in a predetermined table;
The identification information recorded in the table indicates whether or not the computer forming the cluster adds the candidate computer to another cluster associated with the cluster when there is a candidate computer to be added. Determining based on: a cluster management method.

Each computer can be read and executed by any computer in the network system in which a plurality of computers that can be clustered with other computers can freely join and leave, respectively.
A first table that records whether or not each computer can be clustered, and how much other clusters are added to the cluster already formed by one or more computers Table management means enabling access to a second table for recording additional ease information indicating whether it is easy to do;
The clustering availability information of the first table forms a cluster including self and other computers indicating a clusterable state, and the cluster of the first table for all computers included in the formed cluster Cluster forming means for updating the information indicating whether or not to enable clustering to information indicating a non-clusterable state, and further recording the additional ease information for the cluster in the second table;
For functioning as cluster growth means for determining whether to add a candidate computer to the cluster based on the additional ease information recorded in the second table when there is a candidate computer to be added. Computer program.

The management computer is read and executed by a management computer that owns one or more computers each capable of clustering with another computer;
Network connection means for connecting to a computer network that can freely join and leave with other management computers having similar functions;
A first table that records whether or not each computer can be clustered, and how much other clusters are added to the cluster already formed by one or more computers Table management means enabling access to a second table for recording additional ease information indicating whether it is easy to do;
A cluster including a computer in which the clustering availability information in the first table represents a clusterable state among the computers under the control of the own computer and other management computers is formed and formed. Cluster forming means for updating the clusterability information on all computers included in the cluster to information indicating a non-clusterable state, and further recording the additional ease information on the cluster in the second table ;
For functioning as cluster growth means for determining whether to add a candidate computer to the cluster based on the additional ease information recorded in the second table when there is a candidate computer to be added. Computer program.

A plurality of computers each capable of clustering with other computers are read and executed by any computer in the network system that can freely join and leave;
Cluster management means for listing, in a predetermined table, identification information of other clusters associated with computers belonging to the cluster when a cluster is already formed by one or a plurality of computers;
Cluster growth means for determining, based on the identification information listed in the table, whether or not to add a candidate computer to another cluster associated with its own cluster when there is a candidate computer to be added A computer program for functioning as;