JP2009086741A

JP2009086741A - Distributed processing control method in heterogeneous node existing distributed environment and its system and its program

Info

Publication number: JP2009086741A
Application number: JP2007252169A
Authority: JP
Inventors: Takashi Kono; 高志光野; Mitsutaka Shimada; 光高嶋田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2007-09-27
Filing date: 2007-09-27
Publication date: 2009-04-23

Abstract

<P>PROBLEM TO BE SOLVED: To optimize load distribution, and to improve performance according to the state change (load fluctuation, failure occurrence, and new node addition) of each node, and to continue the distributed processing when any failure occurs, and to easily add computer resources in a distributed environment where nodes whose performances or platforms are different coexist. <P>SOLUTION: This distributed processing control method and a distributed processing control program is provided with a distributed object communication base part for achieving platform independency and position permeability by the CORBA and SDO of OMG standard; a state monitoring part for collecting the load state of each node; a resource management part for executing distributed processing by determining the distribution of distributed processing units according to the load state; a dynamic fail over control part for taking over distributed processing by dynamically changing configurations when detecting node failure; and a distributed automatic reference control part for, when detecting a new mode, making it automatically participate in the distributed processing. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、分散処理制御に係る。特に、性能やプラットフォームが異なるノード(計算機または計算機システム)が混在する分散環境において、各ノードの状態変化(負荷変動、障害の発生、新規ノード追加)に応じ、自律的な分散処理制御を実現する分散処理制御方法及び分散処理制御プログラムに関する。 The present invention relates to distributed processing control. In particular, in a distributed environment where nodes (computers or computer systems) with different performance and platforms coexist, autonomous distributed processing control is realized in response to changes in the state of each node (load fluctuation, failure occurrence, new node addition) The present invention relates to a distributed processing control method and a distributed processing control program.

分散環境における負荷分散には、従来から、各ノードの負荷に応じて、分散する処理単位を割り振る方式が広く知られている。特許文献１は、トランザクション単位に負荷の低いサーバに割り振る方法である。特許文献２は、プログラム単位に処理の緊急度と優先度の変換を行う方法である。特許文献３は、オブジェクト間通信量に応じてオブジェクト自体の実行場所を動的に配置変更する方法である。これらの方法は、着目するトランザクション、プログラム、又はオブジェクト毎の粒度で負荷を分散する。 Conventionally, a method of allocating processing units to be distributed according to the load of each node is widely known for load distribution in a distributed environment. Patent Document 1 is a method of allocating to a server with a low load on a transaction basis. Patent Document 2 is a method of converting the urgency level and priority level of processing in units of programs. Patent Document 3 is a method of dynamically changing the execution location of an object itself according to the amount of communication between objects. These methods distribute the load at a granularity for each transaction, program, or object of interest.

分散環境における障害発生の際に必要な動的フェール・オーバー技術としては、待機系サーバに切り替える方式が広く知られている。特許文献４は、障害時に待機系サーバに切り替え、サーバロケーションを変更する方法である。また、特許文献５は、障害業務の通信経路を現用系から待機系に設定変更する方法である。 As a dynamic fail over technique required when a failure occurs in a distributed environment, a method of switching to a standby server is widely known. Patent Document 4 is a method of switching to a standby server in the event of a failure and changing the server location. Further, Patent Document 5 is a method of changing the setting of the communication path for faulty work from the active system to the standby system.

特許文献６は、分散処理の一般的形態で、トランザクションの振り分け及び実行制御を行う専用サーバと負荷分散する複数サーバで負荷を分散する方式である。この方法では、新規ノードの追加に際して、事前にシステム構成の設定変更やプログラム変更が必要である。 Patent Document 6 is a general form of distributed processing, and is a method of distributing a load between a dedicated server that performs transaction distribution and execution control and a plurality of servers that distribute the load. In this method, when a new node is added, it is necessary to change a system configuration setting or a program in advance.

特開2003-296289号JP2003-296289 特開平7-282013号JP 7-282013 A 特開2000-242609号JP 2000-242609 特開2000-99483号JP 2000-99483 特開平9-293059号JP 9-293059 A 特開2006-113827号JP 2006-113827

特許文献１〜３に記載される技術では、分散処理単位であるトランザクション、プログラム、又はオブジェクトの１つで多くの計算機資源を必要とする場合に、負荷分散に偏りが発生する可能性がある。特許文献４及び５に記載される技術では、障害発生に伴う動的フェール・オーバーの際に、待機系サーバが必須であり、またサーバロケーションの変更や通信経路を変更する処理が必要である。 In the techniques described in Patent Documents 1 to 3, when a large number of computer resources are required for one of a transaction, a program, or an object that is a distributed processing unit, there is a possibility that the load distribution is biased. In the techniques described in Patent Documents 4 and 5, a standby server is indispensable in the case of a dynamic failover due to the occurrence of a failure, and a process for changing a server location or a communication path is necessary.

そこで本発明は、性能やプラットフォームの異なるノードが混在する分散処理システムにおいて、各ノードの状態変化(負荷変動、障害の発生、新規ノードの追加)に応じて、分散処理単位を各ノードに適切に配分し、負荷分散を図り、性能を向上させることを目的とする。 Therefore, in the distributed processing system in which nodes having different performance and platforms are mixed, the present invention appropriately sets the distributed processing unit to each node according to the state change of each node (load fluctuation, occurrence of failure, addition of new node). The purpose is to distribute, load balance and improve performance.

本発明の他の目的は、障害の発生に際して分散処理を継続させること及び容易に計算機資源の追加が行なえるようにすることである。 Another object of the present invention is to continue distributed processing in the event of a failure and to make it easy to add computer resources.

本発明の態様の一つは、CPU性能の異なるｎ台のノードが混在する分散処理システムにおける、各ノードへの分散処理単位を配分する分散処理制御である。管理ノードが、各ノードのCPU使用率を収集し、収集したCPU使用率を所定期間で平均化し、各ノードの平均CPU使用率、CPU性能およびCPU性能の総和を用いて、各ノードの余裕率を求め、その余裕率に対応して、分散処理単位を各ノードに配分する分散処理制御である。 One aspect of the present invention is distributed processing control that distributes a distributed processing unit to each node in a distributed processing system in which n nodes having different CPU performances coexist. The management node collects the CPU usage rate of each node, averages the collected CPU usage rate over a predetermined period, and uses the average CPU usage rate, CPU performance, and the sum of the CPU performance of each node, and the margin rate of each node Is distributed processing control in which distributed processing units are distributed to each node in accordance with the margin rate.

本発明の他の態様は、ｎ台のノードの1台に障害が発生してｎ−1台の分散処理システムになっても、新たにノードが加わりｎ＋1台の分散処理システムになっても、各ノードの余裕率に対応して分散処理単位を各ノードに配分する分散処理制御である。 Another aspect of the present invention is that even if a failure occurs in one of the n nodes to become an n−1 distributed processing system, or a new node is added to become an n + 1 distributed processing system, This is distributed processing control that distributes distributed processing units to each node in accordance with the margin ratio of each node.

余裕率に基づく、各ノードの負荷状態に応じた分散処理単位の配分によって、きめ細かい負荷分散が図れ、分散処理の性能が向上する。 By distributing the distributed processing units according to the load state of each node based on the margin rate, fine load distribution can be achieved and the performance of distributed processing is improved.

図面を参照し、実施例を説明する。図１は、性能やプラットフォームが異なるノードが混在する分散処理システムにおいて、各ノードの状態変化として負荷変動を考慮した場合の、各ノードの負荷に応じた分散処理単位の最適配分を説明する図である。 Embodiments will be described with reference to the drawings. FIG. 1 is a diagram for explaining the optimal distribution of distributed processing units according to the load of each node in a distributed processing system in which nodes having different performances and platforms are mixed, when load variation is considered as a state change of each node. is there.

図１の管理ノード100は、分散処理システムの全体を管理するノードである。計算ノード110は、分散処理単位を実行するノードである（一般的に計算ノードは複数存在するが、本図では省略している。）。 The management node 100 in FIG. 1 is a node that manages the entire distributed processing system. The computation node 110 is a node that executes a distributed processing unit (generally there are a plurality of computation nodes, but they are omitted in this figure).

なお、図１では管理ノード100と計算ノード110とは異なるノードであるように示しているが、管理ノードとしての負荷が低い場合、その余裕は計算ノードとして使用される。 Although FIG. 1 shows that the management node 100 and the calculation node 110 are different nodes, when the load as the management node is low, the margin is used as the calculation node.

ＳＤＯミドルウェア300は、分散オブジェクト技術の標準化団体OMG(Object Management Group)によるＳＤＯ(Super Distributed Objects：OMG標準の超分散オブジェクト仕様)及びＣＯＲＢＡ(Common Object Request Broker Architecture：OMG標準のオブジェクト仕様)を使用して、ノード状態に応じた分散処理を制御する。ＳＤＯミドルウェア300は、分散処理アプリケーションに関わらず共通である。また、分散処理アプリケーションは、ユーザプログラムＡ200及びユーザプログラムＢ210から構成される。ユーザプログラムＡ200は、分散処理依頼元に相当し、分散処理パラメータを生成し、分散処理要求を行う。一方、ユーザプログラムＢ210は、分散処理依頼先に相当し、分散処理要求により、配分された分散処理単位を実行する。 SDO middleware 300 uses SDO (Super Distributed Objects: OMG standard super distributed object specification) and CORBA (Common Object Request Broker Architecture: OMG standard object specification) by OMG (Object Management Group), a standard organization for distributed object technology. The distributed processing according to the node state is controlled. The SDO middleware 300 is common regardless of the distributed processing application. The distributed processing application includes a user program A200 and a user program B210. The user program A200 corresponds to a distributed processing request source, generates a distributed processing parameter, and makes a distributed processing request. On the other hand, the user program B210 corresponds to a distributed processing request destination, and executes the distributed processing unit distributed by the distributed processing request.

図１を参照しながら、負荷状態の監視、余裕率の算出、分散処理単位数の配分決定、配分された分散処理単位の実行の順に、最適負荷分散の仕組みを説明する。 With reference to FIG. 1, the optimum load distribution mechanism will be described in the order of load state monitoring, margin ratio calculation, distributed processing unit number allocation determination, and distributed processing unit distribution execution.

まず、負荷状態の監視を説明する。各ノードの状態監視部310は、ＣＰＵ使用率を含む自ノードの負荷状態を、パブリッシュ・サブスクライブ型配信サービスであるＤＤＳ(Data Distributed Service)を介して、管理ノードの資源管理部320へ周期的に通知する。状態監視部310の具体的な処理を、図5を用いて説明する。図5に示す処理は周期的に、必要に応じて障害等のイベントの発生に応じて起動され、自ノードの障害状態情報を取得する（Ｓ500）。正常状態であれば、障害状態情報は正常状態であることを示す。次に、ＣＰＵ負荷情報、使用可能なメモリ容量情報を取得する（Ｓ505、Ｓ510）これら取得した情報を管理ノード100の資源管理部320へ送る（Ｓ115）。 First, load state monitoring will be described. The state monitoring unit 310 of each node periodically transmits the load state of the own node including the CPU usage rate to the resource management unit 320 of the management node via a DDS (Data Distributed Service) that is a publish / subscribe distribution service. Notify Specific processing of the state monitoring unit 310 will be described with reference to FIG. The processing shown in FIG. 5 is periodically started in response to the occurrence of an event such as a failure as necessary, and acquires the failure status information of the own node (S500). If the state is normal, the failure state information indicates a normal state. Next, CPU load information and usable memory capacity information are acquired (S505, S510), and the acquired information is sent to the resource management unit 320 of the management node 100 (S115).

資源管理部320では、各ノードの平均ＣＰＵ使用率Ｕ(t)を式（1）により算出する。 In the resource management unit 320, the average CPU usage rate U (t) of each node is calculated by equation (1).

Ｕ(t)は時刻tにおける平均ＣＰＵ使用率、cは平均する周期数、pは周期の長さ(収集間隔)、kは１以上の整数、ΣＵ｛t-p(k-1)｝は一定時間に収集したＣＰＵ使用率の和を示す。ここで、平均ＣＰＵ使用率を算出する理由を説明する。分散処理単位の配分を決定する際に、各ノードから収集したＣＰＵ使用率（瞬時値）をそのまま使用すると、瞬時値の大きな偏差による影響を受け、分散処理単位の配分が偏る可能性がある。これを防止するため、平均ＣＰＵ使用率を算出する。 U (t) is the average CPU usage rate at time t, c is the number of cycles to be averaged, p is the length of the cycle (collection interval), k is an integer of 1 or more, and ΣU {tp (k-1)} is a fixed time Shows the sum of the CPU usage rates collected. Here, the reason for calculating the average CPU usage rate will be described. When determining the distribution of distributed processing units, if the CPU usage rate (instantaneous value) collected from each node is used as it is, there is a possibility that the distribution of distributed processing units is biased due to the influence of a large deviation of the instantaneous value. In order to prevent this, an average CPU usage rate is calculated.

次に、各ノードが、分散処理のためにどの程度の計算機資源を提供できるかを示す指標である余裕率の算出を説明する。ここで、余裕率を計算機資源の余力（分散処理の配分に使用できるＣＰＵ使用率の割合）と定義し、（１００−平均ＣＰＵ使用率）で表すものとする。資源管理部320では、平均ＣＰＵ使用率と各計算ノードの性能の絶対評価値(ベンチマーク等の値)と併せて、各ノードの余裕率Ｆ(i)を式(２)により算出する。 Next, calculation of a margin ratio that is an index indicating how much computer resources each node can provide for distributed processing will be described. Here, the margin rate is defined as the remaining capacity of the computer resource (the ratio of the CPU usage rate that can be used for the distribution processing distribution), and is represented by (100−average CPU usage rate). In the resource management unit 320, together with the average CPU usage rate and the absolute evaluation value (benchmark value or the like) of the performance of each calculation node, the margin rate F (i) of each node is calculated by Equation (2).

Ｆ(i)はノードiにおける余裕率を示す。Ｕ(t)は時刻tにおける平均ＣＰＵ使用率、Ｐ(i)はノードｉの性能の絶対評価値(ベンチマークの値等)、ｎは本分散制御システムの全ノード数、ｊは１以上の整数、ΣＰ(j)は全ノードの絶対評価値の和を示す。 F (i) indicates a margin rate at node i. U (t) is the average CPU usage rate at time t, P (i) is the absolute evaluation value of the performance of node i (benchmark value, etc.), n is the total number of nodes in this distributed control system, j is an integer of 1 or more , ΣP (j) represents the sum of absolute evaluation values of all nodes.

ここで、性能の絶対値を算出式で使用する理由を説明する。単に、（１００−平均ＣＰＵ使用率）を余裕率とすると、性能の異なるノードが混在する環境では、適切な負荷配分ができない。たとえば同じ余裕率の２つのノードが存在し、この２つのノードの性能差が２倍である場合、余裕率だけで配分すると１：１である。実際は、性能が２倍優れているノードと他方のノードの余裕率の比は、２：１であり、これに対応して最適配分比は、２：１となるからである。したがって、性能の絶対評価値に基づいた各ノードの余裕率を算出する必要があり、このための算出式が式(２)である。 Here, the reason why the absolute value of performance is used in the calculation formula will be described. If (100−average CPU usage rate) is simply a margin rate, appropriate load distribution cannot be performed in an environment where nodes with different performances coexist. For example, when there are two nodes having the same margin ratio, and the difference in performance between the two nodes is double, when the allocation is performed only by the margin ratio, the ratio is 1: 1. Actually, the ratio of the margin ratio of the node having the superior performance to the other node to the other node is 2: 1, and the optimum distribution ratio is 2: 1 corresponding to this. Therefore, it is necessary to calculate the margin ratio of each node based on the absolute evaluation value of performance, and the calculation formula for this is formula (2).

絶対評価値を使用して余裕率の算出することで、性能の異なるノードにおいて同一基準で余裕率を算出したことになる。この余裕率の算出により、性能の異なるノードが混在する分散制御システムにおいて、各ノードの負荷に応じて分散処理単位を適切に配分できる。 By calculating the margin ratio using the absolute evaluation value, the margin ratio is calculated based on the same standard in nodes having different performances. By calculating the margin ratio, in a distributed control system in which nodes having different performances are mixed, distributed processing units can be appropriately distributed according to the load of each node.

以上の管理ノード100の資源管理部320での処理を図6に示す。各ノードからの障害状態情報、ＣＰＵ負荷情報、使用可能なメモリ容量情報などの状態監視情報を受け取る（Ｓ600）。受け取った状態監視情報を基に各ノードの平均ＣＰＵ使用率を求める（Ｓ605）。求めた平均ＣＰＵ使用率から、各ノードの余裕率を求める。 FIG. 6 shows the processing in the resource management unit 320 of the management node 100 described above. Status monitoring information such as failure status information, CPU load information, and usable memory capacity information from each node is received (S600). Based on the received state monitoring information, an average CPU usage rate of each node is obtained (S605). From the obtained average CPU usage rate, a margin rate of each node is obtained.

なお、ここではＣＰＵ負荷情報であるＣＰＵ使用率を基に余裕率を求めているが、監視状態情報の一つとして得た使用可能なメモリ容量が、分散処理単位を配分する際に配分対象となるノードになる制約条件になる場合があることは当業者には明らかであろう。 Here, the margin ratio is obtained based on the CPU usage rate that is CPU load information, but the available memory capacity obtained as one of the monitoring status information is the allocation target when the distributed processing unit is allocated. It will be apparent to those skilled in the art that there may be constraints that become a node.

次に、分散処理単位数の配分決定及び配分された分散処理単位の実行を説明する。資源管理部320では、分散処理依頼元からの処理要求により、算出した余裕率の比に従って、分散処理単位数の配分を式(３)により決定し、ＣＯＲＢＡを介して各計算ノード110に分散処理単位を配分する。各計算ノード110で配分された分散処理単位の処理を実行する。 Next, determination of distribution of the number of distributed processing units and execution of the distributed processing units distributed will be described. In response to a processing request from the distributed processing request source, the resource management unit 320 determines the distribution of the number of distributed processing units according to the ratio of the calculated margin ratio according to Equation (3), and distributes processing to each calculation node 110 via CORBA. Allocate units. The processing of the distributed processing unit distributed by each calculation node 110 is executed.

各ノードにおける分散処理単位の配分数Ｄ(i)を求める式（3）を説明する。Ｄ(i)はノードiへの分散処理単位の配分数、Ｄ_allは配分対象となる分散処理単位の全体数、Ｆ(i)はノードiにおける余裕率、ｎは本分散制御システムの全ノード数、ｊは１以上の整数、ΣＦ(j)は全ノードの余裕率の和を示す。 Formula (3) for obtaining the distribution number D (i) of the distributed processing units in each node will be described. D (i) is the number of distributed processing units distributed to node i, D _all is the total number of distributed processing units to be distributed, F (i) is the margin ratio at node i, and n is all nodes of this distributed control system The number, j is an integer of 1 or more, and ΣF (j) indicates the sum of the margin rates of all nodes.

以上、説明したように図１の仕組みにより、各ノードの負荷状態に応じた分散処理単位の配分によって、きめ細かい負荷分散が図れ、分散処理の性能が向上する。
図２と図３は、性能やプラットフォームが異なるノードが混在する分散処理システムにおいて、各ノードの状態変化として障害の発生を考慮した場合の、動的フェール・オーバーを説明する図である。 As described above, with the mechanism of FIG. 1, fine load distribution can be achieved by distributing distributed processing units according to the load state of each node, and the performance of distributed processing is improved.
2 and 3 are diagrams for explaining dynamic fail over when occurrence of a failure is considered as a state change of each node in a distributed processing system in which nodes having different performances and platforms are mixed.

図２は、管理ノード100の障害時の動的フェール・オーバーを説明する図である。図２を参照しながら、障害検知、新管理ノードの選出、参照情報の登録、分散処理の引継ぎの順に、管理ノード障害時の動的フェール・オーバーの仕組みを説明する。 FIG. 2 is a diagram for explaining dynamic fail over when the management node 100 fails. With reference to FIG. 2, the mechanism of dynamic fail over in the event of a management node failure will be described in the order of failure detection, selection of a new management node, registration of reference information, and takeover of distributed processing.

まず、障害検知を説明する。この動的フェール・オーバーを実現するため、各計算ノード110に、動的フェール・オーバー制御部を含む資源管理部320を配置し、稼動させる。各計算ノード110は、管理ノード100が周期的に発生する稼動通知を監視し、管理ノード100の障害を検知する。 First, failure detection will be described. In order to realize this dynamic fail over, a resource management unit 320 including a dynamic fail over control unit is arranged and operated in each computation node 110. Each computing node 110 monitors an operation notification periodically generated by the management node 100 and detects a failure of the management node 100.

障害が検知された管理ノード100に代わる、新たな管理ノードの選出について説明する。自動的に新管理ノードを決定するため、障害を検知した計算ノード101(この後、新管理ノードになるが、この時点では計算ノードである。)の資源管理部320は、自ノードの優先順位を付加した代表選出メッセージを他ノードに送出する。各計算ノード(たとえば、計算ノード110)は、受け取った代表選出メッセージの優先順位と自ノードの優先順位とを比較し、自ノード110の優先順位が低い場合は計算ノードとしての機能を継続する。自ノード110の優先順位が受け取った代表選出メッセージの優先順位より高い場合、このノードは改めて自ノード110の優先順位を付加した代表選出メッセージを他ノードに送出する。代表選出メッセージを送出の所定時間後までに、新たな代表選出メッセージを受け取らなかったならば、代表選出メッセージを送出したこのノードが新管理ノードになる。このような新管理ノードの選出により、障害を検知した計算ノード101の優先順位が最も高い場合、計算ノード101が新たな管理ノードになる。このように代替ノードの位置や予めの取り決めに依存せずに、新管理ノードを選出できる。 The selection of a new management node to replace the management node 100 in which a failure has been detected will be described. Since the new management node is automatically determined, the resource management unit 320 of the calculation node 101 that detects the failure (hereinafter, the new management node, but at this time is the calculation node) determines the priority order of the own node. A representative selection message to which is added is sent to another node. Each calculation node (for example, calculation node 110) compares the priority of the received representative selection message with the priority of its own node, and continues its function as a calculation node when its own priority is low. When the priority order of the own node 110 is higher than the priority order of the received representative selection message, this node transmits a representative selection message to which the priority order of the own node 110 is newly added to other nodes. If a new representative selection message is not received by a predetermined time after sending the representative selection message, this node that has sent the representative selection message becomes the new management node. When the priority of the calculation node 101 that has detected the failure is the highest due to the selection of the new management node, the calculation node 101 becomes the new management node. In this manner, a new management node can be selected without depending on the position of the alternative node or the prior arrangement.

以上の新たな管理ノードの選出処理を図7を用いて説明する。図7の処理は、各計算ノード101の資源管理部320の処理として実行される。管理ノード100が周期的に発生する稼動通知を監視し、管理ノード100の障害を検知する（Ｓ700）。自ノードの優先順位を他の計算ノードに向けて送出する（Ｓ705）。所定時間の間（Ｓ710）、他の計算ノードからの優先順位情報を受け取る（Ｓ715）。他の計算ノードから受信した優先順位とステップ705で送出した自ノードの優先順位とを比較し（Ｓ720）、受け取った他の計算ノードからの優先順位が自ノードの優先順位よりも高い場合は、処理を終了する。所定時間経過中に自ノードの優先順位を超える他の計算ノードからの優先順位を受け取らなかった場合、自ノードが新たな管理ノードと決定する（Ｓ725）。 The process for selecting a new management node will be described with reference to FIG. The process of FIG. 7 is executed as a process of the resource management unit 320 of each computation node 101. The management node 100 monitors operation notifications periodically generated and detects a failure of the management node 100 (S700). The priority order of the own node is sent to other calculation nodes (S705). For a predetermined time (S710), priority order information from other calculation nodes is received (S715). The priority received from the other calculation node is compared with the priority of the own node sent in step 705 (S720). When the priority from the other calculation node received is higher than the priority of the own node, The process ends. When the priority from other calculation nodes exceeding the priority of the own node is not received within the predetermined time, the own node is determined as a new management node (S725).

次に、新参照情報の登録を説明する。新管理ノード101が決定した後、新管理ノード101の資源管理部(新代表)320は、ネーミングサービスに新参照情報を登録する。この新参照情報の登録により、以降、新管理ノードのオブジェクト参照ができる。 Next, registration of new reference information will be described. After the new management node 101 is determined, the resource management unit (new representative) 320 of the new management node 101 registers new reference information in the naming service. By registering the new reference information, the object reference of the new management node can be performed thereafter.

分散処理中の動的フェール・オーバーの際、分散処理の途中結果を引き継ぐための仕組みとしては、分散共有メモリ(図示略)を使用したテーブル参照を使用する。 A table reference using a distributed shared memory (not shown) is used as a mechanism for taking over the intermediate result of the distributed processing at the time of dynamic fail over during the distributed processing.

以上、説明したように図２の動的フェール・オーバーの仕組みにより、管理ノードの障害発生の際に、分散処理を引き継ぎ、代替ノードの位置に依存せずに、分散処理を継続することができる。以降、分散処理は、新管理ノード101と複数の計算ノード110により、負荷に応じた分散処理が可能な状態として動作する。 As described above, the dynamic fail-over mechanism of FIG. 2 can take over the distributed processing in the event of a failure of the management node and continue the distributed processing without depending on the position of the alternative node. . Thereafter, the distributed processing is performed by the new management node 101 and the plurality of computing nodes 110 in a state where distributed processing according to the load is possible.

管理ノード100の障害に伴う新たな管理ノード101は管理ノードとしての機能を実行する分だけ負荷が高くなる。そこで、管理ノード100を除いて分散制御システムを構成するn−1台のノードに関して、図1に示した分散制御の仕組みを適用することにより、負荷に応じた分散処理が可能な状態として動作する。この場合、新たに配分する分散処理単位は、新たな管理ノード101によって未処理の分散処理単位を対象とすることが望ましい。全ノードの未処理の分散処理単位を対象とすると、その再配分のための処理オーバーヘッドが大きくなるからである。 The load on the new management node 101 due to the failure of the management node 100 increases as the function of the management node is executed. Therefore, by applying the distributed control mechanism shown in FIG. 1 to the n−1 nodes constituting the distributed control system excluding the management node 100, the distributed operation according to the load can be performed. . In this case, it is desirable that the distributed processing unit to be newly allocated is targeted for the unprocessed distributed processing unit by the new management node 101. This is because processing overhead for redistribution becomes large when unprocessed distributed processing units of all nodes are targeted.

図３は、計算ノード110の障害時の動的フェール・オーバーを説明する図である。図３を参照しながら、障害検知、ＳＤＯの再立ち上げ、レジストリの登録の順に、計算ノード障害時の動的フェール・オーバーの仕組みを説明する。 FIG. 3 is a diagram for explaining the dynamic fail over when the computing node 110 fails. With reference to FIG. 3, the mechanism of dynamic fail over in the event of a computation node failure will be described in the order of failure detection, SDO restart, and registry registration.

計算ノード110の障害は、管理ノード100の資源管理部320が検出する。たとえば、計算ノード110の状態監視部310から、所定時間、状態情報の報告がない場合は、計算ノード110の障害として検知する。また、各計算ノード上の状態監視部310でＳＤＯレジストリ(ＳＤＯ、ノードアドレス等に関する情報を蓄積・管理するレジストリ)に定周期アクセスを行わせて、アクセスできない場合には、状態情報として“障害”を、管理ノード100の資源管理部320に報告する。これも、計算ノードの障害検知である。 A failure of the calculation node 110 is detected by the resource management unit 320 of the management node 100. For example, when no state information is reported for a predetermined time from the state monitoring unit 310 of the calculation node 110, it is detected as a failure of the calculation node 110. In addition, if the status monitoring unit 310 on each computation node makes periodic access to the SDO registry (registry that stores and manages information related to SDO, node addresses, etc.) and cannot be accessed, “failure” is set as status information. Is reported to the resource management unit 320 of the management node 100. This is also a failure detection of the computation node.

次に、ＳＤＯの再立ち上げを説明する。計算ノードの障害を検知した場合、資源管理部320(動的フェール・オーバー制御部を含む)はフェール・オーバー先の計算ノード111に対して、ＳＤＯの再立ち上げ（フェール・オーバー）を指示する。 Next, the restart of SDO will be described. When a failure of a calculation node is detected, the resource management unit 320 (including the dynamic fail-over control unit) instructs the fail-over calculation node 111 to restart the SDO (fail over). .

次に、レジストリの登録を図8を用いて説明する。ＳＤＯを再立ち上げした計算ノード111では、動的フェール・オーバー制御部の一部であるＳＤＯ管理（ＳＤＯの登録、更新、削除等を行うモジュール）がＳＤＯレジストリに登録する（Ｓ800）。自ノード上でのＳＤＯ起動時にＳＤＯＩＤ(一意識別子)及びＣＯＲＢＡオブジェクトリファレンスを含むＳＤＯ起動通知をイベントチャネルにより１対多の同時送信をする（Ｓ805）。 Next, registry registration will be described with reference to FIG. In the computing node 111 that has restarted the SDO, SDO management (a module for registering, updating, and deleting SDO) that is a part of the dynamic fail-over control unit is registered in the SDO registry (S800). At the time of SDO activation on the own node, the SDO activation notification including the SDO ID (unique identifier) and the CORBA object reference is transmitted simultaneously in a one-to-many manner through the event channel (S805).

各ノードは、自ノードで管理しているＳＤＯＩＤに該当するＳＤＯ起動通知を受信した場合に、ＳＤＯレジストリのノードアドレス情報を更新する。このレジストリの登録により、以降、新計算ノードのオブジェクト参照ができる。 Each node updates the node address information of the SDO registry when receiving an SDO activation notification corresponding to the SDO ID managed by the node. By this registry registration, the object reference of the new calculation node can be made thereafter.

分散処理中の動的フェール・オーバーの際、分散処理の途中結果を引き継ぐための仕組みとして、管理ノード障害の場合と同様、分散共有メモリを使用したテーブル参照を使用する。
以上、説明したように図３の動的フェール・オーバーの仕組みにより、計算ノードの障害発生の際に、分散処理を引き継ぎ、代替ノードの位置に依存せずに、分散処理を継続することができる。 As in the case of a management node failure, a table reference using the distributed shared memory is used as a mechanism for taking over the intermediate result of the distributed processing at the time of dynamic fail over during the distributed processing.
As described above, with the dynamic fail-over mechanism of FIG. 3, when a failure occurs in a computation node, the distributed processing can be taken over and continued without depending on the position of the alternative node. .

計算ノード110の障害に伴い、計算ノード110を除いて分散制御システムを構成するn−1台のノードに関して、図1に示した分散制御の仕組みを適用することにより、負荷に応じた分散処理が可能な状態として動作する。この場合、新たに配分する分散処理単位は、計算ノード110によって未処理の分散処理単位を対象とすることが望ましい。全ノードの未処理の分散処理単位を対象とすると、その再配分のための処理オーバーヘッドが大きくなるからである。 With the failure of the computation node 110, the distributed processing according to the load can be performed by applying the distributed control mechanism shown in FIG. Act as possible. In this case, it is desirable that the distributed processing unit to be newly allocated is a distributed processing unit that has not been processed by the computation node 110. This is because processing overhead for redistribution increases when the unprocessed distributed processing units of all nodes are targeted.

図４は、各ノードの状態変化のうち、新規ノードの追加に対するもので、新規ノードの分散処理への自動参加を説明する図である。図４を参照しながら、新規ノード検知、新規ノードの参加準備、新規ノードの追加、分散参加中アイコンの順に、新規ノード追加による分散自動参加の仕組みを説明する。 FIG. 4 is a diagram for explaining the automatic participation in the distributed processing of a new node, which is for the addition of a new node among the state changes of each node. With reference to FIG. 4, a mechanism of distributed automatic participation by adding a new node will be described in the order of new node detection, new node participation preparation, new node addition, and distributed participation icon.

新規ノード検知の方法を説明する。管理ノード100の資源管理部320は、各計算ノードからの周期的に送信される状態情報を監視し、管理下にない新規ノード120からの状態情報の受信により、新規ノード120を検知する。この新規ノード120の検知により、管理ノード100は、新規ノード120を自動的に検知できる。 A method for detecting a new node will be described. The resource management unit 320 of the management node 100 monitors the status information periodically transmitted from each calculation node, and detects the new node 120 by receiving the status information from the new node 120 that is not under management. By detecting the new node 120, the management node 100 can automatically detect the new node 120.

次に、新規ノードの参加準備を説明する。管理ノード100の資源管理部320は、新規ノード120を検知したタイミングで新規ノード120に対して、分散参加を指示する。分散参加を指示された新規ノード120の分散自動参加制御部330の処理を、図9を用いて説明する。分散参加を指示された新規ノード120の分散自動参加制御部330は、事前設定された自ノード120に関する性能の絶対評価値を管理ノード100に通知（Ｓ900）した後、分散処理に参加する際に必要な定義ファイル及びＤＬＬ(Dynamic Link Library)を、ＦＴＰ(File Transfer Protocol)を使用して管理ノード100から新規ノード120が受け取る（Ｓ905）。この新規ノードの準備により、分散処理参加に必要な準備が自動的にできる。 Next, preparation for participation of a new node will be described. The resource management unit 320 of the management node 100 instructs distributed participation to the new node 120 at the timing when the new node 120 is detected. The processing of the distributed automatic participation control unit 330 of the new node 120 instructed to perform distributed participation will be described with reference to FIG. The distributed automatic participation control unit 330 of the new node 120 instructed to participate in the distributed participation notifies the management node 100 of the preset absolute performance evaluation value regarding the own node 120 (S900), and then participates in the distributed processing. The new node 120 receives the necessary definition file and DLL (Dynamic Link Library) from the management node 100 using FTP (File Transfer Protocol) (S905). By preparing this new node, preparations necessary for participating in distributed processing can be automatically performed.

なお、性能の絶対評価値に関しては、事前設定された値を使う。または、管理ノード100から分散参加を指示された新規ノード120でベンチマークを実行し、性能の絶対評価値を取得する。 For the absolute evaluation value of performance, a preset value is used. Alternatively, the benchmark is executed on the new node 120 that is instructed to be distributed from the management node 100, and the absolute evaluation value of the performance is acquired.

新規ノード120の分散自動参加制御部330は、ＦＴＰ転送の完了後に、準備完了(新規ノードの絶対評価値を含む。)を、管理ノード100の資源管理部320に通知する（Ｓ910）。 The distributed automatic participation control unit 330 of the new node 120 notifies the resource management unit 320 of the management node 100 of preparation completion (including the absolute evaluation value of the new node) after completion of the FTP transfer (S910).

次に、新規ノードの追加を説明する。管理ノード100の資源管理部320は、準備完了の受信タイミングで、分散環境管理テーブルに、新規ノード情報を追加する。この追加により、新規ノード120は分散環境に参加したことになる。なぜならば、管理ノード100の資源管理部320は、分散処理単位の配分を決定する際に、分散環境管理テーブルを参照するからである。 Next, addition of a new node will be described. The resource management unit 320 of the management node 100 adds new node information to the distributed environment management table at the reception timing of completion of preparation. With this addition, the new node 120 has joined the distributed environment. This is because the resource management unit 320 of the management node 100 refers to the distributed environment management table when determining the distribution of distributed processing units.

次に、分散参加中アイコンを説明する。管理ノード100の資源管理部320は、新規ノード120が分散環境に参加完了したこと(分散環境管理テーブルに追加完了したこと)を新規ノード120へ参加完了として通知する。新規ノード120の分散自動参加制御部は、参加完了の通知を受けたタイミングで、分散参加中を示すアイコンを表示する。 Next, the distributed participation icon will be described. The resource management unit 320 of the management node 100 notifies the new node 120 of completion of participation (addition completion to the distributed environment management table) that the new node 120 has completed participation in the distributed environment. The distributed automatic participation control unit of the new node 120 displays an icon indicating that distributed participation is in progress at the time when the notification of completion of participation is received.

また、このアイコンは、分散環境へ参加中であることを示すと共に、アイコン操作により、該ノードを分散環境から離脱させることができる。該ノードを分散環境から離脱させるために、アイコン操作がされた場合には、管理ノード100の分散環境管理テーブルから、該ノードの情報を削除する。この削除により、分散環境から離脱したことになる。なぜならば、管理ノード100の資源管理部320は、分散処理単位の配分を決定する際に、分散環境管理テーブルを参照するからである。離脱後は、アイコンを消去する。 In addition, this icon indicates that the node is participating in the distributed environment, and the node can be removed from the distributed environment by icon operation. When an icon operation is performed to cause the node to leave the distributed environment, the node information is deleted from the distributed environment management table of the management node 100. This deletion leaves the distributed environment. This is because the resource management unit 320 of the management node 100 refers to the distributed environment management table when determining the distribution of distributed processing units. After leaving, delete the icon.

以上、説明したように図４の新規ノードの分散処理への自動参加の仕組みにより、新規ノードの検知、分散処理に参加するための設定、分散処理開始に至る一連の、分散処理への自動参加ができる。 As described above, with the automatic participation mechanism of the new node in FIG. 4 as described above, a series of automatic participation in the distributed processing including detection of a new node, setting for participating in the distributed processing, and start of distributed processing Can do.

新規ノード120の追加に伴い、新規ノード120を加えた分散制御システムを構成するn＋1台のノードに関して、図1に示した分散制御の仕組みを適用することにより、負荷に応じた分散処理が可能な状態として動作する。この場合、新規ノード120に分散処理単位を配分するタイミングを考慮すべきである。次に分散処理単位が生成するタイミングに合わせて配分することが望ましい。全ノードの未処理の分散処理単位を対象とすると、その再配分のための処理オーバーヘッドが大きくなるからである。 With the addition of the new node 120, distributed processing according to the load is possible by applying the distributed control mechanism shown in FIG. 1 to the n + 1 nodes constituting the distributed control system including the new node 120. Act as a state. In this case, the timing for distributing the distributed processing unit to the new node 120 should be considered. Next, it is desirable to distribute according to the timing when the distributed processing unit is generated. This is because processing overhead for redistribution becomes large when unprocessed distributed processing units of all nodes are targeted.

以上のように、性能やプラットフォームが異なるノードが混在する分散環境において、各ノードの状態変化(負荷変動、障害の発生、新規ノードの追加)に応じて、分散処理単位を最適配分し性能向上させることができる。障害の発生に際して専用の待機サーバなしに分散処理を継続することができ、新規ノードをＬＡＮ接続するだけで分散処理に自動参加させることができる。 As described above, in a distributed environment where nodes with different performance and platforms coexist, the distributed processing units are optimally allocated to improve performance according to the state change of each node (load fluctuation, failure occurrence, addition of new node). be able to. When a failure occurs, the distributed processing can be continued without a dedicated standby server, and the new node can be automatically participated in the distributed processing only by connecting to the LAN.

たとえば、各ノードの負荷状況に応じ、同時並列処理が可能な計算パラメータの最小単位に区分した処理単位（分散処理単位）を、各ノードに最適配分することにより、極細かい負荷分散を図り、分散処理の性能を向上させることができる。 For example, according to the load status of each node, the processing unit (distributed processing unit) divided into the minimum units of calculation parameters that can be processed in parallel and parallel is optimally distributed to each node, thereby achieving extremely fine load distribution and distribution. The processing performance can be improved.

ここで、負荷に応じた極細かい分散処理単位の配分で、なぜ性能が向上するかを説明する。もし、トランザクション単位のように分散の粒度が粗いと、一番負荷の低いノードに対して、該トランザクションを実行させることになり、該トランザクションが多くの計算機資源を必要とするケースでは、今まで一番負荷が低かったノードが逆に過負荷になる可能性がある。一方、粒度の細かい分散処理単位を、各ノードの負荷に応じて配分すると、負荷の偏りなく平準化が図れ、結果として分散処理の性能が向上する。 Here, the reason why the performance is improved by the finely distributed processing unit distribution according to the load will be described. If the granularity of distribution is coarse as in a transaction unit, the transaction is executed on the node with the lowest load. In the case where the transaction requires a large amount of computer resources, there is no problem until now. On the contrary, there is a possibility that a node having a low load will be overloaded. On the other hand, if distributed processing units with fine granularity are distributed according to the load of each node, leveling can be achieved without load unevenness, and as a result, the performance of distributed processing is improved.

また、ノード障害の際に専用の待機系サーバを必要とせず、動的フェール・オーバーにより、自動的に分散処理の途中結果を引き継ぎ、分散処理を継続することができる。また、ユーザプログラムにおいて、サーバロケーションの設定変更や通信経路の設定変更処理は必要ない。さらに、新規ノードをＬＡＮ接続するだけで特別な設定なしに、分散処理中であっても、計算機資源を追加し、分散処理を継続することができる。 In addition, a dedicated standby server is not required in the event of a node failure, and it is possible to continue the distributed processing by automatically taking over the intermediate result of the distributed processing by dynamic fail over. In the user program, there is no need to change the server location setting or the communication path setting. Further, even if the new node is connected to the LAN without any special setting and distributed processing is in progress, computer resources can be added and the distributed processing can be continued.

性能やプラットフォームが異なるノードが混在する分散環境における分散処理の負荷に応じた最適配分を説明する図である。It is a figure explaining the optimal allocation according to the load of the distributed processing in the distributed environment where nodes with different performances and platforms are mixed. 管理ノード障害時の動的フェール・オーバーを説明する図である。It is a figure explaining the dynamic fail over at the time of a management node failure. 計算ノード障害時の動的フェール・オーバーを説明する図である。It is a figure explaining the dynamic fail over at the time of a calculation node failure. 新規ノードの分散処理への自動参加を説明する図である。It is a figure explaining the automatic participation to the distributed process of a new node. 状態監視部の処理フローチャートである。It is a process flowchart of a state monitoring part. 資源管理部の処理フローチャートである。It is a processing flowchart of a resource management part. 新たな管理ノード選出の処理フローチャートである。It is a processing flowchart of new management node selection. フェール・オーバー先の計算ノードにおけるレジストリ登録の処理フローチャートである。It is a processing flowchart of the registry registration in the calculation node of a fail over destination. 新規ノードの分散自動参加制御部の処理フローチャートである。It is a process flowchart of the distributed automatic participation control part of a new node.

Explanation of symbols

100…管理ノード、110…計算ノード、300…SDOミドルウエア、310…状態監視部、320…資源管理部、330…分散自動参加制御部、 100 ... Management node, 110 ... Compute node, 300 ... SDO middleware, 310 ... Status monitoring unit, 320 ... Resource management unit, 330 ... Distributed automatic participation control unit,

Claims

A distributed processing control method for allocating a distributed processing unit to each of the n nodes in a distributed processing system in which n nodes having different CPU performances coexist, wherein a certain node determines the CPU usage rate of each node. Collecting, averaging the collected CPU usage of each node over a predetermined period, and using the averaged average CPU usage of each node, the CPU performance of each node, and the sum of the CPU performance of each node A distributed processing control method for obtaining a margin rate of each node and allocating the distributed processing unit to each node in accordance with the margin rate of each node.

In response to detection of a failure of the certain node, another node included in each of the n nodes collects the CPU usage rate of each of the n−1 nodes excluding the certain node, and the collected The CPU usage rate of each of n−1 nodes is averaged over a predetermined period, and the averaged CPU usage rate of each of the n−1 nodes averaged, the CPU performance of each of the n−1 nodes, and the n Using the sum of the CPU performance of each of the -1 nodes, the margin ratio of each of the n-1 nodes is obtained, and the distributed processing unit is determined according to the margin ratio of each of the n-1 nodes. The distributed processing control method according to claim 1, wherein the distributed processing control method is allocated to each of the n−1 nodes.

In response to detection of a failure of a node other than the certain node, the certain node collects the CPU usage rate of each of the n−1 nodes excluding the node where the failure is detected, and the collected n -Average the CPU usage rate of each node over a predetermined period, and average the average CPU usage rate of each of the n-1 nodes, the CPU performance of the n-1 nodes, and the n- Using the sum of the CPU performance of each node, the margin ratio of each of the n−1 nodes is obtained, and the distributed processing unit is set to correspond to the margin ratio of each of the n−1 nodes. The distributed processing control method according to claim 1, wherein the distributed processing control method is allocated to each of n−1 nodes.

In response to the joining of a new node to each node constituting the distributed processing system, the certain node collects the CPU usage rate of each of n + 1 nodes including the joined node, and the collected The CPU usage rate of each of n + 1 nodes is averaged over a predetermined period, and the averaged CPU usage rate of each of the n + 1 nodes averaged, the CPU performance of each of the n + 1 nodes, and the CPU of each of the n + 1 nodes 2. The margin ratio of each of the (n + 1) nodes is obtained using a sum of performance, and the distributed processing unit is allocated to each of the (n + 1) nodes in accordance with the margin ratio of each of the (n + 1) nodes. The distributed processing control method as described.

A distributed processing system that distributes a distributed processing unit to each node having different CPU performance, collecting the CPU usage rate of each node, averaging the collected CPU usage rates of each node over a predetermined period, and calculating the average Using the average CPU usage rate of each node, the CPU performance of each node and the sum of the CPU performance of each node, obtain the margin rate of each node, corresponding to the margin rate of each node, A distributed node having a node that distributes the distributed processing unit to the nodes, and another node that responds to the CPU usage rate in response to the CPU usage rate collection request and executes the distributed processing unit Processing system.

In a distributed processing system in which nodes having different CPU performances are mixed, a distributed processing control program for a certain node that distributes a distributed processing unit to each node, the step of collecting the CPU usage rate of each node, The step of averaging the CPU usage rate of each node over a predetermined period, the averaged CPU usage rate of each node, the CPU performance of each node, and the sum of the CPU performance of each node, A distributed processing control program for causing a computer to execute a step of obtaining a margin rate of each node and a step of allocating the distributed processing units to the nodes in correspondence with the margin rates of the nodes.