JP2011065469A

JP2011065469A - Distributed file system and node start-up method in distributed file system

Info

Publication number: JP2011065469A
Application number: JP2009216023A
Authority: JP
Inventors: Masato Terashita; 雅人寺下; Akihiko Nishitani; 明彦西谷; Tomohiko Ogishi; 智彦大岸
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2009-09-17
Filing date: 2009-09-17
Publication date: 2011-03-31

Abstract

<P>PROBLEM TO BE SOLVED: To start up a node with a small amount of consumed power in case that a node is added when a disk-usage rate increases in a distributed file system. <P>SOLUTION: A distributed file system has a management server 1 and a plurality of nodes 2. In this case, each of the nodes 2 is provided with an information-collecting part 21 which computes its own disk usage rate and a node starting and executing part 23 which executes its own start-up from a standby state. The management server 1 is provided with an information-receiving part 11 which receives information on disk-usage rates from each of the nodes 2, a node disk capacity management part 12 which computes the system disk usage rate of the whole system composed of the plurality of nodes, a node start-up determination part 15 which selects a standby node which is in a standby state among the nodes, as a start-up node, in case that the system disk usage rate exceeds a prescribed threshold value, and a node start-up instructions part 16 which transmits node start-up instructions to the node selected by the node start-up determination part. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、複数のノードをネットワーク上に配置してディスクを共有する分散ファイルシステムに関し、特に、システム全体のディスク使用率が高くなってきた場合や、ノードに故障が生じた場合にノードを追加するノード起動方法、及び、そのための分散ファイルシステムに関する。 The present invention relates to a distributed file system in which a plurality of nodes are arranged on a network to share a disk, and in particular, when a disk usage rate of the entire system becomes high or a failure occurs in a node, the node is added. The present invention relates to a node activation method to be performed and a distributed file system therefor.

分散ファイルシステムは、ネットワークを介して散在する複数のコンピュータのディレクトリ、もしくはファイルを仮想的に統合して利用するための技術である。この種の技術としては、非特許文献１や非特許文献２で示されるように、複数のマシンのディスクを組み合わせて１つのファイルシステムとして機能する分散プラットフォームが提案されている。
非特許文献１に示されたGfarmは、広域ネットワーク上で、大容量、大規模データ処理の要求に応えるスケーラブルな分散ファイルシステムプラットフォームであり、広域なネットワーク上での効率的なファイル共有に適した分散プラットフォームである。
一方、非特許文献２に示されたHadoopは、１つのディスクで保存できない大量のデータを並列化することで高速かつ効率良く処理できるものであり、比較的大きなサイズかつ基本的に更新されることのないファイルのI/Oに適した分散プラットフォームである。 The distributed file system is a technique for virtually integrating directories or files of a plurality of computers scattered over a network. As this type of technology, as shown in Non-Patent Document 1 and Non-Patent Document 2, a distributed platform that functions as one file system by combining disks of a plurality of machines has been proposed.
Gfarm shown in Non-Patent Document 1 is a scalable distributed file system platform that meets the demands of large-capacity, large-scale data processing on a wide area network, and is suitable for efficient file sharing on a wide area network A distributed platform.
On the other hand, Hadoop disclosed in Non-Patent Document 2 can process a large amount of data that cannot be saved on a single disk in parallel and can be processed at high speed and efficiently, and is relatively large and basically updated. It is a distributed platform suitable for I / O of files without files.

分散ファイルシステムにおけるネットワーク構成、ノードの配置や追加に関しては、運用者の判断で行われている。そのため、ディスク容量や運用状況の監視を日々行っている。特に、システム全体のディスク使用率が高くなってきた場合やノードに故障が発生した際にはノードの追加を行う必要がある。ノードの追加は、設定や設置までの作業に時間を要するので、直に起動させる場合は緊急対応できる体制づくりが必要となり、運用の負担が大きくなる。
一方で、急を要するノード追加による運用の負担を軽減するためにノードの過剰供給を行うと、必要以上の電力が消費するため運用コストの増加につながり、どちらの場合も運用側の負担が増えることが懸念される。 The network configuration and the arrangement and addition of nodes in the distributed file system are performed at the discretion of the operator. Therefore, the disk capacity and operation status are monitored every day. In particular, when the disk usage rate of the entire system becomes high or when a failure occurs in a node, it is necessary to add the node. Adding a node takes time to set up and install it, so when starting it up directly, it is necessary to create a system that can respond urgently, increasing the operational burden.
On the other hand, excessive supply of nodes to reduce the burden of operation due to urgent addition of nodes leads to an increase in operation costs because more power is consumed, and in both cases the burden on the operation side increases. There is concern.

そこで、運用側の負担の軽減を目的として、複数のサーバで構成されるシステムの動作状況に応じて不足している資源（サーバやストレージデバイス）の種別を自動的に判断し、その資源（ノード）の増設（起動）を行う技術が特許文献１に提案されている。 Therefore, for the purpose of reducing the burden on the operation side, the type of the resource (server or storage device) that is insufficient is automatically determined according to the operating status of the system consisting of multiple servers, and the resource (node) ) Has been proposed in Patent Document 1.

特開２００６−１１８６０号公報JP 2006-11860 A

URL：http://datafarm.apgrid.org/index.ja.htmlURL: http://datafarm.apgrid.org/index.en.html URL：http://hadoop.apache.org/URL: http://hadoop.apache.org/

しかしながら、特許文献１に記載の方法では、対象をＬＡＮとした場合でのストレージデバイスの追加や、電源供給を伴うホットスタンバイの待機ノードを配置し，システムの負荷状況によりシステムから使用可能にする仕組となっている。特許文献１の記載によれば、ノードの使用可否を判断するサーバとノードとがスイッチで接続され、ローカルネットワークを対象としているので、ＷＡＮや広域ネットワークを対象にしたシステムでの使用ができないという問題がある。
また、ホットスタンバイでは使用しないノードを起動することになるので、上述したノードの過剰供給の場合と同様に、電力消費による運用コストの増加が問題となる。 However, according to the method described in Patent Document 1, a storage device is added when the target is a LAN, and a hot standby standby node with power supply is arranged so that the system can be used depending on the load status of the system. It has become. According to the description in Patent Document 1, a server that determines whether or not a node can be used is connected to the node by a switch and is intended for a local network, and therefore cannot be used in a system that targets a WAN or a wide area network. There is.
In addition, since a node that is not used in hot standby is activated, an increase in operation cost due to power consumption becomes a problem as in the case of excessive supply of nodes described above.

本発明は上記事情に鑑みて提案されたもので、分散ファイルシステムにおけるディスク使用率が高くなってきた場合等においてノードを追加する際に、少ない電力消費量によりノード起動を行うことができる分散ファイルシステム、及び、分散ファイルシステムにおけるノード起動方法を提供することを目的とする。 The present invention has been proposed in view of the above circumstances, and a distributed file capable of starting a node with a small amount of power consumption when adding a node when the disk usage rate in the distributed file system becomes high. It is an object to provide a system and a node activation method in a distributed file system.

上記目的を達成するため本発明は、分散ファイルシステムにおいて、電源供給を伴わないコールドスタンバイの待機ノードを用意し、ディスク使用率が所望の閾値以上になった時に、待機ノードを自動で起動できるようにすることで、使用状況に応じてシステムにおけるディスク容量の増加を図るものである。 In order to achieve the above object, the present invention provides a cold standby standby node without power supply in a distributed file system, and can automatically start the standby node when the disk usage rate exceeds a desired threshold. By doing so, the disk capacity in the system is increased in accordance with the use situation.

すなわち、請求項１は、管理サーバに接続された複数のノード及び複数の待機ノードを有する分散ファイルシステムにおけるノードの起動方法であって、
前記待機ノードは電源が供給されない状態（コールドスタンバイ）で待機し、
前記複数のノードに故障ノードが存在した場合に前記待機ノードを起動する一方、
前記複数のノードで構成されるシステム全体のディスク使用率を算出し該ディスク使用率が所定の閾値を超えた場合に前記ディスク使用率が所定の閾値以下になるまで前記待機ノードを自動的に起動する
ことを特徴としている。 That is, claim 1 is a node activation method in a distributed file system having a plurality of nodes connected to a management server and a plurality of standby nodes.
The standby node waits in a state where no power is supplied (cold standby),
While the standby node is activated when a failure node exists in the plurality of nodes,
Calculate the disk usage rate of the entire system composed of the plurality of nodes, and automatically start the standby node until the disk usage rate falls below a predetermined threshold when the disk usage rate exceeds a predetermined threshold It is characterized by doing.

請求項２は、管理サーバに接続された複数のノード及び複数の待機ノードを有する分散ファイルシステムにおけるノードの起動方法であって、
前記待機ノードは電源が供給されない状態（コールドスタンバイ）で待機し、
前記複数のノード及び複数の待機ノードは複数のネットワークに分割され、
前記複数のノードに故障ノードが存在した場合に故障ノードが存在するネットワーク内の前記待機ノードを起動する一方、
前記各ネットワークにおける複数ノードのディスク使用率を算出し該ディスク使用率が所定の閾値を超えた場合に前記待機ノードを自動的に起動し、
前記複数のネットワークのノードで構成されるシステム全体のディスク使用率を算出し該ディスク使用率が所定の閾値を超えた場合に前記ディスク使用率が所定の閾値以下になるまで前記待機ノードを自動的に起動する
ことを特徴としている。 Claim 2 is a node activation method in a distributed file system having a plurality of nodes and a plurality of standby nodes connected to the management server,
The standby node waits in a state where no power is supplied (cold standby),
The plurality of nodes and the plurality of standby nodes are divided into a plurality of networks,
Activating the standby node in the network where the failed node exists when the failed node exists in the plurality of nodes,
Calculate the disk usage rate of a plurality of nodes in each network and automatically start the standby node when the disk usage rate exceeds a predetermined threshold,
The disk usage rate of the entire system composed of the nodes of the plurality of networks is calculated, and when the disk usage rate exceeds a predetermined threshold, the standby node is automatically operated until the disk usage rate falls below the predetermined threshold. It is characterized by starting up.

請求項３の分散ファイルシステムは、管理サーバと、前記管理サーバに接続された複数のノードとを有している。
そして、前記各ノードは、システム動作時に起動する起動ノードと、電源が供給されない待機状態（コールドスタンバイ）とした待機ノードを含む。
また、前記各ノードは、自らのディスク使用率を算出する情報収集部と、前記待機状態から自らの起動を行うノード起動実行部とを備えている。
前記管理サーバは、前記各ノードからディスク使用率の情報を受信する情報受信部と、
前記情報受信部で受信した各ノードのディスク使用率から複数のノードで構成されるシステム全体のシステムディスク使用率を算出するディスク容量管理部と、前記システムディスク使用率が所定の閾値を超えた場合に前記ノードの中で待機状態となっている待機ノードを起動ノードとして選択するノード起動判断部と、前記ノード起動判断部で選択されたノードにノード起動命令を発信するノード起動命令部とを備えている。 The distributed file system according to a third aspect includes a management server and a plurality of nodes connected to the management server.
Each of the nodes includes a startup node that is activated during system operation and a standby node that is in a standby state (cold standby) in which power is not supplied.
Each node includes an information collection unit that calculates its own disk usage rate, and a node activation execution unit that activates itself from the standby state.
The management server receives an information of a disk usage rate from each of the nodes;
A disk capacity management unit that calculates a system disk usage rate of the entire system composed of a plurality of nodes from a disk usage rate of each node received by the information receiving unit, and the system disk usage rate exceeds a predetermined threshold A node activation determination unit that selects a standby node that is in a standby state among the nodes as an activation node, and a node activation instruction unit that transmits a node activation instruction to the node selected by the node activation determination unit. ing.

本発明によれば、待機ノードを用意しディスクの使用率が所望の閾値以上になった時に待機ノードを自動で起動できるようにしたので、ディスク使用率から判断しノードの追加が必要になった時でも、待機ノードを起動させることで急を要するノード追加に伴う作業を行う必要がなく運用負担の軽減を図ることができる。 According to the present invention, since the standby node is prepared and the standby node can be automatically started when the disk usage rate exceeds a desired threshold value, it is necessary to add a node based on the disk usage rate. Even when the standby node is activated, it is not necessary to perform an operation associated with an urgent node addition, and the operation burden can be reduced.

また、待機ノードをコールドスタンバイとすることで、システム全体でノードの過剰供給による消費電力の増大を抑制することができる。 Further, by setting the standby node as a cold standby, it is possible to suppress an increase in power consumption due to excessive supply of nodes in the entire system.

また、ノード起動判断を行う際に取得するデータを一時的に保持し、前回のデータと新たに取得したデータに存在するノードリストから故障したノードを探し、故障ノードが存在する場合は該当するノードが存在するネットワークの待機ノードを起動することにより、故障によるネットワークのディスク容量の減少を防ぐことができる。 Also, it temporarily holds the data to be acquired when performing node activation determination, searches for a failed node from the node list existing in the previous data and newly acquired data, and if there is a failed node, the corresponding node By activating the standby node of the network where the network exists, it is possible to prevent a decrease in the disk capacity of the network due to the failure.

本発明の分散ファイルシステムの全体構成を示すモデル図である。It is a model figure which shows the whole structure of the distributed file system of this invention. 本発明の分散ファイルシステムの機能を説明するためのブロック図である。It is a block diagram for demonstrating the function of the distributed file system of this invention. 分散ファイルシステムにおいて、システム全体で待機ノードを起動する場合の動作を説明するための処理フローチャート図である。FIG. 11 is a processing flowchart for explaining an operation when a standby node is activated in the entire system in a distributed file system. 分散ファイルシステムにおいて、ネットワーク毎に待機ノードを起動する場合の動作を説明するための処理フローチャート図である。FIG. 10 is a processing flowchart for explaining an operation when a standby node is activated for each network in a distributed file system.

本発明の分散ファイルシステムの実施の形態の一例について、図面を参照しながら説明する。
分散ファイルシステムは、図１に示すように、管理サーバ１と、管理サーバ１に接続された複数のノード２から構成され、ノード２は管理サーバ１と同じネットワークに存在する場合と、数個単位に複数のネットワークを構成する場合がある。すなわち、システム内に複数のネットワークが存在し、各ネットワーク内にノードが数台存在する。ノード２には、システム使用時に動作する複数の起動ノードと、電源が供給されることなくコールドスタンバイされる複数の待機ノードとが存在している。 An example of an embodiment of the distributed file system of the present invention will be described with reference to the drawings.
As shown in FIG. 1, the distributed file system includes a management server 1 and a plurality of nodes 2 connected to the management server 1, and the node 2 exists in the same network as the management server 1, and several units. There are cases where multiple networks are configured. That is, there are a plurality of networks in the system, and there are several nodes in each network. The node 2 includes a plurality of startup nodes that operate when the system is used, and a plurality of standby nodes that are in cold standby without being supplied with power.

ここで言うネットワークとは、サブネットが別である必要はなく、管理サーバ１上で数台のノード２を一つの集合として認識するものである。
ｎ個（数量に規定はない）のノード（Ｎ₁〜Ｎ_n）２は、管理サーバ１を介してユーザがアクセスすることで、どのネットワークに属するノードであるかを意識させることなく単一のストレージとしてユーザに提供される。 The network here does not have to be a separate subnet, and recognizes several nodes 2 as one set on the management server 1.
The n nodes (N _{1 to} N _n ) 2 (the number of which is not specified) are accessed by the user via the management server 1, so that a single node can be obtained without being aware of which network the node belongs to. Provided to the user as storage.

分散ファイルシステムを構成する管理サーバ１及び各ノード２が有する処理機能について、図２を参照しながら説明する。管理サーバ１には複数のノード２が接続され、ノード２には前記したように起動ノードと待機ノードが存在するが、構成自体は同じある。図２では簡略化して管理サーバ１に一つのノード２（起動ノード又は待機ノード）が接続されている状態が示されている。また、分散ファイルシステムは、複数のネットワークに分割され、各ネットワーク内に複数のノード２（複数の起動ノード、及び、複数の待機ノード）が存在している。
サーバ１及び各ノード２には、分散ファイルシステムにおけるノード起動を行うためのプログラムがインストールされ、プログラムにしたがって以下の動作が行われる。 Processing functions of the management server 1 and each node 2 constituting the distributed file system will be described with reference to FIG. A plurality of nodes 2 are connected to the management server 1, and the node 2 has the start node and the standby node as described above, but the configuration itself is the same. FIG. 2 shows a simplified state in which one node 2 (starting node or standby node) is connected to the management server 1. Further, the distributed file system is divided into a plurality of networks, and a plurality of nodes 2 (a plurality of activation nodes and a plurality of standby nodes) exist in each network.
A program for starting a node in the distributed file system is installed in the server 1 and each node 2, and the following operations are performed according to the program.

管理サーバ１は、ノード２からのディスク使用率の情報を受信する情報受信部１１と、定期的にノード２のディスク使用率情報を管理するノードディスク容量管理部１２と、各ノードのネットワーク分割情報を管理するネットワーク管理部１３と、ネットワークにおけるディスク使用率を算出するネットワークディスク容量管理部１４と、故障ノードの探索及び起動ノードの選択を行うノード起動判断部１５と、ノード起動命令を発信するノード起動命令部１６を有して構成される。 The management server 1 includes an information receiving unit 11 that receives disk usage rate information from the node 2, a node disk capacity management unit 12 that periodically manages the disk usage rate information of the node 2, and network partition information of each node. A network management unit 13 that manages the network, a network disk capacity management unit 14 that calculates the disk usage rate in the network, a node activation determination unit 15 that searches for a failed node and selects an activation node, and a node that transmits a node activation command A startup command unit 16 is included.

管理サーバ１に接続される各ノード２は、ファイルを記録するための記録部を有し、自ノードのディスク使用率を管理する情報収集部２１と、自ノードのディスク使用率を管理サーバ１へ通知する情報通知部２２と、自ノードの起動を行うノード起動実行部２３を有して構成される。
また、各ノード２は、システム内で動作している「稼働中」、動作していたものが何らの理由により停止する「故障」、システム全体の動作状態に応じた起動を待機する「待機」のいずれかの状態となっており、各ノード２が３種類のどの状態のノード（「起動ノード」「故障ノード」「待機ノード」）であるかについては管理サーバ１で管理される。 Each node 2 connected to the management server 1 has a recording unit for recording a file, an information collection unit 21 for managing the disk usage rate of the own node, and the disk usage rate of the own node to the management server 1. An information notification unit 22 for notification and a node activation execution unit 23 that activates the own node are configured.
In addition, each node 2 is “operating” operating in the system, “failure” in which the operating system is stopped for any reason, and “standby” waiting for activation according to the operating state of the entire system. The management server 1 manages which of the three types of nodes (“startup node”, “failed node”, and “standby node”) each node 2 is.

次に、管理サーバ１を構成する各部の機能について説明する。
情報受信部１１は、各ノード２の情報通知部２２より通知された複数のノード２それぞれにおけるディスク使用率の情報を定期的に受信し、ノードディスク容量管理部１２へ通知する。 Next, functions of each unit constituting the management server 1 will be described.
The information receiving unit 11 periodically receives the disk usage rate information in each of the plurality of nodes 2 notified from the information notification unit 22 of each node 2 and notifies the node disk capacity management unit 12 of the information.

ノードディスク容量管理部１２はメモリを有し、今回通知された複数のノード２それぞれにおけるディスク使用率から算出したシステム全体のディスク使用率情報(ND_n)と前回通知されたノード２（システム全体）のディスク使用率情報(ND_L)をノード起動判断部１５へ通知する。また、ノード２（システム全体）のディスク使用率情報(ND_n)は、ネットワーク管理部１３へも通知される。 The node disk capacity management unit 12 has a memory, and the disk usage rate information (ND _n ) of the entire system calculated from the disk usage rate in each of the plurality of nodes 2 notified this time and the node 2 (the entire system) notified last time The disk activation rate information (ND _L ) is notified to the node activation determination unit 15. Further, the disk usage rate information (ND _n ) of the node 2 (the entire system) is also notified to the network management unit 13.

ネットワーク管理部１３は、ノードディスク容量管理部１２より通知されたノード２のディスク使用率(ND_n)と自身が保持する各ノード２のネットワーク分割情報をネットワークディスク容量管理部１４へ通知する。ネットワーク分割情報とは、各ノード２（起動ノード及び待機ノード）が属するネットワークの種別、各ネットワークが有するディスク容量等である。 The network management unit 13 notifies the network disk capacity management unit 14 of the disk usage rate (ND _n ) of the node 2 notified from the node disk capacity management unit 12 and the network partition information of each node 2 held by itself. The network partition information includes the type of network to which each node 2 (startup node and standby node) belongs, the disk capacity of each network, and the like.

ネットワークディスク容量管理部１４は、ネットワーク管理部１３より通知された各ノード２のディスク使用率(ND_n)と各ノード２のネットワーク分割情報よりネットワークディスク使用率（ネットワーク毎のディスク使用率）を算出し、ノード起動判断部１５へ通知する。 The network disk capacity management unit 14 calculates the network disk usage rate (disk usage rate for each network) from the disk usage rate (ND _n ) of each node 2 notified from the network management unit 13 and the network partition information of each node 2. Then, the node activation determination unit 15 is notified.

ノード起動判断部１５は、ノードディスク容量管理部１２から通知された（今回）ディスク使用率情報(ND_n)と、（前回）ディスク使用率情報(ND_L)とから故障ノードの探索および起動ノードの選択を行う。故障ノードの探索は、ディスク使用率情報(ND_n)から起動しているノードのノードリスト(NL_n)を作成し、ディスク使用率情報(ND_L)から起動しているノードのノードリスト(NL_L)を作成し、ND_nとND_Lを比較することにより、NL_L に存在しNL_n に存在しないノードを故障ノードと判断する。
また、ネットワークディスク容量管理部１４より通知されたネットワークディスク使用率を元にネットワーク内での待機ノードの中から起動ノードの選択を行う。
更に、各ノード２のディスク使用率(ND_n)からシステム全体のディスク使用率(Sd)を算出し待機ノードの中から起動ノードの選択を行う。
ノード起動判断部１５は、選択した起動ノードのリストをノード起動命令部１６へ通知する。 The node activation determination unit 15 searches the failed node and starts the node from the (current) disk usage rate information (ND _n ) and the (previous) disk usage rate information (ND _L ) notified from the node disk capacity management unit 12. Make a selection. To find the failed node, create a node list (NL _n ) of the active nodes from the disk usage rate information (ND _n ), and then start the node list (NL of the active node from the disk usage rate information (ND _L ). _L) create and by comparing the ND _n and ND _L, it is determined that the failed node to the node that does not exist in the NL _n present in NL _L.
In addition, the start node is selected from the standby nodes in the network based on the network disk usage rate notified from the network disk capacity management unit 14.
Further, the disk usage rate (Sd) of the entire system is calculated from the disk usage rate (ND _n ) of each node 2, and the activation node is selected from the standby nodes.
The node activation determining unit 15 notifies the node activation command unit 16 of the selected activation node list.

ノード起動命令部１６は、ノード起動判断部１５から受信した起動ノードのリストに基づき、該当する各ノード２（待機ノード）のノード起動実行部２３へ対して、ノード起動命令を通知する。ノード起動部命令部１６は、「稼働中」、「故障」、「待機」の３種類あるノードの状態を管理する。ノードの故障を検知した場合、「稼働中」から「故障」に自動的に変化する。待機ノードを起動ノードに自動的に追加した場合は、「待機」から「稼働中」に自動的に変化する。運用管理者が故障からの復旧を実施した場合は、「故障」から「待機」に手動で変化する。また、待機中のノードを手動で稼働中にさせることも可能である。 Based on the list of activation nodes received from the node activation determination unit 15, the node activation instruction unit 16 notifies the node activation execution unit 23 of each corresponding node 2 (standby node) of the node activation instruction. The node activation unit command unit 16 manages the states of the three types of nodes “active”, “failure”, and “standby”. When a node failure is detected, it automatically changes from “in operation” to “failure”. When the standby node is automatically added to the start node, it automatically changes from “standby” to “in operation”. When the operation manager performs recovery from a failure, it manually changes from “failure” to “standby”. It is also possible to manually put a standby node in operation.

ノード２の情報収集部２１では自ノードのディスク使用率を定期的に取得する。情報通知部２２では、取得したディスク使用率を管理サーバ１の情報受信部１１へ通知する。 The information collecting unit 21 of the node 2 periodically acquires the disk usage rate of the own node. The information notifying unit 22 notifies the information receiving unit 11 of the management server 1 of the acquired disk usage rate.

ノード起動実行部２３では、管理サーバ１のノード起動命令部１６より起動命令を受信し、自らのノードを起動する。ノードを自動起動する仕方については、公知のMagic Packet，IPMIで開示されている技術を使用すればよい。 The node activation execution unit 23 receives an activation command from the node activation command unit 16 of the management server 1 and activates its own node. As a method for automatically starting a node, a technique disclosed in known Magic Packet or IPMI may be used.

分散ファイルシステムにおけるノードを自動起動する場合の手順について、図３を参照しながら説明する。この例では、システム内を複数のネットワークに分断することなく、システム内における待機ノードを自動起動するものである。 A procedure for automatically starting a node in the distributed file system will be described with reference to FIG. In this example, the standby node in the system is automatically activated without dividing the system into a plurality of networks.

先ず、故障ノードの検索を行う（ステップ３１）。
各ノードから収集するディスク使用率のデータを一時保持し、再度データを収集した際に前回収集したデータに存在するノードリスト(NL_L)と新たに収集したデータに存在するノードリスト(NL_n)を突き合わせ、NL_L に存在しNL_n に存在しないノードを探す。
故障ノードが存在した場合は（ステップ３２）、待機ノードの中からディスク容量が多いノードから順に故障ノード数分起動する（ステップ３３）。
ディスク容量が同じノードが複数存在する場合には、分散ファイルシステムに登録された順に選択をする。 First, a failure node is searched (step 31).
The disk usage rate data collected from each node is temporarily retained, and when data is collected again, the node list (NL _L ) that exists in the previously collected data and the node list (NL _n ) that exists in the newly collected data To find a node that exists in NL _L but does not exist in NL _n .
When there are failed nodes (step 32), the nodes are activated in the order of the number of failed nodes from the standby node having the largest disk capacity (step 33).
If there are multiple nodes with the same disk capacity, the nodes are selected in the order registered in the distributed file system.

次に、各ノードのディスク使用率からシステム全体のディスク使用率を管理サーバから取得する（ステップ３４）。
ディスク使用率を予め設定された閾値と比較し（ステップ３４）、閾値以下である場合は何もせず終了する。
ディスク使用率が閾値を超えている場合は、最もディスク容量が多い待機ノードを１台起動する（ステップ３６）。 Next, the disk usage rate of the entire system is acquired from the management server from the disk usage rate of each node (step 34).
The disk usage rate is compared with a preset threshold value (step 34), and if it is equal to or less than the threshold value, the process ends without doing anything.
If the disk usage rate exceeds the threshold, one standby node with the largest disk capacity is activated (step 36).

続いて、起動した待機ノードを含めたシステム全体のディスク使用率を再度管理サーバから取得し（ステップ３４）、閾値を超えているか比較し（ステップ３５）、閾値以下の場合は処理を終了する。
閾値を超える場合は、最もディスク容量が大きい待機ノードを１台起動し（ステップ３６）、システム全体の使用率の再取得を行い（ステップ３４）、閾値以下になるまで待機ノードの起動を繰り返す。 Subsequently, the disk usage rate of the entire system including the activated standby node is obtained again from the management server (step 34), and whether or not the threshold value is exceeded is compared (step 35).
When the threshold value is exceeded, one standby node with the largest disk capacity is activated (step 36), the usage rate of the entire system is reacquired (step 34), and the activation of the standby node is repeated until the threshold value is reached.

図３の例では、分散ファイルシステムのシステム内に存在する複数のノード２について平等に取り扱うようにしたが、複数のノードを関連づけした複数のネットワークに分割した場合において、各ネットワーク内のノードを自動起動する場合の手順について図４を参照しながら説明する。 In the example of FIG. 3, the plurality of nodes 2 existing in the distributed file system are handled equally. However, when the nodes are divided into a plurality of associated networks, the nodes in each network are automatically handled. The procedure for starting will be described with reference to FIG.

先ず、故障ノードの検索を行う（ステップ４１）。
各ノードから収集するディスク使用率のデータを一時保持し、再度データを収集した際に前回収集したデータに存在するノードリスト(NL_L)と新たに収集したデータに存在するノードリスト(NL_n)を突き合わせ、NL_L に存在しNL_n に存在しないノードを探す。 First, a failure node is searched (step 41).
The disk usage rate data collected from each node is temporarily retained, and when data is collected again, the node list (NL _L ) that exists in the previously collected data and the node list (NL _n ) that exists in the newly collected data To find a node that exists in NL _L but does not exist in NL _n .

故障ノードが存在するかどうかを判断し（ステップ４２）、故障ノードが存在した場合は、故障ノードが含まれるネットワークの待機ノードを故障ノード数だけ起動する（ステップ４３）。
次に、各ネットワークのディスク使用率(Nd₁〜Nd_n)を管理サーバから取得する（ステップ４４）。Nd₁〜Nd_n の中でネットワーク閾値(NL)を超えている場合（ステップ４５）には、閾値を超えた全てのネットワークのディスク使用率を全てネットワーク閾値(NL)以下になるまで待機ノードを起動する（ステップ４６）。 It is determined whether or not there is a failure node (step 42). If there is a failure node, the standby nodes in the network including the failure node are activated by the number of failure nodes (step 43).
Next, the disk usage rates (Nd _{1 to} Nd _n ) of each network are acquired from the management server (step 44). Nd ₁ to ND _{If n} exceeds the network threshold (NL) in (step 45), the standby node disk usage of all network exceeding the threshold until all equal to or less than the network threshold (NL) Start (step 46).

前述の処理を行った各ネットワークのディスク使用率(Nd₁〜Nd_n)とシステム全体のディスク使用率(Sd)を管理サーバから取得する（ステップ４７）。システム全体のディスク使用率(Sd)がシステム閾値(SL)を超えている場合は（ステップ４８）、Nd₁〜Nd_nの中から最もディスク使用率の高いネットワーク内の待機ノードを起動し（ステップ４９）、再度Nd₁〜Nd_nとSdを取得する（ステップ４７）。 The disk usage rate (Nd _{1 to} Nd _n ) of each network that has performed the above processing and the disk usage rate (Sd) of the entire system are acquired from the management server (step 47). When the disk usage rate (Sd) of the entire system exceeds the system threshold value (SL) (step 48), the standby node in the network with the highest disk usage rate is started from Nd _{1 to} Nd _n (step 48). 49) Nd _{1 to} Nd _n and Sd are acquired again (step 47).

上述した処理について、システム全体のディスク使用率(Sd)がシステム閾値(SL)以下になるまで続け（ステップ４８）、処理を終了する。ネットワーク閾値(NL)とシステム閾値(SL)の値は、NL ≧ SLとなる値に設定されている。具体例としては、ネットワーク閾値を９０%、システム閾値を８０%とする。 The above-described processing is continued until the disk usage rate (Sd) of the entire system becomes equal to or less than the system threshold (SL) (step 48), and the processing ends. The values of the network threshold (NL) and the system threshold (SL) are set to values that satisfy NL ≧ SL. As a specific example, the network threshold is 90% and the system threshold is 80%.

上述した待機ノードの起動方法であると、故障ノードが存在するネットワークの待機ノードを起動するので、故障ノードの発生前後において、各ネットワークのディスク利用率の変更を最小限にすることができる。また、故障ノードのデータ復旧を同一ネットワークのノードに対して行うことが可能になる。 In the standby node activation method described above, the standby node of the network in which the failed node exists is activated, so that the change in the disk usage rate of each network can be minimized before and after the occurrence of the failed node. In addition, it becomes possible to perform data recovery of a failed node on a node in the same network.

また、図４のようにシステム全体をネットワークに分割した場合、ネットワーク閾値(NL)をシステム閾値(SL)よりも高くすることにより、各ネットワークの待機ノード数を減らすことができる。
ディスク使用率が低いネットワークへデータの格納する分散ファイルシステムで用いた場合は、各ネットワークのディスク使用率を均一にし、負荷分散することができる。 Further, when the entire system is divided into networks as shown in FIG. 4, the number of standby nodes in each network can be reduced by making the network threshold (NL) higher than the system threshold (SL).
When used in a distributed file system that stores data in a network with a low disk usage rate, the disk usage rate of each network can be made uniform and the load can be distributed.

従来一般には、分散ファイルシステムのようなストレージを提供するシステムでは、ディスクの容量が足りなくなってからノードを追加するのでは対応が遅いため、ディスクの使用容量を適宜確認し容量不足に近づいてきた際にノードやストレージを追加する処理が行われる。その場合、ノードを追加するまでの猶予が非常に短いため運用の負担が大きくなる。また、ディスクの容量が足りなくなる判断を早めに行うことで、ノード追加の緊急性を下げることは可能だが、ノード追加時には必ずしも必要でない場合が生じ、過剰供給となることで電力の無駄遣いとなる場合がある。 Conventionally, in a system that provides storage such as a distributed file system, it is slow to add a node after the capacity of the disk is insufficient. At this time, processing for adding nodes and storage is performed. In that case, since the grace period until the node is added is very short, the operation burden increases. In addition, it is possible to reduce the urgency of adding a node by making an early decision that the capacity of the disk will be insufficient, but it may not always be necessary when adding a node. There is.

上述した分散ファイルシステムによれば、ノードの追加は必要だが急を要する時期に追加するのではなく、必要になるまでの間で設定された閾値の値で追加することができる（所望のタイミングで追加することができる）。したがって、電源の供給を伴わない待機ノードの設置は行うが、必要になるまでは起動させないため、容量の過剰供給になることがなく、電力の無駄遣いを防止することができる。 According to the distributed file system described above, it is possible to add a node with a threshold value set until it becomes necessary, but not at a time when it is necessary to add a node but it is urgent (at a desired timing). Can be added). Therefore, although a standby node without power supply is installed, the standby node is not activated until it is necessary, so that excessive capacity is not supplied and wasteful use of power can be prevented.

１…管理サーバ、２…ノード（起動ノード又は待機ノード）、１１…情報受信部、１２…ノードディスク容量管理部、１３…ネットワーク管理部、１４…ネットワークディスク容量管理部、１５…ノード起動判断部、１６…ノード起動命令部、２１…情報収集部、２２…情報通知部、２３…ノード起動実行部。 DESCRIPTION OF SYMBOLS 1 ... Management server, 2 ... Node (starting node or standby node), 11 ... Information receiving part, 12 ... Node disk capacity management part, 13 ... Network management part, 14 ... Network disk capacity management part, 15 ... Node starting judgment part 16 ... Node activation command unit, 21 ... Information collection unit, 22 ... Information notification unit, 23 ... Node activation execution unit.

Claims

A method for starting a node in a distributed file system having a plurality of nodes connected to a management server and a plurality of standby nodes,
The standby node waits in a state where power is not supplied,
While the standby node is activated when a failure node exists in the plurality of nodes,
Calculate the disk usage rate of the entire system composed of the plurality of nodes, and automatically start the standby node until the disk usage rate falls below a predetermined threshold when the disk usage rate exceeds a predetermined threshold A node activation method in a distributed file system.

A method for starting a node in a distributed file system having a plurality of nodes connected to a management server and a plurality of standby nodes,
The standby node waits in a state where power is not supplied,
The plurality of nodes and the plurality of standby nodes are divided into a plurality of networks,
Activating the standby node in the network where the failed node exists when the failed node exists in the plurality of nodes,
Calculate the disk usage rate of a plurality of nodes in each network and automatically start the standby node when the disk usage rate exceeds a predetermined threshold,
The disk usage rate of the entire system composed of the nodes of the plurality of networks is calculated, and when the disk usage rate exceeds a predetermined threshold, the standby node is automatically operated until the disk usage rate falls below the predetermined threshold. A node activation method in a distributed file system, characterized by:

In a distributed file system having a management server and a plurality of nodes connected to the management server,
Each of the nodes includes a startup node that is started during system operation and a standby node that is in a standby state in which power is not supplied.
An information collection unit that calculates its own disk usage rate,
A node activation execution unit that activates itself from the standby state;
With
The management server
An information receiving unit for receiving disk usage rate information from each of the nodes;
A disk capacity management unit that calculates a system disk usage rate of the entire system configured by a plurality of nodes from the disk usage rate of each node received by the information receiving unit;
A node activation determination unit that selects, as an activation node, a standby node that is in a standby state among the nodes when the system disk usage rate exceeds a predetermined threshold;
A distributed file system comprising: a node activation command unit that transmits a node activation command to a node selected by the node activation determination unit.