JP4891657B2

JP4891657B2 - Data storage system, file search device and program

Info

Publication number: JP4891657B2
Application number: JP2006149025A
Authority: JP
Inventors: 弘美宇和田
Original assignee: Nomura Research Institute Ltd
Current assignee: Nomura Research Institute Ltd
Priority date: 2006-05-29
Filing date: 2006-05-29
Publication date: 2012-03-07
Anticipated expiration: 2026-05-29
Also published as: JP2007317138A

Description

本発明は、複数のノードを格子状に接続してなるデータ記憶システムと、このシステムで用いるファイル検索装置およびプログラムに関する。 The present invention relates to a data storage system in which a plurality of nodes are connected in a grid, and a file search apparatus and program used in this system.

ネットワーク上でやり取りされるファイル数や個別ファイルのサイズの増大により、データベースの構築に必要となる記憶容量は年々拡大している。しかしながら、ストレージエリアネットワーク（ＳＡＮ）等を構築して十分な記憶容量を確保するには、多大な費用が必要となる。 Due to the increase in the number of files exchanged on the network and the size of individual files, the storage capacity required for database construction is increasing year by year. However, enormous costs are required to construct a storage area network (SAN) or the like to ensure a sufficient storage capacity.

そこで、比較的安価な複数のサーバまたはパーソナルコンピュータを利用して大容量の記憶システムを構築したいという要望がある。このようなシステムでは、システムを構成する個々のノードの備える記憶容量は十分でなくとも、それらを論理的に統合してひとつの記憶領域に見立ててデータを格納し、またデータを検索できることが要求される。
特開平７−１２９４５０号公報 Therefore, there is a demand for constructing a large-capacity storage system using a plurality of relatively inexpensive servers or personal computers. In such a system, even if the storage capacity of the individual nodes constituting the system is not sufficient, it is required to logically integrate them to store data as if it were a single storage area, and to retrieve the data. Is done.
JP 7-129450 A

上述のような複数のノードを用いるシステムでは、複数のノードとそれらのノード間の接続に何らかの規則性がある場合、ファイルの格納位置を示すポインタを含む索引データを準備しておくことが行われる。データを検索する際に、この索引データが利用される。索引データの保存領域と実データの保存領域との対応関係を不適切に設計すると、拡張性に乏しかったり、ファイル検索に長時間を要したり、または索引データのデータ量と実データのデータ量とのバランスが取れないといった問題が生じうる。 In a system using a plurality of nodes as described above, when there is some regularity in the connection between the plurality of nodes and the nodes, index data including a pointer indicating a file storage position is prepared. . This index data is used when retrieving data. If the correspondence between the index data storage area and the actual data storage area is improperly designed, it will not be scalable, it will take a long time to search for files, or the data volume of the index data and the actual data volume There may be a problem that the balance is not achieved.

本発明はこうした状況に鑑みてなされたものであり、その目的は、それぞれが記憶装置を有する複数のノードに保持されたファイルを共通の索引を用いて管理して単一のデータベースとして機能させるデータ記憶システムを構築するための技術を提供することにある。 The present invention has been made in view of such a situation, and an object of the present invention is to manage files held in a plurality of nodes each having a storage device by using a common index to function as a single database. The object is to provide a technique for constructing a storage system.

本発明のある態様は、それぞれが記憶装置を有する複数のノードに保持されたファイルを共通の索引を用いて管理して単一のデータベースとして機能させるデータ記憶システムである。複数のノードは格子状に配列され、各ノードが前後左右のノードと通信可能に接続される。ファイルを実際に格納するファイル格納ノードと、ファイル格納ノードの索引データを格納する索引ノードとがそれぞれ正方形の部分格子を構成するように分割されている。ファイル格納ノードをツリー構造の葉に対応させ、索引ノードをツリー構造の根または節点に対応させて、ファイルおよび索引データを管理するツリー構造の情報が索引ノードに保持されている。そして、ファイル格納ノードに保持されているファイルを特定するためのファイル特定情報に基づいて一意に決定される索引ノードに、ファイル格納ノードのアドレス情報が格納される。 One embodiment of the present invention is a data storage system that manages files held in a plurality of nodes each having a storage device using a common index to function as a single database. A plurality of nodes are arranged in a lattice pattern, and each node is connected to be communicable with the front, rear, left and right nodes. The file storage node that actually stores the file and the index node that stores the index data of the file storage node are each divided so as to form a square partial lattice. Information on the tree structure for managing files and index data is stored in the index node by associating the file storage node with the leaf of the tree structure and the index node with the root or node of the tree structure. Then, the address information of the file storage node is stored in the index node that is uniquely determined based on the file specifying information for specifying the file held in the file storage node.

この態様によると、複数のノードがそれぞれ備えている記憶領域を論理的にひとつの記憶領域として統合して使用することができる。したがって、大容量の記憶装置の代わりに安価なコンピュータやサーバを結合させて、大容量の記憶装置の代替とすることができる。また、ファイル特定情報に基づいて決まる索引ノードに、ファイル格納ノードのアドレス情報が保持されているので、ファイルが実際に格納されているファイル格納ノードのアドレスが不明であっても、ファイル特定情報さえあれば容易に所望のファイルの格納場所を特定することができる。なお、「ファイル特定情報」はファイル固有の情報であればよく、例えばファイル名、ファイルの作成時刻、更新時刻、ファイルの作成者、ファイルを作成したコンピュータ名やこれらの組合せを含む。 According to this aspect, the storage areas provided in each of the plurality of nodes can be logically integrated and used as one storage area. Therefore, an inexpensive computer or server can be combined instead of a large-capacity storage device to replace the large-capacity storage device. In addition, since the address information of the file storage node is held in the index node determined based on the file identification information, even if the address of the file storage node where the file is actually stored is unknown, the file identification information is even If so, the storage location of the desired file can be easily specified. The “file specifying information” may be information unique to a file, and includes, for example, a file name, a file creation time, an update time, a file creator, a computer name that created the file, and a combination thereof.

格子状の配列において縦方向に並ぶノード数と横方向に並ぶノード数とが互いに素の関係にあり、格子状の配列をユークリッドの互除法を使用して複数の正方形の部分格子に分割してもよい。これによると、互いに素である任意のｍ×ｎ個のノードを複数の正方形の部分格子に容易に分割することができる。また、ユークリッドの互除法から自然に導かれる正方形分割を用いると、ツリー構造において親子関係または兄弟関係にあるノードが近接して位置することになり、ファイル検索時またはファイル格納時の親子方向または兄弟方向へのアクセス時間を削減することができる。さらに、ファイル格納ノードが正方形の部分格子であると、ファイルの転送時に宛先のノードに至るまでの経路が複数化されるため、ノード間の接続の一部が切断されたときでもファイルの転送を実現することができる。 The number of nodes arranged in the vertical direction and the number of nodes arranged in the horizontal direction in the grid-like array are relatively prime, and the grid-like array is divided into a plurality of square sub-lattices using the Euclidean algorithm. Also good. According to this, it is possible to easily divide arbitrary m × n nodes that are relatively prime into a plurality of square sub-lattices. In addition, when square division that is naturally derived from Euclidean mutual division is used, nodes in a parent-child relationship or sibling relationship are located close together in the tree structure, and the parent-child direction or sibling during file search or file storage The access time in the direction can be reduced. In addition, if the file storage node is a square sub-grid, multiple paths to the destination node will be created when transferring the file, so the file can be transferred even if part of the connection between the nodes is broken. Can be realized.

ファイル特定情報を所定の規則にしたがってコード化し、得られたコードにしたがってファイル特定情報に対応するファイルを格納すべきノードを決定してもよい。この場合、コードにハッシュ関数を適用してハッシュ値を求め、ハッシュ値にしたがってファイル格納ノードのアドレス情報を保持すべき索引ノードを決定してもよい。 The file specifying information may be coded according to a predetermined rule, and a node that should store a file corresponding to the file specifying information may be determined according to the obtained code. In this case, a hash function may be obtained by applying a hash function to the code, and an index node that should hold the address information of the file storage node may be determined according to the hash value.

本発明の別の態様は、上述のデータ記憶システムにおけるファイル検索プログラムである。ファイル検索プログラムは、索引ノード上で動作し、ファイルの検索要求を受け取る機能と、ファイルのファイル特定情報を所定の規則にしたがってコード化し、得られたコードにしたがってファイル特定情報に対応するファイルが格納されているファイル格納ノードを決定する機能と、コードにハッシュ関数を適用してハッシュ値を求め、ハッシュ値にしたがってファイル格納ノードのアドレス情報が保持されている索引ノードを決定する機能と、を含む。 Another aspect of the present invention is a file search program in the above-described data storage system. The file search program operates on the index node, receives a file search request, encodes file specifying information of the file according to a predetermined rule, and stores a file corresponding to the file specifying information according to the obtained code. A function for determining a file storage node that is stored, a function for obtaining a hash value by applying a hash function to the code, and a function for determining an index node that holds the address information of the file storage node according to the hash value .

本発明のさらに別の態様は、それぞれが記憶装置を有する複数のノードが格子状に配列され、各ノードが前後左右のノードと通信可能に接続されているとき、各ノードに保持されたファイルをＢツリー構造で管理するデータ記憶システムである。Ｂツリー構造の根、節点、葉と、格子状に配列された複数のノードのいずれかとを一対一に対応させる。そして、葉に対応させたノードにはファイルを実際に格納し、節点に対応させたノードには、部分木に含まれる葉に対応するノードを指し示すアドレス情報を格納し、根に対応させたノードには、節点に対応するノードを指し示すアドレス情報を格納する。 According to still another aspect of the present invention, when a plurality of nodes each having a storage device are arranged in a grid pattern and each node is connected to the front, rear, left, and right nodes so as to be able to communicate with each other, This is a data storage system managed with a B-tree structure. The roots, nodes, and leaves of the B-tree structure are associated with any one of a plurality of nodes arranged in a grid. The node corresponding to the leaf actually stores the file, the node corresponding to the node stores address information indicating the node corresponding to the leaf included in the subtree, and the node corresponding to the root Stores address information indicating the node corresponding to the node.

この態様によると、格子状に配列された複数のノードと既知のＢツリーとを組み合わせて、複数のノードに分散して配置されたファイルをＢツリーで効率よく管理することができる。 According to this aspect, a plurality of nodes arranged in a grid and a known B-tree can be combined to efficiently manage files arranged in a plurality of nodes by using the B-tree.

なお、以上の構成要素の任意の組合せ、本発明を方法、装置、システム、記録媒体、コンピュータプログラムにより表現したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described components and a representation of the present invention by a method, apparatus, system, recording medium, and computer program are also effective as an aspect of the present invention.

本発明によれば、それぞれが記憶装置を有する複数のノードに保持されたファイルをまとめて管理して単一のデータベースとして機能させることができる。 According to the present invention, files held in a plurality of nodes each having a storage device can be collectively managed and function as a single database.

本発明の一実施形態は、それぞれがプロセッサを備える複数のノードが格子状に配列されたシステムにおいて、各ノードに配置された記憶装置の集合を管理するデータ記憶システムである、本実施形態は、特にファイルの格納場所を探索するための索引付け技術に特徴がある。以下、図面を参照してこの実施形態について詳細に説明する。 One embodiment of the present invention is a data storage system that manages a set of storage devices arranged in each node in a system in which a plurality of nodes each including a processor are arranged in a grid pattern. In particular, it is characterized by an indexing technique for searching a storage location of a file. Hereinafter, this embodiment will be described in detail with reference to the drawings.

図１は、本発明の一実施形態に係るデータ記憶システム１００と、これに接続されるクライアント端末１２の全体構成図である。本実施形態が対象とする「データ記憶システム」とは、サーバまたはパーソナルコンピュータ等のそれぞれプロセッサを備える複数のノードを格子状に接続させ、複数のノードにデータを分散配置させたシステムのことをいう。 FIG. 1 is an overall configuration diagram of a data storage system 100 and a client terminal 12 connected to the data storage system 100 according to an embodiment of the present invention. The “data storage system” targeted by this embodiment refers to a system in which a plurality of nodes each having a processor, such as a server or a personal computer, are connected in a grid pattern, and data is distributed and arranged on the plurality of nodes. .

図１に示すように、データ記憶システム１００は、クライアント端末１２から発行される検索要求に対してシステム内に格納されたファイルを検索して提供する格子状配列１０を備える。格子状配列１０は、複数列複数行（図１では５行７列）の格子を形成するようにノード２０が配置される。図１の各ノードは、一台のサーバまたはパーソナルコンピュータに対応しており、各ノードを白抜きの正方形で表している。これら格子状に配列されたノードは、上下左右に位置するノードと通信可能なように構成される。図１では格子状配列１０を５行７列としているが、より多数またはより少数のノードで構成されていてもよいことはいうまでもない。これについては後述する。 As shown in FIG. 1, the data storage system 100 includes a grid array 10 that searches and provides a file stored in the system in response to a search request issued from a client terminal 12. In the grid array 10, nodes 20 are arranged so as to form a grid of a plurality of columns and multiple rows (5 rows and 7 columns in FIG. 1). Each node in FIG. 1 corresponds to one server or personal computer, and each node is represented by a white square. These nodes arranged in a lattice shape are configured to be able to communicate with nodes located on the top, bottom, left, and right. In FIG. 1, the grid array 10 has 5 rows and 7 columns, but it is needless to say that it may be composed of a larger number or a smaller number of nodes. This will be described later.

格子状配列１０内の各ノード２０は、ルータ１６を介してインターネット、ＬＡＮ、ＷＡＮ等のネットワーク１４に接続される。データ記憶システム１００は、企業のデータセンタ等に配置され、多数の検索要求に同時に応答することが可能である。 Each node 20 in the grid array 10 is connected to a network 14 such as the Internet, a LAN, and a WAN via a router 16. The data storage system 100 is arranged in a company data center or the like, and can respond to a large number of search requests simultaneously.

格子状配列１０は、全体としてひとつのデータベースとして機能する。この機能を発揮するために必要となる各種プログラムやＩＰアドレスなどの情報は、システム管理者によって予め各ノードに与えられる。別の実施例では、後述する各ノードの役割が決定された時点で、特定のノードに格納されているプログラムや各種データを各ノードに送信するようにしてもよい。 The grid array 10 functions as one database as a whole. Information such as various programs and IP addresses necessary for exhibiting this function is given to each node in advance by the system administrator. In another embodiment, a program and various data stored in a specific node may be transmitted to each node when the role of each node described later is determined.

クライアント端末１２は、キーボードやマウスなどの入力装置とディスプレイなどの出力装置を備えるパーソナルコンピュータ、または、それに準ずる入出力装置を備える携帯電話であってもよい。ただし、携帯電話の場合には、無線で通信することを想定する。ユーザは、クライアント端末１２上でウェブブラウザ等を使用して、データ記憶システム１００に対して検索要求を発行する。 The client terminal 12 may be a personal computer including an input device such as a keyboard and a mouse and an output device such as a display, or a mobile phone including an input / output device equivalent thereto. However, in the case of a mobile phone, wireless communication is assumed. The user issues a search request to the data storage system 100 using a web browser or the like on the client terminal 12.

図２は、格子状配列１０を構成する各ノードのハードウェア構成図である。ノードは、プログラムにしたがって各種処理を実行するプロセッサ９２と、一時的にデータやプログラムを記憶するメモリ９４と、ノードの再起動があっても記録内容が失われない記憶装置９６と、隣接する別のノードに接続し各種の入出力処理を実行するネットワークインタフェース９８と、これらを相互接続するバス９０とを少なくとも含む。ノードは、図１のように上下左右に位置する別のノードと接続するために、最大４つのネットワークインタフェース９８を備える。記憶装置９６としては、ハードディスク装置、光ディスク装置、磁気ディスク装置、不揮発性メモリのほか、任意のものを使用できる。各ノードは、必要に応じて、キーボードやマウスなどの入力装置、ディスプレイなどの出力装置を有していてもよい。 FIG. 2 is a hardware configuration diagram of each node constituting the grid array 10. The node includes a processor 92 that executes various processes according to a program, a memory 94 that temporarily stores data and programs, a storage device 96 that does not lose recorded contents even if the node is restarted, A network interface 98 for executing various input / output processes connected to the nodes, and a bus 90 for interconnecting them. Each node includes a maximum of four network interfaces 98 for connecting to other nodes located on the top, bottom, left, and right as shown in FIG. As the storage device 96, a hard disk device, an optical disk device, a magnetic disk device, a nonvolatile memory, or any other device can be used. Each node may have an input device such as a keyboard and a mouse and an output device such as a display as necessary.

各ノードは、データ記憶システム１００を構成するのに適したコンパクトな形状、すなわちプロセッサ、メモリ、ハードディスク、バスなどが搭載されたブレードサーバであってもよい。格子状配列１０は、このブレードサーバがラックに多数並ぶ配置となることがサーバ同士をリンクするうえで好ましいが、他の態様であってもよい。 Each node may be a blade server equipped with a compact shape suitable for configuring the data storage system 100, that is, a processor, a memory, a hard disk, a bus, and the like. The grid array 10 is preferably arranged so that a large number of blade servers are arranged in a rack in order to link the servers to each other, but other modes may be used.

図３は、各ノード２０が上下左右のノードとリンクした様子を示す図である。図示するように、ノード２０は上下左右のノードとピアツーピア接続になるよう結線し、それぞれのリンクについて予めＩＰアドレスを与えておく。したがって、各ノードは隣接するノードと同数のネットワークインタフェースを備える必要がある。 FIG. 3 is a diagram illustrating a state in which each node 20 is linked to the upper, lower, left, and right nodes. As shown in the figure, the node 20 is connected to the upper, lower, left, and right nodes so as to be in a peer-to-peer connection, and an IP address is given in advance for each link. Therefore, each node needs to have the same number of network interfaces as neighboring nodes.

各ノード２０は、ハイパースレッディングやＶＭｗａｒｅなどの既知の仮想化技術を使用して、複数のＯＳを同時に起動できるように構成する。仮想化技術を用いれば、複数のＯＳを同時起動するために２つ以上のＣＰＵを備えている必要はない。そして、一方のＯＳではアプリケーションの実行を管理し（以下、このＯＳを「アプリケーションＯＳ」と呼ぶ）、他方のＯＳではルーティングを管理する（以下、このＯＳを「自律ディスクＯＳ」と呼ぶ）。各ノード２０を、アプリケーションＯＳを実行するアプリケーションノード２２と、自律ディスクＯＳを実行する自律ディスクノード２４とに仮想的に分けて考えると、アプリケーションノード２２は自律ディスクノード２４に他のノード２０との通信経路の決定を任せることで、格子状配列内の任意のノード２０間で通信が可能になる。ルーティング機能を自律ディスクノード２４とネットワークインタフェースによって実現するので、各ノード間にスイッチやルータの配置は不要である。しかしながら、複数のノードを行や列の単位でひとくくりにして、それぞれにルータを配置する従来通りのネットワーク構成であっても、本実施形態を実現することができる。 Each node 20 is configured so that a plurality of OSs can be started simultaneously using a known virtualization technology such as hyperthreading or VMware. If the virtualization technology is used, it is not necessary to have two or more CPUs for simultaneously starting a plurality of OSs. One OS manages application execution (hereinafter, this OS is referred to as “application OS”), and the other OS manages routing (hereinafter, this OS is referred to as “autonomous disk OS”). Considering each node 20 virtually divided into an application node 22 that executes the application OS and an autonomous disk node 24 that executes the autonomous disk OS, the application node 22 is connected to the other node 20 with the autonomous disk node 24. By leaving the determination of the communication path, communication between any nodes 20 in the grid array becomes possible. Since the routing function is realized by the autonomous disk node 24 and the network interface, it is not necessary to arrange a switch or a router between the nodes. However, this embodiment can be realized even with a conventional network configuration in which a plurality of nodes are grouped in units of rows and columns, and routers are arranged in each of them.

以下の説明では、すべてのノードでアプリケーションＯＳと自律ディスクＯＳが稼働し、ルーティングとアプリケーションの実行ができるものとする。記憶装置へのデータファイルの格納や検索、後述するハッシュ計算などはアプリケーションノード２２が実行し、ファイルの転送処理やルーティングは自律ディスクノード２４が実行することを前提とし、特にそれらの役割を区別しないで説明する。 In the following description, it is assumed that the application OS and the autonomous disk OS are running on all nodes, and that routing and application execution are possible. It is assumed that storage and retrieval of data files in the storage device, hash calculation described later, and the like are executed by the application node 22, and file transfer processing and routing are executed by the autonomous disk node 24, and their roles are not particularly distinguished. I will explain it.

ところで、例えば図４に示すような格子状に並んだ複数のノードからなるデータ記憶システムを想定すると、従来では、システム管理者が必要と見込まれる記憶容量に応じて、いくつのノードを割り当てれば良いかを決定する。例えば、図４（ａ）のように、初期では２×２＝４個のノードを割り当てたとする。その後の運用によって、初期の見込み以上の記憶容量が必要になると、システム管理者は割り当てるノードを増加する必要がある。この際、追加ノードを例えば図４（ｂ）のように割り当てたとする。すると、各ノードに格納しているデータファイルの配置規則と、ノード追加後に格納されるデータファイルの配置規則との間で一貫性を維持できなくなり、索引データに異同が生じる。したがって、格子状に配列された多数のノードをデータベースとして用いるには、データを配置するノードを一定の規則にしたがって定め、索引データの一貫性を維持する必要がある。 By the way, assuming a data storage system composed of a plurality of nodes arranged in a grid as shown in FIG. 4, for example, conventionally, how many nodes should be allocated according to the storage capacity expected by the system administrator. Decide what is good. For example, as shown in FIG. 4A, assume that 2 × 2 = 4 nodes are initially allocated. If the subsequent operation requires a storage capacity that exceeds the initial expectation, the system administrator needs to increase the number of nodes to be allocated. At this time, it is assumed that the additional node is assigned as shown in FIG. Then, it becomes impossible to maintain consistency between the arrangement rule of the data file stored in each node and the arrangement rule of the data file stored after adding the node, and the index data becomes different. Therefore, in order to use a large number of nodes arranged in a grid as a database, it is necessary to determine nodes where data is arranged according to a certain rule and maintain the consistency of index data.

また、図５に示すように、２行１列の索引ノードと、２行２列の実データ格納ノードが割り当てられた状態から記憶容量を増やす場合を想定すると、実データが増えるにしたがって索引データも増加する。このため、索引データも複数のノードに分散する必要に迫られ、索引データの分割ルールが必要になる。例えば、図５（ａ）、（ｂ）のように、索引データを格納するノードをブロックの左一列というような決め方をしていると、実データの許容量は縦横の積である二次式の比率で増えていくにもかかわらず、索引データの許容量が縦一列の一次式の比率でしか増えないため、索引データの格納領域が制約となって実データへのアクセスが制約される。その結果、索引データの再配置が必然的に求められる。もし、この再配置を怠れば、実データを探索するために索引を有するすべてのノードに問い合わせを行わざるを得なくなる。 Further, as shown in FIG. 5, assuming that the storage capacity is increased from the state in which the index node of 2 rows and 1 column and the actual data storage node of 2 rows and 2 columns are allocated, the index data is increased as the actual data increases. Will also increase. For this reason, it is necessary to distribute the index data to a plurality of nodes, and an index data division rule is required. For example, as shown in FIGS. 5 (a) and 5 (b), if the node for storing the index data is determined to be the left column of the block, the allowable amount of the actual data is a quadratic expression that is a product of length and width. However, since the allowable amount of index data increases only by the ratio of a linear expression in a vertical column, the storage area of the index data is restricted and access to actual data is restricted. As a result, relocation of index data is inevitably required. If this rearrangement is neglected, all nodes having an index must be queried in order to search for actual data.

そこで、ある格子状配列が与えられたときに、実データと索引データの配置を適切に決定できる方法が必要になる。本実施形態では、格子状配列の縦横のノード数を互いに素の整数の組で構成することによって、上述の問題を解決するようにした。より具体的には、格子状配列におけるノードの物理的な配置と、索引データを階層化する周知のＢツリーによる索引データの管理とを組み合わせて使用する。 Therefore, there is a need for a method that can appropriately determine the arrangement of the actual data and the index data when a certain grid arrangement is given. In the present embodiment, the above-described problem is solved by configuring the number of nodes in the vertical and horizontal directions of the lattice-like array by a pair of prime integers. More specifically, the physical arrangement of the nodes in the grid array and the management of the index data by a well-known B-tree for hierarchizing the index data are used in combination.

本実施形態による索引データの管理を実施する前提として、格子状配列を構成するノードを、ファイルを実際に格納する「ファイル格納ノード」と、ファイル格納ノードのアドレス情報を含む索引データを格納する「索引ノード」とに分割する必要がある。ここで、「索引データ」とは、ファイル格納ノードを特定するために必要なデータであり、後述する実施例では、ファイル格納ノードのノード番号とアドレス情報の組である。 As a premise for managing the index data according to the present embodiment, the nodes constituting the grid array are stored as “file storage nodes” that actually store files, and index data including address information of the file storage nodes is stored. It is necessary to divide it into “index nodes”. Here, “index data” is data necessary to specify a file storage node, and in the embodiment described later, is a set of a node number and address information of the file storage node.

図６のフローチャートを参照して、格子状配列１０の分割の手順を説明する。まず、システム管理者は、格子状配列の縦方向のノード数をｍ、横方向のノード数をｎ（ｍ、ｎは自然数）としたとき、ｍとｎが互いに素である格子状配列を構築し、それぞれのノードのＩＰアドレスを設定する（Ｓ１０）。縦横のノード数を互いに素とする理由は、後述するユークリッドの互除法により正方形分割することで索引ノードを階層的に構成できることが保証されるからである。 With reference to the flowchart of FIG. 6, a procedure for dividing the grid array 10 will be described. First, the system administrator constructs a grid array in which m and n are prime when m is the number of nodes in the vertical direction and n is the number of nodes in the horizontal direction (m, n is a natural number). Then, the IP address of each node is set (S10). The reason why the number of nodes in the vertical and horizontal directions is relatively prime is that it is guaranteed that the index nodes can be hierarchically formed by dividing into squares by the Euclidean mutual division method described later.

次に、システム管理者は、格子状配列を大きさを異にする複数の正方形の部分格子に分割する（Ｓ１２）。縦と横のノード数が互いに素である格子状配列１０を複数の正方形の部分格子へと分割するには、周知のユークリッドの互除法を使用する。以下、本実施形態における格子状に配列されたノードに対しユークリッドの互除法を適用して、縦横のノード数が等しい正方形の部分領域に分割する方法を説明する。 Next, the system administrator divides the grid array into a plurality of square partial grids having different sizes (S12). In order to divide the lattice-like array 10 in which the number of nodes in the vertical and horizontal directions are relatively prime into a plurality of square sub-lattices, a well-known Euclidean algorithm is used. Hereinafter, a method of dividing the nodes into the square partial regions having the same number of nodes in the vertical and horizontal directions by applying the Euclidean mutual division method to the nodes arranged in a grid in the present embodiment will be described.

１．ｍ＞ｎであるような自然数ｍ、ｎのノード数を持つ格子状配列をＫ(ｍ、ｎ)と表記する。
２．Ｋ(ｍ、ｎ)の左側から正方形の部分格子Ｋ(ｎ、ｎ)を詰めていく。正方形の個数をｑ_０とすると、ｍをｎで割った商がｑ_０であり、余りは（ｍ−ｑ_０・ｎ）と表せる。
３．余り（ｍ−ｑ_０・ｎ）＝ｒ_１と表記すると、正方形の部分格子Ｋ(ｎ，ｎ)を詰めた残りの部分はＫ(ｒ_１，ｎ)と表せる。
４．長方形Ｋ(ｒ_１，ｎ)の下側から今度は正方形の部分格子Ｋ(ｒ_１，ｒ_１)を詰めていく。正方形の部分格子の個数をｑ_１とすると、ｎをｒ_１で割ったときの商がｑ_１であり、余りは（ｎ−ｑ_１・ｒ_１）と表せる。
５．上記の操作を繰り返すと、有限回で格子状配列Ｋ(ｍ、ｎ)は複数の正方形の部分格子に分割される。 1. A latticed array having a natural number m and n nodes such that m> n is denoted as K (m, n).
2. From the left side of K (m, n), the square partial lattice K (n, n) is packed. If the number of square and _{q 0,} the quotient obtained by dividing m by n is _{q 0,} the remainder can be expressed as _{(m-q 0 · n)} .
3. When the remainder (mq ₀ · n) = r ₁ is expressed, the remaining part filled with the square partial lattice K (n, n) can be expressed as K (r ₁ , n).
4). From the lower side of the rectangle K (r ₁ , n), this time, the square partial lattice K (r ₁ , r ₁ ) is packed. When the number of square sublattices and _{q 1,} the quotient obtained by dividing the n in _{r 1} is _{q 1,} remainder expressed as _{_{(n-q 1 · r 1}} ).
5. When the above operation is repeated, the grid array K (m, n) is divided into a plurality of square partial grids in a finite number of times.

ユークリッドの互除法を用いて任意の領域を正方形の部分格子に分割する方法は、例えば「分割の幾何学デーンによる２つの定理」、日本評論社、砂田利一著、p.34-p.38に記載されているように周知であるから、これ以上詳細な説明は省略する。 The method of dividing an arbitrary area into square sub-lattices using the Euclidean algorithm is, for example, “Two Theorems by Geometrical Dane of Division”, Nihon Crihonsha, Toshikazu Sunada, p.34-p.38. Since it is well-known as described in (1), further detailed explanation is omitted.

図７を参照して、上記手順１〜５の具体例を示す。
１．格子状配列１０は、Ｋ(７，５)と表せる。
２．Ｋ(７，５)の左側から正方形の部分格子Ｋ(５，５)を詰めると、正方形はひとつしか入らないのでｑ_０＝１であり、余りは（７−５・１）＝２となる。
３．したがって、長方形のＫ（２，５）が残る。
４．Ｋ(２，５)の下側から正方形の部分格子Ｋ(２，２)を詰めていくと、正方形は２つ配置できるのでｑ_１＝２となり、あとにＫ(２，１)が余る。
５．最後に、Ｋ(１，１)であるひとつのノードが２つ残る。
これによって、格子状配列１０は、図７で太線の四角形で囲んだ５つの正方形の部分格子に分割される。 With reference to FIG. 7, the specific example of the said procedures 1-5 is shown.
1. The grid array 10 can be expressed as K (7, 5).
2. If a square sub-grid K (5,5) is packed from the left side of K (7,5), only one square is entered, so q ₀ = 1 and the remainder is (7−5 · 1) = 2. .
3. Therefore, rectangular K (2,5) remains.
4). If the square sub-lattice K (2, 2) is packed from the lower side of K (2, 5), two squares can be arranged, so q ₁ = 2, and K (2, 1) is left behind.
5. Finally, two nodes with K (1,1) remain.
As a result, the grid array 10 is divided into five square partial grids surrounded by thick squares in FIG.

図６に戻り、システム管理者は、分割された正方形の部分格子を、ファイル格納ノードかまたは索引ノードのいずれかに指定する（Ｓ１４）。図７において、左側の最大の部分格子に含まれるノードを「ファイル格納ノード」と定め、残りの部分格子に含まれるノードは「索引ノード」と定められる。システム管理者は、各ノードを識別するための番号を割り振る（Ｓ１６）。その結果を図８に示す。図示するように、ファイル格納ノードはｃ０〜ｃ２４の２５個あり、二次索引ノードはｂ０〜ｂ３の４個あり、一次索引ノードはａ０の一個が存在する。図中のａ０’およびｂ０’〜ｂ３’のノードの活用方法については後述する。 Returning to FIG. 6, the system administrator designates the divided square partial grid as either the file storage node or the index node (S14). In FIG. 7, a node included in the left partial grid is defined as a “file storage node”, and nodes included in the remaining partial grids are defined as “index nodes”. The system administrator assigns a number for identifying each node (S16). The result is shown in FIG. As shown in the figure, there are 25 file storage nodes c0 to c24, four secondary index nodes b0 to b3, and one primary index node a0. A method of using the nodes a0 'and b0' to b3 'in the drawing will be described later.

本実施形態では、ファイルはファイル格納ノードに分散して配置しておき、それらの格納場所を知るための索引データとして格納場所へのポインタを周知のＢツリーによって管理する。本実施形態では、ポインタとして図３で示したＩＰアドレスやＭＡＣアドレスを使用する。ファイル格納ノードのそれぞれがＢツリーにおける「葉」と一対一に対応し、索引ノードのそれぞれがＢツリーにおける「節点」と一対一に対応する。最後の正方形分割で得られる１×１のノードは、Ｂツリー構造における「根」と対応させる。この結果、Ｂツリーの構成は図９のようになる。図示するように、一次索引ノードａ０は、自分自身と二次索引ノードｂ０〜ｂ３とを結ぶ４つの枝を有している。二次索引ノードｂ０〜ｂ３は、ファイル格納ノードのうちのいくつかを葉として有している。ファイル格納ノードと二次索引ノードの関連については後述する。 In this embodiment, files are distributed and arranged in file storage nodes, and a pointer to the storage location is managed by a known B-tree as index data for knowing the storage location. In this embodiment, the IP address or MAC address shown in FIG. 3 is used as a pointer. Each file storage node has a one-to-one correspondence with a “leaf” in the B-tree, and each index node has a one-to-one correspondence with a “node” in the B-tree. The 1 × 1 node obtained by the last square division is made to correspond to the “root” in the B-tree structure. As a result, the configuration of the B-tree is as shown in FIG. As shown in the figure, the primary index node a0 has four branches connecting itself and the secondary index nodes b0 to b3. The secondary index nodes b0 to b3 have some of the file storage nodes as leaves. The relationship between the file storage node and the secondary index node will be described later.

システム管理者は、一次索引ノード、二次索引ノード、およびファイル格納ノードに対して、後述する検索プロセスを実行するためのプログラムをインストールさせる。 The system administrator installs a program for executing a search process described later on the primary index node, the secondary index node, and the file storage node.

図１０は、検索プログラムを実行した状態での一次索引ノードの機能ブロック図である。ファイル受取部１０２は、ファイルの格納場所の問い合わせのためにファイル名を受け取ったり、または転送されたファイルを受け取る。検索部１０４は、コード化部１０６とハッシュ計算部１０８とを含む。コード化部１０６は、後述する方法によってファイル名をコード化（数値化）して、ファイルを保持すべきファイル格納ノードを表す「格納場所コード」に変換する。ここで、「コード化」とは、後に具体例を挙げて説明するように、任意のファイル名をファイル格納ノード数以下の整数に変換することをいう。ハッシュ計算部１０８は、格納場所コードに対してハッシュ関数を適用し、ハッシュ値を算出する。ファイル転送部１１０は、アドレス情報にしたがってファイルを別のノードに転送する。テーブル保持部１１２は、格納場所コードまたはハッシュ値と対応するノードのアドレス情報をテーブル形式で保持する。情報取得部１１４は、システム管理者から与えられるノード番号やＩＰアドレスの情報などを取得する。 FIG. 10 is a functional block diagram of the primary index node in a state where the search program is executed. The file receiving unit 102 receives a file name for an inquiry about a file storage location, or receives a transferred file. The search unit 104 includes a coding unit 106 and a hash calculation unit 108. The encoding unit 106 encodes (numerizes) the file name by a method described later, and converts the file name into a “storage location code” representing a file storage node that should hold the file. Here, “encoding” refers to converting an arbitrary file name into an integer equal to or less than the number of file storage nodes, as will be described later with a specific example. The hash calculation unit 108 applies a hash function to the storage location code and calculates a hash value. The file transfer unit 110 transfers the file to another node according to the address information. The table holding unit 112 holds the address information of the node corresponding to the storage location code or the hash value in a table format. The information acquisition unit 114 acquires node number and IP address information given by the system administrator.

二次索引ノードおよびファイル格納ノードの構成も一次索引ノードと同様であるが、後述するように、検索部１０４の機能とテーブル保持部１１２に格納される索引データが異なる。 The configuration of the secondary index node and the file storage node is the same as that of the primary index node, but the function of the search unit 104 and the index data stored in the table holding unit 112 are different as will be described later.

次に、図１１のフローチャートを参照して、ファイル名に基づいて当ファイルを格納すべきファイル格納ノードを決定し、さらに各ファイル格納ノードの索引データを格納すべき索引ノードを決定するプロセスを説明する。このプロセスを経て、図９で示したようなＢツリーを用いてデータ記憶システム内のファイルを検索できるようになる。 Next, with reference to the flowchart of FIG. 11, a process for determining a file storage node to store the file based on the file name and further determining an index node to store the index data of each file storage node will be described. To do. Through this process, a file in the data storage system can be searched using the B-tree as shown in FIG.

図６の手順にしたがってＢツリーを構成した後に、外部からデータ記憶システムにファイルを送信して適当なノードに記憶させる場合を考える。一次索引ノードには、予めシステム管理者によって、二次索引ノードｂ０〜ｂ３およびファイル格納ノードｃ０〜ｃ２４のアドレス情報が記録されており、一次索引ノードはアドレス情報をノード番号と対応付けてテーブル保持部１１２に記録する（Ｓ２０）。外部からのファイルはルータによって一次索引ノードに送信される。一次索引ノードのファイル受取部１０２がファイルを受け取り、コード化部１０６に渡す。コード化部１０６は、受け取ったファイルのファイル名を所定の規則にしたがってコード化し、格納場所コードｘを算出する（Ｓ２２）。ファイル転送部１１０は、格納場所コードｘで指定されるファイル格納ノードに対してそのファイルを転送する（Ｓ２４）。続いて、ハッシュ計算部１０８は、格納場所コードｘに対してハッシュ関数を適用してハッシュ値を算出する（Ｓ２６）。このハッシュ関数をｈ（ｘ）、二次索引ノードの数をｒ_１ ^２と表記すると、ｈ（ｘ）＝０、．．．、（ｒ_１ ^２−１）となるようにハッシュ関数を選択する。ファイル転送部１１０は、ハッシュ値で指定される二次索引ノードに対して、ファイル格納ノードのアドレス情報を送信する（Ｓ２８）。二次索引ノードは、ファイル格納ノードのノード番号とアドレス情報とを対応付けて、自身のテーブル保持部１１２に記録する（Ｓ３０）。 Consider the case where a B-tree is constructed according to the procedure of FIG. 6 and then a file is transmitted from the outside to the data storage system and stored in an appropriate node. In the primary index node, the address information of the secondary index nodes b0 to b3 and the file storage nodes c0 to c24 is recorded in advance by the system administrator. The data is recorded in the unit 112 (S20). External files are sent by the router to the primary index node. The file receiving unit 102 of the primary index node receives the file and passes it to the encoding unit 106. The encoding unit 106 encodes the file name of the received file according to a predetermined rule, and calculates the storage location code x (S22). The file transfer unit 110 transfers the file to the file storage node specified by the storage location code x (S24). Next, the hash calculator 108 calculates a hash value by applying a hash function to the storage location code x (S26). When this hash function is represented as h (x) and the number of secondary index nodes is represented as r ₁ ² , h (x) = 0,. . . , (R ₁ ² −1), the hash function is selected. The file transfer unit 110 transmits the address information of the file storage node to the secondary index node specified by the hash value (S28). The secondary index node records the node number of the file storage node in association with the address information in its own table holding unit 112 (S30).

以下、具体例を挙げて図１１の各ステップを説明する。この例では、簡単のためにファイル名はすべて平仮名で与えられているものとし、平仮名の各文字の母音に基づいてコード化を実行する。 Hereinafter, each step of FIG. 11 will be described with a specific example. In this example, for the sake of simplicity, it is assumed that the file names are all given in hiragana, and encoding is executed based on the vowels of each character of hiragana.

各ノードのコード化部１０６は、図１２に示すような母音コード表を予め保持している。そして、コード化部１０６は、母音コード表にしたがって、ファイル名中の各文字の母音が「あ」であれば「１」を、母音が「い」であれば「２」を、母音が「う」であれば「３」を、母音が「え」であれば「４」を、母音が「お」であれば「５」を、それぞれ与えるとする。促音、拗音、長音や「ん」については「０」を与える。 The encoding unit 106 of each node holds a vowel code table as shown in FIG. Then, according to the vowel code table, the encoding unit 106 sets “1” if the vowel of each character in the file name is “A”, “2” if the vowel is “Yes”, and sets the vowel “ "3" is given for "U", "4" is given if the vowel is "E", and "5" is given if the vowel is "O". “0” is given for prompting sound, stuttering sound, long sound and “n”.

図１３は、ファイル名の具体例と、ファイル名に基づいた格納場所コードの算出方法を示す。ファイル名が「せみなさんか」である場合、コード化部１０６は、「せ」の母音「え」のコード「４」、「み」の母音「い」のコード「２」、．．．といったように、ファイル名の平仮名のコードを母音コード表にしたがって変換していく。すべての文字を変換したら、それらを足し合わせる。この例では、「せ」「み」「な」「さ」「ん」「か」にそれぞれ対応するコード「４」「２」「１」「１」「０」「１」を加算して、格納場所コード「９」が求められる。この数字が、当該ファイルを格納すべきファイル格納ノードの番号（つまりｃ９）を示している。他のファイル名「けいひ」「よさん」「かし」「たいさく」についても同様の手順で計算をし、それぞれのファイル格納ノードはｃ８、ｃ６、ｃ３、ｃ７となる。コードの加算の結果、格納場所コードがファイル格納ノードの総数である「２５」以上になった場合は、格納場所コードを２５で除したときの余りをそのファイルの格納場所コードとする。 FIG. 13 shows a specific example of a file name and a storage location code calculation method based on the file name. When the file name is “Seminasan?”, The encoding unit 106 performs the code “4” of the vowel “e” of “se”, the code “2” of the vowel “i” of “mi”,. . . In this way, the hiragana code of the file name is converted according to the vowel code table. After converting all the characters, add them together. In this example, codes “4”, “2”, “1”, “1”, “0”, and “1” respectively corresponding to “se”, “mi”, “na”, “sa”, “n”, and “ka” are added, The storage location code “9” is obtained. This number indicates the number of the file storage node (that is, c9) where the file is to be stored. The other file names “Keihi”, “Yosan”, “Kashi” and “Taisaku” are calculated in the same procedure, and the respective file storage nodes are c8, c6, c3 and c7. If the storage location code is equal to or greater than “25”, which is the total number of file storage nodes, as a result of the code addition, the remainder when the storage location code is divided by 25 is used as the storage location code of the file.

さらに図１３を参照して、ファイル格納ノードの索引データを格納すべきノードを決定する手順について説明する。ハッシュ計算部１０８は、格納場所コードに対して二次索引ノード数を法とした剰余をハッシュ値として計算する。そして、ハッシュ値を索引データを保持すべき二次索引ノードの番号と決定する。例えば、格納場所コードが「９」であれば、９＝４・２＋１であるから二次索引ノードはｂ１となる。したがって、ファイル格納ノードｃ０、ｃ４、ｃ８、．．．、ｃ２４の索引データは二次索引ノードｂ０に、ファイル格納ノードｃ１、ｃ５、ｃ９、．．．、ｃ２１の索引データは二次索引ノードｂ１に、ファイル格納ノードｃ２、ｃ６、ｃ１０、．．．、ｃ２２の索引データは二次索引ノードｂ２に、ファイル格納ノードｃ３、ｃ７、ｃ１１、．．．、ｃ２３の索引データは二次索引ノードｂ３に、それぞれ格納される。
なお、ハッシュ関数は、連続する格納場所コードに対して異なるハッシュ値を出力するものであれば他の関数でもよい。 Further, with reference to FIG. 13, a procedure for determining a node to store the index data of the file storage node will be described. The hash calculator 108 calculates a remainder modulo the number of secondary index nodes for the storage location code as a hash value. Then, the hash value is determined as the number of the secondary index node that should hold the index data. For example, if the storage location code is “9”, since 9 = 4 · 2 + 1, the secondary index node is b1. Therefore, the file storage nodes c0, c4, c8,. . . , C24 index data is stored in the secondary index node b0 and file storage nodes c1, c5, c9,. . . , C21 is stored in the secondary index node b1, and the file storage nodes c2, c6, c10,. . . , C22 index data is stored in the secondary index node b2, and the file storage nodes c3, c7, c11,. . . , C23 is stored in the secondary index node b3.
Note that the hash function may be another function as long as it outputs different hash values for successive storage location codes.

図１４は、上記具体例にしたがって構築されるＢツリー構造として、各ノードとそれらに格納されるデータを表している。一次索引ノードには、すべての二次索引ノードおよびファイル格納ノードの索引データ（つまり、ノード番号とアドレス情報の組）が配置される。他方、二次索引ノードには、Ｂツリーの葉に相当するファイル格納ノードの索引データが配置される。二次索引ノードは、すべてのファイル格納ノードの索引データを保持するのではなく、上述のハッシュ計算の結果が自身のノード番号と一致するファイルを格納したファイル格納ノードの索引データのみを保持する。本明細書では、これを「部分木の索引データを保持する」という。 FIG. 14 shows each node and data stored in them as a B-tree structure constructed according to the above specific example. In the primary index node, index data (that is, a set of node number and address information) of all secondary index nodes and file storage nodes is arranged. On the other hand, the index data of the file storage node corresponding to the leaf of the B-tree is arranged in the secondary index node. The secondary index node does not hold the index data of all the file storage nodes, but holds only the index data of the file storage node storing the file whose hash calculation result matches the node number of itself. In this specification, this is referred to as “holding index data of a subtree”.

図１４に示すように、本実施形態では、実際のファイルはファイル格納ノードに保持され、そのファイルを検索するために必要な索引データを二次索引ノードと一次索引ノードに格納する。このように、ファイルの保存場所と索引データの保存場所とを異なるノードにしている。 As shown in FIG. 14, in this embodiment, an actual file is held in a file storage node, and index data necessary for searching for the file is stored in a secondary index node and a primary index node. Thus, the file storage location and the index data storage location are set to different nodes.

また、全体のファイル格納ノードの数がいくつであってもＢツリー構造でファイルを管理することができるため、ノードの縦横の配列数に拡張性がある。 In addition, since the files can be managed with the B-tree structure regardless of the number of the entire file storage nodes, the number of arrangements of the nodes in the vertical and horizontal directions is scalable.

上述した正方形の部分格子への分割により、ｒ_０とｒ_１とは互いに素であるから、ファイル格納ノードと二次索引ノードのノード数の比ｒ_０ ^２とｒ_１ ^２も必ず互いに素となる。このため、索引をたどる階層にハッシュ関数を適用すると、二次索引ノードの数とファイル格納ノードの数に公約数がある場合と比べて、索引並びに実データを分散する効果が高くなると期待される。 The division into sublattice of the above-mentioned square, consists of disjoint, the file storage node and also always disjoint two the number of nodes of primary index node of the ratio r ₀ ² and r ₁ ² and r ₀ and r ₁ . For this reason, applying a hash function to the hierarchy following the index is expected to increase the effect of distributing the index and actual data, compared to the case where there is a common divisor in the number of secondary index nodes and the number of file storage nodes. .

外部からデータ格納システムに対してファイルの要求が来ると、その要求は一次索引ノードに渡される。一次索引ノードは、ファイル名をコード化して、格納場所コードと同じ番号を持つファイル格納ノードのアドレスをテーブルから検索し、ファイル要求を検索したファイル格納ノードに渡す。ファイル要求を受け取ったファイル格納ノードは、ファイルを記憶装置から取り出して、要求元のアドレスに対して検索したファイルを送信する。 When a file request comes from the outside to the data storage system, the request is passed to the primary index node. The primary index node encodes the file name, searches the table for the address of the file storage node having the same number as the storage location code, and passes the file request to the searched file storage node. The file storage node that has received the file request retrieves the file from the storage device and transmits the retrieved file to the request source address.

Ｂツリー構造を構築した後であれば、各ノードに元から格納されていたファイルを、Ｂツリーに合わせて再配置することもできる。このプロセスを図１５のフローチャートを参照して説明する。 After the B-tree structure is constructed, the files originally stored in the nodes can be rearranged according to the B-tree. This process will be described with reference to the flowchart of FIG.

まず、各ノードにおいて、現在格納されているデータファイルのファイル名を取得し、それぞれの格納場所コードを計算する（Ｓ４０）。計算された格納場所コードが、各ノードに割り振られたノード番号と一致しているか否かを判定する（Ｓ４２）。一致していれば（Ｓ４２のＹ）、そのファイル名を持つデータファイルは、当該ノードに格納すべきものであるから、このフローを終了する。一致していなければ（Ｓ４２のＮ）、そのファイル名のデータファイルは別のノードに格納すべきものである。したがって、ファイル格納ノードは、上位の索引ノードに対し、そのファイルを格納すべきファイル格納ノードのアドレスを問い合わせる（Ｓ４４）。 First, in each node, the file name of the data file currently stored is acquired, and each storage location code is calculated (S40). It is determined whether or not the calculated storage location code matches the node number assigned to each node (S42). If they match (Y in S42), the data file having the file name is to be stored in the node, so this flow ends. If they do not match (N in S42), the data file with that file name is to be stored in another node. Therefore, the file storage node inquires of the higher-order index node about the address of the file storage node where the file is to be stored (S44).

二次索引ノードは、ファイル名を受け取ると、格納場所コードを計算したうえで、さらにハッシュ値を計算する（Ｓ４６）。計算したハッシュ値が、二次索引ノードに割り振られたノード番号と一致していれば（Ｓ４８のＹ）、テーブルに格納場所コードと一致するファイル格納ノードのアドレスが存在するので、問い合わせをしてきたファイル格納ノードにアドレス情報を送信する（Ｓ５０）。計算したハッシュ値が、二次索引ノードに割り振られたノード番号と一致していないときは（Ｓ４８のＮ）、二次索引ノードは、一次索引ノードに対してさらにファイル名を問い合わせる（Ｓ５２）。
一次索引ノードには、すべてのノードのアドレス情報が記録されているので、格納場所コードに対応するファイル格納ノードのアドレスを検索して、問い合わせをしてきたファイル格納ノードにアドレス情報を送信する（Ｓ５４）。 Upon receiving the file name, the secondary index node calculates a storage location code and further calculates a hash value (S46). If the calculated hash value matches the node number assigned to the secondary index node (Y in S48), the file storage node address that matches the storage location code exists in the table, so an inquiry has been made. Address information is transmitted to the file storage node (S50). When the calculated hash value does not match the node number assigned to the secondary index node (N in S48), the secondary index node inquires of the primary index node about the file name (S52).
Since the address information of all the nodes is recorded in the primary index node, the address of the file storage node corresponding to the storage location code is searched, and the address information is transmitted to the file storage node that has inquired (S54). ).

問い合わせ元のファイル格納ノードは、送信されてきたアドレス情報にしたがって、直接そのファイル格納ノードに対して、ファイルを送信して格納を依頼する（Ｓ５６）。ファイルを受け取ったファイル格納ノードは、ファイル名をコード化して自らに格納すべきファイルであることを確認した後、そのファイルを記憶装置に格納する（Ｓ５８）。 The inquiring file storage node directly sends the file to the file storage node according to the transmitted address information and requests storage (S56). The file storage node that has received the file encodes the file name and confirms that the file is to be stored therein, and then stores the file in the storage device (S58).

以上の手順は、ファイル格納ノードにおけるものであるが、索引ノードでは若干異なる。二次索引ノードでは、コード化、ハッシュ値計算によって、自分の管理する部分木内のファイル格納ノードであることが分かれば、そのアドレスを検索してファイルの格納を依頼する。自分の管理する部分木内のデータでないことが分かると、一次索引ノードに対してファイル格納ノードのアドレスを問い合わせた後、ファイルの格納を依頼する。 The above procedure is for the file storage node, but is slightly different for the index node. If the secondary index node knows that it is a file storage node in the subtree managed by the coding and hash value calculation, the secondary index node searches the address and requests storage of the file. If it is determined that the data is not in the subtree managed by the user, the primary index node is inquired about the address of the file storage node and then requested to store the file.

一次索引ノードはすべてのノードのアドレスを保持しているので、格納場所コードと同一番号を持つファイル格納ノードのアドレスを検索して、ファイルの格納を依頼する。こうすることで、Ｂツリーの作成前に各ノードの記憶装置に保持されていたファイルを再配置することが可能になる。 Since the primary index node holds the addresses of all nodes, it searches for the address of the file storage node having the same number as the storage location code, and requests storage of the file. This makes it possible to rearrange the files held in the storage device of each node before the creation of the B-tree.

なお、図１５の手順は、データ記憶システム内の各ノードにおいてアプリケーションが実行されており、システム内に格納されているファイルが必要になったときに、そのファイルを検索する場合にも適用できる。ファイル名の問い合わせをするまでは図１５と同様であり、所望のファイルを格納しているノードのアドレスが判明すると、そのアドレスに対してファイルの要求を出す。ファイル要求を受け取ったファイル格納ノードは、ファイル名をコード化して自らに格納されているファイルであることを確認すると、そのファイルを要求元のファイル格納ノードに対して送信する。 The procedure shown in FIG. 15 can also be applied to a case where an application is executed in each node in the data storage system and a file stored in the system is needed and the file is searched. The process until the file name is inquired is the same as in FIG. 15. When the address of the node storing the desired file is found, a file request is issued to that address. When the file storage node receiving the file request encodes the file name and confirms that the file is stored in itself, the file storage node transmits the file to the requesting file storage node.

格子状配列内の全ノードのアドレスは、一次索引ノードにある。したがって、各ファイル格納ノードに対して一次索引ノードのアドレスだけを予め通知しておけば、各ファイル格納ノードは、必要なファイルの格納場所の問い合わせを一次索引ノードに対して発することで、ファイル格納ノードのアドレスを知ることができる。しかしながら、この構成では、すべての問い合わせが一次索引ノードに集中してしまう。これに対し本実施形態では、ファイル格納ノードからの問い合わせの一部については、二次索引ノードが管理する部分木内にあるファイルであれば二次索引ノードでアドレスを知ることができるため、一次索引ノードにおける検索の負荷を軽減できる。 The addresses of all nodes in the grid array are at the primary index node. Therefore, if only the address of the primary index node is notified in advance to each file storage node, each file storage node issues a query for the storage location of the necessary file to the primary index node, thereby storing the file. You can know the address of the node. However, with this configuration, all queries are concentrated on the primary index node. On the other hand, in the present embodiment, the address of a part of the query from the file storage node can be known by the secondary index node if the file is in a subtree managed by the secondary index node. The search load on the node can be reduced.

各ファイル格納ノードには、部分木内の二次索引ノードのアドレスのみならず、すべての二次索引ノードのアドレス情報を予め送信しておいてもよい。こうすれば、ファイル格納ノードからファイルを検索する際、コード化、ハッシュ値計算を行って、自らのノードの記憶装置にファイルが存在しない場合は、そのファイルが格納されたノードが含まれる部分木を管理する二次索引ノードに対し、ファイルの格納場所の問い合わせを直接行うことができる。したがって、図１５のように二次索引ノードから一次索引ノードに対する問い合わせが発生しないため、一次索引ノードの処理負荷を引き下げることができる。 The address information of all secondary index nodes as well as the addresses of secondary index nodes in the subtree may be transmitted in advance to each file storage node. In this way, when searching for a file from the file storage node, if encoding and hash value calculation are performed and the file does not exist in the storage device of its own node, the subtree including the node storing the file The secondary storage node that manages the file can be directly inquired about the storage location of the file. Therefore, since the secondary index node does not make an inquiry to the primary index node as shown in FIG. 15, the processing load on the primary index node can be reduced.

一次索引ノードに全ノードのアドレスを格納する代わりに、二次索引ノードのアドレスのみを格納しておいてもよい。この場合、外部からのファイル要求があったとき、一次索引ノードは、コード化とハッシュ値を計算して、そのファイルを格納しているファイル格納ノードを部分木として管理している二次索引ノードのアドレスを知ることができる。一次索引ノードは、二次索引ノードにファイル要求を転送する。二次索引ノードは、そのファイル要求からファイル名を取り出して格納場所コードを求めることで、ファイル格納ノードのアドレスをテーブルから検索する。そして、ファイル要求を該当するファイル格納ノードに渡す。こうすることで、最終的なファイル格納ノードのアドレスを検索する処理を二次索引ノードに回すことで、一次索引ノードの検索負荷をさらに低下させることができる。 Instead of storing the addresses of all nodes in the primary index node, only the addresses of the secondary index nodes may be stored. In this case, when there is an external file request, the primary index node calculates the encoding and hash value, and manages the file storage node storing the file as a subtree. Can know the address. The primary index node forwards the file request to the secondary index node. The secondary index node retrieves the file storage node address from the table by extracting the file name from the file request and obtaining the storage location code. Then, the file request is passed to the corresponding file storage node. By doing so, the search load of the primary index node can be further reduced by passing the process of searching the final file storage node address to the secondary index node.

図８に示したように、格子状配列されたノードを正方形分割して、それぞれにファイル格納ノード、索引ノードの役割を与えると、Ｂツリーを構成しないノードが発生することが避けられない。図８では、ノードａ０’、ｂ０’〜ｂ３’は、Ｂツリーを構成していない。そこで、これらの未使用ノードを、検索データのバックアップ領域として使用してもよい。 As shown in FIG. 8, when nodes arranged in a grid are divided into squares and given the roles of file storage nodes and index nodes, it is inevitable that nodes that do not constitute a B-tree will occur. In FIG. 8, the nodes a0 'and b0' to b3 'do not constitute a B-tree. Therefore, these unused nodes may be used as a backup area for search data.

図１６は、索引ノードのレプリケーションを示す。上述のユークリッドの互除法によって未使用ノードが発生する場合、それらのサイズは、必ず索引ノードを構成する正方形の部分格子と同じ大きさになる。したがって、同じサイズの索引ノードに格納されている索引データのレプリケーションを実行して、バックアップノードを作成する。図１６では、ノードａ０’は、一次索引ノードａ０をレプリケートし、ノードｂ０’〜ｂ３’は、二次索引ノードｂ０〜ｂ３をそれぞれレプリケートする（以下、元からＢツリーである方を「プライマリ」、レプリケートした方を「バックアップ」と呼ぶ）。こうして索引データを二重化することで、索引の信頼性を向上させることができる。これは、Ｂツリーの一部の階層を複製したことに相当する。 FIG. 16 illustrates index node replication. When unused nodes are generated by the Euclidean mutual division method described above, their sizes are always the same as the square sub-lattices constituting the index nodes. Therefore, replication of index data stored in the index node of the same size is executed to create a backup node. In FIG. 16, the node a0 ′ replicates the primary index node a0, and the nodes b0 ′ to b3 ′ replicate the secondary index nodes b0 to b3 (hereinafter, the original B-tree is “primary”). The person who replicates is called "backup"). By duplicating the index data in this way, the reliability of the index can be improved. This corresponds to duplication of a part of the hierarchy of the B-tree.

上記ユークリッドの互除法を用いた手順だと、最後には１×１のノードが二個以上できることになる。したがって、格子状配列の縦横が互いに素である限り、一次索引ノードのレプリケーションを実行できる。バックアップノードを確保したうえで、外部からのファイル要求を、ルータによりラウンドロビンやハッシュ等の既知のアルゴリズムで複数の一次索引ノード（プライマリとバックアップ）に分散して送ることで、ルートである一次索引ノードへのアクセスの集中を緩和し、アクセス数増加時の一次索引ノードの処理負荷を低下させることも可能である。ファイル格納ノードから二次索引ノードへのファイル要求についても、プライマリとバックアップの二次索引ノードに分散させることで同様の効果が得られる。 In the procedure using the Euclidean mutual division method, two or more 1 × 1 nodes are finally formed. Therefore, replication of the primary index node can be executed as long as the vertical and horizontal directions of the grid array are relatively prime. After securing the backup node, the primary index that is the root is sent by distributing external file requests to multiple primary index nodes (primary and backup) using a known algorithm such as round robin or hash by the router. It is also possible to alleviate the concentration of access to the nodes and reduce the processing load on the primary index node when the number of accesses increases. The same effect can be obtained by distributing the file request from the file storage node to the secondary index node to the primary and backup secondary index nodes.

また、ユークリッドの互除法で正方形の部分格子に分割したとき、プライマリとバックアップが並ぶ方向は、階層間で互い違いになる。すなわち、図１７に示すように、一次索引ノードのプライマリとバックアップとが横方向の並びであれば、二次索引ノードのプライマリとバックアップとの並びは縦方向になる。したがって、プライマリとバックアップとの間で、索引データを同期させるときのデータフロー（図１７中の斜線を付した矢印の方向）と、ファイル検索時のファイル要求やアドレス情報をやり取りするときのデータフロー（図１７中の白抜き矢印の方向）とは、通常直交する。したがって、これらのフローがひとつのリンクで競合することがなく、一部のリンクがボトルネックとなりシステム全体の処理性能が低下するような事態が発生しにくい。 Also, when divided into square sub-lattices by the Euclidean mutual division method, the direction in which the primary and the backup are arranged is alternate between the hierarchies. That is, as shown in FIG. 17, if the primary index node primary and backup are arranged in the horizontal direction, the secondary index node primary and backup are arranged in the vertical direction. Therefore, a data flow for synchronizing index data between the primary and the backup (in the direction of the hatched arrow in FIG. 17), and a data flow for exchanging file requests and address information during file search. It is normally orthogonal to (the direction of the white arrow in FIG. 17). Therefore, these flows do not compete for one link, and it is difficult for a situation where a part of the links become a bottleneck and the processing performance of the entire system deteriorates.

以上説明したように、本実施形態では、格子状に配列されたノードとＢツリー構造を適用し、Ｂツリーの「葉」「節点」「根」に相当するものに対して、格子型システム内のノードを一対一で割り当てるようにした。そして、葉に当たるノードにはＢツリーと同じく実際のデータを格納し、節点、根に当たるノードには、葉ノードにルーティングするための索引データを配置するようにした。従来から知られているＢツリーは、索引データも実際のデータも、ひとつのコンピュータの中で閉じているが、それを複数ノードに広げた点に特徴がある。また、Ｂツリーを作るためのノードの領域分割と、索引データの付け方が、ハッシュを介して対になっている点にも特徴がある。 As described above, in the present embodiment, nodes arranged in a lattice and a B-tree structure are applied, and the elements corresponding to “leaves”, “nodes”, and “roots” of the B-tree are applied to the lattice system. All nodes are assigned one-on-one. Then, the actual data is stored in the node corresponding to the leaf as in the B-tree, and the index data for routing to the leaf node is arranged in the node corresponding to the node and the root. A conventionally known B-tree is characterized in that both index data and actual data are closed in one computer, but it is spread over a plurality of nodes. Another feature is that a node area division for creating a B-tree and index data assignment are paired via a hash.

またユークリッドの互除法を用いることにより、格子状配列の縦横のノード数が互いに素であれば、原理的にノード数がいくつであってもデータを管理できる。したがって、システムの拡張性が高い。 In addition, by using the Euclidean mutual division method, data can be managed in principle regardless of the number of nodes as long as the number of nodes in the vertical and horizontal directions of the grid array is relatively prime. Therefore, the expandability of the system is high.

本実施形態によれば、格子状配列されたノードをデータ記憶システムとして用いるときに、公知のＢツリー構造を利用して、ファイルと索引ノードとを階層化して分散配置するようにした。索引ノードは、ファイルが実際に格納されているノードのアドレスを示すポインタの役割を果たす。また、索引ノード数と実ファイル格納ノード数とが、互いに素の関係となることを利用して、索引データを複数の索引ノードに分散させることができる。また、実ファイル格納ノード数と二次索引データ格納ノード数も互いに素となるので、実ファイル格納ノードに配置されるデータの偏りも分散させることができる。階層間のノード数比率が、互いに素である数字の二乗であるため、ノード数の比率が各階層で互いに素となるため、データの分配の偏りを小さくすることができる。また、各葉ノードでのファイル検索の負荷が比較的分散されるので、特定のノードの処理負荷が突出して高くなることを防止できる。 According to the present embodiment, when nodes arranged in a grid are used as a data storage system, files and index nodes are hierarchized and distributedly arranged using a known B-tree structure. The index node serves as a pointer indicating the address of the node where the file is actually stored. Further, the index data can be distributed to a plurality of index nodes by utilizing the fact that the number of index nodes and the number of actual file storage nodes are relatively prime. Also, since the number of actual file storage nodes and the number of secondary index data storage nodes are relatively prime, it is possible to disperse the bias of data arranged in the actual file storage nodes. Since the ratio of the number of nodes between hierarchies is a square of numbers that are relatively prime, the ratio of the number of nodes is relatively prime in each hierarchy, so that the distribution of data can be reduced. In addition, since the file search load at each leaf node is relatively distributed, it is possible to prevent the processing load of a specific node from protruding and increasing.

なお、実施形態では、簡単のためコード化を「日本語ファイル名の母音」で計算しているため、ファイル名の長さが実際には限られることから分散の効果は少ない。しかしながら、例えばアスキーコードなどを用いてコード化すれば、格納場所コードの値はよりばらついた値になることが予測されるので、この分散の効果はより大きくなることが期待される。これ以外にも、単にファイル名の文字数をカウントしたり、ファイル名の英数字に予め数を割り当てておいたり、暗号化の手法を適用するなど、ファイル名などの文字列をコード化するために任意の技術を使用することができる。 In the embodiment, for the sake of simplicity, the encoding is calculated by “vowels of Japanese file names”, and therefore the length of the file name is actually limited, so that the effect of dispersion is small. However, if encoding is performed using, for example, an ASCII code or the like, the value of the storage location code is predicted to be a more dispersed value, so that the effect of this dispersion is expected to be greater. In addition to this, in order to code character strings such as file names, such as simply counting the number of characters in the file name, assigning a number to the file name alphanumeric characters in advance, or applying an encryption method Any technique can be used.

本実施形態では、格子状に配列されたノードを正方形の部分領域に分割することで、次のような効果も生じる。すなわち、親子関係、兄弟関係にある枝ノードが、格子型システム内で近接して位置することになる。親子ノードは索引データの依存関係があり、兄弟ノードは索引データの補完関係にある。よって、互いに隣接していることで、親子方向、兄弟方向へのアクセス時間を短縮できる。
また、ファイル格納ノードが正方形の領域として確保されるため、ファイルを転送するときに、そのノードに至るまでの経路が複数化され、ノード間のリンクの一部が切断されたときでもファイルの転送を実現することができる。 In the present embodiment, the following effects are also obtained by dividing the nodes arranged in a lattice shape into square partial regions. That is, branch nodes having a parent-child relationship and a sibling relationship are located close to each other in the lattice system. Parent and child nodes have a dependency relationship with index data, and sibling nodes have a complementary relationship with index data. Therefore, the access time in the parent-child direction and sibling direction can be shortened by being adjacent to each other.
In addition, since the file storage node is secured as a square area, when transferring a file, multiple paths to that node are made, and the file is transferred even when some of the links between the nodes are disconnected. Can be realized.

以上、本発明をいくつかの実施の形態をもとに説明した。これらの実施の形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例がありうること、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described based on some embodiments. Those skilled in the art will understand that these embodiments are exemplifications, and that there may be various modifications to the combinations of the respective constituent elements and processing processes, and such modifications are also within the scope of the present invention. By the way.

請求項に記載の各構成要件が果たすべき機能は、本実施例において示された各機能ブロックの単体もしくはそれらの連係によって実現されることも当業者には理解されるところである。 It should also be understood by those skilled in the art that the functions to be fulfilled by the constituent elements described in the claims are realized by the individual functional blocks shown in the present embodiment or their linkage.

実施形態では、ファイル格納ノードの番号から求めたハッシュ値を使用して、ファイル格納ノードのアドレスを検索するための索引データを格納する索引ノードを決定することを述べた。しかしながら、上述のＢツリーでは、範囲を指定した検索に対応できない。そこで、ｎ分探索木を使って索引データの管理をしてもよい。ハッシュ値を計算する際に現れた格納場所コードを用いて節点や葉を形成すればｎ分探索木を構築できる。索引データはこの格納場所コードをそのまま用いる一方で、実データはハッシュ値により格納すべき葉を決定する。範囲探索では、このｎ分探索木をたどって葉に格納された実データを得る。 In the embodiment, it has been described that the index node that stores the index data for searching the address of the file storage node is determined using the hash value obtained from the number of the file storage node. However, the above-described B-tree cannot cope with a search specifying a range. Therefore, index data may be managed using an n-ary search tree. An n-ary search tree can be constructed by forming nodes and leaves using the storage location code that appears when calculating the hash value. The index data uses this storage location code as it is, while the actual data determines the leaf to be stored by the hash value. In the range search, the n-number search tree is traced to obtain actual data stored in the leaves.

格納場所コードを算出するためにファイル名を使用したが、それ以外のファイル固有の情報、例えばファイルの作成時刻、更新時刻、ファイルサイズ、ファイルの作成者、ファイルを作成したコンピュータ名やこれらの組合せから格納場所コードを算出してもよい。また、二種類以上の索引、つまり格納場所コードを併用してもよい。 The file name was used to calculate the storage location code, but other file-specific information such as file creation time, update time, file size, file creator, computer name that created the file, and combinations of these The storage location code may be calculated from Two or more types of indexes, that is, storage location codes may be used in combination.

格納すべきデータが増加した場合に、システム管理者がファイル格納ノードまたは索引ノードの増強を必要とするときには、事後的に格子状配列の縦、横のノード数を増加させることも可能である。この場合、一次索引ノードは、新たな格子状配列のノード数と各ノードのアドレス情報に基づいて、上述の手順にしたがってＢツリーを再構築するようにしてもよい。再構築の結果、ファイル格納ノード数と二次索引ノード数との比率が変わるため、索引データを格納すべき二次索引ノードやファイル格納ノード数が増加して、データファイルを格納すべきファイル格納ノードが変わったとき、各ノードは上述した再配置の手順にしたがって、データの再配置を実行するようにしてもよい。 If the system administrator needs to increase the file storage node or the index node when the data to be stored increases, it is possible to increase the number of vertical and horizontal nodes in the grid array afterwards. In this case, the primary index node may reconstruct the B-tree according to the above-described procedure based on the number of nodes in the new grid array and the address information of each node. As a result of the reconstruction, the ratio between the number of file storage nodes and the number of secondary index nodes changes, so the number of secondary index nodes that should store index data and the number of file storage nodes increases, and the file storage that should store data files When a node changes, each node may perform data rearrangement according to the above-described rearrangement procedure.

実施形態で述べたように、システムの管理者が仕様に応じた望ましい正方形の部分格子に切り出しができるように、格子状配列の縦、横のノード数を決定し、その通りに格子状配列を構成する方が、使用されないノードがなく、かつレプリケーション用のノードを確保した好ましい論理構成を持つ記憶システムを構築できる。しかしながら、システム管理者によらずに、正方形分割をプログラムで実行させることもできる。この場合、対象となる格子状配列１０の縦横のノード数は任意であってよく、いずれかのノードで実行されるプログラムが、図７で述べたステップを順次実行することで、正方形の部分格子への切り出しを行うようにしてもよい。 As described in the embodiment, the number of nodes in the grid array is determined so that the system administrator can cut into a desired square sub-grid according to the specifications, and the grid array It is possible to construct a storage system having a preferred logical configuration in which there are no unused nodes and a node for replication is secured. However, square division can be executed by a program without depending on the system administrator. In this case, the number of nodes in the vertical and horizontal directions of the target grid-like array 10 may be arbitrary, and the program executed on any node sequentially executes the steps described in FIG. You may make it cut out to.

格子の形状によっては、三次索引以上の階層をＢツリーに設けてもよいが、本発明の方法は二次索引ノードの検索まででファイル格納ノードを決定できるので、それ以上の次数の階層は不要である。 Depending on the shape of the grid, a hierarchy higher than the tertiary index may be provided in the B-tree. However, since the method of the present invention can determine the file storage node until the search of the secondary index node, no higher order hierarchy is required. It is.

一台のルータに複数のサーバまたはパーソナルコンピュータが接続することによって、図１のひとつのノード２０の下に複数のサーバやパーソナルコンピュータを配置してもよい。 A plurality of servers or personal computers may be arranged under one node 20 in FIG. 1 by connecting a plurality of servers or personal computers to one router.

本発明の一実施形態に係るデータ記憶システムと、これに接続されるクライアント端末の全体構成図である。1 is an overall configuration diagram of a data storage system according to an embodiment of the present invention and a client terminal connected thereto. FIG. 格子状配列を構成する各ノードのハードウェア構成図である。It is a hardware block diagram of each node which comprises a lattice-like arrangement | sequence. 各ノードを上下左右のノードとリンクさせる様子を示した図である。It is the figure which showed a mode that each node was linked with the node of an up, down, left and right. （ａ）、（ｂ）は、格子状配列を有するデータ記憶システムにおける従来技術を説明する図である。(A), (b) is a figure explaining the prior art in the data storage system which has a lattice-like arrangement | sequence. （ａ）、（ｂ）は、格子状配列を有するデータ記憶システムにおける従来技術を説明する図である。(A), (b) is a figure explaining the prior art in the data storage system which has a lattice-like arrangement | sequence. 格子状配列を複数の正方形の部分格子に分割する手順を示すフローチャートである。It is a flowchart which shows the procedure which divides | segments a grid | lattice-like arrangement | sequence into several square partial grids. 格子状配列の分割の具体例を示す図である。It is a figure which shows the specific example of the division | segmentation of a grid | lattice-like arrangement | sequence. 格子状配列の正方形の部分格子への分割結果と、各ノードに割り振られたノード番号を示す図である。It is a figure which shows the division | segmentation result to the square partial lattice of a grid | lattice-like arrangement | sequence, and the node number allocated to each node. Ｂツリーの構成を示す図である。It is a figure which shows the structure of B-tree. 検索プログラムを実行する一次索引ノードの機能ブロック図である。It is a functional block diagram of a primary index node that executes a search program. ファイル格納ノードと索引ノードを決定するプロセスを示すフローチャートである。6 is a flowchart illustrating a process for determining a file storage node and an index node. 母音コード表を示す図である。It is a figure which shows a vowel code table. ファイル名の具体例とファイル名に基づいたコードの算出方法を示す図である。It is a figure which shows the calculation method of the code based on the specific example of a file name, and a file name. Ｂツリーと、各ノードに格納されるデータを示す図である。It is a figure which shows the B-tree and the data stored in each node. ファイルをＢツリーに合わせて再構成するプロセスを示すフローチャートである。It is a flowchart which shows the process of reorganizing a file according to B-tree. 索引ノードのレプリケーションを示す図である。It is a figure which shows replication of an index node. 索引データを同期させるときのデータフローと、ファイル検索時のデータフローとを示す図である。It is a figure which shows the data flow when synchronizing index data, and the data flow at the time of a file search.

Explanation of symbols

１０格子状配列、１２クライアント端末、１４ネットワーク、１６ルータ、２０ノード、９６記憶装置、９８ネットワークインタフェース、１００データ記憶システム、１０２ファイル受取部、１０４検索部、１０６コード化部、１０８ハッシュ計算部、１１０ファイル転送部、１１２テーブル保持部、１１４情報取得部。 10 grid array, 12 client terminal, 14 network, 16 router, 20 node, 96 storage device, 98 network interface, 100 data storage system, 102 file receiving unit, 104 search unit, 106 encoding unit, 108 hash calculation unit, 110 file transfer unit, 112 table holding unit, 114 information acquisition unit.

Claims

A data storage system in which files held in a plurality of nodes each having a storage device are managed using a common index to function as a single database,
The plurality of nodes are arranged in a grid pattern, and each node is connected to be communicable with front, rear, left and right nodes, and a file storage node that actually stores a file and an index node that stores index data of the file storage node Each is divided to form a square sub-grid,
The file storage node is made to correspond to the leaf of the tree structure, the index node is made to correspond to the root or node of the tree structure, and the information of the tree structure for managing the file and the index data is held in the index node,
Address information of the file storage node is stored in an index node that is uniquely determined based on file specifying information for specifying a file held in the file storage node ;
The number of nodes arranged in the vertical direction and the number of nodes arranged in the horizontal direction are relatively prime in the lattice-like arrangement, and the lattice-like arrangement is divided into a plurality of square sub-lattices using the Euclidean algorithm. A data storage system.

Of the partial grids obtained by dividing the nodes arranged in the grid, a node included in the largest partial grid is the file storage node, and a node included in another partial grid is the index node. The data storage system of claim 1, wherein:

The file specifying information is encoded according to a predetermined rule, and a file storage node to store a file corresponding to the file specifying information is determined according to the obtained code.
3. The data according to claim 1, wherein a hash value is obtained by applying a hash function to the code, and an index node that should hold address information of the file storage node is determined according to the hash value. Storage system.

4. The data storage system according to claim 3 , wherein a name assigned to the file is used as the file specifying information.

The case that could portions grid of the same size by dividing the data storage system of claim 1, characterized in that to replicate the index data into a plurality of partial grid to create a backup of the index nodes.

In a data storage system in which files held in a plurality of nodes each having a storage device are managed using a common index and function as a single database,
The plurality of nodes are arranged in a grid pattern, and each node is connected to be communicable with front, rear, left and right nodes, and a file storage node that actually stores a file and an index node that stores index data of the file storage node Each is divided to form a square sub-grid,
The file storage node is made to correspond to the leaf of the tree structure, the index node is made to correspond to the root or node of the tree structure, and the information of the tree structure for managing the file and the index data is held in the index node When
A file search program executed in the index node is
The ability to receive file search requests;
A function of encoding file specifying information of the file according to a predetermined rule, and determining a file storage node in which a file corresponding to the file specifying information is stored according to the obtained code;
A function for determining a hash value by applying a hash function to the code, and determining an index node in which address information of the file storage node is held according to the hash value;
A file search program in a data storage system comprising:

In a data storage system in which files held in a plurality of nodes each having a storage device are managed using a common index and function as a single database,
The plurality of nodes are arranged in a grid pattern, and each node is connected to be communicable with front, rear, left and right nodes, and a file storage node that actually stores a file and an index node that stores index data of the file storage node Each is divided to form a square sub-grid,
The file storage node is made to correspond to the leaf of the tree structure, the index node is made to correspond to the root or node of the tree structure, and the information of the tree structure for managing the file and the index data is held in the index node When
The index nodes functions as a file search unit,
A file receiver for receiving a file search request;
A coding unit that codes the file identification information of the file according to a predetermined rule, and determines a file storage node in which a file corresponding to the file identification information is stored according to the obtained code;
A hash calculation unit that applies a hash function to the code to obtain a hash value, and determines an index node that holds address information of the file storage node according to the hash value;
A file search apparatus in a data storage system comprising: