JP2000322292A

JP2000322292A - Cluster type data server system and data storage method

Info

Publication number: JP2000322292A
Application number: JP11128273A
Authority: JP
Inventors: Koichi Konishi; 弘一小西; Yuichi Aiba; 雄一相場
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1999-05-10
Filing date: 1999-05-10
Publication date: 2000-11-24

Abstract

PROBLEM TO BE SOLVED: To increase the processing speed of an entire cluster system by adaptively distributing the load of processing dependent upon specific data on a data storage device to plural disks and computer nodes at the time of execution to uniformly charge individual data storage devices and computer nodes with the load in a non-shaped cluster system. SOLUTION: Data to be stored in a disk device is roughly divided and is subdivided furthermore, and all subdivided data 100-i1 to 100-im derived from the same roughly divided data are stored in the same disk, and meanwhile, copies 110-i1 to 110-im,..., 1r0-i1 to 1r0-im of respective subdivided data are stored in disks different from the disk, where original data is stored, and different from one another, and processing requests requiring individual subdivided data are distributed to the computer node having the disk, where original data is stored, and computer nodes having disks, where copy data are stored, in consideration of respective load conditions.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、複数の計算機ノー
ドで構成されたクラスタシステムに関し、特に、各計算
機ノードがそれぞれローカルなデータ記憶装置を有する
無共有型クラスタシステムにおいてクラスタ外のクライ
アントからの要求を、各ノードのローカルデータ記憶装
置内のデータを用いて処理するサーバアプリケーション
を実行するサーバシステムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a cluster system composed of a plurality of computer nodes, and more particularly to a request from a client outside a cluster in a non-shared type cluster system in which each computer node has a local data storage device. And a server system that executes a server application that processes data using data in a local data storage device of each node.

【０００２】[0002]

【従来の技術】一つのアプリケーションを実行するため
に、複数の独立した計算機を用いて構成したシステムを
クラスタシステムという。クラスタシステムと分散シス
テムとの相違は、クラスタシステムでは、アプリケーシ
ョンの実行に、常時、クラスタ全体がかかわることであ
る。またクラスタシステムと並列計算機との相違は、単
独でも使用できる独立した計算機の集合として構成され
ていることである。2. Description of the Related Art A system constituted by using a plurality of independent computers to execute one application is called a cluster system. The difference between a cluster system and a distributed system is that, in a cluster system, the entire cluster is always involved in executing an application. The difference between the cluster system and the parallel computer is that the cluster system is configured as a set of independent computers that can be used alone.

【０００３】クラスタシステムの目的は、大きく、高性
能化、大容量化、信頼性向上の三つがある。すなわち、
より多くの計算機を用いることによって、より高い処理
性能を達成すること、より多くのデータを扱うこと、複
数の計算機を用いることにより、そのうちのいくつかの
計算機が停止しても、残りの計算機で処理を継続できる
ようにすることである。[0003] The purpose of a cluster system is threefold: large, high performance, large capacity, and high reliability. That is,
By using more computers, achieving higher processing performance, handling more data, and using multiple computers, even if some of them are stopped, the remaining computers That is, processing can be continued.

【０００４】これらの目的が達成されるなら、クラスタ
システムは、大規模なサーバアプリケーションを実行す
るシステムに好適とされる。というのは、このようなア
プリケーションは、一般に、ディスク装置（磁気ディス
ク装置、光ディスク装置等を含む）上の大量のデータを
高速に処理する必要があり、また、常時、途切れること
なくクライアントからの要求に応えることができる必要
もあるからである。[0004] If these objects are achieved, a cluster system is suitable for a system that executes a large-scale server application. This is because such an application generally needs to process a large amount of data on a disk device (including a magnetic disk device, an optical disk device, and the like) at a high speed, and always receives a request from a client without interruption. It is necessary to be able to respond to

【０００５】ディスク装置上の大量のデータを用いて処
理を行うサーバアプリケーションとしては、データベー
ス管理システム上のオンライントランザクション処理サ
ーバや、音声・動画データの再生を行うクライアントに
それらのデータを供給するマルチメディアストリームサ
ーバ、膨大な数の文書からユーザーが指定したキーワー
ドを含む文書を見つけてその一覧を返す全文検索サーバ
等が知られている。[0005] Server applications that perform processing using a large amount of data on a disk device include an online transaction processing server on a database management system and multimedia that supplies such data to a client that plays back audio and video data. There are known a stream server, a full-text search server that finds a document including a keyword specified by a user from a huge number of documents, and returns a list of the documents.

【０００６】クラスタシステムにおいて、ディスクと計
算機を接続する形態により、大きく二つに分類される。
その一つは、「ディスク共有(shared disk)型」と呼ば
れ、計算機とディスク装置をネットワークを介して接続
し、任意の計算機から任意のディスク装置に対して同じ
手順でアクセスができるようにするものであり、他は、
「無共有(shared nothing)型」と呼ばれ、各計算機に
それぞれローカルにディスク装置を直接接続し、計算機
間で相互接続するためのネットワークを設けるものであ
る。[0006] In a cluster system, it is roughly classified into two types according to a mode of connecting a disk and a computer.
One of them is called "shared disk type", which connects a computer and a disk device via a network so that any computer can access any disk device in the same procedure. The other is
This is called a "shared nothing type", in which a disk device is directly connected locally to each computer, and a network for interconnecting the computers is provided.

【０００７】無共有型と比べ、ディスク共有型は、ディ
スク装置上の特定のデータを必要とする処理が、クラス
タ内の任意の計算機ノードで行うことができる、という
利点を有している。[0007] Compared to the non-shared type, the shared disk type has an advantage that processing requiring specific data on a disk device can be performed by any computer node in the cluster.

【０００８】そこで、従来より、ディスク共有型のクラ
スタシステムの上に構築されるトランザクション処理サ
ーバ等では、各計算機ノードの負荷や稼働状態を常時監
視して、クライアントから新しい処理要求を受け付ける
と、稼働中のより負荷の軽い計算機ノードを選択し、該
選択した計算機ノードに処理を割り当てるための手段を
備えているものがある。トランザクション処理サーバで
はトランザクションモニタと呼ばれる。Conventionally, a transaction processing server or the like built on a disk-sharing type cluster system constantly monitors the load and operating state of each computer node, and receives a new processing request from a client. Some include means for selecting a computer node with a lighter load from among them and assigning processing to the selected computer node. In the transaction processing server, it is called a transaction monitor.

【０００９】かかる構成により、ある計算機ノードが停
止していたり、負荷が高かったりする場合、当該計算機
ノードの代替として、クラスタ内の他の任意の計算機ノ
ードが選択され、処理を代替させて継続させることがで
きる。With this configuration, if a computer node is stopped or the load is high, another arbitrary computer node in the cluster is selected as a substitute for the computer node, and the processing is substituted and continued. be able to.

【００１０】このようにして、ディスク共有型クラスタ
システムでは、比較的容易に、ある計算機ノードの停止
時の運転継続、及び各計算機ノードの負荷の均等化を図
ることができる。In this manner, in the disk-sharing type cluster system, it is possible to relatively easily continue operation when a certain computer node is stopped and to equalize the load on each computer node.

【００１１】一方、無共有型クラスタにおいては、一の
計算機ノードから他の計算機ノードにローカルに接続し
ているディスク装置にアクセスするには、一の計算機ノ
ードから他の計算機ノードに対してディスク装置に対す
る入出力処理を依頼し、他の計算機ノードにディスク装
置に対する入出力処理を代行してもらう必要がある。こ
れを、「遠隔ディスクアクセス」という。On the other hand, in a non-shared cluster, in order for one computer node to access a disk device that is locally connected to another computer node, the disk device must be transmitted from one computer node to another computer node. It is necessary to request input / output processing to the disk device and to have another computer node take over the input / output processing to the disk device. This is called "remote disk access".

【００１２】この際、ネットワークを介して依頼する入
出力処理の内容及びこれに伴うデータが二つの計算機ノ
ード間で授受され、一の計算機ノードが該一の計算機ノ
ードに直接接続しているローカルディスク装置にアクセ
スする場合に比べ、アクセス時間が増大するとともに、
ネットワークトラフィック等が増大する。At this time, the contents of the input / output processing requested via the network and the accompanying data are exchanged between the two computer nodes, and one computer node is connected to the local disk directly connected to the one computer node. Access time increases compared to accessing the device,
Network traffic and the like increase.

【００１３】また、クラスタ内の一の計算機ノードが停
止してしまった場合、他の計算機ノードから停止した一
の計算機ノードに直接接続しているローカルディスク装
置に記憶されているデータにアクセスするための手段が
なくなる。Further, when one computer node in the cluster is stopped, another computer node accesses data stored in a local disk device directly connected to the stopped one computer node. Means are gone.

【００１４】このため、無共有型クラスタシステムは、
そのままでは計算機ノードが停止時の運転継続を実現す
ることは困難である。For this reason, the non-shared cluster system is
It is difficult for the computer node to continue operation when the computer node is stopped as it is.

【００１５】無共有型クラスタシステムにおいて、計算
機ノードの停止時の運転継続を行うための手法として
は、ディスクミラリング（mirroring）またはリダンダ
ンントディスクアレイ等の冗長構成がよく知られてい
る。この手法では、同一のデータを複数の異なる計算機
ノードに接続するディスクに格納する。In a non-shared type cluster system, as a technique for continuing operation when a computer node stops, a redundant configuration such as a disk mirroring or a redundant disk array is well known. In this method, the same data is stored on disks connected to a plurality of different computer nodes.

【００１６】これにより、一の計算機ノードが停止する
かあるいは計算機ノードに接続するローカルディスク装
置が停止しても、別のディスク装置にある同じデータの
複製を利用して処理を継続することができる。Thus, even if one computer node stops or a local disk device connected to the computer node stops, processing can be continued by using the same data copy in another disk device. .

【００１７】以下に説明する分散ファイルシステム「Ti
gerShark」は、このようなシステムの例である。「Tige
rShark」については、文献（R. L. Haskin、“Tiger Sh
ark-- A scalable file system for multimedia”, IBM
J. RES. DEVELOP. VOL. 42NO. 2、 March 1998、pp187
-189）の記載が参照される。The distributed file system “Ti” described below
"gerShark" is an example of such a system. "Tige
rShark ”is described in the literature (RL Haskin,“ Tiger Sh
ark-- A scalable file system for multimedia ”, IBM
J. RES. DEVELOP. VOL. 42 NO. 2, March 1998, pp187
-189).

【００１８】この「Tiger Shark」の特徴の一つは、フ
ァイルを多数の異なる計算機ノードに接続されたディス
ク装置に分散して格納する、ストライピングという技術
である。概略を述べれば、n個の計算機ノードとそのロ
ーカルディスク装置があるとき、ファイル内のi番目の
ブロックを(ｉ mod ｎ)番目（modは剰余演算子）の計
算機ノードのローカルディスク装置に格納する。One of the features of the “Tiger Shark” is a technique called striping, in which files are distributed and stored in disk devices connected to a number of different computer nodes. Briefly, when there are n computer nodes and their local disk devices, the i-th block in the file is stored in the local disk device of the (i mod n) -th (mod is a remainder operator) computer node. .

【００１９】この「Tiger Shark」の他の特徴は、ファ
イルの複製をブロック単位で作成して原本と異なるディ
スク装置に配置する分散冗長格納技術である。この分散
ファイルシステムは、ビデオストリームサーバを実現す
るために使われている。Another feature of this "Tiger Shark" is a distributed redundant storage technique in which a copy of a file is created in block units and placed on a disk device different from the original. This distributed file system is used to realize a video stream server.

【００２０】ストライピング技術は、以下記載の作用効
果を有している。ビデオ（映像信号）や音声信号等のス
トリームデータを供給するサーバには、単位時間当たり
に大量のデータを供給する能力が求められる。単位時間
当たりのデータ供給量を「スループット」という。The striping technique has the following effects. A server that supplies stream data such as a video (video signal) and an audio signal is required to have a capability of supplying a large amount of data per unit time. The data supply amount per unit time is called “throughput”.

【００２１】ストリームデータサーバに求められるスル
ープットはしばしば単一のディスク装置や計算機ノード
が提供できるスループットを超えている。The throughput required for a stream data server often exceeds the throughput that can be provided by a single disk device or computer node.

【００２２】そこで、ファイルをストライピングしてお
いて、複数の計算機ノードがそれぞれ異なるディスク装
置から並行に、データを読み出して、クライアントに向
けて送出することでクラスタシステム全体として高いデ
ータ供給性能を達成することができる。In view of the above, a plurality of computer nodes read data in parallel from different disk devices and send the data to the clients in parallel with a plurality of computer nodes, thereby achieving high data supply performance as a whole cluster system. be able to.

【００２３】一方、分散冗長格納技術の作用効果は以下
の通りである。On the other hand, the functions and effects of the distributed redundant storage technique are as follows.

【００２４】そもそもデータの複製を複数のディスク装
置に持つことによって、いくつかのディスク装置や計算
機ノードに障害が起きたときにもデータの供給を継続す
ることができる。「Tiger Shark」は、さらに「均等ラ
ンダムストライピング」という技術によって、障害発生
時に、残りの正常なディスク装置や計算機ノードに対し
て、障害を起こした計算機ノードやディスク装置を肩代
わりする分の負荷が均等にかかるようにしている。By providing data replication to a plurality of disk devices in the first place, data supply can be continued even when some disk devices or computer nodes fail. "Tiger Shark" also uses a technology called "Equal Random Striping" to evenly load the remaining normal disk units and computer nodes in order to replace the failed computer nodes and disk units when a failure occurs. It is going to take.

【００２５】均等ランダムストライピングにおいては、
ｋ個のディスク装置にファイルを分散して格納する場
合、一つのファイルを構成するブロックを、先頭からｋ
個ずつのグループに分け、各グループごとに異なるラン
ダムに決めた順序でブロックを配置する。各ファイルの
複製も同様に均等ランダムストライピングしておくこと
で、障害発生等により、あるディスク装置上のデータが
利用できない場合、その複製は、ブロックのグループご
とに異なるディスクに均等な割合で存在するので、ファ
イルを先頭から順に読み出していくかぎり、ファイル読
み出し負荷は異なるディスク装置間に均等にかかること
になる。In uniform random striping,
When files are distributed and stored in k disk units, blocks forming one file are k blocks from the beginning.
The blocks are divided into individual groups, and the blocks are arranged in a randomly determined order different for each group. By duplicating each file equally in the same manner, if data on a certain disk device cannot be used due to a failure or the like, the duplication exists in an even ratio on a different disk for each block group. Therefore, as long as the file is sequentially read from the head, the file read load is equally applied to different disk devices.

【００２６】[0026]

【発明が解決しようとする課題】しかしながら上記した
「Tiger Shark」等従来のシステムは、下記記載の問題
点を有している。However, conventional systems such as the above-mentioned "Tiger Shark" have the following problems.

【００２７】第1の問題点は、マルチメディアストリー
ムデータの供給以外の用途に用いた場合、ディスク装置
及び計算機ノードの処理負荷に偏りが生じる可能性が高
い、ということである。The first problem is that when used for purposes other than the supply of multimedia stream data, there is a high possibility that the processing load on the disk device and the computer node will be biased.

【００２８】その理由は、「Tiger Shark」において
は、扱うファイルとして、動画や音声などのマルチメ
ディアストリームデータを想定しており、比較的大きな
ファイルを先頭から最後まで順番に読み出す、というア
クセス方式に適した構成に特化されているためである。[0028] The reason is that in "Tiger Shark", the file to be handled is a multimedia such as a moving image or audio.
This is because media stream data is assumed, and a configuration suitable for an access method of reading relatively large files in order from the beginning to the end is specialized.

【００２９】すなわち、「Tiger Shark」では、各ファ
イルをそれぞれ全計算機ノードに渡って、一様にストラ
イプ化されたディスク装置に分散格納している。このた
め、ファイルごとのアクセス頻度の偏りがあっても、計
算機ノードごとの処理負荷の偏りは発生しない。That is, in the case of "Tiger Shark", each file is distributed and stored in a uniformly striped disk device across all the computer nodes. Therefore, even if the access frequency is uneven for each file, the processing load is not uneven for each computer node.

【００３０】しかしながら、オンライントランザクショ
ン処理や、インデクスを用いる検索処理では、一つファ
イルの中でも場所によってアクセス頻度が相違したり、
アクセスするデータの範囲が毎回異なったりする。However, in online transaction processing and search processing using an index, the access frequency differs depending on the location within one file,
The range of data to be accessed may be different each time.

【００３１】またファイル全体を先頭から最後まで順番
に読み出して走査する検索処理ではディスク装置のアク
セス形式は、ストリームデータと同じであるが、検索条
件に該当するデータの個数によって、同じ量のデータを
読み出しても、計算機ノードごとの処理量は、大きく変
わる可能性がある。In the search processing for reading and scanning the entire file in order from the beginning to the end, the access format of the disk device is the same as that of the stream data, but the same amount of data is determined depending on the number of data corresponding to the search condition. Even if it is read, the processing amount for each computer node may vary greatly.

【００３２】そして、上記した「Tiger Shark」はこの
ような負荷の偏りへの対策を備えていない。The above-mentioned "Tiger Shark" does not have a countermeasure against such load imbalance.

【００３３】第２の問題点は、マルチメディアストリー
ムデータの供給以外の用途に用いた場合、従来のストラ
イピングや均等ランダムストライピングでは、処理性能
が向上するとは限らず、逆に、低下する可能性もあると
いう、ことである。The second problem is that, when used for applications other than the supply of multimedia stream data, the conventional striping or uniform random striping does not always improve the processing performance, and conversely, may decrease the processing performance. That is.

【００３４】その理由は、基本的には前記第１の問題点
の理由と同じである。ストリームデータのアクセスで
は、比較的大きなファイル全体を先頭から最後まで順番
に読み出す必要があると予測できるので、ストライピン
グしておいて、全てのディスク装置に対し一斉に読み出
し開始を指示して並列処理を行うことができ、これによ
りスループットを向上させることができる。The reason is basically the same as that of the first problem. In stream data access, it can be predicted that it is necessary to read a relatively large file in order from the beginning to the end. Therefore, striping is performed, and all disk units are simultaneously instructed to start reading and parallel processing is performed. And thereby improve throughput.

【００３５】しかしながら、トランザクション処理やイ
ンデクスファイルを用いた検索処理では、ファイルに対
するアクセスはファイル全体ではなくその小さな一部分
に対して、また先頭から順番ではなく、予期できない不
特定の順序で行われる。However, in the transaction processing and the search processing using the index file, access to the file is performed not on the entire file but on a small part thereof, and not in the order from the beginning but in an unspecified unpredictable order.

【００３６】このため、ストライピングされたファイル
に対する個々のアクセスは、全ディスク装置ではなく、
数個のディスク装置に対するアクセスとなり、並列処理
の効果が薄い。For this reason, individual access to the striped file is not performed for all disk devices,
Access to several disk devices is required, and the effect of parallel processing is weak.

【００３７】そして、一連のアクセスは、異なるディス
ク装置に対するものになる可能性が高いので、これに対
する複数のディスク装置による並列処理の効果は期待で
きるが、ストリームデータとは異なり、ファイルのどの
部分に、どんな順序でアクセスが起きるかが予測できな
いため、各々のアクセス要求が到着するのを待ってから
しかアクセスを開始することができず、結局、並列処理
の効果は薄い。Since a series of accesses is likely to be made to different disk devices, the effect of parallel processing by a plurality of disk devices can be expected. However, unlike stream data, in any part of a file, Since it is not possible to predict in what order access will occur, access can only be started after each access request has arrived, and the effect of parallel processing is weak.

【００３８】さらに、ストライピングせずに一つのファ
イルを一つのディスク装置に格納する構成と比べると、
複数のディスク装置に対して遠隔ディスクアクセスを行
う手間がかかり、かえって、処理性能が低くなる可能性
もある。Furthermore, when compared with a configuration in which one file is stored in one disk device without striping,
It takes time and effort to perform remote disk access to a plurality of disk devices, and on the contrary, processing performance may be reduced.

【００３９】第３の問題点は、ストライピングを止める
と、負荷分布に大きな偏りが生じる、ということであ
る。A third problem is that when the striping is stopped, a large deviation occurs in the load distribution.

【００４０】その理由は、完全にストライピングをやめ
て、ファイルの全ブロックを一つのディスク装置に格納
してしまうと、当該ディスク装置にアクセスが集中して
しまうためである。The reason is that if the striping is completely stopped and all the blocks of the file are stored in one disk device, accesses are concentrated on the disk device.

【００４１】また、ここまで極端な状況に到らないまで
も、別の問題が生じる。Even if the situation is not so extreme, another problem arises.

【００４２】たとえば、中間的な手法としては、ｋ個の
ディスクがあるとき、ファイルがｋ個のブロックに収ま
るよう、ブロックサイズを大きく設定した上で、均等ラ
ンダムストライピングを適用するという方法が考えられ
る。For example, as an intermediate technique, when there are k disks, a method of setting a large block size so that a file can be accommodated in k blocks and then applying uniform random striping can be considered. .

【００４３】この場合、ブロックのグループが一つしか
できないため、あるディスク装置に配置されるブロック
の複製は、別の一つのディスク装置にしか格納されな
い。In this case, since only one group of blocks can be formed, a copy of a block arranged in one disk device is stored only in another disk device.

【００４４】このため、あるディスク装置が停止した
時、そのディスク装置を肩代りする負荷は、上記別の一
つのディスク装置に集中することになり、均等ランダム
ストライピングの効果が失われる。For this reason, when one disk device stops, the load taking over for that disk device is concentrated on the other disk device, and the effect of the uniform random striping is lost.

【００４５】この問題を回避するために、ファイルが(m
×k)個のブロックに収まるようにブロックサイズを設定
して均等ランダムストライピングを行うこともできる
が、mを大きくするにつれ、肩代りのディスク装置の負
荷はよりよく分散されるものの、上記した第２の問題点
がより顕著になるという問題が残る。To avoid this problem, the file must be (m
× k) The block size can be set so that it fits within the number of blocks, and even random striping can be performed.However, as the value of m is increased, the load on the replacement disk device is more well-distributed. There remains a problem that the second problem becomes more remarkable.

【００４６】したがって、本発明は、上記問題点に鑑み
てなされたものであって、その目的は、無共有型のクラ
スタシステムにおいて、データ記憶装置上の特定のデー
タに依存した処理の負荷を、実行時に適応的に複数のデ
ィスク及び計算機ノードに拡散して、各データ記憶装置
及び計算機ノードにかかる負荷を均等にし、結果的にク
ラスタシステム全体の処理を高速化する、システム及び
方法を提供することにある。Therefore, the present invention has been made in view of the above problems, and an object of the present invention is to reduce a processing load depending on specific data on a data storage device in a non-shared cluster system. To provide a system and method for adaptively spreading to a plurality of disks and computer nodes at the time of execution to equalize the load on each data storage device and computer node, and consequently speeding up the processing of the entire cluster system. It is in.

【００４７】本発明の他の目的は、いくつかのデータ記
憶装置または計算機ノードが停止した時にも、当該デー
タ記憶装置または計算機ノードの分の負荷を稼働中の複
数の計算機ノードに分散して均等に肩代りさせる、シス
テム及び方法を提供することにある。Another object of the present invention is to distribute the load of the data storage device or the computer node to a plurality of operating computer nodes evenly even when some of the data storage devices or the computer nodes are stopped. To provide a system and method that can take over.

【００４８】[0048]

【課題を解決するための手段】前記目的を達成する本願
第１発明に係るクラスタサーバシステムは、複数の計算
機ノードより構成され、前記各計算機ノードがそれぞれ
ローカルにデータ記憶装置を備える無共有型クラスタシ
ステムの上で、データ記憶装置上のデータに依存する処
理を行うサーバ・システムであって、前記データ記憶装
置に記憶するデータを粗に分割したものを粗分割データ
とし、前記粗分割データをさらに分割したものを細分割
データとし、同一粗分割データに由来するすべての細分
割データを同一のデータ記憶装置に記憶し、前記同一粗
分割データに由来するすべての細分割データの複製を原
本とは異なる、かつ、互いに異なるデータ記憶装置に記
憶するようにしたものである。According to a first aspect of the present invention, there is provided a cluster server system comprising a plurality of computer nodes, wherein each of the computer nodes has a local data storage device. A server system for performing a process depending on data on a data storage device on a system, wherein coarsely divided data stored in the data storage device is defined as coarsely divided data, and the coarsely divided data is further divided. The divided data is regarded as subdivided data, all subdivided data derived from the same coarsely divided data are stored in the same data storage device, and a copy of all subdivided data derived from the same coarsely divided data is referred to as an original. The data is stored in different and different data storage devices.

【００４９】本願第２発明は、上記第１発明のサーバシ
ステムにおいて、ある細分割データに対する入出力を前
記細分割データの原本を記憶するデータ記憶装置と前記
細分割データの複製を記憶するデータ記憶装置とのいず
れかを利用して実行するよう振り分ける入出力要求割り
当て手段を備えている。According to a second aspect of the present invention, in the server system according to the first aspect, an input / output for certain subdivided data is performed by a data storage device that stores an original of the subdivided data and a data storage that stores a copy of the subdivided data. There is provided an input / output request allocating means for allocating the request to be executed using any of the devices.

【００５０】また、本願第３発明は、上記第１発明のサ
ーバシステムにおいて、ある細分割データを必要とする
処理に対する要求をを前記細分割データの原本を記憶す
るデータ記憶装置を備えた計算機ノードと、前記細分割
データの複製を記憶するデータ記憶装置を備えた計算機
ノードとのどちらかを利用して実行するよう振り分ける
処理要求割り当て手段を備えている。According to a third aspect of the present invention, in the server system according to the first aspect of the present invention, a request for a process requiring certain subdivided data is sent to a computer node having a data storage device for storing an original of the subdivided data. And a processing request allocating means for allocating the processing to one of the computer nodes having a data storage device for storing a copy of the subdivided data.

【００５１】本願第４発明は、上記第２発明のサーバシ
ステムにおいて、ある細分割データの原本を記録するデ
ータ記憶装置の負荷と、前記細分割データの複製を記録
するデータ記憶装置の負荷を推測する手段と、より軽い
負荷を持つと推測されたデータ記憶装置を利用して前記
細分割データに対する入出力を実行するよう前記細分割
データに対する入出力要求を振り分ける要求割り当て手
段と、を備えている。According to a fourth aspect of the present invention, in the server system according to the second aspect of the present invention, a load on a data storage device for recording an original of certain subdivided data and a load on a data storage device for recording a copy of the subdivided data are estimated. Means for allocating an input / output request for the subdivided data so as to execute input / output to the subdivided data using a data storage device estimated to have a lighter load. .

【００５２】さらに、本願第５発明は、上記第３発明の
サーバシステムにおいて、ある細分割データの原本を記
録するデータ記憶装置を備えた計算機ノードの負荷と、
前記細分割データの複製を記録するデータ記憶装置を備
えた計算機ノードの負荷を推測する手段と、より軽い負
荷を持つと推測された計算機ノードを利用して前記細分
割データを必要とする処理を実行するよう前記細分割デ
ータを必要とする処理要求を振り分ける要求割り当て手
段と、を備えている。Further, the fifth invention of the present application is the server system according to the third invention, wherein a load on a computer node having a data storage device for recording an original of certain subdivided data;
Means for estimating the load on a computer node having a data storage device for recording a copy of the subdivided data, and processing requiring the subdivided data using the computer node estimated to have a lighter load. Request allocation means for allocating a processing request that requires the subdivision data to be executed.

【００５３】また本願第６発明は、上記第２発明のサー
バシステムにおいて、各計算機ノードの稼働状態を監視
する手段を備え、ある細分割データの原本を記録するデ
ータ記憶装置を備えた計算機ノードと、前記細分割デー
タの複製を記録するデータ記憶装置を備えた計算機ノー
ドのうち、稼働中の計算機ノードを利用して前記細分割
データに対する入出力を実行するように前記細分割デー
タに対する入出力要求を振り分ける要求割り当て手段を
備えている。According to a sixth aspect of the present invention, in the server system according to the second aspect of the present invention, there is provided a computer node comprising means for monitoring the operating state of each computer node, and comprising a data storage device for recording an original of certain subdivided data. An input / output request for the subdivided data so as to execute input / output for the subdivided data by using an operating computer node among computer nodes having a data storage device for recording a copy of the subdivided data Request allocation means for distributing the request.

【００５４】さらに本願第７発明は、上記第３発明のサ
ーバシステムにおいて、各計算機ノードの稼働状態を監
視する手段を備え、ある細分割データの原本を記録する
データ記憶装置を備えた計算機ノードと、前記細分割デ
ータの複製を記録するデータ記憶装置を備えた計算機ノ
ードのうち稼働中の計算機ノードを利用して前記細分割
データを必要とする処理を実行するように前記細分割デ
ータを必要とする処理要求を振り分ける要求割り当て手
段を備えている。Further, the seventh invention of the present application is the server system according to the third invention, further comprising means for monitoring the operating state of each computer node, and a computer node having a data storage device for recording an original of certain subdivided data. Needing the subdivided data so as to execute a process that requires the subdivided data using an operating computer node among computer nodes having a data storage device that records a copy of the subdivided data. Request allocation means for allocating processing requests to be performed.

【００５５】[0055]

【発明の実施の形態】本発明の実施の形態について説明
する。まず、本発明の原理・作用について説明する。Embodiments of the present invention will be described. First, the principle and operation of the present invention will be described.

【００５６】本発明を無共有型クラスタシステム構成の
サーバシステムに適用した形態として、データ記憶装置
（2_i）（但し、iは１以上ｎ以下の整数）と要求処理手
段（7_i）とファイルシステム（5_i）とを備える複数
（ｎ台）の計算機ノードと、これら全ての計算機ノード
を接続するネットワーク（３）とを備えて構成されてい
る。As an embodiment in which the present invention is applied to a server system having a non-shared cluster system configuration, a data storage device (2_i) (where i is an integer of 1 or more and n or less), a request processing means (7_i), and a file system ( 5_i), and a network (3) connecting all of these computer nodes.

【００５７】本発明においては、データ記憶装置上に格
納配置するべきデータを、おおまかに分割して粗部分デ
ータを作り、粗部分データをさらに分割して細部分デー
タを作り、同一の粗部分データに由来する細部分データ
を同一の計算機ノードのデータ記憶装置に記憶し、細部
分データの一つ以上の複製を互いに異なる、かつ原本と
も異なるデータ記憶装置に記憶する。In the present invention, the data to be stored and arranged on the data storage device is roughly divided into coarse part data, and the coarse part data is further divided into fine part data to form the same coarse part data. Is stored in the data storage device of the same computer node, and one or more copies of the small data are stored in different data storage devices that are different from the original data.

【００５８】データ記憶装置上のデータを必要とする処
理の要求を、データの原本を持つデータ記憶装置と複製
を持つデータ記憶装置に細部分データ単位で振り分け
る。A request for a process requiring data on the data storage device is distributed to a data storage device having original data and a data storage device having a copy in units of small data portions.

【００５９】具体的には、処理要求を、各データ記憶装
置または計算機ノードの稼働状態及び負荷状況に関する
情報に基づいて、細部分データの原本または複製のどれ
に対して割り当てるかを実行時に決定する、要求割り当
て手段（8_i）を備える。More specifically, at the time of execution, it is determined at the time of execution whether a processing request is to be assigned to an original or a copy of small-part data based on information on the operating state and load state of each data storage device or computer node. , Request allocating means (8_i).

【００６０】本発明においては、一つの粗部分データに
由来するすべての細部分データは、同一のデータ記憶装
置に記憶されているのに対し、その複製データは、複数
のデータ記憶装置に分散して記憶されている。In the present invention, all the fine data derived from one coarse data is stored in the same data storage device, while the duplicated data is distributed to a plurality of data storage devices. Is remembered.

【００６１】そして、処理要求を満たすために細部分デ
ータが必要になるたびに、要求割り当て手段は、該細部
分データの原本を使うか複製を使うかを決定する。Each time the small data is required to satisfy the processing request, the request allocating means determines whether to use the original or the duplicate of the small data.

【００６２】このため、すべてのデータ記憶装置及び計
算機ノードが稼働していて、おおむね均等に負荷がかか
っているときには、一つの粗部分データ内に収まる比較
的広い範囲のデータに対するアクセスを、これを格納す
る一つのデータ記憶装置に対するアクセスとして高々１
回の遠隔ディスクアクセスで行うことができる。For this reason, when all the data storage devices and the computer nodes are operating and the load is substantially equally applied, access to a relatively wide range of data that can be contained in one coarse partial data is performed. At most one access to one data storage device to store
It can be done with remote disk access times.

【００６３】また、あるデータ記憶装置または該データ
記憶装置を持つ計算機ノードの負荷が高い場合には、要
求割り当て手段は、該データ記憶装置上に格納されてい
る原本の代わりに複製を使うと決める比率を高めること
によって、該データ記憶装置または該計算機ノードの負
荷を、複数のデータ記憶装置または計算機ノードに拡散
させ、データ記憶装置間または計算機ノード間の負荷の
偏りを均等化することができる。When the load on a certain data storage device or a computer node having the data storage device is high, the request allocating means decides to use the copy instead of the original stored on the data storage device. By increasing the ratio, the load of the data storage device or the computer node can be spread to a plurality of data storage devices or computer nodes, and the load imbalance between the data storage devices or the computer nodes can be equalized.

【００６４】さらにあるデータ記憶装置または計算機ノ
ードが停止している場合には、要求割り当て手段が、該
データ記憶装置またはその計算機ノードが持つデータ記
憶装置上にある原本の代わりに、複製を使うと決めるこ
とによって、該データ記憶装置または計算機ノードの分
の負荷を、複数のデータ記憶装置または計算機ノード
に、代替させることができる。Further, when a certain data storage device or a computer node is stopped, the request allocating means uses the copy instead of the original on the data storage device or the data storage device of the computer node. By determining, the load of the data storage device or the computer node can be replaced by a plurality of data storage devices or computer nodes.

【００６５】なお、本発明において、データ記憶装置
は、ＨＤＤ（ハードディスク装置）に限定されず、媒体
への書き込み及び読み出しが自在とされる各種補助記憶
装置を含むものとする。また各計算機ノードが備えるデ
ータ記憶装置は論理的に一つのデータ記憶装置を構成し
ていればよく、単一のドライブのコントロール装置、ド
ライブユニットに限定されるものでないことは勿論であ
る。In the present invention, the data storage device is not limited to an HDD (hard disk device), but includes various auxiliary storage devices which can freely write and read data on and from a medium. Further, the data storage device provided in each computer node only needs to logically constitute one data storage device, and is of course not limited to a single drive control device and drive unit.

【００６６】本発明の実施の形態について図面を参照し
てより詳細に説明する。Embodiments of the present invention will be described in more detail with reference to the drawings.

【００６７】[0067]

【実施の形態１】図１は、本発明の第１の実施の形態の
構成を示す図である。なお、以下では、データ記憶装置
はディスク装置よりなるものとして説明する。図１を参
照すると、本発明の第１の実施の形態は、複数の計算機
ノード1_1〜1_n (nは２以上の整数)、ディスク装置2_1
〜2_n、及びＳＡＮ（System Area Network、システム
エリアネットワーク；システム内ネットワーク)３を備
えており、各計算機ノード1_1〜1_nは、ＬＡＮ(Local
Area Network；ローカルエリアネットワーク)４にも接
続している。Embodiment 1 FIG. 1 is a diagram showing a configuration of a first embodiment of the present invention. In the following, a description will be given assuming that the data storage device is a disk device. Referring to FIG. 1, a first embodiment of the present invention provides a plurality of computer nodes 1_1 to 1_n (n is an integer of 2 or more) and a disk device 2_1.
To 2_n and a SAN (System Area Network; network within the system) 3, and each of the computer nodes 1_1 to 1_n is a LAN (Local
Area Network (Local Area Network) 4 is also connected.

【００６８】各計算機ノード1_1〜1_nは、それぞれファ
イルシステム5_1〜5_n、要求受付手段6_1〜6_n、要求処
理手段7_1〜7_nを備えている。Each of the computer nodes 1_1 to 1_n includes a file system 5_1 to 5_n, a request receiving unit 6_1 to 6_n, and a request processing unit 7_1 to 7_n.

【００６９】ファイルシステム5_1〜5_nは、それぞれ、
要求割り当て手段8_1〜8_nを備えている。The file systems 5_1 to 5_n are respectively
It has request allocating means 8_1 to 8_n.

【００７０】各計算機ノード1_1〜1_nの各要求受付手段
6_i(但し、iは1以上n以下の整数)は、それぞれ、ディス
ク装置2_1〜2_n上のデータを必要とする処理要求を受け
取って、該処理要求を複数の部分処理要求に分割し、こ
れらの部分処理要求を、それぞれ要求処理手段7_1〜7_n
のいずれかに送り、それぞれの要求に対する応答として
返ってきた部分処理結果を一つの処理結果に統合する。Each request receiving means of each of the computer nodes 1_1 to 1_n
6_i (where i is an integer of 1 or more and n or less) receives a processing request that requires data on the disk devices 2_1 to 2_n, divides the processing request into a plurality of partial processing requests, and The partial processing requests are sent to request processing means 7_1 to 7_n, respectively.
And integrates the partial processing results returned as a response to each request into one processing result.

【００７１】要求処理手段7_1〜7_nは、要求受付手段6_
iは、部分処理要求に応えて、部分処理結果を作成す
る。The request processing means 7_1 to 7_n are connected to the request receiving means 6_
i creates a partial processing result in response to the partial processing request.

【００７２】ファイルシステム5_1〜5_nは、ファイルに
対する入出力要求を受け取ってディスク装置上のデータ
に対する入出力を行う。The file systems 5_1 to 5_n receive an input / output request for a file and perform input / output for data on a disk device.

【００７３】要求割り当て手段8_1〜8_nは、ファイルに
対する入出力要求を、ファイルの原本及び１つ以上の複
製の中から、いずれを用いて処理するかを決定する。The request allocating means 8_1 to 8_n determine which one of the original file and one or more copies of the file is used to process the input / output request for the file.

【００７４】図２は、本発明の第１の実施の形態の処理
を説明するための図である。図２を参照すると、ディス
ク装置上のデータは、ｎ個の粗部分データに分割され、
各粗部分データは、更にｍ個の細部分データ100_ij(但
し、iは1以上n以下の整数、jは1以上m以下の整数)に分
割されて、それぞれｒ個(但し、rは1以上の整数)の複製
110_ij,120_ij,...,1r0_ijと共に、以下の通り配置され
る。FIG. 2 is a diagram for explaining the processing according to the first embodiment of the present invention. Referring to FIG. 2, the data on the disk device is divided into n pieces of coarse data,
Each coarse part data is further divided into m pieces of fine part data 100_ij (where i is an integer of 1 or more and n or less, j is an integer of 1 or more and m or less) and r pieces (where r is 1 or more) Integer)
Along with 110_ij, 120_ij, ..., 1r0_ij, they are arranged as follows.

【００７５】ディスク装置2_i(但し、iは1以上n以下の
整数)上には、細部分データの原本100_i1〜100_im(但
し、mは1以上n-1以下の整数)があり、原本100_ij(jは1
以上m以下の整数)の複製110_ij〜1r0_ijは、ディスク装
置2_i以外のそれぞれ異なるディスク装置2_k(kは1以上n
以下の整数)上にある。On the disk device 2_i (where i is an integer of 1 or more and n or less), there are original data 100_i1 to 100_im (where m is an integer of 1 or more and n-1 or less) of the thin portion data. j is 1
The replicas 110_ij to 1r0_ij of integers equal to or greater than m and equal to or less than disk devices 2_i (k is 1 or more and n
On the following integer).

【００７６】次に図１及び図２を参照して、本発明の第
１の実施の形態の動作について説明する。Next, the operation of the first embodiment of the present invention will be described with reference to FIGS.

【００７７】要求受付手段6_1〜6_nは、それぞれLAN４
を経由して処理要求を受け付ける。要求受付手段6_k(但
し、kは1以上n以下の整数)は、処理要求を受け付ける
と、それが要求する処理のうち、ディスク装置2_i(但
し、iは1以上n以下の整数)に原本が記憶されている細部
分データ100_i1〜100_imだけを用いて実行できる部分を
要求するｎ個の部分処理要求200_1〜200_nを作成する。The request receiving means 6_1 to 6_n are connected to the LAN 4
A processing request is accepted via. Upon receiving the processing request, the request receiving means 6_k (where k is an integer of 1 or more and n or less) stores the original in the disk device 2_i (where i is an integer of 1 or more and n or less). It generates n partial processing requests 200_1 to 200_n requesting an executable part using only the stored small partial data 100_i1 to 100_im.

【００７８】そして、各部分処理要求200_iを、SAN３経
由で要求処理手段7_iに送る。Then, each partial processing request 200_i is sent to the request processing means 7_i via the SAN 3.

【００７９】要求処理手段7_iは、細部分データ100_i1
〜100_imに対する入出力要求をファイルシステム5_iに
送出する。The request processing means 7_i performs the sub-part data 100_i1
An input / output request for ~ 100_im is sent to the file system 5_i.

【００８０】ファイルシステム5_iは、細部分データ100
_ijに対する入出力要求を受け取ると、これを要求割り
当て手段8_iに渡す。The file system 5_i stores the subpart data 100
When an input / output request for _ij is received, it is passed to the request allocating means 8_i.

【００８１】要求割り当て手段8_iは、細部分データの
原本100_ij及び複製110_ij〜1r0_ijがどの計算機ノード
のディスク装置に記憶されているかを調べ、原本及び複
製を記憶している各ディスク装置及び、各ディスク装置
に直接接続する計算機ノードの稼働状態と、これらのデ
ィスク装置及び計算機ノードにかかっている負荷量を推
測し、動作中の計算機ノードに接続された動作中のディ
スク装置のうち、より負荷が軽いディスク装置または計
算機ノードを優先的に選択して、これに、細部分データ
100_ijに対する入出力要求を割り当てる。The request allocating unit 8_i checks which computer node stores the original data 100_ij and the replicas 110_ij to 1r0_ij of the small-part data, and checks each disk device storing the original data and the replica and each disk. Estimate the operating status of the computer nodes directly connected to the device and the load applied to these disk devices and the computer nodes, and among the operating disk devices connected to the operating computer nodes, the lighter the load Select a disk unit or computer node with priority
Assign an input / output request for 100_ij.

【００８２】ファイルシステム5_iは、計算機ノード2_i
に割り当てられた入出力要求を実行し、その結果を、要
求処理手段7_iに返す。The file system 5_i is a computer node 2_i
And executes the input / output request assigned to the request processing unit 7_i.

【００８３】また他の計算機ノード2_h(但し、hはi以外
の1以上n以下の整数)に割り当てられた入出力要求の処
理を、SAN３を介して計算機ノード2_hのファイルシステ
ム5_hに依頼し、その結果を受けとって、要求処理手段7
_iに返す。Also, a request is made to the file system 5_h of the computer node 2_h via the SAN 3 for processing of an input / output request assigned to another computer node 2_h (where h is an integer other than i and an integer from 1 to n). Upon receiving the result, the request processing means 7
Return to _i.

【００８４】さらに他の任意のファイルシステム5_l(但
し、lはi以外の1以上n以下の整数)からSAN３を介して入
出力要求の依頼を受け、これを実行して結果をファイル
システム5_lに返す。A request for an input / output request is received from another arbitrary file system 5 — l (where l is an integer other than i and an integer from 1 to n) via the SAN 3, the request is executed, and the result is sent to the file system 5 — 1. return.

【００８５】要求処理手段7_iは、必要なファイルの入
出力の全ての結果を得て、部分処理を完了すると、その
結果を要求受付手段6_kに返す。The request processing means 7_i obtains all the results of the necessary file input / output, and when the partial processing is completed, returns the result to the request receiving means 6_k.

【００８６】要求受付手段6_kは、要求処理手段7_1〜7_
nの全てから部分処理結果を受け取ると、これらを一つ
の処理結果にまとめて、LAN４経由で処理要求の送付元
に返送する。The request accepting means 6_k includes request processing means 7_1 to 7_
When the partial processing results are received from all of n, they are combined into one processing result and returned to the sender of the processing request via the LAN 4.

【００８７】次に、本発明の第１の実施の形態の作用効
果について説明する。Next, the operation and effect of the first embodiment of the present invention will be described.

【００８８】本発明の第１の実施の形態においては、デ
ィスク装置2_iの負荷が他のディスク装置に比べて高い
場合、要求割り当て手段8_1〜8_nが、ディスク装置2_i
上にある全ての細部分データを使う割合を減らすよう決
定し、これにより、予期できない入出力パターンによっ
て、特定のディスク装置の負荷が突出して高くなった場
合にも、その負荷を下げることができ、各ディスク装置
にかかる負荷を均等化することにより、システム全体の
処理性能を向上させることができる。In the first embodiment of the present invention, when the load of the disk device 2_i is higher than that of the other disk devices, the request allocating means 8_1 to 8_n sets the disk device 2_i
Decide to reduce the percentage of use of all the sub-part data above, so that if an unexpected I / O pattern causes the load on a particular disk unit to increase significantly, that load can be reduced. By equalizing the load on each disk device, the processing performance of the entire system can be improved.

【００８９】本発明の第１の実施の形態の第２の作用効
果は、ディスク装置2_iが停止している時、要求割り当
て手段8_1〜8_nがディスク装置2_i上にある全ての細部
分データに代えて、他の複数のディスク装置上にある、
その複製または原本を用いるように決定する。これによ
り、障害等予期しない理由により、あるいは保守目的な
どの予定された理由により、ディスク装置2_iが停止し
ても、処理を継続できる。The second operation and effect of the first embodiment of the present invention is that, when the disk device 2_i is stopped, the request allocating means 8_1 to 8_n replaces all the thin data on the disk device 2_i. On multiple other disk drives,
Decide to use the duplicate or original. As a result, even if the disk device 2_i is stopped for an unexpected reason such as a failure or for a predetermined reason such as a maintenance purpose, the processing can be continued.

【００９０】本発明の第１の実施の形態の第３の作用効
果は、あるディスク装置が記憶する複数の原本に対応す
る複製がそれぞれ異なるディスク装置にあるため、ある
ディスク装置が記憶する原本に対する入出力要求を対応
する複製を記憶する他のディスク装置に振り向けること
によって、一つのディスク装置に対する負荷を複数のデ
ィスク装置に拡散することができる。これにより、少な
いステップで数多くのディスク装置に負荷を拡散するこ
とができる。The third operation and effect of the first embodiment of the present invention is that a copy corresponding to a plurality of originals stored in a certain disk device is stored in different disk devices, respectively. By redirecting an I / O request to another disk device that stores a corresponding copy, the load on one disk device can be spread to a plurality of disk devices. As a result, the load can be spread to many disk devices in a small number of steps.

【００９１】[0091]

【実施例１】次に、上記した本発明の第１の実施の形態
についてさらに詳細に説明すべく、具体的な実施例に即
して説明する。以下では、本発明を全文検索サーバシス
テムに適用した実施例について説明する。なお、本発明
の一実施例の基本構成は、図１を参照して説明した前記
実施の形態の構成と同様とされる。Embodiment 1 Next, the first embodiment of the present invention will be described in more detail with reference to a specific embodiment. Hereinafter, an embodiment in which the present invention is applied to a full-text search server system will be described. The basic configuration of one embodiment of the present invention is the same as the configuration of the embodiment described with reference to FIG.

【００９２】ここで、全文検索とは、与えられた文書群
の中から、与えられた文字列群を含む文書を検索してそ
の一覧を求める処理である。ここではインデクスファイ
ルを用いて全文検索を実現する。インデクスファイルと
は、文書群中に出現する任意の単語から、それを含む文
書群中の全文書一覧を取り出せるようにしたもので、あ
らかじめ文書群中に出現する全単語について、それを含
む文書の一覧を求めて、各単語と対応する文書一覧の対
を木構造ないしハッシュ表により対応づけたデータ構造
をファイル中に収めたものである。Here, the full-text search is a process of searching for a document including a given character string group from a given document group and obtaining a list of the documents. Here, full-text search is realized using an index file. An index file is a file in which a list of all documents in a document group that includes the word that appears in the document group can be extracted from any word that appears in the document group. A list is obtained, and a data structure in which a pair of a document list corresponding to each word is associated with a tree structure or a hash table is stored in a file.

【００９３】本実施例では、細部分データにあたるもの
として、インデクスファイルを分割して作るサブインデ
クスを設ける。In the present embodiment, a sub-index created by dividing an index file is provided as the data corresponding to the small part data.

【００９４】ある文書群のインデクスファイルのサブイ
ンデクスは、その文書群を分割して作った部分文書群の
インデクスであると定める。It is determined that the sub-index of the index file of a certain document group is an index of a partial document group created by dividing the document group.

【００９５】図３は、本発明の一実施例をなす全文検索
サーバの構成の一例を示す図である。図３を参照する
と、８台の計算機ノードを備え、ディスク装置2_i(但
し、iは1以上8以下の整数)には、それぞれ4つのサブイ
ンデクスの原本500_i1〜500_i4と、サブインデクスの複
製600_jk(但し、kは1以上4以下の整数、jは整数であ
り、j=(i-k)mod8)が格納されている。FIG. 3 is a diagram showing an example of the configuration of a full-text search server according to an embodiment of the present invention. Referring to FIG. 3, eight computer nodes are provided, and disk devices 2_i (where i is an integer of 1 or more and 8 or less) include four sub-index originals 500_i1 to 500_i4 and a sub-index copy 600_jk ( Here, k is an integer of 1 or more and 4 or less, j is an integer, and j = (ik) mod8) is stored.

【００９６】要求割り当て手段8_iは、各計算機ノード
に割り当てられた入出力要求を記録する入出力要求割当
表9_iと、各計算機ノードの稼働状態を監視する状態監
視手段10_iを備える。The request allocating means 8_i includes an input / output request allocating table 9_i for recording the input / output requests allocated to each computer node, and a state monitoring means 10_i for monitoring the operating state of each computer node.

【００９７】図４は、本発明の一実施例におけるサブイ
ンデクスの原本500_ij及び複製600_ijの構成手順を説明
するための模式図である。FIG. 4 is a schematic diagram for explaining the procedure for configuring the original sub-index 500_ij and the replica 600_ij in one embodiment of the present invention.

【００９８】８万件の検索対象文書があり、それぞれ文
書ID00001〜80000が与えられている。これを先頭から順
に１万件ずつ８個に分け、ディスク装置2_1〜2_8に一つ
ずつ割り当てる。There are 80,000 documents to be searched, and document IDs 00001 to 80,000 are assigned to them. This is divided into eight pieces, each of which is 10,000 in order from the top, and is assigned to the disk devices 2_1 to 2_8 one by one.

【００９９】さらにディスク装置2_i(但し、iは1以上8
以下の整数)に割り当てた文書群を2500件ずつ４個に分
けて、それぞれについてサブインデクスを作成して500_
i1〜500_i4とする。Further, the disk device 2_i (where i is 1 or more and 8
The document group assigned to the following integer) is divided into four groups of 2500 documents, and sub-indexes are created for each
i1 to 500_i4.

【０１００】さらにサブインデクス500_ij(但し、jは1
以上4以下の整数)の複製を作成して600_ijとする。Further, the sub index 500_ij (where j is 1
A copy of the above (4 or less integer) is created and set to 600_ij.

【０１０１】図５は、サブインデクスの原本500_ij及び
複製600_ijのディスク装置2_1〜2_8上への配置を示す模
式図である。図５を参照すると、原本500_i1〜500_i4
(但し、iは1以上8以下の整数)は、すべてディスク装置
2_iに置く。FIG. 5 is a schematic diagram showing the arrangement of the sub-index original 500_ij and the replica 600_ij on the disk devices 2_1 to 2_8. Referring to FIG. 5, originals 500_i1 to 500_i4
(Where i is an integer from 1 to 8)
Put on 2_i.

【０１０２】h = (i + j) mod 8 の時、複製600_ih
は、ディスク装置2_hに置く。When h = (i + j) mod 8, the replica 600_ih
Is placed in the disk device 2_h.

【０１０３】図６は、本発明の一実施例における入出力
要求割当表9_1の構成の一例を示す図である。入出力要
求割当表9_iは、データ500_i1〜500_i4の原本を持つデ
ィスク装置2_iへの入出力要求キュー30_1と、データ500
_i1〜500_i4の複製600_i1〜600_i4を持つディスク装置2
_h〜2_k(但し、h = (i + 1) mod 8, k = (i + 4)mod
8)のそれぞれへの入出力要求キュー30_2〜30_5を備え
る。FIG. 6 is a diagram showing an example of the configuration of the input / output request assignment table 9_1 in one embodiment of the present invention. The input / output request assignment table 9_i includes an input / output request queue 30_1 for the disk device 2_i having the original data 500_i1 to 500_i4, and the data 500_i1 to 500_i4.
Disk unit 2 with duplicates 600_i1 to 600_i4 of _i1 to 500_i4
_h ~ 2_k (where h = (i + 1) mod 8, k = (i + 4) mod
8) The input / output request queues 30_2 to 30_5 for each of the above are provided.

【０１０４】図７は、本発明の一実施例の動作を説明す
るためのシーケンス図であり、クエリが計算機ノード1_
1の要求受付手段6_1に到着した場合の処理の流れを説
明するための図である。FIG. 7 is a sequence diagram for explaining the operation of one embodiment of the present invention.
FIG. 9 is a diagram for explaining the flow of processing when the request arrives at the first request receiving means 6_1.

【０１０５】図７を参照すると、要求受付手段6_1は、
クエリをそのまま複製して、全計算機ノード上の要求処
理手段7_1〜7_8に投入する。Referring to FIG. 7, the request receiving means 6_1
The query is copied as it is and input to the request processing means 7_1 to 7_8 on all the computer nodes.

【０１０６】各要求処理手段7_ｉ（１≦ｉ≦８）は、検
索を行うためにサブインデクス500_i1〜500_i4に対する
読み出し要求をファイルシステム5_iに発行する。Each request processing means 7_i (1 ≦ i ≦ 8) issues a read request for the sub-indexes 500_i1 to 500_i4 to the file system 5_i in order to perform a search.

【０１０７】ファイルシステム5_iは、要求割り当て手
段8_iに、サブインデクス500_ijに対する読み出し要求
を処理するべき計算機ノードの割り当てを依頼する。The file system 5_i requests the request allocating means 8_i to allocate a computer node that should process a read request to the subindex 500_ij.

【０１０８】要求割り当て手段8_iは、各読み出し要求
をキュー30_i〜30_(i+4)のどれかにつなぐ。The request allocating means 8_i connects each read request to one of the queues 30_i to 30_ (i + 4).

【０１０９】ファイルシステム5_iは、キュー30_iにつ
ながれた読み出し要求があれば先頭から順に取り出し
て、ディスク装置2_iから読み出しを実行し、また、キ
ュー30_(i+q)(但し、qは1以上4以下の整数)のどれかに
つながれた読み出し要求があれば、ファイルシステム5_
(i+q)にSAN３を経由して送信する。[0109] If there is a read request connected to the queue 30_i, the file system 5_i sequentially retrieves the read request from the head and executes reading from the disk device 2_i. Also, the queue 30_ (i + q) (where q is 1 or more and 4 If there is a read request connected to any of the following integers), the file system 5_
(i + q) via the SAN3.

【０１１０】ファイルシステム5_(i+q)は、要求された
データを、ディスク2_(i+q)装置から読み出し、SAN３経
由でファイルシステム5_iに返送する。[0110] The file system 5_ (i + q) reads the requested data from the disk 2_ (i + q) device and returns it to the file system 5_i via the SAN 3.

【０１１１】ファイルシステム5_iは、ディスク装置2_i
またはファイルシステム5_(i+q)から受けとった読み出
し結果を要求処理手段7_iに返す。The file system 5_i is the disk device 2_i
Alternatively, it returns the read result received from the file system 5_ (i + q) to the request processing means 7_i.

【０１１２】要求処理手段7_iは、読み出し要求に対す
る結果を、ディスク装置2_iまたはファイルシステム5_
(i+q)から得ると、その読み出し要求を、キューから取
り除く。The request processing means 7_i transmits the result of the read request to the disk device 2_i or the file system 5_
When obtained from (i + q), the read request is removed from the queue.

【０１１３】こうして要求処理手段7_iは、サブインデ
クス500_i1〜500_i4の原本または複製を用いて、クエリ
に対する該当文書IDリストを作成し、これを、要求受付
手段6_1に返送する。In this way, the request processing means 7_i creates a corresponding document ID list for the query using the originals or copies of the sub-indexes 500_i1 to 500_i4, and returns this to the request receiving means 6_1.

【０１１４】要求受付手段6_1は、要求処理手段7_1〜7_
8から該当文書IDリストを受け取って、これらを一つに
まとめ、クエリの発行元に返送する。The request receiving means 6_1 is composed of request processing means 7_1 to 7_
The document ID list is received from 8, and these are combined into one and returned to the query issuer.

【０１１５】図８は、本発明の一実施例における要求割
当手段8_iの動作を説明するための流れ図である。図８
を参照して、要求割当手段8_iの動作を説明する。FIG. 8 is a flow chart for explaining the operation of the request allocating means 8_i in one embodiment of the present invention. FIG.
The operation of the request assignment means 8_i will be described with reference to FIG.

【０１１６】要求割り当て手段8_iは、細部分データ500
_ijに対する、読み出し要求の割り当て依頼を受ける
と、500_ijの複製600_ijがどのディスク装置にあるかを
調べる（ステップＳ101）。ここではディスク装置2_hに
あるとする。The request allocating means 8 — i stores the subpart data 500
When a request to allocate a read request to _ij is received, it is checked which disk device has a copy 600_ij of 500_ij (step S101). Here, it is assumed that it is in the disk device 2_h.

【０１１７】次に、稼働状態監視手段10_iを参照して、
ディスク装置2_hが稼働しているかどうかを調べる（ス
テップＳ102）。Next, referring to the operating state monitoring means 10_i,
It is checked whether the disk device 2_h is operating (step S102).

【０１１８】停止している場合、次にディスク装置2_i
が稼動しているかどうかを調べる（ステップＳ108）。If stopped, then the disk device 2_i
It is checked whether or not is running (step S108).

【０１１９】これも停止しているならば、ディスク装置
2_iが読み出し処理の失敗を通知して終了する（ステッ
プＳ109）。If this is also stopped, the disk device
2_i notifies the failure of the read processing and ends (step S109).

【０１２０】一方、ステップS108で、ディスク装置2_i
が稼動しているならばキュー30_iに読み出し要求をつな
いで終了する。On the other hand, in step S108, the disk device 2_i
Is running, a read request is connected to the queue 30_i, and the process ends.

【０１２１】ステップS102で、ディスク2_hが稼動して
いるならば、次にディスク装置2_iが稼働しているかど
うかを調べる（ステップＳ103）。停止しているなら
ば、キュー30_hに読み出し要求をつないで（ステップＳ
10７）、終了する。If it is determined in step S102 that the disk 2_h is operating, it is checked whether the disk device 2_i is operating (step S103). If stopped, a read request is connected to the queue 30_h (step S
107), end.

【０１２２】ステップＳ103でディスク装置2_iが稼働し
ているならば、割当表9_iを参照して、キュー30_iと30_
hの長さを調べる（ステップＳ104）。If the disk device 2_i is operating in step S103, the queues 30_i and 30_i are referred to by referring to the allocation table 9_i.
The length of h is checked (step S104).

【０１２３】仮にキュー30_hにつないだとしたとしても
まだキュー30_iがキュー30_hの4倍以上長くなるなら
ば、キュー30_hにつなぎ（ステップＳ107）、４倍未満
になるならばキュー30_iに読み出し要求をつないで（ス
テップＳ106）、終了する。Even if the queue 30_i is connected to the queue 30_h, if the queue 30_i is still more than four times longer than the queue 30_h, the queue 30_i is connected to the queue 30_h (step S107). The connection is made (step S106), and the process ends.

【０１２４】次に要求割り当て手段8_ｉの動作を具体的
な例に即して詳細に説明する。Next, the operation of the request allocating means 8_i will be described in detail with reference to a specific example.

【０１２５】図６において、キュー30_1にはすでに入出
力要求a、b、cが、またキュー30_4にはｄが入ってい
る。ここで、500_11に対する入出力要求40_1が投入され
たとき、500_11の原本は、ディスク装置2_1に、その複
製600_11はディスク装置2_2にあり、この両ディスク装
置に対するキューを比べると、キュー30_1には、まだ３
個しか要求がないので、キュー30_2は空であるが、入出
力要求40_1はキュー30_1に入れる、と決める。In FIG. 6, the queue 30_1 already has input / output requests a, b, and c, and the queue 30_4 has d. Here, when the input / output request 40_1 for 500_11 is input, the original of 500_11 is in the disk device 2_1 and its copy 600_11 is in the disk device 2_2, and comparing the queues for both disk devices, the queue 30_1 has Still three
Since there are only requests, the queue 30_2 is empty, but the input / output request 40_1 is decided to be put in the queue 30_1.

【０１２６】次に500_12に対する入出力要求40_2が入来
したとき、500_12の原本はディスク装置2_1に、複製600
_12はディスク装置2_3にあり、この両ディスクに対する
キューを比べると、キュー30_1には、４個要求が入って
おり、キュー30_3は空なので、入出力要求40_2はキュー
30_3に入れると決める。Next, when an input / output request 40_2 for 500_12 arrives, the original of 500_12 is copied to the disk device 2_1 and the copy 600
_12 is in disk unit 2_3, and comparing the queues for both disks, queue 30_1 contains four requests and queue 30_3 is empty, so I / O request 40_2 is queue
Decide to put in 30_3.

【０１２７】次に、インデクス500_13に対する入出力要
求40_3が来たとき、500_13の原本はディスク装置2_1
に、複製600_13はディスク装置2_4にあるので、この両
ディスクに対応するキューを比べると、キュー30_1には
４個要求が入っており、キュー30_4にも1つ要求が入っ
ているので、入出力要求40_3はキュー30_1に入れると決
める。Next, when an input / output request 40_3 for the index 500_13 arrives, the original of 500_13 is stored in the disk device 2_1.
In addition, since the copy 600_13 is in the disk device 2_4, comparing the queues corresponding to both disks, the queue 30_1 contains four requests, and the queue 30_4 also contains one request. Request 40_3 decides to enter queue 30_1.

【０１２８】次に、サブインデクス500_14に対する入出
力要求40_4について決める前に、ディスク2_1での処理
が進み、キュー30_1から入出力要求ａとｂが取り除かれ
たとする。Next, it is assumed that before the input / output request 40_4 for the sub-index 500_14 is determined, the processing on the disk 2_1 proceeds, and the input / output requests a and b are removed from the queue 30_1.

【０１２９】500_14の原本はディスク2_1に、複製はデ
ィスク2_5にあり、両ディスクに対するキューを比べる
と、キュー30_1には３個しか要求がないので、キュー30
_5が空であるが、要求40_4はキュー30_4に入れると決め
る。The original of 500_14 is on disk 2_1 and the copy is on disk 2_5. When the queues for both disks are compared, there are only three requests for queue 30_1.
_5 is empty, but request 40_4 decides to queue 30_4.

【０１３０】この例では、原本を使うか複製を使うか
を、キューの中の要求の個数の比が４倍以上か未満かで
決めているが、判定基準とする比としては、任意の数値
を使うことができる。最適な値は、原本にアクセスする
コストと複製にアクセスするコストに基づいて決まる。In this example, whether to use the original or to use the copy is determined based on the ratio of the number of requests in the queue being four times or more, but the ratio used as a criterion is an arbitrary numerical value. Can be used. The optimal value depends on the cost of accessing the original and the cost of accessing the duplicate.

【０１３１】また、比ではなく個数の差が一定値以上か
未満かで決めるようにしてもよい。Also, the difference may be determined not by the ratio but by the difference between the numbers being equal to or more than a certain value.

【０１３２】また、各ディスクの最近の応答時間や累積
アクセス時間の大小に基づいて定めることもできるし、
各ディスクが一つの入出力要求を終え次第、そのディス
ク上のデータに対する最も古い要求を割り当てることも
できる。Further, it can be determined based on the recent response time of each disk or the magnitude of the accumulated access time,
As soon as each disk has completed one I / O request, the oldest request for data on that disk may be assigned.

【０１３３】[0133]

【実施の形態２】次に本発明の第２の実施の形態につい
て説明する。図９は、本発明の第２の実施の形態の構成
を示す図である。図９を参照すると、本発明の第２の実
施の形態は、図１に示した前記第１の実施の形態におけ
る要求割り当て手段8_1〜8_nに代えて、要求受付手段6_
1〜6_nに、第２の要求割り当て手段21_1〜21_nを備えて
いる点が相違している。[Embodiment 2] Next, a second embodiment of the present invention will be described. FIG. 9 is a diagram showing a configuration of the second exemplary embodiment of the present invention. Referring to FIG. 9, the second embodiment of the present invention is different from the first embodiment shown in FIG. 1 in that the request allocating units 8_1 to 8_n are replaced with request accepting units 6__.
1 to 6_n are provided with second request allocating means 21_1 to 21_n.

【０１３４】要求受付手段6_iは、LAN４経由でクライア
ントから処理要求を受けとると、異なる細部分データ50
0_ij(但し、iは1以上n以下の整数、jは1以上m以下の整
数)を必要とするｎ×ｍ個の部分処理要求を作成する。Upon receiving the processing request from the client via the LAN 4, the request receiving means 6_i receives the different sub-data 50
Create n × m partial processing requests that require 0_ij (where i is an integer between 1 and n, and j is an integer between 1 and m).

【０１３５】そして各部分処理要求をどの要求処理手段
に送付するかの決定を第２の要求割り当て手段21_iに依
頼する。第２の要求割り当て手段21_iは、細部分データ
500_ijを必要とする部分処理要求を、その原本をディス
ク装置2_i上に持つ要求処理手段7_iに投入するか、その
複製600_ijをディスク装置2_h上ににもつ要求処理手段7
_h(但し、hはiとは異なる1以上n以下の整数)に投入する
かを決定する。Then, it requests the second request allocating means 21_i to determine which request processing means to send each partial processing request to. The second request allocating means 21_i stores
The partial processing request that requires 500_ij is input to the request processing unit 7_i having the original on the disk device 2_i, or the request processing unit 7 having the duplicate 600_ij on the disk device 2_h.
_h (where h is an integer from 1 to n different from i) is determined.

【０１３６】要求受付手段6_iは、要求割当手段21_iの
決定に従って、部分処理要求を要求処理手段に送付す
る。The request receiving means 6_i sends a partial processing request to the request processing means according to the determination of the request allocating means 21_i.

【０１３７】要求処理手段7_kは、ファイルシステム5_i
を経由してディスク5_i上の細部分データの原本または
複製をアクセスして部分処理要求を実行し、その結果を
要求受付手段6_iに返す。The request processing means 7_k is the file system 5_i
To access the original or the copy of the thin part data on the disk 5_i to execute the partial processing request, and return the result to the request receiving means 6_i.

【０１３８】要求受付手段は、ｎ個の部分処理要求の結
果を使って処理要求の実行し、その結果をクライアント
に返す。The request receiving means executes the processing request using the results of the n partial processing requests, and returns the result to the client.

【０１３９】次に、本発明の第２の実施の形態の作用効
果について説明する。Next, the function and effect of the second embodiment of the present invention will be described.

【０１４０】本発明の第２の実施の形態では、細部分デ
ータの原本500_ijをディスク装置2_iにもつ計算機ノー
ド1_i(但し、iは1以上n以下の整数)の負荷が他の計算機
ノードに比べて高い場合、第２の要求割り当て手段21_i
が細部分データ500_ijを必要とする部分処理要求を、複
製600_ijをディスク装置2_hに持つ他の計算機ノード1_h
に割り当てる。このため、特定の計算機ノードの負荷が
突出して高くなってもその負荷を下げることができる。In the second embodiment of the present invention, the load of the computer node 1_i (where i is an integer of 1 or more and n or less) having the original 500_ij of the small part data in the disk device 2_i is smaller than that of the other computer nodes. The second request allocation means 21_i
Sends a partial processing request that requires the thin partial data 500_ij to another computer node 1_h that has a copy 600_ij in the disk device 2_h.
Assign to For this reason, even if the load on a specific computer node becomes prominently high, the load can be reduced.

【０１４１】本発明の第２の実施の形態の第２の作用効
果は、ディスク装置2_iまたは計算機ノード1_iが停止し
ている時、第２の要求割り当て手段21_1〜21_nが計算機
ノード1_iに代えて、ディスク装置2_iに記憶されている
細部分データの複製を持つ他のディスク装置を持つ他の
複数の計算機ノードを使うように決定する。The second operation and effect of the second embodiment of the present invention is that when the disk device 2_i or the computer node 1_i is stopped, the second request allocating means 21_1 to 21_n replaces the computer node 1_i. It is decided to use another plurality of computer nodes having another disk device having a copy of the thin portion data stored in the disk device 2_i.

【０１４２】これにより、障害など予期しない理由によ
り、あるいは保守目的などの予定された理由により、デ
ィスクが停止しても、処理を継続できる。Thus, even if the disk is stopped for unexpected reasons such as a failure or for a predetermined reason such as a maintenance purpose, the processing can be continued.

【０１４３】本発明の第２の実施の形態の第３の作用効
果は、あるディスク装置が記憶する複数の原本に対応す
る複製がそれぞれ異なるディスク装置に置かれているた
め、あるディスク装置が記憶する原本を必要とする部分
処理要求を対応する複製を記憶する他のディスク装置を
持つ計算機ノードに割り当てることによって、一つの計
算機ノードの負荷を複数の計算機ノードに拡散すること
ができる。これにより、少ないステップで数多くの計算
機ノードに負荷を拡散することができる。The third operation and effect of the second embodiment of the present invention is that a copy corresponding to a plurality of originals stored in a certain disk device is placed in different disk devices, respectively. By allocating a partial processing request requiring an original to a computer node having another disk device that stores a corresponding copy, the load of one computer node can be spread to a plurality of computer nodes. Thus, the load can be spread to many computer nodes in a small number of steps.

【０１４４】[0144]

【実施例２】次に本発明の第２の実施の形態について具
体的な実施例に即してその動作を説明する。Embodiment 2 Next, the operation of the second embodiment of the present invention will be described with reference to a specific embodiment.

【０１４５】図１０は、本発明の第２の実施例をなす全
文検索サーバシステムの構成を示す図である。図３に示
した前記実施例の構成との相違は、第２の要求割り当て
手段21_1〜21_8がそれぞれひとつずつ要求受付手段6_1
〜6_8に備わっていること、及び、９台目の計算機ノー
ド1_9を備え、LAN４と接続されていることである。FIG. 10 is a diagram showing the configuration of a full-text search server system according to a second embodiment of the present invention. The difference from the configuration of the embodiment shown in FIG. 3 is that the second request allocating units 21_1 to 21_8 are each provided with one request receiving unit 6_1.
6_8, and a ninth computer node 1_9, which is connected to the LAN 4.

【０１４６】第２の要求割当手段21_1〜21_8は、それぞ
れ第２の状態監視手段22_1〜22_8を備えており、また、
計算機ノード1_9は、負荷監視手段23と負荷配分再計算
手段24を備える。The second request allocating means 21_1 to 21_8 include second state monitoring means 22_1 to 22_8, respectively.
The computer node 1_9 includes a load monitoring unit 23 and a load distribution recalculation unit 24.

【０１４７】第２の要求割り当て手段21_1〜21_8は、い
ずれも各データ500_ijごとに複製使用目標比率を記憶し
ており、この比率を満たすように、部分処理要求を各デ
ータの原本と複製に割り当てる。Each of the second request allocating means 21_1 to 21_8 stores a copy use target ratio for each data 500_ij, and allocates a partial processing request to the original and the copy of each data so as to satisfy this ratio. .

【０１４８】要求処理手段7_1〜7_8は、一つ部分処理要
求を処理するごとに、それに要した実時間を測定し、測
定した時間を、LAN4を経由して、負荷監視手段12に送
る。The request processing means 7_1 to 7_8 each time one partial processing request is processed, measures the actual time required for the request, and sends the measured time to the load monitoring means 12 via the LAN 4.

【０１４９】負荷監視手段12は、要求処理手段ごとに処
理にかかった実時間の累積値を検索負荷として記録し、
一定時間ごとに、これらの変数の値を、負荷配分再計算
手段13に送って、累積値をゼロに戻して記録を続ける。The load monitoring means 12 records, as a search load, the accumulated value of the actual time required for processing for each request processing means,
At regular time intervals, the values of these variables are sent to the load distribution recalculating means 13 to return the accumulated value to zero and continue recording.

【０１５０】負荷配分再計算手段13は、要求処理手段7_
1〜7_8の最近の一定時間の検索負荷を受け取る毎に、各
データ500_ijの複製使用目標比率から、新しい複製使用
目標比率を定め、全ての第２の要求割り当て手段21_1〜
21_8に送る。The load distribution recalculating means 13 is provided with the request processing means 7_
Every time a search load of 1 to 7_8 is received for a predetermined period of time, a new replication usage target ratio is determined from the replication usage target ratio of each data 500_ij, and all the second request allocation means 21_1 to
Send to 21_8.

【０１５１】第２の要求割り当て手段は、新しい複製使
用目標比率を受け取ると、それを記憶し、以降その値を
用いて部分処理要求の割り当てを行う。The second request allocating means, upon receiving the new copy use target ratio, stores it and thereafter allocates the partial processing request using the value.

【０１５２】図１１は、本発明の第２の実施例における
第２の要求割当手段21_iの処理を説明するための流れ図
である。図１１を参照すると、まず細部分データ500_ij
の原本及びその複製600_ijがどのディスク装置にあるか
を調べる（ステップS201）。ここでは、原本がディスク
2_iに、複製がディスク装置2_hにあるとする。FIG. 11 is a flow chart for explaining the processing of the second request allocating means 21_i in the second embodiment of the present invention. Referring to FIG. 11, first, the small part data 500_ij
It is checked in which disk device the original and its copy 600_ij are located (step S201). Here, the original is a disc
It is assumed that a copy exists in the disk device 2_h at 2_i.

【０１５３】次に、状態監視手段22_iを参照して、計算
機ノード1_i及び1_hが稼働しているかどうかを調べる
（ステップS202、S208）。どちらも稼働していなけれ
ば、割り当て失敗を通知して終了する（ステップS20
9）。Next, it is determined whether or not the computer nodes 1_i and 1_h are operating with reference to the state monitoring means 22_i (steps S202 and S208). If neither of them is operating, a notification of allocation failure is given and the process ends (step S20).
9).

【０１５４】計算機ノード1_iだけが稼働していれば、
計算機ノード1_iに部分処理要求を割り当て（ステップS
206）、計算機ノード1_hだけが稼働していれば、計算機
ノード1_hに部分処理要求を割り当てる（ステップS20
7）。If only the computer node 1_i is operating,
Allocate a partial processing request to the computer node 1_i (step S
206), if only the computer node 1_h is operating, assign a partial processing request to the computer node 1_h (step S20)
7).

【０１５５】両方が稼働していれば、乱数を用い（ステ
ップS204）、計算機ノード1_hを選択する確率が500_ij
の複製仕様目標比率となるように、計算機ノード1_h
（ステップS207）、または計算機ノード1_iにに部分処
理要求を割り当てる。If both are operating, the random number is used (step S204), and the probability of selecting the computer node 1_h is 500_ij
Computer node 1_h so that the replication specification target ratio of
(Step S207) Alternatively, a partial processing request is assigned to the computer node 1_i.

【０１５６】次に、負荷配分再計算手段13が新しい複製
仕様目標比率を定める手順を説明する。Next, the procedure in which the load distribution recalculating means 13 determines a new replication specification target ratio will be described.

【０１５７】細部分データ500_ijに関する新しい目標比
率R_ijは、原本500_ijを記録するディスク2_iを持つ計
算機ノード1_i上の要求処理手段7_iの最近一時間の検索
負荷をL_i、複製600_ijを記録するディスク2_hを持つ計
算機ノード上の要求処理手段7_hの最近の一定時間の検
索負荷をL_h、L_iとL_hの平均をM_ih、古い目標比率をR
_ij_oldとすると、新しい目標比率Rは次式のように定め
る。The new target ratio R_ij relating to the small-part data 500_ij is obtained by setting the search load L_i of the request processing means 7_i on the computer node 1_i having the disk 2_i for recording the original 500_ij to L_i and the disk 2_h for recording the replica 600_ij. L_h is the search load of the request processing means 7_h on the computer node having the latest fixed time, M_ih is the average of L_i and L_h, and R is the old target ratio.
Assuming _ij_old, the new target ratio R is determined as follows.

【０１５８】R_ij = R_ij_old + (1 - R_ij_old) × (L
_i - M_ih) / L_i ； if L_i>= M_ihR_ij = R_ij_old + (1−R_ij_old) × (L
_i-M_ih) / L_i; if L_i> = M_ih

【０１５９】R_ij = R_ij_old * L_i / M_ih ； i
f L_i < M_ihR_ij = R_ij_old * L_i / M_ih; i
f L_i <M_ih

【０１６０】これにより、原本側の負荷が高ければ複製
を使う比率をより高く、複製側の負荷が高ければ複製
を使う比率をより低く定めることになる。As a result, if the load on the original side is high, the ratio of using the copy is set higher, and if the load on the copy side is high, the ratio of using the copy is set lower.

【０１６１】本実施例では、複製使用目標比率を満たす
ために、乱数を発生して確率的にこの比率を満たす
が、本発明の実施方法はこれに限られるものではない。
たとえば、原本及び複製それぞれを用いた回数を記録
しておき、複製使用実績比率を求めて、これが目標比
率を下回っているならば複製を、そうでなければ原本
を用いるよう選択してもよい。In this embodiment, a random number is generated and the ratio is stochastically satisfied in order to satisfy the copy use target ratio, but the method of the present invention is not limited to this.
For example, it is possible to record the number of times each of the original and the copy has been used, determine the copy usage ratio, and choose to use the copy if the ratio is below the target ratio, otherwise use the original. Good.

【０１６２】また、本実施例では、新しい目標比率を、
特定の数式により算出しているが、本発明において、新
しい目標比率の設定方法は、この特定の式に限定される
ものではない。In this embodiment, the new target ratio is
Although the calculation is performed using a specific formula, in the present invention, a method of setting a new target ratio is not limited to this specific formula.

【０１６３】[0163]

【発明の効果】以上説明したように、本発明によれば下
記記載の効果を奏する。As described above, according to the present invention, the following effects can be obtained.

【０１６４】本発明の第１の効果は、粗分割データに収
まる比較的広い範囲のデータに対するアクセスが、多く
の場合、高々１回の遠隔ディスクアクセスで行うことが
できる、ということである。A first effect of the present invention is that access to a relatively wide range of data that can be accommodated in the coarsely divided data can be performed in many cases with at most one remote disk access.

【０１６５】その理由は、本発明においては、同一粗分
割データに由来する全ての細分割データが一つのディス
ク装置に格納されるようにデータを配置しているため、
そのディスク装置が稼働しており、特に負荷が高くない
限り、該ディスク装置に対するアクセスのみで、用が足
りるためである。The reason is that, in the present invention, data is arranged so that all the subdivided data derived from the same coarsely divided data are stored in one disk device.
This is because the disk device is operating, and unless the load is particularly high, access to the disk device alone is sufficient.

【０１６６】本発明の第２の効果は、ディスク装置また
は計算機ノードの負荷の偏りを削減できる、ということ
である。A second effect of the present invention is that the load imbalance of a disk device or a computer node can be reduced.

【０１６７】その理由は、本発明においては、実行時に
ファイルの原本と複製のどちらにアクセスするかを決定
することにより、負荷の高いディスクまたは計算機ノー
ドから他の複数のディスク装置または計算機ノードに負
荷を拡散させることができるためである。The reason for this is that, in the present invention, by determining whether to access the original file or the copy of the file at the time of execution, the load on a disk or computer node with a high load is reduced to a plurality of other disk devices or computer nodes. This is because it can diffuse.

【０１６８】本発明の第３の効果は、システムを構成す
る一部のディスク装置または計算機ノードが停止してい
る時にも、処理を継続することができる、ということで
あその理由は、本発明においては、あるディスク装置ま
たは計算機ノードが停止している場合、該ディスク装
置または該計算機ノードが持つディスク装置に記録され
ている細部分データの原本に代えて、対応する複製を用
いるように、実行時に切替制御する構成としたためあ
る。The third effect of the present invention is that the processing can be continued even when some of the disk devices or computer nodes constituting the system are stopped. In the above, when a certain disk device or a computer node is stopped, execution is performed such that a corresponding copy is used instead of the original of the fine part data recorded on the disk device or the disk device of the computer node. This is because the switching control is sometimes performed.

【０１６９】本発明の第４の効果は、あるディスク装置
または計算機ノードの負荷を、他の数多くのディスク装
置または計算機ノードに少ないステップで拡散して、代
替させることができる、ということである。A fourth effect of the present invention is that the load of a certain disk device or computer node can be spread to and replaced by many other disk devices or computer nodes in a small number of steps.

【０１７０】その理由は、本発明において、一つのディ
スク装置が記憶するデータの複製が、複数のディスク装
置に分散して記憶されているので、一つのディスク装置
上の原本に換えて、複数のディスク装置上の複製を用い
ることができ、また一つの原本を記憶しているディスク
装置を持つ計算機ノードに換えて、対応する複製を記憶
しているディスク装置を持つ複数の計算機ノードを用い
ることができるからである。The reason is that, in the present invention, a copy of data stored in one disk device is distributed and stored in a plurality of disk devices. A copy on a disk device can be used, and a plurality of computer nodes having a disk device storing the corresponding copy can be used instead of a computer node having a disk device storing one original. Because you can.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態の構成を示す図であ
る。FIG. 1 is a diagram showing a configuration of a first exemplary embodiment of the present invention.

【図２】本発明の第１の実施の形態における細部分デー
タの原本及び複製の作成手順を示す模式図である。FIG. 2 is a schematic diagram showing a procedure for creating an original and a copy of fine part data according to the first embodiment of the present invention.

【図３】本発明の第１の実施例をなす全文検索システム
の構成を示す図である。FIG. 3 is a diagram showing a configuration of a full-text search system according to a first embodiment of the present invention.

【図４】本発明の第１の実施例におけるサブインデクス
の原本及び複製の作成手順を示す図である。FIG. 4 is a diagram showing a procedure for creating an original and a copy of a sub-index according to the first embodiment of the present invention.

【図５】本発明の第１の実施例におけるサブインデクス
の原本及び複製のディスク上の配置を示す図である。FIG. 5 is a diagram showing an arrangement of sub-index originals and duplicates on a disk in the first embodiment of the present invention.

【図６】本発明の第１の実施例における要求割り当て手
段の動作の具体例を示す図である。FIG. 6 is a diagram showing a specific example of the operation of the request assignment unit in the first embodiment of the present invention.

【図７】本発明の第１の実施例の動作を示すシーケンス
図である。FIG. 7 is a sequence diagram showing an operation of the first exemplary embodiment of the present invention.

【図８】本発明の第１の実施例における要求割り当て手
段の処理を説明するためのフローチャートである。FIG. 8 is a flowchart illustrating a process performed by a request allocating unit according to the first embodiment of this invention.

【図９】本発明の第２の実施の形態の構成を示す図であ
る。FIG. 9 is a diagram showing a configuration of a second exemplary embodiment of the present invention.

【図１０】本発明の第２の実施例をなす全文検索システ
ムの構成を示す図である。FIG. 10 is a diagram showing a configuration of a full-text search system according to a second embodiment of the present invention.

【図１１】本発明の第２の実施例における第２の要求割
り当て手段の処理を示すフローチャートである。FIG. 11 is a flowchart illustrating a process of a second request allocating unit according to the second embodiment of this invention.

[Explanation of symbols]

1_1〜1_n 計算機ノード 2_1〜2_n ディスク３ SAN(System Area Network、システム内ネットワー
ク) ４ LAN(Local Area Network、構内ネットワーク) 5_1〜5_n ファイルシステム 6_1〜6_n 要求受付手段 7_1〜7_n 要求処理手段 8_1〜8_n 要求割り当て手段 9_1〜9_n 要求割り当て表 10_1〜10_n 稼働状態監視手段 21_1〜21_n 第２の要求割り当て手段 22_1〜22_n 第２の稼働状態関し手段 23 負荷状態監視手段 24 負荷配分再計算手段 30_1〜30_5 入出力要求キュー 40_1〜40_4 入出力要求 100_11〜100_nm 細部分データの原本 1r0_11〜1r0_nm 細部分データの複製 200_1〜200_n 部分処理要求 500_11〜500_84 サブインデクスの原本 600_11〜600_84 サブインデクスの複製1_1 to 1_n Computer node 2_1 to 2_n Disk 3 SAN (System Area Network, network in system) 4 LAN (Local Area Network, local network) 5_1 to 5_n File system 6_1 to 6_n Request receiving means 7_1 to 7_n Request processing means 8_1 to 8_n Request allocating means 9_1 ~ 9_n Request allocating table 10_1 ~ 10_n Operating status monitoring means 21_1 ~ 21_n Second request allocating means 22_1 ~ 22_n Second operating status related means 23 Load status monitoring means 24 Load distribution recalculating means 30_1 ~ 30_5 Output request queue 40_1 ~ 40_4 I / O request 100_11 ~ 100_nm Original data of thin partial data 1r0_11 ~ 1r0_nm Duplication of thin partial data 200_1 ~ 200_n Partial processing request 500_11 ~ 500_84 Original of sub index 600_11 ~ 600_84 Duplication of sub index

Claims

[Claims]

1. A non-shared cluster system comprising a plurality of computer nodes, wherein each of the computer nodes has a local unique data storage device, and performs processing depending on data on the data storage device. Server
A coarsely divided data obtained by roughly dividing data to be stored in the data storage device, wherein the coarsely divided data is further divided to generate finely divided data; All of the subdivided data to be copied are stored in the same data storage device, and copies of all the subdivided data derived from the same coarsely divided data are copied to a data storage device different from the data storage device in which the original is stored. And a means for storing the data in a plurality of different data storage devices.

2. The server system according to claim 1, further comprising: a data storage device for storing an original of said subdivided data; and a data storage device for storing a copy of said subdivided data. A server system comprising an input / output request allocating means for allocating the input / output request to be executed using any one of the above.

3. The server system according to claim 1, wherein a request for a process requiring certain subdivided data is sent to a computer node having a data storage device for storing an original of the subdivided data; And a processing request allocating means for allocating the processing request to a computer node having a data storage device for storing a copy of the processing request.

4. The server system according to claim 2, wherein a load on a data storage device for recording an original of a certain subdivided data and a means for estimating a load on a data storage device for recording a copy of the subdivided data, Request allocating means for distributing an input / output request for the subdivided data so as to execute an input / output for the subdivided data using a data storage device estimated to have a lighter load by the load estimating unit; And a server system comprising:

5. The server system according to claim 3, further comprising: a load on a computer node having a data storage device for recording an original of certain subdivided data; and a data storage device for recording a copy of said subdivided data. Means for estimating the load on the computer node; and means for performing the processing requiring the subdivision data by using the computer node estimated to have a lighter load by the means for estimating the load. A server system, comprising: request allocation means for allocating a processing request requiring data.

6. The server system according to claim 2, further comprising: means for monitoring an operation state of each of the computer nodes; a computer node having a data storage device for recording an original of certain subdivided data; Among the computer nodes having the data storage device that records the data copy, the operating computer node is used to distribute the input / output request for the subdivided data so as to execute the input / output to the subdivided data. A server system comprising request assignment means.

7. The server system according to claim 3, further comprising: means for monitoring an operation state of each of the computer nodes; a computer node having a data storage device for recording an original of certain subdivided data; A processing request that requires the subdivided data so as to execute a process that requires the subdivided data using an operating computer node among computer nodes having a data storage device that records a copy of data. A server system comprising request allocation means for distribution.

8. A computer system comprising: a plurality of computer nodes connected to each other to form a cluster; each of the plurality of computer nodes having a local data storage device and receiving an input / output request for a file to receive the data; A file system for inputting / outputting data on a storage device, a request receiving unit for receiving a request from a client, and a request processing unit, wherein the request receiving unit of each of the computer nodes includes: Receiving a processing request requiring data, dividing the processing request into a plurality of partial processing requests, sending these partial processing requests to any of the request processing means of a plurality of computer nodes, and Is integrated into one processing result returned as a response to The request processing means creates a partial processing result in response to the partial processing request from the request receiving means, and the file system of each of the computer nodes transmits an input / output request for a file to an original file and one or more files. A request allocating means for determining which of the duplicates is to be processed; first, the data to be stored in the data storage device is roughly divided, and this is further subdivided into the same coarsely divided data; A cluster system, wherein all the subdivided data derived from the above are stored in the same data storage device, and a copy of each of the subdivided data is stored in a different data storage device from the original.

9. An input / output request assignment table for recording the input / output requests assigned to each of the computer nodes, and a status monitoring means for monitoring the operating status of each of the computer nodes. Distribute the processing request that requires each of the subdivided data to a computer node equipped with a data storage device that stores the original and a computer node equipped with a data storage device that stores a copy based on the respective load conditions, 9. The server system according to claim 8, wherein:

10. A computer system comprising: a plurality of computer nodes connected to each other to form a cluster; each of the plurality of computer nodes includes a local data storage device, and receives a data input / output request for a file; A file system for inputting / outputting data on the storage device, a request receiving unit for receiving a request from a client, and a request processing unit, wherein the data to be stored in the data storage device is first roughly divided, This is subdivided, and all the subdivided data derived from the same coarsely divided data are stored in the same data storage device, and a copy of each of the subdivided data is separated from the original by a different data storage device. The request receiving means comprises a request allocating means, wherein the request receiving means receives a processing request from a client. Upon receiving the request, the request allocating unit requests the request allocating unit to create a plurality of partial processing requests that require different pieces of partial data, and determine which computer node to send the respective partial processing requests to the request processing unit. The request allocating means inputs a partial processing request requiring the fine partial data to one request processing means having the original data on the data storage device, or another request having a duplicate thereof on the data storage device. The request accepting means sends a partial processing request to the request processing means in accordance with the determination of the request allocating means, and the request processing means stores the data through the file system. A partial processing request is executed by accessing an original or a copy of the fine partial data on the device, and the result is returned to the request receiving means. Server system of using the result to the execution of the processing request, and returns the results to the client, and wherein the.

11. One of the plurality of computer nodes includes a load monitoring unit and a load distribution recalculation unit, and the request allocating units of the other computer nodes respectively determine an operation state of the own node. The request allocating means includes a duplication use target ratio for each data, and allocates a partial processing request to an original and a copy of each data so as to satisfy this ratio. The request processing means, each time one partial processing request is processed, measures the actual time required for the partial processing, sends the measured time to the load monitoring means, and the load monitoring means The cumulative value of the real time required for processing is recorded as a search load, and at regular time intervals, the values of these variables are sent to the load distribution recalculating means, and the cumulative value is reset. The load distribution recalculating means determines a new replication usage target ratio from the replication usage target ratio of each data every time receiving the latest search load of the request processing means, and 11. The method according to claim 10, wherein the request allocating means receives the new replication use target ratio, stores the received target usage ratio, and allocates a partial processing request using the value. Server system.

12. A data storage method for a non-shared cluster system including a plurality of computer nodes, wherein each of said computer nodes has a locally unique data storage device. First, data to be stored in a data storage device is roughly stored. And further subdivides the coarsely divided data, stores all the subdivided data derived from the same coarsely divided data in the same data storage device, and copies each of the subdivided data separately from the original. And storing the data in different data storage devices.

13. A method of processing data stored in a data storage device by the data storage method of a non-shared cluster system according to claim 12, wherein a processing request requiring each of the subdivided data is stored as an original. A load sharing method for a non-shared cluster system, wherein the load is distributed to a computer node having a data storage device and a computer node having a data storage device that stores a copy of the data storage device, based on respective load conditions.