JPH0991261A

JPH0991261A - Parallel computer

Info

Publication number: JPH0991261A
Application number: JP7266474A
Authority: JP
Inventors: Masahiko Nakahara; 雅彦中原; Hiroyuki Kumazaki; 裕之熊▲崎▼; Kenichi Soejima; 健一副島
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1995-09-20
Filing date: 1995-09-20
Publication date: 1997-04-04

Abstract

PROBLEM TO BE SOLVED: To provide the parallel computer which performs multinode simultaneous input/output processing at a high speed. SOLUTION: Calculation nodes 10 to 15 and input/output nodes 30 of the parallel computer are arranged like a matrix, and partial nodes out of plural calculation nodes 10 to 15 are provided with caches 50 to 52 as cache nodes 20 to 22, and input/output nodes 30 are provided with caches 53. In accordance with designation for the start of the parallel computer, one or more cache nodes 20 to 22 which can communicate with some of input/output nodes 30 are set, and one or more calculation nodes 10 to 15 which can communicate with cache nodes 20 to 22 are set. When calculation nodes 10 to 15 read out data from a secondary storage device 40, they transmit a data request message to communicatable cache nodes 20 to 22. Cache nodes 20 to 22 transmit data to calculation nodes 10 to 15 if data exists in caches 50 to 52, but cache nodes 20 to 22 read out data from the secondary storage device 40 to transmit it to calculation nodes 10 to 15 if data doesn't exist in caches 50 to 52.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ローカルメモリとプロ
セッサを持つ複数の計算ノードと、ローカルメモリとプ
ロセッサと２次記憶装置を持つ複数の入出力ノードを有
し、これらがネットワークで相互に接続し、各ノードが
並列にプログラムを実行する並列計算機に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention has a plurality of calculation nodes each having a local memory and a processor, and a plurality of input / output nodes each having a local memory, a processor and a secondary storage device, which are mutually connected by a network. Then, the present invention relates to a parallel computer in which each node executes a program in parallel.

【０００２】[0002]

【従来の技術】並列計算機には、大きく分けてビジネス
向きと呼ばれる並列計算機と、科学技術計算向きと呼ば
れる並列計算機がある。一般にビジネス向きと呼ばれて
いる並列計算機は、データ処理が中心となるため、プロ
セッサ、メモリ、ネットワーク通信装置から成るノード
すべてにディスクなどの２次記憶装置を接続したシステ
ム構成にしている。これに対して、科学技術計算向きと
呼ばれている並列計算機は、データ演算が処理の大部分
を占めるため、２次記憶装置を持たず、データ演算のみ
を行う計算ノードと、２次記憶装置を持ち、データ演算
の他に入出力処理も行う入出力ノードによって並列計算
機を構成する。そして、計算ノードの数に対して、入出
力ノードの数が少ないのが一般的である。ところで、科
学技術計算を並列計算機上で実行する場合、同一プログ
ラムを並列計算機内の全ノードでほぼ同時に実行する形
を取る場合が多い。このため、同時に同じ入出力処理が
発生し、１個の入出力ノードに入出力処理が集中する可
能性が高い。１個のノードに処理が集中すると、ノード
の数が増えるほど処理待ち時間が長くなり、また、入出
力処理要求の受信側である入出力ノードが処理要求を受
信しきれなくなるなどの問題も発生する。特にノード数
が１０００を越えるような超並列計算機では、特定ノー
ドへの処理集中はシステム全体の性能に影響を及ぼす。
処理の集中を回避する方法として、例えば特開平３−２
０９５３１に示されるように、処理が集中したサービス
プロセスを他のノードに動的に分散する方法が開示され
てる。しかし、上記問題のように２次記憶装置を接続し
ている入出力ノードでしか処理できない場合の処理集中
の問題については解決されていない。2. Description of the Related Art Parallel computers are roughly classified into those for business and those for science and technology. A parallel computer, which is generally called a business type, has a system configuration in which a secondary storage device such as a disk is connected to all the nodes including a processor, a memory, and a network communication device, since data processing is the main focus. On the other hand, in a parallel computer called scientific and technological computing, since data calculation occupies most of the processing, a calculation node that does not have a secondary storage device and a secondary storage device that only performs data calculation. And a parallel computer is configured by an input / output node that has an input / output node that performs input / output processing in addition to data calculation. The number of input / output nodes is generally smaller than the number of calculation nodes. By the way, when the scientific and technological calculations are executed on a parallel computer, it is often the case that the same program is executed almost simultaneously on all nodes in the parallel computer. Therefore, the same input / output processing occurs at the same time, and the input / output processing is likely to concentrate on one input / output node. When the processing concentrates on one node, the processing waiting time becomes longer as the number of nodes increases, and there is a problem that the input / output node which is the receiving side of the input / output processing request cannot receive the processing request. To do. Particularly in a massively parallel computer having more than 1000 nodes, the concentration of processing on a specific node affects the performance of the entire system.
As a method for avoiding concentration of processing, for example, Japanese Patent Laid-Open No. 3-2
As shown in 09531, there is disclosed a method of dynamically distributing a service process in which processing is concentrated to other nodes. However, the problem of processing concentration in the case where processing can be performed only by the input / output node to which the secondary storage device is connected as in the above problem has not been solved.

【０００３】[0003]

【発明が解決しようとする課題】並列計算機上の複数の
計算ノード上で実行しているプログラムが同一入出力ノ
ード上のデータに対して同時に読み出し処理を実行した
場合、該入出力ノードに同時にデータ読み出し処理を要
求する通信が発生する。該入出力ノードがデータ読み出
し要求を受信して処理を行うが、一つの入出力ノードは
１度に１個のノードからの要求に対する処理しかできな
い。仮にＮ個の計算ノードから同時に同じ入出力ノード
に処理要求の通信が入ったとすると、各ノードの入出力
処理に要する時間は平均Ｎ／２倍、全ノードの入出力処
理が終了するまでＮ倍の時間を要することになる。計算
ノード数が１０００を越える超並列計算機ではその処理
待ち時間が１０００倍以上になり、計算機性能全体に与
える影響が大きくなる。各計算ノード上にキャッシュを
設け、ここに入出力データをキャッシュする方法もある
が、一番最初の入出力データ取得の際には該入出力ノー
ドへの通信が必要であるため、同じ問題が発生する。並
列計算機の典型的なアプリケーションプログラムである
科学技術計算のプログラムでは、複数の計算ノード上で
同一プログラムが同時に実行する形式をとるため、上記
問題が発生する可能性が非常に高い。したがって、本発
明の目的は、並列計算機上で並列に実行されるプログラ
ムからの入出力処理の通信に伴う前記処理待ち時間の問
題を軽減し、多ノード同時入出力処理を高速に処理でき
る並列計算機を提供することにある。When a program running on a plurality of computing nodes on a parallel computer simultaneously performs a read process on data on the same input / output node, the data is simultaneously written on the input / output nodes. A communication requesting a read process occurs. The input / output node receives a data read request and performs processing, but one input / output node can process only a request from one node at a time. If processing request communication is input from the N calculation nodes to the same input / output node at the same time, the time required for the input / output processing of each node is N / 2 times the average, and N times until the input / output processing of all nodes is completed. It will take time. In a massively parallel computer having more than 1000 computing nodes, the processing waiting time becomes 1000 times or more, and the influence on the overall computer performance becomes large. There is also a method in which a cache is provided on each computing node and the input / output data is cached here, but the same problem occurs because communication to the input / output node is required when the first input / output data is acquired. appear. A scientific and technical computing program, which is a typical application program of a parallel computer, has a format in which the same program is simultaneously executed on a plurality of computing nodes, so that the above problem is highly likely to occur. Therefore, an object of the present invention is to reduce the problem of the processing waiting time that accompanies the communication of the input / output processing from the programs executed in parallel on the parallel computer and to process the multi-node simultaneous input / output processing at high speed. To provide.

【０００４】[0004]

【課題を解決するための手段】上記目的を達成するた
め、本発明は、複数の計算ノードと、複数の入出力ノー
ドと、該複数の計算ノード及び入出力ノードを相互に接
続するネットワーク有し、計算ノードは、プロセッサ
と、プロセッサが使用するプログラムとデータを保持す
るメモリと、ネットワークを介して他の計算ノード、入
出力ノードと通信を行うための装置を持ち、入出力ノー
ドは、プロセッサと、プロセッサが使用するプログラム
とデータを保持するメモリと、プログラム及びプログラ
ムが使用するデータを保持する２次記憶装置と、ネット
ワークを介して他の計算ノード、入出力ノードと通信を
行うための装置を持つ並列計算機において、前記複数の
計算ノードの内の一部の計算ノードにキャッシュを設け
キャッシュノードとし、並列計算機の立ち上げ時の指定
に従い、前記入出力ノードのいずれかの入出力ノードと
通信できる１以上の前記キャッシュノードを設定すると
共に該各キャッシュノードと通信できる１以上の前記計
算ノードを設定し、１以上の前記計算ノードから同時に
同一２次記憶装置にデータの読み出しまたは書き込み処
理要求がなされたとき、２次記憶装置を有する前記入出
力ノードに代えて前記キャッシュノードから前記計算ノ
ードに要求されたデータを供給するようにしている。ま
た、前記計算ノードは送信するデータ要求メッセージに
どの計算ノードから同じデータへ要求が入るかの情報を
付加し、前記キャッシュノードは前記計算ノードから受
信したデータ要求メッセージ内の前記付加情報に従い、
該キャッシュノードと通信する計算ノード対応に同じデ
ータへの要求の有無を示す識別子と、該要求を実行した
か否かを識別子を設定し、データを要求した全ての計算
ノードへデータが供給されて該両識別子の値が全て一致
するまでキャッシュ上のデータを保持するようにしてい
る。また、前記計算ノードは送信するデータ要求メッセ
ージに現在の要求に関する情報の他に次の読み出しに関
する情報あるいは次の書き込みに関する情報を付加し、
前記キャッシュノードは前記計算ノードから受信したデ
ータ要求メッセージ内の前記付加情報に対応するデータ
がキャッシュ内に存在しないとき、前記付加情報に対応
するデータを前記入出力ノードを介して前記２次記憶装
置から先読みするようにしている。さらに、通信領域を
有する複数の計算ノードと、通信領域を有する複数の入
出力ノードと、該複数の計算ノード及び入出力ノードを
相互に接続するネットワーク有し、該複数の計算ノード
及び入出力ノードは、いずれかの該通信先ノードに転送
するデータと、そのデータを使用すべき転送先ユーザプ
ロセスに割り当てられた通信領域の識別子と、該通信す
べきデータを書き込むべき該通信先ノード内の該メモリ
内の位置を指定する情報とを含むメッセージを該ネット
ワークを介して送信する送信回路と、該複数の計算ノー
ド及び入出力ノードのいずれかの送信元のノードから該
ネットワークを介して転送されたメッセージを受信する
受信回路を有し、いずれかの送信元のノードから複数の
送信先ノードに対して同時にデータ送信が可能な並列計
算機において、前記複数の計算ノードの一部の計算ノー
ドを該計算ノードの通信領域としてキャッシュを用いて
キャッシュノードとし、いずれかのキャッシュのキャッ
シュデータが変更された際に、変更内容を同時に同一キ
ャッシュデータを持つ他のキャッシュノードに送信する
ようにしている。In order to achieve the above object, the present invention has a plurality of calculation nodes, a plurality of input / output nodes, and a network for connecting the plurality of calculation nodes and the input / output nodes to each other. , The computing node has a processor, a memory for holding programs and data used by the processor, and a device for communicating with other computing nodes and input / output nodes via a network. A memory for holding a program and data used by a processor; a secondary storage device for holding a program and data used by the program; and a device for communicating with other computing nodes and input / output nodes via a network. In a parallel computer that has a cache node provided as a cache node in some of the plurality of compute nodes According to the designation at the time of starting the parallel computer, one or more cache nodes that can communicate with any of the input / output nodes are set, and one or more calculation nodes that can communicate with each of the cache nodes are set. When data read or write processing requests are simultaneously made to the same secondary storage device from one or more of the calculation nodes, the cache node requests the calculation node instead of the input / output node having the secondary storage device. I am trying to supply the data. Further, the computing node adds information to which data node a request enters the same data to the data request message to be transmitted, and the cache node follows the additional information in the data request message received from the computing node,
An identifier indicating whether or not there is a request for the same data corresponding to the calculation node communicating with the cache node and an identifier indicating whether or not the request is executed are set, and the data is supplied to all the calculation nodes that have requested the data. The data on the cache is held until the values of both identifiers match. Further, the computing node adds information on the next read or information on the next write to the data request message to be transmitted, in addition to the information on the current request,
When the data corresponding to the additional information in the data request message received from the computing node does not exist in the cache, the cache node transmits the data corresponding to the additional information to the secondary storage device via the input / output node. I'm trying to read ahead. Further, it has a plurality of calculation nodes having a communication area, a plurality of input / output nodes having a communication area, and a network interconnecting the plurality of calculation nodes and the input / output nodes, and the plurality of calculation nodes and the input / output nodes. Is the data to be transferred to any of the communication destination nodes, the identifier of the communication area assigned to the transfer destination user process that should use the data, and the data in the communication destination node to which the data to be communicated should be written. A transmission circuit for transmitting a message including information designating a position in the memory via the network, and a transmission circuit transferred from the source node of the plurality of calculation nodes and the input / output node via the network. A parallel circuit that has a receiving circuit that receives messages and can send data from one source node to multiple destination nodes simultaneously In a computer, a part of the plurality of calculation nodes is used as a cache as a communication area of the calculation node to be a cache node, and when the cache data of one of the caches is changed, the changed contents are the same at the same time. It sends it to other cache nodes that have cache data.

【０００５】[0005]

【作用】データをキャッシュしているキャッシュノード
１個当たりに入出力処理要求を送信する計算ノードの数
が少ないため、送信側の計算ノードの処理待ち時間が少
なくなり、並列計算機全体の性能が向上する。[Operation] Since the number of the calculation nodes that transmit the I / O processing request is small for each cache node that caches data, the processing waiting time of the calculation node on the transmitting side is reduced, and the performance of the entire parallel computer is improved. To do.

【０００６】[0006]

【実施例】以下、本発明の１実施例を詳細に説明する。EXAMPLE An example of the present invention will be described in detail below.

【０００７】図１は、本発明に関わる並列計算機内の計
算ノード及び入出力ノードの構成であり、図２は本発明
を並列計算機に適用した場合のデータ読み出し処理の手
順の１実施例を示すフローチャートであり、図３は複数
の計算ノードから同時にデータ読み出し処理要求が出た
場合の処理手順の１実施例を示すフローチャートであ
る。図１において、１０〜１５は計算ノード、２０〜２
２はキャッシュを持つ計算ノード、すなわちキャッシュ
ノードであり、３０はキャッシュおよび２次記憶装置を
持ち、データ演算の他に入出力処理も行なう入出力ノー
ドである。４０は２次記憶装置、５０〜５３はキャッシ
ュであり、６０〜６８は計算ノード及び入出力ノードを
相互に接続するネットワークである。FIG. 1 shows a configuration of a calculation node and an input / output node in a parallel computer according to the present invention, and FIG. 2 shows an embodiment of a procedure of a data reading process when the present invention is applied to a parallel computer. FIG. 3 is a flow chart, and FIG. 3 is a flow chart showing an embodiment of a processing procedure when data read processing requests are simultaneously issued from a plurality of calculation nodes. In FIG. 1, 10 to 15 are calculation nodes, 20 to 2
Reference numeral 2 is a calculation node having a cache, that is, a cache node, and 30 is an input / output node which has a cache and a secondary storage device and also performs input / output processing in addition to data calculation. Reference numeral 40 is a secondary storage device, 50 to 53 are caches, and 60 to 68 are networks that mutually connect calculation nodes and input / output nodes.

【０００８】次に、図２のフローチャートにしたがって
図１における各部の処理を説明する。並列計算機の各ノ
ードはマトリックス状に配列されているが、計算機の立
ち上げ時に、どの計算ノードがキャッシュノードにな
り、どのキャッシュノードにどの計算ノードが通信でき
るか、及びどの入出力ノードにどのキャッシュノードが
通信できるかについて、予め指定しておく。図１の例で
は、計算ノード１０及び１１はキャッシュノード２０
と、計算ノード１２及び１３はキャッシュノード２１
と、計算ノード１４及び１５はキャッシュノード２２と
通信を行うことが指定してあり、キャッシュノード２
０、２１及び２２は入出力ノード３０と通信を行うこと
が指定してある。Next, the processing of each unit in FIG. 1 will be described with reference to the flowchart of FIG. Each node of the parallel computer is arranged in a matrix, but when the computer starts up, which compute node becomes the cache node, which cache node can communicate with which cache node, and which cache node is connected to which input / output node. It is specified in advance whether the node can communicate. In the example of FIG. 1, the calculation nodes 10 and 11 are cache nodes 20.
And the calculation nodes 12 and 13 are cache nodes 21
, The computing nodes 14 and 15 are designated to communicate with the cache node 22, and the cache node 2
0, 21 and 22 are designated to communicate with the input / output node 30.

【０００９】計算ノード１０が２次記憶装置４０に保持
されているデータを要求する場合、まず、計算ノード１
０は２次記憶装置４０のデータをキャッシュしている可
能性のあるキャッシュノード２０にデータ読み出し要求
を送信する（１１０）。計算ノード１０からのデータ読
み出し要求をキャッシュノード２０が受信し（１２
０）、キャッシュノード２０は内部のキャッシュ５０内
に計算ノード１０が要求するデータが入っているかを調
べ（１３０）、データが入っている場合はそのデータを
計算ノード１０に送信する（２２０）。データがない場
合、キャッシュノード２０は入出力ノード３０にデータ
読み出し要求を送信する（１４０）。キャッシュノード
２０からのデータ要求を入出力ノード３０が受信し（１
５０）、入出力ノード３０内のキャッシュ５３に要求デ
ータが入っているかを調べる（１６０）。データが入っ
ている場合は、データをキャッシュノード２０に送信す
る（１９０）。キャッシュ５３にデータがない場合は２
次記憶装置４０からデータを読み（１７０）、データを
キャッシュに入れる（１８０）。そしてデータをキャッ
シュノード２０に送信する（１９０）。キャッシュノー
ド２０はデータを受け取り（２００）、データをキャッ
シュ５０に入れ（２１０）、更にデータを計算ノード１
０に送信し（２２０）、計算ノード１０がデータを受信
する。図２の実施例では、計算ノード１０がデータを要
求する場合についてのみ説明をしたが、計算ノード１１
〜１５についても処理は同様である。When the calculation node 10 requests the data held in the secondary storage device 40, first, the calculation node 1
0 sends a data read request to the cache node 20 that may cache the data in the secondary storage device 40 (110). The cache node 20 receives the data read request from the computing node 10 (12
0), the cache node 20 checks whether the data requested by the calculation node 10 is stored in the internal cache 50 (130), and if the data is contained, transmits the data to the calculation node 10 (220). If there is no data, the cache node 20 sends a data read request to the input / output node 30 (140). The input / output node 30 receives a data request from the cache node 20 (1
50), it is checked whether the cache 53 in the input / output node 30 contains the requested data (160). If the data is included, the data is transmitted to the cache node 20 (190). 2 if there is no data in cache 53
The data is read from the next storage device 40 (170) and the data is cached (180). Then, the data is transmitted to the cache node 20 (190). The cache node 20 receives the data (200), puts the data in the cache 50 (210), and further stores the data in the compute node 1.
0 (220) and the computing node 10 receives the data. In the embodiment of FIG. 2, only the case where the calculation node 10 requests data is explained, but the calculation node 11
The same applies to the processes from # 15 to # 15.

【００１０】次に、複数の計算ノードから同時に処理要
求が出た場合の処理を図３を使って説明する。計算ノー
ド１０、１１が同時に２次記憶装置４０に保持されてい
るデータを要求した場合、計算ノード１０、１１はキャ
ッシュノード２０にデータ要求を送信する（３００、４
００）。キャッシュノード２０が処理要求を受信すると
（５００）、先に受け付けたデータ読み出し要求から処
理を開始する。図３の実施例では、計算ノード１０から
の要求を先に受け付けた場合の処理を示している。ま
ず、キャッシュノード２０はキャッシュ５０に計算ノー
ド１０からの要求データがあるかどうかを調べる（５１
０）。データが存在した場合は、データを計算ノード１
０に送信し（５２０）、計算ノード１１からの要求の処
理に入る（５３０）。データが存在しない場合は、入出
力ノード３０に対してデータ要求を送信し（５１１）、
その応答を待ってデータをキャッシュ５０に入れた後
（５１２）、データを計算ノード１０に送信する（５２
０）。計算ノード１０は、そのデータを受信し（３１
０）、計算ノード１０側の処理は終了する。Next, the processing in the case where processing requests are simultaneously issued from a plurality of calculation nodes will be described with reference to FIG. When the computing nodes 10 and 11 simultaneously request data held in the secondary storage device 40, the computing nodes 10 and 11 send a data request to the cache node 20 (300, 4).
00). When the cache node 20 receives the processing request (500), the processing is started from the previously received data read request. The embodiment of FIG. 3 shows the processing when the request from the calculation node 10 is accepted first. First, the cache node 20 checks whether or not the cache 50 has request data from the calculation node 10 (51).
0). If data exists, the data is calculated on node 1.
0 (520), and processing of the request from the computing node 11 is started (530). If the data does not exist, a data request is sent to the input / output node 30 (511),
After waiting for the response and putting the data in the cache 50 (512), the data is transmitted to the computing node 10 (52).
0). The computing node 10 receives the data (31
0), the processing on the side of the calculation node 10 ends.

【００１１】次にキャッシュノード２０は、計算ノード
１１からのデータ要求の処理に入る。キャッシュノード
２０はキャッシュ５０に計算ノード１１が要求するデー
タがあるかどうかを調べる（５３０）。データが存在し
た場合は、データを計算ノード１１に送信し（５４
０）、キャッシュノード２０の処理は終了する。データ
が存在しない場合は、入出力ノード３０に対してデータ
要求を送信し（５３１）、その応答を待ってデータをキ
ャッシュ５０に入れた後（５３２）、データを計算ノー
ド１１に送信する（５４０）。計算ノード１１は、その
データを受信し（４１０）、計算ノード１１側の処理も
終了する。Next, the cache node 20 starts processing a data request from the calculation node 11. The cache node 20 checks whether the cache 50 has data requested by the calculation node 11 (530). If the data exists, the data is transmitted to the calculation node 11 (54
0), the process of the cache node 20 ends. If the data does not exist, a data request is transmitted to the input / output node 30 (531), the response is waited for, the data is stored in the cache 50 (532), and then the data is transmitted to the calculation node 11 (540). ). The computing node 11 receives the data (410), and the processing on the computing node 11 side also ends.

【００１２】図３の実施例では、計算ノード１０、１１
が同時に処理要求を出した場合の処理について説明をし
たが、計算ノード１０と１１の間で処理要求を出す時間
に差があり、例えば、キャッシュノード２０が計算ノー
ド１０に対する処理を実行中に計算ノード１１からの処
理要求を受信した場合でも処理は同じである。また、図
３の実施例は、計算ノード１０、１１とキャッシュノー
ド２０に関して説明したが、計算ノード１２〜１５、キ
ャッシュノード２１〜２２の場合も処理は同様である。In the embodiment of FIG. 3, the calculation nodes 10, 11 are
Has described the processing when the processing requests are issued at the same time. However, there is a difference in the time for issuing the processing requests between the calculation nodes 10 and 11, and for example, the calculation is performed while the cache node 20 is executing the processing for the calculation node 10. The processing is the same even when the processing request from the node 11 is received. Further, the embodiment of FIG. 3 has been described with respect to the calculation nodes 10 and 11 and the cache node 20, but the processing is the same for the calculation nodes 12 to 15 and the cache nodes 21 to 22.

【００１３】この実施例によれば、複数のキャッシュノ
ードが入出力ノードへの処理要求集中を緩和する役割を
果たし、また、複数のキャッシュノードを用意すること
で１個のキャッシュノード当たりの処理要求通信も少な
くなるため、処理待ち時間が短くなり、並列計算機全体
の性能を向上させることができる。例えば、図１の実施
例において、あるノードへデータ読み出し要求を出した
場合の処理時間を１とすると、計算ノード１０〜１５が
直接入出力ノード３０に処理を出せば、平均３倍、全ノ
ードの処理が終了するまで６倍の処理時間が発生するの
に対し、キャッシュ５０〜５２にデータがあれば、平均
１．５倍、全ノードの処理が終了するまで２倍の処理時
間で済むことになる。According to this embodiment, a plurality of cache nodes play a role of relieving the concentration of processing requests to the input / output nodes, and by preparing a plurality of cache nodes, the processing requests for each cache node can be reduced. Since communication is reduced, the processing waiting time is shortened, and the performance of the parallel computer as a whole can be improved. For example, in the embodiment of FIG. 1, assuming that the processing time when a data read request is issued to a certain node is 1, if the calculation nodes 10 to 15 directly issue a process to the input / output node 30, the average is tripled, The processing time of 6 times is required until the processing of No. is completed, whereas if there is data in the caches 50 to 52, the processing time is 1.5 times on average and twice as long as the processing of all nodes is completed. become.

【００１４】上記実施例では、データを要求する計算ノ
ードと入出力ノードの間に１個のキャッシュノードしか
置いていないが、２個以上のキャッシュノードを置いて
も良いことは言うまでもない。また、上記実施例では、
データ読み出し要求処理に関して説明したが、データの
書き込み要求、その他キャッシュ上のデータに対する処
理要求の場合でも同様の効果がある。更に、上記実施例
では、データ読み出し要求を発生するプログラムがキャ
ッシュを持たない計算ノード１０〜１５でしか実行され
ていない例を示したが、データ読み出し要求を発生する
プログラムがキャッシュノード２０〜２２、入出力ノー
ド３０で実行されていてもよい。この場合、キャッシュ
にデータが存在すれば、通信による処理待ち時間が発生
せず、処理がより高速になる。In the above embodiment, only one cache node is placed between the calculation node requesting the data and the input / output node, but it goes without saying that two or more cache nodes may be placed. Further, in the above embodiment,
Although the data read request process has been described, the same effect can be obtained in the case of a data write request or a process request for data on the cache. Further, in the above-described embodiment, the example in which the program that issues the data read request is executed only in the computing nodes 10 to 15 that do not have a cache is shown. However, the program that issues the data read request is in the cache nodes 20 to 22, It may be executed in the input / output node 30. In this case, if there is data in the cache, the processing wait time due to communication does not occur and the processing becomes faster.

【００１５】図１の実施例では、単に計算ノード上にキ
ャッシュがある場合の処理であったが、他の実施例とし
て、例えば、特開平６−１９８５６号に示されている、
複数の計算ノード（クラスタ）がネットワークにより結
合された並列計算機であり、各計算ノードの通信領域に
プロセスのデータ領域を割り当てることが可能な並列計
算機では、一部の計算ノードの通信領域としてキャッシ
ュとして使用するデータ領域を用いることによってキャ
ッシュノードとし、キャッシュ上のデータやキャッシュ
管理情報の通信処理を効率的に行うことができる。この
実施例では、キャッシュ内のデータ及びキャッシュ管理
情報を通信領域にコピーする必要がなくなるため、デー
タコピーの時間を排除することができ、ノード内での処
理時間を短くすることが可能となる。更に、ネットワー
クの通信機能に複数計算ノードへの同時通信機能がある
場合は、計算ノード間の相互通信の効率も上げることが
できる。まず、計算機の立ち上げ時に、どの計算ノード
がキャッシュノードになるかを指定すると同時に、すべ
てのキャッシュノードにどの計算ノードがキャッシュを
持っているかを知らせておく。そして、キャッシュの状
態に変化が発生した場合には、直ちに他のキャッシュノ
ードに同時通信機能を使って通知する。この方法では、
並列計算機内にキャッシュを持っている計算ノードの数
がいくつあっても、１回の送信処理で操作が完了するた
め、計算ノード間通信の効率を上げることができる。In the embodiment of FIG. 1, the processing is performed only when the cache is on the calculation node, but as another embodiment, for example, it is shown in Japanese Patent Laid-Open No. 6-19856.
In a parallel computer in which multiple compute nodes (clusters) are connected by a network and the data area of a process can be assigned to the communication area of each compute node, a cache is used as the communication area of some compute nodes. By using the data area to be used, a cache node can be formed, and communication processing of data on the cache and cache management information can be efficiently performed. In this embodiment, since it is not necessary to copy the data in the cache and the cache management information to the communication area, the data copy time can be eliminated and the processing time in the node can be shortened. Further, when the communication function of the network has a function of simultaneously communicating with a plurality of calculation nodes, the efficiency of mutual communication between the calculation nodes can be improved. First, when the computer is started up, which computing node will be the cache node is specified, and at the same time, all the caching nodes are informed which computing node has the cache. Then, when a change occurs in the cache state, it immediately notifies other cache nodes using the simultaneous communication function. in this way,
No matter how many calculation nodes have caches in the parallel computer, since the operation is completed by one transmission process, the efficiency of communication between the calculation nodes can be improved.

【００１６】他の実施例として、図１の実施例におい
て、図４に示すようにキャッシュノード内のキャッシュ
にフラグ（識別子）としてキャッシュアクセス表示フラ
グ７０とキャッシュ無効化許可フラグ７５を設け、複数
の計算ノードが同じデータを要求する場合に、キャッシ
ュ内に格納されているデータに対する他の計算ノードか
らの処理要求が完了するまで、キャッシュデータを保持
する機能を付加することもできる。図４において、キャ
ッシュアクセス表示フラグ７０はアクセスノード表示フ
ラグ７１とアクセス完了表示フラグ７２の対から成り、
それぞれのフラグはデータ要求を行う可能性がある計算
ノードの数の表示ビットを持っており、１ビットに１個
の計算ノードが対応する。アクセスノード表示フラグ７
１は、キャッシュデータ８０内のデータを要求する計算
ノードを表示し、データを要求する計算ノードのフラグ
に“１”が立つ。アクセス完了表示フラグ７２はデータ
要求を既に行った計算ノードを表示し、データ要求を既
に行った計算ノードのフラグに“１”が立つ。また、キ
ャッシュ無効化許可フラグ７５はキャッシュデータ８０
内のデータを無効にしてもよいかどうかを表示し、デー
タを無効にしてもよい場合に“１”が立つ。As another embodiment, in the embodiment of FIG. 1, the cache in the cache node is provided with a cache access display flag 70 and a cache invalidation permission flag 75 as flags (identifiers) as shown in FIG. When the calculation nodes request the same data, it is possible to add a function of holding the cache data until the processing request from the other calculation nodes for the data stored in the cache is completed. In FIG. 4, the cache access display flag 70 is composed of a pair of an access node display flag 71 and an access completion display flag 72,
Each flag has an indication bit of the number of calculation nodes that may make a data request, and one bit corresponds to one calculation node. Access node display flag 7
1 indicates a calculation node requesting data in the cache data 80, and “1” is set in the flag of the calculation node requesting data. The access completion display flag 72 displays a calculation node that has already made a data request, and “1” is set in the flag of the calculation node that has already made a data request. Further, the cache invalidation permission flag 75 is the cache data 80.
It indicates whether or not the data in it may be invalidated, and "1" is set when the data may be invalidated.

【００１７】次に、この方式の処理手順を図５のフロー
チャートを使って説明する。Next, the processing procedure of this system will be described with reference to the flowchart of FIG.

【００１８】まず、キャッシュノードにデータを要求す
るいくつかの計算ノードは、データを要求する際、どの
計算ノードから同じデータへ要求が入るかの情報を付加
したデータ要求メッセージを送信する（６００）。キャ
ッシュノードは、最初のデータ要求通信を受信すると
（６０５）、データ要求の通信メッセージの中から同一
データへの処理要求を発行する計算ノードの情報を読み
出し、アクセスノード表示フラグ７１に設定する（６１
０）。すなわち、同一データへの処理要求を発行する計
算ノードに対応するアクセスノード表示フラグに“１”
を立てる。そして、計算ノードへはデータを送信し（６
２０）、データを要求した計算ノードに対応するアクセ
ス完了表示フラグに“１”を立てる（６３０）。同様
に、他の計算ノードからの同一データ要求通信を受信す
ると（６４０）、計算ノードへはデータを送信し（６５
０）、データを要求した計算ノードに対応するアクセス
完了表示フラグに“１”を立てる（６６０）。ここで、
アクセスノード表示フラグ７１とアクセス完了表示フラ
グ７２を比較し（６７０）、フラグが一致しない、すな
わち、まだデータ要求を出していない計算ノードが残っ
ている場合は、次のデータ要求受信待ちになる（６８
０）。フラグが一致した場合は、すべての計算ノードか
らデータ要求の通信が入ったことになるので、キャッシ
ュ無効化許可フラグ７５に“１”を立て（６９０）、キ
ャッシュデータ８０が無効化されてもよい状態にする。First, some computational nodes requesting data from a cache node, when requesting data, send a data request message to which information on which computational node requests the same data is added (600). . When the cache node receives the first data request communication (605), it reads the information of the calculation node that issues the processing request for the same data from the data request communication message and sets it in the access node display flag 71 (61).
0). That is, "1" is set in the access node display flag corresponding to the calculation node that issues the processing request for the same data.
Stand up. Then, the data is transmitted to the calculation node (6
20), "1" is set to the access completion display flag corresponding to the calculation node that requested the data (630). Similarly, when the same data request communication is received from another computing node (640), the data is transmitted to the computing node (65).
0), "1" is set to the access completion display flag corresponding to the calculation node that requested the data (660). here,
The access node display flag 71 and the access completion display flag 72 are compared (670), and if the flags do not match, that is, if there are computing nodes that have not yet issued a data request, the next data request reception wait ( 68
0). If the flags match, it means that the data request communication has been entered from all the calculation nodes, so that the cache invalidation permission flag 75 is set to "1" (690), and the cache data 80 may be invalidated. Put in a state.

【００１９】上記の説明は、同じデータを要求する１組
の計算ノードに係るキャッシュアクセス表示フラグ、キ
ャッシュ無効化許可フラグ、キャッシュデータについて
説明したが、他の同じデータを要求する１組の計算ノー
ド毎にキャッシュアクセス表示フラグ、キャッシュ無効
化許可フラグ、キャッシュデータが用意されることはい
うまでもない。例えば、キャッシュデータのタグ毎にキ
ャッシュアクセス表示フラグ、キャッシュ無効化許可フ
ラグを用意すればよい。この実施例によれば、複数の計
算ノードから同じデータに対する要求が入った場合、要
求を出したすべての計算ノードにデータを送信するまで
データがキャッシュからはずされることがないため、キ
ャッシュノードから入出力ノードにデータ要求の通信が
発生することがなく、キャッシュノードと入出力ノード
の通信を減らすことができ、ノード間通信の効率を上げ
ることができる。Although the above description has explained the cache access display flag, the cache invalidation permission flag, and the cache data relating to a set of calculation nodes requesting the same data, another set of calculation nodes requesting the same data. It goes without saying that a cache access display flag, a cache invalidation permission flag, and cache data are prepared for each. For example, a cache access display flag and a cache invalidation permission flag may be prepared for each tag of cache data. According to this embodiment, when a request for the same data is input from a plurality of calculation nodes, the data is not removed from the cache until the data is sent to all the calculation nodes that have issued the request. Communication of a data request does not occur in the input / output node, the communication between the cache node and the input / output node can be reduced, and the efficiency of inter-node communication can be improved.

【００２０】前記の実施例において、データ要求メッセ
ージにどの計算ノードから同じデータへ要求が入るかの
情報を付加しているが、他の実施例として、この情報の
代わりにデータの先読み指示の情報を付加することも可
能である。この実施例では、計算ノードはデータ要求を
送信する際、次に要求するデータに関する情報をデータ
要求のメッセージの中に一緒に入れて送信する。キャッ
シュノードは、計算ノードからのデータ要求を受信する
と、キャッシュにデータがない場合、要求データと同時
に指示された先読みデータも入出力ノードに要求し、デ
ータを先読みしておく。キャッシュにデータがある場合
は、計算ノードに要求データを送信し、キャッシュに先
読みデータがない場合には同時に入出力ノードに先読み
データの要求を行い、データを先読みしておく。この実
施例によれば、キャッシュノードは、計算ノードからの
次のデータ要求時に必要となるデータを予め入出力ノー
ドから読み込むことが可能となり、キャッシュミスによ
る処理待ち時間を減らすことができ、更に、キャッシュ
ノードと入出力ノードとの間の通信回数も減らす事が可
能となり、並列計算機全体の性能を上げることができ
る。上記の実施例では、データ要求の際に更に先のデー
タを要求する場合の方式について説明したが、データ書
き込みの際に次のデータの先読み情報の付加、また、デ
ータ要求の際に次に発生するデータ書き込みに関する情
報の付加、データ書き込みの際に次のデータの書き込み
に関する情報を付加してもよい。In the above-described embodiment, the data request message is added with the information indicating from which calculation node the request enters the same data. However, as another embodiment, instead of this information, the information of the prefetching instruction of the data is added. It is also possible to add. In this embodiment, when the computing node sends a data request, it sends information about the next requested data together in a data request message. When the cache node receives the data request from the calculation node, if there is no data in the cache, the cache node requests the prefetch data instructed at the same time as the request data from the input / output node to prefetch the data. When there is data in the cache, the request data is transmitted to the calculation node, and when there is no prefetch data in the cache, the prefetch data is requested to the input / output node at the same time to prefetch the data. According to this embodiment, the cache node can read in advance the data required for the next data request from the calculation node from the input / output node, and can reduce the processing wait time due to a cache miss. The number of communications between the cache node and the input / output node can be reduced, and the performance of the parallel computer as a whole can be improved. In the above embodiment, the method of requesting further data at the time of data request has been described. However, the pre-read information of the next data is added at the time of data writing, and the next data is generated at the time of data request. It is also possible to add information related to data writing to be performed and information related to writing of next data when writing data.

【００２１】[0021]

【発明の効果】以上述べたように、本発明によれば、多
数の計算ノードから一つの入出力ノードへの処理要求の
集中を避け、処理待ち時間を短縮することができる。こ
の結果、並列計算機全体の性能を向上させることができ
る。As described above, according to the present invention, it is possible to avoid the concentration of processing requests from a large number of calculation nodes to one input / output node and to shorten the processing waiting time. As a result, the performance of the entire parallel computer can be improved.

[Brief description of drawings]

【図１】本発明の１実施例である並列計算機内の計算ノ
ード、キャッシュノード、入出力ノードの接続を示すブ
ロック図である。FIG. 1 is a block diagram showing connections of a calculation node, a cache node, and an input / output node in a parallel computer that is an embodiment of the present invention.

【図２】図１の実施例における処理手順を示すフローチ
ャートである。FIG. 2 is a flowchart showing a processing procedure in the embodiment of FIG.

【図３】図１の実施例において、複数の計算ノードが同
時に入出力の処理要求を出した場合の処理手順を示すフ
ローチャートである。FIG. 3 is a flowchart showing a processing procedure when a plurality of computing nodes simultaneously issue input / output processing requests in the embodiment of FIG.

【図４】図１の実施例において、キャッシュアクセス表
示フラグとキャッシュ無効化許可フラグを付加したキャ
ッシュのブロック図である。FIG. 4 is a block diagram of a cache in which a cache access display flag and a cache invalidation permission flag are added in the embodiment of FIG.

【図５】図４の実施例における処理手順を示すフローチ
ャートである。5 is a flowchart showing a processing procedure in the embodiment of FIG.

[Explanation of symbols]

１０〜１５計算ノード２０〜２２キャッシュノード３０入出力ノード４０２次記憶装置５０〜５３キャッシュ６０〜６８ノード間ネットワーク通信路７０キャッシュアクセス表示フラグ７１アクセスノード表示フラグ７２アクセス完了表示フラグ７５キャッシュ無効化許可フラグ８０キャッシュデータ 10-15 Compute node 20-22 Cache node 30 I / O node 40 Secondary storage device 50-53 Cache 60-68 Inter-node network communication path 70 Cache access display flag 71 Access node display flag 72 Access completion display flag 75 Cache invalidation Permission flag 80 cache data

Claims

[Claims]

1. A plurality of calculation nodes, a plurality of input / output nodes, and a network interconnecting the plurality of calculation nodes and the input / output nodes, wherein the calculation nodes are processors, and programs and data used by the processors. And a device for communicating with other computing nodes and input / output nodes via a network, and the input / output node includes a processor, a memory for holding a program and data used by the processor, and a program A parallel computer having a secondary storage device for holding data used by a program, and a device for communicating with other computing nodes and input / output nodes via a network, wherein a part of the plurality of computing nodes is provided. A cache is provided for each of the above compute nodes as a cache node, and the above-mentioned input / output is performed according to the specification when the parallel computer is started up. One or more of the cache nodes that can communicate with any of the input / output nodes of the node, and one or more of the compute nodes that can communicate with each of the cache nodes. A parallel computer, characterized in that, when a data read or write processing request is made to a storage device, the requested data is supplied from the cache node to the calculation node instead of the input / output node having a secondary storage device. .

2. The parallel computer according to claim 1, wherein the computing node adds information indicating from which computing node a request enters the same data to a data request message to be transmitted, and the cache node receives from the computing node. According to the additional information in the data request message, an identifier indicating whether there is a request for the same data corresponding to the calculation node communicating with the cache node and an identifier indicating whether the request has been executed are set, and the data is requested. A parallel computer, characterized in that the data is supplied to all the calculated nodes and the data in the cache is held until the values of both the identifiers match.

3. The parallel computer according to claim 1, wherein the computing node adds information about a next read or information about a next write to the data request message to be transmitted, in addition to the information about the current request, Prefetches the data corresponding to the additional information from the secondary storage device via the input / output node when the data corresponding to the additional information in the data request message received from the computing node does not exist in the cache. A parallel computer characterized by that.

4. A plurality of computing nodes having a communication area,
A plurality of input / output nodes having a communication area and a network interconnecting the plurality of calculation nodes and the input / output nodes are provided, and the plurality of calculation nodes and the input / output nodes transfer to any of the communication destination nodes. A message including data, an identifier of a communication area assigned to a transfer destination user process that should use the data, and information designating a position in the memory in the communication destination node where the data to be communicated is written A transmission circuit for transmitting the message via the network, and a reception circuit for receiving a message transferred from the transmission source node of the plurality of calculation nodes and the input / output node via the network, In a parallel computer capable of simultaneously transmitting data from a source node to a plurality of destination nodes, a part of the plurality of computing nodes When a cache is used as a communication area of the compute node and a cache is used as the communication area of the compute node, when the cache data of one of the caches is changed, the changed contents are simultaneously transmitted to another cache node having the same cache data. A characteristic parallel computer.