JP4117621B2

JP4117621B2 - Data batch transfer device

Info

Publication number: JP4117621B2
Application number: JP2004101887A
Authority: JP
Inventors: 克彦岡田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2004-03-31
Filing date: 2004-03-31
Publication date: 2008-07-16
Anticipated expiration: 2024-03-31
Also published as: JP2005285042A

Description

本発明はデータ一括転送装置に関し、特に複数のコンピュータ（ノード）を接続して高速データ転送を行いながら計算すると共に１つの命令でノード間の大容量転送を実行可能にする、計算システム又はディスタンス転送によるクラスタ内データを一括転送するデータ一括転送装置に関する。 The present invention relates to a data batch transfer apparatus , and more particularly, a calculation system or distance transfer that connects a plurality of computers (nodes) to perform calculation while performing high-speed data transfer and enables large-capacity transfer between nodes with one command. The present invention relates to a data batch transfer apparatus for batch transfer of data in a cluster.

複数のノードに跨る大型計算システムにおいて、処理速度の高速化等のために、並列処理が行われる。ノード間に跨る並行実行プログラムは、主に各ノードの計算フェーズと、ノード間で同期転送する転送フェーズに処理が分れる。急速に向上する計算性能に対して、この転送フェーズの転送時間を軽減することが重要である。 In a large-scale computing system spanning a plurality of nodes, parallel processing is performed in order to increase processing speed. A parallel execution program that spans between nodes can be divided into a calculation phase mainly for each node and a transfer phase for synchronous transfer between nodes. It is important to reduce the transfer time of this transfer phase for rapidly increasing computing performance.

特に、科学計算分野（例えば、各種のシミュレーション）においては、主として、
（１）各ノードでの計算に先立ち、あるノードで大配列データを構築し、これを分割して各ノードへ割り当ててデータを各ノードに転送する作業、また最後に逆に転送しデータを統合する作業および
（２）計算フェーズの間の転送フェーズでの転送（隣接する計算領域の境界のデータの加算演算等をするため、各ノード間で転送しあう）
の２種類の転送が発生する。そこで、これらの転送時間を軽減することが求められる。 In particular, in the field of scientific calculation (for example, various simulations)
(1) Prior to calculation at each node, a large array data is constructed at a certain node, this is divided and assigned to each node, and the data is transferred to each node. (2) Transfer in the transfer phase during the calculation phase (transfer between each node to perform addition operation of data at the boundary of adjacent calculation areas)
Two types of transfers occur. Therefore, it is required to reduce these transfer times.

斯かる技術分野における従来技術は、種々の技術文献に開示されている。各々ローカルエリアネットワークに接続された複数のＣＰＵ（演算処理装置）、データ転送装置および共有メモリを備える複数のクラスタをクラスタ間ネットワークで相互接続するデータ処理装置およびそのデータ処理方法が開示されている（例えば、特許文献１参照。）。また、二次元配列データ内のサブアレイデータ全体をアクセスする場合に、サブアレイデータの要素数に拘らず、総転送要素数をサブアレイデータの要素数で割った分だけデータ転送命令を発行することを不要とし、データ転送時の転送効率を改善する情報処理装置および情報処理システムが開示されている（例えば、特許文献２参照。）。 Conventional techniques in such technical fields are disclosed in various technical documents. A data processing apparatus and a data processing method for interconnecting a plurality of clusters each including a plurality of CPUs (arithmetic processing apparatuses), data transfer apparatuses, and a shared memory connected to a local area network through an inter-cluster network are disclosed ( For example, see Patent Document 1.) In addition, when accessing the entire subarray data in the two-dimensional array data, it is not necessary to issue a data transfer instruction by dividing the total number of transfer elements by the number of subarray data elements , regardless of the number of subarray data elements. An information processing apparatus and an information processing system that improve transfer efficiency during data transfer are disclosed (for example, see Patent Document 2).

特開２０００−３２２３９２号公報（第４頁、第１図）JP 2000-322392 A (page 4, FIG. 1) 特開平１１−１３４３１０号公報（第２−３頁、第２図）JP-A-11-134310 (page 2-3, FIG. 2)

上述の如き従来技術を、図８を参照して説明する。従来技術のデータ（又は情報）処理装置１００は、図８の左上に示す如く、それぞれ複数のＣＰＵ（図中ではＣＰと略記する）１１０、これら複数のＣＰＵ１１０に共通のメモリ１２０およびＲＣＵ（ノード間転送制御ユニット）１３０を含むノード０乃至ノードnの複数のノードを有する。そして、これら複数のノードを、ノード間スイッチ１４０により切替選択するように構成され、ＲＣＵ１３０単位でノード間転送を実施している。この際の、ＲＣＵ１３０から見たメモリ１２０のメモリアドレスのイメージを、図８中に左下に図示している。 The prior art as described above will be described with reference to FIG. As shown in the upper left of FIG. 8, the prior art data (or information) processing apparatus 100 includes a plurality of CPUs (abbreviated as CP in the figure) 110, a memory 120 and an RCU (between nodes) common to the plurality of CPUs 110. A plurality of nodes 0 to n including a transfer control unit 130. The plurality of nodes are configured to be switched and selected by the inter-node switch 140, and inter-node transfer is performed in units of RCUs 130. The image of the memory address of the memory 120 viewed from the RCU 130 at this time is shown in the lower left in FIG.

次に、図９乃至図１１を参照して従来のディスタンス転送を説明する。
図９はディスタンス転送のイメージ図であり、メモリーメモリ間転送時に各要素のアドレスをルールに従って変更して転送する。図中、ＢＬはローカルノード内主記憶転送開始アドレスを、ＢＲはリモートノード内主記憶転送開始アドレスを、ＦＬ１は第１ディスタンス転送要素数、ＦＬ２は第２ディスタンス転送ブロック数、ＴＬは総転送要素数、ＤＬ１はローカルノード第１ディスタンス、ＤＬ２はローカルノード第２ディスタンス、ＤＬ３はローカルノード第３ディスタンス、ＤＲ１はリモートノード第１ディスタンス、ＤＲ２はリモートノード第２ディスタンス、ＤＲ３はリモートノード第３ディスタンスをそれぞれ示す。 Next, conventional distance transfer will be described with reference to FIGS.
FIG. 9 is an image diagram of distance transfer. When transferring between memories, the address of each element is changed according to a rule and transferred. In the figure, BL is the local node main memory transfer start address , BR is the remote node main memory transfer start address, FL1 is the first distance transfer element number, FL2 is the second distance transfer block number, and TL is the total transfer element. DL1, local node first distance, DL2 local node second distance, DL3 local node third distance, DR1 remote node first distance, DR2 remote node second distance, DR3 remote node third distance Each is shown.

図１０はディスタンス転送の仕組みを説明する原理図であり、連続するアドレスに格納された配列に対して演算を実施し、規則的に点在（アドレスが飛ぶ）する配列を、別の規則に並べ直して転送先に格納するものである。転送元メモリ内の状態と転送先ノードのメモリ内の状態が示されている。ここで、配列の開始アドレスを０ｘ８００００として表示している。 FIG. 10 is a principle diagram for explaining the mechanism of distance transfer. An operation is performed on arrays stored at consecutive addresses, and arrays that are regularly scattered (addresses fly) are arranged in another rule. It is corrected and stored in the transfer destination. The state in the transfer source memory and the state in the transfer destination node memory are shown. Here, the start address of the array is displayed as 0x80000.

図１１はディスタンス転送の利用方法の説明図である。図９において、各情報が転送指示情報である。このうちローカルノード側の情報が転送元配置情報であり、リモートノード側情報が転送先情報である。図１０は、配列の構成とメモリ内の配置イメージを示す。大配列を分割して各ノードで演算するためには、配列中を点在するデータを一定のルールを持ってかき集めて転送し、１つの配列として集めて計算を行う。また、図１１に示す如く、上述した分割・転送（図１１（Ａ）参照）のみならず、逆方向の転送・統合（図１１（Ｂ）参照）、圧縮・拡大（図（Ｃ１１）参照）および転置転送（変形）（図１１（Ｄ）参照）等の種々の利用が行われる。 FIG. 11 is an explanatory diagram of a method of using distance transfer. In FIG. 9, each piece of information is transfer instruction information. Among these, information on the local node side is transfer source arrangement information, and remote node side information is transfer destination information. FIG. 10 shows a configuration of the array and an arrangement image in the memory. In order to divide a large array and perform calculation at each node, data scattered in the array is collected and transferred with a certain rule, and is collected as one array for calculation. Further, as shown in FIG. 11, not only the above-described division / transfer (see FIG. 11A), but also reverse transfer / integration (see FIG. 11B), compression / enlargement (see FIG. C11). In addition, various uses such as transposition transfer (deformation) (see FIG. 11D) are performed.

図８に示す従来例において、ＣＰＵ１１０からもＲＣＵ１３０からもメモリ１２０には自ノードのメモリ１２０にしかアクセスすることができない。この条件の中で、並列実行プログラムの転送では、ノードごとの転送の際に競合調停が入ることで、転送時間に該当ノードへの転送待ち時間が加わることになる。特に、複数ノードへの転送を行うノードは、初めに競合調停で転送待ちになってしまうと、後続の転送も一緒に転送待ちとすることになり（所謂ヘッドブロッキング現象）、この競合調停時間が増加するという課題を有する。 In the conventional example shown in FIG. 8, both the CPU 110 and the RCU 130 can access only the memory 120 of the own node. Under these conditions, in the parallel execution program transfer, contention arbitration occurs at the time of transfer for each node, so that a transfer waiting time to the corresponding node is added to the transfer time. In particular, when a node that performs transfer to a plurality of nodes first waits for transfer due to contention arbitration, the subsequent transfer also waits for transfer together (so-called head blocking phenomenon). It has the problem of increasing.

本発明は、従来技術の上述の如き課題に鑑みなされたものであり、クラスタ内の一括転送の機構を持つことにより、並列実行プログラム等でもこの競合調停時間を１回以下に低減し、転送時間を短縮する計算システム、即ちデータ一括転送装置を提供することを主目的とする。また、ディスタンス転送の原理を変更することなく、ソフトウェア制御として実現され、新たな複雑な制御を行うことなく効率的且つ高速でプログラム実行可能にするデータ一括転送装置を提供することを付加的な目的とする。 The present invention has been made in view of the above-described problems of the prior art, and by having a batch transfer mechanism in a cluster, this contention arbitration time can be reduced to one time or less even in a parallel execution program or the like. It is a main object of the present invention to provide a calculation system, that is, a data batch transfer device . Another object of the present invention is to provide a data batch transfer device that is realized as software control without changing the principle of distance transfer, and that can execute a program efficiently and at high speed without performing new complicated control. And

前述の課題を解決するため、本発明によるデータ一括転送装置は次のような特徴的な構成を採用している。 In order to solve the above-described problem, the data batch transfer apparatus according to the present invention employs the following characteristic configuration.

（１）各々複数のＣＰＵに共通のメモリを有するn個（nは２以上の整数）のノード（コンピュータ）数）のノード（コンピュータ）を備え、前記ＣＰＵが転送指示を、アドレスが規則的に点在しているデータ配列を別の規則に並べ直して転送先に転送することを指示する命令であって前記データ配列を前記メモリのメモリサイズと対応するメモリ容量ディスタンスにより並べ直して転送することを指示するディスタンス命令により発行し、前記転送指示により前記複数のノードの前記各メモリのデータを連続したメモリに一括ストア（蓄積）して転送するデータ一括転送装置において、
前記複数のノードの前記メモリに共通接続されたノード間転送制御ユニット（ＲＣＵ）を備え、
該ＲＣＵは、前記各ノードのメモリに対してデータをストアするデータストア部と、前記各ノードの前記メモリの各々の実アドレスを、前記ノードの番号ｎと前記メモリ容量ディスタンスとを乗算した値をｎ番目のノードのメモリの前記実アドレスに対して加算することにより算出されるグローバルアドレスとして保持しており、前記転送指示において前記グローバルアドレスで指定される転送対象データのアドレスを前記メモリ容量ディスタンスにより前記実アドレスに分解し、前記実アドレスをどのノードのメモリに対して送出するかの対応関係を格納するアドレス変換テーブルを参照して、分解により生成された前記実アドレスと送出先のメモリを示す情報とを前記データストア部に通知するアドレス変換部とを備え、
前記データストア部は、前記グローバルアドレスに基づいて一括ストアされた前記転送対象データを、１つの前記ディスタンス命令で指定された前記グローバルアドレスを分解して生成された前記実アドレスと該実アドレスに対応する前記メモリを示す情報とに基づいて前記ノード間で転送するデータ一括転送装置。 (1) n nodes (computers ) each having a memory common to a plurality of CPUs (where n is an integer of 2 or more), wherein the CPU issues a transfer instruction and addresses are regularly An instruction for rearranging scattered data arrays to another rule and transferring them to a transfer destination, and rearranging and transferring the data arrays according to a memory capacity distance corresponding to the memory size of the memory In a data batch transfer apparatus that issues a distance instruction to indicate, and batch stores (accumulates) and transfers data of each memory of the plurality of nodes to a continuous memory according to the transfer instruction,
An inter-node transfer control unit (RCU) commonly connected to the memories of the plurality of nodes;
The RCU has a data store unit for storing data in the memory of each node, a value obtained by multiplying each real address of the memory of each node by the node number n and the memory capacity distance. It holds as a global address calculated by adding to the real address of the memory of the nth node, and the address of the transfer target data specified by the global address in the transfer instruction is determined by the memory capacity distance. The real address generated by the decomposition and the destination memory are shown with reference to an address conversion table that stores the correspondence relationship of the node to which the real address is to be transmitted by decomposing into the real address An address conversion unit for notifying the data store unit of information,
The data store unit corresponds to the real address and the real address generated by disassembling the global address specified by one distance instruction for the transfer target data collectively stored based on the global address. And a data batch transfer device for transferring between the nodes based on the information indicating the memory.

本発明のデータ一括転送装置によると、次の如き実用上の顕著な効果が得られる。即ち、アドレス変換手段を備えて一括転送することにより、クラスタ内又は指定クラスタのノード転送毎に発生する転送競合調停時間を、クラスタ内全ノード転送で１回に時間短縮可能である。また、ソフトウェア命令のインタフェースを変更せずに実現するので、ソフトウェア制御を複雑化せず、転送手続きのソフトウェア制御が複雑化することなくハードウェアの本来の性能を引き出すことが可能である。 According to the data batch transfer device of the present invention, the following practical effects can be obtained. In other words, by carrying out batch transfer with the address conversion means, it is possible to reduce the transfer contention arbitration time that occurs for each node transfer within a cluster or for a designated cluster to one time for all node transfers within the cluster. Further, since the software instruction interface is realized without changing, it is possible to bring out the original performance of the hardware without complicating the software control and without complicating the software control of the transfer procedure.

以下、本発明によるデータ一括転送装置の好適実施例の構成および動作を、添付図面を参照して詳細に説明する。 Hereinafter, the configuration and operation of a preferred embodiment of a data batch transfer apparatus according to the present invention will be described in detail with reference to the accompanying drawings.

先ず、本発明は、以下の説明から明らかな如く、次に３点に集約できる。第１に、従来は、ノード（コンピュータ）１つ毎に保有するノード間転送制御ユニット（以下、ＲＣＵという)を複数のノードに対して１つ持つ（以下、その固まりをクラスタという）構成にし、且つそのＲＣＵにおいて、個々のクラスタ内ノードのメモリ領域を連続アドレスとして、ＲＣＵからアクセスできるためのノード間クラスタ内のアドレス変換機構を有する。尚、本明細書中で『クラスタ』とは、各クラスタ内のノードのメモリは、ノード内の各ＣＰＵからはアクセス可能であるが、他のノードのＣＰＵからはＲＣＵを介してノード間転送を行い、自ノード内のメモリ上にデータを転送しないとデータを参照不可能な構成とし、各ノードが他のノードのメモリをダイレクトに参照（および更新）することが不可なノードを複数構成させたものを意味する。 First, the present invention can be summarized into three points as will be apparent from the following description. First, conventionally, each node (computer) has a node-to-node transfer control unit (hereinafter referred to as RCU) for a plurality of nodes (hereinafter referred to as a cluster). In addition, the RCU has an address translation mechanism in the inter-node cluster that can be accessed from the RCU using the memory area of each intra-cluster node as a continuous address. In this specification, “cluster” means that the memory of a node in each cluster can be accessed from each CPU in the node, but transfer from one node to another is performed from the CPU of another node via the RCU. Configured so that the data cannot be referred to unless the data is transferred to the memory in the own node, and a plurality of nodes in which each node cannot directly refer to (and update) the memory of the other node are configured. Means things.

第２に、ある１ノードの１ＣＰＵから転送命令（１命令）が発行されたときに、上述したアドレス変換機構を介し、ＣＰＵの指定するアドレスを変換し、各メモリからデータをロードして連続したデータとしてＲＣＵ内に取り込むデータロード機構を有する。そして、このデータロード機能により１カ所に集められたデータを、連続したデータとして一括してＣＰＵの転送指示に従ってアドレスを変換し、各ノードに分かれている各メモリへデータをストア（蓄積）するデータストア機構を有する。 Second, when a transfer instruction (one instruction) is issued from one CPU of a certain node, the address designated by the CPU is converted through the above-described address conversion mechanism, and data is continuously loaded by loading data from each memory. It has a data load mechanism that takes it into the RCU as data. Then, the data collected in one place by this data load function is converted into continuous data in accordance with the transfer instruction of the CPU, and the data is stored (accumulated) in each memory divided into each node. Has a store mechanism.

更に第３に、ＣＰＵからの命令である転送指示をディスタンス命令（メモリ上に規則的に点在する配列データをそのアドレス飛びルール〔転送元配列の配置ルールと転送先配列の配置ルール〕で指定することにより、一括ノード間転送する命令）で発行することである。これにより、クラスタ内のノード間に跨るアドレスの飛び幅を新たなディスタンスとして加えた単一ノード間ディスタンス転送のソフトウェア制御でありながら、クラスタ内の全ノード間の転送を一括転送可能にする。 Thirdly, a transfer instruction, which is an instruction from the CPU, is specified by a distance instruction (array data regularly scattered on the memory is specified by its address skip rule [placement rule for transfer source array and placement rule for transfer destination array]). Is issued by a batch command between nodes). As a result, it is possible to perform batch transfer of all nodes in the cluster while performing software control of distance transfer between single nodes in which a jump distance of addresses across nodes in the cluster is added as a new distance.

図１は、本発明のデータ一括転送装置の好適実施例の基本構成を示すブロック図である。図１に示すデータ一括転送装置１０は、それぞれ複数のＣＰＵ１２を含む複数のノート（ノード０〜ノードｎ）、ノード毎に設けられ、ＣＰＵに接続されたメモリ１４およびこれら複数のノードの全てのメモリ１４に接続されたＲＣＵ２０により構成される。 FIG. 1 is a block diagram showing the basic configuration of a preferred embodiment of the data batch transfer apparatus of the present invention. The data batch transfer apparatus 10 shown in FIG. 1 includes a plurality of notes (nodes 0 to n) each including a plurality of CPUs 12, a memory 14 provided for each node, and a memory 14 connected to the CPU and all memories of the plurality of nodes. 14 and the RCU 20 connected to 14.

そして、ＲＣＵ２０は、図１の右側に示す如く、ＣＰＵからの転送指示情報通知部２１、データロード部２２、データストア部２３、アドレス変換部２４およびデータ蓄積バッファ２５を含んでいる。ここで、ＣＰＵからの転送指示情報通知部２１、データロード部２２およびデータストア部２３は、バスを介して相互接続されている。アドレス変換部２４は、ＣＰＵからの転送指示情報通知部２１から出力を得て、データロード部２２およびデータストア部２３に出力する。また、データ蓄積バッファ２５は、データロード部２２およびデータストア部２３間に接続されている。 The RCU 20 includes a transfer instruction information notification unit 21, a data load unit 22, a data store unit 23, an address conversion unit 24, and a data storage buffer 25 from the CPU, as shown on the right side of FIG. Here, the transfer instruction information notification unit 21, the data load unit 22, and the data store unit 23 from the CPU are interconnected via a bus. The address conversion unit 24 obtains an output from the transfer instruction information notification unit 21 from the CPU, and outputs it to the data load unit 22 and the data store unit 23. The data storage buffer 25 is connected between the data load unit 22 and the data store unit 23.

このように、データ一括転送装置１０は、複数のノードに接続されたＲＣＵ２０を備え、このＲＣＵ２０は、アドレス変換部２４を有し、複数のノードに接続された複数のＣＰＵ１２のうちの１ノードに内在するＣＰＵから転送命令（メモリ上に規則的に点在する配列データを、そのアドレス飛びルール〔転送元配列の配置ルールと転送先配列の配置ルール〕で指定することにより、一括ノード間転送する命令）による転送指示情報を受け取る。ノード間転送制御ユニット２０の内部では、ＣＰＵ１２からの転送指示情報（転送元配置情報および転送先配置情報が含まれる）により、転送元データを複数のノードから並行して取り込み、またこれを各ノードのメモリ１４に配置する。 As described above, the data batch transfer apparatus 10 includes the RCU 20 connected to a plurality of nodes. The RCU 20 includes the address conversion unit 24 and is connected to one node among the plurality of CPUs 12 connected to the plurality of nodes. Transfer between batch nodes by specifying transfer instructions from the underlying CPU (array data regularly scattered in the memory by the address skip rule [placement rule for the transfer source array and placement rule for the transfer destination array]) The transfer instruction information is received by the command. Inside the inter-node transfer control unit 20, transfer source data is fetched in parallel from a plurality of nodes in accordance with transfer instruction information (including transfer source arrangement information and transfer destination arrangement information) from the CPU 12, and each node receives this data. Arranged in the memory 14.

次に、上述したデータ一括転送装置１０を構成する各部の主要機能を説明する。アドレス変換部２４は、ＣＰＵ１２からの転送指示情報の転送元情報に示されたアドレス開始位置およびアドレスの飛び幅等のデータの点在するルールに基づき、これを連続したデータの塊に分解する。更に、実際にどのデータをどのノードのどのアドレスから持ってくるのか示したアドレス変換テーブル（後述する図５参照）により一意に変換し、これを通知する。また、同様にＣＰＵ１２からの転送指示情報の転送先情報に示されたアドレス開始位置およびアドレスの飛び幅等のデータの点在するルールに基づき、実際にどのデータをどのノードのどのアドレスへ書き込むか一意に変換する機能を有する。これにより、アドレス変換部２４は、実際にメモリアクセスするノードおよびそのメモリアドレス（一塊のブロック毎）を特定し、この情報をデータロード部２２に伝える。 Next, main functions of each unit constituting the data batch transfer apparatus 10 described above will be described. The address conversion unit 24 disassembles the data into continuous data chunks based on data dotted rules such as the address start position and address jump width indicated in the transfer source information of the transfer instruction information from the CPU 12. Further, it is uniquely converted by an address conversion table (see FIG. 5 described later) indicating which data is actually brought from which address of which node, and this is notified. Similarly, which data is actually written to which address of which node based on the data interspersed rules such as the address start position and address jump width indicated in the transfer destination information of the transfer instruction information from the CPU 12 It has a function to convert uniquely. As a result, the address conversion unit 24 specifies the node that actually accesses the memory and its memory address (for each block), and transmits this information to the data load unit 22.

データロード部２２は、指定されたノードの指定されたメモリアドレス（開始アドレス）にアクセスし、ブロック毎のデータを一括ロードする。データロード部２２は、ロードする際にアドレス変換部２４で変換されたアドレスの順序を保持し、各ノードから非同期に返却されるデータを、要求したアドレスの順序が保障されるようにデータ蓄積バッファ２５への格納を指示する。具体的には、順序のためのシーケンスＩＤを有し、これをデータ蓄積バッファ２５の書き込みアドレスにする等して順序を保障する。データストア部２３は、データ蓄積バッファ２５からデータを取り出し、通知された転送先配置情報を元にアドレス変換部２４が変換した転送先ノードおよび転送先アドレスに従い、指定するノードの指定するアドレス（開始アドレス）にブロック毎のデータを一括ストアする。 The data load unit 22 accesses the specified memory address (start address) of the specified node, and loads the data for each block at once. The data load unit 22 holds the order of addresses converted by the address conversion unit 24 when loading, and stores data returned asynchronously from each node in a data storage buffer so that the requested order of addresses is guaranteed. 25 is instructed. Specifically, a sequence ID for order is provided, and this is guaranteed by using this as a write address of the data storage buffer 25. The data store unit 23 retrieves data from the data storage buffer 25, and designates the address (start) designated by the designated node according to the transfer destination node and the transfer destination address converted by the address conversion unit 24 based on the notified transfer destination arrangement information. Store the data for each block at the same time.

このようにして、本発明のデータ一括転送方法および装置では、ディスタンス命令１命令で複数のノードに点在する配列をクラスタ毎に処理するアドレス変換部２４、データロード部２２およびデータストア部２３を有している。従って、ノード単位の転送に発生する競合調停が、クラスタ単位の１回になるため競合調停時間が減少でき且つディスタンス転送命令に基づき転送を実現することでＳＷ制御を複雑にすることなく一括転送が実現できる。 In this way, in the data batch transfer method and apparatus according to the present invention, the address conversion unit 24, the data load unit 22 and the data store unit 23 which process an array scattered in a plurality of nodes by a distance instruction for each cluster. Have. Therefore, the contention arbitration that occurs in the node unit transfer becomes one time in the cluster unit, so the contention arbitration time can be reduced, and the transfer is realized based on the distance transfer command, so that the batch control can be performed without complicating the SW control. realizable.

次に、図２は、図１の具体例としてのノード０〜ノード３の合計４個のノードを束ねたクラスタ構成のブロック図が示されている。図２に示す具体例において、ＲＣＵ（ノード間転送制御ユニット）２０は、複数のノード（ノード０〜ノード３）と接続される。各ノードは、メモリ１４および１以上のＣＰＵ１２で構成される。ＲＣＵ２０は、複数のノードとの接続で各メモリ１４とのデータの送受を行う。また、各ノードは、ＣＰＵ１２からメモリ１４を経路としてＲＣＵ２０に情報の通知を行う。ここで、ＲＣＵ２０は、ＣＰＵからの転送指示情報通知部２１、データロード部２２、データストア部２３、アドレス変換部２４ａ、２４ｂ、データ蓄積バッファ２５ａ、２５ｂ、アドレス指示情報バッファ２６ａ、２６ｂ、クラスタ間データ送出部２７およびクラスタ間データ受信部２８を含んでいる。 Next, FIG. 2 shows a block diagram of a cluster configuration in which a total of four nodes of node 0 to node 3 as a specific example of FIG. 1 are bundled. In the specific example shown in FIG. 2, the RCU (internode transfer control unit) 20 is connected to a plurality of nodes (node 0 to node 3). Each node includes a memory 14 and one or more CPUs 12. The RCU 20 transmits / receives data to / from each memory 14 by connecting to a plurality of nodes. Each node notifies the RCU 20 of information from the CPU 12 through the memory 14 as a route. Here, the RCU 20 includes a transfer instruction information notification unit 21 from the CPU, a data load unit 22, a data store unit 23, address conversion units 24a and 24b, data storage buffers 25a and 25b, address instruction information buffers 26a and 26b, and between clusters. A data sending unit 27 and an intercluster data receiving unit 28 are included.

複数のノードのうちの１ノードに内在するＣＰＵ１２が、転送命令（メモリ上に規則的に点在する配列データをそのアドレス飛びルールを指定することにより、一括転送命令（図３参照））による転送指示情報を受け取る。ＣＰＵ１２からの転送指示情報通知部２１は、転送指示を複数個保持（キューイング）しながら一番古い指示情報から随時、転送元データ配置情報をアドレス変換部２４に、転送先データ配置情報をアドレス指示情報バッファ２６に通知する。 CPU 12 that resides in one of a plurality of nodes transfers a transfer command (a batch transfer command (see FIG. 3) by specifying an address skip rule for array data regularly scattered in memory). Receive instruction information. The transfer instruction information notification unit 21 from the CPU 12 retains (queues) a plurality of transfer instructions and, from time to time, from the oldest instruction information, transfers the transfer source data arrangement information to the address conversion unit 24 and transfers the transfer destination data arrangement information to the address. The instruction information buffer 26 is notified.

アドレス変換部２４は、予め内部にデータをロードしておき、これを保持するアドレス変換テーブル（図５参照）を持ち、ＣＰＵ１２からの転送指示情報通知部２１から通知された転送元データ配置情報をノードとデータの固まり（ブロックという）毎のアドレスに分解し、これをデータロード部２２に通知する。データロード部２２は、ブロック毎に各ノードのメモリ１４内の指定アドレスよりデータをロードし、データをアドレス変換部２４からアドレスを通知された順序になるようにデータ蓄積バッファ２５に格納する。 The address conversion unit 24 loads data in advance and has an address conversion table (see FIG. 5) for holding the data. The transfer unit data arrangement information notified from the transfer instruction information notification unit 21 from the CPU 12 is stored in the address conversion unit 24. It is broken down into addresses for each node and data block (referred to as a block), and this is notified to the data load unit 22. The data load unit 22 loads data from the designated address in the memory 14 of each node for each block, and stores the data in the data accumulation buffer 25 in the order in which the addresses are notified from the address conversion unit 24.

クラスタ間データ送出部２７は、クラスタ間のデータ転送の競合調停を行い、データ転送の制御を行う。データの転送が可能な場合には、アドレス指示情報バッファ２６とデータ蓄積バッファ２５から各情報およびデータを取り出し、これを転送する。クラスタ内への自クラスタから自クラスタへ転送の場合には、クラスタ間データ受信部２８にデータを送信する。クラスタ間データ受信部２８は、アドレス指示情報バッファ２６とデータ蓄積バッファ２５に転送先データ配置情報とデータを格納する。アドレス指示情報バッファ２６は、アドレス変換部２４に転送先データ配置情報を通知する。アドレス変換部２４ｂは、予め内部にデータをロードしておき、これを保持するアドレス変換テーブル（図５参照）を有し、アドレス指示情報バッファ２６から通知された転送先データ配置情報をノードとデータの固まり（ブロック）毎のアドレスに分解し、これをデータストア部２３に通知する。データストア部２３は、ブロック毎に各ノードのメモリ１４内の指定アドレスにデータをストアする。 The inter-cluster data sending unit 27 performs contention arbitration of data transfer between clusters and controls data transfer. If data transfer is possible, each information and data is taken out from the address indication information buffer 26 and the data storage buffer 25 and transferred. In the case of transfer from the own cluster to the own cluster in the cluster, the data is transmitted to the inter-cluster data receiving unit 28. The intercluster data receiving unit 28 stores the transfer destination data arrangement information and data in the address instruction information buffer 26 and the data storage buffer 25. The address instruction information buffer 26 notifies the address conversion unit 24 of transfer destination data arrangement information. The address conversion unit 24b has an address conversion table (refer to FIG. 5) which loads data therein in advance and holds the data, and transfers the transfer destination data arrangement information notified from the address instruction information buffer 26 to the node and the data Are divided into addresses for each block (block) and notified to the data store unit 23. The data store unit 23 stores data at a designated address in the memory 14 of each node for each block.

以上、本発明の実施例の構成および各部の機能を詳述したが、図２のＣＰＵ１２、メモリ１４およびクラスタ間スイッチ３０は、当業者に周知であり、また本発明とは直接関係しないので、その詳細構成は省略する。尚、上述の実施例では、クラスタは単一であってもよく、複数個存在してもよい。また、ＣＰＵ１２の数も特に制限はない。また、アドレス変換テーブル（図５）は、ページサイズ６４ＭＢ、主記憶１ＴＢ、クラスタ内ノード数４の場合の具体例であるが、これらは１例に過ぎず、これらの数に特に制限はない。主記憶のページ管理については、当業者に周知であり、また本発明とは直接関係しないので、その詳細な構成は省略する。
The configuration of the embodiment of the present invention and the function of each unit have been described in detail. However, the CPU 12, the memory 14, and the intercluster switch 30 in FIG. 2 are well known to those skilled in the art and are not directly related to the present invention. The detailed configuration is omitted. In the above-described embodiment, there may be a single cluster or a plurality of clusters. Further, the number of CPUs 12 is not particularly limited. The address conversion table (FIG. 5 ) is a specific example in the case of a page size of 64 MB, a main storage of 1 TB, and the number of nodes in a cluster of 4, but these are only examples, and there is no particular limitation on these numbers. The page management of the main memory is well known to those skilled in the art and is not directly related to the present invention, so that the detailed configuration is omitted.

次に、本発明による新ディスタンス転送の原理を、図３を参照して説明する。従来のディスタンス転送において、ノード内メモリのサイズを１つのディスタンスとして定義して、大きな１枚のメモリの中の一部分が１ノードに割り当てられるイメージで指定ディスタンス毎（即ち、各ノードへの同時アクセス）に一括してデータを転送する。図３では、ノードを跨るディスタンス＝ノード内メモリサイズ（１ＴＢと想定）と定義してディスタンスアクセスを行っている。本例では、複数ノードに分散したデータを１ノードに統合している。ここでは、ＡｓｕｂとＡは簡単のため一次元配列として説明している。 Next, the principle of the new distance transfer according to the present invention will be described with reference to FIG. In the conventional distance transfer, the size of the memory in the node is defined as one distance, and an image in which a part of one large memory is allocated to one node for each designated distance (ie, simultaneous access to each node) Transfer data in batches. In FIG. 3, distance access is performed by defining distance across nodes = in-node memory size (assuming 1 TB). In this example, data distributed to a plurality of nodes is integrated into one node. Here, Sub and A are described as one-dimensional arrays for simplicity.

図５は、図２中のアドレス変換部２４の説明図である。アドレス変換部２４は、図５（Ａ）に示す如く、複数の加算器等を含む従来構成のディスタンスアドレス分解回路２４１およびその出力側に設けられるアドレス変換テーブル２４２により構成される。図５（Ｂ）は、このアドレス変換テーブル２４２の具体例の説明である。 FIG. 5 is an explanatory diagram of the address conversion unit 24 in FIG. As shown in FIG. 5A, the address conversion unit 24 includes a conventional structure of a distance address decomposition circuit 241 including a plurality of adders and an address conversion table 242 provided on the output side thereof. FIG. 5B is a description of a specific example of the address conversion table 242.

次に、図４におけるデータ転送について、図２のシステムの動作を、図６（Ａ）、（Ｂ）に示すタイミングチャートを参照して説明する。先ず、ノードのシステム立ち上げ時に、アドレス変換部２４にアドレステーブルの書き込みを行う。これは、ＲＣＵ２０の一部へのデータ書き込みという形でＣＰＵ１２等から値を設定する。このようにして、予め実際の転送に先立って設定される。その後、（１）のタイミングで、ＣＰＵ１２が、ディスタンス転送命令を発行する。この転送命令は、転送指示情報をメモリ１４に対して通知する。メモリ１４は、これをＲＣＵ２０に受け渡す。次に、（２）のタイミングで、転送指示情報がＲＣＵ２０内のＣＰＵからの転送指示情報通知部２１に到着する。図４では、Ａｓｕｂ＿ｎが隣接する配列Ａのサブ空間（Ａｓｕｂ＿ｎ+１等）のＡｓｕｂ＿ｎに隣接する配列部分を１〜３命令（３次元配列ならば３命令）で一括して隣接ノードに転送することができる。 Next, regarding the data transfer in FIG. 4, the operation of the system in FIG. 2 will be described with reference to timing charts shown in FIGS. First, the address table is written to the address conversion unit 24 when the node system is started up. This is a value set by the CPU 12 or the like in the form of data writing to a part of the RCU 20. In this way, it is set in advance prior to actual transfer. Thereafter, at the timing (1), the CPU 12 issues a distance transfer instruction. This transfer instruction notifies the memory 14 of transfer instruction information. The memory 14 passes this to the RCU 20. Next, the transfer instruction information arrives at the transfer instruction information notification unit 21 from the CPU in the RCU 20 at the timing (2). In FIG. 4, the array portion adjacent to Asb_n in the subspace (Asub_n + 1, etc.) of array A adjacent to Asb_n is transferred to the adjacent nodes all at once with 1 to 3 instructions (3 instructions for a three-dimensional array). Can do.

ＣＰＵからの転送指示情報通知部２１では、転送指示を複数個保持（キューイング）しながら一番古い指示情報から随時、転送指示情報内の転送元データ配置情報をアドレス変換部２４に、転送先データ配置情報をアドレス指示情報バッファ２６に、（３）のタイミングで通知する。アドレス変換部２４は、ＣＰＵからの転送指示情報通知部２１から通知された転送元データ配置情報を、ディスタンス転送でのアドレス分解回路を動作させることでデータの固まり（ブロック）毎のアドレスに分解し(タイミング４〜７)、これをアドレス変換テーブルで変換することにより、ＲＣＵ２０のポート番号と送出するメモリアクセスリクエストのメモリアドレスおよびデータ長に変換する。変換後は、これをデータロード部２２に通知する（タイミング８〜１１）。 In the transfer instruction information notification unit 21 from the CPU, the transfer source data arrangement information in the transfer instruction information is transferred to the address conversion unit 24 from the oldest instruction information as needed while holding (queueing) a plurality of transfer instructions. The data arrangement information is notified to the address instruction information buffer 26 at the timing (3). The address conversion unit 24 decomposes the transfer source data arrangement information notified from the transfer instruction information notification unit 21 from the CPU into addresses for each data block (block) by operating an address decomposition circuit in distance transfer. (Timing 4 to 7), this is converted into the port number of the RCU 20 and the memory address and data length of the memory access request to be sent out by converting this with the address conversion table. After the conversion, this is notified to the data load unit 22 (timing 8 to 11).

データロード部２２は、ブロック毎に指定されたポートに出力する（タイミング１２〜１５）。これら各タイミングで戻ってきたデータ（各ノードのメモリ内指定アドレスからのデータ）をそれぞれロードする（タイミング１６〜１９)。データをアドレス変換部２４からアドレスを通知された順序になるようにデータ蓄積バッファ２５ａに格納する（タイミング２０）。データが全て来たか否かを、データ蓄積バッファ２５はロードしたデータの個数により判別する。 The data load unit 22 outputs to the port designated for each block (timing 12 to 15). Data returned at each timing (data from the designated address in the memory of each node) is loaded (timing 16 to 19). Data is stored in the data storage buffer 25a in the order in which the addresses are notified from the address conversion unit 24 (timing 20). The data storage buffer 25 determines whether or not all data has been received based on the number of loaded data.

次に、データ蓄積バッファ２５aにデータが全て揃うと、クラスタ間データ送出部２７に転送可能通知が発行される（タイミング２１）。これにより、クラスタ間データ送出部２７は、クラスタ間のデータ転送の競合調停を行い、データ転送の制御を行う（タイミング２２）。データの転送が可能な場合には、アドレス指示情報バッファ２６aおよびデータ蓄積バッファ２５aから各情報およびデータを取り出し、これを転送する（タイミング２３）。クラスタ内への自クラスタから自クラスタへ転送の場合には、クラスタ間データ受信部２８にデータを送信する（タイミング２４）。 Next, when all the data is collected in the data storage buffer 25a, a transfer enable notification is issued to the inter-cluster data sending unit 27 (timing 21). As a result, the inter-cluster data sending unit 27 performs contention arbitration for inter-cluster data transfer and controls data transfer (timing 22). If data transfer is possible, each information and data is taken out from the address indication information buffer 26a and the data storage buffer 25a and transferred (timing 23). In the case of transfer from the own cluster to the own cluster in the cluster, the data is transmitted to the inter-cluster data receiving unit 28 (timing 24).

クラスタ間データ受信部２８は、アドレス指示情報バッファ２６ｂおよびデータ蓄積バッファ２５ｂに転送先データ配置情報およびデータを格納する（タイミング２５）。アドレス指示情報バッファ２６ｂは、アドレス変換部２４ｂに転送先データ配置情報を通知する（タイミング２６）。アドレス変換部２４ｂは、アドレス指示情報バッファ２６ｂから通知された転送先データ配置情報をディスタンス転送でのアドレス分解回路を動作させることでデータの固まり（ブロック）毎のアドレスに分解する（タイミング２７〜３０）。そして、これをアドレス変換テーブルで変換することにより、ＲＣＵ２０のポート番号と送出するメモリアクセスリクエストのメモリアドレスおよびデータ長に変換する。この変換後は、これをデータストア部２３に通知する。データストア部２３は、ブロック毎に各ノードのメモリ１４内の指定アドレスにデータをそれぞれストアする（タイミング３１〜３４）。これにより、図４に示す如く、各ノードから転送データが複数ノードから複数ノードに一括して転送される。 The inter-cluster data receiving unit 28 stores the transfer destination data arrangement information and data in the address instruction information buffer 26b and the data storage buffer 25b (timing 25). The address instruction information buffer 26b notifies the address conversion unit 24b of the transfer destination data arrangement information (timing 26). The address conversion unit 24b decomposes the transfer destination data arrangement information notified from the address instruction information buffer 26b into addresses for each data block (block) by operating an address decomposition circuit in distance transfer (timing 27 to 30). ). Then, by converting this using the address conversion table, the port number of the RCU 20 and the memory address and data length of the memory access request to be transmitted are converted. After this conversion, this is notified to the data store unit 23. The data store unit 23 stores data in the designated address in the memory 14 of each node for each block (timing 31 to 34). As a result, as shown in FIG. 4, transfer data from each node is transferred from a plurality of nodes to a plurality of nodes at once.

次に、図７を参照して本発明の第２実施例について説明する。この基本構成は、図２を参照して上述した通りであるが、ディスタンス命令の通知について更に工夫している。実施例として、今まで説明した実施例は基本的に転送元クラスタから転送先クラスタ（自クラスタを含む）にデータを転送する形をとり、転送元クラスタのＣＰＵからの通知で転送を開始する。図７において、ＣＰＵからの転送指示情報通知部２１は、別クラスタのＣＰＵからのリクエストを通知するリクエスト転送手段とリクエスト受信手段を持ち、リクエスト受信手段が転送指示情報を受け取ったとき、これをＣＰＵからの転送指示情報通知部２１に通知する。以後、ＣＰＵからの転送指示情報通知部２１からの処理は、上述した第１実施例と同様である。
Next, a second embodiment of the present invention will be described with reference to FIG. This basic configuration is as described above with reference to FIG. 2, but further devised for the notification of the distance command. As an example, the embodiment described up to now takes the form of transferring data to the destination cluster (including its own cluster) from essentially the source cluster starts transferring the notification from the CPU of the source cluster. 7, the transfer instruction information notification unit 21 of the CPU has a request transfer means and the request receiving means for notifying a request from another cluster of CPU, when the request receiving means has received the transfer instruction information, which CPU Is notified to the transfer instruction information notification unit 21. Thereafter, the processing from the transfer instruction information notification unit 21 from the CPU is the same as in the first embodiment described above.

このように、第２実施例では、他クラスタに対してもクラスタ内の一括転送が可能なため、複数のクラスタで構成されるシステムについて、動作を統括する１ノードの１ＣＰＵから、全てのクラスタで並列してクラスタ内の一括転送処理を行うことにより、実行時間を更に短縮することが可能である。 As described above, in the second embodiment, batch transfer within a cluster is possible with respect to other clusters. Therefore, in a system composed of a plurality of clusters, from one CPU of one node that supervises operation to all clusters. The execution time can be further shortened by performing batch transfer processing in the cluster in parallel.

図８に、本発明によるデータ一括転送装置１０および従来のデータ一括転送装置１００を、ＲＣＵ側から見たメモリアドレスのイメージと共に対比して示す。図８に右側に示す本発明によるデータ一括転送装置１０によると、各々複数のＣＰＵ１２とメモリ１４を含む複数のノードの複数のメモリ１４に共通のＲＣＵ２０を使用し、このＲＣＵ２０からクラスタ間スイッチ３０を介して他のクラスタと接続される。従って、クラスタ内をグローバルアドレス化してＲＣＵ２０から見たメモリアドレスは、点在するアドレスのデータを一定のルールでかき集めて１回の一括転送が可能であることが分かる。 FIG. 8 shows the data batch transfer device 10 according to the present invention and the conventional data batch transfer device 100 together with the image of the memory address as seen from the RCU side. According to the data batch transfer device 10 according to the present invention shown on the right side in FIG. 8, a common RCU 20 is used for a plurality of memories 14 of a plurality of nodes each including a plurality of CPUs 12 and memories 14, and the inter-cluster switch 30 is connected from this RCU 20. Connected to other clusters via Therefore, it can be seen that the memory address viewed from the RCU 20 by making the cluster global address can collect data of scattered addresses according to a certain rule and perform batch transfer once.

以上、本発明の好適実施例の構成および動作を詳述した。しかし、斯かる実施例は、本発明の単なる例示に過ぎず、何ら本発明を限定するものではないことに留意されたい。本発明の要旨を逸脱することなく、特定用途に応じて種々の変形変更が可能であること、当業者には容易に理解できよう。 The configuration and operation of the preferred embodiment of the present invention have been described in detail above. However, it should be noted that such examples are merely illustrative of the invention and do not limit the invention in any way. Those skilled in the art will readily understand that various modifications and changes can be made according to a specific application without departing from the gist of the present invention.

本発明によるデータ一括転送装置の第１実施例の基本構成を示すブロック図である。It is a block diagram which shows the basic composition of 1st Example of the data batch transfer apparatus by this invention. 図１に示すデータ一括転送装置の具体例を示すブロック図である。It is a block diagram which shows the specific example of the data batch transfer apparatus shown in FIG. 本発明による新ディスタンス転送を説明する概念図である。It is a conceptual diagram explaining the new distance transfer by this invention. 本発明による転送フェーズでの隣接領域のデータ転送の説明図である。It is explanatory drawing of the data transfer of the adjacent area | region in the transfer phase by this invention. 図２中のアドレス変換部の１例の内部構成を示すブロック図である。FIG. 3 is a block diagram illustrating an internal configuration of an example of an address conversion unit in FIG. 2. 図２中のドレス変換テーブルの説明図である。It is explanatory drawing of the dress conversion table in FIG. 本発明の動作を説明するためのタイミングチャートである。3 is a timing chart for explaining the operation of the present invention. 本発明の動作を説明するためのタイミングチャートである。3 is a timing chart for explaining the operation of the present invention. 本発明の第２実施例の説明図である。It is explanatory drawing of 2nd Example of this invention. データ一括転送装置の従来例（左側）および本発明（右側）を対比する説明図である。It is explanatory drawing which contrasts the prior art example (left side) of a data batch transfer apparatus, and this invention (right side). 従来のディスタンス転送命令の仕様を説明する図である。It is a figure explaining the specification of the conventional distance transfer instruction. 分割の際のディスタンス転送と演算の仕組みの説明図である。It is explanatory drawing of the mechanism of the distance transfer in the case of a division | segmentation, and a calculation. ディスタンス転送の主な利用方法の説明図である。It is explanatory drawing of the main usage method of distance transfer.

Explanation of symbols

１０データ一括転送装置
１２ＣＰＵ
１４メモリ
２０ＲＣＵ（ノード間転送制御ユニット）
２１ＣＰＵからの転送指示情報通知部
２２データロード部
２３データストア部
２４アドレス変換部
２５データ蓄積バッファ
２６アドレス指示情報バッファ
２７クラスタ間データ送出部
２８クラスタ間データ受信部
３０クラスタ間スイッチ
10 Data batch transfer device 12 CPU
14 Memory 20 RCU (Internode Transfer Control Unit)
21 Transfer instruction information notification unit 22 from CPU CPU 22 Data load unit 23 Data store unit 24 Address conversion unit 25 Data storage buffer 26 Address instruction information buffer 27 Intercluster data transmission unit 28 Intercluster data reception unit 30 Intercluster switch

Claims

N nodes (computers) each having a memory common to a plurality of CPUs, where the CPU gives a transfer instruction and a data array in which addresses are regularly scattered An instruction that instructs to transfer to the transfer destination after rearranging to a rule, and is issued by a distance instruction that instructs to transfer the data array by rearranging and transferring according to the memory capacity distance corresponding to the memory size of the memory, In a data batch transfer device that batch-stores (accumulates) and transfers data of each memory of the plurality of nodes to a continuous memory according to a transfer instruction,
An inter-node transfer control unit (RCU) commonly connected to the memories of the plurality of nodes;
The RCU has a data store unit for storing data in the memory of each node, a value obtained by multiplying each real address of the memory of each node by the node number n and the memory capacity distance. It holds as a global address calculated by adding to the real address of the memory of the nth node, and the address of the transfer target data specified by the global address in the transfer instruction is determined by the memory capacity distance. The real address generated by the decomposition and the destination memory are shown with reference to an address conversion table that stores the correspondence relationship of the node to which the real address is to be transmitted by decomposing into the real address An address conversion unit for notifying the data store unit of information,
The data store unit corresponds to the real address and the real address generated by disassembling the global address specified by one distance instruction for the transfer target data collectively stored based on the global address. And transferring data between the nodes based on the information indicating the memory.