JP6665429B2

JP6665429B2 - Arithmetic processing device, information processing device, and control method for information processing device

Info

Publication number: JP6665429B2
Application number: JP2015125808A
Authority: JP
Inventors: 哲志中川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-06-23
Filing date: 2015-06-23
Publication date: 2020-03-13
Anticipated expiration: 2035-06-23
Also published as: JP2017010319A

Description

本件は、演算処理装置、情報処理装置、および情報処理装置の制御方法に関する。 The present invention relates to an arithmetic processing device, an information processing device, and a control method of the information processing device.

ＰＣ（Personal Computer），サーバなどの情報処理装置には、メモリから読み出したデータに対する演算処理を行なう、ＣＰＵ（Central Processing Unit）などの演算処理装置が含まれる。このような情報処理装置では、エンドポイントであるＣＰＵからメモリに対する読出要求となるフェッチ要求が発行された後、当該読出要求に対応する応答データをＣＰＵが受信するまでのレイテンシが、処理性能の重要なファクタの一つになっている。 An information processing device such as a PC (Personal Computer) or a server includes an arithmetic processing device such as a CPU (Central Processing Unit) that performs arithmetic processing on data read from a memory. In such an information processing apparatus, the latency from when a CPU, which is an endpoint, issues a fetch request as a read request to a memory to when the CPU receives response data corresponding to the read request is an important factor in processing performance. One of the factors.

また、情報処理装置としては、複数のＣＰＵを相互に通信可能に接続されるマルチプロセッサシステムが用いられる場合がある。このようなマルチプロセッサシステムでは、近年、例えば図１４に示すように、複数のＣＰＵやルータの相互間が、高速シリアル伝送によるネットワークを介して接続される。高速シリアル伝送を採用することによって、通信の高スループットを実現することができる。 Further, a multiprocessor system in which a plurality of CPUs are communicably connected to each other may be used as the information processing apparatus. In such a multiprocessor system, in recent years, as shown in FIG. 14, for example, a plurality of CPUs and routers are connected to each other via a network using high-speed serial transmission. By employing high-speed serial transmission, high communication throughput can be realized.

特開２０１０−１２４４４８号公報JP 2010-124448 A 特開２０１０−１１６２２８号公報JP 2010-116228 A

ところで、高速シリアル伝送によるネットワークでは、伝送エラーがＣＲＣ（Cyclic Redundancy Check；巡回冗長検査）によってチェックされる。そして、伝送エラーが検出されるとパケットの末尾がＥＤＢ（end bad）に書き換えられる。そのため、エンドポイント（ＣＰＵ）では、応答データを含むパケットの末尾までの全てのデータを受信しないと、そのパケットが正常か否かを判定することができない。 In a network using high-speed serial transmission, transmission errors are checked by a CRC (Cyclic Redundancy Check). When a transmission error is detected, the end of the packet is rewritten to EDB (end bad). Therefore, unless the end point (CPU) receives all data up to the end of the packet including the response data, it cannot determine whether the packet is normal.

したがって、フェッチ要求を発行したＣＰＵでは、当該フェッチ要求に応じた応答データを含むパケットの末尾までの全てのデータが、一旦、受信バッファに保存されてから、受信バッファにおけるデータが先頭から順にＣＰＵコアに送り出される。 Therefore, in the CPU that has issued the fetch request, all the data up to the end of the packet including the response data corresponding to the fetch request is temporarily stored in the reception buffer, and then the data in the reception buffer is sequentially stored in the CPU core from the top. Will be sent to

しかしながら、受信バッファはＦＩＦＯ（First In First Out）であるので、ＣＰＵコアが要求している処理対象データ（一単位データ；例えば８バイトデータ）の、受信バッファからの読出は、当該単位データ直前のデータが読み出されるまで待たされる。このため、ＣＰＵコアから見て通信レイテンシが大きくなり処理性能が低下する場合がある。 However, since the reception buffer is a FIFO (First In First Out), reading of the processing target data (one unit data; for example, 8 byte data) requested by the CPU core from the reception buffer is performed immediately before the unit data. Wait until data is read. For this reason, there is a case where the communication latency increases as viewed from the CPU core, and the processing performance decreases.

一つの側面で、本件明細書に開示の発明は、処理対象データの読出レイテンシを短縮することを目的とする。 In one aspect, the invention disclosed in the present specification aims to reduce the read latency of data to be processed.

本件の演算処理装置は、他の演算処理装置に接続されるものであって、処理部，通信部，バッファ，書込部および読出部を有する。前記処理部は、処理対象データの読出要求を生成する。前記通信部は、前記処理部が生成した読出要求を前記他の演算処理装置へ送信するとともに、送信した前記読出要求に対応する前記処理対象データを含むデータブロックを前記他の演算処理装置から受信する。前記バッファは、前記データブロックを保存する。前記書込部は、前記通信部が受信した前記データブロックに含まれる複数の単位データを前記バッファに順次書き込む。前記読出部は、前記複数の単位データのうちの少なくとも一つである前記処理対象データを前記バッファから優先的に読み出す。また、前記読出部は、前記データブロックに付された応答ヘッダに前記他の演算処理装置によって記録された前記処理対象データのアドレス情報であって、前記複数の単位データの中から前記処理対象データを特定するための情報を参照し、当該アドレス情報に対応する前記処理対象データを前記バッファから読み出した後、前記処理対象データ以外の単位データを前記バッファから順次読み出す。 The arithmetic processing device of the present case is connected to another arithmetic processing device, and has a processing unit, a communication unit, a buffer, a writing unit, and a reading unit. The processing unit generates a read request for processing target data. The communication unit transmits the read request generated by the processing unit to the another arithmetic processing device, and receives a data block including the processing target data corresponding to the transmitted read request from the other arithmetic processing device. I do. The buffer stores the data block. The writing unit sequentially writes a plurality of unit data included in the data block by the communication unit has received in the buffer. The reading unit preferentially reads the processing target data, which is at least one of the plurality of unit data, from the buffer. Further, the reading unit is address information of the processing target data recorded by the another processing unit in a response header attached to the data block, and the processing target data is selected from the plurality of unit data. After reading the processing target data corresponding to the address information from the buffer with reference to the information for specifying the address information, unit data other than the processing target data is sequentially read from the buffer.

一実施形態によれば、処理対象データの読出レイテンシを短縮することができる。 According to the embodiment, the read latency of the data to be processed can be reduced.

本発明の第１実施形態および第２実施形態としての演算処理装置（ＣＰＵ）を含む情報処理装置の構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of an information processing apparatus including an arithmetic processing unit (CPU) according to a first embodiment and a second embodiment of the present invention. 第１実施形態の情報処理装置において一ＣＰＵから他ＣＰＵへフェッチ要求を行なう場合のパケットルーティングを説明する図である。FIG. 3 is a diagram illustrating packet routing when a fetch request is made from one CPU to another CPU in the information processing apparatus according to the first embodiment. 第１実施形態におけるフェッチ応答パケットのヘッダのフォーマットを示す図である。FIG. 4 is a diagram illustrating a format of a header of a fetch response packet in the first embodiment. 第１実施形態のＣＰＵにおけるルータに含まれる受信バッファおよび当該受信バッファに対するパケット書込／読出処理に係る構成（書込部および読出部）を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration of a reception buffer included in a router and a packet write / read process for the reception buffer in the CPU according to the first embodiment (a write unit and a read unit). 図４に示すパケット読出処理に係る構成（読出部）を詳細に示すブロック図である。FIG. 5 is a block diagram illustrating in detail a configuration (reading unit) relating to a packet reading process illustrated in FIG. 4. 図５に示すパケット読出処理に係る構成（読出部）の動作を説明するフローチャートである。6 is a flowchart illustrating an operation of a configuration (reading unit) according to the packet reading process illustrated in FIG. 5. 図４および図５に示すパケット読出処理に係る構成（読出部）が図６に示すフローチャートに従って１０番目のデータを最初に読み出す場合の動作を示すタイムチャートである。FIG. 7 is a time chart illustrating an operation when a configuration (reading unit) relating to the packet reading process illustrated in FIGS. 4 and 5 first reads the tenth data according to the flowchart illustrated in FIG. 6. （Ａ）は第２実施形態におけるフェッチ要求パケットのヘッダのフォーマットを示す図、（Ｂ）は第２実施形態におけるフェッチ応答パケットのヘッダのフォーマットを示す図である。(A) is a diagram showing a format of a header of a fetch request packet in the second embodiment, and (B) is a diagram showing a format of a header of a fetch response packet in the second embodiment. 第２実施形態の情報処理装置において一ＣＰＵから他ＣＰＵへフェッチ要求を行なう場合のパケットルーティングを説明する図である。FIG. 14 is a diagram illustrating packet routing when a fetch request is made from one CPU to another CPU in the information processing apparatus according to the second embodiment. 第２実施形態のＣＰＵにおけるルータに含まれる受信バッファおよび当該受信バッファに対するパケット書込／読出処理に係る構成（書込部および読出部）を示すブロック図である。FIG. 13 is a block diagram illustrating a configuration of a reception buffer included in a router and a packet writing / reading process for the reception buffer (a writing unit and a reading unit) in the CPU according to the second embodiment; 図１０に示すパケット読出処理に係る構成（読出部）を詳細に示すブロック図である。FIG. 11 is a block diagram illustrating in detail a configuration (reading unit) relating to a packet reading process illustrated in FIG. 図１１に示すパケット読出処理に係る構成（読出部）の動作を説明するフローチャートである。12 is a flowchart illustrating an operation of a configuration (reading unit) related to the packet reading process illustrated in FIG. 図１０および図１１に示すパケット読出処理に係る構成（読出部）が図１２に示すフローチャートに従って２，６，１０，１４番目のデータを先に読み出す場合の動作を示すタイムチャートである。13 is a time chart showing an operation when the configuration (reading unit) relating to the packet reading process shown in FIGS. 10 and 11 reads out the second, sixth, tenth, and fourteenth data first in accordance with the flowchart shown in FIG. マルチプロセッサシステム（情報処理装置）の構成の一例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of a configuration of a multiprocessor system (information processing device). マルチコアプロセッサ（ＣＰＵ，演算処理装置）の構成の一例を示すブロック図である。FIG. 3 is a block diagram illustrating an example of a configuration of a multi-core processor (CPU, arithmetic processing device). 図１４に示すマルチプロセッサシステムにおいて一ＣＰＵから他ＣＰＵへフェッチ要求を行なう場合のパケットルーティングを説明する図である。FIG. 15 is a diagram illustrating packet routing when a fetch request is made from one CPU to another CPU in the multiprocessor system shown in FIG. 14. （Ａ）はデータ無しパケットのフォーマットを示す図、（Ｂ）はデータ付きパケットのフォーマットを示す図である。(A) is a diagram showing a format of a packet without data, and (B) is a diagram showing a format of a packet with data. （Ａ）はフェッチ要求パケットのヘッダのフォーマットを示す図、（Ｂ）はフェッチ応答パケットのヘッダのフォーマットを示す図である。(A) is a diagram showing a format of a header of a fetch request packet, and (B) is a diagram showing a format of a header of a fetch response packet. 図１５に示すＣＰＵ（マルチコアプロセッサ）におけるルータに含まれる受信バッファおよび当該受信バッファに対するパケット書込／読出処理に係る構成を示すブロック図である。FIG. 16 is a block diagram illustrating a configuration related to a reception buffer included in a router and a packet write / read process for the reception buffer in the CPU (multi-core processor) illustrated in FIG. 15. 図１９に示すパケット書込処理に係る構成の動作を説明するフローチャートである。20 is a flowchart illustrating an operation of a configuration related to the packet writing process illustrated in FIG. 図１９に示すパケット読出処理に係る構成の動作を説明するフローチャートである。20 is a flowchart illustrating an operation of a configuration related to the packet reading process illustrated in FIG. 19. 図１９に示すパケット書込／読出処理に係る構成において最短ケースの動作を示すタイムチャートである。20 is a time chart showing the operation in the shortest case in the configuration relating to the packet write / read processing shown in FIG. 図１９に示すパケット書込処理に係る構成において受信パケットの末尾がＥＤＢである場合の動作を示すタイムチャートである。FIG. 20 is a time chart illustrating an operation when the end of a received packet is EDB in the configuration related to the packet writing process illustrated in FIG. 19.

以下に、図面を参照し、本願の開示する演算処理装置、情報処理装置、および情報処理装置の制御方法の実施形態について、詳細に説明する。ただし、以下に示す実施形態は、あくまでも例示に過ぎず、実施形態で明示しない種々の変形例や技術の適用を排除する意図はない。すなわち、本実施形態を、その趣旨を逸脱しない範囲で種々変形して実施することができる。また、各図は、図中に示す構成要素のみを備えるという趣旨ではなく、他の機能を含むことができる。そして、各実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Hereinafter, an embodiment of an arithmetic processing device, an information processing device, and a control method of an information processing device disclosed in the present application will be described in detail with reference to the drawings. However, the embodiment described below is merely an example, and there is no intention to exclude various modified examples and applications of technology not explicitly described in the embodiment. That is, the present embodiment can be implemented with various modifications without departing from the spirit thereof. In addition, each drawing is not intended to include only the components illustrated in the drawings, but may include other functions. The embodiments can be appropriately combined within a range that does not contradict processing contents.

〔１〕本実施形態に関連する技術
まず、図１４〜図２３を参照しながら本実施形態に関連する技術について説明する。
近年、マルチプロセッサシステム（情報処理装置）では、通信の高スループットを実現すべく、図１４に示すように、複数のＣＰＵやルータの相互間が、高速シリアル伝送によるネットワーク（図１６の符号３０参照）を介して接続される。図１４は、マルチプロセッサシステムの構成の一例を示すブロック図である。 [1] Technology Related to the Present Embodiment First, a technology related to the present embodiment will be described with reference to FIGS.
In recent years, in a multiprocessor system (information processing device), as shown in FIG. 14, a plurality of CPUs and routers are interconnected by a high-speed serial transmission network (see reference numeral 30 in FIG. 16) in order to achieve high communication throughput. ). FIG. 14 is a block diagram illustrating an example of a configuration of a multiprocessor system.

また、近年、ＣＰＵとしては、複数のコア（core）を内蔵したマルチコアプロセッサが一般的に用いられる。ここで、図１５を参照しながら、図１４に示すマルチプロセッサシステムを構成する各ＣＰＵ（マルチコアプロセッサ，演算処理装置）１０の構成の一例について説明する。図１５は、ＣＰＵ１０の構成の一例を示すブロック図である。 In recent years, a multi-core processor having a plurality of cores has been generally used as a CPU. Here, an example of a configuration of each CPU (multi-core processor, arithmetic processing unit) 10 configuring the multiprocessor system illustrated in FIG. 14 will be described with reference to FIG. FIG. 15 is a block diagram illustrating an example of the configuration of the CPU 10.

図１５に示すように、ＣＰＵ１０においては、複数のコア１１が共有キャッシュ（三次キャッシュ）１２を介して接続されている。図１５では、コア１１として、Ｎ＋１個のコア＃０〜コア＃Ｎ（Ｎは１以上の整数）が備えられている。図１５では、一次キャッシュ（不図示）および二次キャッシュ（不図示）が各コア１１に内蔵され、三次キャッシュが、共有キャッシュ１２として用いられているが、一次キャッシュを各コア１１に内蔵し、二次キャッシュを共有キャッシュ１２として用いてもよい。 As shown in FIG. 15, in the CPU 10, a plurality of cores 11 are connected via a shared cache (tertiary cache) 12. In FIG. 15, the core 11 includes N + 1 cores # 0 to #N (N is an integer of 1 or more). In FIG. 15, a primary cache (not shown) and a secondary cache (not shown) are built in each core 11, and a tertiary cache is used as the shared cache 12, but a primary cache is built in each core 11, A secondary cache may be used as the shared cache 12.

共有キャッシュ１２には、コア１１のほかにＭＡＣ（Memory Access Controller）１３およびルータ（Router）１４が接続されている。ＭＡＣ１３は、メインメモリとして機能するＤＩＭＭ（Dual Inline Memory Module）２０を接続され、ＤＩＭＭ２０とのデータのやり取りを制御する制御部として機能する。ルータ１４は、他のＣＰＵ１０または他のルータ（例えばルーティング用のＬＳＩ（Large Scale Integrated circuit））に接続され、パケットによって、他のＣＰＵ１０とのデータの通信を行なう通信部として機能する。 In addition to the core 11, a MAC (Memory Access Controller) 13 and a router (Router) 14 are connected to the shared cache 12. The MAC 13 is connected to a DIMM (Dual Inline Memory Module) 20 functioning as a main memory, and functions as a control unit that controls data exchange with the DIMM 20. The router 14 is connected to another CPU 10 or another router (for example, a large scale integrated circuit (LSI) for routing), and functions as a communication unit that performs data communication with the other CPU 10 by a packet.

このようなＣＰＵ１０において、コア１１の一つがメモリデータの読出（フェッチ）を要求する場合（フェッチ要求を発行する場合）について考える。コア１１は、当該コア１１が必要とする８バイトのデータ（処理対象データ）を示すアドレスを指定して、まず、コア１１に内蔵されたキャッシュ（一次キャッシュ，二次キャッシュ）にアクセスする。その結果、コア１１は、キャッシュヒットすると、当該８バイトのデータを内蔵キャッシュから読み出す。 In such a CPU 10, a case where one of the cores 11 requests reading (fetching) of memory data (a case of issuing a fetch request) will be considered. The core 11 specifies an address indicating 8-byte data (processing target data) required by the core 11 and first accesses a cache (a primary cache and a secondary cache) built in the core 11. As a result, when a cache hit occurs, the core 11 reads the 8-byte data from the internal cache.

一方、コア１１は、内蔵メモリでキャッシュミスすると、複数のコアによって共有している共有キャッシュ（図１５では３次キャッシュ）１２にアクセスする。しかし、共有キャッシュ１２でもキャッシュミスすると、メインメモリ２０に対するアクセスが行なわれる。共有キャッシュ１２のキャッシュラインサイズは、例えば１２８バイトであるため、ＭＡＣ１３に対しては、コア１１の要求する８バイトの処理対象データを含む１２８バイトのデータブロックの読出が要求される。 On the other hand, when a cache miss occurs in the built-in memory, the core 11 accesses a shared cache (a tertiary cache in FIG. 15) 12 shared by a plurality of cores. However, if a cache miss occurs in the shared cache 12, access to the main memory 20 is performed. Since the cache line size of the shared cache 12 is, for example, 128 bytes, the MAC 13 is required to read a 128-byte data block including the 8-byte processing target data requested by the core 11.

次に、コア１１が、当該コア１１の属するＣＰＵ１０（例えば図１４や図１６のＣＰＵ＃０）と異なる他のＣＰＵ１０（例えば図１４や図１６のＣＰＵ＃１）におけるメモリデータ（例えば８バイトの処理対象データ）の読出を要求する場合について、図１６を参照しながら説明する。図１６は、図１４に示すマルチプロセッサシステムにおいて一ＣＰＵ＃０から他ＣＰＵ＃１へフェッチ要求を行なう場合のパケットルーティングを説明する図である。 Next, the core 11 is configured such that memory data (for example, 8 bytes) of another CPU 10 (for example, CPU # 1 of FIGS. 14 and 16) different from the CPU 10 to which the core 11 belongs (for example, CPU # 0 of FIGS. 14 and 16). The case of requesting the reading of (processing target data) will be described with reference to FIG. FIG. 16 is a diagram illustrating packet routing when a fetch request is issued from one CPU # 0 to another CPU # 1 in the multiprocessor system shown in FIG.

この場合、図１６に示すように、ＣＰＵ＃０におけるコア１１は、フェッチ要求パケットを、ルータ１４およびネットワーク３０を介して他のＣＰＵ＃１へ送信する。フェッチ要求パケットを受け取ったＣＰＵ＃１は、要求されたメモリデータ（８バイトの処理対象データ）を含むデータブロックを、ＣＰＵ＃１のＤＩＭＭ２０から読み出し、フェッチ応答パケットとしてデータブロックを送り返す。このときのデータブロックのサイズは１２８バイトである。 In this case, as shown in FIG. 16, the core 11 in the CPU # 0 transmits a fetch request packet to another CPU # 1 via the router 14 and the network 30. The CPU # 1 that has received the fetch request packet reads a data block including the requested memory data (8-byte processing target data) from the DIMM 20 of the CPU # 1, and returns the data block as a fetch response packet. At this time, the size of the data block is 128 bytes.

ここで、図１７（Ａ）および図１７（Ｂ）を参照しながら、フェッチ要求パケットおよびフェッチ応答パケットについて説明する。図１７（Ａ）はデータ無しパケット（フェッチ要求パケット）のフォーマットを示す図であり、図１７（Ｂ）はデータ付きパケット（フェッチ応答パケット）のフォーマットを示す図である。 Here, the fetch request packet and the fetch response packet will be described with reference to FIGS. 17A and 17B. FIG. 17A is a diagram showing the format of a packet without data (fetch request packet), and FIG. 17B is a diagram showing the format of a packet with data (fetch response packet).

パケットには、図１７（Ａ）に示すようなデータ無しパケットと、図１７（Ｂ）に示すようなデータ付きパケットとがある。ＣＰＵ＃０からＣＰＵ＃１へ送信されるフェッチ要求パケットは、データ無しパケットであり、当該フェッチ要求パケットに対するフェッチ応答パケットは、データ付きパケットである。パケットの先頭サイクルは、パケットの内容や宛先を判別するための情報を載せたヘッダ（Header；ＨＤ）である。 The packet includes a packet without data as shown in FIG. 17A and a packet with data as shown in FIG. The fetch request packet transmitted from CPU # 0 to CPU # 1 is a packet without data, and the fetch response packet for the fetch request packet is a packet with data. The head cycle of the packet is a header (HDD) on which information for determining the contents and destination of the packet is placed.

さらに、図１８（Ａ）および図１８（Ｂ）を参照しながら、フェッチ要求パケットおよびフェッチ応答パケットのヘッダについて説明する。図１８（Ａ）はフェッチ要求パケットのヘッダのフォーマットを示す図であり、図１８（Ｂ）はフェッチ応答パケットのヘッダのフォーマットを示す図である。 Furthermore, the headers of the fetch request packet and the fetch response packet will be described with reference to FIGS. FIG. 18A is a diagram showing the format of the header of the fetch request packet, and FIG. 18B is a diagram showing the format of the header of the fetch response packet.

図１８（Ａ）に示すように、フェッチ要求パケットのヘッダには、パケットの種別を表すOpcode（オペコード；ＯＰＣ)や、コア１１の要求するメモリデータのPhysical Address（物理アドレス；ＰＡまたはｐａ）や、パケットを識別するための識別子であるRequest ＩＤ（ＲＱＩＤ）などの情報が含まれる。図１８（Ｂ）に示すように、フェッチ応答パケットのヘッダには、ＯＰＣやＲＱＩＤなどの情報が含まれる。なお、ＲＳＶ（reserve）は、未使用のビットである。 As shown in FIG. 18A, the header of the fetch request packet includes an Opcode (opcode; OPC) indicating the type of the packet, a Physical Address (physical address; PA or pa) of the memory data requested by the core 11, and the like. , A request ID (RQID) which is an identifier for identifying a packet. As shown in FIG. 18B, the header of the fetch response packet contains information such as OPC and RQID. Note that RSV (reserve) is an unused bit.

コア１１の要求する処理対象データを含む１２８バイトのデータブロックは、図１７（Ｂ）に示すように、ヘッダの後に、８バイトのデータに分けられ、１６サイクルかけて送信される。フェッチ応答パケットにおいて、各サイクルの８バイトデータは、先頭から順に物理アドレスｐａ[6:3] ＝ 0000, 0001, … , 1111におけるデータData0, Data1, … , DataFに対応する。 As shown in FIG. 17B, the 128-byte data block including the processing target data requested by the core 11 is divided into 8-byte data after the header and transmitted over 16 cycles. In the fetch response packet, the 8-byte data in each cycle corresponds to the data Data0, Data1,..., DataF at the physical address pa [6: 3] = 0000, 0001,.

ＣＰＵ＃１からフェッチ応答パケットを受け取ったＣＰＵ＃０（ルータ１４）は、フェッチ応答パケットに添付されたデータブロックを、ＣＰＵ＃０の共有キャッシュ１２に登録するとともに、フェッチ要求を行なったコア１１へ送る。 The CPU # 0 (router 14) that has received the fetch response packet from the CPU # 1 registers the data block attached to the fetch response packet in the shared cache 12 of the CPU # 0 and sends the data block to the core 11 that has issued the fetch request. send.

一方、ＣＰＵ＃０，＃１間の伝送エラーは、パケットに付されたＣＲＣをパケット受信側でチェックすることによって検出される。伝送エラーが検出されると、伝送エラーを検出したパケット以降のパケットは破棄され、ＤＬＬＰ（Data Link Layer Packet）を用いてＮＡＣＫ（Negative ACKnowledgement）を送ることで、送信側ＣＰＵ＃１に対しパケットの再送が要求される。 On the other hand, a transmission error between the CPUs # 0 and # 1 is detected by checking the CRC attached to the packet on the packet receiving side. When a transmission error is detected, packets subsequent to the packet in which the transmission error is detected are discarded, and a NACK (Negative Acknowledgment) is transmitted by using a DLLP (Data Link Layer Packet), whereby the packet is transmitted to the transmitting CPU # 1. Retransmission is required.

伝送エラーを検出したパケットを、中継点（ルータ；図１４参照）や受信側ＣＰＵ＃０において破棄することができない場合、当該パケットは、当該パケットの末尾をＥＤＢにして送出される。 If a packet in which a transmission error has been detected cannot be discarded at the relay point (router; see FIG. 14) or the receiving CPU # 0, the packet is sent with the end of the packet set to EDB.

受信側ＣＰＵ＃０におけるルータ１４は、他のＣＰＵ＃１から受信したフェッチ応答パケットを、一旦、当該ルータ１４内のバッファ（図１９の符号１４１参照）に格納する。当該バッファを受信バッファという。受信バッファはＦＩＦＯのバッファである。ここで、図１９〜図２３を参照しながら、受信バッファ１４１および当該受信バッファ１４１に対するパケット書込／読出処理について説明する。 The router 14 in the receiving CPU # 0 temporarily stores the fetch response packet received from the other CPU # 1 in a buffer (see reference numeral 141 in FIG. 19) in the router 14. The buffer is called a reception buffer. The reception buffer is a FIFO buffer. Here, the reception buffer 141 and packet write / read processing for the reception buffer 141 will be described with reference to FIGS.

図１９は、図１５に示すＣＰＵ１０におけるルータ１４に含まれる受信バッファ１４１および当該受信バッファ１４１に対するパケット書込／読出処理に係る構成を示すブロック図である。図１９に示すように、受信バッファ１４１の書込側には、ＷＤＲ（Write Data Register；書込データレジスタ），ＨＤＷＰ（Header Write Pointer；ヘッダ書込ポインタ）およびＷＰ（Write Pointer；書込ポインタ）が備えられる。また、受信バッファ１４１の読出側には、ＲＤＲ（Read Data Register；読出データレジスタ）およびＲＰ（Read Pointer；読出ポインタ）が備えられる。 FIG. 19 is a block diagram showing a configuration relating to the reception buffer 141 included in the router 14 and the packet writing / reading process for the reception buffer 141 in the CPU 10 shown in FIG. As shown in FIG. 19, on the write side of the reception buffer 141, WDR (Write Data Register), HDWP (Header Write Pointer) and WP (Write Pointer) are provided. Is provided. The read side of the reception buffer 141 includes an RDR (Read Data Register) and an RP (Read Pointer).

ＷＤＲは、受信バッファ１４１に書き込まれる書込対象データ（一単位データ；例えば８バイトデータ）を一時的に保存するレジスタである。ＷＰは、書込対象パケットの書込を制御するもので、ＷＤＲに保存されている書込対象データの、受信バッファ１４１における書込先アドレスを指定するポインタである。ＷＰは、初期状態では０を設定され、書込対象パケットの受信バッファ１４１への書込時に１サイクル毎に１ずつインクリメントされる。ＷＤＲに保存されている書込対象データは、ＷＰによって指定されるアドレスに書き込まれる。ＨＤＷＰは、書込対象パケットのヘッダのアドレスを指定するポインタであり、初期状態では０を設定される。 The WDR is a register for temporarily storing write target data (one unit of data; for example, 8 bytes of data) to be written to the reception buffer 141. The WP controls writing of a packet to be written, and is a pointer that specifies a write destination address in the reception buffer 141 of the write target data stored in the WDR. The WP is set to 0 in an initial state, and is incremented by one every cycle when a write target packet is written to the reception buffer 141. The write target data stored in the WDR is written to an address specified by the WP. HDWP is a pointer that specifies the address of the header of the packet to be written, and is set to 0 in the initial state.

ＲＤＲは、受信バッファ１４１から読み出された読出データ（一単位データ；例えば８バイトデータ）を一時的に保存するレジスタである。ＲＰは、受信バッファ１４１からのデータの読出を制御するもので、受信バッファ１４１から読み出すデータのアドレスを指定するポインタである。ＲＰは、初期状態では０を設定され、受信バッファ１４１からのデータ読出時に１サイクル毎に１ずつインクリメントされる。ＲＰによって指定されるアドレスのデータは、受信バッファ１４１から読み出され、ＲＤＲに一時的に保存され、前述したように共有キャッシュ１２に登録されるとともにフェッチ要求を行なったコア１１へ送信される。 The RDR is a register for temporarily storing read data (one unit data; for example, 8 byte data) read from the reception buffer 141. The RP controls reading of data from the reception buffer 141, and is a pointer that specifies an address of data to be read from the reception buffer 141. RP is set to 0 in an initial state, and is incremented by one every cycle when data is read from the reception buffer 141. The data at the address specified by the RP is read from the reception buffer 141, temporarily stored in the RDR, registered in the shared cache 12, and transmitted to the core 11 that has issued the fetch request as described above.

図２０に示すフローチャート（ステップＳ１０１〜Ｓ１０７）に従って、図１９に示すパケット書込処理に係る構成（ＣＰＵ＃０におけるルータ１４）の動作について説明する。ルータ１４は、パケット書込時にヘッダ（ＨＤ）またはデータ（ＤＴ）の受信を待機しており（ステップＳ１０１のＮＯルート）、ヘッダ（ＨＤ）またはデータ（ＤＴ）を受信すると（ステップＳ１０１のＹＥＳルート）、ＥＤＢを受信したか否かを判断する（ステップＳ１０２）。 The operation of the configuration (router 14 in CPU # 0) related to the packet writing process shown in FIG. 19 will be described with reference to the flowchart (steps S101 to S107) shown in FIG. The router 14 waits for reception of the header (HD) or data (DT) at the time of writing the packet (NO route in step S101), and receives the header (HD) or data (DT) (YES route in step S101). ), It is determined whether or not EDB has been received (step S102).

ＥＤＢを受信していない場合（ステップＳ１０２のＮＯルート）、ルータ１４は、パケットの最終サイクルを示すＥＮＤを受信したか否かを判断する（ステップＳ１０３）。ＥＮＤを受信した場合（ステップＳ１０３のＹＥＳルート）、ルータ１４は、ＨＤＷＰに、次の書込対象パケットの先頭アドレス（ヘッダのアドレス）を指定する値として、ＷＰ＋１を設定する（ステップＳ１０４）。この後、ルータ１４は、ＷＰによって指定される、受信バッファ１４１のエントリに、ＷＤＲにおけるデータ（又はヘッダ）を書き込んでから（ステップＳ１０５）、ＷＰを１インクリメントし（ステップＳ１０６）、ステップＳ１０１の処理に戻る。 If the EDB has not been received (NO route in step S102), the router 14 determines whether an END indicating the last cycle of the packet has been received (step S103). When the END is received (YES route in step S103), the router 14 sets WP + 1 in HDWP as a value designating the start address (header address) of the next packet to be written (step S104). Thereafter, the router 14 writes the data (or header) in the WDR into the entry of the reception buffer 141 specified by the WP (step S105), then increments the WP by 1 (step S106), and performs the process of step S101. Return to

ステップＳ１０２でＥＤＢを受信したと判断した場合（ＹＥＳルート）、ルータ１４は、ＷＰにＨＤＷＰを設定し（ステップＳ１０７）、ステップＳ１０１の処理に戻る。これによって、現在の書込対象パケット（末尾にＥＤＢを設定されたパケット）が、再度、先頭（ヘッダ）から順に受信バッファ１４１へ書き込まれる。 When it is determined in step S102 that the EDB has been received (YES route), the router 14 sets HDWP as WP (step S107), and returns to the processing of step S101. As a result, the current packet to be written (the packet with the EDB set at the end) is again written to the reception buffer 141 in order from the beginning (header).

ついで、図２１に示すフローチャート（ステップＳ１１１〜Ｓ１１３）に従って、図１９に示すパケット読出処理に係る構成（ＣＰＵ＃０におけるルータ１４）の動作について説明する。ルータ１４は、ＲＰの値とＨＤＷＰの値とが一致しているか否かを判断する（ステップＳ１１１）。ＲＰの値とＨＤＷＰの値とが一致している場合（ステップＳ１１１のＹＥＳルート）、ルータ１４は、書込対象パケットの書込中であると判断し、ステップＳ１１１の処理に戻る。 Next, the operation of the configuration (router 14 in CPU # 0) related to the packet reading process shown in FIG. 19 will be described with reference to the flowchart (steps S111 to S113) shown in FIG. The router 14 determines whether or not the value of the RP matches the value of the HDWP (step S111). When the value of the RP matches the value of the HDWP (YES route in step S111), the router 14 determines that the packet to be written is being written, and returns to the process in step S111.

ＲＰの値とＨＤＷＰの値とが一致しない場合（ステップＳ１１１のＮＯルート）、ルータ１４は、書込対象パケットの書込を完了したと判断し、受信バッファ１４１からのパケット読出を開始する（図２２のタイミングＴ１８参照）。つまり、ルータ１４は、ＲＰによって指定される、受信バッファ１４１のエントリ（一単位データ）を、受信バッファ１４１からＲＤＲ経由で読み出す（ステップＳ１１２）。この後、ルータ１４は、ＲＰを１インクリメントし（ステップＳ１１３）、ステップＳ１１１の処理に戻る。 When the value of the RP does not match the value of the HDWP (NO route in step S111), the router 14 determines that the writing of the packet to be written is completed, and starts reading the packet from the reception buffer 141 (FIG. 22 timing T18). That is, the router 14 reads an entry (one unit data) of the reception buffer 141 specified by the RP from the reception buffer 141 via the RDR (step S112). Thereafter, the router 14 increments the RP by 1 (step S113), and returns to the process of step S111.

ここで、受信バッファ１４１に対するフェッチ応答パケットの書込／読出処理の最短ケースのタイムチャートを図２２に示す。図２２に示すように、初期状態（タイミングＴ０参照）において、ＷＥ（Write Enable）およびＲＥ（Read Enable）はＬｏｗ状態に設定され、ＷＰ，ＨＤＷＰ，ＲＰは０に設定される。パケットの書込処理は、ＷＥをＨｉｇｈ状態にすることで開始され、まずヘッダＨＤがＷＤＲ経由で受信バッファ１４１に書き込まれる（タイミングＴ１参照）。 Here, FIG. 22 shows a time chart of the shortest case of the write / read processing of the fetch response packet to / from the reception buffer 141. As shown in FIG. 22, in the initial state (see timing T0), WE (Write Enable) and RE (Read Enable) are set to the Low state, and WP, HDWP, and RP are set to 0. The packet writing process is started by setting the WE to the high state, and first, the header HD is written to the reception buffer 141 via the WDR (see timing T1).

以降、ＷＰが１ずつインクリメントされる都度、１６個のデータＤＴ０〜ＤＴＦがＷＤＲ経由で受信バッファ１４１に順次書き込まれる（タイミングＴ２〜Ｔ１７参照；図２０のステップＳ１０３のＮＯルートからステップＳ１０５，Ｓ１０６参照）。 Thereafter, every time WP is incremented by one, 16 data DT0 to DTF are sequentially written to the reception buffer 141 via WDR (see timings T2 to T17; see steps S105 and S106 from the NO route of step S103 in FIG. 20). ).

そして、タイミングＴ１７で、データＤＴＦとパケットの最終サイクルを示すＥＮＤとが受信されると、ＷＥがＬｏｗ状態に設定されるとともにＲＥがＨｉｇｈ状態に設定される。これに伴い、タイミングＴ１８で、ＨＤＷＰの値として、次のパケットのヘッダのアドレスを示す値ＷＰ＋１＝１７が設定される（図２０のステップＳ１０３のＹＥＳルートからステップＳ１０４）。これにより、ＨＤＷＰの値１７とＲＰの値１とが不一致になり（図２１のステップＳ１１１のＮＯルート）、パケットの読出処理が開始され、まずヘッダＨＤがＲＤＲ経由で受信バッファ１４１から読み出される（タイミングＴ１８参照）。 Then, when the data DTF and END indicating the last cycle of the packet are received at timing T17, WE is set to a low state and RE is set to a high state. Accordingly, at timing T18, a value WP + 1 = 17 indicating the address of the header of the next packet is set as the value of HDWP (from the YES route of step S103 in FIG. 20 to step S104). As a result, the value 17 of the HDWP does not match the value 1 of the RP (NO route of step S111 in FIG. 21), and the packet reading process is started. First, the header HD is read from the reception buffer 141 via the RDR ( (See timing T18).

以降、ＲＰが１インクリメントされる都度、１６個のデータＤＴ０〜ＤＴＦがＲＤＲ経由で受信バッファ１４１から順次読み出される（タイミングＴ１９〜Ｔ３４参照；図２１のステップＳ１１１のＮＯルートからステップＳ１１２，Ｓ１１３参照）。このような読出処理は、タイミングＴ３４でＨＤＷＰの値とＲＰの値とが一致するまで（つまり図２１のステップＳ１１１でＹＥＳと判定されるまで）実行される。図２２に示す例では、タイミングＴ３４においてＨＤＷＰの値とＲＰの値とは、１７で一致している。 Thereafter, every time the RP is incremented by one, 16 data DT0 to DTF are sequentially read from the reception buffer 141 via the RDR (see timings T19 to T34; see steps S112 and S113 from the NO route of step S111 in FIG. 21). . Such a reading process is executed until the value of HDWP matches the value of RP at timing T34 (that is, until it is determined as YES in step S111 in FIG. 21). In the example shown in FIG. 22, the value of HDWP and the value of RP coincide at 17 at timing T34.

図２０〜図２２を参照しながら上述したように、パケット読出処理は、パケットの最終サイクルで最後の単位データが書き込まれるのを待ってから開始される。これは、パケットの最終サイクル（末尾）がＥＤＢである場合には、パケットを破棄する必要があるためである。 As described above with reference to FIGS. 20 to 22, the packet reading process is started after the last unit data is written in the last cycle of the packet. This is because if the last cycle (end) of the packet is EDB, the packet needs to be discarded.

ついで、受信したフェッチ応答パケットの末尾がＥＤＢである場合のタイムチャートを図２３に示す。図２３では、図２２と同様の手順に従ってパケットのデータＤＴ８の書込タイミングＴ１０になった時点で、ＥＤＢが検出される場合が例示されている。この場合、タイミングＴ１１で、ＷＰの値としてＨＤＷＰの値（図２３では０）が保存される（図２０のステップＳ１０２のＹＥＳルートからステップＳ１０７参照）。 Next, a time chart when the end of the received fetch response packet is EDB is shown in FIG. FIG. 23 illustrates a case where EDB is detected at the timing of writing timing T10 of packet data DT8 according to the same procedure as in FIG. In this case, at timing T11, the value of HDWP (0 in FIG. 23) is stored as the value of WP (see step S107 from the YES route of step S102 in FIG. 20).

これにより、受信バッファ１４１からの読出処理を行なうことなく、現在の書込対象パケット（末尾にＥＤＢを設定されたパケット）が、再度、先頭（ヘッダ）から順に受信バッファ１４１へ書き込まれる。その際、末尾にＥＤＢを設定されたパケットは、後続のパケットによって上書きされる。 As a result, the current packet to be written (the packet with the EDB set at the end) is again written to the reception buffer 141 in order from the beginning (header) without performing the reading process from the reception buffer 141. At that time, the packet with the EDB set at the end is overwritten by the subsequent packet.

〔２〕本実施形態の概要
上述したように、高速シリアル伝送によるネットワーク３０では、伝送エラーがＣＲＣによってチェックされる。そして、伝送エラーが検出されるとパケットの末尾がＥＤＢに書き換えられる。そのため、エンドポイント（ＣＰＵ＃０）では、応答データ（処理対象データ）を含むパケットの末尾までの全てのデータを受信しないと、そのパケットが正常か否かを判定することができない。 [2] Overview of the present embodiment As described above, in the network 30 using high-speed serial transmission, transmission errors are checked by CRC. When a transmission error is detected, the end of the packet is rewritten to EDB. Therefore, the endpoint (CPU # 0) cannot determine whether the packet is normal unless all data up to the end of the packet including the response data (processing target data) is received.

したがって、フェッチ要求を発行したＣＰＵ＃０では、当該フェッチ要求に応じた応答データを含むパケットの末尾までの全てのデータが、一旦、受信バッファ１４１に保存されてから、受信バッファ１４１におけるデータが先頭から順にコア１１に送り出される。 Therefore, in the CPU # 0 which has issued the fetch request, all data up to the end of the packet including the response data corresponding to the fetch request is temporarily stored in the reception buffer 141, and then the data in the reception buffer 141 From the core 11.

しかし、受信バッファ１４１はＦＩＦＯであるので、コア１１が要求している処理対象データ（一単位データ；例えば８バイトデータ）の、受信バッファ１４１からの読出は、当該単位データ直前のデータが読み出されるまで待たされる。このため、コア１１から見て通信レイテンシが大きくなり、ＣＰＵ＃０の処理性能や、当該ＣＰＵ＃０を含む情報処理装置（マルチプロセッサシステム）の処理性能が低下する場合がある。 However, since the reception buffer 141 is a FIFO, when the processing target data (one unit data; for example, 8 byte data) requested by the core 11 is read from the reception buffer 141, the data immediately before the unit data is read. Wait until. For this reason, the communication latency increases when viewed from the core 11, and the processing performance of the CPU # 0 and the processing performance of the information processing apparatus (multiprocessor system) including the CPU # 0 may be reduced.

このため、高速シリアル伝送のネットワーク３０で接続されるマルチプロセッサシステムにおいて、コア１１が要求する処理対象データの読出レイテンシを短縮することが望まれている。 Therefore, in a multiprocessor system connected by the high-speed serial transmission network 30, it is desired to reduce the read latency of data to be processed requested by the core 11.

そこで、本実施形態（第１および第２実施形態）では、例えば、他のＣＰＵ１０から読み出され転送されてきたフェッチ応答パケットにおける１２８バイトのデータブロックのうち、コア１１が必要とする８バイトの処理対象データが、先に受信バッファ１４１から読み出されコア１１へ送られる。これにより、コア１１から見た通信レイテンシ（処理対象データの読出レイテンシ）が短縮される。 Therefore, in the present embodiment (first and second embodiments), for example, of the 128-byte data block in the fetch response packet read and transferred from another CPU 10, the 8-byte data block required by the core 11 is used. The processing target data is first read from the reception buffer 141 and sent to the core 11. As a result, the communication latency (read latency of data to be processed) viewed from the core 11 is reduced.

つまり、後述する第１実施形態の情報処理装置では、上述した関連技術と同様、エンドポイントの受信バッファ１４１に、フェッチ要求に対する応答パケットが全て書き込まれる。エンドポイントは、高速シリアル伝送のネットワーク３０に接続されフェッチ応答パケットを受信する受信側ＣＰＵ１０Ａ（図１参照；第１の演算処理装置，ＣＰＵ＃０）である。しかし、第１実施形態の情報処理装置では、図１〜図７を参照しながら後述するように、パケットの書込後、パケットの受信バッファ１４１からの読出時には、受信バッファ１４１に書き込んだ順ではなく、コア１１が要求する８バイトの処理対象データが最初に読み出される。 That is, in the information processing apparatus according to the first embodiment to be described later, all the response packets to the fetch request are written to the reception buffer 141 of the endpoint similarly to the related art described above. The endpoint is the receiving CPU 10A (see FIG. 1; first processing unit, CPU # 0) connected to the high-speed serial transmission network 30 and receiving the fetch response packet. However, in the information processing apparatus according to the first embodiment, as described later with reference to FIGS. 1 to 7, when reading a packet from the reception buffer 141 after writing the packet, the order in which the packet is written into the reception buffer 141 is as follows. Instead, the 8-byte process target data requested by the core 11 is read first.

後述する第１実施形態の情報処理装置は、コア１１が要求する８バイトの処理対象データを最初に読み出すため、フェッチ応答パケットは、コア１１が要求する処理対象データを示す物理アドレス（ｐａ[6:3]）を含む（図２，図３参照）。また、エンドポイントにおいては、上述した関連技術のパケット読出処理に係る構成（図１９のＲＰおよびＲＤＲ参照）に、後述するＨＤＲＰ（ヘッダ読出ポインタ），LengthレジスタおよびCycleＣＴ（サイクルカウンタ）が追加される（図４および図５の読出部１４３参照）。ＨＤＲＰ，LengthレジスタおよびCycleＣＴを用いてＲＰを制御することで、コア１１が要求する８バイトの処理対象データを最初に読み出すことが可能になっている（図５〜図７参照）。 Since the information processing apparatus according to the first embodiment, which will be described later, first reads 8-byte processing target data requested by the core 11, the fetch response packet includes a physical address (pa [6]) indicating the processing target data requested by the core 11. : 3]) (see FIGS. 2 and 3). At the end point, an HDRP (header read pointer), a Length register, and a CycleCT (cycle counter), which will be described later, are added to the configuration (see RP and RDR in FIG. 19) relating to the packet read processing of the related art described above. (See the reading unit 143 in FIGS. 4 and 5). By controlling the RP using the HDRP, the Length register, and the CycleCT, the 8-byte processing target data required by the core 11 can be read first (see FIGS. 5 to 7).

このようにして、ＣＰＵ１０Ａのコア１１から要求される処理対象データを先頭にして受信バッファ１４１から読み出すことで読出レイテンシが短縮される。例えば、コア１１が要求する処理対象データがパケットの最後尾である場合、当該処理対象が最初に読み出されることで、レイテンシを［パケットのデータ部分のサイクル数−１］だけ短縮することができる。逆に、処理対象データがパケットの先頭だった場合、レイテンシは変わらない。つまり、レイテンシの短縮サイクル数は、処理対象データがパケットの何番目のデータであるかによって変わる。したがって、平均すると、［パケットのデータ部分のサイクル数−１］／２程度、レイテンシを短縮することが可能である。 In this way, by reading the data to be processed requested from the core 11 of the CPU 10A from the reception buffer 141 at the head, the read latency is reduced. For example, when the processing target data requested by the core 11 is the tail of the packet, by reading the processing target first, the latency can be reduced by [the number of cycles of the data portion of the packet −1]. Conversely, if the processing target data is at the head of the packet, the latency does not change. In other words, the number of cycles of latency reduction varies depending on the order of the packet data to be processed. Therefore, on average, it is possible to reduce the latency by about [the number of cycles of the data portion of the packet-1] / 2.

また、図１および図８〜図１３を参照しながら後述する第２実施形態は、行列計算等で発生するストライドアクセスにおいて、１個のデータ付きパケット内に、必要な単位データ（処理対象データ）が複数存在する場合に対応する技術である。行列の計算などで配列を扱う際、コア１１は、パケット長よりも短い一定間隔をあけたアドレスにおける複数の単位データを要求し、当該複数の単位データが１つのパケットに含まれている。 In a second embodiment described later with reference to FIG. 1 and FIGS. 8 to 13, in stride access generated by matrix calculation or the like, necessary unit data (processing target data) is included in one packet with data. This is a technique corresponding to a case where there are a plurality of. When handling an array in a matrix calculation or the like, the core 11 requests a plurality of unit data at addresses spaced at a fixed interval shorter than the packet length, and the plurality of unit data are included in one packet.

このような場合、第２実施形態では、受信バッファ１４１からパケットを読み出す際、必要な複数の単位データを、他の単位データよりも前（先頭側）に詰めて送出することで、必要な複数の単位データが、他の単位データよりも先に読み出される。これにより、ＣＰＵ１０Ａのコア１１から見たレイテンシが、大幅に短縮される。 In such a case, in the second embodiment, when reading out a packet from the reception buffer 141, a necessary plurality of unit data is packed before the other unit data (leading side) and transmitted, so that a necessary plurality of unit data are transmitted. Is read out before the other unit data. As a result, the latency as viewed from the core 11 of the CPU 10A is significantly reduced.

〔３〕第１実施形態の情報処理装置
図１を参照しながら、本発明の第１実施形態としての演算処理装置（ＣＰＵ）１０Ａ，１０Ｂを含む情報処理装置（マルチプロセッサシステム）１の構成について説明する。図１は、その構成を示すブロック図である。 [3] Information Processing Device of First Embodiment With reference to FIG. 1, the configuration of an information processing device (multiprocessor system) 1 including arithmetic processing units (CPUs) 10A and 10B as a first embodiment of the present invention will be described. explain. FIG. 1 is a block diagram showing the configuration.

図１に示すように、第１実施形態の情報処理装置１は、複数（図１で２個）のＣＰＵ１０Ａ，１０Ｂを有するマルチプロセッサシステムであり、例えばＰＣ，サーバである。ＣＰＵ１０Ａは、第１の演算処理装置に相当し、受信側ＣＰＵもしくはＣＰＵ＃０と表記する場合がある。また、ＣＰＵ１０Ｂは、第２の演算処理装置に相当し、送信側ＣＰＵもしくはＣＰＵ＃１と表記する場合がある。なお、第１実施形態の情報処理装置１には２個のＣＰＵが備えられているが、本発明は、これに限定されるものでなく、３個以上のＣＰＵが備えられてもよい。 As shown in FIG. 1, the information processing apparatus 1 according to the first embodiment is a multiprocessor system having a plurality of (two in FIG. 1) CPUs 10A and 10B, and is, for example, a PC or a server. The CPU 10A corresponds to a first arithmetic processing device, and may be referred to as a receiving CPU or CPU # 0 in some cases. The CPU 10B corresponds to a second arithmetic processing unit, and may be referred to as a transmitting CPU or a CPU # 1 in some cases. Although the information processing apparatus 1 according to the first embodiment includes two CPUs, the present invention is not limited to this, and may include three or more CPUs.

ＣＰＵ１０ＡとＣＰＵ１０Ｂとは、高速シリアル伝送によるネットワーク３０を介して相互に通信可能に接続される。 The CPU 10A and the CPU 10B are communicably connected to each other via a network 30 using high-speed serial transmission.

ＣＰＵ１０Ａは、図１５に示すＣＰＵ１０と同様、複数のコア１１Ａを内蔵したマルチコアプロセッサである。ＣＰＵ１０Ａにおいては、複数のコア１１が共有キャッシュ（三次キャッシュ）１２Ａを介して接続されている。図１では、コア１１Ａとして、Ｎ＋１個のコア＃０〜コア＃Ｎ（Ｎは１以上の整数）が備えられている。図１では、一次キャッシュ（不図示）および二次キャッシュ（不図示）が各コア１１に内蔵され、三次キャッシュが、共有キャッシュ１２Ａとして用いられているが、一次キャッシュを各コア１１Ａに内蔵し、二次キャッシュを共有キャッシュ１２Ａとして用いてもよい。 The CPU 10A is a multi-core processor including a plurality of cores 11A, like the CPU 10 shown in FIG. In the CPU 10A, a plurality of cores 11 are connected via a shared cache (tertiary cache) 12A. In FIG. 1, N + 1 cores # 0 to #N (N is an integer of 1 or more) are provided as the core 11A. In FIG. 1, a primary cache (not shown) and a secondary cache (not shown) are built in each core 11, and a tertiary cache is used as a shared cache 12A. However, a primary cache is built in each core 11A, A secondary cache may be used as the shared cache 12A.

共有キャッシュ１２Ａには、コア１１ＡのほかにＭＡＣ１３Ａおよびルータ１４Ａが接続されている。ＭＡＣ１３Ａは、メインメモリとして機能するＤＩＭＭ２０Ａを接続され、ＤＩＭＭ２０Ａとのデータのやり取りを制御する制御部として機能する。ルータ１４Ａは、他のＣＰＵ１０Ｂまたは他のルータ（例えばルーティング用のＬＳＩ；図１４参照）に接続され、パケットによって、他のＣＰＵ１０Ｂとのデータの通信を行なう第１の通信部として機能する。 The MAC 13A and the router 14A are connected to the shared cache 12A in addition to the core 11A. The MAC 13A is connected to the DIMM 20A functioning as a main memory, and functions as a control unit that controls data exchange with the DIMM 20A. The router 14A is connected to another CPU 10B or another router (for example, a routing LSI; see FIG. 14), and functions as a first communication unit that performs data communication with the other CPU 10B by a packet.

ＣＰＵ１０Ｂは、コア１１Ｂ，共有キャッシュ（三次キャッシュ）１２Ｂ，ＭＡＣ１３Ｂ，ルータ１４Ｂを有している。コア１１Ｂ，共有キャッシュ１２Ｂ，ＭＡＣ１３Ｂ，ルータ１４Ｂは、それぞれ、上述したＣＰＵ１０Ａのコア１１Ａ，共有キャッシュ１２Ａ，ＭＡＣ１３Ａ，ルータ１４Ａと同様である。ただし、ＭＡＣ１３Ｂは、メインメモリとして機能するＤＩＭＭ２０Ｂを接続され、ＤＩＭＭ２０Ｂとのデータのやり取りを制御する制御部として機能する。ルータ１４Ｂは、他のＣＰＵ１０Ａまたは他のルータ（例えばルーティング用のＬＳＩ；図１４参照）に接続され、パケットによって、他のＣＰＵ１０Ａとのデータの通信を行なう第２の通信部として機能する。 The CPU 10B has a core 11B, a shared cache (tertiary cache) 12B, a MAC 13B, and a router 14B. The core 11B, the shared cache 12B, the MAC 13B, and the router 14B are the same as the above-described core 11A, shared cache 12A, MAC 13A, and router 14A of the CPU 10A, respectively. However, the MAC 13B is connected to the DIMM 20B functioning as a main memory, and functions as a control unit that controls data exchange with the DIMM 20B. The router 14B is connected to another CPU 10A or another router (for example, a routing LSI; see FIG. 14), and functions as a second communication unit that performs data communication with the other CPU 10A by a packet.

ＣＰＵ１０Ａにおける複数のコア（処理部）１１Ａのうちの少なくとも一つは、必要な処理対象データ（例えば８バイトの単位データ）の読出要求を生成する。以下では、読出要求をフェッチ要求という場合がある。 At least one of the plurality of cores (processing units) 11A in the CPU 10A generates a read request for necessary processing target data (for example, 8-byte unit data). Hereinafter, the read request may be referred to as a fetch request.

ＣＰＵ１０Ａにおけるルータ（第１の通信部）１４Ａは、一のコア１１Ａが生成したフェッチ要求を、フェッチ要求パケット（図２，図１７（Ａ）参照）によってＣＰＵ１０Ｂへ送信する。また、ルータ（第１の通信部）１４Ａは、フェッチ要求に対応する処理対象データを含むデータブロックを添付されたフェッチ応答パケット（図２，図１７（Ｂ）参照）を、ＣＰＵ１０Ｂから受信する。 The router (first communication unit) 14A in the CPU 10A transmits a fetch request generated by one core 11A to the CPU 10B by a fetch request packet (see FIGS. 2 and 17A). Further, the router (first communication unit) 14A receives from the CPU 10B a fetch response packet (see FIGS. 2 and 17B) to which a data block including data to be processed corresponding to the fetch request is attached.

一方、ＣＰＵ１０Ｂにおけるルータ（第２の通信部）１４Ｂは、フェッチ要求パケット（図２，図１７（Ａ）参照）によって、ＣＰＵ１０Ａからフェッチ要求を受信する。また、ルータ（第２の通信部）１４Ｂは、フェッチ要求に対応する処理対象データを含むデータブロックを添付されたフェッチ応答パケット（図２，図１７（Ｂ）参照）を、ＣＰＵ１０Ａへ送信する。 On the other hand, the router (second communication unit) 14B in the CPU 10B receives a fetch request from the CPU 10A by a fetch request packet (see FIGS. 2 and 17A). Further, the router (second communication unit) 14B transmits to the CPU 10A a fetch response packet (see FIGS. 2 and 17B) to which a data block including processing target data corresponding to the fetch request is attached.

特に、第１実施形態において、ＣＰＵ１０Ｂのルータ（第２の通信部）１４Ｂは、フェッチ要求（フェッチ要求パケットのヘッダ）に含まれるアドレス情報（図１８（Ａ）のＰＡ参照）から取り出された処理対象データのアドレス情報ｐａ[6:3]を記録した応答ヘッダをデータブロックに付す。ここで、応答ヘッダは、例えば図３を参照しながら後述するフェッチ応答パケットのヘッダである。そして、ルータ（第２の通信部）１４Ｂは、応答ヘッダを付したデータブロックを、フェッチ応答パケット（図２，図１７（Ｂ）参照）としてＣＰＵ１０Ａへ送信する。 In particular, in the first embodiment, the router (second communication unit) 14B of the CPU 10B performs processing extracted from the address information (see PA in FIG. 18A) included in the fetch request (the header of the fetch request packet). A response header recording the address information pa [6: 3] of the target data is added to the data block. Here, the response header is, for example, a header of a fetch response packet described later with reference to FIG. Then, the router (second communication unit) 14B transmits the data block to which the response header is added to the CPU 10A as a fetch response packet (see FIGS. 2 and 17B).

ＣＰＵ１０Ａのルータ（第１の通信部）１４Ａは、受信バッファ１４１，書込部１４２，読出部１４３を有する。第１実施形態において、受信バッファ１４１，書込部１４２，読出部１４３は、ＣＰＵ１０Ａのルータ１４Ａ内に備えられているが、ＣＰＵ１０Ａ内に備えられていればよい。 The router (first communication unit) 14A of the CPU 10A includes a reception buffer 141, a writing unit 142, and a reading unit 143. In the first embodiment, the receiving buffer 141, the writing unit 142, and the reading unit 143 are provided in the router 14A of the CPU 10A, but may be provided in the CPU 10A.

受信バッファ（バッファ）１４１は、データブロックを添付されたデータ付きパケットを単位データ（例えば８バイト）で保存する。 The reception buffer (buffer) 141 stores a packet with data to which a data block is attached as unit data (for example, 8 bytes).

書込部１４２は、ルータ１４Ａによって受信されたパケット（ヘッダおよびデータブロック）に含まれる複数の単位データを、受信バッファ１４１に順次書き込む。第１実施形態の書込部１４２は、図４を参照しながら後述するごとく、図１９に示した関連技術と同様に構成されている。 The writing unit 142 sequentially writes a plurality of unit data included in the packet (header and data block) received by the router 14A into the reception buffer 141. The writing unit 142 of the first embodiment has the same configuration as the related art shown in FIG. 19, as described later with reference to FIG.

読出部１４３は、パケットに含まれる複数の単位データのうちの少なくとも一つである処理対象データを、受信バッファ１４１から優先的に読み出す。特に、第１実施形態の読出部１４３は、データブロックに付された応答ヘッダに記録された処理対象データのアドレス情報ｐａ[6:3]を参照する。そして、読出部１４３は、まず、参照した当該アドレス情報ｐａ[6:3]に対応する処理対象データを受信バッファ１４１から読み出した後、当該処理対象データ以外の単位データを受信バッファ１４１から順次読み出す。第１実施形態の読出部１４３は、図４および図５を参照しながら後述するごとく構成され、図６および図７を参照しながら後述するごとく動作する。 The reading unit 143 preferentially reads, from the reception buffer 141, the processing target data that is at least one of the plurality of unit data included in the packet. In particular, the reading unit 143 of the first embodiment refers to the address information pa [6: 3] of the processing target data recorded in the response header attached to the data block. Then, the reading unit 143 first reads the processing target data corresponding to the referenced address information pa [6: 3] from the reception buffer 141, and then sequentially reads unit data other than the processing target data from the reception buffer 141. . The reading unit 143 of the first embodiment is configured as described below with reference to FIGS. 4 and 5, and operates as described below with reference to FIGS. 6 and 7.

なお、第１実施形態では、ＣＰＵ１０Ａが、第１の通信部としての機能や、受信バッファ１４１，書込部１４２，読出部１４３としての機能を有し、ＣＰＵ１０Ｂが、第２の通信部としての機能を有する場合について説明している。しかし、複数のＣＰＵ１０Ａ，１０Ｂのそれぞれが、第１および第２の通信部としての機能と、受信バッファ１４１，書込部１４２，読出部１４３としての機能とを有していてもよい。 In the first embodiment, the CPU 10A has a function as a first communication unit and a function as a reception buffer 141, a writing unit 142, and a reading unit 143, and the CPU 10B has a function as a second communication unit. The case of having a function is described. However, each of the plurality of CPUs 10A and 10B may have a function as the first and second communication units and a function as the reception buffer 141, the writing unit 142, and the reading unit 143.

ここで、図２を参照しながら、図１に示す情報処理装置１において、一ＣＰＵ１０Ａのコア１１Ａから他ＣＰＵ１０Ｂのメモリデータに対するフェッチ要求を発行する場合のパケットルーティングについて説明する。 Here, with reference to FIG. 2, a description will be given of packet routing when the information processing device 1 shown in FIG. 1 issues a fetch request for the memory data of the other CPU 10B from the core 11A of one CPU 10A.

この場合、図２に示すように、必要な処理対象データを読み出すためのフェッチ要求パケット（データ無しパケット）が、ＣＰＵ１０Ａから発行・送信され、ネットワーク３０経由で、当該処理対象データを保持するメモリ２０Ｂを有するＣＰＵ１０Ｂにルーティングされる。ルータ１４Ｂによってフェッチ要求パケットを受信したＣＰＵ１０Ｂは、要求された処理対象データを含むデータブロックをメモリ２０Ｂから読み出し、フェッチ応答パケット（データ付きパケット）を生成し、当該フェッチ応答パケットをＣＰＵ１０Ａに送信する。 In this case, as shown in FIG. 2, a fetch request packet (data-less packet) for reading necessary processing target data is issued and transmitted from the CPU 10A, and the memory 20B holding the processing target data via the network 30. Is routed to the CPU 10B having Upon receiving the fetch request packet by the router 14B, the CPU 10B reads a data block including the requested processing target data from the memory 20B, generates a fetch response packet (packet with data), and transmits the fetch response packet to the CPU 10A.

このとき、ＣＰＵ１０Ｂで生成されるフェッチ応答パケットのヘッダ（応答ヘッダ）のフォーマットを図３に示す。図３に示すように、第１実施形態では、フェッチ応答パケットのヘッダに、フェッチ要求対象の単位データ（処理対象データ）を特定可能な４ビットの物理アドレスｐａ[6:3]が載せられている。４ビットの物理アドレスｐａ[6:3]は、フェッチ応答パケットに含まれるデータブロックにおける複数の単位データ（例えば１６個の８バイトデータ）のうちの、どの単位データがＣＰＵ１０Ａ側のコア１１Ａによって要求されているのかを識別するための情報である。 At this time, the format of the header (response header) of the fetch response packet generated by the CPU 10B is shown in FIG. As shown in FIG. 3, in the first embodiment, a 4-bit physical address pa [6: 3] capable of specifying the unit data (processing target data) to be fetched is placed in the header of the fetch response packet. I have. The 4-bit physical address pa [6: 3] specifies which unit data of a plurality of unit data (for example, 16 8-byte data) in the data block included in the fetch response packet is requested by the core 11A on the CPU 10A side. This is information for identifying whether or not the information has been written.

ＣＰＵ１０Ｂがフェッチ応答パケットを生成する際、フェッチ要求パケットのヘッダに載っている物理アドレスＰＡ（図１８（Ａ）参照）から、フェッチ要求対象の単位データを特定可能な４ビットの物理アドレスｐａ[6:3]が取り出される。取り出された物理アドレスｐａ[6:3]を、図１８（Ｂ）を参照しながら前述したヘッダに含ませることにより、図３に示すようなフォーマットの応答ヘッダが生成される。このように生成された応答ヘッダを有するフェッチ応答パケットは、図２に示すように、フェッチ要求の発行元であるＣＰＵ１０Ａにルーティングされ送信される。 When the CPU 10B generates the fetch response packet, the physical address PA (see FIG. 18A) described in the header of the fetch request packet indicates the 4-bit physical address pa [6] that can specify the unit data to be fetched. : 3] is taken out. By including the extracted physical address pa [6: 3] in the header described above with reference to FIG. 18B, a response header having a format as shown in FIG. 3 is generated. The fetch response packet having the response header generated in this way is routed and transmitted to the CPU 10A that has issued the fetch request, as shown in FIG.

ＣＰＵ１０Ａ側で受信されたフェッチ応答パケットは、まず、ルータ１４Ａに設けられた受信バッファ１４１に、書込部１４２（図１，図４参照）によって、一旦、単位データ毎に書き込まれる。この後、読出部１４３′におけるＲＰの値（読み出すべき単位データのアドレス）を制御することで、受信バッファ１４１に書き込まれたパケットのうち、コア１１Ａが要求する処理対象データが、受信バッファ１４１から優先的に読み出される。図４は、図１に示すＣＰＵ１０Ａにおけるルータ１４Ａに含まれる受信バッファ１４１および当該受信バッファ１４１に対するパケット書込／読出処理に係る構成（書込部１４２および読出部１４３）を示すブロック図である。 The fetch response packet received by the CPU 10A is first written into the reception buffer 141 provided in the router 14A by the writing unit 142 (see FIGS. 1 and 4) once for each unit data. Thereafter, by controlling the value of the RP (the address of the unit data to be read) in the reading unit 143 ′, the processing target data requested by the core 11A among the packets written in the reception buffer 141 is transmitted from the reception buffer 141. Read out preferentially. FIG. 4 is a block diagram showing reception buffer 141 included in router 14A in CPU 10A shown in FIG. 1 and a configuration (a writing unit 142 and a reading unit 143) relating to packet writing / reading processing for reception buffer 141.

図４に示すように、第１実施形態の書込部１４２は、図１９に示した関連技術の受信バッファ１４１に対するパケット書込処理に係る構成と同様のＷＤＲ，ＨＤＷＰおよびＷＰを有する。また、第１実施形態の読出部１４３は、図１９に示した関連技術の受信バッファ１４１に対するパケット書込処理に係る構成と同様のＲＤＲおよびＲＰに加え、ＨＤＲＰ（Header Read Pointer；ヘッダ読出ポインタ），LengthレジスタおよびCycleＣＴ（Cycle Counter；サイクルカウンタ）を有する。 As illustrated in FIG. 4, the writing unit 142 according to the first embodiment has the same WDR, HDWP, and WP as the configuration related to the packet writing process on the reception buffer 141 according to the related art illustrated in FIG. Further, the reading unit 143 of the first embodiment includes an HDRP (Header Read Pointer) in addition to the same RDR and RP as the configuration related to the packet writing process to the reception buffer 141 of the related art shown in FIG. , Length register and CycleCT (Cycle Counter).

ＲＤＲは、受信バッファ１４１から読み出された読出データ（一単位データ；例えば８バイトデータ）を一時的に保存するレジスタである。ＲＰは、受信バッファ１４１からのデータの読出を制御するもので、受信バッファ１４１から読み出すデータのアドレスを指定するポインタである。ＲＰは、初期状態では０を設定され、受信バッファ１４１からのデータ読出時に１サイクル毎に１ずつインクリメントされる。ＲＰによって指定されるアドレスのデータは、受信バッファ１４１から読み出され、ＲＤＲに一時的に保存され、前述したように共有キャッシュ１２Ａに登録されるとともにフェッチ要求を行なったコア１１へ送信される。 The RDR is a register for temporarily storing read data (one unit data; for example, 8 byte data) read from the reception buffer 141. The RP controls reading of data from the reception buffer 141, and is a pointer that specifies an address of data to be read from the reception buffer 141. RP is set to 0 in an initial state, and is incremented by one every cycle when data is read from the reception buffer 141. The data at the address specified by the RP is read from the reception buffer 141, temporarily stored in the RDR, registered in the shared cache 12A and transmitted to the core 11 that has issued the fetch request as described above.

ＨＤＲＰは、受信バッファ１４１から読出中のパケットのヘッダのアドレスを示すポインタであり、初期状態では０を設定される。Lengthレジスタは、受信バッファ１４１から読出中のパケットのデータ長（パケット長）Lengthを設定される。Lengthレジスタに設定されるデータ長Lengthは、ヘッダを受信バッファ１４１から読み出した際に、Length生成部１４３ａ（図５参照）によって当該ヘッダにおけるOpcode（パケット種）から生成され設定される。CycleＣＴは、受信バッファ１４１から読出中のパケットの単位データが、何個目の単位データであるかを示すカウンタであり、一サイクル毎につまり一単位データを読み出す都度、１ずつインクリメントされる。 HDRP is a pointer indicating the address of the header of the packet being read from the reception buffer 141, and is set to 0 in the initial state. In the Length register, the data length (packet length) Length of the packet being read from the reception buffer 141 is set. The data length Length set in the Length register is generated and set from the Opcode (packet type) in the header by the Length generation unit 143a (see FIG. 5) when the header is read from the reception buffer 141. CycleCT is a counter indicating the number of unit data of the unit data of the packet being read from the reception buffer 141, and is incremented by one every cycle, that is, each time one unit data is read.

ついで、図５を参照しながら、図４に示すパケット読出処理に係る構成、つまり読出部１４３について、より詳細に説明する。図５は、読出部１４３の詳細構成を示すブロック図である。図５に示す読出部１４３は、図６および図７を参照しながら後述するように動作する。図５に示すように、第１実施形態の読出部１４３は、Length生成部１４３ａと１加算器１４３ｂ，１４３ｄと加算器１４３ｃ，１４３ｅ，１４３ｆとセレクタ１４３ｇとを含む。 Next, the configuration related to the packet reading process shown in FIG. 4, that is, the reading unit 143 will be described in more detail with reference to FIG. FIG. 5 is a block diagram illustrating a detailed configuration of the reading unit 143. The reading unit 143 shown in FIG. 5 operates as described later with reference to FIGS. As shown in FIG. 5, the reading unit 143 of the first embodiment includes a length generating unit 143a, 1 adders 143b and 143d, adders 143c, 143e and 143f, and a selector 143g.

Length生成部１４３ａは、図４を参照しながら前述したように、ヘッダを受信バッファ１４１から読み出した際に、当該ヘッダにおけるOpcodeから、受信バッファ１４１から読出中のパケットのデータ長Lengthを生成し、Lengthレジスタに設定する。 As described above with reference to FIG. 4, when reading the header from the reception buffer 141, the Length generation unit 143a generates the data length Length of the packet being read from the reception buffer 141 from the Opcode in the header, Set in the Length register.

１加算器（＋１）１４３ｂは、ＨＤＲＰの示す値（以下、ＨＤＲＰと表記）に１を加算する。 The 1 adder (+1) 143b adds 1 to a value indicated by HDRP (hereinafter, referred to as HDRP).

加算器１４３ｃは、Lengthレジスタにおけるデータ長Lengthと、１加算器１４３ｂからの値ＨＤＲＰ＋１とを加算し、得られた値ＨＤＲＰ＋Length＋１をＨＤＲＰに設定する（図６のステップＳ２４参照）。加算器１４３ｃの動作タイミングは、CycleＣＴの示す値（以下、CycleＣＴと表記）がデータ長LengthになってCycleＣＴをリセット（初期化）するタイミング（図６のステップＳ２２および図７のタイミングＴ３４参照）である。 The adder 143c adds the data length Length in the Length register and the value HDRP + 1 from the one adder 143b, and sets the obtained value HDRP + Length + 1 to HDRP (see step S24 in FIG. 6). The operation timing of the adder 143c is the timing at which the value indicated by CycleCT (hereinafter, referred to as CycleCT) becomes the data length Length and resets (initializes) CycleCT (see step S22 in FIG. 6 and timing T34 in FIG. 7). is there.

１加算器（＋１）１４３ｄは、ＲＰの示す値（以下、ＲＰと表記）に１を加算する。 The one adder (+1) 143d adds 1 to a value indicated by RP (hereinafter, referred to as RP).

加算器１４３ｅは、コア１１Ａが要求する処理対象データを特定する物理アドレスｐａ[6:3]と、１加算器１４３ｄからの値ＲＰ＋１とを加算し、得られた値ＲＰ＋ｐａ[6:3]＋１をＲＰに設定する（図６のステップＳ１６参照）。物理アドレスｐａ[6:3]は、読出中のパケットのヘッダからＲＤ−ＢＵＳ（読出バス）を介して読み出される。加算器１４３ｅの動作タイミングは、フェッチ応答パケットからヘッダを読み出すタイミング（図６のステップＳ１５のＹＥＳルート；図７のタイミングＴ１８参照）である。当該タイミングで、セレクタ１４３ｇは、加算器１４３ｅで得られた値ＲＰ＋ｐａ[6:3]＋１をＲＰに設定するように切替動作を行なう（図５〜図７の(1)参照）。 The adder 143e adds the physical address pa [6: 3] specifying the processing target data requested by the core 11A and the value RP + 1 from the one adder 143d, and obtains the obtained value RP + pa [6: 3] +1. Is set to the RP (see step S16 in FIG. 6). The physical address pa [6: 3] is read from the header of the packet being read via the RD-BUS (read bus). The operation timing of the adder 143e is the timing of reading the header from the fetch response packet (YES route in step S15 in FIG. 6; see timing T18 in FIG. 7). At this timing, the selector 143g performs a switching operation to set the value RP + pa [6: 3] +1 obtained by the adder 143e to RP (see (1) in FIGS. 5 to 7).

加算器１４３ｆは、Lengthレジスタにおけるデータ長Lengthと、１加算器１４３ｂからの値ＨＤＲＰ＋１とを加算し、得られた値ＨＤＲＰ＋Length＋１をＲＰに設定する（図６のステップＳ２３参照）。加算器１４３ｆの動作タイミングは、CycleＣＴがデータ長LengthになってCycleＣＴをリセット（初期化）するタイミング（図６のステップＳ２２；図７のタイミングＴ３４参照）である。当該タイミングで、セレクタ１４３ｇは、加算器１４３ｆで得られた値ＨＤＲＰ＋Length＋１をＲＰに設定するように切替動作を行なう（図５〜図７の(4)参照）。 The adder 143f adds the data length Length in the Length register and the value HDRP + 1 from the one adder 143b, and sets the obtained value HDRP + Length + 1 to RP (see step S23 in FIG. 6). The operation timing of the adder 143f is a timing at which CycleCT becomes the data length Length and resets (initializes) CycleCT (Step S22 in FIG. 6; see timing T34 in FIG. 7). At this timing, the selector 143g performs a switching operation to set the value HDRP + Length + 1 obtained by the adder 143f to RP (see (4) in FIGS. 5 to 7).

また、セレクタ１４３ｇは、１加算器１４３ｄで得られた値ＲＰ＋１をＲＰに設定するように切替動作を行なう（図５〜図７の(2)参照）。当該切替動作を行なうタイミングは、ヘッダのOpcodeがフェッチ応答でない場合（図６のステップＳ１５のＮＯルート参照）、もしくは、ＲＰがＨＤＲＰ＋Lengthと一致しないタイミング（図６のステップＳ１９のＮＯルート；図７のタイミングＴ１９〜Ｔ２３，Ｔ２５〜Ｔ３３参照）である。 The selector 143g performs a switching operation to set the value RP + 1 obtained by the one adder 143d to RP (see (2) in FIGS. 5 to 7). The timing for performing the switching operation is when the Opcode of the header is not a fetch response (see the NO route in step S15 in FIG. 6), or when the RP does not match HDRP + Length (NO route in step S19 in FIG. 6; FIG. 7). Timings T19 to T23 and T25 to T33).

さらに、セレクタ１４３ｇは、１加算器１４３ｂで得られた値ＨＤＲＰ＋１をＲＰに設定するように切替動作を行なう（図５〜図７の(3)参照）。当該切替動作を行なうタイミングは、ＲＰがＨＤＲＰ＋Lengthと一致するタイミング（図６のステップＳ１９のＹＥＳルート；図７のタイミングＴ２４参照）である。 Further, the selector 143g performs a switching operation so that the value HDRP + 1 obtained by the one adder 143b is set to RP (see (3) in FIGS. 5 to 7). The timing at which the switching operation is performed is a timing at which RP matches HDRP + Length (YES route in step S19 in FIG. 6; see timing T24 in FIG. 7).

なお、上述した書込部１４２および読出部１４３としての機能は、論理ゲート等によってハードウエア的にＣＰＵ１０Ａに組み込まれて実現されてもよいし、プログラムを実行することでソフトウエア的にＣＰＵ１０Ａに組み込まれて実現されてもよい。 The functions of the writing unit 142 and the reading unit 143 described above may be realized by being incorporated in the CPU 10A in hardware by a logic gate or the like, or may be implemented in the CPU 10A by software by executing a program. And may be realized.

次に、図５〜図７を参照しながら、上述のごとく構成された書込部１４２および読出部１４３の動作について説明する。図６は、図５に示すパケット読出処理に係る構成（読出部１４３）の動作を説明するフローチャートである。図７は、図４および図５に示すパケット読出処理に係る構成（読出部１４３）が図６に示すフローチャートに従って１０番目のデータ（ＤＴＡ）を最初に読み出す場合の動作を示すタイムチャートである。 Next, operations of the writing unit 142 and the reading unit 143 configured as described above will be described with reference to FIGS. FIG. 6 is a flowchart illustrating the operation of the configuration (reading unit 143) related to the packet reading process shown in FIG. FIG. 7 is a time chart showing an operation when the configuration (reading unit 143) related to the packet reading process shown in FIGS. 4 and 5 first reads the tenth data (DTA) according to the flowchart shown in FIG.

第１実施形態の書込部１４２によるパケット書込動作は、図２０を参照しながら前述した関連技術の動作と同様であるので、その説明は省略する。これに対し、第１実施形態の読出部１４３によるパケット読出動作は、図２１を参照しながら前述した関連技術の動作と異なっている。 The packet write operation performed by the write unit 142 according to the first embodiment is the same as the operation of the related art described above with reference to FIG. On the other hand, the packet reading operation by the reading unit 143 of the first embodiment is different from the operation of the related art described above with reference to FIG.

ここで、図６に示すフローチャート（ステップＳ１１〜Ｓ２４）に従って、図５および図７を参照しながら、第１実施形態の読出部１４３によるパケット読出動作について説明する。 Here, the packet reading operation by the reading unit 143 according to the first embodiment will be described with reference to FIGS. 5 and 7 according to the flowchart (steps S11 to S24) shown in FIG.

ルータ１４Ａは、ＲＰの値とＨＤＷＰの値とが一致しているか否かを判断する（ステップＳ１１）。ＲＰの値とＨＤＷＰの値とが一致している場合（ステップＳ１１のＹＥＳルート）、ルータ１４Ａは、書込対象パケットの書込中であると判断し、ステップＳ１１の処理に戻る。 The router 14A determines whether the value of RP matches the value of HDWP (step S11). When the value of the RP matches the value of the HDWP (YES route in step S11), the router 14A determines that the packet to be written is being written, and returns to the process in step S11.

ＲＰの値とＨＤＷＰの値とが一致しない場合（ステップＳ１１のＮＯルート）、ルータ１４Ａは、書込対象パケットの書込を完了したと判断し、受信バッファ１４１からのパケット読出を開始する（図７のタイミングＴ１８参照）。つまり、ＲＰによって指定される、受信バッファ１４１のエントリ（一単位データ）が、受信バッファ１４１からＲＤＲ経由で読み出される（ステップＳ１２）。そして、読み出されたエントリがヘッダ（ＨＤ）であるか否かが判断される（ステップＳ１３）。 If the value of RP does not match the value of HDWP (NO route in step S11), the router 14A determines that the writing of the packet to be written has been completed, and starts reading the packet from the reception buffer 141 (FIG. 7, timing T18). That is, the entry (one unit data) of the reception buffer 141 specified by the RP is read from the reception buffer 141 via the RDR (step S12). Then, it is determined whether the read entry is a header (HD) (step S13).

パケット読出の開始時には、まずパケットのヘッダが読み出されるため、読み出されたエントリはヘッダであると判断される（ステップＳ１３のＹＥＳルート）。この場合、読み出されたヘッダのOpcodeが参照され、当該Opcode（パケット種）に基づき、受信バッファ１４１から読出中のパケットのデータ長Lengthが生成され、生成されたデータ長Lengthが読出部１４３のLengthレジスタに設定される（ステップＳ１４）。 At the start of packet reading, the header of the packet is first read, so that the read entry is determined to be the header (YES route in step S13). In this case, the read header Opcode is referred to, the data length Length of the packet being read from the reception buffer 141 is generated based on the Opcode (packet type), and the generated data length Length is read by the reading unit 143. This is set in the Length register (step S14).

この後、当該Opcodeがフェッチ応答であるか否かを判断する（ステップＳ１５）。フェッチ応答である場合（ステップＳ１５のＹＥＳルート）、セレクタ１４３ｇは図５の(1)を選択するように切替動作を行なう。これにより、加算器１４３ｅで得られた値ＲＰ＋ｐａ[6:3]＋１がＲＰに設定され（ステップＳ１６）、CycleＣＴが１インクリメントされる（ステップＳ１７）。そして、ＲＰに設定された値（アドレス）で指定されるエントリ（データＤＴＡ）が読み出される（ステップＳ１１のＮＯルートからステップＳ１２）。つまり、パケットのヘッダを読み出した際に、ヘッダの物理アドレスｐａ[6:3]に基づき、コア１１Ａが要求しているデータに対応する受信バッファ１４１のアドレスＲＰ＋ｐａ[6:3]＋１がＲＰに設定される。図７に示す例では、１０番目のデータＤＴＡに対応するアドレスを示す１１が、ＲＰにセットされる。 Thereafter, it is determined whether or not the Opcode is a fetch response (step S15). If the response is a fetch response (YES route in step S15), the selector 143g performs a switching operation to select (1) in FIG. Thus, the value RP + pa [6: 3] +1 obtained by the adder 143e is set to RP (step S16), and CycleCT is incremented by 1 (step S17). Then, the entry (data DTA) specified by the value (address) set in the RP is read (from the NO route of step S11 to step S12). That is, when the header of the packet is read, the address RP + pa [6: 3] +1 of the reception buffer 141 corresponding to the data requested by the core 11A is set to the RP based on the physical address pa [6: 3] of the header. Is set. In the example shown in FIG. 7, 11 indicating the address corresponding to the tenth data DTA is set in RP.

この後、次のサイクルで、ヘッダの次に読み出されるエントリはデータであり、この場合（ステップＳ１３のＮＯルート）、CycleＣＴの値がデータ長（パケット長）Lengthに到達したか否かが判断される（ステップＳ１８）。つまり、読出対象パケットの全てのデータが読み出されたか否かが判断される。図７では、Length＝１６の例が示されている。 Thereafter, in the next cycle, the entry to be read next to the header is data. In this case (NO route in step S13), it is determined whether the value of CycleCT has reached the data length (packet length) Length. (Step S18). That is, it is determined whether or not all the data of the read target packet has been read. FIG. 7 shows an example in which Length = 16.

CycleＣＴ＝Lengthでない場合（ステップＳ１８のＮＯルート）、ＲＰの値が値ＨＤＲＰ＋Length（図７では値１６）に到達したか否かが判断される（ステップＳ１９）。ＲＰ＝ＨＤＲＰ＋Lengthでない場合（ステップＳ１９のＮＯルート）、もしくは、ヘッダのOpcodeがフェッチ応答でない場合（ステップＳ１５のＮＯルート）、セレクタ１４３ｇは図５の(2)を選択するように切替動作を行なう。これにより、ＲＰの値が値ＨＤＲＰ＋Lengthに到達するまで、一単位データを読み出す都度（図７のタイミングＴ１９〜Ｔ２３参照）、ＲＰの値が１インクリメントされ（ステップＳ２０）、CycleＣＴが１インクリメントされた後（ステップＳ１７）、処理はステップＳ１１に戻る。 If CycleCT is not Length (NO route in step S18), it is determined whether the value of RP has reached the value HDRP + Length (value 16 in FIG. 7) (step S19). When RP is not HDRP + Length (NO route in step S19), or when the Opcode of the header is not a fetch response (NO route in step S15), the selector 143g performs a switching operation to select (2) in FIG. Thus, each time one unit of data is read (see timings T19 to T23 in FIG. 7) until the value of RP reaches the value HDRP + Length, the value of RP is incremented by 1 (step S20), and after CycleCT is incremented by 1 (Step S17), the process returns to step S11.

ＲＰの値が値ＨＤＲＰ＋Lengthに到達すると（ステップＳ１９のＹＥＳルート）、セレクタ１４３ｇは図５の(3)を選択するように切替動作を行なう。これにより、１加算器１４３ｂで得られた値ＨＤＲＰ＋１がＲＰに設定され（ステップＳ２１）、CycleＣＴが１インクリメントされる（ステップＳ１７）。例えば図７のタイミングＴ２４では、ＨＤＲＰ＝０であるため、ＲＰには１が設定される。この後、処理はステップＳ１１に戻る。 When the value of RP reaches the value HDRP + Length (YES route in step S19), the selector 143g performs a switching operation to select (3) in FIG. As a result, the value HDRP + 1 obtained by the one adder 143b is set to RP (step S21), and CycleCT is incremented by one (step S17). For example, at timing T24 in FIG. 7, since HDRP = 0, 1 is set to RP. Thereafter, the process returns to step S11.

ＲＰに値ＨＤＲＰ＋１を設定した後の各サイクル（図７のタイミングＴ２５〜Ｔ３３参照）では、CycleＣＴの値がデータ長Lengthに到達するまで、つまり読出対象パケットの全てのデータが読み出されるまで、セレクタ１４３ｇは図５の(2)を選択するように切替動作を行なう。これにより、CycleＣＴの値がデータ長Lengthに到達するまで、一単位データを読み出す都度、ＲＰの値が１インクリメントされ（ステップＳ２０）、CycleＣＴが１インクリメントされた後（ステップＳ１７）、処理はステップＳ１１に戻る。 In each cycle after setting the value HDRP + 1 in RP (see timings T25 to T33 in FIG. 7), the selector 143g continues until the value of CycleCT reaches the data length Length, that is, until all the data of the read target packet is read. Performs a switching operation to select (2) in FIG. Thus, each time one unit of data is read, the value of RP is incremented by 1 until the value of CycleCT reaches the data length Length (step S20), and after the CycleCT is incremented by 1 (step S17), the process proceeds to step S11. Return to

CycleＣＴの値がデータ長Lengthに到達すると（ステップＳ１８のＹＥＳルート；図７のタイミングＴ３４参照）、CycleＣＴの値が０にリセットされる（ステップＳ２２）。そして、セレクタ１４３ｇは図５の(4)を選択するように切替動作を行なう。これにより、加算器１４３ｆで得られた値ＨＤＲＰ＋Length＋１（図７では値１７）が、次に読み出すべきデータのアドレスとしてＲＰに設定される（ステップＳ２３）。また、加算器１４３ｃで得られた値ＨＤＲＰ＋Length＋１（図７では値１７）が、次に読み出すべきパケットのヘッダのアドレスとしてＨＤＲＰに設定される（ステップＳ２４）。この後、処理はステップＳ１１に戻る。 When the value of CycleCT reaches the data length Length (YES route in step S18; see timing T34 in FIG. 7), the value of CycleCT is reset to 0 (step S22). Then, the selector 143g performs a switching operation so as to select (4) in FIG. Thus, the value HDRP + Length + 1 (value 17 in FIG. 7) obtained by the adder 143f is set in RP as the address of the data to be read next (step S23). The value HDRP + Length + 1 (value 17 in FIG. 7) obtained by the adder 143c is set in HDRP as the address of the header of the packet to be read next (step S24). Thereafter, the process returns to step S11.

以上の動作により、第１実施形態では、データＤＴＡを、図２２に示した関連技術の場合よりも１０サイクル早く読み出すことができ、レイテンシが短縮される。 According to the above operation, in the first embodiment, the data DTA can be read ten cycles earlier than in the case of the related art shown in FIG. 22, and the latency is reduced.

また、上述した動作では、物理アドレスｐａ[6:3]によって、読み出すデータの順番が一意に決まる。したがって、パケットを受け取ったコア１１Ａは、受信バッファ１４１におけるパケットから最初に読み出したデータ以外のデータが、どの物理アドレスのデータであるかを容易に判断することができる。 In the above-described operation, the order of data to be read is uniquely determined by the physical address pa [6: 3]. Therefore, the core 11A that has received the packet can easily determine which physical address is the data other than the data read first from the packet in the reception buffer 141.

〔４〕第２実施形態の情報処理装置
次に、図１および図８〜図１３を参照しながら、本発明の第２実施形態としての情報処理装置（マルチプロセッサシステム）１′について説明する。ここで説明する第２実施形態は、複数の処理対象データが、一つのデータブロック内において所定間隔Intervalをあけて存在する場合に対応する技術である。つまり、第２実施形態は、前述したように、行列計算等で発生するストライドアクセスにおいて、１個のデータ付きパケット内に、必要な単位データ（処理対象データ）が複数存在する場合に対応する技術である。例えば、図１３に示す例では、１６個の８バイトデータＤＴ０〜ＤＴＦを含むパケット内に、所定間隔Interval＝４をあけて４個の処理対象データＤＴ２，ＤＴ６，ＤＴＡ，ＤＴＥが存在する場合について説明する。 [4] Information Processing Apparatus of Second Embodiment Next, an information processing apparatus (multiprocessor system) 1 ′ according to a second embodiment of the present invention will be described with reference to FIGS. 1 and 8 to 13. The second embodiment described here is a technique corresponding to a case where a plurality of processing target data exists at a predetermined interval Interval in one data block. That is, as described above, the second embodiment is a technique corresponding to a case where a plurality of necessary unit data (processing target data) exist in one packet with data in a stride access generated by a matrix calculation or the like. It is. For example, in the example illustrated in FIG. 13, a case is described in which four processing target data DT2, DT6, DTA, and DTE exist at a predetermined interval Interval = 4 in a packet including 16 8-byte data DT0 to DTF. explain.

図１に示すように、第２実施形態の情報処理装置１′も、第１実施形態の情報処理装置１と同様、複数（図１で２個）のＣＰＵ１０Ａ，１０Ｂを有している。第２実施形態においても、ＣＰＵ１０ＡとＣＰＵ１０Ｂとは、高速シリアル伝送によるネットワーク３０を介して相互に通信可能に接続される。 As shown in FIG. 1, the information processing apparatus 1 'of the second embodiment also has a plurality (two in FIG. 1) of CPUs 10A and 10B, similarly to the information processing apparatus 1 of the first embodiment. Also in the second embodiment, the CPU 10A and the CPU 10B are communicably connected to each other via the network 30 using high-speed serial transmission.

情報処理装置１′におけるＣＰＵ１０Ａ，１０Ｂも、図１〜図７を参照しながら前述した情報処理装置１におけるＣＰＵ１０Ａ，１０Ｂと同様に構成されている。ただし、第２実施形態の情報処理装置１′では、以下に説明するように、ＣＰＵ１０Ａのルータ１４Ａの第１の通信部としての機能、および、ＣＰＵ１０Ｂのルータ１４Ｂの第２の通信部としての機能に若干の変更が加えられる。また、第２実施形態の情報処理装置１′では、以下に説明するように、ＣＰＵ１０Ａ（ルータ１４Ａ）における読出部１４３が、読出部１４３′（図１，図１０，図１１参照）に変更されている。 The CPUs 10A and 10B in the information processing device 1 'have the same configuration as the CPUs 10A and 10B in the information processing device 1 described above with reference to FIGS. However, in the information processing apparatus 1 'of the second embodiment, as described below, the function of the CPU 10A as a first communication unit of the router 14A and the function of the CPU 10B as a second communication unit of the router 14B. Some changes are made to. Further, in the information processing apparatus 1 'of the second embodiment, as described below, the reading unit 143 in the CPU 10A (router 14A) is changed to a reading unit 143' (see FIGS. 1, 10, and 11). ing.

第２実施形態のＣＰＵ１０Ａにおけるルータ（第１の通信部）１４Ａも、第１実施形態と同様、一のコア１１Ａが生成したフェッチ要求を、フェッチ要求パケット（図９，図１７（Ａ）参照）によってＣＰＵ１０Ｂへ送信する。また、ルータ（第１の通信部）１４Ａは、フェッチ要求に対応する処理対象データを含むデータブロックを添付されたフェッチ応答パケット（図９，図１７（Ｂ）参照）を、ＣＰＵ１０Ｂから受信する。 Similarly to the first embodiment, the router (first communication unit) 14A in the CPU 10A of the second embodiment also transmits a fetch request generated by one core 11A to a fetch request packet (see FIGS. 9 and 17A). To the CPU 10B. Also, the router (first communication unit) 14A receives from the CPU 10B a fetch response packet (see FIGS. 9 and 17B) to which a data block including the processing target data corresponding to the fetch request is attached.

ただし、第２実施形態のＣＰＵ１０Ａにおけるルータ（第１の通信部）１４ＡからＣＰＵ１０Ｂへ送信される、フェッチ要求パケットのヘッダは、図８（Ａ）に示すようなフォーマットを有する。つまり、フェッチ要求パケットのヘッダには、図１８（Ａ）を参照しながら上述したＯＰＣ，ＲＱＩＤ，物理アドレスＰＡのほかに、上記所定間隔に関する情報（ここでは上記所定間隔を示す値Interval）が含まれる。 However, the header of the fetch request packet transmitted from the router (first communication unit) 14A in the CPU 10A of the second embodiment to the CPU 10B has a format as shown in FIG. That is, in the header of the fetch request packet, in addition to the OPC, RQID, and physical address PA described above with reference to FIG. 18A, information on the predetermined interval (here, a value Interval indicating the predetermined interval) is included. It is.

また、第２実施形態のＣＰＵ１０Ｂにおけるルータ（第２の通信部）１４Ｂは、図８（Ｂ）に示すようなフォーマットを有する応答ヘッダをＣＰＵ１０Ａへ送信されるフェッチ応答パケット（ＤＩＭＭ２０Ｂ等から読み出されたデータブロック）に付す。図８（Ｂ）に示すように、当該応答ヘッダには、フェッチ要求パケットのヘッダに含まれるアドレス情報（図８（Ａ）のＰＡ参照）から取り出された先頭の処理対象データのアドレス情報ｐａ[6:3]と、所定間隔Intervalとが記録される。そして、ルータ（第２の通信部）１４Ｂは、応答ヘッダを付したデータブロックを、フェッチ応答パケット（図９，図１７（Ｂ）参照）としてＣＰＵ１０Ａへ送信する。 The router (second communication unit) 14B in the CPU 10B of the second embodiment reads a response header having a format as shown in FIG. 8B from a fetch response packet (DIMM 20B or the like) transmitted to the CPU 10A. Data block). As shown in FIG. 8B, the response header includes the address information pa [of the head processing target data extracted from the address information (see PA in FIG. 8A) included in the header of the fetch request packet. 6: 3] and a predetermined interval Interval are recorded. Then, the router (second communication unit) 14B transmits the data block to which the response header is added to the CPU 10A as a fetch response packet (see FIGS. 9 and 17B).

さらに、第２実施形態のＣＰＵ１０Ａにおけるルータ（第１の通信部）１４Ａは、受信バッファ１４１，書込部１４２，読出部１４３′を有する。第２実施形態において、受信バッファ１４１，書込部１４２，読出部１４３′は、ＣＰＵ１０Ａのルータ１４Ａ内に備えられているが、ＣＰＵ１０Ａ内に備えられていればよい。受信バッファ１４１および書込部１４２は、図１および図４を参照しながら上述した第１実施形態の受信バッファ１４１および書込部１４２と同様に構成されているので、その説明は省略する。 Further, the router (first communication unit) 14A in the CPU 10A of the second embodiment has a reception buffer 141, a writing unit 142, and a reading unit 143 '. In the second embodiment, the receiving buffer 141, the writing unit 142, and the reading unit 143 'are provided in the router 14A of the CPU 10A, but may be provided in the CPU 10A. The reception buffer 141 and the writing unit 142 are configured in the same manner as the reception buffer 141 and the writing unit 142 of the first embodiment described above with reference to FIGS.

読出部１４３′は、データブロック（フェッチ応答パケット）に付された応答ヘッダに記録された、先頭の処理対象データの物理アドレスｐａ[6:3]と所定間隔Intervalとを参照する。そして、読出部１４３′は、参照した物理アドレスｐａ[6:3]と所定間隔Intervalとに基づき、まず、コア１１Ａの要求する複数の処理対象データを受信バッファ１４１から読み出した後、当該複数の処理対象データ以外の単位データを受信バッファ１４１から順次読み出す。第２実施形態における読出部１４３′は、図１０および図１１を参照しながら後述するごとく構成され、図１２および図１３を参照しながら後述するごとく動作する。 The reading unit 143 'refers to the physical address pa [6: 3] and the predetermined interval Interval of the head processing target data recorded in the response header attached to the data block (fetch response packet). Then, the reading unit 143 'reads a plurality of processing target data requested by the core 11A from the reception buffer 141 based on the referred physical address pa [6: 3] and the predetermined interval Interval, and then reads the plurality of processing target data. The unit data other than the processing target data is sequentially read from the reception buffer 141. The reading unit 143 'in the second embodiment is configured as described below with reference to FIGS. 10 and 11, and operates as described below with reference to FIGS.

なお、第２実施形態では、ＣＰＵ１０Ａが、第１の通信部としての機能や、受信バッファ１４１，書込部１４２，読出部１４３′としての機能を有し、ＣＰＵ１０Ｂが、第２の通信部としての機能を有する場合について説明している。しかし、複数のＣＰＵ１０Ａ，１０Ｂのそれぞれが、第１および第２の通信部としての機能と、受信バッファ１４１，書込部１４２，読出部１４３′としての機能とを有していてもよい。 In the second embodiment, the CPU 10A has a function as a first communication unit and a function as a reception buffer 141, a writing unit 142, and a reading unit 143 ', and the CPU 10B is a second communication unit. Is described. However, each of the plurality of CPUs 10A and 10B may have a function as the first and second communication units and a function as the reception buffer 141, the writing unit 142, and the reading unit 143 '.

ここで、図９を参照しながら、第２実施形態の情報処理装置１′において、一ＣＰＵ１０Ａのコア１１Ａから他ＣＰＵ１０Ｂのメモリデータに対する、ストライドアクセスに係るフェッチ要求を発行する場合のパケットルーティングについて説明する。 Here, with reference to FIG. 9, in the information processing apparatus 1 'of the second embodiment, a description will be given of packet routing when a fetch request for stride access is issued from the core 11A of one CPU 10A to the memory data of another CPU 10B. I do.

この場合、図９に示すように、必要な処理対象データを読み出すためのフェッチ要求パケット（データ無しパケット）が、ＣＰＵ１０Ａから発行・送信され、ネットワーク３０経由で、当該処理対象データを保持するメモリ２０Ｂを有するＣＰＵ１０Ｂにルーティングされる。このとき、フェッチ要求パケットのヘッダには、図８（Ａ）に示すように、少なくとも、処理対象データのアドレス情報ＰＡと、ストライドアクセスに係る所定間隔Intervalとが含まれる。 In this case, as shown in FIG. 9, a fetch request packet (data-less packet) for reading necessary processing target data is issued and transmitted from the CPU 10A, and the memory 20B holding the processing target data via the network 30. Is routed to the CPU 10B having At this time, as shown in FIG. 8A, at least the address information PA of the processing target data and the predetermined interval Interval related to the stride access are included in the header of the fetch request packet.

一方、ルータ１４Ｂによってフェッチ要求パケットを受信したＣＰＵ１０Ｂは、要求された処理対象データを含むデータブロックをメモリ２０Ｂから読み出し、フェッチ応答パケット（データ付きパケット）を生成し、当該フェッチ応答パケットをＣＰＵ１０Ａに送信する。このとき、図８（Ｂ）に示すように、ＣＰＵ１０Ｂで生成されるフェッチ応答パケットのヘッダ（応答ヘッダ）には、ストライドアクセス対象の先頭の単位データ（処理対象データ）を特定可能な４ビットの物理アドレスｐａ[6:3]が載せられている。また、図８（Ｂ）に示すように、当該ヘッダには、ストライドアクセスに係る所定間隔Intervalも載せられている。 On the other hand, upon receiving the fetch request packet by the router 14B, the CPU 10B reads the data block including the requested processing target data from the memory 20B, generates a fetch response packet (packet with data), and transmits the fetch response packet to the CPU 10A. I do. At this time, as shown in FIG. 8B, the header (response header) of the fetch response packet generated by the CPU 10B includes a 4-bit data that can specify the head unit data (process target data) of the stride access target. The physical address pa [6: 3] is placed. Further, as shown in FIG. 8B, a predetermined interval Interval related to stride access is also carried in the header.

ＣＰＵ１０Ｂがフェッチ応答パケットを生成する際、フェッチ要求パケットのヘッダに載っている物理アドレスＰＡ（図８（Ａ）参照）から、フェッチ要求対象の先頭単位データを特定可能な４ビットの物理アドレスｐａ[6:3]が取り出される。また、同ヘッダからストライドアクセスに係る所定間隔Intervalが取り出される。取り出された物理アドレスｐａ[6:3]および所定間隔Intervalを応答ヘッダに含ませることにより、図８（Ｂ）に示すようなフォーマットの応答ヘッダが生成される。このように生成された応答ヘッダを有するフェッチ応答パケットは、図９に示すように、フェッチ要求の発行元であるＣＰＵ１０Ａにルーティングされ送信される。 When the CPU 10B generates the fetch response packet, the physical address PA (see FIG. 8A) described in the header of the fetch request packet indicates the 4-bit physical address pa [ 6: 3] is taken out. Also, a predetermined interval Interval related to the stride access is extracted from the header. By including the extracted physical address pa [6: 3] and the predetermined interval Interval in the response header, a response header having a format as shown in FIG. 8B is generated. The fetch response packet having the response header generated in this way is routed and transmitted to the CPU 10A that has issued the fetch request, as shown in FIG.

上述の通り、第２実施形態では、フェッチ要求パケットのヘッダおよびフェッチ応答パケットのヘッダの両方に、コア１１Ａがメモリ２０Ｂにストライドアクセスする際のアドレス間隔を示すIntervalが追加されている。ここで、例えば、データブロック（１２８バイトデータ）に含まれる単位データ（８バイトデータ）の数が１６である場合、所定間隔Intervalの値は、０＜Interval＜１６の範囲の整数である。これは、Interval＝０の場合、同一の単位データを選択することになる一方、Interval＝１６の場合、パケットに含まれるデータブロック（１６個の単位データ）の範囲を超えることになるからである。 As described above, in the second embodiment, the Interval indicating the address interval when the core 11A performs the stride access to the memory 20B is added to both the header of the fetch request packet and the header of the fetch response packet. Here, for example, when the number of unit data (8-byte data) included in the data block (128-byte data) is 16, the value of the predetermined interval Interval is an integer in the range of 0 <Interval <16. This is because when Interval = 0, the same unit data is selected, while when Interval = 16, the range exceeds the data block (16 unit data) included in the packet. .

ＣＰＵ１０Ａ側で受信されたフェッチ応答パケットは、まず、ルータ１４Ａに設けられた受信バッファ１４１に、書込部１４２（図１，図１０参照）によって、一旦、単位データ毎に書き込まれる。この後、読出部１４３′におけるＲＰの値（読み出すべき単位データのアドレス）を制御することで、受信バッファ１４１に書き込まれたパケットのうち、コア１１Ａが要求する処理対象データが受信バッファ１４１から優先的に読み出される。 The fetch response packet received by the CPU 10A is first written into the reception buffer 141 provided in the router 14A by the writing unit 142 (see FIGS. 1 and 10) once for each unit data. Thereafter, by controlling the value of the RP (the address of the unit data to be read) in the reading unit 143 ′, of the packets written in the reception buffer 141, the processing target data requested by the core 11 A has priority over the reception buffer 141. Is read out.

特に、第２実施形態における読出部１４３′は、応答ヘッダに記録された、先頭の処理対象データの物理アドレスｐａ[6:3]と所定間隔Intervalとに基づき、コア１１Ａの要求する複数の処理対象データを受信バッファ１４１から読み出した後、それ以外のデータを受信バッファ１４１から順次読み出す。 In particular, the reading unit 143 'in the second embodiment performs a plurality of processings requested by the core 11A based on the physical address pa [6: 3] of the first processing target data and the predetermined interval recorded in the response header. After the target data is read from the reception buffer 141, other data is sequentially read from the reception buffer 141.

以下、上述のような機能を実現する第２実施形態の構成について、図１０および図１１を参照しながら説明する。図１０は、第２実施形態のＣＰＵ１０Ａにおけるルータ１４Ａに含まれる受信バッファ１４１および当該受信バッファ１４１に対するパケット書込／読出処理に係る構成（書込部１４２および読出部１４３′）を示すブロック図である。図１１は、図１０に示すパケット読出処理に係る構成（読出部１４３′）を詳細に示すブロック図である。 Hereinafter, a configuration of the second embodiment that realizes the above-described functions will be described with reference to FIGS. 10 and 11. FIG. 10 is a block diagram showing a reception buffer 141 included in the router 14A in the CPU 10A according to the second embodiment, and a configuration (a writing unit 142 and a reading unit 143 ') relating to packet writing / reading processing for the reception buffer 141. is there. FIG. 11 is a block diagram showing in detail the configuration (reading unit 143 ') relating to the packet reading process shown in FIG.

図１０に示すように、第２実施形態における書込部１４２は、図４に示した書込部１４２と同様のＷＤＲ，ＨＤＷＰおよびＷＰを有する。 As illustrated in FIG. 10, the writing unit 142 according to the second embodiment has the same WDR, HDWP, and WP as the writing unit 142 illustrated in FIG.

また、第２実施形態の読出部１４３′は、図４に示した読出部１４３と同様のＲＤＲ，ＲＰ，ＨＤＲＰ，LengthレジスタおよびCycleＣＴに加え、Intervalレジスタを有する。ＷＤＲ，ＨＤＷＰ，ＷＰ，ＲＤＲ，ＲＰ，ＨＤＲＰ，LengthレジスタおよびCycleＣＴについては、既述のものと同様であるので、その説明は省略する。 The read unit 143 'of the second embodiment has an Interval register in addition to the same RDR, RP, HDRP, Length register and CycleCT as the read unit 143 shown in FIG. The WDR, HDWP, WP, RDR, RP, HDRP, Length register, and CycleCT are the same as those described above, and a description thereof will be omitted.

第２実施形態で追加されるIntervalレジスタには、フェッチ応答パケットのヘッダ（応答ヘッダ）を受信バッファ１４１から読み出した際に、当該ヘッダに記録された所定間隔Intervalが設定される。なお、パケット種がフェッチ応答パケット以外のパケットについては、Intervalの値として１が設定される。 In the Interval register added in the second embodiment, when the header (response header) of the fetch response packet is read from the reception buffer 141, a predetermined interval Interval recorded in the header is set. Note that for packets other than the fetch response packet, the packet type is set to 1 as the Interval value.

ついで、図１１を参照しながら、図１０に示すパケット読出処理に係る構成、つまり読出部１４３′について、より詳細に説明する。図１１に示すように、第２実施形態の読出部１４３′は、第１実施形態の読出部１４３と同様のLength生成部１４３ａと１加算器１４３ｂ，１４３ｄと加算器１４３ｃ，１４３ｅ，１４３ｆとセレクタ１４３ｇとに加え、加算器１４３ｈおよび演算器１４３ｉを有する。 Next, the configuration relating to the packet reading process shown in FIG. 10, that is, the reading unit 143 'will be described in more detail with reference to FIG. As shown in FIG. 11, the reading unit 143 'of the second embodiment includes a length generating unit 143a, one adders 143b and 143d, and adders 143c, 143e and 143f, and a selector similar to the reading unit 143 of the first embodiment. 143g and an adder 143h and a calculator 143i.

Length生成部１４３ａは、第１実施形態と同様、ヘッダを受信バッファ１４１から読み出した際に、当該ヘッダにおけるOpcodeから、受信バッファ１４１から読出中のパケットのデータ長Lengthを生成し、Lengthレジスタに設定する。 As in the first embodiment, when reading the header from the reception buffer 141, the Length generation unit 143a generates the data length of the packet being read from the reception buffer 141 from the Opcode in the header and sets the data length in the Length register. I do.

１加算器（＋１）１４３ｂは、第１実施形態と同様、ＨＤＲＰの示す値に１を加算する。 The 1 adder (+1) 143b adds 1 to the value indicated by HDRP, as in the first embodiment.

加算器１４３ｃは、第１実施形態と同様、Lengthレジスタにおけるデータ長Lengthと、１加算器１４３ｂからの値ＨＤＲＰ＋１とを加算し、得られた値ＨＤＲＰ＋Length＋１をＨＤＲＰに設定する（図１２のステップＳ４５参照）。加算器１４３ｃの動作タイミングは、CycleＣＴの示す値がデータ長LengthになってCycleＣＴをリセット（初期化）するタイミング（図１２のステップＳ４３および図１３のタイミングＴ３４参照）である。 As in the first embodiment, the adder 143c adds the data length Length in the Length register and the value HDRP + 1 from the one adder 143b, and sets the obtained value HDRP + Length + 1 to HDRP (see step S45 in FIG. 12). ). The operation timing of the adder 143c is the timing at which the value indicated by CycleCT becomes the data length Length and resets (initializes) CycleCT (see step S43 in FIG. 12 and timing T34 in FIG. 13).

１加算器（＋１）１４３ｄは、第１実施形態と同様、ＲＰの示す値に１を加算する。 The 1 adder (+1) 143d adds 1 to the value indicated by RP, as in the first embodiment.

加算器１４３ｅは、第１実施形態と同様、コア１１Ａが要求する先頭の処理対象データを特定する物理アドレスｐａ[6:3]と、１加算器１４３ｄからの値ＲＰ＋１とを加算し、得られた値ＲＰ＋ｐａ[6:3]＋１をＲＰに設定する（図１２のステップＳ３７参照）。物理アドレスｐａ[6:3]は、読出中のパケットのヘッダからＲＤ−ＢＵＳを介して読み出される。加算器１４３ｅの動作タイミングは、フェッチ応答パケットからヘッダを読み出すタイミング（図１２のステップＳ３６のＹＥＳルート；図１３のタイミングＴ１８参照）である。当該タイミングで、セレクタ１４３ｇは、加算器１４３ｅで得られた値ＲＰ＋ｐａ[6:3]＋１をＲＰに設定するように切替動作を行なう（図１１〜図１３の(1)参照）。 As in the first embodiment, the adder 143e adds the physical address pa [6: 3] specifying the first processing target data requested by the core 11A and the value RP + 1 from the one adder 143d, and obtains the result. The value RP + pa [6: 3] +1 is set to RP (see step S37 in FIG. 12). The physical address pa [6: 3] is read from the header of the packet being read via the RD-BUS. The operation timing of the adder 143e is the timing of reading the header from the fetch response packet (YES route in step S36 in FIG. 12; see timing T18 in FIG. 13). At this timing, the selector 143g performs a switching operation so as to set the value RP + pa [6: 3] +1 obtained by the adder 143e to RP (see (1) in FIGS. 11 to 13).

加算器１４３ｆは、第１実施形態と同様、Lengthレジスタにおけるデータ長Lengthと、１加算器１４３ｂからの値ＨＤＲＰ＋１とを加算し、得られた値ＨＤＲＰ＋Length＋１をＲＰに設定する（図１２のステップＳ４４参照）。加算器１４３ｆの動作タイミングは、CycleＣＴがデータ長LengthになってCycleＣＴをリセット（初期化）するタイミング（図１２のステップＳ４３；図１３のタイミングＴ３４参照）である。当該タイミングで、セレクタ１４３ｇは、加算器１４３ｆで得られた値ＨＤＲＰ＋Length＋１をＲＰに設定するように切替動作を行なう（図１１〜図１３の(4)参照）。 As in the first embodiment, the adder 143f adds the data length Length in the Length register and the value HDRP + 1 from the one adder 143b, and sets the obtained value HDRP + Length + 1 to RP (see step S44 in FIG. 12). ). The operation timing of the adder 143f is the timing at which the CycleCT becomes the data length Length and resets (initializes) the CycleCT (step S43 in FIG. 12; see timing T34 in FIG. 13). At this timing, the selector 143g performs a switching operation so as to set the value HDRP + Length + 1 obtained by the adder 143f to RP (see (4) in FIGS. 11 to 13).

第２実施形態で追加された加算器１４３ｈは、Intervalレジスタにおける所定間隔Intervalと、ＲＰの値とを加算し、得られた値ＲＰ＋IntervalをＲＰに設定する（図１２のステップＳ４１参照）。加算器１４３ｈの動作タイミングは、ヘッダのOpcodeがフェッチ応答でない場合（図１２のステップＳ３６のＮＯルート参照）、もしくは、ＲＰ＋IntervalがＨＤＲＰ＋Length以下であるタイミング（図１２のステップＳ４０のＮＯルート；図１３のタイミングＴ１９〜Ｔ２１，Ｔ２３〜Ｔ２５，Ｔ２７〜Ｔ２９，Ｔ３１〜Ｔ３２参照）である。当該タイミングで、セレクタ１４３ｇは、加算器１４３ｈで得られた値ＲＰ＋IntervalをＲＰに設定するように切替動作を行なう（図１１〜図１３の(2)参照）。 The adder 143h added in the second embodiment adds a predetermined interval Interval in the Interval register and the value of RP, and sets the obtained value RP + Interval to RP (see step S41 in FIG. 12). The operation timing of the adder 143h is when the Opcode of the header is not a fetch response (see NO route in step S36 in FIG. 12) or when RP + Interval is equal to or less than HDRP + Length (NO route in step S40 in FIG. 12; FIG. 13). Timings T19 to T21, T23 to T25, T27 to T29, and T31 to T32. At this timing, the selector 143g performs a switching operation to set the value RP + Interval obtained by the adder 143h to RP (see (2) in FIGS. 11 to 13).

また、第２実施形態で追加された演算器１４３ｉは、Intervalレジスタにおける所定間隔Intervalと、１加算器１４３ｂからの値ＨＤＲＰ＋１と、ＲＰの値とに基づき、値ＨＤＲＰ＋［（ＲＰ−ＨＤＲＰ＋１）％Interval］を算出し、当該値をＲＰに設定する（図１２のステップＳ４２参照）。演算器１４３ｉの動作タイミングは、ＲＰ＋IntervalがＨＤＲＰ＋Lengthを超えるタイミング（図１２のステップＳ４０のＹＥＳルート；図１３のタイミングＴ２２，Ｔ２６，Ｔ３０参照）である。当該タイミングで、セレクタ１４３ｇは、演算器１４３ｉで得られた値をＲＰに設定するように切替動作を行なう（図１１〜図１３の(3)参照）。なお、上記値における％は、剰余を与える演算に用いられる記号で、剰余＝被除数％除数と規定される。例えば、１６％４＝４、１７％４＝１、１４％４＝２となる。 The arithmetic unit 143i added in the second embodiment is configured such that a value HDRP + [(RP−HDRP + 1)% Interval is obtained based on a predetermined interval Interval in the Interval register, the value HDRP + 1 from the one adder 143b, and the value of RP. ] Is calculated and the value is set to RP (see step S42 in FIG. 12). The operation timing of the arithmetic unit 143i is a timing when RP + Interval exceeds HDRP + Length (YES route in step S40 in FIG. 12; see timings T22, T26, and T30 in FIG. 13). At this timing, the selector 143g performs a switching operation so as to set the value obtained by the calculator 143i to RP (see (3) in FIGS. 11 to 13). Note that% in the above value is a symbol used in the operation for giving the remainder, and is defined as remainder = dividend% divisor. For example, 16% 4 = 4, 17% 4 = 1, and 14% 4 = 2.

なお、上述した書込部１４２および読出部１４３′としての機能は、論理ゲート等によってハードウエア的にＣＰＵ１０Ａに組み込まれて実現されてもよいし、プログラムを実行することでソフトウエア的にＣＰＵ１０Ａに組み込まれて実現されてもよい。 The functions of the writing unit 142 and the reading unit 143 'described above may be implemented by being incorporated in the CPU 10A in hardware by a logic gate or the like, or may be implemented in software by executing a program. It may be realized by being incorporated.

次に、図１１〜図１３を参照しながら、上述のごとく構成された書込部１４２および読出部１４３′の動作について説明する。図１２は、図１１に示すパケット読出処理に係る構成（読出部１４３′）の動作を説明するフローチャートである。図１３は、図１０および図１１に示すパケット読出処理に係る構成（読出部１４３′）が図１２に示すフローチャートに従って２，６，１０，１４番目のデータ（ＤＴ２，ＤＴ６，ＤＴＡ，ＤＴＥ）を先に読み出す場合の動作を示すタイムチャートである。 Next, the operations of the writing unit 142 and the reading unit 143 'configured as described above will be described with reference to FIGS. FIG. 12 is a flowchart for explaining the operation of the configuration (reading unit 143 ') relating to the packet reading process shown in FIG. FIG. 13 shows that the configuration (reading unit 143 ') relating to the packet reading process shown in FIGS. 10 and 11 converts the second, sixth, tenth, and fourteenth data (DT2, DT6, DTA, DTE) according to the flowchart shown in FIG. 6 is a time chart showing an operation when reading is performed first.

第２実施形態の書込部１４２によるパケット書込動作は、図２０を参照しながら前述した関連技術の動作と同様であるので、その説明は省略する。一方、第２実施形態の読出部１４３′によるパケット読出動作は、図６を参照しながら前述した第１実施形態の読出部１４３の動作と部分的に異なっている。 The packet writing operation performed by the writing unit 142 according to the second embodiment is the same as the operation of the related art described above with reference to FIG. On the other hand, the packet reading operation by the reading unit 143 'of the second embodiment is partially different from the operation of the reading unit 143 of the first embodiment described above with reference to FIG.

図１２に示すフローチャート（ステップＳ３１〜Ｓ４５）に従って、図１１および図１３を参照しながら第２実施形態の読出部１４３′によるパケット読出動作について説明する。 The packet reading operation by the reading unit 143 'of the second embodiment will be described with reference to FIGS. 11 and 13 according to the flowchart (steps S31 to S45) shown in FIG.

ルータ１４Ａは、ＲＰの値とＨＤＷＰの値とが一致しているか否かを判断する（ステップＳ１１）。ＲＰの値とＨＤＷＰの値とが一致している場合（ステップＳ３１のＹＥＳルート）、ルータ１４Ａは、書込対象パケットの書込中であると判断し、ステップＳ３１の処理に戻る。 The router 14A determines whether the value of RP matches the value of HDWP (step S11). If the value of RP matches the value of HDWP (YES route in step S31), the router 14A determines that the packet to be written is being written, and returns to the process in step S31.

ＲＰの値とＨＤＷＰの値とが一致しない場合（ステップＳ３１のＮＯルート）、ルータ１４Ａは、書込対象パケットの書込を完了したと判断し、受信バッファ１４１からのパケット読出を開始する（図１３のタイミングＴ１８参照）。つまり、ＲＰによって指定される、受信バッファ１４１のエントリ（一単位データ）が、受信バッファ１４１からＲＤＲ経由で読み出される（ステップＳ３２）。そして、読み出されたエントリがヘッダ（ＨＤ）であるか否かが判断される（ステップＳ３３）。 If the value of RP does not match the value of HDWP (NO route in step S31), the router 14A determines that the writing of the packet to be written has been completed, and starts reading the packet from the reception buffer 141 (FIG. 13 timing T18). That is, the entry (one unit data) of the reception buffer 141 specified by the RP is read from the reception buffer 141 via the RDR (step S32). Then, it is determined whether the read entry is a header (HD) (step S33).

パケット読出の開始時には、まずパケットのヘッダが読み出されるため、読み出されたエントリはヘッダであると判断される（ステップＳ３３のＹＥＳルート）。この場合、読み出されたヘッダの所定間隔Intervalが参照され、ヘッダから当該Intervalの値が、読出部１４３′のIntervalレジスタに設定される（ステップＳ３４）。また、読み出されたヘッダのOpcodeが参照され、当該Opcode（パケット種）に基づき、受信バッファ１４１から読出中のパケットのデータ長Lengthが生成され、生成されたデータ長Lengthが読出部１４３′のLengthレジスタに設定される（ステップＳ３５）。 At the start of packet reading, the header of the packet is first read, so that the read entry is determined to be the header (YES route in step S33). In this case, the predetermined interval Interval of the read header is referred to, and the value of the Interval is set in the Interval register of the reading unit 143 'from the header (step S34). In addition, the read header Opcode is referred to, the data length Length of the packet being read from the reception buffer 141 is generated based on the Opcode (packet type), and the generated data length Length is read by the reading unit 143 ′. It is set in the Length register (step S35).

この後、当該Opcodeがフェッチ応答であるか否かを判断する（ステップＳ３６）。フェッチ応答である場合（ステップＳ３６のＹＥＳルート）、セレクタ１４３ｇは図１１の(1)を選択するように切替動作を行なう。これにより、加算器１４３ｅで得られた値ＲＰ＋ｐａ[6:3]＋１がＲＰに設定され（ステップＳ３７）、CycleＣＴが１インクリメントされる（ステップＳ３８）。そして、ＲＰに設定された値（アドレス）で指定されるエントリ（データＤＴ２）が読み出される（ステップＳ３１のＮＯルートからステップＳ３２）。つまり、パケットのヘッダを読み出した際に、ヘッダの物理アドレスｐａ[6:3]に基づき、コア１１Ａが要求しているデータ（先頭単位データ）に対応する受信バッファ１４１のアドレスＲＰ＋ｐａ[6:3]＋１がＲＰに設定される。図１３に示す例では、２番目のデータＤＴ２に対応するアドレスを示す３が、ＲＰにセットされる。 Thereafter, it is determined whether or not the Opcode is a fetch response (step S36). If the response is a fetch response (YES route in step S36), the selector 143g performs a switching operation to select (1) in FIG. Thereby, the value RP + pa [6: 3] +1 obtained by the adder 143e is set to RP (step S37), and CycleCT is incremented by 1 (step S38). Then, the entry (data DT2) specified by the value (address) set in the RP is read (from the NO route of step S31 to step S32). That is, when the header of the packet is read, the address RP + pa [6: 3 of the reception buffer 141 corresponding to the data (head unit data) requested by the core 11A based on the physical address pa [6: 3] of the header. ] +1 is set in the RP. In the example shown in FIG. 13, 3 indicating the address corresponding to the second data DT2 is set in RP.

この後、次のサイクルで、ヘッダの次に読み出されるエントリはデータであり、この場合（ステップＳ３３のＮＯルート）、CycleＣＴの値がデータ長（パケット長）Lengthに到達したか否かが判断される（ステップＳ３９）。つまり、読出対象パケットの全てのデータが読み出されたか否かが判断される。図１３では、Length＝１６の例が示されている。 Thereafter, in the next cycle, the entry read out after the header is data. In this case (NO route in step S33), it is determined whether the value of CycleCT has reached the data length (packet length) Length. (Step S39). That is, it is determined whether or not all the data of the read target packet has been read. FIG. 13 shows an example in which Length = 16.

CycleＣＴ＝Lengthでない場合（ステップＳ３９のＮＯルート）、値ＲＰ＋Intervalが値ＨＤＲＰ＋Length（図１３では値１６）を超えているか否かが判断される（ステップＳ４０）。ＲＰ＋IntervalがＨＤＲＰ＋Length以下である場合（ステップＳ４０のＮＯルート）、もしくは、ヘッダのOpcodeがフェッチ応答でない場合（ステップＳ３６のＮＯルート）、セレクタ１４３ｇは図１１の(2)を選択するように切替動作を行なう。 If CycleCT = Length is not satisfied (NO route in step S39), it is determined whether the value RP + Interval exceeds the value HDRP + Length (value 16 in FIG. 13) (step S40). When RP + Interval is equal to or less than HDRP + Length (NO route in step S40), or when the Opcode of the header is not a fetch response (NO route in step S36), the selector 143g performs a switching operation to select (2) in FIG. Do.

これにより、ＲＰ＋Intervalの値が値ＨＤＲＰ＋Lengthを超えるまで、一単位データを読み出す都度、ＲＰの値に、所定間隔の値Interval（図１３ではInterval＝４）が加算され（ステップＳ４１）、CycleＣＴが１インクリメントされた後（ステップＳ３８）、処理はステップＳ３１に戻る。なお、ステップＳ４１の処理の実行タイミングは、図１３のタイミングＴ１９〜Ｔ２１，Ｔ２３〜Ｔ２５，Ｔ２７〜Ｔ２９，Ｔ３１〜Ｔ３３に対応する。 Thus, each time one unit of data is read, the value of the predetermined interval Interval (Interval = 4 in FIG. 13) is added to the value of RP until the value of RP + Interval exceeds the value HDRP + Length (step S41), and CycleCT is incremented by one. After that (Step S38), the process returns to Step S31. The execution timing of the process of step S41 corresponds to timings T19 to T21, T23 to T25, T27 to T29, and T31 to T33 in FIG.

ＲＰ＋Intervalの値が値ＨＤＲＰ＋Lengthを超えると（ステップＳ４０のＹＥＳルート）、セレクタ１４３ｇは図１１の(3)を選択するように切替動作を行なう。これにより、演算器１４３ｉで得られた値ＨＤＲＰ＋［（ＲＰ−ＨＤＲＰ＋１）％Interval］がＲＰに設定され（ステップＳ４２）、CycleＣＴが１インクリメントされる（ステップＳ３８）。 When the value of RP + Interval exceeds the value HDRP + Length (YES route in step S40), the selector 143g performs a switching operation to select (3) in FIG. As a result, the value HDRP + [(RP−HDRP + 1)% Interval] obtained by the arithmetic unit 143i is set to RP (step S42), and CycleCT is incremented by 1 (step S38).

例えば、図１３のタイミングＴ２２では、値ＨＤＲＰ＋［（ＲＰ−ＨＤＲＰ＋１）％Interval］＝０＋［（１５−０＋１）％４］＝１６％４＝４であるため、ＲＰには４が設定される。また、図１３のタイミングＴ２６では、値ＨＤＲＰ＋［（ＲＰ−ＨＤＲＰ＋１）％Interval］＝０＋［（１６−０＋１）％４］＝１７％４＝１であるため、ＲＰには１が設定される。同様に、図１３のタイミングＴ３０では、値ＨＤＲＰ＋［（ＲＰ−ＨＤＲＰ＋１）％Interval］＝０＋［（１３−０＋１）％４］＝１４％４＝２であるため、ＲＰには２が設定される。この後、処理はステップＳ３１に戻る。 For example, at the timing T22 in FIG. 13, the value HDRP + [(RP−HDRP + 1)% Interval] = 0 + [(15-0 + 1)% 4] = 16% 4 = 4, so 4 is set to RP. At the timing T26 in FIG. 13, the value HDRP + [(RP−HDRP + 1)% Interval] = 0 + [(16-0 + 1)% 4] = 17% 4 = 1, so RP is set to 1. Similarly, at the timing T30 in FIG. 13, the value HDRP + [(RP−HDRP + 1)% Interval] = 0 + [(13−0 + 1)% 4] = 14% 4 = 2, so 2 is set in RP. . Thereafter, the process returns to step S31.

この後、CycleＣＴの値がデータ長Lengthに到達すると（ステップＳ３９のＹＥＳルート；図１３のタイミングＴ３４参照）、CycleＣＴの値が０にリセットされる（ステップＳ４３）。そして、セレクタ１４３ｇは図１１の(4)を選択するように切替動作を行なう。これにより、加算器１４３ｆで得られた値ＨＤＲＰ＋Length＋１（図１３では値１７）が、次に読み出すべきデータのアドレスとしてＲＰに設定される（ステップＳ４４）。また、加算器１４３ｃで得られた値ＨＤＲＰ＋Length＋１（図７では値１７）が、次に読み出すべきパケットのヘッダのアドレスとしてＨＤＲＰに設定される（ステップＳ４５）。この後、処理はステップＳ３１に戻る。 Thereafter, when the value of CycleCT reaches the data length Length (YES route in step S39; see timing T34 in FIG. 13), the value of CycleCT is reset to 0 (step S43). Then, the selector 143g performs a switching operation so as to select (4) in FIG. Thus, the value HDRP + Length + 1 (value 17 in FIG. 13) obtained by the adder 143f is set in RP as the address of the data to be read next (step S44). Also, the value HDRP + Length + 1 (the value 17 in FIG. 7) obtained by the adder 143c is set in HDRP as the address of the header of the packet to be read next (step S45). Thereafter, the process returns to step S31.

以上の動作により、第２実施形態では、データＤＴ２，ＤＴ６，ＤＴＡ，ＤＴＥが他のデータよりも先に読み出され、レイテンシが短縮される。 According to the above operation, in the second embodiment, the data DT2, DT6, DTA, and DTE are read before other data, and the latency is reduced.

また、上述した動作では、先頭の処理対象データの物理アドレスｐａ[6:3]と所定間隔Intervalとによって、読み出すデータの順番が一意に決まる。したがって、パケットを受け取ったコア１１Ａは、受信バッファ１４１におけるパケットから先に読み出されたデータ群以外のデータが、どの物理アドレスのデータであるかを容易に判断することができる。 In the above-described operation, the order of data to be read is uniquely determined by the physical address pa [6: 3] of the first processing target data and the predetermined interval Interval. Accordingly, the core 11A that has received the packet can easily determine which physical address is the data other than the data group previously read from the packet in the reception buffer 141.

なお、図１〜図７を参照しながら上述した第１実施形態は、図８〜図１３を参照しながら上述した第２実施形態の所定間隔Intervalの値が１である場合に相当する。 Note that the first embodiment described above with reference to FIGS. 1 to 7 corresponds to the case where the value of the predetermined interval Interval is 1 in the second embodiment described above with reference to FIGS.

〔５〕その他
以上、本発明の好ましい実施形態について詳述したが、本発明は、係る特定の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲内において、種々の変形、変更して実施することができる。 [5] Others Although the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the specific embodiment, and various modifications and changes can be made without departing from the spirit of the present invention. It can be changed and implemented.

〔６〕付記
以上の各実施形態を含む実施形態に関し、さらに以下の付記を開示する。 [6] Supplementary notes The following supplementary notes are further disclosed with respect to the embodiments including the above embodiments.

（付記１）
他の演算処理装置に接続される演算処理装置であって、
処理対象データの読出要求を生成する処理部と、
前記処理部が生成した読出要求を前記他の演算処理装置へ送信するとともに、送信した前記読出要求に対応する前記処理対象データを含むデータブロックを前記他の演算処理装置から受信する通信部と、
前記データブロックを保存するバッファと、
前記通信部が受信したデータブロックに含まれる複数の単位データを前記バッファに順次書き込む書込部と、
前記複数の単位データのうちの少なくとも一つである前記処理対象データを前記バッファから優先的に読み出す読出部と、を有する、演算処理装置。 (Appendix 1)
An arithmetic processing device connected to another arithmetic processing device,
A processing unit that generates a read request for data to be processed;
A communication unit that transmits the read request generated by the processing unit to the other arithmetic processing device, and receives a data block including the processing target data corresponding to the transmitted read request from the other arithmetic processing device,
A buffer for storing the data block;
A writing unit for sequentially writing a plurality of unit data included in the data block received by the communication unit to the buffer,
A reading unit that preferentially reads out the processing target data, which is at least one of the plurality of unit data, from the buffer.

（付記２）
前記読出部は、
前記データブロックに付された応答ヘッダに前記他の演算処理装置によって記録された前記処理対象データのアドレス情報を参照し、
当該アドレス情報に対応する前記処理対象データを前記バッファから読み出した後、
前記処理対象データ以外の単位データを前記バッファから順次読み出す、付記１に記載の演算処理装置。 (Appendix 2)
The reading unit includes:
Referring to the address information of the processing target data recorded by the other arithmetic processing device in the response header attached to the data block,
After reading the processing target data corresponding to the address information from the buffer,
The arithmetic processing device according to claim 1, wherein the unit data other than the processing target data is sequentially read from the buffer.

（付記３）
複数の前記処理対象データが前記複数の単位データにおいて所定間隔をあけて存在する場合、
前記通信部は、
前記所定間隔に関する情報を含む前記読出要求を前記他の演算処理装置へ送信し、
前記読出部は、
前記データブロックに付された応答ヘッダに前記他の演算処理装置によって記録された先頭の前記処理対象データのアドレス情報と前記所定間隔に関する情報とを参照し、
前記アドレス情報と前記所定間隔とに基づき前記複数の前記処理対象データを前記バッファから読み出した後、
前記複数の前記処理対象データ以外の単位データを前記バッファから順次読み出す、付記１に記載の演算処理装置。 (Appendix 3)
When the plurality of processing target data exists at predetermined intervals in the plurality of unit data,
The communication unit,
Transmitting the read request including the information on the predetermined interval to the other arithmetic processing device,
The reading unit includes:
Referring to the address information and the information on the predetermined interval of the head of the processing target data recorded by the other arithmetic processing device in the response header attached to the data block,
After reading the plurality of processing target data from the buffer based on the address information and the predetermined interval,
The arithmetic processing device according to claim 1, wherein the unit data other than the plurality of processing target data is sequentially read from the buffer.

（付記４）
第１の演算処理装置と、
前記第１の演算処理装置に接続される第２の演算処理装置と、を有し、
前記第１の演算処理装置は、
処理対象データの読出要求を生成する処理部と、
前記処理部が生成した読出要求を前記第２の演算処理装置へ送信するとともに、送信した前記読出要求に対応する前記処理対象データを含むデータブロックを前記第２の演算処理装置から受信する第１の通信部と、
前記データブロックを保存するバッファと、
前記第１の通信部が受信したデータブロックに含まれる複数の単位データを前記バッファに順次書き込む書込部と、
前記複数の単位データのうちの少なくとも一つである前記処理対象データを前記バッファから優先的に読み出す読出部と、を有し、
前記第２の演算処理装置は、
前記第１の演算処理装置から前記読出要求を受信するとともに、受信した前記読出要求に対応する前記処理対象データを含む前記データブロックを前記第１の演算処理装置へ送信する第２の通信部を有する、情報処理装置。 (Appendix 4)
A first arithmetic processing unit;
A second processing unit connected to the first processing unit,
The first arithmetic processing unit includes:
A processing unit that generates a read request for data to be processed;
Transmitting a read request generated by the processing unit to the second arithmetic processing device, and receiving a data block including the processing target data corresponding to the transmitted read request from the second arithmetic processing device; Communication department,
A buffer for storing the data block;
A writing unit for sequentially writing a plurality of unit data included in the data block received by the first communication unit to the buffer;
A reading unit that preferentially reads the processing target data, which is at least one of the plurality of unit data, from the buffer,
The second arithmetic processing unit includes:
A second communication unit that receives the read request from the first arithmetic processing device and transmits the data block including the processing target data corresponding to the received read request to the first arithmetic processing device. Information processing device.

（付記５）
前記第２の演算処理装置における前記第２の通信部は、
前記読出要求に含まれるアドレス情報から取り出された前記処理対象データのアドレス情報を記録した応答ヘッダを前記データブロックに付し、
前記応答ヘッダを付した前記データブロックを、前記第１の演算処理装置へ送信する、付記４に記載の情報処理装置。 (Appendix 5)
The second communication unit in the second arithmetic processing device,
Attaching a response header recording the address information of the processing target data extracted from the address information included in the read request to the data block,
The information processing device according to claim 4, wherein the data block to which the response header is added is transmitted to the first arithmetic processing device.

（付記６）
前記第１の演算処理装置における前記読出部は、
前記データブロックに付された前記応答ヘッダに記録された前記処理対象データのアドレス情報を参照し、
当該アドレス情報に対応する前記処理対象データを前記バッファから読み出した後、
前記処理対象データ以外の単位データを前記バッファから順次読み出す、付記５に記載の情報処理装置。 (Appendix 6)
The reading unit in the first arithmetic processing device,
With reference to the address information of the processing target data recorded in the response header attached to the data block,
After reading the processing target data corresponding to the address information from the buffer,
The information processing apparatus according to claim 5, wherein the unit data other than the processing target data is sequentially read from the buffer.

（付記７）
複数の前記処理対象データが前記複数の単位データにおいて所定間隔をあけて存在する場合、
前記第１の演算処理装置における前記第１の通信部は、
前記所定間隔に関する情報を含む前記読出要求を前記第２の演算処理装置へ送信し、
前記第２の演算処理装置における前記第２の通信部は、
前記読出要求に含まれるアドレス情報から取り出された先頭の前記処理対象データのアドレス情報と、前記読出要求から取り出された前記所定間隔に関する情報とを記録した応答ヘッダを前記データブロックに付し、
前記応答ヘッダを付した前記データブロックを、前記第１の演算処理装置へ送信する、付記４に記載の情報処理装置。 (Appendix 7)
When the plurality of processing target data exists at predetermined intervals in the plurality of unit data,
The first communication unit in the first arithmetic processing device includes:
Transmitting the read request including the information on the predetermined interval to the second arithmetic processing unit;
The second communication unit in the second arithmetic processing device,
Attach to the data block a response header that records the address information of the first processing target data extracted from the address information included in the read request and the information related to the predetermined interval extracted from the read request,
5. The information processing device according to claim 4, wherein the data block to which the response header is added is transmitted to the first arithmetic processing device.

（付記８）
前記第１の演算処理装置における前記読出部は、
前記データブロックに付された前記応答ヘッダに記録された前記先頭の前記処理対象データのアドレス情報と前記所定間隔に関する情報とを参照し、
前記アドレス情報と前記所定間隔とに基づき、前記複数の前記処理対象データを前記バッファから読み出した後、
前記複数の前記処理対象データ以外の単位データを前記バッファから順次読み出す、付記７に記載の情報処理装置。 (Appendix 8)
The reading unit in the first arithmetic processing device,
With reference to the address information and the information on the predetermined interval of the head of the processing target data recorded in the response header attached to the data block,
Based on the address information and the predetermined interval, after reading the plurality of processing target data from the buffer,
8. The information processing apparatus according to claim 7, wherein the plurality of unit data other than the processing target data are sequentially read from the buffer.

（付記９）
第１の演算処理装置と、前記第１の演算処理装置に接続される第２の演算処理装置と、を有する情報処理装置の制御方法であって、
前記第１の演算処理装置は、
処理対象データの読出要求を前記第２の演算処理装置へ送信し、
前記第２の演算処理装置は、
前記第１の演算処理装置から前記読出要求を受信すると、受信した前記読出要求に対応する前記処理対象データを含むデータブロックを前記第１の演算処理装置へ送信し、
前記第１の演算処理装置は、
前記読出要求に対応する前記処理対象データを含むデータブロックを前記第２の演算処理装置から受信し、
受信したデータブロックに含まれる複数の単位データをバッファに順次書き込み、
前記複数の単位データのうちの少なくとも一つである前記処理対象データを前記バッファから優先的に読み出す、情報処理装置の制御方法。 (Appendix 9)
A control method of an information processing device, comprising: a first arithmetic processing device; and a second arithmetic processing device connected to the first arithmetic processing device,
The first arithmetic processing unit includes:
Transmitting a read request for data to be processed to the second arithmetic processing unit;
The second arithmetic processing unit includes:
When receiving the read request from said first processor, the processed data corresponding to said read request received transmitted-containing Mude Taburokku to the first processor,
The first arithmetic processing unit includes:
Receiving a data block including the processing target data corresponding to the read request from the second arithmetic processing unit;
A plurality of unit data included in the received data block are sequentially written to the buffer,
A control method for an information processing device, wherein the processing target data, which is at least one of the plurality of unit data, is preferentially read from the buffer.

（付記１０）
前記第２の演算処理装置は、
前記読出要求に含まれるアドレス情報から取り出された前記処理対象データのアドレス情報を記録した応答ヘッダを前記データブロックに付し、
前記応答ヘッダを付した前記データブロックを、前記第１の演算処理装置へ送信する、付記９に記載の情報処理装置の制御方法。 (Appendix 10)
The second arithmetic processing unit includes:
Attaching a response header recording the address information of the processing target data extracted from the address information included in the read request to the data block,
The control method for an information processing device according to claim 9, wherein the data block to which the response header is added is transmitted to the first arithmetic processing device.

（付記１１）
前記第１の演算処理装置は、
前記データブロックに付された前記応答ヘッダに記録された前記処理対象データのアドレス情報を参照し、
当該アドレス情報に対応する前記処理対象データを前記バッファから読み出した後、
前記処理対象データ以外の単位データを前記バッファから順次読み出す、付記１０に記載の情報処理装置の制御方法。 (Appendix 11)
The first arithmetic processing unit includes:
With reference to the address information of the processing target data recorded in the response header attached to the data block,
After reading the processing target data corresponding to the address information from the buffer,
11. The control method of an information processing device according to claim 10, wherein unit data other than the processing target data is sequentially read from the buffer.

（付記１２）
複数の前記処理対象データが前記複数の単位データにおいて所定間隔をあけて存在する場合、
前記第１の演算処理装置は、
前記所定間隔に関する情報を含む前記読出要求を前記第２の演算処理装置へ送信し、
前記第２の演算処理装置は、
前記読出要求に含まれるアドレス情報から取り出された先頭の前記処理対象データのアドレス情報と、前記読出要求から取り出された前記所定間隔に関する情報とを記録した応答ヘッダを前記データブロックに付し、
前記応答ヘッダを付した前記データブロックを、前記第１の演算処理装置へ送信する、付記９に記載の情報処理装置の制御方法。 (Appendix 12)
When the plurality of processing target data exists at predetermined intervals in the plurality of unit data,
The first arithmetic processing unit includes:
Transmitting the read request including the information on the predetermined interval to the second arithmetic processing unit;
The second arithmetic processing unit includes:
Attach to the data block a response header that records the address information of the first processing target data extracted from the address information included in the read request and the information related to the predetermined interval extracted from the read request,
The control method for an information processing device according to claim 9, wherein the data block to which the response header is added is transmitted to the first arithmetic processing device.

（付記１３）
前記第１の演算処理装置は、
前記データブロックに付された前記応答ヘッダに記録された前記先頭の前記処理対象データのアドレス情報と前記所定間隔に関する情報とを参照し、
前記アドレス情報と前記所定間隔とに基づき、前記複数の前記処理対象データを前記バッファから読み出した後、
前記複数の前記処理対象データ以外の単位データを前記バッファから順次読み出す、付記１２に記載の情報処理装置の制御方法。 (Appendix 13)
The first arithmetic processing unit includes:
With reference to the address information and the information on the predetermined interval of the head of the processing target data recorded in the response header attached to the data block,
Based on the address information and the predetermined interval, after reading the plurality of processing target data from the buffer,
13. The control method for an information processing device according to claim 12, wherein the unit data other than the plurality of processing target data is sequentially read from the buffer.

１，１′ 情報処理装置（マルチプロセッサシステム）
１０，１０Ａ，１０ＢＣＰＵ（演算処理装置，マルチコアプロセッサ）
１１，１１Ａ，１１Ｂコア（処理部）
１２，１２Ａ，１２Ｂ共有キャッシュ（三次キャッシュ）
１３，１３Ａ，１３ＢＭＡＣ
１４ルータ
１４Ａルータ（第１の通信部）
１４Ｂルータ（第２の通信部）
１４１受信バッファ（バッファ）
１４２書込部
１４３，１４３′ 読出部
１４３ａ length生成部
１４３ｂ，１４３ｄ１加算器（＋１）
１４３ｃ，１４３ｅ，１４３ｆ，１４３ｈ加算器
１４３ｇセレクタ
１４３ｉ演算器
２０，２０Ａ，２０ＢＤＩＭＭ（メインメモリ）
３０ネットワーク 1,1 'information processing device (multiprocessor system)
10, 10A, 10B CPU (arithmetic processing unit, multi-core processor)
11, 11A, 11B core (processing unit)
12,12A, 12B Shared cache (tertiary cache)
13, 13A, 13B MAC
14 router 14A router (first communication unit)
14B router (second communication unit)
141 Receive buffer (buffer)
142 Writer 143, 143 'Reader 143a Length generator 143b, 143d 1 adder (+1)
143c, 143e, 143f, 143h Adder 143g Selector 143i Arithmetic unit 20, 20A, 20B DIMM (Main memory)
30 Network

Claims

An arithmetic processing device connected to another arithmetic processing device,
A processing unit that generates a read request for data to be processed;
A communication unit that transmits the read request generated by the processing unit to the other arithmetic processing device, and receives a data block including the processing target data corresponding to the transmitted read request from the other arithmetic processing device,
A buffer for storing the data block;
A writing unit sequentially writes a plurality of unit data included in the data block by the communication unit has received in the buffer,
A reading unit that preferentially reads the processing target data, which is at least one of the plurality of unit data, from the buffer,
The reading unit includes:
Address information of the processing target data recorded by the other processing unit in a response header attached to the data block, and information for specifying the processing target data from the plurality of unit data. See,
After reading the processing target data corresponding to the address information from the buffer,
An arithmetic processing device for sequentially reading unit data other than the processing target data from the buffer.

An arithmetic processing device connected to another arithmetic processing device,
A processing unit that generates a read request for data to be processed;
A communication unit that transmits the read request generated by the processing unit to the other arithmetic processing device, and receives a data block including the processing target data corresponding to the transmitted read request from the other arithmetic processing device,
A buffer for storing the data block;
A writing unit sequentially writes a plurality of unit data included in the data block by the communication unit has received in the buffer,
A reading unit that preferentially reads the processing target data, which is at least one of the plurality of unit data, from the buffer,
When the plurality of processing target data exists at predetermined intervals in the plurality of unit data,
The communication unit,
Transmitting the read request including the information on the predetermined interval to the other arithmetic processing device,
The reading unit includes:
Referring to the address information and the predetermined interval of the head of the processing target data recorded by the other arithmetic processing device in the response header attached to the data block,
After reading the plurality of processing target data from the buffer based on the address information and the predetermined interval,
An arithmetic processing device for sequentially reading unit data other than the plurality of processing target data from the buffer.

A first arithmetic processing unit;
A second processing unit connected to the first processing unit,
The first arithmetic processing unit includes:
A processing unit that generates a read request for data to be processed;
Transmitting a read request generated by the processing unit to the second arithmetic processing device and receiving a data block including the processing target data corresponding to the transmitted read request from the second arithmetic processing device; Communication department,
A buffer for storing the data block;
A writing unit sequentially writes a plurality of unit data of the first communication unit is included in the data block received in said buffer,
A reading unit that preferentially reads the processing target data, which is at least one of the plurality of unit data, from the buffer,
The second arithmetic processing unit includes:
A second communication unit that receives the read request from the first arithmetic processing device and transmits the data block including the processing target data corresponding to the received read request to the first arithmetic processing device. Have
The second communication unit in the second arithmetic processing device,
A response header, which is address information of the processing target data extracted from the address information included in the read request and records information for specifying the processing target data from the plurality of unit data, is stored in the data block. Attached to
An information processing device for transmitting the data block to which the response header is added to the first arithmetic processing device.

A first arithmetic processing unit;
A second processing unit connected to the first processing unit,
The first arithmetic processing unit includes:
A processing unit that generates a read request for data to be processed;
Transmitting a read request generated by the processing unit to the second arithmetic processing device and receiving a data block including the processing target data corresponding to the transmitted read request from the second arithmetic processing device; Communication department,
A buffer for storing the data block;
A writing unit sequentially writes a plurality of unit data of the first communication unit is included in the data block received in said buffer,
A reading unit that preferentially reads the processing target data, which is at least one of the plurality of unit data, from the buffer,
The second arithmetic processing unit includes:
A second communication unit that receives the read request from the first arithmetic processing device and transmits the data block including the processing target data corresponding to the received read request to the first arithmetic processing device. Have
When the plurality of processing target data exists at predetermined intervals in the plurality of unit data,
The first communication unit in the first arithmetic processing device includes:
Transmitting the read request including the information on the predetermined interval to the second arithmetic processing unit;
The second communication unit in the second arithmetic processing device,
Address information of the first processing target data extracted from the address information included in the read request, wherein the information specifies the first processing target data from the plurality of unit data; Attaching a response header that records information on the predetermined interval extracted from the request to the data block,
An information processing device for transmitting the data block to which the response header is added to the first arithmetic processing device.

The reading unit in the first arithmetic processing device,
With reference to the address information and the information on the predetermined interval of the head of the processing target data recorded in the response header attached to the data block,
Based on the address information and the predetermined interval, after reading the plurality of processing target data from the buffer,
The information processing apparatus according to claim 4, wherein unit data other than the plurality of processing target data is sequentially read from the buffer.

A control method of an information processing device, comprising: a first arithmetic processing device; and a second arithmetic processing device connected to the first arithmetic processing device,
The first arithmetic processing unit includes:
Transmitting a read request for data to be processed to the second arithmetic processing unit;
The second arithmetic processing unit includes:
When receiving the read request from said first processor, said an address information of the processing target data retrieved from the address information included in the read request, for identifying a pre-Symbol processed data received A response header recording information is attached to a data block including the processing target data corresponding to the read request,
Transmitting the data block with the response header to the first arithmetic processing unit;
The first arithmetic processing unit includes:
Receiving the data block including the processing object data corresponding to said read request from said second processor,
Sequentially writes a plurality of unit data included in the received data blocks in the buffer,
A control method for an information processing device, wherein the processing target data, which is at least one of the plurality of unit data, is preferentially read from the buffer.

In an information processing apparatus having a first arithmetic processing device and a second arithmetic processing device connected to the first arithmetic processing device,
The first arithmetic processing unit includes:
Transmitting a read request for data to be processed to the second arithmetic processing unit;
The second arithmetic processing unit includes:
Upon receiving the read request from the first processing unit, transmitting a data block including the processing target data corresponding to the received read request to the first processing unit;
The first arithmetic processing unit includes:
Receiving the data block including the processing object data corresponding to said read request from said second processor,
Sequentially writes a plurality of unit data included in the received data blocks in the buffer,
A control method of an information processing device, wherein the processing target data, which is at least one of the plurality of unit data, is read out preferentially from the buffer,
When the plurality of processing target data exists at predetermined intervals in the plurality of unit data,
The first arithmetic processing unit includes:
Transmitting the read request including the information on the predetermined interval to the second arithmetic processing unit;
The second arithmetic processing unit includes:
Address information of the first processing target data extracted from the address information included in the read request, wherein the information specifies the first processing target data from the plurality of unit data; Attaching a response header that records information on the predetermined interval extracted from the request to the data block,
A control method for an information processing device, wherein the data block to which the response header is added is transmitted to the first arithmetic processing device.

The first arithmetic processing unit includes:
With reference to the address information and the information on the predetermined interval of the head of the processing target data recorded in the response header attached to the data block,
Based on the address information and the predetermined interval, after reading the plurality of processing target data from the buffer,
The method according to claim 7, wherein unit data other than the plurality of data to be processed is sequentially read from the buffer.