JP2017010319A

JP2017010319A - Arithmetic processing device, information processing device, and information processing device control method

Info

Publication number: JP2017010319A
Application number: JP2015125808A
Authority: JP
Inventors: 哲志中川; Tetsushi Nakagawa
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2015-06-23
Filing date: 2015-06-23
Publication date: 2017-01-12
Anticipated expiration: 2035-06-23
Also published as: JP6665429B2

Abstract

PROBLEM TO BE SOLVED: To shorten read-out latency for processing object data.SOLUTION: An arithmetic processing device 10A connected with another arithmetic processing device 10B includes: a processing unit 11A for generating a readout request for processing object data; a communication unit 14A for transmitting the readout request to the other arithmetic processing device 10B and for receiving a data block including the processing object data corresponding to the transmitted readout request from the other arithmetic processing device 10B; a buffer 141 for storing the data block; a writing unit 142 for sequentially writing multiple pieces of unit data included in the data block received by the communication unit 14A in the buffer; and a reading unit 143 for preferentially reading out the processing object data, i.e. at least one of the multiple pieces of unit data from the buffer 141.SELECTED DRAWING: Figure 1

Description

本件は、演算処理装置、情報処理装置、および情報処理装置の制御方法に関する。 The present case relates to an arithmetic processing device, an information processing device, and a control method for the information processing device.

ＰＣ（Personal Computer），サーバなどの情報処理装置には、メモリから読み出したデータに対する演算処理を行なう、ＣＰＵ（Central Processing Unit）などの演算処理装置が含まれる。このような情報処理装置では、エンドポイントであるＣＰＵからメモリに対する読出要求となるフェッチ要求が発行された後、当該読出要求に対応する応答データをＣＰＵが受信するまでのレイテンシが、処理性能の重要なファクタの一つになっている。 An information processing apparatus such as a PC (Personal Computer) or a server includes an arithmetic processing apparatus such as a CPU (Central Processing Unit) that performs arithmetic processing on data read from a memory. In such an information processing apparatus, the latency until the CPU receives response data corresponding to the read request after the fetch request, which is a read request to the memory, is issued from the CPU that is the endpoint is an important factor in processing performance. It is one of the factors.

また、情報処理装置としては、複数のＣＰＵを相互に通信可能に接続されるマルチプロセッサシステムが用いられる場合がある。このようなマルチプロセッサシステムでは、近年、例えば図１４に示すように、複数のＣＰＵやルータの相互間が、高速シリアル伝送によるネットワークを介して接続される。高速シリアル伝送を採用することによって、通信の高スループットを実現することができる。 Further, as the information processing apparatus, a multiprocessor system in which a plurality of CPUs are connected so as to communicate with each other may be used. In such a multiprocessor system, in recent years, as shown in FIG. 14, for example, a plurality of CPUs and routers are connected to each other via a network using high-speed serial transmission. By adopting high-speed serial transmission, high communication throughput can be realized.

特開２０１０−１２４４４８号公報JP 2010-124448 A 特開２０１０−１１６２２８号公報JP 2010-116228 A

ところで、高速シリアル伝送によるネットワークでは、伝送エラーがＣＲＣ（Cyclic Redundancy Check；巡回冗長検査）によってチェックされる。そして、伝送エラーが検出されるとパケットの末尾がＥＤＢ（end bad）に書き換えられる。そのため、エンドポイント（ＣＰＵ）では、応答データを含むパケットの末尾までの全てのデータを受信しないと、そのパケットが正常か否かを判定することができない。 By the way, in a network using high-speed serial transmission, a transmission error is checked by CRC (Cyclic Redundancy Check). When a transmission error is detected, the end of the packet is rewritten to EDB (end bad). Therefore, the endpoint (CPU) cannot determine whether or not the packet is normal unless it receives all the data up to the end of the packet including the response data.

したがって、フェッチ要求を発行したＣＰＵでは、当該フェッチ要求に応じた応答データを含むパケットの末尾までの全てのデータが、一旦、受信バッファに保存されてから、受信バッファにおけるデータが先頭から順にＣＰＵコアに送り出される。 Therefore, in the CPU that has issued the fetch request, all the data up to the end of the packet including the response data corresponding to the fetch request is temporarily stored in the reception buffer, and then the data in the reception buffer is processed in order from the top. Sent out.

しかしながら、受信バッファはＦＩＦＯ（First In First Out）であるので、ＣＰＵコアが要求している処理対象データ（一単位データ；例えば８バイトデータ）の、受信バッファからの読出は、当該単位データ直前のデータが読み出されるまで待たされる。このため、ＣＰＵコアから見て通信レイテンシが大きくなり処理性能が低下する場合がある。 However, since the reception buffer is a FIFO (First In First Out), the processing target data requested by the CPU core (one unit data; for example, 8-byte data) is read from the reception buffer immediately before the unit data. Wait until data is read. For this reason, when viewed from the CPU core, communication latency may increase and processing performance may deteriorate.

一つの側面で、本件明細書に開示の発明は、処理対象データの読出レイテンシを短縮することを目的とする。 In one aspect, an object of the invention disclosed in this specification is to shorten a read latency of data to be processed.

本件の演算処理装置は、他の演算処理装置に接続されるものであって、処理部，通信部，バッファ，書込部および読出部を有する。前記処理部は、処理対象データの読出要求を生成する。前記通信部は、前記処理部が生成した読出要求を前記他の演算処理装置へ送信するとともに、送信した前記読出要求に対応する前記処理対象データを含むデータブロックを前記他の演算処理装置から受信する。前記バッファは、前記データブロックを保存する。前記書込部は、前記通信部が受信したデータブロックに含まれる複数の単位データを前記バッファに順次書き込む。前記読出部は、前記複数の単位データのうちの少なくとも一つである前記処理対象データを前記バッファから優先的に読み出す。 The arithmetic processing device of the present case is connected to another arithmetic processing device, and includes a processing unit, a communication unit, a buffer, a writing unit, and a reading unit. The processing unit generates a read request for processing target data. The communication unit transmits a read request generated by the processing unit to the other arithmetic processing device and receives a data block including the processing target data corresponding to the transmitted read request from the other arithmetic processing device. To do. The buffer stores the data block. The writing unit sequentially writes a plurality of unit data included in the data block received by the communication unit to the buffer. The reading unit preferentially reads the data to be processed, which is at least one of the plurality of unit data, from the buffer.

一実施形態によれば、処理対象データの読出レイテンシを短縮することができる。 According to one embodiment, it is possible to shorten the read latency of data to be processed.

本発明の第１実施形態および第２実施形態としての演算処理装置（ＣＰＵ）を含む情報処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing apparatus containing the arithmetic processing unit (CPU) as 1st Embodiment and 2nd Embodiment of this invention. 第１実施形態の情報処理装置において一ＣＰＵから他ＣＰＵへフェッチ要求を行なう場合のパケットルーティングを説明する図である。It is a figure explaining packet routing in the case of making a fetch request from one CPU to another CPU in the information processing apparatus of the first embodiment. 第１実施形態におけるフェッチ応答パケットのヘッダのフォーマットを示す図である。It is a figure which shows the format of the header of the fetch response packet in 1st Embodiment. 第１実施形態のＣＰＵにおけるルータに含まれる受信バッファおよび当該受信バッファに対するパケット書込／読出処理に係る構成（書込部および読出部）を示すブロック図である。It is a block diagram which shows the structure (a writing part and a reading part) which concerns on the receiving buffer contained in the router in 1st Embodiment, and the packet writing / reading process with respect to the said receiving buffer. 図４に示すパケット読出処理に係る構成（読出部）を詳細に示すブロック図である。FIG. 5 is a block diagram showing in detail a configuration (reading unit) related to the packet reading process shown in FIG. 4. 図５に示すパケット読出処理に係る構成（読出部）の動作を説明するフローチャートである。6 is a flowchart illustrating an operation of a configuration (reading unit) related to the packet reading process illustrated in FIG. 5. 図４および図５に示すパケット読出処理に係る構成（読出部）が図６に示すフローチャートに従って１０番目のデータを最初に読み出す場合の動作を示すタイムチャートである。6 is a time chart illustrating an operation when the configuration (reading unit) according to the packet reading process illustrated in FIGS. 4 and 5 first reads tenth data according to the flowchart illustrated in FIG. 6. （Ａ）は第２実施形態におけるフェッチ要求パケットのヘッダのフォーマットを示す図、（Ｂ）は第２実施形態におけるフェッチ応答パケットのヘッダのフォーマットを示す図である。(A) is a figure which shows the format of the header of the fetch request packet in 2nd Embodiment, (B) is a figure which shows the format of the header of the fetch response packet in 2nd Embodiment. 第２実施形態の情報処理装置において一ＣＰＵから他ＣＰＵへフェッチ要求を行なう場合のパケットルーティングを説明する図である。It is a figure explaining packet routing in the case of making a fetch request from one CPU to another CPU in the information processing apparatus of the second embodiment. 第２実施形態のＣＰＵにおけるルータに含まれる受信バッファおよび当該受信バッファに対するパケット書込／読出処理に係る構成（書込部および読出部）を示すブロック図である。It is a block diagram which shows the structure (a writing part and a reading part) which concerns on the receiving buffer contained in the router in CPU of 2nd Embodiment, and the packet write / read process with respect to the said receiving buffer. 図１０に示すパケット読出処理に係る構成（読出部）を詳細に示すブロック図である。It is a block diagram which shows in detail the structure (reading part) which concerns on the packet reading process shown in FIG. 図１１に示すパケット読出処理に係る構成（読出部）の動作を説明するフローチャートである。12 is a flowchart illustrating an operation of a configuration (reading unit) related to the packet reading process illustrated in FIG. 11. 図１０および図１１に示すパケット読出処理に係る構成（読出部）が図１２に示すフローチャートに従って２，６，１０，１４番目のデータを先に読み出す場合の動作を示すタイムチャートである。12 is a time chart showing an operation when the configuration (reading unit) related to the packet reading process shown in FIG. 10 and FIG. 11 reads the second, sixth, tenth and fourteenth data first in accordance with the flowchart shown in FIG. マルチプロセッサシステム（情報処理装置）の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of a multiprocessor system (information processing apparatus). マルチコアプロセッサ（ＣＰＵ，演算処理装置）の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of a multi-core processor (CPU, arithmetic processing unit). 図１４に示すマルチプロセッサシステムにおいて一ＣＰＵから他ＣＰＵへフェッチ要求を行なう場合のパケットルーティングを説明する図である。FIG. 15 is a diagram illustrating packet routing when a fetch request is made from one CPU to another CPU in the multiprocessor system shown in FIG. 14. （Ａ）はデータ無しパケットのフォーマットを示す図、（Ｂ）はデータ付きパケットのフォーマットを示す図である。(A) is a figure which shows the format of a packet without data, (B) is a figure which shows the format of a packet with data. （Ａ）はフェッチ要求パケットのヘッダのフォーマットを示す図、（Ｂ）はフェッチ応答パケットのヘッダのフォーマットを示す図である。(A) is a figure which shows the format of the header of a fetch request packet, (B) is a figure which shows the format of the header of a fetch response packet. 図１５に示すＣＰＵ（マルチコアプロセッサ）におけるルータに含まれる受信バッファおよび当該受信バッファに対するパケット書込／読出処理に係る構成を示すブロック図である。FIG. 16 is a block diagram showing a configuration relating to a reception buffer included in a router and a packet writing / reading process for the reception buffer in the CPU (multi-core processor) shown in FIG. 15. 図１９に示すパケット書込処理に係る構成の動作を説明するフローチャートである。20 is a flowchart for explaining the operation of the configuration related to the packet writing process shown in FIG. 19. 図１９に示すパケット読出処理に係る構成の動作を説明するフローチャートである。20 is a flowchart for explaining the operation of the configuration related to the packet reading process shown in FIG. 図１９に示すパケット書込／読出処理に係る構成において最短ケースの動作を示すタイムチャートである。20 is a time chart showing an operation in the shortest case in the configuration related to the packet writing / reading processing shown in FIG. 図１９に示すパケット書込処理に係る構成において受信パケットの末尾がＥＤＢである場合の動作を示すタイムチャートである。FIG. 20 is a time chart showing an operation when the end of the received packet is EDB in the configuration related to the packet writing process shown in FIG. 19. FIG.

以下に、図面を参照し、本願の開示する演算処理装置、情報処理装置、および情報処理装置の制御方法の実施形態について、詳細に説明する。ただし、以下に示す実施形態は、あくまでも例示に過ぎず、実施形態で明示しない種々の変形例や技術の適用を排除する意図はない。すなわち、本実施形態を、その趣旨を逸脱しない範囲で種々変形して実施することができる。また、各図は、図中に示す構成要素のみを備えるという趣旨ではなく、他の機能を含むことができる。そして、各実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 Hereinafter, embodiments of an arithmetic processing device, an information processing device, and a control method for the information processing device disclosed in the present application will be described in detail with reference to the drawings. However, the embodiment described below is merely an example, and there is no intention to exclude application of various modifications and techniques not explicitly described in the embodiment. That is, the present embodiment can be implemented with various modifications without departing from the spirit of the present embodiment. Each figure is not intended to include only the components shown in the figure, and may include other functions. And each embodiment can be suitably combined in the range which does not contradict a processing content.

〔１〕本実施形態に関連する技術
まず、図１４〜図２３を参照しながら本実施形態に関連する技術について説明する。
近年、マルチプロセッサシステム（情報処理装置）では、通信の高スループットを実現すべく、図１４に示すように、複数のＣＰＵやルータの相互間が、高速シリアル伝送によるネットワーク（図１６の符号３０参照）を介して接続される。図１４は、マルチプロセッサシステムの構成の一例を示すブロック図である。 [1] Technology Related to the Present Embodiment First, the technology related to the present embodiment will be described with reference to FIGS.
In recent years, in a multiprocessor system (information processing apparatus), a high-speed serial transmission network (see reference numeral 30 in FIG. 16) is used between a plurality of CPUs and routers to achieve high communication throughput. ). FIG. 14 is a block diagram illustrating an example of a configuration of a multiprocessor system.

また、近年、ＣＰＵとしては、複数のコア（core）を内蔵したマルチコアプロセッサが一般的に用いられる。ここで、図１５を参照しながら、図１４に示すマルチプロセッサシステムを構成する各ＣＰＵ（マルチコアプロセッサ，演算処理装置）１０の構成の一例について説明する。図１５は、ＣＰＵ１０の構成の一例を示すブロック図である。 In recent years, a multi-core processor incorporating a plurality of cores is generally used as the CPU. Here, an example of the configuration of each CPU (multi-core processor, arithmetic processing unit) 10 constituting the multiprocessor system shown in FIG. 14 will be described with reference to FIG. FIG. 15 is a block diagram illustrating an example of the configuration of the CPU 10.

図１５に示すように、ＣＰＵ１０においては、複数のコア１１が共有キャッシュ（三次キャッシュ）１２を介して接続されている。図１５では、コア１１として、Ｎ＋１個のコア＃０〜コア＃Ｎ（Ｎは１以上の整数）が備えられている。図１５では、一次キャッシュ（不図示）および二次キャッシュ（不図示）が各コア１１に内蔵され、三次キャッシュが、共有キャッシュ１２として用いられているが、一次キャッシュを各コア１１に内蔵し、二次キャッシュを共有キャッシュ１２として用いてもよい。 As shown in FIG. 15, in the CPU 10, a plurality of cores 11 are connected via a shared cache (tertiary cache) 12. In FIG. 15, the core 11 includes N + 1 cores # 0 to #N (N is an integer equal to or greater than 1). In FIG. 15, a primary cache (not shown) and a secondary cache (not shown) are built in each core 11, and a tertiary cache is used as the shared cache 12. However, a primary cache is built in each core 11, A secondary cache may be used as the shared cache 12.

共有キャッシュ１２には、コア１１のほかにＭＡＣ（Memory Access Controller）１３およびルータ（Router）１４が接続されている。ＭＡＣ１３は、メインメモリとして機能するＤＩＭＭ（Dual Inline Memory Module）２０を接続され、ＤＩＭＭ２０とのデータのやり取りを制御する制御部として機能する。ルータ１４は、他のＣＰＵ１０または他のルータ（例えばルーティング用のＬＳＩ（Large Scale Integrated circuit））に接続され、パケットによって、他のＣＰＵ１０とのデータの通信を行なう通信部として機能する。 In addition to the core 11, a MAC (Memory Access Controller) 13 and a router (Router) 14 are connected to the shared cache 12. The MAC 13 is connected to a DIMM (Dual Inline Memory Module) 20 that functions as a main memory, and functions as a control unit that controls data exchange with the DIMM 20. The router 14 is connected to another CPU 10 or another router (for example, an LSI (Large Scale Integrated circuit) for routing), and functions as a communication unit that performs data communication with the other CPU 10 by a packet.

このようなＣＰＵ１０において、コア１１の一つがメモリデータの読出（フェッチ）を要求する場合（フェッチ要求を発行する場合）について考える。コア１１は、当該コア１１が必要とする８バイトのデータ（処理対象データ）を示すアドレスを指定して、まず、コア１１に内蔵されたキャッシュ（一次キャッシュ，二次キャッシュ）にアクセスする。その結果、コア１１は、キャッシュヒットすると、当該８バイトのデータを内蔵キャッシュから読み出す。 In such a CPU 10, consider a case where one of the cores 11 requests reading (fetching) of memory data (when issuing a fetch request). The core 11 designates an address indicating 8-byte data (processing target data) required by the core 11 and first accesses a cache (primary cache, secondary cache) built in the core 11. As a result, when the core 11 hits the cache, the core 11 reads the 8-byte data from the internal cache.

一方、コア１１は、内蔵メモリでキャッシュミスすると、複数のコアによって共有している共有キャッシュ（図１５では３次キャッシュ）１２にアクセスする。しかし、共有キャッシュ１２でもキャッシュミスすると、メインメモリ２０に対するアクセスが行なわれる。共有キャッシュ１２のキャッシュラインサイズは、例えば１２８バイトであるため、ＭＡＣ１３に対しては、コア１１の要求する８バイトの処理対象データを含む１２８バイトのデータブロックの読出が要求される。 On the other hand, when a cache miss occurs in the built-in memory, the core 11 accesses a shared cache (tertiary cache in FIG. 15) 12 shared by a plurality of cores. However, if a cache miss occurs in the shared cache 12, the main memory 20 is accessed. Since the cache line size of the shared cache 12 is, for example, 128 bytes, the MAC 13 is requested to read a 128-byte data block including 8-byte processing target data requested by the core 11.

次に、コア１１が、当該コア１１の属するＣＰＵ１０（例えば図１４や図１６のＣＰＵ＃０）と異なる他のＣＰＵ１０（例えば図１４や図１６のＣＰＵ＃１）におけるメモリデータ（例えば８バイトの処理対象データ）の読出を要求する場合について、図１６を参照しながら説明する。図１６は、図１４に示すマルチプロセッサシステムにおいて一ＣＰＵ＃０から他ＣＰＵ＃１へフェッチ要求を行なう場合のパケットルーティングを説明する図である。 Next, the core 11 has memory data (for example, 8-byte memory) in another CPU 10 (for example, CPU # 1 in FIGS. 14 and 16) different from the CPU 10 (for example, CPU # 0 in FIGS. 14 and 16) to which the core 11 belongs. A case of requesting reading of (processing target data) will be described with reference to FIG. FIG. 16 is a diagram for explaining packet routing when a fetch request is made from one CPU # 0 to another CPU # 1 in the multiprocessor system shown in FIG.

この場合、図１６に示すように、ＣＰＵ＃０におけるコア１１は、フェッチ要求パケットを、ルータ１４およびネットワーク３０を介して他のＣＰＵ＃１へ送信する。フェッチ要求パケットを受け取ったＣＰＵ＃１は、要求されたメモリデータ（８バイトの処理対象データ）を含むデータブロックを、ＣＰＵ＃１のＤＩＭＭ２０から読み出し、フェッチ応答パケットとしてデータブロックを送り返す。このときのデータブロックのサイズは１２８バイトである。 In this case, as shown in FIG. 16, the core 11 in the CPU # 0 transmits the fetch request packet to the other CPU # 1 via the router 14 and the network 30. Receiving the fetch request packet, the CPU # 1 reads a data block including the requested memory data (8-byte processing target data) from the DIMM 20 of the CPU # 1, and sends back the data block as a fetch response packet. The size of the data block at this time is 128 bytes.

ここで、図１７（Ａ）および図１７（Ｂ）を参照しながら、フェッチ要求パケットおよびフェッチ応答パケットについて説明する。図１７（Ａ）はデータ無しパケット（フェッチ要求パケット）のフォーマットを示す図であり、図１７（Ｂ）はデータ付きパケット（フェッチ応答パケット）のフォーマットを示す図である。 Here, the fetch request packet and the fetch response packet will be described with reference to FIGS. 17 (A) and 17 (B). FIG. 17A is a diagram showing the format of a packet without data (fetch request packet), and FIG. 17B is a diagram showing the format of a packet with data (fetch response packet).

パケットには、図１７（Ａ）に示すようなデータ無しパケットと、図１７（Ｂ）に示すようなデータ付きパケットとがある。ＣＰＵ＃０からＣＰＵ＃１へ送信されるフェッチ要求パケットは、データ無しパケットであり、当該フェッチ要求パケットに対するフェッチ応答パケットは、データ付きパケットである。パケットの先頭サイクルは、パケットの内容や宛先を判別するための情報を載せたヘッダ（Header；ＨＤ）である。 The packet includes a packet without data as shown in FIG. 17A and a packet with data as shown in FIG. The fetch request packet transmitted from CPU # 0 to CPU # 1 is a no-data packet, and the fetch response packet for the fetch request packet is a packet with data. The head cycle of the packet is a header (HD) on which information for determining the content and destination of the packet is placed.

さらに、図１８（Ａ）および図１８（Ｂ）を参照しながら、フェッチ要求パケットおよびフェッチ応答パケットのヘッダについて説明する。図１８（Ａ）はフェッチ要求パケットのヘッダのフォーマットを示す図であり、図１８（Ｂ）はフェッチ応答パケットのヘッダのフォーマットを示す図である。 Further, the header of the fetch request packet and the fetch response packet will be described with reference to FIGS. 18 (A) and 18 (B). FIG. 18A is a diagram showing the format of the header of the fetch request packet, and FIG. 18B is a diagram showing the format of the header of the fetch response packet.

図１８（Ａ）に示すように、フェッチ要求パケットのヘッダには、パケットの種別を表すOpcode（オペコード；ＯＰＣ)や、コア１１の要求するメモリデータのPhysical Address（物理アドレス；ＰＡまたはｐａ）や、パケットを識別するための識別子であるRequest ＩＤ（ＲＱＩＤ）などの情報が含まれる。図１８（Ｂ）に示すように、フェッチ応答パケットのヘッダには、ＯＰＣやＲＱＩＤなどの情報が含まれる。なお、ＲＳＶ（reserve）は、未使用のビットである。 As shown in FIG. 18A, the header of the fetch request packet includes an Opcode (opcode; OPC) indicating the type of packet, a physical address (physical address; PA or pa) of memory data requested by the core 11, and the like. , Information such as Request ID (RQID), which is an identifier for identifying a packet, is included. As shown in FIG. 18B, the header of the fetch response packet includes information such as OPC and RQID. RSV (reserve) is an unused bit.

コア１１の要求する処理対象データを含む１２８バイトのデータブロックは、図１７（Ｂ）に示すように、ヘッダの後に、８バイトのデータに分けられ、１６サイクルかけて送信される。フェッチ応答パケットにおいて、各サイクルの８バイトデータは、先頭から順に物理アドレスｐａ[6:3] ＝ 0000, 0001, … , 1111におけるデータData0, Data1, … , DataFに対応する。 As shown in FIG. 17B, the 128-byte data block including the processing target data requested by the core 11 is divided into 8-byte data after the header and transmitted over 16 cycles. In the fetch response packet, 8-byte data in each cycle corresponds to data Data0, Data1,..., DataF at physical addresses pa [6: 3] = 0000, 0001,.

ＣＰＵ＃１からフェッチ応答パケットを受け取ったＣＰＵ＃０（ルータ１４）は、フェッチ応答パケットに添付されたデータブロックを、ＣＰＵ＃０の共有キャッシュ１２に登録するとともに、フェッチ要求を行なったコア１１へ送る。 The CPU # 0 (router 14) that has received the fetch response packet from the CPU # 1 registers the data block attached to the fetch response packet in the shared cache 12 of the CPU # 0 and to the core 11 that has made the fetch request. send.

一方、ＣＰＵ＃０，＃１間の伝送エラーは、パケットに付されたＣＲＣをパケット受信側でチェックすることによって検出される。伝送エラーが検出されると、伝送エラーを検出したパケット以降のパケットは破棄され、ＤＬＬＰ（Data Link Layer Packet）を用いてＮＡＣＫ（Negative ACKnowledgement）を送ることで、送信側ＣＰＵ＃１に対しパケットの再送が要求される。 On the other hand, a transmission error between the CPUs # 0 and # 1 is detected by checking the CRC attached to the packet at the packet receiving side. When a transmission error is detected, the packets after the packet in which the transmission error is detected are discarded, and a NACK (Negative ACKnowledgement) is sent using DLLP (Data Link Layer Packet), so that the packet of the packet is sent to the sending CPU # 1. A resend is requested.

伝送エラーを検出したパケットを、中継点（ルータ；図１４参照）や受信側ＣＰＵ＃０において破棄することができない場合、当該パケットは、当該パケットの末尾をＥＤＢにして送出される。 When a packet in which a transmission error is detected cannot be discarded at the relay point (router; see FIG. 14) or the receiving CPU # 0, the packet is sent with the end of the packet as EDB.

受信側ＣＰＵ＃０におけるルータ１４は、他のＣＰＵ＃１から受信したフェッチ応答パケットを、一旦、当該ルータ１４内のバッファ（図１９の符号１４１参照）に格納する。当該バッファを受信バッファという。受信バッファはＦＩＦＯのバッファである。ここで、図１９〜図２３を参照しながら、受信バッファ１４１および当該受信バッファに対するパケット書込／読出処理について説明する。 The router 14 in the receiving CPU # 0 temporarily stores the fetch response packet received from the other CPU # 1 in a buffer (see reference numeral 141 in FIG. 19) in the router 14 concerned. This buffer is called a reception buffer. The reception buffer is a FIFO buffer. Here, with reference to FIGS. 19 to 23, the reception buffer 141 and packet writing / reading processing for the reception buffer will be described.

図１９は、図１５に示すＣＰＵ１０におけるルータ１４に含まれる受信バッファ１４１および当該受信バッファ１４１に対するパケット書込／読出処理に係る構成を示すブロック図である。図１９に示すように、受信バッファ１４１の書込側には、ＷＤＲ（Write Data Register；書込データレジスタ），ＨＤＷＰ（Header Write Pointer；ヘッダ書込ポインタ）およびＷＰ（Write Pointer；書込ポインタ）が備えられる。また、受信バッファ１４１の読出側には、ＲＤＲ（Read Data Register；読出データレジスタ）およびＲＰ（Read Pointer；読出ポインタ）が備えられる。 FIG. 19 is a block diagram showing a configuration relating to reception buffer 141 included in router 14 and packet writing / reading processing to reception buffer 141 in CPU 10 shown in FIG. As shown in FIG. 19, on the write side of the reception buffer 141, WDR (Write Data Register), HDWP (Header Write Pointer) and WP (Write Pointer) are provided. Is provided. On the reading side of the reception buffer 141, an RDR (Read Data Register) and an RP (Read Pointer) are provided.

ＷＤＲは、受信バッファ１４１に書き込まれる書込対象データ（一単位データ；例えば８バイトデータ）を一時的に保存するレジスタである。ＷＰは、書込対象パケットの書込を制御するもので、ＷＤＲに保存されている書込対象データの、受信バッファ１４１における書込先アドレスを指定するポインタである。ＷＰは、初期状態では０を設定され、書込対象パケットの受信バッファ１４１への書込時に１サイクル毎に１ずつインクリメントされる。ＷＤＲに保存されている書込対象データは、ＷＰによって指定されるアドレスに書き込まれる。ＨＤＷＰは、書込対象パケットのヘッダのアドレスを指定するポインタであり、初期状態では０を設定される。 The WDR is a register that temporarily stores write target data (one unit data; for example, 8-byte data) to be written in the reception buffer 141. The WP controls writing of a write target packet, and is a pointer that specifies a write destination address in the reception buffer 141 of write target data stored in the WDR. WP is set to 0 in the initial state, and is incremented by 1 every cycle when the write target packet is written to the reception buffer 141. The write target data stored in the WDR is written at an address specified by the WP. HDWP is a pointer that specifies the address of the header of the write target packet, and is set to 0 in the initial state.

ＲＤＲは、受信バッファ１４１から読み出された読出データ（一単位データ；例えば８バイトデータ）を一時的に保存するレジスタである。ＲＰは、受信バッファ１４１からのデータの読出を制御するもので、受信バッファ１４１から読み出すデータのアドレスを指定するポインタである。ＲＰは、初期状態では０を設定され、受信バッファ１４１からのデータ読出時に１サイクル毎に１ずつインクリメントされる。ＲＰによって指定されるアドレスのデータは、受信バッファ１４１から読み出され、ＲＤＲに一時的に保存され、前述したように共有キャッシュ１２に登録されるとともにフェッチ要求を行なったコア１１へ送信される。 The RDR is a register that temporarily stores read data (one unit data; for example, 8-byte data) read from the reception buffer 141. The RP controls reading of data from the reception buffer 141 and is a pointer for designating an address of data read from the reception buffer 141. RP is set to 0 in the initial state, and is incremented by 1 every cycle when data is read from the reception buffer 141. Data at the address specified by the RP is read from the reception buffer 141, temporarily stored in the RDR, registered in the shared cache 12 as described above, and transmitted to the core 11 that has made the fetch request.

図２０に示すフローチャート（ステップＳ１０１〜Ｓ１０７）に従って、図１９に示すパケット書込処理に係る構成（ＣＰＵ＃０におけるルータ１４）の動作について説明する。ルータ１４は、パケット書込時にヘッダ（ＨＤ）またはデータ（ＤＴ）の受信を待機しており（ステップＳ１０１のＮＯルート）、ヘッダ（ＨＤ）またはデータ（ＤＴ）を受信すると（ステップＳ１０１のＹＥＳルート）、ＥＤＢを受信したか否かを判断する（ステップＳ１０２）。 The operation of the configuration related to the packet writing process shown in FIG. 19 (router 14 in CPU # 0) will be described according to the flowchart shown in FIG. 20 (steps S101 to S107). The router 14 waits for reception of the header (HD) or data (DT) at the time of packet writing (NO route of step S101), and receives the header (HD) or data (DT) (YES route of step S101). ), It is determined whether an EDB has been received (step S102).

ＥＤＢを受信していない場合（ステップＳ１０２のＮＯルート）、ルータ１４は、パケットの最終サイクルを示すＥＮＤを受信したか否かを判断する（ステップＳ１０３）。ＥＮＤを受信した場合（ステップＳ１０３のＹＥＳルート）、ルータ１４は、ＨＤＷＰに、次の書込対象パケットの先頭アドレス（ヘッダのアドレス）を指定する値として、ＷＰ＋１を設定する（ステップＳ１０４）。この後、ルータ１４は、ＷＰによって指定される、受信バッファ１４１のエントリに、ＷＤＲにおけるデータ（又はヘッダ）を書き込んでから（ステップＳ１０５）、ＷＰを１インクリメントし（ステップＳ１０６）、ステップＳ１０１の処理に戻る。 If the EDB has not been received (NO route in step S102), the router 14 determines whether or not an END indicating the final cycle of the packet has been received (step S103). When END is received (YES route in step S103), the router 14 sets WP + 1 as a value for designating the head address (header address) of the next write target packet in the HDWP (step S104). Thereafter, the router 14 writes the data (or header) in the WDR to the entry of the reception buffer 141 specified by the WP (step S105), increments the WP by 1 (step S106), and performs the process of step S101. Return to.

ステップＳ１０２でＥＤＢを受信したと判断した場合（ＹＥＳルート）、ルータ１４は、ＷＰにＨＤＷＰを設定し（ステップＳ１０７）、ステップＳ１０１の処理に戻る。これによって、現在の書込対象パケット（末尾にＥＤＢを設定されたパケット）が、再度、先頭（ヘッダ）から順に受信バッファ１４１へ書き込まれる。 If it is determined in step S102 that an EDB has been received (YES route), the router 14 sets HDWP in the WP (step S107), and returns to the processing in step S101. As a result, the current write target packet (packet with EDB set at the end) is written again to the reception buffer 141 in order from the head (header).

ついで、図２１に示すフローチャート（ステップＳ１１１〜Ｓ１１３）に従って、図１９に示すパケット読出処理に係る構成（ＣＰＵ＃０におけるルータ１４）の動作について説明する。ルータ１４は、ＲＰの値とＨＤＷＰの値とが一致しているか否かを判断する（ステップＳ１１１）。ＲＰの値とＨＤＷＰの値とが一致している場合（ステップＳ１１１のＹＥＳルート）、ルータ１４は、書込対象パケットの書込中であると判断し、ステップＳ１１１の処理に戻る。 Next, the operation of the configuration (router 14 in CPU # 0) related to the packet reading process shown in FIG. 19 will be described according to the flowchart shown in FIG. 21 (steps S111 to S113). The router 14 determines whether or not the RP value matches the HDWP value (step S111). If the RP value matches the HDWP value (YES route in step S111), the router 14 determines that the write target packet is being written, and returns to the process in step S111.

ＲＰの値とＨＤＷＰの値とが一致しない場合（ステップＳ１１１のＮＯルート）、ルータ１４は、書込対象パケットの書込を完了したと判断し、受信バッファ１４１からのパケット読出を開始する（図２２のタイミングＴ１８参照）。つまり、ルータ１４は、ＲＰによって指定される、受信バッファ１４１のエントリ（一単位データ）を、受信バッファ１４１からＲＤＲ経由で読み出す（ステップＳ１１２）。この後、ルータ１４は、ＲＰを１インクリメントし（ステップＳ１１３）、ステップＳ１１１の処理に戻る。 If the RP value and the HDWP value do not match (NO route of step S111), the router 14 determines that the writing of the write target packet has been completed, and starts reading the packet from the reception buffer 141 (FIG. 22 timing T18). That is, the router 14 reads the entry (one unit data) of the reception buffer 141 designated by the RP from the reception buffer 141 via RDR (step S112). Thereafter, the router 14 increments RP by 1 (step S113) and returns to the process of step S111.

ここで、受信バッファ１４１に対するフェッチ応答パケットの書込／読出処理の最短ケースのタイムチャートを図２２に示す。図２２に示すように、初期状態（タイミングＴ０参照）において、ＷＥ（Write Enable）およびＲＥ（Read Enable）はＬｏｗ状態に設定され、ＷＰ，ＨＤＷＰ，ＲＰは０に設定される。パケットの書込処理は、ＷＥをＨｉｇｈ状態にすることで開始され、まずヘッダＨＤがＷＤＲ経由で受信バッファ１４１に書き込まれる（タイミングＴ１参照）。 Here, FIG. 22 shows a time chart of the shortest case of writing / reading processing of the fetch response packet with respect to the reception buffer 141. As shown in FIG. 22, in the initial state (see timing T0), WE (Write Enable) and RE (Read Enable) are set to the Low state, and WP, HDWP, and RP are set to 0. The packet writing process is started by setting WE to a high state, and first, the header HD is written to the reception buffer 141 via the WDR (see timing T1).

以降、ＷＰが１ずつインクリメントされる都度、１６個のデータＤＴ０〜ＤＴＦがＷＤＲ経由で受信バッファ１４１に順次書き込まれる（タイミングＴ２〜Ｔ１７参照；図２０のステップＳ１０３のＮＯルートからステップＳ１０５，Ｓ１０６参照）。 Thereafter, each time WP is incremented by 1, 16 pieces of data DT0 to DTF are sequentially written to the reception buffer 141 via the WDR (see timing T2 to T17; refer to steps S105 and S106 from the NO route of step S103 in FIG. 20). ).

そして、タイミングＴ１７で、データＤＴＦとパケットの最終サイクルを示すＥＮＤとが受信されると、ＷＥがＬｏｗ状態に設定されるとともにＲＥがＨｉｇｈ状態に設定される。これに伴い、タイミングＴ１８で、ＨＤＷＰの値として、次のパケットのヘッダのアドレスを示す値ＷＰ＋１＝１７が設定される（図２０のステップＳ１０３のＹＥＳルートからステップＳ１０４）。これにより、ＨＤＷＰの値１７とＲＰの値１とが不一致になり（図２１のステップＳ１１１のＮＯルート）、パケットの読出処理が開始され、まずヘッダＨＤがＲＤＲ経由で受信バッファ１４１から読み出される（タイミングＴ１８参照）。 When data DTF and END indicating the last cycle of the packet are received at timing T17, WE is set to the Low state and RE is set to the High state. Accordingly, at timing T18, the value WP + 1 = 17 indicating the header address of the next packet is set as the HDWP value (from the YES route in step S103 in FIG. 20 to step S104). As a result, the HDWP value 17 and the RP value 1 do not match (NO route in step S111 in FIG. 21), and the packet reading process is started. First, the header HD is read from the reception buffer 141 via the RDR ( (See timing T18).

以降、ＲＰが１インクリメントされる都度、１６個のデータＤＴ０〜ＤＴＦがＲＤＲ経由で受信バッファ１４１から順次読み出される（タイミングＴ１９〜Ｔ３４参照；図２１のステップＳ１１１のＮＯルートからステップＳ１１２，Ｓ１１３参照）。このような読出処理は、タイミングＴ３４でＨＤＷＰの値とＲＰの値とが一致するまで（つまり図２１のステップＳ１１１でＹＥＳと判定されるまで）実行される。図２２に示す例では、タイミングＴ３４においてＨＤＷＰの値とＲＰの値とは、１７で一致している。 Thereafter, each time RP is incremented by 1, 16 pieces of data DT0 to DTF are sequentially read from the reception buffer 141 via RDR (see timings T19 to T34; refer to steps S112 and S113 from the NO route of step S111 in FIG. 21). . Such a reading process is executed until the HDWP value and the RP value match at timing T34 (that is, until YES is determined in step S111 in FIG. 21). In the example shown in FIG. 22, the HDWP value and the RP value coincide with each other at the timing T 34.

図２０〜図２２を参照しながら上述したように、パケット読出処理は、パケットの最終サイクルで最後の単位データが書き込まれるのを待ってから開始される。これは、パケットの最終サイクル（末尾）がＥＤＢである場合には、パケットを破棄する必要があるためである。 As described above with reference to FIGS. 20 to 22, the packet reading process is started after waiting for the last unit data to be written in the final cycle of the packet. This is because the packet needs to be discarded when the last cycle (end) of the packet is EDB.

ついで、受信したフェッチ応答パケットの末尾がＥＤＢである場合のタイムチャートを図２３に示す。図２３では、図２２と同様の手順に従ってパケットのデータＤＴ８の書込タイミングＴ１０になった時点で、ＥＤＢが検出される場合が例示されている。この場合、タイミングＴ１１で、ＷＰの値としてＨＤＷＰの値（図２３では０）が保存される（図２０のステップＳ１０２のＹＥＳルートからステップＳ１０７参照）。 Next, FIG. 23 shows a time chart when the end of the received fetch response packet is EDB. FIG. 23 illustrates a case where EDB is detected at the time when the write timing T10 of the packet data DT8 is reached according to the same procedure as in FIG. In this case, at timing T11, the HDWP value (0 in FIG. 23) is stored as the WP value (refer to step S107 from the YES route in step S102 in FIG. 20).

これにより、受信バッファ１４１からの読出処理を行なうことなく、現在の書込対象パケット（末尾にＥＤＢを設定されたパケット）が、再度、先頭（ヘッダ）から順に受信バッファ１４１へ書き込まれる。その際、末尾にＥＤＢを設定されたパケットは、後続のパケットによって上書きされる。 Thus, the current write target packet (packet with EDB set at the end) is written again to the reception buffer 141 in order from the head (header) without performing the reading process from the reception buffer 141. At that time, a packet with EDB at the end is overwritten by a subsequent packet.

〔２〕本実施形態の概要
上述したように、高速シリアル伝送によるネットワーク３０では、伝送エラーがＣＲＣによってチェックされる。そして、伝送エラーが検出されるとパケットの末尾がＥＤＢに書き換えられる。そのため、エンドポイント（ＣＰＵ＃０）では、応答データ（処理対象データ）を含むパケットの末尾までの全てのデータを受信しないと、そのパケットが正常か否かを判定することができない。 [2] Outline of the Embodiment As described above, in the network 30 based on high-speed serial transmission, a transmission error is checked by the CRC. When a transmission error is detected, the end of the packet is rewritten to EDB. Therefore, the endpoint (CPU # 0) cannot determine whether or not the packet is normal unless it receives all the data up to the end of the packet including the response data (processing target data).

したがって、フェッチ要求を発行したＣＰＵ＃０では、当該フェッチ要求に応じた応答データを含むパケットの末尾までの全てのデータが、一旦、受信バッファ１４１に保存されてから、受信バッファ１４１におけるデータが先頭から順にコア１１に送り出される。 Therefore, in CPU # 0 that issued the fetch request, all the data up to the end of the packet including the response data corresponding to the fetch request is temporarily stored in the reception buffer 141, and then the data in the reception buffer 141 is the head. Are sent to the core 11 in order.

しかし、受信バッファ１４１はＦＩＦＯであるので、コア１１が要求している処理対象データ（一単位データ；例えば８バイトデータ）の、受信バッファ１４１からの読出は、当該単位データ直前のデータが読み出されるまで待たされる。このため、コア１１から見て通信レイテンシが大きくなり、ＣＰＵ＃０の処理性能や、当該ＣＰＵ＃０を含む情報処理装置（マルチプロセッサシステム）の処理性能が低下する場合がある。 However, since the reception buffer 141 is a FIFO, the processing target data (one unit data; for example, 8-byte data) requested by the core 11 is read from the reception buffer 141 by reading the data immediately before the unit data. Wait until. For this reason, the communication latency increases from the viewpoint of the core 11, and the processing performance of the CPU # 0 and the processing performance of the information processing apparatus (multiprocessor system) including the CPU # 0 may decrease.

このため、高速シリアル伝送のネットワーク３０で接続されるマルチプロセッサシステムにおいて、コア１１が要求する処理対象データの読出レイテンシを短縮することが望まれている。 For this reason, in a multiprocessor system connected by a high-speed serial transmission network 30, it is desired to shorten the read latency of processing target data requested by the core 11.

そこで、本実施形態（第１および第２実施形態）では、例えば、他のＣＰＵ１０から読み出され転送されてきたフェッチ応答パケットにおける１２８バイトのデータブロックのうち、コア１１が必要とする８バイトの処理対象データが、先に受信バッファ１４１から読み出されコア１１へ送られる。これにより、コア１１から見た通信レイテンシ（処理対象データの読出レイテンシ）が短縮される。 Therefore, in the present embodiment (first and second embodiments), for example, of the 128-byte data block in the fetch response packet read and transferred from the other CPU 10, the 8-byte required by the core 11 is used. The processing target data is first read from the reception buffer 141 and sent to the core 11. As a result, the communication latency (read latency of processing target data) viewed from the core 11 is shortened.

つまり、後述する第１実施形態の情報処理装置では、上述した関連技術と同様、エンドポイントの受信バッファ１４１に、フェッチ要求に対する応答パケットが全て書き込まれる。エンドポイントは、高速シリアル伝送のネットワーク３０に接続されフェッチ応答パケットを受信する受信側ＣＰＵ１０Ａ（図１参照；第１の演算処理装置，ＣＰＵ＃０）である。しかし、第１実施形態の情報処理装置では、図１〜図７を参照しながら後述するように、パケットの書込後、パケットの受信バッファ１４１からの読出時には、受信バッファ１４１に書き込んだ順ではなく、コア１１が要求する８バイトの処理対象データが最初に読み出される。 That is, in the information processing apparatus according to the first embodiment to be described later, all response packets to the fetch request are written in the reception buffer 141 of the endpoint, as in the related art described above. The end point is a receiving CPU 10A (see FIG. 1; first arithmetic processing unit, CPU # 0) that is connected to the high-speed serial transmission network 30 and receives a fetch response packet. However, in the information processing apparatus according to the first embodiment, as will be described later with reference to FIGS. 1 to 7, after the packet is written, when the packet is read from the reception buffer 141, Instead, the 8-byte processing target data requested by the core 11 is read first.

後述する第１実施形態の情報処理装置は、コア１１が要求する８バイトの処理対象データを最初に読み出すため、フェッチ応答パケットは、コア１１が要求する処理対象データを示す物理アドレス（ｐａ[6:3]）を含む（図２，図３参照）。また、エンドポイントにおいては、上述した関連技術のパケット読出処理に係る構成（図１９のＲＰおよびＲＤＲ参照）に、後述するＨＤＲＰ（ヘッダ読出ポインタ），LengthレジスタおよびCycleＣＴ（サイクルカウンタ）が追加される（図４および図５の読出部１４３参照）。ＨＤＲＰ，LengthレジスタおよびCycleＣＴを用いてＲＰを制御することで、コア１１が要求する８バイトの処理対象データを最初に読み出すことが可能になっている（図５〜図７参照）。 Since the information processing apparatus according to the first embodiment, which will be described later, first reads 8-byte processing target data requested by the core 11, the fetch response packet includes a physical address (pa [6] indicating the processing target data requested by the core 11. : 3]) (see FIGS. 2 and 3). In addition, in the end point, HDRP (header read pointer), Length register, and CycleCT (cycle counter), which will be described later, are added to the configuration related to the packet read processing of the related technology described above (see RP and RDR in FIG. 19). (See the reading unit 143 in FIGS. 4 and 5). By controlling the RP using the HDRP, Length register, and CycleCT, it is possible to first read 8-byte processing target data requested by the core 11 (see FIGS. 5 to 7).

このようにして、ＣＰＵ１０Ａのコア１１から要求される処理対象データを先頭にして受信バッファ１４１から読み出すことで読出レイテンシが短縮される。例えば、コア１１が要求する処理対象データがパケットの最後尾である場合、当該処理対象が最初に読み出されることで、レイテンシを［パケットのデータ部分のサイクル数−１］だけ短縮することができる。逆に、処理対象データがパケットの先頭だった場合、レイテンシは変わらない。つまり、レイテンシの短縮サイクル数は、処理対象データがパケットの何番目のデータであるかによって変わる。したがって、平均すると、［パケットのデータ部分のサイクル数−１］／２程度、レイテンシを短縮することが可能である。 In this way, the read latency is shortened by reading from the reception buffer 141 with the processing target data requested from the core 11 of the CPU 10A as the head. For example, when the processing target data requested by the core 11 is the tail end of the packet, the processing target is read first, so that the latency can be shortened by [the number of cycles of the data portion of the packet−1]. Conversely, if the data to be processed is at the beginning of the packet, the latency does not change. That is, the number of cycles for shortening the latency varies depending on what number of data the processing target data is. Therefore, on average, it is possible to reduce the latency by about [number of cycles of the data portion of the packet-1] / 2.

また、図１および図８〜図１３を参照しながら後述する第２実施形態は、行列計算等で発生するストライドアクセスにおいて、１個のデータ付きパケット内に、必要な単位データ（処理対象データ）が複数存在する場合に対応する技術である。行列の計算などで配列を扱う際、コア１１は、パケット長よりも短い一定間隔をあけたアドレスにおける複数の単位データを要求し、当該複数の単位データが１つのパケットに含まれている。 In the second embodiment to be described later with reference to FIG. 1 and FIGS. 8 to 13, in the stride access generated by matrix calculation or the like, necessary unit data (processing target data) is included in one packet with data. This is a technique corresponding to the case where there are a plurality of. When an array is handled by matrix calculation or the like, the core 11 requests a plurality of unit data at addresses having a fixed interval shorter than the packet length, and the plurality of unit data is included in one packet.

このような場合、第２実施形態では、受信バッファ１４１からパケットを読み出す際、必要な複数の単位データを、他の単位データよりも前（先頭側）に詰めて送出することで、必要な複数の単位データが、他の単位データよりも先に読み出される。これにより、ＣＰＵ１０Ａのコア１１から見たレイテンシが、大幅に短縮される。 In such a case, in the second embodiment, when a packet is read from the reception buffer 141, a plurality of necessary unit data are sent before being packed (front side) before the other unit data, and the necessary plurality of unit data are transmitted. These unit data are read before other unit data. Thereby, the latency seen from the core 11 of the CPU 10A is greatly shortened.

〔３〕第１実施形態の情報処理装置
図１を参照しながら、本発明の第１実施形態としての演算処理装置（ＣＰＵ）１０Ａ，１０Ｂを含む情報処理装置（マルチプロセッサシステム）１の構成について説明する。図１は、その構成を示すブロック図である。 [3] Information processing apparatus according to the first embodiment With reference to FIG. 1, the configuration of an information processing apparatus (multiprocessor system) 1 including arithmetic processing units (CPUs) 10A and 10B according to the first embodiment of the present invention. explain. FIG. 1 is a block diagram showing the configuration.

図１に示すように、第１実施形態の情報処理装置１は、複数（図１で２個）のＣＰＵ１０Ａ，１０Ｂを有するマルチプロセッサシステムであり、例えばＰＣ，サーバである。ＣＰＵ１０Ａは、第１の演算処理装置に相当し、受信側ＣＰＵもしくはＣＰＵ＃０と表記する場合がある。また、ＣＰＵ１０Ｂは、第２の演算処理装置に相当し、送信側ＣＰＵもしくはＣＰＵ＃１と表記する場合がある。なお、第１実施形態の情報処理装置１には２個のＣＰＵが備えられているが、本発明は、これに限定されるものでなく、３個以上のＣＰＵが備えられてもよい。 As shown in FIG. 1, the information processing apparatus 1 according to the first embodiment is a multiprocessor system having a plurality (two in FIG. 1) of CPUs 10A and 10B, such as a PC and a server. The CPU 10A corresponds to the first arithmetic processing unit, and may be referred to as a receiving CPU or CPU # 0. Further, the CPU 10B corresponds to a second arithmetic processing unit and may be referred to as a transmitting CPU or CPU # 1. Although the information processing apparatus 1 according to the first embodiment includes two CPUs, the present invention is not limited to this, and may include three or more CPUs.

ＣＰＵ１０ＡとＣＰＵ１０Ｂとは、高速シリアル伝送によるネットワーク３０を介して相互に通信可能に接続される。 The CPU 10A and the CPU 10B are connected to be communicable with each other via a network 30 based on high-speed serial transmission.

ＣＰＵ１０Ａは、図１５に示すＣＰＵ１０と同様、複数のコア１１Ａを内蔵したマルチコアプロセッサである。ＣＰＵ１０Ａにおいては、複数のコア１１が共有キャッシュ（三次キャッシュ）１２Ａを介して接続されている。図１では、コア１１Ａとして、Ｎ＋１個のコア＃０〜コア＃Ｎ（Ｎは１以上の整数）が備えられている。図１では、一次キャッシュ（不図示）および二次キャッシュ（不図示）が各コア１１に内蔵され、三次キャッシュが、共有キャッシュ１２Ａとして用いられているが、一次キャッシュを各コア１１Ａに内蔵し、二次キャッシュを共有キャッシュ１２Ａとして用いてもよい。 The CPU 10A is a multi-core processor incorporating a plurality of cores 11A, as with the CPU 10 shown in FIG. In the CPU 10A, a plurality of cores 11 are connected via a shared cache (tertiary cache) 12A. In FIG. 1, as the core 11A, N + 1 cores # 0 to #N (N is an integer equal to or greater than 1) are provided. In FIG. 1, a primary cache (not shown) and a secondary cache (not shown) are built in each core 11, and a tertiary cache is used as the shared cache 12A, but a primary cache is built in each core 11A. A secondary cache may be used as the shared cache 12A.

共有キャッシュ１２Ａには、コア１１ＡのほかにＭＡＣ１３Ａおよびルータ１４Ａが接続されている。ＭＡＣ１３Ａは、メインメモリとして機能するＤＩＭＭ２０Ａを接続され、ＤＩＭＭ２０Ａとのデータのやり取りを制御する制御部として機能する。ルータ１４Ａは、他のＣＰＵ１０Ｂまたは他のルータ（例えばルーティング用のＬＳＩ；図１４参照）に接続され、パケットによって、他のＣＰＵ１０Ｂとのデータの通信を行なう第１の通信部として機能する。 In addition to the core 11A, a MAC 13A and a router 14A are connected to the shared cache 12A. The MAC 13A is connected to a DIMM 20A that functions as a main memory, and functions as a control unit that controls data exchange with the DIMM 20A. The router 14A is connected to another CPU 10B or another router (for example, an LSI for routing; see FIG. 14), and functions as a first communication unit that performs data communication with the other CPU 10B by a packet.

ＣＰＵ１０Ｂは、コア１１Ｂ，共有キャッシュ（三次キャッシュ）１２Ｂ，ＭＡＣ１３Ｂ，ルータ１４Ｂを有している。コア１１Ｂ，共有キャッシュ１２Ｂ，ＭＡＣ１３Ｂ，ルータ１４Ｂは、それぞれ、上述したＣＰＵ１０Ａのコア１１Ａ，共有キャッシュ１２Ａ，ＭＡＣ１３Ａ，ルータ１４Ａと同様である。ただし、ＭＡＣ１３Ｂは、メインメモリとして機能するＤＩＭＭ２０Ｂを接続され、ＤＩＭＭ２０Ｂとのデータのやり取りを制御する制御部として機能する。ルータ１４Ｂは、他のＣＰＵ１０Ａまたは他のルータ（例えばルーティング用のＬＳＩ；図１４参照）に接続され、パケットによって、他のＣＰＵ１０Ａとのデータの通信を行なう第２の通信部として機能する。 The CPU 10B includes a core 11B, a shared cache (tertiary cache) 12B, a MAC 13B, and a router 14B. The core 11B, shared cache 12B, MAC 13B, and router 14B are the same as the core 11A, shared cache 12A, MAC 13A, and router 14A of the CPU 10A described above, respectively. However, the MAC 13B is connected to a DIMM 20B that functions as a main memory, and functions as a control unit that controls exchange of data with the DIMM 20B. The router 14B is connected to another CPU 10A or another router (for example, a routing LSI; see FIG. 14), and functions as a second communication unit that performs data communication with the other CPU 10A by a packet.

ＣＰＵ１０Ａにおける複数のコア（処理部）１１Ａのうちの少なくとも一つは、必要な処理対象データ（例えば８バイトの単位データ）の読出要求を生成する。以下では、読出要求をフェッチ要求という場合がある。 At least one of the plurality of cores (processing units) 11A in the CPU 10A generates a read request for necessary processing target data (for example, 8-byte unit data). Hereinafter, the read request may be referred to as a fetch request.

ＣＰＵ１０Ａにおけるルータ（第１の通信部）１４Ａは、一のコア１１Ａが生成したフェッチ要求を、フェッチ要求パケット（図２，図１７（Ａ）参照）によってＣＰＵ１０Ｂへ送信する。また、ルータ（第１の通信部）１４Ａは、フェッチ要求に対応する処理対象データを含むデータブロックを添付されたフェッチ応答パケット（図２，図１７（Ｂ）参照）を、ＣＰＵ１０Ｂから受信する。 The router (first communication unit) 14A in the CPU 10A transmits a fetch request generated by one core 11A to the CPU 10B by a fetch request packet (see FIGS. 2 and 17A). Further, the router (first communication unit) 14A receives from the CPU 10B a fetch response packet (see FIGS. 2 and 17B) attached with a data block including data to be processed corresponding to the fetch request.

一方、ＣＰＵ１０Ｂにおけるルータ（第２の通信部）１４Ｂは、フェッチ要求パケット（図２，図１７（Ａ）参照）によって、ＣＰＵ１０Ａからフェッチ要求を受信する。また、ルータ（第２の通信部）１４Ｂは、フェッチ要求に対応する処理対象データを含むデータブロックを添付されたフェッチ応答パケット（図２，図１７（Ｂ）参照）を、ＣＰＵ１０Ａへ送信する。 On the other hand, the router (second communication unit) 14B in the CPU 10B receives the fetch request from the CPU 10A by a fetch request packet (see FIGS. 2 and 17A). Further, the router (second communication unit) 14B transmits a fetch response packet (see FIGS. 2 and 17B) to which the data block including the processing target data corresponding to the fetch request is attached to the CPU 10A.

特に、第１実施形態において、ＣＰＵ１０Ｂのルータ（第２の通信部）１４Ｂは、フェッチ要求（フェッチ要求パケットのヘッダ）に含まれるアドレス情報（図１８（Ａ）のＰＡ参照）から取り出された処理対象データのアドレス情報ｐａ[6:3]を記録した応答ヘッダをデータブロックに付す。ここで、応答ヘッダは、例えば図３を参照しながら後述するフェッチ応答パケットのヘッダである。そして、ルータ（第２の通信部）１４Ｂは、応答ヘッダを付したデータブロックを、フェッチ応答パケット（図２，図１７（Ｂ）参照）としてＣＰＵ１０Ａへ送信する。 In particular, in the first embodiment, the router (second communication unit) 14B of the CPU 10B performs processing extracted from address information (see PA in FIG. 18A) included in the fetch request (the header of the fetch request packet). A response header in which the address information pa [6: 3] of the target data is recorded is attached to the data block. Here, the response header is, for example, a header of a fetch response packet described later with reference to FIG. Then, the router (second communication unit) 14B transmits the data block with the response header to the CPU 10A as a fetch response packet (see FIGS. 2 and 17B).

ＣＰＵ１０Ａのルータ（第１の通信部）１４Ａは、受信バッファ１４１，書込部１４２，読出部１４３を有する。第１実施形態において、受信バッファ１４１，書込部１４２，読出部１４３は、ＣＰＵ１０Ａのルータ１４Ａ内に備えられているが、ＣＰＵ１０Ａ内に備えられていればよい。 The router (first communication unit) 14A of the CPU 10A includes a reception buffer 141, a writing unit 142, and a reading unit 143. In the first embodiment, the reception buffer 141, the writing unit 142, and the reading unit 143 are provided in the router 14A of the CPU 10A, but may be provided in the CPU 10A.

受信バッファ（バッファ）１４１は、データブロックを添付されたデータ付きパケットを単位データ（例えば８バイト）で保存する。 The reception buffer (buffer) 141 stores a packet with data attached with a data block as unit data (for example, 8 bytes).

書込部１４２は、ルータ１４Ａによって受信されたパケット（ヘッダおよびデータブロック）に含まれる複数の単位データを、受信バッファ１４１に順次書き込む。第１実施形態の書込部１４２は、図４を参照しながら後述するごとく、図１９に示した関連技術と同様に構成されている。 The writing unit 142 sequentially writes a plurality of unit data included in the packet (header and data block) received by the router 14A in the reception buffer 141. As will be described later with reference to FIG. 4, the writing unit 142 of the first embodiment is configured in the same manner as the related technique shown in FIG. 19.

読出部１４３は、パケットに含まれる複数の単位データのうちの少なくとも一つである処理対象データを、受信バッファ１４１から優先的に読み出す。特に、第１実施形態の読出部１４３は、データブロックに付された応答ヘッダに記録された処理対象データのアドレス情報ｐａ[6:3]を参照する。そして、読出部１４３は、まず、参照した当該アドレス情報ｐａ[6:3]に対応する処理対象データを受信バッファ１４１から読み出した後、当該処理対象データ以外の単位データを受信バッファ１４１から順次読み出す。第１実施形態の読出部１４３は、図４および図５を参照しながら後述するごとく構成され、図６および図７を参照しながら後述するごとく動作する。 The reading unit 143 reads from the reception buffer 141 preferentially processing target data that is at least one of the plurality of unit data included in the packet. In particular, the reading unit 143 according to the first embodiment refers to the address information pa [6: 3] of the processing target data recorded in the response header attached to the data block. The reading unit 143 first reads processing target data corresponding to the referenced address information pa [6: 3] from the reception buffer 141, and then sequentially reads unit data other than the processing target data from the reception buffer 141. . The reading unit 143 of the first embodiment is configured as described later with reference to FIGS. 4 and 5, and operates as described later with reference to FIGS. 6 and 7.

なお、第１実施形態では、ＣＰＵ１０Ａが、第１の通信部としての機能や、受信バッファ１４１，書込部１４２，読出部１４３としての機能を有し、ＣＰＵ１０Ｂが、第２の通信部としての機能を有する場合について説明している。しかし、複数のＣＰＵ１０Ａ，１０Ｂのそれぞれが、第１および第２の通信部としての機能と、受信バッファ１４１，書込部１４２，読出部１４３としての機能とを有していてもよい。 In the first embodiment, the CPU 10A has a function as a first communication unit and functions as a reception buffer 141, a writing unit 142, and a reading unit 143, and the CPU 10B serves as a second communication unit. The case of having a function is described. However, each of the plurality of CPUs 10 A and 10 B may have a function as the first and second communication units and a function as the reception buffer 141, the writing unit 142, and the reading unit 143.

ここで、図２を参照しながら、図１に示す情報処理装置１において、一ＣＰＵ１０Ａのコア１１Ａから他ＣＰＵ１０Ｂのメモリデータに対するフェッチ要求を発行する場合のパケットルーティングについて説明する。 Here, referring to FIG. 2, in the information processing apparatus 1 shown in FIG. 1, packet routing in the case where a fetch request for memory data of another CPU 10B is issued from the core 11A of one CPU 10A will be described.

この場合、図２に示すように、必要な処理対象データを読み出すためのフェッチ要求パケット（データ無しパケット）が、ＣＰＵ１０Ａから発行・送信され、ネットワーク３０経由で、当該処理対象データを保持するメモリ２０Ｂを有するＣＰＵ１０Ｂにルーティングされる。ルータ１４Ｂによってフェッチ要求パケットを受信したＣＰＵ１０Ｂは、要求された処理対象データを含むデータブロックをメモリ２０Ｂから読み出し、フェッチ応答パケット（データ付きパケット）を生成し、当該フェッチ応答パケットをＣＰＵ１０Ａに送信する。 In this case, as shown in FIG. 2, a fetch request packet (packet without data) for reading out necessary processing target data is issued / transmitted from the CPU 10A, and the memory 20B holds the processing target data via the network 30. Is routed to the CPU 10B. The CPU 10B that has received the fetch request packet by the router 14B reads the data block including the requested processing target data from the memory 20B, generates a fetch response packet (packet with data), and transmits the fetch response packet to the CPU 10A.

このとき、ＣＰＵ１０Ｂで生成されるフェッチ応答パケットのヘッダ（応答ヘッダ）のフォーマットを図３に示す。図３に示すように、第１実施形態では、フェッチ応答パケットのヘッダに、フェッチ要求対象の単位データ（処理対象データ）を特定可能な４ビットの物理アドレスｐａ[6:3]が載せられている。４ビットの物理アドレスｐａ[6:3]は、フェッチ応答パケットに含まれるデータブロックにおける複数の単位データ（例えば１６個の８バイトデータ）のうちの、どの単位データがＣＰＵ１０Ａ側のコア１１Ａによって要求されているのかを識別するための情報である。 At this time, the format of the header (response header) of the fetch response packet generated by the CPU 10B is shown in FIG. As shown in FIG. 3, in the first embodiment, a 4-bit physical address pa [6: 3] capable of specifying unit data (processing target data) to be fetched is placed in the header of the fetch response packet. Yes. The 4-bit physical address pa [6: 3] is requested by the core 11A on the CPU 10A side out of which unit data (for example, 16 8-byte data) in the data block included in the fetch response packet. This is information for identifying whether or not

ＣＰＵ１０Ｂがフェッチ応答パケットを生成する際、フェッチ要求パケットのヘッダに載っている物理アドレスＰＡ（図１８（Ａ）参照）から、フェッチ要求対象の単位データを特定可能な４ビットの物理アドレスｐａ[6:3]が取り出される。取り出された物理アドレスｐａ[6:3]を、図１８（Ｂ）を参照しながら前述したヘッダに含ませることにより、図３に示すようなフォーマットの応答ヘッダが生成される。このように生成された応答ヘッダを有するフェッチ応答パケットは、図２に示すように、フェッチ要求の発行元であるＣＰＵ１０Ａにルーティングされ送信される。 When the CPU 10B generates the fetch response packet, the 4-bit physical address pa [6] that can specify the unit data to be fetched from the physical address PA (see FIG. 18A) included in the header of the fetch request packet. : 3] is taken out. By including the extracted physical address pa [6: 3] in the header described above with reference to FIG. 18B, a response header having a format shown in FIG. 3 is generated. The fetch response packet having the response header generated in this way is routed and transmitted to the CPU 10A that is the issuer of the fetch request, as shown in FIG.

ＣＰＵ１０Ａ側で受信されたフェッチ応答パケットは、まず、ルータ１４Ａに設けられた受信バッファ１４１に、書込部１４２（図１，図４参照）によって、一旦、単位データ毎に書き込まれる。この後、読出部１４３′におけるＲＰの値（読み出すべき単位データのアドレス）を制御することで、受信バッファ１４１に書き込まれたパケットのうち、コア１１Ａが要求する処理対象データが、受信バッファ１４１から優先的に読み出される。図４は、図１に示すＣＰＵ１０Ａにおけるルータ１４Ａに含まれる受信バッファ１４１および当該受信バッファ１４１に対するパケット書込／読出処理に係る構成（書込部１４２および読出部１４３）を示すブロック図である。 The fetch response packet received on the CPU 10A side is first written to the reception buffer 141 provided in the router 14A once for each unit data by the writing unit 142 (see FIGS. 1 and 4). Thereafter, by controlling the RP value (address of the unit data to be read) in the reading unit 143 ′, the processing target data requested by the core 11A out of the packet written in the reception buffer 141 is received from the reception buffer 141. Read preferentially. FIG. 4 is a block diagram showing a reception buffer 141 included in router 14A in CPU 10A shown in FIG. 1 and a configuration (writing unit 142 and reading unit 143) relating to packet writing / reading processing with respect to reception buffer 141.

図４に示すように、第１実施形態の書込部１４２は、図１９に示した関連技術の受信バッファ１４１に対するパケット書込処理に係る構成と同様のＷＤＲ，ＨＤＷＰおよびＷＰを有する。また、第１実施形態の読出部１４３は、図１９に示した関連技術の受信バッファ１４１に対するパケット書込処理に係る構成と同様のＲＤＲおよびＲＰに加え、ＨＤＲＰ（Header Read Pointer；ヘッダ読出ポインタ），LengthレジスタおよびCycleＣＴ（Cycle Counter；サイクルカウンタ）を有する。 As shown in FIG. 4, the writing unit 142 of the first embodiment has the same WDR, HDWP, and WP as the configuration related to the packet writing process for the reception buffer 141 of the related technology shown in FIG. In addition to the same RDR and RP as the configuration related to the packet writing process with respect to the reception buffer 141 of the related technique shown in FIG. 19, the reading unit 143 of the first embodiment includes an HDRP (Header Read Pointer). , Length register and CycleCT (Cycle Counter).

ＲＤＲは、受信バッファ１４１から読み出された読出データ（一単位データ；例えば８バイトデータ）を一時的に保存するレジスタである。ＲＰは、受信バッファ１４１からのデータの読出を制御するもので、受信バッファ１４１から読み出すデータのアドレスを指定するポインタである。ＲＰは、初期状態では０を設定され、受信バッファ１４１からのデータ読出時に１サイクル毎に１ずつインクリメントされる。ＲＰによって指定されるアドレスのデータは、受信バッファ１４１から読み出され、ＲＤＲに一時的に保存され、前述したように共有キャッシュ１２Ａに登録されるとともにフェッチ要求を行なったコア１１へ送信される。 The RDR is a register that temporarily stores read data (one unit data; for example, 8-byte data) read from the reception buffer 141. The RP controls reading of data from the reception buffer 141 and is a pointer for designating an address of data read from the reception buffer 141. RP is set to 0 in the initial state, and is incremented by 1 every cycle when data is read from the reception buffer 141. Data at the address specified by the RP is read from the reception buffer 141, temporarily stored in the RDR, registered in the shared cache 12A as described above, and transmitted to the core 11 that has made the fetch request.

ＨＤＲＰは、受信バッファ１４１から読出中のパケットのヘッダのアドレスを示すポインタであり、初期状態では０を設定される。Lengthレジスタは、受信バッファ１４１から読出中のパケットのデータ長（パケット長）Lengthを設定される。Lengthレジスタに設定されるデータ長Lengthは、ヘッダを受信バッファ１４１から読み出した際に、Length生成部１４３ａ（図５参照）によって当該ヘッダにおけるOpcode（パケット種）から生成され設定される。CycleＣＴは、受信バッファ１４１から読出中のパケットの単位データが、何個目の単位データであるかを示すカウンタであり、一サイクル毎につまり一単位データを読み出す都度、１ずつインクリメントされる。 HDRP is a pointer indicating the address of the header of the packet being read from the reception buffer 141, and is set to 0 in the initial state. In the Length register, the data length (packet length) Length of the packet being read from the reception buffer 141 is set. The data length Length set in the Length register is generated and set from the Opcode (packet type) in the header by the Length generator 143a (see FIG. 5) when the header is read from the reception buffer 141. CycleCT is a counter indicating what unit data the unit data of the packet being read from the reception buffer 141 is, and is incremented by 1 every cycle, that is, whenever one unit data is read.

ついで、図５を参照しながら、図４に示すパケット読出処理に係る構成、つまり読出部１４３について、より詳細に説明する。図５は、読出部１４３の詳細構成を示すブロック図である。図５に示す読出部１４３は、図６および図７を参照しながら後述するように動作する。図５に示すように、第１実施形態の読出部１４３は、Length生成部１４３ａと１加算器１４３ｂ，１４３ｄと加算器１４３ｃ，１４３ｅ，１４３ｆとセレクタ１４３ｇとを含む。 Next, the configuration related to the packet reading process shown in FIG. 4, that is, the reading unit 143 will be described in more detail with reference to FIG. FIG. 5 is a block diagram illustrating a detailed configuration of the reading unit 143. The reading unit 143 shown in FIG. 5 operates as described later with reference to FIGS. 6 and 7. As shown in FIG. 5, the reading unit 143 of the first embodiment includes a length generation unit 143a, 1 adders 143b and 143d, adders 143c, 143e and 143f, and a selector 143g.

Length生成部１４３ａは、図４を参照しながら前述したように、ヘッダを受信バッファ１４１から読み出した際に、当該ヘッダにおけるOpcodeから、受信バッファ１４１から読出中のパケットのデータ長Lengthを生成し、Lengthレジスタに設定する。 As described above with reference to FIG. 4, the Length generation unit 143 a generates the data length Length of the packet being read from the reception buffer 141 from the Opcode in the header when reading the header from the reception buffer 141. Set in the Length register.

１加算器（＋１）１４３ｂは、ＨＤＲＰの示す値（以下、ＨＤＲＰと表記）に１を加算する。 The 1 adder (+1) 143b adds 1 to a value indicated by HDRP (hereinafter referred to as HDRP).

加算器１４３ｃは、Lengthレジスタにおけるデータ長Lengthと、１加算器１４３ｂからの値ＨＤＲＰ＋１とを加算し、得られた値ＨＤＲＰ＋Length＋１をＨＤＲＰに設定する（図６のステップＳ２４参照）。加算器１４３ｃの動作タイミングは、CycleＣＴの示す値（以下、CycleＣＴと表記）がデータ長LengthになってCycleＣＴをリセット（初期化）するタイミング（図６のステップＳ２２および図７のタイミングＴ３４参照）である。 The adder 143c adds the data length Length in the Length register and the value HDRP + 1 from the 1 adder 143b, and sets the obtained value HDRP + Length + 1 to HDRP (see step S24 in FIG. 6). The operation timing of the adder 143c is the timing at which the value indicated by CycleCT (hereinafter referred to as CycleCT) becomes the data length Length and resets (initializes) CycleCT (see step S22 in FIG. 6 and timing T34 in FIG. 7). is there.

１加算器（＋１）１４３ｄは、ＲＰの示す値（以下、ＲＰと表記）に１を加算する。 The 1 adder (+1) 143d adds 1 to the value indicated by RP (hereinafter referred to as RP).

加算器１４３ｅは、コア１１Ａが要求する処理対象データを特定する物理アドレスｐａ[6:3]と、１加算器１４３ｄからの値ＲＰ＋１とを加算し、得られた値ＲＰ＋ｐａ[6:3]＋１をＲＰに設定する（図６のステップＳ１６参照）。物理アドレスｐａ[6:3]は、読出中のパケットのヘッダからＲＤ−ＢＵＳ（読出バス）を介して読み出される。加算器１４３ｅの動作タイミングは、フェッチ応答パケットからヘッダを読み出すタイミング（図６のステップＳ１５のＹＥＳルート；図７のタイミングＴ１８参照）である。当該タイミングで、セレクタ１４３ｇは、加算器１４３ｅで得られた値ＲＰ＋ｐａ[6:3]＋１をＲＰに設定するように切替動作を行なう（図５〜図７の(1)参照）。 The adder 143e adds the physical address pa [6: 3] specifying the processing target data requested by the core 11A and the value RP + 1 from the 1 adder 143d, and obtains the obtained value RP + pa [6: 3] +1. Is set to RP (see step S16 in FIG. 6). The physical address pa [6: 3] is read from the header of the packet being read through the RD-BUS (read bus). The operation timing of the adder 143e is the timing for reading the header from the fetch response packet (YES route in step S15 in FIG. 6; see timing T18 in FIG. 7). At this timing, the selector 143g performs a switching operation so that the value RP + pa [6: 3] +1 obtained by the adder 143e is set to RP (see (1) in FIGS. 5 to 7).

加算器１４３ｆは、Lengthレジスタにおけるデータ長Lengthと、１加算器１４３ｂからの値ＨＤＲＰ＋１とを加算し、得られた値ＨＤＲＰ＋Length＋１をＲＰに設定する（図６のステップＳ２３参照）。加算器１４３ｆの動作タイミングは、CycleＣＴがデータ長LengthになってCycleＣＴをリセット（初期化）するタイミング（図６のステップＳ２２；図７のタイミングＴ３４参照）である。当該タイミングで、セレクタ１４３ｇは、加算器１４３ｆで得られた値ＨＤＲＰ＋Length＋１をＲＰに設定するように切替動作を行なう（図５〜図７の(4)参照）。 The adder 143f adds the data length Length in the Length register and the value HDRP + 1 from the 1 adder 143b, and sets the obtained value HDRP + Length + 1 to RP (see step S23 in FIG. 6). The operation timing of the adder 143f is the timing at which CycleCT becomes the data length Length and resets (initializes) CycleCT (step S22 in FIG. 6; see timing T34 in FIG. 7). At this timing, the selector 143g performs a switching operation so that the value HDRP + Length + 1 obtained by the adder 143f is set to RP (see (4) in FIGS. 5 to 7).

また、セレクタ１４３ｇは、１加算器１４３ｄで得られた値ＲＰ＋１をＲＰに設定するように切替動作を行なう（図５〜図７の(2)参照）。当該切替動作を行なうタイミングは、ヘッダのOpcodeがフェッチ応答でない場合（図６のステップＳ１５のＮＯルート参照）、もしくは、ＲＰがＨＤＲＰ＋Lengthと一致しないタイミング（図６のステップＳ１９のＮＯルート；図７のタイミングＴ１９〜Ｔ２３，Ｔ２５〜Ｔ３３参照）である。 The selector 143g performs a switching operation so as to set the value RP + 1 obtained by the 1 adder 143d to RP (see (2) in FIGS. 5 to 7). The timing for performing the switching operation is when the Opcode of the header is not a fetch response (see the NO route in step S15 in FIG. 6), or when the RP does not match HDRP + Length (the NO route in step S19 in FIG. 6; FIG. 7). Timing T19 to T23, T25 to T33).

さらに、セレクタ１４３ｇは、１加算器１４３ｂで得られた値ＨＤＲＰ＋１をＲＰに設定するように切替動作を行なう（図５〜図７の(3)参照）。当該切替動作を行なうタイミングは、ＲＰがＨＤＲＰ＋Lengthと一致するタイミング（図６のステップＳ１９のＹＥＳルート；図７のタイミングＴ２４参照）である。 Further, the selector 143g performs a switching operation so as to set the value HDRP + 1 obtained by the 1 adder 143b to RP (see (3) in FIGS. 5 to 7). The timing for performing the switching operation is the timing at which RP matches HDRP + Length (YES route in step S19 in FIG. 6; see timing T24 in FIG. 7).

なお、上述した書込部１４２および読出部１４３としての機能は、論理ゲート等によってハードウエア的にＣＰＵ１０Ａに組み込まれて実現されてもよいし、プログラムを実行することでソフトウエア的にＣＰＵ１０Ａに組み込まれて実現されてもよい。 Note that the functions as the writing unit 142 and the reading unit 143 described above may be realized by being incorporated into the CPU 10A in hardware by a logic gate or the like, or incorporated into the CPU 10A as software by executing a program. May be realized.

次に、図５〜図７を参照しながら、上述のごとく構成された書込部１４２および読出部１４３の動作について説明する。図６は、図５に示すパケット読出処理に係る構成（読出部１４３）の動作を説明するフローチャートである。図７は、図４および図５に示すパケット読出処理に係る構成（読出部１４３）が図６に示すフローチャートに従って１０番目のデータ（ＤＴＡ）を最初に読み出す場合の動作を示すタイムチャートである。 Next, the operations of the writing unit 142 and the reading unit 143 configured as described above will be described with reference to FIGS. FIG. 6 is a flowchart for explaining the operation of the configuration (reading unit 143) related to the packet reading process shown in FIG. FIG. 7 is a time chart showing an operation when the configuration (reading unit 143) related to the packet reading process shown in FIGS. 4 and 5 first reads the 10th data (DTA) according to the flowchart shown in FIG.

第１実施形態の書込部１４２によるパケット書込動作は、図２０を参照しながら前述した関連技術の動作と同様であるので、その説明は省略する。これに対し、第１実施形態の読出部１４３によるパケット読出動作は、図２１を参照しながら前述した関連技術の動作と異なっている。 The packet writing operation by the writing unit 142 of the first embodiment is the same as the operation of the related art described above with reference to FIG. On the other hand, the packet reading operation by the reading unit 143 of the first embodiment is different from the operation of the related art described above with reference to FIG.

ここで、図６に示すフローチャート（ステップＳ１１〜Ｓ２４）に従って、図５および図７を参照しながら、第１実施形態の読出部１４３によるパケット読出動作について説明する。 Here, according to the flowchart shown in FIG. 6 (steps S11 to S24), the packet reading operation by the reading unit 143 of the first embodiment will be described with reference to FIGS.

ルータ１４Ａは、ＲＰの値とＨＤＷＰの値とが一致しているか否かを判断する（ステップＳ１１）。ＲＰの値とＨＤＷＰの値とが一致している場合（ステップＳ１１のＹＥＳルート）、ルータ１４Ａは、書込対象パケットの書込中であると判断し、ステップＳ１１の処理に戻る。 The router 14A determines whether or not the RP value matches the HDWP value (step S11). If the RP value matches the HDWP value (YES route in step S11), the router 14A determines that the write target packet is being written, and returns to the process in step S11.

ＲＰの値とＨＤＷＰの値とが一致しない場合（ステップＳ１１のＮＯルート）、ルータ１４Ａは、書込対象パケットの書込を完了したと判断し、受信バッファ１４１からのパケット読出を開始する（図７のタイミングＴ１８参照）。つまり、ＲＰによって指定される、受信バッファ１４１のエントリ（一単位データ）が、受信バッファ１４１からＲＤＲ経由で読み出される（ステップＳ１２）。そして、読み出されたエントリがヘッダ（ＨＤ）であるか否かが判断される（ステップＳ１３）。 If the RP value and the HDWP value do not match (NO route in step S11), the router 14A determines that the writing of the write target packet has been completed, and starts reading the packet from the reception buffer 141 (FIG. 7 timing T18). That is, the entry (one unit data) of the reception buffer 141 designated by RP is read from the reception buffer 141 via RDR (step S12). Then, it is determined whether or not the read entry is a header (HD) (step S13).

パケット読出の開始時には、まずパケットのヘッダが読み出されるため、読み出されたエントリはヘッダであると判断される（ステップＳ１３のＹＥＳルート）。この場合、読み出されたヘッダのOpcodeが参照され、当該Opcode（パケット種）に基づき、受信バッファ１４１から読出中のパケットのデータ長Lengthが生成され、生成されたデータ長Lengthが読出部１４３のLengthレジスタに設定される（ステップＳ１４）。 At the start of packet reading, the header of the packet is read first, so that the read entry is determined to be a header (YES route in step S13). In this case, the Opcode of the read header is referred to, the data length Length of the packet being read from the reception buffer 141 is generated based on the Opcode (packet type), and the generated data length Length is generated by the reading unit 143. It is set in the Length register (step S14).

この後、当該Opcodeがフェッチ応答であるか否かを判断する（ステップＳ１５）。フェッチ応答である場合（ステップＳ１５のＹＥＳルート）、セレクタ１４３ｇは図５の(1)を選択するように切替動作を行なう。これにより、加算器１４３ｅで得られた値ＲＰ＋ｐａ[6:3]＋１がＲＰに設定され（ステップＳ１６）、CycleＣＴが１インクリメントされる（ステップＳ１７）。そして、ＲＰに設定された値（アドレス）で指定されるエントリ（データＤＴＡ）が読み出される（ステップＳ１１のＮＯルートからステップＳ１２）。つまり、パケットのヘッダを読み出した際に、ヘッダの物理アドレスｐａ[6:3]に基づき、コア１１Ａが要求しているデータに対応する受信バッファ１４１のアドレスＲＰ＋ｐａ[6:3]＋１がＲＰに設定される。図７に示す例では、１０番目のデータＤＴＡに対応するアドレスを示す１１が、ＲＰにセットされる。 Thereafter, it is determined whether or not the Opcode is a fetch response (step S15). If it is a fetch response (YES route in step S15), the selector 143g performs a switching operation so as to select (1) in FIG. Thus, the value RP + pa [6: 3] +1 obtained by the adder 143e is set to RP (step S16), and CycleCT is incremented by 1 (step S17). Then, the entry (data DTA) designated by the value (address) set in RP is read (from the NO route in step S11 to step S12). That is, when the header of the packet is read, the address RP + pa [6: 3] +1 of the reception buffer 141 corresponding to the data requested by the core 11A becomes RP based on the physical address pa [6: 3] of the header. Is set. In the example shown in FIG. 7, 11 indicating the address corresponding to the tenth data DTA is set in the RP.

この後、次のサイクルで、ヘッダの次に読み出されるエントリはデータであり、この場合（ステップＳ１３のＮＯルート）、CycleＣＴの値がデータ長（パケット長）Lengthに到達したか否かが判断される（ステップＳ１８）。つまり、読出対象パケットの全てのデータが読み出されたか否かが判断される。図７では、Length＝１６の例が示されている。 Thereafter, in the next cycle, the entry read out next to the header is data. In this case (NO route in step S13), it is determined whether or not the value of CycleCT has reached the data length (packet length) Length. (Step S18). That is, it is determined whether or not all data of the read target packet has been read. FIG. 7 shows an example in which Length = 16.

CycleＣＴ＝Lengthでない場合（ステップＳ１８のＮＯルート）、ＲＰの値が値ＨＤＲＰ＋Length（図７では値１６）に到達したか否かが判断される（ステップＳ１９）。ＲＰ＝ＨＤＲＰ＋Lengthでない場合（ステップＳ１９のＮＯルート）、もしくは、ヘッダのOpcodeがフェッチ応答でない場合（ステップＳ１５のＮＯルート）、セレクタ１４３ｇは図５の(2)を選択するように切替動作を行なう。これにより、ＲＰの値が値ＨＤＲＰ＋Lengthに到達するまで、一単位データを読み出す都度（図７のタイミングＴ１９〜Ｔ２３参照）、ＲＰの値が１インクリメントされ（ステップＳ２０）、CycleＣＴが１インクリメントされた後（ステップＳ１７）、処理はステップＳ１１に戻る。 If CycleCT = Length is not satisfied (NO route of step S18), it is determined whether or not the value of RP has reached the value HDRP + Length (value 16 in FIG. 7) (step S19). If RP = HDRP + Length is not satisfied (NO route in step S19), or if the Opcode of the header is not a fetch response (NO route in step S15), the selector 143g performs a switching operation so as to select (2) in FIG. Thus, every time one unit data is read until the value of RP reaches the value HDRP + Length (see timings T19 to T23 in FIG. 7), the value of RP is incremented by 1 (step S20) and CycleCT is incremented by 1. (Step S17), the process returns to Step S11.

ＲＰの値が値ＨＤＲＰ＋Lengthに到達すると（ステップＳ１９のＹＥＳルート）、セレクタ１４３ｇは図５の(3)を選択するように切替動作を行なう。これにより、１加算器１４３ｂで得られた値ＨＤＲＰ＋１がＲＰに設定され（ステップＳ２１）、CycleＣＴが１インクリメントされる（ステップＳ１７）。例えば図７のタイミングＴ２４では、ＨＤＲＰ＝０であるため、ＲＰには１が設定される。この後、処理はステップＳ１１に戻る。 When the value of RP reaches the value HDRP + Length (YES route in step S19), the selector 143g performs a switching operation so as to select (3) in FIG. Thus, the value HDRP + 1 obtained by the 1 adder 143b is set to RP (step S21), and CycleCT is incremented by 1 (step S17). For example, at timing T24 in FIG. 7, since HDRP = 0, 1 is set in RP. Thereafter, the process returns to step S11.

ＲＰに値ＨＤＲＰ＋１を設定した後の各サイクル（図７のタイミングＴ２５〜Ｔ３３参照）では、CycleＣＴの値がデータ長Lengthに到達するまで、つまり読出対象パケットの全てのデータが読み出されるまで、セレクタ１４３ｇは図５の(2)を選択するように切替動作を行なう。これにより、CycleＣＴの値がデータ長Lengthに到達するまで、一単位データを読み出す都度、ＲＰの値が１インクリメントされ（ステップＳ２０）、CycleＣＴが１インクリメントされた後（ステップＳ１７）、処理はステップＳ１１に戻る。 In each cycle after setting the value HDRP + 1 in RP (see timings T25 to T33 in FIG. 7), the selector 143g until the value of CycleCT reaches the data length Length, that is, until all the data of the read target packet is read. Performs the switching operation so as to select (2) in FIG. Thus, every time one unit data is read out until the value of CycleCT reaches the data length Length, the value of RP is incremented by 1 (step S20), and after the CycleCT is incremented by 1 (step S17), the process proceeds to step S11. Return to.

CycleＣＴの値がデータ長Lengthに到達すると（ステップＳ１８のＹＥＳルート；図７のタイミングＴ３４参照）、CycleＣＴの値が０にリセットされる（ステップＳ２２）。そして、セレクタ１４３ｇは図５の(4)を選択するように切替動作を行なう。これにより、加算器１４３ｆで得られた値ＨＤＲＰ＋Length＋１（図７では値１７）が、次に読み出すべきデータのアドレスとしてＲＰに設定される（ステップＳ２３）。また、加算器１４３ｃで得られた値ＨＤＲＰ＋Length＋１（図７では値１７）が、次に読み出すべきパケットのヘッダのアドレスとしてＨＤＲＰに設定される（ステップＳ２４）。この後、処理はステップＳ１１に戻る。 When the value of CycleCT reaches the data length Length (YES route in step S18; see timing T34 in FIG. 7), the value of CycleCT is reset to 0 (step S22). The selector 143g performs a switching operation so as to select (4) in FIG. As a result, the value HDRP + Length + 1 (value 17 in FIG. 7) obtained by the adder 143f is set to RP as the address of data to be read next (step S23). Further, the value HDRP + Length + 1 (value 17 in FIG. 7) obtained by the adder 143c is set to HDRP as the header address of the packet to be read next (step S24). Thereafter, the process returns to step S11.

以上の動作により、第１実施形態では、データＤＴＡを、図２２に示した関連技術の場合よりも１０サイクル早く読み出すことができ、レイテンシが短縮される。 With the above operation, in the first embodiment, the data DTA can be read 10 cycles earlier than in the related art shown in FIG. 22, and the latency is shortened.

また、上述した動作では、物理アドレスｐａ[6:3]によって、読み出すデータの順番が一意に決まる。したがって、パケットを受け取ったコア１１Ａは、受信バッファ１４１におけるパケットから最初に読み出したデータ以外のデータが、どの物理アドレスのデータであるかを容易に判断することができる。 In the above-described operation, the order of data to be read is uniquely determined by the physical address pa [6: 3]. Therefore, the core 11A that has received the packet can easily determine which physical address data other than the data read first from the packet in the reception buffer 141 is.

〔４〕第２実施形態の情報処理装置
次に、図１および図８〜図１３を参照しながら、本発明の第２実施形態としての情報処理装置（マルチプロセッサシステム）１′について説明する。ここで説明する第２実施形態は、複数の処理対象データが、一つのデータブロック内において所定間隔Intervalをあけて存在する場合に対応する技術である。つまり、第２実施形態は、前述したように、行列計算等で発生するストライドアクセスにおいて、１個のデータ付きパケット内に、必要な単位データ（処理対象データ）が複数存在する場合に対応する技術である。例えば、図１３に示す例では、１６個の８バイトデータＤＴ０〜ＤＴＦを含むパケット内に、所定間隔Interval＝４をあけて４個の処理対象データＤＴ２，ＤＴ６，ＤＴＡ，ＤＴＥが存在する場合について説明する。 [4] Information Processing Apparatus According to Second Embodiment Next, an information processing apparatus (multiprocessor system) 1 ′ as a second embodiment of the present invention will be described with reference to FIGS. 1 and 8 to 13. The second embodiment described here is a technique corresponding to a case where a plurality of pieces of processing target data exist at a predetermined interval Interval within one data block. That is, as described above, the second embodiment is a technique corresponding to a case where a plurality of necessary unit data (processing target data) exist in one packet with data in stride access generated by matrix calculation or the like. It is. For example, in the example shown in FIG. 13, there are four processing target data DT2, DT6, DTA, and DTE in a packet including 16 pieces of 8-byte data DT0 to DTF with a predetermined interval Interval = 4. explain.

図１に示すように、第２実施形態の情報処理装置１′も、第１実施形態の情報処理装置１と同様、複数（図１で２個）のＣＰＵ１０Ａ，１０Ｂを有している。第２実施形態においても、ＣＰＵ１０ＡとＣＰＵ１０Ｂとは、高速シリアル伝送によるネットワーク３０を介して相互に通信可能に接続される。 As shown in FIG. 1, the information processing apparatus 1 ′ of the second embodiment also includes a plurality (two in FIG. 1) of CPUs 10 A and 10 B, similarly to the information processing apparatus 1 of the first embodiment. Also in the second embodiment, the CPU 10A and the CPU 10B are connected to be communicable with each other via the network 30 based on high-speed serial transmission.

情報処理装置１′におけるＣＰＵ１０Ａ，１０Ｂも、図１〜図７を参照しながら前述した情報処理装置１におけるＣＰＵ１０Ａ，１０Ｂと同様に構成されている。ただし、第２実施形態の情報処理装置１′では、以下に説明するように、ＣＰＵ１０Ａのルータ１４Ａの第１の通信部としての機能、および、ＣＰＵ１０Ｂのルータ１４Ｂの第２の通信部としての機能に若干の変更が加えられる。また、第２実施形態の情報処理装置１′では、以下に説明するように、ＣＰＵ１０Ａ（ルータ１４Ａ）における読出部１４３が、読出部１４３′（図１，図１０，図１１参照）に変更されている。 The CPUs 10A and 10B in the information processing apparatus 1 ′ are also configured similarly to the CPUs 10A and 10B in the information processing apparatus 1 described above with reference to FIGS. However, in the information processing apparatus 1 ′ according to the second embodiment, as described below, the function as the first communication unit of the router 14A of the CPU 10A and the function as the second communication unit of the router 14B of the CPU 10B. Some changes will be made. In the information processing apparatus 1 ′ according to the second embodiment, the reading unit 143 in the CPU 10A (router 14A) is changed to a reading unit 143 ′ (see FIGS. 1, 10, and 11) as described below. ing.

第２実施形態のＣＰＵ１０Ａにおけるルータ（第１の通信部）１４Ａも、第１実施形態と同様、一のコア１１Ａが生成したフェッチ要求を、フェッチ要求パケット（図９，図１７（Ａ）参照）によってＣＰＵ１０Ｂへ送信する。また、ルータ（第１の通信部）１４Ａは、フェッチ要求に対応する処理対象データを含むデータブロックを添付されたフェッチ応答パケット（図９，図１７（Ｂ）参照）を、ＣＰＵ１０Ｂから受信する。 Similarly to the first embodiment, the router (first communication unit) 14A in the CPU 10A of the second embodiment also sends a fetch request packet generated by one core 11A to a fetch request packet (see FIGS. 9 and 17A). To the CPU 10B. Further, the router (first communication unit) 14A receives from the CPU 10B a fetch response packet (see FIGS. 9 and 17B) attached with a data block including data to be processed corresponding to the fetch request.

ただし、第２実施形態のＣＰＵ１０Ａにおけるルータ（第１の通信部）１４ＡからＣＰＵ１０Ｂへ送信される、フェッチ要求パケットのヘッダは、図８（Ａ）に示すようなフォーマットを有する。つまり、フェッチ要求パケットのヘッダには、図１８（Ａ）を参照しながら上述したＯＰＣ，ＲＱＩＤ，物理アドレスＰＡのほかに、上記所定間隔に関する情報（ここでは上記所定間隔を示す値Interval）が含まれる。 However, the header of the fetch request packet transmitted from the router (first communication unit) 14A in the CPU 10A of the second embodiment to the CPU 10B has a format as shown in FIG. That is, the header of the fetch request packet includes information related to the predetermined interval (here, a value Interval indicating the predetermined interval) in addition to the OPC, RQID, and physical address PA described above with reference to FIG. It is.

また、第２実施形態のＣＰＵ１０Ｂにおけるルータ（第２の通信部）１４Ｂは、図８（Ｂ）に示すようなフォーマットを有する応答ヘッダをＣＰＵ１０Ａへ送信されるフェッチ応答パケット（ＤＩＭＭ２０Ｂ等から読み出されたデータブロック）に付す。図８（Ｂ）に示すように、当該応答ヘッダには、フェッチ要求パケットのヘッダに含まれるアドレス情報（図８（Ａ）のＰＡ参照）から取り出された先頭の処理対象データのアドレス情報ｐａ[6:3]と、所定間隔Intervalとが記録される。そして、ルータ（第２の通信部）１４Ｂは、応答ヘッダを付したデータブロックを、フェッチ応答パケット（図９，図１７（Ｂ）参照）としてＣＰＵ１０Ａへ送信する。 Further, the router (second communication unit) 14B in the CPU 10B of the second embodiment reads a response header having a format as shown in FIG. 8B from a fetch response packet (DIMM 20B or the like) transmitted to the CPU 10A. Data block). As shown in FIG. 8B, in the response header, the address information pa [of the first process target data extracted from the address information (see PA in FIG. 8A) included in the header of the fetch request packet is included. 6: 3] and a predetermined interval Interval are recorded. The router (second communication unit) 14B transmits the data block with the response header to the CPU 10A as a fetch response packet (see FIGS. 9 and 17B).

さらに、第２実施形態のＣＰＵ１０Ａにおけるルータ（第１の通信部）１４Ａは、受信バッファ１４１，書込部１４２，読出部１４３′を有する。第２実施形態において、受信バッファ１４１，書込部１４２，読出部１４３′は、ＣＰＵ１０Ａのルータ１４Ａ内に備えられているが、ＣＰＵ１０Ａ内に備えられていればよい。受信バッファ１４１および書込部１４２は、図１および図４を参照しながら上述した第１実施形態の受信バッファ１４１および書込部１４２と同様に構成されているので、その説明は省略する。 Furthermore, the router (first communication unit) 14A in the CPU 10A of the second embodiment includes a reception buffer 141, a writing unit 142, and a reading unit 143 ′. In the second embodiment, the reception buffer 141, the writing unit 142, and the reading unit 143 ′ are provided in the router 14A of the CPU 10A, but may be provided in the CPU 10A. Since the reception buffer 141 and the writing unit 142 are configured in the same manner as the reception buffer 141 and the writing unit 142 of the first embodiment described above with reference to FIGS. 1 and 4, description thereof is omitted.

読出部１４３′は、データブロック（フェッチ応答パケット）に付された応答ヘッダに記録された、先頭の処理対象データの物理アドレスｐａ[6:3]と所定間隔Intervalとを参照する。そして、読出部１４３′は、参照した物理アドレスｐａ[6:3]と所定間隔Intervalとに基づき、まず、コア１１Ａの要求する複数の処理対象データを受信バッファ１４１から読み出した後、当該複数の処理対象データ以外の単位データを受信バッファ１４１から順次読み出す。第２実施形態における読出部１４３′は、図１０および図１１を参照しながら後述するごとく構成され、図１２および図１３を参照しながら後述するごとく動作する。 The reading unit 143 ′ refers to the physical address pa [6: 3] of the first process target data and the predetermined interval Interval recorded in the response header attached to the data block (fetch response packet). Based on the physical address pa [6: 3] and the predetermined interval Interval, the reading unit 143 ′ first reads a plurality of processing target data requested by the core 11A from the reception buffer 141, and then Unit data other than the processing target data is sequentially read from the reception buffer 141. The reading unit 143 ′ in the second embodiment is configured as described later with reference to FIGS. 10 and 11, and operates as described later with reference to FIGS.

なお、第２実施形態では、ＣＰＵ１０Ａが、第１の通信部としての機能や、受信バッファ１４１，書込部１４２，読出部１４３′としての機能を有し、ＣＰＵ１０Ｂが、第２の通信部としての機能を有する場合について説明している。しかし、複数のＣＰＵ１０Ａ，１０Ｂのそれぞれが、第１および第２の通信部としての機能と、受信バッファ１４１，書込部１４２，読出部１４３′としての機能とを有していてもよい。 In the second embodiment, the CPU 10A has a function as a first communication unit and functions as a reception buffer 141, a writing unit 142, and a reading unit 143 ′, and the CPU 10B serves as a second communication unit. The case of having the function is described. However, each of the CPUs 10A and 10B may have a function as the first and second communication units and a function as the reception buffer 141, the writing unit 142, and the reading unit 143 ′.

ここで、図９を参照しながら、第２実施形態の情報処理装置１′において、一ＣＰＵ１０Ａのコア１１Ａから他ＣＰＵ１０Ｂのメモリデータに対する、ストライドアクセスに係るフェッチ要求を発行する場合のパケットルーティングについて説明する。 Here, referring to FIG. 9, in the information processing apparatus 1 ′ of the second embodiment, packet routing when issuing a fetch request related to stride access to the memory data of another CPU 10B from the core 11A of one CPU 10A will be described. To do.

この場合、図９に示すように、必要な処理対象データを読み出すためのフェッチ要求パケット（データ無しパケット）が、ＣＰＵ１０Ａから発行・送信され、ネットワーク３０経由で、当該処理対象データを保持するメモリ２０Ｂを有するＣＰＵ１０Ｂにルーティングされる。このとき、フェッチ要求パケットのヘッダには、図８（Ａ）に示すように、少なくとも、処理対象データのアドレス情報ＰＡと、ストライドアクセスに係る所定間隔Intervalとが含まれる。 In this case, as shown in FIG. 9, a fetch request packet (packet without data) for reading out necessary processing target data is issued / transmitted from the CPU 10A, and the memory 20B holds the processing target data via the network 30. Is routed to the CPU 10B. At this time, as shown in FIG. 8A, the header of the fetch request packet includes at least address information PA of data to be processed and a predetermined interval Interval related to stride access.

一方、ルータ１４Ｂによってフェッチ要求パケットを受信したＣＰＵ１０Ｂは、要求された処理対象データを含むデータブロックをメモリ２０Ｂから読み出し、フェッチ応答パケット（データ付きパケット）を生成し、当該フェッチ応答パケットをＣＰＵ１０Ａに送信する。このとき、図８（Ｂ）に示すように、ＣＰＵ１０Ｂで生成されるフェッチ応答パケットのヘッダ（応答ヘッダ）には、ストライドアクセス対象の先頭の単位データ（処理対象データ）を特定可能な４ビットの物理アドレスｐａ[6:3]が載せられている。また、図８（Ｂ）に示すように、当該ヘッダには、ストライドアクセスに係る所定間隔Intervalも載せられている。 On the other hand, the CPU 10B that has received the fetch request packet by the router 14B reads the data block including the requested processing target data from the memory 20B, generates a fetch response packet (packet with data), and transmits the fetch response packet to the CPU 10A. To do. At this time, as shown in FIG. 8B, the header (response header) of the fetch response packet generated by the CPU 10B has a 4-bit value that can identify the head unit data (processing target data) to be stride accessed. The physical address pa [6: 3] is listed. Further, as shown in FIG. 8B, a predetermined interval Interval related to stride access is also placed in the header.

ＣＰＵ１０Ｂがフェッチ応答パケットを生成する際、フェッチ要求パケットのヘッダに載っている物理アドレスＰＡ（図８（Ａ）参照）から、フェッチ要求対象の先頭単位データを特定可能な４ビットの物理アドレスｐａ[6:3]が取り出される。また、同ヘッダからストライドアクセスに係る所定間隔Intervalが取り出される。取り出された物理アドレスｐａ[6:3]および所定間隔Intervalを応答ヘッダに含ませることにより、図８（Ｂ）に示すようなフォーマットの応答ヘッダが生成される。このように生成された応答ヘッダを有するフェッチ応答パケットは、図９に示すように、フェッチ要求の発行元であるＣＰＵ１０Ａにルーティングされ送信される。 When the CPU 10B generates the fetch response packet, the 4-bit physical address pa [that can specify the head unit data to be fetched from the physical address PA (see FIG. 8A) included in the header of the fetch request packet. 6: 3] is taken out. Further, a predetermined interval Interval related to stride access is extracted from the header. By including the extracted physical address pa [6: 3] and the predetermined interval Interval in the response header, a response header having a format as shown in FIG. 8B is generated. The fetch response packet having the response header generated in this way is routed and transmitted to the CPU 10A that is the issuer of the fetch request, as shown in FIG.

上述の通り、第２実施形態では、フェッチ要求パケットのヘッダおよびフェッチ応答パケットのヘッダの両方に、コア１１Ａがメモリ２０Ｂにストライドアクセスする際のアドレス間隔を示すIntervalが追加されている。ここで、例えば、データブロック（１２８バイトデータ）に含まれる単位データ（８バイトデータ）の数が１６である場合、所定間隔Intervalの値は、０＜Interval＜１６の範囲の整数である。これは、Interval＝０の場合、同一の単位データを選択することになる一方、Interval＝１６の場合、パケットに含まれるデータブロック（１６個の単位データ）の範囲を超えることになるからである。 As described above, in the second embodiment, the interval indicating the address interval when the core 11A performs stride access to the memory 20B is added to both the header of the fetch request packet and the header of the fetch response packet. Here, for example, when the number of unit data (8-byte data) included in the data block (128-byte data) is 16, the value of the predetermined interval Interval is an integer in the range of 0 <Interval <16. This is because when Interval = 0, the same unit data is selected, whereas when Interval = 16, the range of data blocks (16 unit data) included in the packet is exceeded. .

ＣＰＵ１０Ａ側で受信されたフェッチ応答パケットは、まず、ルータ１４Ａに設けられた受信バッファ１４１に、書込部１４２（図１，図１０参照）によって、一旦、単位データ毎に書き込まれる。この後、読出部１４３′におけるＲＰの値（読み出すべき単位データのアドレス）を制御することで、受信バッファ１４１に書き込まれたパケットのうち、コア１１Ａが要求する処理対象データが受信バッファ１４１から優先的に読み出される。 The fetch response packet received on the CPU 10A side is first written for each unit data by the writing unit 142 (see FIGS. 1 and 10) into the reception buffer 141 provided in the router 14A. After that, by controlling the value of RP (address of unit data to be read) in the reading unit 143 ′, the processing target data requested by the core 11A has priority from the receiving buffer 141 among the packets written in the receiving buffer 141. Read out automatically.

特に、第２実施形態における読出部１４３′は、応答ヘッダに記録された、先頭の処理対象データの物理アドレスｐａ[6:3]と所定間隔Intervalとに基づき、コア１１Ａの要求する複数の処理対象データを受信バッファ１４１から読み出した後、それ以外のデータを受信バッファ１４１から順次読み出す。 In particular, the reading unit 143 ′ in the second embodiment performs a plurality of processes requested by the core 11A based on the physical address pa [6: 3] of the first process target data and the predetermined interval Interval recorded in the response header. After the target data is read from the reception buffer 141, other data is sequentially read from the reception buffer 141.

以下、上述のような機能を実現する第２実施形態の構成について、図１０および図１１を参照しながら説明する。図１０は、第２実施形態のＣＰＵ１０Ａにおけるルータ１４Ａに含まれる受信バッファ１４１および当該受信バッファ１４１に対するパケット書込／読出処理に係る構成（書込部１４２および読出部１４３′）を示すブロック図である。図１１は、図１０に示すパケット読出処理に係る構成（読出部１４３′）を詳細に示すブロック図である。 Hereinafter, the configuration of the second embodiment for realizing the above-described function will be described with reference to FIGS. 10 and 11. FIG. 10 is a block diagram illustrating a reception buffer 141 included in the router 14A in the CPU 10A of the second embodiment and a configuration (a writing unit 142 and a reading unit 143 ′) related to packet writing / reading processing with respect to the reception buffer 141. is there. FIG. 11 is a block diagram showing in detail the configuration (reading unit 143 ′) related to the packet reading process shown in FIG.

図１０に示すように、第２実施形態における書込部１４２は、図４に示した書込部１４２と同様のＷＤＲ，ＨＤＷＰおよびＷＰを有する。 As shown in FIG. 10, the writing unit 142 in the second embodiment has the same WDR, HDWP, and WP as the writing unit 142 shown in FIG.

また、第２実施形態の読出部１４３′は、図４に示した読出部１４３と同様のＲＤＲ，ＲＰ，ＨＤＲＰ，LengthレジスタおよびCycleＣＴに加え、Intervalレジスタを有する。ＷＤＲ，ＨＤＷＰ，ＷＰ，ＲＤＲ，ＲＰ，ＨＤＲＰ，LengthレジスタおよびCycleＣＴについては、既述のものと同様であるので、その説明は省略する。 Further, the reading unit 143 ′ of the second embodiment includes an Interval register in addition to the RDR, RP, HDRP, Length register, and CycleCT similar to the reading unit 143 shown in FIG. Since the WDR, HDWP, WP, RDR, RP, HDRP, Length register, and CycleCT are the same as those already described, description thereof is omitted.

第２実施形態で追加されるIntervalレジスタには、フェッチ応答パケットのヘッダ（応答ヘッダ）を受信バッファ１４１から読み出した際に、当該ヘッダに記録された所定間隔Intervalが設定される。なお、パケット種がフェッチ応答パケット以外のパケットについては、Intervalの値として１が設定される。 In the Interval register added in the second embodiment, when a header (response header) of a fetch response packet is read from the reception buffer 141, a predetermined interval Interval recorded in the header is set. For packets whose packet type is other than the fetch response packet, 1 is set as the value of Interval.

ついで、図１１を参照しながら、図１０に示すパケット読出処理に係る構成、つまり読出部１４３′について、より詳細に説明する。図１１に示すように、第２実施形態の読出部１４３′は、第１実施形態の読出部１４３と同様のLength生成部１４３ａと１加算器１４３ｂ，１４３ｄと加算器１４３ｃ，１４３ｅ，１４３ｆとセレクタ１４３ｇとに加え、加算器１４３ｈおよび演算器１４３ｉを有する。 Next, the configuration related to the packet reading process shown in FIG. 10, that is, the reading unit 143 ′ will be described in more detail with reference to FIG. As shown in FIG. 11, the reading unit 143 ′ of the second embodiment includes a length generator 143a, 1 adders 143b and 143d, adders 143c, 143e and 143f, and a selector similar to the reading unit 143 of the first embodiment. In addition to 143g, it has an adder 143h and a calculator 143i.

Length生成部１４３ａは、第１実施形態と同様、ヘッダを受信バッファ１４１から読み出した際に、当該ヘッダにおけるOpcodeから、受信バッファ１４１から読出中のパケットのデータ長Lengthを生成し、Lengthレジスタに設定する。 Similar to the first embodiment, when the header is read from the reception buffer 141, the Length generation unit 143a generates the data length Length of the packet being read from the reception buffer 141 from the Opcode in the header, and sets it in the Length register. To do.

１加算器（＋１）１４３ｂは、第１実施形態と同様、ＨＤＲＰの示す値に１を加算する。 The 1 adder (+1) 143b adds 1 to the value indicated by HDRP, as in the first embodiment.

加算器１４３ｃは、第１実施形態と同様、Lengthレジスタにおけるデータ長Lengthと、１加算器１４３ｂからの値ＨＤＲＰ＋１とを加算し、得られた値ＨＤＲＰ＋Length＋１をＨＤＲＰに設定する（図１２のステップＳ４５参照）。加算器１４３ｃの動作タイミングは、CycleＣＴの示す値がデータ長LengthになってCycleＣＴをリセット（初期化）するタイミング（図１２のステップＳ４３および図１３のタイミングＴ３４参照）である。 Similarly to the first embodiment, the adder 143c adds the data length Length in the Length register and the value HDRP + 1 from the one adder 143b, and sets the obtained value HDRP + Length + 1 to HDRP (see step S45 in FIG. 12). ). The operation timing of the adder 143c is the timing at which the value indicated by CycleCT becomes the data length Length to reset (initialize) CycleCT (see step S43 in FIG. 12 and timing T34 in FIG. 13).

１加算器（＋１）１４３ｄは、第１実施形態と同様、ＲＰの示す値に１を加算する。 The 1 adder (+1) 143d adds 1 to the value indicated by RP as in the first embodiment.

加算器１４３ｅは、第１実施形態と同様、コア１１Ａが要求する先頭の処理対象データを特定する物理アドレスｐａ[6:3]と、１加算器１４３ｄからの値ＲＰ＋１とを加算し、得られた値ＲＰ＋ｐａ[6:3]＋１をＲＰに設定する（図１２のステップＳ３７参照）。物理アドレスｐａ[6:3]は、読出中のパケットのヘッダからＲＤ−ＢＵＳを介して読み出される。加算器１４３ｅの動作タイミングは、フェッチ応答パケットからヘッダを読み出すタイミング（図１２のステップＳ３６のＹＥＳルート；図１３のタイミングＴ１８参照）である。当該タイミングで、セレクタ１４３ｇは、加算器１４３ｅで得られた値ＲＰ＋ｐａ[6:3]＋１をＲＰに設定するように切替動作を行なう（図１１〜図１３の(1)参照）。 Similarly to the first embodiment, the adder 143e adds the physical address pa [6: 3] that specifies the first processing target data requested by the core 11A and the value RP + 1 from the one adder 143d. The value RP + pa [6: 3] +1 is set to RP (see step S37 in FIG. 12). The physical address pa [6: 3] is read via the RD-BUS from the header of the packet being read. The operation timing of the adder 143e is the timing for reading the header from the fetch response packet (YES route in step S36 in FIG. 12; see timing T18 in FIG. 13). At this timing, the selector 143g performs a switching operation so that the value RP + pa [6: 3] +1 obtained by the adder 143e is set to RP (see (1) in FIGS. 11 to 13).

加算器１４３ｆは、第１実施形態と同様、Lengthレジスタにおけるデータ長Lengthと、１加算器１４３ｂからの値ＨＤＲＰ＋１とを加算し、得られた値ＨＤＲＰ＋Length＋１をＲＰに設定する（図１２のステップＳ４４参照）。加算器１４３ｆの動作タイミングは、CycleＣＴがデータ長LengthになってCycleＣＴをリセット（初期化）するタイミング（図１２のステップＳ４３；図１３のタイミングＴ３４参照）である。当該タイミングで、セレクタ１４３ｇは、加算器１４３ｆで得られた値ＨＤＲＰ＋Length＋１をＲＰに設定するように切替動作を行なう（図１１〜図１３の(4)参照）。 Similarly to the first embodiment, the adder 143f adds the data length Length in the Length register and the value HDRP + 1 from the one adder 143b, and sets the obtained value HDRP + Length + 1 to RP (see step S44 in FIG. 12). ). The operation timing of the adder 143f is the timing at which CycleCT becomes the data length Length and resets (initializes) CycleCT (step S43 in FIG. 12; see timing T34 in FIG. 13). At the timing, the selector 143g performs a switching operation so as to set the value HDRP + Length + 1 obtained by the adder 143f to RP (see (4) in FIGS. 11 to 13).

第２実施形態で追加された加算器１４３ｈは、Intervalレジスタにおける所定間隔Intervalと、ＲＰの値とを加算し、得られた値ＲＰ＋IntervalをＲＰに設定する（図１２のステップＳ４１参照）。加算器１４３ｈの動作タイミングは、ヘッダのOpcodeがフェッチ応答でない場合（図１２のステップＳ３６のＮＯルート参照）、もしくは、ＲＰ＋IntervalがＨＤＲＰ＋Length以下であるタイミング（図１２のステップＳ４０のＮＯルート；図１３のタイミングＴ１９〜Ｔ２１，Ｔ２３〜Ｔ２５，Ｔ２７〜Ｔ２９，Ｔ３１〜Ｔ３２参照）である。当該タイミングで、セレクタ１４３ｇは、加算器１４３ｈで得られた値ＲＰ＋IntervalをＲＰに設定するように切替動作を行なう（図１１〜図１３の(2)参照）。 The adder 143h added in the second embodiment adds the predetermined interval Interval in the Interval register and the value of RP, and sets the obtained value RP + Interval to RP (see step S41 in FIG. 12). The operation timing of the adder 143h is the timing when the Opcode of the header is not a fetch response (see the NO route in step S36 in FIG. 12), or the timing when RP + Interval is equal to or less than HDRP + Length (NO route in step S40 in FIG. Timings T19 to T21, T23 to T25, T27 to T29, and T31 to T32). At the timing, the selector 143g performs a switching operation so as to set the value RP + Interval obtained by the adder 143h to RP (see (2) in FIGS. 11 to 13).

また、第２実施形態で追加された演算器１４３ｉは、Intervalレジスタにおける所定間隔Intervalと、１加算器１４３ｂからの値ＨＤＲＰ＋１と、ＲＰの値とに基づき、値ＨＤＲＰ＋［（ＲＰ−ＨＤＲＰ＋１）％Interval］を算出し、当該値をＲＰに設定する（図１２のステップＳ４２参照）。演算器１４３ｉの動作タイミングは、ＲＰ＋IntervalがＨＤＲＰ＋Lengthを超えるタイミング（図１２のステップＳ４０のＹＥＳルート；図１３のタイミングＴ２２，Ｔ２６，Ｔ３０参照）である。当該タイミングで、セレクタ１４３ｇは、演算器１４３ｉで得られた値をＲＰに設定するように切替動作を行なう（図１１〜図１３の(3)参照）。なお、上記値における％は、剰余を与える演算に用いられる記号で、剰余＝被除数％除数と規定される。例えば、１６％４＝４、１７％４＝１、１４％４＝２となる。 The computing unit 143i added in the second embodiment is based on the value HDRP + [(RP−HDRP + 1)% Interval based on the predetermined interval Interval in the Interval register, the value HDRP + 1 from the 1 adder 143b, and the value of RP. ] Is set to RP (see step S42 in FIG. 12). The operation timing of the calculator 143i is a timing at which RP + Interval exceeds HDRP + Length (YES route in step S40 in FIG. 12; see timings T22, T26, and T30 in FIG. 13). At the timing, the selector 143g performs a switching operation so as to set the value obtained by the calculator 143i to RP (see (3) in FIGS. 11 to 13). Note that% in the above value is a symbol used for calculation to give a remainder, and is defined as remainder = dividend number% divisor. For example, 16% 4 = 4, 17% 4 = 1, and 14% 4 = 2.

なお、上述した書込部１４２および読出部１４３′としての機能は、論理ゲート等によってハードウエア的にＣＰＵ１０Ａに組み込まれて実現されてもよいし、プログラムを実行することでソフトウエア的にＣＰＵ１０Ａに組み込まれて実現されてもよい。 Note that the functions as the writing unit 142 and the reading unit 143 ′ described above may be realized by being incorporated in the CPU 10A in hardware by a logic gate or the like, or may be realized in the CPU 10A by software by executing a program. It may be implemented by being incorporated.

次に、図１１〜図１３を参照しながら、上述のごとく構成された書込部１４２および読出部１４３′の動作について説明する。図１２は、図１１に示すパケット読出処理に係る構成（読出部１４３′）の動作を説明するフローチャートである。図１３は、図１０および図１１に示すパケット読出処理に係る構成（読出部１４３′）が図１２に示すフローチャートに従って２，６，１０，１４番目のデータ（ＤＴ２，ＤＴ６，ＤＴＡ，ＤＴＥ）を先に読み出す場合の動作を示すタイムチャートである。 Next, the operations of the writing unit 142 and the reading unit 143 ′ configured as described above will be described with reference to FIGS. FIG. 12 is a flowchart for explaining the operation of the configuration (reading unit 143 ′) related to the packet reading process shown in FIG. 13 shows that the configuration (reading unit 143 ′) related to the packet reading process shown in FIGS. 10 and 11 stores the second, sixth, tenth, and fourteenth data (DT2, DT6, DTA, DTE) according to the flowchart shown in FIG. It is a time chart which shows operation | movement in the case of reading previously.

第２実施形態の書込部１４２によるパケット書込動作は、図２０を参照しながら前述した関連技術の動作と同様であるので、その説明は省略する。一方、第２実施形態の読出部１４３′によるパケット読出動作は、図６を参照しながら前述した第１実施形態の読出部１４３の動作と部分的に異なっている。 The packet writing operation by the writing unit 142 of the second embodiment is the same as the operation of the related art described above with reference to FIG. On the other hand, the packet reading operation by the reading unit 143 ′ of the second embodiment is partially different from the operation of the reading unit 143 of the first embodiment described above with reference to FIG.

図１２に示すフローチャート（ステップＳ３１〜Ｓ４５）に従って、図１１および図１３を参照しながら第２実施形態の読出部１４３′によるパケット読出動作について説明する。 A packet reading operation by the reading unit 143 ′ of the second embodiment will be described with reference to FIGS. 11 and 13 according to the flowchart shown in FIG. 12 (steps S31 to S45).

ルータ１４Ａは、ＲＰの値とＨＤＷＰの値とが一致しているか否かを判断する（ステップＳ１１）。ＲＰの値とＨＤＷＰの値とが一致している場合（ステップＳ３１のＹＥＳルート）、ルータ１４Ａは、書込対象パケットの書込中であると判断し、ステップＳ３１の処理に戻る。 The router 14A determines whether or not the RP value matches the HDWP value (step S11). If the RP value matches the HDWP value (YES route of step S31), the router 14A determines that the write target packet is being written, and returns to the process of step S31.

ＲＰの値とＨＤＷＰの値とが一致しない場合（ステップＳ３１のＮＯルート）、ルータ１４Ａは、書込対象パケットの書込を完了したと判断し、受信バッファ１４１からのパケット読出を開始する（図１３のタイミングＴ１８参照）。つまり、ＲＰによって指定される、受信バッファ１４１のエントリ（一単位データ）が、受信バッファ１４１からＲＤＲ経由で読み出される（ステップＳ３２）。そして、読み出されたエントリがヘッダ（ＨＤ）であるか否かが判断される（ステップＳ３３）。 If the RP value and the HDWP value do not match (NO route of step S31), the router 14A determines that the writing of the write target packet has been completed, and starts reading the packet from the reception buffer 141 (FIG. 13 timing T18). That is, the entry (one unit data) of the reception buffer 141 designated by the RP is read from the reception buffer 141 via RDR (step S32). Then, it is determined whether or not the read entry is a header (HD) (step S33).

パケット読出の開始時には、まずパケットのヘッダが読み出されるため、読み出されたエントリはヘッダであると判断される（ステップＳ３３のＹＥＳルート）。この場合、読み出されたヘッダの所定間隔Intervalが参照され、ヘッダから当該Intervalの値が、読出部１４３′のIntervalレジスタに設定される（ステップＳ３４）。また、読み出されたヘッダのOpcodeが参照され、当該Opcode（パケット種）に基づき、受信バッファ１４１から読出中のパケットのデータ長Lengthが生成され、生成されたデータ長Lengthが読出部１４３′のLengthレジスタに設定される（ステップＳ３５）。 At the start of packet reading, the header of the packet is first read, so that the read entry is determined to be a header (YES route in step S33). In this case, the predetermined interval Interval of the read header is referred to, and the value of the Interval from the header is set in the Interval register of the reading unit 143 ′ (step S34). Further, the Opcode of the read header is referred to, the data length Length of the packet being read from the reception buffer 141 is generated based on the Opcode (packet type), and the generated data length Length is generated by the reading unit 143 ′. It is set in the Length register (step S35).

この後、当該Opcodeがフェッチ応答であるか否かを判断する（ステップＳ３６）。フェッチ応答である場合（ステップＳ３６のＹＥＳルート）、セレクタ１４３ｇは図１１の(1)を選択するように切替動作を行なう。これにより、加算器１４３ｅで得られた値ＲＰ＋ｐａ[6:3]＋１がＲＰに設定され（ステップＳ３７）、CycleＣＴが１インクリメントされる（ステップＳ３８）。そして、ＲＰに設定された値（アドレス）で指定されるエントリ（データＤＴ２）が読み出される（ステップＳ３１のＮＯルートからステップＳ３２）。つまり、パケットのヘッダを読み出した際に、ヘッダの物理アドレスｐａ[6:3]に基づき、コア１１Ａが要求しているデータ（先頭単位データ）に対応する受信バッファ１４１のアドレスＲＰ＋ｐａ[6:3]＋１がＲＰに設定される。図１３に示す例では、２番目のデータＤＴ２に対応するアドレスを示す３が、ＲＰにセットされる。 Thereafter, it is determined whether or not the Opcode is a fetch response (step S36). If it is a fetch response (YES route in step S36), the selector 143g performs a switching operation so as to select (1) in FIG. Thus, the value RP + pa [6: 3] +1 obtained by the adder 143e is set to RP (step S37), and CycleCT is incremented by 1 (step S38). Then, the entry (data DT2) designated by the value (address) set in RP is read (from the NO route in step S31 to step S32). That is, when reading the header of the packet, based on the physical address pa [6: 3] of the header, the address RP + pa [6: 3] of the reception buffer 141 corresponding to the data (first unit data) requested by the core 11A. ] +1 is set to RP. In the example shown in FIG. 13, 3 indicating the address corresponding to the second data DT2 is set in the RP.

この後、次のサイクルで、ヘッダの次に読み出されるエントリはデータであり、この場合（ステップＳ３３のＮＯルート）、CycleＣＴの値がデータ長（パケット長）Lengthに到達したか否かが判断される（ステップＳ３９）。つまり、読出対象パケットの全てのデータが読み出されたか否かが判断される。図１３では、Length＝１６の例が示されている。 Thereafter, in the next cycle, the entry read next to the header is data. In this case (NO route in step S33), it is determined whether or not the value of CycleCT has reached the data length (packet length) Length. (Step S39). That is, it is determined whether or not all data of the read target packet has been read. FIG. 13 shows an example in which Length = 16.

CycleＣＴ＝Lengthでない場合（ステップＳ３９のＮＯルート）、値ＲＰ＋Intervalが値ＨＤＲＰ＋Length（図１３では値１６）を超えているか否かが判断される（ステップＳ４０）。ＲＰ＋IntervalがＨＤＲＰ＋Length以下である場合（ステップＳ４０のＮＯルート）、もしくは、ヘッダのOpcodeがフェッチ応答でない場合（ステップＳ３６のＮＯルート）、セレクタ１４３ｇは図１１の(2)を選択するように切替動作を行なう。 If CycleCT = Length is not satisfied (NO route of step S39), it is determined whether or not the value RP + Interval exceeds the value HDRP + Length (value 16 in FIG. 13) (step S40). When RP + Interval is equal to or less than HDRP + Length (NO route of step S40), or when the Opcode of the header is not a fetch response (NO route of step S36), the selector 143g performs the switching operation so as to select (2) in FIG. Do.

これにより、ＲＰ＋Intervalの値が値ＨＤＲＰ＋Lengthを超えるまで、一単位データを読み出す都度、ＲＰの値に、所定間隔の値Interval（図１３ではInterval＝４）が加算され（ステップＳ４１）、CycleＣＴが１インクリメントされた後（ステップＳ３８）、処理はステップＳ３１に戻る。なお、ステップＳ４１の処理の実行タイミングは、図１３のタイミングＴ１９〜Ｔ２１，Ｔ２３〜Ｔ２５，Ｔ２７〜Ｔ２９，Ｔ３１〜Ｔ３３に対応する。 Thus, every time one unit of data is read out until the value of RP + Interval exceeds the value HDRP + Length, the value Interval (Interval = 4 in FIG. 13) is added to the value of RP (Step S41), and CycleCT is incremented by 1 After that (step S38), the process returns to step S31. Note that the execution timing of the process in step S41 corresponds to the timings T19 to T21, T23 to T25, T27 to T29, and T31 to T33 in FIG.

ＲＰ＋Intervalの値が値ＨＤＲＰ＋Lengthを超えると（ステップＳ４０のＹＥＳルート）、セレクタ１４３ｇは図１１の(3)を選択するように切替動作を行なう。これにより、演算器１４３ｉで得られた値ＨＤＲＰ＋［（ＲＰ−ＨＤＲＰ＋１）％Interval］がＲＰに設定され（ステップＳ４２）、CycleＣＴが１インクリメントされる（ステップＳ３８）。 When the value of RP + Interval exceeds the value HDRP + Length (YES route in step S40), the selector 143g performs a switching operation so as to select (3) in FIG. As a result, the value HDRP + [(RP−HDRP + 1)% Interval] obtained by the calculator 143i is set to RP (step S42), and CycleCT is incremented by 1 (step S38).

例えば、図１３のタイミングＴ２２では、値ＨＤＲＰ＋［（ＲＰ−ＨＤＲＰ＋１）％Interval］＝０＋［（１５−０＋１）％４］＝１６％４＝４であるため、ＲＰには４が設定される。また、図１３のタイミングＴ２６では、値ＨＤＲＰ＋［（ＲＰ−ＨＤＲＰ＋１）％Interval］＝０＋［（１６−０＋１）％４］＝１７％４＝１であるため、ＲＰには１が設定される。同様に、図１３のタイミングＴ３０では、値ＨＤＲＰ＋［（ＲＰ−ＨＤＲＰ＋１）％Interval］＝０＋［（１３−０＋１）％４］＝１４％４＝２であるため、ＲＰには２が設定される。この後、処理はステップＳ３１に戻る。 For example, at the timing T22 in FIG. 13, since the value HDRP + [(RP−HDRP + 1)% Interval] = 0 + [(15−0 + 1)% 4] = 16% 4 = 4, 4 is set in RP. Further, at the timing T26 in FIG. 13, since the value HDRP + [(RP−HDRP + 1)% Interval] = 0 + [(16−0 + 1)% 4] = 17% 4 = 1, 1 is set in the RP. Similarly, at the timing T30 in FIG. 13, since the value HDRP + [(RP−HDRP + 1)% Interval] = 0 + [(13−0 + 1)% 4] = 14% 4 = 2, 2 is set in RP. . Thereafter, the process returns to step S31.

この後、CycleＣＴの値がデータ長Lengthに到達すると（ステップＳ３９のＹＥＳルート；図１３のタイミングＴ３４参照）、CycleＣＴの値が０にリセットされる（ステップＳ４３）。そして、セレクタ１４３ｇは図１１の(4)を選択するように切替動作を行なう。これにより、加算器１４３ｆで得られた値ＨＤＲＰ＋Length＋１（図１３では値１７）が、次に読み出すべきデータのアドレスとしてＲＰに設定される（ステップＳ４４）。また、加算器１４３ｃで得られた値ＨＤＲＰ＋Length＋１（図７では値１７）が、次に読み出すべきパケットのヘッダのアドレスとしてＨＤＲＰに設定される（ステップＳ４５）。この後、処理はステップＳ３１に戻る。 Thereafter, when the value of CycleCT reaches the data length Length (YES route in step S39; see timing T34 in FIG. 13), the value of CycleCT is reset to 0 (step S43). Then, the selector 143g performs a switching operation so as to select (4) in FIG. As a result, the value HDRP + Length + 1 (value 17 in FIG. 13) obtained by the adder 143f is set to RP as the address of data to be read next (step S44). Further, the value HDRP + Length + 1 (value 17 in FIG. 7) obtained by the adder 143c is set to HDRP as the header address of the packet to be read next (step S45). Thereafter, the process returns to step S31.

以上の動作により、第２実施形態では、データＤＴ２，ＤＴ６，ＤＴＡ，ＤＴＥが他のデータよりも先に読み出され、レイテンシが短縮される。 With the above operation, in the second embodiment, the data DT2, DT6, DTA, and DTE are read before other data, and the latency is shortened.

また、上述した動作では、先頭の処理対象データの物理アドレスｐａ[6:3]と所定間隔Intervalとによって、読み出すデータの順番が一意に決まる。したがって、パケットを受け取ったコア１１Ａは、受信バッファ１４１におけるパケットから先に読み出されたデータ群以外のデータが、どの物理アドレスのデータであるかを容易に判断することができる。 In the above-described operation, the order of data to be read is uniquely determined by the physical address pa [6: 3] of the first process target data and the predetermined interval Interval. Therefore, the core 11A that has received the packet can easily determine the physical address of the data other than the data group previously read from the packet in the reception buffer 141.

なお、図１〜図７を参照しながら上述した第１実施形態は、図８〜図１３を参照しながら上述した第２実施形態の所定間隔Intervalの値が１である場合に相当する。 The first embodiment described above with reference to FIGS. 1 to 7 corresponds to the case where the value of the predetermined interval Interval of the second embodiment described above with reference to FIGS.

〔５〕その他
以上、本発明の好ましい実施形態について詳述したが、本発明は、係る特定の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲内において、種々の変形、変更して実施することができる。 [5] Others While the preferred embodiments of the present invention have been described in detail above, the present invention is not limited to such specific embodiments, and various modifications and changes can be made without departing from the spirit of the present invention. It can be changed and implemented.

〔６〕付記
以上の各実施形態を含む実施形態に関し、さらに以下の付記を開示する。 [6] Supplementary Notes The following supplementary notes are further disclosed with respect to the embodiments including the above embodiments.

（付記１）
他の演算処理装置に接続される演算処理装置であって、
処理対象データの読出要求を生成する処理部と、
前記処理部が生成した読出要求を前記他の演算処理装置へ送信するとともに、送信した前記読出要求に対応する前記処理対象データを含むデータブロックを前記他の演算処理装置から受信する通信部と、
前記データブロックを保存するバッファと、
前記通信部が受信したデータブロックに含まれる複数の単位データを前記バッファに順次書き込む書込部と、
前記複数の単位データのうちの少なくとも一つである前記処理対象データを前記バッファから優先的に読み出す読出部と、を有する、演算処理装置。 (Appendix 1)
An arithmetic processing device connected to another arithmetic processing device,
A processing unit for generating a read request for processing target data;
A communication unit that transmits the read request generated by the processing unit to the other arithmetic processing device and receives a data block including the processing target data corresponding to the transmitted read request from the other arithmetic processing device;
A buffer for storing the data block;
A writing unit for sequentially writing a plurality of unit data included in the data block received by the communication unit to the buffer;
And a reading unit that preferentially reads out the processing target data that is at least one of the plurality of unit data from the buffer.

（付記２）
前記読出部は、
前記データブロックに付された応答ヘッダに前記他の演算処理装置によって記録された前記処理対象データのアドレス情報を参照し、
当該アドレス情報に対応する前記処理対象データを前記バッファから読み出した後、
前記処理対象データ以外の単位データを前記バッファから順次読み出す、付記１に記載の演算処理装置。 (Appendix 2)
The reading unit
Refer to the address information of the processing target data recorded by the other arithmetic processing unit in the response header attached to the data block;
After reading the processing target data corresponding to the address information from the buffer,
The arithmetic processing apparatus according to appendix 1, wherein unit data other than the processing target data is sequentially read from the buffer.

（付記３）
複数の前記処理対象データが前記複数の単位データにおいて所定間隔をあけて存在する場合、
前記通信部は、
前記所定間隔に関する情報を含む前記読出要求を前記他の演算処理装置へ送信し、
前記読出部は、
前記データブロックに付された応答ヘッダに前記他の演算処理装置によって記録された先頭の前記処理対象データのアドレス情報と前記所定間隔に関する情報とを参照し、
前記アドレス情報と前記所定間隔とに基づき前記複数の前記処理対象データを前記バッファから読み出した後、
前記複数の前記処理対象データ以外の単位データを前記バッファから順次読み出す、付記１に記載の演算処理装置。 (Appendix 3)
When a plurality of the processing target data exists at predetermined intervals in the plurality of unit data,
The communication unit is
Transmitting the read request including information on the predetermined interval to the other arithmetic processing unit;
The reading unit
With reference to the address information of the processing target data at the beginning recorded by the other arithmetic processing device in the response header attached to the data block and the information on the predetermined interval,
After reading the plurality of processing target data from the buffer based on the address information and the predetermined interval,
The arithmetic processing apparatus according to appendix 1, wherein unit data other than the plurality of data to be processed is sequentially read from the buffer.

（付記４）
第１の演算処理装置と、
前記第１の演算処理装置に接続される第２の演算処理装置と、を有し、
前記第１の演算処理装置は、
処理対象データの読出要求を生成する処理部と、
前記処理部が生成した読出要求を前記第２の演算処理装置へ送信するとともに、送信した前記読出要求に対応する前記処理対象データを含むデータブロックを前記第２の演算処理装置から受信する第１の通信部と、
前記データブロックを保存するバッファと、
前記第１の通信部が受信したデータブロックに含まれる複数の単位データを前記バッファに順次書き込む書込部と、
前記複数の単位データのうちの少なくとも一つである前記処理対象データを前記バッファから優先的に読み出す読出部と、を有し、
前記第２の演算処理装置は、
前記第１の演算処理装置から前記読出要求を受信するとともに、受信した前記読出要求に対応する前記処理対象データを含む前記データブロックを前記第１の演算処理装置へ送信する第２の通信部を有する、情報処理装置。 (Appendix 4)
A first arithmetic processing unit;
A second arithmetic processing unit connected to the first arithmetic processing unit,
The first arithmetic processing unit includes:
A processing unit for generating a read request for processing target data;
A first request for transmitting the read request generated by the processing unit to the second arithmetic processing unit and receiving a data block including the processing target data corresponding to the transmitted read request from the second arithmetic processing unit. The communication department of
A buffer for storing the data block;
A writing unit for sequentially writing a plurality of unit data included in the data block received by the first communication unit to the buffer;
A reading unit that preferentially reads out the processing target data that is at least one of the plurality of unit data from the buffer;
The second arithmetic processing unit includes:
A second communication unit that receives the read request from the first arithmetic processing unit and transmits the data block including the processing target data corresponding to the received read request to the first arithmetic processing unit; An information processing apparatus.

（付記５）
前記第２の演算処理装置における前記第２の通信部は、
前記読出要求に含まれるアドレス情報から取り出された前記処理対象データのアドレス情報を記録した応答ヘッダを前記データブロックに付し、
前記応答ヘッダを付した前記データブロックを、前記第１の演算処理装置へ送信する、付記４に記載の情報処理装置。 (Appendix 5)
The second communication unit in the second arithmetic processing unit is:
A response header that records the address information of the processing target data extracted from the address information included in the read request is attached to the data block,
The information processing apparatus according to appendix 4, wherein the data block with the response header is transmitted to the first arithmetic processing apparatus.

（付記６）
前記第１の演算処理装置における前記読出部は、
前記データブロックに付された前記応答ヘッダに記録された前記処理対象データのアドレス情報を参照し、
当該アドレス情報に対応する前記処理対象データを前記バッファから読み出した後、
前記処理対象データ以外の単位データを前記バッファから順次読み出す、付記５に記載の情報処理装置。 (Appendix 6)
The reading unit in the first arithmetic processing unit is:
Referring to the address information of the processing target data recorded in the response header attached to the data block;
After reading the processing target data corresponding to the address information from the buffer,
The information processing apparatus according to appendix 5, wherein unit data other than the processing target data is sequentially read from the buffer.

（付記７）
複数の前記処理対象データが前記複数の単位データにおいて所定間隔をあけて存在する場合、
前記第１の演算処理装置における前記第１の通信部は、
前記所定間隔に関する情報を含む前記読出要求を前記第２の演算処理装置へ送信し、
前記第２の演算処理装置における前記第２の通信部は、
前記読出要求に含まれるアドレス情報から取り出された先頭の前記処理対象データのアドレス情報と、前記読出要求から取り出された前記所定間隔に関する情報とを記録した応答ヘッダを前記データブロックに付し、
前記応答ヘッダを付した前記データブロックを、前記第１の演算処理装置へ送信する、付記４に記載の情報処理装置。 (Appendix 7)
When a plurality of the processing target data exists at predetermined intervals in the plurality of unit data,
The first communication unit in the first arithmetic processing unit is:
Transmitting the read request including information on the predetermined interval to the second arithmetic processing unit;
The second communication unit in the second arithmetic processing unit is:
Attaching a response header to the data block, which is recorded with address information of the processing target data extracted from the address information included in the read request, and information regarding the predetermined interval extracted from the read request,
The information processing apparatus according to appendix 4, wherein the data block with the response header is transmitted to the first arithmetic processing apparatus.

（付記８）
前記第１の演算処理装置における前記読出部は、
前記データブロックに付された前記応答ヘッダに記録された前記先頭の前記処理対象データのアドレス情報と前記所定間隔に関する情報とを参照し、
前記アドレス情報と前記所定間隔とに基づき、前記複数の前記処理対象データを前記バッファから読み出した後、
前記複数の前記処理対象データ以外の単位データを前記バッファから順次読み出す、付記７に記載の情報処理装置。 (Appendix 8)
The reading unit in the first arithmetic processing unit is:
With reference to the address information of the processing target data at the head recorded in the response header attached to the data block and the information about the predetermined interval,
Based on the address information and the predetermined interval, after reading the plurality of data to be processed from the buffer,
The information processing apparatus according to appendix 7, wherein unit data other than the plurality of data to be processed is sequentially read from the buffer.

（付記９）
第１の演算処理装置と、前記第１の演算処理装置に接続される第２の演算処理装置と、を有する情報処理装置の制御方法であって、
前記第１の演算処理装置は、
処理対象データの読出要求を前記第２の演算処理装置へ送信し、
前記第２の演算処理装置は、
前記第１の演算処理装置から前記読出要求を受信すると、受信した前記読出要求に対応する前記処理対象データを含む前記データブロックを前記第１の演算処理装置へ送信し、
前記第１の演算処理装置は、
前記読出要求に対応する前記処理対象データを含むデータブロックを前記第２の演算処理装置から受信し、
受信したデータブロックに含まれる複数の単位データをバッファに順次書き込み、
前記複数の単位データのうちの少なくとも一つである前記処理対象データを前記バッファから優先的に読み出す、情報処理装置の制御方法。 (Appendix 9)
A method for controlling an information processing apparatus, comprising: a first arithmetic processing device; and a second arithmetic processing device connected to the first arithmetic processing device,
The first arithmetic processing unit includes:
Sending a read request for processing target data to the second arithmetic processing unit;
The second arithmetic processing unit includes:
When the read request is received from the first arithmetic processing unit, the data block including the processing target data corresponding to the received read request is transmitted to the first arithmetic processing unit,
The first arithmetic processing unit includes:
Receiving a data block including the processing target data corresponding to the read request from the second arithmetic processing unit;
A plurality of unit data included in the received data block are sequentially written to the buffer,
A method for controlling an information processing apparatus, wherein the processing target data that is at least one of the plurality of unit data is preferentially read from the buffer.

（付記１０）
前記第２の演算処理装置は、
前記読出要求に含まれるアドレス情報から取り出された前記処理対象データのアドレス情報を記録した応答ヘッダを前記データブロックに付し、
前記応答ヘッダを付した前記データブロックを、前記第１の演算処理装置へ送信する、付記９に記載の情報処理装置の制御方法。 (Appendix 10)
The second arithmetic processing unit includes:
A response header that records the address information of the processing target data extracted from the address information included in the read request is attached to the data block,
The control method of the information processing apparatus according to appendix 9, wherein the data block with the response header is transmitted to the first arithmetic processing apparatus.

（付記１１）
前記第１の演算処理装置は、
前記データブロックに付された前記応答ヘッダに記録された前記処理対象データのアドレス情報を参照し、
当該アドレス情報に対応する前記処理対象データを前記バッファから読み出した後、
前記処理対象データ以外の単位データを前記バッファから順次読み出す、付記１０に記載の情報処理装置の制御方法。 (Appendix 11)
The first arithmetic processing unit includes:
Referring to the address information of the processing target data recorded in the response header attached to the data block;
After reading the processing target data corresponding to the address information from the buffer,
11. The information processing apparatus control method according to appendix 10, wherein unit data other than the processing target data is sequentially read from the buffer.

（付記１２）
複数の前記処理対象データが前記複数の単位データにおいて所定間隔をあけて存在する場合、
前記第１の演算処理装置は、
前記所定間隔に関する情報を含む前記読出要求を前記第２の演算処理装置へ送信し、
前記第２の演算処理装置は、
前記読出要求に含まれるアドレス情報から取り出された先頭の前記処理対象データのアドレス情報と、前記読出要求から取り出された前記所定間隔に関する情報とを記録した応答ヘッダを前記データブロックに付し、
前記応答ヘッダを付した前記データブロックを、前記第１の演算処理装置へ送信する、付記９に記載の情報処理装置の制御方法。 (Appendix 12)
When a plurality of the processing target data exists at predetermined intervals in the plurality of unit data,
The first arithmetic processing unit includes:
Transmitting the read request including information on the predetermined interval to the second arithmetic processing unit;
The second arithmetic processing unit includes:
Attaching a response header to the data block, which is recorded with address information of the processing target data extracted from the address information included in the read request, and information regarding the predetermined interval extracted from the read request,
The control method of the information processing apparatus according to appendix 9, wherein the data block with the response header is transmitted to the first arithmetic processing apparatus.

（付記１３）
前記第１の演算処理装置は、
前記データブロックに付された前記応答ヘッダに記録された前記先頭の前記処理対象データのアドレス情報と前記所定間隔に関する情報とを参照し、
前記アドレス情報と前記所定間隔とに基づき、前記複数の前記処理対象データを前記バッファから読み出した後、
前記複数の前記処理対象データ以外の単位データを前記バッファから順次読み出す、付記１２に記載の情報処理装置の制御方法。 (Appendix 13)
The first arithmetic processing unit includes:
With reference to the address information of the processing target data at the head recorded in the response header attached to the data block and the information about the predetermined interval,
Based on the address information and the predetermined interval, after reading the plurality of data to be processed from the buffer,
13. The information processing apparatus control method according to appendix 12, wherein unit data other than the plurality of data to be processed is sequentially read from the buffer.

１，１′ 情報処理装置（マルチプロセッサシステム）
１０，１０Ａ，１０ＢＣＰＵ（演算処理装置，マルチコアプロセッサ）
１１，１１Ａ，１１Ｂコア（処理部）
１２，１２Ａ，１２Ｂ共有キャッシュ（三次キャッシュ）
１３，１３Ａ，１３ＢＭＡＣ
１４ルータ
１４Ａルータ（第１の通信部）
１４Ｂルータ（第２の通信部）
１４１受信バッファ（バッファ）
１４２書込部
１４３，１４３′ 読出部
１４３ａ length生成部
１４３ｂ，１４３ｄ１加算器（＋１）
１４３ｃ，１４３ｅ，１４３ｆ，１４３ｈ加算器
１４３ｇセレクタ
１４３ｉ演算器
２０，２０Ａ，２０ＢＤＩＭＭ（メインメモリ）
３０ネットワーク 1,1 'information processing device (multiprocessor system)
10, 10A, 10B CPU (arithmetic processing unit, multi-core processor)
11, 11A, 11B core (processing unit)
12, 12A, 12B Shared cache (tertiary cache)
13, 13A, 13B MAC
14 router 14A router (first communication unit)
14B router (second communication unit)
141 Receive buffer (buffer)
142 writing unit 143, 143 'reading unit 143a length generation unit 143b, 143d 1 adder (+1)
143c, 143e, 143f, 143h Adder 143g Selector 143i Operation unit 20, 20A, 20B DIMM (main memory)
30 network

Claims

An arithmetic processing device connected to another arithmetic processing device,
A processing unit for generating a read request for processing target data;
A communication unit that transmits the read request generated by the processing unit to the other arithmetic processing device and receives a data block including the processing target data corresponding to the transmitted read request from the other arithmetic processing device;
A buffer for storing the data block;
A writing unit for sequentially writing a plurality of unit data included in the data block received by the communication unit to the buffer;
And a reading unit that preferentially reads out the processing target data that is at least one of the plurality of unit data from the buffer.

The reading unit
Refer to the address information of the processing target data recorded by the other arithmetic processing unit in the response header attached to the data block;
After reading the processing target data corresponding to the address information from the buffer,
The arithmetic processing apparatus according to claim 1, wherein unit data other than the processing target data is sequentially read from the buffer.

When a plurality of the processing target data exists at predetermined intervals in the plurality of unit data,
The communication unit is
Transmitting the read request including information on the predetermined interval to the other arithmetic processing unit;
The reading unit
With reference to the address information of the processing target data at the beginning recorded by the other arithmetic processing device in the response header attached to the data block and the information on the predetermined interval,
After reading the plurality of processing target data from the buffer based on the address information and the predetermined interval,
The arithmetic processing apparatus according to claim 1, wherein unit data other than the plurality of processing target data is sequentially read from the buffer.

A first arithmetic processing unit;
A second arithmetic processing unit connected to the first arithmetic processing unit,
The first arithmetic processing unit includes:
A processing unit for generating a read request for processing target data;
A first request for transmitting the read request generated by the processing unit to the second arithmetic processing unit and receiving a data block including the processing target data corresponding to the transmitted read request from the second arithmetic processing unit. The communication department of
A buffer for storing the data block;
A writing unit for sequentially writing a plurality of unit data included in the data block received by the first communication unit to the buffer;
A reading unit that preferentially reads out the processing target data that is at least one of the plurality of unit data from the buffer;
The second arithmetic processing unit includes:
A second communication unit that receives the read request from the first arithmetic processing unit and transmits the data block including the processing target data corresponding to the received read request to the first arithmetic processing unit; An information processing apparatus.

The second communication unit in the second arithmetic processing unit is:
A response header that records the address information of the processing target data extracted from the address information included in the read request is attached to the data block,
The information processing apparatus according to claim 4, wherein the data block with the response header is transmitted to the first arithmetic processing apparatus.

When a plurality of the processing target data exists at predetermined intervals in the plurality of unit data,
The first communication unit in the first arithmetic processing unit is:
Transmitting the read request including information on the predetermined interval to the second arithmetic processing unit;
The second communication unit in the second arithmetic processing unit is:
Attaching a response header to the data block, which is recorded with address information of the processing target data extracted from the address information included in the read request, and information regarding the predetermined interval extracted from the read request,
The information processing apparatus according to claim 4, wherein the data block with the response header is transmitted to the first arithmetic processing apparatus.

The reading unit in the first arithmetic processing unit is:
With reference to the address information of the processing target data at the head recorded in the response header attached to the data block and the information about the predetermined interval,
Based on the address information and the predetermined interval, after reading the plurality of data to be processed from the buffer,
The information processing apparatus according to claim 6, wherein unit data other than the plurality of data to be processed is sequentially read from the buffer.

A method for controlling an information processing apparatus, comprising: a first arithmetic processing device; and a second arithmetic processing device connected to the first arithmetic processing device,
The first arithmetic processing unit includes:
Sending a read request for processing target data to the second arithmetic processing unit;
The second arithmetic processing unit includes:
When the read request is received from the first arithmetic processing unit, the data block including the processing target data corresponding to the received read request is transmitted to the first arithmetic processing unit,
The first arithmetic processing unit includes:
Receiving a data block including the processing target data corresponding to the read request from the second arithmetic processing unit;
A plurality of unit data included in the received data block are sequentially written to the buffer,
A method for controlling an information processing apparatus, wherein the processing target data that is at least one of the plurality of unit data is preferentially read from the buffer.