JP5527340B2

JP5527340B2 - Vector processing apparatus and vector processing method

Info

Publication number: JP5527340B2
Application number: JP2012040261A
Authority: JP
Inventors: 健加納
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2012-02-27
Filing date: 2012-02-27
Publication date: 2014-06-18
Anticipated expiration: 2032-02-27
Also published as: JP2013175115A

Description

本発明は、ベクトル処理装置およびベクトル処理方法に関し、特に、一定間隔を空けて並んでいる複数の配列要素に対してアクセスを行うベクトル処理装置およびベクトル処理方法に関する。 The present invention relates to a vector processing device and a vector processing method, and more particularly, to a vector processing device and a vector processing method for accessing a plurality of array elements arranged at regular intervals.

非特許文献１に、ベクトル計算機に関するストライドアクセスが記述されている。このベクトル計算機は、ストライドアクセスを行うためのベクトルロード命令を備えている。このベクトルロード命令は、主記憶の２次元行列データのうちで、連続アドレスによってアクセスできない行または列のデータにアクセスしようとする場合に、一定の距離（ストライド）間隔を空けたメモリアドレスの複数の要素にアクセスするものである。 Non-Patent Document 1 describes a stride access related to a vector computer. This vector computer is provided with a vector load instruction for performing stride access. This vector load instruction is used to access a plurality of memory addresses spaced by a certain distance (stride) when trying to access row or column data that cannot be accessed by continuous addresses in the two-dimensional matrix data in the main memory. Access the element.

特許文献１には、ベクトル計算機に関して、ワード単位でインターリーブされた主記憶部から複素数の実部および虚部の両方を読み出して、読みだしたデータを２つのロードバッファ、２つのベクトルレジスタにベクトルロードするベクトルロード方法が開示されている。また、特許文献２には、キャッシュメモリを備えたベクトル計算機が開示されている。 In Patent Document 1, regarding a vector computer, both a real part and an imaginary part of a complex number are read from a main memory unit interleaved in units of words, and the read data is loaded into two load buffers and two vector registers. A vector loading method is disclosed. Patent Document 2 discloses a vector computer having a cache memory.

特許第３７８６１８２号Japanese Patent No. 3786182 特開２０１０−８６４９６号公報JP 2010-86496 A

「コンピュータ・アーキテクチャ」、ジョン・ヘネシー、デイビッド・パターソン著、３６４〜３６６ページ、１９９２年１２月２５日発行"Computer Architecture", John Hennessy, by David Patterson, pp. 364-366, published December 25, 1992

近年、主記憶を構成するＳＤＲＡＭでは、高速化を図るために、１命令によって複数のワードを読み出す技術が用いられるようになった。この結果、１ワード単位で主記憶部をインターリーブさせることが困難になってしまい、主記憶に１命令によってアクセス可能な複数のワード単位で、主記憶部をインターリーブさせる、という手法を採用する必要が生じた。しかし、このような手法を採用すると、連続アドレスへのアクセスではないストライドアクセスを行う場合には、主記憶から１命令で読み出した複数のワードのうちで１ワードのみしか使用されないことになってしまう。 In recent years, in a SDRAM constituting a main memory, a technique for reading a plurality of words with one instruction has been used in order to increase the speed. As a result, it becomes difficult to interleave the main memory unit in units of one word, and it is necessary to adopt a method of interleaving the main memory unit in units of a plurality of words that can be accessed by one instruction to the main memory. occured. However, when such a method is adopted, when performing stride access that is not access to a continuous address, only one word is used among a plurality of words read from the main memory by one instruction. .

言い換えると、主記憶への１命令によって読み出せるワードの数が多くなったために、１ワードだけを主記憶から読み出すという構成をとることが困難となっている。このため、１ワードだけを必要とするストライドアクセスを行う場合においても、複数のワード（ブロック）を主記憶から読み出すことになってしまう。つまり、主記憶から読み出した複数のワードのうちで、レジスタにロードされるのは１ワードだけであり、その他のワードは使われないことになる。この結果、ストライドアクセス時に本来必要としない不要なアクセスをも含むことになってしまい、メモリアクセスに要する消費電力の増加につながっていた。 In other words, since the number of words that can be read by one instruction to the main memory is increased, it is difficult to adopt a configuration in which only one word is read from the main memory. For this reason, even when stride access requiring only one word is performed, a plurality of words (blocks) are read from the main memory. In other words, among the plurality of words read from the main memory, only one word is loaded into the register, and the other words are not used. As a result, unnecessary access that is not originally required during stride access is included, leading to an increase in power consumption required for memory access.

また、上記の問題の解決に関して、特許文献２では、キャッシュメモリを設けて、先行のストライドアクセスによって読み出した複数のワードを予めキャッシュメモリに格納しておき、後続のストライドアクセスではキャッシュメモリ上のデータを用いる、という手法が開示されている。しかし、主記憶の容量と比較してキャッシュメモリの容量は小さいため、必要とする全てのデータをキャッシュメモリに格納できるわけではない。また、キャッシュメモリにデータを一旦格納できたとしても、他のメモリアクセスによって追い出されてしまい、後続のストライドアクセス時にはキャッシュメモリ上に既にデータが存在しないことも想定される。 Regarding the solution of the above problem, in Patent Document 2, a cache memory is provided, and a plurality of words read by preceding stride access are stored in the cache memory in advance, and data on the cache memory is stored by subsequent stride access. The technique of using is disclosed. However, since the capacity of the cache memory is small compared with the capacity of the main memory, not all necessary data can be stored in the cache memory. In addition, even if data can be once stored in the cache memory, it is expelled by another memory access, and it is assumed that no data already exists in the cache memory at the time of subsequent stride access.

言い換えると、後続のストライドアクセスによってアクセスされるワードを含んだブロックを、前回のストライドアクセスにおいてキャッシュメモリに予め格納しておいたとしても、後続のストライドアクセスが実行される前に、キャッシュメモリ上のブロックが既に追い出されていることも考えられる。このため、仮にキャッシュメモリを設けたとしても、後続のストライドアクセスによって同じブロックを再度、主記憶から読み出す可能性がある。なお、十分な容量のキャッシュメモリを確保することで必要なブロックが追い出されないように保証し、後続のベクトルロード命令によってキャッシュヒットさせることが考えられるが、この場合には、キャッシュメモリの容量の増加につながってしまう。 In other words, even if a block including a word accessed by a subsequent stride access is stored in the cache memory in advance in the previous stride access, before the subsequent stride access is executed, It is also possible that the block has already been evicted. For this reason, even if a cache memory is provided, the same block may be read from the main memory again by subsequent stride access. It should be noted that by securing a sufficient amount of cache memory, it is possible to guarantee that necessary blocks will not be evicted, and a cache hit may be caused by a subsequent vector load instruction. It will lead to an increase.

本発明の第１の目的は、このような問題を解決するためになされたものであり、消費電力の低減が可能なベクトル処理装置およびベクトル処理方法を提供することを目的とするものである。 A first object of the present invention is to solve such a problem, and an object thereof is to provide a vector processing device and a vector processing method capable of reducing power consumption.

本発明に係るベクトル処理装置は、メモリの読み出し単位ごとにそれぞれ異なる記憶手段に記憶するようにインターリーブされた主記憶装置と、前記主記憶装置から読み出されたデータを一時的に格納するロードバッファと、前記主記憶装置に対するデータ読み出しリクエストの発行、および前記主記憶装置から読み出されたデータの前記ロードバッファへの書き込みを制御するロードバッファ制御手段と、前記ロードバッファから転送されるデータを格納するベクトルレジスタと、を備え、前記主記憶装置上の２次元配列の要素が、メモリのアドレスが連続して格納されない次元方向の全ての要素が前記メモリの読み出し単位内での位置が同じになるように予め位置が調整されて前記主記憶装置に格納され、前記ロードバッファ制御手段は、前記メモリのアドレスが連続して格納されない次元方向へのアクセスを行うベクトルロード命令を受けた場合に、前記リクエストの識別情報と、当該リクエストにより前記主記憶装置から読み出されるデータの前記ロードバッファでの書き込み位置情報と、に基づいて、前記主記憶装置から読み出されたデータの前記ロードバッファへの書き込みを制御するものである。 A vector processing apparatus according to the present invention includes a main storage device that is interleaved so as to be stored in different storage means for each memory read unit, and a load buffer that temporarily stores data read from the main storage device And a load buffer control means for controlling the issuance of a data read request to the main storage device and the writing of the data read from the main storage device to the load buffer, and the data transferred from the load buffer are stored A two-dimensional array element on the main memory, and all elements in the dimension direction in which memory addresses are not stored continuously have the same position in the reading unit of the memory. The position is adjusted in advance and stored in the main memory, and the load buffer control means When receiving a vector load instruction that performs access in a dimension direction in which the addresses of the memory are not continuously stored, the identification information of the request and the data read from the main storage device by the request in the load buffer Based on the write position information, the writing of the data read from the main storage device to the load buffer is controlled.

また、本発明に係るベクトル処理装置のベクトル処理方法は、主記憶装置上の２次元配列の要素が、メモリのアドレスが連続して格納されない次元方向の全ての要素が前記メモリの読み出し単位内での位置が同じになるように予め位置が調整されて前記主記憶装置に格納されるステップと、前記メモリのアドレスが連続して格納されない次元方向へのアクセスを行うベクトルロード命令を受けた場合に、前記主記憶装置に対するデータ読み出しリクエストの識別情報と、当該リクエストにより前記主記憶装置から読み出されるデータのロードバッファでの書き込み位置情報と、に基づいて、前記主記憶装置から読み出されたデータの前記ロードバッファへの書き込みを制御するステップと、を備えるものである。 Further, the vector processing method of the vector processing device according to the present invention is such that all elements in the two-dimensional array on the main storage device in the dimension direction in which memory addresses are not continuously stored are within the read unit of the memory. When the position is adjusted in advance so as to be the same position and stored in the main storage device, and when a vector load instruction for accessing in the dimension direction in which the address of the memory is not continuously stored is received , Based on the identification information of the data read request to the main storage device and the write position information in the load buffer of the data read from the main storage device by the request, the data read from the main storage device And a step of controlling writing to the load buffer.

本発明によれば、消費電力の低減が可能なベクトル処理装置およびベクトル処理方法を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the vector processing apparatus and vector processing method which can reduce power consumption can be provided.

実施の形態１にかかるベクトル処理装置の構成を示す図である。1 is a diagram illustrating a configuration of a vector processing apparatus according to a first embodiment. 実施の形態１にかかるベクトル処理装置の動作を説明するための図である。FIG. 5 is a diagram for explaining an operation of the vector processing apparatus according to the first embodiment; 実施の形態１にかかる配列の構成を説明するための図である。FIG. 3 is a diagram for explaining a configuration of an array according to the first embodiment. 実施の形態１にかかる配列の要素のアドレスを説明するための図である。FIG. 6 is a diagram for explaining addresses of elements of the array according to the first embodiment; 実施の形態１にかかるベクトル処理装置の動作を説明するための図である。FIG. 5 is a diagram for explaining an operation of the vector processing apparatus according to the first embodiment; 本発明にかかるベクトル処理装置の本質的部分を説明するための図である。It is a figure for demonstrating the essential part of the vector processing apparatus concerning this invention.

以下では、本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。各図面において、同一又は対応する要素には同一の符号が付されており、説明の明確化のため、必要に応じて重複説明は省略される。
＜発明の実施の形態１＞
［構成の説明］
図１は、本実施の形態に係るベクトル処理装置の構成例を示している。ベクトル処理装置は、ベクトル演算器１と、ベクトルレジスタ２と、演算器クロスバ３と、ロードバッファ４と、ロードバッファ制御部５と、主記憶クロスバ６と、主記憶装置としての主記憶７と、を備えている。 Hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings. In each drawing, the same or corresponding elements are denoted by the same reference numerals, and redundant description is omitted as necessary for clarification of the description.
<Embodiment 1 of the Invention>
[Description of configuration]
FIG. 1 shows a configuration example of a vector processing apparatus according to the present embodiment. The vector processing device includes a vector computing unit 1, a vector register 2, a computing unit crossbar 3, a load buffer 4, a load buffer control unit 5, a main memory crossbar 6, a main memory 7 as a main memory, It has.

ベクトル演算器１は、ベクトルレジスタ２に格納されているデータを用いてベクトル演算を行う演算器群である。ベクトル演算器１は、通常、複数種類の演算器を備えている。演算結果は、ベクトルレジスタ２に格納される。 The vector computing unit 1 is a group of computing units that perform vector computation using data stored in the vector register 2. The vector computing unit 1 normally includes a plurality of types of computing units. The calculation result is stored in the vector register 2.

ベクトルレジスタ２は、ベクトル演算に用いるベクトルデータを格納するレジスタ群である。ベクトルレジスタ２は、通常、複数個のレジスタを備えている。１つのレジスタ当たり、複数個のデータを格納することができ、最大ベクトル長分の個数のデータを格納することができる。本実施の形態では、最大ベクトル長を１６とした場合を例に説明し、１つのレジスタ当たり、１６個のデータを格納することができる（すなわち、ベクトルレジスタ２に含まれる１つのレジスタあたりの要素数は１６である）。 The vector register 2 is a group of registers for storing vector data used for vector operations. The vector register 2 usually includes a plurality of registers. A plurality of data can be stored per register, and the number of data corresponding to the maximum vector length can be stored. In the present embodiment, the case where the maximum vector length is 16 will be described as an example, and 16 data can be stored per register (that is, elements per register included in the vector register 2). The number is 16.)

演算器クロスバ３は、ベクトル演算器１およびベクトルレジスタ２と、ロードバッファ４とを相互に接続するネットワークである。 The computing unit crossbar 3 is a network that connects the vector computing unit 1, the vector register 2, and the load buffer 4 to each other.

ロードバッファ４は、主記憶７から読み出されたデータを一時的に格納するためのバッファ群である。本実施の形態では、ロードバッファ４は、複数個のロードバッファ４−１〜４−４を備えている。通常は、１つのベクトルロード命令に対して、ロードバッファ４のうちで１つのロードバッファ４−１〜４−４が用いられる。ロードバッファ４−１〜４−４それぞれの１つ当たりの容量は、ベクトルレジスタ２の１つ当たりの容量と同じである。本実施の形態では、１つのロードバッファ４−１〜４−４当たり、１６個のデータを格納することができる。主記憶７から読み出されるデータは、主記憶７への読み出しリクエストの順序に従って返されるとは限られないため、本実施の形態では、このようなロードバッファ４を用いている。なお、ロードバッファ４を構成するバッファ群の個数は４個に限定されず、主記憶の読み出し単位の数と同じか、またはそれ以上の数とすればよい。 The load buffer 4 is a buffer group for temporarily storing data read from the main memory 7. In the present embodiment, the load buffer 4 includes a plurality of load buffers 4-1 to 4-4. Normally, one load buffer 4-1 to 4-4 among the load buffers 4 is used for one vector load instruction. The capacity per load buffer 4-1 to 4-4 is the same as the capacity per vector register 2. In the present embodiment, 16 pieces of data can be stored per load buffer 4-1 to 4-4. Since the data read from the main memory 7 is not always returned according to the order of read requests to the main memory 7, this embodiment uses such a load buffer 4. The number of buffer groups constituting the load buffer 4 is not limited to four, and may be the same as or more than the number of read units in the main memory.

ロードバッファ制御部５は、主記憶７に対する主記憶読み出しリクエストの発行や、主記憶７から読み出されたデータのロードバッファ４への書き込みの制御を行う。本実施の形態では、ロードバッファ制御部５は、ロードバッファテーブル８を備えている。 The load buffer control unit 5 issues a main memory read request to the main memory 7 and controls writing of data read from the main memory 7 to the load buffer 4. In the present embodiment, the load buffer control unit 5 includes a load buffer table 8.

ロードバッファテーブル８は、主記憶７から返却されたデータを、ロードバッファ４のうちいずれのロードバッファ４−１〜４−４のどの位置に書き込むかを保持する。ロードバッファ制御部５は、ベクトルロード命令を受け付けると、主記憶７の主記憶部７−１〜７−４それぞれに対して主記憶読み出しリクエストの発行を行うが、主記憶部７−１〜７−４に対してリクエストを発行する際に、そのリクエストの識別情報（以下、リクエストIDと呼ぶ）と、そのリクエストの対象となる主記憶部７−１〜７−４からデータが返却された場合に返却データを書き込むロードバッファ４−１〜４−４の書き込み位置を示す情報と、を対応付けてロードバッファテーブル８に保持する。そして、ロードバッファ制御部５は、主記憶部７−１〜７−４から返却されるデータに付加されたリクエストＩＤと、そのリクエストに対して主記憶部７−１〜７−４のそれぞれで何番目に返却された返却データであるかと、に基づいて、ロードバッファテーブル８を参照して、返却データをどのロードバッファ４のどこに書き込むかを判断する。 The load buffer table 8 holds in which position in any of the load buffers 4-1 to 4-4 of the load buffer 4 the data returned from the main memory 7 is written. When receiving the vector load instruction, the load buffer control unit 5 issues a main memory read request to each of the main storage units 7-1 to 7-4 of the main memory 7, but the main storage units 7-1 to 7-7. When a request is issued to -4, identification information of the request (hereinafter referred to as a request ID) and data returned from the main storage units 7-1 to 7-4 that are the targets of the request Are stored in the load buffer table 8 in association with the information indicating the write positions of the load buffers 4-1 to 4-4 to which the return data is written. Then, the load buffer control unit 5 uses the request ID added to the data returned from the main storage units 7-1 to 7-4 and the main storage units 7-1 to 7-4 for the request. Based on what return data is returned, the load buffer table 8 is referred to and it is determined in which load buffer 4 the return data is written.

主記憶クロスバ６は、主記憶部７−１〜７−４とロードバッファ４を相互に接続するネットワークである。 The main memory crossbar 6 is a network that connects the main memory units 7-1 to 7-4 and the load buffer 4 to each other.

主記憶７は、メモリの読み出し単位ごとにそれぞれ異なる記憶部にインターリーブされている。本実施の形態では、メモリの読み出し単位を４とし、主記憶７が４つの主記憶部７−１〜７−４に分割された場合を例に説明する。すなわち、主記憶７は４つの主記憶部７−１〜７−４を含み、１回のメモリの読み出しによって、４個のデータを読み出すことができる。主記憶７は、例えばＳＤＲＡＭ（Ｓｙｎｃｈｒｏｎｏｕｓｄｙｎａｍｉｃｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ）を用いて構成される。なお、メモリの読み出し単位は主記憶７の構成に応じて決定され、４個に限定されず、２個以上の読み出し単位とすればよい。また、メモリの読み出し単位を複数とすれば、主記憶７の分割の単位は４つに限定されず、１以上の任意の分割の単位としてよい。 The main memory 7 is interleaved in different storage units for each memory read unit. In the present embodiment, the case where the read unit of the memory is 4, and the main memory 7 is divided into four main memory units 7-1 to 7-4 will be described as an example. That is, the main memory 7 includes four main storage units 7-1 to 7-4, and four data can be read out by reading the memory once. The main memory 7 is configured using, for example, an SDRAM (Synchronous dynamic random access memory). Note that the read unit of the memory is determined according to the configuration of the main memory 7 and is not limited to four, but may be two or more read units. Further, if a plurality of memory reading units are used, the division unit of the main memory 7 is not limited to four, and may be one or more arbitrary division units.

主記憶部７−１には、メモリアドレス０〜３のデータ、メモリアドレス１６〜１９のデータ、メモリアドレス３２〜３５のデータ、…が格納される。主記憶部７−２には、メモリアドレス４〜７のデータ、メモリアドレス２０〜２３のデータ、メモリアドレス３６〜３９のデータ、…が格納される。主記憶部７−３には、メモリアドレス８〜１１のデータ、メモリアドレス２４〜２７のデータ、メモリアドレス４０〜４３のデータ、…が格納される。主記憶部７−４には、メモリアドレス１２〜１５のデータ、メモリアドレス２８〜３１のデータ、メモリアドレス４４〜４７のデータ、…が格納される。 The main memory 7-1 stores data at memory addresses 0 to 3, data at memory addresses 16 to 19, data at memory addresses 32 to 35, and so on. The main memory 7-2 stores data at memory addresses 4-7, data at memory addresses 20-23, data at memory addresses 36-39, and so on. The main memory 7-3 stores data at memory addresses 8 to 11, data at memory addresses 24 to 27, data at memory addresses 40 to 43, and so on. The main memory 7-4 stores data at memory addresses 12 to 15, data at memory addresses 28 to 31, data at memory addresses 44 to 47, and so on.

主記憶部７−１〜７−４は、それぞれが独立して動作する。主記憶部７−１〜７−４は、主記憶読み出しリクエストをそれぞれ受け取り、読み出したデータを主記憶クロスバ６に送る。本実施の形態では、１クロックあたり最大で４個のデータがロードバッファ４に返却され、ロードバッファ４に書き込まれる。また、主記憶部７−１〜７−４は、受け取ったリクエストのＩＤを返却データに付加し、主記憶クロスバ６に送る。 Each of the main storage units 7-1 to 7-4 operates independently. The main memory units 7-1 to 7-4 each receive a main memory read request and send the read data to the main memory crossbar 6. In the present embodiment, a maximum of four data per clock is returned to the load buffer 4 and written to the load buffer 4. Further, the main storage units 7-1 to 7-4 add the received request ID to the return data and send it to the main storage crossbar 6.

［動作の説明］
続いて以下では、ベクトル処理装置の動作の一例について、図２〜図５を用いて説明する。
まず、図２を参照して、図１に示したベクトル処理装置において、連続アドレスのデータに対してアクセスするベクトルロードの処理方法について説明する。 [Description of operation]
Subsequently, an example of the operation of the vector processing device will be described below with reference to FIGS.
First, with reference to FIG. 2, a vector load processing method for accessing data of continuous addresses in the vector processing apparatus shown in FIG. 1 will be described.

図２では、ある時刻において主記憶部７−１〜７−４から読み出されるデータと、主記憶部７−１〜７−４からの読み出しが完了したときにロードバッファ４に格納される内容と、を示している。例えば、時刻Ｔでは、メモリアドレス０、４、８、１２の内容が、主記憶部７−１〜７−４からそれぞれ読み出される。そして、読み出しが完了したときには、メモリアドレス０〜１５の内容がロードバッファ４に格納される。 In FIG. 2, data read from the main storage units 7-1 to 7-4 at a certain time, and contents stored in the load buffer 4 when reading from the main storage units 7-1 to 7-4 is completed. , Shows. For example, at time T, the contents of memory addresses 0, 4, 8, and 12 are read from the main storage units 7-1 to 7-4, respectively. When reading is completed, the contents of the memory addresses 0 to 15 are stored in the load buffer 4.

ベクトルロード命令は、主記憶からベクトルレジスタ２へのデータ転送命令である。本実施の形態では、ベクトルロード命令は、第１〜第４のオペランドを含んでいる。
第１オペランドは、データが格納されるベクトルレジスタ２の個数を指定する。
第２オペランドは、データが格納される先頭のベクトルレジスタ２を指定する。
第３オペランドは、ストライド値を指定する。
第４オペランドは、主記憶上でのデータの読み出しを開始するアドレスを指定する。 The vector load instruction is a data transfer instruction from the main memory to the vector register 2. In the present embodiment, the vector load instruction includes first to fourth operands.
The first operand specifies the number of vector registers 2 in which data is stored.
The second operand specifies the top vector register 2 in which data is stored.
The third operand specifies a stride value.
The fourth operand specifies an address from which data reading on the main memory is started.

ベクトル処理装置は、このベクトルロード命令を用いることで、主記憶上での開始アドレス（第４オペランド）とストライド値（第３オペランド）に基づいてデータの読み出しを行い、ベクトルレジスタ２の先頭ベクトルレジスタ（第２オペランド）から開始して、ベクトルレジスタ２の個数（第１オペランド）分のベクトルレジスタ２の各要素に対して、データの格納を行う。 By using this vector load instruction, the vector processing device reads data based on the start address (fourth operand) and the stride value (third operand) on the main memory, and the first vector register of the vector register 2 Starting from (second operand), data is stored in each element of the vector register 2 for the number of vector registers 2 (first operand).

例えば、第１オペランドに１が指定され、第２オペランドに０番目のベクトルレジスタ２が指定され、第３オペランドに１(８バイト)が指定され、第４オペランドにアドレス０が指定されたベクトルロード命令が実行された場合を仮定する。また、ベクトルロード命令の発行時のベクトル長は１６であり、ベクトルロード命令では１６個のデータが読み出されると想定する。すなわち、ベクトルロード命令によって、１つのロードバッファ４に、アドレスの間隔が１である要素が１６個格納される。 For example, vector load in which 1 is specified for the first operand, 0th vector register 2 is specified for the second operand, 1 (8 bytes) is specified for the third operand, and address 0 is specified for the fourth operand Assume that the instruction is executed. Further, it is assumed that the vector length at the time of issuing the vector load instruction is 16, and that 16 data are read out by the vector load instruction. In other words, 16 elements having an address interval of 1 are stored in one load buffer 4 by the vector load instruction.

まず、ベクトルロード命令を受けたロードバッファ制御部５は、ロードバッファ４を１つ確保する。そして、ロードバッファ制御部５は、主記憶部７−１〜７−４それぞれに対して、主記憶読み出しリクエストを１つずつ送る。ここで、主記憶読み出しリクエストには、主記憶部７−１〜７−４それぞれでの開始アドレスが付加されている。 First, the load buffer control unit 5 that has received the vector load instruction secures one load buffer 4. Then, the load buffer control unit 5 sends one main memory read request to each of the main storage units 7-1 to 7-4. Here, the start address in each of the main storage units 7-1 to 7-4 is added to the main memory read request.

主記憶部７−１は、メモリアドレス０〜３の内容を順次読み出して、読み出したデータを主記憶クロスバ６に送る。主記憶部７−２は、メモリアドレス４〜７の内容を順次読み出して、読み出したデータを主記憶クロスバ６に送る。主記憶部７−３は、メモリアドレス８〜１１の内容を順次読み出して、読み出したデータを主記憶クロスバ６に送る。主記憶部７−４は、メモリアドレス１２〜１５の内容を順次読み出して、読み出したデータを主記憶クロスバ６に送る。主記憶クロスバ６は、主記憶部７−１〜７−４からそれぞれ送られたデータを、ロードバッファ４に対して返却する。 The main memory 7-1 sequentially reads the contents of the memory addresses 0 to 3 and sends the read data to the main memory crossbar 6. The main memory 7-2 sequentially reads the contents of the memory addresses 4 to 7 and sends the read data to the main memory crossbar 6. The main memory 7-3 sequentially reads the contents of the memory addresses 8 to 11 and sends the read data to the main memory crossbar 6. The main memory 7-4 sequentially reads the contents of the memory addresses 12 to 15 and sends the read data to the main memory crossbar 6. The main memory crossbar 6 returns the data respectively sent from the main memory units 7-1 to 7-4 to the load buffer 4.

ロードバッファ制御部５は、返却データに付加されたリクエストＩＤと、返却データが何番目に返却されたワードであるかと、に基づいて、ロードバッファテーブル８を検索して、主記憶７から返却されたデータをどのロードバッファ４のどこに書き込むかを判断して、ロードバッファ４に書き込む。 The load buffer control unit 5 searches the load buffer table 8 based on the request ID added to the return data and the number of the returned data, and is returned from the main memory 7. It is determined where in which load buffer 4 the written data is written, and the data is written into the load buffer 4.

ロードバッファ制御部５は、主記憶７からの読み出し対象となった全てのデータ（１６個のデータ）を、ロードバッファ４に全て書き込えたと判断した場合には、第２オペランドで指定された０番目のベクトルレジスタ２に対してデータを転送し、全て転送し終わるとベクトルロード命令を完了する。 When the load buffer control unit 5 determines that all the data (16 pieces of data) to be read from the main memory 7 has been written to the load buffer 4, it is designated by the second operand. Data is transferred to the 0th vector register 2, and when all of the data has been transferred, the vector load instruction is completed.

次に、図３および図４を参照して、主記憶７上の２次元配列の列に対してアクセスを行うベクトルロード命令に関して、ベクトル命令の実行前に予め行われる前処理方法を説明する。図３は、本実施の形態で用いる２次元配列を示し、アクセスの対象となる主記憶７上のデータを２次元配列の形式で配置している。図３では、１６行１７列の２次元配列を例示している。ここで、２次元配列の行方向において同一行内のデータは、主記憶７上でのメモリアドレスは連続しているものとする。一方で、列方向において同一列内のデータは、主記憶７上でのメモリアドレスは連続していないものとする。つまり、同じ列のデータに対してアクセスする場合には、一定間隔（ストライド）を空けて並んでいる複数の配列要素に対するアクセス（ストライドアクセス）となる。 Next, with reference to FIG. 3 and FIG. 4, a preprocessing method that is performed in advance before execution of a vector instruction will be described with respect to a vector load instruction that accesses a two-dimensional array column on the main memory 7. FIG. 3 shows a two-dimensional array used in the present embodiment, in which data on the main memory 7 to be accessed is arranged in the form of a two-dimensional array. FIG. 3 illustrates a two-dimensional array of 16 rows and 17 columns. Here, the data in the same row in the row direction of the two-dimensional array is assumed to have continuous memory addresses on the main memory 7. On the other hand, it is assumed that the memory addresses on the main memory 7 are not continuous in the data in the same column in the column direction. In other words, when accessing the data in the same column, access (stride access) is made to a plurality of array elements arranged at regular intervals (stride).

本実施の形態では、ストライドアクセスのベクトルロード命令を生成するに先立って、２次元配列の列方向において同一列内に位置する全ての要素が、主記憶の読み出し単位内で同一の位置となるように、２次元配列の要素の位置を予め調整する。 In the present embodiment, prior to generating a stride access vector load instruction, all elements located in the same column in the column direction of the two-dimensional array are set to the same position in the main memory read unit. In addition, the positions of the elements of the two-dimensional array are adjusted in advance.

２次元配列の調整は、コンパイラによって行われる。コンパイラは、ベクトルロード命令を生成するに先立って、主記憶７でのメモリの読み出し単位と２次元配列の構成に基づいてストライドを決定する。コンパイラは、この決定したストライドに応じて２次元配列の調整を行う。コンパイラは、調整した２次元配列を主記憶部７−１〜７−４に対して配置する。具体的には、例えば、図３に示す２次元配列では、メモリの読み出し単位が４であり、２次元配列の列数が１７であるため、１７より大きな値であって４の倍数である２０をストライドとして決定する。 Adjustment of the two-dimensional array is performed by a compiler. Prior to generating the vector load instruction, the compiler determines the stride based on the memory read unit in the main memory 7 and the configuration of the two-dimensional array. The compiler adjusts the two-dimensional array in accordance with the determined stride. The compiler places the adjusted two-dimensional array in the main storage units 7-1 to 7-4. Specifically, for example, in the two-dimensional array shown in FIG. 3, the memory read unit is 4, and the number of columns of the two-dimensional array is 17, so the value is larger than 17 and is a multiple of 4 20 Is determined as a stride.

なお、コンパイラは、メモリの読み出し単位と、２次元配列の構成と、主記憶７−１〜７−４の個数と、に基づいて、ストライドを決定するようにしてもよい。具体的には、例えば、図３に示す２次元配列では、メモリの読み出し単位が４であり、２次元配列の列数が１７であり、主記憶部７−１〜７−４の個数が４であるため、１７より大きな値であって１６の倍数でない値２０をストライドとして決定する。ここで、値１６は、メモリの読み出し単位と主記憶部７−１〜７−４の個数との乗算により求まる値である。 The compiler may determine the stride based on the memory reading unit, the configuration of the two-dimensional array, and the number of main memories 7-1 to 7-4. Specifically, for example, in the two-dimensional array shown in FIG. 3, the memory reading unit is 4, the number of columns of the two-dimensional array is 17, and the number of main storage units 7-1 to 7-4 is 4. Therefore, a value 20 greater than 17 and not a multiple of 16 is determined as a stride. Here, the value 16 is a value obtained by multiplying the read unit of the memory by the number of main storage units 7-1 to 7-4.

コンパイラは、このようにして求めたストライドに応じて２次元配列の調整を行う。図３に示す２次元配列では、２次元配列の列数が２０となるように、３列分の行列を追加する（図において斜線を用いて示す範囲３０１）。このような行列の追加（パディング）によって、行方向において２次元配列の同一行内の全てのデータをちょうど読み出し単位で扱うことができ、また、列方向において同一列内に位置する全ての要素が、メモリの読み出し単位内で同一の位置に調整される。 The compiler adjusts the two-dimensional array according to the stride thus obtained. In the two-dimensional array shown in FIG. 3, a matrix for three columns is added so that the number of columns in the two-dimensional array is 20 (range 301 indicated by hatching in the figure). By adding such a matrix (padding), all the data in the same row of the two-dimensional array in the row direction can be handled in the readout unit, and all the elements located in the same column in the column direction are It is adjusted to the same position within the reading unit of the memory.

図４および図１を用いて、パディング後の２次元配列の各要素と主記憶７の格納アドレスの関係を説明する。図４は、図３に示した２次元配列をパディングして主記憶部７−１〜７−４に対してマッピングした場合に、２次元配列の１列目〜４列目の要素が格納される主記憶部７−１〜７−４のアドレスを示している。なお、主記憶部７−１〜７−４の格納開始メモリアドレスは０とした。また、図４では、図３に示した２次元配列から１列目〜４列目のみを抜き出して表示している。 The relationship between each element of the two-dimensional array after padding and the storage address of the main memory 7 will be described with reference to FIGS. 4 and 1. 4 stores the elements of the first to fourth columns of the two-dimensional array when the two-dimensional array shown in FIG. 3 is padded and mapped to the main storage units 7-1 to 7-4. The addresses of main storage units 7-1 to 7-4 are shown. The storage start memory addresses of the main storage units 7-1 to 7-4 are set to 0. In FIG. 4, only the first to fourth columns are extracted from the two-dimensional array shown in FIG.

図４において、各列のそれぞれの要素が、図１に示した主記憶部７−１〜７−１〜７−４のアドレスにマッピングされている。例えば、図４の４列の１行目の要素はそれぞれ、図１の主記憶部７−１のメモリアドレス０、メモリアドレス１、メモリアドレス２、メモリアドレス３にマッピングされる。すなわち、図４の４列の１行目の要素は全て、図１に示した主記憶部７−１に格納されている。また、図４の４列の他の行の４つの要素についても、図１に示した主記憶部７−２〜７−４にそれぞれ格納されている。すなわち、パディング後の２次元配列の１列目から４列目の同一行内のデータが、図１の主記憶７においてちょうどメモリの読み出し単位で配置される。 In FIG. 4, each element of each column is mapped to the addresses of the main storage units 7-1 to 7-1 to 7-4 shown in FIG. For example, the elements in the first row of the four columns in FIG. 4 are mapped to the memory address 0, the memory address 1, the memory address 2, and the memory address 3 of the main storage unit 7-1 in FIG. That is, all the elements in the first row of the four columns in FIG. 4 are stored in the main storage unit 7-1 shown in FIG. 4 are also stored in the main storage units 7-2 to 7-4 shown in FIG. 1, respectively. That is, the data in the same row of the first column to the fourth column of the two-dimensional array after padding is arranged in the main memory 7 of FIG.

また、例えば、２次元配列の１列目のデータは主記憶部７−１の１番目のメモリアドレス０となり、２列目のデータは主記憶部７−２の１番目のメモリアドレス２０となり、３列目のデータは主記憶部７−３の１番目のメモリアドレス４０となり、４列目のデータは主記憶部７−４の１番目のメモリアドレス６０となっている。すなわち、２次元配列の各列で同一列内に位置する全ての要素が、メモリの読み出し単位内で同一の位置となる。 Further, for example, the data in the first column of the two-dimensional array is the first memory address 0 of the main storage unit 7-1 and the data in the second column is the first memory address 20 of the main storage unit 7-2. The data in the third column is the first memory address 40 of the main storage unit 7-3, and the data in the fourth column is the first memory address 60 of the main storage unit 7-4. That is, all elements positioned in the same column in each column of the two-dimensional array are in the same position in the memory read unit.

図５を用いて、図１に示したベクトル処理装置において、ストライドアクセスするベクトルロードの処理方法を説明する。
図５は、図３に示した２次元配列データのうちで、図４に示した１列目から４列目の要素をベクトルロードした場合における、ベクトルロード方法の処理を説明する図である。このベクトルロードは、データのアクセスは図３の列方向でのアクセスとなることから、ストライドアクセスである。図５では、ある時刻において主記憶部７−１〜７−４から読み出されるデータと、主記憶部７−１〜７−４からの読み出しが完了したときにロードバッファ４（ロードバッファ４−１〜４−４）に格納される内容と、を示している。
例えば、時刻Ｔでは、メモリアドレス０、２０、４０、６０の内容が、主記憶部７−１〜７−４からそれぞれ読み出される。そして、読み出しが完了したときには、ストライドアクセスされた各アドレスの内容が、ロードバッファ４−１〜４−４それぞれに格納される。 A vector load processing method for stride access in the vector processing apparatus shown in FIG. 1 will be described with reference to FIG.
FIG. 5 is a diagram for explaining the processing of the vector loading method when the elements of the first column to the fourth column shown in FIG. 4 are vector loaded from the two-dimensional array data shown in FIG. This vector load is a stride access because data access is access in the column direction of FIG. In FIG. 5, when data read from the main storage units 7-1 to 7-4 at a certain time and reading from the main storage units 7-1 to 7-4 are completed, the load buffer 4 (load buffer 4-1 To 4-4).
For example, at time T, the contents of memory addresses 0, 20, 40, and 60 are read from the main storage units 7-1 to 7-4, respectively. When the reading is completed, the contents of each address subjected to stride access are stored in the load buffers 4-1 to 4-4, respectively.

また、図５では、実行するベクトルロード命令として、第１オペランドに４が指定され、第２オペランドに０番目のベクトルレジスタ２が指定され、第３オペランドに２０(１６０バイト)が指定され、第４オペランドにアドレス０が指定された場合を想定する。また、ベクトルロード命令の発行時のベクトル長は１６であり、ベクトルロード命令では、１６個のデータが読み出される。すなわち、ベクトルロード命令によって、ロードバッファ４−１〜４−４のそれぞれに、アドレスの間隔が２０である要素が１６個ずつ格納される。 In FIG. 5, as a vector load instruction to be executed, 4 is designated as the first operand, the 0th vector register 2 is designated as the second operand, 20 (160 bytes) is designated as the third operand, Assume that address 0 is specified in four operands. Further, the vector length at the time of issuing the vector load instruction is 16, and 16 data are read in the vector load instruction. In other words, 16 elements each having an address interval of 20 are stored in each of the load buffers 4-1 to 4-4 by the vector load instruction.

まず、ベクトル命令を受けたロードバッファ制御部５は、ロードバッファ４を４つ確保する（ロードバッファ４−１〜４−４）。そして、主記憶部７−１〜７−４に対して、主記憶読み出しリクエストをそれぞれ４つずつ送る。ここで、主記憶読み出しリクエストには、主記憶部７−１〜７−４それぞれでの開始アドレスが付加されている。 First, upon receiving a vector instruction, the load buffer control unit 5 secures four load buffers 4 (load buffers 4-1 to 4-4). Then, four main memory read requests are sent to the main memory units 7-1 to 7-4. Here, the start address in each of the main storage units 7-1 to 7-4 is added to the main memory read request.

主記憶部７−１は、メモリアドレス０〜３、メモリアドレス８０〜８３、メモリアドレス１６０〜１６３、メモリアドレス２４０〜２４３の内容を順次読み出して、読み出したデータを主記憶クロスバ６に送る。主記憶部７−２は、メモリアドレス２０〜２３、メモリアドレス１００〜１０３、メモリアドレス１８０〜１８３、メモリアドレス２６０〜２６３の内容を順次読み出して、読み出したデータを主記憶クロスバ６に送る。主記憶部７−３は、メモリアドレス４０〜４３、メモリアドレス１２０〜１２３、メモリアドレス２００〜２０３、メモリアドレス２８０〜２８３の内容を順次読み出して、読み出した主記憶クロスバ６に送る。主記憶部７−４は、メモリアドレス６０〜６３、メモリアドレス１４０〜１４３、メモリアドレス２２０〜２２３、メモリアドレス３００〜３０３の内容を順次読み出して、読み出したデータを主記憶クロスバ６に送る。主記憶クロスバ６は、主記憶部７−１〜７−４から順次送られるデータを、ロードバッファ４−１〜４−４に対して返却する。 The main storage unit 7-1 sequentially reads the contents of the memory addresses 0 to 3, the memory addresses 80 to 83, the memory addresses 160 to 163, and the memory addresses 240 to 243, and sends the read data to the main memory crossbar 6. The main memory 7-2 sequentially reads the contents of the memory addresses 20 to 23, the memory addresses 100 to 103, the memory addresses 180 to 183, and the memory addresses 260 to 263, and sends the read data to the main memory crossbar 6. The main memory 7-3 sequentially reads the contents of the memory addresses 40 to 43, the memory addresses 120 to 123, the memory addresses 200 to 203, and the memory addresses 280 to 283, and sends them to the read main memory crossbar 6. The main memory 7-4 sequentially reads the contents of the memory addresses 60 to 63, the memory addresses 140 to 143, the memory addresses 220 to 223, and the memory addresses 300 to 303, and sends the read data to the main memory crossbar 6. The main storage crossbar 6 returns the data sequentially sent from the main storage units 7-1 to 7-4 to the load buffers 4-1 to 4-4.

ロードバッファ制御部５は、データに付加されたリクエストＩＤと、返却データが何番目に返却されたワードであるかと、に基づいて、ロードバッファテーブル８を引いて、主記憶から返却されたデータをどのロードバッファ４のどこに書き込むかを判断して、ロードバッファ４に書き込む。 The load buffer control unit 5 subtracts the load buffer table 8 based on the request ID added to the data and the number of words in which the return data is returned, and returns the data returned from the main memory. It is determined in which load buffer 4 where to write, and the load buffer 4 is written.

ロードバッファ制御部５は、主記憶７からの読み出し対象となった全てのデータ（６４個のデータ）を、ロードバッファ４−１〜４−４に全て書き込えたと判断した場合には、第２オペランドで指定された０番目から３番目のベクトルレジスタ２に対してデータを転送し、全て転送し終わるとベクトルロード命令を完了する。 When the load buffer control unit 5 determines that all the data (64 pieces of data) to be read from the main memory 7 has been written to the load buffers 4-1 to 4-4, Data is transferred to the 0th to 3rd vector registers 2 designated by the two operands, and when the transfer is completed, the vector load instruction is completed.

以上説明したように、本実施の形態によれば、ストライドアクセスを行う１個のベクトルロード命令によって、必要とするデータのみをメモリの読み出し単位でロードし、本来必要としない不要なアクセスの発生を防止することができる。このため、ストライドアクセス時のメモリからの読み出し回数を必要最低限の回数に抑制することができる。従って、メモリアクセスに要する消費電力を低減することができる。
さらに、本実施の形態によれば、ストライドアクセスのためにキャッシュメモリを使用せずに済むため、キャッシュメモリの容量を小さくすることができる。 As described above, according to the present embodiment, only necessary data is loaded in a memory read unit by one vector load instruction for performing stride access, and unnecessary access that is not necessary is generated. Can be prevented. For this reason, the number of times of reading from the memory at the time of stride access can be suppressed to the minimum necessary number. Therefore, power consumption required for memory access can be reduced.
Furthermore, according to the present embodiment, it is not necessary to use the cache memory for stride access, so the capacity of the cache memory can be reduced.

なお、図３に示した２次元配列に対してパディングを行わなかった場合に、どのような結果となるかを以下に簡単に説明する。
パディングを行わなかった場合、例えば、図３の２次元配列の要素のうちで、１行目の１７列目の要素は、２行目の１列目から３列目までの３要素と合わせて、図１の主記憶部７−１にマッピングされる。また、例えば、２行目の４列目の要素は、２行目の５列目から７列目までの３要素と合わせて、図１の主記憶部７−２にマッピングされる。すなわち、ストライドアクセスの対象となる２行目の１列目から３列目までの３要素と、２行目の４列目の要素とが、それぞれ異なる主記憶部７−１〜７−４にマッピングされる。
このようにマッピングされた場合に、図３の２次元配列の１列目から４列目の要素に対するストライドアクセスを考える。すると、ストライドアクセスの対象となる要素（２行目の１列目から４列目の４要素）が、主記憶部７−１と主記憶部７−２とに分かれているために、例えば、主記憶部７−１に含まれる３要素を読み出すためのベクトルロード命令を実行し、さらに、主記憶部７−２に含まれる１要素を読み出すためのベクトルロード命令を実行する必要がある。しかし、主記憶部７−１に含まれる３要素を読み出すためにベクトルロード命令を実行した場合に、必要なのは３要素のみであり、他の１要素は不要な要素である。また、主記憶部７−２に含まれる３要素を読み出すためにベクトルロード命令を実行した場合に、必要なのは１要素のみであり、他の３要素は不要な要素である。すなわち、アクセス対象となる４要素がちょうど主記憶の読み出し単位に全て含まれていないために、メモリから読み出した複数のワードのうちで、レジスタにロードされるのは１ワードだけであり、その他のワードは使われないことになる。この結果、ストライドアクセス時に本来必要としない不要なアクセスをも含むことになってしまい、メモリアクセスに要する消費電力の増加につながってしまう。 The following is a brief description of what results will be obtained when padding is not performed on the two-dimensional array shown in FIG.
When padding is not performed, for example, among the elements of the two-dimensional array in FIG. 3, the element in the 17th column of the first row is combined with the three elements from the first column to the third column of the second row. , Mapped to the main storage unit 7-1 in FIG. Further, for example, the element in the fourth column on the second row is mapped to the main storage unit 7-2 in FIG. 1 together with the three elements from the fifth column to the seventh column on the second row. That is, the three elements from the first column to the third column on the second row and the elements in the fourth column on the second row are respectively stored in different main storage units 7-1 to 7-4. To be mapped.
Consider the stride access to the elements in the first to fourth columns of the two-dimensional array of FIG. Then, since the elements (four elements in the first column to the fourth column in the second row) subject to stride access are divided into the main storage unit 7-1 and the main storage unit 7-2, for example, It is necessary to execute a vector load instruction for reading out three elements included in the main storage unit 7-1 and to execute a vector load instruction for reading out one element included in the main storage unit 7-2. However, when the vector load instruction is executed to read out the three elements included in the main storage unit 7-1, only three elements are necessary, and the other one element is an unnecessary element. Further, when the vector load instruction is executed to read out the three elements included in the main storage unit 7-2, only one element is necessary, and the other three elements are unnecessary elements. That is, since all four elements to be accessed are not included in the main memory read unit, only one word is loaded into the register among the plurality of words read from the memory. The word will not be used. As a result, unnecessary access that is not originally required during stride access is included, leading to an increase in power consumption required for memory access.

＜その他の実施の形態＞
実施の形態１では、"第１の実施の形態では、ロードバッファ制御部５が、返却データに付加されたリクエストＩＤと、返却データが何番目に返却されたワードであるかと、に基づいて、ロードバッファテーブル８を引いて、どのロードバッファ４のどこに書き込むかを判断する"という例を示したが、本発明はこれに限定されない。 <Other embodiments>
In the first embodiment, “in the first embodiment, the load buffer control unit 5 is based on the request ID added to the return data and the number of the word in which the return data is returned. The example of “determining which load buffer 4 to write to by loading the load buffer table 8” has been shown, but the present invention is not limited to this.

例えば、"まず、ロードバッファ制御部５が、読み出したデータを格納するロードバッファの番号と、そのロードバッファの書き込み位置と、を主記憶読み出しリクエストに対して付加する。主記憶部７−１〜７−４が、リクエストに付加されたこれら２つの情報を読み出したデータに付加して、ロードバッファ制御部５に返却する。さらに、ロードバッファ制御部５が、これら付加された情報に基づいて、ロードバッファ４への書き込みを行う。"という構成を採用することもできる。すなわち、ロードバッファ制御部５は、リクエストの識別情報と、そのリクエストにより主記憶７から読み出されるデータのロードバッファ４での書き込み位置情報と、に基づいて、主記憶７から読み出したデータのロードバッファ４への書き込みを制御する。これにより、ロードバッファ制御部５は、ロードバッファテーブル８を有せずとも、本発明の効果を奏することができる。 For example, “First, the load buffer control unit 5 adds the number of the load buffer for storing the read data and the write position of the load buffer to the main memory read request. 7-4 adds these two pieces of information added to the request to the read data and returns them to the load buffer control unit 5. Furthermore, the load buffer control unit 5 uses the added information, It is also possible to adopt a configuration of “writing to the load buffer 4”. That is, the load buffer control unit 5 loads the data read buffer from the main memory 7 based on the request identification information and the write position information in the load buffer 4 of the data read from the main memory 7 by the request. 4 is controlled. Thus, the load buffer control unit 5 can achieve the effects of the present invention without having the load buffer table 8.

ここで、図６を参照して、本発明の概要を改めて説明する。図６は、本発明にかかるベクトル処理装置の本質的部分のみを抽出して記載したブロック図である。
図に示すように、ベクトル処理装置は、メモリの読み出し単位ごとにそれぞれ異なる記憶部７−１〜７−４に記憶するようにインターリーブされた主記憶７と、主記憶７から読み出されたデータを一時的に格納するロードバッファ４と、主記憶７に対するデータ読み出しリクエストの発行、および主記憶７から読み出されたデータのロードバッファ４への書き込みを制御するロードバッファ制御部５と、ロードバッファ４から転送されるデータを格納するベクトルレジスタ２と、を備えている。 Here, the outline of the present invention will be described again with reference to FIG. FIG. 6 is a block diagram in which only the essential part of the vector processing apparatus according to the present invention is extracted and described.
As shown in the figure, the vector processing apparatus includes a main memory 7 interleaved so as to be stored in different storage units 7-1 to 7-4 for each memory read unit, and data read from the main memory 7. A load buffer 4 that temporarily stores data, a load buffer control unit 5 that controls issuance of a data read request to the main memory 7, and writing of data read from the main memory 7 to the load buffer 4, and a load buffer And a vector register 2 for storing data transferred from 4.

ここで、主記憶７の２次元配列の要素は、メモリのアドレスが連続して格納されない次元方向の全ての要素がメモリの読み出し単位内での位置が同じになるように予め位置が調整されて主記憶７に格納される。ロードバッファ制御部５は、メモリのアドレスが連続して格納されない次元方向へのアクセスを行うベクトルロード命令を受けた場合に、リクエストの識別情報と、そのリクエストにより主記憶７から読み出されるデータのロードバッファ４での書き込み位置情報と、に基づいて、主記憶７から読み出されたデータのロードバッファ４への書き込みを制御する。 Here, the positions of the elements of the two-dimensional array in the main memory 7 are adjusted in advance so that all the elements in the dimension direction in which the memory addresses are not continuously stored have the same position in the reading unit of the memory. Stored in the main memory 7. When the load buffer control unit 5 receives a vector load instruction for performing access in a dimension direction in which memory addresses are not continuously stored, the load buffer control unit 5 loads request identification information and data read from the main memory 7 by the request. Based on the write position information in the buffer 4, the writing of the data read from the main memory 7 to the load buffer 4 is controlled.

本発明は上述した実施の形態のみに限定されるものではなく、既に述べた本発明の要旨を逸脱しない範囲において種々の変更が可能であることは勿論である。 The present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the gist of the present invention already described.

上記の実施の形態の一部又は全部は、以下の付記のようにも記載され得るが、以下には限られない。 A part or all of the above embodiment can be described as in the following supplementary notes, but is not limited thereto.

（付記１）
メモリの読み出し単位ごとにそれぞれ異なる記憶手段に記憶するようにインターリーブされた主記憶装置と、
前記主記憶装置から読み出されたデータを一時的に格納するロードバッファと、
前記主記憶装置に対するデータ読み出しリクエストの発行、および前記主記憶装置から読み出されたデータの前記ロードバッファへの書き込みを制御するロードバッファ制御手段と、
前記ロードバッファから転送されるデータを格納するベクトルレジスタと、を備え、
前記主記憶装置上の２次元配列の要素が、メモリのアドレスが連続して格納されない次元方向の全ての要素が前記メモリの読み出し単位内での位置が同じになるように予め位置が調整されて前記主記憶装置に格納され、
前記ロードバッファ制御手段は、前記メモリのアドレスが連続して格納されない次元方向へのアクセスを行うベクトルロード命令を受けた場合に、前記リクエストの識別情報と、当該リクエストにより前記主記憶装置から読み出されるデータの前記ロードバッファでの書き込み位置情報と、に基づいて、前記主記憶装置から読み出されたデータの前記ロードバッファへの書き込みを制御する、
ベクトル処理装置。 (Appendix 1)
A main storage device interleaved so as to be stored in different storage means for each memory read unit;
A load buffer for temporarily storing data read from the main storage device;
Load buffer control means for controlling issuance of a data read request to the main storage device and writing of data read from the main storage device to the load buffer;
A vector register for storing data transferred from the load buffer,
The positions of the elements of the two-dimensional array on the main storage device are adjusted in advance so that all the elements in the dimension direction in which the addresses of the memory are not continuously stored are the same in the reading unit of the memory. Stored in the main memory,
When the load buffer control unit receives a vector load instruction for performing access in a dimension direction in which the addresses of the memory are not continuously stored, the load buffer control unit is read from the main storage device by the request identification information and the request. Controlling writing of data read from the main storage device to the load buffer based on write position information of the data in the load buffer;
Vector processing device.

（付記２）
前記メモリのアドレスが連続して格納されない次元方向の要素数が、前記メモリの読み出し単位の整数倍となるように要素が予め追加されることで、前記２次元配列の要素の位置が予め調整される、
付記１に記載のベクトル処理装置。 (Appendix 2)
The positions of the elements of the two-dimensional array are adjusted in advance by adding elements in advance so that the number of elements in the dimension direction in which the addresses of the memory are not continuously stored is an integral multiple of the read unit of the memory. The
The vector processing device according to attachment 1.

（付記３）
前記ロードバッファ制御手段は、前記主記憶装置に対するリクエストの識別情報と、当該リクエストにより前記主記憶装置から読み出されるデータの前記ロードバッファでの書き込み位置を示す情報と、を対応付けて保持するロードバッファテーブルを有し、当該ロードバッファテーブルを参照して、前記主記憶装置から読み出されたデータの前記ロードバッファへの書き込みを制御する、
付記１または２に記載のベクトル処理装置。 (Appendix 3)
The load buffer control means is a load buffer that holds, in association with each other, identification information of a request to the main storage device and information indicating a write position in the load buffer of data read from the main storage device by the request Having a table and referring to the load buffer table to control writing of data read from the main storage device to the load buffer;
The vector processing device according to attachment 1 or 2.

（付記４）
前記ロードバッファ制御手段は、前記主記憶装置に対するリクエストの識別情報と、当該リクエストにより前記主記憶装置から読み出されるデータの前記ロードバッファでの書き込み位置を示す情報と、を前記主記憶装置に対して送出し、前記主記憶装置から読み出されて返却されるデータに付加されるこれら情報に基づいて、前記主記憶装置から読み出されたデータの前記ロードバッファへの書き込みを制御する、
付記１または２に記載のベクトル処理装置。 (Appendix 4)
The load buffer control means sends identification information of a request to the main storage device and information indicating a write position in the load buffer of data read from the main storage device by the request to the main storage device. Sending and controlling writing of data read from the main storage device to the load buffer based on the information added to the data read and returned from the main storage device;
The vector processing device according to attachment 1 or 2.

（付記５）
前記主記憶装置がＳＤＲＡＭを用いて構成される、
付記１ないし４のいずれか１項に記載のベクトル処理装置。 (Appendix 5)
The main storage device is configured using SDRAM.
The vector processing device according to any one of appendices 1 to 4.

（付記６）
主記憶装置上の２次元配列の要素が、メモリのアドレスが連続して格納されない次元方向の全ての要素が前記メモリの読み出し単位内での位置が同じになるように予め位置が調整されて前記主記憶装置に格納されるステップと、
前記メモリのアドレスが連続して格納されない次元方向へのアクセスを行うベクトルロード命令を受けた場合に、前記主記憶装置に対するデータ読み出しリクエストの識別情報と、当該リクエストにより前記主記憶装置から読み出されるデータのロードバッファでの書き込み位置情報と、に基づいて、前記主記憶装置から読み出されたデータの前記ロードバッファへの書き込みを制御するステップと、
を備えるベクトル処理装置のベクトル処理方法。 (Appendix 6)
The positions of the elements of the two-dimensional array on the main storage device are adjusted in advance so that all the elements in the dimension direction in which memory addresses are not continuously stored are the same in the reading unit of the memory. Steps stored in main memory;
When receiving a vector load instruction for accessing in a dimension direction in which the addresses of the memory are not continuously stored, identification information of a data read request for the main storage device and data read from the main storage device by the request Controlling the writing of the data read from the main storage device to the load buffer based on the write position information in the load buffer;
A vector processing method of a vector processing apparatus comprising:

（付記７）
前記２次元配列の要素の位置が予め調整されるステップでは、前記メモリのアドレスが連続して格納されない次元方向の要素数が、前記メモリの読み出し単位の整数倍となるように要素が予め追加される、
付記６に記載のベクトル処理装置のベクトル処理方法。 (Appendix 7)
In the step of adjusting the positions of the elements of the two-dimensional array in advance, the elements are added in advance so that the number of elements in the dimension direction in which the addresses of the memory are not continuously stored is an integral multiple of the read unit of the memory. The
The vector processing method of the vector processing apparatus according to attachment 6.

（付記８）
前記ロードバッファへの書き込みを制御するステップは、前記主記憶装置に対するリクエストの識別情報と、当該リクエストにより前記主記憶装置から読み出されるデータの前記ロードバッファでの書き込み位置を示す情報と、を対応付けて保持し、当該保持した情報を参照して、前記主記憶装置から読み出されたデータの前記ロードバッファへの書き込みを制御する、
付記６または７に記載のベクトル処理装置のベクトル処理方法。 (Appendix 8)
The step of controlling the writing to the load buffer associates the identification information of the request to the main storage device with the information indicating the write position in the load buffer of the data read from the main storage device by the request. Control the writing of the data read from the main storage device to the load buffer with reference to the stored information,
The vector processing method of the vector processing apparatus according to appendix 6 or 7.

（付記９）
前記ロードバッファへの書き込みを制御するステップは、前記主記憶装置に対するリクエストの識別情報と、当該リクエストにより前記主記憶装置から読み出されるデータの前記ロードバッファでの書き込み位置を示す情報と、を前記主記憶装置に対して送出し、前記主記憶装置から読み出されて返却されるデータに付加されるこれら情報に基づいて、前記主記憶装置から読み出されたデータの前記ロードバッファへの書き込みを制御する、
付記６または７に記載のベクトル処理装置のベクトル処理方法。 (Appendix 9)
The step of controlling the writing to the load buffer includes the identification information of the request to the main storage device and the information indicating the write position in the load buffer of the data read from the main storage device by the request. Control writing of the data read from the main storage device to the load buffer based on the information sent to the storage device and added to the data read from the main storage device and returned. To
The vector processing method of the vector processing apparatus according to appendix 6 or 7.

（付記１０）
前記主記憶装置がＳＤＲＡＭを用いて構成される、
付記６ないし９のいずれか１項に記載のベクトル処理装置のベクトル処理方法。 (Appendix 10)
The main storage device is configured using SDRAM.
The vector processing method of the vector processing apparatus according to any one of appendices 6 to 9.

１ベクトル演算器
２ベクトルレジスタ
３演算器クロスバ
４ロードバッファ
４−１〜４−４ロードバッファ
５ロードバッファ制御部
６主記憶クロスバ
７主記憶
７−１〜７−４主記憶部
８ロードバッファテーブル
３０１パディングした範囲 1 Vector Operation Unit 2 Vector Register 3 Operation Unit Crossbar 4 Load Buffer 4-1 to 4-4 Load Buffer 5 Load Buffer Control Unit 6 Main Memory Crossbar 7 Main Memory 7-1 to 7-4 Main Storage Unit 8 Load Buffer Table 301 Padding range

Claims

A main storage device interleaved so as to be stored in different storage means for each memory read unit;
A load buffer for temporarily storing data read from the main storage device;
Load buffer control means for controlling issuance of a data read request to the main storage device and writing of data read from the main storage device to the load buffer;
A vector register for storing data transferred from the load buffer,
The positions of the elements of the two-dimensional array on the main storage device are adjusted in advance so that all the elements in the dimension direction in which the addresses of the memory are not continuously stored are the same in the reading unit of the memory. Stored in the main memory,
When the load buffer control unit receives a vector load instruction for performing access in a dimension direction in which the addresses of the memory are not continuously stored, the load buffer control unit is read from the main storage device by the request identification information and the request. Controlling writing of data read from the main storage device to the load buffer based on write position information of the data in the load buffer;
Vector processing device.

The positions of the elements of the two-dimensional array are adjusted in advance by adding elements in advance so that the number of elements in the dimension direction in which the addresses of the memory are not continuously stored is an integral multiple of the read unit of the memory. The
The vector processing apparatus according to claim 1.

The load buffer control means is a load buffer that holds, in association with each other, identification information of a request to the main storage device and information indicating a write position in the load buffer of data read from the main storage device by the request Having a table and referring to the load buffer table to control writing of data read from the main storage device to the load buffer;
The vector processing apparatus according to claim 1.

The load buffer control means sends identification information of a request to the main storage device and information indicating a write position in the load buffer of data read from the main storage device by the request to the main storage device. Sending and controlling writing of data read from the main storage device to the load buffer based on the information added to the data read and returned from the main storage device;
The vector processing apparatus according to claim 1.

The main storage device is configured using SDRAM.
The vector processing apparatus according to claim 1.

The positions of the elements of the two-dimensional array on the main storage device are adjusted in advance so that all the elements in the dimension direction in which memory addresses are not continuously stored are the same in the reading unit of the memory. Steps stored in main memory;
When receiving a vector load instruction for accessing in a dimension direction in which the addresses of the memory are not continuously stored, identification information of a data read request for the main storage device and data read from the main storage device by the request Controlling the writing of the data read from the main storage device to the load buffer based on the write position information in the load buffer;
A vector processing method of a vector processing apparatus comprising:

In the step of adjusting the positions of the elements of the two-dimensional array in advance, the elements are added in advance so that the number of elements in the dimension direction in which the addresses of the memory are not continuously stored is an integral multiple of the read unit of the memory. The
The vector processing method of the vector processing apparatus according to claim 6.

The step of controlling the writing to the load buffer associates the identification information of the request to the main storage device with the information indicating the write position in the load buffer of the data read from the main storage device by the request. Control the writing of the data read from the main storage device to the load buffer with reference to the stored information,
The vector processing method of the vector processing apparatus according to claim 6 or 7.

The step of controlling the writing to the load buffer includes the identification information of the request to the main storage device and the information indicating the write position in the load buffer of the data read from the main storage device by the request. Control writing of the data read from the main storage device to the load buffer based on the information sent to the storage device and added to the data read from the main storage device and returned. To
The vector processing method of the vector processing apparatus according to claim 6 or 7.

The main storage device is configured using SDRAM.
The vector processing method of the vector processing apparatus of any one of Claims 6 thru | or 9.