JP6413605B2

JP6413605B2 - Vector arithmetic device, control method and program, and vector processing device

Info

Publication number: JP6413605B2
Application number: JP2014211277A
Authority: JP
Inventors: 泰洋西垣
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2014-10-16
Filing date: 2014-10-16
Publication date: 2018-10-31
Anticipated expiration: 2034-10-16
Also published as: JP2016081259A

Description

本発明は、ロードバッファを備えるベクトル演算装置に関する。 The present invention relates to a vector operation device including a load buffer.

ベクトル処理装置は、主記憶装置とベクトル演算装置を備え、当該ベクトル演算装置は、主記憶装置からロードしたベクトルデータやベクトル演算中の中間結果を保持するベクトルレジスタと、ベクトルレジスタに保持されたベクトルデータを演算するベクトル演算器とを有する。 The vector processing device includes a main storage device and a vector operation device. The vector operation device includes a vector register for holding vector data loaded from the main storage device and an intermediate result during vector operation, and a vector stored in the vector register. A vector calculator for calculating data.

主記憶装置のアクセス速度は、ベクトル演算の速度に比べて遅く、ベクトルデータのベクトルレジスタへのロードを高速化するために、特許文献１のベクトル演算装置は、主記憶装置とベクトルレジスタとの間にベクトルデータを一時的に格納するロードバッファを備える。 The access speed of the main storage device is slower than the speed of vector operation. In order to increase the load of vector data to the vector register, the vector operation device of Patent Document 1 is provided between the main storage device and the vector register. Includes a load buffer for temporarily storing vector data.

一方、メモリ管理の技術として、特許文献２、３にプリフェッチが開示されている。 On the other hand, as a memory management technique, Patent Documents 2 and 3 disclose prefetch.

特開平２−１０１５７６号公報Japanese Patent Laid-Open No. 2-101576 特許第４５３２９３１号公報Japanese Patent No. 4532931 特開２００２−２９７３７９号公報JP 2002-297379 A

特許文献１のような、ベクトルデータを格納するロードバッファを有するベクトル演算装置は、ロードバッファが枯渇すると、ロードバッファが確保されるまでベクトルロード命令を保留し、ロードバッファが確保されてからベクトルロード命令を発行する。このとき、発行されたベクトルロード命令が、キャッシュメモリに送られてキャッシュミスになると、主記憶装置からキャッシュメモリにベクトルデータを転送する必要が生じる。
これにより、ベクトル演算装置におけるベクトルロード命令の実行時間が長くなり、装置性能が低下する。 A vector arithmetic unit having a load buffer for storing vector data as in Patent Document 1 holds a vector load instruction until a load buffer is secured when the load buffer is depleted, and then loads a vector load after the load buffer is secured. Issue an instruction. At this time, if the issued vector load instruction is sent to the cache memory and a cache miss occurs, it becomes necessary to transfer the vector data from the main storage device to the cache memory.
As a result, the execution time of the vector load instruction in the vector arithmetic device becomes longer, and the device performance is lowered.

また、特許文献２、３には、スカラ演算装置にプリフェッチに適用させる技術の開示はあるが、ベクトル演算装置に適用させるための具体的な開示はない。 Patent Documents 2 and 3 disclose a technique for applying a scalar arithmetic device to prefetching, but do not disclose a specific disclosure for applying to a vector arithmetic device.

このように、ロードバッファを備えるベクトル演算装置において、ロードバッファの枯渇によるベクトルロード命令の実行時間の長期化を解消する具体的な技術が望まれている。 As described above, in a vector arithmetic apparatus having a load buffer, there is a demand for a specific technique that eliminates an increase in the execution time of a vector load instruction due to the exhaustion of the load buffer.

本発明の目的は、ロードバッファを備えるベクトル演算装置において、ベクトルロード命令の実行時間を短縮することが可能な技術を提供することにある。 An object of the present invention is to provide a technique capable of shortening the execution time of a vector load instruction in a vector operation device having a load buffer.

本発明のベクトル演算装置は、ロードバッファを有するベクトル処理部と、ベクトルデータを一時的に保持するキャッシュ部と、ベクトルロード命令を前記キャッシュ部へ通知するメモリアクセス処理部と、を備え、前記メモリアクセス処理部は、前記ロードバッファが使用できない場合、前記ベクトルロード命令の通知を保留し、前記保留されたベクトルロード命令に対応するプリフェッチ命令を生成して通知し、前記キャッシュ部は、前記プリフェッチ命令に応じてベクトルデータを主記憶装置から読み出して前記キャッシュ部に配置する。 The vector operation device of the present invention comprises a vector processing unit having a load buffer, a cache unit for temporarily storing vector data, and a memory access processing unit for notifying the cache unit of a vector load instruction, When the load buffer cannot be used, the access processing unit suspends notification of the vector load instruction, generates and notifies a prefetch instruction corresponding to the suspended vector load instruction, and the cache unit transmits the prefetch instruction. In response to this, the vector data is read from the main memory and placed in the cache unit.

本発明のベクトル演算装置の制御方法は、ロードバッファを有するベクトル処理部と、ベクトルデータを一時的に保持するキャッシュ部を備え、ベクトルロード命令に応じて、主記憶装置から前記ベクトル処理部に前記ベクトルデータを転送する、ベクトル演算装置の制御方法であって、前記ロードバッファが使用できない場合、前記ベクトルロード命令の通知を保留し、前記保留されたベクトルロード命令に対応するプリフェッチ命令を生成して通知し、前記プリフェッチ命令に応じてベクトルデータを前記主記憶装置から読み出して前記キャッシュ部に配置する。 The control method of the vector arithmetic device of the present invention comprises a vector processing unit having a load buffer and a cache unit for temporarily storing vector data, and the main memory unit sends the vector processing unit to the vector processing unit in response to a vector load instruction. A vector arithmetic device control method for transferring vector data, wherein when the load buffer cannot be used, the vector load instruction notification is suspended and a prefetch instruction corresponding to the suspended vector load instruction is generated. In response, the vector data is read from the main memory in response to the prefetch instruction and placed in the cache unit.

本発明のベクトル演算装置の制御プログラムは、ロードバッファを有するベクトル処理部と、ベクトルデータを一時的に保持するキャッシュ部を備え、ベクトルロード命令に応じて、主記憶装置から前記ベクトル処理部に前記ベクトルデータを転送する、ベクトル演算装置の制御方法であって、前記ベクトル演算装置に、前記ロードバッファが使用できない場合、前記ベクトルロード命令の通知を保留し、前記保留されたベクトルロード命令に対応するプリフェッチ命令を生成して通知し、前記プリフェッチ命令に応じてベクトルデータを前記主記憶装置から読み出して前記キャッシュ部に配置する、ことを実行させる。 A control program for a vector operation device according to the present invention includes a vector processing unit having a load buffer and a cache unit for temporarily storing vector data, and the main memory unit transfers the vector processing unit to the vector processing unit according to a vector load instruction. A vector arithmetic device control method for transferring vector data, wherein when the load buffer cannot be used in the vector arithmetic device, the notification of the vector load instruction is suspended, and the vector load instruction is retained. A prefetch instruction is generated and notified, and vector data is read from the main memory in accordance with the prefetch instruction and placed in the cache unit.

本発明のベクトル処理装置は、主記憶装置と、上述のベクトル演算装置とを備える。 The vector processing device of the present invention includes a main storage device and the above-described vector arithmetic device.

本発明は、ロードバッファを備えるベクトル演算装置において、ベクトルロード命令の実行時間を短縮することができる。 The present invention can reduce the execution time of a vector load instruction in a vector operation device including a load buffer.

本発明の第１の実施形態によるベクトル演算装置の構成を示すブロック図である。It is a block diagram which shows the structure of the vector arithmetic unit by the 1st Embodiment of this invention. 第１の実施形態によるベクトル演算装置１のメモリアクセス処理部２の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the memory access process part 2 of the vector arithmetic unit 1 by 1st Embodiment. 本発明の第２の実施形態によるベクトル処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the vector processing apparatus by the 2nd Embodiment of this invention. 本発明の第２の実施形態によるベクトル処理装置の動作を示すシーケンス図である。It is a sequence diagram which shows operation | movement of the vector processing apparatus by the 2nd Embodiment of this invention. 本発明の第３の実施形態によるベクトル処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the vector processing apparatus by the 3rd Embodiment of this invention.

（第１の実施形態）
本発明の第１の実施形態によるベクトル演算装置について図面を参照して説明する。 (First embodiment)
A vector arithmetic device according to a first embodiment of the present invention will be described with reference to the drawings.

図１は、第１の実施形態によるベクトル演算装置の構成を示すブロック図である。図１に示すように、ベクトル演算装置１は、ロードデータを格納するロードバッファ６を有し、ベクトルデータに対するベクトル演算を実行するベクトル処理部５と、主記憶装置（図示せず）からのベクトルデータを一時的に保持するキャッシュ部４を有する。更にベクトル演算装置１は、ベクトルロード命令に応じて、主記憶装置からベクトル処理部５にベクトルデータを転送するプロセッサネットワーク部３と、ベクトルロード命令をプロセッサネットワーク部３へ通知するメモリアクセス処理部２と、を備える。メモリアクセス処理部２は、ロードバッファ６が使用できない場合、ベクトルロード命令の通知を保留し、保留したベクトルロード命令に対応するプリフェッチ命令を生成し、キャッシュ部４１にプリフェッチ命令を通知する。キャッシュ部４１は、プリフェッチ命令に応じてベクトルデータを主記憶装置から読み出してキャッシュ部４に配置する（書き込む）。 FIG. 1 is a block diagram showing the configuration of the vector operation device according to the first embodiment. As shown in FIG. 1, the vector arithmetic unit 1 has a load buffer 6 for storing load data, a vector processing unit 5 for executing vector arithmetic on the vector data, and a vector from a main storage device (not shown). The cache unit 4 temporarily holds data. Further, the vector arithmetic unit 1 includes a processor network unit 3 that transfers vector data from the main storage device to the vector processing unit 5 in response to the vector load instruction, and a memory access processing unit 2 that notifies the processor network unit 3 of the vector load instruction. And comprising. When the load buffer 6 cannot be used, the memory access processing unit 2 suspends notification of the vector load instruction, generates a prefetch instruction corresponding to the suspended vector load instruction, and notifies the cache unit 41 of the prefetch instruction. The cache unit 41 reads vector data from the main storage device according to the prefetch instruction and places (writes) the vector data in the cache unit 4.

なお、ベクトル演算装置、及び、これを含むベクトル処理装置の各構成については、第２の実施形態にてさらに詳しく説明する。 Note that each configuration of the vector operation device and the vector processing device including the vector operation device will be described in more detail in the second embodiment.

次に、第１の実施形態によるベクトル演算装置の動作について図面を用いて説明する。 Next, the operation of the vector arithmetic apparatus according to the first embodiment will be described with reference to the drawings.

図２は、第１の実施形態によるベクトル演算装置のメモリアクセス処理部の動作を示すフローチャートである。図２に示すように、ベクトル演算装置１のメモリアクセス処理部２は、ベクトルロード命令を受付けた後、ベクトル処理部５のロードバッファ６が使用できるか否かを判定する（Ａ１）。 FIG. 2 is a flowchart showing the operation of the memory access processing unit of the vector arithmetic device according to the first embodiment. As shown in FIG. 2, after receiving the vector load instruction, the memory access processing unit 2 of the vector arithmetic unit 1 determines whether or not the load buffer 6 of the vector processing unit 5 can be used (A1).

ベクトル処理部５のロードバッファ６が使用できる場合（Ａ１のＹｅｓ）、メモリアクセス処理部２は、プロセッサネットワーク部３（キャッシュ部４１を含む）にベクトルロード命令を通知する（Ａ５）。 When the load buffer 6 of the vector processing unit 5 can be used (Yes in A1), the memory access processing unit 2 notifies the processor network unit 3 (including the cache unit 41) of the vector load instruction (A5).

ベクトル処理部５のロードバッファ６が使用できない場合（Ａ１のＮｏ）、メモリアクセス処理部２は、プロセッサネットワーク部３（キャッシュ部４１を含む）へのベクトルロード命令の通知を保留（Ａ２）し、ベクトルロード命令によるベクトルデータのベクトル処理部５への転送を保留する。メモリアクセス処理部２は、保留したベクトルロード命令に対応するプリフェッチ命令を生成し、生成したプリフェッチ命令をプロセッサネットワーク部３（キャッシュ部４を含む）に通知する（Ａ３）。 When the load buffer 6 of the vector processing unit 5 cannot be used (No in A1), the memory access processing unit 2 suspends (A2) notification of the vector load instruction to the processor network unit 3 (including the cache unit 41), Transfer of vector data to the vector processing unit 5 by the vector load instruction is suspended. The memory access processing unit 2 generates a prefetch instruction corresponding to the suspended vector load instruction, and notifies the generated prefetch instruction to the processor network unit 3 (including the cache unit 4) (A3).

その後、メモリアクセス処理部２は、ロードバッファ６が使用できる否かを判定し（Ａ４）、ロードバッファ６が使用できる場合（Ａ４のＹｅｓ）、メモリアクセス処理部２は、プロセッサネットワーク部３（キャッシュ部４を含む）にベクトルロード命令を通知する（Ａ５）。プロセッサネットワーク部３（キャッシュ部４を含む）は、プリフェッチ命令に応じて、ベクトルデータを、主記憶装置から読み出してキャッシュ部４に配置する。 Thereafter, the memory access processing unit 2 determines whether or not the load buffer 6 can be used (A4). If the load buffer 6 can be used (Yes in A4), the memory access processing unit 2 uses the processor network unit 3 (cache). The vector load instruction is notified to (including section 4) (A5). The processor network unit 3 (including the cache unit 4) reads vector data from the main storage device and places the vector data in the cache unit 4 in response to the prefetch instruction.

上述のように、第１の実施形態によるベクトル処理装置１によれば、ベクトルロード命令を受けたメモリアクセス処理部２は、ロードバッファ６が使用できない場合、保留したベクトルロード命令に対応するプリフェッチ命令を生成し、生成したプリフェッチ命令をプロセッサネットワーク部３へ通知する。プロセッサネットワーク部３（キャッシュ部４を含む）は、プリフェッチ命令に応じて、主記憶装置からベクトルロード命令に対応するベクトルデータをプロセッサネットワーク部３のキャッシュ部４に配置する。その後、ロードバッファ６が使用可能になった際に、メモリアクセス処理部２から通知されるベクトルロード命令に対して、キャッシュ部４でベクトルデータがキャッシュヒットするため、プロセッサネットワーク部３（キャッシュ部４を含む）は、対応するベクトルデータをベクトル処理部５に速やかに転送することができる。すなわち、ベクトルロード命令の実行時間を短縮することができる。 As described above, according to the vector processing device 1 according to the first embodiment, when the load buffer 6 cannot be used, the memory access processing unit 2 that has received the vector load instruction, the prefetch instruction corresponding to the reserved vector load instruction And the generated prefetch instruction is notified to the processor network unit 3. The processor network unit 3 (including the cache unit 4) arranges vector data corresponding to the vector load instruction from the main storage device in the cache unit 4 of the processor network unit 3 in response to the prefetch instruction. After that, when the load buffer 6 becomes usable, the cache data hits the vector data in response to the vector load instruction notified from the memory access processing unit 2, so that the processor network unit 3 (cache unit 4 The corresponding vector data can be promptly transferred to the vector processing unit 5. That is, the execution time of the vector load instruction can be shortened.

また、第１の実施形態では、保留したベクトルロード命令に対応するプリフェッチ命令を用いている。このため、先行技術文献に記載のような、履歴保持機構やアドレス予測機構を必要としない。また、予測によるプリフェッチではないため、予測が外れて不必要なデータをキャッシュすることもない。 In the first embodiment, a prefetch instruction corresponding to a reserved vector load instruction is used. For this reason, a history holding mechanism and an address prediction mechanism as described in the prior art document are not required. Further, since prefetching is not performed by prediction, unnecessary data is not cached due to prediction failure.

（第２の実施形態）
次に、本発明の第２の実施形態によるベクトル演算装置、及び、ベクトル処理装置について図面を用いて説明する。図３は、第２の実施形態によるベクトル処理装置１００の構成を示すブロック図である。図３に示すように、第２の実施形態のベクトル処理装置１００は、ベクトルベクトル演算装置１０と、主記憶装置７０とを備える。
（ベクトル処理装置１００）
ベクトル演算装置１０と主記憶装置７０は、信号線１０１及び信号線１０２を介して相互に接続されている。信号線１０２は、主記憶装置７０に記憶されたベクトルデータをベクトル演算装置１０に読み出すため、又は、ベクトル演算装置１０で生成したベクトルデータを主記憶装置７０に書き込むために用いられる。 (Second Embodiment)
Next, a vector calculation device and a vector processing device according to a second embodiment of the present invention will be described with reference to the drawings. FIG. 3 is a block diagram showing a configuration of the vector processing apparatus 100 according to the second embodiment. As shown in FIG. 3, the vector processing device 100 according to the second embodiment includes a vector vector computing device 10 and a main storage device 70.
(Vector processing apparatus 100)
The vector arithmetic unit 10 and the main storage device 70 are connected to each other via a signal line 101 and a signal line 102. The signal line 102 is used for reading the vector data stored in the main storage device 70 to the vector operation device 10 or writing the vector data generated by the vector operation device 10 in the main storage device 70.

以下、ベクトル処理装置１００を構成する、ベクトル演算装置１０及び主記憶装置７０について詳細に説明する。 Hereinafter, the vector calculation device 10 and the main storage device 70 constituting the vector processing device 100 will be described in detail.

（ベクトル演算装置１０）
ベクトル演算装置１０は、命令制御部２０と、メモリアクセス処理部３０と、プロセッサネットワーク部４０と、ベクトル制御部５０と、ベクトル処理部６０と、を備える。 (Vector arithmetic unit 10)
The vector arithmetic device 10 includes an instruction control unit 20, a memory access processing unit 30, a processor network unit 40, a vector control unit 50, and a vector processing unit 60.

（命令制御部２０）
命令制御部２０は、主記憶装置７０と信号線１０１で接続され、メモリアクセス処理部３０と信号線１０３で接続され、ベクトル制御部５０と信号線１０４で接続されている。 (Instruction control unit 20)
The instruction control unit 20 is connected to the main storage device 70 through the signal line 101, connected to the memory access processing unit 30 through the signal line 103, and connected to the vector control unit 50 through the signal line 104.

命令制御部２０は、信号線１０１を通じて主記憶装置７０から読み出した命令を解読する。解読された命令がスカラ命令である場合は、そのスカラ命令にかかる処理を実行する。一方、解読された命令がベクトル命令である場合は、そのベクトル命令を、信号線１０４を介してベクトル制御部５０に出力する。さらに、そのベクトル命令がベクトルロード命令である場合は、そのベクトルロード命令を、信号線１０３を介して後述のメモリアクセス処理部３０に出力する。ベクトルロード命令は、ロードするベクトルデータの主記憶装置７０のアドレスを特定する情報（例えば、開始アドレスとベクトルデータの間隔）と、ベクトルデータの要素数と、ベクトルデータをロードするベクトルレジスタに割り振られたベクトルレジスタ番号を含む。 The instruction control unit 20 decodes the instruction read from the main storage device 70 through the signal line 101. If the decoded instruction is a scalar instruction, the processing related to the scalar instruction is executed. On the other hand, when the decoded instruction is a vector instruction, the vector instruction is output to the vector control unit 50 via the signal line 104. Further, when the vector instruction is a vector load instruction, the vector load instruction is output to the memory access processing unit 30 described later via the signal line 103. The vector load instruction is allocated to information specifying the address of the main memory 70 of the vector data to be loaded (for example, the interval between the start address and the vector data), the number of elements of the vector data, and the vector register for loading the vector data. Vector register number.

（メモリアクセス処理部３０）
メモリアクセス処理部３０は、信号線１０３により命令制御部２０と、信号線１０５によりプロセッサネットワーク部４０と、信号線１０６によりベクトル制御部５０と、信号線１０７によりベクトル処理部６０と、それぞれ接続されている。メモリアクセス処理部３０は、命令制御部２０から送られる命令に応じてベクトル処理部６０と主記憶装置７０とのアクセスを制御する。第２の実施形態によるベクトル演算装置１０のメモリアクセス処理部３０は、第１の実施形態によるベクトル演算装置１のメモリアクセス処理部２に相当する。 (Memory access processing unit 30)
The memory access processing unit 30 is connected to the instruction control unit 20 through the signal line 103, the processor network unit 40 through the signal line 105, the vector control unit 50 through the signal line 106, and the vector processing unit 60 through the signal line 107. ing. The memory access processing unit 30 controls access between the vector processing unit 60 and the main storage device 70 in accordance with an instruction sent from the instruction control unit 20. The memory access processing unit 30 of the vector arithmetic device 10 according to the second embodiment corresponds to the memory access processing unit 2 of the vector arithmetic device 1 according to the first embodiment.

メモリアクセス処理部３０は、信号線１０３を通じて命令制御部２０から送られるベクトルロード命令を解読すると共に、プロセッサネットワーク部４０の状態を管理する。また、メモリアクセス処理部３０は、ベクトルロード命令を、信号線１０５を介してプロセッサネットワーク部４０に送る。さらに、メモリアクセス処理部３０は、主記憶装置７０とプロセッサネットワーク４０との間、及び、プロセッサネットワーク部４０とベクトル処理部６０との間のデータの行き来を制御する。 The memory access processing unit 30 decodes the vector load instruction sent from the instruction control unit 20 through the signal line 103 and manages the state of the processor network unit 40. In addition, the memory access processing unit 30 sends a vector load instruction to the processor network unit 40 via the signal line 105. Further, the memory access processing unit 30 controls the data transfer between the main storage device 70 and the processor network 40 and between the processor network unit 40 and the vector processing unit 60.

（ロードバッファ管理部３１）
メモリアクセス処理部３０は、後述するロードバッファ６２の空き管理を行うロードバッファ管理部３１を備える。ロードバッファ管理部３１は、ベクトルロード命令に関して、ベクトル処理部６０のロードバッファ６２における記憶領域の空きを管理する。 (Load buffer management unit 31)
The memory access processing unit 30 includes a load buffer management unit 31 that performs vacancy management of a load buffer 62 described later. The load buffer management unit 31 manages the free space of the storage area in the load buffer 62 of the vector processing unit 60 regarding the vector load instruction.

メモリアクセス処理部３０が、命令制御部２０から信号線１０３を通じてベクトルロード命令を受信すると、ロードバッファ管理部３１は、受信したベクトルロード命令用に、空き状態のロードバッファ６２の記憶領域を割り当てて、それを使用中として管理する。そして、ロードバッファ管理部３１は、割り当てたロードバッファ６２の記憶領域を一意に識別するロードバッファ番号を、信号線１０５を通じてプロセッサネットワーク部４０に発行する。 When the memory access processing unit 30 receives a vector load instruction from the instruction control unit 20 through the signal line 103, the load buffer management unit 31 allocates a storage area of the free load buffer 62 for the received vector load instruction. Manage it as in use. Then, the load buffer management unit 31 issues a load buffer number for uniquely identifying the allocated storage area of the load buffer 62 to the processor network unit 40 through the signal line 105.

また、ロードバッファ管理部３１は、ロードバッファ番号、ベクトルレジスタ番号、要素数を含むベクトルロード命令情報を、信号線１０６を通じてベクトル制御部５０に通知する。該ベクトルロード命令情報は、ベクトル制御部５０を介してベクトル処理部６０に通知される。 Further, the load buffer management unit 31 notifies the vector control unit 50 through the signal line 106 of vector load instruction information including the load buffer number, the vector register number, and the number of elements. The vector load instruction information is notified to the vector processing unit 60 via the vector control unit 50.

なお、ロードバッファ管理部３１は、メモリアクセス処理部３０が信号線１０７を通じて後述するベクトル処理部６０からロードバッファ番号を指定したロードバッファ解放通知を受けると、そのロードバッファ番号に係るロードバッファ６２の記憶領域を再び空き状態として管理する。 When the memory access processing unit 30 receives a load buffer release notification designating a load buffer number from the vector processing unit 60 (to be described later) through the signal line 107, the load buffer management unit 31 stores the load buffer 62 associated with the load buffer number. The storage area is managed again as an empty state.

ロードバッファ管理部３１は、ロードバッファ６２の記憶領域の管理のために、フラグ情報を用いる。ロードバッファ６２の記憶領域に対応してロードバッファ番号が固定的に設定される。このロードバッファ番号に１対１で対応するフラグが、ロードバッファ番号数の分、設定される。ベクトルロード命令で使用中のロードバッファ６２の記憶領域は、対応するロードバッファ番号のフラグが１となる。一方、ベクトル処理部６０から送られてくるロードバッファ解放通知によって解放されるロードバッファ６２の記憶領域は、対応するロードバッファ番号のフラグが０となる。すなわち、フラグがリセットされる。このようにメモリアクセス処理部３０は、ロードバッファ管理部３１のフラグ情報を検索することにより、使用可能なロードバッファ６２の記憶領域の有無を判定し、使用するロードバッファ番号を特定できる。 The load buffer management unit 31 uses flag information for managing the storage area of the load buffer 62. A load buffer number is fixedly set corresponding to the storage area of the load buffer 62. Flags corresponding to the load buffer numbers on a one-to-one basis are set for the number of load buffer numbers. In the storage area of the load buffer 62 being used by the vector load instruction, the flag of the corresponding load buffer number is 1. On the other hand, in the storage area of the load buffer 62 released by the load buffer release notification sent from the vector processing unit 60, the flag of the corresponding load buffer number becomes 0. That is, the flag is reset. As described above, the memory access processing unit 30 searches the flag information of the load buffer management unit 31 to determine the presence / absence of a usable storage area of the load buffer 62 and to specify the load buffer number to be used.

ここで、第２の実施形態によるベクトル演算装置１０のメモリアクセス処理部３０は、ベクトルロード命令を受付けた際にベクトルロード管理部３１を検索し、ロードバッファ６２に使用可能な記憶領域がある場合、後述のプロセッサネットワーク部４０にベクトルロード命令であることを通知する。さらに、メモリアクセス処理部３０は、主記憶装置７０上の所望のベクトルデータを特定する情報、要素数、ロードバッファ番号をプロセッサネットワーク部４０に通知する。また、メモリアクセス処理部３０は、ベクトル制御部５０にロードバッファ番号を通知する。 Here, the memory access processing unit 30 of the vector arithmetic device 10 according to the second embodiment searches the vector load management unit 31 when receiving a vector load instruction, and there is a usable storage area in the load buffer 62. Then, the processor network unit 40 to be described later is notified that it is a vector load instruction. Further, the memory access processing unit 30 notifies the processor network unit 40 of information specifying the desired vector data on the main storage device 70, the number of elements, and the load buffer number. In addition, the memory access processing unit 30 notifies the vector control unit 50 of the load buffer number.

一方、ロードバッファ６２に使用可能な記憶領域がない場合、メモリアクセス処理部３０は、ロードバッファ６２に記憶領域が確保されるまでプロセッサネットワーク部４０へのベクトルロード命令、及び、それに付随する情報の通知を保留する。また、メモリアクセス処理部３０は、ベクトル制御部５０へのロードバッファ番号の通知も保留する。 On the other hand, when there is no usable storage area in the load buffer 62, the memory access processing unit 30 stores the vector load instruction to the processor network unit 40 and the accompanying information until the storage area is secured in the load buffer 62. Hold notifications. The memory access processing unit 30 also suspends notification of the load buffer number to the vector control unit 50.

メモリアクセス処理部３０は、ロードバッファ６２に使用可能な記憶領域がないと判定した際に、後述するプロセッサネットワーク部４０にプリフェッチ命令を通知するとともに、ベクトルロード命令の主記憶装置７０上の所望のベクトルデータを特定する情報、要素数を通知する。 When the memory access processing unit 30 determines that there is no usable storage area in the load buffer 62, the memory access processing unit 30 notifies a prefetch instruction to the processor network unit 40, which will be described later, and also stores a desired vector load instruction on the main storage device 70. Notifies the information specifying the vector data and the number of elements.

そして、ロードバッファ６２に記憶領域が確保された後、メモリアクセス処理部３０は、プロセッサネットワーク部４０にベクトルロード命令、及び、それに付随する情報を通知し、ベクトル制御部５０にロードバッファ番号を通知する。
（プロセッサネットワーク部４０）
プロセッサネットワーク部４０は、主記憶装置７０と信号線１０２で接続され、メモリアクセス処理部３０と信号線１０５で接続され、ベクトル処理部６０と信号線１０８で接続される。プロセッサネットワーク部４０は、主記憶装置７０のデータを一時的に保持するキャッシュ部４１を備える。第２の実施形態によるベクトル演算装置１０のプロセッサネットワーク部４０は、第１の実施形態によるベクトル演算装置１のプロセッサネットワーク部３に相当する。 After the storage area is secured in the load buffer 62, the memory access processing unit 30 notifies the processor network unit 40 of the vector load instruction and information accompanying it, and notifies the vector control unit 50 of the load buffer number. To do.
(Processor network unit 40)
The processor network unit 40 is connected to the main storage device 70 through the signal line 102, connected to the memory access processing unit 30 through the signal line 105, and connected to the vector processing unit 60 through the signal line 108. The processor network unit 40 includes a cache unit 41 that temporarily holds data in the main storage device 70. The processor network unit 40 of the vector arithmetic device 10 according to the second embodiment corresponds to the processor network unit 3 of the vector arithmetic device 1 according to the first embodiment.

プロセッサネットワーク部４０は、メモリアクセス処理部３０から送られてくるベクトルロード命令に応じて、主記憶装置７０とベクトル処理部６０との間で、ベクトルデータを転送する。 The processor network unit 40 transfers vector data between the main storage device 70 and the vector processing unit 60 in response to the vector load instruction sent from the memory access processing unit 30.

プロセッサネットワーク部４０は、メモリアクセス処理部３０から送られるベクトルロード命令を要素毎のベクトルロード命令に分解し、要素毎のベクトルロード命令に情報を付加し後述するベクトル処理部６０に転送する。付加する情報は、ベクトルロード命令の場合、所望のベクトルデータの格納位置、ロードバッファ番号、要素番号であり、プリフェッチ命令の場合、所望のベクトルデータの格納位置である。 The processor network unit 40 decomposes the vector load instruction sent from the memory access processing unit 30 into vector load instructions for each element, adds information to the vector load instruction for each element, and transfers the information to the vector processing unit 60 described later. The information to be added is the storage position, load buffer number, and element number of the desired vector data in the case of a vector load instruction, and the storage position of the desired vector data in the case of a prefetch instruction.

（キャッシュ部４１）
キャッシュ部４１は、命令がベクトルロード命令の場合、主記憶装置７０内の所望のベクトルデータをキャッシュ部４１が保持しているか否かを判定（キャッシュヒット／ミス判定）する。所望のベクトルデータをキャッシュ部４１が保持している場合（キャッシュヒット時）、キャッシュ部４１は、キャッシュ部４１で保持しているベクトルデータをベクトル処理部３０にロードデータとして転送する。一方、キャッシュ部４１が保持していない場合（キャッシュミス時）、キャッシュ部４１は、主記憶装置７０にロード命令を送り、主記憶装置７０から所望のベクトルデータを受け取る。続いて、キャッシュ部４１は、ベクトルデータをキャッシュ部４１へ格納するとともにベクトル処理部６０へロードデータとして転送する。キャッシュ部４１は、ベクトル処理部６０にロードデータを転送する際にロードバッファ番号、要素番号も付加して転送する。 (Cache part 41)
When the instruction is a vector load instruction, the cache unit 41 determines whether or not the cache unit 41 holds desired vector data in the main storage device 70 (cache hit / miss determination). When the cache unit 41 holds desired vector data (when a cache hit occurs), the cache unit 41 transfers the vector data held by the cache unit 41 to the vector processing unit 30 as load data. On the other hand, when the cache unit 41 does not hold (at the time of a cache miss), the cache unit 41 sends a load command to the main storage device 70 and receives desired vector data from the main storage device 70. Subsequently, the cache unit 41 stores the vector data in the cache unit 41 and transfers it as load data to the vector processing unit 60. When transferring the load data to the vector processing unit 60, the cache unit 41 also adds the load buffer number and the element number and transfers them.

また、キャッシュ部４１は、メモリアクセス処理部３０からの命令が、プリフェッチ命令の場合も、主記憶装置７０内のベクトルデータをキャッシュ部４１が保持しているか否かを判定（キャッシュヒット／ミス判定）する。キャッシュ部４１が保持している場合（キャッシュヒット時）、プリフェッチ命令は完了する。キャッシュ部４１が保持していない場合（キャッシュミス時）、キャッシュ部４１は、主記憶装置７０にキャッシュフィル命令を送り、主記憶装置７０から所望のベクトルデータを受け取り、キャッシュ部４１へ格納する。
（ベクトル制御部５０）
ベクトル制御部５０は、命令制御部２０と信号線１０４で接続され、メモリアクセス処理部３０と信号線１０６で接続され、ベクトル処理部６０と信号線１０９で接続される。 Further, the cache unit 41 determines whether or not the cache unit 41 holds the vector data in the main storage device 70 (cache hit / miss determination) even when the instruction from the memory access processing unit 30 is a prefetch instruction. ) When the cache unit 41 holds (at the time of cache hit), the prefetch instruction is completed. When the cache unit 41 does not hold (at the time of a cache miss), the cache unit 41 sends a cache fill command to the main storage device 70, receives desired vector data from the main storage device 70, and stores it in the cache unit 41.
(Vector control unit 50)
The vector control unit 50 is connected to the instruction control unit 20 through the signal line 104, connected to the memory access processing unit 30 through the signal line 106, and connected to the vector processing unit 60 through the signal line 109.

ベクトル制御部５０は、命令制御部２０から送られてくるベクトル命令に応じてベクトル処理部６０で行うベクトル命令を制御する。 The vector control unit 50 controls the vector command performed by the vector processing unit 60 in accordance with the vector command sent from the command control unit 20.

ベクトルロード命令時にはメモリアクセス処理部３０からロードバッファ番号を受け取った後にベクトル処理部６０へベクトルロード命令であることを通知すると同時にロードバッファ番号、要素数、転送先ベクトルレジスタ番号を通知する。
（ベクトル処理部６０）
ベクトル処理部６０は、メモリアクセス処理部３０と信号線１０７で接続され、プロセッサネットワーク部４０と信号線１０８で接続され、ベクトル制御部５０と信号線１０９、１１０で接続される。 In the case of a vector load instruction, after receiving the load buffer number from the memory access processing unit 30, the vector processing unit 60 is notified that it is a vector load instruction, and at the same time, the load buffer number, the number of elements and the transfer destination vector register number are notified.
(Vector processing unit 60)
The vector processing unit 60 is connected to the memory access processing unit 30 through a signal line 107, is connected to the processor network unit 40 through a signal line 108, and is connected to the vector control unit 50 through signal lines 109 and 110.

ベクトル処理部６０は、ベクトルデータに対してベクトル演算を実行する機能を持つ。ベクトル処理部６０は、ベクトルロード管理部６１と、主記憶装置７０から読み出されたベクトルデータを一時的に格納するロードバッファ６２と、ベクトルデータを格納するベクトルレジスタ６３と、ベクトルレジスタ６３に格納されたベクトルデータに対してベクトル演算を行う１つ以上のベクトル演算器（不図示）とを備える。第２の実施形態によるベクトル演算装置１０のベクトル処理部６０、ロードバッファ６２は、それぞれ第１の実施形態によるベクトル演算装置１のベクトル処理部５、ロードバッファ６に相当する。 The vector processing unit 60 has a function of executing vector operations on vector data. The vector processing unit 60 stores the vector load management unit 61, the load buffer 62 that temporarily stores vector data read from the main storage device 70, the vector register 63 that stores vector data, and the vector register 63. And one or more vector calculators (not shown) that perform vector operations on the vector data. The vector processing unit 60 and the load buffer 62 of the vector arithmetic device 10 according to the second embodiment correspond to the vector processing unit 5 and the load buffer 6 of the vector arithmetic device 1 according to the first embodiment, respectively.

ベクトル演算器およびロードバッファ６２から出力されるベクトルデータは、ベクトルレジスタ６３に振り分けられる。なお、ベクトル演算されたベクトルデータを主記憶装置７０にストアするストアバッファなど、ストアに関連する構成は具体的な説明、及び、図示を省略している。 Vector data output from the vector computing unit and load buffer 62 is distributed to the vector register 63. Note that a detailed description and illustration of a configuration related to the store, such as a store buffer that stores vector data obtained by vector calculation in the main storage device 70, is omitted.

また、ベクトル処理部６０は、メモリアクセス処理部３０からベクトルロード命令、及び、ロードバッファ番号を受け取る。
ベクトルロード命令は、メモリアクセス処理部３０が信号線１０３を通じて命令制御部２０から受信する。
（ベクトルロード管理部６１）
ベクトルロード管理部６１は、メモリアクセス処理部３０と信号線１０７で接続され、プロセッサネットワーク部４０と信号線１０８で接続され、ベクトル制御部５０と信号線１０９，１１０で接続される。 Further, the vector processing unit 60 receives a vector load instruction and a load buffer number from the memory access processing unit 30.
The vector load instruction is received from the instruction control unit 20 through the signal line 103 by the memory access processing unit 30.
(Vector load manager 61)
The vector load management unit 61 is connected to the memory access processing unit 30 through a signal line 107, is connected to the processor network unit 40 through a signal line 108, and is connected to the vector control unit 50 through signal lines 109 and 110.

ベクトルロード管理部６１は、信号線１０８を通じて送られてきたベクトルデータの要素を、該要素に付されたロードバッファ番号に係るロードバッファの記憶領域に一旦格納する。そしてその後、ロードバッファ６２の記憶領域に格納されたベクトルデータをベクトルレジスタ６３に転送する。 The vector load management unit 61 temporarily stores the element of the vector data transmitted through the signal line 108 in the storage area of the load buffer associated with the load buffer number assigned to the element. Thereafter, the vector data stored in the storage area of the load buffer 62 is transferred to the vector register 63.

また、ベクトルロード管理部６１は、ロードバッファ管理部３１から、ロードバッファ解放通知を受け、該通知に係るロードバッファ６２の記憶領域を解放（使用中フラグのリセット）する。 Further, the vector load management unit 61 receives a load buffer release notification from the load buffer management unit 31, and releases the storage area of the load buffer 62 related to the notification (resets the busy flag).

ベクトルロード管理部６１は、ベクトル制御部５０から送られたロードバッファ番号、要素数、転送先のベクトルレジスタ番号を含むベクトルロード命令、及び、プロセッサネットワーク部４０から送られたロードバッファ番号、要素番号を含むロードデータに応じてロードバッファ６２内にロードデータが整列したか否かを管理する。ロードデータが整列し、ベクトルレジスタへの転送が可能になった場合にはロードバッファ６２からロードデータを読み出しベクトルレジスタ６３へ転送する。 The vector load management unit 61 includes a load buffer number sent from the vector control unit 50, the number of elements, a vector load instruction including the transfer destination vector register number, and a load buffer number and element number sent from the processor network unit 40. It is managed whether or not the load data is aligned in the load buffer 62 in accordance with the load data including. When the load data is aligned and transfer to the vector register becomes possible, the load data is read from the load buffer 62 and transferred to the vector register 63.

ベクトルロード管理部６１は、ロードバッファ６２からベクトルレジスタ６３へ転送を行うためにロードバッファ６２からロードデータを読み出した後に信号線１０７を通じてメモリアクセス処理部３０にロードバッファ解放通知を送る。 The vector load management unit 61 reads load data from the load buffer 62 in order to perform transfer from the load buffer 62 to the vector register 63, and then sends a load buffer release notification to the memory access processing unit 30 through the signal line 107.

ロードバッファ６２は、ロードデータを格納する記憶領域を備える。ロードバッファ６２の記憶領域は、複数個ありベクトルロード命令ごとに使用するロードバッファ６２の記憶領域を指定するためにロードバッファ番号が付与されている。ロードデータ６２の記憶領域における格納位置は、ロードバッファ番号と要素番号により決定される。 The load buffer 62 includes a storage area for storing load data. There are a plurality of storage areas of the load buffer 62, and a load buffer number is assigned to designate a storage area of the load buffer 62 to be used for each vector load instruction. The storage position of the load data 62 in the storage area is determined by the load buffer number and the element number.

ベクトルレジスタ６３は、ベクトル演算を行う際のデータを格納しておくレジスタである。ベクトルレジスタ６３は、複数個ありベクトル命令ごとに使用するベクトルレジスタを指定するためにベクトルレジスタ番号が付与されている。ベクトル処理部６０内のベクトル演算器は図示していない。
（主記憶装置７０）
主記憶装置７０は、命令制御部２０と信号線１０１で接続され、プロセッサネットワーク部４０と信号線１０２で接続される。主記憶装置７０はプロセッサネットワーク部６０から送られてくるベクトルロードに従ってベクトルデータの転送を行う。 The vector register 63 is a register for storing data used when performing a vector operation. There are a plurality of vector registers 63, and a vector register number is assigned to designate a vector register to be used for each vector instruction. A vector calculator in the vector processing unit 60 is not shown.
(Main storage device 70)
The main storage device 70 is connected to the instruction control unit 20 through a signal line 101 and is connected to the processor network unit 40 through a signal line 102. The main storage device 70 transfers vector data in accordance with the vector load sent from the processor network unit 60.

ロード命令時には所望のベクトルデータを特定する情報を基にベクトルデータを読み出しプロセッサネットワーク部に転送する。キャッシュフィル命令時にも同様に所望のベクトルデータを特定する情報を基にベクトルデータを読み出しプロセッサネットワーク部４１に転送する。 At the time of a load instruction, vector data is read based on information for specifying desired vector data and transferred to the processor network unit. Similarly, at the time of a cache fill command, vector data is read out based on information for specifying desired vector data and transferred to the processor network unit 41.

次に、第２の実施形態のベクトル演算装置１０の動作について図面を用いて説明する。図４は、第２の実施形態によるベクトル演算装置１０の動作を示すシーケンス図である。図中、実線は、第２の実施形態の動作を示し、破線は、関連する比較例の動作を示している。 Next, operation | movement of the vector arithmetic unit 10 of 2nd Embodiment is demonstrated using drawing. FIG. 4 is a sequence diagram illustrating an operation of the vector arithmetic device 10 according to the second embodiment. In the figure, the solid line indicates the operation of the second embodiment, and the broken line indicates the operation of the related comparative example.

はじめに、命令制御部２０は、主記憶装置７０から読み出した命令を解読し、ベクトルロード命令である場合は、そのベクトルロード命令を、メモリアクセス処理部３０に出力する。メモリアクセス処理部３０は、ベクトル処理部６０のロードバッファ６２が使用できるか否かを判定する。ロードバッファ６２が使用できない場合、メモリアクセス処理部３０は、ベクトルロード命令の通知を保留し、ベクトルロード命令に代えてプリフェッチ命令をプロセッサネットワーク部４０に通知する。 First, the instruction control unit 20 decodes the instruction read from the main storage device 70, and outputs the vector load instruction to the memory access processing unit 30 if it is a vector load instruction. The memory access processing unit 30 determines whether or not the load buffer 62 of the vector processing unit 60 can be used. When the load buffer 62 cannot be used, the memory access processing unit 30 suspends the notification of the vector load instruction and notifies the processor network unit 40 of the prefetch instruction instead of the vector load instruction.

プロセッサネットワーク部４０のキャッシュ部４１は、プリフェッチ命令に応じて、ベクトルロード命令に対応するベクトルデータがキャッシュ部４１にあるか否かを判定する。ベクトルデータがキャッシュ部４１に存在する場合、プリフェッチ命令は完了する。ベクトルデータがキャッシュ部４１に存在しない場合、キャッシュ部４１は、主記憶装置からベクトルロード命令に対応するベクトルデータをキャッシュ部４１に配置する。 The cache unit 41 of the processor network unit 40 determines whether the vector data corresponding to the vector load instruction exists in the cache unit 41 according to the prefetch instruction. If the vector data exists in the cache unit 41, the prefetch instruction is completed. When the vector data does not exist in the cache unit 41, the cache unit 41 places vector data corresponding to the vector load instruction from the main storage device in the cache unit 41.

ベクトル処理部６０のベクトルロード管理部６１は、ロードバッファ６２に空きができると、ロードバッファ解放通知をメモリアクセス処理部３０に通知し、メモリアクセス処理部３０は、通知を保留していたベクトルロード命令をプロセッサネットワーク部４０に通知する。 When the load buffer 62 becomes free, the vector load management unit 61 of the vector processing unit 60 notifies the memory access processing unit 30 of a load buffer release notification, and the memory access processing unit 30 suspends the notification from the vector load. The instruction is notified to the processor network unit 40.

プロセッサネットワーク部４０のキャッシュ部４１は、ベクトルロード命令を受けて、ベクトルロード命令に対応するベクトルデータがキャッシュ部４１にあるか否かを判定する。プリフェッチ命令によってキャッシュ部４１に対応するベクトルデータが存在するため、キャッシュ部４１は、ベクトルデータをベクトル処理部６０へ転送する。 The cache unit 41 of the processor network unit 40 receives the vector load instruction and determines whether or not vector data corresponding to the vector load instruction exists in the cache unit 41. Since there is vector data corresponding to the cache unit 41 due to the prefetch instruction, the cache unit 41 transfers the vector data to the vector processing unit 60.

次に、図４に示す比較例の動作について説明する。図４の破線が示すように比較例のメモリアクセス処理部は、ロードバッファが使用できない場合、プリフェッチ命令をプロセッサネットワーク部に通知せず、ロードバッファが解放されるまでベクトルロード命令の通知を保留する。比較例のメモリアクセス処理部は、ロードバッファ解放後、プロセッサネットワーク部にベクトルロード命令を通知する。このときプロセッサネットワーク部のキャッシュ部にベクトルロード命令に対応するベクトルデータが存在しない（キャッシュミス）と、プロセッサネットワーク部は、主記憶装置からベクトルデータをロードすることになる。このとき、比較例は、ベクトルロード命令の実行時間が長くなる。 Next, the operation of the comparative example shown in FIG. 4 will be described. As indicated by the broken line in FIG. 4, when the load buffer cannot be used, the memory access processing unit of the comparative example does not notify the processor network unit of the prefetch instruction, but holds the notification of the vector load instruction until the load buffer is released. . The memory access processing unit of the comparative example notifies the vector load instruction to the processor network unit after releasing the load buffer. At this time, if there is no vector data corresponding to the vector load instruction in the cache unit of the processor network unit (cache miss), the processor network unit loads the vector data from the main memory. At this time, in the comparative example, the execution time of the vector load instruction becomes longer.

結果として、図４に示すように第２の実施形態のベクトル処理装置は、比較例に比べて、ベクトルロード命令の実行時間を短縮することができる。 As a result, as shown in FIG. 4, the vector processing apparatus according to the second embodiment can reduce the execution time of the vector load instruction as compared with the comparative example.

なお、メモリアクセス処理部３０からのプリフェッチ命令後、ベクトルデータがキャッシュ部４１に配置される前に、ロードバッファ６２が開放され、メモリアクセス処理部３０で保留されていたベクトルロード命令が通知される場合がある。この場合、主記憶装置７０への同一アドレスにアクセスする際に、プリフェッチ命令時の主記憶装置７０への同一アドレスへのアクセスデータを用いる。これにより主記憶装置７０へのアクセスを高速化でき、後続のベクトルロード命令の実行時間を短縮することができる。 Note that after the prefetch instruction from the memory access processing unit 30, before the vector data is placed in the cache unit 41, the load buffer 62 is released, and the vector load instruction that has been suspended in the memory access processing unit 30 is notified. There is a case. In this case, when accessing the same address to the main storage device 70, access data to the same address to the main storage device 70 at the time of the prefetch instruction is used. Thereby, the access to the main storage device 70 can be speeded up, and the execution time of the subsequent vector load instruction can be shortened.

上述のように、第２の実施形態によるベクトル処理装置１００によれば、メモリアクセス処理部３０は、ベクトルロード命令を受け付けた際、ロードバッファ６２が使用できない場合にベクトルロード命令の通知を保留する。続いて、メモリアクセス処理部３０は、ベクトルロード命令に対応するプリフェッチ命令を生成し、プリフェッチ命令を発行する。更に、メモリアクセス処理部３０は、ロードバッファ解放通知を受け付け、使用可能なロードバッファ６２が確保された後にベクトルロード命令を発行する。 As described above, according to the vector processing device 100 according to the second embodiment, when the memory access processing unit 30 receives a vector load instruction, the memory access processing unit 30 suspends notification of the vector load instruction when the load buffer 62 cannot be used. . Subsequently, the memory access processing unit 30 generates a prefetch instruction corresponding to the vector load instruction and issues the prefetch instruction. Furthermore, the memory access processing unit 30 receives a load buffer release notification, and issues a vector load instruction after a usable load buffer 62 is secured.

キャッシュ部４１は、プリフェッチ命令を受け付けた後にキャッシュヒット、又は、キャッシュミスを判定し、キャッシュミス時には主記憶装置７０へアクセスしベクトルデータをキャッシュ部４１に保持しておく。また、キャッシュ部４１は、ベクトルロード命令を受け付けた後にキャッシュヒット、又は、キャッシュミスを判定し、キャッシュヒット時にキャッシュからベクトルデータをロードバッファ６２に送る。キャッシュミス時にはキャッシュ部４１は、主記憶装置７０へアクセスしデータを転送しロードバッファ６２へ送る。このようにロードバッファ６２が使用できない場合に保留されたベクトルロード命令について、ロードバッファ６２が使用可能になった際に、プリフェッチ命令でキャッシュ部４１にベクトルデータを保持できた場合には保留されていたベクトルロード命令はキャッシュにヒットする。このため、プロセッサネットワーク部４０は、対応するベクトルデータをベクトル処理部４０に速やかに転送することができる。すなわち、ベクトルロード命令の実行時間を短縮することができる。また、プリフェッチ命令後、ベクトルデータがキャッシュ部４１に配置される前に保留されたベクトルロード命令が通知されると、キャッシュ部４１でキャッシュヒットしない場合がある。このとき、プリフェッチ命令時のアクセスデータを用いて主記憶装置７０にアクセスすることで、主記憶装置７０へのアクセスを高速化し、後続のベクトルロード命令の実行時間を短縮することができる。 The cache unit 41 determines a cache hit or a cache miss after accepting the prefetch instruction. When the cache miss occurs, the cache unit 41 accesses the main storage device 70 and holds the vector data in the cache unit 41. The cache unit 41 determines a cache hit or a cache miss after receiving a vector load instruction, and sends vector data from the cache to the load buffer 62 when the cache hit occurs. When a cache miss occurs, the cache unit 41 accesses the main storage device 70 to transfer data and send it to the load buffer 62. As described above, when the load buffer 62 becomes usable, the vector load instruction held when the load buffer 62 cannot be used is held when the prefetch instruction can hold the vector data in the cache unit 41. The vector load instruction hits the cache. Therefore, the processor network unit 40 can quickly transfer the corresponding vector data to the vector processing unit 40. That is, the execution time of the vector load instruction can be shortened. Further, if a vector load instruction held before vector data is placed in the cache unit 41 is notified after the prefetch instruction, the cache unit 41 may not hit the cache. At this time, by accessing the main storage device 70 using the access data at the time of the prefetch instruction, the access to the main storage device 70 can be speeded up and the execution time of the subsequent vector load instruction can be shortened.

さらに、第２の実施形態は、第１の実施形態と同様に、保留したベクトルロード命令に対応するプリフェッチ命令を用いている。このため、先行技術文献に記載のような、履歴保持機構やアドレス予測機構を必要としない。また予測によるプリフェッチではないため、予測が外れて不必要なデータをキャッシュすることもない。 Furthermore, the second embodiment uses a prefetch instruction corresponding to a reserved vector load instruction, as in the first embodiment. For this reason, a history holding mechanism and an address prediction mechanism as described in the prior art document are not required. Further, since it is not a prefetch by prediction, the prediction is not missed and unnecessary data is not cached.

＜第３の実施形態＞
本発明の第３の実施形態によるベクトル演算装置、及び、ベクトル処理装置について、図面を用いて説明する。図５は、第３の実施形態によるベクトル処理装置の構成を示すブロック図である。第３の実施形態によるベクトル処理装置１００Ａは、ロードバッファ管理部６４の配置が第２の実施形態によるベクトル処理装置１００と相違する。すなわち、第２の実施形態によるベクトル処理装置１００は、ロードバッファ管理部３１をメモリアクセス処理部３０に備えるが、第３の実施形態によるベクトル処理装置１００Ａは、ロードバッファ管理部６４をベクトルベクトル処理部６０Ａに備える。なお、第３の実施形態によるベクトル処理装置１００Ａの構成の説明において、第２の実施形態によるベクトル処理装置１００と同じ構成については、同一の符号を付与し詳細な説明は省略する。 <Third Embodiment>
A vector operation device and a vector processing device according to a third embodiment of the present invention will be described with reference to the drawings. FIG. 5 is a block diagram showing a configuration of a vector processing device according to the third embodiment. The vector processing apparatus 100A according to the third embodiment is different from the vector processing apparatus 100 according to the second embodiment in the arrangement of the load buffer management unit 64. That is, the vector processing device 100 according to the second embodiment includes the load buffer management unit 31 in the memory access processing unit 30, while the vector processing device 100A according to the third embodiment includes the load buffer management unit 64 in the vector vector processing. Provided in part 60A. In the description of the configuration of the vector processing device 100A according to the third embodiment, the same components as those of the vector processing device 100 according to the second embodiment are denoted by the same reference numerals and detailed description thereof is omitted.

ベクトル演算装置１０Ａは、命令制御部２０、メモリアクセス処理部３０Ａ、プロセッサネットワーク部４０、ベクトル制御部５０、ベクトル処理部６０Ａを備える。 The vector arithmetic unit 10A includes an instruction control unit 20, a memory access processing unit 30A, a processor network unit 40, a vector control unit 50, and a vector processing unit 60A.

プロセッサネットワーク部４０は、主記憶装置のベクトルデータを保持するキャッシュ部４１を備える。 The processor network unit 40 includes a cache unit 41 that holds vector data of the main storage device.

ベクトル処理部６０Ａは、ベクトルロード管理部６１、ロードバッファ６２、ベクトルレジスタ６３およびロードバッファ管理部６４を備える。 The vector processing unit 60A includes a vector load management unit 61, a load buffer 62, a vector register 63, and a load buffer management unit 64.

メモリアクセス処理部３０Ａは、命令制御部２０から送られてくる命令に応じてベクトル処理部６０Ａと主記憶装置７０とのアクセスを制御する。ベクトルロード命令時には信号線１０７を通じてベクトルロード処理部６０Ａにロードバッファ番号要求を送り、ベクトル処理部６０Ａよりロードバッファ番号を受け取った後にプロセッサネットワーク部４０へベクトルロード命令を通知する。これとともに主記憶装置７０上の所望のベクトルデータを特定する情報、要素数、ロードバッファ番号を通知する。またベクトル制御部５０にロードバッファ番号を通知する。 The memory access processing unit 30 </ b> A controls access between the vector processing unit 60 </ b> A and the main storage device 70 in accordance with an instruction sent from the instruction control unit 20. At the time of a vector load command, a load buffer number request is sent to the vector load processing unit 60A through the signal line 107, and after receiving the load buffer number from the vector processing unit 60A, the vector load command is notified to the processor network unit 40. At the same time, information specifying the desired vector data on the main storage device 70, the number of elements, and the load buffer number are notified. In addition, the vector control unit 50 is notified of the load buffer number.

第３の実施形態のメモリアクセス処理部３０Ａは、ロードバッファ番号の要求をベクトル処理部６０Ａに送った後、ベクトル処理部６０Ａより後述するプリフェッチ指示を受け取ると、ベクトルロード命令の通知を保留する。更にメモリアクセス処理部３０Ａは、プロセッサネットワーク部４０にプリフェッチ命令を通知するとともに、ベクトルロード命令の主記憶装置７０上の所望のベクトルデータを特定する情報、要素数を通知する。 When the memory access processing unit 30A of the third embodiment sends a load buffer number request to the vector processing unit 60A and then receives a prefetch instruction to be described later from the vector processing unit 60A, the memory access processing unit 30A suspends notification of the vector load instruction. Further, the memory access processing unit 30A notifies the processor network unit 40 of the prefetch instruction and also notifies information specifying the desired vector data on the main memory 70 of the vector load instruction and the number of elements.

ベクトル処理部６０Ａは、ベクトルロード管理部６１と、ロードバッファ６２と、ベクトルレジスタ６３、ロードバッファ管理部６４を含み、メモリアクセス処理部３０Ａから送られてくるロードバッファ要求に応じてベクトルロード管理部６１を検索する。 The vector processing unit 60A includes a vector load management unit 61, a load buffer 62, a vector register 63, and a load buffer management unit 64, and a vector load management unit in response to a load buffer request sent from the memory access processing unit 30A. 61 is searched.

ロードバッファに使用可能な記憶領域がある場合、ベクトル処理部６０Ａは、信号線１０７と通じてロードバッファ番号をメモリアクセス処理部３０Ａに通知する。ロードバッファに使用可能な記憶領域が無い場合、ベクトル処理部６０Ａは、信号線１０７と通じてプリフェッチ指示をメモリアクセス処理部３０Ａに送出する。なお、ベクトル処理部６０Ａは、ロードバッファが確保された後、ロードバッファ番号をメモリアクセス処理部３０Ａに通知する。 When there is a usable storage area in the load buffer, the vector processing unit 60A notifies the memory access processing unit 30A of the load buffer number through the signal line 107. When there is no usable storage area in the load buffer, the vector processing unit 60A sends a prefetch instruction to the memory access processing unit 30A through the signal line 107. The vector processing unit 60A notifies the memory access processing unit 30A of the load buffer number after the load buffer is secured.

ベクトルロード管理部６１は、ベクトル制御部５０から送られたロードバッファ番号、要素数、転送先ベクトルレジスタ番号を含むベクトルロード命令、および、プロセッサネットワーク部４０から送られたロードバッファ番号、要素番号を含むロードデータに応じてロードバッファ６２内にロードデータが整列したか否かを管理する。 The vector load management unit 61 receives the load buffer number and element number sent from the vector control unit 50, the vector load instruction including the transfer destination vector register number, and the load buffer number and element number sent from the processor network unit 40. Whether or not the load data is aligned in the load buffer 62 is managed according to the load data included.

またベクトルロード管理部６１は、プロセッサネットワーク部４０から送られてきたロードデータをロードバッファ６２に送り、ロードデータがロードバッファ６２に格納されているか否かを管理する。ベクトルロード管理部６１は、ロードデータが整列し、ベクトルレジスタ６３への転送が可能になった場合、ロードバッファ６２からロードデータを読み出しベクトルレジスタ６３へ転送する。 The vector load management unit 61 sends the load data sent from the processor network unit 40 to the load buffer 62 and manages whether the load data is stored in the load buffer 62. When the load data is aligned and transfer to the vector register 63 becomes possible, the vector load management unit 61 reads the load data from the load buffer 62 and transfers it to the vector register 63.

ベクトルロード管理部６１は、ロードバッファ６２からベクトルレジスタ６３にロードデータを転送するために、ロードバッファ６２からロードデータを読み出した後にロードバッファ管理部６４にロードバッファ解放通知を送る。 The vector load management unit 61 reads the load data from the load buffer 62 and then sends a load buffer release notification to the load buffer management unit 64 in order to transfer the load data from the load buffer 62 to the vector register 63.

ロードバッファ管理部６４は、ロードバッファ番号に１対１で対応するフラグをロードバッファ番号数の分持つ。ロードバッファ管理部６４は、ベクトルロード命令で使用したロードバッファ番号に対応するフラグを１とし、ベクトルロード管理部６１から送られてくるロードバッファ解放通知により解放するロードバッファ番号に対応するフラグを０とする。 The load buffer management unit 64 has flags corresponding to the load buffer numbers on a one-to-one basis for the number of load buffer numbers. The load buffer management unit 64 sets the flag corresponding to the load buffer number used in the vector load instruction to 1, and sets the flag corresponding to the load buffer number to be released by the load buffer release notification sent from the vector load management unit 61 to 0. And

これによりメモリアクセス処理部３０Ａは、ベクトル処理部６０Ａのロードバッファ管理部６４にあるフラグを検索することで、使用可能なロードバッファの有無を判定し、使用するロードバッファ番号を特定する。 Accordingly, the memory access processing unit 30A searches for a flag in the load buffer management unit 64 of the vector processing unit 60A, thereby determining whether there is a usable load buffer and specifying a load buffer number to be used.

上述のように、第３の実施形態によるベクトル処理装置１００Ａによれば、第２の実施形態と同様に、ベクトルロード命令の実行時間を短縮することができる。 As described above, according to the vector processing device 100A according to the third embodiment, the execution time of the vector load instruction can be shortened as in the second embodiment.

すなわち、メモリアクセス処理部３０Ａは、ベクトルロード命令を受け付けた際、ロードバッファ６２が使用できない場合にベクトルロード命令を保留する。続いて、メモリアクセス処理部３０Ａは、ベクトルロード命令に対応するプリフェッチ命令を生成し、プリフェッチ命令を発行する。更に、メモリアクセス処理部３０は、ロードバッファ解放通知を受け付け、使用可能なロードバッファが確保された後にベクトルロード命令を発行する。 That is, when receiving the vector load instruction, the memory access processing unit 30A holds the vector load instruction when the load buffer 62 cannot be used. Subsequently, the memory access processing unit 30A generates a prefetch instruction corresponding to the vector load instruction and issues the prefetch instruction. Furthermore, the memory access processing unit 30 receives a load buffer release notification, and issues a vector load instruction after a usable load buffer is secured.

キャッシュ部４１は、プリフェッチ命令を受け付けた後にキャッシュヒット、又は、キャッシュミスを判定し、キャッシュミス時には主記憶装置７０へアクセスしベクトルデータをキャッシュ部４１に保持しておく。また、キャッシュ部４１は、ベクトルロード命令を受け付けた後にキャッシュヒット、又は、キャッシュミスを判定し、キャッシュヒット時にキャッシュからデータをロードバッファ６２に送り、ミス時には主記憶装置７０へアクセスしデータを転送しロードバッファ６２へ送る。 The cache unit 41 determines a cache hit or a cache miss after accepting the prefetch instruction. When the cache miss occurs, the cache unit 41 accesses the main storage device 70 and holds the vector data in the cache unit 41. Further, the cache unit 41 determines a cache hit or a cache miss after receiving a vector load instruction, sends data from the cache to the load buffer 62 when a cache hit occurs, and accesses the main storage device 70 to transfer the data when a miss occurs. To the load buffer 62.

このように、ロードバッファ６２が使用できない場合に保留されたベクトルロード命令は、ロードバッファ６２が使用可能になった後、プリフェッチ命令でキャッシュ部４１にベクトルデータが保持された場合に、キャッシュにヒットする。このため、プロセッサネットワーク部４０は、対応するベクトルデータをベクトル処理部４０に速やかに転送することができる。 As described above, the vector load instruction held when the load buffer 62 cannot be used hits the cache when the vector data is held in the cache unit 41 by the prefetch instruction after the load buffer 62 becomes usable. To do. Therefore, the processor network unit 40 can quickly transfer the corresponding vector data to the vector processing unit 40.

また、プリフェッチ命令後、ベクトルデータがキャッシュ部４１に配置される前に保留されたベクトルロード命令が通知されると、キャッシュ部４１でキャッシュヒットしない場合がある。このとき、プリフェッチ命令時のアクセスデータを用いて主記憶装置７０にアクセスすることで、主記憶装置７０へのアクセスを高速化し、後続のベクトルロード命令の実行時間を短縮することができる。さらに、第３の実施形態は、第１の実施形態と同様に、保留したベクトルロード命令に対応するプリフェッチ命令を用いている。このため、先行技術文献に記載のような、履歴保持機構やアドレス予測機構を必要としない。また予測によるプリフェッチではないため、予測が外れて不必要なデータをキャッシュすることもない。 Further, if a vector load instruction held before vector data is placed in the cache unit 41 is notified after the prefetch instruction, the cache unit 41 may not hit the cache. At this time, by accessing the main storage device 70 using the access data at the time of the prefetch instruction, the access to the main storage device 70 can be speeded up and the execution time of the subsequent vector load instruction can be shortened. Furthermore, the third embodiment uses a prefetch instruction corresponding to a reserved vector load instruction, as in the first embodiment. For this reason, a history holding mechanism and an address prediction mechanism as described in the prior art document are not required. Further, since it is not a prefetch by prediction, the prediction is not missed and unnecessary data is not cached.

（その他）
なお、本発明の各機能は、プログラムを組み込んだＬＳＩのハードウエア部品である回路部品を実装することにより、その動作をハードウエア的に実現することができる。またその機能を提供するプログラムを記憶装置（図示せず）に格納し、そのプログラムを主記憶部１００にロードして命令制御部で実行することにより、ソフトウエア的に実現することも可能である。
ベクトル処理装置１００、又は、ベクトル演算装置１、１０が備える各部の実現手段は、特に限定されない。すなわち、物理的に結合した一つの装置により実現されてもよいし、物理的に分離した二つ以上の装置を有線又は無線で接続し、これら複数の装置により実現してもよい。 (Other)
The functions of the present invention can be realized in hardware by mounting circuit components, which are LSI hardware components incorporating a program. It is also possible to realize the program by storing a program for providing the function in a storage device (not shown), loading the program into the main storage unit 100 and executing it by the instruction control unit. .
Means for realizing each unit included in the vector processing apparatus 100 or the vector arithmetic apparatuses 1 and 10 is not particularly limited. That is, it may be realized by one physically coupled device, or two or more physically separated devices may be connected by wire or wirelessly and realized by a plurality of these devices.

以上、実施形態（及び実施例）を参照して本願発明を説明したが、本願発明は上記実施形態（及び実施例）に限定されものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 While the present invention has been described with reference to the embodiments (and examples), the present invention is not limited to the above embodiments (and examples). Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

上記の実施形態の一部又は全部は、以下の付記のように記載されうるが、以下には限られない。 Part or all of the above-described embodiments can be described as in the following supplementary notes, but is not limited thereto.

（付記１）
ロードバッファを有するベクトル処理部と、
ベクトルデータを一時的に保持するキャッシュ部と、
ベクトルロード命令を前記キャッシュ部へ通知するメモリアクセス処理部と、を備え、
前記メモリアクセス処理部は、前記ロードバッファが使用できない場合、前記ベクトルロード命令の通知を保留し、前記保留されたベクトルロード命令に対応するプリフェッチ命令を生成して通知し、
前記キャッシュ部は、前記プリフェッチ命令に応じてベクトルデータを主記憶装置から読み出して前記キャッシュ部に配置する、ベクトル演算装置。 (Appendix 1)
A vector processing unit having a load buffer;
A cache unit for temporarily storing vector data;
A memory access processing unit for notifying the cache unit of a vector load instruction,
The memory access processing unit, when the load buffer is not usable, suspends the notification of the vector load instruction, generates and notifies a prefetch instruction corresponding to the suspended vector load instruction,
The cache unit reads out vector data from a main storage device according to the prefetch instruction and places the vector data in the cache unit.

（付記２）
前記ロードバッファの解放後、前記メモリアクセス処理部は、前記保留されたベクトルロード命令を前記キャッシュ部へ通知する、付記１に記載のベクトル演算装置。 (Appendix 2)
The vector operation device according to appendix 1, wherein after the load buffer is released, the memory access processing unit notifies the cache unit of the reserved vector load instruction.

（付記３）
前記キャッシュ部は、前記プリフェッチ命令に応じて対応するベクトルデータが存在するか否か判定し、前記対応するベクトルデータが存在しない場合、前記主記憶装置から読み出して前記キャッシュ部に配置する、付記１又は付記２に記載のベクトル演算装置。 (Appendix 3)
The cache unit determines whether or not corresponding vector data exists in response to the prefetch instruction. If the corresponding vector data does not exist, the cache unit reads out from the main storage device and arranges the vector data in the cache unit. Alternatively, the vector arithmetic device according to attachment 2.

（付記４）
前記キャッシュ部は、前記プリフェッチ命令に応じて、対応するベクトルデータが存在するか否か判定し、前記対応するベクトルデータが存在する場合、前記プリフェッチ命令を完了する、付記３に記載のベクトル演算装置。 (Appendix 4)
The vector arithmetic unit according to attachment 3, wherein the cache unit determines whether or not corresponding vector data exists in accordance with the prefetch instruction, and completes the prefetch instruction when the corresponding vector data exists. .

（付記５）
前記ロードバッファの使用状態をフラグ情報により管理するロードバッファ管理部を備え、前記メモリアクセス処理部、又は、ベクトル処理部が、前記ロードバッファ管理部を備える、付記１〜付記４のいずれか１つに記載のベクトル演算装置。 (Appendix 5)
Any one of appendix 1 to appendix 4, further comprising a load buffer management unit that manages a use state of the load buffer by flag information, and the memory access processing unit or the vector processing unit includes the load buffer management unit. The vector arithmetic unit described in 1.

（付記６）
前記メモリアクセス処理部は、前記ベクトルロード命令の受付後、前記ロードバッファ管理部を検索して、前記ロードバッファが使用できるかを判定する、付記５に記載のベクトル演算装置。 (Appendix 6)
6. The vector operation device according to appendix 5, wherein the memory access processing unit searches the load buffer management unit after receiving the vector load instruction to determine whether the load buffer can be used.

（付記７）
前記ベクトル処理部は、ロードデータがロードバッファに格納されているか否かを管理するベクトルロード管理部を備え、
前記ベクトルロード管理部は、ロードバッファの解放後にロードバッファ管理部にロードバッファの解放を通知する、付記５又は付記６のいずれか１つに記載のベクトル演算装置。 (Appendix 7)
The vector processing unit includes a vector load management unit that manages whether load data is stored in a load buffer,
The vector operation device according to any one of appendix 5 or appendix 6, wherein the vector load management unit notifies the load buffer management unit of the release of the load buffer after the load buffer is released.

（付記８）
主記憶装置と、付記１から８のいずれか１つに記載のベクトル演算装置とを備える、付記１〜付記７のいずれか１つに記載のベクトル演算装置。 (Appendix 8)
8. The vector operation device according to any one of attachments 1 to 7, comprising a main storage device and the vector operation device according to any one of attachments 1 to 8.

（付記９）
ロードバッファを有するベクトル処理部と、ベクトルデータを一時的に保持するキャッシュ部を備え、ベクトルロード命令に応じて、主記憶装置から前記ベクトル処理部に前記ベクトルデータを転送する、ベクトル演算装置の制御方法であって、
前記ロードバッファが使用できない場合、前記ベクトルロード命令の通知を保留し、前記保留されたベクトルロード命令に対応するプリフェッチ命令を生成して通知し、
前記プリフェッチ命令に応じてベクトルデータを前記主記憶装置から読み出して前記キャッシュ部に配置する、
ベクトル演算装置の制御方法。 (Appendix 9)
Control of a vector arithmetic unit comprising a vector processing unit having a load buffer and a cache unit for temporarily storing vector data, and transferring the vector data from main memory to the vector processing unit in response to a vector load instruction A method,
If the load buffer is unavailable, suspend notification of the vector load instruction, generate and notify a prefetch instruction corresponding to the suspended vector load instruction;
In response to the prefetch instruction, vector data is read from the main storage device and placed in the cache unit.
Control method of vector arithmetic unit.

（付記１０）
前記ロードバッファの解放後、前記保留されたベクトルロード命令を前記キャッシュ部へ通知する、付記９に記載のベクトル演算装置の制御方法。 (Appendix 10)
The control method of the vector arithmetic unit according to appendix 9, wherein after the load buffer is released, the reserved vector load instruction is notified to the cache unit.

（付記１１）
前記プリフェッチ命令に応じて対応するベクトルデータが存在するか否か判定し、前記対応するベクトルデータが存在しない場合、前記主記憶装置から読み出して前記キャッシュ部に配置する、付記９又は付記１０に記載のベクトル演算装置の制御方法。 (Appendix 11)
Addendum 9 or appendix 10, wherein it is determined whether or not corresponding vector data exists according to the prefetch instruction, and when the corresponding vector data does not exist, the corresponding vector data is read from the main storage device and arranged in the cache unit. Control method for the vector computing device.

（付記１２）
前記プリフェッチ命令に応じて対応するベクトルデータが存在するか否か判定し、前記対応するベクトルデータが存在する場合、前記プリフェッチ命令を完了する、付記１１に記載のベクトル演算装置の制御方法。 (Appendix 12)
12. The control method of the vector arithmetic apparatus according to appendix 11, wherein it is determined whether or not corresponding vector data exists according to the prefetch instruction, and the prefetch instruction is completed when the corresponding vector data exists.

（付記１３）
ロードバッファを有するベクトル処理部と、ベクトルデータを一時的に保持するキャッシュ部を備え、ベクトルロード命令に応じて、主記憶装置から前記ベクトル処理部に前記ベクトルデータを転送する、ベクトル演算装置の制御プログラムであって、
前記ベクトル演算装置に、
前記ロードバッファが使用できない場合、前記ベクトルロード命令の通知を保留し、前記保留されたベクトルロード命令に対応するプリフェッチ命令を生成して通知し、
前記プリフェッチ命令に応じてベクトルデータを前記主記憶装置から読み出して前記キャッシュ部に配置する、ことを実行させる、
ベクトル演算装置の制御プログラム。 (Appendix 13)
Control of a vector arithmetic unit comprising a vector processing unit having a load buffer and a cache unit for temporarily storing vector data, and transferring the vector data from main memory to the vector processing unit in response to a vector load instruction A program,
In the vector arithmetic unit,
If the load buffer is unavailable, suspend notification of the vector load instruction, generate and notify a prefetch instruction corresponding to the suspended vector load instruction;
In response to the prefetch instruction, vector data is read from the main storage device and placed in the cache unit.
Control program for vector arithmetic unit.

（付記１４）
前記ロードバッファの解放後、前記保留されたベクトルロード命令を前記キャッシュ部へ通知する、付記１３に記載のベクトル演算装置の制御プログラム。 (Appendix 14)
14. The control program for a vector arithmetic apparatus according to appendix 13, wherein after the load buffer is released, the reserved vector load instruction is notified to the cache unit.

（付記１５）
前記プリフェッチ命令に応じて対応するベクトルデータが存在するか否か判定し、前記対応するベクトルデータが存在しない場合、前記主記憶装置から読み出して前記キャッシュ部に配置する、付記１３又は付記１４に記載のベクトル演算装置の制御プログラム。 (Appendix 15)
Supplementary note 13 or Supplementary note 14, wherein it is determined whether or not corresponding vector data exists in response to the prefetch instruction, and when the corresponding vector data does not exist, the data is read from the main storage device and placed in the cache unit. Control program for vector computing device.

（付記１６）
前記プリフェッチ命令に応じて対応するベクトルデータが存在するか否か判定し、前記対応するベクトルデータが存在する場合、前記プリフェッチ命令を完了する、付記１５に記載のベクトル演算装置の制御プログラム。 (Appendix 16)
16. The control program for a vector arithmetic device according to appendix 15, wherein it is determined whether or not corresponding vector data exists in accordance with the prefetch instruction, and if the corresponding vector data exists, the prefetch instruction is completed.

（付記１７）
ロードデータを格納するロードバッファを有し、ベクトルデータに対するベクトル演算を実行するベクトル処理部と、
主記憶装置からの前記ベクトルデータを一時的に保持するキャッシュ部を有し、ベクトルロード命令に応じて、前記主記憶装置から前記ベクトル処理部に前記ベクトルデータを転送するプロセッサネットワーク部と、
前記ベクトルロード命令を前記プロセッサネットワーク部へ通知するメモリアクセス処理部と、を備え、
前記メモリアクセス処理部は、前記ロードバッファが使用できない場合、前記ベクトルロード命令の通知を保留し、前記保留したベクトルロード命令に対応するプリフェッチ命令を前記プロセッサネットワーク部に通知し、
前記プロセッサネットワーク部は、前記プリフェッチ命令に応じて対応するベクトルデータを、前記主記憶装置から読み出して前記キャッシュ部に配置する、ベクトル演算装置。 (Appendix 17)
A vector processing unit having a load buffer for storing the load data and executing a vector operation on the vector data;
A processor unit for temporarily storing the vector data from the main memory, and transferring the vector data from the main memory to the vector processor in response to a vector load instruction;
A memory access processing unit for notifying the vector load instruction to the processor network unit,
When the load buffer is not usable, the memory access processing unit suspends notification of the vector load instruction, notifies the processor network unit of a prefetch instruction corresponding to the suspended vector load instruction,
The processor network unit is a vector arithmetic unit that reads out corresponding vector data from the main memory in accordance with the prefetch instruction and arranges the vector data in the cache unit.

１ベクトル演算装置
２メモリアクセス処理部
３プロセッサネットワーク部
４キャッシュ部
５ベクトル処理部
６ロードバッファ
１０、１０Ａベクトル演算装置
２０命令制御部
３０、３０Ａメモリアクセス処理部
４０プロセッサネットワーク部
４１キャッシュ部
５０ベクトル制御部
６０ベクトル処理部
６１ベクトルロード管理部
６２ロードバッファ
６３ベクトルレジスタ
６４ロードバッファ管理部
１００、１００Ａベクトル処理装置
１０１信号線
１０２信号線
１０３信号線
１０４信号線
１０５信号線
１０６信号線
１０７信号線
１０８信号線
１０９信号線 DESCRIPTION OF SYMBOLS 1 Vector arithmetic unit 2 Memory access processing part 3 Processor network part 4 Cache part 5 Vector processing part 6 Load buffer 10, 10A Vector arithmetic unit 20 Instruction control part 30, 30A Memory access processing part 40 Processor network part 41 Cache part 50 Vector control Unit 60 Vector processing unit 61 Vector load management unit 62 Load buffer 63 Vector register 64 Load buffer management unit 100, 100A Vector processing device 101 Signal line 102 Signal line 103 Signal line 104 Signal line 105 Signal line 106 Signal line 107 Signal line 108 Signal Line 109 Signal line

Claims

A vector processing unit having a load buffer;
A cache unit for temporarily storing vector data;
A memory access processing unit for notifying the cache unit of a vector load instruction,
The memory access processing unit, when the load buffer is not usable, suspends the notification of the vector load instruction, generates and notifies a prefetch instruction corresponding to the suspended vector load instruction,
The cache unit reads vector data from a main storage device according to the prefetch instruction and places the vector data in the cache unit.
Vector arithmetic unit.

The vector operation device according to claim 1, wherein the memory access processing unit notifies the cache unit of the reserved vector load instruction after releasing the load buffer.

The cache unit determines whether or not corresponding vector data exists according to the prefetch instruction, and when the corresponding vector data does not exist, reads the data from the main storage device and arranges the vector data in the cache unit. 3. The vector arithmetic device according to 1 or 2.

4. The vector arithmetic device according to claim 3, wherein the cache unit determines whether or not corresponding vector data exists according to the prefetch instruction, and completes the prefetch instruction when the corresponding vector data exists. .

The load buffer management unit that manages the use state of the load buffer by flag information, and the memory access processing unit or the vector processing unit includes the load buffer management unit. The vector operation device according to item 1.

The vector operation device according to claim 5, wherein, after receiving the vector load instruction, the memory access processing unit searches the load buffer management unit to determine whether the load buffer can be used.

The vector processing unit includes a vector load management unit that manages whether load data is stored in a load buffer,
The vector operation device according to claim 5, wherein the vector load management unit notifies the load buffer management unit of the release of the load buffer after the load buffer is released.

A main storage device, and a vector arithmetic unit according to claims 1 7, vector processing apparatus.

Control of a vector arithmetic unit comprising a vector processing unit having a load buffer and a cache unit for temporarily storing vector data, and transferring the vector data from main memory to the vector processing unit in response to a vector load instruction A method,
If the load buffer is unavailable, suspend notification of the vector load instruction, generate and notify a prefetch instruction corresponding to the suspended vector load instruction;
In response to the prefetch instruction, vector data is read from the main storage device and placed in the cache unit.
Control method of vector arithmetic unit.

Control of a vector arithmetic unit comprising a vector processing unit having a load buffer and a cache unit for temporarily storing vector data, and transferring the vector data from main memory to the vector processing unit in response to a vector load instruction A program,
In the vector arithmetic unit,
If the load buffer is unavailable, suspend notification of the vector load instruction, generate and notify a prefetch instruction corresponding to the suspended vector load instruction;
In response to the prefetch instruction, vector data is read from the main storage device and placed in the cache unit.
Control program for vector arithmetic unit.