JPS60136874A

JPS60136874A - Vector processor

Info

Publication number: JPS60136874A
Application number: JP24394583A
Authority: JP
Inventors: Yasuhiko Hatakeyama; 畠山　靖彦
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1983-12-26
Filing date: 1983-12-26
Publication date: 1985-07-20
Also published as: JPH0414384B2

Abstract

PURPOSE:To execute a vector fetching/storing instruction at high speed by gathering together requests for plual vector elements which are processed by a data transmission circuit to one request, and transmitting it to a storage controller. CONSTITUTION:An output of a prerequest formation circuit 27-1 is transmitted to a request and additional information formation circuit 27-4, and the formed request and the fetch request additional information are transmitted to passes 15-1 and 15-2 respectively. The request address which is formed by a request address calculation circuit 27-3 is transmitted to a pass 15-3 in synchonization with these. A VR writing control circuit 27-14 receives the fetch request additional information, when it is a request which complexes two 4-byte fetches, forms an advance signal two times, and transmits the signal to a pass 18-1. Then, the control circuit 27-14 transmits the mask value which corresponds to two-time advance signals to a pass 18-2.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明はベクトル処理装置に係り、特に主記憶装置（以
下訟と呼ぶ）と複数のベクトルレジスタ（以下ＶＲと呼
ぶ）を言むベクトルレジスフユニット（以下ＶＲＵと呼
ぶ）間のデータ転送な高速に実行するに好適なベクトル
処理装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Application of the Invention] The present invention relates to a vector processing device, and in particular to a vector register unit including a main memory (hereinafter referred to as a storage device) and a plurality of vector registers (hereinafter referred to as a VR). The present invention relates to a vector processing device suitable for high-speed data transfer between VRUs (hereinafter referred to as VRUs).

[Background of the invention]

第１図に、ベクトル処＠装置の一構成例をボす。本例は
、従来例及び本発明の実施例の読切に共通して用いられ
る。FIG. 1 shows an example of the configuration of a vector processing @ device. This example is commonly used in the one-shot of the conventional example and the embodiment of the present invention.

第１図において、ＭＳｌは８個の独立に動作可能な記憶
装置（以下ボートと呼ぶ）から成り、谷ボートは更に４
個の独立に動作可能な記憶装置（以下バンクと呼ぶ）か
らなる。各ボートは記憶制御装置（以下ＳＣと呼ぶ）２
との間に８本の８バイト巾のＭＳｉＡ出し／書込みパス
７〜１４を持つ。８バイトベクトルフエツチ命令か起動
されると、データ転送回路（以下ＤＴＣと呼ぶ）３はＭ
Ｓｌ上に配置されたベクトルデータの各要素の■アドレ
スを順次計算し、２本あるフェッチパス１５〜１６のう
ちの一本、例えば１５を用いてＳＣ４にリクエストを送
出する。ＳＣ４は、受１’ｌ″取ったリクエストをその
アドレスに従って、ＭＳｌの対応するボート（例えばボ
ート０とする）に対し、ボート対応のＭＳｌ出し／書込
みノくス（例えはボート０に対応する胱出し／書込み）
くスフとする）を用いて転送する。ボート０は、ＳＣ２
から送られたボート内アドレスに従い８ノ（イトデー′
夕をアクセスし、８バイトフエツチに対応するバンクビ
ジータイム（例えば７サイクルとする）後にフェッチデ
ータをＭＳＳ出出／書込みパス７を用いてＳＣ，２に転
送する。ＳＣ２は受け取ったフェッチデータをフェッチ
パス１５を用いてＤＴＣ３に転送する。In FIG. 1, the MSl consists of eight independently operable storage devices (hereinafter referred to as boats), and the valley boats further include four
It consists of several independently operable storage devices (hereinafter referred to as banks). Each boat has a storage controller (hereinafter referred to as SC) 2
There are eight 8-byte wide MSiA output/write paths 7 to 14 between the MSiA and the MSiA. When an 8-byte vector fetch instruction is activated, the data transfer circuit (hereinafter referred to as DTC) 3 transfers M
The address of each element of the vector data arranged on Sl is calculated in sequence, and a request is sent to SC4 using one of the two fetch paths 15 to 16, for example 15. The SC4 sends the received request to the corresponding boat (for example, boat 0) to the MSI output/write address (for example, the cell corresponding to boat 0). output/write)
transfer using kusufutosuru). Boat 0 is SC2
According to the address in the boat sent from
After a bank busy time (for example, 7 cycles) corresponding to an 8-byte fetch, the fetched data is transferred to SC,2 using MSS output/write path 7. The SC2 transfers the received fetch data to the DTC3 using the fetch path 15.

ＤＴＣ５は受け取った８バイトデータをそのまま順次、
ベクトルレジスタ（ＶＲ）書込みパス１８ヲ用いてＶＲ
Ｕ２に転送し、ＶＲＵ２が受け取った８バイトデータを
命令起動時に指定された査号の■の適当な要素査号に書
き込むことにより、１要素分のフェッチリクエスト処理
が終了する。この様な処理をベクトル要素数（ベクトル
長とも呼、ぶ）回だけ順次繰り返すことにより、１つの
８バイトベクトルフエツチ命令の実行か終了する。DTC5 sequentially receives the 8-byte data as is.
VR using vector register (VR) write path 18
By transferring the 8-byte data received by VRU2 to U2 and writing the 8-byte data received by VRU2 into an appropriate element symbol (■) of the symbol specified at the time of instruction activation, the fetch request processing for one element is completed. By sequentially repeating such processing as many times as the number of vector elements (also called vector length), execution of one 8-byte vector fetch instruction is completed.

フェッチパス１６及びＶＲ書き込みパス１９を用いても
同様にして８バイトベクトルフエツチ甜令を実行するこ
とが出来る。An 8-byte vector fetch instruction can be executed in the same manner using the fetch path 16 and the VR write path 19.

８バイトベクトルストア命令が起動されると、ＶＲＵ２
は命令起動時に指定された蕾号のＶＲから侠素番号順に
８バイトデータを、ＶＲ読出しパス２０を用いて、ＤＴ
Ｃ３Ｋ送出する。ＤＴＣ３ハ、Ｖ−ＲＵ２　カ６順次送
られて来るベクトルデータを格納すべきＭＳアドレスを
順次計算し、ＶＲＵ２から受け取ったストアデータと一
緒に、ストアパス１７を用いて８バイトストアリクエス
トとしてＳＣ４に転送する。ＳＣ４はフェチリクエスト
処理の場合と同様に、受は取ったリクエストをそのアド
レスに従って、ＭＳＩの対応するボートに対し、ホード
対応のＭＳ読出し／書き込みパス（例えばボート。When the 8-byte vector store instruction is activated, VRU2
reads the 8-byte data in the order of the master number from the VR of the master number specified at the time of command activation, using the VR read path 20, and reads it to the DT.
Send C3K. DTC3 C, V-RU2 C6 Sequentially calculates the MS address where the sequentially sent vector data should be stored, and transfers it to SC4 as an 8-byte store request using store path 17 along with the store data received from VRU2. do. As in the case of fetish request processing, the SC4 receives the received request according to its address and sends it to the corresponding port of the MSI on the hoard-enabled MS read/write path (e.g., port.

に対応する沁読出し／書き込みパス７とする）を用いて
転送する。ボート０で、ＳＣ２から送られたボート内ア
ドレスに従い８バイトデータの書込みを行なうことによ
り、１要素分のストアリクエスト処理が終了する。この
様な処理をベクトル要素数回だけ順次繰り返すことによ
り、１つの８バイトベクトルストア命令の実行が終了す
る。7). At boat 0, 8-byte data is written according to the address within the boat sent from SC2, thereby completing the store request processing for one element. By sequentially repeating such processing for the number of vector elements, execution of one 8-byte vector store instruction is completed.

ＤＴＣ３ではアドレス計算は１バイトを単位として行な
われ、その下３ｂｉｔを切りすててＳＣに送出されるリ
クエストアドレスが生成される。４バイトベクトル７エ
ツチ／ストア命令においては、上記の切りすてられた３
ｂｉｔのうちの最上位１ビツトが、対応するリクエスト
の上４バイトか上４バイトかを指定する為の付加情報と
して、該リクエストに付加される。４バイトベクトルフ
エツチ命令に８いては、ＳＣ４からＤＴＣ３に転送され
た８バイト７エツテデータから上記付加１’Ｖ　Ｗを用
いて上４バイト又は下４バイトを切り出し、■Ｕ２に送
出するので、バンクビジータイムは、８バイトベクトル
フエツチ命令の場合と同様に、７サイクルとなる。In the DTC3, address calculation is performed in units of 1 byte, and the lower 3 bits are cut off to generate a request address to be sent to the SC. In the 4-byte vector 7 etch/store instruction, the above truncated 3
The most significant 1 bit of the bits is added to the request as additional information for specifying whether it is the upper 4 bytes or the upper 4 bytes of the corresponding request. In the 4-byte vector fetch instruction, the upper 4 bytes or lower 4 bytes are extracted from the 8-byte 7-fetch data transferred from SC4 to DTC3 using the above addition 1'VW and sent to U2. The busy time is 7 cycles as in the case of the 8-byte vector fetch instruction.

４バイトベクトルストア命令においては、８バイト単位
に設けられたＥＣＣコードを作成する為、先ずリクエス
トアドレスに従い８バイトデータを読出し、ＶＲＵ２か
らＶＲ耽出しパス２０、ＤＴＣ５、ストアパス４１７、
ＳＣ４及びＭｓ読出し／書込みパス７を経由して送られ
た４バイトストアテータを、前記の付加情報を用いて、
半分置換し、結果の８バイトデータを前記リクエストア
ドレスで指定されるＭＳ　ｌ、ｊｌ域に書き込む。従っ
て８バイトベオトルストア命令の場合のバンクビジータ
イムが７サイクルであるのに対して、４バイトベクトル
ストア砧令の場合のバンクビジータイムは、約２倍例え
ば１５サイクルとなる。In the 4-byte vector store instruction, in order to create an ECC code provided in 8-byte units, 8-byte data is first read out according to the request address, and the data is read from the VRU 2 to the VR indulgence path 20, DTC 5, store path 417,
Using the above additional information, the 4-byte store data sent via SC4 and Ms read/write path 7 is
Half is replaced and the resulting 8-byte data is written into the MS l, jl area specified by the request address. Therefore, while the bank busy time for an 8-byte vector store instruction is 7 cycles, the bank busy time for a 4-byte vector store instruction is about twice as long, for example, 15 cycles.

使来のベクトル処理装置では、マスク付８バイトベクト
ルストア命令を実行する場合、マスク１直がＩｏｏの要
素に対応する躯ストアを抑止する為に、ＤＴ（ｊにおい
て、マスク値を付加情報としてリクエストに付加する。In a conventional vector processing device, when executing an 8-byte vector store instruction with a mask, the mask value is requested as additional information in DT(j) in order to suppress the physical store whose mask 1 corresponds to the element of Ioo. Add to.

ＭＳｌでは上記付加情報を見て、ストア動作を抑止する
。この方法によると、実行されないストア動作の為に、
７サイクルの間当該バンクがビジーとしてＳＣ４におい
て型埋される。The MSl checks the additional information and suppresses the store operation. According to this method, due to the store operation not being executed,
The bank is busy for 7 cycles and is filled in at SC4.

次に、従来のベクトル処理装置で、アドレス連続の４バ
イトベクトルフエツチ／ストア命令を実行する場合を考
えてみる。一般の４バイトベクトルンエツチ／ストア命
令の実行については既に述べた。ＤＴ（ｊにおいてリク
エストアドレスの計算を行なうが、多くの場合、このア
ドレスはベクトルの先頭アドレス（以下■山と呼ぶ）に
、要素間間隔（以下ＶＬＲと呼ぶ）をｊ＠次加算してめ
られる。この■ＬＲが±４の時、ＭＳ上に当該ベクトル
の各要素が連続して並ぶことにＢ、る。Next, consider a case where a conventional vector processing device executes a 4-byte vector fetch/store instruction with consecutive addresses. Execution of a general 4-byte vector fetch/store instruction has already been described. The request address is calculated at DT (j), but in most cases, this address is determined by adding the inter-element interval (hereinafter referred to as VLR) to the start address of the vector (hereinafter referred to as ■mount) by j@. .When LR is ±4, each element of the vector is consecutively arranged on MS.

この様な場合をアドレス連続の４バイトベクトルフエツ
チ／ストア命令と呼ぶ。Such a case is called a 4-byte vector fetch/store instruction with consecutive addresses.

例えは、■飢の下８ｂｉｔが０であり、Ｖ＝Ｒ”　４０
４バイトベクトルＡ−（Ａ（ｏ）　、　Ａ（１）　、　
＝・−Ａ（ｎ）　）　”ｔフェッチする命令Ｌ　Ａを実行する場合、Ａ（０）とＡ（１）は、ＭＳＩＳ水上
ト０のバンク００８バイトの前半及び恢半に格納されて
いる為に、バンクが競合し、ＳＣ４からＭＳｌへのＡ（
１）のフェッチリクエスト＆’；！、Ａ（０）のフェッ
チリクエストの７サイクル後に送出されることになる。For example, the bottom 8 bits of starvation are 0, and V=R” 40
4-byte vector A-(A(o), A(1),
=・-A(n)) "When executing the fetch instruction LA, A(0) and A(1) are stored in the first half and second half of the bank 008 byte of MSIS water table 0. , there is a bank conflict and A( from SC4 to MSl
1) Fetch request &';! , A(0) will be sent 7 cycles after the fetch request.

Ａ（２）の７エツチリクエストはＡ（１）の７エツチリ
クエストの次のサイクルで送出され、以下同様に、２要
素に１回の割合でバンクの競合が発生し、本命令のリク
エスト処理のスループットは、同時に実行されている他
のフェッチ／ストア命令とのバンク競合を無視しても、
平均２リクエスト／８サイクルとなり、アドレス連続で
ない場合の４バイトベクトルフエツチ命令、８バイトベ
クトルフエツチ命令等の処理スループットに対し、約１
／４となる。この様子を第６図のタイムチャートに示す
。The 7-etch request of A(2) is sent in the next cycle of the 7-etch request of A(1), and similarly, bank conflicts occur once every two elements, and the request processing of this instruction is delayed. Throughput ignores bank conflicts with other concurrently executing fetch/store instructions.
The average is 2 requests/8 cycles, and the processing throughput for 4-byte vector fetch instructions, 8-byte vector fetch instructions, etc. when addresses are not consecutive is approximately 1
/4. This situation is shown in the time chart of FIG.

＋１ｆｆＪ　４ｆｊ　Ｋ　Ｌ−Ｃ１ＶＡＲ（７）　下８
ｂｉｔが１０１　テアリ、ＶＪ−４の４バイトペクトｋ
　Ｃ＝　（Ｃ（ｏ）　、　Ｃ（１）　、　・−−−−−
Ｃ（ｎ）　）にストアする命令ＳＴ　Ｃを実行する場合は、第４図に示す様に、リクエスト処理
は２リクエスト／１６サイクルとなり、スループットは
約１／８となる。+1ffJ 4fj K L-C1VAR (7) Lower 8
Bit is 101 Teari, VJ-4 4-byte pect k
C= (C(o), C(1), ・------
When executing the instruction ST C to store in C(n) ), as shown in FIG. 4, the request processing is 2 requests/16 cycles, and the throughput is approximately 1/8.

第２図は、８バイトベクトルＡ−（Ａ（０ン、Ａ（１）
・・・−＝　Ａ（ｎ）　）　、同じく８バイトペクト＃
　Ｂ　−（Ｂ（ｏｌ　、　Ｂ（１）・・・・・・Ｂ（ｎ
））をそれぞれロードする２つのベクトルロード６４と
、同じく８バイトベクトルＣ−（Ｃ（０）　、　Ｃ（１
）　、・・・・・・Ｃ（ｎ））をストアする１つのベク
トルストア命令とが同時に実行され、最高のスルーブツ
トを実現している場合のタイムチャートを示す。Figure 2 shows the 8-byte vector A-(A(0, A(1)
...-= A(n) ), also an 8-byte pect #
B - (B(ol, B(1)...B(n
)), respectively, and the same 8-byte vectors C-(C(0), C(1
), . . . C(n)) are executed simultaneously to achieve the highest throughput.

[Purpose of the invention]

本発明の目的は、複数のリクエストの処理をまとめて行
なうことにより、主記憶装置のメモリバンクの競合を避
け、ベクトルフェッチ／ストア命令を高速に実行するこ
とが可能なベクトル処理装置を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide a vector processing device that can execute vector fetch/store instructions at high speed by processing multiple requests at once, avoiding contention in memory banks of a main storage device. It is in.

[Summary of the invention]

本発明では、主記憶装置と、ベクトルロータ格納用の複
数個のベクトルレジスタとｎ！ｊ　Ｓ己主糺１憶装置と
前記ベクトルレジスタ間のデータ転送を司るデータ転送
回路と、当該データ転送回路からの主記憶アクセス安来
を前記主記憶装置に転送し、フェッチリクエストに対し
ては、＃Ｉｊ　ＭＣ主記憶装置から読み出されたフエチ
データをｉｌＪ記データ転送回路に転送し、ストアリク
エストに均しては、前記データ転送回路から受け取った
ストアデータを前ｄ己主記憶装置に転送するｄ己憚制［
＋装置とを有するベクトル処理装置において、前記デー
タ転送回路で処理する複数のベクトル要素に対するリク
エストを１つにまとめて前記記憶制御装置に転送し、当
該配置惹制側１裳１直及び前記主記憶装置においては当
該リクエストを単一のリクエストとして処理し、フェッ
チリクエストに対しては、前記データ転送回路において
フェッチデータを複数のリクエスト対応のデータに分割
して順次前記ベクトルレジスタに転送し、ストアリクエ
ストに対しては、前日己ベクトルレジスタから転送され
る複数のストアデータな前記データ転送回路において１
つのリクエストに付加して前記記憶制御装置に転送し、
前記主記憶装置に一度に書き込むことを特徴とする。In the present invention, a main memory device, a plurality of vector registers for storing vector rotors, and n! A data transfer circuit that controls data transfer between the storage device and the vector register, and a main memory access from the data transfer circuit to the main memory device, and in response to a fetch request, # Ij Transfers the fetish data read from the MC main memory to the data transfer circuit, and in response to a store request, transfers the store data received from the data transfer circuit to the main memory. Refrain [
In the vector processing device, requests for a plurality of vector elements to be processed by the data transfer circuit are combined into one and transferred to the storage control device, and The device processes the request as a single request, and in response to a fetch request, the data transfer circuit divides the fetch data into data corresponding to multiple requests and sequentially transfers them to the vector register. In contrast, in the data transfer circuit, a plurality of store data are transferred from the previous vector register.
1 request and transfer it to the storage control device,
It is characterized in that the information is written to the main memory at one time.

[Embodiments of the invention]

以下、本発明の一実施例を、図面を用いて説明する。 An embodiment of the present invention will be described below with reference to the drawings.

従来力の説明で述べた様に、ＤＴＣ３では、２つのベク
トルフェッチ命令と１つのベクトルストア命令を同時に
処理することが出来る。ＤＴＣ３の内部構造及びＳＣ４
、ＶＲＵ２とのバスの詳細を第５図に示す。第５図に示
される様に、ＤＴＣ３は２つのフェッチ専用データ転送
回路（以下ＦＤＴＣと叶ぷ）　２７．２８及びストア専
用データ転送回路（以下５ＤＴＣと呼ぶ）２９から成る
。Ｂ”ＤＴＣ２７とＳＣ４の間のフェッチバス１５は、
ＦＤＴＣ２７からＳＣ４へのリクエストバス１５−１．
フェッチリクエスト（”ｌ’　ｊＪＤ　１＋’ｒ報ハス
１５−２、フェッチアドレスバス１５−３、ＳＣ４から
ＦＤＴＣ２７へのアドバンスバス１５−４、フェッチリ
クエスト付加情報バス１５−５、フェッチデータ１５−
６からなる。ＦＤＴＣ２７からＶＲＵ２へのＶＲ書き込
みバス１８はアドバンスバス１８−１、ライトイネーブ
ルバス１Ｂ−２、データバス１８−３からなる。ＦＤＴ
Ｃ２ａ　、フェッチバス１６、’ＶＲ丼込みバス１９に
ついても同様である。Ｓ］、ｌｉ’ｃ２９とＳＣ４との
間のストアバス１７は、リクエストパス１７−１、スト
アリクエスト付加情報バス１７−２、ストアアドレスバ
ス１７−３、ストアデータハス１７−４から成り、５Ｄ
ＴＣ２９とＶＲ［Ｊ２との間のＶＲ耽出しバスはヴアリ
ッドパス２０−１とストアデータバス２Ｏ−２１ｊ・ら
成る。As described in the explanation of conventional capabilities, the DTC3 can simultaneously process two vector fetch instructions and one vector store instruction. Internal structure of DTC3 and SC4
, and VRU2 are shown in detail in FIG. As shown in FIG. 5, the DTC 3 consists of two fetch-only data transfer circuits (hereinafter referred to as FDTC) 27 and 28 and a store-only data transfer circuit (hereinafter referred to as 5DTC) 29. B” The fetch bus 15 between the DTC 27 and the SC 4 is
Request bus 15-1 from FDTC27 to SC4.
Fetch request ("l' jJD 1+'r information bus 15-2, fetch address bus 15-3, advance bus 15-4 from SC4 to FDTC 27, fetch request additional information bus 15-5, fetch data 15-
Consists of 6. The VR write bus 18 from the FDTC 27 to the VRU 2 includes an advance bus 18-1, a write enable bus 1B-2, and a data bus 18-3. FDT
The same applies to C2a, the fetch bus 16, and the VR bowl bus 19. The store bus 17 between the li'c 29 and the SC 4 consists of a request path 17-1, a store request additional information bus 17-2, a store address bus 17-3, and a store data bus 17-4.
The VR entertainment bus between TC29 and VR[J2 consists of valid path 20-1 and store data bus 2O-21j.

第７図はＦＤＴＣ２７の内部構造を２ハす図であり、以
下これを用いてＦｌ）ＴＣ２７の動作を説明１〜る。FIG. 7 is a 2-square diagram of the internal structure of the FDTC 27, and hereinafter, the operation of the Fl) TC 27 will be explained using this diagram.

ブレリクエスト生成回路２７−１の出力は、リクエスト
及び付加情報生成回路２７−４に送出されると同時に、
マスク読出し回路２７−２及びリクエストアドレス計算
回＃６２７−３に入力される。The output of the blur request generation circuit 27-1 is sent to the request and additional information generation circuit 27-4, and at the same time,
It is input to the mask readout circuit 27-2 and the request address calculation circuit #627-3.

リクエスト及び付加情報生成回路２７−４では、マスク
（ｉ［２７−９及び８バイト内アドレス２７−１０を用
いて、リクエスト久ぴ第６図（１）に示すフェッチリク
エスト付加情報を生成し、リクエストはＦ、Ｆ、２７−
５を経てリクエストパス１５−１に送出される。この７
エツテリクエスト付加情報は、Ｈ１Ｍ　、Ｖｏ　、　Ｖ
＋　、Ｃｏ　、Ｃ＋　、Ｉから成る。１−ｉ−０，１の
時、それぞれ８バイト、４バイ）？令、Ｍ　−Ｏｒ　１
の時、それぞれ単一リクエスト、複合リクエスト、ｖｏ
は８バイトの前半４バイトが有効、Ｖｌは８バイトの後
半４バイトが有効、Ｃ０は■。に対応するマスク値、Ｃ
８はｖｌに対応するマスク値、■はＭ−１の時ＶＬＲ−
７４であることをそれぞれ示す。このフェッチリクエス
ト付加情報はＦ、Ｆ、　柱２７−６を経てフェッチリク
エスト付加情報バス１５−２に送出される。これらと同
期して、リクエストアドレス計算回路２７−６で生成さ
れたリクエストアドレスは、Ｆ、Ｆ、＃Ｐ２７−７を経
てアドレスバス１５−６に送出される。この間のタイム
チャート例を第８図に示し、その時のＭＳ　０）　＝ｈ
バンクビジータイムチャートを第９図に示す。第９図は
アドレス連続の４バイトベクトルＡ＝　（Ａ（ｏ＋　、
　Ａ（１）。The request and additional information generation circuit 27-4 generates the fetch request additional information shown in FIG. is F, F, 27-
5 and is sent to the request path 15-1. This 7
Etsute request additional information is H1M, Vo, V
+, Co, C+, and I. 1-i-0,1, 8 bytes and 4 bytes respectively)? Order, M-Or 1
When , single request, compound request, vo
For Vl, the first 4 bytes of 8 bytes are valid, for Vl, the latter 4 bytes of 8 bytes are valid, and for C0, ■. The mask value corresponding to C
8 is the mask value corresponding to vl, ■ is VLR- when M-1
74 respectively. This fetch request additional information is sent to the fetch request additional information bus 15-2 via F, F and pillars 27-6. In synchronization with these, the request address generated by the request address calculation circuit 27-6 is sent to the address bus 15-6 via F, F, #P27-7. An example of the time chart during this period is shown in Fig. 8, and the MS at that time is 0) = h
A bank busy time chart is shown in FIG. Figure 9 shows a 4-byte vector of consecutive addresses A= (A(o+,
A(1).

・・・・・・Ａｌｎ）　）をロードするベクトルロード
命令の場合を示す。The case of a vector load instruction that loads . . . Aln) ) is shown.

ＶＲ４込み匍１側１回路２７−１４は、フェッチリクエ
スト付加情報を、バス１５−５、Ｆ、Ｆ、群２７−１２
を経て受け取り、それが２つの４バイトフエツチを複合
したリクエストであった場合、アドバンスバス号を２回
生成し、バス１８−１に送出する。VR4 included side 1 circuit 27-14 sends the fetch request additional information to bus 15-5, F, F, group 27-12.
If the request is a combination of two 4-byte fetches, the advance bus number is generated twice and sent to the bus 18-1.

この時それら２回のアドバンスに別名するマスク値をバ
ス１８−２に送出する。データバス１ｂ−６、Ｆ、Ｆ、
群２７−１３を経て送られたフェッチデータ８バイトは
、セレクト回路２７−１６で制Ｙｄ４１侶ｇ２７−１５
によって４バイトずつ切り出されデータバス　−１８−
５に送出される。At this time, mask values giving aliases to these two advances are sent to bus 18-2. Data bus 1b-6, F, F,
The 8 bytes of fetch data sent via the group 27-13 are controlled by the select circuit 27-16.
Data bus -18-
Sent on 5th.

第８図の場合に対応して、この間のタイムチヤードを第
１０図に示す。第１０図を見ると判る様に、第０要素及
び第１要素を複合したリクエストに対するアドバンス２
７−１１を受け取ると、その事をフェッチリクエスト付
加情報２７−１２の内のＨピット及びＭビットで知り、
ＶＲＵへのアドバンス１８−１を２発送出する。その時
のライトイネーブル１８−２はフェッチリクエスト付加
情報２７−１２の内のＣ８＋Ｃ４ビツトから生成する。Corresponding to the case of FIG. 8, the time chart during this period is shown in FIG. As can be seen from Figure 10, Advance 2 for a request that combines the 0th element and the 1st element
When it receives 7-11, it knows this from the H pit and M bit in the fetch request additional information 27-12,
Send 2 advance 18-1s to VRU. The write enable 18-2 at this time is generated from bits C8+C4 of the fetch request additional information 27-12.

本例においては、ＶＬＲ−４の場合、すなわち４バイト
データが訟上でアドレス順方向に連続してい・るので、
ｃｏが第０要素のライトイネーブルとなり、ＣＩが第１
賛累のライトイネーブルとなる。In this example, in the case of VLR-4, 4-byte data is sequential in the forward address direction, so
co becomes the write enable for the 0th element, and CI becomes the write enable for the 1st element.
It becomes a write enable for the support.

Ｖ＝Ｒ−−４の場合、すなわち４バイトデータが歪上で
アドレス逆方向に連続している場合には、フェッチリク
エスト付加情報２７−１２の内のＩビットが１１１とな
り、０１ビツトが第ｏ要素のライトイネーブルとなり、
ｃｏビットが第１要素のライトイネーブルとなる。セレ
クト回路２７−１６においても同様の事が言える。In the case of V=R--4, that is, when the 4-byte data is consecutive in the opposite address direction on the distortion, the I bit in the fetch request additional information 27-12 becomes 111, and the 01 bit becomes the o-th bit. Enables the element to be written,
The co bit becomes the write enable for the first element. The same can be said of the select circuit 27-16.

本例の様に、４バイトアドレス順連続の場合は、訟から
転送された８バイトフェーツチデータ２７−１３の上位
４バイトがまず切り出されて、第０要素のデータとして
データバス１８−６に送出される。データバス１８−３
の下４バイトには１０′が入る。次サイクルでは７エツ
チデータ２７−１３の下位４バイトか切り出され、第１
要素のデータとしてデータバス１８−６の上４バイトに
送出され、下４バイトには“０１が入る。ＶＬＲ＝−４
の場合は、逆に７エツチテータ２７−Ｌ５の下４バイト
が第Ｏ要素のデータとして切り出され、次サイクルで上
４バイトが第１要素のデータとして切り出される。ＶＲ
Ｕ２においては、アドバンス１８−１によってＶＲの書
込みホインタの更新を行ない、ライトイネーブル１８−
２によってＶＲ書込みデータ１８−５の書込み可否を決
める。In the case of consecutive 4-byte address order as in this example, the upper 4 bytes of the 8-byte fetch data 27-13 transferred from the address are first cut out and sent to the data bus 18-6 as the 0th element data. Sent out. Data bus 18-3
10' is entered in the lower 4 bytes. In the next cycle, the lower 4 bytes of 7-etch data 27-13 are cut out, and the first
It is sent to the upper 4 bytes of the data bus 18-6 as element data, and "01" is entered in the lower 4 bytes.VLR=-4
In this case, conversely, the lower 4 bytes of the 7 etcher 27-L5 are cut out as data for the O-th element, and in the next cycle, the upper 4 bytes are cut out as data for the 1st element. VR
In U2, the advance 18-1 updates the VR write pointer, and the write enable 18-
2 determines whether or not the VR write data 18-5 can be written.

倍精度命令においては、フェッチリクエスト付加情報２
７−１２の内のＨ，Ｍ、Ｉの各ビットは常に０°となつ
、騙−ｖ、　、　ｃｏ−ｃ、となる。また、フェッチデ
ータ８バイト２７−１５はそのままＶＲ書込みデータ８
バイト１８−６としてＶＲＵ２に送出される。For double-precision instructions, fetch request additional information 2
Each bit of H, M, and I in 7-12 is always 0°, so that the equations are -v, , co-c. Also, fetch data 8 bytes 27-15 are VR write data 8 as is.
Sent to VRU2 as byte 18-6.

次に、第１１図を用いて、５ＤＴＣ２９の動作を説明す
る。ヴアリッドパス２０−１を経てＶＲＵ２から転送さ
れたヴアリッドは、ｌ；’、１；’、２９−１を経て、
リクエスト及び付加情報生成回路に送出されると同時に
、マスク読出し回路２９−６及びリクエストアドレス計
算回路２９−４に入力される。リクエスト及び付加情報
生成回路２９−５では、マスク値２９−１２及び８バイ
ト内アドレス２９−１３を用いて、リクエスト及び第６
図（２）に示すストアリクエスト付加情報を生成する。Next, the operation of the 5DTC 29 will be explained using FIG. 11. The variable transferred from the VRU 2 via the variable path 20-1 passes through l;', 1;', 29-1,
At the same time as being sent to the request and additional information generation circuit, it is input to the mask reading circuit 29-6 and the request address calculation circuit 29-4. The request and additional information generation circuit 29-5 uses the mask value 29-12 and the 8-byte address 29-13 to generate the request and the sixth
The store request additional information shown in Figure (2) is generated.

このストアリクエスト付ｍ情報は、Ｖｏ　＝　ＶＩより
成る。ＶＯは８バイトの前半４バイトが有効、■１は８
バイトの成牛４バイトが有効であることをそれぞれ示す
。このストアリクエスト付加情報のＶ。＋　Ｖ、はそれ
ぞれＦ、Ｆ、２９−７及びＦ、Ｆ、群２９−８を経てリ
クエストバス１７−１及びストアリクエスト付加情報パ
ス１７−２に送出される。これらと同期して、リクエス
トアドレス計算回路２９−４で生成されたリクエストア
ドレスは、Ｆ、Ｆ、群２９−９を経てアドレスバス１７
−３に送出される。又ストアデータバス２０−２を経て
Ｆ、Ｆ、群２９−２の上４バイトにセットされたストア
データ４バイトは、２要素分が複合されてＦ、Ｆ、群２
９−１０にセットされ、ストアデータバス１７−４に送
出される。この間のタイムチャート及び諮の谷パンクビ
ジータイムチャートをそれぞれ第１２図、第１６図に示
す。第１６図は、４バイトベクトルＣ−（Ｃ（０）　、
　Ｃ（１）　−Ｃ（ｎ）　）をストアするベクトルスト
ア命令の場合を示す。This m information with store request consists of Vo = VI. For VO, the first 4 bytes of 8 bytes are valid, ■1 is 8
Each indicates that 4 adult cow bites are valid. V of this store request additional information. +V is sent to the request bus 17-1 and the store request additional information path 17-2 via the F, F, 29-7 and F, F, groups 29-8, respectively. In synchronization with these, the request address generated by the request address calculation circuit 29-4 is sent to the address bus 17 via F, F, group 29-9.
-3. In addition, the 4 bytes of store data set in the upper 4 bytes of F, F, group 29-2 via the store data bus 20-2 are combined into two elements and are stored in F, F, group 2.
9-10 and sent to the store data bus 17-4. A time chart during this period and a puncture busy time chart of Kouno Valley are shown in FIG. 12 and FIG. 16, respectively. Figure 16 shows the 4-byte vector C-(C(0),
A case of a vector store instruction that stores C(1)-C(n)) is shown.

〔Effect of the invention〕

本発明によれ（工、マスク付きも富めて、ベクトルフェ
ッチ／ストア両会の処＊乞複数要素ずつまとめて処理す
ることが可能となり、また、マスク値によって抑止すべ
き要素のストア処理をリクエスト発生時点から抑止する
ことが出来るので、同一ベクトルフェッチ／ストア稲令
処理における要素間のバンク競合を避け、仮数ベクトル
のフェッチ／ストア命令間のバンク競合を軽減すること
が出来、それらの命令処理速度を大巾に向上出来る。According to the present invention, it is possible to process multiple elements at once in both vector fetch/store functions by using a mask, and it is also possible to request store processing of elements that should be suppressed by mask values. Since it can be suppressed from the moment it occurs, it is possible to avoid bank conflicts between elements in the same vector fetch/store instruction processing, reduce bank conflicts between mantissa vector fetch/store instructions, and improve the processing speed of those instructions. can be greatly improved.

[Brief explanation of drawings]

第１図は、ベクトル処理装置の構成例を示す図、第２図
は、一般的なベクトルフェッチ／ストア命令処理におけ
るバンクビジータイムの概念図、第５図と第４図は、４
バイトアドレス連続ベクトルフエツチストア命令処理で
のバンクビジータイムチャート、第５図は、本発明の一
実施例におけるデータ転送回路の構成図、第６図（１）
と（２）は、フェッチ及びストアリクエスト付加情報を
説明するだめの図、第７図は、フェッチ用データ転送回
路の内部構造を示す図、第８図と第１０図は、フェッチ
用データ転送回路の動作例のタイムチャート、第９図は
、第８図、第１０図に対応するＭＳのバンクビジータイ
ムチャート、第１１図は、ストア用データ転送回路の内
部構造を示す図、第１２図と第１６図は、それぞれスト
ア用データ転送回路の動作例のタイムチャート、訟のバ
ンクビジータイムチャートである。１・・・ＭＳ、　２・・・ＶＲＵ、　３・・・ＤＴＣ。４・・・ＳＣ，５，６・・・ＡＬＵ。７〜１４・・・■読出し／書込みパス、１５〜１６・・
・フェッチパス、１７・・・ストアパス、１８〜１９・
・・ＶＲ誉込みパス、２０・・・ＶＲ読出しパス、２１
〜２６・・・ＡＬＵ用データバス、２７〜２８・・・フ
ェッチ用データ転送回路、２９・・・ストア用データ歓
送回路、２７−１・・・プレリクエスト生成回路、２７−２．２
９−３・・・マスク読出し回路、２７−５　、２９−４
・・・リクエストアドレス計算回路、２７−１４・・・
ＶＲ薔込み制御回路、２７−４　、２９−５・・・リク
エスト及び付加情報生成回路、２７−１６　、２９−６
・・・セレクト回路。第　１　図第３回！−酊一第４−回坑　タ図ゅ、１　高鍋朋友冨８図２７Ｉ　Φ　３　０　■　デＬリクエストｌターｌ　１
　２．　Φ０　０．Ｏ’ノクエスト１デー３　７　リク
エストアｉ’Ｌス２ｓル　２．ム＋８２５ムや１，２５搾２゜筆　９　渕第７０　図２７−／／　ｌの　２ρ　ｑ■　６．■　アト・ノく・
二人２７−ノ３　＠、■　■、■　■、　５　６．り　
フエ、７チテ゛ニ夕／Ｂ−３０■　■　■　■　■　■
　■　ｖｓｌ＞みテ°１夕箭ｉｔ　品蕃７２　図／７−１　Ｏ！　■　４　リクエストFIG. 1 is a diagram showing a configuration example of a vector processing device, FIG. 2 is a conceptual diagram of bank busy time in general vector fetch/store instruction processing, and FIGS.
FIG. 5 is a bank busy time chart in byte address continuous vector fetch store command processing, and FIG. 6 is a block diagram of a data transfer circuit in an embodiment of the present invention.
and (2) are diagrams for explaining fetch and store request additional information, Figure 7 is a diagram showing the internal structure of the fetch data transfer circuit, and Figures 8 and 10 are the fetch data transfer circuit. FIG. 9 is a bank busy time chart of the MS corresponding to FIGS. 8 and 10. FIG. 11 is a diagram showing the internal structure of the store data transfer circuit, and FIG. FIG. 16 is a time chart of an example of the operation of the store data transfer circuit and a bank busy time chart of the case, respectively. 1...MS, 2...VRU, 3...DTC. 4...SC, 5,6...ALU. 7-14...■ Read/write path, 15-16...
・Fetch pass, 17...Store pass, 18-19・
...VR honorable pass, 20...VR readout pass, 21
~26... Data bus for ALU, 27-28... Data transfer circuit for fetch, 29... Data forwarding circuit for store, 27-1... Pre-request generation circuit, 27-2.2
9-3...Mask readout circuit, 27-5, 29-4
...Request address calculation circuit, 27-14...
VR control circuit, 27-4, 29-5... request and additional information generation circuit, 27-16, 29-6
...Select circuit. Figure 1 Part 3! -Sakeichi 4th-Kai Tazu, 1 Takanabe Tomomi 8Figure 27I Φ 3 0 ■ DeL request l Tarl 1
2. Φ0 0. O'NoQuest 1 Day 3 7 Request I'L 2s 2. 2ρ q■ 6. ■ Ato Noku
Two people 27-no 3 @, ■ ■, ■ ■, 5 6. the law of nature
Hue, 7th District/B-30 ■ ■ ■ ■ ■ ■
■ vsl＞Mite°1 Yushu it 品蕃72 Figure/7-1 O! ■ 4 Request

Claims

[Claims]

(1) A main memory device, a plurality of vector registers for storing vector data, a data transfer circuit that controls data transfer between the main memory device and the vector registers, and a main memory access request from the data transfer circuit. In response to a fetch request, the fetish data read from the main memory is transferred to the data transfer circuit, and in response to a store request, the store data received from the data transfer circuit is transferred to the main memory. A vector processing device having a storage control device that transfers data to the main storage device, in which requests for a plurality of vector elements processed by the data transfer circuit are combined into one and transferred to the storage control device; The main storage device processes the request as a single request, and in response to a fetch request, the data transfer circuit divides the fetch data into data corresponding to a plurality of requests and sequentially transfers the data to the vector register,
In response to a store request, multiple store data 'l7?' are transferred from the vector register. 1. A vector processing device characterized in that the data is attached to one request in the data transfer circuit, transferred to the storage control device, and stored in the main storage device all at once.