JPH0877141A

JPH0877141A - Vector data processor

Info

Publication number: JPH0877141A
Application number: JP20643494A
Authority: JP
Inventors: Takahiro Uchida; 尊博内田
Original assignee: NEC Computertechno Ltd
Current assignee: NEC Computertechno Ltd
Priority date: 1994-08-31
Filing date: 1994-08-31
Publication date: 1996-03-22
Anticipated expiration: 2018-04-21
Also published as: JP3398673B2

Abstract

PURPOSE: To improve the throughput by improving the request processing efficiency of a vector data processor. CONSTITUTION: A constitution example of the vector data processor corresponding to a 4B odd skip load/store instruction is shown, and this processor is provided with an operation part 1, a network part 5, and a memory part 9, and the operation part 1 includes an instruction recognition circuit 2, a network control circuit 3, and a vector operation circuit 4, and the network part 5 includes a contention arbitrating circuit 6, a buffer circuit 7, and a crossbar circuit 8. When each input port is held because of the occurrence of port contention, all input ports are held in the case of a normal instruction, but only input ports which are not processed as the result of contention are held in the case of a 4B odd skip vector instruction. Thus, port contention for the 4B odd skip vector instruction is reduced to improve the throughput.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明はベクトルデータ処理装置
に関し、特にベクトルのロード／ストア命令に対応して
リクェスト処理を行うベクトル処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a vector data processing device, and more particularly to a vector processing device which performs request processing in response to vector load / store instructions.

【０００２】[0002]

【従来の技術】近年、情報処理装置のベクトルデータ処
理装置においては、主メモリとレジスタまたは演算部と
の間の大量のデータを高速にて処理するために、同一タ
イミングにおいて、同時に複数のデータをネットワーク
部に連続的に供給することにより、情報処理の高速化が
図られている。2. Description of the Related Art In recent years, in a vector data processing device of an information processing device, in order to process a large amount of data between a main memory and a register or an arithmetic unit at high speed, a plurality of data are simultaneously processed at the same timing. By continuously supplying the data to the network unit, the speed of information processing is increased.

【０００３】従来のベクトルデータ処理装置の一例の構
成が図５に示される。本従来例は、４Ｂ（Ｎ＝４の場
合）奇数飛びロード／ストア命令に対応するベクトル処
理装置の一例であり、図５に示されるように、命令認識
回路２およびベクトル演算回路４を含む演算部１と、競
合調停回路６、バッファ受けレジスタを内蔵するバッフ
ァ回路７およびクロスバ回路８を含むネットワーク部５
と、メモリ部９とを備えて構成されている。The configuration of an example of a conventional vector data processing device is shown in FIG. This conventional example is an example of a vector processing device corresponding to a 4B (when N = 4) odd-numbered jump load / store instruction, and as shown in FIG. 5, an operation including an instruction recognition circuit 2 and a vector operation circuit 4. Network section 5 including section 1, contention arbitration circuit 6, buffer circuit 7 including a buffer receiving register, and crossbar circuit 8
And a memory unit 9.

【０００４】図５において、命令認識回路２においては
所定の命令コード１００が解析されて、当該解析により
認識されたベクトル命令に対応するベクトル演算実行命
令１０１が出力されて、ベクトル演算回路４に入力され
る。ベクトル演算回路４においては、命令認識回路２よ
り入力されるベクトル演算実行命令１０１を受けて、所
定のベクトル演算が実行される。本従来例は、上述のよ
うに、４Ｂ連続ロード／ストア命令および４Ｂ奇数飛び
ロード／ストア命令に対応するベクトル処理装置の例で
あり、これらの命令はベクトル命令である。これに対応
して、ベクトル演算回路４には四つの出力ポートが設け
られており、ベクトル演算実行命令１０１を受けて、当
該ベクトル命令の実行時においては、これらの四つの出
力ポートより、アドレスと演算結果のデータとが組にな
って１要素として形成されるリクエスト１０３が順番に
発行されて、バッファ回路７の入力ポートに入力され
る。In FIG. 5, a predetermined instruction code 100 is analyzed in the instruction recognition circuit 2, and a vector operation execution instruction 101 corresponding to the vector instruction recognized by the analysis is output and input to the vector operation circuit 4. To be done. The vector operation circuit 4 receives the vector operation execution instruction 101 input from the instruction recognition circuit 2 and executes a predetermined vector operation. This conventional example is an example of the vector processing device corresponding to the 4B continuous load / store instruction and the 4B odd skip load / store instruction as described above, and these instructions are vector instructions. Corresponding to this, the vector operation circuit 4 is provided with four output ports. When the vector operation execution instruction 101 is received and the vector instruction is executed, an address and an address are output from these four output ports. The request 103, which is formed as one element by forming a pair with the operation result data, is sequentially issued and input to the input port of the buffer circuit 7.

【０００５】命令認識回路２において、４Ｂ連続ストア
命令が認識された場合には、命令認識回路２より、４バ
イト連続ストア命令に対応するベクトル演算実行命令１
０１が出力されて、ベクトル演算回路４に入力される。
ベクトル演算回路４においては、このベクトル演算実行
命令１０１を受けて、当該命令に含まれるベースアドレ
スとディスタンスよりアドレスが決定され、それぞれス
トアされるデータとアドレスとが組になって１要素とし
て形成されるリクェストが生成される。なお、この場合
においては、ベクトル演算実行命令１０１が、４Ｂ連続
ストア命令に対応する実行命令であるため、当該命令
は、ディスタンスが４Ｂ（４Ｂ＊１）であり、リクェス
トの行き先が連続して配列される命令となっている。ま
た、ベクトル演算実行命令１０１が、４Ｂ奇数飛び命令
に対応する実行命令である場合には、当該命令は、ディ
スタンスが４Ｂ（４Ｂ＊３）であり、リクェストの行き
先が奇数飛びに配列される命令となる。なお、本従来例
においては、ベクトル演算回路４には四つの出力ポート
が設けられているため、ベクトル演算回路４からは、四
つのリクェスト１０３が、それぞれ一つの組となって順
次に出力されてバッファ回路７に入力される。When the 4B consecutive store instruction is recognized in the instruction recognition circuit 2, the vector recognition execution instruction 1 corresponding to the 4 byte consecutive store instruction is issued from the instruction recognition circuit 2.
01 is output and input to the vector operation circuit 4.
In the vector operation circuit 4, the vector operation execution instruction 101 is received, the address is determined from the base address and the distance included in the instruction, and the data to be stored and the address are paired and formed as one element. A request is generated. In this case, since the vector operation execution instruction 101 is an execution instruction corresponding to the 4B continuous store instruction, the distance of the instruction is 4B (4B * 1), and the destination of the request is continuously arranged. It is a command to be done. When the vector operation execution instruction 101 is an execution instruction corresponding to a 4B odd skip instruction, the instruction has a distance of 4B (4B * 3), and the destination of the request is arranged in an odd jump. Becomes In this prior art example, since the vector operation circuit 4 is provided with four output ports, the vector operation circuit 4 sequentially outputs four request 103 as one set, respectively. It is input to the buffer circuit 7.

【０００６】バッファ回路７は、未処理のリクェストが
存在する場合には、ベクトル演算回路４より入力される
リクェストを一時的にバッファ受けレジスタ内に保持し
ておく緩衝バッファであり、入力されるリクェスト１０
３は、当該バッファを介して、その順序に従い要素０、
要素１、要素２および要素４の４要素ごとに、それぞれ
順番にレジスタ内に格納される。そして、このリクェス
ト１０３の入力に対応して、最前列のリクェストアドレ
ス１０４が出力されて、競合調停回路６に入力される。
バッファ回路７のバッファ受けレジスタ内に格納される
リクェストが満杯になる時点においては、ホールド信号
１０６が出力されて、ベクトル演算回路４に入力され、
このホールド信号１０６により、ベクトル演算回路４か
らのリクェスト１０３の出力は一旦停止される。他方に
おいて、競合調停回路６においては、バッファ回路７よ
り入力される最前列のリクェストアドレス１０４を受け
て、当該アドレスを基にして、リクェスト１０３の競合
状態がチェックされる。そして、リクェスト間に競合が
存在する場合には、バッファ回路７に出力されるホール
ド信号１０５はオンとなり、また競合が存在しない場合
にはオフとなる。また競合調停回路６よりは、セレクト
信号１０７が出力されてクロスバ回路８に入力される。
クロスバ回路８においては、競合調停回路６より入力さ
れるセレクト信号１０７を受けて、バッファ回路７のバ
ッファ受けレジスタより入力されるリクェスト１０８が
適宜選択されて出力され、メモリ部９に入力される。メ
モリ部９においては、クロスバ回路８より入力されるリ
クェスト８が、それぞれアドレスとデータとに分類され
て、指定されたアドレスに対応するデータが格納され
る。本従来例においては、四要素づつの入力に対して、
実質的に、処理されてメモリ部９に出力される要素数は
２要素のみであり、これは最大スループットの１／２で
あることを意味している。The buffer circuit 7 is a buffer buffer that temporarily holds the request input from the vector operation circuit 4 in the buffer receiving register when there is an unprocessed request. 10
3 is element 0, according to the order through the buffer,
Each of the four elements, element 1, element 2 and element 4, is stored in the register in order. Then, in response to the input of the request 103, the request address 104 in the front row is output and input to the contention arbitration circuit 6.
At the time when the request stored in the buffer receiving register of the buffer circuit 7 becomes full, the hold signal 106 is output and input to the vector operation circuit 4,
The hold signal 106 temporarily stops the output of the request 103 from the vector operation circuit 4. On the other hand, the contention arbitration circuit 6 receives the request address 104 in the front row input from the buffer circuit 7, and checks the contention state of the request 103 based on the address. When there is contention between the requests, the hold signal 105 output to the buffer circuit 7 is turned on, and when there is no contention, it is turned off. The select signal 107 is output from the competitive arbitration circuit 6 and input to the crossbar circuit 8.
In the crossbar circuit 8, in response to the select signal 107 input from the contention arbitration circuit 6, the request 108 input from the buffer receiving register of the buffer circuit 7 is appropriately selected and output, and input to the memory unit 9. In the memory unit 9, the request 8 input from the crossbar circuit 8 is classified into an address and data, and data corresponding to the designated address is stored. In this conventional example, for each input of four elements,
Substantially, the number of elements processed and output to the memory unit 9 is only two elements, which means ½ of the maximum throughput.

【０００７】次に、図６（ａ）、（ｂ）、（ｃ）、
（ｄ）および（ｅ）と、図７（ａ）、（ｂ）、（ｃ）、
（ｄ）および（ｅ）を参照して、本従来例の動作を敷延
して説明する。Next, FIGS. 6 (a), 6 (b), 6 (c),
(D) and (e) and FIGS. 7 (a), (b), (c),
With reference to (d) and (e), the operation of this prior art example will be described.

【０００８】図６は、本従来例における、４Ｂ奇数飛び
（４Ｂ＊１飛び）ベクトル命令時のリクェスト処理動作
説明図である。図６（ａ）におけるＡ、Ｂ、ＣおよびＤ
は、メモリ部９のメモリポートを示しており、それぞれ
のメモリポートは、アッパー（Ｕ）とローアー（Ｌ）と
に分けられている。これらのメモリポートの下に示され
る○で囲まれた数字は、ディスタンス“４Ｂ”の４Ｂベ
クトル命令時におけるリクェスト到着場所と、リクェス
ト発行順序とを示している。この順序でリクェストが発
行された時のバッファおよびレジスタの１Ｔ目の状態
が、図６（ｂ）に示される。通常のベクトル命令おける
リクェスト処理においては、バッファに入力されるリク
ェストは、その順序に従ってレジスタに出力される。枠
内の数字とアルファベットは、上述のように、それぞれ
リクェスト発行順序と行き先のメモリポートを示してい
る。１Ｔ目においては、四つのレジスタ内の○で囲まれ
ている１Ａと３Ｂの二つのリクェストが処理される
が、、２Ａと４Ｂの二つのリクェストは処理されなかっ
たために、この二つのポートにおいてはホールド信号が
オンとなり、これらのホールド信号の論理和がとられ
て、バッファとレジスタの全ての入力ポートに対するホ
ールド信号がオンの状態となる（Ｈｏｌｄａｌｌ処
理）。このために、図６（ｃ）の２Ｔ目においては、四
つのレジスタの内の残りの２Ａと４Ｂの二つのリクェス
トのみが処理される。この時にはリクェストの中で競合
がないために、ホールド信号はオフとなる。図６（ｄ）
の３Ｔ目においては、ホールド信号が解除されたために
各バッファからレジスタにリクェストが出力され、競合
チェックの結果二つのリクェストが処理される。二つの
リクェストが競合に敗れて処理されなかったために、再
度全てのバッファおよびレジスタに対するホールド信号
がオンとなる。図６（ｅ）に示される４Ｔ目において
は、残された二つのリクェストが処理される。以降にお
いては、１Ｔ目から４Ｔ目の動作の繰返しとなる。この
４Ｂ奇数飛び（４Ｂ＊１飛び）ベクトル命令時において
は、各タイミングにおける４要素づつの入力に対して、
出力が２要素づつであり、これは最大スループットの１
／２に相当している。FIG. 6 is an explanatory diagram of a request processing operation at the time of a 4B odd jump (4B * 1 jump) vector instruction in the conventional example. A, B, C and D in FIG. 6 (a)
Indicates a memory port of the memory unit 9, and each memory port is divided into an upper (U) and a lower (L). The numbers circled under these memory ports indicate the request arrival location and the request issuing order at the time of the 4B vector instruction of the distance "4B". FIG. 6B shows the 1T state of the buffer and the register when the request is issued in this order. In the request processing in the normal vector instruction, the request input to the buffer is output to the register in the order. The numbers and letters in the boxes indicate the request issuing sequence and the destination memory port, respectively, as described above. In the 1T, the two requests of 1A and 3B surrounded by ◯ in the four registers are processed, but the two requests of 2A and 4B are not processed, so these two ports are processed. The hold signal is turned on, the hold signals are ORed, and the hold signals for all the input ports of the buffer and the register are turned on (hold all processing). Therefore, in the 2T of FIG. 6C, only the remaining two requests 2A and 4B out of the four registers are processed. At this time, since there is no competition in the request, the hold signal is turned off. Figure 6 (d)
In the 3T, the hold signal is released, so that a request is output from each buffer to the register, and two requests are processed as a result of the conflict check. The hold signals for all buffers and registers turn on again because the two requests were lost due to contention. At the 4Tth shown in FIG. 6E, the remaining two requests are processed. After that, the operation from the 1T to the 4T is repeated. At the time of this 4B odd jump (4B * 1 jump) vector instruction, for each input of 4 elements at each timing,
The output is 2 elements each, which is 1 of the maximum throughput.
It is equivalent to / 2.

【０００９】図７は、本従来例における、４Ｂ奇数飛び
（４Ｂ＊３飛び）ロード／ストア命令発行時のリクェス
ト処理動作説明図である。図７（ａ）には、この命令が
発行された時には、Ａ、Ｂ、ＣおよびＤの四つのメモリ
ポートに対して送られてくるリクェストの順番が、、
、、、、……にて示されている。演算部１にお
いては、４Ｂ奇数飛び命令時には、図７（ａ）に示され
る１Ｔ目においては、リクェスト１Ａ、２Ｂおよび３Ｄ
が処理されるが、リクェスト４Ａは競合に敗れたため
に、全てのバッファに対するホールド信号がオンの状態
となる（ＨｏｌｄＡｌｌ処理）。図７（ｃ）に示される
２Ｔ目においては、新たなリクェストがバッファより送
られてこないために、レジスタに一つだけ残されたリク
ェスト４Ａのみが処理される。次いで、図７（ｄ）に示
される３Ｔ目においては、各バッファより新たなリクェ
ストが送られてきて処理された結果、リクェスト８Ｃが
競合に敗れて処理されないために、全てのバッファに対
するホールド信号がオンとなる。図７（ｅ）に示される
４Ｔ目においては、残された８Ｃのみが処理される。以
降においては、１Ｔ目から４Ｔ目の動作のパターンと同
じであり、繰返して処理が行われてゆく。上述の１Ｔ目
から４Ｔ目までの平均をとると、各タイミングにおいて
２リクェストづつ処理されていることがわかる。このこ
とは、本従来例におけるスループットが、本従来例の構
成における最大スループットの１／２に相当していると
いうことを意味している。FIG. 7 is an explanatory diagram of a request processing operation at the time of issuing a 4B odd jump (4B * 3 jump) load / store instruction in the conventional example. In FIG. 7A, when this instruction is issued, the order of requests sent to the four memory ports A, B, C and D is as follows:
,,,, ... are shown. In the arithmetic unit 1, at the time of the 4B odd skip instruction, the request 1A, 2B and 3D is generated at the 1T shown in FIG. 7A.
However, since the request 4A lost the competition, the hold signals for all the buffers are turned on (HoldAll processing). At the 2T shown in FIG. 7C, since no new request is sent from the buffer, only the request 4A left in the register is processed. Next, at the 3T shown in FIG. 7D, a new request is sent from each buffer and processed, and as a result, the request 8C loses contention and is not processed. It turns on. At the 4Tth shown in FIG. 7E, only the remaining 8C is processed. After that, the operation pattern is the same as that of the first to fourth operations, and the processing is repeated. It can be seen that two requests are processed at each timing by averaging the above first to fourth Ts. This means that the throughput in this conventional example corresponds to 1/2 of the maximum throughput in the configuration of this conventional example.

【００１０】[0010]

【発明が解決しようとする課題】上述した従来のベクト
ルデータ処理装置においては、競合調停回路よりバッフ
ァ回路に入力されるホールド信号のオン／オフが、バッ
ファ回路より入力される最前列のリクェストアドレスを
受けて、当該アドレスを基にして、リクェストの競合状
態がチェックする判定方法のみにより制御されているた
めに、スループットが最大値より低レベルとなり、ベク
トルロードおよびストア命令の、主メモリに対するアク
セス時間が増大し、処理速度が著しく低下するという欠
点がある。In the above-described conventional vector data processing device, the ON / OFF state of the hold signal input to the buffer circuit from the contention arbitration circuit determines the request address of the front row input from the buffer circuit. Therefore, the throughput is lower than the maximum value because the contention state of the request is controlled only by the judgment method that checks based on the relevant address, and the access time to the main memory of the vector load and store instructions is reduced. However, there is a drawback in that the processing speed increases and the processing speed significantly decreases.

【００１１】[0011]

【課題を解決するための手段】本発明のベクトルデータ
処理装置は、ベクトル演算を行う少なくとも１個以上の
演算部と、２ＮＢを最小リクェスト単位とする複数のバ
ンクを有し、同時並行処理を行うことが可能な複数のポ
ートを有するメモリモジュールにより構成される少なく
とも１個以上の主記憶部と、前記演算部と前記主記憶部
との間において並列に複数の２ＮＢ単位のデータ転送を
行うことが可能なネットワーク部とを備えるベクトル処
理装置において、前記演算部が、所定の命令コードを解
析して、認識されたベクトル命令に対応するベクトル演
算実行命令を出力するとともに、当該ベクトル命令を出
力する命令認識回路と、前記ベクトル演算実行命令を受
けて所定のベクトル演算を実行し、各出力ポートに対応
するリクェストを出力するベクトル演算回路と、前記ベ
クトル命令を受けて、当該ベクトル命令をコード化して
一時的に保持するとともに、所定のタイミングにおい
て、ネットワーク制御信号を出力するネットワーク制御
回路とを備えて構成され、前記ネットワーク部が、前記
ベクトル演算回路より出力される各ポートのリクェスト
を受けて、一時的にレジスタ内に格納するバッファ回路
と、前記ネットワーク制御信号を入力し、当該ネットワ
ーク制御信号を介して、前記ベクトル演算回路より出力
される各ポートのリクェスト形式を認識するとともに、
前記バッファ回路内のレジスタから送られてくるリクェ
ストのアドレス情報により前記レジスタ内のリクェスト
競合状態を検出し、前記リクェスト形式を選択制御基準
として、競合に敗れた要素に対応する入力ポートならび
に競合発生時の全ての入力ポートをホールドする競合調
停回路と、前記バッファ回路より出力される各ポートの
リクエストを入力し、前記競合調停回路より出力される
セレクト信号を受けて、当該セレクト信号により各ポー
トのリクェストを選択して、前記主記憶部の対応する入
力ポートに出力するクロスバ回路とを備えて構成され
る。A vector data processing apparatus according to the present invention has at least one arithmetic unit for performing vector arithmetic and a plurality of banks each having 2NB as a minimum request unit, and performs simultaneous parallel processing. It is possible to perform data transfer in units of 2 NB in parallel between at least one main storage unit configured by a memory module having a plurality of ports capable of performing the above operations, and the arithmetic unit and the main storage unit. In a vector processing device including a possible network unit, the arithmetic unit analyzes a predetermined instruction code and outputs a vector operation execution instruction corresponding to the recognized vector instruction, and an instruction to output the vector instruction. Receiving the request corresponding to each output port by executing the predetermined vector operation in response to the recognition circuit and the vector operation execution instruction. And a network control circuit that receives the vector instruction, encodes the vector instruction and temporarily holds the vector instruction, and outputs a network control signal at a predetermined timing. A network unit receives a request from each port output from the vector operation circuit, temporarily stores a buffer circuit in a register, and inputs the network control signal, and through the network control signal, the vector While recognizing the request format of each port output from the arithmetic circuit,
When the request contention state in the register is detected by the request address information sent from the register in the buffer circuit, and the request format is used as the selection control criterion, the input port corresponding to the element lost in the contention and the occurrence of the contention , A request for each port output from the buffer circuit is input, a select signal output from the contention arbitration circuit is received, and the request of each port is requested by the select signal. And a crossbar circuit for outputting to the corresponding input port of the main memory section.

【００１２】なお、前記競合調停回路は、前記バッファ
回路のレジスタより出力される各ポート対応のアドレス
情報を受けて、リクェストの競合状態を検出して、前記
クロスバ回路に対して前記セレクト信号を出力するとと
もに、各ポートに対応するホールド信号を出力するアー
ビターと、前記ネットワーク制御回路より入力されるネ
ットワーク制御信号を受けて、当該ネットワーク制御信
号を介して前記ベクトル演算回路より出力される各ポー
トのリクェスト形式を認識して、ＮＢ奇数飛び命令形式
が認識された時点において切替制御信号をオンとして出
力するＮＢ奇数飛び命令認識回路と、前記アービターか
ら出力される各ポートに対応するホールド信号を受け
て、それらの論理和をとって出力する論理和回路と、前
記アービターから出力される各ポートに対応するホール
ド信号および前記論理和回路の論理和出力を入力し、前
記切替制御信号により制御されて所定のポートに対応す
るホールド信号を選択して、前記バッファ回路内の対応
するレジスタに出力するセレクトとを備えて構成しても
よい。The contention arbitration circuit receives the address information corresponding to each port output from the register of the buffer circuit, detects the contention state of the request, and outputs the select signal to the crossbar circuit. In addition, an arbiter that outputs a hold signal corresponding to each port and a network control signal input from the network control circuit are received, and the request of each port output from the vector operation circuit via the network control signal. Upon receiving the NB odd-numbered jump instruction recognition circuit which recognizes the format and outputs the switching control signal as ON when the NB odd-numbered jump instruction format is recognized, and the hold signal corresponding to each port output from the arbiter, A logical sum circuit that takes the logical sum of these and outputs it, and the output from the arbiter The hold signal corresponding to each port and the logical sum output of the logical sum circuit are input, the hold signal corresponding to a predetermined port controlled by the switching control signal is selected, and the corresponding hold signal in the buffer circuit is selected. You may comprise with the select output to a register.

【００１３】[0013]

【実施例】次に、本発明について図面を参照して説明す
る。Next, the present invention will be described with reference to the drawings.

【００１４】図１は本発明の一実施例を示すブロック図
である。本実施例は、前述の従来例の場合と同様に、Ｎ
＝４の場合に対応する４Ｂ奇数飛びロード／ストア命令
に対応するベクトルデータ処理装置の例であり、図１に
示されるように、命令認識回路２、ネットワーク制御回
路３およびベクトル演算回路４を含む演算部１と、競合
調停回路６、バッファ回路７およびクロスバ回路８を含
むネットワーク部５と、メモリ部９とを備えて構成され
る。FIG. 1 is a block diagram showing an embodiment of the present invention. In this embodiment, as in the case of the above-mentioned conventional example, N
2 is an example of a vector data processing device corresponding to a 4B odd jump load / store instruction corresponding to the case of = 4, and includes an instruction recognition circuit 2, a network control circuit 3 and a vector operation circuit 4 as shown in FIG. The arithmetic unit 1, a network unit 5 including a competitive arbitration circuit 6, a buffer circuit 7 and a crossbar circuit 8, and a memory unit 9 are provided.

【００１５】図１において、命令認識回路２においては
所定の命令コード１００が解析されて、認識されたベク
トル命令に対応するベクトル演算実行命令１０１が出力
されて、ベクトル演算回路４に入力されるとともに、当
該ベクトル命令がネットワーク制御回路３に入力され
る。ベクトル演算回路４においては、命令認識回路２よ
り出力されるベクトル演算実行命令１０１を受けて、所
定のベクトル演算が実行される。本実施例においては、
４Ｂ連続ロード／ストア命令および４Ｂ奇数飛びロード
／ストア命令がベクトル命令であり、これに対応して、
ベクトル演算回路４には、それぞれ要素０、要素１、要
素２および要素３に対応する四つの出力ポートが設けら
れており、ベクトル演算実行命令１０１を受けて、当該
ベクトル命令の実行時においては、これらの四つの出力
ポートからは、これらの４要素に対応して、それぞれア
ドレスと演算結果のデータとが組になって形成される四
つのリクェスト１０３が順番に発行されて、バッファ回
路７に出力される。他方、ネットワーク制御回路３にお
いては、命令認識回路２より入力される命令がコード化
され、一時的に保持される。そして、ベクトル演算回路
４よりバッファ回路７に対してリクェスト１０３が入力
されるタイミングにおいて、リクェスト１０３の生成源
である当該命令コードが、ネットワーク制御信号１０２
として出力され、競合調停回路６に入力される。なお、
以下においては、命令認識回路２において、ベクトル命
令として、４Ｂ連続ストア命令が認識された場合につい
ての動作について説明する。In FIG. 1, a predetermined instruction code 100 is analyzed in the instruction recognition circuit 2, and a vector operation execution instruction 101 corresponding to the recognized vector instruction is output and input to the vector operation circuit 4. , The vector command is input to the network control circuit 3. The vector operation circuit 4 receives the vector operation execution instruction 101 output from the instruction recognition circuit 2 and executes a predetermined vector operation. In this embodiment,
The 4B continuous load / store instruction and the 4B odd skip load / store instruction are vector instructions, and correspondingly,
The vector operation circuit 4 is provided with four output ports corresponding to the element 0, the element 1, the element 2 and the element 3, respectively. When the vector operation execution instruction 101 is received and the vector instruction is executed, From these four output ports, corresponding to these four elements, four request 103, each of which is formed by combining an address and data of the operation result, are issued in order and output to the buffer circuit 7. To be done. On the other hand, in the network control circuit 3, the command input from the command recognition circuit 2 is coded and temporarily stored. Then, at the timing when the request 103 is input from the vector operation circuit 4 to the buffer circuit 7, the instruction code, which is the generation source of the request 103, changes to the network control signal 102.
And is input to the competitive arbitration circuit 6. In addition,
In the following, the operation when the instruction recognition circuit 2 recognizes a 4B continuous store instruction as a vector instruction will be described.

【００１６】命令認識回路２において、４Ｂ連続ストア
命令が認識された場合には、命令認識回路２より、４Ｂ
連続ストア命令に対応するベクトル演算実行命令１０１
が出力されて、ベクトル演算回路４に入力される。ベク
トル演算回路４においては、このベクトル演算実行命令
１０１を受けて、当該命令に含まれるベースアドレスと
ディスタンスよりアドレスが決定され、当該アドレスが
ストアされるデータと組になって１要素として形成され
るリクェストが生成される。なお、この場合において
は、ベクトル演算実行命令１０１が、４Ｂ連続ストア命
令に対応する実行命令であるため、当該命令は、ディス
タンスが４Ｂ（４Ｂ＊１）であり、リクェストの行き先
が連続して配列される命令となっている。また、ベクト
ル演算実行命令１０１が、４Ｂ奇数飛び命令に対応する
実行命令である場合には、当該命令は、ディスタンスが
４Ｂ（４Ｂ＊３）であり、リクェストの行き先が奇数飛
びに配列される命令となる。なお、上述のように、ベク
トル演算回路４には四つの出力ポートが設けられている
ため、ベクトル演算回路４からは、各出力ポートに対応
する四つの要部のリクェスト１０３が組となって順次に
出力されてバッファ回路７に入力される。また、ネット
ワーク制御回路３においては、ベクトル演算回路４よ
り、最初にリクェスト１０３が出力されるタイミングに
おいて、４バイト連続ストア命令の情報を含むネットワ
ーク制御信号１０２が出力されて、競合調停回路６に入
力される。When the 4B consecutive store instruction is recognized by the instruction recognition circuit 2, the instruction recognition circuit 2 outputs 4B.
Vector operation execution instruction 101 corresponding to continuous store instruction
Is output and input to the vector operation circuit 4. The vector operation circuit 4 receives the vector operation execution instruction 101, determines the address from the base address and the distance included in the instruction, and forms a pair with the data stored in the address to form one element. A request is generated. In this case, since the vector operation execution instruction 101 is an execution instruction corresponding to the 4B continuous store instruction, the distance of the instruction is 4B (4B * 1), and the destination of the request is continuously arranged. It is a command to be done. When the vector operation execution instruction 101 is an execution instruction corresponding to a 4B odd skip instruction, the instruction has a distance of 4B (4B * 3), and the destination of the request is arranged in an odd jump. Becomes As described above, since the vector operation circuit 4 is provided with four output ports, from the vector operation circuit 4, the request 103 of the four main parts corresponding to each output port is sequentially formed as a set. And is input to the buffer circuit 7. Further, in the network control circuit 3, the vector operation circuit 4 outputs the network control signal 102 including the information of the 4-byte continuous store instruction at the timing when the request 103 is first output and inputs it to the contention arbitration circuit 6. To be done.

【００１７】バッファ回路７は、未処理のリクェストが
存在する場合には、ベクトル演算回路４より入力される
リクェストを一時的にバッファ受けレジスタ内に格納し
ておく緩衝バッファであり、入力されるリクェスト１０
３は、当該バッファを介して、その順序に従って要素
０、要素１、要素２および要素４の４要素ごとに、それ
ぞれ順番にレジスタ内に格納される。そして、このリク
ェスト１０３の入力に対応して、前記レジスタからは各
要素の最前列のリクェストアドレス１０４が出力され
て、競合調停回路６に入力される。このバッファ回路７
のレジスタに格納されるリクェストが満杯になった時点
においては、ホールド信号１０６が出力されてベクトル
演算回路４に入力され、このホールド信号１０６によ
り、ベクトル演算回路４からのリクェスト１０３の出力
は停止される。他方において、競合調停回路６において
は、バッファ回路７より入力される最前列のリクェスト
のアドレス１０４を基にして、リクェスト１０３の競合
状態がチェックされる。そして競合が存在する場合に
は、所定の優先順位に従って、各要素に対応するリクェ
ストが出力されるように作用するセレクト信号１０７が
出力されて、クロスバ回路８に送出されるとともに、ネ
ットワーク制御回路３より入力されるネットワーク制御
信号１０２を参照して、競合によりクロスバ回路８より
出力されなかったリクェストに対応するホールド信号が
オンとなって出力され、同時に、処理されたリクェスト
が存在していたバッファ回路７のレジスタがクリアされ
る。勿論、リクェスト間に競合が存在しない場合には、
バッファ回路７に出力されるホールド信号１０５はオフ
となって出力されない。なお、この場合に、本発明の特
徴として重要な点は、競合調停回路６において、バッフ
ァ回路７に出力されるホールド信号１０５のオン／オフ
が、最前列のリクェストアドレス１０４による競合チェ
ックの判定作用に加えて、ネットワーク制御回路３から
のネットワーク制御信号１０２よって認識される、リク
ェスト１０３の生成源である命令コードの情報により制
御されていることである。クロスバ回路８においては、
競合調停回路６より入力されるセレクト信号１０７を受
けて、バッファ回路７のレジスタより入力される要素
０、要素１、要素２および要素４の４要素ごとのリクェ
スト１０８が適宜選択されて出力され、メモリ部９に入
力される。メモリ部９においては、クロスバ回路８より
入力されるリクェスト８が、それぞれアドレスとデータ
とに分類されて、指定されたアドレスに対応するデータ
が各要部ごとに格納される。The buffer circuit 7 is a buffer buffer for temporarily storing the request input from the vector operation circuit 4 in the buffer receiving register when there is an unprocessed request. 10
3 is stored in the register in order through the buffer for each of four elements 0, 1, 1, 2 and 4 according to the order. Then, in response to the input of the request 103, the request address 104 in the front row of each element is output from the register and input to the contention arbitration circuit 6. This buffer circuit 7
When the request stored in the register is full, the hold signal 106 is output and input to the vector operation circuit 4, and the hold signal 106 stops the output of the request 103 from the vector operation circuit 4. It On the other hand, in the contention arbitration circuit 6, the contention state of the request 103 is checked based on the address 104 of the request in the front row input from the buffer circuit 7. If there is a conflict, a select signal 107 that operates so that the request corresponding to each element is output is output according to a predetermined priority order, and is output to the crossbar circuit 8 and the network control circuit 3 With reference to the input network control signal 102, the hold signal corresponding to the request that was not output from the crossbar circuit 8 due to competition is turned on and output, and at the same time, the buffer circuit in which the processed request exists Register 7 is cleared. Of course, if there is no competition between the requests,
The hold signal 105 output to the buffer circuit 7 is off and not output. In this case, an important point as a feature of the present invention is that the ON / OFF of the hold signal 105 output to the buffer circuit 7 in the contention arbitration circuit 6 determines the contention check by the request address 104 in the front row. In addition, it is controlled by the information of the instruction code, which is the generation source of the request 103, which is recognized by the network control signal 102 from the network control circuit 3. In the crossbar circuit 8,
Upon receiving the select signal 107 input from the contention arbitration circuit 6, the request 108 for each of the four elements of the element 0, the element 1, the element 2, and the element 4 input from the register of the buffer circuit 7 is appropriately selected and output, It is input to the memory unit 9. In the memory unit 9, the request 8 input from the crossbar circuit 8 is classified into an address and data, and data corresponding to the designated address is stored for each main part.

【００１８】図２は、上記の競合調停回路６の一実施例
の構成を示すブロック図である。図２に示されるよう
に、本実施例は、アービタ−１０と、論理和回路１１
と、セレクタ１２と、４Ｂ奇数飛び命令認識回路１３と
を備えて構成されている。図１に示されるバッファ回路
７より出力される、それぞれ要素０、要素１、要素２お
よび要素３に対応する四つの最前列のリクェストのアド
レス１０４は、アービター１０に入力される。アービタ
ー１０においては、最前列のリクェストのアドレス１０
４を基にして、リクェスト１０３の競合状態がチェック
され、競合状態が存在する場合には、所定の優先順位に
従って、各要素に対応するリクェストがメモリ部９に出
力されるように作用するセレクト信号１０７が出力され
て、クロスバ回路８に送出される。また、アービター１
０からは、各要素に対応するホールド信号の出力線が、
それぞれ論理和回路１１およびセレクタ１２に接続され
ており、アービター１０から出力されるホールド信号
は、論理和回路１１において論理和がとられ、その論理
和出力は、要部対応のホールド信号としてセレクタ１２
に入力される。また、アービター１０から出力される各
要部に対応するホールド信号は、直接セレクタ１２に入
力される。他方において、ネットワーク制御回路３より
入力されるネットワーク制御信号１０２が４Ｂ奇数飛び
命令認識回路１３に入力されており、ネットワーク制御
信号１０２を介して、ベクトル演算回路４より発行され
るリクェストの形式が４Ｂ奇数飛び命令に対応するもの
と認識される場合には、ネットワーク制御回路３より競
合調停回路６に入力されるネットワーク制御信号１０２
を介して、当該４Ｂ奇数飛び命令認識回路１３から出力
される選択制御信号１０９はオンとなる。セレクタ１２
においては、前記論理和出力および選択制御信号１０９
が共にオンの状態において、始めて各要部それぞれに対
応するホールド信号１０５が出力されて、バッファ回路
７に送出される。FIG. 2 is a block diagram showing the configuration of an embodiment of the contention arbitration circuit 6 described above. As shown in FIG. 2, in this embodiment, the arbiter 10 and the OR circuit 11 are used.
And a selector 12 and a 4B odd skip instruction recognition circuit 13. Addresses 104 of the four front row request requests corresponding to the element 0, the element 1, the element 2, and the element 3 output from the buffer circuit 7 shown in FIG. 1 are input to the arbiter 10. In arbiter 10, the address 10 of the front row request
4, the contention state of the request 103 is checked, and if there is a contention state, a select signal that operates so that the request corresponding to each element is output to the memory unit 9 according to a predetermined priority order. 107 is output and sent to the crossbar circuit 8. Also, arbiter 1
From 0, the output line of the hold signal corresponding to each element is
The hold signals output from the arbiter 10 are connected to the logical sum circuit 11 and the selector 12, respectively, and the logical sum is taken in the logical sum circuit 11, and the logical sum output is used as a hold signal corresponding to the main part of the selector 12.
Is input to Further, the hold signal corresponding to each main part output from the arbiter 10 is directly input to the selector 12. On the other hand, the network control signal 102 input from the network control circuit 3 is input to the 4B odd skip instruction recognition circuit 13, and the format of the request issued from the vector operation circuit 4 via the network control signal 102 is 4B. When it is recognized that it corresponds to an odd-number jump instruction, the network control signal 102 input from the network control circuit 3 to the contention arbitration circuit 6
The selection control signal 109 output from the 4B odd-numbered jump instruction recognition circuit 13 is turned on via. Selector 12
, The OR output and selection control signal 109
When both are turned on, the hold signal 105 corresponding to each main part is output for the first time and sent to the buffer circuit 7.

【００１９】次に、図３（ａ）、（ｂ）、（ｃ）、
（ｄ）および（ｅ）と、図４（ａ）、（ｂ）、（ｃ）、
（ｄ）および（ｅ）を参照して、本実施例の動作を敷延
して説明する。Next, FIGS. 3 (a), (b), (c),
(D) and (e) and FIGS. 4 (a), (b), (c),
With reference to (d) and (e), the operation of this embodiment will be described in a delayed manner.

【００２０】図３は、本実施例における、４Ｂ奇数飛び
（４Ｂ＊１飛び）ベクトル命令時のリクェスト処理動作
説明図である。図３（ａ）におけるＡ、Ｂ、ＣおよびＤ
は、メモリ部９のメモリポートを示しており、それぞれ
のメモリポートは、アッパー（Ｕ）とローアー（Ｌ）と
に分けられている。これらのメモリポートの下に示され
る○で囲まれた数字は、ディスタンス“４Ｂ”の４Ｂベ
クトル命令時におけるリクェスト到着場所と、リクェス
ト発行順序とを示している。この順序でリクェストが発
行された時のバッファおよびレジスタの１Ｔ目の状態
が、図３（ｂ）に示される。通常のベクトル命令おける
リクェスト処理においては、従来の動作と同じである
が、命令認識回路２において４Ｂ奇数飛びベクトル命令
が認識されて、Ｈｏｌｄａｌｌ処理からＨｏｌｄｏ
ｎｌｙ処理に切替えられる。１Ｔ目においては、レジス
タに格納されている四つのリクェストについて競合チェ
ックされた結果、○で囲まれている二つのリクェストが
処理される。この際に、競合に敗れたリクェストがある
入力ポートに対応するホールド信号がセレクトされ、各
バッファおよびレジスタに対するホールド信号として出
力される。即ち、１Ｔ目において競合に敗れたリクェス
トのある入力ポートのバッファおよびレジスタに対する
ホールド信号のみがオンとなる（Ｈｏｌｄｏｎｌｙ処
理）。FIG. 3 is an explanatory diagram of a request processing operation at the time of a 4B odd jump (4B * 1 jump) vector instruction in this embodiment. A, B, C and D in FIG.
Indicates a memory port of the memory unit 9, and each memory port is divided into an upper (U) and a lower (L). The numbers circled under these memory ports indicate the request arrival location and the request issuing order at the time of the 4B vector instruction of the distance "4B". The state of the first T of the buffer and the register when the request is issued in this order is shown in FIG. The request processing in the normal vector instruction is the same as the conventional operation, but the 4B odd jump vector instruction is recognized in the instruction recognition circuit 2, and the Hold all processing to the Hold o processing are performed.
Switched to nly processing. At the 1T, as a result of the contention check for the four requests stored in the register, two requests surrounded by a circle are processed. At this time, a hold signal corresponding to an input port having a request that has lost competition is selected and output as a hold signal for each buffer and register. That is, only the hold signal for the buffer and the register of the input port having the request that has lost the competition at the 1T is turned on (Hold only process).

【００２１】次に、図３（ｃ）の２Ｔ目においては、ホ
ールド信号がオンとならなかった入力ポート＃０と入力
ポート＃２のバッファからレジスタにリクェストが送ら
れて、合計四つのリクェストが競合チェックされて処理
される。この際に、四つのリクェストは、それぞれ別の
メモリポートを目指しているために、競合が生じること
はなく、各入力ポートのバッファおよびレジスタに対す
るホールド信号もオフの状態のままである。次いで図３
（ｄ）の３Ｔ目においては、全入力ポートのバッファよ
りレジスタにリクェストが送られて競合チェックがかか
るが、四つのリクェストは、それぞれ別のメモリポート
を目指しているために競合は起きず、全てリクェスト処
理される。このことは図３（ｅ）の４Ｔ目以降において
も同様である。本実施例における図３と、従来例におけ
る図６との対比により明らかなように、本実施例による
方が従来例に対して２倍の効率でリクェスト処理を行う
ことができることがわかる。Next, in the 2T of FIG. 3C, the request is sent from the buffers of the input port # 0 and the input port # 2 whose hold signals were not turned on to the register, and a total of four requests are generated. Conflict checked and processed. At this time, since the four requests are aimed at different memory ports, no conflict occurs, and the hold signals for the buffers and registers of the respective input ports also remain off. Then Fig. 3
At the 3T of (d), requests are sent from the buffers of all the input ports to the registers and a conflict check is performed, but since the four requests are aimed at different memory ports, conflict does not occur, and all The request is processed. This is the same for the 4T and subsequent figures in FIG. 3 (e). As is clear from the comparison between FIG. 3 of the present embodiment and FIG. 6 of the conventional example, it is understood that the request processing according to the present embodiment can perform the request processing with twice the efficiency of the conventional example.

【００２２】図４（ａ）、（ｂ）、（ｃ）、（ｄ）およ
び（ｅ）は、本実施例における、４Ｂ奇数飛び（４Ｂ＊
３飛び）ロード／ストア命令発行時のリクェスト処理の
動作説明図である。図４（ａ）には、この命令が発行さ
れた時には、Ａ、Ｂ、ＣおよびＤの四つのメモリポート
に対して送られてくるリクェストの順番が、、、
、、、……にて示されている。演算部１において
は、４Ｂ奇数飛び命令時には、入力ポート＃０、入力ポ
ート＃１、入力ポート＃２、入力ポート＃３、入力ポー
ト＃０、……の順にリクェストが送出される。図４
（ｂ）、（ｃ）、（ｄ）および（ｅ）には、この順番で
送出した時の順番と行き先のメモリポートを示したリク
ェストが示されており、図４（ｂ）の１Ｔ目において
は、送られてきたリクェストが、バッファおよびレジス
タに格納された状態が示されている。図４（ｂ）におい
て、○で囲まれている１Ａ、２Ｂおよび３Ｄの三つのリ
クェストが処理されて、競合に敗れたリクェスト４Ａの
存在するバッファ＃３に対してのみホールド信号がオン
となる（ＨｏｌｄＯｎｌｙ処理）。このために、図４
（ｃ）に示ざれる２Ｔ目においては、ホールド信号がオ
フとなった三つのバッファからは次のリクェストがレジ
スタに送られ、競合チェックの結果、全てのリクェスト
が処理される。図７（ｄ）の３Ｔ目においては、四つの
バッファから次のリクェストがレジスタに送られてくる
が、競合が起きることなく全てのリクェストが処理され
る。図（ｅ）の４Ｔ目以降においても、競合は起こら
ず、四つのリクェストは全て処理されてゆく。このこと
は、本実施例の構成において最大のスループットであ
り、従来例における動作に比較して約２倍の効率でリク
ェストを処理することが可能となる。4 (a), (b), (c), (d) and (e) show 4B odd number jumps (4B *) in this embodiment.
FIG. 9 is an operation explanatory diagram of request processing when a load / store instruction is issued. In FIG. 4A, when this instruction is issued, the order of requests sent to the four memory ports A, B, C and D is as follows:
,,, ... are shown. In the arithmetic unit 1, when a 4B odd-numbered jump instruction is issued, the request is transmitted in the order of input port # 0, input port # 1, input port # 2, input port # 3, input port # 0, .... FIG.
(B), (c), (d) and (e) show the request indicating the order of sending in this order and the memory port of the destination, and in 1T of FIG. 4 (b). Shows that the transmitted request is stored in the buffer and the register. In FIG. 4B, the three requests 1A, 2B, and 3D surrounded by ◯ are processed, and the hold signal is turned on only for the buffer # 3 in which the request 4A that has lost the competition exists. Hold Only treatment). To this end, FIG.
At the 2T shown in (c), the next request is sent to the register from the three buffers whose hold signals have been turned off, and as a result of the conflict check, all requests are processed. At the 3T in FIG. 7D, the next request is sent from the four buffers to the register, but all the requests are processed without conflict. Even after the 4th T in the figure (e), competition does not occur, and all four requests are processed. This is the maximum throughput in the configuration of the present embodiment, and the request can be processed with approximately twice the efficiency of the operation in the conventional example.

【００２３】[0023]

【発明の効果】以上説明したように、本発明は、４Ｂ奇
数飛びロード／ストア命令のように、リクェスト間のコ
ンシステンシを守らなくてもよい命令を検出して、Ｈｏ
ｌｄａｌｌ処理からＨｏｌｄｏｎｌｙ処理に切替える
ことにより、常にリクェスト間において競合が生じる４
Ｂ奇数飛びロード／ストア命令が発行された時点におい
ても、リクェスト処理効率の低下を防止することができ
るという効果がある。As described above, the present invention detects an instruction such as a 4B odd jump load / store instruction that does not need to maintain the consistency between requests and outputs Ho.
By switching from ldall processing to Hold only processing, competition always occurs between request 4
Even when the B odd skip load / store instruction is issued, it is possible to prevent a decrease in request processing efficiency.

[Brief description of drawings]

【図１】本発明の一実施例の構成を示すブロック図であ
る。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention.

【図２】本実施例における競合調停回路の一実施例を示
すブロック図である。FIG. 2 is a block diagram showing an embodiment of a contention arbitration circuit according to the present embodiment.

【図３】本実施例における４Ｂ連続ロード／ストア命令
時の動作説明図である。FIG. 3 is an operation explanatory diagram at the time of a 4B continuous load / store instruction in the present embodiment.

【図４】本実施例における４Ｂ奇数飛びロード／ストア
命令時の動作説明図である。FIG. 4 is an explanatory diagram of an operation at the time of a 4B odd-numbered jump load / store instruction in this embodiment.

【図５】従来例の構成を示すブロック図である。FIG. 5 is a block diagram showing a configuration of a conventional example.

【図６】従来例における４Ｂ連続ロード／ストア命令時
の動作説明図である。FIG. 6 is an operation explanatory diagram at the time of a 4B continuous load / store instruction in a conventional example.

【図７】従来例における４Ｂ奇数飛びロード／ストア命
令時の動作説明図である。FIG. 7 is an explanatory diagram of an operation at the time of a 4B odd-numbered jump load / store instruction in the conventional example.

[Explanation of symbols]

１演算部２命令認識回路３ネットワーク制御回路４ベクトル演算回路５ネットワーク部６競合調停回路７バッファ回路８クロスバ回路９メモリ部１０アービター１１論理和回路１２セレクタ１３４Ｂ奇数飛び命令認識回路 1 arithmetic unit 2 instruction recognition circuit 3 network control circuit 4 vector arithmetic circuit 5 network unit 6 competitive arbitration circuit 7 buffer circuit 8 crossbar circuit 9 memory unit 10 arbiter 11 OR circuit 12 selector 13 4B odd skip instruction recognition circuit

Claims

[Claims]

1. At least one arithmetic unit for performing vector arithmetic and 2N bytes (hereinafter abbreviated as 2NB; N).
Is an integer of 1 or more), and at least one main storage unit configured by a memory module having a plurality of banks in which data is a minimum request unit and having a plurality of ports capable of performing simultaneous parallel processing, In a vector data processing device comprising a network unit capable of parallelly transferring data in units of 2NB between the arithmetic unit and the main storage unit, the arithmetic unit analyzes a predetermined instruction code. And outputs a vector operation execution instruction corresponding to the recognized vector instruction, and an instruction recognition circuit that outputs the vector instruction, and executes a predetermined vector operation in response to the vector operation execution instruction, and outputs to each output port. A vector operation circuit which outputs a corresponding request, and which receives the vector instruction and encodes the vector instruction into a single And a network control circuit that outputs a network control signal at a predetermined timing, and the network unit receives the request of each port output from the vector operation circuit, and temporarily A buffer circuit that is stored in a register, and recognizes the request format of each port output from the vector operation circuit by inputting the network control signal and the network control signal. The request contention state in the register is detected by the request address information sent from the register, and the input format corresponding to the element lost in the contention and all the input ports when the contention occurs, using the request format as the selection control criterion. A competitive arbitration circuit that holds the The request of each port output from the logic circuit is input, the select signal output from the contention arbitration circuit is received, the request of each port is selected by the select signal, and the corresponding input port of the main memory unit is selected. And a crossbar circuit for outputting to a vector data processing device.

2. The contention arbitration circuit receives address information corresponding to each port output from a register of the buffer circuit, detects a contention state of a request, and outputs the select signal to the crossbar circuit. In addition, an arbiter that outputs a hold signal corresponding to each port and a network control signal input from the network control circuit, and a request for each port output from the vector operation circuit via the network control signal. NB which recognizes the format and outputs the switching control signal as ON at the time when the NB odd skip instruction format is recognized.
An odd jump instruction recognition circuit, a logical sum circuit for receiving a hold signal corresponding to each port output from the arbiter, taking a logical sum of them, and outputting the result, and a hold corresponding to each port output from the arbiter Input the signal and the logical sum output of the logical sum circuit,
2. The vector data processing device according to claim 1, further comprising a selector controlled by the switching control signal to select a hold signal corresponding to a predetermined port and output the selected hold signal to a corresponding register in the buffer circuit. .