JP5505963B2

JP5505963B2 - Vector processing apparatus and vector operation processing method

Info

Publication number: JP5505963B2
Application number: JP2009267385A
Authority: JP
Inventors: 秀之佐藤
Original assignee: NEC Computertechno Ltd
Current assignee: NEC Computertechno Ltd
Priority date: 2009-11-25
Filing date: 2009-11-25
Publication date: 2014-05-28
Anticipated expiration: 2029-11-25
Also published as: JP2011113183A

Description

本発明はベクトル処理装置及びベクトル演算処理方法に関する。 The present invention relates to a vector processing device and a vector operation processing method.

近年、大量のベクトルデータを高速かつ正確に処理するために、さまざまな技術が開示されている。 In recent years, various techniques have been disclosed in order to process a large amount of vector data quickly and accurately.

特許文献１に記載のベクトル処理装置は、複数の演算パイプラインと接続するベクトルレジスタと、主記憶装置との間にデータバッファを配置している。当該ベクトル処理装置は、ベクトルレジスタからデータバッファに読み出されたデータ数をカウントし、カウントされたデータ数に基づいてベクトルユニットから主記憶装置に対するアクセス要求の送出を中断する。これにより、演算パイプラインの動作を停止する必要がある場合に、必要最小限のアクセス要求のみを中断させる。 In the vector processing device described in Patent Document 1, a data buffer is disposed between a vector register connected to a plurality of arithmetic pipelines and a main storage device. The vector processing device counts the number of data read from the vector register to the data buffer, and interrupts the transmission of an access request from the vector unit to the main memory based on the counted number of data. Thus, when it is necessary to stop the operation of the operation pipeline, only the minimum necessary access request is interrupted.

特許文献２に記載のベクトル処理装置は、ベクトルデータのメモリアクセス命令の発行時に、入力ポートの偶数入力ポートと奇数入力ポートについて、それぞれ出力ポートの競合調停を行うアービタを設ける。当該ベクトル処理装置は競合調停処理を分割したため、競合調停処理のスループットの低下を抑止する。 The vector processing device described in Patent Document 2 includes an arbiter that performs contention arbitration between output ports for even-numbered input ports and odd-numbered input ports when a vector data memory access instruction is issued. Since the vector processing apparatus divides the contention arbitration process, it suppresses a decrease in the throughput of the contention arbitration process.

特許文献３に記載のベクトル処理装置は、ベクトルデータの圧縮、拡張する変換命令を実現している。当該ベクトル処理装置は、ベクトルデータ要素数よりも少ない格納領域を有するデータバッファを設けている。当該ベクトル処理装置はベクトルデータ間の演算を必要としないもの、または演算結果が０になるもの等をデータから除外したベクトルデータをデータバッファに書き込む。上述の構成により、データバッファサイズを小さくすることを可能としている。 The vector processing device described in Patent Document 3 realizes a conversion instruction for compressing and expanding vector data. The vector processing apparatus is provided with a data buffer having a storage area smaller than the number of vector data elements. The vector processing apparatus writes vector data in which data that does not require calculation between vector data or data whose calculation result is 0 is excluded from data. With the above configuration, the data buffer size can be reduced.

特許文献４に記載のベクトル処理装置は、同期が必要なベクトル命令の処理の高速化を実現している。当該ベクトル処理装置は、ベクトルパイプライン演算部に割り当てられたベクトル要素の数を算出し、ベクトル命令の発行タイミングを事前に予測する。これにより、同期制御信号のやりとり等が不要となり、処理の高速化が図れる。 The vector processing device described in Patent Document 4 realizes high-speed processing of vector instructions that require synchronization. The vector processing device calculates the number of vector elements allocated to the vector pipeline operation unit, and predicts the issue timing of the vector instruction in advance. This eliminates the need for synchronization control signal exchange and the like, and speeds up the processing.

特開平６−１６８２６４号公報JP-A-6-168264 特開平９−６７５９号公報Japanese Patent Laid-Open No. 9-6759 特開２０００−３５９５８号公報JP 2000-35958 A 特開２０００−２２２３９０号公報JP 2000-222390 A

しかしながら、上述のベクトル処理装置では、ベクトルレジスタからのデータの読み出し及び書き出しの処理が性能上の問題となる可能性がある。以下に詳細を説明する。 However, in the above-described vector processing apparatus, processing for reading and writing data from the vector register may cause a performance problem. Details will be described below.

一般にベクトル処理装置は、複数のベクトル処理ユニット（ベクトルパイプサブユニット）から構成される。ベクトルパイプサブユニットは、ベクトル演算処理を行うベクトル演算パイプを複数有する構成とできる。ベクトル演算パイプは、相互にベクトルデータの入出力をすることができる（以下の記載では、ベクトル演算パイプ間でデータの入出力が発生する処理をベクトルパイプ間動作命令と記載する。）。 In general, a vector processing apparatus is composed of a plurality of vector processing units (vector pipe subunits). The vector pipe subunit can have a plurality of vector operation pipes that perform vector operation processing. Vector operation pipes can mutually input and output vector data (in the following description, processing in which data input / output occurs between vector operation pipes is referred to as an inter-vector pipe operation instruction).

図８に本発明が解決しようとする課題の１つに関連するベクトル処理装置の構成を示す。ベクトル処理装置は、パイプ間クロスバユニット１０と、ベクトルパイプサブユニット２０と、信号線３０と、を備える。ベクトルパイプユニット２０は複数のベクトルパイプサブユニット２００から構成される。ベクトルパイプサブユニット２００は、ベクトル演算パイプ２１０を備える。 FIG. 8 shows the configuration of a vector processing apparatus related to one of the problems to be solved by the present invention. The vector processing apparatus includes an inter-pipe crossbar unit 10, a vector pipe subunit 20, and a signal line 30. The vector pipe unit 20 is composed of a plurality of vector pipe subunits 200. The vector pipe subunit 200 includes a vector operation pipe 210.

一般にベクトルパイプサブユニット２００内のベクトル演算パイプ２１０の数が増えた場合、ベクトルパイプユニット２０とパイプ間クロスバユニット１０とを結ぶ信号線の数が増える。これにより、配線収容性が圧迫されることになり、ＬＳＩ設計が困難になる。そこで、図８に示すようにベクトルパイプサブユニット２００内のベクトル演算パイプ２１０が相互にベクトルデータ読み出しをインターリーブして行う（交互に行う）制御を行う。しかし、当該処理はインターリーブを行うことにより早期の処理終了が困難な傾向がある。 In general, when the number of vector operation pipes 210 in the vector pipe subunit 200 increases, the number of signal lines connecting the vector pipe unit 20 and the cross-pipe cross-bar unit 10 increases. This imposes pressure on the wiring capacity and makes LSI design difficult. Therefore, as shown in FIG. 8, the vector operation pipe 210 in the vector pipe subunit 200 performs control (interleaved) that interleaves vector data reading with each other. However, the process tends to be difficult to finish at an early stage by performing interleaving.

ある命令に基づいてベクトルパイプサブユニット内にあるベクトル演算パイプからのデータ読み出し処理を行っている間は、当該命令と関連するリソースへの処理は禁止される。そのため、インターリーブ制御により処理が長時間かかり、ベクトル演算パイプで処理すべき後続処理が待ち状態となり、全体の処理が遅延する恐れがある。 While data is being read from the vector operation pipe in the vector pipe subunit based on a certain instruction, processing on the resource associated with the instruction is prohibited. For this reason, processing takes a long time due to interleave control, and subsequent processing to be processed by the vector operation pipe enters a waiting state, which may delay the entire processing.

本発明は、このような問題点を解決するためになされたものであり、ベクトル処理装置内のベクトル演算命令がベクトルパイプ間動作命令により遅延する問題を解決したベクトル処理装置を提供することを目的とする。 The present invention has been made to solve such problems, and an object of the present invention is to provide a vector processing apparatus that solves the problem that a vector operation instruction in a vector processing apparatus is delayed by an operation instruction between vector pipes. And

本発明にかかるベクトル処理装置の一態様は、第１及び第２のベクトル演算パイプを備える第１の処理ユニットと、第２の処理ユニットと、前記第１の処理ユニットと前記第２の処理ユニットとの間でデータを転送できるよう構成されたデータ転送回路と、前記第１の処理ユニットと前記データ転送回路の間を接続するとともに、前記第１及び第２のベクトル演算パイプの出力データを前記データ転送回路に供給するため、又は前記データ転送回路から前記第１及び第２のベクトル演算パイプに入力データを供給するために使用される第１のデータパスと、を備え、前記第１の処理ユニットは、前記第１及び第２のベクトル演算パイプと前記第１のデータパスとの間に配置され、前記出力データ又は前記入力データを保持するデータバッファを備え、前記第１及び第２のベクトル演算パイプは、他方のベクトル演算パイプが前記データバッファへアクセス中であるか否か及び前記第１のデータパスが使用中であるか否かに依存することなく、前記データバッファにアクセス可能に構成されている、ものである。 One aspect of the vector processing apparatus according to the present invention includes a first processing unit including first and second vector operation pipes, a second processing unit, the first processing unit, and the second processing unit. A data transfer circuit configured to transfer data between the first processing unit and the data transfer circuit, and output data of the first and second vector operation pipes to the data transfer circuit. A first data path used for supplying data to the data transfer circuit or for supplying input data from the data transfer circuit to the first and second vector operation pipes. A unit is a data buffer arranged between the first and second vector operation pipes and the first data path, and holding the output data or the input data And the first and second vector operation pipes depend on whether the other vector operation pipe is accessing the data buffer and whether the first data path is in use. The data buffer is configured to be accessible.

本発明によれば、ベクトル処理装置内のベクトル演算命令がベクトルパイプ間動作命令により遅延する問題を解決することができる。 According to the present invention, it is possible to solve the problem that the vector operation instruction in the vector processing apparatus is delayed by the operation instruction between vector pipes.

実施の形態１にかかるベクトル処理装置のブロック図である。1 is a block diagram of a vector processing apparatus according to a first embodiment; 実施の形態１にかかるベクトル処理装置のブロック図である。1 is a block diagram of a vector processing apparatus according to a first embodiment; 実施の形態１にかかるベクトルパイプサブユニットの構成を示すブロック図である。It is a block diagram which shows the structure of the vector pipe subunit concerning Embodiment 1. FIG. 実施の形態１にかかるベクトルレジスタの構成を示すブロック図である。1 is a block diagram showing a configuration of a vector register according to a first embodiment. 実施の形態１にかかる発行制御部の構成を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration of an issue control unit according to the first exemplary embodiment. 実施の形態１にかかるベクトル処理装置の命令実行時の動作を示すタイミングチャートである。6 is a timing chart showing an operation at the time of instruction execution of the vector processing apparatus according to the first exemplary embodiment; 実施の形態１にかかるベクトル処理装置の命令実行時の動作を示すタイミングチャートである。6 is a timing chart showing an operation at the time of instruction execution of the vector processing apparatus according to the first exemplary embodiment; 本発明が解決しようとする課題の１つに関連するベクトル処理装置を示すブロック図である。It is a block diagram which shows the vector processing apparatus relevant to one of the problems which this invention tends to solve. 本発明が解決しようとする課題の１つに関連するベクトル処理装置の命令実行時の動作を示すタイミングチャートである。It is a timing chart which shows the operation | movement at the time of instruction execution of the vector processing apparatus relevant to one of the problems which this invention tends to solve.

実施の形態１
以下、図面を参照して本発明の実施の形態について説明する。まず、図１を参照して、本実施の形態１にかかるベクトル処理装置の基本構成と、その動作の概略について説明する。 Embodiment 1
Embodiments of the present invention will be described below with reference to the drawings. First, the basic configuration of the vector processing apparatus according to the first embodiment and the outline of the operation will be described with reference to FIG.

ベクトル処理装置内にあるベクトルサブユニット２００は、ベクトル演算パイプ２１０と、送信データバッファ２２０と、受信データバッファ２３０と、ベクトルデータ選択回路２７０と、を備える。ベクトル演算パイプ２１０のデータ入出力は、送信データバッファ２２０及び受信データバッファ２３０を用いて行う。ベクトル演算パイプ２１０は、対応する送信データバッファ２２０にベクトルデータを出力のみを行い、他の処理はベクトル演算パイプ２１０の外部で行う。すなわち、ベクトルデータ選択回路２７０が各送信データバッファ２２０に格納されたデータをインターリーブして出力する。 The vector subunit 200 in the vector processing apparatus includes a vector operation pipe 210, a transmission data buffer 220, a reception data buffer 230, and a vector data selection circuit 270. Data input / output of the vector operation pipe 210 is performed using the transmission data buffer 220 and the reception data buffer 230. The vector operation pipe 210 only outputs vector data to the corresponding transmission data buffer 220, and other processing is performed outside the vector operation pipe 210. That is, the vector data selection circuit 270 interleaves the data stored in each transmission data buffer 220 and outputs it.

図１に示したベクトル処理装置の構成により、ベクトル演算パイプ２１０はベクトル演算パイプ毎にベクトルデータの入出力のみを実行し、他の処理（インターリーブ処理等）はベクトル演算パイプ２１０の外部で行う。そのため、ベクトル演算パイプ２１０でのベクトルデータの入出力時間が短縮でき、ベクトル演算パイプ２１０内での後続の処理を早期に開始できる。 With the configuration of the vector processing apparatus shown in FIG. 1, the vector operation pipe 210 executes only input / output of vector data for each vector operation pipe, and other processing (interleave processing and the like) is performed outside the vector operation pipe 210. Therefore, the input / output time of vector data in the vector calculation pipe 210 can be shortened, and the subsequent processing in the vector calculation pipe 210 can be started early.

続いて、図２を参照して、本実施の形態１にかかるベクトル処理装置の基本構成について説明する。ベクトル処理装置は、パイプ間クロスバユニット１０と、ベクトルパイプユニット２０と、信号線３０と、を備える。パイプ間クロスバユニット１０は、ベクトルパイプサブユニット２００間のデータ転送を行うための処理部である。 Next, a basic configuration of the vector processing apparatus according to the first embodiment will be described with reference to FIG. The vector processing apparatus includes an inter-pipe crossbar unit 10, a vector pipe unit 20, and a signal line 30. The inter-pipe crossbar unit 10 is a processing unit for performing data transfer between the vector pipe subunits 200.

ベクトルパイプユニット２０は複数のベクトルパイプサブユニット２００から構成される。ベクトルパイプサブユニット２００は、ベクトル演算パイプ２１０と、送信データバッファ２２０と、受信データバッファ２３０と、を備える。ベクトルパイプサブユニット２００は、複数（図中では２つ）のベクトル演算パイプ２１０を備える。ベクトルパイプサブユニット２００はベクトル演算を行うための処理ユニットである。ベクトルパイプサブユニット２００の詳細な構成は後述する。 The vector pipe unit 20 is composed of a plurality of vector pipe subunits 200. The vector pipe subunit 200 includes a vector operation pipe 210, a transmission data buffer 220, and a reception data buffer 230. The vector pipe subunit 200 includes a plurality (two in the figure) of vector operation pipes 210. The vector pipe subunit 200 is a processing unit for performing vector operations. The detailed configuration of the vector pipe subunit 200 will be described later.

信号線３０は、ベクトルパイプサブユニット２００と、パイプ間クロスバユニット１０と、の間を結ぶ信号線である。信号線３０は、ベクトルパイプサブユニット２００からの出力データをパイプ間クロスバユニット１０に入力するための信号線と、パイプ間クロスバユニット１０からの出力データをベクトルパイプサブユニット２００に入力するための信号線とを含む。 The signal line 30 is a signal line connecting between the vector pipe subunit 200 and the inter-pipe crossbar unit 10. The signal line 30 is a signal line for inputting the output data from the vector pipe subunit 200 to the cross-pipe crossbar unit 10 and a signal for inputting the output data from the cross-pipe crossbar unit 10 to the vector pipe subunit 200. Including lines.

続いて、図３を用いて、ベクトルパイプサブユニット２００の詳細な構成を説明する。ベクトルパイプサブユニット２００は、ベクトル演算パイプ２１０と、送信データバッファ２２０と、受信データバッファ２３０と、ベクトルデータ選択回路２７０と、を備える。 Next, a detailed configuration of the vector pipe subunit 200 will be described with reference to FIG. The vector pipe subunit 200 includes a vector operation pipe 210, a transmission data buffer 220, a reception data buffer 230, and a vector data selection circuit 270.

ベクトル演算パイプ２１０は、ベクトル命令の処理を行う。ベクトル命令の処理とは、ベクトルレジスタを読み出し、読み出した値を用いた演算を行い、演算結果をレジスタに書き込む、という一連の処理を示す。また、ベクトル演算パイプ２１０は、他のベクトル演算パイプ２１０にデータを出力する処理を行う場合、送信データバッファ２２０に送信データを書き込む。 The vector operation pipe 210 performs vector instruction processing. Vector instruction processing refers to a series of processing of reading a vector register, performing an operation using the read value, and writing the operation result to the register. Further, the vector operation pipe 210 writes transmission data in the transmission data buffer 220 when performing processing for outputting data to other vector operation pipes 210.

送信データバッファ２２０は、他のベクトル演算パイプ２１０にデータを出力する処理を行う場合に、出力データを一時的に記憶するためのデータバッファである。送信データバッファ２２０は、ベクトルパイプサブユニット２００内の各ベクトル演算パイプ２１０と対応して配置される。受信データバッファ２３０は、パイプ間クロスバユニット１０から入力されるデータを一時的に記憶するためのデータバッファである。受信データバッファ２３０は、ベクトルパイプサブユニット２００内の各ベクトル演算パイプ２１０と対応して配置される。 The transmission data buffer 220 is a data buffer for temporarily storing output data when processing for outputting data to another vector operation pipe 210 is performed. The transmission data buffer 220 is arranged corresponding to each vector operation pipe 210 in the vector pipe subunit 200. The reception data buffer 230 is a data buffer for temporarily storing data input from the inter-pipe crossbar unit 10. The reception data buffer 230 is arranged corresponding to each vector operation pipe 210 in the vector pipe subunit 200.

ベクトルデータ選択回路２７０は、各受信データバッファ２２０から出力されたデータをインターリーブして、パイプ間クロスバユニット１０に出力するための選択回路である。 The vector data selection circuit 270 is a selection circuit for interleaving the data output from each reception data buffer 220 and outputting it to the inter-pipe crossbar unit 10.

ベクトル演算パイプ２１０は、パイプ内クロスバ２４０と、ベクトルレジスタ２５０と、演算器２６０と、を備える。パイプ内クロスバ２４０は、演算器２６０から出力された値をベクトルレジスタ２５０に入力する。ベクトルレジスタ２５０は、演算器２６０の演算結果を一時的に記憶するレジスタであり、ベクトル演算パイプ２１０内に複数存在する（図３では、ベクトル演算パイプ２１０に対して８つのベクトルレジスタ２５０で構成されている）。演算器２６０は、ベクトル演算を行うための処理部である。 The vector operation pipe 210 includes an in-pipe crossbar 240, a vector register 250, and an arithmetic unit 260. The in-pipe crossbar 240 inputs the value output from the calculator 260 to the vector register 250. The vector register 250 is a register that temporarily stores the calculation result of the calculator 260, and a plurality of vector registers 250 exist in the vector calculation pipe 210 (in FIG. 3, the vector calculation pipe 210 includes eight vector registers 250). ing). The computing unit 260 is a processing unit for performing vector computation.

図４は、ベクトルレジスタ２５０の詳細を示す図である。ベクトルレジスタ２５０は、ベクトルレジスタ２５０へのデータ書き込みにかかるベクトルレジスタライトパスと、ベクトルレジスタ２５０からのデータ読み込みにかかるベクトルレジスタリードパスと、を有する。各ベクトルレジスタ２５０は、ベクトルレジスタ番号を持つ複数のレジスタから構成される。図４ではベクトルレジスタ２５０は、ベクトルレジスタ番号ＶＲ０、ＶＲ８、ＶＲ１６、ＶＲ２４、ＶＲ３２、ＶＲ４０、ＶＲ４８、ＶＲ５６を持つレジスタから構成されている。また、図４に示したベクトルレジスタ２５０とは異なるベクトルレジスタ２５０（例えば、図４に示したベクトルレジスタ２５０の隣に配置されているベクトルレジスタ２５０）は異なるベクトルレジスタ番号（例えばＶＲ１等）を持つレジスタから構成される。ここで、同一のベクトルレジスタ２５０内にあるレジスタ（例えばＶＲ０とＶＲ８）からのデータ読み出しは、同一のベクトルレジスタリードパスを使用するため、競合関係（他方の処理を待たなければならない関係）となる。同様に、同一のベクトルレジスタ２５０内にあるレジスタ（例えばＶＲ０とＶＲ８）へのデータ書き込みは、同一のベクトルレジスタライトパスを使用するため、競合関係（他方の処理を待たなければならない関係）となる。（なお、以下の説明では、ベクトルレジスタ番号の差が８の倍数である場合には競合関係にあるものとする。） FIG. 4 is a diagram showing details of the vector register 250. The vector register 250 has a vector register write path for writing data to the vector register 250 and a vector register read path for reading data from the vector register 250. Each vector register 250 includes a plurality of registers having vector register numbers. In FIG. 4, the vector register 250 is composed of registers having vector register numbers VR0, VR8, VR16, VR24, VR32, VR40, VR48, and VR56. Also, a vector register 250 different from the vector register 250 shown in FIG. 4 (for example, the vector register 250 arranged next to the vector register 250 shown in FIG. 4) has a different vector register number (for example, VR1). Consists of registers. Here, since data reading from registers (for example, VR0 and VR8) in the same vector register 250 uses the same vector register read path, there is a contention relationship (relationship that must wait for the other processing). . Similarly, writing data to registers (for example, VR0 and VR8) in the same vector register 250 uses the same vector register write path, and therefore has a conflicting relationship (relationship that must wait for the other process). . (In the following description, it is assumed that there is a competitive relationship when the difference between the vector register numbers is a multiple of 8.)

続いて、図５を用いてベクトル処理装置にベクトルデータの処理命令を発行する発行制御部４００の構成を説明する。発行制御部４００は、命令リクエストＡ格納部４０１と、命令リクエストＢ格納部４０２と、発行チェック回路Ａ４０３と、発行チェック回路Ｂ４０４と、ＧＯＡ４０５と、ＧＯＢ４０６と、ビジーフラグ記憶部４０７と、ベクトルレジスタライトバスビジーフラグ記憶部４０８と、ＷＳレジスタ４０９と、ベクトルレジスタライトパス使用抑止フラグ記憶部４１０と、ＶＴＢリード抑止フラグ４１１と、ＰＸＢ命令２次リクエスト格納部４１２と、発行チェック回路Ｃ４１３と、ＲＧＯ４１４と、を備える。 Next, the configuration of the issue control unit 400 that issues a vector data processing command to the vector processing device will be described with reference to FIG. The issue control unit 400 includes an instruction request A storage unit 401, an instruction request B storage unit 402, an issue check circuit A403, an issue check circuit B404, a GOA 405, a GOB 406, a busy flag storage unit 407, and a vector register write bus. A busy flag storage unit 408, a WS register 409, a vector register write path use inhibition flag storage unit 410, a VTB read inhibition flag 411, a PXB instruction secondary request storage unit 412, an issue check circuit C413, an RGO 414, Is provided.

命令リクエストＡ格納部４０１は、発行予定のベクトル命令を一時的に格納する格納部である。同様に、命令リクエストＢ格納部４０２は、発行予定のベクトル命令を一時的に格納する格納部である。 The instruction request A storage unit 401 is a storage unit that temporarily stores a vector instruction to be issued. Similarly, the instruction request B storage unit 402 is a storage unit that temporarily stores a vector instruction to be issued.

ビジーフラグ記憶部４０７は、各リソースの使用状態を示すビジーフラグを記憶する。ただし、ビジーフラグにはベクトルレジスタ２５０のベクトルレジスタライトパスの使用状況に関する状態の情報は含まない。ベクトルレジスタライトパスビジーフラグ記憶部４０８は、ベクトルレジスタ２５０内のベクトルレジスタライトパスの使用状況に関する状態を示すベクトルレジスタライトパスビジーフラグを記憶する。 The busy flag storage unit 407 stores a busy flag indicating the usage state of each resource. However, the busy flag does not include state information regarding the usage status of the vector register write path of the vector register 250. The vector register write path busy flag storage unit 408 stores a vector register write path busy flag indicating a state relating to the usage status of the vector register write path in the vector register 250.

発行チェック回路Ａ４０３は、命令リクエストＡ格納部４０１に格納された命令リクエストの内容をチェックする。そして、発行チェック回路Ａ４０３は、命令リクエストに応じて使用するリソースの状態をビジーフラグ記憶部４０７に問い合わせる。発行チェック回路Ａ４０３は、当該リソースが使用状態か否かのチェック結果をＧＯＡ４０５に書き込む。発行チェック回路Ａ４０３は、リソースの使用状況が空き状態であれば発行すべき命令をＧＯＡ４０５に通知する。ここで、発行チェック回路Ａ４０３は、ベクトルレジスタ２５０への書き込みを含む処理（ＶＭＶ命令等）の場合、命令を分離して（一次命令、二次命令）、ベクトルレジスタ２５０への書き込み以外の命令（一次命令）をＧＯＡ４０５に通知する。たとえばＶＭＶ命令であれば、一次命令とは、ＶＭＶ命令のうちベクトルレジスタ２５０への書き込み処理の前段階での処理（ベクトルレジスタ２５０からパイプ間クロスバユニット１０への書き出し等）の命令であり、二次命令はベクトルレジスタ２５０への書き込みを指示する命令である。また、発行チェック回路Ａ４０３は、二次命令をＰＸＢ命令２次リクエスト格納部４１２に格納する。 The issue check circuit A 403 checks the contents of the instruction request stored in the instruction request A storage unit 401. Then, the issue check circuit A403 inquires the busy flag storage unit 407 about the state of the resource to be used in response to the instruction request. The issue check circuit A403 writes a check result on whether or not the resource is in use in the GOA 405. The issue check circuit A403 notifies the GOA 405 of an instruction to be issued if the resource usage status is empty. Here, in the case of processing including writing to the vector register 250 (VMV instruction or the like), the issue check circuit A 403 separates the instructions (primary instruction and secondary instruction) and issues instructions other than writing to the vector register 250 ( Primary instruction) is notified to the GOA 405. For example, in the case of a VMV instruction, the primary instruction is an instruction of processing (such as writing from the vector register 250 to the inter-pipe crossbar unit 10) in the stage before the writing process to the vector register 250 in the VMV instruction. The next instruction is an instruction to instruct writing to the vector register 250. Further, the issue check circuit A403 stores the secondary instruction in the PXB instruction secondary request storage unit 412.

発行チェック回路Ｂ４０４は、命令リクエストＢ格納部４０２に格納された命令リクエストの内容をチェックする。そして、発行チェック回路Ｂ４０４は、命令リクエストに応じて使用するリソースの状態をビジーフラグ記憶部４０７に問い合わせる。発行チェック回路Ｂ４０４は、当該リソースが使用状態か否かのチェック結果をＧＯＢ４０６に書き込む。発行チェック回路Ｂ４０４は、リソースの使用状況が空き状態であれば発行すべき命令をＧＯＢ４０６に通知する。ここで、発行チェック回路Ｂ４０４は、発行チェック回路Ａ４０３と同様に、ベクトルレジスタ２５０への書き込みを含む処理（ＶＭＶ命令等）を分離して一次命令をＧＯＢ４０６に通知する。 The issue check circuit B 404 checks the content of the instruction request stored in the instruction request B storage unit 402. Then, the issue check circuit B404 inquires the busy flag storage unit 407 about the state of the resource to be used in response to the instruction request. The issue check circuit B404 writes a check result on whether or not the resource is in use in the GOB 406. The issue check circuit B404 notifies the GOB 406 of an instruction to be issued if the resource usage status is empty. Here, similarly to the issue check circuit A 403, the issue check circuit B 404 separates processing (such as a VMV instruction) including writing to the vector register 250 and notifies the GOB 406 of the primary instruction.

ＧＯＡ４０５は、発行チェック回路Ａ４０３によるリソースのチェック結果が書き込まれるレジスタである。また、ＧＯＡ４０５を管理する処理部は、リソースのチェック結果の書き込みと同時に全てのベクトル演算パイプ２１０にベクトル命令の実行指示を発行する。 The GOA 405 is a register in which a resource check result by the issue check circuit A403 is written. The processing unit managing the GOA 405 issues a vector instruction execution instruction to all the vector operation pipes 210 at the same time as writing the resource check result.

ＧＯＢ４０６は、発行チェック回路Ｂ４０４によるリソースのチェック結果が書き込まれるレジスタである。また、ＧＯＢ４０６を管理する処理部は、リソースのチェック結果の書き込みと同時に全てのベクトル演算パイプ２１０にベクトル命令の実行指示を発行する。 The GOB 406 is a register in which a resource check result by the issue check circuit B 404 is written. The processing unit that manages the GOB 406 issues an instruction to execute a vector instruction to all the vector operation pipes 210 at the same time as writing the resource check result.

ＷＳレジスタ４０９は、制御信号であるＶＴＢライトスタート信号を格納するレジスタである。ＶＴＢライトスタート信号は、ベクトルパイプ間命令動作時に、ベクトルデータがパイプ間クロスバユニット１０から受信データバッファ２３０に書き込まれることを通知する制御信号である。 The WS register 409 is a register that stores a VTB write start signal that is a control signal. The VTB write start signal is a control signal for notifying that vector data is written from the inter-pipe crossbar unit 10 to the reception data buffer 230 during an inter-vector pipe instruction operation.

ベクトルレジスタライトパス使用抑止フラグ記憶部４１０は、ベクトルレジスタ２５０内のベクトルレジスタライトパスの使用を抑止するか否かを示すベクトルレジスタライトパス使用抑止フラグを記憶する。ベクトルレジスタライトパス使用抑止フラグは、ＶＴＢライトスタート信号が入力された場合、他の命令によりベクトルレジスタライトパスが使用されることを禁止するために更新される。 The vector register write path use inhibition flag storage unit 410 stores a vector register write path use inhibition flag indicating whether or not to inhibit use of the vector register write path in the vector register 250. When a VTB write start signal is input, the vector register write path use inhibition flag is updated to prohibit the use of the vector register write path by another instruction.

ＶＴＢリード抑止フラグ記憶部４１１は、受信データバッファ２３０の読み込みを抑止するか否かを示すＶＴＢリード抑止フラグを記憶する。ＶＴＢリード抑止フラグは、ベクトルデータがパイプ間クロスバユニット１０から入力されるベクトルデータが受信バッファ２３０に格納される前に受信バッファ２３０を読み出すことを禁止するために用いられる。

The VTB read inhibition flag storage unit 411 stores a VTB read inhibition flag indicating whether or not to inhibit reading of the reception data buffer 230. VTB read inhibition flag is used to prohibit reading the receive buffer 230 before vector data vector data is inputted from the inter-pipe crossbar unit 10 is stored in the receive buffer 230.

ＰＸＢ命令２次リクエスト格納部４１２は、命令リクエストＡ格納部４０１又は命令リクエストＢ格納部４０２に格納された命令リクエストがベクトル間動作命令である場合、ベクトルレジスタ２５０にベクトルデータを書き込むために必要な命令リクエスト（二次命令）を格納する。 The PXB instruction secondary request storage unit 412 is necessary for writing vector data in the vector register 250 when the instruction request stored in the instruction request A storage unit 401 or the instruction request B storage unit 402 is an inter-vector operation instruction. Stores an instruction request (secondary instruction).

発行チェック回路Ｃ４１３は、受信データバッファ２３０からベクトルレジスタ２５０に書き込みを行う場合に動作する。発行チェック回路Ｃ４１３は、ＰＸＢ命令２次リクエスト格納部４１２に格納された命令リクエストに対応するベクトルレジスタライトパスビジーフラグと、ＶＴＢリード抑止フラグと、ＷＳレジスタ４０９の値と、をチェックする。発行チェック回路Ｃ４１３は、チェック結果をＲＧＯ４１４に書き込む。 The issue check circuit C413 operates when writing from the received data buffer 230 to the vector register 250. The issue check circuit C413 checks the vector register write pass busy flag corresponding to the instruction request stored in the PXB instruction secondary request storage unit 412, the VTB read inhibition flag, and the value of the WS register 409. The issue check circuit C413 writes the check result in the RGO 414.

ＲＧＯ４１４は、発行チェック回路Ｃ４１３によるチェック結果を書き込むためのレジスタである。また、ＲＧＯ４１４を管理する処理部は、発行チェック回路Ｃ４１３によるチェック結果の書き込みと同時に、全てのベクトル演算パイプ２１０に受信データバッファ２３０からベクトルレジスタ２５０へのデータ書き込みの実行を指示する。 The RGO 414 is a register for writing the check result by the issue check circuit C413. The processing unit that manages the RGO 414 instructs all the vector operation pipes 210 to execute data writing from the reception data buffer 230 to the vector register 250 simultaneously with the writing of the check result by the issue check circuit C413.

続いて、本実施の形態にかかるベクトル処理装置が以下の２つのベクトルパイプ間動作命令（ＶＭＶ：パイプ間ムーブ命令、ＶＡＤＤ：ベクトル加算命令、ＶＲ：ベクトルレジスタ番号、Ｓ：定数）を実行した際の動作について説明する。
（１）ＶＭＶＶＲｍ←ＶＲｎ
（２）ＶＡＤＤＶＲｏ←Ｓｙ＋ＶＲｐ Subsequently, when the vector processing apparatus according to the present embodiment executes the following two vector pipe operation instructions (VMV: move instruction between pipes, VADD: vector addition instruction, VR: vector register number, S: constant) Will be described.
(1) VMV VRm ← VRn
(2) VADD VRo ← Sy + VRp

まず、ＶＭＶ命令を発行した後に、ＶＡＤＤ命令を発行した際の本実施の形態にかかるベクトル処理装置の一般的な動作を以下に説明する。まず、ＶＭＶ命令は、命令リクエストＡ格納部４０１に格納される。発行チェック回路Ａ４０３は、当該ＶＭＶ命令が使用するリソースの使用状況をチェックする。具体的には、発行チェック回路Ａ４０３は、ビジーフラグ記憶部４０７と、ベクトルレジスタライトパスビジーフラグ記憶部４０８と、ベクトルレジスタライトパス使用抑止フラグ記憶部４１０と、をチェックしてリソースの使用状況を確認する。 First, a general operation of the vector processing apparatus according to the present embodiment when a VADD instruction is issued after a VMV instruction is issued will be described below. First, the VMV instruction is stored in the instruction request A storage unit 401. The issue check circuit A403 checks the usage status of resources used by the VMV instruction. Specifically, the issue check circuit A403 checks the busy flag storage unit 407, the vector register write path busy flag storage unit 408, and the vector register write path use inhibition flag storage unit 410 to confirm the resource usage status. To do.

発行チェック回路Ａ４０３は、リソースの使用状況が空き状態であればＶＭＶ命令をＧＯＡ４０５に設定するとともに、ＧＯＡ４０３を介してＶＭＶ命令の処理開始をベクトルパイプサブユニット２００に命令する（一次命令発行）。すなわち、発行チェック回路Ａ４０３はＶＭＶ命令をベクトルレジスタ２５０への書き込みの前段階の処理（ベクトルレジスタ２５０からパイプ間クロスバユニット１０への書き出し等）の命令（一次命令）を発行する。また、ＶＭＶ命令の処理開始と同時に、ビジーフラグ記憶部４０７に格納されたビジーフラグの値を更新する。これにより、当該ＶＭＶ命令で使用するリソースに対する後続命令の発行を抑止する。また、ＶＭＶ命令を構成する命令のうち、ベクトルレジスタ２５０へのベクトルデータ書き込み処理に必要な情報（二次命令）をＰＸＢ命令２次リクエスト格納部４１２に格納する。 The issue check circuit A403 sets the VMV instruction in the GOA 405 if the resource usage status is empty, and instructs the vector pipe subunit 200 to start processing the VMV instruction via the GOA 403 (issue primary instruction). That is, the issue check circuit A 403 issues an instruction (primary instruction) of a process (such as writing from the vector register 250 to the cross-pipe crossbar unit 10) in the previous stage of writing the VMV instruction to the vector register 250. Simultaneously with the start of processing of the VMV instruction, the value of the busy flag stored in the busy flag storage unit 407 is updated. As a result, issuance of subsequent instructions to resources used in the VMV instruction is suppressed. In addition, information (secondary instruction) necessary for writing vector data into the vector register 250 among the instructions constituting the VMV instruction is stored in the PXB instruction secondary request storage unit 412.

ベクトル演算パイプ２１０によるＶＭＶ命令の実行が終了した後に、パイプ間クロスバユニット１０はＷＳレジスタ４０９にＶＴＢライトスタート信号を入力する。ＷＳレジスタ４０９にＶＴＢライトスタート信号が入力された場合、ベクトルレジスタライトパス使用抑止フラグ記憶部４１０内のベクトルレジスタライトパス使用抑止フラグを更新する。ベクトルレジスタライトパス使用抑止フラグの更新により、当該ＶＭＶ命令とは異なる命令によるベクトルレジスタ２５０への書き込みが抑止された状態になる。これは、パイプ間クロスバユニット１０から入力されたデータを入力先のベクトルレジスタ２５０に書き込むための準備である。 After the execution of the VMV instruction by the vector operation pipe 210 is completed, the inter-pipe crossbar unit 10 inputs a VTB write start signal to the WS register 409. When the VTB write start signal is input to the WS register 409, the vector register write path use inhibition flag in the vector register write path use inhibition flag storage unit 410 is updated. By updating the vector register write path use inhibition flag, writing to the vector register 250 by an instruction different from the VMV instruction is inhibited. This is preparation for writing data input from the cross-pipe crossbar unit 10 into the input destination vector register 250.

パイプ間クロスバユニット１０は、出力先のベクトル演算パイプ２１０に対応した受信データバッファ２３０にベクトルデータを書き込む。ここで、受信データバッファ２３０からベクトル演算パイプ２１０への書き込みのタイミングは、ＶＴＢリード抑止フラグ記憶部４１１内のＶＴＢリード抑止フラグにより管理される。 The inter-pipe crossbar unit 10 writes vector data to the reception data buffer 230 corresponding to the output destination vector operation pipe 210. Here, the timing of writing from the reception data buffer 230 to the vector operation pipe 210 is managed by the VTB read inhibition flag in the VTB read inhibition flag storage unit 411.

発行チェック回路Ｃ４１３は、ベクトル間クロスバユニット１０から受信データバッファ２３０に入力されたベクトルデータをベクトル演算パイプ２１０に入力できるか否かをチェックする。言い換えると、発行チェック回路Ｃ４１３は、当該入力処理にかかるリソースが空き状態か否かをチェックする。発行チェック回路Ｃ４１３は、ベクトルレジスタライトパスビジーフラグ記憶部４０８に格納されたベクトルレジスタライトパスビジーフラグと、ＶＴＢリード抑止フラグ記憶部４１１に格納されたＶＴＢリード抑止フラグと、ＷＳレジスタ４０９の保持する値とをチェックする。ここで、ベクトルレジスタライトパスビジーフラグは、ＰＸＢ命令２次リクエスト格納部４１２に格納した命令リクエストに関連するベクトルレジスタ２５０についての使用状況についてチェックする。 The issue check circuit C413 checks whether the vector data input from the inter-vector crossbar unit 10 to the reception data buffer 230 can be input to the vector operation pipe 210. In other words, the issue check circuit C413 checks whether or not the resource related to the input process is free. The issue check circuit C413 holds the vector register write pass busy flag stored in the vector register write pass busy flag storage unit 408, the VTB read suppression flag stored in the VTB read suppression flag storage unit 411, and the WS register 409. Check the value. Here, the vector register write pass busy flag checks the usage status of the vector register 250 related to the instruction request stored in the PXB instruction secondary request storage unit 412.

発行チェック回路Ｃ４１３は、リソースの使用状況が空き状態であれば受信データバッファ２３０からベクトル演算パイプ２１０への書き込み命令をＲＧＯ４１４に書き込む。また、発行チェック回路Ｃ４１３は、ＲＧＯ４１４への書き込みとともに、受信データバッファ２３０からベクトル演算パイプ２１０へのベクトルデータの書き込み命令を発行する（二次命令発行）。さらに、ＰＸＢ２次リクエスト格納部４１２内に格納されたＰＸＢ２次リクエストに対応するリソースにかかるベクトルレジスタライトパスビジーフラグを設定する。ベクトルレジスタライトパスビジーフラグの設定により、同一リソースに対する後続命令の実行を抑止する。加えて、ＲＧＯ４１４の設定の後に、ベクトルレジスタライトパス使用抑止フラグ記憶部４１０と、ＷＳレジスタ４０９の値と、をリセットする。 The issuance check circuit C413 writes a write command from the reception data buffer 230 to the vector operation pipe 210 in the RGO 414 if the resource usage status is empty. In addition, the issue check circuit C413 issues an instruction to write vector data from the received data buffer 230 to the vector operation pipe 210 along with writing to the RGO 414 (secondary instruction issue). Further, a vector register write path busy flag is set for the resource corresponding to the PXB secondary request stored in the PXB secondary request storage unit 412. Execution of subsequent instructions for the same resource is suppressed by setting the vector register write pass busy flag. In addition, after the RGO 414 is set, the vector register write path use inhibition flag storage unit 410 and the value of the WS register 409 are reset.

後続のＶＡＤＤ命令は、命令リクエストＢ格納部４０２に格納される。発行チェック回路Ｂ４０４は、当該ＶＭＶ命令が使用するリソースの対応状況をチェックする。具体的には、発行チェック回路Ｂ４０４は、ビジーフラグ記憶部４０７と、ベクトルレジスタライトパスビジーフラグ記憶部４０８と、ベクトルレジスタライトパス使用抑止フラグ記憶部４１０と、をチェックしてリソースの使用状況を確認する。 Subsequent VADD instructions are stored in the instruction request B storage unit 402. The issue check circuit B404 checks the correspondence status of resources used by the VMV instruction. Specifically, the issue check circuit B404 checks the busy flag storage unit 407, the vector register write path busy flag storage unit 408, and the vector register write path use inhibition flag storage unit 410 to confirm the resource usage status. To do.

発行チェック回路Ｂ４０４は、リソースの使用状況が空き状態であればＶＡＤＤ命令をＧＯＢ４０６に設定するとともに、ＶＡＤＤ命令の処理開始を命令する。ここで、ＶＡＤＤ命令はＶＭＶ命令の後続命令であるが、使用するリソースが空き状態であれば命令発行の時点でＶＡＤＤ命令の実行を開始する。また、ＶＡＤＤ命令の処理開始と同時に、ビジーフラグ記憶部４０７に格納されたビジーフラグの値と、ベクトルレジスタライトパスビジーフラグ記憶部４０８に格納されたベクトルレジスタライトパスビジーフラグの値と、を更新する。一般にＶＡＤＤ命令はＶＭＶ命令に比べて処理時間が短い。そのため、ＶＭＶ命令実行後に待ち状態とならずに直ちにＶＡＤＤ命令が発行された場合、ＶＡＤＤ命令はＶＭＶ命令よりも先に終了する場合がある。すなわち、ＶＡＤＤ命令の実行はＶＭＶ命令の実行を追い越した形で終了する。 The issuance check circuit B404 sets the VADD instruction in the GOB 406 if the resource usage state is empty, and instructs the start of processing of the VADD instruction. Here, the VADD instruction is a subsequent instruction of the VMV instruction, but if the resource to be used is empty, execution of the VADD instruction is started when the instruction is issued. Simultaneously with the processing start of the VADD instruction, the value of the busy flag stored in the busy flag storage unit 407 and the value of the vector register write pass busy flag stored in the vector register write pass busy flag storage unit 408 are updated. In general, the processing time of the VADD instruction is shorter than that of the VMV instruction. For this reason, if the VADD instruction is issued immediately after the execution of the VMV instruction without waiting, the VADD instruction may end before the VMV instruction. That is, the execution of the VADD instruction is finished in a manner that overtakes the execution of the VMV instruction.

次に、図６を用いて、本実施の形態にかかるベクトル処理装置がＶＭＶ命令と、ＶＡＤＤ命令を実行した際の処理について説明する。図６は以下の命令を実行した際のタイミングチャートである。ＶＲ０とＶＲ８は、ベクトルレジスタ２５０への書き込みについて競合関係となる。なお、図中のマシンサイクルとは、クロックと同義である。
（１）ＶＭＶＶＲ０←ＶＲ１４
（２）ＶＡＤＤＶＲ８←Ｓｙ＋ＶＲ７ Next, processing when the vector processing apparatus according to the present embodiment executes the VMV instruction and the VADD instruction will be described with reference to FIG. FIG. 6 is a timing chart when the following instructions are executed. VR0 and VR8 are in a competitive relationship for writing to the vector register 250. The machine cycle in the figure is synonymous with clock.
(1) VMV VR0 ← VR14
(2) VADD VR8 ← Sy + VR7

最初に発行制御部４００はＶＭＶ命令を一次発行する（Ｓ１）。この場合に、当該ＶＭＶ命令に関連するリソースの使用を抑止するビジーフラグを設定する。当該一次発行にかかる命令ではベクトルレジスタ２５０への書き込みは行わないため、ベクトルレジスタライトパスビジーフラグは設定しない。 First, the issue control unit 400 primarily issues a VMV command (S1). In this case, a busy flag that suppresses the use of resources related to the VMV instruction is set. Since the instruction relating to the primary issue does not write to the vector register 250, the vector register write pass busy flag is not set.

その後、全てのベクトル演算パイプ２１０が処理状態となり、ベクトル演算パイプ２１０はベクトルデータを読み出す（Ｓ１１）。当該読み出す処理は、インターリーブを要することのない処理である。そのため、当該処理は短い時間での終了が可能である。ベクトル演算パイプ２１０は、読み出したベクトルデータを送信データバッファ２２０に格納する（Ｓ１２）。パイプ間クロスバユニット１０は、送信バッファ２２０のリード制御に基づいて送信データバッファ２２０を読み出す。ここで、ベクトルパイプサブユニット２００は２つのベクトル演算パイプ２１０を備えるため、送信データバッファ２２０のリード制御にはデータのインターリーブを考慮する。パイプ間クロスバユニット１０は、それぞれのベクトルパイプサブユニット２００からベクトルデータを取得し、処理を行う（Ｓ１４）。 Thereafter, all the vector operation pipes 210 enter the processing state, and the vector operation pipe 210 reads vector data (S11). The reading process is a process that does not require interleaving. Therefore, the process can be completed in a short time. The vector operation pipe 210 stores the read vector data in the transmission data buffer 220 (S12). The inter-pipe crossbar unit 10 reads the transmission data buffer 220 based on the read control of the transmission buffer 220. Here, since the vector pipe subunit 200 includes two vector operation pipes 210, data interleaving is considered in the read control of the transmission data buffer 220. The inter-pipe crossbar unit 10 acquires vector data from each vector pipe subunit 200 and performs processing (S14).

一方、ＶＡＤＤ命令は、ＶＭＶ命令の一次発行終了後に実行が開始される（Ｓ２）。これは、ＶＲ０とＶＲ８が同一のベクトルレジスタライトパスを使用する競合関係であってもベクトルレジスタライトパスビジーフラグが設定されていないため、ＶＡＤＤ命令を実行可能と判断されるためである。ＶＡＤＤ命令の実行では、ベクトル演算パイプ２１０からのベクトルデータの読み出しと、ベクトルデータの加算処理と、ベクトル演算パイプ２１０へのベクトルデータの書き込みとが実行される。 On the other hand, execution of the VADD instruction is started after the primary issue of the VMV instruction is completed (S2). This is because it is determined that the VADD instruction can be executed because the vector register write path busy flag is not set even if VR0 and VR8 are in a competitive relationship using the same vector register write path. In the execution of the VADD instruction, reading of vector data from the vector operation pipe 210, addition processing of vector data, and writing of vector data to the vector operation pipe 210 are executed.

パイプ間クロスバユニット１０からのＶＴＢライトスタート信号がＷＳレジスタ４０９に入力される。ここで、ベクトルレジスタライトパス使用抑止フラグと、ＶＴＢリード抑止フラグも同時に設定する。その後、ベクトル間クロスバユニット１０は、受信データバッファ２３０にベクトルデータを書き込む（Ｓ１５）。受信データバッファ２３０のデータの読み出しが可能となった時点で、ＶＴＢリード抑止フラグがリセットされる（Ｓ１６）。発行チェック回路Ｃ４１３は、ベクトルレジスタライトパスビジーフラグと、ＶＴＢリード抑止フラグと、ＷＳレジスタ４０９の値とをチェックする。発行チェック回路Ｃ４１３は、ＶＭＶ命令の二次発行が可能であることをチェックし、チェック後に受信データバッファ２３０に格納されたベクトルデータをベクトルレジスタ２５０に書き込むこと指示する命令を発行する（ＶＭＶ二次発行）。ＶＭＶ二次発行をした場合には、ベクトルレジスタライトパス使用抑止フラグをリセットする。また、ベクトルレジスタライトビジーフラグを設定し、同一ベクトルレジスタライトパスの使用を抑止する。 A VTB write start signal from the cross pipe unit 10 between pipes is input to the WS register 409. Here, the vector register write path use inhibition flag and the VTB read inhibition flag are set simultaneously. After that, the inter-vector crossbar unit 10 writes vector data to the reception data buffer 230 (S15). When the data in the reception data buffer 230 can be read, the VTB read inhibition flag is reset (S16). The issue check circuit C413 checks the vector register write pass busy flag, the VTB read inhibition flag, and the value of the WS register 409. The issue check circuit C413 checks that the secondary issue of the VMV instruction is possible, and issues an instruction instructing to write the vector data stored in the reception data buffer 230 into the vector register 250 after the check (VMV secondary) Issue). When the VMV secondary issue is issued, the vector register write path use inhibition flag is reset. In addition, the vector register write busy flag is set to suppress the use of the same vector register write path.

上述のように、ベクトル演算パイプ２１０のベクトルデータの読み出し、書き込み（ＶＲＲｅａｄ、ＶＲＷｒｉｔｅ）はデータの読み書きのみの処理で処理を終了できる。これは、各ベクトル演算パイプ２１０が送信データバッファ２２０及び受信データバッファ２３０を用いてデータを送受信することによるため、ベクトル演算パイプ２１０がインターリーブ処理を行う必要がないためである。 As described above, vector data read / write (VR Read, VR Write) of the vector operation pipe 210 can be completed by a process of only reading and writing data. This is because each vector operation pipe 210 transmits and receives data using the transmission data buffer 220 and the reception data buffer 230, and therefore the vector operation pipe 210 does not need to perform interleaving processing.

また、上述のようにベクトルレジスタライトパスビジーフラグによりベクトルレジスタ２５０への書き込みを制御している。本実施の形態にかかる命令発行部４００は、ＶＭＶ命令をベクトルレジスタ２５０への書き込みが発生しない１次命令とベクトルレジスタ２５０への書き込みが生じる２次命令に分離して発行する。ベクトルレジスタライトパスビジーフラグは、２次命令の発行時のみしか使用中として設定されない。これにより、先行するＶＭＶ命令がベクトルレジスタ２５０への書き込み処理を行っていない間に後続のＶＡＤＤ命令を終了することができ、効率的なベクトル演算処理が実現されている。 Further, as described above, the writing to the vector register 250 is controlled by the vector register write pass busy flag. The instruction issuing unit 400 according to the present embodiment issues the VMV instruction separately into a primary instruction that does not cause writing to the vector register 250 and a secondary instruction that causes writing to the vector register 250. The vector register write pass busy flag is set as in use only when a secondary instruction is issued. Thus, the succeeding VADD instruction can be completed while the preceding VMV instruction is not performing the writing process to the vector register 250, and an efficient vector operation process is realized.

次に、図７を用いて、本実施の形態にかかるベクトル処理装置が以下のＶＭＶ命令と、ＶＡＤＤ命令を実行した際の処理について説明する。図７は以下の命令を実行した際のタイミングチャートである。ＶＲ６とＶＲ１４は、ベクトルレジスタ２５０からの読み出しについて競合関係となる。なお、ベクトルレジスタ２５０の書き込みについては競合関係にないため、図７及び以下の説明では、ベクトルレジスタ２５０への書き込み処理にかかる記載は省略する。
（１）ＶＭＶＶＲ０←ＶＲ１４
（２） ' ＶＡＤＤＶＲ４←ＶＲ６＋ＶＲ７ Next, processing when the vector processing apparatus according to the present embodiment executes the following VMV instruction and VADD instruction will be described with reference to FIG. FIG. 7 is a timing chart when the following instructions are executed. VR6 and VR14 are in a competitive relationship for reading from the vector register 250. Note that since writing to the vector register 250 is not in a competitive relationship, the description relating to the writing processing to the vector register 250 is omitted in FIG. 7 and the following description.
(1) VMV VR0 ← VR14
(2) 'VADD VR4 ← VR6 + VR7

最初に発行制御部４００はＶＭＶ命令の一次命令を発行する（Ｓ１）。この場合に、当該ＶＭＶ命令に関連するリソースの使用を抑止するビジーフラグを設定する。その後、全てのベクトル演算パイプ２１０が処理状態となり、ベクトル演算パイプ２１０はベクトルデータを読み出す（Ｓ１１）。当該読み出す処理は、インターリーブを要することのない処理である。そのため、当該処理は短い時間での終了が可能である。当該読み出し処理の実行中は、ベクトルレジスタリードパスビジーフラグが設定され、他の命令による読み出し処理が禁止される。ベクトル演算パイプ２１０は、読み出したベクトルデータを送信データバッファ２２０に格納する（Ｓ１２）。 First, the issue control unit 400 issues a primary instruction of the VMV instruction (S1). In this case, a busy flag that suppresses the use of resources related to the VMV instruction is set. Thereafter, all the vector operation pipes 210 enter the processing state, and the vector operation pipe 210 reads vector data (S11). The reading process is a process that does not require interleaving. Therefore, the process can be completed in a short time. During the execution of the read process, the vector register read pass busy flag is set, and read processes by other instructions are prohibited. The vector operation pipe 210 stores the read vector data in the transmission data buffer 220 (S12).

パイプ間クロスバユニット１０は、送信バッファ２２０のリード制御に基づいて送信データバッファ２２０を読み出す。ここで、ベクトルパイプサブユニット２００は２つのベクトル演算パイプ２１０を備えるため、送信データバッファ２２０のリード制御にはデータのインターリーブの処理を要する。パイプ間クロスバユニット１０は、それぞれのベクトルパイプサブユニット２００からベクトルデータを取得し、処理を行う（Ｓ１４）。 The inter-pipe crossbar unit 10 reads the transmission data buffer 220 based on the read control of the transmission buffer 220. Here, since the vector pipe subunit 200 includes two vector operation pipes 210, read control of the transmission data buffer 220 requires data interleaving processing. The inter-pipe crossbar unit 10 acquires vector data from each vector pipe subunit 200 and performs processing (S14).

一方、ＶＡＤＤ命令は、先行するＶＭＶ命令とベクトルレジスタリードパスの競合（Ｖ６とＶ１４）がある。そのため、ＶＡＤＤ命令はベクトルレジスタリードパスビジーフラグがリセットした後に発行可能となる。ここで、ＶＭＶ命令のベクトルデータの読み出し処理（Ｓ１１）は、インターリーブを要することのない処理のため、早期の処理終了が可能である。そのため、ＶＡＤＤ命令は、ＶＭＶ命令の発行から早期に発行される。以降の処理は、図６に示すものと同様のため、説明は省略する。 On the other hand, the VADD instruction has a conflict (V6 and V14) between the preceding VMV instruction and the vector register read path. Therefore, the VADD instruction can be issued after the vector register read pass busy flag is reset. Here, the vector data read process (S11) of the VMV instruction is a process that does not require interleaving, so that the process can be completed at an early stage. Therefore, the VADD instruction is issued early from the issue of the VMV instruction. The subsequent processing is the same as that shown in FIG.

上述のように、送信データバッファ２２０を備えたことにより、ベクトルデータの読み出し処理（Ｓ１１）がインターリーブを要することのない処理のため、早期の処理終了が可能である。よって、ベクトルレジスタの読み込みについて競合関係にある場合であっても、ベクトルデータの読み出し処理後に後続の命令が実行可能となる。 As described above, since the transmission data buffer 220 is provided, the vector data read processing (S11) does not require interleaving, so that early processing can be completed. Therefore, even if there is a competitive relationship regarding the reading of the vector register, the subsequent instruction can be executed after the vector data reading process.

続いて、本実施の形態にかかるベクトル処理装置との比較のため、図９に本発明が解決しようとする課題の１つに関連するベクトル処理装置の構成（図８）においてベクトルレジスタ２５０の読み込み処理及び書き込み処理の競合がある場合の動作を示す。本実施の形態にかかるベクトル処理装置ではベクトルレジスタからの読み込み処理はインターリーブ処理を行うため、相当の処理時間を要する（図中の（１））。そのため、ベクトルレジスタリードパスが競合している場合、本実施の形態にかかるベクトル処理装置ではＶＡＤＤ命令の実行開始が遅れる。 Subsequently, for comparison with the vector processing apparatus according to the present embodiment, FIG. 9 shows reading of the vector register 250 in the configuration of the vector processing apparatus (FIG. 8) related to one of the problems to be solved by the present invention. The operation in the case where there is contention between the processing and the writing processing is shown. In the vector processing apparatus according to the present embodiment, since the reading process from the vector register performs an interleaving process, a considerable processing time is required ((1) in the figure). Therefore, when the vector register read paths are in conflict, the vector processing apparatus according to the present embodiment delays the start of execution of the VADD instruction.

ベクトルレジスタライトパスが競合している場合、ＶＭＶ命令発行時からベクトルレジスタライトパスビジーフラグが設定される。実際のベクトルレジスタへの書き込み処理は、パイプ間クロスバユニット１０からベクトルレジスタへの書き込み（Ｓ１３）の時点で実行される。しかし、他の命令によるＶＭＶ命令発行時点からベクトルレジスタの書き込みはＶＭＶ命令発行時点から禁止された状態となる。これは、ＶＭＶ命令を一の命令として発行しているため、ベクトルレジスタライトパスビジーフラグが長時間書き変わらないことに起因する。 When the vector register write path is in conflict, the vector register write path busy flag is set from the time when the VMV instruction is issued. The actual writing process to the vector register is executed at the time of writing from the cross-pipe crossbar unit 10 to the vector register (S13). However, the writing of the vector register from the time when the VMV instruction is issued by another instruction is prohibited from the time when the VMV instruction is issued. This is because the VMV instruction is issued as one instruction, and the vector register write pass busy flag is not rewritten for a long time.

以下に本実施の形態にかかるベクトル処理装置による効果についてまとめる。図６及び図７の例で示したように、送信データバッファ２２０及び受信データバッファ２３０をベクトルパイプサブユニット２００内に設けたことにより、ベクトル演算パイプ２１０からのベクトルデータの読み出し、書き出しが早期に終了する。これは、ベクトル演算パイプ２１０毎に出力先を設け、ベクトル演算パイプ２１０での処理を早期に終了できるようにしたためである。これにより、通常インターリーブ処理の時間もかかっていた処理が短縮できる。ベクトル演算パイプ２１０からのベクトルデータの読み出し、書き出しの時間が短縮されることにより、後続の命令を開始できるタイミングが早まり、処理時間の短縮につながる。 The effects of the vector processing apparatus according to this embodiment will be summarized below. As shown in the examples of FIGS. 6 and 7, by providing the transmission data buffer 220 and the reception data buffer 230 in the vector pipe subunit 200, reading and writing of vector data from the vector operation pipe 210 can be performed at an early stage. finish. This is because an output destination is provided for each vector calculation pipe 210 so that the processing in the vector calculation pipe 210 can be completed early. As a result, processing that normally takes time for interleaving processing can be shortened. Since the time for reading and writing vector data from the vector operation pipe 210 is shortened, the timing at which the subsequent instruction can be started is accelerated, and the processing time is shortened.

また、図６に示したように、ベクトルレジスタライトパスビジーフラグによりベクトルレジスタ２５０への書き込みを制御している。本実施の形態にかかる命令発行部４００は、ＶＭＶ命令をベクトルレジスタ２５０への書き込みが発生しない１次命令とベクトルレジスタ２５０への書き込みが生じる２次命令に分離して発行する。ベクトルレジスタライトパスビジーフラグは、２次命令の発行時のみしか使用中として設定されない。これにより、先行するＶＭＶ命令がベクトルレジスタ２５０への書き込み処理を行っていない間に後続のＶＡＤＤ命令を終了することができ、効率的なベクトル演算処理が実現されている。 Further, as shown in FIG. 6, the writing to the vector register 250 is controlled by the vector register write pass busy flag. The instruction issuing unit 400 according to the present embodiment issues the VMV instruction separately into a primary instruction that does not cause writing to the vector register 250 and a secondary instruction that causes writing to the vector register 250. The vector register write pass busy flag is set as in use only when a secondary instruction is issued. Thus, the succeeding VADD instruction can be completed while the preceding VMV instruction is not performing the writing process to the vector register 250, and an efficient vector operation process is realized.

なお、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。 Note that the present invention is not limited to the above-described embodiment, and can be changed as appropriate without departing from the spirit of the present invention.

１０パイプ間クロスバユニット
２０ベクトルパイプユニット
２００ベクトルパイプサブユニット
２１０ベクトル演算パイプ
２２０送信データバッファ
２３０受信データバッファ
２４０パイプ内クロスバ
２５０ベクトルレジスタ
２６０演算器
２７０ベクトルデータ選択回路
３０信号線
４００発行制御部
４０１命令リクエストＡ格納部
４０２命令リクエストＢ格納部
４０３発行チェック回路Ａ
４０４発行チェック回路Ｂ
４０５ＧＯＡ
４０６ＧＯＢ
４０７ビジーフラグ記憶部
４０８ベクトルレジスタライトパスビジーフラグ記憶部
４０９ＷＳレジスタ
４１０ベクトルレジスタライトパス使用抑止フラグ記憶部
４１１ＶＴＢリード抑止フラグ記憶部
４１２ＰＸＢ命令２次リクエスト格納部
４１３発行チェック回路Ｃ
４１４ＲＧＯ 10 pipe crossbar unit 20 vector pipe unit 200 vector pipe subunit 210 vector operation pipe 220 transmission data buffer 230 reception data buffer 240 pipe crossbar 250 vector register 260 operator 270 vector data selection circuit 30 signal line 400 issuance control unit 401 instruction Request A storage unit 402 Instruction request B storage unit 403 Issue check circuit A
404 Issue check circuit B
405 GOA
406 GOB
407 Busy flag storage unit 408 Vector register write path busy flag storage unit 409 WS register 410 Vector register write path use inhibition flag storage unit 411 VTB read inhibition flag storage unit 412 PXB instruction secondary request storage unit 413 Issue check circuit C
414 RGO

Claims

A first processing unit comprising first and second vector operation pipes;
A second processing unit;
A data transfer circuit configured to transfer data between the first processing unit and the second processing unit;
The first processing unit and the data transfer circuit are connected and output data of the first and second vector operation pipes are supplied to the data transfer circuit or from the data transfer circuit. And a first data path used to provide input data to a second vector operation pipe;
With
The first processing unit includes a data buffer disposed between the first and second vector operation pipes and the first data path, and holding the output data or the input data.
The first and second vector operation pipes do not depend on whether the other vector operation pipe is accessing the data buffer and whether the first data path is in use, The data buffer is configured to be accessible ;
The data buffer is arranged between the first data operation pipe and the first data path, and between the second vector operation pipe and the first data path. A second data buffer provided,
Vector arithmetic processing unit.

The first data buffer is configured to hold output data of the first vector operation pipe;
The second data buffer is configured to hold output data of the second vector operation pipe;
The first processing unit further includes a selection circuit for supplying to said first data buffer or said data held in the second data buffer selectively the first data path, according to claim 1 Vector arithmetic processing unit.

The first processing unit further includes an issue control unit that issues an instruction to the first vector operation pipe,
The first vector operation pipe includes a first register for storing data used for operation in the operation pipe and operation result data,
The first data buffer is configured to hold input data to the first vector operation pipe;
The issuance control unit depends on whether or not the first vector operation pipe is executing a preceding instruction that causes data to be written from the data transfer circuit via the first data buffer to the first register. Without being performed, neither data transfer from the data transfer circuit to the first data buffer nor data transfer from the first data buffer to the first register due to execution of the preceding instruction is performed. A subsequent instruction is issued to the first vector operation pipe on the condition that
The vector operation processing apparatus according to claim 1 .

A first flag indicating whether or not a write path for writing data to the first register is in use;
A second flag indicating whether or not the first data path for transferring data from the data transfer circuit to the first data buffer is in use;
Further comprising
The issuance control unit refers to the first and second flags to transfer data from the data transfer circuit to the first data buffer and from the first data buffer to the first register. The vector operation processing apparatus according to claim 3 , wherein it is determined that neither data transfer is performed.

The issuance control unit, when the first flag is not set and data transfer from the data transfer circuit to the first data buffer is started,
The vector operation processing device according to claim 4 , wherein data transfer from the data buffer to the first register is started and the first flag is set.

The second flag is supplied from the data transfer circuit to the first processing unit, and is set according to a control signal indicating the start of data transfer from the data transfer circuit to the first data buffer. The vector arithmetic processing apparatus according to claim 4 or 5 .

The data buffer, said output data output by said first and second vector operation pipes, any of claims 1 to 6 independent of the transfer of the output data to said first data path The vector operation processing device according to claim 1.

The data buffer separates the supply of the input data from the first data path to the data buffer and the supply of the input data from the data buffer to the first and second vector operation pipes. The vector arithmetic processing apparatus of any one of Claim 1 thru | or 6 .

It said data transfer circuit is a crossbar switch, the vector processing apparatus according to any one of claims 1 to 8.

A first processing unit comprising first and second vector operation pipes;
A second processing unit;
A data transfer circuit configured to transfer data between the first processing unit and the second processing unit;
The first processing unit and the data transfer circuit are connected and output data of the first and second vector operation pipes are supplied to the data transfer circuit or from the data transfer circuit. And a first data path used to provide input data to a second vector operation pipe;
A processing method in a vector processing device comprising:
The first processing unit includes a data buffer disposed between the first and second vector operation pipes and the first data path, and holding the output data or the input data.
The data buffer is arranged between the first data operation pipe and the first data path, and between the second vector operation pipe and the first data path. A second data buffer,
The first and second vector operation pipes do not depend on whether the other vector operation pipe is accessing the data buffer and whether the first data path is in use, A vector operation processing method for accessing the data buffer.