JP4347352B2

JP4347352B2 - Vector data processing device

Info

Publication number: JP4347352B2
Application number: JP2007037404A
Authority: JP
Inventors: 師人中込; 俊彦中村
Original assignee: NEC Computertechno Ltd
Current assignee: NEC Computertechno Ltd
Priority date: 2007-02-19
Filing date: 2007-02-19
Publication date: 2009-10-21
Anticipated expiration: 2027-02-19
Also published as: JP2008204028A

Description

本発明は、計算機に使用されるベクトルデータ処理装置に関する。 The present invention relates to a vector data processing apparatus used in a computer.

計算機に使用されるベクトルデータ処理装置が知られている。図１は、従来のベクトルデータ処理装置の構成を示している。従来のベクトルデータ処理装置は、クロスバ入力レジスタ１０１−１、１０１−２と、クロスバ１０２と、クロスバ出力レジスタ１０３−１、１０３−２と、ライトデータレジスタ１０４−１、１０４−２と、ベクトルレジスタ１０５−１、１０５−２と、リードデータレジスタ１０６−１、１０６−２と、演算入力レジスタ１０７−１、１０７−２と、演算器１０８と、演算出力レジスタ１０９とを具備している。これらはクロックに応じて動作する。 A vector data processing apparatus used for a computer is known. FIG. 1 shows the configuration of a conventional vector data processing apparatus. The conventional vector data processing apparatus includes crossbar input registers 101-1, 101-2, crossbar 102, crossbar output registers 103-1, 103-2, write data registers 104-1, 104-2, and vector registers. 105-1 and 105-2, read data registers 106-1 and 106-2, arithmetic input registers 107-1 and 107-2, an arithmetic unit 108, and an arithmetic output register 109 are provided. These operate according to the clock.

クロスバ入力レジスタ１０１−１、１０１−２は、クロスバ１０２に接続されている。また、クロスバ入力レジスタ１０１−２は、他のリソースに接続されている。クロスバ１０２は、クロスバ出力レジスタ１０３−１、１０３−２に接続されている。クロスバ出力レジスタ１０３−１、１０３−２は、それぞれ、ライトデータレジスタ１０４−１、１０４−２に接続されている。ライトデータレジスタ１０４−１、１０４−２は、それぞれ、ベクトルレジスタ１０５−１、１０５−２に接続されている。ベクトルレジスタ１０５−１、１０５−２は、それぞれ、リードデータレジスタ１０６−１、１０６−２に接続されている。リードデータレジスタ１０６−１、１０６−２は、それぞれ、演算入力レジスタ１０７−１、１０７−２に接続されている。演算入力レジスタ１０７−１、１０７−２は、演算器１０８に接続されている。演算器１０８は、演算出力レジスタ１０９に接続されている。演算出力レジスタ１０９は、クロスバ入力レジスタ１０１−１に接続されている。 The crossbar input registers 101-1 and 101-2 are connected to the crossbar 102. The crossbar input register 101-2 is connected to other resources. The crossbar 102 is connected to the crossbar output registers 103-1 and 103-2. The crossbar output registers 103-1 and 103-2 are connected to the write data registers 104-1 and 104-2, respectively. The write data registers 104-1 and 104-2 are connected to the vector registers 105-1 and 105-2, respectively. Vector registers 105-1 and 105-2 are connected to read data registers 106-1 and 106-2, respectively. The read data registers 106-1 and 106-2 are connected to the operation input registers 107-1 and 107-2, respectively. The arithmetic input registers 107-1 and 107-2 are connected to the arithmetic unit 108. The computing unit 108 is connected to the computation output register 109. The arithmetic output register 109 is connected to the crossbar input register 101-1.

ベクトルレジスタ１０５−１は、１番目からｎ番目（ｎは３以上の整数）までのｎ個のベクトルデータＡ＝｛ａ１、ａ２、ａ３、・・・、ａｎ｝を格納する。 The vector register 105-1 stores n vector data A = {a1, a2, a3,..., An} from the first to the nth (n is an integer of 3 or more).

このベクトルレジスタ１０５−１は、例えばＲＡＭにより構成される。ベクトルレジスタ１０５−１は、ｎ個のベクトルデータＡ＝｛ａ１、ａ２、ａ３、・・・、ａｎ｝をライトアドレスＷＡ１に応じて各要素位置に格納し、リードアドレスＲＡ１に応じて、それらをリードデータレジスタ１０６−１、演算入力レジスタ１０７−１を介して演算器１０８に出力する。 The vector register 105-1 is constituted by a RAM, for example. The vector register 105-1 stores n vector data A = {a1, a2, a3,..., An} in each element position according to the write address WA1, and stores them in accordance with the read address RA1. The data is output to the arithmetic unit 108 via the read data register 106-1 and the arithmetic input register 107-1.

ベクトルレジスタ５−２は、１番目からｎ番目までのｎ個のベクトルデータＢ｛ｂ１、ｂ２、ｂ３、・・・、ｂｎ｝を格納する。 The vector register 5-2 stores n vector data B {b1, b2, b3,..., Bn} from the first to the nth.

このベクトルレジスタ１０５−２は、例えばＲＡＭにより構成される。ベクトルレジスタ１０５−２は、ｎ個のベクトルデータＢ＝｛ｂ１、ｂ２、ｂ３、・・・、ｂｎ｝をライトアドレスＷＡ２に応じて各要素位置に格納し、リードアドレスＲＡ２に応じて、それらをリードデータレジスタ１０６−２、演算入力レジスタ１０７−２を介して演算器１０８に出力する。 The vector register 105-2 is constituted by a RAM, for example. The vector register 105-2 stores n vector data B = {b1, b2, b3,..., Bn} in each element position according to the write address WA2, and stores them in accordance with the read address RA2. The data is output to the arithmetic unit 108 via the read data register 106-2 and the arithmetic input register 107-2.

演算器１０８は、パイプライン構成を用い、ベクトルデータの入力から出力まで数マシンサイクルを必要とするが、マシンサイクル毎に異なったベクトルデータＡ＝｛ａ１、ａ２、ａ３、・・・、ａｎ｝、Ｂ＝｛ｂ１、ｂ２、ｂ３、・・・、ｂｎ｝を入力し数マシンサイクル後にはマシンサイクル毎に演算結果（演算結果のベクトルデータ）Ｃ＝｛ｃ１、ｃ２、ｃ３、・・・、ｃｎ｝を生成して出力するように構成されている。すなわち、異なる演算を並列して実行できるように、複数の演算部を備えている。 The computing unit 108 uses a pipeline configuration and requires several machine cycles from the input to the output of vector data. However, vector data A = {a1, a2, a3,... , B = {b 1, b 2, b 3,..., Bn} and after several machine cycles, the operation result (vector data of the operation result) C = {c 1, c 2, c 3,. cn} is generated and output. That is, a plurality of calculation units are provided so that different calculations can be executed in parallel.

この演算器１０８は、演算入力レジスタ１０７−１の出力（ｎ個のベクトルデータ｛ａ１、ａ２、ａ３、・・・、ａｎ｝）を第一オペランドとして入力し、演算入力レジスタ１０７−２の出力（ｎ個のベクトルデータ｛ｂ１、ｂ２、ｂ３、・・・、ｂｎ｝）を第二オペランドとして入力する。演算器１０８は、ｎ個のベクトルデータ｛ａ１、ａ２、ａ３、・・・、ａｎ｝とｎ個のベクトルデータ｛ｂ１、ｂ２、ｂ３、・・・、ｂｎ｝のそれぞれに対して順次に演算を施して、ｎ個の演算結果｛ｃ１、ｃ２、ｃ３、・・・、ｃｎ｝を生成し、順次に演算出力レジスタ１０９に出力する。 This computing unit 108 inputs the output of the computation input register 107-1 (n vector data {a1, a2, a3,..., An}) as the first operand, and outputs the computation input register 107-2. (N vector data {b1, b2, b3,..., Bn}) are input as the second operand. The computing unit 108 sequentially calculates each of the n vector data {a1, a2, a3,..., An} and the n vector data {b1, b2, b3,. To generate n operation results {c 1, c 2, c 3,..., Cn}, and sequentially output them to the operation output register 109.

ここで、ｎ個の演算結果｛ｃ１、ｃ２、ｃ３、・・・、ｃｎ｝はベクトルレジスタ１０５−２に格納するよう命令で指定されているものとする。この場合、演算出力レジスタ１０９は、ｎ個の演算結果｛ｃ１、ｃ２、ｃ３、・・・、ｃｎ｝を、クロスバ入力レジスタ１０１−１を介してクロスバ１０２に出力する。 Here, it is assumed that n operation results {c1, c2, c3,..., Cn} are designated by an instruction to be stored in the vector register 105-2. In this case, the operation output register 109 outputs n operation results {c1, c2, c3,..., Cn} to the crossbar 102 via the crossbar input register 101-1.

クロスバ１０２は、ｎ個の演算結果｛ｃ１、ｃ２、ｃ３、・・・、ｃｎ｝に対して、ベクトルレジスタ１０５−１、１０５−２のうちの、命令で指定されるベクトルレジスタ１０５−２に格納するためにルーティングを行い、ｎ個の演算結果｛ｃ１、ｃ２、ｃ３、・・・、ｃｎ｝を順次にクロスバ出力レジスタ１０３−２、ライトデータレジスタ１０４−２を介してベクトルレジスタ１０５−２に格納する。 The crossbar 102 applies the n operation results {c1, c2, c3,..., Cn} to the vector register 105-2 designated by the instruction among the vector registers 105-1 and 105-2. Routing is performed for storage, and n operation results {c1, c2, c3,..., Cn} are sequentially transmitted through the crossbar output register 103-2 and the write data register 104-2 to the vector register 105-2. To store.

このように、従来のベクトルデータ処理装置では、一度に多くのベクトルデータを格納（保持）できる複数のベクトルレジスタ１０５−１、１０５−２を用意し、また異なる演算を並列して実行できるようにするために演算器１０８に複数の演算部が演算リソースとして用意されていた。これらの演算器１０８の出力を命令で指定されたベクトルレジスタ（ベクトルレジスタ１０５−２）に格納するために、複数のベクトルレジスタ１０５−１、１０５−２と複数の演算部とに対応し、それぞれのベクトルデータをルーティングするクロスバ１０２が必要となっていた。 As described above, the conventional vector data processing apparatus prepares a plurality of vector registers 105-1 and 105-2 that can store (hold) a large amount of vector data at a time, and can execute different operations in parallel. In order to do this, the computing unit 108 is provided with a plurality of computing units as computing resources. In order to store the outputs of these arithmetic units 108 in the vector register (vector register 105-2) designated by the instruction, it corresponds to a plurality of vector registers 105-1 and 105-2 and a plurality of arithmetic units, The crossbar 102 for routing the vector data is required.

しかしながら、従来のベクトルデータ処理装置において、次のような課題がある。 However, the conventional vector data processing apparatus has the following problems.

まず、第１の課題について説明する。一度に処理を行うベクトル要素数（ベクトルデータ数）が多い場合には、クロスバ１０２でのルーティングにかかる時間は問題となることは少ない。一方、処理を行うベクトル要素数が少ない場合にはこのクロスバ１０２でのルーティング時間もオーバーヘッドとして性能上問題となる。 First, the first problem will be described. When the number of vector elements (number of vector data) to be processed at a time is large, the time required for routing in the crossbar 102 is not a problem. On the other hand, when the number of vector elements to be processed is small, the routing time in the crossbar 102 also becomes a performance problem as overhead.

次に、第２の課題について説明する。ベクトルレジスタベクトルレジスタ１０５−１、１０５−２は多くの要素を格納する。このため、従来型の大容量を維持したＲＡＭのまま、書き込み用のライトポートを増やしたり、読み出し用のリードポートを増やしたりすると、ＲＡＭ自体の面積が増加する。その影響によりクロスバ１０２自身もそれに合わせて大きくなり、ルーティング時間が更に増加してしまう。 Next, the second problem will be described. Vector registers The vector registers 105-1 and 105-2 store many elements. For this reason, if the write port for writing is increased or the read port for reading is increased with the conventional RAM maintaining a large capacity, the area of the RAM itself increases. As a result, the crossbar 102 itself becomes larger accordingly, and the routing time further increases.

大量のベクトルデータに関しては現状の処理を行い、少量のベクトルデータに関してはより高速に処理することが望まれる。 It is desired to perform the current processing for a large amount of vector data and to process a small amount of vector data at a higher speed.

ベクトルデータに関する技術について紹介する。 Introduces technologies related to vector data.

特開平９−２８２３０８号公報（特許文献１）には“ベクトル命令制御方式”が記載されている。ベクトル命令制御方式は、ベクトル型情報処理装置を構築するＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）チップに内蔵されるベクトルユニットを駆動し制御する。このベクトル命令制御方式は、ベクトルユニットを内蔵するＬＳＩチップに、短いベクトル長のベクトル命令を実行するベクトルユニットと、スカラ命令を実行するスカラユニットと、前記ベクトルユニット及び前記スカラユニットの両者に含まれて共用される演算ユニットとを備えることを特徴としている。 Japanese Patent Laid-Open No. 9-282308 (Patent Document 1) describes a “vector instruction control method”. The vector instruction control method drives and controls a vector unit built in an LSI (Large Scale Integration) chip that constructs a vector type information processing apparatus. This vector instruction control method is included in an LSI chip incorporating a vector unit, a vector unit that executes a vector instruction with a short vector length, a scalar unit that executes a scalar instruction, and both the vector unit and the scalar unit. And an arithmetic unit shared with each other.

特開平９−１９８３７４号公報（特許文献２）には“ベクトル処理装置”が記載されている。ベクトル処理装置は、複数のベクトルレジスタと少なくとも１つのベクトル演算器と少なくとも１つのデータ転送回路とを有するベクトル演算処理ユニットを複数備え、１つのベクトル命令を前記ベクトル演算処理ユニットで分割して処理する。このベクトル処理装置は、複数の異なるベクトル命令列の並列処理が可能か否かを判定する判定手段を備え、前記複数のベクトル演算処理ユニットを複数に分割し、分割された複数のベクトル演算処理ユニットにそれぞれ前記異なるベクトル命令列の１つを割り当てて、複数の異なるベクトル命令列を並列に処理することを特徴としている。 Japanese Patent Laid-Open No. 9-198374 (Patent Document 2) describes a “vector processing device”. The vector processing apparatus includes a plurality of vector arithmetic processing units each including a plurality of vector registers, at least one vector arithmetic unit, and at least one data transfer circuit, and processes one vector instruction divided by the vector arithmetic processing unit. . The vector processing apparatus includes a determination unit that determines whether parallel processing of a plurality of different vector instruction sequences is possible, and divides the plurality of vector arithmetic processing units into a plurality of divided vector arithmetic processing units. Are assigned one of the different vector instruction sequences, and a plurality of different vector instruction sequences are processed in parallel.

特開２００１−２７３２７７号公報（特許文献３）には“演算処理システム”が記載されている。演算処理システムは、複数のデータ要素をそれぞれに含むデータ・ベクトルを使用する演算を処理する。この演算処理システムは、データ・ベクトルのデータ要素を格納するための複数の格納要素を含むベクトル・データ・ファイルと、バスによって前記ベクトル・データ・ファイルに結合されるポインタ配列であって、複数のエントリを含み、各エントリがベクトル・データ・ファイル内の少なくとも１つの格納要素を識別するようにしたポインタ配列とを含み、前記データ・ベクトルの少なくとも１つのデータ要素を格納するための少なくとも１つの格納要素であって、前記ポインタ配列の少なくとも１つの特定のエントリに対し、その特定のエントリによって識別される少なくとも１つの格納要素が、前記ベクトル・データ・ファイルの任意の開始アドレスを有している。 Japanese Patent Laid-Open No. 2001-273277 (Patent Document 3) describes an “arithmetic processing system”. An arithmetic processing system processes operations that use data vectors that each include a plurality of data elements. The processing system includes a vector data file including a plurality of storage elements for storing data elements of a data vector, and a pointer array coupled to the vector data file by a bus, At least one storage for storing at least one data element of said data vector, including an entry, each entry comprising a pointer array adapted to identify at least one storage element in the vector data file An element, for at least one specific entry of the pointer array, at least one storage element identified by the specific entry has an arbitrary starting address of the vector data file.

特開平５−２３３２７９号公報（特許文献４）には“情報処理装置が記載されている。情報処理装置は、複数のレジスタウインドにそれぞれ対応して定められた複数群のスカラレジスタを使用する第１種のプログラムと、複数のベクトルレジスタを使用する第２種のプログラムとを実行する。この情報処理装置は、該複数群のスカラレジスタと、該複数のベクトルレジスタを実現するための所定数のレジスタであって、それぞれ該複数群のスカラレジスタの一つとして使用され、かつ、該複数のベクトルレジスタをそれぞれ構成する複数群の要素レジスタの一つとしてそれぞれ使用される複数のレジスタを含むものと、実行中の第１種のプログラムが指定したレジスタウインド番号を保持する手段と、該保持手段に接続され、該第１種のプログラムの実行時に、その第１種のプログラムが発行した、少なくとも一つのレジスタを使用する命令に応答して、該所定数のレジスタの内、該保持されたレジスタウインド番号のレジスタウインドにおいてその命令で指定されたレジスタ番号のスカラレジスタとして使用すべき一つのレジスタを決定する第１の決定回路と、該第２種のプログラムの実行時に、該第２種のプログラムが発行した、少なくとも一つのレジスタを使用する命令に応答して、該所定数のレジスタの内、その命令で指定されたレジスタ番号を有するベクトルレジスタを構成する一群の要素レジスタとして使用すべき一群のレジスタを決定する第２の決定回路と、該第１、第２の決定回路に接続され、該第１の決定回路により決定された該一つのレジスタと、該第２の決定回路により決定された該一群のレジスタをアクセスするアクセス回路とを有している。 Japanese Patent Application Laid-Open No. 5-233279 (Patent Document 4) describes “Information Processing Device. The information processing device uses a plurality of groups of scalar registers respectively defined corresponding to a plurality of register windows. One type of program and a second type of program using a plurality of vector registers are executed, and the information processing apparatus includes a plurality of groups of scalar registers and a predetermined number of units for realizing the plurality of vector registers. A register including a plurality of registers each used as one of the plurality of groups of scalar registers and used as one of a plurality of groups of element registers constituting each of the plurality of vector registers; Means for holding the register window number designated by the first type program being executed, and connected to the holding means; In response to an instruction using the at least one register issued by the first type program at the time of execution of the program, the instruction is executed in the register window of the stored register window number among the predetermined number of registers. A first determination circuit for determining one register to be used as a scalar register of a specified register number, and at least one register issued by the second type program when executing the second type program; In response to an instruction to be used, a second determination circuit for determining a group of registers to be used as a group of element registers constituting a vector register having a register number designated by the instruction among the predetermined number of registers And the one register connected to the first and second decision circuits and decided by the first decision circuit, and the second decision circuit And a access circuit for accessing the set of registers which are determined by the circuit.

特開平１−１９１２６５号公報（特許文献５）には“ベクトル演算命令起動方式”が記載されている。ベクトル演算命令起動方式は、スカラユニットからの起動情報に基づいて、ベクトルユニットが動作を開始する。このベクトル演算命令起動方式は、該スカラユニットからのベクトルユニットへの上記起動時間をパイプラインで、ベクトル演算命令の実行ステージ以前に送出し、ベクトルユニットにおいては、該起動時間に基づいて、上記ベクトル演算命令の実行ステージ以前に、操作コードのデコード、及びベクトルレジスタからの最初の一要素のリードを完了させ、上記スカラユニットからのベクトル命令の実行ステージにおいて送出されるスタートコマンドを受信した時点で、該ベクトル演算を開始することを特徴としている。 Japanese Patent Laid-Open No. 1-191265 (Patent Document 5) describes a “vector operation instruction starting method”. In the vector operation instruction activation method, the vector unit starts its operation based on the activation information from the scalar unit. In this vector operation instruction activation method, the activation time from the scalar unit to the vector unit is sent in a pipeline before the execution stage of the vector operation instruction. In the vector unit, the vector operation instruction is sent based on the activation time. Before the execution stage of the arithmetic instruction, the decoding of the operation code and the reading of the first element from the vector register are completed, and when the start command sent in the execution stage of the vector instruction from the scalar unit is received, The vector operation is started.

特開平９−２８２３０８号公報JP-A-9-282308 特開平９−１９８３７４号公報Japanese Patent Laid-Open No. 9-198374 特開２００１−２７３２７７号公報JP 2001-273277 A 特開平５−２３３２７９号公報JP-A-5-233279 特開平１−１９１２６５号公報JP-A-1-191265

本発明の課題は、大量のベクトルデータに関しては現状の処理を行い、少量のベクトルデータに関してはより高速に処理することができるベクトルデータ処理装置を提供することにある。 An object of the present invention is to provide a vector data processing apparatus that can perform the current processing for a large amount of vector data and can process a small amount of vector data at a higher speed.

以下に、発明を実施するための最良の形態・実施例で使用される符号を括弧付きで用いて、課題を解決するための手段を記載する。この符号は、特許請求の範囲の記載と発明を実施するための最良の形態・実施例の記載との対応を明らかにするために付加されたものであり、特許請求の範囲に記載されている発明の技術的範囲の解釈に用いてはならない。 In the following, means for solving the problems will be described using the reference numerals used in the best modes and embodiments for carrying out the invention in parentheses. This reference numeral is added to clarify the correspondence between the description of the claims and the description of the best mode for carrying out the invention / example, and is described in the claims. It should not be used to interpret the technical scope of the invention.

本発明のベクトルデータ処理装置は、
１番目からｎ番目（ｎは３以上の整数）までのｎ個の第１ベクトルデータ（｛ａ１、ａ２、ａ３、・・・、ａｎ｝）のうちの、１番目からｊ番目（ｊは、ｊ＜（ｎ−ｊ）を満たす整数）までのｊ個の第１ベクトルデータ（｛ａ１、ａ２｝）が格納された第１若番用格納部（アドレス“０”、“１”）と、１番目からｎ番目までのｎ個の第２ベクトルデータ（｛ｂ１、ｂ２、ｂ３、・・・、ｂｎ｝）のうちの、１番目からｊ番目までのｊ個の第２ベクトルデータ（｛ｂ１、ｂ２｝）が格納された第２若番用格納部（アドレス“２”、“３”）とを有するレジスタファイル（１１）と、
前記ｊ個の第１ベクトルデータ（｛ａ１、ａ２｝）以外の（ｎ−ｊ）個の第１ベクトルデータ（｛ａ３、・・・、ａｎ｝）が格納された第１ベクトルレジスタ（５−１）と、
前記ｊ個の第２ベクトルデータ（｛ｂ１、ｂ２｝）以外の（ｎ−ｊ）個の第２ベクトルデータ（｛ｂ３、・・・、ｂｎ｝）が格納された第２ベクトルレジスタ（５−２）と、
前記ｎ個の第１ベクトルデータ（｛ａ１、ａ２、ａ３、・・・、ａｎ｝）と前記ｎ個の第２ベクトルデータ（｛ｂ１、ｂ２、ｂ３、・・・、ｂｎ｝）のそれぞれに対して順次に演算を施して、ｎ個の演算結果（｛ｃ１、ｃ２、ｃ３、・・・、ｃｎ｝）を生成する演算器（８）と、
前記ｎ個の演算結果（｛ｃ１、ｃ２、ｃ３、・・・、ｃｎ｝）のうちの、１番目からｊ番目までのｊ個の演算結果（｛ｃ１、ｃ２｝）を順次に前記レジスタファイル（１１）に格納する若番用ライトデータレジスタ（１０−２）と、
前記ｊ個の演算結果（｛ｃ１、ｃ２｝）以外の（ｎ−ｊ）個の演算結果（｛ｃ３、・・・、ｃｎ｝）を順次に前記第１及び第２ベクトルレジスタ（５−１、５−２）のうちの指定されたベクトルレジスタ（５−２）に格納するクロスバ（２）と
を具備している。 The vector data processing apparatus of the present invention
Of the first vector data ({a1, a2, a3,..., An}) from the first to the nth (n is an integer of 3 or more), the first to jth (j is a first young number storage unit (address “0”, “1”) in which j first vector data ({a1, a2}) up to j <(n−j) satisfying an integer) are stored; Of the n second vector data ({b1, b2, b3,..., Bn}) from the first to the nth, j second vector data ({b1) from the first to the jth , B2}) is stored in a register file (11) having a second number storage section (addresses “2”, “3”);
A first vector register (5-) that stores (n−j) first vector data ({a3,..., An}) other than the j first vector data ({a1, a2}). 1) and
A second vector register (5-) that stores (n−j) second vector data ({b3,..., Bn}) other than the j second vector data ({b1, b2}). 2) and
Each of the n first vector data ({a1, a2, a3,..., An}) and the n second vector data ({b1, b2, b3,..., Bn}) An arithmetic unit (8) that sequentially performs operations on the result to generate n operation results ({c1, c2, c3,..., Cn});
Among the n operation results ({c1, c2, c3,..., Cn}), j operation results ({c1, c2}) from the first to the jth are sequentially stored in the register file. The young write data register (10-2) stored in (11),
The (n−j) operation results ({c3,..., Cn}) other than the j operation results ({c1, c2}) are sequentially added to the first and second vector registers (5-1). 5-2) and a crossbar (2) stored in the designated vector register (5-2).

本発明のベクトルデータ処理装置は、
前記レジスタファイル（１１）の前記第１若番用格納部（アドレス“０”、“１”）に格納された前記ｊ個の第１ベクトルデータ（｛ａ１、ａ２｝）を順次に前記演算器（８）に出力し、その後に、前記第１ベクトルレジスタ（５−１）に格納された前記（ｎ−ｊ）個の第１ベクトルデータ（｛ａ３、・・・、ａｎ｝）を順次に前記演算器（８）に出力する第１選択回路（１２−１）と、
前記レジスタファイル（１１）の前記第２若番用格納部（アドレス“２”、“３”）に格納された前記ｊ個の第２ベクトルデータ（｛ｂ１、ｂ２｝）を順次に前記演算器（８）に出力し、その後に、前記第２ベクトルレジスタ（５−２）に格納された前記（ｎ−ｊ）個の第２ベクトルデータ（｛ｂ３、・・・、ｂｎ｝）を順次に前記演算器（８）に出力する第２選択回路（１２−２）と
を更に具備している。 The vector data processing apparatus of the present invention
The j pieces of first vector data ({a1, a2}) stored in the first young number storage section (addresses “0”, “1”) of the register file (11) are sequentially added to the computing unit. (8) and then sequentially output the (n−j) pieces of first vector data ({a3,..., An}) stored in the first vector register (5-1). A first selection circuit (12-1) for outputting to the computing unit (8);
The j pieces of second vector data ({b1, b2}) stored in the second young number storage unit (addresses “2”, “3”) of the register file (11) are sequentially added to the computing unit. (8), and then sequentially output the (n−j) second vector data ({b3,..., Bn}) stored in the second vector register (5-2). And a second selection circuit (12-2) for outputting to the computing unit (8).

本発明のベクトルデータ処理装置は、
前記第１選択回路（１２−１）と前記演算器（８）との間に接続された第１演算入力レジスタ（７−１）と、
前記第２選択回路（１２−２）と前記演算器（８）との間に接続された第２演算入力レジスタ（７−２）と、
前記演算器（８）と前記クロスバ（２）との間に接続された演算出力レジスタ（９）と
を更に具備している。 The vector data processing apparatus of the present invention
A first calculation input register (7-1) connected between the first selection circuit (12-1) and the calculator (8);
A second calculation input register (7-2) connected between the second selection circuit (12-2) and the calculator (8);
An arithmetic output register (9) connected between the arithmetic unit (8) and the crossbar (2) is further provided.

本発明のベクトルデータ処理装置は、
前記演算出力レジスタ（９）と前記クロスバ（２）との間に前記第１、第２ベクトルレジスタ（５−１、５−２）に対応して接続された第１、第２クロスバ入力レジスタ（１−１、１−２）と、
前記クロスバ（２）と前記第１、第２ベクトルレジスタ（５−１、５−２）との間に接続された第１、第２クロスバ出力レジスタ（３−１、３−２）と、
前記第１、第２クロスバ出力レジスタ（３−１、３−２）と前記第１、第２ベクトルレジスタ（５−１、５−２）との間に接続された第１、第２ライトデータレジスタ（４−１、４−２）と
を更に具備している。 The vector data processing apparatus of the present invention
First and second crossbar input registers (5) connected in correspondence with the first and second vector registers (5-1, 5-2) between the arithmetic output register (9) and the crossbar (2). 1-1, 1-2),
First and second crossbar output registers (3-1, 3-2) connected between the crossbar (2) and the first and second vector registers (5-1, 5-2);
First and second write data connected between the first and second crossbar output registers (3-1 and 3-2) and the first and second vector registers (5-1 and 5-2). And registers (4-1, 4-2).

本発明のベクトルデータ処理装置では、演算結果の若番要素（演算結果｛ｃ１、ｃ２｝）を、ルーティングのためのクロスバ（２）に通過させない構成にすることにより、短い時間で結果格納レジスタ（レジスタファイル（１１）、ベクトルレジスタ（５−２））に確定させることができる。 In the vector data processing device of the present invention, the result storage register (calculation result {c1, c2}) is not passed through the crossbar (2) for routing in a short time by making the configuration so that the young elements (calculation results {c1, c2}) are not passed. The register file (11) and the vector register (5-2)) can be confirmed.

また、本発明では、若番要素（ベクトルデータ｛ａ１、ａ２｝、｛ｂ１、ｂ２｝）用に少量のマルチポート構成レジスタファイル（１１）を用意し、また大容量（ベクトルデータ｛ａ３〜ａｎ｝、｛ｂ３〜ｂｎ｝）用にはポート数の少ないＲＡＭを用いて組み合わせてベクトルレジスタ（５−１、５−２）を構成することにより、少ないハードウェア量の増加でベクトルデータ処理装置を構成できる。 In the present invention, a small amount of multi-port configuration register file (11) is prepared for young elements (vector data {a1, a2}, {b1, b2}), and a large capacity (vector data {a3-an }, {B3 to bn}), the vector register (5-1, 5-2) is configured using a RAM with a small number of ports, thereby reducing the amount of hardware and increasing the vector data processing device. Can be configured.

このように、本発明のベクトルデータ処理装置は、大量のベクトルデータに関しては現状の処理を行い、少量のベクトルデータに関してはより高速に処理することができる。 As described above, the vector data processing apparatus of the present invention can perform the current processing for a large amount of vector data and can process a small amount of vector data at a higher speed.

以下に添付図面を参照して、本発明のベクトルデータ処理装置について詳細に説明する。 The vector data processing apparatus of the present invention will be described below in detail with reference to the accompanying drawings.

図２は、本発明のベクトルデータ処理装置の構成を示している。本発明のベクトルデータ処理装置は、クロスバ入力レジスタ１−１、１−２と、クロスバ２と、クロスバ出力レジスタ３−１、３−２と、ライトデータレジスタ４−１、４−２と、ベクトルレジスタ５−１、５−２と、リードデータレジスタ６−１、６−２と、演算入力レジスタ７−１、７−２と、演算器８と、演算出力レジスタ９と、若番用ライトデータレジスタ１０−１、１０−２（以下、ライトデータレジスタ１０−１、１０−２）と、レジスタファイル１１と、選択回路１２−１、１２−２とを具備している。これらはクロックに応じて動作する。 FIG. 2 shows the configuration of the vector data processing apparatus of the present invention. The vector data processing apparatus of the present invention includes crossbar input registers 1-1 and 1-2, a crossbar 2, crossbar output registers 3-1 and 3-2, write data registers 4-1 and 4-2, and a vector. Registers 5-1, 5-2, read data registers 6-1, 6-2, operation input registers 7-1, 7-2, operation unit 8, operation output register 9, and young write data Registers 10-1 and 10-2 (hereinafter, write data registers 10-1 and 10-2), a register file 11, and selection circuits 12-1 and 12-2 are provided. These operate according to the clock.

クロスバ入力レジスタ１−１、１−２は、クロスバ２に接続されている。また、クロスバ入力レジスタ１−２は、他のリソース（例えば主記憶装置など）に接続されている。クロスバ２は、クロスバ出力レジスタ３−１、３−２に接続されている。クロスバ出力レジスタ３−１、３−２は、それぞれ、ライトデータレジスタ４−１、４−２に接続されている。ライトデータレジスタ４−１、４−２は、それぞれ、ベクトルレジスタ５−１、５−２に接続されている。ベクトルレジスタ５−１、５−２は、それぞれ、リードデータレジスタ６−１、６−２に接続されている。リードデータレジスタ６−１、６−２は、それぞれ、選択回路１２−１、１２−２に接続されている。
ライトデータレジスタ１０−１、１０−２は、レジスタファイル１１に接続されている。また、ライトデータレジスタ１０−１は、他のリソースに接続されている。レジスタファイル１１は、選択回路１２−１、１２−２に接続されている。
選択回路１２−１、１２−２は、それぞれ、演算入力レジスタ７−１、７−２に接続されている。演算入力レジスタ７−１、７−２は、演算器８に接続されている。演算器８は、演算出力レジスタ９に接続されている。演算出力レジスタ９は、クロスバ入力レジスタ１−１とライトデータレジスタ１０−２に接続されている。 The crossbar input registers 1-1 and 1-2 are connected to the crossbar 2. The crossbar input register 1-2 is connected to other resources (for example, a main storage device). The crossbar 2 is connected to the crossbar output registers 3-1, 3-2. The crossbar output registers 3-1 and 3-2 are connected to the write data registers 4-1 and 4-2, respectively. The write data registers 4-1 and 4-2 are connected to the vector registers 5-1 and 5-2, respectively. The vector registers 5-1 and 5-2 are connected to the read data registers 6-1 and 6-2, respectively. The read data registers 6-1 and 6-2 are connected to the selection circuits 12-1 and 12-2, respectively.
The write data registers 10-1 and 10-2 are connected to the register file 11. The write data register 10-1 is connected to other resources. The register file 11 is connected to the selection circuits 12-1 and 12-2.
The selection circuits 12-1 and 12-2 are connected to the operation input registers 7-1 and 7-2, respectively. The arithmetic input registers 7-1 and 7-2 are connected to the arithmetic unit 8. The computing unit 8 is connected to the computation output register 9. The arithmetic output register 9 is connected to the crossbar input register 1-1 and the write data register 10-2.

レジスタファイル１１は、第１若番用格納部と、第２若番用格納部とを有している。
第１若番用格納部は、１番目からｎ番目（ｎは３以上の整数）までのｎ個の第１ベクトルデータＡ＝｛ａ１、ａ２、ａ３、・・・、ａｎ｝のうちの、１番目からｊ番目（ｊは、ｊ＜（ｎ−ｊ）を満たす整数）までのｊ個の第１ベクトルデータを格納する。
第２若番用格納部は、１番目からｎ番目までのｎ個の第２ベクトルデータＢ｛ｂ１、ｂ２、ｂ３、・・・、ｂｎ｝のうちの、１番目からｊ番目までのｊ個の第２ベクトルデータを格納する。
ここで、ｊを２とし、ｊ個の第１ベクトルデータをベクトルデータ｛ａ１、ａ２｝とし、ｊ個の第２ベクトルデータをベクトルデータ｛ｂ１、ｂ２｝とする。この場合、第１若番用格納部をレジスタファイル１１のアドレス“０”、“１”とし、第２若番用格納部をレジスタファイル１１のアドレス“２”、“３”とする。 The register file 11 has a first young number storage unit and a second young number storage unit.
The first young number storage unit includes n first vector data A = {a1, a2, a3,..., An} from the first to the nth (n is an integer of 3 or more). The j first vector data from the first to the jth (j is an integer satisfying j <(n−j)) are stored.
The second young number storage unit includes j pieces from the first to the j-th of n pieces of second vector data B {b1, b2, b3,..., Bn} from the first to the n-th. The second vector data is stored.
Here, j is set to 2, j pieces of first vector data are set as vector data {a1, a2}, and j pieces of second vector data are set as vector data {b1, b2}. In this case, the first young number storage unit is set to the addresses “0” and “1” of the register file 11, and the second young number storage unit is set to the addresses “2” and “3” of the register file 11.

このレジスタファイル１１は、２リード・２ライト構成である。
書き込みに関しては、レジスタファイル１１は、少量のベクトルデータ｛ａ１、ａ２｝についてはライトアドレスＷ１またはライトアドレスＷ２に応じてライトデータレジスタ１０−１またはライトデータレジスタ１０−２のレジスタデータを、レジスタファイル１１のアドレス“０”と“１”に格納し、少量のベクトルデータ｛ｂ１、ｂ２｝についてはライトアドレスＷ１またはライトアドレスＷ２に応じてライトデータレジスタ１０−１またはライトデータレジスタ１０−２のレジスタデータを、レジスタファイル１１のアドレス“２”と“３”に格納する。ここで、ライトデータレジスタ１０−１のベクトルデータはライトアドレスＷ１により制御され、ライトデータレジスタ１０−２のベクトルデータはライトアドレスＷ２により制御されるものとする。
読み出しに関しては、レジスタファイル１１は、リードアドレスＲ１に応じて、アドレス“０”と“１”に格納されているベクトルデータを順次読み出して選択回路１２−１へ供給し、リードアドレスＲ２に応じて、アドレス“２”と“３”に格納されているベクトルデータを順次読み出して選択回路１２−２へ供給する。 The register file 11 has a 2-read / 2-write configuration.
Regarding writing, the register file 11 stores the register data of the write data register 10-1 or the write data register 10-2 according to the write address W1 or the write address W2 for a small amount of vector data {a1, a2}. 11 is stored in addresses “0” and “1”, and a small amount of vector data {b1, b2} is stored in the write data register 10-1 or the write data register 10-2 according to the write address W1 or the write address W2. Data is stored at addresses “2” and “3” of the register file 11. Here, the vector data in the write data register 10-1 is controlled by the write address W1, and the vector data in the write data register 10-2 is controlled by the write address W2.
Regarding the reading, the register file 11 sequentially reads the vector data stored in the addresses “0” and “1” according to the read address R1 and supplies the vector data to the selection circuit 12-1, and according to the read address R2. The vector data stored in the addresses “2” and “3” are sequentially read and supplied to the selection circuit 12-2.

ベクトルレジスタ５−１は、２個のベクトルデータ｛ａ１、ａ２｝以外の（ｎ−２）個のベクトルデータ｛ａ３、・・・、ａｎ｝を格納する。 The vector register 5-1 stores (n-2) vector data {a3,..., An} other than the two vector data {a1, a2}.

このベクトルレジスタ５−１は、１リード・１ライト構成のＲＡＭによって構成されている。
書き込みに関しては、ベクトルレジスタ５−１は、（ｎ−２）個のベクトルデータ｛ａ３、ａ４、・・・、ａｎ｝をライトアドレスＷＡ１に応じて、ライトデータレジスタ４−１のベクトルデータをアドレス“０”〜“ｎ−２”に順次格納する。
読み出しに関しては、ベクトルレジスタ５−１は、リードアドレスＲＡ１に応じて、そのアドレス“０”〜“ｎ−２”に格納された大量のベクトルデータ｛ａ３、ａ４、・・・、ａｎ｝を読み出してリードデータレジスタ６−１へ供給する。 The vector register 5-1 is composed of a 1-read / 1-write RAM.
For writing, the vector register 5-1 addresses (n-2) pieces of vector data {a3, a4,..., An} according to the write address WA1, and addresses the vector data in the write data register 4-1. Store sequentially in “0” to “n−2”.
For reading, the vector register 5-1 reads a large amount of vector data {a3, a4,..., An} stored in the addresses “0” to “n-2” according to the read address RA1. To the read data register 6-1.

ベクトルレジスタ５−２は、２個のベクトルデータ｛ｂ１、ｂ２｝以外の（ｎ−２）個のベクトルデータ｛ｂ３、・・・、ｂｎ｝を格納する。 The vector register 5-2 stores (n−2) vector data {b3,..., Bn} other than the two vector data {b1, b2}.

このベクトルレジスタ５−２は、１リード・１ライト構成のＲＡＭによって構成されている。
書き込みに関しては、ベクトルレジスタ５−２は、（ｎ−２）個のベクトルデータ｛ｂ３、ｂ４、・・・、ｂｎ｝をライトアドレスＷＡ２に応じて、ライトデータレジスタ４−２のベクトルデータをアドレス“０”〜“ｎ−２”に順次格納する。
読み出しに関しては、ベクトルレジスタ５−２は、リードアドレスＲＡ２に応じて、そのアドレス“０”〜“ｎ−２”に格納された大量のベクトルデータ｛ｂ３、ｂ４、・・・、ｂｎ｝を読み出してリードデータレジスタ６−２へ供給する。 The vector register 5-2 is composed of a 1-read / 1-write RAM.
For writing, the vector register 5-2 addresses (n-2) vector data {b3, b4,..., Bn} according to the write address WA2, and addresses the vector data in the write data register 4-2. Store sequentially in “0” to “n−2”.
For reading, the vector register 5-2 reads a large amount of vector data {b3, b4,..., Bn} stored in the addresses “0” to “n-2” according to the read address RA2. To the read data register 6-2.

選択回路１２−１は、レジスタファイル１１の第１若番用格納部（アドレス“０”、“１”）の出力とリードデータレジスタ６−１の出力のうちの、レジスタファイル１１の出力を優先的に選択する。即ち、選択回路１２−１は、レジスタファイル１１の第１若番用格納部（アドレス“０”、“１”）に格納された２個のベクトルデータ｛ａ１、ａ２｝を順次に演算入力レジスタ７−１を介して演算器８に出力し、その後に、リードデータレジスタ６−１に格納された（ｎ−２）個のベクトルデータ｛ａ３、・・・、ａｎ｝を順次に演算入力レジスタ７−１を介して演算器８に出力する。 The selection circuit 12-1 gives priority to the output of the register file 11 out of the output of the first storage unit (address “0”, “1”) of the register file 11 and the output of the read data register 6-1. To choose. That is, the selection circuit 12-1 sequentially calculates the two vector data {a1, a2} stored in the first young number storage unit (addresses “0”, “1”) of the register file 11 to the operation input register. 7-1 to the computing unit 8 and then the (n-2) vector data {a3,..., An} stored in the read data register 6-1 are sequentially inputted to the arithmetic input register. The result is output to the computing unit 8 via 7-1.

選択回路１２−２は、レジスタファイル１１の第２若番用格納部（アドレス“２”、“３”）の出力とリードデータレジスタ６−２の出力のうちの、レジスタファイル１１の出力を優先的に選択する。即ち、選択回路１２−２は、レジスタファイル１１の第２若番用格納部（アドレス“２”、“３”）に格納された２個のベクトルデータ｛ｂ１、ｂ２｝を順次に演算入力レジスタ７−２を介して演算器８に出力し、その後に、リードデータレジスタ６−２に格納された（ｎ−２）個のベクトルデータ｛ｂ３、・・・、ｂｎ｝を順次に演算器８に出力する。 The selection circuit 12-2 prioritizes the output of the register file 11 out of the output of the second young number storage unit (address "2", "3") of the register file 11 and the output of the read data register 6-2. To choose. That is, the selection circuit 12-2 sequentially calculates the two vector data {b1, b2} stored in the second young number storage unit (addresses “2”, “3”) of the register file 11 in the operation input register. 7-2 is output to the arithmetic unit 8 and then (n-2) vector data {b3,..., Bn} stored in the read data register 6-2 are sequentially applied to the arithmetic unit 8. Output to.

演算器８は、本実施例の場合、パイプライン構成を用い、ベクトルデータの入力から出力まで３マシンサイクルを必要とするが、マシンサイクル毎に異なったベクトルデータＡ＝｛ａ１、ａ２、ａ３、・・・、ａｎ｝、Ｂ＝｛ｂ１、ｂ２、ｂ３、・・・、ｂｎ｝を入力し３マシンサイクル後にはマシンサイクル毎に演算結果（演算結果のベクトルデータ）Ｃ＝｛ｃ１、ｃ２、ｃ３、・・・、ｃｎ｝を生成して出力するように構成されている。すなわち、異なる演算を並列して実行できるように、複数の演算部を備えている。 In this embodiment, the computing unit 8 uses a pipeline configuration and requires three machine cycles from the input to the output of vector data. However, vector data A = {a1, a2, a3, .., An}, B = {b1, b2, b3,..., Bn}, and after 3 machine cycles, the computation results (vector data of computation results) C = {c1, c2, c3,..., cn} are generated and output. That is, a plurality of calculation units are provided so that different calculations can be executed in parallel.

この演算器８は、演算入力レジスタ７−１の出力（ｎ個のベクトルデータ｛ａ１、ａ２、ａ３、・・・、ａｎ｝）を第一オペランドとして入力し、演算入力レジスタ７−２の出力（ｎ個のベクトルデータ｛ｂ１、ｂ２、ｂ３、・・・、ｂｎ｝）を第二オペランドとして入力する。演算器８は、ｎ個のベクトルデータ｛ａ１、ａ２、ａ３、・・・、ａｎ｝とｎ個のベクトルデータ｛ｂ１、ｂ２、ｂ３、・・・、ｂｎ｝のそれぞれに対して順次に演算を施して、ｎ個の演算結果｛ｃ１、ｃ２、ｃ３、・・・、ｃｎ｝を生成し、順次に演算出力レジスタ９に出力する。 The calculator 8 inputs the output of the calculation input register 7-1 (n vector data {a1, a2, a3,..., An}) as the first operand, and outputs the calculation input register 7-2. (N vector data {b1, b2, b3,..., Bn}) are input as the second operand. The arithmetic unit 8 sequentially calculates n vector data {a1, a2, a3,..., An} and n vector data {b1, b2, b3,. To generate n calculation results {c 1, c 2, c 3,..., Cn} and sequentially output them to the operation output register 9.

ここで、ｎ個の演算結果｛ｃ１、ｃ２、ｃ３、・・・、ｃｎ｝のうちの、１番目から２番目までの２個の演算結果｛ｃ１、ｃ２｝はレジスタファイル１１に格納するよう命令で指定され、２個の演算結果｛ｃ１、ｃ２｝以外の（ｎ−２）個の演算結果｛ｃ３、・・・、ｃｎ｝はベクトルレジスタ５−２に格納するよう命令で指定されているものとする。この場合、演算出力レジスタ９は、２個の演算結果｛ｃ１、ｃ２｝を、ライトデータレジスタ１０−２を介してレジスタファイル１１に格納し、（ｎ−２）個の演算結果｛ｃ３、・・・、ｃｎ｝を、クロスバ入力レジスタ１−１を介してクロスバ２に出力する。 Here, of the n calculation results {c1, c2, c3,..., Cn}, the two calculation results {c1, c2} from the first to the second are stored in the register file 11. (N-2) operation results {c3,..., Cn} other than the two operation results {c1, c2} are specified by the instruction to be stored in the vector register 5-2. It shall be. In this case, the operation output register 9 stores the two operation results {c1, c2} in the register file 11 via the write data register 10-2, and (n-2) operation results {c3,. .., Cn} are output to the crossbar 2 via the crossbar input register 1-1.

クロスバ２は、（ｎ−２）個の演算結果｛ｃ３、・・・、ｃｎ｝に対して、ベクトルレジスタ５−１、５−２のうちの、命令で指定されるベクトルレジスタ５−２に格納するためにルーティングを行い、（ｎ−２）個の演算結果｛ｃ３、・・・、ｃｎ｝を順次にクロスバ出力レジスタ３−２、ライトデータレジスタ４−２を介してベクトルレジスタ５−２に格納する。 For the (n−2) operation results {c3,..., Cn}, the crossbar 2 applies to the vector register 5-2 specified by the instruction in the vector registers 5-1 and 5-2. Routing is performed for storage, and (n−2) operation results {c3,..., Cn} are sequentially transferred to the vector register 5-2 via the crossbar output register 3-2 and the write data register 4-2. To store.

本発明のベクトルデータ処理装置では、演算結果の若番要素（演算結果｛ｃ１、ｃ２｝）を、ルーティングのためのクロスバ２に通過させない構成にすることにより、短い時間で結果格納レジスタ（レジスタファイル１１、ベクトルレジスタ５−２）に確定させることができる。
また、本発明では、若番要素（ベクトルデータ｛ａ１、ａ２｝、｛ｂ１、ｂ２｝）用に少量のマルチポート構成レジスタファイル１１を用意し、また大容量（ベクトルデータ｛ａ３〜ａｎ｝、｛ｂ３〜ｂｎ｝）用にはポート数の少ないＲＡＭを用いて組み合わせてベクトルレジスタ５−１、５−２を構成することにより、少ないハードウェア量の増加でベクトルデータ処理装置を構成できる。
以下、これについて詳細に説明する。 In the vector data processing apparatus of the present invention, the result storage register (register file) can be obtained in a short time by adopting a configuration in which the young element of the operation result (operation result {c1, c2}) is not passed through the crossbar 2 for routing. 11 and the vector register 5-2).
In the present invention, a small amount of multiport configuration register file 11 is prepared for the young elements (vector data {a1, a2}, {b1, b2}), and a large capacity (vector data {a3 to an}, For {b3 to bn}), a vector data processor can be configured with a small increase in hardware by configuring the vector registers 5-1 and 5-2 using a RAM with a small number of ports.
This will be described in detail below.

図３は、本発明のベクトルデータ処理装置の動作を示すタイミングチャートである。 FIG. 3 is a timing chart showing the operation of the vector data processing apparatus of the present invention.

本実施例において、それぞれ９個のベクトルデータＡ＝｛ａ１、ａ２、ａ３、・・・、ａ９｝とＢ＝｛ｂ１、ｂ２、ｂ３、・・・、ｂ９｝に演算を行い、９個の演算結果Ｃ＝｛ｃ１、ｃ２、ｃ３、・・・、ｃ９｝を算出する場合について説明する。演算に先立ち、各ベクトルデータは、ベクトルデータＡについては、ベクトルデータａ１、ａ２がそれぞれレジスタファイル１１のアドレス“０”、“１”に格納され、ベクトルデータａ３〜ａ９である７個のベクトルデータがそれぞれベクトルレジスタ５−１のアドレス“０”〜“６”に格納され、また、ベクトルデータＢについては、ベクトルデータｂ１、ｂ２がそれぞれレジスタファイル１１のアドレス“２”、“３”に格納され、ベクトルデータｂ３〜ｂ９である７個のベクトルデータがベクトルレジスタ５−２のアドレス“０”〜“６”に格納されているものとする。また、演算結果ｃ１、ｃ２はレジスタファイル１１に格納するよう命令で指定され、演算結果ｃ３〜ｃ９はベクトルレジスタ５−２に格納するよう命令で指定されているものとする。 In this embodiment, 9 vector data A = {a1, a2, a3,..., A9} and B = {b1, b2, b3,. A case where the calculation result C = {c1, c2, c3,..., C9} is calculated will be described. Prior to the calculation, for each vector data, for vector data A, vector data a1 and a2 are stored at addresses “0” and “1” of register file 11, respectively, and seven vector data which are vector data a3 to a9. Are stored at addresses “0” to “6” of the vector register 5-1, respectively, and for the vector data B, vector data b1 and b2 are stored at addresses “2” and “3” of the register file 11, respectively. It is assumed that seven vector data as vector data b3 to b9 are stored at addresses “0” to “6” of the vector register 5-2. In addition, it is assumed that the operation results c1 and c2 are specified by an instruction to be stored in the register file 11, and the operation results c3 to c9 are specified by an instruction to be stored in the vector register 5-2.

（クロックサイクル“１”において）
まず、レジスタファイル１１は、その第一リードポートに対応したアドレスレジスタＲ１にアドレス“０”をセットし、その第二リードポートに対応したアドレスレジスタＲ２にアドレス“２”をセットする。レジスタファイル１１は、それぞれ、リードアドレスＲ１、Ｒ２に応じて、自身のアドレス“０”、“２”に格納されているベクトルデータａ１、ｂ１を同時に読み出し、選択回路１２−１、１２−２に出力する。
選択回路１２−１、１２−２は、それぞれ、レジスタファイル１１の第一、第二リードポート（ベクトルデータａ１、ｂ１）を選択し、次のクロックサイクルであるクロックサイクル“２”で、ベクトルデータａ１、ｂ１を演算入力レジスタ７−１、７−２に格納する。 (In clock cycle “1”)
First, the register file 11 sets the address “0” in the address register R1 corresponding to the first read port, and sets the address “2” in the address register R2 corresponding to the second read port. The register file 11 simultaneously reads the vector data a1 and b1 stored in its own addresses “0” and “2” according to the read addresses R1 and R2, respectively, and sends them to the selection circuits 12-1 and 12-2. Output.
The selection circuits 12-1 and 12-2 select the first and second read ports (vector data a1 and b1) of the register file 11, respectively, and the vector data in the clock cycle “2” that is the next clock cycle. a1 and b1 are stored in the arithmetic input registers 7-1 and 7-2.

（クロックサイクル“２”において）
演算器８は、この時刻に値が確定している演算入力レジスタ７−１の出力（すなわちベクトルデータａ１）を第一オペランドとして入力し、演算入力レジスタ７−２の出力（すなわちベクトルデータｂ１）を第二オペランドとして入力して演算を開始する。上述のように、本実施例での演算器８は、演算処理を３つのステージに分割して処理するパイプライン構成を取っており、演算結果が演算出力レジスタ９に確定するまで３クロックサイクル必要としている。そのため、演算器８は、それぞれ演算入力レジスタ７−１、７−２に格納されているベクトルデータａ１、ｂ１に対して演算を施し、その演算結果ｃ１を、３クロックサイクル後のクロックサイクル“５”で演算出力レジスタ９に格納する。
レジスタファイル１１は、その第一リードポートに対応したアドレスレジスタＲ１にアドレス“１”をセットし、その第二リードポートに対応したアドレスレジスタＲ２にアドレス“３”をセットする。レジスタファイル１１は、それぞれ、リードアドレスＲ１、Ｒ２に応じて、自身のアドレス“１”、“３”に格納されているベクトルデータａ２、ｂ２を同時に読み出し、選択回路１２−１、１２−２に出力する。
選択回路１２−１、１２−２は、それぞれ、レジスタファイル１１の第一、第二リードポート（ベクトルデータａ２、ｂ２）を選択し、次のクロックサイクルであるクロックサイクル“３”で、ベクトルデータａ２、ｂ２を演算入力レジスタ７−１、７−２に格納する。
更に、ベクトルレジスタ５−１は、そのリードを制御するアドレスレジスタＲＡ１にアドレス“０”をセットし、ベクトルレジスタ５−２は、そのリードを制御するアドレスレジスタＲＡ２にアドレス“０”をセットする。ベクトルレジスタ５−１、５−２は、それぞれ、リードアドレスＲＡ１、ＲＡ２に応じて、自身のアドレス“０”に格納されているベクトルデータａ３、ｂ３を読み出し、次のクロックサイクルであるクロックサイクル“３”で、それぞれリードデータレジスタ６−１、６−２に出力する。 (In clock cycle “2”)
The arithmetic unit 8 inputs the output (that is, vector data a1) of the arithmetic input register 7-1 whose value is fixed at this time as a first operand, and outputs the output (that is, vector data b1) of the arithmetic input register 7-2. Is input as the second operand to start the operation. As described above, the arithmetic unit 8 in this embodiment has a pipeline configuration in which the arithmetic processing is divided into three stages, and three clock cycles are required until the arithmetic result is determined in the arithmetic output register 9. It is said. Therefore, the arithmetic unit 8 performs an operation on the vector data a1 and b1 stored in the operation input registers 7-1 and 7-2, respectively, and outputs the operation result c1 to a clock cycle “5 after three clock cycles. "Is stored in the operation output register 9.
The register file 11 sets the address “1” in the address register R1 corresponding to the first read port, and sets the address “3” in the address register R2 corresponding to the second read port. The register file 11 simultaneously reads the vector data a2 and b2 stored in its own addresses “1” and “3” according to the read addresses R1 and R2, respectively, and sends them to the selection circuits 12-1 and 12-2. Output.
The selection circuits 12-1 and 12-2 select the first and second read ports (vector data a2 and b2) of the register file 11, respectively, and the vector data in the clock cycle “3” that is the next clock cycle. a2 and b2 are stored in the arithmetic input registers 7-1 and 7-2.
Further, the vector register 5-1 sets the address “0” in the address register RA1 that controls the read, and the vector register 5-2 sets the address “0” in the address register RA2 that controls the read. The vector registers 5-1 and 5-2 read the vector data a 3 and b 3 stored in their own addresses “0” in accordance with the read addresses RA 1 and RA 2, respectively, and the clock cycle “ 3 ", the data is output to the read data registers 6-1 and 6-2, respectively.

（クロックサイクル“３”において）
演算器８は、それぞれ演算入力レジスタ７−１、７−２に格納されているベクトルデータａ２、ｂ２に対して演算を施し、その演算結果ｃ２を、３クロックサイクル後のクロックサイクル“６”で演算出力レジスタ９に格納する。
選択回路１２−１、１２−２は、それぞれ、リードデータレジスタ６−１、６−２（ベクトルデータａ３、ｂ３）を選択し、次のクロックサイクルであるクロックサイクル“４”で、ベクトルデータａ３、ｂ３を演算入力レジスタ７−１、７−２に格納する。
ベクトルレジスタ５−１、５−２は、それぞれ、そのリードを制御するアドレスレジスタＲＡ１、ＲＡ２にアドレス“１”をセットする。ベクトルレジスタ５−１、５−２は、それぞれ、リードアドレスＲＡ１、ＲＡ２に応じて、自身のアドレス“１”に格納されているベクトルデータａ４、ｂ４を読み出し、次のクロックサイクルであるクロックサイクル“４”で、リードデータレジスタ６−１、６−２に出力する。 (In clock cycle “3”)
The arithmetic unit 8 performs an operation on the vector data a2 and b2 stored in the operation input registers 7-1 and 7-2, respectively, and outputs the operation result c2 in the clock cycle “6” after 3 clock cycles. Store in the arithmetic output register 9.
The selection circuits 12-1 and 12-2 select the read data registers 6-1 and 6-2 (vector data a3 and b3), respectively, and the vector data a3 in the clock cycle “4” that is the next clock cycle. , B3 are stored in the operation input registers 7-1 and 7-2.
The vector registers 5-1 and 5-2 respectively set the address “1” in the address registers RA 1 and RA 2 that control the reading. The vector registers 5-1 and 5-2 read the vector data a 4 and b 4 stored in their own addresses “1” according to the read addresses RA 1 and RA 2, respectively, and the next clock cycle, which is the clock cycle “ At 4 ″, the data is output to the read data registers 6-1 and 6-2.

（クロックサイクル“４”において）
演算器８は、それぞれ演算入力レジスタ７−１、７−２に格納されているベクトルデータａ３、ｂ３に対して演算を施し、その演算結果ｃ３を、３クロックサイクル後のクロックサイクル“７”で演算出力レジスタ９に格納する。
選択回路１２−１、１２−２は、それぞれ、リードデータレジスタ６−１、６−２（ベクトルデータａ４、ｂ４）を選択し、次のクロックサイクルであるクロックサイクル“５”で、ベクトルデータａ４、ｂ４を演算入力レジスタ７−１、７−２に格納する。
ベクトルレジスタ５−１、５−２は、それぞれ、そのリードを制御するアドレスレジスタＲＡ１、ＲＡ２にアドレス“２”をセットする。ベクトルレジスタ５−１、５−２は、それぞれ、リードアドレスＲＡ１、ＲＡ２に応じて、自身のアドレス“２”に格納されているベクトルデータａ５、ｂ５を読み出し、次のクロックサイクルであるクロックサイクル“５”で、リードデータレジスタ６−１、６−２に出力する。 (At clock cycle “4”)
The arithmetic unit 8 performs an operation on the vector data a3 and b3 stored in the operation input registers 7-1 and 7-2, respectively, and outputs the operation result c3 in the clock cycle “7” after 3 clock cycles. Store in the arithmetic output register 9.
The selection circuits 12-1 and 12-2 select the read data registers 6-1 and 6-2 (vector data a4 and b4), respectively, and the vector data a4 in the clock cycle “5” that is the next clock cycle. , B4 are stored in the operation input registers 7-1 and 7-2.
The vector registers 5-1 and 5-2 set the address “2” in the address registers RA 1 and RA 2 that control the reading. The vector registers 5-1 and 5-2 read the vector data a 5 and b 5 stored in their own address “2” according to the read addresses RA 1 and RA 2, respectively, and the next clock cycle is the clock cycle “ At 5 ″, the data is output to the read data registers 6-1 and 6-2.

（クロックサイクル“５”において）
演算器８は、それぞれ演算入力レジスタ７−１、７−２に格納されているベクトルデータａ４、ｂ４に対して演算を施し、その演算結果ｃ４を、３クロックサイクル後のクロックサイクル“８”で演算出力レジスタ９に格納する。
選択回路１２−１、１２−２は、それぞれ、リードデータレジスタ６−１、６−２（ベクトルデータａ５、ｂ５）を選択し、次のクロックサイクルであるクロックサイクル“６”で、ベクトルデータａ５、ｂ５を演算入力レジスタ７−１、７−２に格納する。
ベクトルレジスタ５−１、５−２は、それぞれ、そのリードを制御するアドレスレジスタＲＡ１、ＲＡ２にアドレス“３”をセットする。ベクトルレジスタ５−１、５−２は、それぞれ、リードアドレスＲＡ１、ＲＡ２に応じて、自身のアドレス“３”に格納されているベクトルデータａ６、ｂ６を読み出し、次のクロックサイクルであるクロックサイクル“６”で、リードデータレジスタ６−１、６−２に出力する。
演算出力レジスタ９は、自身が格納している演算結果ｃ１を、次のクロックサイクルであるクロックサイクル“６”でライトデータレジスタ１０−２に格納する。 (At clock cycle “5”)
The arithmetic unit 8 performs an operation on the vector data a4 and b4 stored in the operation input registers 7-1 and 7-2, respectively, and outputs the operation result c4 in the clock cycle “8” after 3 clock cycles. Store in the arithmetic output register 9.
The selection circuits 12-1 and 12-2 select the read data registers 6-1 and 6-2 (vector data a5 and b5), respectively, and the vector data a5 in the clock cycle “6” that is the next clock cycle. , B5 are stored in the operation input registers 7-1 and 7-2.
The vector registers 5-1 and 5-2 set the address “3” in the address registers RA 1 and RA 2 that control the reading. The vector registers 5-1 and 5-2 read the vector data a 6 and b 6 stored in their own address “3” in accordance with the read addresses RA 1 and RA 2, respectively, and the clock cycle “ At 6 ″, the data is output to the read data registers 6-1 and 6-2.
The operation output register 9 stores the operation result c1 stored in the operation output register 9 in the write data register 10-2 in the clock cycle “6” which is the next clock cycle.

（クロックサイクル“６”において）
演算器８は、それぞれ演算入力レジスタ７−１、７−２に格納されているベクトルデータａ５、ｂ５に対して演算を施し、その演算結果ｃ５を、３クロックサイクル後のクロックサイクル“９”で演算出力レジスタ９に格納する。
選択回路１２−１、１２−２は、それぞれ、リードデータレジスタ６−１、６−２（ベクトルデータａ６、ｂ６）を選択し、次のクロックサイクルであるクロックサイクル“７”で、ベクトルデータａ６、ｂ６を演算入力レジスタ７−１、７−２に格納する。
ベクトルレジスタ５−１、５−２は、それぞれ、そのリードを制御するアドレスレジスタＲＡ１、ＲＡ２にアドレス“４”をセットする。ベクトルレジスタ５−１、５−２は、それぞれ、リードアドレスＲＡ１、ＲＡ２に応じて、自身のアドレス“４”に格納されているベクトルデータａ７、ｂ７を読み出し、次のクロックサイクルであるクロックサイクル“７”で、リードデータレジスタ６−１、６−２に出力する。
演算出力レジスタ９は、自身が格納している演算結果ｃ２を、次のクロックサイクルであるクロックサイクル“７”でライトデータレジスタ１０−２に格納する。
レジスタファイル１１は、その書き込みを制御するライトアドレスレジスタＷ２にアドレス“３”をセットする。レジスタファイル１１は、ライトアドレスＷ２に応じて、ライトデータレジスタ１０−２に格納されている演算結果ｃ１を、次のクロックサイクルであるクロックサイクル“７”で自身のアドレス“３”に格納する。 (At clock cycle “6”)
The arithmetic unit 8 performs an operation on the vector data a5 and b5 stored in the operation input registers 7-1 and 7-2, respectively, and outputs the operation result c5 in the clock cycle “9” after 3 clock cycles. Store in the arithmetic output register 9.
The selection circuits 12-1 and 12-2 select the read data registers 6-1 and 6-2 (vector data a6 and b6), respectively, and the vector data a6 in the clock cycle “7” that is the next clock cycle. , B6 are stored in the operation input registers 7-1 and 7-2.
The vector registers 5-1 and 5-2 set the address “4” in the address registers RA 1 and RA 2 that control the reading. The vector registers 5-1 and 5-2 read the vector data a 7 and b 7 stored in their own address “4” according to the read addresses RA 1 and RA 2, respectively, and the next clock cycle, which is the clock cycle “ 7 ", the data is output to the read data registers 6-1 and 6-2.
The operation output register 9 stores the operation result c2 stored in the operation output register 9 in the write data register 10-2 in the clock cycle “7” which is the next clock cycle.
The register file 11 sets the address “3” in the write address register W2 that controls the writing. The register file 11 stores the calculation result c1 stored in the write data register 10-2 in its address “3” in the next clock cycle “7” according to the write address W2.

（クロックサイクル“７”において）
演算器８は、それぞれ演算入力レジスタ７−１、７−２に格納されているベクトルデータａ６、ｂ６に対して演算を施し、その演算結果ｃ６を、３クロックサイクル後のクロックサイクル“１０”で演算出力レジスタ９に格納する。
選択回路１２−１、１２−２は、それぞれ、リードデータレジスタ６−１、６−２（ベクトルデータａ７、ｂ７）を選択し、次のクロックサイクルであるクロックサイクル“８”で、ベクトルデータａ７、ｂ７を演算入力レジスタ７−１、７−２に格納する。
ベクトルレジスタ５−１、５−２は、それぞれ、そのリードを制御するアドレスレジスタＲＡ１、ＲＡ２にアドレス“５”をセットする。ベクトルレジスタ５−１、５−２は、それぞれ、リードアドレスＲＡ１、ＲＡ２に応じて、自身のアドレス“５”に格納されているベクトルデータａ８、ｂ８を読み出し、次のクロックサイクルであるクロックサイクル“８”で、リードデータレジスタ６−１、６−２に出力する。
演算出力レジスタ９は、自身が格納している演算結果ｃ３を、次のクロックサイクルであるクロックサイクル“８”でクロスバ入力レジスタ１−１に格納する。
レジスタファイル１１は、自身の書き込みを制御するライトアドレスレジスタＷ２にアドレス“４”をセットする。レジスタファイル１１は、ライトアドレスＷ２に応じて、ライトデータレジスタ１０−２に格納されている演算結果ｃ２を、次のクロックサイクルであるクロックサイクル“８”で自身のアドレス“４”に格納する。 (At clock cycle “7”)
The arithmetic unit 8 performs an operation on the vector data a6 and b6 stored in the operation input registers 7-1 and 7-2, respectively, and outputs the operation result c6 in the clock cycle “10” after 3 clock cycles. Store in the arithmetic output register 9.
The selection circuits 12-1 and 12-2 select the read data registers 6-1 and 6-2 (vector data a7 and b7), respectively, and the vector data a7 in the clock cycle “8” that is the next clock cycle. , B7 are stored in the arithmetic input registers 7-1 and 7-2.
The vector registers 5-1 and 5-2 set the address “5” in the address registers RA 1 and RA 2 that control the reading. The vector registers 5-1 and 5-2 read the vector data a 8 and b 8 stored in their own addresses “5” in accordance with the read addresses RA 1 and RA 2, respectively, and the clock cycle “ At 8 ″, the data is output to the read data registers 6-1 and 6-2.
The calculation output register 9 stores the calculation result c3 stored in the calculation output register 9 in the crossbar input register 1-1 at the clock cycle “8” which is the next clock cycle.
The register file 11 sets the address “4” in the write address register W2 that controls its writing. The register file 11 stores the calculation result c2 stored in the write data register 10-2 at its address “4” in the next clock cycle “8” according to the write address W2.

（クロックサイクル“８”において）
演算器８は、それぞれ演算入力レジスタ７−１、７−２に格納されているベクトルデータａ７、ｂ７に対して演算を施し、その演算結果ｃ７を、３クロックサイクル後のクロックサイクル“１１”で演算出力レジスタ９に格納する。
選択回路１２−１、１２−２は、それぞれ、リードデータレジスタ６−１、６−２（ベクトルデータａ８、ｂ８）を選択し、次のクロックサイクルであるクロックサイクル“９”で、ベクトルデータａ８、ｂ８を演算入力レジスタ７−１、７−２に格納する。
ベクトルレジスタ５−１、５−２は、それぞれ、そのリードを制御するアドレスレジスタＲＡ１、ＲＡ２にアドレス“６”をセットする。ベクトルレジスタ５−１、５−２は、それぞれ、リードアドレスＲＡ１、ＲＡ２に応じて、自身のアドレス“６”に格納されているベクトルデータａ９、ｂ９を読み出し、次のクロックサイクルであるクロックサイクル“９”で、リードデータレジスタ６−１、６−２に出力する。
クロスバ２は、クロスバ入力レジスタ１−１に格納されている演算結果ｃ３に対してルーティングし、次のクロックサイクルであるクロックサイクル“９”でクロスバ出力レジスタ３−２に格納する。
演算出力レジスタ９は、自身が格納している演算結果ｃ４を、次のクロックサイクルであるクロックサイクル“９”でクロスバ入力レジスタ１−１に格納する。 (At clock cycle “8”)
The arithmetic unit 8 performs an operation on the vector data a7 and b7 stored in the operation input registers 7-1 and 7-2, respectively, and outputs the operation result c7 in the clock cycle “11” after 3 clock cycles. Store in the arithmetic output register 9.
The selection circuits 12-1 and 12-2 select the read data registers 6-1 and 6-2 (vector data a8 and b8), respectively, and the vector data a8 in the clock cycle “9” that is the next clock cycle. , B8 are stored in the operation input registers 7-1 and 7-2.
The vector registers 5-1 and 5-2 set the address “6” in the address registers RA 1 and RA 2 that control the reading. The vector registers 5-1 and 5-2 read the vector data a 9 and b 9 stored in their own addresses “6” in accordance with the read addresses RA 1 and RA 2, respectively, and the clock cycle “ At 9 ″, the data is output to the read data registers 6-1 and 6-2.
The crossbar 2 routes the operation result c3 stored in the crossbar input register 1-1, and stores it in the crossbar output register 3-2 at the clock cycle "9" which is the next clock cycle.
The operation output register 9 stores the operation result c4 stored in the operation output register 9 in the crossbar input register 1-1 at the clock cycle “9” which is the next clock cycle.

（クロックサイクル“９”において）
演算器８は、それぞれ演算入力レジスタ７−１、７−２に格納されているベクトルデータａ８、ｂ８に対して演算を施し、その演算結果ｃ８を、３クロックサイクル後のクロックサイクル“１２”で演算出力レジスタ９に格納する。
選択回路１２−１、１２−２は、それぞれ、リードデータレジスタ６−１、６−２（ベクトルデータａ９、ｂ９）を選択し、次のクロックサイクルであるクロックサイクル“１０”で、ベクトルデータａ９、ｂ９を演算入力レジスタ７−１、７−２に格納する。
クロスバ出力レジスタ３−２は、自身が格納している演算結果ｃ３を、次のクロックサイクルであるクロックサイクル“１０”でライトデータレジスタ４−２に格納する。
クロスバ２は、クロスバ入力レジスタ１−１に格納されている演算結果ｃ４に対してルーティングし、次のクロックサイクルであるクロックサイクル“１０”でクロスバ出力レジスタ３−２に格納する。
演算出力レジスタ９は、自身が格納している演算結果ｃ５を、次のクロックサイクルであるクロックサイクル“１０”でクロスバ入力レジスタ１−１に格納する。 (At clock cycle “9”)
The arithmetic unit 8 performs an operation on the vector data a8 and b8 stored in the operation input registers 7-1 and 7-2, respectively, and outputs the operation result c8 in the clock cycle “12” after 3 clock cycles. Store in the arithmetic output register 9.
The selection circuits 12-1 and 12-2 select the read data registers 6-1 and 6-2 (vector data a9 and b9), respectively, and the vector data a9 in the clock cycle “10” that is the next clock cycle. , B9 are stored in the operation input registers 7-1 and 7-2.
The crossbar output register 3-2 stores the calculation result c3 stored in the crossbar output register 3-2 in the write data register 4-2 in the clock cycle “10” which is the next clock cycle.
The crossbar 2 routes the operation result c4 stored in the crossbar input register 1-1, and stores it in the crossbar output register 3-2 at the clock cycle "10" which is the next clock cycle.
The calculation output register 9 stores the calculation result c5 stored in the calculation output register 9 in the crossbar input register 1-1 in the clock cycle “10” which is the next clock cycle.

（クロックサイクル“１０”において）
演算器８は、それぞれ演算入力レジスタ７−１、７−２に格納されているベクトルデータａ９、ｂ９に対して演算を施し、その演算結果ｃ９を、３クロックサイクル後のクロックサイクル“１３”で演算出力レジスタ９に格納する。
ベクトルレジスタ５−２は、その書き込みを制御するアドレスレジスタＷＡ２にアドレス“０”をセットする。ベクトルレジスタ５−２は、ライトアドレスＷＡ２に応じて、ライトデータレジスタ４−２に格納されている演算結果ｃ１を、次のクロックサイクルであるクロックサイクル“１１”で自身のアドレス“０”に格納する。
クロスバ出力レジスタ３−２は、自身が格納している演算結果ｃ４を、次のクロックサイクルであるクロックサイクル“１１”でライトデータレジスタ４−２に格納する。
クロスバ２は、クロスバ入力レジスタ１−１に格納されている演算結果ｃ５に対してルーティングし、次のクロックサイクルであるクロックサイクル“１１”でクロスバ出力レジスタ３−２に格納する。
演算出力レジスタ９は、自身が格納している演算結果ｃ６を、次のクロックサイクルであるクロックサイクル“１１”でクロスバ入力レジスタ１−１に格納する。 (At clock cycle “10”)
The arithmetic unit 8 performs an operation on the vector data a9 and b9 stored in the operation input registers 7-1 and 7-2, respectively, and outputs the operation result c9 in the clock cycle “13” after 3 clock cycles. Store in the arithmetic output register 9.
The vector register 5-2 sets the address “0” in the address register WA2 that controls the writing. The vector register 5-2 stores the operation result c1 stored in the write data register 4-2 at its address “0” in the next clock cycle “11” according to the write address WA2. To do.
The crossbar output register 3-2 stores the operation result c4 stored in the crossbar output register 3-2 in the write data register 4-2 in the clock cycle “11” which is the next clock cycle.
The crossbar 2 routes to the operation result c5 stored in the crossbar input register 1-1, and stores it in the crossbar output register 3-2 at the clock cycle "11" which is the next clock cycle.
The operation output register 9 stores the operation result c6 stored in the operation output register 9 in the crossbar input register 1-1 in the clock cycle “11” which is the next clock cycle.

（クロックサイクル“１１”において）
ベクトルレジスタ５−２は、その書き込みを制御するアドレスレジスタＷＡ２にアドレス“１”をセットする。ベクトルレジスタ５−２は、ライトアドレスＷＡ２に応じて、ライトデータレジスタ４−２に格納されている演算結果ｃ２を、次のクロックサイクルであるクロックサイクル“１２”で自身のアドレス“１”に格納する。
クロスバ出力レジスタ３−２は、自身が格納している演算結果ｃ５を、次のクロックサイクルであるクロックサイクル“１２”でライトデータレジスタ４−２に格納する。
クロスバ２は、クロスバ入力レジスタ１−１に格納されている演算結果ｃ６に対してルーティングし、次のクロックサイクルであるクロックサイクル“１２”でクロスバ出力レジスタ３−２に格納する。
演算出力レジスタ９は、自身が格納している演算結果ｃ７を、次のクロックサイクルであるクロックサイクル“１２”でクロスバ入力レジスタ１−１に格納する。 (At clock cycle “11”)
The vector register 5-2 sets the address “1” in the address register WA2 that controls the writing. The vector register 5-2 stores the operation result c2 stored in the write data register 4-2 at its address “1” in the next clock cycle “12” according to the write address WA2. To do.
The crossbar output register 3-2 stores the operation result c5 stored in the crossbar output register 3-2 in the write data register 4-2 in the clock cycle “12” which is the next clock cycle.
The crossbar 2 routes the operation result c6 stored in the crossbar input register 1-1, and stores it in the crossbar output register 3-2 at the clock cycle "12" which is the next clock cycle.
The operation output register 9 stores the operation result c7 stored in the operation output register 9 in the crossbar input register 1-1 in the clock cycle “12” which is the next clock cycle.

（クロックサイクル“１２”において）
ベクトルレジスタ５−２は、その書き込みを制御するアドレスレジスタＷＡ２にアドレス“２”をセットする。ベクトルレジスタ５−２は、ライトアドレスＷＡ２に応じて、ライトデータレジスタ４−２に格納されている演算結果ｃ３を、次のクロックサイクルであるクロックサイクル“１３”で自身のアドレス“２”に格納する。
クロスバ出力レジスタ３−２は、自身が格納している演算結果ｃ６を、次のクロックサイクルであるクロックサイクル“１３”でライトデータレジスタ４−２に格納する。
クロスバ２は、クロスバ入力レジスタ１−１に格納されている演算結果ｃ７に対してルーティングし、次のクロックサイクルであるクロックサイクル“１３”でクロスバ出力レジスタ３−２に格納する。
演算出力レジスタ９は、自身が格納している演算結果ｃ８を、次のクロックサイクルであるクロックサイクル“１３”でクロスバ入力レジスタ１−１に格納する。 (At clock cycle “12”)
The vector register 5-2 sets the address “2” in the address register WA2 that controls the writing. The vector register 5-2 stores the operation result c3 stored in the write data register 4-2 at its address “2” in the next clock cycle “13” according to the write address WA2. To do.
The crossbar output register 3-2 stores the operation result c6 stored in the crossbar output register 3-2 in the write data register 4-2 in the clock cycle “13” which is the next clock cycle.
The crossbar 2 routes the operation result c7 stored in the crossbar input register 1-1, and stores it in the crossbar output register 3-2 at the clock cycle "13" which is the next clock cycle.
The calculation output register 9 stores the calculation result c8 stored in the calculation output register 9 in the crossbar input register 1-1 in the clock cycle “13” which is the next clock cycle.

（クロックサイクル“１３”において）
ベクトルレジスタ５−２は、その書き込みを制御するアドレスレジスタＷＡ２にアドレス“３”をセットする。ベクトルレジスタ５−２は、ライトアドレスＷＡ２に応じて、ライトデータレジスタ４−２に格納されている演算結果ｃ４を、次のクロックサイクルであるクロックサイクル“１４”で自身のアドレス“３”に格納する。
クロスバ出力レジスタ３−２は、自身が格納している演算結果ｃ７を、次のクロックサイクルであるクロックサイクル“１４”でライトデータレジスタ４−２に格納する。
クロスバ２は、クロスバ入力レジスタ１−１に格納されている演算結果ｃ８に対してルーティングし、次のクロックサイクルであるクロックサイクル“１４”でクロスバ出力レジスタ３−２に格納する。
演算出力レジスタ９は、自身が格納している演算結果ｃ９を、次のクロックサイクルであるクロックサイクル“１４”でクロスバ入力レジスタ１−１に格納する。 (At clock cycle “13”)
The vector register 5-2 sets the address “3” in the address register WA2 that controls the writing. The vector register 5-2 stores the calculation result c4 stored in the write data register 4-2 at its address “3” in the next clock cycle “14” according to the write address WA2. To do.
The crossbar output register 3-2 stores the operation result c7 stored in the crossbar output register 3-2 in the write data register 4-2 in the clock cycle “14” which is the next clock cycle.
The crossbar 2 routes the operation result c8 stored in the crossbar input register 1-1, and stores it in the crossbar output register 3-2 at clock cycle "14" which is the next clock cycle.
The operation output register 9 stores the operation result c9 stored in the operation output register 9 in the crossbar input register 1-1 in the clock cycle “14” which is the next clock cycle.

（クロックサイクル“１４”において）
ベクトルレジスタ５−２は、その書き込みを制御するアドレスレジスタＷＡ２にアドレス“４”をセットする。ベクトルレジスタ５−２は、ライトアドレスＷＡ２に応じて、ライトデータレジスタ４−２に格納されている演算結果ｃ５を、次のクロックサイクルであるクロックサイクル“１５”で自身のアドレス“４”に格納する。
クロスバ出力レジスタ３−２は、自身が格納している演算結果ｃ８を、次のクロックサイクルであるクロックサイクル“１５”でライトデータレジスタ４−２に格納する。
クロスバ２は、クロスバ入力レジスタ１−１に格納されている演算結果ｃ９に対してルーティングし、次のクロックサイクルであるクロックサイクル“１５”でクロスバ出力レジスタ３−２に格納する。 (At clock cycle “14”)
The vector register 5-2 sets the address “4” in the address register WA2 that controls the writing. The vector register 5-2 stores the operation result c5 stored in the write data register 4-2 at its address “4” in the clock cycle “15” which is the next clock cycle in accordance with the write address WA2. To do.
The crossbar output register 3-2 stores the operation result c8 stored in the crossbar output register 3-2 in the write data register 4-2 in the clock cycle “15” which is the next clock cycle.
The crossbar 2 routes the operation result c9 stored in the crossbar input register 1-1, and stores it in the crossbar output register 3-2 at the clock cycle "15" which is the next clock cycle.

（クロックサイクル“１５”において）
ベクトルレジスタ５−２は、その書き込みを制御するアドレスレジスタＷＡ２にアドレス“５”をセットする。ベクトルレジスタ５−２は、ライトアドレスＷＡ２に応じて、ライトデータレジスタ４−２に格納されている演算結果ｃ６を、次のクロックサイクルであるクロックサイクル“１６”で自身のアドレス“５”に格納する。
クロスバ出力レジスタ３−２は、自身が格納している演算結果ｃ９を、次のクロックサイクルであるクロックサイクル“１６”でライトデータレジスタ４−２に格納する。 (At clock cycle “15”)
The vector register 5-2 sets the address “5” in the address register WA2 that controls the writing. The vector register 5-2 stores the calculation result c6 stored in the write data register 4-2 at its address “5” in the next clock cycle “16” in accordance with the write address WA2. To do.
The crossbar output register 3-2 stores the operation result c9 stored in the crossbar output register 3-2 in the write data register 4-2 in the clock cycle “16” which is the next clock cycle.

（クロックサイクル“１６”において）
ベクトルレジスタ５−２は、その書き込みを制御するアドレスレジスタＷＡ２にアドレス“６”をセットする。ベクトルレジスタ５−２は、ライトアドレスＷＡ２に応じて、ライトデータレジスタ４−２に格納されている演算結果ｃ７を、次のクロックサイクルであるクロックサイクル“１７”で自身のアドレス“６”に格納する。 (At clock cycle “16”)
The vector register 5-2 sets the address “6” in the address register WA2 that controls the writing. The vector register 5-2 stores the calculation result c7 stored in the write data register 4-2 at its address “6” in the next clock cycle “17” according to the write address WA2. To do.

以上の説明により、本発明のベクトルデータ処理装置は、次のような効果を奏する。 As described above, the vector data processing apparatus of the present invention has the following effects.

まず、第１の効果について説明する。本発明のベクトルデータ処理装置によれば、演算結果の若番要素（演算結果｛ｃ１、ｃ２｝）を、ルーティングのためのクロスバ２に通過させない構成にしている。このため、短い時間で結果格納レジスタ（レジスタファイル１１、ベクトルレジスタ５−２）に確定させることができる。 First, the first effect will be described. According to the vector data processing device of the present invention, the young element (calculation result {c1, c2}) of the calculation result is not passed through the crossbar 2 for routing. Therefore, the result storage register (register file 11, vector register 5-2) can be determined in a short time.

次に、第２の効果について説明する。本発明によれば、若番要素（ベクトルデータ｛ａ１、ａ２｝、｛ｂ１、ｂ２｝）用に少量のマルチポート構成レジスタファイル１１を用意し、また大容量（ベクトルデータ｛ａ３〜ａ９｝、｛ｂ３〜ｂ９｝）用にはポート数の少ないＲＡＭを用いて組み合わせてベクトルレジスタ５−１、５−２を構成している。このため、少ないハードウェア量の増加でベクトルデータ処理装置を構成できる。 Next, the second effect will be described. According to the present invention, a small amount of multiport configuration register file 11 is prepared for the young elements (vector data {a1, a2}, {b1, b2}), and a large capacity (vector data {a3 to a9}, For {b3 to b9}), vector registers 5-1 and 5-2 are configured by using a RAM with a small number of ports. Therefore, the vector data processing apparatus can be configured with a small increase in hardware amount.

図１は、従来のベクトルデータ処理装置の構成を示している。FIG. 1 shows the configuration of a conventional vector data processing apparatus. 図２は、本発明のベクトルデータ処理装置の構成を示している。FIG. 2 shows the configuration of the vector data processing apparatus of the present invention. 図３は、本発明のベクトルデータ処理装置の動作を示すタイミングチャートである。FIG. 3 is a timing chart showing the operation of the vector data processing apparatus of the present invention.

Explanation of symbols

１−１、１−２クロスバ入力レジスタ、
２クロスバ、
３−１、３−２クロスバ出力レジスタ、
４−１、４−２ライトデータレジスタ、
５−１、５−２ベクトルレジスタ、
６−１、６−２リードデータレジスタ、
７−１、７−２演算入力レジスタ、
８演算器、
９演算出力レジスタ、
１０−１、１０−２ライトデータレジスタ（若番用ライトデータレジスタ）、
１１レジスタファイル、
１２−１、１２−２選択回路、
ＲＡ１、ＲＡ２リードアドレス、
ＷＡ１、ＷＡ２ライトアドレス、
Ｒ１、Ｒ２リードアドレス、
Ｗ１、Ｗ２ライトアドレス、
１０１−１、１０１−２クロスバ入力レジスタ、
１０２クロスバ、
１０３−１、１０３−２クロスバ出力レジスタ、
１０４−１、１０４−２ライトデータレジスタ、
１０５−１、１０５−２ベクトルレジスタ、
１０６−１、１０６−２リードデータレジスタ、
１０７−１、１０７−２演算入力レジスタ、
１０８演算器、
１０９演算出力レジスタ、 1-1, 1-2 Crossbar input register,
2 Crossbar,
3-1, 3-2 Crossbar output register,
4-1, 4-2 Write data register,
5-1, 5-2 Vector register,
6-1, 6-2 Read data register,
7-1, 7-2 Operation input register,
8 arithmetic units,
9 Operation output register,
10-1, 10-2 Write data register (write data register for young number),
11 Register file,
12-1, 12-2 selection circuit,
RA1, RA2 read address,
WA1, WA2 Write address,
R1, R2 read address,
W1, W2 Write address,
101-1, 101-2 crossbar input register,
102 Crossbar,
103-1, 103-2 Crossbar output register,
104-1, 104-2 write data register,
105-1 and 105-2 vector registers,
106-1, 106-2 read data register,
107-1, 107-2 arithmetic input register,
108 computing units,
109 arithmetic output register,

Claims

Of the 1st to nth (n is an integer of 3 or more) n first vector data, the 1st to jth (j is an integer satisfying j <(n−j)) Of the first young number storage section storing the first vector data and the j second vector data from the first to the jth out of the n second vector data from the first to the nth. A register file having a second young number storage unit storing
A first vector register storing (n−j) first vector data other than the j first vector data;
A second vector register storing (n−j) second vector data other than the j second vector data;
An arithmetic unit that sequentially performs an operation on each of the n first vector data and the n second vector data to generate n operation results;
A young write data register for sequentially storing j operation results from the first to jth out of the n operation results in the register file;
A vector data processing apparatus comprising: a crossbar that sequentially stores (n−j) operation results other than the j operation results in a designated vector register of the first and second vector registers.

The vector data processing apparatus according to claim 1,
The j first vector data stored in the first young number storage unit of the register file are sequentially output to the computing unit, and then the (n−) stored in the first vector register. j) a first selection circuit for sequentially outputting the first vector data to the computing unit;
The j second vector data stored in the second young number storage unit of the register file are sequentially output to the computing unit, and then the (n−) stored in the second vector register. j) A vector data processing apparatus further comprising a second selection circuit that sequentially outputs the second vector data to the computing unit.

The vector data processing apparatus according to claim 2, wherein
A first calculation input register connected between the first selection circuit and the calculator;
A second calculation input register connected between the second selection circuit and the calculator;
A vector data processing apparatus further comprising an arithmetic output register connected between the arithmetic unit and the crossbar.

The vector data processing apparatus according to claim 3,
First and second crossbar input registers connected in correspondence with the first and second vector registers between the arithmetic output register and the crossbar;
First and second crossbar output registers connected between the crossbar and the first and second vector registers;
A vector data processing apparatus further comprising first and second write data registers connected between the first and second crossbar output registers and the first and second vector registers.