JP2003337693A

JP2003337693A - Fast processor

Info

Publication number: JP2003337693A
Application number: JP2002143337A
Authority: JP
Inventors: So Yamauchi; 宗山内
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2002-05-17
Filing date: 2002-05-17
Publication date: 2003-11-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide a fast processor having high performance in processing which is not affected by the influence of latency. <P>SOLUTION: A sum-up device is constituted with a pipeline-type adder 202, a shift register 204 and a shit-step number control section 206. The output of the pipeline-type adder 202 is connected with the shift register 204, and the output of the shift register 204 is connected with an input of the pipeline-type adder 202. The number of the shift steps of the shift register 204 is controlled by the shift-step number control section 206. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、高速演算器に関す
るものである。更に詳述すれば本発明は、コンピュータ
装置及びその他の電子装置の演算器に係わり、該演算器
の高速化及び高効率化を図った高速演算器に関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a high speed arithmetic unit. More specifically, the present invention relates to an arithmetic unit of a computer device and other electronic devices, and relates to a high-speed arithmetic unit that aims to increase the speed and efficiency of the arithmetic unit.

【０００２】[0002]

【従来の技術】従来、コンピュータ装置及びその他の電
子装置の演算器に於いては、演算器の演算回路を構成す
る論理回路を、レジスタで細かく分割してパイプライン
化することにより、高い稼動クロック周波数、即ち、高
い演算スループットを実現している。2. Description of the Related Art Conventionally, in an arithmetic unit of a computer device and other electronic devices, a logic circuit constituting an arithmetic circuit of the arithmetic unit is finely divided by a register to form a pipeline so that a high operating clock can be obtained. A frequency, that is, a high calculation throughput is realized.

【０００３】図３は、従来のパイプライン型演算回路の
一例を示すブロック構成図である。乗算器１０１は、入
力線１０２と入力線１０３とから入力されるデータを乗
算して、その結果を出力線１０４に出力する。加算器１
０５は、出力線１０４と出力線１０７とから出力される
データを加算して、その結果を出力線１０６に出力す
る。レジスタ１０８は、出力線１０６から出力されるデ
ータを入力とし、又出力線１０７にデータを出力するレ
ジスタである。FIG. 3 is a block diagram showing an example of a conventional pipeline type arithmetic circuit. The multiplier 101 multiplies the data input from the input lines 102 and 103 and outputs the result to the output line 104. Adder 1
05 adds the data output from the output line 104 and the output line 107, and outputs the result to the output line 106. The register 108 is a register which receives data output from the output line 106 and outputs data to the output line 107.

【０００４】図３に示したパイプライン型演算回路の動
作を説明する。二つのベクトルＡ、Ｂの内積を計算する
場合を考える。加算器１０５に、出力線１０４と出力線
１０７とから出力されるデータを入力してから、その加
算結果が出力線１０７に出力されるまでが１クロックで
完了すると仮定する。その場合、二つのベクトルＡ、Ｂ
の要素を各々入力線１０２と入力線１０３にクロック毎
投入する。すると最終的に、出力線１０７には二つのベ
クトルＡ、Ｂの内積結果が出力される。The operation of the pipeline type arithmetic circuit shown in FIG. 3 will be described. Consider the case of calculating the inner product of two vectors A and B. It is assumed that the process from inputting the data output from the output line 104 and the output line 107 to the adder 105 to outputting the addition result to the output line 107 is completed in one clock. In that case, two vectors A and B
The above elements are input to the input line 102 and the input line 103 for each clock. Then, finally, the inner product result of the two vectors A and B is output to the output line 107.

【０００５】特開平９−３２５９５３号公報には、「積
和演算を高速に処理可能なプロセッサを提供する」とい
う課題に対して、「命令を格納する命令レジスタと、複
数のレジスタを備え、前記命令レジスタに格納された命
令に応じて、前記複数のレジスタの内の２個以上のレジ
スタから並列にデータが読み出されるレジスタファイル
と、前記レジスタファイルのレジスタから並列に読み出
されたデータに、前記命令レジスタに格納された命令に
応じて所定の演算を施す演算器と、Ｍ個（Ｍは２以上の
整数）のレジスタを備え、前記命令レジスタに格納され
た命令に応じて、前記Ｍ個のレジスタの内の１個以上の
レジスタからデータを読み出す累計レジスタファイル
と、前記演算器が演算した結果のデータと、前記累計レ
ジスタファイルのレジスタから読み出されたデータを、
前記命令レジスタに格納された命令に応じて加算する加
算器とを有し、前記累計レジスタファイルのレジスタに
は、前記加算器が加算した結果のデータが、前記命令レ
ジスタに格納された命令に応じて格納されることを特徴
とするプロセッサ」なる技術が開示されている。Japanese Unexamined Patent Publication No. 9-325953 discloses a problem of "providing a processor capable of processing multiply-accumulate operations at high speed", "providing an instruction register for storing an instruction and a plurality of registers. According to the instruction stored in the instruction register, the register file in which data is read in parallel from two or more registers of the plurality of registers and the data read in parallel from the register in the register file are An arithmetic unit for performing a predetermined operation in accordance with the instruction stored in the instruction register and M (M is an integer of 2 or more) registers are provided, and the M number of units corresponding to the instruction stored in the instruction register are provided. A cumulative register file for reading data from one or more of the registers, data obtained as a result of calculation by the arithmetic unit, and a record of the cumulative register file. The data read out from the static,
An adder that adds according to the instruction stored in the instruction register, and the result data added by the adder is stored in the register of the cumulative register file according to the instruction stored in the instruction register. The technology of "processor characterized by being stored as" is disclosed.

【０００６】又、特開平１０−２１４２６１号公報に
は、「累算並列演算処理装置において、累算処理の効率
化を図り、処理時間を短縮し、高速化することを目的と
する」という課題に対して、「第１連続自動アドレス発
生器と、ソースデータメモリと係数データメモリと、前
記第１連続自動アドレス発生器から出力されるアドレス
の前記ソースデータメモリと係数データメモリのベクト
ルデータを格納する第１レジスタと、前記第１レジスタ
に格納されたベクトルデータの演算を行うパイプライン
演算器と、前記パイプライン演算器の演算結果を格納す
る第２レジスタと、前記第２レジスタに格納された演算
結果の累算処理を行う累算演算器と、前記累算演算器の
累算結果を格納する第３レジスタと、第２連続自動アド
レス発生器と、前記第３レジスタに格納された累算結果
を前記第２連続自動アドレス発生器から出力されるアド
レスに格納するディスティネーションデータメモリとを
パイプライン状に構成し、前記ベクトルデータの読み込
み、パイプライン演算、累算処理、データ転送をタップ
毎に並列に行い、前記累算演算器の初期化、第１，第２
連続自動アドレス発生器の各アドレスの再設定を所定の
タップ数毎に並列に行うことを特徴とする累算並列演算
処理装置」という技術が開示されている。Further, Japanese Unexamined Patent Publication No. 10-214261 discloses a problem that "in an accumulative parallel arithmetic processing unit, the aim is to improve the efficiency of the accumulating process, shorten the processing time, and increase the speed." , "The first continuous automatic address generator, the source data memory and the coefficient data memory, and the vector data of the source data memory and the coefficient data memory of the address output from the first continuous automatic address generator are stored. Stored in the first register, a pipeline arithmetic unit for calculating the vector data stored in the first register, a second register storing the calculation result of the pipeline arithmetic unit, and a second register stored in the second register. An accumulator for accumulating the arithmetic result, a third register for storing the accumulative result of the accumulator, a second continuous automatic address generator, A destination data memory for storing the accumulation result stored in the 3 register at the address output from the second continuous automatic address generator is configured in a pipeline, and the vector data is read, pipeline operation, accumulation is performed. Arithmetic processing and data transfer are performed in parallel for each tap to initialize the accumulator, first and second
A technique called "accumulation parallel arithmetic processing device characterized in that each address of the continuous automatic address generator is reset in parallel for every predetermined number of taps" is disclosed.

【０００７】[0007]

【発明が解決しようとする課題】従来の技術には、以下
に説明する問題点があった。The prior art has the following problems.

【０００８】図３に示したパイプライン型演算回路で
は、加算器１０５のレイテンシが積和の処理性能に大き
く影響を与える、ということである。レイテンシ（Late
ncy）とは、データの要求を開始してから、そのデータ
が返ってくるまでの時間のことを意味する。従って、レ
イテンシの影響の問題があるとは、そのデータが返って
くるまでの時間が長く、処理速度が遅いということであ
る。In the pipeline type arithmetic circuit shown in FIG. 3, the latency of the adder 105 greatly affects the product-sum processing performance. Latency
ncy) means the time from the start of requesting data until the data is returned. Therefore, the problem of latency influence is that it takes a long time to return the data and the processing speed is slow.

【０００９】即ち、加算器１０５は、加算した結果をレ
ジスタ１０８にアキュームレートしていくため、加算器
１０５の加算結果が出力線１０７に出力されるまで、次
の加算を行なうことができない。従って、加算器１０５
に出力線１０４と出力線１０７とから出力されるデータ
を入力してから、その加算結果が出力線１０７に出力さ
れるまでがｍクロック（ｍ＞１）要すると、ベクトル
Ａ、Ｂの要素数をｎとするならば、ベクトルＡ、Ｂの内
積を計算するのにｍ＊（ｎ−１）クロック要することに
なる。つまり、アキュームレートのような演算処理で
は、演算器の出力を入力にフィードバックしているた
め、演算器のレイテンシの影響の問題が表面化してい
た。That is, since the adder 105 accumulates the added result in the register 108, the next addition cannot be performed until the addition result of the adder 105 is output to the output line 107. Therefore, the adder 105
When it takes m clocks (m> 1) from inputting the data output from the output lines 104 and 107 to outputting the addition result to the output line 107, the number of elements of the vectors A and B If n is n, it takes m * (n-1) clocks to calculate the inner product of the vectors A and B. That is, in the arithmetic processing such as the accumulation rate, the output of the arithmetic unit is fed back to the input, so that the problem of the latency of the arithmetic unit is exposed.

【００１０】特開平９−３２５９５３号公報の「プロセ
ッサおよびデータ処理装置」では、「総和演算を含む演
算を個別に複数回行う場合に、例えば、個別に行う各演
算に対して累計レジスタファイルのレジスタを割り当
て、演算器により上記各演算の部分的な演算を行って、
その演算結果を割り当てられたレジスタと加算器を用い
て累計（累積加算）していく使い方が出来る。すなわ
ち、総和演算を、上記各演算毎に一括して実施するので
はなく、上記各演算の順で累計を行いそれを繰り返すこ
とにより並列に実施することが出来る。このため、この
プロセッサでは、従来技術で生じていた演算の空き時間
なしに効率よく演算を行うことができ、総和演算の項数
が少ない場合には、従来技術より短時間に演算を実施す
ることが出来る」という効果を発揮するものの、レイテ
ンシの影響の問題が残っていた。In the "processor and data processing device" of Japanese Unexamined Patent Publication No. 9-325953, "when an operation including a sum operation is individually performed a plurality of times, for example, for each operation performed individually, a register of a cumulative register file is registered. , And perform a partial operation of each of the above operations by the operation unit,
It is possible to use the calculation results by accumulating (cumulative addition) using assigned registers and adders. That is, the summation operation can be performed in parallel by not performing the summation operation collectively for each of the above-described computations but by accumulating in the order of the above-described computations and repeating it. Therefore, in this processor, it is possible to efficiently perform the calculation without the idle time of the calculation that has occurred in the conventional technique, and when the number of terms of the total calculation is small, the calculation should be performed in a shorter time than the conventional technique. However, there is still a problem of latency influence.

【００１１】特開平１０−２１４２６１号公報の「累算
並列演算処理装置、およびその方法」では、「従来の個
別処理を１パイプライン処理中に組み込み並列処理とす
ることにより、フィルタ演算などの累算処理を行う際、
処理時間の短縮を実現できるという有利な効果が得られ
る」ものの、同様にレイテンシの影響の問題が残ってい
た。In Japanese Unexamined Patent Application Publication No. 10-214261, "accumulation parallel operation processing device and method", "conventional individual processing is incorporated into one pipeline processing to be parallel processing, so that accumulation of filter operations and the like can be performed. When performing arithmetic processing,
Although it has the advantageous effect of achieving a reduction in processing time, "there was a problem of latency impact as well.

【００１２】本発明は、以上の従来技術における問題点
を鑑みてなされたものであって、その目的とするところ
は、レイテンシの影響の問題が無く処理性能の高い演算
回路を提供することにある。The present invention has been made in view of the above problems in the prior art, and an object thereof is to provide an arithmetic circuit having a high processing performance without the problem of the influence of latency. .

【００１３】[0013]

【課題を解決するための手段】前記課題を解決する本出
願第一の発明の高速演算器は、複数セットの入力データ
から選択された一セットの入力データと同じセットに属
する演算結果に対して演算処理を行う演算手段と、該演
算手段の演算結果をセット別に分類して保持する記憶手
段と、指定したセットに属する演算結果を前記記憶手段
から取り出して出力する演算結果選択手段とを備えたこ
とを特徴とするものである。A high-speed arithmetic unit according to the first invention of the present application for solving the above-mentioned problems is to operate on an operation result belonging to the same set as one set of input data selected from a plurality of sets of input data. The present invention comprises: arithmetic means for performing arithmetic processing; storage means for classifying and retaining the arithmetic results of the arithmetic means for each set; and arithmetic result selection means for taking out and outputting the arithmetic results belonging to the designated set from the storage means. It is characterized by that.

【００１４】従って以上の本出願第一の発明の高速演算
器によれば、レイテンシの影響の問題が無く処理性能の
高い演算回路の実現が可能となる。Therefore, according to the high-speed arithmetic unit of the first invention of the present application, it is possible to realize an arithmetic circuit having high processing performance without the problem of latency influence.

【００１５】前記課題を解決する本出願第二の発明の高
速演算器は、本出願第一の発明の高速演算器に於いて、
前記演算手段は、論理否定、論理積、論理和、排他的論
理和の内のいずれか一つ或いは一つ以上の論理演算を行
なう演算回路であることを特徴とするものである。A high-speed arithmetic unit according to the second invention of the present application for solving the above-mentioned problems is a high-speed arithmetic unit according to the first invention of the present application.
The arithmetic means is an arithmetic circuit for performing one or more logical operations among logical negation, logical product, logical sum, and exclusive logical sum.

【００１６】従って以上の本出願第二の発明の高速演算
器によれば、論理否定、論理積、論理和、排他的論理和
の内のいずれか一つ或いは一つ以上の論理演算を高速で
行なうことが可能となる。Therefore, according to the above-described high-speed arithmetic unit of the second invention of the present application, one or more logical operations among logical negation, logical product, logical sum, and exclusive logical sum can be performed at high speed. It becomes possible to do it.

【００１７】前記課題を解決する本出願第三の発明の高
速演算器は、本出願第一の発明の高速演算器に於いて、
前記演算手段は、整数の四則演算を行なう演算回路であ
ることを特徴とするものである。A high-speed arithmetic unit according to the third invention of the present application for solving the above-mentioned problems is a high-speed arithmetic unit according to the first invention of the present application.
The arithmetic means is an arithmetic circuit for performing four arithmetic operations on integers.

【００１８】従って以上の本出願第三の発明の高速演算
器によれば、整数の四則演算を高速で行なうことが可能
となる。Therefore, according to the high-speed arithmetic unit of the third invention of the present application, the four arithmetic operations of integers can be performed at high speed.

【００１９】前記課題を解決する本出願第四の発明の高
速演算器は、本出願第一の発明の高速演算器に於いて、
前記演算手段は、浮動小数点の四則演算を行なう演算回
路であることを特徴とするものである。A high-speed arithmetic unit according to a fourth invention of the present application for solving the above-mentioned problems is a high-speed arithmetic unit according to the first invention of the present application.
The arithmetic means is an arithmetic circuit for performing four arithmetic operations of floating point.

【００２０】従って以上の本出願第四の発明の高速演算
器によれば、浮動小数点の四則演算を高速で行なうこと
が可能となる。Therefore, according to the high-speed arithmetic unit of the fourth invention of the present application, it is possible to perform the four arithmetic operations of floating point at high speed.

【００２１】前記課題を解決する本出願第五の発明の高
速演算器は、本出願第一の発明の高速演算器に於いて、
前記演算手段は、論理演算と四則演算とを行なう演算回
路を複数組み合わせて構成して成ることを特徴とするも
のである。A high-speed arithmetic unit according to the fifth invention of the present application for solving the above-mentioned problems is a high-speed arithmetic unit according to the first invention of the present application,
The arithmetic means is configured by combining a plurality of arithmetic circuits that perform logical operations and four arithmetic operations.

【００２２】従って以上の本出願第五の発明の高速演算
器によれば、論理演算と四則演算とが複数組み合わされ
ていても高速演算が可能となる。Therefore, according to the high speed operation device of the fifth invention of the present application, high speed operation is possible even if a plurality of logical operations and four arithmetic operations are combined.

【００２３】前記課題を解決する本出願第六の発明の高
速演算器は、本出願第一乃至第五の発明の高速演算器に
於いて、前記演算手段は、パイプライン化されて成るこ
とを特徴とするものである。A high-speed arithmetic unit according to a sixth invention of the present application for solving the above-mentioned problems is the high-speed arithmetic unit according to the first to fifth inventions of the present application, wherein the arithmetic means is pipelined. It is a feature.

【００２４】従って以上の本出願第六の発明の高速演算
器によれば、ＣＰＵなどの命令を複数の手順に分けて、
流れ作業で処理することが可能となり、一つの命令の処
理が終わる前に次の命令の処理を開始できるため、レイ
テンシの影響の問題が無く処理性能の高い演算回路の実
現が可能となるTherefore, according to the high-speed arithmetic unit of the sixth invention of the present application, the instructions of the CPU or the like are divided into a plurality of procedures,
Since it is possible to process in a line work and the processing of the next instruction can be started before the processing of one instruction is completed, it is possible to realize an arithmetic circuit with high processing performance without the problem of latency influence.

【００２５】前記課題を解決する本出願第七の発明の高
速演算器は、本出願第一乃至第六の発明の高速演算器に
於いて、前記演算手段が演算に要する時間と、前記一セ
ットの入力データが入力されるのに要する時間と、前記
記憶手段に演算結果が記憶される時間と、前記演算結果
選択手段が前記記憶手段から演算結果を選択して前記演
算手段の入力に到達するのに要する時間とを合計したク
ロック数と、前記セット数とが同じになるように構成し
て成ることを特徴とするものである。The high-speed arithmetic unit according to the seventh invention of the present application for solving the above-mentioned problems is the high-speed arithmetic unit according to the first to sixth inventions of the present application, wherein the time required for the arithmetic unit to perform the arithmetic operation and the one set Of time required to input the input data, the time for which the calculation result is stored in the storage unit, and the calculation result selection unit selects the calculation result from the storage unit to reach the input of the calculation unit. It is characterized in that the number of clocks, which is the total of the time required for, and the number of sets are the same.

【００２６】従って以上の本出願第七の発明の高速演算
器によれば、演算器の稼働率を高く保ち、レイテンシの
影響の問題が無く処理性能の高い演算回路の実現が可能
となる。Therefore, according to the high-speed arithmetic unit of the seventh invention of the present application, it is possible to realize an arithmetic circuit having a high processing performance while keeping the operating rate of the arithmetic unit high and without the problem of latency.

【００２７】前記課題を解決する本出願第八の発明の高
速演算器は、本出願第一乃至第六の発明の高速演算器に
於いて、前記演算手段が演算に要する時間と、前記一セ
ットの入力データが入力されるのに要する時間と、前記
記憶手段に演算結果が記憶される時間と、前記演算結果
選択手段が前記記憶手段から演算結果を選択して前記演
算手段の入力に到達するのに要する時間とを合計したク
ロック数の自然数倍と、前記セット数とが同じになるよ
うに構成して成ることを特徴とするものである。The high-speed arithmetic unit according to the eighth invention of the present application for solving the above-mentioned problems is the high-speed arithmetic unit according to the first to sixth inventions of the present application, wherein the time required for the arithmetic operation by the arithmetic unit and the one set Of time required to input the input data, the time for which the calculation result is stored in the storage unit, and the calculation result selection unit selects the calculation result from the storage unit to reach the input of the calculation unit. It is characterized in that the number of sets is the same as a natural number times the number of clocks, which is the sum of the time required for.

【００２８】従って以上の本出願第八の発明の高速演算
器によれば、演算器の稼働率を高く保ち、レイテンシの
影響の問題が無く処理性能の高い演算回路の実現が可能
となる。Therefore, according to the high-speed arithmetic unit of the eighth invention of the present application, it is possible to keep the operating rate of the arithmetic unit high and to realize an arithmetic circuit with high processing performance without the problem of latency.

【００２９】前記課題を解決する本出願第九の発明の高
速演算器は、本出願第一、第七及び第八の発明の高速演
算器に於いて、前記記憶手段は、先入れ先出し方式記憶
素子を用いて成ることを特徴とするものである。A high-speed arithmetic unit according to the ninth invention of the present application for solving the above-mentioned problems is the high-speed arithmetic unit according to the first, seventh and eighth inventions of the present application, wherein the storage means is a first-in first-out storage device. It is characterized by being used.

【００３０】従って以上の本出願第九の発明の高速演算
器によれば、演算結果など前記記憶手段に記憶されるデ
ータは、保存された順番で処理を行なうことが可能とな
り、前記演算手段が演算に要する時間と、前記一セット
の入力データが入力されるのに要する時間と、前記記憶
手段に演算結果が記憶される時間と、前記演算結果選択
手段が前記記憶手段から演算結果を選択して前記演算手
段の入力に到達するのに要する時間とを合計したクロッ
ク数と、前記セット数とが同じになるように構成ことを
可能にする。Therefore, according to the high-speed arithmetic unit of the ninth invention of the present application, the data such as the arithmetic result stored in the storage unit can be processed in the order in which they are stored. The time required for calculation, the time required for inputting the set of input data, the time for storing the calculation result in the storage means, and the calculation result selection means for selecting the calculation result from the storage means. The number of clocks, which is the sum of the time required to reach the input of the arithmetic means, and the number of sets are the same.

【００３１】前記課題を解決する本出願第十の発明の高
速演算器は、本出願第一、第七及び第八の発明の高速演
算器に於いて、前記記憶手段は、シフトレジスタを用い
て成ることを特徴とするものである。A high-speed arithmetic unit according to the tenth invention of the present application for solving the above-mentioned problems is the high-speed arithmetic unit according to the first, seventh and eighth inventions of the present application, wherein the storage means uses a shift register. It is characterized by being formed.

【００３２】従って以上の本出願第十の発明の高速演算
器によれば、与えられた信号に基づいて記憶している情
報をシフトさせることが可能となり、前記演算手段が演
算に要する時間と、前記一セットの入力データが入力さ
れるのに要する時間と、前記記憶手段に演算結果が記憶
される時間と、前記演算結果選択手段が前記記憶手段か
ら演算結果を選択して前記演算手段の入力に到達するの
に要する時間とを合計したクロック数と、前記セット数
とが同じになるように構成ことを可能にする。Therefore, according to the high-speed arithmetic unit of the tenth invention of the present application, it becomes possible to shift the stored information based on the given signal, and the time required for the arithmetic unit to perform the arithmetic operation, The time required for inputting the set of input data, the time for storing the calculation result in the storage means, the calculation result selecting means selecting the calculation result from the storage means, and inputting the calculation result It is possible to configure such that the number of clocks, which is the total of the time required to reach the above, and the number of sets are the same.

【００３３】[0033]

【発明の実施の形態】本発明の実施の形態を説明する。
本発明の高速演算器は、パイプライン型であり、複数セ
ットの演算を同時に処理をすることを可能にしたもので
ある。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described.
The high-speed arithmetic unit of the present invention is of a pipeline type and is capable of simultaneously processing a plurality of sets of arithmetic operations.

【００３４】具体的な構成は次の通りである。本発明の
高速演算器は、複数セットの入力データから選択された
一セットの入力データと同じセットに属する演算結果に
対して演算処理を行う演算手段と、該演算手段の演算結
果をセット別に分類して保持する記憶手段と、指定した
セットに属する演算結果を前記記憶手段から取り出して
出力する演算結果選択手段とを備えている。これによ
り、レイテンシの影響の問題が無く処理性能の高い演算
回路の実現が可能となる。The specific configuration is as follows. A high-speed arithmetic unit according to the present invention includes an arithmetic unit for performing arithmetic processing on an arithmetic result belonging to the same set as one set of input data selected from a plurality of sets of input data, and an arithmetic result of the arithmetic unit classified by set. And storage means for holding the calculation results, and calculation result selection means for extracting and outputting the calculation results belonging to the designated set from the storage means. This makes it possible to realize an arithmetic circuit with high processing performance without the problem of latency.

【００３５】そして、前記演算手段は、論理否定、論理
積、論理和、排他的論理和の内のいずれか一つ或いは一
つ以上の論理演算を行なう演算回路である。更に、前記
演算手段は、整数の四則演算や浮動小数点の四則演算を
行なう演算回路であり、又論理演算と四則演算とを行な
う演算回路を複数組み合わせて構成しても良い。これに
より、論理否定、論理積、論理和、排他的論理和の内の
いずれか一つ或いは一つ以上の論理演算を高速で行なう
こと、浮動小数点の四則演算を高速で行なうこと、整数
の四則演算を高速で行なうこと、論理演算と四則演算と
が複数組み合わされていても高速演算を行うことが可能
である。The arithmetic means is an arithmetic circuit for performing one or more logical operations among logical NOT, logical product, logical sum, and exclusive OR. Further, the arithmetic means is an arithmetic circuit that performs four arithmetic operations of integers and four arithmetic operations of floating point, and may be configured by combining a plurality of arithmetic circuits that perform logical operations and four arithmetic operations. As a result, at least one logical operation among logical negation, logical product, logical sum, and exclusive OR is performed at high speed, floating point arithmetic operations are performed at high speed, and integer arithmetic operations are performed. It is possible to perform the operation at high speed and to perform the high speed operation even if a plurality of logical operations and four arithmetic operations are combined.

【００３６】前記演算手段はパイプライン化されてい
る。これにより、ＣＰＵなどの命令を複数の手順に分け
て、流れ作業で処理することが可能となり、一つの命令
の処理が終わる前に次の命令の処理を開始できるため、
レイテンシの影響の問題が無く処理性能の高い演算回路
の実現が可能となるThe computing means is pipelined. This makes it possible to divide an instruction of the CPU or the like into a plurality of procedures and process them in an assembling manner, so that the processing of the next instruction can be started before the processing of one instruction is finished.
It is possible to realize an arithmetic circuit with high processing performance without the problem of latency effects.

【００３７】高速演算器は、前記演算手段が演算に要す
る時間と、前記一セットの入力データが入力されるのに
要する時間と、前記記憶手段に演算結果が記憶される時
間と、前記演算結果選択手段が前記記憶手段から演算結
果を選択して前記演算手段の入力に到達するのに要する
時間とを合計したクロック数と、前記セット数とが同じ
になるように構成しているので、演算器の稼働率を高く
保ち、レイテンシの影響の問題が無く処理性能の高い演
算回路の実現が可能となる。The high-speed arithmetic unit includes a time required for the arithmetic unit to perform an arithmetic operation, a time required to input the set of input data, a time required to store the arithmetic result in the storage unit, and the arithmetic result. Since the number of clocks, which is the sum of the time required for the selection means to select the calculation result from the storage means and reach the input of the calculation means, is the same as the set number, the calculation is performed. It is possible to realize an arithmetic circuit with high processing performance, which keeps the operating rate of the device high and has no problem of the influence of latency.

【００３８】尚、前記演算手段が演算に要する時間と、
前記一セットの入力データが入力されるのに要する時間
と、前記記憶手段に演算結果が記憶される時間と、前記
演算結果選択手段が前記記憶手段から演算結果を選択し
て前記演算手段の入力に到達するのに要する時間とを合
計したクロック数の自然数倍と、前記セット数とが同じ
になるように構成しても良く、これによっても演算器の
稼働率を高く保ち、レイテンシの影響の問題が無く処理
性能の高い演算回路の実現が可能となる。The time required for the calculation by the calculation means,
The time required for inputting the set of input data, the time for storing the calculation result in the storage means, the calculation result selecting means selecting the calculation result from the storage means, and inputting the calculation result Can be configured so that the number of sets is the same as a natural number times the number of clocks, which is the sum of the time required to arrive at, and this also keeps the operating rate of the computing unit high and affects the latency. It is possible to realize an arithmetic circuit having high processing performance without the problem of.

【００３９】前記記憶手段は、先入れ先出し方式記憶素
子を用いて成ることを特徴とするものである。従って、
演算結果など前記記憶手段に記憶されるデータは、保存
された順番で処理を行なうことが可能となり、前記演算
手段が演算に要する時間と、前記一セットの入力データ
が入力されるのに要する時間と、前記記憶手段に演算結
果が記憶される時間と、前記演算結果選択手段が前記記
憶手段から演算結果を選択して前記演算手段の入力に到
達するのに要する時間とを合計したクロック数と、前記
セット数とが同じになるように構成ことを可能にしてい
る。The storage means is characterized by using a first-in first-out storage element. Therefore,
The data stored in the storage means, such as the calculation result, can be processed in the order in which they are stored, and the time required for the calculation by the calculation means and the time required for inputting the set of input data And the number of clocks that is the sum of the time required for the operation result to be stored in the storage means and the time required for the operation result selection means to select the operation result from the storage means and reach the input of the operation means. , And the number of sets is the same.

【００４０】又前記記憶手段は、シフトレジスタを用い
て構成されている。これにより、与えられた信号に基づ
いて記憶している情報をシフトさせることが可能とな
り、前記演算手段が演算に要する時間と、前記一セット
の入力データが入力されるのに要する時間と、前記記憶
手段に演算結果が記憶される時間と、前記演算結果選択
手段が前記記憶手段から演算結果を選択して前記演算手
段の入力に到達するのに要する時間とを合計したクロッ
ク数と、前記セット数とが同じになるように構成ことを
可能にしている。The storage means is composed of a shift register. This makes it possible to shift the stored information based on a given signal, the time required for the arithmetic means to perform the arithmetic operation, the time required to input the set of input data, and the The number of clocks, which is the sum of the time required to store the calculation result in the storage means and the time required for the calculation result selection means to select the calculation result from the storage means and reach the input of the calculation means, and the set It is possible to configure so that the numbers are the same.

【００４１】以上で説明した様に構成することによっ
て、本発明の高速演算器は、複数セットの入力データに
対して演算処理を行なうが、演算結果を入力にフィード
バックするタイプの演算において、その演算に複数クロ
ックを要する場合でも、ひとつの入力データのセットに
対する演算が終了しないうちに、次の入力データのセッ
トを演算器に入力することができるため、複数演算クロ
ックを要する演算においても、演算器の稼働率を高く保
ち、高い処理性能を得ることが可能である。By configuring as described above, the high-speed arithmetic unit according to the present invention performs arithmetic processing on a plurality of sets of input data. Even if a plurality of clocks are required for the operation, the next set of input data can be input to the arithmetic unit before the operation for one set of input data is completed. It is possible to maintain high operation rate and obtain high processing performance.

【００４２】[0042]

【実施例】図１は、本発明の高速演算器の一実施例のブ
ロック構成図である。累算器（アキュームレータ）を構
成する場合について示している。累算器は、パイプライ
ン型加算器２０２と、シフトレジスタ２０４と、シフト
段数制御部２０６とから構成され、パイプライン型加算
器２０２の出力がシフトレジスタ２０４へ、シストレジ
スタ２０４の出力がパイプライン型加算器２０２の一方
の入力に接続されている。そして、シフトレジスタ２０
４のシフト段数をシフト段数制御部２０６が制御できる
ように構成している。1 is a block diagram of an embodiment of a high-speed arithmetic unit according to the present invention. The case where an accumulator is configured is shown. The accumulator includes a pipeline type adder 202, a shift register 204, and a shift stage number control unit 206. The output of the pipeline type adder 202 is sent to the shift register 204 and the output of the sist register 204 is sent to the pipeline. It is connected to one input of the type adder 202. Then, the shift register 20
The shift stage number control unit 206 can control the number of shift stages of four.

【００４３】前述の複数セットの入力データから選択さ
れた一セットの入力データと同じセットに属する演算結
果に対して演算処理を行う演算手段は、ここではパイプ
ライン型加算器２０２であり、演算手段の演算結果をセ
ット別に分類して保持する記憶手段は、ここではシフト
レジスタ２０４であり、又指定したセットに属する演算
結果を前記記憶手段から取り出して出力する演算結果選
択手段は、ここではシフト段数制御部２０６である。The operation means for performing operation processing on the operation result belonging to the same set as one set of input data selected from the above-mentioned plurality of sets of input data is the pipeline type adder 202 here, and the operation means The storage means for classifying and holding the operation results of each of the sets is the shift register 204 here, and the operation result selecting means for extracting and outputting the operation results belonging to the designated set from the storage means is the number of shift stages here. The control unit 206.

【００４４】パイプライン型加算器２０２は、アキュー
ムレートするデータが入力される入力線２０１と、アキ
ュームレートされた中間結果がフィードバックされてく
る出力線２０５が入力として接続され、出力側には、加
算した結果を出力する出力線２０３が接続される。The pipeline type adder 202 is connected with the input line 201 to which the data to be accumulated is input and the output line 205 to which the accumulated intermediate result is fed back, as an input, and the addition to the output side. The output line 203 for outputting the result is connected.

【００４５】シフトレジスタ２０４は、出力線２０３
と、シフト段数制御部２０６から出力される制御線２０
７が入力として接続され、出力側には出力線２０５が接
続される。The shift register 204 has an output line 203.
And the control line 20 output from the shift stage number control unit 206.
7 is connected as an input, and the output line 205 is connected to the output side.

【００４６】シフト段数制御部２０６は、段数入力線２
０８が入力として接続され、出力側には制御線２０７が
接続される。The shift stage number control unit 206 uses the stage number input line 2
08 is connected as an input, and the control line 207 is connected to the output side.

【００４７】次に、図１の回路の動作について、図１を
参照しながら説明する。アキュームレートするデータ
は、入力線２０１から入力される。ここで、アキューム
レートするデータのセット数がｋセットと仮定する。こ
れは例えば、ｋ本のベクトル各々の総和を求め、結果と
してｋ個のスカラー値を得るという処理に相当する。こ
の場合、入力線２０１からデータが入って、演算結果が
出力線２０５に出力されるまでの時間がｋクロックにな
るように、シフトレジスタ２０４の段数が制御される。Next, the operation of the circuit of FIG. 1 will be described with reference to FIG. The data to be accumulated is input from the input line 201. Here, it is assumed that the number of sets of data to be accumulated is k sets. This corresponds to, for example, the process of obtaining the sum of k vectors and obtaining k scalar values as a result. In this case, the number of stages of the shift register 204 is controlled so that the time until data is input from the input line 201 and the calculation result is output to the output line 205 is k clocks.

【００４８】例えば、パイプライン型加算器２０２のレ
イテンシがｍクロックの場合、出力線２０３や出力線２
０５の遅延が１クロックの周期よりも十分に短ければ、
シフトレジスタ２０４の段数を（ｋ−ｍ）段にセットす
る。但し、セット数ｋがｍよりも小さい場合は、パイプ
ライン型演算器２０２自体が本質的に性能を出せないと
いうことであり、基本的には処理の対象にしない。尚、
本構成では、別のデータのセットと併せて一度に計算す
ることで、ｋ＞ｍになる様にして、この問題を回避する
ことも可能である。For example, when the latency of the pipeline type adder 202 is m clocks, the output line 203 and the output line 2
If the delay of 05 is sufficiently shorter than the period of 1 clock,
The number of stages of the shift register 204 is set to (km) stages. However, when the number of sets k is smaller than m, it means that the pipeline type arithmetic unit 202 itself cannot essentially provide performance, and is basically not a processing target. still,
In this configuration, it is also possible to avoid this problem by making k> m by performing calculation at once together with another data set.

【００４９】データのセット数ｋとパイプライン型演算
器２０２のレイテンシｍに応じてシフトレジスタ２０４
の段数を（ｋ−ｍ）段にセットするのは、シフト段数制
御部２０６の役割である。段数入力線２０８から入力さ
れた値と、パイプライン型加算器２０２のレイテンシｍ
に従って、制御線２０７介して、シフトレジスタ２０４
がシフトをする段数を設定するように制御する。According to the number k of data sets and the latency m of the pipeline type arithmetic unit 202, the shift register 204
It is the role of the shift stage number control unit 206 to set the stage number to (k−m). The value input from the stage number input line 208 and the latency m of the pipeline type adder 202
According to the control line 207, the shift register 204
Controls to set the number of shift stages.

【００５０】シフト段数制御部２０６は、アキュームレ
ート処理の最初のｋクロックの間、シフトレジスタ２０
４の出力を零に固定するように制御線２０７へ制御信号
を出力する機能をも有している。又、アキュームレート
処理の最後の加算の結果はシフトレジスタ２０４に入れ
る必要がないので、シフト段数制御部２０６は、制御線
２０７へ制御信号を出力して、出力線２０３からの出力
を出力線２０５へ直接バイパスするように、シフトレジ
スタ２０４を制御することも可能である。The shift stage number control unit 206 controls the shift register 20 during the first k clocks of the accumulation rate process.
It also has a function of outputting a control signal to the control line 207 so that the output of 4 is fixed to zero. Further, since it is not necessary to put the result of the final addition of the accumulation rate process in the shift register 204, the shift stage number control unit 206 outputs a control signal to the control line 207 and outputs the output from the output line 203 to the output line 205. It is also possible to control the shift register 204 to bypass it directly.

【００５１】図１の回路の動作を表したタイミング図を
図２に示す。図２では、行列３０１の各行毎に総和を求
める演算を例としている。行列３０１は、４行４列の行
列である。従って、４個の総和を求める処理が４セット
ある（即ち、ｋ＝４である。）。パイプライン型演算器
２０２のレイテンシｍを２とすると、シフトレジスタ２
０４がシフトする段数は、ｋ−ｍ＝２段でる。その場合
の入力線２０１、出力線２０３、出力線２０５のタイミ
ングを示したのがタイミング図３０２となっている。A timing diagram showing the operation of the circuit of FIG. 1 is shown in FIG. FIG. 2 exemplifies a calculation for obtaining the sum total for each row of the matrix 301. The matrix 301 is a matrix with 4 rows and 4 columns. Therefore, there are four sets of processing for obtaining the four sums (that is, k = 4). If the latency m of the pipeline type arithmetic unit 202 is 2, the shift register 2
The number of shifts of 04 is km = 2. A timing diagram 302 shows the timing of the input line 201, the output line 203, and the output line 205 in that case.

【００５２】行列３０１の各行を多重化して処理するの
で、最初の４クロックでは、各行の１列目のデータ、次
の４クロックでは各行の２列目のデータという順に入力
線２０１にデータを投入する。そして、この場合の行
数、即ちセット数ｋに併せてシフト段数を制御している
ので、行数分だけデータを入力線２０１に投入する度に
それまでアキュームレートした中間結果が出力線２０５
に出てくる。Since each row of the matrix 301 is multiplexed and processed, data is input to the input line 201 in the order of the first column data of each row at the first 4 clocks and the second column data of each row at the next 4 clocks. To do. Since the number of rows in this case, that is, the number of shift stages is controlled in accordance with the number of sets k, each time data is input to the input line 201, the intermediate result accumulated up to that point is output line 205.
Come out to.

【００５３】この様にして、パイプライン型演算器２０
２が遊ぶことなく演算を続けることが可能となる。それ
により、ｋセットのベクトルのアキュームレート処理
は、ベクトル長をｎとすると、（ｋ＊ｎ＋ｋ−１）クロ
ック後に完了する。本発明を用いない場合は、加算１回
毎に（ｍ＋１）クロック待つ必要があり、合計では（ｍ
＋１）＊ｎ＊ｋクロックを要してしまう。In this way, the pipeline type arithmetic unit 20
It becomes possible for 2 to continue the calculation without playing. Thereby, the accumulation rate process of the k sets of vectors is completed after (k * n + k-1) clocks, where n is the vector length. If the present invention is not used, it is necessary to wait (m + 1) clocks for each addition, and (m + 1) clocks in total.
+1) * n * k clocks are required.

【００５４】本発明の実施の形態では、ｋセットのアキ
ュームレート処理を並行処理して行なっている。従っ
て、パイプライン型加算器２０２の演算処理のレイテン
シが複数クロックを要して、入力線２０１からデータが
入って、演算結果が出力線２０５に出力されるまでにｋ
クロックを要する場合でも、その間に別のセットの加算
を多重化して実行することができるので、パイプライン
型加算器２０２の稼働率を高く保ち続けることが可能と
なる。In the embodiment of the present invention, k sets of accumulation rate processes are executed in parallel. Therefore, the latency of the arithmetic processing of the pipeline type adder 202 requires a plurality of clocks, data is input from the input line 201, and the calculation result is output to the output line 205 by k.
Even if a clock is required, another set of additions can be multiplexed and executed during that time, so that it is possible to keep the operating rate of the pipelined adder 202 high.

【００５５】即ち、行列と行列の掛け算などのように、
複数セットの積和演算が必要な場合に、積和を構成する
アキュームレータとして適用することで、複数セットの
積和を効率良くパイプライン処理することが可能とな
る。That is, like matrix-matrix multiplication,
When a plurality of sets of product-sum operations are required, application as an accumulator that constitutes the product-sum enables pipeline processing of a plurality of sets of product-sum efficiently.

【００５６】[0056]

【発明の効果】本発明の高速演算器は、複数セットの入
力データから選択された一セットの入力データと同じセ
ットに属する演算結果に対して演算処理を行う演算手段
と、該演算手段の演算結果をセット別に分類して保持す
る記憶手段と、指定したセットに属する演算結果を前記
記憶手段から取り出して出力する演算結果選択手段とを
備えたことにより、パイプライン演算器が演算に複数ク
ロックを要し且つデータがフィードバックする場合でも
演算器の稼働率を高く保ち、高い処理性能が得られるの
で、レイテンシの影響の問題が無く処理性能の高い演算
回路の実現が可能となった。The high-speed arithmetic unit of the present invention comprises arithmetic means for performing arithmetic processing on arithmetic results belonging to the same set as one set of input data selected from a plurality of sets of input data, and the arithmetic operation of the arithmetic means. The pipeline arithmetic unit uses a plurality of clocks for calculation by providing the storage unit for classifying and holding the results by set and the calculation result selecting unit for fetching and outputting the calculation result belonging to the designated set from the storage unit. Even when data is fed back, the operation rate of the arithmetic unit can be kept high and high processing performance can be obtained, so that it is possible to realize an arithmetic circuit with high processing performance without the problem of latency.

[Brief description of drawings]

【図１】本発明の高速演算器の一実施例（累算器）を示
したブロック構成図である。FIG. 1 is a block diagram showing an embodiment (accumulator) of a high-speed arithmetic unit according to the present invention.

【図２】図１の累算器のタイミング図である。2 is a timing diagram of the accumulator of FIG.

【図３】従来のパイプライン型演算回路のブロック図で
ある。FIG. 3 is a block diagram of a conventional pipeline type arithmetic circuit.

[Explanation of symbols]

１０１乗算器１０２入力線１０３入力線１０４出力線１０５加算器１０６出力線１０７出力線１０８レジスタ２０１入力線２０２パイプライン型加算器２０３出力線２０４シフトレジスタ２０５出力線２０６シフト段数制御部２０７制御線２０８段数入力線３０１行列３０２タイミング図 101 multiplier 102 input line 103 input line 104 output line 105 adder 106 output line 107 output line 108 registers 201 input line 202 Pipeline type adder 203 output line 204 shift register 205 output line 206 Shift stage number control unit 207 control line 208 steps number input line 301 matrix 302 Timing diagram

Claims

[Claims]

1. An arithmetic means for performing arithmetic processing on an arithmetic result belonging to the same set as one set of input data selected from a plurality of sets of input data, and an arithmetic result of the arithmetic means is classified and retained for each set. A high-speed arithmetic unit comprising: a storage unit for storing the calculation result and a calculation result selection unit for outputting the calculation result belonging to the designated set from the storage unit.

2. The arithmetic means is an arithmetic circuit for performing one or more logical operations among logical NOT, logical product, logical sum, and exclusive OR. The high-speed arithmetic unit described in 1.

3. The high-speed arithmetic unit according to claim 1, wherein the arithmetic means is an arithmetic circuit for performing four arithmetic operations of integers.

4. The high-speed arithmetic unit according to claim 1, wherein said arithmetic means is an arithmetic circuit for performing four arithmetic operations of floating point.

5. The high-speed arithmetic unit according to claim 1, wherein the arithmetic means is configured by combining a plurality of arithmetic circuits for performing logical operations and four arithmetic operations.

6. A high-speed arithmetic unit according to claim 1, wherein said arithmetic unit is pipelined.

7. A time required for the calculation means to perform a calculation, and a time required for inputting the set of input data,
The number of clocks that is the sum of the time taken for the operation result to be stored in the storage means and the time required for the operation result selection means to select the operation result from the storage means and reach the input of the operation means; 7. The high-speed arithmetic unit according to claim 1, wherein the number of sets is the same as the number of sets.

8. A time required for the calculation means to perform a calculation and a time required for inputting the set of input data,
A natural number of clocks, which is the sum of the time required for the calculation result to be stored in the storage means and the time required for the calculation result selection means to select the calculation result from the storage means and reach the input of the calculation means. 7. The number of sets is the same as the number of sets.
The described high-speed computing unit.

9. The high-speed arithmetic unit according to claim 1, wherein said memory means comprises a first-in first-out memory device.

10. The high-speed arithmetic unit according to claim 1, wherein said storage means comprises a shift register.