JPH02235174A

JPH02235174A - Bus matrix

Info

Publication number: JPH02235174A
Application number: JP3229690A
Authority: JP
Inventors: Leslie D Kohn; レスリイ・デイ・コーン
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 1989-02-10
Filing date: 1990-02-13
Publication date: 1990-09-18
Also published as: DE4001232A1; DE4001232C2

Abstract

PURPOSE: To execute a dual operation and to improve the performance of a processor by providing a multiplier device multiplying operand inputs, an adder device adding them and a data route controller connected to operand input. CONSTITUTION: A controller 12 constituted of a multiplexer selects that the operand (OP) 1 of the multiplier device 24 becomes either KR from a register 22, KI from a register 21 or SRC1 from a line 20. A controller 25 selects that OP2 becomes SRC2 from a line 26 or the final result of the adder device 32 from a line 34. Controllers 31 and 33 select necessary inputs for OP1 and OP2 in the device 32. The result of the device 32 is connected to one of 32 floating point registers by the line 34. Thus, a bus matrix with which the dual operation is executed in the floating point device of the processor is obtained.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は、半導体マイクロプロセッサに関し、更に詳し
くは、デュアル算術演算を行なうパス・マトリックスに
関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to semiconductor microprocessors, and more particularly to path matrices that perform dual arithmetic operations.

〔発明の背景〕本発明は、マイクロプロセッサにおいて浮動小数点装置
の一部を形成しているパス・マトリックス回路に係わる
ものである。本発明に関して使用される寸イクロプロセ
ッサは、Ｎ　１０　ＴＭ７’　ａセツサと呼称されてい
るインテル８６０”マイクロプロセッサである。（イン
テルは、インテル・コーポレーションの登鎌商標である
。）ＮＩＯプロセッサは、３　２／６　４ビットＩ　ＥＥＥ
コンパチプル浮動小数点プロセッサ、３２ビットＲＩＳ
Ｏ整数プロセッサ、および６４ビット３次元図形プロセ
ッサである。ベクトルおよびスカラ演算の両方Ｋ関して
最適化され次数値プロセッサを用いると、それは、単一
チップで、百万個以上のトランジスタを内蔵しかつクレ
イ１の性能の約半分もの性能を有する産業用第１次統合
高性能ベクトル●プロセッサとなる。ＮＩＯプロセッサ
は、非常に速い実行速度を得るのにパイプライン浮動小
数点装置を使用している。BACKGROUND OF THE INVENTION The present invention relates to path matrix circuits forming part of floating point units in microprocessors. The microprocessor used in connection with the present invention is an Intel 860" microprocessor designated as the NIO TM7'a processor (Intel is a registered trademark of Intel Corporation). 2/6 4 bit I EEE
Compatible floating point processor, 32-bit RIS
O integer processor, and a 64-bit 3D graphics processor. Using a K-optimized order value processor for both vector and scalar operations, it is possible to use an industrial processor with more than a million transistors and about half the performance of Cray 1 on a single chip. Becomes a primary integrated high-performance vector ●processor. NIO processors use pipelined floating point units to obtain very fast execution speeds.

後述するように、本発明は、Ｎ１０プロセッサの浮動小
数点ハードウエアに関して十分に最適化されたパス●マ
トリックスを提供する。このバス・マ｝　ＩＪツクスは
、乗算器装置およびアダー装置の同時（デュアル）演算
をサポートする。これらデュアル演算は、最も一般的Ｋ
使用されているソフトウエア●アルゴリズム、たとえば
積、ＤＡＸＰＹ，ＦＦＴなどの合計をサポートする。As discussed below, the present invention provides a path matrix that is fully optimized for floating point hardware in the N10 processor. This bus matrix supports simultaneous (dual) operation of multiplier units and adder units. These dual operations are the most common K
Software used ● Supports summation algorithms such as product, DAXPY, FFT, etc.

一般に、マイクロプロセッサの命令に関し、ソース●オ
ペランドおよびデステイネーションは、１組の浮動小数
点レジスタから来るようＫ指定されている。大抵の装置
では、この組の浮動小数点レジスタは、通常、２つのソ
ース●オペランドと１つのデステイネーション●オベラ
ンドを供給する。簡単な加算または乗算演算を行なう・
には、３つのオペランド構成で十分である。しかし、演
算または乗算を同時に行なうようなデュアル演算を行な
うには、あと３つのオペランド（全部で６つのオペラン
ド）が必要である。６つのオペランドを処理するのに浮
動小数点レジスタ・ファイルを必要とすることはあｔｂ
有効的ではないので、従来のマイクロプロセッサは、代
表的には、デュアル演算を連続的に行なっている。言い
換えれば、最初に加算を行ない、続いて乗算、ま九はこ
の逆を行なっている。Generally, for microprocessor instructions, the source operand and destination are specified to come from a set of floating point registers. In most devices, this set of floating point registers typically provides two source operands and one destination operand. Perform simple addition or multiplication operations
A three operand configuration is sufficient for . However, to perform dual operations such as simultaneous operations or multiplications, three more operands (six operands in total) are required. There is no need for a floating point register file to handle six operands.
Because of the lack of efficiency, conventional microprocessors typically perform dual operations sequentially. In other words, first we perform addition, then multiplication, and vice versa.

連続演算の他の方法は、乗算および加算演算の両方を並
列して行なうことである。この方法は、乗算累積演算と
して知られている幅広く採用されている方法でおる。乗
算累積演算κおいて、乗算器は、浮動小数点レジスタ・
ファイルからの２′）のソース−オペランドを得る。ア
ダーのオペランド入力の一方は、乗算器の結果出力を受
信する。Another method of sequential operations is to perform both multiplication and addition operations in parallel. This method is a widely used method known as the multiply-accumulate operation. In the multiply-accumulate operation κ, the multiplier is a floating point register.
Obtain the source-operand of 2') from the file. One of the adder's operand inputs receives the result output of the multiplier.

アダーの他のソース●オベランド入力は、アダー自身の
結果出力に接続して、ある種のフィードバック構成を成
している。シミエレートされた算術演算は、基本的には
積の合計の累積である。しかし、乗算累積演算には、簡
単な演算なとえば積の合計しか行危うことができないと
いう問題点がある。これは、相互接続が、通常、固定構
造の”／％−ドーワイヤード・だからである。様々なア
ルゴリズムを供給するという要求に対し、実質的Ｋよシ
汎用化されかつよク幅広く演算を処理できる装置が必要
とされている。これには、非常Ｋ有効的な方法で複雑な
アルゴリズムを供給できるバス●マトリックスを有して
いることが有利である。後述するように、本発明は、有
効的な方法で幅広い並列演算、またはアルゴリズムを実
行できるようにしている。この能力は、従来のプロセッ
サに比較して、ここで述べているマイクロプロセッサの
性能を高めている。The adder's other sources ● Oberand inputs are connected to the adder's own result outputs to form a kind of feedback configuration. A simulated arithmetic operation is essentially an accumulation of sums of products. However, the multiplication and accumulation operation has a problem in that it can only perform simple operations such as the sum of products. This is because the interconnects are typically fixed-structure, do-wired.In response to the demands of providing a variety of algorithms, they are virtually universal and can handle a wide range of operations. What is needed is an apparatus that advantageously has a bus matrix capable of feeding complex algorithms in a highly efficient manner. This ability increases the performance of the microprocessor described here compared to traditional processors.

[Overview of the invention]

浮動小数点演算を実行することができるマイクロプロセ
ッサにおけるデュアル算術演耳を行なうバス・マトリッ
クスについて脱明する。マイクロプロセッサは、第１お
よび第２浮動小数点ソース命令オペランドおよび浮動小
数点デステイネーシｇ／φレジスタを供給する。乗算装
置は、第１および第２オペランドを乗算するのに使用さ
れ、第１結果を生じる。アダー装置は、第３および第４
オペランドを加算するのに使用され、第２結果を生じる
。また、本発明は、所定のアルゴリズムに関する内部ル
ープ計算を行なうのに使用される定数の夾数部および虚
数部を記憶し、かつ乗算器により発生された第１結果を
一時記憶するレジスタ装置を含んでいる。データ経路制
御装置は、並列に所定のアルゴリズムを実現するよう、
複数のオペランドの１つを選択して乗算器およびアダー
の各オペランド入力に接続するのに使用される。本発明
のこの特徴によシ、乗算および加算演算の両方を並列に
行なって、マイクロプロセッサにおいて幅広いアルゴリ
ズムを供給することができる。A bus matrix for dual arithmetic operations in a microprocessor capable of performing floating point operations is explained. The microprocessor provides first and second floating point source instruction operands and floating point destination g/φ registers. A multiplier is used to multiply the first and second operands to produce a first result. The adder device has the third and fourth
Used to add operands to produce a second result. The invention also includes a register device for storing the conjunctive and imaginary parts of constants used to perform inner loop calculations for a given algorithm and for temporarily storing the first result generated by the multiplier. I'm here. The data path controller implements a predetermined algorithm in parallel.
It is used to select one of a plurality of operands to connect to each operand input of the multiplier and adder. This feature of the invention allows both multiplication and addition operations to be performed in parallel to provide a wide variety of algorithms in a microprocessor.

最後に、複数のオペランド（第１結果、第２結果、第１
および第２ソース・オペランド、定数、ま九は一時記憶
された第１結果を含んでいる）全データ経路制御装置の
入力に接続するようバス・マ｝　ＩＪツクスにおいて相
互接続されている。所定のアルゴリズムに関し、データ
経路制御装置は、乗算器またはアダーのいずれかの適当
な入力に、どの特定のオベ２冫ド入力を接続すべきかを
決定する。九とえば、本発明の実施例においては、１６
個の異なるソフトウエア命令またはアルゴリズムを供給
する１６個の異なるデータ経路が示されている。Finally, multiple operands (first result, second result, first
and a second source operand, a constant, containing the temporarily stored first result) are interconnected in a bus mask for connection to inputs of all data routing devices. For a given algorithm, the data path controller determines which particular output input should be connected to the appropriate input of either the multiplier or the adder. For example, in an embodiment of the present invention, 16
Sixteen different data paths are shown serving different software instructions or algorithms.

以下、添付の図面Ｋ基いて、本発明の実施例に関し説明
する。Embodiments of the present invention will be described below with reference to the accompanying drawing K.

〔Example〕

並列算術演算を行なうのに使用される浮動小数点パス・
マトリックスについて説明する。以下の説明において、
特定のデータ経路などの詳細な記載は、本発明の理解を
助けるためのものであり、本発明はこれら詳細な記載に
限定されないことは、当条者Ｋは明白であろう。また、
アダーおよび乗算器など周知の構造および回路について
の詳細な説明は、本発明を不明瞭にしないよう省略する
。A floating point path used to perform parallel arithmetic operations.
Explain the matrix. In the following explanation,
It will be apparent to those skilled in the art that the detailed descriptions, such as specific data paths, are provided to aid in understanding the invention, and the invention is not limited to these details. Also,
Detailed descriptions of well-known structures and circuits, such as adders and multipliers, are omitted so as not to obscure the present invention.

多くの最近のマイクロプロセッサのアーキテクチュアに
おいて、浮動小数点装置は、演算が装置に供給される速
度を増すのに並列方式を使用している。ある種の並列方
式は、゜バイブライニングと呼ばれている。パイグライ
ンーアーキテクデュアは、並列に実行することができる
一連のより基本的演算（“段゛と呼ばれている）として
各演算を処理する。たとえば、プロセッサの浮動小数点
アダー装置について考察する。アダーの演算を▲と表し
、段を▲１　，＾２　，Ａ３と表す。段は、１つのアグ
ーの命令に関するＡ１＋１が、次のアグーの命令に関す
るＡ１と並列に実行できるように指定される。また、丁
度１クロツクで、各Ａｉを実行することができる。なお
、プロセッサの乗算器およびベクトル整数装置のパイプ
２イニングは、段数が異なっていることを除けば、同様
に説明できる。In many modern microprocessor architectures, floating point units use parallelism to increase the speed at which operations are delivered to the unit. One type of parallelism is called bilining. The pie-grain architecture processes each operation as a series of more basic operations (called "stages") that can be executed in parallel. For example, consider the floating-point adder unit of a processor. The operation of is expressed as ▲, and the stages are expressed as ▲1, ^2, and A3. The stages are specified such that A1+1 regarding one Agu instruction can be executed in parallel with A1 regarding the next Agu instruction. , each Ai can be executed in exactly one clock. Note that the pipe two innings of the processor's multiplier and vector integer unit can be described similarly, except that the number of stages is different.

第１図は、本発明を含んでいるプロセッサの浮動小数点
アダー（単精度入力オベランドが使用される場合は浮動
小数点乗算器）において見られるような３段バイブライ
ニングを示している。図の各列Ｌ、バイブラインの３段
の１つを示している。FIG. 1 illustrates three-stage bilining as found in the floating point adder (or floating point multiplier if a single precision input overland is used) of a processor incorporating the present invention. Each row L in the figure shows one of the three stages of the vibe line.

各段は、結果を即値結果を保持するとともに、（ソフト
ウエアによシ第１段に供給される場合）これら結果に関
する状態情報を保持する。また、図面において、命令ス
トリームは、全てｌａｌ類の一連９連続浮動小数点命令
から成っている（すなわち、全てアダ〜命令か、または
全て単精度乗算器命令である）。Each stage holds immediate results and (if provided by software to the first stage) state information regarding these results. Also, in the drawing, the instruction stream consists of a series of nine consecutive floating point instructions, all of the LAL class (ie, all adda instructions or all single precision multiplier instructions).

命令の時間的関係は、ｆ，＋　１　＋１　＋　１　”　
２　＋等と表される。図の行は、連続するクロック・サ
イクルＫおける装置の状態を表している。パイプライン
演算が行なわれるごとに、最後の段の状態は、状態レジ
スタが利用できるようになる（たとえば、ＮＩＯプロセ
ッサにおいて、結果は、浮動小数点ステータス●レジス
タ”ｆｓｒ“において利用できる）。パイプラインの最
終段の結果は％　ｒｄｅ＋ｓｔに記憶され、パイプライ
ンは１段進められ、かつ入力オペランドａｒｃｌ　，　
ｓｒ−ｃ２は、パイプラインの最初の段に転送される。The temporal relationship of the instructions is f, + 1 + 1 + 1 ”
It is expressed as 2 + etc. The rows of the diagram represent the state of the device during successive clock cycles K. Each time a pipeline operation is performed, the state of the last stage is made available in a status register (eg, in a NIO processor, the result is available in a floating point status register "fsr"). The result of the final stage of the pipeline is stored in %rde+st, the pipeline is advanced one stage, and the input operands arcl,
sr-c2 is transferred to the first stage of the pipeline.

二−モニツクｓｒｃｌ　，ｓｒｃ２　，　ｒｄｅｓｔは
、ＮＩＯプロセッサＫ位置する３２個の浮動小数点レジ
スタの１つを示している。The two-monics srcl, src2, rdest refer to one of the 32 floating point registers located in NIO processor K.

ＮＩＯプロセッサにおいて、パイプラインの段数は、１
〜３個である。３段パイプラインＫ関するパイプライン
演算は、３つ前の演算の結果を記憶する。２段パイプラ
インに関するパイプライン演算は、２つ前の演算の結果
を記憶する。１段パイプラインＫ関するパイプライン演
算は、前の演算の結果を記憶する。ＮＩＯプロセッサは
、乗算器用、アダー用、ベクトルー整数装置用、浮動小
数点パイプラインを有している。アダー・パイプライン
は３段有している。乗算器のパイ゛プラインにおける段
数は、パイプラインのソース・オペランドの精度によシ
、２ま念は３段となる。ベクトルー整数装置は、全精度
に対して１段有している。In the NIO processor, the number of pipeline stages is 1.
~3 pieces. Pipeline operations related to the three-stage pipeline K store the results of the three previous operations. A pipeline operation related to a two-stage pipeline stores the result of the two previous operations. Pipeline operations related to the one-stage pipeline K store the results of previous operations. The NIO processor has multiplier, adder, vector-to-integer unit, and floating point pipelines. The adder pipeline has three stages. The number of multiplier stages in the pipeline depends on the precision of the pipeline's source operands, with two stages being three stages. The vector-integer unit has one stage for full precision.

ロード●パイプラインは全精度に対して３段有している
。Load ● The pipeline has three stages for total precision.

第２図は、本発明の実施例を示している。第２図の浮動
小数点パス・マトリックスは、乗算器装置２４とアダー
装置３２を有している。装置２４，３２の内部構成は、
尚分野において周知であるのでここでの説明は省略する
が、簡単Ｋ言えば、これらは普通のデイジタル乗算器ま
たはアダーから成っている。本実施例は、本発明の出願
人Ｋｇｌ渡された、発明の名称「並列乗算のための４−
２７ダー・セル」および「浮動小数点乗算のためのステ
イツキ●ビット●プレデイクタ」の２つの米国特許願に
おいて開示され九乗算器装置を使用している。本実施例
のアダー装置は、本発明の出願人に譲渡され念、発明の
名称「浮動小数点アダーの念めのプリノーマリゼーショ
ン」および「浮動小数点アダーのための切シ上げ論理装
置」の２つの米国特許顧に開示されている。FIG. 2 shows an embodiment of the invention. The floating point path matrix of FIG. 2 includes a multiplier unit 24 and an adder unit 32. The internal configuration of the devices 24 and 32 is as follows:
In simple terms, these consist of conventional digital multipliers or adders, although they are well known in the art and will not be discussed here. This embodiment is based on the title of the invention "4-4 for parallel multiplication" passed to the applicant Kgl
Two US patent applications, ``27 Dar Cell'' and ``Statsky Bit Predictor for Floating Point Multiplication'', are disclosed in two US patent applications using a nine multiplier device. The adder device of this embodiment is assigned to the applicant of the present invention, and has two United States Patent No. Disclosed to patent advisor.

図示のように、バス・マトリックスは、さらに３つの特
殊なレジスタ、すなわちＫＲレシスタ２２、ＫＩレジス
タ２１、Ｔレジスタ３０を有している（ＫＩは定数の虚
数部、ＫＲは定数の実数部、Ｔは一時的を示している）
。これらレジスタは、１つのデュアル演算命令からの値
を記憶し、かつそれらをその後のデュアル演算命令の入
力として供給することができる。定数レジスタ２２．２
１は、オペランドｓｒｃｌの実数部および虚数部をそれ
ぞれ記憶するのに使用することができる。その後、ｓｒ
ｃｌＯ代シにこれら値を乗算パイプラインに供給するこ
とができる。Ｔ（一時的）レジスタ３０は、乗算器パイ
プラインの最後の段の結果を記憶するのに有効で、その
後、その値をｓｒｃｌの代シにアダー●パイプラインに
供給する。As shown, the bus matrix further includes three special registers: KR register 22, KI register 21, and T register 30 (KI is the imaginary part of the constant, KR is the real part of the constant, and T indicates temporary)
. These registers can store values from one dual arithmetic instruction and provide them as inputs to subsequent dual arithmetic instructions. Constant register 22.2
1 can be used to store the real and imaginary parts of the operand srcl, respectively. After that, sr.
These values can then be fed into the multiplication pipeline instead. The T (temporary) register 30 is useful for storing the result of the last stage of the multiplier pipeline and then providing its value to the adder pipeline on behalf of srcl.

第２゜図は、またデータ経路制御装置２３，２５，３１
　．３３を示している。データ経路制御装置２３．２５
，３１．３３は、乗算器装置およびアダー装置のオペラ
ンド入力を選択するのに使用される。これら各制御装置
（第２図では１本の水平ラインにより示されている）は
、代表的には、マルテプレクサま九は制御可能バスのよ
うなスイッチング装置を有している。本実施例は、当分
野において周知な普通のマルチプレクサを使用している
。FIG. 2 also shows data path control devices 23, 25, 31.
．． 33 is shown. Data path controller 23.25
, 31.33 are used to select the operand inputs of the multiplier unit and the adder unit. Each of these controllers (represented by a single horizontal line in FIG. 2) typically includes a switching device such as a controllable bus. This embodiment uses conventional multiplexers well known in the art.

動作において、複数のオペランド（データ経路制御装置
を示した水平ラインに向かう矢印Ｋよシ示されている）
から１つのオペランドが選択さね、乗算器ｔ九はアダー
のいずれかに接続される。九とえば、データ経路制御装
置２３は、どのアルゴリズムが供給されるかによ，９、
ＫＩＫ記憶された定数の虚数値、ＫＲに記憶された定数
の実数値、またはソース●オペランドｓｒｃｌのいずれ
かを乗算器装置２４の第１オペランド入力に供給する。In operation, multiple operands (indicated by arrow K pointing towards the horizontal line indicating the data path controller)
One operand is selected from , and multiplier t9 is connected to one of the adders. 9. For example, the data path controller 23 may
Either the imaginary value of the constant stored in KIK, the real value of the constant stored in KR, or the source operand srcl is applied to the first operand input of multiplier unit 24.

本実施例において、各マルチプレクサ２３，２５，３１
，３３の制御は、Ｏｐコードで４ビット●デ一夕経路制
御欄（ＤＰＣ）によシ行なわれる。ＤＰＣは、特殊なレ
ジスタのローディングおよびオペランドを指定する。In this embodiment, each multiplexer 23, 25, 31
, 33 is controlled by a 4-bit output path control field (DPC) using an Op code. DPC specifies special register loading and operands.

第２図は、本発明の実施例によシサポートされる可能な
アルゴリズムの全てを実挑するのＫ使用される全バス接
続マトリックスを示している。乗算器装置２４のオペラ
ンド１は、レジスタ２２から供給され九ＫＲ，レジスタ
２１からのＫｌ，ｔ次はライン２０に沿って供給された
ｇｒｃｌのいずれかになるように選択される。これらの
値のどれか１つＫ決定することにより、乗算器のオペラ
ンド１　（　ａｐｌ）は、ＤＰＣの特定の工冫コーディ
ングκよシ確定される。同様に、乗算器のオベ２冫ド２
　（　ｏｐ２）は、ライン２６から供給されたａｒｃ２
、またはライン３４に生じるアダー・パイプラインの最
終段の結果のいずれかになる。制御装置２５は、これら
２つの値のいずれかがオペランド２になるかを決定する
。アダーのオペランド１は、２イｙ２Ｇに接続したａｃ
ｒｌ，？レジスタ３０Ｋ記憶された一時的結果値、また
はライン３４に沿つ九アダー●パイプライン入力の最終
段の結果のいずれかくなる。制御装置３１は、アダー装
置３２のオペランド１人力に関し適切なデータ経路を選
択するのＫ使用される。最後に、アダー３２のオペラン
ド２は、ライン２６からのｓｒｃ　２　，ライン２Ｔに
おける乗算器パイプラインの最終段の結果、ま九はライ
ン３４に供給されたアダー・パイプラインの最終段の結
果のいずれかになるよ５Ｋ選択される。制御装置すなわ
ちマルチプレクサ装置３３は、どの入力オペランドがア
ダー装置３２のオペランド２になるかを選択するように
、ＤＰＣによ９命令される。ライン３４に沿ってたアダ
ー装置３２によシ供給された結果は、プロセッサの３２
個の浮動小数点レジスタの１つに接続されるｒｄｅｓｔ
値を宍している。FIG. 2 shows the total bus connection matrix used to demonstrate all of the possible algorithms supported by embodiments of the present invention. Operand 1 of multiplier unit 24 is selected to be either KR supplied from register 22, Kl from register 21, or grcl supplied along line 20. By determining one of these values, the operand 1 (apl) of the multiplier is determined by the specific engineering coding κ of the DPC. Similarly, the multiplier obed2
(op2) is arc2 supplied from line 26
, or the result of the final stage of the adder pipeline occurring on line 34. Controller 25 determines which of these two values will be operand 2. Operand 1 of the adder is the ac connected to 2iy2G.
rl,? This will either be the temporary result value stored in register 30K, or the result of the last stage of the nine adder pipeline inputs along line 34. The controller 31 is used to select the appropriate data path for one of the operands of the adder device 32. Finally, operand 2 of adder 32 is either src 2 from line 26, the result of the last stage of the multiplier pipeline on line 2T, or the result of the last stage of the adder pipeline supplied on line 34. The 5K will be chosen. The controller or multiplexer device 33 is instructed by the DPC to select which input operand becomes operand 2 of the adder device 32. The results provided by adder device 32 along line 34 are sent to processor 32.
rdest connected to one of the floating point registers
It has value.

表１は、いかにしてＤＰＣの様々な工冫コーディングが
異なるデータ経路を選択し、そしてそれによシ異なるア
ルゴリズムを供給するかを示している。ＤＰＣの各位は
、それに関連した独特の１組の二一モニツクを有してい
る。二一モニツクＰＦＡＭ，　ＰＦＳＭ　ハ、デュアル
演算命令“パイプライン浮動小数点加算および乗算”お
よび“パイプライン浮動小数点減算および乗算゜である
。表１の二−モニツクによシ示されたデュアル演算命令
に関して実現される実際のデータ経路は、第３図乃至第
８図に示されている。Table 1 shows how various technical coding of the DPC selects different data paths and therefore provides different algorithms. Each member of the DPC has a unique set of 21 Monics associated with it. 21 Monics PFAM, PFSM C. Dual arithmetic instructions "Pipeline floating point addition and multiplication" and "Pipeline floating point subtraction and multiplication." The actual data paths taken are shown in FIGS. 3-8.

表Ｉ　　ＤＰＣエンコーディング −Ｋロードが設定されると、乗算器のオペランド１がＫ
Ｒの場合ＫＲはロードされ、乗算器のオペランド１がＫ
Ｌの場合ＫＬがロードされる。Table I DPC Encoding - When K load is set, multiplier operand 1 is
If R then KR is loaded and operand 1 of the multiplier is K
If L, KL is loaded.

たとえば、プログラマがマトリックス反転を行ないたい
場合について考察する。本発明において、これは、第３
図の実際のデータ経路に示されているソフトウエア命令
ｒ２ｐｌを使用して行なわれる。For example, consider the case where a programmer wants to perform a matrix inversion. In the present invention, this is the third
This is done using the software instruction r2pl shown in the actual data path of the figure.

マトリックス反転を行なう際、アルゴリズムの内部ルー
プは次の数学的関係で示される。When performing matrix inversion, the inner loop of the algorithm is represented by the following mathematical relationship:

ｋＶ凰　＋　Ｖ！　→　Ｖ． ζとで、ｋは実数の定数で、Ｖ，，Ｖ．はベクトル要素
である。マトリックス反転を行なうことは、各ベクトル
要素に、ある定数をかけ、その後その結果に第２ベクト
ルを加算することを含んでおシ、それによシその結果が
第２ベクトル記憶位置く記憶し直される。この命令を供
給するため、第３図において、ＫＲレジスタは乗算器装
置のａｐｌ　入力に直接的に接続している。乗算器装置
は他の入力（ｏｐ２）は、浮動小数点命令オペランドｓ
ｒｃ２Ｋ接続している。乗算器装置の出力結果は、アダ
ー装置のｏｐ２人力に接続し、かつアダー装ｔｏｏｐｌ
入力は、浮動小数点装置のｓｒｃｌ命令オペランドに接
続している。ｓｒｃｌ　，　ｓｒｃ２オペランドは、上
記式のｖ１１　ｖ，に対応している。アダー装置からの
結果は、ｒｄｅｉｔ　　レジスタ内に配置され、ベクト
ルＶ！に関する新しい値になる。kV 凰 + V! →V. ζ, k is a real constant, V,,V. is a vector element. Performing a matrix inversion involves multiplying each vector element by a constant and then adding a second vector to the result, whereby the result is stored back in a second vector storage location. . To provide this instruction, in FIG. 3, the KR register is connected directly to the apl input of the multiplier unit. The other input (op2) of the multiplier unit is the floating point instruction operand s
rc2K is connected. The output result of the multiplier device is connected to the op2 power of the adder device and the adder device toopl
The input is connected to the srcl instruction operand of the floating point unit. The srcl and src2 operands correspond to v11 v in the above equation. The result from the adder device is placed in the rdeit register and vector V! becomes the new value for

マトリックス反転は、本発明のバス●マトリックスでは
行なうことができるが、従来の乗算累積演算を用いては
簡単に行なうことができないという、内部ループ定数を
含んでいるデュアル演算の適当な例である。乗算累積方
式は、一般的に精度が劣夛、プログラムするのがよシむ
ずかしく、さらに結果を得るのが遅いので、あま夛好ま
しくない。（なお、従来の乗算累積演算は、本発明のパ
ス・マトリックスに供給されるようなｍｌ２ａｐｎソフ
トウエア命令によシ供給される。）本発明は、ここＫ示
した実施例に限定されず、本発明の思想から離れるとと
なく様々Ｋ改変できるとと唸当業者には明白であろう。Matrix inversion is a suitable example of a dual operation involving an inner loop constant, which can be performed with the bus matrix of the present invention, but cannot easily be performed using conventional multiply-accumulate operations. Multiply-accumulate methods are generally less accurate, more difficult to program, and slower to obtain results, making them less desirable. (It should be noted that conventional multiply-accumulate operations are provided by ml2apn software instructions such as those provided in the path matrix of the present invention.) The present invention is not limited to the embodiment shown herein; It will be obvious to those skilled in the art that various modifications can be made without departing from the idea of the invention.

たとえば、ここでは、供給し得る様々なアルゴリズムを
示してはいるが、別のアルゴリズムを供給するのに他の
マトリックス接続を使用してもよい。For example, although various algorithms are shown here that may be provided, other matrix connections may be used to provide different algorithms.

以上のようＫ，本発明は、マイクロプロセッサの浮動小
数点装置においてデュアル演算命令を行なう優れたパス
・マトリックスを提供する。As described above, the present invention provides an excellent path matrix for performing dual arithmetic instructions in a floating point unit of a microprocessor.

[Brief explanation of the drawing]

第１図は本発明の浮動小数点バス・マトリックスに関連
し次プロセッサのパイプライン●アー卑テクチュアと３
つのパイプライン段を示し、第２図は本発明のバス・マ
トリックスの実施例を示し、第３図はｌ！ｊ！１に示す
ように、それぞれ特定のソフトウエア命令を表す二−モ
ニツクｒ２ｐｌ　，ｒ２ｓｌに関して選択された実際の
データ経路を示し、第４図は表ＩＫ示すように、それぞ
れ特定のソフトウエア命令を表す二−モニツクｒ２ｐｔ
，ｒ２ｓｔＫ関して選択され次実際のデータ経路を示し
、第５図は表１に示すように、それぞれ特定のソフトウ
エア命令を表す二−モニツクｒ２ｐａＬ　ｒ２ａｓｌＫ
関して選択された実際のデータ経路を示し、第６図は表
１に示すように、それぞれ特定のソフトウエア命令を表
す二一モニツクｒ２ａｐｔ，　ｒ２ａｍｔに関して選択
された実際のデータ経路を示し、第７図は表１に示すよ
うに、それぞれ特定のソフトウエア命令を表す二−モニ
ツクｉ２ｐｌ，　１２ｍｌに関して選択された実際のデ
ータ経路を示し、第８図は表１に示すように、それぞれ
特定のソフトウェア命令を表す二一モニツクｉ２ｐｔ，
　１２ｓｔに関して選択された実際のデータ経路を示し
、第９図は表１に示すように、それぞれ特定のソフトウ
エア命令を表す二−モニツク１２ａｐｌ，　１２ａｓｌ
に関して選択された実際のデータ経路を示し、第１０図
は表１に示すように、それぞれ特定のソフトウエア命令
を表す二−モニツク１２ａｐＬ　ｉ２ａｓｔに関して選
択された実際のデータ経路を示し、第１１図は表１に示
すように、それぞれ特定のソフトウエア命令を表す二一
モニツクｒａｔｌｐ２，ｒａｔｌｇ２に関して選択され
た実際のデータ経路を示し、第１２図は表１に示すよう
に、それぞれ特定のソフトウエア命令を表す二一モニツ
クｍ１２畳ｐｍ，　ｍｌ２ａｓｍに関して選択された実
際のデータ経路を示し、第ｌ３図は表１に示すように、
それぞれ特定のソフトウエア命令を表す二一モニツクｒ
ｉｌｐ２，　ｒａｌｓ２に関して選択された実際のデー
タ経路を示し、第１４図は表１に示すように、それぞれ
特定のソフトウエア命令を表す二−モニツクｍｌ２ｔｔ
ｐａ，ｍｌ２ｔｔｓａ　　に関して選択された実際のデ
ータ経路を示し、第１５図は表１に示すように、それぞ
れ特定のン７トウエア命令を表す二−モニツクｉａｔｌ
ｐ２．ｉａｔｌｓ２　　に関して選択された実際のデー
タ経路を示し、第１６図は表１に示すように、それぞれ
特定のソフトウエア命令を表す二一モニツクｍｌ２ｔｐ
ｍｌ　ｍｌ２ｔａｍに関して選択された実際のデータ経
路を示し、第１７図は表１に示すように、それぞれ特定
のン７トウエア命令を表す二−モニツクｉｉｌｐ２＊　
ｉｍｌｓ２　に関して選択された実際のデータ経路を示
し、第１８図は表１に示すようＫ１それぞれ特定のソフ
トウエア命令を表す二−モニックｍｌ２ｔｐｓｓ，　ｍ
ｌ２ｔｓａに関して選択された実際のデータ経路を示し
ている。２４・●φ・乗算器装置、３２・●●−アダー装置、２３．２５，３３ φデータ経路制御装置、 ●ＫＬレジスタ、２２　● ７Ｉ［逼−１ ●ＫＲレジスタ、３０　● ．Ｔレジスタ。FIG. 1 shows the following processor pipeline architecture related to the floating-point bus matrix of the present invention.
FIG. 2 shows an embodiment of the bus matrix of the invention, and FIG. 3 shows l! j! 1 shows the actual data paths selected for two monitors r2pl, r2sl, each representing a specific software instruction, and FIG. -Monique r2pt
, r2stK, and FIG.
FIG. 6 shows the actual data paths selected for the 21 monitors r2apt and r2amt, each representing a particular software instruction, as shown in Table 1; The figures show the actual data paths selected for the two-monics i2pl, 12ml, each representing a specific software instruction, as shown in Table 1, and FIG. 21 Monique i2pt, which represents
12st, FIG. 9 shows two monitors 12apl, 12asl, each representing a specific software instruction, as shown in Table 1.
FIG. 10 shows the actual data paths selected for two monitors 12apLi2ast, each representing a particular software instruction, as shown in Table 1, and FIG. Table 1 shows the actual data paths selected for the two monitors ratlp2, ratlg2, each representing a specific software instruction, and FIG. Figure 13 shows the actual data path selected for the 21 Monique m12 tatami pm, ml2 asm, as shown in Table 1.
21 Monik r, each representing a specific software instruction.
14 shows the actual data paths selected for ilp2, rals2, and the binary ml2tt, each representing a specific software instruction, as shown in Table
15 shows the actual data paths selected for pa, ml2ttsa, and FIG.
p2. The actual data paths selected for iatls2 are shown in Figure 16, each representing a specific software instruction, as shown in Table 1.
Figure 17 shows the actual data path selected for mlml2tam, and FIG.
18 shows the actual data path selected for imls2, and FIG.
3 shows the actual data path selected for l2tsa. 24・●φ・multiplier device, 32・●●−adder device, 23.25 ,33 φ data path control device, ●KL register, 22 ● 7I[逼−1 ●KR register, 30 ● . T register.

Claims

[Claims]

(1) a multiplier having first and second operand inputs and an output providing a first result; and an adder having third and fourth operand inputs and an output providing a second result; selecting one of a plurality of operands including the first result, the second result, and first and second source operands to each operand input of the multiplier and the adder to provide an algorithm of A dual arithmetic bus matrix comprising: a data routing device connected to the bus matrix;

(2) in a processor having a floating point unit and providing first and second floating point instruction operands, a multiplier unit that multiplies the first and second operands to produce a first result; an adder device that adds four operands to produce a second result; a register device that stores a constant and temporarily stores said first result; and an adder device that adds four operands to produce a second result; connected to the multiplier so that another one of the plurality of operands becomes the second operand,
a multiplexer device connected to the adder device such that one of the plurality of operands becomes the third operand and one of the plurality of operands becomes the fourth operand to supply a predetermined dual operation algorithm; , A processor that executes dual arithmetic instructions.

(3) In a floating point section of a processor having a floating point multiplier and a floating point adder and supplying first and second floating point instruction operands, a register device for storing constants and the result of the final stage of the multiplier. and, related to each operand input of the multiplier and the adder,
and the first and second instruction operands, the constant, the result of the last stage of the multiplier, the result of the current stage of the multiplier, so as to provide a software algorithm by performing multiplication and addition operations simultaneously. , the result of the current stage of the above adder and one of the multiple operands it contains.
a data routing device connecting one of said operands to said associated operand input; and a bus matrix for executing dual instructions.