JP6684713B2 - 融合積和演算を実行するための方法及びマイクロプロセッサ - Google Patents
融合積和演算を実行するための方法及びマイクロプロセッサ Download PDFInfo
- Publication number
- JP6684713B2 JP6684713B2 JP2016538834A JP2016538834A JP6684713B2 JP 6684713 B2 JP6684713 B2 JP 6684713B2 JP 2016538834 A JP2016538834 A JP 2016538834A JP 2016538834 A JP2016538834 A JP 2016538834A JP 6684713 B2 JP6684713 B2 JP 6684713B2
- Authority
- JP
- Japan
- Prior art keywords
- sum
- product
- unrounded
- result
- rounding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims description 95
- 238000004364 calculation method Methods 0.000 claims description 147
- 239000013598 vector Substances 0.000 claims description 145
- 238000009825 accumulation Methods 0.000 claims description 135
- 230000036961 partial effect Effects 0.000 claims description 105
- 239000000047 product Substances 0.000 description 194
- 239000000543 intermediate Substances 0.000 description 192
- 238000003860 storage Methods 0.000 description 59
- 238000010606 normalization Methods 0.000 description 33
- 238000012546 transfer Methods 0.000 description 30
- 238000013461 design Methods 0.000 description 29
- 230000006870 function Effects 0.000 description 23
- 125000004122 cyclic group Chemical group 0.000 description 22
- 238000002347 injection Methods 0.000 description 19
- 239000007924 injection Substances 0.000 description 19
- 238000007667 floating Methods 0.000 description 17
- 238000012545 processing Methods 0.000 description 16
- 230000000295 complement effect Effects 0.000 description 15
- 150000001875 compounds Chemical class 0.000 description 13
- 238000010586 diagram Methods 0.000 description 13
- 230000008901 benefit Effects 0.000 description 12
- 230000004044 response Effects 0.000 description 12
- 230000004048 modification Effects 0.000 description 11
- 238000012986 modification Methods 0.000 description 11
- 238000013459 approach Methods 0.000 description 10
- 238000012937 correction Methods 0.000 description 10
- 229910052717 sulfur Inorganic materials 0.000 description 10
- 230000008569 process Effects 0.000 description 8
- 239000003607 modifier Substances 0.000 description 7
- 239000000872 buffer Substances 0.000 description 5
- 229910052760 oxygen Inorganic materials 0.000 description 5
- 230000009977 dual effect Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000012432 intermediate storage Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 238000000638 solvent extraction Methods 0.000 description 3
- 229910052770 Uranium Inorganic materials 0.000 description 2
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000003071 parasitic effect Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000009849 deactivation Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002513 implantation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000013067 intermediate product Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013215 result calculation Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
- G06F7/487—Multiplying; Dividing
- G06F7/4876—Multiplying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/499—Denomination or exception handling, e.g. rounding or overflow
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/499—Denomination or exception handling, e.g. rounding or overflow
- G06F7/49905—Exception handling
- G06F7/4991—Overflow or underflow
- G06F7/49915—Mantissa overflow or underflow in handling floating-point numbers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/499—Denomination or exception handling, e.g. rounding or overflow
- G06F7/49942—Significance control
- G06F7/49947—Rounding
- G06F7/49957—Implementation of IEEE-754 Standard
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/22—Microcontrol or microprogram arrangements
- G06F9/223—Execution means for microinstructions irrespective of the microinstruction function, e.g. decoding of microinstructions and nanoinstructions; timing of microinstructions; programmable logic arrays; delays and fan-out problems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
- G06F9/30014—Arithmetic instructions with variable precision
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/3017—Runtime instruction translation, e.g. macros
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
- G06F9/30185—Instruction operation extension or modification according to one or more bits in the instruction, e.g. prefix, sub-opcode
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
- G06F7/485—Adding; Subtracting
Description
本出願は、2014年7月2日に申請され「Non−Atomic Split−Path Fused Multiply−Accumulate with Rounding cache」と題された米国仮特許出願第62/020,246号及び2015年6月10日に申請され「Non−Atomic Temporally−Split Fused Multiply−Accumulate Apparatus and Operation Using a Calculation Control Indicator Cache and Providing a Split−Path Heuristic for Performing a Fused FMA Operation and Generating a Standard Format Intermediate Result」と題された米国仮特許出願第62/173,808号の利益を主張するものであり、上記出願の双方が本明細書において参照により援用される。
と、加算/減算累算演算子インジケータOSとをさらに受け取る。別の実装では、アキュムレータ・アライメント及び注入ロジック220は、加算/減算累算演算子インジケータOSが修正された乗算器45によって受け取られたマイクロ命令が乗算減算マイクロ命令であることを指示する場合に、CMを選択的に加法的に反転する。
Hokenek, Montoye, Cook,“Second−Generation RISC Floating Point with Multiply− Add Fused”,IEEE Journal Of Solid−State Circuits, Vol 25, No 5, Oct 1990.
Lang, Bruguera,“Floating−Point Multiply−Add−Fused with Reduced Latency”,IEEE Trans On Computers, Vol 53, No 8, Aug 2004.
Bruguera, Lang,“Floating−Point Fused Multiply−Add: Reduced Latency for Floating−Point Addition”,Pub TBD − Exact Title Important.
Vangal, Hoskote, Borkar, Alvanpour,“A 6.2−GFlops Floating−Point Multiply−Accumulator With Conditional Normalization”,IEEE Jour. Of Solid−State Circuits, Vol 41, No 10, Oct 2006.
Galal, Horowitz,“Energy−Efficient Floating−Point Unit Design”,IEEE Trans On Computers Vol 60, No 7, July 2011.
Srinivasan, Bhudiya, Ramanarayanan, Babu, Jacob, Mathew, Krishnamurthy, Erraguntla,“Split−path Fused Floating Point Multiply Accumulate (FPMAC)”,2013 Symp on Computer Arithmetic (paper).
Srinivasan, Bhudiya, Ramanarayanan, Babu, Jacob, Mathew, Krishnamurthy, Erraguntla,“Split−path Fused Floating Point Multiply Accumulate (FPMAC)”,2014 Symp on Computer Arithmetic, Austin TX, (slides from www.arithsymposium.org).
Srinivasan, Bhudiya, Ramanarayanan, Babu, Jacob, Mathew, Krishnamurthy, Erraguntla, United States Patent 8,577,948 (B2), Nov 5, 2013.
Quach, Flynn,“Suggestions For Implementing A Fast IEEE Multiply−Add−Fused Instruction”,(Stanford) Technical Report CSL−TR−91−483 July, 1991.
Seidel,“Multiple Path IEEE Floating−Point Fused Multiply−Add”,IEEE 2004.
Huang, Shen, Dai, Wang,“A New Architecture For Multiple−Precision Floating− Point Multiply−Add Fused Unit Design”,Pub TBD, Nat’l University of Defense Tech, China (after) 2006.
Paidimarri, Cevrero, Brisk, Ienne,“FPGA Implementation of a Single−Precision Floating−Point Multiply−Accumulator with Single−Cycle Accumulation”,Pub TBD.
Henry, Elliott, Parks,“X87 Fused Multiply−Add Instruction”,United States Patent 7,917,568 (B2), Mar 29, 2011.
Walaa Abd El Aziz Ibrahim,“Binary Floating Point Fused Multiply Add Unit”,Thesis Submitted to Cairo University, Giza, Egypt, 2012 (retr from Google).
Quinell,“Floating−Point Fused Multiply−Add Architectures”,Dissertation Presented to Univ Texas at Austin, May 2007, (retr from Google).
Author Unknown,“AMD Athlon Processor Floating Point Capability”,AMD White Paper Aug 28, 2000.
Cornea, Harrison, Tang,“Intel Itanium Floating−Point Architecture”,Pub TBD.
Gerwig, Wetter, Schwarz, Haess, Krygowski, Fleischer, Kroener,“The IBM eServer z990 floating−point unit”,IBM Jour Res & Dev Vol 48 No 3/4 May, July 2004.
Wait,“IBM PowerPC 440 FPU with complex−arithmetic extensions”,IBM Jour Res & Dev Vol 49 No 2/3 March, May 2005.
Chatterjee, Bachega, et al,“Design and exploitation of a high−performance SIMD floating−point unit for Blue Gene/L”,IBM Jour Res & Dev, Vol 49 No 2/3 March, May 2005.
Claims (14)
- マイクロプロセッサにおいて形式±A*B±Cの融合積和演算を実行するための方法であって、A、B、及びCは入力オペランドであり、CがAとBとの積に累算される前に丸めは生じず、当該方法は、
前記融合積和演算を、第1及び第2の積和サブ演算に分割するステップと、
前記第1の積和サブ演算において、(i)AとBとの部分積をCと累算するか、又は(ii)AとBとの部分積のみを累算するかを選択し、前記(i)又は(ii)の場合の累算の結果から丸められていない非冗長和を生成するステップと、
前記丸められていない非冗長和の複数のMSBから、丸められていない非冗長中間結果ベクトルを生成するステップ;
前記丸められていない非冗長和から除外された複数のLSBから1つ以上の丸めインジケータを生成するステップ;
前記第1の積和サブ演算がCを累算することなく前記丸められていない非冗長和を生成した場合に、前記第2の積和サブ演算において、Cを、前記丸められていない非冗長中間結果ベクトルと累算するステップと、
前記(i)の場合に得られる前記丸められていない非冗長和又は前記(ii)の場合に前記第2の積和サブ演算で得られる非冗長和に基づいて前記丸めインジケータを利用することにより、前記融合積和演算の最終的な丸められた結果を生成するステップと、
を含む方法。 - 前記第1の積和サブ演算と第2の積和サブ演算との間に、前記丸められていない非冗長和をメモリに記憶し、及び/又は前記丸められていない非冗長和を第1の命令実行ユニットから第2の命令実行ユニットに転送するステップ
をさらに含む請求項1に記載の方法。 - 複数の計算制御インジケータをメモリに記憶するステップ、及び/又は複数の計算制御インジケータを第1の命令実行ユニットから第2の命令実行ユニットに転送するステップをさらに含む、請求項1又は2に記載の方法。
- 前記メモリは、前記第1及び第2の命令実行ユニットの外部にあり、前記丸められていない非冗長和を記憶する結果ストアと、前記第2の積和サブ演算におけるその後の計算がどのように進行すべきかを指示するように前記AとBとの部分積に付随して生成される複数の計算制御インジケータを記憶する、前記結果ストアと区別される計算制御インジケータ・ストアとを備える、請求項2又は3に記載の方法。
- 前記計算制御インジケータは、前記丸められていない非冗長和から算術的に正しい丸められた結果を生成するためのものである、請求項3又は4に記載の方法。
- 形式±A*B±Cの融合積和演算を実行するように動作可能なマイクロプロセッサであって、A、B、及びCは入力オペランドであり、CがAとBとの積に累算される前に丸めは生じず、当該マイクロプロセッサは、
融合積和演算の第1及び第2の積和サブ演算を実行するように構成された2つ以上の命令実行ユニット
を備え、前記第1の積和サブ演算において、(i)AとBとの部分積をCと累算すること、又は(ii)AとBとの部分積のみを累算することの間で選択が行われ、前記(i)又は(ii)の場合の累算の結果から、丸められていない非冗長和が生成され、
前記丸められていない非冗長和の複数のMSBから、丸められていない非冗長中間結果ベクトルが生成され、
前記丸められていない非冗長和から除外された複数のLSBから1つ以上の丸めインジケータが生成され、
前記第1の積和サブ演算がCを累算することなく前記丸められていない非冗長和を生成した場合に、前記第2の積和サブ演算において、Cは、前記丸められていない非冗長中間結果ベクトルと累算され、
前記(i)の場合に得られる前記丸められていない非冗長和又は前記(ii)の場合に前記第2の積和サブ演算で得られる非冗長和に基づいて丸めインジケータを利用することにより、前記融合積和演算の最終的な丸められた結果が生成される、
マイクロプロセッサ。 - 前記第1の積和サブ演算によって生成された前記丸められていない非冗長和を記憶するための、前記2つ以上の命令実行ユニットの外部のメモリ、をさらに備え、前記メモリは、前記第2の積和サブ演算が実行中になるまで無期限に前記丸められていない非冗長和を記憶するように構成され、これにより、前記2つ以上の命令実行ユニットが前記第1の積和サブ演算と前記第2の積和サブ演算との間に前記融合積和演算に無関係の他の演算を実行することを可能にする、請求項6に記載のマイクロプロセッサ。
- 前記メモリは、前記丸められていない非冗長和を記憶する結果ストアと、前記第2の積和サブ演算におけるその後の計算がどのように進行すべきかを指示するように前記AとBとの積に付随して生成される複数の計算制御インジケータを記憶する、前記結果ストアと区別される計算制御インジケータ・ストアとを備える、請求項7に記載のマイクロプロセッサ。
- 前記2つ以上の命令実行ユニットは、前記第1の積和サブ演算を実行するように構成された乗算器と、前記第2の積和サブ演算を実行するように構成された加算器とを備える、請求項7又は8に記載のマイクロプロセッサ。
- マイクロプロセッサにおいて形式±A*B±Cの融合積和演算を実行するための方法であって、A、B、及びCは入力オペランドであり、当該方法は、
少なくともAとBとの積を計算し、丸められていない非冗長中間結果ベクトルを生成するための第1命令を、前記マイクロプロセッサの第1の実行ユニットにディスパッチするステップであって、前記第1の実行ユニットにおいて、(i)AとBとの部分積をCと累算するか、又は(ii)AとBとの部分積のみを累算するかを選択し、前記(i)又は(ii)の場合の累算の結果から丸められていない非冗長和を生成し、前記丸められていない非冗長和の複数のMSBから、丸められていない非冗長中間結果ベクトルを生成し、前記丸められていない非冗長和から除外された複数のLSBから1つ以上の丸めインジケータを生成するステップと、
前記(i)の場合に得られる前記丸められていない非冗長和と前記(ii)の場合に得られる丸められていない非冗長中間結果ベクトルとを受け取り、前記丸めインジケータを利用することにより、±A*B±Cの最終的な丸められた結果を生成するための第2命令を、前記マイクロプロセッサの第2の実行ユニットにディスパッチするステップと、
±A*B±Cの前記最終的な丸められた結果を保存するステップと、
を含む方法。 - 前記丸められていない非冗長中間結果ベクトルを前記第1の実行ユニットから前記第2の実行ユニットに転送するステップ、及び/又は前記計算の丸められていない結果を複数の実行ユニットの間で共有される共有メモリに保存するステップ、をさらに含む請求項10に記載の方法。
- 前記第1の実行ユニットが、前記第2の実行ユニットにおけるその後の計算がどのように進行すべきかを指示するように前記AとBとの積に付随して生成される1つ以上の計算制御インジケータを生成するステップであって、前記第1の実行ユニットは、少なくともAとBとの積の前記計算及び前記丸められていない非冗長中間結果ベクトルの前記生成に付随的に前記1つ以上の計算制御インジケータを生成する、ステップと、
前記第2の実行ユニットが前記1つ以上の計算制御インジケータを受け取り、前記丸められていない結果及び前記計算制御インジケータを使用して前記最終的な丸められた結果を生成するステップと、
をさらに含む請求項10又は11に記載の方法。 - マイクロプロセッサにおいて形式±A*B±Cの融合積和演算を実行するための方法であって、A、B、及びCは入力オペランドであり、当該方法は、
少なくともAとBとの積を計算し、丸められていない非冗長中間結果ベクトルを生成するための第1命令を、前記マイクロプロセッサの第1の実行ユニットにディスパッチするステップであって、前記第1の実行ユニットにおいて、(i)AとBとの部分積をCと累算するか、又は(ii)AとBとの部分積のみを累算するかを選択し、前記(i)又は(ii)の場合の累算の結果から丸められていない非冗長和を生成し、前記丸められていない非冗長和の複数のMSBから、丸められていない非冗長中間結果ベクトルを生成し、前記丸められていない非冗長和から除外された複数のLSBから1つ以上の丸めインジケータを生成するステップと、
前記融合積和演算のその後の計算がどのように進行すべきかを指示するように前記AとBとの積に付随して生成される計算制御インジケータを生成するステップと、
前記(i)の場合に得られる前記丸められていない非冗長和と前記(ii)の場合に得られる丸められていない非冗長中間結果ベクトル及び計算制御インジケータを受け取る第2命令を、前記マイクロプロセッサの第2の実行ユニットにディスパッチし、前記計算制御インジケータ及び前記丸めインジケータに従って±A*B±Cの最終的な丸められた結果を生成するステップと、
を含む方法。 - 前記計算制御インジケータは、前記第1の実行ユニットがCをAとBとの積に累算したか否かの指示を含む、請求項13に記載の方法。
Applications Claiming Priority (19)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201462020246P | 2014-07-02 | 2014-07-02 | |
US62/020,246 | 2014-07-02 | ||
US201562173808P | 2015-06-10 | 2015-06-10 | |
US62/173,808 | 2015-06-10 | ||
US14/748,924 US10019229B2 (en) | 2014-07-02 | 2015-06-24 | Calculation control indicator cache |
US14/748,817 | 2015-06-24 | ||
US14/749,002 US9798519B2 (en) | 2014-07-02 | 2015-06-24 | Standard format intermediate result |
US14/749,050 | 2015-06-24 | ||
US14/748,870 | 2015-06-24 | ||
US14/749,002 | 2015-06-24 | ||
US14/749,088 | 2015-06-24 | ||
US14/748,817 US9778907B2 (en) | 2014-07-02 | 2015-06-24 | Non-atomic split-path fused multiply-accumulate |
PCT/US2015/037508 WO2016003740A1 (en) | 2014-07-02 | 2015-06-24 | Split-path fused multiply-accumulate operation using first and second sub-operations |
US14/748,870 US9778908B2 (en) | 2014-07-02 | 2015-06-24 | Temporally split fused multiply-accumulate operation |
US14/749,088 US9891887B2 (en) | 2014-07-02 | 2015-06-24 | Subdivision of a fused compound arithmetic operation |
US14/748,956 | 2015-06-24 | ||
US14/748,956 US10019230B2 (en) | 2014-07-02 | 2015-06-24 | Calculation control indicator cache |
US14/749,050 US9891886B2 (en) | 2014-07-02 | 2015-06-24 | Split-path heuristic for performing a fused FMA operation |
US14/748,924 | 2015-06-24 |
Publications (2)
Publication Number | Publication Date |
---|---|
JP2016535360A JP2016535360A (ja) | 2016-11-10 |
JP6684713B2 true JP6684713B2 (ja) | 2020-04-22 |
Family
ID=53502534
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2016538834A Active JP6684713B2 (ja) | 2014-07-02 | 2015-06-24 | 融合積和演算を実行するための方法及びマイクロプロセッサ |
JP2015227713A Active JP6207574B2 (ja) | 2014-07-02 | 2015-11-20 | 計算制御インジケータキャッシュ |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP2015227713A Active JP6207574B2 (ja) | 2014-07-02 | 2015-11-20 | 計算制御インジケータキャッシュ |
Country Status (6)
Country | Link |
---|---|
US (7) | US9798519B2 (ja) |
EP (2) | EP2963538B1 (ja) |
JP (2) | JP6684713B2 (ja) |
CN (7) | CN106406810B (ja) |
TW (7) | TWI650652B (ja) |
WO (1) | WO2016003740A1 (ja) |
Families Citing this family (96)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9513906B2 (en) * | 2013-01-23 | 2016-12-06 | International Business Machines Corporation | Vector checksum instruction |
US9606803B2 (en) | 2013-07-15 | 2017-03-28 | Texas Instruments Incorporated | Highly integrated scalable, flexible DSP megamodule architecture |
US11106462B2 (en) * | 2019-05-24 | 2021-08-31 | Texas Instruments Incorporated | Method and apparatus for vector sorting |
US20150065928A1 (en) * | 2013-08-30 | 2015-03-05 | ISOS Solutions, LLC | Apparatus for Reducing the Appearance and Effects of Scars |
US11432990B2 (en) | 2013-08-30 | 2022-09-06 | ISOS Solutions, LLC | Textured apparatus with therapeutic material incorporated therein and methods of manufacturing same |
US11061672B2 (en) | 2015-10-02 | 2021-07-13 | Via Alliance Semiconductor Co., Ltd. | Chained split execution of fused compound arithmetic operations |
US10671347B2 (en) * | 2016-01-28 | 2020-06-02 | International Business Machines Corporation | Stochastic rounding floating-point multiply instruction using entropy from a register |
US10489152B2 (en) | 2016-01-28 | 2019-11-26 | International Business Machines Corporation | Stochastic rounding floating-point add instruction using entropy from a register |
US10282169B2 (en) | 2016-04-06 | 2019-05-07 | Apple Inc. | Floating-point multiply-add with down-conversion |
US10275243B2 (en) | 2016-07-02 | 2019-04-30 | Intel Corporation | Interruptible and restartable matrix multiplication instructions, processors, methods, and systems |
GB2553783B (en) | 2016-09-13 | 2020-11-04 | Advanced Risc Mach Ltd | Vector multiply-add instruction |
US10241757B2 (en) | 2016-09-30 | 2019-03-26 | International Business Machines Corporation | Decimal shift and divide instruction |
US10127015B2 (en) | 2016-09-30 | 2018-11-13 | International Business Machines Corporation | Decimal multiply and shift instruction |
US10078512B2 (en) | 2016-10-03 | 2018-09-18 | Via Alliance Semiconductor Co., Ltd. | Processing denormal numbers in FMA hardware |
US20180121168A1 (en) * | 2016-10-27 | 2018-05-03 | Altera Corporation | Denormalization in multi-precision floating-point arithmetic circuitry |
CN109710559A (zh) * | 2016-11-03 | 2019-05-03 | 北京中科寒武纪科技有限公司 | Slam运算装置和方法 |
US10140092B2 (en) | 2016-11-04 | 2018-11-27 | Samsung Electronics Co., Ltd. | Closepath fast incremented sum in a three-path fused multiply-add design |
US10216479B2 (en) * | 2016-12-06 | 2019-02-26 | Arm Limited | Apparatus and method for performing arithmetic operations to accumulate floating-point numbers |
US10515302B2 (en) * | 2016-12-08 | 2019-12-24 | Via Alliance Semiconductor Co., Ltd. | Neural network unit with mixed data and weight size computation capability |
KR102649318B1 (ko) * | 2016-12-29 | 2024-03-20 | 삼성전자주식회사 | 상태 회로를 포함하는 메모리 장치와 그것의 동작 방법 |
US10303438B2 (en) * | 2017-01-16 | 2019-05-28 | International Business Machines Corporation | Fused-multiply-add floating-point operations on 128 bit wide operands |
US10452288B2 (en) * | 2017-01-19 | 2019-10-22 | International Business Machines Corporation | Identifying processor attributes based on detecting a guarded storage event |
GB2560159B (en) * | 2017-02-23 | 2019-12-25 | Advanced Risc Mach Ltd | Widening arithmetic in a data processing apparatus |
EP4053695A1 (en) | 2017-03-20 | 2022-09-07 | INTEL Corporation | Systems, methods, and apparatuses for dot production operations |
US10489877B2 (en) | 2017-04-24 | 2019-11-26 | Intel Corporation | Compute optimization mechanism |
US10055383B1 (en) * | 2017-04-28 | 2018-08-21 | Hewlett Packard Enterprise Development Lp | Matrix circuits |
US10338919B2 (en) * | 2017-05-08 | 2019-07-02 | Nvidia Corporation | Generalized acceleration of matrix multiply accumulate operations |
DE102018110607A1 (de) | 2017-05-08 | 2018-11-08 | Nvidia Corporation | Verallgemeinerte Beschleunigung von Matrix-Multiplikations-und-Akkumulations-Operationen |
CN107315710B (zh) | 2017-06-27 | 2020-09-11 | 上海兆芯集成电路有限公司 | 全精度及部分精度数值的计算方法及装置 |
CN107291420B (zh) | 2017-06-27 | 2020-06-05 | 上海兆芯集成电路有限公司 | 整合算术及逻辑处理的装置 |
WO2019009870A1 (en) | 2017-07-01 | 2019-01-10 | Intel Corporation | SAVE BACKGROUND TO VARIABLE BACKUP STATUS SIZE |
US10235135B2 (en) | 2017-07-17 | 2019-03-19 | International Business Machines Corporation | Normalization of a product on a datapath |
US10387147B2 (en) | 2017-08-02 | 2019-08-20 | International Business Machines Corporation | Managing an issue queue for fused instructions and paired instructions in a microprocessor |
CN107833176A (zh) | 2017-10-30 | 2018-03-23 | 上海寒武纪信息科技有限公司 | 一种信息处理方法及相关产品 |
CN109783055B (zh) * | 2017-11-10 | 2021-02-12 | 瑞昱半导体股份有限公司 | 浮点数运算电路及方法 |
US10481869B1 (en) * | 2017-11-10 | 2019-11-19 | Apple Inc. | Multi-path fused multiply-add with power control |
US11809869B2 (en) | 2017-12-29 | 2023-11-07 | Intel Corporation | Systems and methods to store a tile register pair to memory |
US11093247B2 (en) | 2017-12-29 | 2021-08-17 | Intel Corporation | Systems and methods to load a tile register pair |
US11023235B2 (en) | 2017-12-29 | 2021-06-01 | Intel Corporation | Systems and methods to zero a tile register pair |
US11816483B2 (en) | 2017-12-29 | 2023-11-14 | Intel Corporation | Systems, methods, and apparatuses for matrix operations |
US11669326B2 (en) | 2017-12-29 | 2023-06-06 | Intel Corporation | Systems, methods, and apparatuses for dot product operations |
US11789729B2 (en) | 2017-12-29 | 2023-10-17 | Intel Corporation | Systems and methods for computing dot products of nibbles in two tile operands |
JP6863907B2 (ja) * | 2018-01-05 | 2021-04-21 | 日本電信電話株式会社 | 演算回路 |
CN108364065B (zh) * | 2018-01-19 | 2020-09-11 | 上海兆芯集成电路有限公司 | 采布斯乘法的微处理器 |
CN108416431B (zh) * | 2018-01-19 | 2021-06-01 | 上海兆芯集成电路有限公司 | 神经网络微处理器与宏指令处理方法 |
CN108363559B (zh) * | 2018-02-13 | 2022-09-27 | 北京旷视科技有限公司 | 神经网络的乘法处理方法、设备和计算机可读介质 |
CN110276447A (zh) * | 2018-03-14 | 2019-09-24 | 上海寒武纪信息科技有限公司 | 一种计算装置及方法 |
US10664287B2 (en) | 2018-03-30 | 2020-05-26 | Intel Corporation | Systems and methods for implementing chained tile operations |
DE102018209901A1 (de) * | 2018-06-19 | 2019-12-19 | Robert Bosch Gmbh | Recheneinheit, Verfahren und Computerprogramm zum Multiplizieren zumindest zweier Multiplikanden |
US11093579B2 (en) | 2018-09-05 | 2021-08-17 | Intel Corporation | FP16-S7E8 mixed precision for deep learning and other algorithms |
US10970076B2 (en) | 2018-09-14 | 2021-04-06 | Intel Corporation | Systems and methods for performing instructions specifying ternary tile logic operations |
US11579883B2 (en) | 2018-09-14 | 2023-02-14 | Intel Corporation | Systems and methods for performing horizontal tile operations |
US10990396B2 (en) | 2018-09-27 | 2021-04-27 | Intel Corporation | Systems for performing instructions to quickly convert and use tiles as 1D vectors |
US10719323B2 (en) | 2018-09-27 | 2020-07-21 | Intel Corporation | Systems and methods for performing matrix compress and decompress instructions |
US10866786B2 (en) | 2018-09-27 | 2020-12-15 | Intel Corporation | Systems and methods for performing instructions to transpose rectangular tiles |
US10896043B2 (en) | 2018-09-28 | 2021-01-19 | Intel Corporation | Systems for performing instructions for fast element unpacking into 2-dimensional registers |
US10963256B2 (en) | 2018-09-28 | 2021-03-30 | Intel Corporation | Systems and methods for performing instructions to transform matrices into row-interleaved format |
US10929143B2 (en) | 2018-09-28 | 2021-02-23 | Intel Corporation | Method and apparatus for efficient matrix alignment in a systolic array |
US10963246B2 (en) | 2018-11-09 | 2021-03-30 | Intel Corporation | Systems and methods for performing 16-bit floating-point matrix dot product instructions |
CN111221496B (zh) * | 2018-11-26 | 2023-06-13 | 北京华航无线电测量研究所 | 一种使用fpga实现浮点数据累加的方法 |
CN111260069B (zh) * | 2018-11-30 | 2022-12-09 | 上海寒武纪信息科技有限公司 | 数据处理装置、方法、芯片及电子设备 |
US10929503B2 (en) | 2018-12-21 | 2021-02-23 | Intel Corporation | Apparatus and method for a masked multiply instruction to support neural network pruning operations |
US11886875B2 (en) | 2018-12-26 | 2024-01-30 | Intel Corporation | Systems and methods for performing nibble-sized operations on matrix elements |
US11294671B2 (en) | 2018-12-26 | 2022-04-05 | Intel Corporation | Systems and methods for performing duplicate detection instructions on 2D data |
US20200210517A1 (en) | 2018-12-27 | 2020-07-02 | Intel Corporation | Systems and methods to accelerate multiplication of sparse matrices |
US10942985B2 (en) | 2018-12-29 | 2021-03-09 | Intel Corporation | Apparatuses, methods, and systems for fast fourier transform configuration and computation instructions |
US10922077B2 (en) | 2018-12-29 | 2021-02-16 | Intel Corporation | Apparatuses, methods, and systems for stencil configuration and computation instructions |
US11016731B2 (en) | 2019-03-29 | 2021-05-25 | Intel Corporation | Using Fuzzy-Jbit location of floating-point multiply-accumulate results |
US11269630B2 (en) | 2019-03-29 | 2022-03-08 | Intel Corporation | Interleaved pipeline of floating-point adders |
US10990397B2 (en) | 2019-03-30 | 2021-04-27 | Intel Corporation | Apparatuses, methods, and systems for transpose instructions of a matrix operations accelerator |
US11175891B2 (en) | 2019-03-30 | 2021-11-16 | Intel Corporation | Systems and methods to perform floating-point addition with selected rounding |
CN111814093A (zh) * | 2019-04-12 | 2020-10-23 | 杭州中天微系统有限公司 | 一种乘累加指令的处理方法和处理装置 |
US11403097B2 (en) | 2019-06-26 | 2022-08-02 | Intel Corporation | Systems and methods to skip inconsequential matrix operations |
US11334647B2 (en) | 2019-06-29 | 2022-05-17 | Intel Corporation | Apparatuses, methods, and systems for enhanced matrix multiplier architecture |
US10825512B1 (en) | 2019-08-27 | 2020-11-03 | Nxp Usa, Inc. | Memory reads of weight values |
US11829729B2 (en) | 2019-09-05 | 2023-11-28 | Micron Technology, Inc. | Spatiotemporal fused-multiply-add, and related systems, methods and devices |
US11934824B2 (en) | 2019-09-05 | 2024-03-19 | Micron Technology, Inc. | Methods for performing processing-in-memory operations, and related memory devices and systems |
US11693657B2 (en) * | 2019-09-05 | 2023-07-04 | Micron Technology, Inc. | Methods for performing fused-multiply-add operations on serially allocated data within a processing-in-memory capable memory device, and related memory devices and systems |
US11288220B2 (en) * | 2019-10-18 | 2022-03-29 | Achronix Semiconductor Corporation | Cascade communications between FPGA tiles |
US11119772B2 (en) | 2019-12-06 | 2021-09-14 | International Business Machines Corporation | Check pointing of accumulator register results in a microprocessor |
US11714875B2 (en) | 2019-12-28 | 2023-08-01 | Intel Corporation | Apparatuses, methods, and systems for instructions of a matrix operations accelerator |
CN113126954B (zh) | 2019-12-31 | 2024-04-09 | 华为技术有限公司 | 浮点数乘法计算的方法、装置和算术逻辑单元 |
US11182159B2 (en) | 2020-02-26 | 2021-11-23 | Google Llc | Vector reductions using shared scratchpad memory |
CN113391788B (zh) * | 2020-03-11 | 2024-01-26 | 芯立嘉集成电路(杭州)有限公司 | 存储器内算术处理器及存储器内算术处理方法 |
CN113721886A (zh) * | 2020-05-25 | 2021-11-30 | 瑞昱半导体股份有限公司 | 对数计算方法及对数计算电路 |
WO2021250689A1 (en) * | 2020-06-12 | 2021-12-16 | Gulzar Singh | Novel hardware accelerator circuit for bit-level operations in a microcontroller |
US11537861B2 (en) | 2020-06-23 | 2022-12-27 | Micron Technology, Inc. | Methods of performing processing-in-memory operations, and related devices and systems |
TWI746126B (zh) * | 2020-08-25 | 2021-11-11 | 創鑫智慧股份有限公司 | 矩陣乘法裝置及其操作方法 |
US11941395B2 (en) | 2020-09-26 | 2024-03-26 | Intel Corporation | Apparatuses, methods, and systems for instructions for 16-bit floating-point matrix dot product instructions |
US11029920B1 (en) * | 2020-10-21 | 2021-06-08 | Chariot Technologies Lab, Inc. | Execution of a conditional statement by an arithmetic and/or bitwise unit |
WO2022150058A1 (en) * | 2021-01-07 | 2022-07-14 | Groq, Inc. | Numerical precision in digital multiplier circuitry |
US11663004B2 (en) | 2021-02-26 | 2023-05-30 | International Business Machines Corporation | Vector convert hexadecimal floating point to scaled decimal instruction |
US11360769B1 (en) | 2021-02-26 | 2022-06-14 | International Business Machines Corporation | Decimal scale and convert and split to hexadecimal floating point instruction |
US11625244B2 (en) * | 2021-06-22 | 2023-04-11 | Intel Corporation | Native support for execution of get exponent, get mantissa, and scale instructions within a graphics processing unit via reuse of fused multiply-add execution unit hardware logic |
US20230129750A1 (en) | 2021-10-27 | 2023-04-27 | International Business Machines Corporation | Performing a floating-point multiply-add operation in a computer implemented environment |
CN117149099B (zh) * | 2023-10-31 | 2024-03-12 | 江苏华鲲振宇智能科技有限责任公司 | 一种计算存储分体式服务器系统及控制方法 |
Family Cites Families (58)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB1527289A (en) | 1976-08-17 | 1978-10-04 | Int Computers Ltd | Data processing systems |
US4974198A (en) | 1986-07-16 | 1990-11-27 | Nec Corporation | Vector processing system utilizing firm ware control to prevent delays during processing operations |
JPH01119861A (ja) * | 1987-11-02 | 1989-05-11 | Sharp Corp | ディジタル信号処理用lsi |
JPH04177527A (ja) * | 1990-11-09 | 1992-06-24 | Hitachi Ltd | 演算処理回路 |
US5375078A (en) | 1992-12-15 | 1994-12-20 | International Business Machines Corporation | Arithmetic unit for performing XY+B operation |
US5347481A (en) | 1993-02-01 | 1994-09-13 | Hal Computer Systems, Inc. | Method and apparatus for multiplying denormalized binary floating point numbers without additional delay |
DE69519449T2 (de) | 1994-05-05 | 2001-06-21 | Conexant Systems Inc | Raumzeigersdatenpfad |
JP2987308B2 (ja) * | 1995-04-28 | 1999-12-06 | 松下電器産業株式会社 | 情報処理装置 |
GB9513115D0 (en) | 1995-06-28 | 1995-08-30 | Biochemie Sa | Organic compounds |
US5867413A (en) * | 1995-10-17 | 1999-02-02 | Hitachi Micro Systems, Inc. | Fast method of floating-point multiplication and accumulation |
US5880983A (en) | 1996-03-25 | 1999-03-09 | International Business Machines Corporation | Floating point split multiply/add system which has infinite precision |
JPH09325953A (ja) * | 1996-06-06 | 1997-12-16 | Hitachi Ltd | プロセッサおよびデータ処理装置 |
JP3790307B2 (ja) * | 1996-10-16 | 2006-06-28 | 株式会社ルネサステクノロジ | データプロセッサ及びデータ処理システム |
KR100291383B1 (ko) * | 1996-11-18 | 2001-09-17 | 윤종용 | 디지털신호처리를위한명령을지원하는모듈계산장치및방법 |
US5880984A (en) | 1997-01-13 | 1999-03-09 | International Business Machines Corporation | Method and apparatus for performing high-precision multiply-add calculations using independent multiply and add instruments |
US6233672B1 (en) * | 1997-03-06 | 2001-05-15 | Advanced Micro Devices, Inc. | Piping rounding mode bits with floating point instructions to eliminate serialization |
US6094668A (en) | 1997-10-23 | 2000-07-25 | Advanced Micro Devices, Inc. | Floating point arithmetic unit including an efficient close data path |
US6611856B1 (en) * | 1999-12-23 | 2003-08-26 | Intel Corporation | Processing multiply-accumulate operations in a single cycle |
US20040098439A1 (en) | 2000-02-22 | 2004-05-20 | Bass Stephen L. | Apparatus and method for sharing overflow/underflow compare hardware in a floating-point multiply-accumulate (FMAC) or floating-point adder (FADD) unit |
US7117372B1 (en) | 2000-11-28 | 2006-10-03 | Xilinx, Inc. | Programmable logic device with decryption and structure for preventing design relocation |
US6779013B2 (en) | 2001-06-04 | 2004-08-17 | Intel Corporation | Floating point overflow and sign detection |
US7080111B2 (en) * | 2001-06-04 | 2006-07-18 | Intel Corporation | Floating point multiply accumulator |
US6947962B2 (en) * | 2002-01-24 | 2005-09-20 | Intel Corporation | Overflow prediction algorithm and logic for high speed arithmetic units |
WO2003100602A2 (en) * | 2002-05-24 | 2003-12-04 | Koninklijke Philips Electronics N.V. | A scalar/vector processor |
US7689641B2 (en) | 2003-06-30 | 2010-03-30 | Intel Corporation | SIMD integer multiply high with round and shift |
GB2411975B (en) | 2003-12-09 | 2006-10-04 | Advanced Risc Mach Ltd | Data processing apparatus and method for performing arithmetic operations in SIMD data processing |
US7433911B2 (en) | 2004-12-21 | 2008-10-07 | Arm Limited | Data processing apparatus and method for performing floating point addition |
US7401107B2 (en) | 2004-12-22 | 2008-07-15 | Arm Limited | Data processing apparatus and method for converting a fixed point number to a floating point number |
US7730117B2 (en) * | 2005-02-09 | 2010-06-01 | International Business Machines Corporation | System and method for a floating point unit with feedback prior to normalization and rounding |
US7461117B2 (en) | 2005-02-11 | 2008-12-02 | International Business Machines Corporation | Floating point unit with fused multiply add and method for calculating a result with a floating point unit |
US20070038693A1 (en) | 2005-08-10 | 2007-02-15 | Christian Jacobi | Method and Processor for Performing a Floating-Point Instruction Within a Processor |
JP4956950B2 (ja) * | 2005-09-29 | 2012-06-20 | ソニー株式会社 | 反射型スクリーン |
WO2007094047A2 (ja) | 2006-02-14 | 2007-08-23 | Fujitsu Ltd | 演算装置および演算方法 |
US7912887B2 (en) * | 2006-05-10 | 2011-03-22 | Qualcomm Incorporated | Mode-based multiply-add recoding for denormal operands |
US8429384B2 (en) | 2006-07-11 | 2013-04-23 | Harman International Industries, Incorporated | Interleaved hardware multithreading processor architecture |
US9223751B2 (en) * | 2006-09-22 | 2015-12-29 | Intel Corporation | Performing rounding operations responsive to an instruction |
US8321849B2 (en) * | 2007-01-26 | 2012-11-27 | Nvidia Corporation | Virtual architecture and instruction set for parallel thread computing |
US8443029B2 (en) * | 2007-03-01 | 2013-05-14 | International Business Machines Corporation | Round for reround mode in a decimal floating point instruction |
US8078660B2 (en) | 2007-04-10 | 2011-12-13 | The Board Of Regents, University Of Texas System | Bridge fused multiply-adder circuit |
US7917568B2 (en) | 2007-04-10 | 2011-03-29 | Via Technologies, Inc. | X87 fused multiply-add instruction |
US8046399B1 (en) | 2008-01-25 | 2011-10-25 | Oracle America, Inc. | Fused multiply-add rounding and unfused multiply-add rounding in a single multiply-add module |
US20090248769A1 (en) | 2008-03-26 | 2009-10-01 | Teck-Kuen Chua | Multiply and accumulate digital filter operations |
US8046400B2 (en) * | 2008-04-10 | 2011-10-25 | Via Technologies, Inc. | Apparatus and method for optimizing the performance of x87 floating point addition instructions in a microprocessor |
US9507656B2 (en) | 2009-04-16 | 2016-11-29 | Oracle America, Inc. | Mechanism for handling unfused multiply-accumulate accrued exception bits in a processor |
JP5491071B2 (ja) * | 2009-05-20 | 2014-05-14 | エヌイーシーコンピュータテクノ株式会社 | 命令融合演算装置および命令融合演算方法 |
CN101930354B (zh) * | 2009-07-28 | 2014-03-12 | 威盛电子股份有限公司 | 微处理器及其执行指令的方法 |
US8386755B2 (en) * | 2009-07-28 | 2013-02-26 | Via Technologies, Inc. | Non-atomic scheduling of micro-operations to perform round instruction |
US8990282B2 (en) | 2009-09-21 | 2015-03-24 | Arm Limited | Apparatus and method for performing fused multiply add floating point operation |
CN101825998B (zh) * | 2010-01-22 | 2012-09-05 | 龙芯中科技术有限公司 | 向量复数乘法运算的处理方法及相应的装置 |
US8577948B2 (en) * | 2010-09-20 | 2013-11-05 | Intel Corporation | Split path multiply accumulate unit |
US8914430B2 (en) * | 2010-09-24 | 2014-12-16 | Intel Corporation | Multiply add functional unit capable of executing scale, round, GETEXP, round, GETMANT, reduce, range and class instructions |
CN101986264B (zh) * | 2010-11-25 | 2013-07-31 | 中国人民解放军国防科学技术大学 | 用于simd向量微处理器的多功能浮点乘加运算装置 |
US8965945B2 (en) * | 2011-02-17 | 2015-02-24 | Arm Limited | Apparatus and method for performing floating point addition |
US8671129B2 (en) | 2011-03-08 | 2014-03-11 | Oracle International Corporation | System and method of bypassing unrounded results in a multiply-add pipeline unit |
US9213523B2 (en) * | 2012-06-29 | 2015-12-15 | Intel Corporation | Double rounded combined floating-point multiply and add |
US8892619B2 (en) | 2012-07-24 | 2014-11-18 | The Board Of Trustees Of The Leland Stanford Junior University | Floating-point multiply-add unit using cascade design |
US9152382B2 (en) * | 2012-10-31 | 2015-10-06 | Intel Corporation | Reducing power consumption in a fused multiply-add (FMA) unit responsive to input data values |
US11061672B2 (en) | 2015-10-02 | 2021-07-13 | Via Alliance Semiconductor Co., Ltd. | Chained split execution of fused compound arithmetic operations |
-
2015
- 2015-06-24 US US14/749,002 patent/US9798519B2/en active Active
- 2015-06-24 WO PCT/US2015/037508 patent/WO2016003740A1/en active Application Filing
- 2015-06-24 CN CN201610722812.6A patent/CN106406810B/zh active Active
- 2015-06-24 CN CN201610722858.8A patent/CN106325810B/zh active Active
- 2015-06-24 CN CN201610722859.2A patent/CN106293610B/zh active Active
- 2015-06-24 US US14/748,817 patent/US9778907B2/en active Active
- 2015-06-24 US US14/748,870 patent/US9778908B2/en active Active
- 2015-06-24 CN CN201580003388.3A patent/CN105849690B/zh active Active
- 2015-06-24 US US14/748,924 patent/US10019229B2/en active Active
- 2015-06-24 US US14/749,088 patent/US9891887B2/en active Active
- 2015-06-24 CN CN201610726133.6A patent/CN106325811B/zh active Active
- 2015-06-24 JP JP2016538834A patent/JP6684713B2/ja active Active
- 2015-06-24 US US14/748,956 patent/US10019230B2/en active Active
- 2015-06-24 CN CN201610726893.7A patent/CN106126189B/zh active Active
- 2015-06-24 US US14/749,050 patent/US9891886B2/en active Active
- 2015-06-24 CN CN201610726151.4A patent/CN106339202B/zh active Active
- 2015-07-01 EP EP15174801.9A patent/EP2963538B1/en active Active
- 2015-07-01 EP EP15174805.0A patent/EP2963539B1/en active Active
- 2015-07-02 TW TW104121548A patent/TWI650652B/zh active
- 2015-07-02 TW TW104121545A patent/TWI605384B/zh active
- 2015-07-02 TW TW104121552A patent/TWI601019B/zh active
- 2015-07-02 TW TW104121551A patent/TWI625671B/zh active
- 2015-07-02 TW TW104121550A patent/TWI608410B/zh active
- 2015-07-02 TW TW104121547A patent/TWI634437B/zh active
- 2015-07-02 TW TW104121546A patent/TWI638312B/zh active
- 2015-11-20 JP JP2015227713A patent/JP6207574B2/ja active Active
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6684713B2 (ja) | 融合積和演算を実行するための方法及びマイクロプロセッサ | |
CN107077417B (zh) | 有效性配准 | |
Muller et al. | Hardware implementation of floating-point arithmetic | |
Bruintjes | Design of a fused multiply-add floating-point and integer datapath |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20160224 |
|
A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20160224 |
|
A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20170228 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20170314 |
|
A601 | Written request for extension of time |
Free format text: JAPANESE INTERMEDIATE CODE: A601 Effective date: 20170608 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20170713 |
|
A131 | Notification of reasons for refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A131 Effective date: 20180116 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20180413 |
|
A02 | Decision of refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A02 Effective date: 20181009 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20190206 |
|
A911 | Transfer to examiner for re-examination before appeal (zenchi) |
Free format text: JAPANESE INTERMEDIATE CODE: A911 Effective date: 20190214 |
|
A912 | Re-examination (zenchi) completed and case transferred to appeal board |
Free format text: JAPANESE INTERMEDIATE CODE: A912 Effective date: 20190329 |
|
A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20200109 |
|
A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20200330 |
|
R150 | Certificate of patent or registration of utility model |
Ref document number: 6684713 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |