USRE40883E1 - Methods and apparatus for dynamic instruction controlled reconfigurable register file with extended precision - Google Patents
Methods and apparatus for dynamic instruction controlled reconfigurable register file with extended precision Download PDFInfo
- Publication number
- USRE40883E1 USRE40883E1 US10/827,697 US82769704A USRE40883E US RE40883 E1 USRE40883 E1 US RE40883E1 US 82769704 A US82769704 A US 82769704A US RE40883 E USRE40883 E US RE40883E
- Authority
- US
- United States
- Prior art keywords
- value
- register file
- odd
- extended
- reconfigurable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 title claims description 16
- 238000012545 processing Methods 0.000 claims description 13
- 238000003672 processing method Methods 0.000 claims 3
- 238000009825 accumulation Methods 0.000 description 9
- 230000035508 accumulation Effects 0.000 description 9
- 238000013459 approach Methods 0.000 description 6
- 238000013461 design Methods 0.000 description 6
- 230000009977 dual effect Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 239000002131 composite material Substances 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30105—Register structure
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30105—Register structure
- G06F9/30112—Register structure comprising data of variable length
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
- G06F9/3016—Decoding the operand specifier, e.g. specifier format
Definitions
- the present invention relates generally to improvements to processing, and more particularly to advantageous techniques for providing a scalable building block register file which in a first application of the register file provides a low cost lower capacity register file, while in a second application, a higher capacity register file with dynamic reconfiguration support for flexible data type operations is provided.
- the present invention also relates to advantageous techniques for providing a dynamically reconfigurable register file of variable size width for different levels of data precision operations when executing algorithms demanding variable data types of variable precision requirements and for conducting multiple parallel operations on lower precision data in 32 bit and 64 bit forms.
- register file When executing algorithms it is desirable to have a register file that can be organized to more advantageously support processing of the varying data types and formats that dynamically occur in a programming application. For example, a register file of large width for high precision operations can be required in one part of an application while single and multiple parallel operations on lower precision data can be required in a different part of the same application.
- This desire is offset by the hardware cost to implement a wider register file or the hardware cost to implement additional read and write ports.
- the problem is how to achieve a dynamically configurable register file with extended precision at a reduced hardware cost without affecting general capabilities including performance.
- the present invention advantageously addresses these problems while achieving a variety of advantages as addressed in further detail below.
- two single wide register files each with the same number of registers, are used in combination to provide a single register model that uses less read and write ports individually than a single register file of twice the capacity would require. Due to the reduced size of the register files and reduced number of read and write ports, higher performance implementations can be achieved as compared to a single register file of equivalent combined capacity of data width and read and write ports.
- the architecture designates one reduced register file to contain even register addresses and the other to contain odd register addresses.
- the architecture designates one register file configured as two banks of registers wherein the even and odd registers are selectable by means of the read/write port address lines.
- an additional register set of at least one register can be dynamically associated with any register in the register file to flexibly provide extended precision data width to any selected file register.
- double width accesses are constrained to only work on even-odd register pairs thereby treating the two separate register files as a single addressable file of twice the width of an individual register.
- the even or odd register file is designated as containing the upper half of the bits in a double width access.
- Double width accesses may occur on the read, write operations, or both depending on the operation to be performed. In this way, the access width of the register file is doubled without the addition of costly read/write ports or more bits per each register and the number of required read and write ports per half is reduced.
- the double width register file achieved by this invention provides the single width accesses for a simpler programming model when dealing with data types of single width. Additionally, since the same number of read and write ports exist on both halves, single width accesses across the full even plus odd register address space are possible.
- FIG. 1A illustrates a first prior art register file arrangement
- FIG. 1B illustrates a second prior art register file arrangement
- FIG. 1C illustrates a first reconfigurable register file in accordance with the present invention
- FIGS. 1 D 1 and 1 D 2 illustrates an exemplary add instruction for use in conjunction with a reconfigurable register file
- FIG. 2 illustrates a ManArray indirect very long instruction word (iVLIW) processor in conjunction with a reconfigurable register file in accordance with the present invention
- FIG. 3A illustrates two x/ 2 extended precision registers used with the reconfigurable register file for extended precision
- FIG. 3B illustrates four x/ 4 extended precision registers used with the reconfigurable register file for extended precision
- FIGS. 3 C 1 and 3 C 2 illustrates an exemplary MPXYA instruction for use with a reconfigurable register file
- FIG. 4 illustrates two x/ 4 extended previsionprecision registers used with a building block register file that is a subset of the reconfigurable register file.
- 60/068,021 entitled “Methods and Apparatus for Scalable Instruction Set Architecture” filed Dec. 18, 1997
- Provisional Application Ser. No. 60/071,248 entitled “Methods and Apparatus to Dynamically Expand the Instruction Pipeline of a Very Long Instruction Word Processor” filed Jan. 12, 1998
- Provisional Application Ser. No. 60/072,915 entitled “Methods and Apparatus to Support Conditional Execution in a VLIW-Based Array Processor with Subword Execution” filed Jan. 28, 1988
- Provisional Application Ser. No. 60/088,148 entitled “Methods and Apparatus for ManArray PE-PE Switch Control” filed on Jun. 5, 1998
- 60/092,148 Metals and Apparatus for Dynamic Instruction Controlled Reconfigurable Register File with Extended Precision
- FIG. 1A depicts a first prior art register file arrangement 100 (Prior Art) 1 consisting of “n” registers R 0 . . . R(n ⁇ 1) 110 with four read data output ports, Rx 0 112 , Rx 1 114 , Rx 2 116 , and Rx 3 118 , each of x-bits width.
- Rx 0 112 there are two read ports, Rt 0 124 and Rt 1 126 , each of x-bits.width.
- a total of six x-bit width ports are required to provide double width accesses.
- the data bit width “x” is typically 8-bit, 16-bit, 32-bit, 64-bit and other sizes such as 9-bit, 18-bit etc.
- the register file read data output ports connect to an execution unit, for example a Multiply Accumulate Unit 120 consisting of a multiplier 121 and an accumulator 123 .
- the “ ⁇ ” symbol indicates a concatenation of input or output bus widths due to the granularity of the read and write ports of the register file 110 .
- Other execution unit types include Arithmetic Logic Units, specialized functional units, etc. as dictated by a particular processor architecture.
- FIG. 1B depicts a second prior art register file arrangement 200 (Prior Art 2) consisting of “n” registers R 0 . . . R(n ⁇ 1) 210 with three 2x-bit wide read ports, Rx 0 212 , Rx 1 214 , and Rx 2 216 .
- Rt 224 there is a single 2x-bit wide write port Rt 224 .
- a total of three 2x-bit width ports are required to provide double width accesses.
- the data bit width “2x” is typically 16-bit, 32-bit, 64-bit, 128-bit and other sizes such as 18-bit, 36-bit, etc.
- the register file read data output ports typically connect to an execution unit, for example a Multiply Accumulate Unit 220 consisting of a multiplier 221 and an accumulator 223 .
- the “N” registers must be of 2x width doubling the size of the register file which, in general, does not necessarily provide a significant performance advantage that would justify the added expense.
- FIG. 1C depicts a reconfigurable file and execution unit 300 in accordance with the present invention.
- the reconfigurable register file consists of a first portion or file 330 and a second portion or file 340 , each containing three single x-bit width read access ports, and a single x-bit write port.
- File 330 consists of read ports Ryo 332 , Rxo 334 , Rso 336 , and write port Rto 326 .
- File 340 consists of read ports Rye 342 , Rxe 344 , Rse 346 and write port Rte 324 .
- Multiplexers 301 and 302 allow single width accesses 352 and 354 , respectively, from either half of the composite register file.
- the MAU execution unit 320 consists of a multiplier 321 and an accumulator 323 .
- each register file portion 330 and 340 is required to have only three x-bit read ports and a single x-bit write port.
- a small number of ports for each register file portion are utilized to achieve x-bit width granularity of storage. Consequently, this design more readily supports a VLIW architecture containing multiple execution units.
- the critical path in the register file is reduced thereby improving its read and write access performance. It is noted that an n/ 4 register arrangement is also a feasible approach for low cost applications.
- the present invention does not sacrifice granularity of accesses, single width and double width accesses are optimally supported without increasing the register file size. This is important since all applications contain a control portion that typically has sequential execution with little or no parallelism and an algorithm execution portion that typically contains operations that can be executed in parallel.
- the parallel code portion can be operated upon by packed data operations and VLIW operations while the sequential control section usually requires single width data type operation support. It is consequently of great importance to efficiently support the sequential code data types as well as the parallel code data types.
- a reconfigurable register file provides this support.
- this novel register file design integrated into the instruction set architecture single width and double width instructions can be mixed on a cycle by cycle basis.
- the present invention allows the reconfigurable register file to be treated as a 32x32-bit register file in one cycle and a 16x64-bit register file in the very next cycle.
- VLIW ManArray indirect Very Long Instruction Word
- FIG. 2 there are 8 read ports and 4 write ports for each half of the reconfigurable register file 200 . These ports support single width 32-bit and in combination double width 64-bit accesses for any of the 5 execution units. Address and control logic are not shown in FIG. 2 to improve the clarity of illustration. It will be recognized that registers having numbers of bits (p), and different numbers of read ports (q) and write ports (r) may be employed.
- An exemplary instruction that takes advantage of this configuration of the register file is the 32-bit multiply-accumulate.
- the operation performed by this instruction is: Rto ⁇ Rte ⁇ (Rx*Ry)+Rto ⁇ Rte where Rx and Ry are 32-bit quantities and Rto ⁇ Rte is a 64-bit quantity.
- Rx and Ry are 32-bit quantities
- Rto ⁇ Rte is a 64-bit quantity.
- it would take 1(Rx)+1(Ry)+2(Rto ⁇ Rte) 4 32-bit read ports and 2(Rto ⁇ Rte ⁇ ) 32-bit write ports to accommodate this instruction.
- this same function can be implemented with 3 read ports and 1 write port per block by using even/odd pairs for the 64-bit quantities.
- the mux on the input to the functional unit is controlled to select the proper register file.
- Rt the add instruction executing on the ALU that performs the function: Rt ⁇ Rx+Ry where Rx, Ry, and Rt are 32-bit quantities. If Rx is R 1 , Ry is R 3 , and Rt is R 5 then the mux on the lower 32-bit inputs selects the odd register file for both inputs. Since the ALU has two read ports on the odd register file this operation is accomplished without any problems. The 32-bit write to R 5 is also easily accomplished by only enabling the write for the odd register file. Any combination of even or odd registers can be selected without restrictions. Extended Precision
- FIG. 3A illustrates a system 500 employing two (x/2)-bit registers 553 and 555 labeled XH 1 and XH 0 which are used to extend the precision of the accumulation operation that occurs in the Extended accumulator unit 523 .
- the Multiply with Extended Accumulate operation is defined in FIG. 3C FIGS. 3 C 1 and 3 C 2 which defines the MPYXA instruction.
- 3A is adapted for an 80-bit extended accumulate operation where a 32 ⁇ 32-bit multiply is carried out by multiplier 521 which produces a 64-bit result that is extended to 80-bits in the accumulate operation of extended accumulator 523 .
- This can be seen in FIG. 3A where depending upon the least significant bit (LSB) of the target register field in the MPYXA instruction, bit 17 of FIG. 3 C 3 C 1 , one of two extended precision registers XH 1 553 or XH 0 555 is selected via multiplexer 563 .
- the least significant bit of the Register Target field allows the extended precision register to be arbitrarily used with any pair of registers in the register file.
- the inputs of multiplexer 563 are the (x/2)-bit length extended precision input operands XH 0 552 and XH 1 554 .
- the multiplexer 563 selects XH 0 552 when its input control line 556 is a “0”.
- the multiplexer 563 selects XH 1 554 when its input control line 556 is a “1”.
- the output of multiplexer 563 is signal line 564 which is (x/2)-bits and is an input to the extended accumulator 523 .
- the extended output 566 is a partial sum of product value that is stored in the extended precision registers in preparation for the next multiply accumulate operation.
- the output 566 is written to either XH 1 553 or XH 0 555 under control of a Write (Wr) signal 562 .
- the pipeline stored LSB of the Rte field 551 is used to control the Wr signal via logical AND type function where the Wr 562 is passed onto the register depending on the state of the LSB.
- the AND gates 557 and 559 control this function, where the LSB input to AND 559 is an inverted version 561 of whatever bit appears on line 556 .
- the output of the AND gates 558 and 560 control the writing of the output extended precision data 566 to their extended precision registers.
- the extended precision registers XH 1 553 and XH 0 555 are part of the special purpose or miscellaneous registers that are used in the processor and consequently are load-able and read-able by the programmer.
- the read and write buses that accomplish this task for the programmer are not shown in FIG. 3A for reasons of clarity.
- FIG. 3B depicts a quad extended precision apparatus 600 supporting the MPYXA multiply with extended accumulate instruction of FIG. 3 C 3 C 1 which shows dual 40 bit accumulation 702 and double width 80 bit accumulation 703 701 .
- four (x/4)-bit registers are provided as partitions of two (x/2)-bit registers 653 and 655 labeled XB 3 and XB 2 in register 653 and XB 1 and XB 0 in register 655 .
- the four (x/4)-bit registers are used to extend the precision of the accumulation operation that occurs in the Extended accumulator units 621 and 625 .
- the Multiply with Extended Accumulate operation is defined in FIG.
- FIG. 3 C 3 C 1 which defines the MPYXA instruction for dual 40-bit extended accumulates 702 .
- the apparatus of FIG. 3B supports the dual 40-bit extended accumulate operation where two 16 ⁇ 16-bit multiplies 619 and 623 each produce a 32-bit result that are each extended to 40-bits in the accumulate operations performed by accumulators 621 and 625 , respectively.
- This operation can be seen in FIG. 3B where depending upon the least significant bit (LSB) of the target register field in the MPYXA instruction, bit 17 of FIG. 3 C 3 C 1 , one of two extended precision registers XB 3 and XB 2 653 or XB 1 and XB 0 655 are selected via multiplexers 663 and 665 .
- LSB least significant bit
- the least significant bit of the Register Target field allows the extended precision register to be arbitrarily used with any pair of registers in the register file. This powerful but simple feature allows a programmer to utilize any pair of registers for an extended precision operation without any mode control or specialized accumulator hardware added to the architecture.
- the input of multiplexers 663 and 665 are the (x/24)-bit length extended precision input operands XB 0 622 and XB 2 626 for multiplexer 663 , and XB 1 624 and XB 3 628 for multiplexer 665 .
- the multiplexer 663 selects XB 0 622 when its input control line 630 is a “0”.
- the multiplexer 665 selects XB 1 624 when its input control line 630 is a “0”.
- the multiplexer 663 selects XB 2 626 when its input control line 630 is a “1”.
- the multiplexer 665 selects XB 3 628 when its input control line 630 is a “1”.
- the output 670 of multiplexer 663 is (x/4)-bits and serves an input to the extended accumulator 621 .
- the extended output 636 is a partial sum of product value that is stored in the extended precision registers in preparation for the next multiply accumulate operation.
- the output 672 of multiplexer 665 is (x/4)-bits and serves as an input to the extended accumulator 625 .
- the extended output 638 is a partial sum of product value that is stored in the extended precision registers in preparation for the next multiply accumulate operation.
- the output 636 is written to either XB 2 or XB 0 and the output 638 is written to either XB 3 or XB 1 all under control of a Write (Wr) signal 648 .
- the pipeline stored LSB of the Rte field 651 is used to control the Wr signal via a logical AND type function where the Wr 648 is passed onto the register depending on the state of the LSB.
- the AND gates 657 and 659 control this function, where the LSB input to AND 659 is an inverted 661 version of 630 .
- the output of the AND gates 632 and 634 control the writing of the output extended precision data 636 and 638 to their extended precision registers.
- the partitioned extended precision registers 653 and 655 are part of the special purpose or miscellaneous registers that are used in the processor and consequently are load-able and read-able by the programmer.
- the read and write buses that accomplish this task for the programmer are not shown in FIG. 3B for reasons of clarity.
- the present approach allows dual accumulations of 40-bits of precision for dual 16 ⁇ 16 multiply-accumulates, as specified in the MPYXA instruction FIG. 3 C and for the exemplary apparatus shown in FIG. 3 B. For 32 ⁇ 32 multiply-accumulate operations, 80-bits of precision are available for the accumulation.
- the extended precision concept can be further extended to support quad 20 bit accumulations where x is 16-bits and there are 4 extended precision bits.
- the concept can be further generalized by using more than one x-bit extended precision register and basing the selection of the register extended precision portions on more than the single LSB of the Instruction Rte field. Since a single 32-bit extended precision register provides support for up to two 80-bit extended accumulate operations and up to four 40-bit extended accumulate operations, further extensions, even though feasible, for practical reasons presently appear to be of limited use.
- a processor can be designed utilizing a subset of the ManArray architecture that is based upon a single 16 ⁇ 32 register file, i.e. one of the building blocks for a reconfigurable register file. Dual 8 ⁇ 32 register files can be also used to create a reconfigurable 16 ⁇ 32 register file.
- An important aspect is that a low cost register file design point can be reached by subsetting the ManArray architecture that allows future growth into higher performance processors that remain code compatible with the lower cost subset design.
- An exemplary apparatus 700 implementing this use of the extended precision concept with a single register file design is shown in FIG. 4 .
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Executing Machine-Instructions (AREA)
Abstract
Description
Rto∥Rte←(Rx*Ry)+Rto∥Rte
where Rx and Ry are 32-bit quantities and Rto∥Rte is a 64-bit quantity. In a traditional non-split 32-bit wide register file implementation, it would take 1(Rx)+1(Ry)+2(Rto∥Rte)=4 32-bit read ports and 2(Rto∥Rte←) 32-bit write ports to accommodate this instruction. However, using the two register file blocks described above, this same function can be implemented with 3 read ports and 1 write port per block by using even/odd pairs for the 64-bit quantities.
Rt←Rx+Ry
where Rx, Ry, and Rt are 32-bit quantities. If Rx is R1, Ry is R3, and Rt is R5 then the mux on the lower 32-bit inputs selects the odd register file for both inputs. Since the ALU has two read ports on the odd register file this operation is accomplished without any problems. The 32-bit write to R5 is also easily accomplished by only enabling the write for the odd register file. Any combination of even or odd registers can be selected without restrictions.
Extended Precision
Claims (23)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/827,697 USRE40883E1 (en) | 1998-07-09 | 2004-04-19 | Methods and apparatus for dynamic instruction controlled reconfigurable register file with extended precision |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US9214898P | 1998-07-09 | 1998-07-09 | |
US09/169,255 US6343356B1 (en) | 1998-10-09 | 1998-10-09 | Methods and apparatus for dynamic instruction controlled reconfiguration register file with extended precision |
US09/796,037 US6430677B2 (en) | 1998-07-09 | 2001-02-28 | Methods and apparatus for dynamic instruction controlled reconfigurable register file with extended precision |
US10/827,697 USRE40883E1 (en) | 1998-07-09 | 2004-04-19 | Methods and apparatus for dynamic instruction controlled reconfigurable register file with extended precision |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/796,037 Reissue US6430677B2 (en) | 1998-07-09 | 2001-02-28 | Methods and apparatus for dynamic instruction controlled reconfigurable register file with extended precision |
Publications (1)
Publication Number | Publication Date |
---|---|
USRE40883E1 true USRE40883E1 (en) | 2009-08-25 |
Family
ID=22614859
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/169,255 Expired - Lifetime US6343356B1 (en) | 1998-07-09 | 1998-10-09 | Methods and apparatus for dynamic instruction controlled reconfiguration register file with extended precision |
US09/796,037 Ceased US6430677B2 (en) | 1998-07-09 | 2001-02-28 | Methods and apparatus for dynamic instruction controlled reconfigurable register file with extended precision |
US10/827,697 Expired - Lifetime USRE40883E1 (en) | 1998-07-09 | 2004-04-19 | Methods and apparatus for dynamic instruction controlled reconfigurable register file with extended precision |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/169,255 Expired - Lifetime US6343356B1 (en) | 1998-07-09 | 1998-10-09 | Methods and apparatus for dynamic instruction controlled reconfiguration register file with extended precision |
US09/796,037 Ceased US6430677B2 (en) | 1998-07-09 | 2001-02-28 | Methods and apparatus for dynamic instruction controlled reconfigurable register file with extended precision |
Country Status (1)
Country | Link |
---|---|
US (3) | US6343356B1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080191882A1 (en) * | 2007-02-14 | 2008-08-14 | Nec (China) Co., Ltd. | Radio frequency identification system and method |
US20080284570A1 (en) * | 2005-04-25 | 2008-11-20 | Seung Hyup Ryoo | Reader Control System |
US20140237218A1 (en) * | 2011-12-19 | 2014-08-21 | Vinodh Gopal | Simd integer multiply-accumulate instruction for multi-precision arithmetic |
Families Citing this family (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6405273B1 (en) * | 1998-11-13 | 2002-06-11 | Infineon Technologies North America Corp. | Data processing device with memory coupling unit |
WO2001067234A2 (en) * | 2000-03-08 | 2001-09-13 | Sun Microsystems, Inc. | Vliw computer processing architecture having a scalable number of register files |
US7111156B1 (en) * | 2000-04-21 | 2006-09-19 | Ati Technologies, Inc. | Method and apparatus for multi-thread accumulation buffering in a computation engine |
EP1311945A1 (en) * | 2000-08-22 | 2003-05-21 | Jean-Paul Theis | A configurable register file with multi-range shift register support |
US7072929B2 (en) * | 2000-11-01 | 2006-07-04 | Pts Corporation | Methods and apparatus for efficient complex long multiplication and covariance matrix implementation |
GB2382886B (en) * | 2001-10-31 | 2006-03-15 | Alphamosaic Ltd | Vector processing system |
GB2382673B (en) * | 2001-10-31 | 2005-10-26 | Alphamosaic Ltd | A vector processing system |
AU2003215860A1 (en) * | 2002-04-10 | 2003-10-20 | Koninklijke Philips Electronics N.V. | Data processing system |
US7660841B2 (en) * | 2004-02-20 | 2010-02-09 | Altera Corporation | Flexible accumulator in digital signal processing circuitry |
US7650374B1 (en) * | 2004-03-02 | 2010-01-19 | Sun Microsystems, Inc. | Hybrid multi-precision multiplication |
CN1333356C (en) * | 2004-07-23 | 2007-08-22 | 中国人民解放军国防科学技术大学 | Write serialization and resource duplication combined multi-port register file design method |
TW200625097A (en) * | 2004-11-17 | 2006-07-16 | Sandbridge Technologies Inc | Data file storing multiple date types with controlled data access |
US8561037B2 (en) * | 2007-08-29 | 2013-10-15 | Convey Computer | Compiler for generating an executable comprising instructions for a plurality of different instruction sets |
US9710384B2 (en) | 2008-01-04 | 2017-07-18 | Micron Technology, Inc. | Microprocessor architecture having alternative memory access paths |
US9015399B2 (en) | 2007-08-20 | 2015-04-21 | Convey Computer | Multiple data channel memory module architecture |
US8095735B2 (en) | 2008-08-05 | 2012-01-10 | Convey Computer | Memory interleave for heterogeneous computing |
US8583897B2 (en) * | 2009-02-02 | 2013-11-12 | Arm Limited | Register file with circuitry for setting register entries to a predetermined value |
US8423745B1 (en) | 2009-11-16 | 2013-04-16 | Convey Computer | Systems and methods for mapping a neighborhood of data to general registers of a processing element |
US20110153995A1 (en) * | 2009-12-18 | 2011-06-23 | Electronics And Telecommunications Research Institute | Arithmetic apparatus including multiplication and accumulation, and dsp structure and filtering method using the same |
US8914619B2 (en) | 2010-06-22 | 2014-12-16 | International Business Machines Corporation | High-word facility for extending the number of general purpose registers available to instructions |
US20110320765A1 (en) * | 2010-06-28 | 2011-12-29 | International Business Machines Corporation | Variable width vector instruction processor |
CN101930355B (en) * | 2010-08-24 | 2013-07-24 | 中国航天科技集团公司第九研究院第七七一研究所 | Register circuit realizing grouping addressing and read write control method for register files |
CN101930356B (en) * | 2010-08-24 | 2013-03-20 | 中国航天科技集团公司第九研究院第七七一研究所 | Method for group addressing and read-write controlling of register file for floating-point coprocessor |
US10430190B2 (en) | 2012-06-07 | 2019-10-01 | Micron Technology, Inc. | Systems and methods for selectively controlling multithreaded execution of executable code segments |
US9652233B2 (en) * | 2013-08-20 | 2017-05-16 | Apple Inc. | Hint values for use with an operand cache |
US9459869B2 (en) * | 2013-08-20 | 2016-10-04 | Apple Inc. | Intelligent caching for an operand cache |
CN103984524B (en) * | 2014-05-15 | 2016-07-06 | 中国航天科技集团公司第九研究院第七七一研究所 | A kind of three port floating-point register towards risc processor |
GB2522290B (en) * | 2014-07-14 | 2015-12-09 | Imagination Tech Ltd | Running a 32-bit operating system on a 64-bit machine |
US20170371662A1 (en) * | 2016-06-23 | 2017-12-28 | Intel Corporation | Extension of register files for local processing of data in computing environments |
US11709681B2 (en) | 2017-12-11 | 2023-07-25 | Advanced Micro Devices, Inc. | Differential pipeline delays in a coprocessor |
US11567554B2 (en) * | 2017-12-11 | 2023-01-31 | Advanced Micro Devices, Inc. | Clock mesh-based power conservation in a coprocessor based on in-flight instruction characteristics |
US11243905B1 (en) * | 2020-07-28 | 2022-02-08 | Shenzhen GOODIX Technology Co., Ltd. | RISC processor having specialized data path for specialized registers |
CN114008603B (en) * | 2020-07-28 | 2024-09-13 | 深圳市汇顶科技股份有限公司 | Data path block circuit and method of using the same |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4302818A (en) | 1979-07-10 | 1981-11-24 | Texas Instruments Incorporated | Micro-vector processor |
US4713749A (en) * | 1985-02-12 | 1987-12-15 | Texas Instruments Incorporated | Microprocessor with repeat instruction |
US4774688A (en) * | 1984-11-14 | 1988-09-27 | International Business Machines Corporation | Data processing system for determining min/max in a single operation cycle as a result of a single instruction |
US5072418A (en) * | 1989-05-04 | 1991-12-10 | Texas Instruments Incorporated | Series maxium/minimum function computing devices, systems and methods |
US5644780A (en) | 1995-06-02 | 1997-07-01 | International Business Machines Corporation | Multiple port high speed register file with interleaved write ports for use with very long instruction word (vlin) and n-way superscaler processors |
US5903919A (en) | 1997-10-07 | 1999-05-11 | Motorola, Inc. | Method and apparatus for selecting a register bank |
US6044448A (en) | 1997-12-16 | 2000-03-28 | S3 Incorporated | Processor having multiple datapath instances |
US6078941A (en) * | 1996-11-18 | 2000-06-20 | Samsung Electronics Co., Ltd. | Computational structure having multiple stages wherein each stage includes a pair of adders and a multiplexing circuit capable of operating in parallel |
US6134648A (en) | 1996-03-15 | 2000-10-17 | Micron Technology, Inc. | Method and apparatus for performing an operation mulitiple times in response to a single instruction |
US6223255B1 (en) * | 1995-02-03 | 2001-04-24 | Lucent Technologies | Microprocessor with an instruction level reconfigurable n-way cache |
-
1998
- 1998-10-09 US US09/169,255 patent/US6343356B1/en not_active Expired - Lifetime
-
2001
- 2001-02-28 US US09/796,037 patent/US6430677B2/en not_active Ceased
-
2004
- 2004-04-19 US US10/827,697 patent/USRE40883E1/en not_active Expired - Lifetime
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4302818A (en) | 1979-07-10 | 1981-11-24 | Texas Instruments Incorporated | Micro-vector processor |
US4774688A (en) * | 1984-11-14 | 1988-09-27 | International Business Machines Corporation | Data processing system for determining min/max in a single operation cycle as a result of a single instruction |
US4713749A (en) * | 1985-02-12 | 1987-12-15 | Texas Instruments Incorporated | Microprocessor with repeat instruction |
US5072418A (en) * | 1989-05-04 | 1991-12-10 | Texas Instruments Incorporated | Series maxium/minimum function computing devices, systems and methods |
US6223255B1 (en) * | 1995-02-03 | 2001-04-24 | Lucent Technologies | Microprocessor with an instruction level reconfigurable n-way cache |
US5644780A (en) | 1995-06-02 | 1997-07-01 | International Business Machines Corporation | Multiple port high speed register file with interleaved write ports for use with very long instruction word (vlin) and n-way superscaler processors |
US6134648A (en) | 1996-03-15 | 2000-10-17 | Micron Technology, Inc. | Method and apparatus for performing an operation mulitiple times in response to a single instruction |
US6078941A (en) * | 1996-11-18 | 2000-06-20 | Samsung Electronics Co., Ltd. | Computational structure having multiple stages wherein each stage includes a pair of adders and a multiplexing circuit capable of operating in parallel |
US5903919A (en) | 1997-10-07 | 1999-05-11 | Motorola, Inc. | Method and apparatus for selecting a register bank |
US6044448A (en) | 1997-12-16 | 2000-03-28 | S3 Incorporated | Processor having multiple datapath instances |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8378790B2 (en) | 2005-04-25 | 2013-02-19 | Lg Electronics Inc. | Reader control system |
US20080290993A1 (en) * | 2005-04-25 | 2008-11-27 | Seung Hyup Ryoo | Reader Control System |
US8482389B2 (en) | 2005-04-25 | 2013-07-09 | Lg Electronics Inc. | Reader control system |
US8508343B2 (en) | 2005-04-25 | 2013-08-13 | Lg Electronics Inc. | Reader control system |
US20090051493A1 (en) * | 2005-04-25 | 2009-02-26 | Kongsberg Automotive As | Reader control system |
US20090219143A1 (en) * | 2005-04-25 | 2009-09-03 | Seung Hyup Ryoo | Reader control system |
US20110072318A1 (en) * | 2005-04-25 | 2011-03-24 | Seung Hyup Ryoo | Reader control system |
US20110068907A1 (en) * | 2005-04-25 | 2011-03-24 | Seung Hyup Ryoo | Reader control system |
US20110156882A1 (en) * | 2005-04-25 | 2011-06-30 | Seung Hyup Ryoo | Reader control system |
US20110156881A1 (en) * | 2005-04-25 | 2011-06-30 | Seung Hyup Ryoo | Reader control system |
US8115604B2 (en) | 2005-04-25 | 2012-02-14 | Lg Electronics Inc. | Reader control system |
US8115595B2 (en) * | 2005-04-25 | 2012-02-14 | Lg Electronics Inc. | Reader control system |
US9679172B2 (en) | 2005-04-25 | 2017-06-13 | Lg Electronics Inc. | Reader control system |
US20080284570A1 (en) * | 2005-04-25 | 2008-11-20 | Seung Hyup Ryoo | Reader Control System |
US20080316019A1 (en) * | 2005-04-25 | 2008-12-25 | Seung Hyup Ryoo | Reader Control System |
US8598989B2 (en) | 2005-04-25 | 2013-12-03 | Lg Electronics Inc. | Reader control system |
US8604913B2 (en) | 2005-04-25 | 2013-12-10 | Lg Electronics Inc. | Reader control system |
US8624712B2 (en) | 2005-04-25 | 2014-01-07 | Lg Electronics Inc. | Reader control system |
US8653948B2 (en) | 2005-04-25 | 2014-02-18 | Lg Electronics Inc. | Reader control system |
US8665066B2 (en) | 2005-04-25 | 2014-03-04 | Lg Electronics Inc. | Reader control system |
US8698604B2 (en) | 2005-04-25 | 2014-04-15 | Lg Electronics Inc. | Reader control system |
US8749355B2 (en) | 2005-04-25 | 2014-06-10 | Lg Electronics Inc. | Reader control system |
US9672395B2 (en) | 2005-04-25 | 2017-06-06 | Lg Electronics Inc. | Reader control system |
US20080191882A1 (en) * | 2007-02-14 | 2008-08-14 | Nec (China) Co., Ltd. | Radio frequency identification system and method |
US9235414B2 (en) * | 2011-12-19 | 2016-01-12 | Intel Corporation | SIMD integer multiply-accumulate instruction for multi-precision arithmetic |
US20140237218A1 (en) * | 2011-12-19 | 2014-08-21 | Vinodh Gopal | Simd integer multiply-accumulate instruction for multi-precision arithmetic |
Also Published As
Publication number | Publication date |
---|---|
US6343356B1 (en) | 2002-01-29 |
US20010011342A1 (en) | 2001-08-02 |
US6430677B2 (en) | 2002-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
USRE40883E1 (en) | Methods and apparatus for dynamic instruction controlled reconfigurable register file with extended precision | |
US8069337B2 (en) | Methods and apparatus for dynamic instruction controlled reconfigurable register file | |
US6061779A (en) | Digital signal processor having data alignment buffer for performing unaligned data accesses | |
US4814976A (en) | RISC computer with unaligned reference handling and method for the same | |
US4654781A (en) | Byte addressable memory for variable length instructions and data | |
US8918445B2 (en) | Circuit which performs split precision, signed/unsigned, fixed and floating point, real and complex multiplication | |
US5287532A (en) | Processor elements having multi-byte structure shift register for shifting data either byte wise or bit wise with single-bit output formed at bit positions thereof spaced by one byte | |
US7127588B2 (en) | Apparatus and method for an improved performance VLIW processor | |
US7308559B2 (en) | Digital signal processor with cascaded SIMD organization | |
US8250348B2 (en) | Methods and apparatus for dynamically switching processor mode | |
US20020116567A1 (en) | Efficient I-cache structure to support instructions crossing line boundaries | |
US20070239970A1 (en) | Apparatus For Cooperative Sharing Of Operand Access Port Of A Banked Register File | |
US5513363A (en) | Scalable register file organization for a computer architecture having multiple functional units or a large register file | |
US7051186B2 (en) | Selective bypassing of a multi-port register file | |
US20200394038A1 (en) | Look up table with data element promotion | |
CN108139911B (en) | Conditional execution specification of instructions using conditional expansion slots in the same execution packet of a VLIW processor | |
WO1997032249A1 (en) | System for performing arithmetic operations with single or double precision | |
WO2001098893A1 (en) | Generation of memory addresses utilizing scheme registers | |
US6915411B2 (en) | SIMD processor with concurrent operation of vector pointer datapath and vector computation datapath | |
US7340591B1 (en) | Providing parallel operand functions using register file and extra path storage | |
WO2000068783A2 (en) | Digital signal processor computation core | |
US7107302B1 (en) | Finite impulse response filter algorithm for implementation on digital signal processor having dual execution units | |
GB2108737A (en) | Byte addressable memory for variable length instructions and data | |
US7246218B2 (en) | Systems for increasing register addressing space in instruction-width limited processors | |
CN115373744A (en) | RISC-V based encoding method for expanding VM instruction set |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ALTERA CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PTS CORPORATION;REEL/FRAME:018184/0423 Effective date: 20060824 Owner name: ALTERA CORPORATION,CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:PTS CORPORATION;REEL/FRAME:018184/0423 Effective date: 20060824 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
REFU | Refund |
Free format text: REFUND - SURCHARGE FOR LATE PAYMENT, LARGE ENTITY (ORIGINAL EVENT CODE: R1554); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: REFUND - PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: R1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
SULP | Surcharge for late payment | ||
FPAY | Fee payment |
Year of fee payment: 12 |
|
REMI | Maintenance fee reminder mailed |