EP1709527A1 - Microprocessor instruction to enable access of a virtual buffer in circular fashion - Google Patents
Microprocessor instruction to enable access of a virtual buffer in circular fashionInfo
- Publication number
- EP1709527A1 EP1709527A1 EP05789271A EP05789271A EP1709527A1 EP 1709527 A1 EP1709527 A1 EP 1709527A1 EP 05789271 A EP05789271 A EP 05789271A EP 05789271 A EP05789271 A EP 05789271A EP 1709527 A1 EP1709527 A1 EP 1709527A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- register
- index
- zero
- address
- instruction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
- 239000000872 buffer Substances 0.000 claims abstract description 155
- 230000006870 function Effects 0.000 claims description 17
- 238000000034 method Methods 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 2
- 238000004590 computer program Methods 0.000 claims 2
- 238000010586 diagram Methods 0.000 description 17
- 238000013507 mapping Methods 0.000 description 7
- 238000013519 translation Methods 0.000 description 5
- 238000003860 storage Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- 101100054780 Strongylocentrotus purpuratus CYIIB gene Proteins 0.000 description 2
- 101100478107 Strongylocentrotus purpuratus SPEC3 gene Proteins 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000977 initiatory effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 241000699670 Mus sp. Species 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000007853 buffer solution Substances 0.000 description 1
- 229940057344 bufferin Drugs 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/355—Indexed addressing
- G06F9/3552—Indexed addressing using wraparound, e.g. modulo or circular addressing
Definitions
- the present invention relates in general to microprocessor architecture, and more particularly to a single microprocessor instruction that enables access to a virtual buffer in a memory associated with the microprocessor in circular fashion using address index values and one or more general purpose registers.
- Circular buffers are commonly used in many Digital Signal Processing (DSP) filters and other similar algorithms and applications.
- DSP Digital Signal Processing
- the most common DSP operation is the implementation of a filter function which achieves in the digital domain what an analog filter would achieve in the analog domain. Since digital values are discrete, the filter operation attempts to emulate the operation of the analog filter using a method in which a number "n" of input values are used at each computation step. Although discontinuity in signal values does not occur in the analog domain, such discontinuity can occur in discrete digital computations. In an attempt to mitigate the negative effects of the discontinuity in signal values, each filter summation uses one new sample value and n-1 old sample values.
- This type of filter operation is best done using a circular buffer Docket MIPS:0197-00-PCT 2 which simply adds the new value to the current position in the buffer and reuses all the old values without re-copying them to a new buffer. And this type of filter computation is so common that substantially all digital signal processors provide support for hardware circular buffers. Otherwise, the overhead of copying n-1 values for each outer loop of the filter summation would almost always be necessary, which would significantly reduce efficiency and performance.
- the buffer size was often limited to a power of two so that the buffer size could be represented as an exponent.
- the "exponent of two" buffer size representation often resulted in significant waste of memory resources. For example, if a buffer size of 10 kilobytes (KB) was required, the buffer size had to be 16 KB since the next smaller buffer size of 8 KB was not adequate.
- conventional configurations including the arithmetic solutions required stricter limits on the location of the buffer, such as alignment with the data size of the memory.
- a processor is configured to enable access of a virtual buffer in circular fashion using at least one register and logic which manipulates indexes to enable addressing of the elements in the buffer.
- the processor includes at least one register which stores an address index, a last element offset and a decrement and logic which executes a circular buffer instruction.
- the logic compares the address index to zero, modifies the Docket MIPS:0197-00-PCT 4 address index to the last element offset if the address index is zero, and modifies the address index by the decrement if the address index is not zero.
- the logic replaces the address index with the last element offset, or otherwise adds the last element offset to the address index, when the address index is zero, or subtracts the decrement from the address index if the address index is not zero.
- a base address points to a first or base element of the circular buffer located in memory.
- the address index when added to the base address, provides a pointer to specific elements in the circular buffer.
- the last element offset is also an index, such that when added to the base address, provides a pointer to the last element at the "top" of the circular buffer.
- the decrement corresponds with the size of each element, so that modifying the address index by the decrement enables addressing of the sequential elements of the buffer.
- the use of relative indexes eliminates complicated arithmetic computations. Rather than performing circular or modular address arithmetic operations to calculate buffer element addresses, the address index is simply compared to zero. When the address index reaches zero, it is modified with or otherwise replaced by the last element offset to wrap or roll around to the top element of the circular buffer.
- the address index, last element offset and decrement may be stored in a single register or multiple registers, such as the general purpose registers (GPRs) of the processor.
- GPRs general purpose registers
- a first GPR stores the address and at least one other GPR stores the last element offset and the decrement.
- the logic while executing the circular buffer instruction, retrieves the address index and the decrement from at least one second GPR, determines whether the first GPR is zero, loads the first Docket MIPS:0197-00-PCT 5
- the logic determines whether the first GPR is zero, loads a third GPR with the last element offset if the first GPR is zero, and subtracts the decrement from the third GPR if the first GPR is not zero.
- the instruction itself identifies the one or more registers used in the instruction.
- the last element offset and decrement are stored in a first register and the address index is stored in a second register.
- the circular buffer instruction identifies a first register storing the address index, a second register storing the last element offset and the decrement, and a third register providing a destination for a result of modifying the address index.
- a microprocessor system includes a microprocessor and a memory.
- the microprocessor includes at least one register and an execution unit that executes program instructions.
- the memory stores a buffer and the instructions which enable access to the buffer in circular fashion, where the instructions include at least one first instruction and a modular subtraction instruction.
- At least one first instruction causes the execution unit to load at least one register with an address index to enable addressing of elements of the buffer, an offset index to enable addressing of a last element in the buffer, and a decrement value indicative of the size of the elements in the buffer.
- the modular subtraction instruction causes the execution unit to determine whether the address index is zero, to load a register with the offset index if Docket MIPS :0197-00-PCT 6 the address index is zero, and to reduce the address index by the decrement value if the address index is not zero.
- the address index, the decrement value and the offset index may be stored in a single register or multiple registers.
- one or more of the registers are selected from the general purpose registers (GPRs) of the microprocessor.
- GPRs general purpose registers
- a first register stores the address index and a second register stores the offset index and the decrement value.
- the modular subtraction instruction causes the execution unit to determine whether the first register holds a zero value, to load the first register with the offset index if the first register holds a zero value, and to subtract the decrement value from the first register if it does not hold a zero value.
- the modular subtraction instruction causes the execution unit to determine whether the first register holds a zero value, to load a third register with the offset index if the first register holds a zero value, and to subtract the decrement value from the first register and store the result into the third register if the first register does not hold a zero value.
- the modular subtraction instruction includes at least one field identifying the registers used while executing the instruction.
- the modular subtraction instruction includes a first field identifying a source register for storing the address index, a second field identifying a target register for storing the offset index and the decrement value, and a third field identifying a destination register for storing a result of the modular subtraction instruction. Docket MIPS:0197-00-PCT 7
- the execution unit of the microprocessor may further employ a base pointer to locate the buffer in the memory.
- the execution unit adds the address index to the base pointer to address the elements of the buffer.
- a modular subtraction instruction for execution on a microprocessor having at least one general purpose register includes opcode bits for designating the modular subtraction instruction, and operand bits for designating at least one general purpose register storing an offset index, a decrement value, and an address index.
- the address index is modified by the decrement value if the address index is not zero and is modified by the offset index if the address index is zero.
- the opcode bits include a first opcode field denoting an extended instruction set, a function field specifying a subclass of instructions, and a second opcode field specifying the modular subtraction instruction.
- the operand bits include a first field identifying a source register for storing the address index, a second field identifying a target register storing the offset index and the decrement value, and a third field identifying a destination register.
- the source register is decremented by the decrement value and the result is stored in the destination register if the source register is not zero, or the offset index is stored in the destination register if the source register is zero.
- the first and third fields may identify the same register as the source and destination registers.
- a method of enabling access to a buffer in memory of a processing system in circular fashion with a single instruction includes loading a roll-around index, an address index and a decrement value into at least one register and executing a buffer instruction. Executing the buffer instruction further includes determining whether the address index is zero, modifying the address index by the decrement value if the address index is not zero, and updating the address index with the roll-around index if the address index is zero.
- the method may include loading the roll-around index and the decrement value into a first register, and initializing a second register with an initial address index.
- the initializing may include clearing the second register or .loading the second register with the roll-around index.
- the method may include any one or more of determining whether a register holds a zero value, subtracting the decrement value from the contents of the register, and loading the roll-around index value into the register.
- the method may include comparing the address index to zero.
- FIG. 1 is a simplified block diagram of a microprocessor system implemented according to an exemplary embodiment of the present invention including a microprocessor configured to implement and access elements of a virtual circular buffer with a single instruction; Docket MIPS :0197-00-PCT 9
- FIGs 2A and 2B are more detailed block diagrams exemplary embodiments of the circular buffer of FIG. 1;
- FIG. 3 is a simplified block diagram illustrating multiple circular buffers implemented in virtual memory of the program which is mapped to the physical memory of the microprocessor of FIG. 1;
- FIG. 4 is a block diagram of an exemplary configuration of the registers for use with the MODSUB instruction of FIG. 1;
- FIG. 5 is a block diagram illustrating an exemplary instruction encoding of the MODSUB instruction of FIG. 1 for the MIPS32® or MIPS64® architectures including the MIPS® DSP ASE;
- FIGs 6A and 6B are flowchart diagrams illustrating a process of initiating and executing the MODSUB instruction of FIG. 1 including operation performed by the microprocessor according to an exemplary embodiment of the present invention
- FIG. 6A is a flowchart diagram illustrating high-level user and program functions using the MODSUB instruction
- FIG. 6B is a flowchart diagram illustrating the internal processor functionality during each execution of the MODSUB instruction of FIG. 6A;
- FIG. 7 shows an exemplary 40-tap block FIR filter written in the C programming language without the MODSUB instruction; Docket MIPS:0197-00-PCT 10
- FIG. 8 shows an exemplary version of the same 40-tap block FIR filter written in the C programming language and optimized with the MODSUB instruction
- FIGs 9A, 9B and 9C collectively show the same 40-tap block FIR filter written using assembly code for the MIPS32® architecture and hand-tuned to achieve optimal performance for the MIPS32® 24K microprocessor architecture and without using the MODSUB instruction.
- the inventors of the present application have recognized the need to enable implementation of circular buffers in program memory of a microprocessor system that allows maximum flexibility with minimal constraints. They have therefore developed a single microprocessor instruction that enables an implementation of a virtual circular buffer anywhere in memory using general purpose registers without the conventional Docket MIPS:0197-00-PCT 11 constraints on the number of buffers or the size of each buffer, as will be further described below with respect to FIGS 1 - 9C.
- FIG. 1 is a simplified block diagram of a microprocessor system 100 including a microprocessor 101 implemented according to an exemplary embodiment of the present invention.
- the microprocessor 101 is configured to implement and access elements of a virtual circular buffer 113 with a single instruction 111, which is referred to herein as the modular subtraction (MODSUB) instruction 111.
- MODSUB modular subtraction
- FIG. 1 is a simplified block diagram of a microprocessor system 100 including a microprocessor 101 implemented according to an exemplary embodiment of the present invention.
- the microprocessor 101 is configured to implement and access elements of a virtual circular buffer 113 with a single instruction 111, which is referred to herein as the modular subtraction (MODSUB) instruction 111.
- MODSUB modular subtraction
- FIG. 1 is a simplified block diagram of a microprocessor system 100 including a microprocessor 101 implemented according to an exemplary embodiment of the present invention.
- the microprocessor 101 is configured to implement and access elements of a virtual circular buffer 113 with
- the microprocessor 101 is coupled to one or more input/output (I/O) devices 102 and to a memory 103, which stores the circular buffer 113 and a program 104 containing one or more instructions including the MODSUB instruction 111.
- the microprocessor 101 includes a memory controller (MC) 105 for interfacing the memory 103 and at least one execution unit 107 for performing functions and computations indicated by the program instructions.
- the microprocessor 101 includes one or more registers 109 for storing and manipulating data values and variables as controlled by instructions. Any type of register is contemplated, such as including one or more general purpose registers (GPRs) or the like.
- GPRs general purpose registers
- the microprocessor 101 conforms substantially to a microprocessor architecture from MIPS Technologies, Inc., such as according to either of the MIPS32® or MIPS64® architectures, in which the selected architecture may further be extended by a Digital Signal Processor (DSP) Application-Specific Extension (ASE). Docket MIPS:0197-00-PCT 12
- DSP Digital Signal Processor
- ASE Application-Specific Extension
- the DSP ASE is an extension of the basic MIPS®TM microprocessor core and is integrated therewith and thus incorporated on the same core integrated circuit (IC) or chip at core synthesis.
- the DSP ASE extension to the core enables the same core to perform extended DSP functions rather than requiring a separate coprocessor.
- the MODSUB instruction 111 is a DSP ASE instruction synthesized into the same core of the microprocessor 101 and included within the core instruction set. It is appreciated, however, that the present invention is not limited to MIPS® microprocessor architectures or extensions, and may be used by other processors or processing logic and the like, in which it is desired to implement one or more circular buffers.
- the MODSUB instruction 111 may be implemented as part of the core instruction set, or may be implemented separately as part of a coprocessor. All such configurations are possible and contemplated without falling outside the scope of the present invention.
- the microprocessor system 100 may be implemented as a computer system, including but not limited to a personal computer, workstation computer, server computer, notebook computer, personal digital assistant, file server, print server, enterprise server, and the like.
- the microprocessor system 100 may also include an embedded system, including but not limited to a set-top box, intelligent peripheral device, automobile embedded system, embedded system in an appliance, mass storage controller, and the like.
- the I/O devices 102 include devices and components for receiving data as input for provision to the microprocessor 101 for processing, including but not limited to user Docket MIPS:0197-00-PCT 13 input.
- the I/O devices 102 also comprise devices for receiving from the microprocessor 101 results of the processing and for outputting the results, including but not limited to user output.
- the I/O devices 102 may include, but are not limited to direct memory access controllers, timers, clocks, interrupt controllers, serial port controllers, parallel port controllers, USB port controllers, IEEE 1394 controllers, SCSI controllers, Fibre Channel controllers, floppy disk controllers, hard disk controllers, graphics controllers, display devices, keyboards, mice, scanners, plotters, printers, floppy disk drives, hard disk drives, optical storage devices, tape drives, digital cameras, and the like, or any combination thereof.
- the memory 103 includes any suitable storage medium memory for storing program instructions and data to be processed by the microprocessor 101, including but not limited to, dynamic random access memory (DRAM), static random access memory (SRAM), synchronous DRAM (SDRAM), double-data rate SDRAM (DDR-SDRAM), Rambus DRAM (RDRAM), read-only memory (ROM), programmable read only memory (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), FLASH memory, and the like, or any combination thereof.
- DRAM dynamic random access memory
- SRAM static random access memory
- SDRAM synchronous DRAM
- DDR-SDRAM double-data rate SDRAM
- RDRAM Rambus DRAM
- ROM read-only memory
- PROM programmable read only memory
- EPROM erasable PROM
- EEPROM electrically erasable PROM
- FLASH memory and the like, or any combination thereof.
- the memory 103 stores the virtual circular buffer 113 and the program 104 which includes the MODSU
- the MC 105 may include any combination of a memory-management unit (MMU) (not shown), a translation lookaside buffer (TLB) (not shown), a fixed mapping translation (FMT) (not shown), etc., as known to those skilled in the art.
- MMU memory-management unit
- TLB translation lookaside buffer
- FMT fixed mapping translation
- the MODSUB instruction 111 is fetched by the MC 105 of the microprocessor 101 and forwarded for Docket MIPS:0197-00-PCT 14 execution by the execution unit 107 to generate and use the circular buffer 113 as further described below.
- the execution unit 107 may include any combination of an arithmetic/logic unit (ALU) (not shown), a multiply/divide unit (MDU) (not shown) and similar type functional units as known to those skilled in the art.
- ALU arithmetic/logic unit
- MDU multiply/divide unit
- the MODSUB instruction 111 is forwarded to and executed by an ALU within the execution unit 107.
- FIG. 2A is a more detailed block diagram of an exemplary embodiment of the circular buffer 113.
- the circular buffer 113 is located between a lower address (LA) and an upper address (UA).
- a bottom pointer (BP) is set equal to LA and an address (ADDR) index is added to BP, such as by the execution unit 107, to generate a current pointer (CP).
- the CP is used to address the buffer elements in the circular buffer 113 in order to store (write) data into or retrieve (read) data from the circular buffer 113.
- N there is a number "N" of elements in the circular buffer 113, numbered from a first element EO, pointed to by BP, to a top element EN.
- Each element EO-EN in the circular buffer 113 has an equal size, and the last or top element EN is addressed when CP is set to equal to a last-buffer-element pointer (LBEP), which is equivalent to BP plus a last-buffer-element (LBE) index.
- LBE index is a last element offset or a roll- around index or a wrap-around index such that CP effectively loops in circular buffer 113 by decrementing ADDR to zero and then back to LBE.
- FIG. 2B is a more detailed block diagram of another exemplary embodiment of the circular buffer 113 with two different addresses or pointers IN and OUT.
- Docket MIPS:0197-00-PCT 15 rather than a single ADDR value for a single element pointer CP, a first address value ADDRl is used to define an input pointer IN and a second address value ADDR2 is used to define an output pointer OUT.
- the use of an input pointer and an output pointer is common for circular buffers, such as when implementing a f ⁇ rst-in, first-out (FIFO) queue in which elements are written into the buffer using the IN pointer and retrieved from the buffer using the OUT pointer.
- FIFO f ⁇ rst-in, first-out
- the multiple pointer embodiments are achieved using a first register to store ADDRl for the IN pointer and a second register to store ADDR2 for the OUT pointer.
- the MODSUB instruction 111 is programmed to provide the appropriate register depending upon the operation being performed and the buffer element being accessed.
- FIG. 3 is a simplified block diagram illustrating multiple circular buffers implemented in virtual memory of the program which is mapped to the physical memory of the microprocessor 101.
- the microprocessor 101 is capable of physically specifying a range of addresses, shown as processor physical memory 301.
- the program code being executed by the microprocessor 101 such as, for example, from the program 104, is mapped within the processor physical memory 301 in a space, shown using a dotted area, referred to as program virtual memory 303.
- This mapping can be achieved using a standard address translation mechanism, such as the TLB or using predetermined fixed mapping via an MMU.
- Three "virtual" circular buffers 305 (Circular Buffer A, shown in a space with lines slanting to the right), 307 (Circular Buffer B, shown in a space with lines slanting to the left) and 309 (Circular Buffer B, shown in a space with cross-hatched Docket MIPS:0197-00-PCT 16 lines) are shown located within the program virtual memory 303, each implemented and configured in a similar manner as the circular buffer 113.
- the memory 103 typically defines a significantly smaller addressable space than the processor physical memory 301 or even the program virtual memory 303.
- the memory 103 includes consecutive addressable locations mapped into the processor physical memory 301 of the microprocessor 101.
- the MC 105 performs the mapping and address translation functions between the microprocessor 101 and the memory 103 to enable proper execution of programs, such as the program 104, and generation of virtual circular buffers, such as the virtual circular buffers 305, 307 and 309 within the program memory.
- Such mapping and address translation schemes are known and are not discussed further herein.
- any number of circular buffers may be defined and the number of circular buffers simultaneously used is limited only by the number of registers employed.
- FIG. 4 is a block diagram of an exemplary configuration of the registers 109 used with the MODSUB instruction 111.
- a target register "rt” stores the LBE index in a first field 401 and a decrement (DEC) value in a second field 403.
- the DEC value is equivalent to the size of each element in the circular buffer 113.
- a source register “rs” stores the ADDR index and a destination register “rd” stores a destination (DEST) value.
- the particular sizes of the registers, the register fields, the indexes and the values are a matter of design choice considerations that are determined appropriate for the particular configuration (or microprocessor core) being implemented. Docket MIPS:0197-00-PCT 17
- the registers rt, rs and rd are each 32-bit general purpose registers with bit zero "O" at the right-most position and bit 31 at the left-most position.
- the field 403 storing the DEC value is an 8-bit field including bits 0 to 7 (or 7:0) of the rt register, referred to as rt[7:0].
- the field 401 storing the LBE index is a 16-bit field including bits 8 to 23 of the rt register, or rt[23:8]. The remaining or upper portion of the rt register (bits rt[31:24]) is not used or otherwise ignored.
- the ADDR index is up to a 32-bit value stored within the rs register, or rs[31:0].
- the DEST value is up to a 32-bit value stored within the rd register, or rd[31 :0].
- the ADDR index and the DEST value are limited to 16-bits for practical purposes, although larger values are contemplated.
- the particular sizes and locations of the indexes and values may be modified depending upon the specific implementation. For example, the relative sizes may be the same or doubled for a 64-bit configuration. Also, the LBE index and DEC value could be stored in two different registers rather than in different fields of the same register.
- FIG. 5 is a block diagram illustrating an exemplary instruction encoding 501 of the MODSUB instruction 111 for the MIPS32® or MIPS64® architectures including the MIPS® DSP ASE.
- the present invention is not limited to the particular architecture illustrated or its instruction encoding, format or specific operation; similar or comparable Docket MIPS:0197-00-PCT 18 instruction encoding, formats and operations are envisioned for different microprocessor architectures.
- the left-most 6 bits 31:26 define an opcode field containing a special SPEC3 major opcode that allows further sub- decoding by extending the opcode mapping.
- the extended instruction set for the DSP ASE are decoded according to predetermined opcode mapping.
- the execution unit 107 decodes the last 6 bits 5:0 defining a function field specifying a subclass of instructions denoted herein as OP2.
- the function field may specify a DSP ASE instruction
- the OP2 opcode specifies another subset of instructions defined in an operation (op) field located at bit locations 10:6, specifying the MODSUB instruction 111, which completes the opcode encoding.
- the 5 bits 25:21 define a source register field identifying one of the registers 109 as the source register rs. In this case, 5 bits are used to identify one of a total 32 GPRs as the rs register.
- the next 5 bits 20: 16 define a target register field identifying one of the registers 109 as the target register rt.
- the next 5 bits 15:11 define a destination register field identifying one of the registers 109 as the destination register rd.
- the source and destination register fields may define the same register as both the source and destination.
- a pointer BP is set equal to the lower address LA of the circular buffer 113.
- the size of the circular buffer 113, or SIZE when added to BP, would address the upper address UA at the top of the circular buffer 112.
- the DEC value essentially defines the size of each buffer element.
- the LBE index is an offset address that is added to BP to form the pointer LBEP which points to the top element (or last data Docket MIPS:0197-00-PCT 19 value) in the circular buffer 113, such as element EN shown in FIG. 1.
- the ADDR index is added to the base address BP by the microprocessor 101 to obtain a corresponding pointer, such as the current pointer CP or an input pointer IN or an output pointer OUT, etc., where each pointer enables access to a corresponding element in the circular buffer 113.
- the ADDR index is decremented by DEC to point to the next data value and then the result is added to BP by the execution unit 107 to obtain the absolute virtual address of that buffer element.
- ADDR is decremented to zero, it is rolled or wrapped back to the LBE index to enable access to the last buffer element EN at the top of the circular buffer 113.
- An exemplary instruction format of the MODSUB instruction 111 employing the instruction encoding 501 and the exemplary configuration of the registers 109 is as follows:
- the illustrated instruction fo ⁇ nat includes 3 operation lines performed by the execution unit 107 of the microprocessor 101 when executing the MODSUB instruction 111.
- the LBE index is retrieved from the field 201 of the rt register. Note that the value in the rt register is shown as being retrieved, right-shifted 8 bits (denoted Docket MIPS:0197-00-PCT 20 by "»”), and bitwise ANDed (denoted by "&") with the 16-bit HEX value "ffff (preceded with hexadecimal notation "Ox").
- the DEC value is retrieved from the field 203 of the it register and bitwise ANDed with the 8-bit HEX value "ff '.
- the third operation line illustrates the MODSUB instruction operation using the retrieved operands.
- the DEST value is provided in the rd register
- the ADDR index is provided in the rs register
- the LBE index and the DEC value are specified in the rt register.
- the illustrated operation description shows a modular subtraction performed on the ADDR index using the specified DEC value and the LBE index as the modular roll-around value.
- the ADDR value (or register rs) is checked for a zero. If ADDR is zero, then it has reached the bottom of the circular buffer 113 and it is rolled back to point to the top element in the buffer by resetting it to LBE.
- the 16 bits in the rt register, or rt[23:8] are loaded into the rightmost bits of the rd register, or rd[15:0], and the higher bits of the rd register, or rd[31:16] are set to all zeros. If the ADDR value is not zero, then the ADDR value is decremented by DEC, or ADDR - DEC, and stored in the rd register as the DEST value. If the rd register is defined as the same register rs, then the rs register is updated with the new ADDR value in either case. Docket MIPS.-0197-00-PCT 21
- the source and destination registers may be defined as the same register or different registers. If a separate destination register rd is defined, then, after execution of the MODSUB instruction 111, the contents of the rd register may be copied into the rs register to update the rs register to point to the next element in the circular buffer 113. Thus, the ADDR index is either decremented by the DEC value or updated with the LBE index and the result placed in the rd register leaving the rs register unmodified.
- a separate rd register may be defined to keep the rs register temporarily unmodified if for any reason it is desired to conveniently address the prior buffer element again using the rs register while addressing the next element using the rd register.
- the MODSUB instruction 111 assumes that the DEC value is a multiple of the buffer size and that the value zero (0) is eventually reached when the MODSUB instruction 111 is called repeatedly. If this is not the case, then the MODSUB instruction 111 will not do a modular wrap-around to the last element in the buffer, which would potentially cause memory corruption and a memory fault in the application using the MODSUB instruction 111. If an index load word instruction used in conjunction with the MODSUB Docket MIPS:0197-00-PCT 22 instruction 111 uses an index value that is not a multiple of the buffer element, then an address error exception occurs. An address error exception also occurs if a negative index pointer generates an invalid address value.
- a safe programming practice would be to check for a negative index value after the call to the MODSUB instruction 111 during code development (in a #ifdef ERROR_CHECK, for example), and to optionally not compile this error checking code in the final production code.
- the typical size of the data operand is either 2 bytes or 4 bytes, which is also the specified DEC value. It is noted that the buffer start value, or the BP pointer, need only be aligned to the natural width of the data element in the circular buffer 113.
- FIGs 6A and 6B are flowchart diagrams illustrating a process of initiating and executing the MODSUB instruction 111 including operation performed by the microprocessor 101 according to an exemplary embodiment of the present invention. One or more of the blocks may be re-ordered without modifying the basic modular subtract function.
- FIG. 6A is a flowchart diagram illustrating high-level user and program functions using the MODSUB instruction.
- the LBE index and DEC value are defined and stored into one or more registers.
- the size of the circular buffer, or SIZE need not be explicitly defined since implicitly defined by the LBE index and the DEC value.
- the number of registers used depends upon the relative size of the registers and the LBE index and DEC values as previously described.
- the location of the lower address LA of the circular buffer 113 is determined and BP is set equal to LA to point to the beginning of the circular buffer 113.
- the register holding the ADDR index is initialized, such as being reset or otherwise Docket MIPS:0197-00-PCT 23 defined with an initial value.
- the rs register is cleared (all O's) or loaded with the LBE index, or preset, if desired, to point to any element within the circular buffer 113. Also, if multiple ADDR indexes are desired for a given circular buffer, each index or its corresponding buffer is initialized.
- FIG. 6B is a flowchart diagram illustrating the internal processor functionality during each execution of the MODSUB instruction at block 607 of FIG. 6A.
- the microprocessor 101 determines whether ADDR is zero, and then modifies ADDR as shown in either block 615 or block 617. In particular, if ADDR is not zero, operation proceeds to block 615 in which the microprocessor 101 decrements ADDR by DEC, such as performed by a subtraction operation. An addition operation is contemplated in an alternative embodiment (e.g., increment).
- ADDR is zero as Docket MIPS:0197-00-PCT 24 determined at block 613
- operation proceeds instead to block 617 in which the microprocessor 101 sets ADDR equal to LBE (or otherwise replaces ADDR with LBE).
- the MODSUB instruction 111 is complete after ADDR is updated at block 615 or block 617, and operation "returns" or otherwise proceeds to block 609.
- the maximum size of the circular buffer 113 defined by the LBE index is 64 KB. It is appreciated, however, that any number of bits may be employed to define the buffer size, so that any practicable size may be defined.
- the size of the circular buffer 113 is not restricted to be a power of two (2), although for proper operation for most configurations is divisible by two.
- the MODSUB instruction 111 may be used to create a circular buffer with 5K half-word (2 byte) elements for a buffer size of 10 KB.
- SIZE is 10 KB
- DEC is 2
- LBE is set to 10 KB - 2.
- the circular buffer 113 may be located anywhere in memory 103 as long as the starting address is aligned to the natural width of the data elements in the buffer (e.g., where the data width in bytes is 1, 2, 4, 8, etc.).
- special registers are typically defined so that the total number of circular buffers that can be implemented is limited or otherwise predetermined.
- a circular buffer instruction implemented according to an embodiment of the present invention does not require special registers so that GPRs may be used to Docket MIPS:0197-00-PCT 25 implement each circular buffer.
- the total number of circular buffers defined is theoretically unlimited using the MODSUB instruction 111, and the number of circular buffers used simultaneously is limited only by the total number of registers defined for the particular microprocessor.
- FIG. 7 shows an exemplary 40-tap block FIR filter written in the C programming language without the MODSUB instruction. This program was compiled with compiler optimization using the MIPS32® instruction set and the resulting code required 620 cycles per element (cycles/element).
- FIG. 8 shows an exemplary version of the same 40-tap block FIR filter written in the C programming language and optimized with the MODSUB instruction.
- FIGs 9A, 9B and 9C collectively show the same 40-tap block FIR filter written using assembly code for the MIPS32® architecture and hand-tuned to achieve optimal performance for the MIPS32® 24K microprocessor architecture and without using the MODSUB instruction. This version required 401 cycles/element.
- the C code for both versions was compiled for the MIPS32® microprocessor employing the DSP ASE, the first without the MODSUB instruction and the second with the MODSUB instruction.
- the version without the MODSUB instruction required 256 cycles/element during execution, which was superior to both the C version and the handwritten assembly versions based on the MIPS32® instruction set without DSP ASE.
- the compiled version of the C code employing the DSP ASE with the MODSUB instruction used only 214 cycles/element during execution, which provides a substantial improvement over the other versions previously described.
- the version compiled for DSP ASE with the MODSUB instruction reduced the number of cycles/element by more than 15% compared to similar code compiled for DSP ASE without the MODSUB instruction, and reduced the number of cycles/element by almost half as compared to the hand-optimized version of assembly code shown in FIGs 9A - 9C. Reducing the number of cycles per element significantly enhances performance of the microprocessor 101 when performing DSP functions. The performance increase is multiplied by the number of elements of each buffer and further multiplied by the number of buffers employed.
- the MODSUB instruction simplifies and reduces the source code, and further optimizes performance by significantly reducing the number of cycles/element required to perform the same filter function as compared to conventional configurations without the MODSUB instruction.
- Programming resources are optimized and code performance is maximized employing a MODSUB instruction implemented according to an embodiment of the present invention. Docket MIPS:0197-00-PCT 27
- a single register is sufficient if the total number of bits for the ADDR, DEC and LBE values are less than or equal to the register size.
- the sizes of at least one of the values can be reduced to accommodate all three.
- the ADDR index may be stored in the upper double-word of the same register holding LBE and DEC in the lower double-word.
- the sizes of each of the values may also be increased, such as, for example, doubled in the 64-bit case as compared to the 32-bit case illustrated.
- multiple ADDR values may be employed for the same circular buffer, such as an input address IN and a separate output address OUT. Additional registers are used if multiple pointers are defined for the same circular buffer. If multiple address or pointers are defined for a common circular buffer, the source register field for each MODSUB instruction is programmed with the appropriate value to access the corresponding register and address type for each call. Although only one circular buffer is described in detail, any number of circular buffers may be defined and the number simultaneously used is limited only by the total number of registers or GPRs.
- the MODSUB instruction has been defined as a DSP extension to the basic set of microprocessor instructions, but could be incorporated as part of the basic instruction set if desired. Incorporation into the primary instruction set architecture (ISA) may be valuable in some configurations since the use of circular buffers is not limited to DSP functions.
- the DSP ASE is synthesized into the same core Docket MIPS:0197-00-PCT 28 as the primary microprocessor, but could also be implemented as a coprocessor instruction in a microprocessor system utilizing a coprocessor.
- the present invention and its benefits, features and advantages have been described in detail, other embodiments are encompassed by the invention.
- the invention can be embodied in software (e.g., computer readable code, program code, instructions and/or data) disposed, for example, in a computer usable (e.g., readable) medium.
- software e.g., computer readable code, program code, instructions and/or data
- a computer usable (e.g., readable) medium e.g., readable
- this can be accomplished through the use of general programming languages (e.g., C, C++, JAVA, etc.), GDSII databases, hardware description languages (HDL) including Verilog HDL, VHDL, and so on, or other available programs, databases, and/or circuit (i.e., schematic) capture tools.
- Such software can be disposed in any known computer usable (e.g., readable) medium including semiconductor memory, magnetic disk, optical disc (e.g., CD-ROM, DVD-ROM, etc.) and as a computer data signal embodied in a computer usable transmission medium (e.g., carrier wave or any other medium including digital, optical, or analog-based medium).
- the software can be transmitted over communication networks including the Internet and intranets.
- the invention can be embodied in software (e.g., in HDL as part of a semiconductor intellectual property core, such as a microprocessor core, or as a system-level design, such as a System on Chip or SOC) and transformed to hardware as part of the production Docket MIPS:0197-00-PCT 29 of integrated circuits. Also, the invention may be embodied as a combination of hardware and software.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Advance Control (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/956,498 US7873810B2 (en) | 2004-10-01 | 2004-10-01 | Microprocessor instruction using address index values to enable access of a virtual buffer in circular fashion |
PCT/US2005/028773 WO2006073512A1 (en) | 2005-08-11 | 2005-08-11 | Microprocessor instruction to enable access of a virtual buffer in circular fashion |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1709527A1 true EP1709527A1 (en) | 2006-10-11 |
Family
ID=35636767
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP05789271A Ceased EP1709527A1 (en) | 2004-10-01 | 2005-08-11 | Microprocessor instruction to enable access of a virtual buffer in circular fashion |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1709527A1 (en) |
GB (1) | GB0607209D0 (en) |
WO (1) | WO2006073512A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10817802B2 (en) * | 2016-05-07 | 2020-10-27 | Intel Corporation | Apparatus for hardware accelerated machine learning |
CN114911526B (en) * | 2022-06-01 | 2024-08-27 | 中国人民解放军国防科技大学 | Brain-like processor based on brain-like instruction set and application method thereof |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0992880A1 (en) * | 1998-10-06 | 2000-04-12 | Texas Instruments Inc. | Circular buffer management |
EP1039370B1 (en) * | 1999-03-19 | 2005-10-26 | Freescale Semiconductor, Inc. | Modulo address generator and a method for implementing modulo addressing |
US6782447B2 (en) | 1999-12-17 | 2004-08-24 | Koninklijke Philips Electronics N.V. | Circular address register |
US20030061464A1 (en) * | 2001-06-01 | 2003-03-27 | Catherwood Michael I. | Digital signal controller instruction set and architecture |
-
2005
- 2005-08-11 EP EP05789271A patent/EP1709527A1/en not_active Ceased
- 2005-08-11 WO PCT/US2005/028773 patent/WO2006073512A1/en active Application Filing
-
2006
- 2006-04-10 GB GBGB0607209.4A patent/GB0607209D0/en not_active Ceased
Non-Patent Citations (5)
Title |
---|
"TMS320C3x User?s Guide - Excerpts: front page and pages 1-2 ~ 1-4; 2-9 ~ 2-10; 6-21 ~ 6-25.", July 1997, TEXAS INSTRUMENTS * |
ANON: "Orthogonal Data Flow for High-Concurrency in a Digital Signal Processor", IBM-TDB, vol. 38, no. 4, April 1995 (1995-04-01), Armonk, NY, US, pages 383 - 387, XP000516188 * |
FURBER, S.B.: "VLSI RISC ARCHITECTURE AND ORGANIZATION", 1989, MARCEL DEKKER, NY, US * |
Retrieved from the Internet <URL:http://www.oopweb.com/Assembly/Documents/ArtOfAssembly/Volume/Chapter_4/CH04-1.html#HEADING1-12> * |
See also references of WO2006073512A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO2006073512A1 (en) | 2006-07-13 |
GB0607209D0 (en) | 2006-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4485198B2 (en) | Processor system with Java accelerator | |
US10514922B1 (en) | Transfer triggered microcontroller with orthogonal instruction set | |
US7243213B2 (en) | Process for translating instructions for an arm-type processor into instructions for a LX-type processor; relative translator device and computer program product | |
JP2616182B2 (en) | Data processing device | |
US20120311303A1 (en) | Processor for Executing Wide Operand Operations Using a Control Register and a Results Register | |
US20010010072A1 (en) | Instruction translator translating non-native instructions for a processor into native instructions therefor, instruction memory with such translator, and data processing apparatus using them | |
WO2010004242A2 (en) | Data processing apparatus, for example using vector pointers | |
GB2317469A (en) | Data processing system register control | |
US5502827A (en) | Pipelined data processor for floating point and integer operation with exception handling | |
US7873810B2 (en) | Microprocessor instruction using address index values to enable access of a virtual buffer in circular fashion | |
WO2009073542A1 (en) | Enhanced microprocessor or microcontroller | |
WO1998012626A1 (en) | Data processing condition code flags | |
GB2317465A (en) | Data processing apparatus registers | |
JP2847974B2 (en) | Data processing device | |
GB2317467A (en) | Input operand control in data processing systems | |
JPH0215331A (en) | Data processor | |
US6986028B2 (en) | Repeat block with zero cycle overhead nesting | |
WO2009073532A1 (en) | Enhanced microprocessor or microcontroller | |
JPH07120284B2 (en) | Data processing device | |
US6581150B1 (en) | Apparatus and method for improved non-page fault loads and stores | |
EP1709527A1 (en) | Microprocessor instruction to enable access of a virtual buffer in circular fashion | |
JP2001501001A (en) | Input operand control in data processing systems | |
JP2001501755A (en) | Data processing condition code flag | |
JPH04260929A (en) | Data processor | |
JP2668456B2 (en) | Bit search circuit and data processing device having the same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20060410 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR |
|
AX | Request for extension of the european patent |
Extension state: AL BA HR MK YU |
|
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1097929 Country of ref document: HK |
|
RBV | Designated contracting states (corrected) |
Designated state(s): DE FR GB |
|
DAX | Request for extension of the european patent (deleted) | ||
RBV | Designated contracting states (corrected) |
Designated state(s): DE FR GB |
|
17Q | First examination report despatched |
Effective date: 20080201 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED |
|
18R | Application refused |
Effective date: 20100325 |
|
REG | Reference to a national code |
Ref country code: HK Ref legal event code: WD Ref document number: 1097929 Country of ref document: HK |