US6782470B1 - Operand queues for streaming data: A processor register file extension - Google Patents

Operand queues for streaming data: A processor register file extension Download PDF

Info

Publication number
US6782470B1
US6782470B1 US09/706,899 US70689900A US6782470B1 US 6782470 B1 US6782470 B1 US 6782470B1 US 70689900 A US70689900 A US 70689900A US 6782470 B1 US6782470 B1 US 6782470B1
Authority
US
United States
Prior art keywords
queue
register
operand
registers
register file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime, expires
Application number
US09/706,899
Inventor
Stefan G. Berg
Michael S. Grow
Weiyun Sun
Donglok Kim
Yongmin Kim
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Washington
Original Assignee
University of Washington
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Washington filed Critical University of Washington
Priority to US09/706,899 priority Critical patent/US6782470B1/en
Assigned to WASHINGTON, UNIVERSITY OF reassignment WASHINGTON, UNIVERSITY OF ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BERG, STEFAN G., KIM, DONGLOK, GROW, MICHAEL S., KIM, YONGMIN, SUN, WEIYUN
Application granted granted Critical
Publication of US6782470B1 publication Critical patent/US6782470B1/en
Adjusted expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30134Register stacks; shift registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30141Implementation provisions of register files, e.g. ports
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3824Operand accessing
    • G06F9/3826Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage
    • G06F9/3828Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage with global bypass, e.g. between pipelines, between clusters

Definitions

  • This invention relates to processor architecture and image processing applications, and more particularly to the register file(s) in a mediaprocessor.
  • VLIW Very Long Instruction Word
  • One method of improving cache performance is to place prefetched data into stream buffers instead of the cache.
  • One kind of a stream buffer is a FIFO prefetch buffer, which holds consecutive cache blocks. If a memory address produces a miss in the cache but a hit in the stream buffer, the data are moved from the stream buffer into the cache instead of having to go out to the external memory. Since many algorithms use multiple streams at a time, multi-way stream buffers have been developed. Multiway stream buffers are a group of stream buffers in parallel, which allow multiple streams to be prefetched concurrently.
  • FIG. 1 shows a typical architecture with stream buffers.
  • the stream buffers discussed above require a large additional silicon area for storing the streamed data.
  • the storage area of unused queues is wasted. Accordingly, there is need of a more efficient and effective architecture for handling streams of data.
  • a processor includes a register file with a dynamically configurable operand queue extension.
  • the register file is configured by a user's application program into registers and operand queues.
  • the program designer determines how the register file is to be configured. Specifically, the programmer determines the trade-off between the number and size of the operand queue(s) versus the number of registers to be available to the program.
  • all or a portion of the register file is allocatable into registers, and a portion of the register file is allocatable into zero or more operand queues.
  • an additional address bit is used for each register address to define whether it is functioning as a register or part of an operand queue.
  • the application program sets the locations and depth of each operand queue within the register file.
  • a given queue occupies a consecutive set of registers, although multiple queues need not occupy consecutive registers.
  • queue state logic maintains operand queue status information, such as a header pointer, tail pointer, start address, end address and number of vacancies for a given operand queue.
  • the size of the queue can be optimized for efficient use of silicon. Wasted area as for conventional stream buffers is avoided.
  • the number of queues and the depth of each queue can vary, (e.g., one function may need three queues each with a depth of ten while another function may need five queues each with a depth of six). Furthermore, in the case where no queues are needed, there is no silicon sitting unused since the operand queue memory can be used for general-purpose registers.
  • FIG. 1 is a block diagram of a conventional memory hierarchy including a stream buffer
  • FIG. 2 is a block diagram of a processor having a register file with an operand queue extension according to an embodiment of this invention
  • FIG. 3 is a diagram of a register file 14 ;
  • FIG. 4 is a diagram of an address for accessing the register file 14 ;
  • FIG. 5 is a diagram of an exemplary configuration of the register file into multiple registers and operand queues.
  • FIG. 6 is a diagram of another exemplary configuration of the register file into multiple registers and operand queues.
  • a processor 10 includes one or more functional units 12 , one or more register files 14 , on-chip memory 16 , a memory controller 18 , one or more operand queue decoders 20 , and queue state logic 21 .
  • the processor 10 typically is part of a computing system which includes external (e.g., off-chip) memory 22 . External memory 22 is accessed through an input/output port 23 of the processor.
  • the processor 10 performs computing tasks on data stored in the register file 14 .
  • a typical processor instruction includes an instruction operand and one or more source operands.
  • the source operands include data that is loaded from on-chip memory 16 or external memory 22 into the register file 14 .
  • the data is accessed from registers or operand queues of the register file 14 .
  • the register file 14 is formed by a plurality of general purpose registers 24 .
  • the register file 14 is configured dynamically to provide a plurality of registers 24 and zero or more operand queues 26 .
  • the processor 10 performs a computation on the source operands and stores a result in a destination operand, (a register of the register file 14 ).
  • the processor 10 is based on a single instruction multiple data (‘SIMD’) architecture, a VLIW architecture, a RISC architecture, or a combination of such architectures.
  • SIMD single instruction multiple data
  • the register file 14 is formed by a plurality of general purpose registers 24 . Each register 24 is addressable and accessible by one or more functional units 12 .
  • the register file includes a plurality of access ports. There is an access port for moving data into or out of the register file from on-chip memory 16 and external memory 22 .
  • the functional units 12 use two source operands and one destination operand.
  • there are three access ports to the register file per functional unit 12 one for each source operand and one for the destination operand.
  • the register file 14 includes an operand queue extension. Specifically, the registers 24 are dynamically configured to serve as general purpose registers or as part of an operand queue 26 .
  • the operand queue 26 is implemented by including an operand queue decoder 20 in the register address lines between the functional unit and the register file.
  • queue state logic 21 is coupled to the operand queue decoders 20 .
  • For each access port of the register file 14 there is an operand queue decoder 20 .
  • Each register 24 is configured as a general purpose register or part of an operand queue using a bit in its address line.
  • a 64-bit register typically has a 6-bit address. Instead, 7 bits are used.
  • a prescribed bit e.g., the seventh bit
  • the remaining six bits 30 correspond to a register address, and identify which one of the 64 registers is addressed.
  • the prescribed bit referred to herein as a configuration bit 32
  • the address is used to access the queue state logic 21 and determine the status information for the addressed operand queue 26 .
  • the queue state logic 21 stores the status information for each operand queue 26 .
  • the status information includes a head pointer, a tail pointer, a start address, an end address, and a number of vacancies for each operand queue.
  • the size of the queue state logic 21 depends on the number of operand queues which are allowed to be defined at a given time.
  • the queue state logic 21 is coupled to every operand queue decoder 20 .
  • the operand queue decoder 20 receives the address value from a corresponding functional unit along address lines 23 . If the value is a general purpose register address, then the corresponding register 24 is accessed. If instead the value refers to an operand queue 26 , then the decoder 20 inquires with the queue state logic 21 along lines 25 for the current head or tail pointer of the particular queue (e.g., depending on whether a read or write instruction is being processed by the corresponding functional unit 12 ). Such pointer is the address for one of the registers 24 which is part of the operand queue 26 . Such register then is accessed.
  • FIGS. 5-6 show exemplary configurations of the register file 14 . These configurations are for illustrative purposes only. The actual configurations are defined by an application program designer as desired. Such configurations may change dynamically from program to program and within a given program. Each application computer program dynamically determines the configuration of the register file 14 . Any given operand queue 26 spanning multiple registers in length uses consecutively addressed registers. However, different operand queues can be separated by registers which serve as general purpose registers and not part of an operand queue, as shown in FIG. 6 . The size of any given operand queue is independent of the size selected for other operand queues and may vary from queue to queue.
  • the data stored in these queues 26 are mapped over the existing register file 14 . While this approach may appear to restrict the number of registers 24 available to the programmer when the queues 26 are used, it provides a more flexible approach because the programmer can balance between the size of the operand queues 26 and the number of general purpose registers 24 available to the program.
  • the existing datapath can be reused without the need for multiplexers to choose between a register file and queues.
  • the registers can be arbitrarily partitioned among a plurality of operand queues and general purpose registers. The only restriction is that an individual queue occupy a consecutive set of registers.
  • the location and depth of a queue 26 within the register file 14 is set with a single instruction.
  • the value is read from the head of the queue.
  • Read accesses are either a POP (i.e., fetch the first (oldest) element of a queue, removing that element from the queue in the process) or a TOP operation (i.e., fetch the first (oldest) element of the queue while leaving that element in the queue in the process).
  • POP i.e., fetch the first (oldest) element of a queue, removing that element from the queue in the process
  • a TOP operation i.e., fetch the first (oldest) element of the queue while leaving that element in the queue in the process.
  • destination operands are written to the tail of an operand queue. If the queue is full, the functional unit stalls until the queue has space for the new value.
  • Data is read from or written to the register file 14 by a functional unit 12 along data lines 27 .
  • the same queue can be used as both source and destination operands, it is more common for one side of the queue to be accessed by the memory controller for transfer of on-chip or external memory data. If one of multiple queues 26 is used as a source operand, the memory controller 18 streams the data into the queue that is subsequently read out by the functional unit 12 . If the queue 26 is used as a destination operand, then the functional unit 12 streams a set of results into the queue while the memory controller 18 reads these results from the queue and stores them in memory 16 / 22 .

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

The register file of a processor includes embedded operand queues. The configuration of the register file into registers and operand queues is defined dynamically by a computer program. The programmer determines the trade-off between the number and size of the operand queue(s) versus the number of registers used for the program. The programmer partitions a portion of the registers into one or more operand queues. A given queue occupies a consecutive set of registers, although multiple queues need not occupy consecutive registers. An additional address bit is included to distinguish operand queue addresses from register addresses. Queue state logic tracks status information for each queue, including a header pointer, tail pointer, start address, end address and number of vacancies value. The program sets the locations and depth of a given operand queue within the register file.

Description

BACKGROUND OF THE INVENTION
This invention relates to processor architecture and image processing applications, and more particularly to the register file(s) in a mediaprocessor.
Built-in parallelism in superscalar and Very Long Instruction Word (‘VLIW’) architectures allows mediaprocessors, such as Philips Trimedia processor and Hitachi/Equator Technologies MAP, to perform multiple operations per clock cycle. The multimedia data processed by these mediaprocessors are typically supplied by streams of data. A stream is a sequence of data with predictable addresses. This attribute makes a stream a good candidate for cache prefetching.
Processing these streams in a cache-based system, however, is inefficient for two main reasons. First, many data streams have little temporal locality. For example, a video data stream is not used again in many cases. This makes placing them in a data cache wasteful. Second, many streams have a non-unit stride, which results in transferring the data to the cache and never referencing much of the data. Stride is the distance between successive steam elements.
One method of improving cache performance is to place prefetched data into stream buffers instead of the cache. One kind of a stream buffer is a FIFO prefetch buffer, which holds consecutive cache blocks. If a memory address produces a miss in the cache but a hit in the stream buffer, the data are moved from the stream buffer into the cache instead of having to go out to the external memory. Since many algorithms use multiple streams at a time, multi-way stream buffers have been developed. Multiway stream buffers are a group of stream buffers in parallel, which allow multiple streams to be prefetched concurrently. FIG. 1 shows a typical architecture with stream buffers.
The stream buffers discussed above require a large additional silicon area for storing the streamed data. In addition, the storage area of unused queues is wasted. Accordingly, there is need of a more efficient and effective architecture for handling streams of data.
SUMMARY OF THE INVENTION
According to the invention, a processor includes a register file with a dynamically configurable operand queue extension. The register file is configured by a user's application program into registers and operand queues. The program designer determines how the register file is to be configured. Specifically, the programmer determines the trade-off between the number and size of the operand queue(s) versus the number of registers to be available to the program.
According to one aspect of the invention, all or a portion of the register file is allocatable into registers, and a portion of the register file is allocatable into zero or more operand queues. In one embodiment an additional address bit is used for each register address to define whether it is functioning as a register or part of an operand queue.
According to another aspect of the invention, the application program sets the locations and depth of each operand queue within the register file. A given queue occupies a consecutive set of registers, although multiple queues need not occupy consecutive registers.
According to another aspect of the invention, queue state logic maintains operand queue status information, such as a header pointer, tail pointer, start address, end address and number of vacancies for a given operand queue.
According to an advantage of this invention, by implementing the operand queues as a configuration of registers in the register file the size of the queue can be optimized for efficient use of silicon. Wasted area as for conventional stream buffers is avoided. The number of queues and the depth of each queue can vary, (e.g., one function may need three queues each with a depth of ten while another function may need five queues each with a depth of six). Furthermore, in the case where no queues are needed, there is no silicon sitting unused since the operand queue memory can be used for general-purpose registers.
These and other aspects and advantages of the invention will be better understood by reference to the following detailed description taken in conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of a conventional memory hierarchy including a stream buffer;
FIG. 2 is a block diagram of a processor having a register file with an operand queue extension according to an embodiment of this invention;
FIG. 3 is a diagram of a register file 14;
FIG. 4 is a diagram of an address for accessing the register file 14;
FIG. 5 is a diagram of an exemplary configuration of the register file into multiple registers and operand queues; and
FIG. 6 is a diagram of another exemplary configuration of the register file into multiple registers and operand queues.
DESCRIPTION OF SPECIFIC EMBODIMENTS
Referring to FIG. 2, a processor 10 includes one or more functional units 12, one or more register files 14, on-chip memory 16, a memory controller 18, one or more operand queue decoders 20, and queue state logic 21. The processor 10 typically is part of a computing system which includes external (e.g., off-chip) memory 22. External memory 22 is accessed through an input/output port 23 of the processor. The processor 10 performs computing tasks on data stored in the register file 14.
A typical processor instruction includes an instruction operand and one or more source operands. The source operands include data that is loaded from on-chip memory 16 or external memory 22 into the register file 14. The data is accessed from registers or operand queues of the register file 14. Referring to FIG. 3, the register file 14 is formed by a plurality of general purpose registers 24. The register file 14 is configured dynamically to provide a plurality of registers 24 and zero or more operand queues 26. The processor 10 performs a computation on the source operands and stores a result in a destination operand, (a register of the register file 14). The result then is moved into on-chip memory 16 and then output from the processor 10, (e.g., to external memory 22, to a peripheral device or some other output destination). In various embodiments the processor 10 is based on a single instruction multiple data (‘SIMD’) architecture, a VLIW architecture, a RISC architecture, or a combination of such architectures.
The register file 14 is formed by a plurality of general purpose registers 24. Each register 24 is addressable and accessible by one or more functional units 12. The register file includes a plurality of access ports. There is an access port for moving data into or out of the register file from on-chip memory 16 and external memory 22. In addition, for each functional unit 12, there is at least two access ports—one for a source operand and one for a destination operand. In a preferred embodiment the functional units 12 use two source operands and one destination operand. In such embodiment there are three access ports to the register file per functional unit 12—one for each source operand and one for the destination operand.
The register file 14 includes an operand queue extension. Specifically, the registers 24 are dynamically configured to serve as general purpose registers or as part of an operand queue 26. The operand queue 26 is implemented by including an operand queue decoder 20 in the register address lines between the functional unit and the register file. In addition queue state logic 21 is coupled to the operand queue decoders 20. For each access port of the register file 14, there is an operand queue decoder 20. For N access ports there an N operand queue decoders, (along with N sets of address lines 23, N sets of data lines 27, N communication paths 25 to the queue state logic, and N control lines 29).
Each register 24 is configured as a general purpose register or part of an operand queue using a bit in its address line. For example, a 64-bit register typically has a 6-bit address. Instead, 7 bits are used. When a prescribed bit (e.g., the seventh bit) is a prescribed value (e.g., zero), the remaining six bits 30 (see FIG. 4) correspond to a register address, and identify which one of the 64 registers is addressed. When the prescribed bit, referred to herein as a configuration bit 32, indicates that the address is for an operand queue 26, the address is used to access the queue state logic 21 and determine the status information for the addressed operand queue 26. The queue state logic 21 stores the status information for each operand queue 26. The status information includes a head pointer, a tail pointer, a start address, an end address, and a number of vacancies for each operand queue. The size of the queue state logic 21 depends on the number of operand queues which are allowed to be defined at a given time. The queue state logic 21 is coupled to every operand queue decoder 20.
The operand queue decoder 20 receives the address value from a corresponding functional unit along address lines 23. If the value is a general purpose register address, then the corresponding register 24 is accessed. If instead the value refers to an operand queue 26, then the decoder 20 inquires with the queue state logic 21 along lines 25 for the current head or tail pointer of the particular queue (e.g., depending on whether a read or write instruction is being processed by the corresponding functional unit 12). Such pointer is the address for one of the registers 24 which is part of the operand queue 26. Such register then is accessed.
FIGS. 5-6 show exemplary configurations of the register file 14. These configurations are for illustrative purposes only. The actual configurations are defined by an application program designer as desired. Such configurations may change dynamically from program to program and within a given program. Each application computer program dynamically determines the configuration of the register file 14. Any given operand queue 26 spanning multiple registers in length uses consecutively addressed registers. However, different operand queues can be separated by registers which serve as general purpose registers and not part of an operand queue, as shown in FIG. 6. The size of any given operand queue is independent of the size selected for other operand queues and may vary from queue to queue.
The data stored in these queues 26 are mapped over the existing register file 14. While this approach may appear to restrict the number of registers 24 available to the programmer when the queues 26 are used, it provides a more flexible approach because the programmer can balance between the size of the operand queues 26 and the number of general purpose registers 24 available to the program. In addition, the existing datapath can be reused without the need for multiplexers to choose between a register file and queues. The registers can be arbitrarily partitioned among a plurality of operand queues and general purpose registers. The only restriction is that an individual queue occupy a consecutive set of registers.
In one embodiment the location and depth of a queue 26 within the register file 14 is set with a single instruction. When one of the multiple operand queues 26 is used as a source operand, the value is read from the head of the queue. Read accesses are either a POP (i.e., fetch the first (oldest) element of a queue, removing that element from the queue in the process) or a TOP operation (i.e., fetch the first (oldest) element of the queue while leaving that element in the queue in the process). If no value is available, the accessing functional unit 12 stalls and waits for the arrival of a new value. In one embodiment destination operands are written to the tail of an operand queue. If the queue is full, the functional unit stalls until the queue has space for the new value. Data is read from or written to the register file 14 by a functional unit 12 along data lines 27.
Although the same queue can be used as both source and destination operands, it is more common for one side of the queue to be accessed by the memory controller for transfer of on-chip or external memory data. If one of multiple queues 26 is used as a source operand, the memory controller 18 streams the data into the queue that is subsequently read out by the functional unit 12. If the queue 26 is used as a destination operand, then the functional unit 12 streams a set of results into the queue while the memory controller 18 reads these results from the queue and stores them in memory 16/22.
Although the data could come directly from the external memory 22, data transfer between the on-chip memory 16 and the operand queues 26 is more practical, since such a data transfer incurs fewer CPU stall cycles.
Meritorious and Advantageous Effects
By implementing the operand queues as a configuration of registers in the register file the size of the queue can be optimized for efficient use of silicon. Wasted area as for conventional stream buffers is avoided. The number of queues and the depth of each queue can vary, (e.g., one function may need three queues each with a depth of ten while another function may need five queues each with a depth of six). Furthermore, in the case where no queues are needed, there is no silicon sitting unused since the operand queue memory can be used for general-purpose registers.
Although a preferred embodiment of the invention has been illustrated and described, various alternatives, modifications and equivalents may be used. Therefore, the foregoing description should not be taken as limiting the scope of the inventions which are defined by the appended claims.

Claims (21)

What is claimed is:
1. A processor comprising:
a functional processing unit which executes processor instructions that specify at least one of a source operand and a destination operand;
a register file directly accessible by the functional processing unit which stores the source operand and destination operand specified by said processor instructions, the register file formed by a plurality of registers, wherein at least two registers of the plurality of registers are dynamically configurable to function at one time as general purpose registers and at another time as an operand queue, the operand queue directly accessible by the functional processing unit.
2. The processor of claim 1, further comprising:
a plurality of address lines for accessing the register, said plurality of address lines carrying a first plurality of address bits for addressing each one register of the plurality of registers in the register file and carrying at least one second address bit which identifies whether the register addressed by the first plurality of address bits serves as a general purpose register or part of the operand queue.
3. The processor of claim 2, further comprising:
an operand queue decoder coupled to the functional unit and the register file which decodes said plurality of address lines to select said addressed register for access as a general purpose register or part of said operand queue.
4. The processor of claim 3, further comprising:
a data input/output port for inputting data to the processor and outputting data from the processor; and
a memory controller which controls input of data through said data input/output port into said register file and output of data from said register file through said input/output port.
5. The processor of claim 4, further comprising:
on-chip memory, wherein the memory controller controls input of data through said data input/output port into said on-chip memory and output of data from said on-chip memory through said input/output port, and wherein the memory controller controls movement of data between the register file and said on-chip memory.
6. The processor of claim 3, further comprising queue state logic for storing operand queue status information, the status information comprising a queue head pointer, a queue tail pointer, a queue start address, a queue end address, and a number of vacancies in the operand queue.
7. The processor of claim 6, in which the function of each one register of the plurality of registers is independently configurable to function as either one of a general purpose register or as part of an operand queue.
8. The processor of claim 1, in which the function of each one register of the plurality of registers is independently configurable to function as either one of a general purpose register or as part of an operand queue.
9. The processor of claim 1, wherein the register file is capable of being allocated into a plurality of operand queues, and wherein each one register of the plurality of registers is dynamically configurable to function at a given time as either one of a general purpose register or as part of one of the plurality of operand queues.
10. A method for dynamically configuring a register file of a processor, comprising the steps of:
processing a first instruction which allocates a first plurality of registers of the register file to serve as an operand queue, the operand queue having a queue head pointer, a queue tail pointer, a queue start address, and a queue end address;
processing a second instruction which reallocates at least one of the first plurality of registers into a general purpose register.
11. The method of claim 10, further comprising the steps of:
processing a third instruction which reallocates at least one register of the first plurality of registers which previously functioned as a general purpose register to function as part of the operand queue.
12. The method of claim 10, in which the operand queue is a first operand queue, and further comprising the steps of:
processing a third instruction which allocates a second plurality of registers of the register file to serve as a second operand queue, the second operand queue having a second queue head pointer, a second queue tail pointer, a second queue start address, and a second queue end address;
decoding a fourth instruction to identify an addressed register of the register file and to identify whether the addressed register is configured to serve as a general purpose register or as part of one of the operand queues.
13. The method of claim 12, wherein the step of processing the first instruction comprises allocating the first plurality of registers of the register file to serve as the first operand queue, and the step of processing the third instruction comprises allocating the second plurality of registers of the register file to serve as the second operand queue, wherein the first plurality of registers and the second plurality of registers together occupy a continuous address space.
14. The method of claim 12, wherein the step of processing the first instruction comprises allocating the first plurality of registers of the register file to serve as the first operand queue, wherein the first plurality of registers occupy a first continuous address space, and wherein the step of processing the third instruction comprises allocating the second plurality of registers of the register file to serve as the second operand queue, wherein the second plurality of registers occupy a second continuous address space, the second continuous address space being discontinuous with the first continuous address space.
15. The method of claim 10, further comprising the step of:
moving data between a memory source external to the processor and the plurality of registers configured as the operand queue.
16. The method of claim 10, further comprising the steps of:
directly accessing data from the plurality of registers configured as the operand queue by a functional processing unit of the processor.
17. A method for dynamically configuring a register file of a processor, comprising the steps of:
accessing a first register of a plurality of registers in the register file as a general purpose register during a portion of a computer program executed by the processor; and
dynamically reallocating the first register to be part of an operand queue in response to a processed instruction.
18. The method of claim 17, wherein the step of dynamically reallocating comprises decoding a register address within a program instruction to identify the function of the first register as being part of the operand queue, the program instruction also specifying an address of a data operand to be stored in the operand queue.
19. The method of claim 17, further comprising the steps of:
accessing a second register of the plurality of registers in the register file as part of the operand queue;
dynamically reallocating the second register to function as a general purpose register not part of the operand queue; and
accessing the second register as the general purpose register.
20. A method for dynamically configuring a register file of a processor, comprising the steps of:
accessing a register of a plurality of registers in the register file as part of an operand queue;
dynamically reallocating the register to function as a general purpose register not part of the operand queue; and
accessing the register as a general purpose register.
21. The method of claim 20, wherein the step of dynamically reallocating comprises decoding a register address within a program instruction to identify the function of the first register as being part of the operand queue, the program instruction also specifying an address of a data operand to be stored in the operand queue.
US09/706,899 2000-11-06 2000-11-06 Operand queues for streaming data: A processor register file extension Expired - Lifetime US6782470B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/706,899 US6782470B1 (en) 2000-11-06 2000-11-06 Operand queues for streaming data: A processor register file extension

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/706,899 US6782470B1 (en) 2000-11-06 2000-11-06 Operand queues for streaming data: A processor register file extension

Publications (1)

Publication Number Publication Date
US6782470B1 true US6782470B1 (en) 2004-08-24

Family

ID=32869892

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/706,899 Expired - Lifetime US6782470B1 (en) 2000-11-06 2000-11-06 Operand queues for streaming data: A processor register file extension

Country Status (1)

Country Link
US (1) US6782470B1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6898692B1 (en) * 1999-06-28 2005-05-24 Clearspeed Technology Plc Method and apparatus for SIMD processing using multiple queues
US7206857B1 (en) * 2002-05-10 2007-04-17 Altera Corporation Method and apparatus for a network processor having an architecture that supports burst writes and/or reads
US7320037B1 (en) 2002-05-10 2008-01-15 Altera Corporation Method and apparatus for packet segmentation, enqueuing and queue servicing for multiple network processor architecture
US7336669B1 (en) 2002-05-20 2008-02-26 Altera Corporation Mechanism for distributing statistics across multiple elements
US7339943B1 (en) 2002-05-10 2008-03-04 Altera Corporation Apparatus and method for queuing flow management between input, intermediate and output queues
US20080072010A1 (en) * 2006-09-18 2008-03-20 Freescale Semiconductor, Inc. Data processor and methods thereof
US7593334B1 (en) 2002-05-20 2009-09-22 Altera Corporation Method of policing network traffic
US7606248B1 (en) 2002-05-10 2009-10-20 Altera Corporation Method and apparatus for using multiple network processors to achieve higher performance networking applications
EP2335149A1 (en) * 2008-09-08 2011-06-22 Bridgeco, Inc. Very long instruction word processor with multiple data queues
US11231933B2 (en) * 2014-04-17 2022-01-25 Texas Instruments Incorporated Processor with variable pre-fetch threshold

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6192384B1 (en) * 1998-09-14 2001-02-20 The Board Of Trustees Of The Leland Stanford Junior University System and method for performing compound vector operations

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6192384B1 (en) * 1998-09-14 2001-02-20 The Board Of Trustees Of The Leland Stanford Junior University System and method for performing compound vector operations

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Basoglu et al., "Single-Chip Processor for Media Applications: The MAP1000" Intl Journal of Imaging Systems and Technology, vol. 10, pp. 96-106, 1999.
Berg et al; "Critical Review of Programmable Mediaprocessor Architectures," SPIE Proceedings, vol. 3655, pp 147-156, 1999.
McKee et al., "Smarter Memory: Improving Bandwidth for Streamed References," IEEE Computer, vol. 31, No. 7, pp. 54-63, 1993.
N. Jouppi, "Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers," International Symposium on Computer Architecture, pp. 364-373, May 1990.
Palacharla et al., "Evaluating Stream Buffers as a Secondary Cache Replacement," Intl. Symposium on Computer Architecture, pp. 24-33, 4/94.
Rathman et al., "Processing the New World of Interactive Media," IEEE Signal Processing, vol. 15, No. 2, pp. 108-117, 1998.

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6898692B1 (en) * 1999-06-28 2005-05-24 Clearspeed Technology Plc Method and apparatus for SIMD processing using multiple queues
US7206857B1 (en) * 2002-05-10 2007-04-17 Altera Corporation Method and apparatus for a network processor having an architecture that supports burst writes and/or reads
US7320037B1 (en) 2002-05-10 2008-01-15 Altera Corporation Method and apparatus for packet segmentation, enqueuing and queue servicing for multiple network processor architecture
US7339943B1 (en) 2002-05-10 2008-03-04 Altera Corporation Apparatus and method for queuing flow management between input, intermediate and output queues
US7606248B1 (en) 2002-05-10 2009-10-20 Altera Corporation Method and apparatus for using multiple network processors to achieve higher performance networking applications
US7336669B1 (en) 2002-05-20 2008-02-26 Altera Corporation Mechanism for distributing statistics across multiple elements
US7593334B1 (en) 2002-05-20 2009-09-22 Altera Corporation Method of policing network traffic
US20080072010A1 (en) * 2006-09-18 2008-03-20 Freescale Semiconductor, Inc. Data processor and methods thereof
US7788471B2 (en) * 2006-09-18 2010-08-31 Freescale Semiconductor, Inc. Data processor and methods thereof
EP2335149A1 (en) * 2008-09-08 2011-06-22 Bridgeco, Inc. Very long instruction word processor with multiple data queues
US11231933B2 (en) * 2014-04-17 2022-01-25 Texas Instruments Incorporated Processor with variable pre-fetch threshold
US11861367B2 (en) 2014-04-17 2024-01-02 Texas Instruments Incorporated Processor with variable pre-fetch threshold

Similar Documents

Publication Publication Date Title
US10901913B2 (en) Two address translations from a single table look-aside buffer read
US10203958B2 (en) Streaming engine with stream metadata saving for context switching
US11693660B2 (en) Data processing apparatus having streaming engine with read and read/advance operand coding
US6988181B2 (en) VLIW computer processing architecture having a scalable number of register files
US6321318B1 (en) User-configurable on-chip program memory system
US11119779B2 (en) Dual data streams sharing dual level two cache access ports to maximize bandwidth utilization
US11080047B2 (en) Register file structures combining vector and scalar data with global and local accesses
US11113062B2 (en) Inserting predefined pad values into a stream of vectors
JPH10187533A (en) Cache system, processor, and method for operating processor
US11210097B2 (en) Stream reference register with double vector and dual single vector operating modes
US11360536B2 (en) Controlling the number of powered vector lanes via a register field
US20230385063A1 (en) Streaming engine with early exit from loop levels supporting early exit loops and irregular loops
US6782470B1 (en) Operand queues for streaming data: A processor register file extension
US20050182915A1 (en) Chip multiprocessor for media applications
Wei et al. A near-memory processor for vector, streaming and bit manipulation workloads
US7080234B2 (en) VLIW computer processing architecture having the problem counter stored in a register file register
JP4384828B2 (en) Coprocessor device and method for facilitating data transfer
US6957319B1 (en) Integrated circuit with multiple microcode ROMs
WO2020237231A1 (en) Inserting predefined pad values into a stream of vectors
US20230004391A1 (en) Streaming engine with stream metadata saving for context switching
US20230065512A1 (en) Pseudo-First In, First Out (FIFO) Tag Line Replacement
US11113208B2 (en) Pseudo-first in, first out (FIFO) tag line replacement

Legal Events

Date Code Title Description
AS Assignment

Owner name: WASHINGTON, UNIVERSITY OF, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BERG, STEFAN G.;GROW, MICHAEL S.;SUN, WEIYUN;AND OTHERS;REEL/FRAME:011531/0670;SIGNING DATES FROM 20000906 TO 20000919

STCF Information on status: patent grant

Free format text: PATENTED CASE

REMI Maintenance fee reminder mailed
FPAY Fee payment

Year of fee payment: 4

SULP Surcharge for late payment
REMI Maintenance fee reminder mailed
FPAY Fee payment

Year of fee payment: 8

SULP Surcharge for late payment

Year of fee payment: 7

FPAY Fee payment

Year of fee payment: 12