US20070074004A1 - Systems and methods for selectively decoupling a parallel extended instruction pipeline - Google Patents
Systems and methods for selectively decoupling a parallel extended instruction pipeline Download PDFInfo
- Publication number
- US20070074004A1 US20070074004A1 US11/528,434 US52843406A US2007074004A1 US 20070074004 A1 US20070074004 A1 US 20070074004A1 US 52843406 A US52843406 A US 52843406A US 2007074004 A1 US2007074004 A1 US 2007074004A1
- Authority
- US
- United States
- Prior art keywords
- pipeline
- instruction
- queue
- instructions
- parallel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000012545 processing Methods 0.000 claims description 12
- 230000008878 coupling Effects 0.000 claims description 7
- 238000010168 coupling process Methods 0.000 claims description 7
- 238000005859 coupling reaction Methods 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 10
- 230000008901 benefit Effects 0.000 description 7
- 238000013461 design Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 238000012356 Product development Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000001343 mnemonic effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30018—Bit or string instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30032—Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3808—Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
- G06F9/3875—Pipelining a single stage, e.g. superpipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3877—Concurrent instruction execution, e.g. pipeline, look ahead using a slave processor, e.g. coprocessor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
- G06F9/3895—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
- G06F9/3897—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4007—Interpolation-based scaling, e.g. bilinear interpolation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/14—Coding unit complexity, e.g. amount of activity or edge presence estimation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/182—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/43—Hardware specially adapted for motion estimation or compensation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/523—Motion estimation or motion compensation with sub-pixel accuracy
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/80—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
- H04N19/82—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
- H04N19/86—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
Definitions
- the invention relates generally to embedded microprocessor architecture and more specifically to systems and methods for selectively decoupling an extended instruction pipeline from a main pipeline in an microprocessor-based system.
- Processor extension logic is utilized to extend a microprocessor's capability. Typically, this logic is in parallel to and accessible by the main processor pipeline. It is often used to perform specific, repetitive, computationally intensive functions thereby freeing up the main processor pipeline.
- the parallel instruction pipeline containing the extension logic is capable of fetching and executing its own instructions and hence maximizing concurrency.
- control and synchronization between the two pipelines becomes difficult when programming a processor having such a decoupled architecture.
- a parallel pipeline architecture that can fully exploit the advantages of parallelism without suffering from the design complexity of loosely or completely decoupled pipelines.
- the microprocessor architecture comprises a first processor instruction pipeline, comprising a front end portion and a rear portion, a second processor instruction pipeline, comprising a front end portion and a rear portion, and an instruction queue coupling the first and second instruction pipeline between their respective front end and rear portions.
- Another embodiment of the invention provides a method of dynamically decoupling a parallel extended processor pipeline from a main processor pipeline.
- the method according to this embodiment comprises sending an instruction from the main processor pipeline to the parallel extended processor pipeline instructing the parallel extended processor pipeline to operate autonomously, operating the parallel extended processor pipeline autonomously, storing subsequent instructions from the main processor pipeline to the parallel extended processor pipeline in an instruction queue, executing an instruction with the parallel extended processor pipeline to cease autonomous execution, and thereafter executing instructions supplied by the main processor pipeline in the queue.
- Still a further embodiment of the invention provides a method of performing dynamically controlled parallel instruction processing in a microprocessor.
- the method comprises fetching and executing instructions with a main processor pipeline, sending instructions from the main processor pipeline to a parallel extended processor pipeline via an instruction queue coupling the two pipelines, and if the instruction is to an instruction to be executed by the parallel extended pipeline, executing that instruction with the parallel extended pipeline, otherwise if the instruction is an instruction instructing that parallel extended pipeline to begin autonomous execution, thereafter fetching and executing instructions autonomously with the parallel extended pipeline independent of the main pipeline's instruction fetches, and storing instructions from main pipeline for the parallel extended pipeline in the instruction queue until autonomous processing has ceased.
- FIG. 1 is a functional block diagram illustrating a microprocessor-based system including a main processor core and a SIMD media accelerator according to at least one embodiment of the invention
- FIG. 2 is a block diagram illustrating a conventional multistage microprocessor pipeline having a pair of parallel data paths
- FIG. 3 is a block diagram illustrating another conventional multiprocessor design having a pair of parallel processor pipelines
- FIG. 4 is a block diagram illustrating a dynamically decoupleable multi-stage microprocessor pipeline according to at least one embodiment of the invention.
- FIG. 5 is a flow chart detailing the steps of a method for sending instructions for operating a main processor pipeline and an extended processor pipeline according to at least one embodiment of the invention.
- FIG. 6 is a flow chart detailing the steps of a method for dynamically decoupling an extended processor pipeline from a main pipeline according to at least one embodiment of the invention.
- FIG. 1 a functional block diagram illustrating a microprocessor-based system 5 including a main processor core 10 and a SIMD media accelerator 50 according to at least one embodiment of the invention
- the diagram illustrates a microprocessor 5 comprising a standard single instruction single data (SISD) processor core 10 having a multistage instruction pipeline 12 and a SIMD media engine 50 .
- the processor core 10 may be a processor core such as the ARC 700 embedded processor core available from ARC, International of Elstree, United Kingdom, and as described in provisional patent application No. 60/572,238 filed May 19, 2004 entitled “Microprocessor Architecture” which, is hereby incorporated by reference in its entirety.
- the processor core may be a different processor core.
- a single instruction issued by the processor pipeline 12 may cause up to 16 16-bit elements to be operated on in parallel through the use of the 128-bit data path 55 in the media engine 50 .
- the SIMD engine 50 utilizes closely coupled memory units.
- the SIMD data memory 52 (SDM) is a 128-bit wide data memory that provides low latency access to and from the 128-bit vector register file 51 . The SDM contents are transferable to and from system main memory via a DMA unit 54 thereby freeing up the processor core 10 and the SIMD core 50 .
- SIMD code memory 56 allows the SIMD unit to fetch instructions from a localized code memory, allowing the SIMD pipeline to dynamically decouple from the processor core 10 resulting in truly parallel operation between the processor core and SIMD media engine as will be discussed in greater detail in the context of FIGS. 4-6 .
- the microprocessor architecture will permit the processor-based system 5 to operate in both closely coupled and decoupled modes of operation.
- the SIMD program code fetch is exclusively handled by the main processor core 10 .
- the SIMD pipeline 50 executes code fetched from a local memory 56 independent of the processor core 10 .
- the processor core 10 may therefore instruct the SIMD pipeline 50 to execute autonomously in this de-coupled mode, for example, to perform video tasks such as audio processing, entropy encoding/decoding, discrete cosine transforms (DCTs) and inverse DCTs, motion compensation and de-block filtering.
- DCTs discrete cosine transforms
- inverse DCTs motion compensation and de-block filtering.
- FIG. 2 a block diagram illustrating a conventional multistage microprocessor pipeline having a pair of parallel data paths is depicted.
- data paths required to support different instructions typically have a different number of stages.
- Data paths supporting specialized extension instructions for performing digital signal processing or other complex but repetitive functions may be used only some of the time during processor execution and remain idle otherwise. Thus, whether or not these instructions are currently needed will effect the number of effective stages in the processor pipeline.
- pipeline stages F 1 to F 4 at the front end 100 of the processor pipeline are responsible for functions such as instruction fetch, decode and issue. These pipeline stages are used to handle all instructions issued by the microprocessor.
- the pipeline splits into parallel data paths 110 and 115 incorporating stages E 1 -E 3 and D 1 -D 4 respectively.
- These parallel sub-paths represent pipeline stages used to support different instructions/data operations.
- stages E 1 -E 3 may be the primary/default processor pipeline
- stages D 1 -D 4 comprise the extended pipeline designed for processing specific instructions.
- This type of architecture can be characterized as coupled or tightly coupled to the extent that regardless of whether instructions are destined for default pipeline stages E 1 -E 3 or extended pipeline D 1 -D 4 , they all must pass through stages F 1 -F 4 , until a decision is made as to which portion of the pipeline will perform the remaining processing steps.
- the processor pipeline of FIG. 2 achieves the advantage that instructions can be freely intermixed, irrespectively of whether the instructions are executed by the data path in sub-paths E 1 -E 3 or D 1 -D 4 .
- all instructions appear as a single thread of program execution.
- This type of pipeline architecture also has the advantage of greatly simplified program design and debugging, thereby reducing the time to market in product developments. It is admittedly a highly flexible architecture.
- a limitation of this architecture is that the sequential nature of instruction execution significantly limits the exploitable parallelism between the data paths that could otherwise be used to improve overall performance. This negatively effects performance relative to other parallel pipeline architectures.
- FIG. 3 is a block diagram illustrating another conventional multiprocessor architecture having a pair of parallel instruction pipelines.
- the processor pipeline of FIG. 3 contains a front end 120 comprised of stages F 1 -F 4 and a rear portion 125 comprised of stages E 1 -E 3 .
- the processor also contains a parallel data path having a front end 135 comprised of front end stages G 1 -G 2 and rear portion 140 comprised of stages D 1 -D 4 .
- this architecture contains truly parallel pipelines to the extent that both front portions 420 and 435 each can fetch instructions separately.
- This type of parallel architecture may be characterized as loosely coupled or decoupled because the application specific extension data path G 1 -G 2 and D 1 -D 4 is autonomous and can execute instructions in parallel to the main pipeline consisting of F 1 -F 4 and E 1 -E 3 .
- This arrangement enhances exploitable parallelism over the architecture depicted in FIG. 2 .
- mechanisms are required to synchronize their operations, as represented by dashed line 130 .
- These mechanisms typically implemented using specific instructions and bus structures which, are often not a natural part of a program and are inserted as after-thoughts to “fix” the disconnect between main pipeline and extended pipeline. As consequence of this, the resulting program utilizing both instruction pipelines becomes difficult to design and optimize.
- FIG. 4 a block diagram illustrating a dynamically decoupleable multi-stage microprocessor pipeline according to at least one embodiment of the invention is provided.
- the pipeline architecture according to this embodiment ameliorates at least some and preferably most or all of the above-noted limitations of conventional parallel pipeline architectures.
- This exemplary pipeline depicted in FIG. 4 consists of a front end portion 145 comprising stages F 1 -F 4 , a rear portion 150 comprising stages E 1 -E 3 , and a parallel extendible pipeline having a front portion 160 comprising stages G 1 -G 2 and a rear portion 165 comprising stages D 1 -D 4 .
- FIG. 4 consists of a front end portion 145 comprising stages F 1 -F 4 , a rear portion 150 comprising stages E 1 -E 3 , and a parallel extendible pipeline having a front portion 160 comprising stages G 1 -G 2 and a rear portion 165 comprising stages D 1 -D 4 .
- instructions can be issued from the CPU to the extendible pipeline D 1 to D 4 .
- a queue 155 is added between the two pipelines. The queue serves to delay execution of instructions issued by the front end portion 145 of the main pipeline if the extension pipeline is not ready. A tradeoff can be made during system design to decide on how many entries should be in the queue 155 to insure that the extension pipeline is sufficiently decoupled from the main pipeline.
- the main pipeline can issue a Sequence Run (vrun) instruction to instruct the extension pipeline to use its own front end 160 , G 1 to G 2 in the diagram, to execute instruction sequences stored in a record memory 156 , causing the extension pipeline to fetch and execute instructions autonomously.
- a Sequence Run (vrun) instruction to instruct the extension pipeline to use its own front end 160 , G 1 to G 2 in the diagram, to execute instruction sequences stored in a record memory 156 , causing the extension pipeline to fetch and execute instructions autonomously.
- the main pipeline can keep issuing extension instructions that accumulate in the queue 155 until the extension pipeline executes a Sequence Record End (vendrec) instruction. After the motifc instruction is issued, the extension resumes executing instructions issued to the queue 155 .
- the pipeline depicted in FIG. 4 is designed to switch between being coupled, that is, executing instructions for the main pipeline front end 145 , and being decoupled, that is, during autonomous runtime of the extended pipeline.
- the instructions vrun and structuric which dynamically switch the pipeline between the coupling states, can be designed to be light weight, executing in, for example, a single cycle.
- These instructions can then be seen as parallel analogs of the conventional call and return instructions. That is, when instructing the extension pipeline to fetch and execute instructions autonomously, the main processor pipeline is issuing a parallel function call that runs concurrently with its own thread of instruction execution to maximize speedup of the application. The two threads of instruction execution eventually join back into one after the extension pipeline executes the motifc instruction which is the last instruction of the program thread autonomously executed by the extension pipeline.
- another advantage of this architecture is that during debugging, such as, for example, instruction stepping, the two parallel threads can be forced to be serialized such that the CPU front portion 145 will not issue any instruction after issuing vrun to the extension pipeline until the latter fetches and executes the motifc instruction.
- this will give the programmer the view of a single program thread that has the same functional behavior of the parallel program when executed normally and hence will greatly simplify the task of debugging.
- processor pipeline containing a parallel extendible pipeline that can be dynamically coupled and decoupled is the ability to use two separate clock domains. In low power applications, it is often necessary to run specific parts of the integrated circuit at varying clock frequencies, in order to reduce and/or minimize power consumption.
- the front end portion 145 of the main pipeline can utilize an operating clock frequency different from that of the parallel pipeline 165 of stages D 1 -D 4 with the primary clock partitioning occurring naturally at the queue 155 labeled as Q in the FIG. 4 .
- step 200 a flow chart of an exemplary method for sending instructions from a main processor pipeline to an extended processor pipeline according to at least one embodiment of the invention is depicted. Operation of the method begins in step 200 and proceeds to step 205 , where an instruction is fetched by the main processor pipeline.
- step 210 because the instruction is determined to be one for processing by the parallel extended pipeline, the instruction is passed from the main pipeline to the parallel extended pipeline via an instruction queue coupling the two pipelines.
- the parallel extended pipeline is currently processing instructions from the queue, that instruction will be processed in turn by the parallel extended pipeline as specified in step 220 . Otherwise, the instruction will remain in the queue until the parallel extended pipeline has ceased its autonomous operation.
- step 225 while the instruction is either sitting in the queue or being processed by the parallel pipeline, the main pipeline is able to continue processing instructions.
- the queue provides a mechanism for the main pipeline to offload instructions to the parallel extended pipeline without stalling the main pipeline. Operation of the method stops in step 230 .
- this Figure is a flow chart of an exemplary method for dynamically decoupling an extended processor pipeline from a main pipeline according to at least one embodiment of the invention. Operation of the method begins in step 300 and proceeds to step 305 where the main processor pipeline sends a run instruction to the parallel extended pipeline via the instruction queue coupling the pipelines.
- the parallel pipeline retrieves the run instruction from the queue. As noted above, this may occur instantly or after the parallel pipeline has retrieved and processed other instructions in front of the run instruction in the queue. In various embodiments, this run instruction will specify a location in a record memory accessible by the parallel extended pipeline of a starting location of a sequence of recorded instructions.
- the parallel extended pipeline begins executing the series of recorded instructions, that is, it begins autonomous operation. In various embodiments this comprises fetching and executing its own instructions independent of the main pipeline's instruction stack. Also, in various embodiments, the parallel extended pipeline may operate at another clock frequency that the main pipeline, such as, for example, a fractional percentage (i.e., 1 ⁇ 2, 1 ⁇ 4, etc.). Concurrent to the parallel extended pipeline's autonomous execution, the main processor pipeline can continue sending instructions to the parallel extended pipeline as depicted in step 320 . Then, in step 325 , after the parallel pipeline has processed an end instruction recorded at the end of the sequence of recorded instructions, autonomous operation of that pipeline ceases. In step 330 , the parallel pipeline returns to the queue to process any queued instructions received from the main pipeline. In step 335 , the parallel extended pipeline continues processing instructions issued by the main pipeline that appear in the queue until an instruction to begin autonomous operation is received.
- a main processor pipeline is extended through a dynamically coupled parallel SIMD instruction pipeline.
- the main processor pipeline may issue instructions to the extended pipeline through an instruction queue that effectively decouples the extended pipeline.
- the extended SIMD pipeline is also able to run prerecorded macros that are stored in a local SIMD instruction memory so that a single macro instruction sent to the SIMD pipeline via the queue allows many pre-determined instructions to be executed as discussed in commonly assigned U.S.
- main pipeline One consideration of using an instruction queue to decouple the extended SIMD pipeline from the processor core (main pipeline) is that it becomes possible for the processor core to issue too many instructions causing the queue to become full. When the main processor pipeline can no longer issue instructions to the queue, the pipeline will have to stall until the queue frees up a slot for the instruction that caused the pipeline to stall. Pipeline stalls have a negative effect on overall system performance. In this case in particular, a pipeline stall means that the processor core will stop being able to operate in parallel, therefore negating the gains derived from the dynamically decoupled extended parallel SIMD pipeline.
- the SIMD pipeline queue uses condition codes to notify the processor pipeline of the condition of the queue.
- the SIMD queue sets a condition code of QF for queue nearly full whenever there are less than a predetermined number of empty slots remaining in the queue. In various embodiments, this number may be 16. However, in various embodiments, the number may be different than 16.
- the SIMD queue sets a condition code of QNF as the opposite of QF when more than the predetermined number of slots remain available.
- condition codes rather than using several instructions to load these status values and test the value before branching on the test result, two conditional branch instructions using these condition codes directly test for such conditions, thereby reducing the number of instructions required to perform this task.
- these instructions will only branch when the condition code used is set.
- these instructions may have the mnemonic “BQF” for branch when queue is nearly full and “BQNF” for branch when queue is not nearly full.
- Such condition codes make the queue full status an integral part of the main processor programming model and make it possible to make frequent light-weight intelligent decisions by software to maximize overall performance.
- condition codes are maintained by the queue itself based on the queue's status.
- the instruction to check the condition code are branch instructions that are specified to check the particular condition codes.
- checking of the condition code is done by placing condition code checking branch instructions where necessary, such as before issuing any instructions to the extended pipeline.
- condition codes provide an easy mechanism for preventing main pipeline stalls caused by trying to issue instructions to a full queue.
- These two conditional branch instructions allow the main processor pipeline to regularly check the status of the queue before issuing more instructions into the extended SIMD pipeline queue.
- the main processor core can use these instructions to avoid stalling the processor when the queue is full or nearly full, and branch to another task that does not involve the SIMD engine until these queue conditions change. Therefore, these instructions provide the processor with an effective and relatively low overhead means of scheduling work load on the available resources while preventing main pipeline stalls.
Abstract
Systems and methods for selectively decoupling a parallel extended processor pipeline. A main processor pipeline and parallel extended pipeline are coupled via an instruction queue. The main pipeline can instruct the parallel pipeline to execute instructions directly or to begin fetching and executing its own instructions autonomously. During autonomous operation of the parallel pipeline, instructions from the main pipeline accumulate in the instruction queue. The parallel pipeline can return to main pipeline controlled execution through a single instruction. A light weight mechanism in the form of a condition code as seen by the main processor is designed to allow intelligent decision maximizing overall performance to be made in run-time if further instructions should be issued to the parallel extended pipeline based on the queue status.
Description
- This application claims priority to U.S. Provisional Patent Application No. 60/721,108 titled “SIMD Architecture and Associated Systems and Methods,” filed Sep. 28, 2005, the disclosure of which is hereby incorporated by reference in its entirety.
- The invention relates generally to embedded microprocessor architecture and more specifically to systems and methods for selectively decoupling an extended instruction pipeline from a main pipeline in an microprocessor-based system.
- Processor extension logic is utilized to extend a microprocessor's capability. Typically, this logic is in parallel to and accessible by the main processor pipeline. It is often used to perform specific, repetitive, computationally intensive functions thereby freeing up the main processor pipeline.
- In conventional microprocessors, there are essentially two types of parallel pipeline architectures: tightly coupled and loosely coupled, or decoupled. In the former, instructions are fetched and executed serially in the main processor pipeline. If the instruction is an instruction to be processed by the extension logic, the instruction is sent to that logic. Because every instruction originates from the main pipeline the two pipelines are said to be tightly coupled. This limits the degree of concurrency exploitable between the pipelines.
- In the second architecture, the parallel instruction pipeline containing the extension logic is capable of fetching and executing its own instructions and hence maximizing concurrency. However, control and synchronization between the two pipelines becomes difficult when programming a processor having such a decoupled architecture. Thus, there exists a need for a parallel pipeline architecture that can fully exploit the advantages of parallelism without suffering from the design complexity of loosely or completely decoupled pipelines.
- Accordingly, at least one embodiment of the invention provides a microprocessor architecture. The microprocessor architecture according to this embodiment comprises a first processor instruction pipeline, comprising a front end portion and a rear portion, a second processor instruction pipeline, comprising a front end portion and a rear portion, and an instruction queue coupling the first and second instruction pipeline between their respective front end and rear portions.
- Another embodiment of the invention provides a method of dynamically decoupling a parallel extended processor pipeline from a main processor pipeline. The method according to this embodiment comprises sending an instruction from the main processor pipeline to the parallel extended processor pipeline instructing the parallel extended processor pipeline to operate autonomously, operating the parallel extended processor pipeline autonomously, storing subsequent instructions from the main processor pipeline to the parallel extended processor pipeline in an instruction queue, executing an instruction with the parallel extended processor pipeline to cease autonomous execution, and thereafter executing instructions supplied by the main processor pipeline in the queue.
- Still a further embodiment of the invention provides a method of performing dynamically controlled parallel instruction processing in a microprocessor. The method according to this embodiment comprises fetching and executing instructions with a main processor pipeline, sending instructions from the main processor pipeline to a parallel extended processor pipeline via an instruction queue coupling the two pipelines, and if the instruction is to an instruction to be executed by the parallel extended pipeline, executing that instruction with the parallel extended pipeline, otherwise if the instruction is an instruction instructing that parallel extended pipeline to begin autonomous execution, thereafter fetching and executing instructions autonomously with the parallel extended pipeline independent of the main pipeline's instruction fetches, and storing instructions from main pipeline for the parallel extended pipeline in the instruction queue until autonomous processing has ceased.
- These and other embodiments and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
- In order to facilitate a fuller understanding of the present disclosure, reference is now made to the accompanying drawings, in which like elements are referenced with like numerals. These drawings should not be construed as limiting the present disclosure, but are intended to be exemplary only.
-
FIG. 1 is a functional block diagram illustrating a microprocessor-based system including a main processor core and a SIMD media accelerator according to at least one embodiment of the invention; -
FIG. 2 is a block diagram illustrating a conventional multistage microprocessor pipeline having a pair of parallel data paths; -
FIG. 3 is a block diagram illustrating another conventional multiprocessor design having a pair of parallel processor pipelines; -
FIG. 4 is a block diagram illustrating a dynamically decoupleable multi-stage microprocessor pipeline according to at least one embodiment of the invention; and -
FIG. 5 is a flow chart detailing the steps of a method for sending instructions for operating a main processor pipeline and an extended processor pipeline according to at least one embodiment of the invention; and -
FIG. 6 is a flow chart detailing the steps of a method for dynamically decoupling an extended processor pipeline from a main pipeline according to at least one embodiment of the invention. - The following description is intended to convey a thorough understanding of the embodiments described by providing a number of specific embodiments and details involving microprocessor architecture and systems and methods for selectively decoupling an extended instruction pipeline from a main instruction pipeline. It should be appreciated, however, that the present invention is not limited to these specific embodiments and details, which are exemplary only. It is further understood that one possessing ordinary skill in the art, in light of known systems and methods, would appreciate the use of the invention for its intended purposes and benefits in any number of alternative embodiments, depending upon specific design and other needs.
- Referring now to
FIG. 1 , a functional block diagram illustrating a microprocessor-basedsystem 5 including amain processor core 10 and aSIMD media accelerator 50 according to at least one embodiment of the invention The diagram illustrates amicroprocessor 5 comprising a standard single instruction single data (SISD)processor core 10 having amultistage instruction pipeline 12 and aSIMD media engine 50. In various embodiments, theprocessor core 10 may be a processor core such as the ARC 700 embedded processor core available from ARC, International of Elstree, United Kingdom, and as described in provisional patent application No. 60/572,238 filed May 19, 2004 entitled “Microprocessor Architecture” which, is hereby incorporated by reference in its entirety. Alternatively, in various embodiments, the processor core may be a different processor core. - In various embodiments, a single instruction issued by the
processor pipeline 12 may cause up to 16 16-bit elements to be operated on in parallel through the use of the 128-bit data path 55 in themedia engine 50. In various embodiments, theSIMD engine 50 utilizes closely coupled memory units. In various embodiments, the SIMD data memory 52 (SDM) is a 128-bit wide data memory that provides low latency access to and from the 128-bitvector register file 51. The SDM contents are transferable to and from system main memory via aDMA unit 54 thereby freeing up theprocessor core 10 and theSIMD core 50. In various embodiments, a SIMD code memory 56 (SCM) allows the SIMD unit to fetch instructions from a localized code memory, allowing the SIMD pipeline to dynamically decouple from theprocessor core 10 resulting in truly parallel operation between the processor core and SIMD media engine as will be discussed in greater detail in the context ofFIGS. 4-6 . - Therefore, in various embodiments, the microprocessor architecture will permit the processor-based
system 5 to operate in both closely coupled and decoupled modes of operation. In the closely coupled mode of operation, the SIMD program code fetch is exclusively handled by themain processor core 10. In the decoupled mode of operation, theSIMD pipeline 50 executes code fetched from alocal memory 56 independent of theprocessor core 10. Theprocessor core 10 may therefore instruct theSIMD pipeline 50 to execute autonomously in this de-coupled mode, for example, to perform video tasks such as audio processing, entropy encoding/decoding, discrete cosine transforms (DCTs) and inverse DCTs, motion compensation and de-block filtering. - Referring now to
FIG. 2 , a block diagram illustrating a conventional multistage microprocessor pipeline having a pair of parallel data paths is depicted. In a microprocessor employing a variable-length pipeline, data paths required to support different instructions typically have a different number of stages. Data paths supporting specialized extension instructions for performing digital signal processing or other complex but repetitive functions may be used only some of the time during processor execution and remain idle otherwise. Thus, whether or not these instructions are currently needed will effect the number of effective stages in the processor pipeline. - Extending a general-purpose microprocessor with application specific extension instructions can often add significant length to the instruction pipeline. In the pipeline of
FIG. 2 , pipeline stages F1 to F4 at thefront end 100 of the processor pipeline are responsible for functions such as instruction fetch, decode and issue. These pipeline stages are used to handle all instructions issued by the microprocessor. After these stages, the pipeline splits intoparallel data paths - By using the single pipeline front-end to fetch and issue all instructions, the processor pipeline of
FIG. 2 achieves the advantage that instructions can be freely intermixed, irrespectively of whether the instructions are executed by the data path in sub-paths E1-E3 or D1-D4. Thus, all instructions appear as a single thread of program execution. This type of pipeline architecture also has the advantage of greatly simplified program design and debugging, thereby reducing the time to market in product developments. It is admittedly a highly flexible architecture. However, a limitation of this architecture is that the sequential nature of instruction execution significantly limits the exploitable parallelism between the data paths that could otherwise be used to improve overall performance. This negatively effects performance relative to other parallel pipeline architectures. -
FIG. 3 is a block diagram illustrating another conventional multiprocessor architecture having a pair of parallel instruction pipelines. The processor pipeline ofFIG. 3 contains afront end 120 comprised of stages F1-F4 and arear portion 125 comprised of stages E1-E3. However, the processor also contains a parallel data path having afront end 135 comprised of front end stages G1-G2 andrear portion 140 comprised of stages D1-D4. Unlike the architecture ofFIG. 2 , this architecture contains truly parallel pipelines to the extent that both front portions 420 and 435 each can fetch instructions separately. This type of parallel architecture may be characterized as loosely coupled or decoupled because the application specific extension data path G1-G2 and D1-D4 is autonomous and can execute instructions in parallel to the main pipeline consisting of F1-F4 and E1-E3. This arrangement enhances exploitable parallelism over the architecture depicted inFIG. 2 . However, as the two parallel pipelines become independent, mechanisms are required to synchronize their operations, as represented by dashedline 130. These mechanisms, typically implemented using specific instructions and bus structures which, are often not a natural part of a program and are inserted as after-thoughts to “fix” the disconnect between main pipeline and extended pipeline. As consequence of this, the resulting program utilizing both instruction pipelines becomes difficult to design and optimize. - Referring now to
FIG. 4 , a block diagram illustrating a dynamically decoupleable multi-stage microprocessor pipeline according to at least one embodiment of the invention is provided. The pipeline architecture according to this embodiment ameliorates at least some and preferably most or all of the above-noted limitations of conventional parallel pipeline architectures. This exemplary pipeline depicted inFIG. 4 consists of afront end portion 145 comprising stages F1-F4, arear portion 150 comprising stages E1-E3, and a parallel extendible pipeline having afront portion 160 comprising stages G1-G2 and arear portion 165 comprising stages D1-D4. In the pipeline depicted inFIG. 4 , instructions can be issued from the CPU to the extendible pipeline D1 to D4. To decouple the extendible pipeline D1 to D4 from thefront portion 145 of the main pipeline F1 to F4, aqueue 155 is added between the two pipelines. The queue serves to delay execution of instructions issued by thefront end portion 145 of the main pipeline if the extension pipeline is not ready. A tradeoff can be made during system design to decide on how many entries should be in thequeue 155 to insure that the extension pipeline is sufficiently decoupled from the main pipeline. Additionally, in various embodiments, the main pipeline can issue a Sequence Run (vrun) instruction to instruct the extension pipeline to use its ownfront end 160, G1 to G2 in the diagram, to execute instruction sequences stored in arecord memory 156, causing the extension pipeline to fetch and execute instructions autonomously. In various embodiments, while the extension pipeline, G1-G2 and D1-D4, is performing operations, the main pipeline can keep issuing extension instructions that accumulate in thequeue 155 until the extension pipeline executes a Sequence Record End (vendrec) instruction. After the vendrec instruction is issued, the extension resumes executing instructions issued to thequeue 155. - Therefore, instead of trying to get what effectively becomes two independent processors to work together as in the pipeline depicted in
FIG. 3 , the pipeline depicted inFIG. 4 is designed to switch between being coupled, that is, executing instructions for the main pipelinefront end 145, and being decoupled, that is, during autonomous runtime of the extended pipeline. As such, the instructions vrun and vendrec, which dynamically switch the pipeline between the coupling states, can be designed to be light weight, executing in, for example, a single cycle. These instructions can then be seen as parallel analogs of the conventional call and return instructions. That is, when instructing the extension pipeline to fetch and execute instructions autonomously, the main processor pipeline is issuing a parallel function call that runs concurrently with its own thread of instruction execution to maximize speedup of the application. The two threads of instruction execution eventually join back into one after the extension pipeline executes the vendrec instruction which is the last instruction of the program thread autonomously executed by the extension pipeline. - In addition to efficient operation, another advantage of this architecture is that during debugging, such as, for example, instruction stepping, the two parallel threads can be forced to be serialized such that the
CPU front portion 145 will not issue any instruction after issuing vrun to the extension pipeline until the latter fetches and executes the vendrec instruction. In various embodiments, this will give the programmer the view of a single program thread that has the same functional behavior of the parallel program when executed normally and hence will greatly simplify the task of debugging. - Another advantage of the processor pipeline containing a parallel extendible pipeline that can be dynamically coupled and decoupled is the ability to use two separate clock domains. In low power applications, it is often necessary to run specific parts of the integrated circuit at varying clock frequencies, in order to reduce and/or minimize power consumption. Using dynamic decoupling, the
front end portion 145 of the main pipeline can utilize an operating clock frequency different from that of theparallel pipeline 165 of stages D1-D4 with the primary clock partitioning occurring naturally at thequeue 155 labeled as Q in theFIG. 4 . - Referring now to
FIG. 5 , a flow chart of an exemplary method for sending instructions from a main processor pipeline to an extended processor pipeline according to at least one embodiment of the invention is depicted. Operation of the method begins instep 200 and proceeds to step 205, where an instruction is fetched by the main processor pipeline. Instep 210, because the instruction is determined to be one for processing by the parallel extended pipeline, the instruction is passed from the main pipeline to the parallel extended pipeline via an instruction queue coupling the two pipelines. In various embodiments, if the parallel extended pipeline is currently processing instructions from the queue, that instruction will be processed in turn by the parallel extended pipeline as specified instep 220. Otherwise, the instruction will remain in the queue until the parallel extended pipeline has ceased its autonomous operation. Instep 225, while the instruction is either sitting in the queue or being processed by the parallel pipeline, the main pipeline is able to continue processing instructions. The queue provides a mechanism for the main pipeline to offload instructions to the parallel extended pipeline without stalling the main pipeline. Operation of the method stops instep 230. - Referring now to
FIG. 6 , this Figure is a flow chart of an exemplary method for dynamically decoupling an extended processor pipeline from a main pipeline according to at least one embodiment of the invention. Operation of the method begins instep 300 and proceeds to step 305 where the main processor pipeline sends a run instruction to the parallel extended pipeline via the instruction queue coupling the pipelines. Instep 310, the parallel pipeline retrieves the run instruction from the queue. As noted above, this may occur instantly or after the parallel pipeline has retrieved and processed other instructions in front of the run instruction in the queue. In various embodiments, this run instruction will specify a location in a record memory accessible by the parallel extended pipeline of a starting location of a sequence of recorded instructions. Next, instep 315, based on receipt of the run instruction, the parallel extended pipeline begins executing the series of recorded instructions, that is, it begins autonomous operation. In various embodiments this comprises fetching and executing its own instructions independent of the main pipeline's instruction stack. Also, in various embodiments, the parallel extended pipeline may operate at another clock frequency that the main pipeline, such as, for example, a fractional percentage (i.e., ½, ¼, etc.). Concurrent to the parallel extended pipeline's autonomous execution, the main processor pipeline can continue sending instructions to the parallel extended pipeline as depicted instep 320. Then, instep 325, after the parallel pipeline has processed an end instruction recorded at the end of the sequence of recorded instructions, autonomous operation of that pipeline ceases. Instep 330, the parallel pipeline returns to the queue to process any queued instructions received from the main pipeline. Instep 335, the parallel extended pipeline continues processing instructions issued by the main pipeline that appear in the queue until an instruction to begin autonomous operation is received. - As discussed above, in the microprocessor architecture according to the various embodiments of the invention, a main processor pipeline is extended through a dynamically coupled parallel SIMD instruction pipeline. In various embodiments, the main processor pipeline may issue instructions to the extended pipeline through an instruction queue that effectively decouples the extended pipeline. In various embodiments, the extended SIMD pipeline is also able to run prerecorded macros that are stored in a local SIMD instruction memory so that a single macro instruction sent to the SIMD pipeline via the queue allows many pre-determined instructions to be executed as discussed in commonly assigned U.S. patent applications XX/XXX,XXX titled, “Systems and Methods for Recording Instructions Sequences in a Microprocessor Having a Dynamically Decoupleable Extended Instruction Pipeline,” filed concurrently herewith, the disclosure of which is hereby incorporated by reference in its entirety. This architecture, among other things, allows the SIMD media engine (the extended pipeline) to operate in parallel with the primary pipeline (processor core) and allows the processor core to operate far in advance of the parallel SIMD pipeline.
- One consideration of using an instruction queue to decouple the extended SIMD pipeline from the processor core (main pipeline) is that it becomes possible for the processor core to issue too many instructions causing the queue to become full. When the main processor pipeline can no longer issue instructions to the queue, the pipeline will have to stall until the queue frees up a slot for the instruction that caused the pipeline to stall. Pipeline stalls have a negative effect on overall system performance. In this case in particular, a pipeline stall means that the processor core will stop being able to operate in parallel, therefore negating the gains derived from the dynamically decoupled extended parallel SIMD pipeline.
- Therefore, in order to prevent the main processor pipeline from issuing instructions to the queue when it is full, thereby causing the main pipeline to stall, in various embodiments, the SIMD pipeline queue uses condition codes to notify the processor pipeline of the condition of the queue. In various embodiments, the SIMD queue sets a condition code of QF for queue nearly full whenever there are less than a predetermined number of empty slots remaining in the queue. In various embodiments, this number may be 16. However, in various embodiments, the number may be different than 16. In various embodiments, the SIMD queue sets a condition code of QNF as the opposite of QF when more than the predetermined number of slots remain available.
- In various embodiments, rather than using several instructions to load these status values and test the value before branching on the test result, two conditional branch instructions using these condition codes directly test for such conditions, thereby reducing the number of instructions required to perform this task. In various embodiments, these instructions will only branch when the condition code used is set. In various embodiments, these instructions may have the mnemonic “BQF” for branch when queue is nearly full and “BQNF” for branch when queue is not nearly full. Such condition codes make the queue full status an integral part of the main processor programming model and make it possible to make frequent light-weight intelligent decisions by software to maximize overall performance. These condition codes are maintained by the queue itself based on the queue's status. The instruction to check the condition code are branch instructions that are specified to check the particular condition codes. In various embodiments of the invention, checking of the condition code is done by placing condition code checking branch instructions where necessary, such as before issuing any instructions to the extended pipeline. Thus, the condition codes provide an easy mechanism for preventing main pipeline stalls caused by trying to issue instructions to a full queue.
- These two conditional branch instructions allow the main processor pipeline to regularly check the status of the queue before issuing more instructions into the extended SIMD pipeline queue. The main processor core can use these instructions to avoid stalling the processor when the queue is full or nearly full, and branch to another task that does not involve the SIMD engine until these queue conditions change. Therefore, these instructions provide the processor with an effective and relatively low overhead means of scheduling work load on the available resources while preventing main pipeline stalls.
- The embodiments of the present inventions are not to be limited in scope by the specific embodiments described herein. For example, although many of the embodiments disclosed herein have been described with reference to systems and dynamically decoupling a parallel pipeline in a microprocessor-based system having a main instruction pipeline and an extended instruction pipeline, the principles herein are equally applicable to other aspects of microprocessor design and function. Indeed, various modifications of the embodiments of the present inventions, in addition to those described herein, will be apparent to those of ordinary skill in the art from the foregoing description and accompanying drawings. Thus, such modifications are intended to fall within the scope of the following appended claims. Further, although some of the embodiments of the present invention have been described herein in the context of a particular implementation in a particular environment for a particular purpose, those of ordinary skill in the art will recognize that its usefulness is not limited thereto and that the embodiments of the present inventions can be beneficially implemented in any number of environments for any number of purposes. Accordingly, the claims set forth below should be construed in view of the full breath and spirit of the embodiments of the present inventions as disclosed herein.
Claims (20)
1. A microprocessor architecture comprising:
a first processor instruction pipeline, comprising a front end portion and a rear portion;
a second processor instruction pipeline, comprising a front end portion and a rear portion; and
an instruction queue coupling the first and second instruction pipeline between their respective front end and rear portions.
2. The microprocessor architecture according to claim 1 , where the instruction queue is located in the second instruction pipeline between that pipeline's front end and rear portions.
3. The microprocessor architecture according to claim 1 , wherein the queue is configured to store instructions issued by the first instruction pipeline to the second instruction pipeline.
4. The microprocessor architecture according to claim 1 , wherein the first instruction pipeline is configured to be able to instruct the second instruction pipeline to operate autonomously.
5. The microprocessor architecture according to claim 4 , wherein operating autonomously comprises fetching and executing its own instructions via the second pipeline's front end portion.
6. The microprocessor architecture according to claim 4 , wherein operating autonomously comprises operating on a different clock frequency than the first instruction pipeline.
7. The microprocessor architecture according to claim 5 , wherein instructions issued to the second instruction pipeline accumulate in the queue during its autonomous operation.
8. The microprocessor architecture according to claim 7 , wherein the instruction queue comprises at least one condition code.
9. The microprocessor architecture according to claim 8 , wherein the at least one condition code comprises a code indicative of at least one state of the queue selected from the group consisting queue having less than a predetermined number of free slots, queue having more than a predetermined of free slots, and queue full.
10. The microprocessor architecture according to claim 9 , wherein the first processor instruction pipeline uses the at least one condition code to determine whether to send an instruction to the queue or to branch to another instruction that does not require the second instruction pipeline.
11. The microprocessor architecture according to claim 7 , wherein the second instruction pipeline is adapted to return from autonomous operation to first instruction pipeline controlled operation by executing a return instruction.
12. The microprocessor architecture according to claim 11 , wherein instructions accumulated in the queue are executed by the second instruction pipeline when it returns from autonomous operation.
13. A method of dynamically decoupling a parallel extended processor pipeline from a main processor pipeline comprising:
sending an instruction from the main processor pipeline to the parallel extended processor pipeline instructing the parallel extended processor pipeline to operate autonomously;
operating the parallel extended processor pipeline autonomously;
storing subsequent instructions from the main processor pipeline to the parallel extended processor pipeline in an instruction queue;
executing an instruction with the parallel extended processor pipeline to cease autonomous execution; and
thereafter executing instructions supplied by the main processor pipeline in the queue.
14. The method according to claim 13 , further comprising executing an instruction on the main processor pipeline to check a condition code of the instruction queue before sending subsequent instructions to the queue.
15. The method according to claim 14 , further comprising either branching to another instruction that doesn't require the parallel extended processor pipeline or sending the instruction to the instruction queue based on the condition code.
16. The method according to claim 13 , wherein operating the parallel extended processor pipeline autonomously comprises fetching and executing its own instructions via that pipeline's own front end, independent of instructions fetched and executed by the main processor pipeline.
17. The method according to claim 13 , wherein operating the parallel extended processor pipeline autonomously comprises operating at a different clock frequency than the main processor pipeline.
18. The method according to claim 13 , wherein the main processor pipeline continues to fetch and execute instructions while the parallel extended processor pipeline is operating autonomously.
19. The method according to claim 13 , wherein, executing an instruction with the parallel extended processor pipeline to cease autonomous execution comprises returning from autonomous operation to first instruction pipeline controlled operation without being instructed to do so by the first instruction pipeline.
20. A method of performing dynamically controlled parallel instruction processing in a microprocessor comprising:
fetching and executing instructions with a main processor pipeline;
sending instructions from the main processor pipeline to a parallel extended processor pipeline via an instruction queue coupling the two pipelines; and
if the instruction is to an instruction to be executed by the parallel extended pipeline, executing that instruction with the parallel extended pipeline;
otherwise if the instruction is an instruction instructing that parallel extended pipeline to begin autonomous execution, thereafter fetching and executing instructions autonomously with the parallel extended pipeline independent of the main pipeline's instruction fetches, and storing instructions from main pipeline for the parallel extended pipeline in the instruction queue until autonomous processing has ceased.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/528,434 US20070074004A1 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for selectively decoupling a parallel extended instruction pipeline |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US72110805P | 2005-09-28 | 2005-09-28 | |
US11/528,434 US20070074004A1 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for selectively decoupling a parallel extended instruction pipeline |
Publications (1)
Publication Number | Publication Date |
---|---|
US20070074004A1 true US20070074004A1 (en) | 2007-03-29 |
Family
ID=37968194
Family Applications (7)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/528,470 Abandoned US20070073925A1 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for synchronizing multiple processing engines of a microprocessor |
US11/528,327 Active 2029-03-30 US7747088B2 (en) | 2005-09-28 | 2006-09-28 | System and methods for performing deblocking in microprocessor-based video codec applications |
US11/528,338 Active 2027-04-24 US7971042B2 (en) | 2005-09-28 | 2006-09-28 | Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline |
US11/528,325 Active 2030-06-13 US8212823B2 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
US11/528,434 Abandoned US20070074004A1 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for selectively decoupling a parallel extended instruction pipeline |
US11/528,432 Active 2031-04-08 US8218635B2 (en) | 2005-09-28 | 2006-09-28 | Systolic-array based systems and methods for performing block matching in motion compensation |
US11/528,326 Abandoned US20070074007A1 (en) | 2005-09-28 | 2006-09-28 | Parameterizable clip instruction and method of performing a clip operation using the same |
Family Applications Before (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/528,470 Abandoned US20070073925A1 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for synchronizing multiple processing engines of a microprocessor |
US11/528,327 Active 2029-03-30 US7747088B2 (en) | 2005-09-28 | 2006-09-28 | System and methods for performing deblocking in microprocessor-based video codec applications |
US11/528,338 Active 2027-04-24 US7971042B2 (en) | 2005-09-28 | 2006-09-28 | Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline |
US11/528,325 Active 2030-06-13 US8212823B2 (en) | 2005-09-28 | 2006-09-28 | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/528,432 Active 2031-04-08 US8218635B2 (en) | 2005-09-28 | 2006-09-28 | Systolic-array based systems and methods for performing block matching in motion compensation |
US11/528,326 Abandoned US20070074007A1 (en) | 2005-09-28 | 2006-09-28 | Parameterizable clip instruction and method of performing a clip operation using the same |
Country Status (2)
Country | Link |
---|---|
US (7) | US20070073925A1 (en) |
WO (1) | WO2007049150A2 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070070080A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
US20090049281A1 (en) * | 2007-07-24 | 2009-02-19 | Samsung Electronics Co., Ltd. | Multimedia decoding method and multimedia decoding apparatus based on multi-core processor |
US20090217275A1 (en) * | 2008-02-22 | 2009-08-27 | International Business Machines Corporation | Pipelining hardware accelerators to computer systems |
US20090213127A1 (en) * | 2008-02-22 | 2009-08-27 | International Business Machines Corporation | Guided attachment of accelerators to computer systems |
US20090217266A1 (en) * | 2008-02-22 | 2009-08-27 | International Business Machines Corporation | Streaming attachment of hardware accelerators to computer systems |
US20140355691A1 (en) * | 2013-06-03 | 2014-12-04 | Texas Instruments Incorporated | Multi-threading in a video hardware engine |
US10296340B2 (en) | 2014-03-13 | 2019-05-21 | Arm Limited | Data processing apparatus for executing an access instruction for N threads |
US11042502B2 (en) * | 2014-12-24 | 2021-06-22 | Samsung Electronics Co., Ltd. | Vector processing core shared by a plurality of scalar processing cores for scheduling and executing vector instructions |
Families Citing this family (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9015397B2 (en) | 2012-11-29 | 2015-04-21 | Sandisk Technologies Inc. | Method and apparatus for DMA transfer with synchronization optimization |
US9330060B1 (en) | 2003-04-15 | 2016-05-03 | Nvidia Corporation | Method and device for encoding and decoding video image data |
US8660182B2 (en) | 2003-06-09 | 2014-02-25 | Nvidia Corporation | MPEG motion estimation based on dual start points |
TWI239474B (en) * | 2004-07-28 | 2005-09-11 | Novatek Microelectronics Corp | Circuit for counting sum of absolute difference |
TWI295540B (en) * | 2005-06-15 | 2008-04-01 | Novatek Microelectronics Corp | Motion estimation circuit and operating method thereof |
TWI296091B (en) * | 2005-11-15 | 2008-04-21 | Novatek Microelectronics Corp | Motion estimation circuit and motion estimation processing element |
US8731071B1 (en) | 2005-12-15 | 2014-05-20 | Nvidia Corporation | System for performing finite input response (FIR) filtering in motion estimation |
US20070217515A1 (en) * | 2006-03-15 | 2007-09-20 | Yu-Jen Wang | Method for determining a search pattern for motion estimation |
US8724702B1 (en) | 2006-03-29 | 2014-05-13 | Nvidia Corporation | Methods and systems for motion estimation used in video coding |
US8660380B2 (en) | 2006-08-25 | 2014-02-25 | Nvidia Corporation | Method and system for performing two-dimensional transform on data value array with reduced power consumption |
US9094686B2 (en) * | 2006-09-06 | 2015-07-28 | Broadcom Corporation | Systems and methods for faster throughput for compressed video data decoding |
KR101354659B1 (en) * | 2006-11-08 | 2014-01-28 | 삼성전자주식회사 | Method and apparatus for motion compensation supporting multicodec |
US7958177B2 (en) * | 2006-11-29 | 2011-06-07 | Arcsoft, Inc. | Method of parallelly filtering input data words to obtain final output data words containing packed half-pel pixels |
US8756482B2 (en) | 2007-05-25 | 2014-06-17 | Nvidia Corporation | Efficient encoding/decoding of a sequence of data frames |
US9118927B2 (en) * | 2007-06-13 | 2015-08-25 | Nvidia Corporation | Sub-pixel interpolation and its application in motion compensated encoding of a video signal |
US8873625B2 (en) | 2007-07-18 | 2014-10-28 | Nvidia Corporation | Enhanced compression in representing non-frame-edge blocks of image frames |
JP2009054032A (en) * | 2007-08-28 | 2009-03-12 | Toshiba Corp | Parallel processor |
JP5159258B2 (en) * | 2007-11-06 | 2013-03-06 | 株式会社東芝 | Arithmetic processing unit |
US8437410B1 (en) | 2007-11-21 | 2013-05-07 | Marvell International Ltd. | System and method to execute a clipping instruction |
US20090188521A1 (en) * | 2008-01-17 | 2009-07-30 | Evazynajad Ali M | Dental Floss Formed from Botanic and Botanically Derived Fiber |
CN102112971A (en) | 2008-08-06 | 2011-06-29 | 阿斯奔收购公司 | Haltable and restartable dma engine |
US8386547B2 (en) | 2008-10-31 | 2013-02-26 | Intel Corporation | Instruction and logic for performing range detection |
US9179166B2 (en) * | 2008-12-05 | 2015-11-03 | Nvidia Corporation | Multi-protocol deblock engine core system and method |
US8666181B2 (en) | 2008-12-10 | 2014-03-04 | Nvidia Corporation | Adaptive multiple engine image motion detection system and method |
US20100180100A1 (en) * | 2009-01-13 | 2010-07-15 | Mavrix Technology, Inc. | Matrix microprocessor and method of operation |
CN102055969B (en) * | 2009-10-30 | 2012-12-19 | 鸿富锦精密工业(深圳)有限公司 | Image deblocking filter and image processing device using same |
US9390539B2 (en) * | 2009-11-04 | 2016-07-12 | Intel Corporation | Performing parallel shading operations |
CN103502935B (en) | 2011-04-01 | 2016-10-12 | 英特尔公司 | The friendly instruction format of vector and execution thereof |
TWI449433B (en) * | 2011-08-01 | 2014-08-11 | Novatek Microelectronics Corp | Image processing circuit and image processing method |
CN102346769B (en) * | 2011-09-20 | 2014-10-22 | 奇智软件(北京)有限公司 | Method and device for consolidating registry file |
US9389861B2 (en) * | 2011-12-22 | 2016-07-12 | Intel Corporation | Systems, apparatuses, and methods for mapping a source operand to a different range |
US10157061B2 (en) * | 2011-12-22 | 2018-12-18 | Intel Corporation | Instructions for storing in general purpose registers one of two scalar constants based on the contents of vector write masks |
CN104025021A (en) * | 2011-12-23 | 2014-09-03 | 英特尔公司 | Apparatus and method for sliding window data gather |
US9152424B2 (en) * | 2012-06-14 | 2015-10-06 | International Business Machines Corporation | Mitigating instruction prediction latency with independently filtered presence predictors |
US9241163B2 (en) * | 2013-03-15 | 2016-01-19 | Intersil Americas LLC | VC-2 decoding using parallel decoding paths |
US9330022B2 (en) * | 2013-06-25 | 2016-05-03 | Intel Corporation | Power logic for memory address conversion |
JP6262621B2 (en) * | 2013-09-25 | 2018-01-17 | 株式会社メガチップス | Image enlargement / reduction processing apparatus and image enlargement / reduction processing method |
US9547493B2 (en) * | 2013-10-03 | 2017-01-17 | Synopsys, Inc. | Self-timed user-extension instructions for a processing device |
US20160125263A1 (en) * | 2014-11-03 | 2016-05-05 | Texas Instruments Incorporated | Method to compute sliding window block sum using instruction based selective horizontal addition in vector processor |
US9715464B2 (en) | 2015-03-27 | 2017-07-25 | Microsoft Technology Licensing, Llc | Direct memory access descriptor processing |
US10091904B2 (en) * | 2016-07-22 | 2018-10-02 | Intel Corporation | Storage sled for data center |
US10108581B1 (en) | 2017-04-03 | 2018-10-23 | Google Llc | Vector reduction processor |
GB2563384B (en) | 2017-06-07 | 2019-12-25 | Advanced Risc Mach Ltd | Programmable instruction buffering |
US10437740B2 (en) * | 2017-12-15 | 2019-10-08 | Exten Technologies, Inc. | High performance raid operations offload with minimized local buffering |
CN113411575B (en) * | 2018-09-24 | 2022-07-22 | 华为技术有限公司 | Image processing apparatus, method and storage medium for performing quality optimized deblocking |
US11099973B2 (en) * | 2019-01-28 | 2021-08-24 | Salesforce.Com, Inc. | Automated test case management systems and methods |
CN114341805A (en) * | 2019-07-03 | 2022-04-12 | 华夏芯(北京)通用处理器技术有限公司 | Pure function language neural network accelerator system and structure |
US11880231B2 (en) * | 2020-12-14 | 2024-01-23 | Microsoft Technology Licensing, Llc | Accurate timestamp or derived counter value generation on a complex CPU |
CN113312088B (en) * | 2021-06-29 | 2022-05-17 | 北京熵核科技有限公司 | Method and device for executing program instruction |
US11567775B1 (en) * | 2021-10-25 | 2023-01-31 | Sap Se | Dynamic generation of logic for computing systems |
WO2023235004A1 (en) * | 2022-06-02 | 2023-12-07 | Micron Technology, Inc. | Time-division multiplexed simd function unit |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5884057A (en) * | 1994-01-11 | 1999-03-16 | Exponential Technology, Inc. | Temporal re-alignment of a floating point pipeline to an integer pipeline for emulation of a load-operate architecture on a load/store processor |
US5923892A (en) * | 1997-10-27 | 1999-07-13 | Levy; Paul S. | Host processor and coprocessor arrangement for processing platform-independent code |
US6757019B1 (en) * | 1999-03-13 | 2004-06-29 | The Board Of Trustees Of The Leland Stanford Junior University | Low-power parallel processor and imager having peripheral control circuitry |
US6865663B2 (en) * | 2000-02-24 | 2005-03-08 | Pts Corporation | Control processor dynamically loading shadow instruction register associated with memory entry of coprocessor in flexible coupling mode |
US6950929B2 (en) * | 2001-05-24 | 2005-09-27 | Samsung Electronics Co., Ltd. | Loop instruction processing using loop buffer in a data processing device having a coprocessor |
US20060047934A1 (en) * | 2004-08-31 | 2006-03-02 | Schmisseur Mark A | Integrated circuit capable of memory access control |
US7079147B2 (en) * | 2003-05-14 | 2006-07-18 | Lsi Logic Corporation | System and method for cooperative operation of a processor and coprocessor |
US20070071106A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for performing deblocking in microprocessor-based video codec applications |
Family Cites Families (213)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4594659A (en) | 1982-10-13 | 1986-06-10 | Honeywell Information Systems Inc. | Method and apparatus for prefetching instructions for a central execution pipeline unit |
JPS63225822A (en) | 1986-08-11 | 1988-09-20 | Toshiba Corp | Barrel shifter |
US4905178A (en) | 1986-09-19 | 1990-02-27 | Performance Semiconductor Corporation | Fast shifter method and structure |
JPS6398729A (en) | 1986-10-15 | 1988-04-30 | Fujitsu Ltd | Barrel shifter |
US4914622A (en) | 1987-04-17 | 1990-04-03 | Advanced Micro Devices, Inc. | Array-organized bit map with a barrel shifter |
DE3889812T2 (en) | 1987-08-28 | 1994-12-15 | Nec Corp | Data processor with a test structure for multi-position shifters. |
KR970005453B1 (en) | 1987-12-25 | 1997-04-16 | 가부시기가이샤 히다찌세이사꾸쇼 | Data processing apparatus for high speed processing |
US4926323A (en) | 1988-03-03 | 1990-05-15 | Advanced Micro Devices, Inc. | Streamlined instruction processor |
JPH01263820A (en) | 1988-04-15 | 1989-10-20 | Hitachi Ltd | Microprocessor |
EP0344347B1 (en) | 1988-06-02 | 1993-12-29 | Deutsche ITT Industries GmbH | Digital signal processing unit |
GB2229832B (en) | 1989-03-30 | 1993-04-07 | Intel Corp | Byte swap instruction for memory format conversion within a microprocessor |
JPH03185530A (en) | 1989-12-14 | 1991-08-13 | Mitsubishi Electric Corp | Data processor |
EP0436341B1 (en) | 1990-01-02 | 1997-05-07 | Motorola, Inc. | Sequential prefetch method for 1, 2 or 3 word instructions |
JPH03248226A (en) | 1990-02-26 | 1991-11-06 | Nec Corp | Microprocessor |
JP2560889B2 (en) | 1990-05-22 | 1996-12-04 | 日本電気株式会社 | Microprocessor |
US5778423A (en) | 1990-06-29 | 1998-07-07 | Digital Equipment Corporation | Prefetch instruction for improving performance in reduced instruction set processor |
CA2045790A1 (en) | 1990-06-29 | 1991-12-30 | Richard Lee Sites | Branch prediction in high-performance processor |
US5155843A (en) | 1990-06-29 | 1992-10-13 | Digital Equipment Corporation | Error transition mode for multi-processor system |
JP2556612B2 (en) | 1990-08-29 | 1996-11-20 | 日本電気アイシーマイコンシステム株式会社 | Barrel shifter circuit |
US5636363A (en) | 1991-06-14 | 1997-06-03 | Integrated Device Technology, Inc. | Hardware control structure and method for off-chip monitoring entries of an on-chip cache |
US5493687A (en) | 1991-07-08 | 1996-02-20 | Seiko Epson Corporation | RISC microprocessor architecture implementing multiple typed register sets |
US5539911A (en) | 1991-07-08 | 1996-07-23 | Seiko Epson Corporation | High-performance, superscalar-based computer system with out-of-order instruction execution |
US5450586A (en) | 1991-08-14 | 1995-09-12 | Hewlett-Packard Company | System for analyzing and debugging embedded software through dynamic and interactive use of code markers |
US5283874A (en) | 1991-10-21 | 1994-02-01 | Intel Corporation | Cross coupling mechanisms for simultaneously completing consecutive pipeline instructions even if they begin to process at the same microprocessor of the issue fee |
CA2073516A1 (en) | 1991-11-27 | 1993-05-28 | Peter Michael Kogge | Dynamic multi-mode parallel processor array architecture computer system |
FR2690299B1 (en) * | 1992-04-17 | 1994-06-17 | Telecommunications Sa | METHOD AND DEVICE FOR SPATIAL FILTERING OF DIGITAL IMAGES DECODED BY BLOCK TRANSFORMATION. |
US5423011A (en) | 1992-06-11 | 1995-06-06 | International Business Machines Corporation | Apparatus for initializing branch prediction information |
US5542074A (en) | 1992-10-22 | 1996-07-30 | Maspar Computer Corporation | Parallel processor system with highly flexible local control capability, including selective inversion of instruction signal and control of bit shift amount |
US5696958A (en) | 1993-01-11 | 1997-12-09 | Silicon Graphics, Inc. | Method and apparatus for reducing delays following the execution of a branch instruction in an instruction pipeline |
GB2275119B (en) | 1993-02-03 | 1997-05-14 | Motorola Inc | A cached processor |
US5937202A (en) | 1993-02-11 | 1999-08-10 | 3-D Computing, Inc. | High-speed, parallel, processor architecture for front-end electronics, based on a single type of ASIC, and method use thereof |
US5454117A (en) | 1993-08-25 | 1995-09-26 | Nexgen, Inc. | Configurable branch prediction for a processor performing speculative execution |
JP2801135B2 (en) | 1993-11-26 | 1998-09-21 | 富士通株式会社 | Instruction reading method and instruction reading device for pipeline processor |
US6116768A (en) | 1993-11-30 | 2000-09-12 | Texas Instruments Incorporated | Three input arithmetic logic unit with barrel rotator |
US5590350A (en) | 1993-11-30 | 1996-12-31 | Texas Instruments Incorporated | Three input arithmetic logic unit with mask generator |
US5509129A (en) | 1993-11-30 | 1996-04-16 | Guttag; Karl M. | Long instruction word controlling plural independent processor operations |
US5590351A (en) | 1994-01-21 | 1996-12-31 | Advanced Micro Devices, Inc. | Superscalar execution unit for sequential instruction pointer updates and segment limit checks |
JPH07253922A (en) | 1994-03-14 | 1995-10-03 | Texas Instr Japan Ltd | Address generating circuit |
US5530825A (en) | 1994-04-15 | 1996-06-25 | Motorola, Inc. | Data processor with branch target address cache and method of operation |
US5517436A (en) | 1994-06-07 | 1996-05-14 | Andreas; David C. | Digital signal processor for audio applications |
AU698055B2 (en) * | 1994-07-14 | 1998-10-22 | Johnson-Grace Company | Method and apparatus for compressing images |
US5809293A (en) | 1994-07-29 | 1998-09-15 | International Business Machines Corporation | System and method for program execution tracing within an integrated processor |
US5692168A (en) | 1994-10-18 | 1997-11-25 | Cyrix Corporation | Prefetch buffer using flow control bit to identify changes of flow within the code stream |
US5600674A (en) | 1995-03-02 | 1997-02-04 | Motorola Inc. | Method and apparatus of an enhanced digital signal processor |
US5655122A (en) | 1995-04-05 | 1997-08-05 | Sequent Computer Systems, Inc. | Optimizing compiler with static prediction of branch probability, branch frequency and function frequency |
US5835753A (en) | 1995-04-12 | 1998-11-10 | Advanced Micro Devices, Inc. | Microprocessor with dynamically extendable pipeline stages and a classifying circuit |
US5920711A (en) | 1995-06-02 | 1999-07-06 | Synopsys, Inc. | System for frame-based protocol, graphical capture, synthesis, analysis, and simulation |
US5842004A (en) | 1995-08-04 | 1998-11-24 | Sun Microsystems, Inc. | Method and apparatus for decompression of compressed geometric three-dimensional graphics data |
US6292879B1 (en) | 1995-10-25 | 2001-09-18 | Anthony S. Fong | Method and apparatus to specify access control list and cache enabling and cache coherency requirement enabling on individual operands of an instruction of a computer |
US5727211A (en) | 1995-11-09 | 1998-03-10 | Chromatic Research, Inc. | System and method for fast context switching between tasks |
US5996071A (en) | 1995-12-15 | 1999-11-30 | Via-Cyrix, Inc. | Detecting self-modifying code in a pipelined processor with branch processing by comparing latched store address to subsequent target address |
US5896305A (en) | 1996-02-08 | 1999-04-20 | Texas Instruments Incorporated | Shifter circuit for an arithmetic logic unit in a microprocessor |
US5752014A (en) | 1996-04-29 | 1998-05-12 | International Business Machines Corporation | Automatic selection of branch prediction methodology for subsequent branch instruction based on outcome of previous branch prediction |
US5784636A (en) | 1996-05-28 | 1998-07-21 | National Semiconductor Corporation | Reconfigurable computer architecture for use in signal processing applications |
US20010025337A1 (en) | 1996-06-10 | 2001-09-27 | Frank Worrell | Microprocessor including a mode detector for setting compression mode |
US5805876A (en) | 1996-09-30 | 1998-09-08 | International Business Machines Corporation | Method and system for reducing average branch resolution time and effective misprediction penalty in a processor |
US5964884A (en) | 1996-09-30 | 1999-10-12 | Advanced Micro Devices, Inc. | Self-timed pulse control circuit |
US5848264A (en) | 1996-10-25 | 1998-12-08 | S3 Incorporated | Debug and video queue for multi-processor chip |
US6061521A (en) | 1996-12-02 | 2000-05-09 | Compaq Computer Corp. | Computer having multimedia operations executable as two distinct sets of operations within a single instruction cycle |
US5909572A (en) | 1996-12-02 | 1999-06-01 | Compaq Computer Corp. | System and method for conditionally moving an operand from a source register to a destination register |
EP0855645A3 (en) | 1996-12-31 | 2000-05-24 | Texas Instruments Incorporated | System and method for speculative execution of instructions with data prefetch |
KR100236533B1 (en) | 1997-01-16 | 2000-01-15 | 윤종용 | Digital signal processor |
US6185732B1 (en) | 1997-04-08 | 2001-02-06 | Advanced Micro Devices, Inc. | Software debug port for a microprocessor |
US6154857A (en) | 1997-04-08 | 2000-11-28 | Advanced Micro Devices, Inc. | Microprocessor-based device incorporating a cache for capturing software performance profiling data |
US6088786A (en) | 1997-06-27 | 2000-07-11 | Sun Microsystems, Inc. | Method and system for coupling a stack based processor to register based functional unit |
US6026478A (en) | 1997-08-01 | 2000-02-15 | Micron Technology, Inc. | Split embedded DRAM processor |
US6226738B1 (en) | 1997-08-01 | 2001-05-01 | Micron Technology, Inc. | Split embedded DRAM processor |
US6760833B1 (en) | 1997-08-01 | 2004-07-06 | Micron Technology, Inc. | Split embedded DRAM processor |
US6157988A (en) | 1997-08-01 | 2000-12-05 | Micron Technology, Inc. | Method and apparatus for high performance branching in pipelined microsystems |
JPH1185515A (en) | 1997-09-10 | 1999-03-30 | Ricoh Co Ltd | Microprocessor |
US5978909A (en) | 1997-11-26 | 1999-11-02 | Intel Corporation | System for speculative branch target prediction having a dynamic prediction history buffer and a static prediction history buffer |
US6044458A (en) | 1997-12-12 | 2000-03-28 | Motorola, Inc. | System for monitoring program flow utilizing fixwords stored sequentially to opcodes |
US6014743A (en) | 1998-02-05 | 2000-01-11 | Intergrated Device Technology, Inc. | Apparatus and method for recording a floating point error pointer in zero cycles |
US6151672A (en) | 1998-02-23 | 2000-11-21 | Hewlett-Packard Company | Methods and apparatus for reducing interference in a branch history table of a microprocessor |
US6374349B2 (en) | 1998-03-19 | 2002-04-16 | Mcfarling Scott | Branch predictor with serially connected predictor stages for improving branch prediction accuracy |
US6377970B1 (en) * | 1998-03-31 | 2002-04-23 | Intel Corporation | Method and apparatus for computing a sum of packed data elements using SIMD multiply circuitry |
US6584585B1 (en) | 1998-05-08 | 2003-06-24 | Gateway, Inc. | Virtual device driver and methods employing the same |
US6289417B1 (en) | 1998-05-18 | 2001-09-11 | Arm Limited | Operand supply to an execution unit |
US6466333B2 (en) | 1998-06-26 | 2002-10-15 | Canon Kabushiki Kaisha | Streamlined tetrahedral interpolation |
US20020053015A1 (en) | 1998-07-14 | 2002-05-02 | Sony Corporation And Sony Electronics Inc. | Digital signal processor particularly suited for decoding digital audio |
US6327651B1 (en) | 1998-09-08 | 2001-12-04 | International Business Machines Corporation | Wide shifting in the vector permute unit |
US6253287B1 (en) | 1998-09-09 | 2001-06-26 | Advanced Micro Devices, Inc. | Using three-dimensional storage to make variable-length instructions appear uniform in two dimensions |
US6339822B1 (en) | 1998-10-02 | 2002-01-15 | Advanced Micro Devices, Inc. | Using padded instructions in a block-oriented cache |
US6671743B1 (en) | 1998-11-13 | 2003-12-30 | Creative Technology, Ltd. | Method and system for exposing proprietary APIs in a privileged device driver to an application |
US6529930B1 (en) | 1998-11-16 | 2003-03-04 | Hitachi America, Ltd. | Methods and apparatus for performing a signed saturation operation |
US6189091B1 (en) | 1998-12-02 | 2001-02-13 | Ip First, L.L.C. | Apparatus and method for speculatively updating global history and restoring same on branch misprediction detection |
US6341348B1 (en) | 1998-12-03 | 2002-01-22 | Sun Microsystems, Inc. | Software branch prediction filtering for a microprocessor |
US6957327B1 (en) | 1998-12-31 | 2005-10-18 | Stmicroelectronics, Inc. | Block-based branch target buffer |
US6477683B1 (en) | 1999-02-05 | 2002-11-05 | Tensilica, Inc. | Automated processor generation system for designing a configurable processor and method for the same |
US6418530B2 (en) | 1999-02-18 | 2002-07-09 | Hewlett-Packard Company | Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions |
US6499101B1 (en) | 1999-03-18 | 2002-12-24 | I.P. First L.L.C. | Static branch prediction mechanism for conditional branch instructions |
US6427206B1 (en) | 1999-05-03 | 2002-07-30 | Intel Corporation | Optimized branch predictions for strongly predicted compiler branches |
US6560754B1 (en) | 1999-05-13 | 2003-05-06 | Arc International Plc | Method and apparatus for jump control in a pipelined processor |
US6622240B1 (en) | 1999-06-18 | 2003-09-16 | Intrinsity, Inc. | Method and apparatus for pre-branch instruction |
US6518974B2 (en) | 1999-07-16 | 2003-02-11 | Intel Corporation | Pixel engine |
JP2001034504A (en) | 1999-07-19 | 2001-02-09 | Mitsubishi Electric Corp | Source level debugger |
US6772325B1 (en) | 1999-10-01 | 2004-08-03 | Hitachi, Ltd. | Processor architecture and operation for exploiting improved branch control instruction |
US6546481B1 (en) | 1999-11-05 | 2003-04-08 | Ip - First Llc | Split history tables for branch prediction |
US7072398B2 (en) * | 2000-12-06 | 2006-07-04 | Kai-Kuang Ma | System and method for motion vector generation and analysis of digital video clips |
US6609194B1 (en) | 1999-11-12 | 2003-08-19 | Ip-First, Llc | Apparatus for performing branch target address calculation based on branch type |
US6909744B2 (en) | 1999-12-09 | 2005-06-21 | Redrock Semiconductor, Inc. | Processor architecture for compression and decompression of video and images |
KR100395763B1 (en) | 2000-02-01 | 2003-08-25 | 삼성전자주식회사 | A branch predictor for microprocessor having multiple processes |
US6412038B1 (en) | 2000-02-14 | 2002-06-25 | Intel Corporation | Integral modular cache for a processor |
US6629167B1 (en) | 2000-02-18 | 2003-09-30 | Hewlett-Packard Development Company, L.P. | Pipeline decoupling buffer for handling early data and late data |
US6519696B1 (en) | 2000-03-30 | 2003-02-11 | I.P. First, Llc | Paired register exchange using renaming register map |
US6876703B2 (en) | 2000-05-11 | 2005-04-05 | Ub Video Inc. | Method and apparatus for video coding |
US7079579B2 (en) * | 2000-07-13 | 2006-07-18 | Samsung Electronics Co., Ltd. | Block matching processor and method for block matching motion estimation in video compression |
US6681295B1 (en) | 2000-08-31 | 2004-01-20 | Hewlett-Packard Development Company, L.P. | Fast lane prefetching |
US6718460B1 (en) | 2000-09-05 | 2004-04-06 | Sun Microsystems, Inc. | Mechanism for error handling in a computer system |
US20020065860A1 (en) * | 2000-10-04 | 2002-05-30 | Grisenthwaite Richard Roy | Data processing apparatus and method for saturating data values |
US20030070013A1 (en) | 2000-10-27 | 2003-04-10 | Daniel Hansson | Method and apparatus for reducing power consumption in a digital processor |
US6948054B2 (en) | 2000-11-29 | 2005-09-20 | Lsi Logic Corporation | Simple branch prediction and misprediction recovery method |
KR100386639B1 (en) * | 2000-12-04 | 2003-06-02 | 주식회사 오픈비주얼 | Method for decompression of images and video using regularized dequantizer |
TW477954B (en) | 2000-12-05 | 2002-03-01 | Faraday Tech Corp | Memory data accessing architecture and method for a processor |
US20020073301A1 (en) | 2000-12-07 | 2002-06-13 | International Business Machines Corporation | Hardware for use with compiler generated branch information |
US7139903B2 (en) | 2000-12-19 | 2006-11-21 | Hewlett-Packard Development Company, L.P. | Conflict free parallel read access to a bank interleaved branch predictor in a processor |
US6877089B2 (en) | 2000-12-27 | 2005-04-05 | International Business Machines Corporation | Branch prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program |
US6963554B1 (en) | 2000-12-27 | 2005-11-08 | National Semiconductor Corporation | Microwire dynamic sequencer pipeline stall |
US20020087851A1 (en) | 2000-12-28 | 2002-07-04 | Matsushita Electric Industrial Co., Ltd. | Microprocessor and an instruction converter |
US8285976B2 (en) | 2000-12-28 | 2012-10-09 | Micron Technology, Inc. | Method and apparatus for predicting branches using a meta predictor |
US6925634B2 (en) | 2001-01-24 | 2005-08-02 | Texas Instruments Incorporated | Method for maintaining cache coherency in software in a shared memory system |
US7039901B2 (en) | 2001-01-24 | 2006-05-02 | Texas Instruments Incorporated | Software shared memory bus |
US6823447B2 (en) | 2001-03-01 | 2004-11-23 | International Business Machines Corporation | Software hint to improve the branch target prediction accuracy |
AU2002238325A1 (en) | 2001-03-02 | 2002-09-19 | Atsana Semiconductor Corp. | Data processing apparatus and system and method for controlling memory access |
JP3890910B2 (en) | 2001-03-21 | 2007-03-07 | 株式会社日立製作所 | Instruction execution result prediction device |
US7010558B2 (en) | 2001-04-19 | 2006-03-07 | Arc International | Data processor with enhanced instruction execution and method |
US20020194461A1 (en) | 2001-05-04 | 2002-12-19 | Ip First Llc | Speculative branch target address cache |
US6886093B2 (en) | 2001-05-04 | 2005-04-26 | Ip-First, Llc | Speculative hybrid branch direction predictor |
US20020194462A1 (en) | 2001-05-04 | 2002-12-19 | Ip First Llc | Apparatus and method for selecting one of multiple target addresses stored in a speculative branch target address cache per instruction cache line |
US7200740B2 (en) | 2001-05-04 | 2007-04-03 | Ip-First, Llc | Apparatus and method for speculatively performing a return instruction in a microprocessor |
US7165168B2 (en) | 2003-01-14 | 2007-01-16 | Ip-First, Llc | Microprocessor with branch target address cache update queue |
US7165169B2 (en) | 2001-05-04 | 2007-01-16 | Ip-First, Llc | Speculative branch target address cache with selective override by secondary predictor based on branch instruction type |
GB0112275D0 (en) | 2001-05-21 | 2001-07-11 | Micron Technology Inc | Method and circuit for normalization of floating point significands in a simd array mpp |
GB0112269D0 (en) | 2001-05-21 | 2001-07-11 | Micron Technology Inc | Method and circuit for alignment of floating point significands in a simd array mpp |
JP3805339B2 (en) | 2001-06-29 | 2006-08-02 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Method for predicting branch target, processor, and compiler |
US7162619B2 (en) | 2001-07-03 | 2007-01-09 | Ip-First, Llc | Apparatus and method for densely packing a branch instruction predicted by a branch target address cache and associated target instructions into a byte-wide instruction buffer |
US6823444B1 (en) | 2001-07-03 | 2004-11-23 | Ip-First, Llc | Apparatus and method for selectively accessing disparate instruction buffer stages based on branch target address cache hit and instruction stage wrap |
JP4145586B2 (en) * | 2001-07-24 | 2008-09-03 | セイコーエプソン株式会社 | Image processing apparatus, image processing program, and image processing method |
US7010675B2 (en) | 2001-07-27 | 2006-03-07 | Stmicroelectronics, Inc. | Fetch branch architecture for reducing branch penalty without branch prediction |
US7191445B2 (en) | 2001-08-31 | 2007-03-13 | Texas Instruments Incorporated | Method using embedded real-time analysis components with corresponding real-time operating system software objects |
JP2003131902A (en) | 2001-10-24 | 2003-05-09 | Toshiba Corp | Software debugger, system-level debugger, debug method and debug program |
US7272622B2 (en) | 2001-10-29 | 2007-09-18 | Intel Corporation | Method and apparatus for parallel shift right merge of data |
US7685212B2 (en) | 2001-10-29 | 2010-03-23 | Intel Corporation | Fast full search motion estimation with SIMD merge instruction |
US20040054877A1 (en) | 2001-10-29 | 2004-03-18 | Macy William W. | Method and apparatus for shuffling data |
US7051239B2 (en) | 2001-12-28 | 2006-05-23 | Hewlett-Packard Development Company, L.P. | Method and apparatus for efficiently implementing trace and/or logic analysis mechanisms on a processor chip |
US20030225998A1 (en) | 2002-01-31 | 2003-12-04 | Khan Mohammed Noshad | Configurable data processor with multi-length instruction set architecture |
US7168067B2 (en) | 2002-02-08 | 2007-01-23 | Agere Systems Inc. | Multiprocessor system with cache-based software breakpoints |
US7181596B2 (en) | 2002-02-12 | 2007-02-20 | Ip-First, Llc | Apparatus and method for extending a microprocessor instruction set |
US7529912B2 (en) | 2002-02-12 | 2009-05-05 | Via Technologies, Inc. | Apparatus and method for instruction-level specification of floating point format |
US7328328B2 (en) | 2002-02-19 | 2008-02-05 | Ip-First, Llc | Non-temporal memory reference control mechanism |
US7315921B2 (en) | 2002-02-19 | 2008-01-01 | Ip-First, Llc | Apparatus and method for selective memory attribute control |
US7546446B2 (en) | 2002-03-08 | 2009-06-09 | Ip-First, Llc | Selective interrupt suppression |
US7395412B2 (en) | 2002-03-08 | 2008-07-01 | Ip-First, Llc | Apparatus and method for extending data modes in a microprocessor |
US7180943B1 (en) * | 2002-03-26 | 2007-02-20 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | Compression of a data stream by selection among a set of compression tools |
US7155598B2 (en) | 2002-04-02 | 2006-12-26 | Ip-First, Llc | Apparatus and method for conditional instruction execution |
US7185180B2 (en) | 2002-04-02 | 2007-02-27 | Ip-First, Llc | Apparatus and method for selective control of condition code write back |
US7302551B2 (en) | 2002-04-02 | 2007-11-27 | Ip-First, Llc | Suppression of store checking |
US7373483B2 (en) | 2002-04-02 | 2008-05-13 | Ip-First, Llc | Mechanism for extending the number of registers in a microprocessor |
US7380103B2 (en) | 2002-04-02 | 2008-05-27 | Ip-First, Llc | Apparatus and method for selective control of results write back |
US20030198295A1 (en) * | 2002-04-12 | 2003-10-23 | Liang-Gee Chen | Global elimination algorithm for motion estimation and the hardware architecture thereof |
US7380109B2 (en) | 2002-04-15 | 2008-05-27 | Ip-First, Llc | Apparatus and method for providing extended address modes in an existing instruction set for a microprocessor |
US20030204705A1 (en) | 2002-04-30 | 2003-10-30 | Oldfield William H. | Prediction of branch instructions in a data processing apparatus |
KR100450753B1 (en) | 2002-05-17 | 2004-10-01 | 한국전자통신연구원 | Programmable variable length decoder including interface of CPU processor |
US6938151B2 (en) | 2002-06-04 | 2005-08-30 | International Business Machines Corporation | Hybrid branch prediction using a global selection counter and a prediction method comparison table |
US6718504B1 (en) | 2002-06-05 | 2004-04-06 | Arc International | Method and apparatus for implementing a data processor adapted for turbo decoding |
US7493480B2 (en) | 2002-07-18 | 2009-02-17 | International Business Machines Corporation | Method and apparatus for prefetching branch history information |
US7392368B2 (en) * | 2002-08-09 | 2008-06-24 | Marvell International Ltd. | Cross multiply and add instruction and multiply and subtract instruction SIMD execution on real and imaginary components of a plurality of complex data elements |
US7000095B2 (en) | 2002-09-06 | 2006-02-14 | Mips Technologies, Inc. | Method and apparatus for clearing hazards using jump instructions |
WO2004030369A1 (en) * | 2002-09-27 | 2004-04-08 | Videosoft, Inc. | Real-time video coding/decoding |
US20050125634A1 (en) | 2002-10-04 | 2005-06-09 | Fujitsu Limited | Processor and instruction control method |
US6968444B1 (en) | 2002-11-04 | 2005-11-22 | Advanced Micro Devices, Inc. | Microprocessor employing a fixed position dispatch unit |
US8667252B2 (en) * | 2002-11-21 | 2014-03-04 | Stmicroelectronics, Inc. | Method and apparatus to adapt the clock rate of a programmable coprocessor for optimal performance and power dissipation |
US7227901B2 (en) | 2002-11-21 | 2007-06-05 | Ub Video Inc. | Low-complexity deblocking filter |
US7266676B2 (en) | 2003-03-21 | 2007-09-04 | Analog Devices, Inc. | Method and apparatus for branch prediction based on branch targets utilizing tag and data arrays |
US6774832B1 (en) | 2003-03-25 | 2004-08-10 | Raytheon Company | Multi-bit output DDS with real time delta sigma modulation look up from memory |
US7174444B2 (en) | 2003-03-31 | 2007-02-06 | Intel Corporation | Preventing a read of a next sequential chunk in branch prediction of a subject chunk |
US20040193855A1 (en) | 2003-03-31 | 2004-09-30 | Nicolas Kacevas | System and method for branch prediction access |
US7590829B2 (en) | 2003-03-31 | 2009-09-15 | Stretch, Inc. | Extension adapter |
US20040225870A1 (en) | 2003-05-07 | 2004-11-11 | Srinivasan Srikanth T. | Method and apparatus for reducing wrong path execution in a speculative multi-threaded processor |
US7010676B2 (en) | 2003-05-12 | 2006-03-07 | International Business Machines Corporation | Last iteration loop branch prediction upon counter threshold and resolution upon counter one |
US20040252766A1 (en) * | 2003-06-11 | 2004-12-16 | Daeyang Foundation (Sejong University) | Motion vector search method and apparatus |
US20040255104A1 (en) | 2003-06-12 | 2004-12-16 | Intel Corporation | Method and apparatus for recycling candidate branch outcomes after a wrong-path execution in a superscalar processor |
US7668897B2 (en) | 2003-06-16 | 2010-02-23 | Arm Limited | Result partitioning within SIMD data processing systems |
US7539714B2 (en) * | 2003-06-30 | 2009-05-26 | Intel Corporation | Method, apparatus, and instruction for performing a sign operation that multiplies |
US7783871B2 (en) | 2003-06-30 | 2010-08-24 | Intel Corporation | Method to remove stale branch predictions for an instruction prior to execution within a microprocessor |
US7424501B2 (en) | 2003-06-30 | 2008-09-09 | Intel Corporation | Nonlinear filtering and deblocking applications utilizing SIMD sign and absolute value operations |
US7373642B2 (en) | 2003-07-29 | 2008-05-13 | Stretch, Inc. | Defining instruction extensions in a standard programming language |
US20050027974A1 (en) | 2003-07-31 | 2005-02-03 | Oded Lempel | Method and system for conserving resources in an instruction pipeline |
US20050024486A1 (en) | 2003-07-31 | 2005-02-03 | Viresh Ratnakar | Video codec system with real-time complexity adaptation |
US7133950B2 (en) | 2003-08-19 | 2006-11-07 | Sun Microsystems, Inc. | Request arbitration in multi-core processor |
JP2005078234A (en) | 2003-08-29 | 2005-03-24 | Renesas Technology Corp | Information processor |
US7237098B2 (en) | 2003-09-08 | 2007-06-26 | Ip-First, Llc | Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence |
US20050066305A1 (en) | 2003-09-22 | 2005-03-24 | Lisanke Robert John | Method and machine for efficient simulation of digital hardware within a software development environment |
US7277592B1 (en) * | 2003-10-21 | 2007-10-02 | Redrock Semiconductory Ltd. | Spacial deblocking method using limited edge differences only to linearly correct blocking artifact |
US7457362B2 (en) | 2003-10-24 | 2008-11-25 | Texas Instruments Incorporated | Loop deblock filtering of block coded video in a very long instruction word processor |
KR100980076B1 (en) | 2003-10-24 | 2010-09-06 | 삼성전자주식회사 | System and method for branch prediction with low-power consumption |
US7363544B2 (en) | 2003-10-30 | 2008-04-22 | International Business Machines Corporation | Program debug method and apparatus |
US7219207B2 (en) | 2003-12-03 | 2007-05-15 | Intel Corporation | Reconfigurable trace cache |
US8069336B2 (en) | 2003-12-03 | 2011-11-29 | Globalfoundries Inc. | Transitioning from instruction cache to trace cache on label boundaries |
US7401328B2 (en) | 2003-12-18 | 2008-07-15 | Lsi Corporation | Software-implemented grouping techniques for use in a superscalar data processing system |
US7293164B2 (en) | 2004-01-14 | 2007-11-06 | International Business Machines Corporation | Autonomic method and apparatus for counting branch instructions to generate branch statistics meant to improve branch predictions |
US8607209B2 (en) | 2004-02-04 | 2013-12-10 | Bluerisc Inc. | Energy-focused compiler-assisted branch prediction |
US7613911B2 (en) | 2004-03-12 | 2009-11-03 | Arm Limited | Prefetching exception vectors by early lookup exception vectors within a cache memory |
US20050216713A1 (en) | 2004-03-25 | 2005-09-29 | International Business Machines Corporation | Instruction text controlled selectively stated branches for prediction via a branch target buffer |
US7281120B2 (en) | 2004-03-26 | 2007-10-09 | International Business Machines Corporation | Apparatus and method for decreasing the latency between an instruction cache and a pipeline processor |
US20050223202A1 (en) | 2004-03-31 | 2005-10-06 | Intel Corporation | Branch prediction in a pipelined processor |
US20050289323A1 (en) | 2004-05-19 | 2005-12-29 | Kar-Lik Wong | Barrel shifter for a microprocessor |
US20060015706A1 (en) | 2004-06-30 | 2006-01-19 | Chunrong Lai | TLB correlated branch predictor and method for use thereof |
TWI253024B (en) * | 2004-07-20 | 2006-04-11 | Realtek Semiconductor Corp | Method and apparatus for block matching |
TWI305323B (en) | 2004-08-23 | 2009-01-11 | Faraday Tech Corp | Method for verification branch prediction mechanisms and readable recording medium for storing program thereof |
US20060095713A1 (en) * | 2004-11-03 | 2006-05-04 | Stexar Corporation | Clip-and-pack instruction for processor |
WO2006096612A2 (en) | 2005-03-04 | 2006-09-14 | The Trustees Of Columbia University In The City Of New York | System and method for motion estimation and mode decision for low-complexity h.264 decoder |
US8879636B2 (en) | 2007-05-25 | 2014-11-04 | Synopsys, Inc. | Adaptive video encoding apparatus and methods |
-
2006
- 2006-09-28 US US11/528,470 patent/US20070073925A1/en not_active Abandoned
- 2006-09-28 US US11/528,327 patent/US7747088B2/en active Active
- 2006-09-28 US US11/528,338 patent/US7971042B2/en active Active
- 2006-09-28 US US11/528,325 patent/US8212823B2/en active Active
- 2006-09-28 US US11/528,434 patent/US20070074004A1/en not_active Abandoned
- 2006-09-28 US US11/528,432 patent/US8218635B2/en active Active
- 2006-09-28 WO PCT/IB2006/003358 patent/WO2007049150A2/en active Application Filing
- 2006-09-28 US US11/528,326 patent/US20070074007A1/en not_active Abandoned
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5884057A (en) * | 1994-01-11 | 1999-03-16 | Exponential Technology, Inc. | Temporal re-alignment of a floating point pipeline to an integer pipeline for emulation of a load-operate architecture on a load/store processor |
US5923892A (en) * | 1997-10-27 | 1999-07-13 | Levy; Paul S. | Host processor and coprocessor arrangement for processing platform-independent code |
US6757019B1 (en) * | 1999-03-13 | 2004-06-29 | The Board Of Trustees Of The Leland Stanford Junior University | Low-power parallel processor and imager having peripheral control circuitry |
US6865663B2 (en) * | 2000-02-24 | 2005-03-08 | Pts Corporation | Control processor dynamically loading shadow instruction register associated with memory entry of coprocessor in flexible coupling mode |
US6950929B2 (en) * | 2001-05-24 | 2005-09-27 | Samsung Electronics Co., Ltd. | Loop instruction processing using loop buffer in a data processing device having a coprocessor |
US7079147B2 (en) * | 2003-05-14 | 2006-07-18 | Lsi Logic Corporation | System and method for cooperative operation of a processor and coprocessor |
US20060047934A1 (en) * | 2004-08-31 | 2006-03-02 | Schmisseur Mark A | Integrated circuit capable of memory access control |
US20070071106A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for performing deblocking in microprocessor-based video codec applications |
US20070071101A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systolic-array based systems and methods for performing block matching in motion compensation |
US20070074012A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for recording instruction sequences in a microprocessor having a dynamically decoupleable extended instruction pipeline |
US20070073925A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for synchronizing multiple processing engines of a microprocessor |
US20070074007A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Parameterizable clip instruction and method of performing a clip operation using the same |
US20070070080A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7747088B2 (en) | 2005-09-28 | 2010-06-29 | Arc International (Uk) Limited | System and methods for performing deblocking in microprocessor-based video codec applications |
US20070071101A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systolic-array based systems and methods for performing block matching in motion compensation |
US20070070080A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
US20070074012A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for recording instruction sequences in a microprocessor having a dynamically decoupleable extended instruction pipeline |
US20070073925A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for synchronizing multiple processing engines of a microprocessor |
US20070071106A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Systems and methods for performing deblocking in microprocessor-based video codec applications |
US20070074007A1 (en) * | 2005-09-28 | 2007-03-29 | Arc International (Uk) Limited | Parameterizable clip instruction and method of performing a clip operation using the same |
US8218635B2 (en) | 2005-09-28 | 2012-07-10 | Synopsys, Inc. | Systolic-array based systems and methods for performing block matching in motion compensation |
US8212823B2 (en) | 2005-09-28 | 2012-07-03 | Synopsys, Inc. | Systems and methods for accelerating sub-pixel interpolation in video processing applications |
US7971042B2 (en) | 2005-09-28 | 2011-06-28 | Synopsys, Inc. | Microprocessor system and method for instruction-initiated recording and execution of instruction sequences in a dynamically decoupleable extended instruction pipeline |
US8634470B2 (en) * | 2007-07-24 | 2014-01-21 | Samsung Electronics Co., Ltd. | Multimedia decoding method and multimedia decoding apparatus based on multi-core processor |
US20090049281A1 (en) * | 2007-07-24 | 2009-02-19 | Samsung Electronics Co., Ltd. | Multimedia decoding method and multimedia decoding apparatus based on multi-core processor |
US20090217266A1 (en) * | 2008-02-22 | 2009-08-27 | International Business Machines Corporation | Streaming attachment of hardware accelerators to computer systems |
US20090213127A1 (en) * | 2008-02-22 | 2009-08-27 | International Business Machines Corporation | Guided attachment of accelerators to computer systems |
US20090217275A1 (en) * | 2008-02-22 | 2009-08-27 | International Business Machines Corporation | Pipelining hardware accelerators to computer systems |
US8250578B2 (en) | 2008-02-22 | 2012-08-21 | International Business Machines Corporation | Pipelining hardware accelerators to computer systems |
US8726289B2 (en) | 2008-02-22 | 2014-05-13 | International Business Machines Corporation | Streaming attachment of hardware accelerators to computer systems |
US7953912B2 (en) | 2008-02-22 | 2011-05-31 | International Business Machines Corporation | Guided attachment of accelerators to computer systems |
US11736700B2 (en) | 2013-06-03 | 2023-08-22 | Texas Instruments Incorporated | Multi-threading in a video hardware engine |
US20140355691A1 (en) * | 2013-06-03 | 2014-12-04 | Texas Instruments Incorporated | Multi-threading in a video hardware engine |
US11228769B2 (en) * | 2013-06-03 | 2022-01-18 | Texas Instruments Incorporated | Multi-threading in a video hardware engine |
US10296340B2 (en) | 2014-03-13 | 2019-05-21 | Arm Limited | Data processing apparatus for executing an access instruction for N threads |
US11042502B2 (en) * | 2014-12-24 | 2021-06-22 | Samsung Electronics Co., Ltd. | Vector processing core shared by a plurality of scalar processing cores for scheduling and executing vector instructions |
Also Published As
Publication number | Publication date |
---|---|
US8218635B2 (en) | 2012-07-10 |
US8212823B2 (en) | 2012-07-03 |
US20070071106A1 (en) | 2007-03-29 |
US20070070080A1 (en) | 2007-03-29 |
US7971042B2 (en) | 2011-06-28 |
US20070073925A1 (en) | 2007-03-29 |
WO2007049150A3 (en) | 2007-12-27 |
US20070074012A1 (en) | 2007-03-29 |
US20070071101A1 (en) | 2007-03-29 |
US20070074007A1 (en) | 2007-03-29 |
US7747088B2 (en) | 2010-06-29 |
WO2007049150A2 (en) | 2007-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20070074004A1 (en) | Systems and methods for selectively decoupling a parallel extended instruction pipeline | |
US7451296B2 (en) | Method and apparatus for pausing execution in a processor or the like | |
TWI308295B (en) | Apparatus and method for switchable conditional execution in a vliw processor | |
US6205543B1 (en) | Efficient handling of a large register file for context switching | |
US7366874B2 (en) | Apparatus and method for dispatching very long instruction word having variable length | |
US7395410B2 (en) | Processor system with an improved instruction decode control unit that controls data transfer between processor and coprocessor | |
US20050223253A1 (en) | Methods and apparatus for power control in a scalable array of processor elements | |
US8589664B2 (en) | Program flow control | |
US20080141013A1 (en) | Digital processor with control means for the execution of nested loops | |
US20210294639A1 (en) | Entering protected pipeline mode without annulling pending instructions | |
US20240036876A1 (en) | Pipeline protection for cpus with save and restore of intermediate results | |
US20210326136A1 (en) | Entering protected pipeline mode with clearing | |
US20050149931A1 (en) | Multithread processor architecture for triggered thread switching without any cycle time loss, and without any switching program command | |
US20020087844A1 (en) | Apparatus and method for concealing switch latency | |
US20080065870A1 (en) | Information processing apparatus | |
US6275924B1 (en) | System for buffering instructions in a processor by reissuing instruction fetches during decoder stall time | |
US7302555B2 (en) | Zero overhead branching and looping in time stationary processors | |
WO2009026221A2 (en) | Stall-free pipelined cache for statically scheduled and dispatched execution | |
KR100861073B1 (en) | Parallel processing processor architecture adapting adaptive pipeline | |
WO2002046917A1 (en) | Digital signal processing apparatus | |
JPH05197547A (en) | Vliw type arithmetic processor | |
US6697933B1 (en) | Method and apparatus for fast, speculative floating point register renaming | |
US20090292908A1 (en) | Method and arrangements for multipath instruction processing | |
JPH03163627A (en) | Instruction processor | |
JPH04184532A (en) | Computer system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ARC INTERNATIONAL(UK) LIMITED, UNITED KINGDOM Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WONG, KAR-LIK;GRAHAM, CARL NORMAN;LIM, SEOW CHUAN;AND OTHERS;REEL/FRAME:018357/0561;SIGNING DATES FROM 20060926 TO 20060927 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |