US20050177703A1 - Thread ID in a multithreaded processor - Google Patents
Thread ID in a multithreaded processor Download PDFInfo
- Publication number
- US20050177703A1 US20050177703A1 US10/774,226 US77422604A US2005177703A1 US 20050177703 A1 US20050177703 A1 US 20050177703A1 US 77422604 A US77422604 A US 77422604A US 2005177703 A1 US2005177703 A1 US 2005177703A1
- Authority
- US
- United States
- Prior art keywords
- thread
- pipeline
- stage
- trap
- multithreaded processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000015654 memory Effects 0.000 claims description 31
- 238000000034 method Methods 0.000 claims description 17
- 239000000872 buffer Substances 0.000 claims description 8
- 230000001902 propagating effect Effects 0.000 claims 2
- 230000006835 compression Effects 0.000 description 8
- 238000007906 compression Methods 0.000 description 8
- 230000001360 synchronised effect Effects 0.000 description 6
- 230000000875 corresponding effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 239000003550 marker Substances 0.000 description 4
- 239000002699 waste material Substances 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3808—Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
Definitions
- the present invention relates to microprocessor systems, and more particularly to thread identification in a multithreaded processor.
- microprocessor architectures Modern computer systems utilize a variety of different microprocessor architectures to perform program execution. Each microprocessor architecture is configured to execute programs made up of a number of macro instructions and micro instructions. Many macro instructions are translated or decoded into a sequence of micro instructions before processing. Micro instructions are simple machine instructions that can be executed directly by a microprocessor.
- each pipeline consists of multiple stages. Each stage in a pipeline operates in parallel with the other stages. However, each stage operates on a different macro or micro instruction.
- Pipelines are usually synchronous with respect to the system clock signal. Therefore, each pipeline stage is designed to perform its function in a single clock cycle. Thus, the instructions move through the pipeline with each active clock edge of a clock signal.
- Some microprocessors use asynchronous pipelines. Rather than a clock signal, handshaking signals are used between pipeline stages to indicate when the various stages are ready to accept new instructions.
- the present invention can be used with microprocessors using either (or both) synchronous or asynchronous pipelines.
- FIG. 1 shows an instruction fetch and issue unit, having an instruction fetch stage (I stage) 105 and a pre-decode stage (PD stage) 110 , coupled via an instruction buffer 115 to a typical four stage integer pipeline 120 for a microprocessor.
- Integer pipeline 120 comprises a decode stage (D stage) 130 , an execute one stage (E 1 stage) 140 , an execute two stage (E 2 stage) 150 , and a write back stage (W stage) 160 .
- Instruction fetch stage 105 fetches instructions to be processed.
- Pre-decode stage 110 predecodes instructions and stores them into the instructions buffer. It also groups instructions so that they can be issued in the next stage to one or more pipelines. Ideally, instructions are issued into integer pipeline 120 every clock cycle. Each instruction passes through the pipeline and is processed by each stage as necessary. Thus, during ideal operating conditions integer pipeline 120 is simultaneously processing 4 instructions. However, many conditions as explained below may prevent the ideal operation of integer pipeline 120 .
- FIG. 2 shows a typical four stage load/store pipeline 200 for a microprocessor coupled to a memory system 270 , instruction fetch stage 105 and pre-decode stage 110 .
- Load/store pipeline 200 includes a decode stage (D stage) 230 , an execute one stage (E 1 stage) 340 , an execute two stage (E 2 stage) 250 , and a write back stage (W stage) 260 .
- memory system 270 includes a data cache 274 and main memory 278 .
- Other embodiments of memory system 270 may be configured as scratch pad memory using SRAMs. Because memory systems, data caches, and scratch pad memories, are well known in the art, the function and performance of memory system 270 is not described in detail.
- Load/store pipeline 200 is specifically tailored to perform load and store instructions.
- Decode stage 230 decodes the instruction and reads the register file (not shown) for the needed information regarding the instruction.
- Execute one stage 240 calculates memory addresses for the load or store instructions. Because the address is calculated in execute one stage and load instructions only provide the address, execute one state 240 configures memory system 270 to provide the appropriate data at the next active clock cycle for load from memory. However, for store instructions, the data to be stored is typically not available at execute one stage 240 .
- execute two stage 250 retrieves information from the appropriate location in memory system 270 .
- execute two stage 250 prepares to write the data appropriate location.
- execute two stage 250 configures memory system 270 to store the data on the next active clock edge.
- write back stage 260 writes the appropriate value into a register file.
- pipelining can increase overall throughput in a processor
- pipelining also introduces data dependency issues between instructions in the pipeline. For example, if instruction “LD D 0 , [A 0 ]”, which means to load data register D 0 with the value at memory address A 0 , is followed by “MUL D 2 , D 0 , D 1 ”, which means to multiply the value in data register D 0 with the value in data register D 1 and store the result into data register d 2 , “MUL D 2 , D 0 , D 1 ” can not be executed until after “LD D 0 , [A 0 ]” is complete. Otherwise, “MUL D 2 , D 0 , D 1 ” may use an outdated value in data register D 0 .
- integer pipeline 120 and load/store pipeline 200 can execute instructions every clock cycle. However, many situations may occur that causes parts of integer pipeline 120 or load/store pipeline 200 to stall, which degrades the performance of the microprocessor.
- a common problem which causes pipeline stalls is latency in memory system 270 caused by cache misses. For example, a load instruction “LD D 0 , [A 0 ]” loads data from address A 0 of memory system 270 into data register D 0 . If the value for address A 0 is in a data cache 274 , the value in data register D 0 can be simply replaced by the data value for address A 0 in data cache 274 .
- memory system 270 may cause load/store pipeline 200 to stall as the cache miss causes a refill operation. Furthermore, if the cache has no empty set and the previous cache data are dirty, the refill operation would need to be preceded by a write back operation.
- multithreaded processors can switch from a current thread to a second thread that can use the processors cycles that would have been wasted in single threaded processors.
- the processor holds the state of several active threads, which can be executed independently. When one of the threads becomes blocked, for example due to a cache miss, another thread can be executed so that processor cycles are not wasted.
- thread switching may also be caused by timer interrupts and progress-monitoring software in a real-time kernel. Because the processor does not have to waste cycles waiting on a blocked thread overall performance of the processor is increased. However, different threads generally operate on different register contexts. Thus data forwarding between threads should be avoided.
- Traps are generally caused by error conditions, which lead to a redirection of the program flow to execute a trap handler.
- the error conditions can occur in different pipeline stages and need to be prioritized in case of simultaneous occurrences.
- Synchronous traps need to be synchronous to the instruction flow, which means the instruction that caused the trap is directly followed by the trap handler in the program execution.
- Asynchronous traps usually get handled some cycles after the trap is detected.
- a trap handler needs to be able to correlate a trap to the thread, which caused the trap.
- most conventional processors using data forwarding or supporting synchronous traps do not allow multiple threads to coexist in the same pipeline. In these processors, processing cycles are wasted during a thread switch to allow the pipelines to empty the current thread before switching to the new thread.
- Other conventional processors allow multiple threads to coexist in the pipeline but do not support data forwarding and synchronous traps.
- program tracing becomes complicated due to thread switching.
- Conventional embedded processors incorporate program trace output for debugging and development purposes.
- a program trace is a list of entries that tracks the actual instructions issued by the instruction fetch and issue unit with the program counter at the time each instruction is issued.
- a list of program instructions without correlation to the actual threads owning the instruction would be useless for debugging.
- a multithreaded processor in accordance with the present invention includes a thread ID for each instruction or operand in a pipeline stage. Data forwarding is only performed between pipeline stages having the same thread ID. Furthermore, the relationship between threads and traps is easily maintained because the thread ID for each instruction that can cause the trap is available at each pipeline stage. Furthermore, the thread ID is incorporated into the program trace so that the relationship between instructions and threads can be determined.
- a multithreaded processor includes an instruction fetch and issue unit and a pipeline.
- the instruction fetch and issue unit includes an instruction fetch stage configured to fetch one or more sets of fetched bits representing one or more instructions and an instruction buffer to store the sets of fetched bits.
- the instruction buffer stores an associated thread ID for each set of fetched bits.
- the pipeline is coupled to the instruction fetch and issue unit and configured to receive a set of fetched bits and the associated thread ID.
- Each pipeline stage of the pipeline has a thread ID memory to store a thread ID associated with the instruction or operand within the pipeline stage.
- the multithreaded processor can also include a data forwarding unit for forwarding data between a first pipeline stage having a first thread ID and a second pipeline stage having a second thread ID.
- a data forwarding unit for forwarding data between a first pipeline stage having a first thread ID and a second pipeline stage having a second thread ID.
- first thread ID is equal to the second thread ID then data forwarding is allowed.
- data forwarding is prevented when the first thread ID does not match the second thread ID.
- Some embodiments of the present invention also include a trap handler, which prevents trap resolution of a trap when the active thread is not the same as the thread that generated the trap. When the thread that generated the trap becomes the active thread, the trap handler resolves the trap.
- a trace generation unit detects issuance of instructions and generates program trace entries that include the thread IDs of the thread containing the instructions.
- the trace generation unit detects thread switches and generates a thread switch marker for the program trace.
- the thread switch marker can contain the thread ID of threads involved in the thread switch.
- FIG. 1 is a simplified diagram of a conventional integer pipeline.
- FIG. 2 is a simplified diagram of a conventional load/store pipeline.
- FIG. 3 is a simplified block diagram of pipeline with a data forwarding unit in accordance with one embodiment of the present invention.
- FIG. 4 is a simplified block diagram of a trace unit with a pipeline in accordance with one embodiment of the present invention.
- processors in accordance with the present invention attach a thread ID to instructions and operands in pipelines.
- the thread ID identifies the thread to which the operand or instruction belongs. Data forwarding is prevented between pipeline stages when the thread IDs of the instructions or operands in the pipeline stages do not match.
- the thread IDs also allows traps and to be correlated with the thread from which the trap was generated. Furthermore, the thread ID can be incorporated into a program trace to aid in debugging and development.
- FIG. 3 is a simplified block diagram of an instruction fetch and issue unit 300 with a pipeline 330 .
- pipeline 330 can represent both load/store pipelines and execution pipelines.
- most embodiments of the present invention would include additional pipelines coupled to instruction fetch and issue unit 300 .
- Instruction fetch and issue unit 300 includes an instruction fetch stage (I stage) 305 , a pre-decode stage (PD stage) 310 , and an instruction buffer 320 .
- Instruction fetch stage 305 fetches instructions to be processed.
- instruction fetch stage 305 fetches a set of bits having a size equal to the data width of the memory system of the processor.
- instruction fetch stage 305 would read a set of 64 bits.
- Pre-decode stage 110 performs predecoding such as generating operand pointers and grouping instructions on each set of fetched bits.
- the set of fetched bits are stored in a row of the instruction buffer 320 .
- Each set of fetched bits is also given a thread ID corresponding to the thread, which owns the particular set of fetched bits.
- a set of bits stored in row 320 _ 0 has a corresponding thread ID stored with the set at thread ID memory 320 _ 0 _TID.
- the set of fetched bits and the corresponding thread ID are issued to a pipeline, such as pipeline 330 .
- the set of fetched bits have been predecoded to include operand pointers, instructions, and predecode information.
- the thread ID associated with the set of fetched bits attaches to the instructions and operands of the set of fetched bits.
- each stage of pipeline 330 includes a thread ID memory to store the thread ID associated with the operand or instruction in the pipeline stage.
- decode stage 340 includes a thread ID memory TID_ 340 .
- execute one stage 350 includes thread ID memories TID_ 350 , TID_ 360 , and TID_ 370 , respectively.
- thread ID memories may be a single bit.
- the thread ID memories would be larger than a single bit.
- pipeline 330 also includes a first thread register files 392 and a second thread register file 394 , a data forwarding unit 380 and a thread handler 395 .
- first thread register files 392 and a second thread register file 394 For clarity, only two thread register files are shown. In general, the number of thread register files is equal to the maximum number of active threads supported by the processor. Each active thread has a corresponding thread register file. The instructions and operands of a thread make use of only the corresponding thread register file.
- data forwarding unit 380 includes a thread ID comparator 383 as well as an operand comparator 386 .
- Operand comparator 386 determines when an operand in a later stage may be needed in an earlier stage. However, the operand is forwarded only when the thread ID associated with the operand in the later stage is equal to the thread ID associated with the operand in the earlier stage.
- a pipeline stall is avoided by forwarding the value from memory address A 0 obtained in execute two stage 360 to execute one stage 350 , which needs the data from data register D 0 to process “MUL D 2 , D 0 , D 1 ”.
- instruction “LD D 0 , [A 0 ]” is associated with a different thread than instruction “MUL D 2 , D 0 , D 1 ”, then data forwarding should not occur.
- data forwarding unit 380 uses thread ID comparator 383 to compare thread ID TID_ 360 with thread ID TID_ 350 . If thread ID TID_ 350 matches thread ID TID_ 360 then data forwarding is allowed. Otherwise, data forwarding is prevented.
- processors in accordance with the present invention can support multiple threads in a single pipeline with data forwarding when appropriate. Therefore, processors in accordance with the present invention achieve higher performance than conventional processors.
- trap handler 395 includes a trap thread register 396 which stores the thread ID of the active thread when a trap is detected. Trap handler 395 resolves traps only if the thread ID of the active thread matches the thread ID of the thread that generated the trap, which stored in trap thread register 396 . Thus, if a thread switch occurs after detection of a trap but before the trap can be resolved, trap handler 395 would delay handling of the trap until the thread that generated the trap is the active thread.
- program tracing can include thread information.
- Conventional embedded processors incorporate program trace output for debugging and development purposes.
- a program trace is a list of entries that tracks the actual instructions issued by the instruction fetch and issue unit with the program counter at the time each instruction is issued.
- the program trace is compressed.
- One common compression technique for program traces is to tokenize the number of instructions issued, along with a notification of any program flow change i.e. branches, jumps, or calls.
- a synchronization operation is performed that inserts series of token that represents the number of instructions issued since the last synchronization operation, a synchronization token, and the current program counter into the program trace.
- a software debugger uses the tokens of the program trace determine the behavior of the processor during program execution.
- some embodiments of the present invention include a trace unit 400 coupled to instruction fetch and issue unit 300 and pipeline 330 . Although not shown, trace unit 400 would also be coupled to the other pipelines in the processor.
- Trace unit 400 includes a trace generation unit 420 and a trace compression unit 410 .
- Trace generation unit 420 receives the instruction and thread ID of each issued instruction from instruction fetch and issue unit 300 . Furthermore, trace generation unit 420 monitors the pipelines to detect branches, jumps, or calls. Trace generation unit 420 generates program trace entries that together form the program trace.
- the program trace entries include thread identification information using the thread IDs from the pipelines and instruction fetch and issue unit 300 .
- Trace compression unit 410 receives the program trace from trace generation unit 410 and compresses the program trace into compressed program trace 410 . Any form of compression can be used in trace compression unit 410 .
- Most embodiments of the present invention tokenize the instructions as explained above.
- the present invention includes several methods of embedding thread identification in the program trace. For example, some embodiments of the present invention, adds a thread identification field for in each program trace entry.
- the thread identification field includes the thread ID of the instruction from instruction fetch and issue unit 300 .
- a thread switch marker is inserted into the program trace whenever a thread switch occurs. For embodiments of trace compression unit 410 that tokenize the program trace, the thread switch marker is tokenized into a thread switch token.
- each program trace entry includes a thread identification field.
- Trace compression unit 410 tokenizes the program trace into tokens that include a thread identification field. Furthermore, a synchronization operation is performed whenever a thread switch occurs. As explained above a synchronization operation inserts series of token that represents the number of instructions issued since the last synchronization operation, a synchronization token, and the current program counter into the program trace.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Debugging And Monitoring (AREA)
- Advance Control (AREA)
Abstract
Description
- The present invention relates to microprocessor systems, and more particularly to thread identification in a multithreaded processor.
- Modern computer systems utilize a variety of different microprocessor architectures to perform program execution. Each microprocessor architecture is configured to execute programs made up of a number of macro instructions and micro instructions. Many macro instructions are translated or decoded into a sequence of micro instructions before processing. Micro instructions are simple machine instructions that can be executed directly by a microprocessor.
- To increase processing power, most microprocessors use multiple pipelines, such as an integer pipeline and a load/store pipeline to process the macro and micro instructions. Typically, each pipeline consists of multiple stages. Each stage in a pipeline operates in parallel with the other stages. However, each stage operates on a different macro or micro instruction. Pipelines are usually synchronous with respect to the system clock signal. Therefore, each pipeline stage is designed to perform its function in a single clock cycle. Thus, the instructions move through the pipeline with each active clock edge of a clock signal. Some microprocessors use asynchronous pipelines. Rather than a clock signal, handshaking signals are used between pipeline stages to indicate when the various stages are ready to accept new instructions. The present invention can be used with microprocessors using either (or both) synchronous or asynchronous pipelines.
-
FIG. 1 shows an instruction fetch and issue unit, having an instruction fetch stage (I stage) 105 and a pre-decode stage (PD stage) 110, coupled via aninstruction buffer 115 to a typical fourstage integer pipeline 120 for a microprocessor.Integer pipeline 120 comprises a decode stage (D stage) 130, an execute one stage (E1 stage) 140, an execute two stage (E2 stage) 150, and a write back stage (W stage) 160.Instruction fetch stage 105 fetches instructions to be processed. Pre-decodestage 110 predecodes instructions and stores them into the instructions buffer. It also groups instructions so that they can be issued in the next stage to one or more pipelines. Ideally, instructions are issued intointeger pipeline 120 every clock cycle. Each instruction passes through the pipeline and is processed by each stage as necessary. Thus, during ideal operatingconditions integer pipeline 120 is simultaneously processing 4 instructions. However, many conditions as explained below may prevent the ideal operation ofinteger pipeline 120. -
FIG. 2 shows a typical four stage load/store pipeline 200 for a microprocessor coupled to amemory system 270,instruction fetch stage 105 and pre-decodestage 110. Load/store pipeline 200 includes a decode stage (D stage) 230, an execute one stage (E1 stage) 340, an execute two stage (E2 stage) 250, and a write back stage (W stage) 260. In one embodiment,memory system 270 includes adata cache 274 andmain memory 278. Other embodiments ofmemory system 270 may be configured as scratch pad memory using SRAMs. Because memory systems, data caches, and scratch pad memories, are well known in the art, the function and performance ofmemory system 270 is not described in detail. Load/store pipeline 200 is specifically tailored to perform load and store instructions.Decode stage 230 decodes the instruction and reads the register file (not shown) for the needed information regarding the instruction. Execute onestage 240 calculates memory addresses for the load or store instructions. Because the address is calculated in execute one stage and load instructions only provide the address, execute onestate 240 configuresmemory system 270 to provide the appropriate data at the next active clock cycle for load from memory. However, for store instructions, the data to be stored is typically not available at execute onestage 240. For load instructions, execute twostage 250 retrieves information from the appropriate location inmemory system 270. For store instructions, execute twostage 250 prepares to write the data appropriate location. For example, for stores to memory, execute twostage 250 configuresmemory system 270 to store the data on the next active clock edge. For register load operations, write backstage 260 writes the appropriate value into a register file. By including both a load/store pipeline and an integer pipeline, overall performance of a microprocessor is enhanced because the load/store pipeline and integer pipelines can perform in parallel. - While pipelining can increase overall throughput in a processor, pipelining also introduces data dependency issues between instructions in the pipeline. For example, if instruction “LD D0, [A0]”, which means to load data register D0 with the value at memory address A0, is followed by “MUL D2, D0, D1”, which means to multiply the value in data register D0 with the value in data register D1 and store the result into data register d2, “MUL D2, D0, D1” can not be executed until after “LD D0, [A0]” is complete. Otherwise, “MUL D2, D0, D1” may use an outdated value in data register D0. However, stalling the pipeline to delay the execution of “MUL D2, D0, D1” would waste processor cycles. Many data dependency problems can be solved by forwarding data between pipeline stages. For example, the pipeline stage with the loaded value from [A0] targeting data register D0, could forward the value to a pipeline stage with “MUL D2, D0, D1” to solve the data dependency issue without stalling the pipeline.
- Ideally,
integer pipeline 120 and load/store pipeline 200 can execute instructions every clock cycle. However, many situations may occur that causes parts ofinteger pipeline 120 or load/store pipeline 200 to stall, which degrades the performance of the microprocessor. A common problem which causes pipeline stalls is latency inmemory system 270 caused by cache misses. For example, a load instruction “LD D0, [A0]” loads data from address A0 ofmemory system 270 into data register D0. If the value for address A0 is in adata cache 274, the value in data register D0 can be simply replaced by the data value for address A0 indata cache 274. However, if the value for address A0 is not indata cache 274, the value needs to be obtained from the main memory. Thus,memory system 270 may cause load/store pipeline 200 to stall as the cache miss causes a refill operation. Furthermore, if the cache has no empty set and the previous cache data are dirty, the refill operation would need to be preceded by a write back operation. - Rather than stalling the pipeline and wasting processor cycles, some processors (called multithreaded processors), can switch from a current thread to a second thread that can use the processors cycles that would have been wasted in single threaded processors. Specifically, in multithreaded processors, the processor holds the state of several active threads, which can be executed independently. When one of the threads becomes blocked, for example due to a cache miss, another thread can be executed so that processor cycles are not wasted. Furthermore, thread switching may also be caused by timer interrupts and progress-monitoring software in a real-time kernel. Because the processor does not have to waste cycles waiting on a blocked thread overall performance of the processor is increased. However, different threads generally operate on different register contexts. Thus data forwarding between threads should be avoided.
- Another related problem is caused by traps. Traps are generally caused by error conditions, which lead to a redirection of the program flow to execute a trap handler. The error conditions can occur in different pipeline stages and need to be prioritized in case of simultaneous occurrences. Synchronous traps need to be synchronous to the instruction flow, which means the instruction that caused the trap is directly followed by the trap handler in the program execution. Asynchronous traps usually get handled some cycles after the trap is detected. In a multithreaded processor, a trap handler needs to be able to correlate a trap to the thread, which caused the trap. Thus, most conventional processors using data forwarding or supporting synchronous traps do not allow multiple threads to coexist in the same pipeline. In these processors, processing cycles are wasted during a thread switch to allow the pipelines to empty the current thread before switching to the new thread. Other conventional processors allow multiple threads to coexist in the pipeline but do not support data forwarding and synchronous traps.
- Another issue with conventional multi-threaded processors is that program tracing becomes complicated due to thread switching. Conventional embedded processors incorporate program trace output for debugging and development purposes. Generally, a program trace is a list of entries that tracks the actual instructions issued by the instruction fetch and issue unit with the program counter at the time each instruction is issued. However for multi-threaded processors, a list of program instructions without correlation to the actual threads owning the instruction would be useless for debugging.
- Hence there is a need for a method or system to allow pipelines to have multiple threads without the limitations of conventional systems with regards to program tracing, data forwarding and trap handling.
- Accordingly, a multithreaded processor in accordance with the present invention includes a thread ID for each instruction or operand in a pipeline stage. Data forwarding is only performed between pipeline stages having the same thread ID. Furthermore, the relationship between threads and traps is easily maintained because the thread ID for each instruction that can cause the trap is available at each pipeline stage. Furthermore, the thread ID is incorporated into the program trace so that the relationship between instructions and threads can be determined.
- For example in one embodiment of the present invention, a multithreaded processor includes an instruction fetch and issue unit and a pipeline. The instruction fetch and issue unit includes an instruction fetch stage configured to fetch one or more sets of fetched bits representing one or more instructions and an instruction buffer to store the sets of fetched bits. In addition the instruction buffer stores an associated thread ID for each set of fetched bits. The pipeline is coupled to the instruction fetch and issue unit and configured to receive a set of fetched bits and the associated thread ID. Each pipeline stage of the pipeline has a thread ID memory to store a thread ID associated with the instruction or operand within the pipeline stage. The multithreaded processor can also include a data forwarding unit for forwarding data between a first pipeline stage having a first thread ID and a second pipeline stage having a second thread ID. When the first thread ID is equal to the second thread ID then data forwarding is allowed. However data forwarding is prevented when the first thread ID does not match the second thread ID.
- Some embodiments of the present invention also include a trap handler, which prevents trap resolution of a trap when the active thread is not the same as the thread that generated the trap. When the thread that generated the trap becomes the active thread, the trap handler resolves the trap.
- Some embodiments of the present invention use thread IDs in the generation of program traces. Specifically, in one embodiment of the present invention, a trace generation unit detects issuance of instructions and generates program trace entries that include the thread IDs of the thread containing the instructions. In another embodiment of the present invention, the trace generation unit detects thread switches and generates a thread switch marker for the program trace. The thread switch marker can contain the thread ID of threads involved in the thread switch.
- The present invention will be more fully understood in view of the following description and drawings.
-
FIG. 1 is a simplified diagram of a conventional integer pipeline. -
FIG. 2 is a simplified diagram of a conventional load/store pipeline. -
FIG. 3 is a simplified block diagram of pipeline with a data forwarding unit in accordance with one embodiment of the present invention. -
FIG. 4 is a simplified block diagram of a trace unit with a pipeline in accordance with one embodiment of the present invention. - As explained above, conventional multithreaded processors supporting data forwarding in pipelines, waste processing cycles to empty the current thread before loading a new thread during a thread switch, or they have a limited forwarding capability, e.g. cycle-by-cycle multithreading usually doesn't support forwarding into any pipeline stage. Processors in accordance with the present invention, attach a thread ID to instructions and operands in pipelines. The thread ID identifies the thread to which the operand or instruction belongs. Data forwarding is prevented between pipeline stages when the thread IDs of the instructions or operands in the pipeline stages do not match. The thread IDs also allows traps and to be correlated with the thread from which the trap was generated. Furthermore, the thread ID can be incorporated into a program trace to aid in debugging and development.
-
FIG. 3 is a simplified block diagram of an instruction fetch andissue unit 300 with apipeline 330. Because load store pipelines and execution pipelines behave similarly with respect to the present invention,pipeline 330 can represent both load/store pipelines and execution pipelines. Furthermore, most embodiments of the present invention would include additional pipelines coupled to instruction fetch andissue unit 300. However, for clarity and concisenessonly pipeline 330 is described. Instruction fetch andissue unit 300 includes an instruction fetch stage (I stage) 305, a pre-decode stage (PD stage) 310, and aninstruction buffer 320. Instruction fetchstage 305 fetches instructions to be processed. Typically, instruction fetchstage 305 fetches a set of bits having a size equal to the data width of the memory system of the processor. For example, in 64 bit systems, instruction fetchstage 305 would read a set of 64 bits.Pre-decode stage 110 performs predecoding such as generating operand pointers and grouping instructions on each set of fetched bits. The set of fetched bits are stored in a row of theinstruction buffer 320. Each set of fetched bits is also given a thread ID corresponding to the thread, which owns the particular set of fetched bits. As illustrated inFIG. 3 , a set of bits stored in row 320_0 has a corresponding thread ID stored with the set at thread ID memory 320_0_TID. - The set of fetched bits and the corresponding thread ID are issued to a pipeline, such as
pipeline 330. Generally, the set of fetched bits have been predecoded to include operand pointers, instructions, and predecode information. Unlike conventional pipelines, the thread ID associated with the set of fetched bits attaches to the instructions and operands of the set of fetched bits. Thus, each stage ofpipeline 330 includes a thread ID memory to store the thread ID associated with the operand or instruction in the pipeline stage. Thus, decodestage 340 includes a thread ID memory TID_340. Similarly, execute onestage 350, execute twostage 360, and write backstage 370, include thread ID memories TID_350, TID_360, and TID_370, respectively. Some embodiments of the present invention only supports two active threads. For these embodiments, the thread ID memories may be a single bit. For other embodiments having more active threads, the thread ID memories would be larger than a single bit. - In addition to the pipeline stages,
pipeline 330 also includes a first thread register files 392 and a secondthread register file 394, adata forwarding unit 380 and athread handler 395. For clarity, only two thread register files are shown. In general, the number of thread register files is equal to the maximum number of active threads supported by the processor. Each active thread has a corresponding thread register file. The instructions and operands of a thread make use of only the corresponding thread register file. - In conventional processors supporting data forwarding, operands are forwarded when the operand being stored in a later pipeline stage is required in an earlier pipeline stage. However, in a multithreaded processor, data forwarding should only occur if the operands belong to the same thread. Thus,
data forwarding unit 380 includes athread ID comparator 383 as well as anoperand comparator 386.Operand comparator 386 determines when an operand in a later stage may be needed in an earlier stage. However, the operand is forwarded only when the thread ID associated with the operand in the later stage is equal to the thread ID associated with the operand in the earlier stage. - For example, as explained above, if instruction “LD D0, [A0]”, which means to load data register D0 with the value at memory address A0, is followed by “MUL D2, D0, D1”, which means to multiply the value in data register D0 with the value in data register D1 and store the result into data register d2, “MUL D2, D0, D1” can not be executed until after “LD D0, [A0]” is complete. Otherwise, “MUL D2, D0, D1” may use an outdated value in data register D0. A pipeline stall is avoided by forwarding the value from memory address A0 obtained in execute two
stage 360 to execute onestage 350, which needs the data from data register D0 to process “MUL D2, D0, D1”. However, if instruction “LD D0, [A0]” is associated with a different thread than instruction “MUL D2, D0, D1”, then data forwarding should not occur. Thus, in accordance with the present invention,data forwarding unit 380 usesthread ID comparator 383 to compare thread ID TID_360 with thread ID TID_350. If thread ID TID_350 matches thread ID TID_360 then data forwarding is allowed. Otherwise, data forwarding is prevented. - Thus, processors in accordance with the present invention can support multiple threads in a single pipeline with data forwarding when appropriate. Therefore, processors in accordance with the present invention achieve higher performance than conventional processors.
- As described above, another common problem in conventional multi-threaded processors is trap handling. If a trap is detected but not resolved prior to a thread switch, errors are likely to occur as the trap is resolved. Thus as illustrated in
FIG. 3 , in one embodiment of the present invention,trap handler 395 includes atrap thread register 396 which stores the thread ID of the active thread when a trap is detected.Trap handler 395 resolves traps only if the thread ID of the active thread matches the thread ID of the thread that generated the trap, which stored intrap thread register 396. Thus, if a thread switch occurs after detection of a trap but before the trap can be resolved,trap handler 395 would delay handling of the trap until the thread that generated the trap is the active thread. - Another advantage of using the novel thread identification of the present invention is that program tracing can include thread information. Conventional embedded processors incorporate program trace output for debugging and development purposes. Generally, a program trace is a list of entries that tracks the actual instructions issued by the instruction fetch and issue unit with the program counter at the time each instruction is issued. Often, the program trace is compressed. One common compression technique for program traces is to tokenize the number of instructions issued, along with a notification of any program flow change i.e. branches, jumps, or calls. Periodically, (every 128 or 256 instructions, for example) a synchronization operation is performed that inserts series of token that represents the number of instructions issued since the last synchronization operation, a synchronization token, and the current program counter into the program trace. A software debugger uses the tokens of the program trace determine the behavior of the processor during program execution.
- However, conventional program tracing is insufficient for multithreaded processors. Specifically, a list of issued instructions even with a program counter would not be enough to determine the behavior of the processor without information about which thread contained the instructions.
- As illustrated in
FIG. 4 , some embodiments of the present invention include atrace unit 400 coupled to instruction fetch andissue unit 300 andpipeline 330. Although not shown,trace unit 400 would also be coupled to the other pipelines in the processor.Trace unit 400 includes a trace generation unit 420 and a trace compression unit 410. Trace generation unit 420 receives the instruction and thread ID of each issued instruction from instruction fetch andissue unit 300. Furthermore, trace generation unit 420 monitors the pipelines to detect branches, jumps, or calls. Trace generation unit 420 generates program trace entries that together form the program trace. The program trace entries include thread identification information using the thread IDs from the pipelines and instruction fetch andissue unit 300. Trace compression unit 410 receives the program trace from trace generation unit 410 and compresses the program trace into compressed program trace 410. Any form of compression can be used in trace compression unit 410. Most embodiments of the present invention tokenize the instructions as explained above. - The present invention includes several methods of embedding thread identification in the program trace. For example, some embodiments of the present invention, adds a thread identification field for in each program trace entry. The thread identification field includes the thread ID of the instruction from instruction fetch and
issue unit 300. In other embodiments of the present invention, a thread switch marker is inserted into the program trace whenever a thread switch occurs. For embodiments of trace compression unit 410 that tokenize the program trace, the thread switch marker is tokenized into a thread switch token. - In a specific embodiment of the present invention, each program trace entry includes a thread identification field. Trace compression unit 410 tokenizes the program trace into tokens that include a thread identification field. Furthermore, a synchronization operation is performed whenever a thread switch occurs. As explained above a synchronization operation inserts series of token that represents the number of instructions issued since the last synchronization operation, a synchronization token, and the current program counter into the program trace.
- In the various embodiments of this invention, novel structures and methods have been described to use thread IDs to maintain data coherency and to generate meaningful program traces in multithreaded processors. The various embodiments of the structures and methods of this invention that are described above are illustrative only of the principles of this invention and are not intended to limit the scope of the invention to the particular embodiments described. For example, in view of this disclosure, those skilled in the art can define other instruction fetch and issue units, pipelines, pipeline stages, instruction buffers, data forwarding units, thread ID comparators, operand comparators, Trace Units, Trace generation units, trace compression units, and so forth, and use these alternative ures to create a method or system according to the principles his invention. Thus, the invention is limited only by the owing claims.
Claims (27)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/774,226 US7263599B2 (en) | 2004-02-06 | 2004-02-06 | Thread ID in a multithreaded processor |
DE602005005726T DE602005005726T2 (en) | 2004-02-06 | 2005-02-03 | Spread a thread ID in a multithreaded pipeline processor |
EP05002225A EP1562109B1 (en) | 2004-02-06 | 2005-02-03 | Thread id propagation in a multithreaded pipelined processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/774,226 US7263599B2 (en) | 2004-02-06 | 2004-02-06 | Thread ID in a multithreaded processor |
Publications (2)
Publication Number | Publication Date |
---|---|
US20050177703A1 true US20050177703A1 (en) | 2005-08-11 |
US7263599B2 US7263599B2 (en) | 2007-08-28 |
Family
ID=34679405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/774,226 Active 2024-12-19 US7263599B2 (en) | 2004-02-06 | 2004-02-06 | Thread ID in a multithreaded processor |
Country Status (3)
Country | Link |
---|---|
US (1) | US7263599B2 (en) |
EP (1) | EP1562109B1 (en) |
DE (1) | DE602005005726T2 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070204137A1 (en) * | 2004-08-30 | 2007-08-30 | Texas Instruments Incorporated | Multi-threading processors, integrated circuit devices, systems, and processes of operation and manufacture |
US20080115115A1 (en) * | 2006-11-15 | 2008-05-15 | Lucian Codrescu | Embedded trace macrocell for enhanced digital signal processor debugging operations |
US20080115011A1 (en) * | 2006-11-15 | 2008-05-15 | Lucian Codrescu | Method and system for trusted/untrusted digital signal processor debugging operations |
US20080115113A1 (en) * | 2006-11-15 | 2008-05-15 | Lucian Codrescu | Non-intrusive, thread-selective, debugging method and system for a multi-thread digital signal processor |
US20080114972A1 (en) * | 2006-11-15 | 2008-05-15 | Lucian Codrescu | Method and system for instruction stuffing operations during non-intrusive digital signal processor debugging |
GB2448276B (en) * | 2006-02-28 | 2011-06-15 | Mips Tech Inc | Distributive scoreboard scheduling in an out-of-order processor |
US8484516B2 (en) * | 2007-04-11 | 2013-07-09 | Qualcomm Incorporated | Inter-thread trace alignment method and system for a multi-threaded processor |
US20140122845A1 (en) * | 2011-12-30 | 2014-05-01 | Jaewoong Chung | Overlapping atomic regions in a processor |
WO2015063466A1 (en) * | 2013-10-31 | 2015-05-07 | Silicon Tailor Limited | Pipelined configurable processor |
US20150324202A1 (en) * | 2014-05-07 | 2015-11-12 | Alibaba Group Holding Limited | Detecting data dependencies of instructions associated with threads in a simultaneous multithreading scheme |
WO2022212218A1 (en) * | 2021-03-27 | 2022-10-06 | Ceremorphic, Inc. | Reconfigurable multi-thread processor for simultaneous operations on split instructions and operands |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060095732A1 (en) * | 2004-08-30 | 2006-05-04 | Tran Thang M | Processes, circuits, devices, and systems for scoreboard and other processor improvements |
US7877734B2 (en) * | 2006-01-12 | 2011-01-25 | International Business Machines Corporation | Selective profiling of program code executing in a runtime environment |
US8032737B2 (en) * | 2006-08-14 | 2011-10-04 | Marvell World Trade Ltd. | Methods and apparatus for handling switching among threads within a multithread processor |
US20080082793A1 (en) * | 2006-09-29 | 2008-04-03 | Mips Technologies, Inc. | Detection and prevention of write-after-write hazards, and applications thereof |
US9946547B2 (en) | 2006-09-29 | 2018-04-17 | Arm Finance Overseas Limited | Load/store unit for a processor, and applications thereof |
US7594079B2 (en) | 2006-09-29 | 2009-09-22 | Mips Technologies, Inc. | Data cache virtual hint way prediction, and applications thereof |
US8443341B2 (en) * | 2006-11-09 | 2013-05-14 | Rogue Wave Software, Inc. | System for and method of capturing application characteristics data from a computer system and modeling target system |
US9329870B2 (en) | 2013-02-13 | 2016-05-03 | International Business Machines Corporation | Extensible execution unit interface architecture with multiple decode logic and multiple execution units |
CN108958798B (en) * | 2018-06-15 | 2021-04-20 | 上海兆芯集成电路有限公司 | Instruction translation circuit, processor circuit and execution method thereof |
US11288072B2 (en) | 2019-09-11 | 2022-03-29 | Ceremorphic, Inc. | Multi-threaded processor with thread granularity |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5890008A (en) * | 1997-06-25 | 1999-03-30 | Sun Microsystems, Inc. | Method for dynamically reconfiguring a processor |
US6088788A (en) * | 1996-12-27 | 2000-07-11 | International Business Machines Corporation | Background completion of instruction and associated fetch request in a multithread processor |
US20020116587A1 (en) * | 2000-12-22 | 2002-08-22 | Modelski Richard P. | External memory engine selectable pipeline architecture |
US6470443B1 (en) * | 1996-12-31 | 2002-10-22 | Compaq Computer Corporation | Pipelined multi-thread processor selecting thread instruction in inter-stage buffer based on count information |
US20020194457A1 (en) * | 1997-12-16 | 2002-12-19 | Haitham Akkary | Memory system for ordering load and store instructions in a processor that performs out-of-order multithread execution |
US6507862B1 (en) * | 1999-05-11 | 2003-01-14 | Sun Microsystems, Inc. | Switching method in a multi-threaded processor |
US6609193B1 (en) * | 1999-12-30 | 2003-08-19 | Intel Corporation | Method and apparatus for multi-thread pipelined instruction decoder |
US7010669B2 (en) * | 2001-06-22 | 2006-03-07 | Intel Corporation | Determining whether thread fetch operation will be blocked due to processing of another thread |
-
2004
- 2004-02-06 US US10/774,226 patent/US7263599B2/en active Active
-
2005
- 2005-02-03 DE DE602005005726T patent/DE602005005726T2/en active Active
- 2005-02-03 EP EP05002225A patent/EP1562109B1/en not_active Expired - Fee Related
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6088788A (en) * | 1996-12-27 | 2000-07-11 | International Business Machines Corporation | Background completion of instruction and associated fetch request in a multithread processor |
US6470443B1 (en) * | 1996-12-31 | 2002-10-22 | Compaq Computer Corporation | Pipelined multi-thread processor selecting thread instruction in inter-stage buffer based on count information |
US5890008A (en) * | 1997-06-25 | 1999-03-30 | Sun Microsystems, Inc. | Method for dynamically reconfiguring a processor |
US20020194457A1 (en) * | 1997-12-16 | 2002-12-19 | Haitham Akkary | Memory system for ordering load and store instructions in a processor that performs out-of-order multithread execution |
US6507862B1 (en) * | 1999-05-11 | 2003-01-14 | Sun Microsystems, Inc. | Switching method in a multi-threaded processor |
US6609193B1 (en) * | 1999-12-30 | 2003-08-19 | Intel Corporation | Method and apparatus for multi-thread pipelined instruction decoder |
US20020116587A1 (en) * | 2000-12-22 | 2002-08-22 | Modelski Richard P. | External memory engine selectable pipeline architecture |
US7010669B2 (en) * | 2001-06-22 | 2006-03-07 | Intel Corporation | Determining whether thread fetch operation will be blocked due to processing of another thread |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9389869B2 (en) | 2004-08-30 | 2016-07-12 | Texas Instruments Incorporated | Multithreaded processor with plurality of scoreboards each issuing to plurality of pipelines |
US20110099393A1 (en) * | 2004-08-30 | 2011-04-28 | Texas Instruments Incorporated | Multi-threading processors, integrated circuit devices, systems, and processes of operation and manufacture |
US20070204137A1 (en) * | 2004-08-30 | 2007-08-30 | Texas Instruments Incorporated | Multi-threading processors, integrated circuit devices, systems, and processes of operation and manufacture |
US9015504B2 (en) | 2004-08-30 | 2015-04-21 | Texas Instruments Incorporated | Managing power of thread pipelines according to clock frequency and voltage specified in thread registers |
US7890735B2 (en) | 2004-08-30 | 2011-02-15 | Texas Instruments Incorporated | Multi-threading processors, integrated circuit devices, systems, and processes of operation and manufacture |
US20110099355A1 (en) * | 2004-08-30 | 2011-04-28 | Texas Instruments Incorporated | Multi-threading processors, integrated circuit devices, systems, and processes of operation and manufacture |
GB2448276B (en) * | 2006-02-28 | 2011-06-15 | Mips Tech Inc | Distributive scoreboard scheduling in an out-of-order processor |
US20080115011A1 (en) * | 2006-11-15 | 2008-05-15 | Lucian Codrescu | Method and system for trusted/untrusted digital signal processor debugging operations |
US8533530B2 (en) | 2006-11-15 | 2013-09-10 | Qualcomm Incorporated | Method and system for trusted/untrusted digital signal processor debugging operations |
US8341604B2 (en) | 2006-11-15 | 2012-12-25 | Qualcomm Incorporated | Embedded trace macrocell for enhanced digital signal processor debugging operations |
US8370806B2 (en) | 2006-11-15 | 2013-02-05 | Qualcomm Incorporated | Non-intrusive, thread-selective, debugging method and system for a multi-thread digital signal processor |
US8380966B2 (en) | 2006-11-15 | 2013-02-19 | Qualcomm Incorporated | Method and system for instruction stuffing operations during non-intrusive digital signal processor debugging |
US20080115115A1 (en) * | 2006-11-15 | 2008-05-15 | Lucian Codrescu | Embedded trace macrocell for enhanced digital signal processor debugging operations |
US20080115113A1 (en) * | 2006-11-15 | 2008-05-15 | Lucian Codrescu | Non-intrusive, thread-selective, debugging method and system for a multi-thread digital signal processor |
US20080114972A1 (en) * | 2006-11-15 | 2008-05-15 | Lucian Codrescu | Method and system for instruction stuffing operations during non-intrusive digital signal processor debugging |
US8484516B2 (en) * | 2007-04-11 | 2013-07-09 | Qualcomm Incorporated | Inter-thread trace alignment method and system for a multi-threaded processor |
US20140122845A1 (en) * | 2011-12-30 | 2014-05-01 | Jaewoong Chung | Overlapping atomic regions in a processor |
US9710280B2 (en) * | 2011-12-30 | 2017-07-18 | Intel Corporation | Overlapping atomic regions in a processor |
US9658985B2 (en) | 2013-10-31 | 2017-05-23 | Silicon Tailor Limited | Pipelined configurable processor |
JP2016535913A (en) * | 2013-10-31 | 2016-11-17 | シリコン テーラー リミテッド | Pipelined configurable processor |
WO2015063466A1 (en) * | 2013-10-31 | 2015-05-07 | Silicon Tailor Limited | Pipelined configurable processor |
US10275390B2 (en) | 2013-10-31 | 2019-04-30 | Silicon Tailor Limited | Pipelined configurable processor |
US20150324202A1 (en) * | 2014-05-07 | 2015-11-12 | Alibaba Group Holding Limited | Detecting data dependencies of instructions associated with threads in a simultaneous multithreading scheme |
US10545763B2 (en) * | 2014-05-07 | 2020-01-28 | Alibaba Group Holding Limited | Detecting data dependencies of instructions associated with threads in a simultaneous multithreading scheme |
WO2022212218A1 (en) * | 2021-03-27 | 2022-10-06 | Ceremorphic, Inc. | Reconfigurable multi-thread processor for simultaneous operations on split instructions and operands |
US11782719B2 (en) | 2021-03-27 | 2023-10-10 | Ceremorphic, Inc. | Reconfigurable multi-thread processor for simultaneous operations on split instructions and operands |
Also Published As
Publication number | Publication date |
---|---|
DE602005005726T2 (en) | 2009-04-30 |
DE602005005726D1 (en) | 2008-05-15 |
EP1562109A1 (en) | 2005-08-10 |
EP1562109B1 (en) | 2008-04-02 |
US7263599B2 (en) | 2007-08-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1562109B1 (en) | Thread id propagation in a multithreaded pipelined processor | |
EP1562108B1 (en) | Program tracing in a multithreaded processor | |
US6308261B1 (en) | Computer system having an instruction for probing memory latency | |
CN108287730B (en) | Processor pipeline device | |
JP5889986B2 (en) | System and method for selectively committing the results of executed instructions | |
US8386754B2 (en) | Renaming wide register source operand with plural short register source operands for select instructions to detect dependency fast with existing mechanism | |
US6578137B2 (en) | Branch and return on blocked load or store | |
US6415380B1 (en) | Speculative execution of a load instruction by associating the load instruction with a previously executed store instruction | |
US6523110B1 (en) | Decoupled fetch-execute engine with static branch prediction support | |
JP5323936B2 (en) | Apparatus and method for speculative interrupt vector prefetch | |
US6629271B1 (en) | Technique for synchronizing faults in a processor having a replay system | |
US20140108771A1 (en) | Using Register Last Use Information to Perform Decode Time Computer Instruction Optimization | |
EP1886216B1 (en) | Controlling out of order execution pipelines using skew parameters | |
US7454598B2 (en) | Controlling out of order execution pipelines issue tagging | |
US8171266B2 (en) | Look-ahead load pre-fetch in a processor | |
KR20050084661A (en) | In order multithreading recycle and dispatch mechanism | |
KR20100108591A (en) | Processor including hybrid redundancy for logic error protection | |
US20010005882A1 (en) | Circuit and method for initiating exception routines using implicit exception checking | |
US20240036876A1 (en) | Pipeline protection for cpus with save and restore of intermediate results | |
US5634136A (en) | Data processor and method of controlling the same | |
US7831979B2 (en) | Processor with instruction-based interrupt handling | |
CN116414458A (en) | Instruction processing method and processor | |
US6453412B1 (en) | Method and apparatus for reissuing paired MMX instructions singly during exception handling | |
US20230315446A1 (en) | Arithmetic processing apparatus and method for arithmetic processing | |
US20220308887A1 (en) | Mitigation of branch misprediction penalty in a hardware multi-thread microprocessor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INFINEON TECHNOLOGIES NORTH AMERICA CORP., CALIFOR Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NORDEN, ERIK K.;OBER, ROBERT E.;ARNOLD, ROGER D.;AND OTHERS;REEL/FRAME:014973/0760;SIGNING DATES FROM 20031215 TO 20040130 |
|
AS | Assignment |
Owner name: INFINEON TECHNOLOGIES AG, GERMANY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INFINEON TECHNOLOGIES NORTH AMERICA CORP.;REEL/FRAME:015019/0284 Effective date: 20040823 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PAYER NUMBER DE-ASSIGNED (ORIGINAL EVENT CODE: RMPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
CC | Certificate of correction | ||
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |