US20080215864A1 - Method and apparatus for instruction pointer storage element configuration in a simultaneous multithreaded processor - Google Patents
Method and apparatus for instruction pointer storage element configuration in a simultaneous multithreaded processor Download PDFInfo
- Publication number
- US20080215864A1 US20080215864A1 US11/638,315 US63831506A US2008215864A1 US 20080215864 A1 US20080215864 A1 US 20080215864A1 US 63831506 A US63831506 A US 63831506A US 2008215864 A1 US2008215864 A1 US 2008215864A1
- Authority
- US
- United States
- Prior art keywords
- multiplexer
- thread
- processor
- instruction pointer
- steer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title description 16
- 238000013461 design Methods 0.000 abstract description 8
- 208000010693 Charcot-Marie-Tooth Disease Diseases 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 239000000872 buffer Substances 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
Abstract
A simultaneous multithreaded processor that reduces the number of hardware components necessary as well as the complexity of design over current systems is disclosed. As opposed to requiring individual storage elements for saving instruction pointer information for each re-steer logic component within a processor pipeline, the present invention allows for instruction pointer information of an inactive thread to be stored in a single, ‘inactive thread’ storage element until the thread becomes active again.
Description
- This application is a Continuation application and claims the benefit of priority from application U.S. Ser. No. 09/753,764 filed on Dec. 29, 2000, and will issue as U.S. Pat. No. 7,149,880 on Dec. 12, 2006.
- The present invention relates to processor design. More specifically, the present invention relates to a system that reduces the number of hardware components necessary for instruction pointer generation in a simultaneous multithreaded processor.
- Multithreaded processors have become more and more popular in the art to minimize unproductive time spent by a processor. Multithreading enables a processor to perform tasks for a given thread until a specific event occurs, such as a certain number of execution cycles passing, a higher priority thread requiring attention, or the current thread being forced into a stall mode while waiting for data, and then beginning processing on another thread.
- To facilitate multiple program threads being actively executed, simultaneous multithreaded (SMT) implementations require that multiple threads be fetched and readied for execution. Different methods exist in the art for fetching instructions from the various active threads. One approach is to utilize multiple ‘front ends’ to fetch and fill the de-coupling buffers that feed the ‘back end’ execution pipes—one for each thread. This approach requires a large amount of hardware for the multiple front ends (which also include the level one instruction cache).
- An alternate, more common, method of facilitating multithreading is to time-multiplex between various threads in a ‘round-robin’ fashion.
FIG. 2 provides an illustration of simple instruction pointer logic utilizing such a method of time-multiplexing between two threads. A problem with this method is that, not only do themultiplexers multiplexer multiplexers next level multiplexer 246 to select between the multiple threads. - It is therefore desirable to have a system for a simultaneous multithreaded processor that minimizes the number of hardware components necessary as well as the complexity of design.
-
FIG. 1 illustrates how a typical processor utilizing pipelining methodology operates with regards to instruction pointer generation. -
FIG. 2 illustrates the operation of instruction pointer generation of a typical simultaneous multithreaded processor that utilizes pipelining methodology. -
FIG. 3 provides an illustration of an embodiment of the present invention. -
FIG. 4 provides a flowchart for the process of an embodiment of the present invention. - A system and method are disclosed for a simultaneous multithreaded processor that reduces the number of hardware components necessary as well as the complexity of design over current systems of multithreaded processors. As stated previously, prior systems require a
storage element FIG. 2 ) Further, the number of inputs to each multiplexer needs to be doubled as compared to a non-multithreaded design. For each necessary input, there is a direct (live)feed 260 and a pre-recorded data path (from the storage element) 258. - In an embodiment of the present invention, instruction pointer information of the inactive thread is saved in a
storage element 348,350 (FIG. 3 ) located after therespective multiplexer multiplexer 318,320 (as an ‘inactive thread re-steer’), through thecommon multiplexer 346, and on to the processor pipeline. -
FIG. 1 illustrates how a typical processor utilizing pipelining methodology operates with regards to instruction pointer generation. Pipelining organizes instructions into a kind of assembly line process, where the microprocessor begins executing a second instruction before the first has been completed. That is, different instructions are in the pipeline simultaneously, each at a different processing stage. This is done to improve efficiency and to minimize unproductive time spent by the processor. - In the first stage, Instruction Pointer Generation-1 (IPG-1) 102, a pointer is provided for
debug operations 122, such as ‘design for testability’ (DFT) operations. This instruction pointer path maintains the highest priority for themultiplexer 118. Therefore, this instruction pointer would pre-empt any other instructions received at the multiplexer simultaneously. The next stage in the processor pipeline, instruction pointer generation (IPG) 104, is the stage of the pipeline in which the various re-steers are presented to themultiplexer 118 for priority routing to the processor. All of the re-steer logic paths are fed into themultiplexer 118 with their associated multiplexer priority. The priority is denoted by the order of placement on the multiplexer—the higher the input location, the higher the priority. Also, there-steer logic components bubble branch re-steer 128 has a one cycle penalty (the one cycle previous to it must be flushed). Therefore, it has a greater priority than a ‘0’ bubble branch re-steer 130, which does not have to flush a cycle. Also,sequential IP 132 is lowest in priority because a simple instruction pointer increment should not be performed unless it is assured that no re-steers need to be performed. - Before passing to the next stage of the processor pipeline, the instruction pointer passes through an IPG/IPG+1
staging storage element 134 to be held until the instruction pointer generation +1 (IPG+1) 106 stage is ready to accept the instruction pointer. Based on past history of logic decisions with regards to jumps to different address locations, the ‘0’ bubble branch re-steerlogic 130 decides whether there is a significant likelihood that upon execution, the processor will jump to a different (non-sequential) address. If the pipeline had been filled with successive instruction pointers for each successive pipeline stage, and a jump turns out to be necessary, the entire contents of the pipeline prior to that point would need to be flushed. By utilizing past history of logic decisions, it is possible to minimize the number of times the pipeline needs to be flushed. - Based on history, the ‘0’
bubble re-steer logic 130 provides a new instruction pointer to themultiplexer 118 if necessary. This re-steer logic is called “0 bubble” because it has a zero cycle penalty. No stages of the pipeline have to be flushed. In contrast, a ‘1’ bubble branch re-steer necessitates that the contents of IPG+1 106 be flushed. This is explained further in the following. - Within the IPG+1 stage of the processor pipeline, the instruction pointer is incremented at sequential IP (IP+1) 132 to move to the next address if necessary. Upon moving to the next stage of the processor pipeline, the instruction pointer passes through and may be held if necessary at the IPG+1/IPG+2
staging storage element 136 until the instruction pointer generation +2 (IPG+2) 108 stage can accept the instruction pointer. At this stage, ‘1’ bubblebranch re-steer logic 128 determines, based on history, whether a new instruction pointer should be supplied to the pipeline. After the IPG+2 stage, the instruction pointer may be held at the IPG+2/REG storage element 138, and then it is received in the register access (REG)stage 110. In theREG stage 110, data is read from the one or more registers to be processed. With the data of the registers, after the REG/EXEstaging storage element 140, the required function (such as addition, subtraction, etc.) is performed on the data of the registers at the execute (EXE)stage 112. - After the
EXE stage 112, the instruction pointer may be held in the EXE/XPNstaging storage element 142, and then the exception detection (XPN)stage 114 is entered. In theXPN 114 stage, the processor makes sure that the instruction did not encounter an exception. The instruction pointer with associated data then passes through an XPN/CMTstaging storage element 144 and then on to the commitCMT stage 116. In theCMT stage 116, if there were no exceptions encountered in theXPN stage 114, the register is updated with the resulting value. If there was an exception encountered in theXPN stage 114, exception/fault re-steer logic 124 is utilized to provide an instruction pointer for exception handling. - If it is determined at this stage by branch
mispredict re-steer logic 126, that the address predicted in the ‘0’branch re-steer logic 130 or the ‘1’branch re-steer logic 128 was in fact wrong—that the following stages (to the left) were pre-loaded with incorrect instruction pointers, the processor pipeline is flushed and the branchmispredict re-steer logic 128 provides the correct instruction pointer back to themultiplexer 118. -
FIG. 2 illustrates the operation of instruction pointer generation of a typical simultaneous multithreaded processor that utilizes pipelining methodology. In the IPG stage 204, aseparate multiplexer MUX1 218 is utilized forthread 1, andMUX2 220 is utilized forthread 2. Acommon multiplexer 246 switches between the two threads, depending upon which one is active. As compared to the single-thread, pipelined processor described inFIG. 1 , the typical multi-thread version has twice as many inputs for eachmultiplexer - For example, while
thread 1 is active, re-steer information is fed from the differentre-steer logic common multiplexer 246 to pass to the processor pipeline. Thecommon multiplexer 246 chooses between thedifferent multiplexers multiplexer 218 directly from whicheverre-steer logic - While
thread 1 is active and is being fed into the processor,thread 2 is inactive. Re-steer information necessary forthread 2 must be saved during the inactivity ofthread 2 in order to prevent loss. Therefore, in a typical multithreaded processor, eachre-steer logic component storage element thread 1 becomes inactive, because of an event such as a stall in waiting for another process' output or because of a pre-determined number of clocks passing,thread 2 can then become active once again.MUX2 220 can then receive the pre-recordedre-steer information 258 to resume activity as if no interruption had occurred. - As stated previously, this system for a simultaneous multithreaded processor requires a large amount of hardware because of the individually utilized storage elements and a great complexity in the wiring design.
-
FIG. 3 provides an illustration of an embodiment of the present invention. As opposed to storing re-steer information in individual storage elements related to each of there-steer logic components multiplexer common multiplexer 346, the instruction pointer information of the inactive thread is stored in an ‘inactive thread’storage element respective multiplexer re-steer storage element logic storage element multiplexer bubble branch re-steer 330 andsequential IP 332 would not be utilized (because of the threads changing), and the newly active thread instruction pointer would have been coming from the inactive re-steer storage element (into the multiplexer). Because the ‘1’bubble branch re-steer 328 was generated a cycle earlier, it has a higher priority than theinactive thread re-steer re-steer storage element bubble branch re-steer 330 andsequential IP 332 would be possible. -
FIG. 4 provides a flowchart for the process of an embodiment of the present invention. In an embodiment of the present invention, at each thread of one or more threads, the highest priority instruction pointer is presented in theprocessor appropriate instruction pointer 416. The system of an embodiment of the present invention checks to see which thread is active 406, 408. In this embodiment, when one thread is active the other(s) must be inactive. In an embodiment, if the thread is inactive, the instruction pointer of that thread is stored in astorage element processor processor processor - In an embodiment, the processor next performs the process of ‘register access’ 430, where it reads the appropriate values from the registers. The processor then enters the ‘execute’
stage 432, where it performs the necessary function upon the register values. In an embodiment, the processor then looks to see if an exception was encountered 434. If an exception was encountered 436, an appropriate instruction pointer is generated 438 and provided in theprocessor processor - Although several embodiments are specifically illustrated and described herein, it will be appreciated that modifications and variations of the present invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention.
Claims (9)
1. A processor comprising:
a first multiplexer to receive a first instruction thread;
a second multiplexer to receive a second instruction thread;
a first storage element to store a set of instruction pointers associated with the first instruction thread from the first multiplexer if the first thread is inactive;
a second storage element to store a set of instruction pointers associated with the second instruction thread from the second multiplexer if the second thread is inactive;
a first inactive thread re-steer logic to cause the first multiplexer to receive the first thread if the first thread becomes active;
a second inactive thread re-steer logic to cause the second multiplexer to receive the second thread if the second thread becomes active.
2. The processor of claim 1 , further comprising a common multiplexer coupled between said first and second multiplexer.
3. The processor of claim 2 , wherein the common multiplexer is to receive instruction pointer information sequentially from the first multiplexer and the second multiplexer in a time-multiplexed manner.
4. The processor of claim 2 , wherein the common multiplexer is to receive instruction pointer information sequentially from the first multiplexer and the second multiplexer in an alternating manner.
5. The processor of claim 1 , wherein the first multiplexer and the second multiplexer are priority multiplexers.
6. The processor of claim 5 , wherein the first multiplexer and the second multiplexer are to receive instruction pointer information and data from a plurality of stages in a processor pipeline.
7. The processor of claim 6 , wherein the first multiplexer and the second multiplexer are to receive instruction pointer information and data from re-steer logic at the plurality of stages in the processor pipeline.
8. The processor of claim 7 , wherein the first multiplexer and the second multiplexer are to pass the instruction pointer information and data to a common multiplexer along with a priority indicator.
9. The processor of claim 1 , wherein the storage element is a flip-flop device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/638,315 US20080215864A1 (en) | 2000-12-29 | 2006-12-12 | Method and apparatus for instruction pointer storage element configuration in a simultaneous multithreaded processor |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/753,764 US7149880B2 (en) | 2000-12-29 | 2000-12-29 | Method and apparatus for instruction pointer storage element configuration in a simultaneous multithreaded processor |
US11/638,315 US20080215864A1 (en) | 2000-12-29 | 2006-12-12 | Method and apparatus for instruction pointer storage element configuration in a simultaneous multithreaded processor |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/753,764 Continuation US7149880B2 (en) | 2000-12-29 | 2000-12-29 | Method and apparatus for instruction pointer storage element configuration in a simultaneous multithreaded processor |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080215864A1 true US20080215864A1 (en) | 2008-09-04 |
Family
ID=25032059
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/753,764 Expired - Fee Related US7149880B2 (en) | 2000-12-29 | 2000-12-29 | Method and apparatus for instruction pointer storage element configuration in a simultaneous multithreaded processor |
US11/638,315 Abandoned US20080215864A1 (en) | 2000-12-29 | 2006-12-12 | Method and apparatus for instruction pointer storage element configuration in a simultaneous multithreaded processor |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/753,764 Expired - Fee Related US7149880B2 (en) | 2000-12-29 | 2000-12-29 | Method and apparatus for instruction pointer storage element configuration in a simultaneous multithreaded processor |
Country Status (1)
Country | Link |
---|---|
US (2) | US7149880B2 (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7149880B2 (en) * | 2000-12-29 | 2006-12-12 | Intel Corporation | Method and apparatus for instruction pointer storage element configuration in a simultaneous multithreaded processor |
US7350060B2 (en) * | 2003-04-24 | 2008-03-25 | International Business Machines Corporation | Method and apparatus for sending thread-execution-state-sensitive supervisory commands to a simultaneous multi-threaded (SMT) processor |
US7401207B2 (en) * | 2003-04-25 | 2008-07-15 | International Business Machines Corporation | Apparatus and method for adjusting instruction thread priority in a multi-thread processor |
US8694976B2 (en) * | 2003-12-19 | 2014-04-08 | Intel Corporation | Sleep state mechanism for virtual multithreading |
US7681014B2 (en) | 2005-02-04 | 2010-03-16 | Mips Technologies, Inc. | Multithreading instruction scheduler employing thread group priorities |
US7631130B2 (en) * | 2005-02-04 | 2009-12-08 | Mips Technologies, Inc | Barrel-incrementer-based round-robin apparatus and instruction dispatch scheduler employing same for use in multithreading microprocessor |
US7490230B2 (en) | 2005-02-04 | 2009-02-10 | Mips Technologies, Inc. | Fetch director employing barrel-incrementer-based round-robin apparatus for use in multithreading microprocessor |
US7657883B2 (en) * | 2005-02-04 | 2010-02-02 | Mips Technologies, Inc. | Instruction dispatch scheduler employing round-robin apparatus supporting multiple thread priorities for use in multithreading microprocessor |
US7823153B1 (en) | 2005-09-30 | 2010-10-26 | Symantec Corporation | System and method for detecting and logging in-line synchronization primitives in application program code |
US7930684B2 (en) * | 2005-10-12 | 2011-04-19 | Symantec Operating Corporation | System and method for logging and replaying asynchronous events |
US8117600B1 (en) | 2005-12-29 | 2012-02-14 | Symantec Operating Corporation | System and method for detecting in-line synchronization primitives in binary applications |
US7975272B2 (en) * | 2006-12-30 | 2011-07-05 | Intel Corporation | Thread queuing method and apparatus |
US7945764B2 (en) * | 2008-01-11 | 2011-05-17 | International Business Machines Corporation | Processing unit incorporating multirate execution unit |
US20160283233A1 (en) * | 2015-03-24 | 2016-09-29 | Freescale Semiconductor, Inc. | Computer systems and methods for context switching |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5907702A (en) * | 1997-03-28 | 1999-05-25 | International Business Machines Corporation | Method and apparatus for decreasing thread switch latency in a multithread processor |
US5933627A (en) * | 1996-07-01 | 1999-08-03 | Sun Microsystems | Thread switch on blocked load or store using instruction thread field |
US7149880B2 (en) * | 2000-12-29 | 2006-12-12 | Intel Corporation | Method and apparatus for instruction pointer storage element configuration in a simultaneous multithreaded processor |
-
2000
- 2000-12-29 US US09/753,764 patent/US7149880B2/en not_active Expired - Fee Related
-
2006
- 2006-12-12 US US11/638,315 patent/US20080215864A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5933627A (en) * | 1996-07-01 | 1999-08-03 | Sun Microsystems | Thread switch on blocked load or store using instruction thread field |
US5907702A (en) * | 1997-03-28 | 1999-05-25 | International Business Machines Corporation | Method and apparatus for decreasing thread switch latency in a multithread processor |
US7149880B2 (en) * | 2000-12-29 | 2006-12-12 | Intel Corporation | Method and apparatus for instruction pointer storage element configuration in a simultaneous multithreaded processor |
Also Published As
Publication number | Publication date |
---|---|
US20020087843A1 (en) | 2002-07-04 |
US7149880B2 (en) | 2006-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20080215864A1 (en) | Method and apparatus for instruction pointer storage element configuration in a simultaneous multithreaded processor | |
US6240510B1 (en) | System for processing a cluster of instructions where the instructions are issued to the execution units having a priority order according to a template associated with the cluster of instructions | |
EP0399762B1 (en) | Multiple instruction issue computer architecture | |
US6304960B1 (en) | Validating prediction for branches in a cluster via comparison of predicted and condition selected tentative target addresses and validation of branch conditions | |
US20160291982A1 (en) | Parallelized execution of instruction sequences based on pre-monitoring | |
KR100616722B1 (en) | Pipe1ined instruction dispatch unit in a supersca1ar processor | |
EP0730221B1 (en) | Superscalar processor with multiple register windows and speculative return address generation | |
EP1562108B1 (en) | Program tracing in a multithreaded processor | |
US20060248319A1 (en) | Validating branch resolution to avoid mis-steering instruction fetch | |
EP1562109A1 (en) | Thread id propagation in a multithreaded pipelined processor | |
KR101081674B1 (en) | A system and method for using a working global history register | |
US20070083736A1 (en) | Instruction packer for digital signal processor | |
WO2003067424A2 (en) | Processor with delayed branch resolution | |
EP3306468A1 (en) | A method and a processor | |
US5778208A (en) | Flexible pipeline for interlock removal | |
US5634136A (en) | Data processor and method of controlling the same | |
US20030120883A1 (en) | Electronic processing device and method of pipelining in such a device | |
US5737562A (en) | CPU pipeline having queuing stage to facilitate branch instructions | |
US6601162B1 (en) | Processor which executes pipeline processing having a plurality of stages and which has an operand bypass predicting function | |
US7013256B2 (en) | Computer system with debug facility | |
WO2016156955A1 (en) | Parallelized execution of instruction sequences based on premonitoring | |
US6453412B1 (en) | Method and apparatus for reissuing paired MMX instructions singly during exception handling | |
KR102379886B1 (en) | Vector instruction processing | |
US10296350B2 (en) | Parallelized execution of instruction sequences | |
US6718460B1 (en) | Mechanism for error handling in a computer system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |