WO2008027823A1 - Dependent instruction thread scheduling - Google Patents
Dependent instruction thread scheduling Download PDFInfo
- Publication number
- WO2008027823A1 WO2008027823A1 PCT/US2007/076867 US2007076867W WO2008027823A1 WO 2008027823 A1 WO2008027823 A1 WO 2008027823A1 US 2007076867 W US2007076867 W US 2007076867W WO 2008027823 A1 WO2008027823 A1 WO 2008027823A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- thread
- instruction
- accordance
- unpredictable latency
- Prior art date
Links
- 230000001419 dependent effect Effects 0.000 title claims description 18
- 230000015654 memory Effects 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 9
- 230000004044 response Effects 0.000 claims 3
- 238000000034 method Methods 0.000 description 32
- 230000007704 transition Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 239000002245 particle Substances 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3838—Dependency mechanisms, e.g. register scoreboarding
Definitions
- the present invention relates generally to graphics processors and more specifically to thread scheduling of threads having dependent instructions.
- Specialized processors are often used to perform specific functions related to a type of application in order to efficiently and quickly perform operations related to the application.
- graphic processors perform various graphics operations to process image data and render an image and are efficient at manipulating and displaying computer graphics. Due to their highly-parallel structure, graphics processors are more effective than typical general processors for a range of complex algorithms.
- a graphics processor implements a number of graphics primitive operations in a way that makes executing the operations much faster than presenting the graphics directly to a screen with the host central processing unit (CPU).
- a thread scheduler is used to control the timing of the execution of the instructions of the various threads and efficiently allocate resources to the threads.
- Some instructions require the retrieval of data from a data source having an unpredictable latency. For example, retrieval of data from some memories within a processor system may have an unpredictable latency due to data size or location.
- a texture engine that returns texture data in a graphics processor.
- An instruction of a thread may or may not require the retrieval of data from a data source with unpredictable latency.
- a dependent instruction may require data acquired by a previous instruction to execute the dependent instruction. Where the required data is acquired from an unpredictable latency data source, the required data may not be returned in time to execute the dependent instruction.
- One technique for managing threads in conventional system includes checking data availability before executing every instruction of a particular thread. Such methods, however, require complicated detection schemes that utilize resources.
- Another technique used in conventional systems includes placing instructions on hold until a load instruction is completed which results in low efficiency. [0005] Therefore, there is a need for a thread scheduler for efficiently managing the execution of threads having dependent instructions.
- a thread scheduler includes thread context units for submitting instructions of threads for scheduling by the thread scheduler where each context register includes a load reference counter for maintaining a counter value indicative of a difference between a number of data requests and a number of data returns associated with the particular context register.
- a thread context controller of the context unit is configured to refrain from submitting an instruction of a thread when the counter value is nonzero and the instruction includes a data dependency indicator indicating the instruction requires data returned by a previous instruction.
- FIG. 1 is a block diagram of a thread management system in accordance with the exemplary embodiment of the invention.
- FIG. 2 is a state machine diagram of thread control in accordance with the exemplary embodiment.
- FIG. 3 is flow chart of a method of managing threads with dependent instructions in accordance with the exemplary embodiment.
- FIG. 1 is a block diagram of a thread managing system 100 in accordance with an exemplary embodiment of the invention.
- the thread managing system 100 may include other processes, entities, engines, and/or functions in addition to those discussed with reference to FIG. 1.
- the blocks illustrated in FIG. 1 may be implemented using any combination of hardware, software, and/or firmware. Further, the functions and operations of the blocks described in FIG. 1 may be implemented in any number of devices, circuits, or elements.
- Two or more of the functional blocks may be integrated in a single device and the functions described as performed in any single device may be implemented over several devices.
- An example of a suitable implementation of the thread management system 100 includes implementing the system 100 as part of a multi-thread processor 101. The techniques discussed, however, may be applied to any of various processors or computer systems that are used to schedule and process multiple threads.
- a thread scheduler 102 an arithmetic logic unit (ALU) 106, a load controller 108, an instruction cache (not shown), register file bank (not shown), constant random access memory (RAM) (not shown), other functions are implemented in a multi-threaded processor 101.
- the multi-thread processor 101 is a programmable processor configured to efficiently process particular types of data streams.
- An example of a suitable multi-thread processor 101 includes a multi-thread processor that includes constant data for efficiently processing multi-media data streams (e.g., video, audio, etc.).
- the constant RAM supports the ALU by improving register bank retrieval efficiency.
- An instruction cache stores instructions for the threads to provide instructions to the thread scheduler 102.
- load controller 108 loads the instruction cache with instructions from memory 128 and loads the constant RAM and the register file bank with data from the memory 128 and/or the texture engine 130.
- the instructions indicate specific operations to be performed for each thread. Examples of suitable operations include arithmetic operations, elementary functions, and memory access operations.
- the constant RAM stores constant values used by ALU 106.
- the register file bank may store temporary results as well as final results from ALU 106 for threads.
- An output interface (not shown) receives the final results for the executed threads from register file bank and provides the results to the corresponding applications.
- the thread managing system 100 receives threads, such as graphics threads for example, from an application.
- the thread scheduler 102 receives a thread stream and performs various functions to schedule and manage execution of threads. For example, the thread scheduler 102 may schedule processing of threads, determine whether resources needed by a particular thread are available, and move thread data to a register file bank via the load controller 108. The thread scheduler 102 interfaces with the load controller 108 in order to synchronize the resources for received threads. The thread scheduler 102 may also monitor the order in which threads are received from a particular application and cause those threads to be outputted in the same order or sequence as received.
- the thread scheduler 102 selects active threads for execution, checks for read/write port conflicts among the selected threads and, if there are no conflicts, assigns an appropriate instruction from a thread into an ALU 106 and sends another instruction of another thread to the load controller 108.
- the load controller 108 may also be configured to obtain data associated with a thread from an unpredictable latency data source 104 such as a texture engine 130 or external memory 128.
- the memory 128 may include a global data cache and/or an external memory device, for example.
- the load controller 108 loads thread data into a register file bank (not shown) and associated instructions into an instruction cache (not shown).
- the thread scheduler 102 also removes threads that have been processed by ALU 106.
- the ALU 106 may be a single quad ALU or may include four scalar ALUs. In the exemplary embodiment, the ALU 106 performs pixel-parallel processing on one component of an attribute for up to four pixels. In some circumstances, the ALU 106 performs component-parallel processing on up to four components of an attribute for a single pixel.
- the ALU 106 fetches data from the register file bank and receives constants from constant RAM. Ideally, the ALU 106 processes data at every clock cycle so that it is not idle, thereby increasing processing efficiency.
- the ALU 106 may include multiple read and write ports on an interface to register file bank so that it is able to provide thread results while new thread data is fetched/read on each clock cycle.
- the thread scheduler includes a plurality of thread context units 110, 112, 114 for managing and submitting threads for execution through a thread arbitration/resource manager (referred to herein as a resource manager) 132. Each thread is assigned to a thread slot and managed on an associated thread context unit 110, 112, 114.
- a thread context register 134, 136, 138 within each thread connect unit 110, 112, 114 stores the instruction type and other information for each instruction of the thread.
- a thread context controller 140, 142, 144 within each thread context unit 110, 112, 114 controls the submission of instructions to the resource manager 132.
- a thread context unit requests resources from the resource manager 132. When a resource is available, the resource manager grants the request.
- the flow controller (not shown) and the resource manager 132 within the thread scheduler 102 allow each thread to access the ALU 106, load controller 108, and instruction cache (not shown) to allow each thread context unit 110, 112, 114 to load data and to have instructions executed by the appropriate resource to execute the thread.
- the thread context unit In order to execute an instruction, the thread context unit first requests that the instruction be loaded into an instruction cache. The instruction is at least partly decoded and the context control 140 within the thread context unit 110 determines whether the instruction should be executed based on an existence of a dependency indicator and a value of the load reference counter (LRC) 116 of the thread context unit 110.
- LRC load reference counter
- an instruction type indicates to the resource manager 132 the resource needed to execute the instruction.
- the requested resource may be the ALU 106, memory 128, or texture engine 130, for example.
- the resource manager (thread arbitration manager) 132 therefore, manages the allocation of the requested resources to grant the requested resources to the requesting thread context units.
- An instruction within a thread may have a predictable latency or an unpredictable latency. Instructions that have a predictable latency are referred to herein as predictable latency instructions and include instructions that are executed within a known number of clock cycles. Examples of predictable latency instructions include ALU operations and other operations that do not require resources external to the multithreaded processor core. In the exemplary embodiment, internal multi-thread processor 101 operations are designed to have a standard latency. Instructions that do not have predictable latencies are referred to herein as unpredictable latency instructions. Examples of unpredictable latency instructions include operations that require external resources outside of the multi-threaded processor 101. For example, texture related instructions and data retrieval instructions accessing external memory have unpredictable latency.
- a particular instruction may require the retrieval of data from a data source that has an unpredictable latency.
- the unpredictable data source 104 may be a texture engine 130 or memory 128.
- a texture related instruction within the thread requires the texture engine 130 to process information provided by the instruction and to return texture data.
- a memory related instruction requires the retrieval of stored data stored within the memory 128. Where the memory is off chip, the time required to retrieve the requested data may not be predictable. Other situations may arise where a particular data retrieval event has an unpredictable latency.
- the thread context units include appropriate resources for managing the execution of the threads.
- the thread context unit 110, 112, 114 includes a context controller 140, 142, 144 implemented in logic for managing the execution of the thread.
- a context controller can be modeled with a state machine where the context controller transitions between a finite set of states.
- each thread context register 110, 112, 114 includes a load reference counter (LRC) 116, 120, 124 and each connect controller 140, 142, 144 includes a dependency indicator detector (DID) 118, 122, 126.
- LRC load reference counter
- DID dependency indicator detector
- Each LRC is incremented by one when a request for data is placed from the associated thread context unit executing the thread.
- Each LRC is decremented by one when the requested data is returned from the source. Therefore, the load reference counter (LRC) is configured to maintain a counter value indicative of a difference between a number of data requests and a number of data returns for the particular thread.
- the dependency indicator detectors 118, 122, 126 are configured to detect a dependency indicator in an instruction.
- the dependency indicator may be any type of data, flag, bit, or other indicator that indicates whether the instruction requires data from a previous instruction in the thread.
- the dependency indicators are added. In the exemplary embodiment, a single bit dependency indicator is added to each data dependent instruction of the thread.
- each instruction of a thread that requires data from a previous instruction includes the dependency indicator.
- each LRC is incremented and decremented in accordance with the data requests and returns for the corresponding thread.
- the dependency indicator detector (DID) 112, 122, 126 identifies an instruction as dependent on a data from a previous instruction
- the LRC is evaluated to determine if it is zero. If the LRC is nonzero, the thread context unit refrains from submitting the instruction and the thread context unit is placed in a wait state. The instruction is executed when the LRC is equal to zero.
- FIG. 2 is a state machine diagram of thread control in accordance with the exemplary embodiment.
- the state machine diagram is a model of behavior composed of states, transitions and actions.
- a state stores information about the past since it reflects the input changes from the system start to the present.
- a transition is indicated by the arrows in FIG. 2 and indicates a state change. Particular conditions must be met in order for a transition to occur.
- FIG. 2 is representative of the states and transitions of a thread context unit. Accordingly, each thread context unit may be in a different state during operation of the thread scheduler.
- the thread context unit may be in one seven states including an IDLE state 202, an ALU state 204, a LOAD state 206, a FLOW state 208, a DEP state 210, a WAIT state 212, and a EMIT state 214. All threads begin and end in the IDLE state 202. The thread context unit is awaiting a new thread in the IDLE state 202. An instruction type is presented to the resource manager 132 when transitioning from the IDLE state to the WAIT state 204 where the thread context unit waits for a resource to become available. When the load controller is available, the instruction is loaded from the instruction cache and the instruction is executed.
- the instruction may include an ALU operation, a load operation, or a texture operation.
- the thread context unit transitions to the appropriate state.
- the thread context unit performs flow control tasks and updates the program counter.
- the thread context unit performs load tasks such as initial loading, loading texture data, loading data from memory and other load and store execution.
- the thread context unit waits for a resource.
- the arithmetic logic unit (ALU) 106 is accessed in the ALU state 204 to perform an arithmetic or logic instruction.
- the thread context unit enters the DEP state 210.
- the LRC returns to zero, the thread context unit transitions to the previous exit state.
- the EMIT state 214 the returned result of the instruction is moved out of the context register.
- FIG. 3 is flow chart of a method of managing threads with dependent instructions in accordance with the exemplary embodiment.
- the method may be applied other structures, the method is executed by a thread context unit in the exemplary embodiment.
- the following steps provide an example of performing the dependent instruction related techniques discussed with reference to FIG. 2. Accordingly, other operations of the thread context unit can be modeled with other blocks or flow charts that are omitted in FIG. 3 in the interest of clarity and brevity.
- the method is performed by logic circuitry performing the operations of the context controller 140, 142, 144.
- the actions of the context controller logic is dictated by the compiled thread code. Accordingly, during the compiling procedure, the compiler identifies the dependencies indicated in a higher level code and inserts the dependency indicators within the appropriate locations of the instructions that is recognized by the context controller.
- LRC_VAL are initialized for the new thread.
- a global register sets the initialized values for the thread.
- INIT-PC may be nonzero and the PC may be set to a nonzero value.
- the INIT_LRC_VAL is zero. If, however, the thread requires data to be preloaded such as texture data, for example, the INIT LRC V AL may be set to a non-zero value.
- the thread context unit is in an IDLE state 202 and waiting for a new thread. When a new thread is assigned, the method continues at step 304.
- step 304 it is determined whether the current instruction is a dependent instruction. If a dependency indictor is detected in the instruction, the instruction is identified as a dependent instruction and the method continues at step 306. Otherwise, the method proceeds to step 312.
- step 306 it is determined whether the LRC value (LRC VAL) is equal to zero. If the LRC is equal to zero, the method continues at step 312. If the LRC value is nonzero, the method proceeds to step 308. [0030] At step 308, it is determined whether requested data has been retrieved. In the exemplary embodiment, the load controller indicates that the data has been returned to the appropriate register file. If data has been returned, the method continues at step 310, where the LRC is decremented by 1. Otherwise, the method returns to step 308. After step 310, method returns to step 306.
- a request is placed for the appropriate resource to execute the instruction.
- the resource manager read arbitration manager 132
- the instruction is executed by the appropriate resource.
- step 314 it is determined whether the instruction has requested data. If data has been requested, the method continues at step 316 where the LRC value is incremented by 1 before the method proceeds to step 318. If no data has been requested, the method proceeds to step 318.
- step 318 the program counter is incremented by 1.
- step 320 it is determined weather the next PC instruction exists for the thread. If the there are more instructions the method returns to step 304. Otherwise, the method returns to step 302 to wait for the next thread.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- a general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine.
- a processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
- a software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
- An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
- the storage medium may be integral to the processor.
- the processor and the storage medium may reside in an ASIC.
- the ASIC may reside in a user terminal.
- the processor and the storage medium may reside as discrete components in a user terminal.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Executing Machine-Instructions (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Advance Control (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP07814471A EP2080090A1 (en) | 2006-08-29 | 2007-08-27 | Dependent instruction thread scheduling |
JP2009526842A JP2010503070A (en) | 2006-08-29 | 2007-08-27 | Dependent instruction thread scheduling |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/468,221 US8291431B2 (en) | 2006-08-29 | 2006-08-29 | Dependent instruction thread scheduling |
US11/468,221 | 2006-08-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2008027823A1 true WO2008027823A1 (en) | 2008-03-06 |
Family
ID=38739898
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2007/076867 WO2008027823A1 (en) | 2006-08-29 | 2007-08-27 | Dependent instruction thread scheduling |
Country Status (6)
Country | Link |
---|---|
US (1) | US8291431B2 (en) |
EP (1) | EP2080090A1 (en) |
JP (1) | JP2010503070A (en) |
KR (1) | KR20090045944A (en) |
CN (1) | CN101506774A (en) |
WO (1) | WO2008027823A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101079001B1 (en) | 2008-06-30 | 2011-11-01 | 인텔 코포레이션 | Managing active thread dependencies in graphics processing |
GB2514618A (en) * | 2013-05-31 | 2014-12-03 | Advanced Risc Mach Ltd | Data processing systems |
EP3367235A1 (en) * | 2017-02-24 | 2018-08-29 | Advanced Micro Devices, Inc. | Separate tracking of pending loads and stores |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9317290B2 (en) * | 2007-05-04 | 2016-04-19 | Nvidia Corporation | Expressing parallel execution relationships in a sequential programming language |
US8527740B2 (en) * | 2009-11-13 | 2013-09-03 | International Business Machines Corporation | Mechanism of supporting sub-communicator collectives with O(64) counters as opposed to one counter for each sub-communicator |
US9311102B2 (en) * | 2010-07-13 | 2016-04-12 | Advanced Micro Devices, Inc. | Dynamic control of SIMDs |
KR20120017294A (en) | 2010-08-18 | 2012-02-28 | 삼성전자주식회사 | System and method of scheduling |
US8732711B2 (en) * | 2010-09-24 | 2014-05-20 | Nvidia Corporation | Two-level scheduler for multi-threaded processing |
KR101869939B1 (en) * | 2012-01-05 | 2018-06-21 | 삼성전자주식회사 | Method and apparatus for graphic processing using multi-threading |
CN102830954B (en) * | 2012-08-24 | 2014-10-29 | 北京中科信芯科技有限责任公司 | Method and device for instruction scheduling |
US9400653B2 (en) | 2013-03-14 | 2016-07-26 | Samsung Electronics Co., Ltd. | System and method to clear and rebuild dependencies |
US20140362098A1 (en) * | 2013-06-10 | 2014-12-11 | Sharp Laboratories Of America, Inc. | Display stream compression |
US9417876B2 (en) * | 2014-03-27 | 2016-08-16 | International Business Machines Corporation | Thread context restoration in a multithreading computer system |
US10108419B2 (en) * | 2014-09-26 | 2018-10-23 | Qualcomm Incorporated | Dependency-prediction of instructions |
TWI564807B (en) | 2015-11-16 | 2017-01-01 | 財團法人工業技術研究院 | Scheduling method and processing device using the same |
CN111045814B (en) * | 2018-10-11 | 2023-12-08 | 华为技术有限公司 | Resource scheduling method and terminal equipment |
US11740908B2 (en) * | 2020-03-16 | 2023-08-29 | Arm Limited | Systems and methods for defining a dependency of preceding and succeeding instructions |
US11740907B2 (en) * | 2020-03-16 | 2023-08-29 | Arm Limited | Systems and methods for determining a dependency of instructions |
US20220188144A1 (en) * | 2020-12-11 | 2022-06-16 | Oracle International Corporation | Intra-Process Caching and Reuse of Threads |
US20230409336A1 (en) * | 2022-06-17 | 2023-12-21 | Advanced Micro Devices, Inc. | VLIW Dynamic Communication |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5923862A (en) * | 1997-01-28 | 1999-07-13 | Samsung Electronics Co., Ltd. | Processor that decodes a multi-cycle instruction into single-cycle micro-instructions and schedules execution of the micro-instructions |
WO2000033183A1 (en) * | 1998-12-03 | 2000-06-08 | Sun Microsystems, Inc. | Method and structure for local stall control in a microprocessor |
US20030005260A1 (en) * | 1992-03-31 | 2003-01-02 | Sanjiv Garg | Superscalar RISC instruction scheduling |
US6557095B1 (en) * | 1999-12-27 | 2003-04-29 | Intel Corporation | Scheduling operations using a dependency matrix |
US6950927B1 (en) * | 2001-04-13 | 2005-09-27 | The United States Of America As Represented By The Secretary Of The Navy | System and method for instruction-level parallelism in a programmable multiple network processor environment |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0463965B1 (en) * | 1990-06-29 | 1998-09-09 | Digital Equipment Corporation | Branch prediction unit for high-performance processor |
US7847803B1 (en) * | 2000-07-26 | 2010-12-07 | Ati Technologies Ulc | Method and apparatus for interleaved graphics processing |
EP1227666A1 (en) * | 2001-01-18 | 2002-07-31 | Sony Service Centre (Europe) N.V. | Method and device for downloading application data |
US20070260856A1 (en) * | 2006-05-05 | 2007-11-08 | Tran Thang M | Methods and apparatus to detect data dependencies in an instruction pipeline |
JP2008015562A (en) | 2006-06-30 | 2008-01-24 | Kenichiro Ishikawa | Cache mistake/hit prediction |
-
2006
- 2006-08-29 US US11/468,221 patent/US8291431B2/en not_active Expired - Fee Related
-
2007
- 2007-08-27 CN CNA2007800316587A patent/CN101506774A/en active Pending
- 2007-08-27 WO PCT/US2007/076867 patent/WO2008027823A1/en active Application Filing
- 2007-08-27 JP JP2009526842A patent/JP2010503070A/en active Pending
- 2007-08-27 KR KR1020097006133A patent/KR20090045944A/en not_active Application Discontinuation
- 2007-08-27 EP EP07814471A patent/EP2080090A1/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030005260A1 (en) * | 1992-03-31 | 2003-01-02 | Sanjiv Garg | Superscalar RISC instruction scheduling |
US5923862A (en) * | 1997-01-28 | 1999-07-13 | Samsung Electronics Co., Ltd. | Processor that decodes a multi-cycle instruction into single-cycle micro-instructions and schedules execution of the micro-instructions |
WO2000033183A1 (en) * | 1998-12-03 | 2000-06-08 | Sun Microsystems, Inc. | Method and structure for local stall control in a microprocessor |
US6557095B1 (en) * | 1999-12-27 | 2003-04-29 | Intel Corporation | Scheduling operations using a dependency matrix |
US6950927B1 (en) * | 2001-04-13 | 2005-09-27 | The United States Of America As Represented By The Secretary Of The Navy | System and method for instruction-level parallelism in a programmable multiple network processor environment |
Non-Patent Citations (1)
Title |
---|
THEOBALD K B ET AL: "Superconducting processors for HTMT: issues and challenges", FRONTIERS OF MASSIVELY PARALLEL COMPUTATION, 1999. FRONTIERS '99. THE SEVENTH SYMPOSIUM ON THE ANNAPOLIS, MD, USA 21-25 FEB. 1999, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 21 February 1999 (1999-02-21), pages 260 - 267, XP010323709, ISBN: 0-7695-0087-0 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101079001B1 (en) | 2008-06-30 | 2011-11-01 | 인텔 코포레이션 | Managing active thread dependencies in graphics processing |
GB2514618A (en) * | 2013-05-31 | 2014-12-03 | Advanced Risc Mach Ltd | Data processing systems |
US10176546B2 (en) | 2013-05-31 | 2019-01-08 | Arm Limited | Data processing systems |
GB2514618B (en) * | 2013-05-31 | 2020-11-11 | Advanced Risc Mach Ltd | Data processing systems |
EP3367235A1 (en) * | 2017-02-24 | 2018-08-29 | Advanced Micro Devices, Inc. | Separate tracking of pending loads and stores |
US11074075B2 (en) | 2017-02-24 | 2021-07-27 | Advanced Micro Devices, Inc. | Wait instruction for preventing execution of one or more instructions until a load counter or store counter reaches a specified value |
Also Published As
Publication number | Publication date |
---|---|
EP2080090A1 (en) | 2009-07-22 |
US8291431B2 (en) | 2012-10-16 |
US20080059966A1 (en) | 2008-03-06 |
CN101506774A (en) | 2009-08-12 |
JP2010503070A (en) | 2010-01-28 |
KR20090045944A (en) | 2009-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8291431B2 (en) | Dependent instruction thread scheduling | |
US9069605B2 (en) | Mechanism to schedule threads on OS-sequestered sequencers without operating system intervention | |
US9690581B2 (en) | Computer processor with deferred operations | |
US8250396B2 (en) | Hardware wake-and-go mechanism for a data processing system | |
US11900122B2 (en) | Methods and systems for inter-pipeline data hazard avoidance | |
EP2179350B1 (en) | Compound instructions in a multi-threaded processor | |
EP0913767A2 (en) | A method and apparatus for affecting subsequent instruction processing in a data processor | |
US8635621B2 (en) | Method and apparatus to implement software to hardware thread priority | |
WO2008145653A1 (en) | Scheduling threads in a processor | |
US20100122064A1 (en) | Method for increasing configuration runtime of time-sliced configurations | |
JP2007249960A (en) | Method, device and program for performing cacheline polling, and information processing system | |
US10558418B2 (en) | Monitor support on accelerated processing device | |
CN110659115A (en) | Multi-threaded processor core with hardware assisted task scheduling | |
KR20150101870A (en) | Method and apparatus for avoiding bank conflict in memory | |
CN112559403B (en) | Processor and interrupt controller therein | |
US20110247018A1 (en) | API For Launching Work On a Processor | |
CN114035847B (en) | Method and apparatus for parallel execution of kernel programs | |
CN117501254A (en) | Providing atomicity for complex operations using near-memory computation | |
US7996848B1 (en) | Systems and methods for suspending and resuming threads | |
US20150363903A1 (en) | Wavefront Resource Virtualization | |
KR100861701B1 (en) | Register renaming system and method based on value similarity | |
US11947487B2 (en) | Enabling accelerated processing units to perform dataflow execution | |
JP2024523339A (en) | Providing atomicity for composite operations using near-memory computing | |
Sotudeh et al. | Intelligent co-operative processor-in-memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200780031658.7 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07814471 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 273/MUMNP/2009 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2009526842 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020097006133 Country of ref document: KR |
|
NENP | Non-entry into the national phase |
Ref country code: RU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2007814471 Country of ref document: EP |