GB2579534A - Load-store unit with partitioned reorder queues with single cam port - Google Patents

Load-store unit with partitioned reorder queues with single cam port Download PDF

Info

Publication number
GB2579534A
GB2579534A GB2006338.4A GB202006338A GB2579534A GB 2579534 A GB2579534 A GB 2579534A GB 202006338 A GB202006338 A GB 202006338A GB 2579534 A GB2579534 A GB 2579534A
Authority
GB
United Kingdom
Prior art keywords
store
load
instruction
processing unit
partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB2006338.4A
Other versions
GB202006338D0 (en
GB2579534B (en
Inventor
Sinharoy Balaram
Lloyd Bryan
Gonzales Christopher
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/726,627 external-priority patent/US11175924B2/en
Priority claimed from US15/726,596 external-priority patent/US10606591B2/en
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of GB202006338D0 publication Critical patent/GB202006338D0/en
Publication of GB2579534A publication Critical patent/GB2579534A/en
Application granted granted Critical
Publication of GB2579534B publication Critical patent/GB2579534B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30189Instruction operation extension or modification according to execution mode, e.g. mode flag
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • G06F9/3832Value prediction for operands; operand history buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3856Reordering of instructions, e.g. using queues or age tags
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • G06F12/1045Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
    • G06F12/1063Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache the data cache being concurrently virtually addressed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1008Correctness of operation, e.g. memory ordering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/652Page size control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/655Same page detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/657Virtual address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/68Details of translation look-aside buffer [TLB]
    • G06F2212/681Multi-level TLB, e.g. microTLB and main TLB
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/3834Maintaining memory consistency

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Advance Control (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

Technical solutions are described for a load-store unit (LSU) that executes a plurality of instructions in an out-of-order (OoO) window using multiple LSU pipes. The execution includes selecting an instruction from the OoO window, the instruction using an effective address; and if the instruction is a load instruction: and if the processing unit is operating in single thread mode, creating an entry in a first partition of a load reorder queue (LRQ) if the instruction is issued on a first load pipe, and creating the entry in a second partition of the LRQ if the instruction is issued on a second load pipe. Further, if the processing unit is operating in a multi-thread mode, creating the entry in a first predetermined portion of the first partition of the LRQ if the instruction is issued on the first load pipe and by a first thread of the processing unit.

Claims (20)

1. A processing unit for executing one or more instructions, the processing unit comprising: a load-store unit (LSU) configured to execute a plurality of instructions in an out-of-order (OoO) window using multiple LSU pipes by: selecting an instruction from the OoO window, the instruction using an effective address; and in response to the instruction being a load instruction: in response to the processing unit operating in a single thread mode, creating an entry in a first partition of a load reorder queue based on the instruction being issued on a first load pipe, and creating the entry in a second partition of the load reorder queue based on the instruction being issued on a second load pipe; and in response to the processing unit operating in a multi-thread mode where multiple threads are processed simultaneously, creating the entry in a first predetermined portion of the first partition of the load reorder queue based on the instruction being issued on the first load pipe and by a first thread of the processing unit.
2. The processing unit of claim 1, wherein in the multi-thread mode the first predetermined portion of the first partition of the load reorder queue is specific to load instructions issued by the first thread of the processing unit using the first load pipe.
3. The processing unit of claim 1 , the load-store unit further configured to: in response to the instruction being a store instruction: in response to the processing unit operating in the single thread mode, creating a store entry in a first partition of a store reorder queue based on the store instruction being issued on a first store pipe, and creating the store entry in a second partition of the store reorder queue based on the store instruction being issued on a second store pipe; and in response to the processing unit operating in the multi-thread mode, creating the store entry in a first predetermined portion of the first partition of the store reorder queue based on the store instruction being issued on the first store pipe and by the first thread of the processing unit.
4. The processing unit of claim 1 , wherein the load reorder queue comprises one partition for each load pipe of the LSU.
5. The processing unit of claim 4, wherein the LSU operates multiple load instructions concurrently, one load instruction using each respective load pipe.
6. The processing unit of claim 1 , wherein the store reorder queue comprises one partition for each store pipe of the LSU.
7. The processing unit of claim 6, wherein the LSU operates multiple store instructions concurrently, one store instruction using each respective load pipe.
8. A computer-implemented method for out-of-order execution of one or more instructions by a processing unit, the method comprising: receiving, by a load-store unit (LSU) of the processing unit, an out-of-order window of instructions comprising a plurality of instructions to be executed out-of-order; and issuing, by the LSU, instructions from the OoO window by: selecting an instruction from the OoO window, the instruction using an effective address; in response to the instruction being a load instruction: in response to the processing unit operating in a single thread mode, creating an entry in a first partition of a load reorder queue based on the instruction being issued on a first load pipe, and creating the entry in a second partition of the load reorder queue based on the instruction being issued on a second load pipe; and in response to the processing unit operating in a multi-thread mode, creating the entry in a first predetermined portion of the first partition of the load reorder queue based on the instruction being issued on the first load pipe and by a first thread of the processing unit.
9. The computer-implemented method of claim 8, wherein in the multi-thread mode the first predetermined portion of the first partition of the load reorder queue is specific to load instructions issued by the first thread of the processing unit using the first load pipe.
10. The computer-implemented method of claim 8, further comprising: in response to the instruction being a store instruction: in response to the processing unit operating in the single thread mode, creating a store entry in a first partition of a store reorder queue based on the store instruction being issued on a first store pipe, and creating the store entry in a second partition of the store reorder queue based on the store instruction being issued on a second store pipe; and in response to the processing unit operating in the multi-thread mode, creating the store entry in a first predetermined portion of the first partition of the store reorder queue based on the store instruction being issued on the first store pipe and by the first thread of the processing unit.
11. The computer-implemented method of claim 8, wherein the load reorder queue comprises one partition for each load pipe of the LSU.
12. The computer-implemented method of claim 11 , wherein the LSU operates multiple load instructions concurrently, one load instruction using each respective load pipe.
13. The computer-implemented method of claim 8, wherein the store reorder queue comprises one partition for each store pipe of the LSU.
14. The computer-implemented method of claim 13, wherein the LSU operates multiple store instructions concurrently, one store instruction using each respective load pipe.
15. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processing unit to cause the processing unit to perform operations comprising: receiving, by a load-store unit (LSU) of the processing unit, an out-of-order window of instructions comprising a plurality of instructions to be executed out-of-order; and issuing, by the LSU, instructions from the OoO window by: selecting an instruction from the OoO window, the instruction using an effective address; in response to the instruction being a load instruction: in response to the processing unit operating in a single thread mode, creating an entry in a first partition of a load reorder queue based on the instruction being issued on a first load pipe, and creating the entry in a second partition of the load reorder queue based on the instruction being issued on a second load pipe; and in response to the processing unit operating in a multi-thread mode, creating the entry in a first predetermined portion of the first partition of the load reorder queue based on the instruction being issued on the first load pipe and by a first thread of the processing unit.
16. The computer program product of claim 15, wherein in the multi-thread mode the first predetermined portion of the first partition of the load reorder queue is specific to load instructions issued by the first thread of the processing unit using the first load pipe.
17. The computer program product of claim 15, wherein in response to the instruction being a store instruction: in response to the processing unit operating in the single thread mode, creating a store entry in a first partition of a store reorder queue based on the store instruction being issued on a first store pipe, and creating the store entry in a second partition of the store reorder queue based on the store instruction being issued on a second store pipe; and in response to the processing unit operating in the multi-thread mode, creating the store entry in a first predetermined portion of the first partition of the store reorder queue based on the store instruction being issued on the first store pipe and by the first thread of the processing unit.
18. The computer program product of claim 15, wherein the load reorder queue comprises one partition for each load pipe of the LSU.
19. The computer program product of claim 18, wherein the LSU operates multiple load instructions concurrently, one load instruction using each respective load pipe.
20. The computer program product of claim 15, wherein the store reorder queue comprises one partition for each store pipe of the LSU, and wherein the LSU operates multiple store instructions concurrently, one store instruction using each respective load pipe.
GB2006338.4A 2017-10-06 2018-10-03 Load-store unit with partitioned reorder queues with single CAM port Active GB2579534B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US15/726,627 US11175924B2 (en) 2017-10-06 2017-10-06 Load-store unit with partitioned reorder queues with single cam port
US15/726,596 US10606591B2 (en) 2017-10-06 2017-10-06 Handling effective address synonyms in a load-store unit that operates without address translation
US15/825,494 US10606592B2 (en) 2017-10-06 2017-11-29 Handling effective address synonyms in a load-store unit that operates without address translation
US15/825,453 US11175925B2 (en) 2017-10-06 2017-11-29 Load-store unit with partitioned reorder queues with single cam port
PCT/IB2018/057695 WO2019069256A1 (en) 2017-10-06 2018-10-03 Load-store unit with partitioned reorder queues with single cam port

Publications (3)

Publication Number Publication Date
GB202006338D0 GB202006338D0 (en) 2020-06-17
GB2579534A true GB2579534A (en) 2020-06-24
GB2579534B GB2579534B (en) 2020-12-16

Family

ID=65994519

Family Applications (2)

Application Number Title Priority Date Filing Date
GB2006338.4A Active GB2579534B (en) 2017-10-06 2018-10-03 Load-store unit with partitioned reorder queues with single CAM port
GB2006344.2A Active GB2579757B (en) 2017-10-06 2018-10-03 Handling effective address synonyms in a load-store unit that operates without address translation

Family Applications After (1)

Application Number Title Priority Date Filing Date
GB2006344.2A Active GB2579757B (en) 2017-10-06 2018-10-03 Handling effective address synonyms in a load-store unit that operates without address translation

Country Status (5)

Country Link
JP (2) JP7025100B2 (en)
CN (2) CN111133413B (en)
DE (2) DE112018004004T5 (en)
GB (2) GB2579534B (en)
WO (2) WO2019069256A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023056289A (en) 2021-10-07 2023-04-19 富士通株式会社 Arithmetic processing unit, and arithmetic processing method
CN114780146B (en) * 2022-06-17 2022-08-26 深流微智能科技(深圳)有限公司 Resource address query method, device and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101324840A (en) * 2007-06-15 2008-12-17 国际商业机器公司 Method and system for performing independent loading for reinforcement processing unit
US7730282B2 (en) * 2004-08-11 2010-06-01 International Business Machines Corporation Method and apparatus for avoiding data dependency hazards in a microprocessor pipeline architecture using a multi-bit age vector
US20130346729A1 (en) * 2012-06-26 2013-12-26 International Business Machines Corporation Pipelining out-of-order instructions
CN104094223A (en) * 2012-02-06 2014-10-08 国际商业机器公司 Multi-threaded processor instruction balancing through instruction uncertainty

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6694425B1 (en) 2000-05-04 2004-02-17 International Business Machines Corporation Selective flush of shared and other pipeline stages in a multithread processor
US6931639B1 (en) * 2000-08-24 2005-08-16 International Business Machines Corporation Method for implementing a variable-partitioned queue for simultaneous multithreaded processors
US7343469B1 (en) * 2000-09-21 2008-03-11 Intel Corporation Remapping I/O device addresses into high memory using GART
US20040117587A1 (en) * 2002-12-12 2004-06-17 International Business Machines Corp. Hardware managed virtual-to-physical address translation mechanism
US8645974B2 (en) * 2007-08-02 2014-02-04 International Business Machines Corporation Multiple partition adjunct instances interfacing multiple logical partitions to a self-virtualizing input/output device
US7711929B2 (en) * 2007-08-30 2010-05-04 International Business Machines Corporation Method and system for tracking instruction dependency in an out-of-order processor
US8639884B2 (en) * 2011-02-28 2014-01-28 Freescale Semiconductor, Inc. Systems and methods for configuring load/store execution units
US8966232B2 (en) * 2012-02-10 2015-02-24 Freescale Semiconductor, Inc. Data processing system operable in single and multi-thread modes and having multiple caches and method of operation
CN103198028B (en) * 2013-03-18 2015-12-23 华为技术有限公司 A kind of internal storage data moving method, Apparatus and system
US9740409B2 (en) * 2013-12-13 2017-08-22 Ineda Systems, Inc. Virtualized storage systems
US10209995B2 (en) * 2014-10-24 2019-02-19 International Business Machines Corporation Processor core including pre-issue load-hit-store (LHS) hazard prediction to reduce rejection of load instructions
US10089240B2 (en) * 2014-12-26 2018-10-02 Wisconsin Alumni Research Foundation Cache accessed using virtual addresses

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7730282B2 (en) * 2004-08-11 2010-06-01 International Business Machines Corporation Method and apparatus for avoiding data dependency hazards in a microprocessor pipeline architecture using a multi-bit age vector
CN101324840A (en) * 2007-06-15 2008-12-17 国际商业机器公司 Method and system for performing independent loading for reinforcement processing unit
CN104094223A (en) * 2012-02-06 2014-10-08 国际商业机器公司 Multi-threaded processor instruction balancing through instruction uncertainty
US20130346729A1 (en) * 2012-06-26 2013-12-26 International Business Machines Corporation Pipelining out-of-order instructions

Also Published As

Publication number Publication date
CN111133421A (en) 2020-05-08
CN111133421B (en) 2023-09-29
GB2579757B (en) 2020-11-18
DE112018004004T5 (en) 2020-04-16
DE112018004006T5 (en) 2020-04-16
JP2020536308A (en) 2020-12-10
GB2579757A (en) 2020-07-01
CN111133413B (en) 2023-09-29
DE112018004006B4 (en) 2021-03-25
JP7064273B2 (en) 2022-05-10
CN111133413A (en) 2020-05-08
WO2019069256A1 (en) 2019-04-11
GB202006344D0 (en) 2020-06-17
GB202006338D0 (en) 2020-06-17
JP2020536310A (en) 2020-12-10
JP7025100B2 (en) 2022-02-24
WO2019069255A1 (en) 2019-04-11
GB2579534B (en) 2020-12-16

Similar Documents

Publication Publication Date Title
US9733945B2 (en) Pipelining out-of-order instructions
US7734897B2 (en) Allocation of memory access operations to memory access capable pipelines in a superscalar data processing apparatus and method having a plurality of execution threads
US20160357669A1 (en) Flushing control within a multi-threaded processor
EP2140347B1 (en) Processing long-latency instructions in a pipelined processor
PH12017550124A1 (en) Decoupled processor instruction window and operand buffer
WO2015153121A8 (en) A data processing apparatus and method for executing a stream of instructions out of order with respect to original program order
JP2016207232A5 (en) Processor
JP5803972B2 (en) Multi-core processor
JP2015527681A5 (en)
GB2581759A (en) Completing coalesced global completion table entries in an out-of-order processor
KR20150041740A (en) Decoding a complex program instruction corresponding to multiple micro-operations
JP2011529603A5 (en)
GB2579534A (en) Load-store unit with partitioned reorder queues with single cam port
CA2533741A1 (en) Programmable delayed dispatch in a multi-threaded pipeline
WO2015017129A4 (en) Multi-threaded gpu pipeline
US20170123808A1 (en) Instruction fusion
US7809930B2 (en) Selective suppression of register renaming
RU2017103951A (en) EFFICIENT INTERRUPTION ROUTING FOR A MULTI-THREAD PROCESS
JP2006243864A5 (en)
US20150286501A1 (en) Register-type-aware scheduling of virtual central processing units
US9213547B2 (en) Processor and method for processing instructions using at least one processing pipeline
GB2581945A (en) Scalable dependency matrix with one or a plurality of summary bits in an out-of-order processor
US10705587B2 (en) Mode switching in dependence upon a number of active threads
WO2015131445A1 (en) Microengine and packet processing method therefor, and computer storage medium
US9977679B2 (en) Apparatus and method for suspending execution of a thread in response to a hint instruction

Legal Events

Date Code Title Description
746 Register noted 'licences of right' (sect. 46/1977)

Effective date: 20210122