CN107368285B - 多线程架构 - Google Patents

多线程架构 Download PDF

Info

Publication number
CN107368285B
CN107368285B CN201710264673.1A CN201710264673A CN107368285B CN 107368285 B CN107368285 B CN 107368285B CN 201710264673 A CN201710264673 A CN 201710264673A CN 107368285 B CN107368285 B CN 107368285B
Authority
CN
China
Prior art keywords
instruction
register
thread
cache
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710264673.1A
Other languages
English (en)
Other versions
CN107368285A (zh
Inventor
M·阿卜杜拉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Publication of CN107368285A publication Critical patent/CN107368285A/zh
Application granted granted Critical
Publication of CN107368285B publication Critical patent/CN107368285B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30123Organisation of register space, e.g. banked or distributed register file according to context, e.g. thread buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/3009Thread control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30134Register stacks; shift registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30138Extension of register space, e.g. register cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/3016Decoding the operand specifier, e.g. specifier format
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • G06F9/342Extension of operand address space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3808Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/3826Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • G06F9/384Register renaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3853Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3858Result writeback, i.e. updating the architectural state or memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3889Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
    • G06F9/3891Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute organised in groups of units sharing resources, e.g. clusters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/461Saving or restoring of program or task context
    • G06F9/462Saving or restoring of program or task context with multiple register sets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Executing Machine-Instructions (AREA)
  • Advance Control (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

本申请涉及用于处理使用上下文切换的多线程架构中的指令的装置和方法。用于复杂指令格式的动态生成、执行、同步以及并行的统一架构包括虚拟寄存器文件、寄存器缓存以及寄存器文件层级。自生成和同步动态和静态线程的架构提供了高效的上下文切换。

Description

多线程架构
本申请是国际申请号为PCT/US2007/084710,国际申请日为2007/11/14,进入国家阶段的申请号为200780046679.6,题为“多线程架构”的发明专利申请的分案申请。
技术领域
本发明主要涉及数字电路。更具体地,本发明涉及改进的处理器设计。
背景技术
处理器需要过程相关的或者完全独立的多个任务。这种处理器的内部状态通常由寄存器集成,所述寄存器在程序执行的每个特定的时刻处可以具有不同的值。在程序执行的每个时刻处,将内部状态的图像称为处理器的架构状态。
当切换代码执行来运行另一个子例程(function)(例如,另一个线程、过程或者程序)时,则必需保存机械/处理器的状态,以便新的子例程可以使用内部寄存器来建立其新的状态。一旦新的子例程终止,则可以放弃其状态,并且恢复到之前上下文的状态并继续执行。将这种切换过程称为上下文切换,并且该过程通常包括特别是具有现代架构的几十个或者几百个周期,所述现代架构采用大量的寄存器(例如,64,128,256)和/或乱序执行。
在线程感知(thread-aware)硬件架构中,硬件一般支持有限数量的硬件支持的线程的多个上下文状态。在这样的情况下,硬件复制每个支持的线程的所有架构状态要件。当执行新的线程时,这不需要上下文切换。然而,这还是有多个弊端,即:复制在硬件中支持的每个额外的线程的所有架构状态要件(即,寄存器)的区域、能量、以及复杂度。此外,如果软件线程的数量超过明确支持的硬件线程的数量,则必须还是要进行上下文切换。这是因为基于精细度的并行往往需要大量的线程。对上下文状态的硬件存储进行复制的硬件线程感知架构无助于非线程的软件代码,并且只能为线程化的软件减少上下文切换的次数。然而,这些线程通常构建为粗粒度并行,并且导致初始和同步离开诸如子例程调用和循环并行执行之类的细粒度并行的沉重的软件开销,而无法进行高效的线程初始/自动生成。如此描述的开销使得使用现有技术的编译器来自动并行化这种代码困难,或者针对不明确的/易并行化的/线程的软件代码的用户并行化技术困难。
发明内容
用于复杂指令格式的动态生成、执行、同步以及并行的统一架构包括虚拟寄存器文件、寄存器缓存(register cache)以及寄存器文件层级。自生成和同步动态和静态线程的架构提供了高效的上下文切换。
附图说明
结合附图通过下面的详细描述,可以更加全面地理解本发明,在附图中:
图1示出了使用到寄存器文件的直连的寄存器状态的即时保存和恢复;
图2示出了根据本发明实施例的具有寄存器位置映射的虚拟寄存器文件;
图3示出了根据本发明实施例的具有用于推测的寄存器状态的临时存储装置的多个层级的寄存器文件;
图4示出了根据本发明实施例的用于对软件的上下文/线程进行虚拟化的寄存器缓存;
图5示出了根据本发明实施例的具有在上下文切换时溢出和重载寄存器的自动机制的寄存器缓存;
图6示出了根据本发明实施例的使用计数器模式的寄存器名称的错误相关性的方案;
图7示出了已知的超标量执行架构;
图8A示出了根据本发明实施例使用的执行架构;
图8B示出了根据本发明实施例的缓存、调度器以及存储继承信息的重排序单元;
图9A示出了根据本发明实施例的用以在跨一组核心的线程之间实施顺序一致性的全局前端和继承向量;
图9B示出了根据本发明实施例的允许跨核心的线程间和线程内的寄存器通信的寄存器互连拓扑;
图9C示出了根据本发明实施例的跨物理分段式架构的统一寄存器文件的基于位置的寄存器引用;
图10示出了根据本发明实施例的自动生成越过循环迭代和边界的线程的机制;
图11A示出了根据本发明实施例的自动生成跨越控制路径的线程的机制;
图11B示出了根据本发明实施例的测试和分支指令处理;
图12示出了根据本发明实施例的使用的具有单独标签和数据阵列的分段式/分布式缓存;
图13A示出了支持预取指令和其加载消耗指令之间的精确调度的映射机制;
图13B示出了根据本发明实施例的通过虚拟化硬件存储的到物理存储的Temp和Chain的动态分配;
图14示出了根据本发明实施例使用的在无需额外的物理寄存器的情况下使地址寄存器能够重新命名的调度器向量中的计数器机制;
图15示出了根据本发明实施例实施的绕过和检查机制。
在各个附图中,相似的附图标记指代相对应的部分。
具体实施方式
本发明的一个方面减小了上下文切换所带来的麻烦。由于保存架构状态(例如,数据寄存器、控制寄存器、程序计数器等)耗费时间,因而上下文切换带来了麻烦。在继续进行执行之前不得不恢复之前的上下文的状态。这意味着大量寄存器的许多加载/存储的出现有时就如同涉及的子例程调用一样常见。公开的架构状态的结构在架构状态上进行逐步的自上下文切换以及即时上下文切换。同样,引入虚拟寄存器文件和寄存器缓存的概念作为可能的实施组件。相比可能使用传统的硬件线程的支持来说,虚拟寄存器文件或者寄存器缓存还可以提供硬件中对虚拟的更大数量的线程或上下文的支持。多个层级的寄存器文件支持为寄存器文件提供更大的带宽。
还可以将该技术与线程支持相结合。架构状态的一部分将会经历即时硬件/软件支持的上下文切换,在所述上下文切换中,通过使用LIFO(后进先出)电路的硬件,来快速地/即时地保存所有架构状态。相对于寄存器文件或者随机存取存储器的实施,LIFO电路的使用允许更低的复杂度和高效的速度和区域的实现。将属于该类的每个接下来的上下文切换保存在LIFO中的旧的上下文切换的上面。这个技术非常适合递归式的子例程调用或者嵌套的并行循环,其中,在递归中的一个调用的上下文将会以LIFO的方式自然地进入和离开。图1示出了硬件上下文切换支持的一个实施例。
对架构状态进行这种快速保存的另一种方式是将寄存器文件的内容保存为存储器或缓存中的一个块(例如,一个缓存行或块)。一个寄存器批组为一系列后续的寄存器号码(例如,寄存器0到7可以为更大寄存器集0-64的子组),当出现涉及这8个寄存器的上下文切换时,可以将该一系列后续的寄存器号码存储为64字节大小的一个缓存行。相对于通常使用加载和存储来保存和恢复每个个别寄存器来说,寄存器文件具有在同一周期或少量周期中直接运送其所有内容的专门的宽总线101和102。寄存器文件的缓存传送的一个可能的实施是使用总线102和101来连接到填充缓冲区,所述填充缓冲区一般用来填充缓存行以作为后门路径来将寄存器直接移动到缓存。一个可选的实施可以使用写入/读取结合缓冲区,所述写入/读取结合缓冲区通常用来在部分数据作为简单事务而通过总线写到存储器之前,将它们存储和加载积累到缓冲区,而不是每个均形成部分总线事务。这种结合缓冲区可以具有经由101和102的侧门总线连接,用以将一系列寄存器结合到一个写入结合缓冲区,所述写入结合缓冲区随后可以将寄存器的整个缓存行的值传送到缓存或者存储器中,同时可以使用读取结合缓冲区来进行相反的传送。在另一个实施中,可以选择使用具有LIFO电路或者随机存取存储器缓冲区的专门局部存储器。LIFO电路更为简单,并且具有更少的实施限制,但却实行了严格的保存和恢复数据的顺序(后进先出)。另一方面,诸如对处理器缓存使用直接总线之类的随机存取存储器具有使用局部存储器的灵活性,但却实行了硬件实施限制,而允许以随机的顺序灵活地保存和恢复上下文。
当新的上下文取代旧的上下文时,可以逐步地保存和恢复架构状态的另一部分。换句话说,在新的或旧的上下文中使用个别寄存器时,由硬件来交换进和交换出/读进和读出该特定寄存器。这种逐步的上下文切换适合全局变量和通过上下文切换从一个子例程传送到另一个子例程的变量。它还适合当循环线程化时通过嵌套循环来传送的那些变量,这样,将内部循环指定给与外部循环不同的线程。还可以应用到下面的情况:其中,将循环的不同迭代指定给不同的线程并且这些迭代将各个变量从一个迭代传送到其它迭代(循环带来的相关性)。在所有的这些情况中,对寄存器进行逐步上下文切换是有益的,这是因为一般在线程之间使用的寄存器(在上下文之间传送数据的寄存器)的数量很大,因此可以共享那些寄存器并且不需要进行上下文切换,而剩下的寄存器可以逐步地按需进行交换。
本发明还可以使用虚拟寄存器来允许容纳比由指令集的架构直接支持的寄存器集更大的寄存器集。额外的寄存器成为硬件支持的上下文切换或者硬件/软件线程的部分。寄存器缓存或者寄存器层级的概念也适用于这样的情形。虚拟寄存器文件是一种如下的寄存器文件架构:其中,通过使用虚拟寄存器文件存储装置,来扩展对架构可视的指令集寄存器并且对架构始终可视。这便于对属于不同上下文切换或线程的相同架构的寄存器保留多个拷贝。这也使得具有比使用传统物理寄存器文件直接可访问的寄存器数量更多的或者可变的寄存器数量。在硬件正在生成线程/上下文(硬件线程)的情况中,结构寄存器的那些拷贝对硬件是可视的,并且当软件正在生成线程/上下文(软件线程)时,它们还可以用软件线程进行过载。
图2示出了通过使用寄存器文件的层级的这样的虚拟寄存器文件的一个实施例。较低层级级别的201为执行单元支持非常高的带宽。较高层级级别200支持较低带宽的访问。寄存器文件层级的一些级别包含由指令集的架构支持的基本寄存器集,而寄存器文件层级的其它级别包含基本寄存器集的虚拟拷贝。还可以允许复制基本寄存器集以支持多线程。
还可以在该层级中包括寄存器缓存,在所述寄存器缓存中,基于寄存器对当前上下文的关联性以及它们立即访问的需要,来将这些寄存器缓存到这些结构中或者从该结构中缓存出。实施这种虚拟寄存器文件和寄存器缓存的一种方式是给寄存器位置提供标签以便通过标签来访问它们,所述标签包括实际的寄存器号码以及额外的其它信息,例如:上下文ID号码或者静态线程号码和可能的存储器地址,在所述可能的存储器地址中,为上下文切换而保存它们。其他可选的技术同样于此描述,其中,索引映射202使用关于每个结构寄存器的基于位置的信息。将在层级中的寄存器拷贝的位置(地点)作为寄存器文件203的寄存器读取访问的解码阶段的部分进行查找。其它机制包括将寄存器的位置作为从寄存器的产生指令到其消耗指令的计分牌(score board)机制的部分来进行传送。消耗指令需要读取寄存器以便它们知道它们需要去访问该结构寄存器的哪个拷贝,但是在基于位置/地点的方案中,由于寄存器的访问是通过其寄存器号码和位置来进行的,所以不需要标签,其中,将特定的寄存器拷贝物理地位于多层级寄存器文件中的该寄存器的多个拷贝当中。
逐步上下文切换的机制可以使用类似的机制来对当前上下文切换进行个别寄存器的保存和恢复,但具有以下不同。当新的上下文试图使用个别寄存器并且局部拷贝属于旧的上下文时,该个别寄存器是由硬件交换/读取的。此时,将属于旧的上下文的寄存器与带入到局部寄存器文件201的属于新的上下文的200中的全局寄存器拷贝进行交换。在一个实施中,具有相同寄存器号码但不同的上下文ID标签的多个寄存器可以共存。标签方案的一个实施例可以类似于呈现来自不同上下文的相同寄存器的虚拟寄存器文件。虚拟寄存器文件是值得引起注意的,因为它允许大的虚拟寄存器文件被实施为具有不同带宽和容量变化的多级寄存器文件的层级。下面的示例示出了呈现子例程的不同情形。假设随后调用在单一程序/任务内的不同子例程调用。
Function 1(values,references)
{
a-d:local variables
X-Z:global variables
Code
}
Function 2(values,references)
{
a-d:local variables
X-Z:global Variables
Code
}
Function 3(values,references)
{
a-d:local variables
X-Z:global variables
If(condition)then call Function 3()
}
Main{
Function 1()
Function 2()
Function 3()
}
如果不需要旧的上下文,则需要将局部寄存器文件中的旧的局部变量作为一个整体来保存,此处使用上下文即时保存是有意义的,需要保存整个上下文的其它情况为操作系统调用新的独立过程的时候。
另一方面,如果不需要保存所有的旧的上下文的寄存器,则只将需要保存的那些与新的上下文寄存器的值进行逐步交换。这是子例程1和2一起共存于同一局部寄存器文件中并且在全局寄存器文件中共享相同的全局寄存器的情况。这是针对细粒度线程的情况。在这样的情况中,当需要访问特定的上下文中的特定寄存器时,将它的上下文ID与在局部寄存器文件中存在的那个进行核对。如果它们匹配,则可以使用它,否则需要从(诸如全局寄存器文件或者寄存器缓存之类的)更高的存储层级中带来/读取它,并且可以与在局部层级存储(例如,临时局部寄存器)中的进行交换。
在子例程的调用处,将写在子例程作用域(scope)之内的新的变量指定给(具有与子例程外部作用域相比而言新的上下文/线程ID的)新的寄存器拷贝。可以通过硬件管理单元(HMU)或者通过为该架构生成代码的编译器/程序设计器来指定那些新的寄存器名称的拷贝。在HMU的情况中,进入子例程代码(调用和返回指令之间的一个)时,HUM单元将新的上下文/线程ID号码指定给写在子例程作用域之内的每个寄存器名称,从而生成寄存器名称的拷贝。当到达返回指令时,该上下文/线程ID变为空闲以重新指定并且所有的寄存器名称的拷贝可以再一次重新使用。由于硬件自动地管理寄存器名称的这些临时拷贝的指定和释放,所以在创建和同步这些临时上下文中没有开销。
当指令架构允许使用上下文/线程ID号码来扩展寄存器名称时,编译器可以在不同的上下文中操纵寄存器名称的管理。将指令写进具体寄存器名称的这种方式也指明了指令写给的和指令从其上进行读取的上下文/线程。考虑具有3个源(S1,S2,S3)和一个目的地(Dest)的一般指令作为下述编码:
Dest/S1←ADD(S2,S3)
在这个示例中,将寄存器和其对线程的关联格式化为:
Thy:Rx←ADD(Thz:Ri,Thw:Rk)
这示出了指令架构如何允许指明上下文/线程ID,即Thy(其中y表示线程ID)和具有寄存器名称x的寄存器Rx的关联。编译器和程序设计器可以指明寄存器名称所属于的上下文/线程号码。当编译了子例程代码时,编译器可以使用专门指令来增加线程ID计数器,以增加上下文/线程ID号码;当使用减少指令来退出子例程调用时,则减少该计数器。指令架构可以使用的另一个机制是实施空闲线程选择器指令。编译器/程序设计器使用该指令来轮询硬件以找到其可以使用的空闲上下文/线程ID。当从子例程返回时,它可以使用诸如上下文/线程空闲指令之类的另一个指令。下面示出了这两个方法。
增加线程ID计数器
Function A:((Th+:Ry←Move Th:Ry)
{
Function body
}
Return(Th-:Ri←Move Th:Ri)
减少线程ID计数器
在子例程调用之前进行增加和在返回之后进行减少的原因是允许在调用方和被调用方之间传送值。增加创建新的上下文/线程号码而减少则释放它。
执行相同任务的可选方法使用了选择空闲的上下文/线程号码的指令和释放该上下文/线程号码的另一个指令。
J=当前线程号码
I=选择空闲的线程ID号码
Function A:(Th{I}:Ry←Move Th{J}:Ry)
{
Function body
}
Return(Th{J}:Rx←Move Th:Rx)
Free thread ID number{I}
为了减少与每个寄存器名称相关联的上下文/线程号码的编码空间,指令架构可以指明:将寄存器与父线程或者当前子线程相关联;这可以用一位来编码(我们将父/子位称为:“P/C”位)。每个新生成的线程将指明其父线程(用过的线程)。将父ID号码与其他线程状态保持在一起。可以使用混合方案,在所述混合方案中,指令可以指明来自使用父/子位的当前子或父线程的一个/多个源,和来自使用明确的线程号码和该源的寄存器名称的另一个不相关的线程的一个/多个源。下面的指令集的架构示出了这种混合方案:
P/C:Rx←ADD(P/C:Ri,P/C:Rj,Thw:Rk)
“P/C”位指明寄存器是否属于父线程或者子线程。可以一般化这个方案,其中,可以在更高级别的上下文之内指定更多位来对活动线程进行编码。接下来示出这个更加一般的方案。高级别的上下文在其状态内定义很少的线程。处理器架构允许可以由指令集缩减的小规格存储状态。假设处理器硬件可以支持总共N个线程,则指令集可以支持这些线程的缩减,这使得M个线程在指令集编码中通过寄存器进行相互通信,其中M<N。这将使得由指令集的架构支持的线程数量大于经缩减的编码位允许的线程数量。例如,假设硬件可以支持128个线程(N=128)。可以将指令集编码定义为只具有4个线程(M=4)的编码,这样,可以由128个线程当中的4个活动线程来同时访问那些线程的寄存器。根据程序的活动区,该4个编码可以在128个线程当中轮换。可以使用很少的位(T1…Tlog M)来在指令中编码M个线程。在本示例中使用2位T1T2来编码4个线程。
T1T2:Rx←ADD(T1T2:Ri,T1T2:Rj,Thw:Rk)
可以将程序区内的活动线程映射到由硬件支持的线程的所有组,下面示例性示出映射或者指定指令:
Map(T1T2=00),TH=0001101
Map(T1T2=01),TH=1001000
Map(T1T2=10),TH=1010101
Map(T1T2=11),TH=0111100
01:Rx←ADD(01:Ri,10:Rj,1000110:Rk)
在上面代码的示例中,指令将硬件线程映射到由映射指令的2位T1T2表示的经缩减的编码。经由示例,添加的指令为每个源和目的地的线程编码为只2个经缩减的位。然而,第三源明确地编码其源的线程。在指令集中编码以缩减线程而对位数的决定,是可以从一个架构变化到另一个的实施决定,同样,包括第三源或者具有其所明确编码的线程也是实施的选择。
为了实施这种指令或微操作(指令的微代码实施)的处理器的流水线,该流水线将存储这些映射,并且当处理器执行指令以编码经缩减的线程位时,使用这些映射来访问正确的线程状态。这些映射状态可以扩展寄存器分配算法以将单一线程变量映射到寄存器,其中编译器可以将更大的线程组分配给有限数量的映射状态,并且随后在每个线程之内进行寄存器分配。通过使用与寄存器的生命范围相类似的线程的生命范围,这种方式可以将映射状态在程序中重新分配给新的线程。
Figure BDA0001275703390000111
如果该处理器为乱序处理器,则它还可以允许对那些缩减映射进行重新命名以允许更多的那些主动区线程同时执行。例如,在这种乱序处理器中,将会解码上面的代码,来将实际微操作编码内部的那些经缩减的位重新命名为完整的线程号码。在这种乱序的机器中,还实施临时推测寄存器来保留那些重新命名的寄存器和状态。图3实施了临时寄存器,其中,在将诸如301之类的临时存储器交调拨给全局寄存器文件之前,推测状态可以位于临时寄存器中。
所描述的一般方案使细粒度线程能够无需通过存储器而在寄存器级别执行上进行通信、协作、和同步执行,而在同时具有通过给线程提供寄存器关联而极度紧密的指令集编码,并减小了指令编码空间。上面描述的所有方案需要编译器插入指令以管理线程并且设置扩展位,这可以由硬件管理单元来进行,所述硬件管理单元对指令执行类似的动作和管理这些扩展位的设置(如通篇本文所描述的那样)。
我们一般化上面的机制来操纵超出子例程调用的细粒度线程,以包括跨循环边界的线程并在循环体和循环体之外的代码之间创建的线程。考虑下面的代码:
Loop(j=1to 100)
{
Instruction A
Instruction B
Loop(j=1to j)
{
Instruction 1
Instruction 2
}
Instruction C
}
通过使用编译器或者硬件管理单元,可以将表示内部循环体的线程(指令1和2)和表示内部循环体之外的指令的线程映射为硬件支持的线程。
经由示例,在下面的实施表示的一个方案中,可以使那些细粒度线程位于寄存器文件/寄存器缓存中,并且可以交换进和交换出到更高级别的寄存器文件层级和存储器层级。假设,将编译器设计为使用一些规范,所述规范依靠在被调用的子例程中使用的寄存器的数量来保存上下文。因此它可以以增加的方式来保存若干寄存器。例如,如果设计规范以8为增量使用寄存器批组,则当子例程需要使用在1和8个寄存器之间的若干寄存器时,其将会保存旧的上下文的寄存器1到8(并且当它恢复旧的上下文时恢复它们)。如果子例程需要使用在1和16个寄存器之间的若干寄存器时,则它将会保存寄存器1至16等等,以完整的程度保存由架构支持的全部寄存器。允许建立虚拟寄存器文件的另一个方案是,对不同的上下文/线程的寄存器进行混叠(aliasing),其需要比由指令架构提供的寄存器数量更少的寄存器。例如,小子例程调用或者工作线程可能只需要使用所有寄存器的子集,例如,指令集可使用的32个寄存器当中的8个寄存器(寄存器0到7)。如果是这种情况,则硬件/编译器将这8个逻辑寄存器批组混叠在另一个空闲的物理寄存器批组的上面,线程映射到的寄存器批组不需要和相同的寄存器号码在一起;即,可以将寄存器0-7映射到在另一个上下文物理状态存储中可使用的物理寄存器8-15或16-23或24-31上。
图4示出这样的情形。将由寄存器0-7组成的线程5的状态402混叠在空闲的物理寄存器批组8-15的上面。这意味着可以比将每个线程映射到全部32个寄存器状态上的一般情况容纳更多的线程。在图4中,由于32个寄存器中只有4组可以同时使用,所以一般只有4个线程可以在同一时刻共存。然而如果大多数额外线程(除了自身可能需要32个寄存器大状态的主线程之外)需要小组寄存器,并且它们的每个均可以和8个寄存器小状态共存,则可以同时处理总共13个线程。可以在这8个寄存器批组的混叠配置中共存的同时线程的最大数量是16个小状态线程。显然,我们不需要为了示例性的目的而像图中所示出的那样存储每个寄存器的标签。在高效的实施中,可以存储每个批组(在本示例中8个寄存器)的线程标签,或者可选地可以根据号码混叠方案来分配线程,在所述号码混叠方案中,线程的号码在8的倍数的边界处开始。例如,如果我们基于这个方案来开始分配寄存器,则图4示出软件线程号码0、1、3、4、5、6、7和9。如上面所描述,(经由编译器)通过指令集或者通过硬件管理单元来将软件线程映射给物理硬件线程。混叠方案现在允许将16个硬件线程映射到软件线程。在这样的情形中,软件线程0保留硬件线程0、1、2和3,因为它需要全部32个寄存器大状态,并且因此使用8寄存器的4个批组中的每一个。然而,软件线程1只保留一个小状态硬件线程存储并因此映射在硬件线程4上。将软件线程5映射到硬件线程5上,将软件线程3映射在硬件线程6上并且将软件线程6映射在硬件线程12上(在图中示出为点400)。
在指令集或微指令编码中,为了支持上面描述的混叠方案,对线程-寄存器对的引用(reference)可以是位的连续编码,其中,顶上的4位为硬件线程号码(指示了8个寄存器的哪个批组被线程引用),而较低的3位指示寄存器批组0-7中的个别寄存器。
作为示例,假设软件需要进行接下来在不同的线程中添加特定寄存器的高级别线程的任务,并且将结果写进软件线程1的寄存器2中,如图4中的403所示如下:
Th1:R2←ADD(Th0:R13,Th3:R2,Th6:R3)
随后,通过编译器或者硬件管理单元来将软件线程映射到硬件线程。这种映射确保需要超过8个寄存器的软件线程保留不止仅一个硬件-线程存储,因而其他的软件线程不能使用那些保留的硬件线程来映射到其上。任务的实际指令编码如下:
0100010←ADD(0001101,0110010,1100011),其解码如下:
HWTh4:R2←ADD(HWTh1:R5,HWTh6:R2,Th12:R3)
注意,由于将软件线程的映射混叠在4个小状态线程0、1、2、3上,其中的每个具有只8个寄存器的批组,所以将软件线程0寄存器13编码为硬件线程1(401)寄存器5(0001101,前4位为硬件线程号码而后3位为寄存器号码)。可选地,可以将相同的编码读作硬件线程0寄存器13(00 01101,前2位为大状态硬件线程号码(具有32个寄存器状态),而后5位为32个寄存器当中的寄存器号码)。
如图所示,可以在之前寄存器缓存方案上面实施的另一个方案在这个寄存器缓存和存储器缓存之间提供自动存储加载机制,以当上下文切换和线程上下文交换时,允许寄存器缓存自动地溢出和重新加载寄存器。
下面代码示出了方案、以及图5中描述的硬件如何工作来允许无缝多线程的上下文切换。(类似于一般与组和方式相关联的缓存而建立的)寄存器缓存具有牵引部分、标签阵列和数据阵列。对数据阵列的访问类似于对任何线程的寄存器阵列的访问;当任何指令访问该线程寄存器时,进行寄存器读取。然而,当任何加载或存储指令是上下文切换的一部分时,对标签阵列进行访问,这也被认为是保存和恢复上下文。当存储的寄存器的当前线程上下文(保存上下文)实现,由于在寄存器文件缓存中的那个特定寄存器集中仍具有闲置的空间,因而不需要将值实际地存储到存储器上的时候,针对该特定寄存器,使用当前堆栈存储器地址来只更新寄存器缓存的标签部分。稍后,当在寄存器缓存中没有空闲的空间来存储需要分配相同寄存器名称的新的线程时,使用这个标签来实际地收回寄存器。下面代码示出这些:
Figure BDA0001275703390000141
Figure BDA0001275703390000151
在上面代码中,编译器希望将由线程1、4和6重写的线程0的寄存器R0到R7,保存到存储器堆栈上。在这个方案中,分配堆栈中的存储器区域来容纳寄存器批组(0-7),但是当线程1、4和6相遇时,硬件没有真正地进行将线程0的寄存器存储到存储器/存储器缓存。这是因为多上下文的寄存器文件缓存能够将存储501提供给那些线程所需的额外8个寄存器(0-7),而没有破坏原来线程(线程0)的寄存器0-8。只要多上下文的寄存器文件缓存可以容纳那些寄存器,它就使用由存储指令意图执行的堆栈存储器地址来只更新寄存器缓存中的标签阵列。注意当遇到线程8时(“子例程foo”),因为寄存器缓存的空间都为寄存器批组0-7占据,所以必需进行线程0的寄存器0-7的存储,并且事实上这类似于在一般缓存中的收回过程,其中,将线程0的寄存器0-7收回到实际堆栈存储器/缓存层级存储500,而寄存器文件缓存在这之后在存储区域502中包括线程8的寄存器0-7的值。
对上下文切换的机制还可以促进诸如动态自生成和线程同步之类的其他技术。单一任务或程序的不同部分的并发执行当前需要乱序“超标量”处理器架构的实施。然而,乱序执行需要相当大的硬件投入和复杂度。下面是使用了乱序微架构的这种并发执行所需要的一般性的实施组件:
1.分配逻辑,其以程序的序列顺序来分配所有资源。
2.重新命名逻辑和寄存器映射,其将逻辑寄存器映射处分配处的物理寄存器,并且在物理寄存器的退回(retirement)处更新逻辑寄存器。
3.保留站,其根据资源的可用性并且在动态解决相关性之后来乱序地分派操作。
4.重排序缓冲区,其允许将所述结果退回到架构状态之前,以初始的序列顺序对结果重新排序。所述重排序缓冲区还保持精确异常行为。
5.存储器重排序缓冲区,其对乱序分派的存储器访问进行重新排序,以保持相同一致的模式就如同它们按顺序分派一样,例如,在存储到相同地址之后对加载进行检查。还有必要像存储器映射的I/O一样来为专门寄存器访问保持访问的顺序。
很少的架构能够利用乱序机器的超标量的特性和它们的能力来分派指令的多个独立流,以允许多个软件线程在机器中同时地执行。这个同时多线程需要为每个硬件支持的线程复制机器的架构状态,并且分割线程之间的微架构资源,并且共享存储器结构就像那些线程之间的缓存一样。尽管事实是那些机器是乱序的,但它们并不由乱序硬件本身来在线程之间初始、自分派、同步或者进行并发。于此形成对比,将所有的那些任务留给软件,当(例如,通过使用界限)满足了线程的分派条件/相关性时,由软件初始/生成并且分派那些线程。当线程之间具有集中控制或相关性时,软件(例如使用锁定)同步它们。软件还决定它们之间的并发性。前面提及的软件可以为实际主线程编码或者为试图静态地并行化非线程代码的编译器。由于自动并行化这些代码的困难度,通常可以使用粗粒度并行来实施这些线程,而使细粒度并行没有高效线程初始。
下面的论述示出了,通过使用硬件/软件的新的示例,即使没有使用精细的乱序微架构支持,也可以实现自动并行化。通过动态地检测对控制条件和/或相关性的解决,并且/或者通过允许线程之间的寄存器的交叉引用,还通过消除跨线程的存储器引用的歧义,新的硬件生成、同步、并行化、并且并发地执行那些细/粗粒度静态和动态线程。通过硬件监视与触发线程相关联的控制条件,或者监视当前线程所依靠的交叉引用的寄存器的值的回写来实现。
当所有线程相关的寄存器和/或控制条件,特别是交叉引用特性的那些(例如,线程间寄存器)满足时,触发了通过硬件来动态地初始/生成线程。这可以通过该线程所依靠的源的明确列表来监视。可以将这些硬件初始的动态线程看作,或者实际地作为子例程调用、或者以重复的子例程调用的形式封装(encapsulate)的方法或者子程序或者甚至循环实例。该子例程调用具有其执行所依靠的参数列表。监视这些参数列表或者寄存器中出现的它们的回写阶段的。一旦每个参数/寄存器在其更新的指令执行之后被回写,则设置标志。当设置了这些参数/寄存器的所有标志时,则由硬件来动态地分派子例程/线程。对在(硬件生成的和软件生成的)不同线程之间的寄存器或虚拟寄存器的交叉引用和/或存储器的引用的允许,通过使用自然相关的解决方法,例如用于通过使用寄存器来创建程序的一致执行,以对相关性进行通信的方法等,来建立一种方式来生成、同步、并行化、以及并发地执行那些线程。
上面提及的缓慢逐步上下文切换还帮助允许通过使用如上面所示出的其他存在的动态/静态线程,来建立/交换那些动态生成的或软件静态的线程的新的状态。
图6示出了在那些线程的动态执行期间,对相同名称的寄存器解决错误相关的一个可能的机制。这还可以通过给寄存器号码增加在机器中的指令分配时刻用标签标记以示区分的位字段,来保持寄存器更新的序列顺序。在前端处,每个寄存器具有对应于该位字段的2个计数器的值。在将该寄存器作为指令的目的地的每个新指定处增加一个计数器的值(领先计数器)601。在将对应寄存器调拨给架构状态的每个调拨处增加另一个指针602(滞后计数器)。滞后计数器永远不会超过领先计数器。领先计数器可以在到达最大值之后,环绕到初始的值;滞后计数器具有相同的功能。可以将这些计数器扩展到具体的线程,这样每个线程拥有自己的计数器。
当相同寄存器的新的时刻作为目的地指定给新指令时,特定寄存器的领先计数器601持续增加。每次在将寄存器作为目的地来重新指定给新分配的指令603时,(在机器的前端中)增加该寄存器的位字段,并且将该位字段附着到硬件指令605上。使用这个寄存器作为源606的每个接下来的指令将会使用新增加的寄存器号码的扩展字段。当该寄存器调拨给最终的架构状态的时候,则增加在前端处的该寄存器的位字段的滞后计数器602。使用和增加滞后计数器的原因是,(在指令寄存器读取阶段)通过将硬件指令的寄存器的扩展字段,与来自滞后计数器指针的那个寄存器的滞后计数器作比较,来判断特定的寄存器的值是否需要从推测重排序缓冲区或者从退回的寄存器文件中读取。滞后计数器602在退回流水线阶段604处增加。该机制还可以应用到个别指令,并且不限于子例程的调用或线程。
可以使用继承向量来取代图6中示出的领先计数器和滞后计数器表,所述继承向量具有针对每个结构寄存器的字段。在分配时刻,该向量将增加的值从每个指令转发到下一个,而每个指令增加与其目的地寄存器相关的位字段中的值。在图8.a/8.b中示出了类似的概念,区别是针对一组指令而非仅一个指令。
在下面的段落中,描述优选的方法,来随同上面描述的上下文切换的操纵一起,来对自动/动态线程的生成和同步进行统一。此外,披露了物理-分段-架构-统一的寄存器的概念和非常适合于这个实施例的寄存器层级。
架构还可以包括指令矩阵/块/桶架构(可更换地使用矩阵、块或者桶),其中,指令是超大指令矩阵(Ultra Large Instruction Matrix)的部分,请参见同一发明人的申请号为PCT/US2007/066536的先前发明来了解作为“ULIM架构”的所述超大指令矩阵,通过引用的方式将该申请包含在本文中。指令矩阵/桶/块是指令的集合,其可以完全地相互相关或者完全地独立或者介于二者之间(一些指令依靠其他指令,而一些指令相对于其他指令独立)。通常指令矩阵/块以矩阵的格式来封装指令,在所述矩阵的格式中,相关的指令必须占据矩阵之内的一列,而独立的指令占据行或者列。可以从将架构映射给其硬件执行单元的少数可能的配置中选择矩阵的维(行、列)。矩阵/块/桶架构的概念允许代替个别源而在指令块/桶本身的级别上来传送相关性。因而,架构高效地移除了与CAM(内容可寻址的匹配)相关联的复杂度,所述CAM在大标量并发指令执行架构中匹配源和目的地。可以在没有明确的使用ULIM架构或者甚至指令桶/块下,通过使用替代的虚拟标识符,来设计本发明,所述虚拟标识符提供与块/桶相类似的作用,这样,没有真正的形成矩阵或块而是通过使用简单指令组,所述虚拟标识符在源组的级别上追踪(track)和解决相关性(dependency)。本实施相对于现有方案的优势在于,其对每个单一指令中使用个别源的相关性追踪和解决。
在指令桶之内和桶之间,可以将指令组合为VLIW或者SIMD或者MIMD。随后将桶作为动态线程来分派或执行。此外,静态软件线程可以共享动态线程的相同的机制,而不同之处在于其生成是基于软件应用线程的。
图7示出了如本领域相关技术所描述的传统的超标量乱序架构。个别指令使用重命名表(图中未示出)来重新命名,所述重命名表将架构寄存器映射到物理的寄存器池中,随后调度、执行并且退回指令。图7中示出的是具有3个分派端口704的一般指令调度器701。通过使用这些端口,可以以程序的乱序顺序来调度和分派至多3个指令,以在3个执行单元702上执行。通过使用回写端口705,指令在每个周期将它们的结果回写到重排序缓冲区703。随后至多3个指令可以从指令重排序/退回缓冲区703上退回,这样它们的目的地寄存器可以非推测地并且持久地更新架构状态。
与图7的乱序硬件的一般实施形成对照,如图8.a所示,本发明描述了动态调度乱序架构,其通过使用指令矩阵/桶和组级别相关性检查以及指令组分派和执行的概念来更好地定标。指令矩阵缓冲区和调度器801存储那些指令矩阵/块/桶。
在一个示例中,确定指令桶806来满足其相关性源并且从而准备进行分派,桶808和809也是如此。如果内部潜在的硬件允许,则这些桶可以在单一周期中执行,或者它们可以以流水线或者非流水线的方式经由多个推测周期来在执行单元802上逐步执行。在这些桶的执行结束之后,将它们的结果写入到指令矩阵退回寄存器文件803。通过使用回写端口805,将它们写作表示桶的一组寄存器。这些桶的目的地寄存器驻留在桶退回缓冲区,直到桶作为一个整体可以以程序的原序列顺序来更新架构状态为止。
于此描述的这个概念的一个可能的实施例具有3个组件:
1-前端
2-调度器和执行
3-后端退回
前端包括:推测线程桶-指针,桶源和目的地列表。调度器和执行桶包括桶分派选择器以及虚拟寄存器匹配和读取,包括可能的寄存器层级和寄存器缓存。后端是将执行的桶记录,并且在退回之前加强了异常排序。寄存器层级/缓存还为执行的桶的结果充当中间存储,直到该结果是非推测的并且可以更新架构状态为止(类似于图8.a示出的803)。下面公开了前端、分派阶段、以及后端的一个可能的实施,其中,记录了执行的桶。这些在图8.b中示出。
通过拾取新的线程矩阵/块/桶,过程开始,并且随后将新的线程桶指定到桶缓冲区中的空桶槽当中。在线程分配指针阵列852中的每个线程分配指针包括桶的间隔,在所述间隔中,允许线程物理地放置其指令的块/桶。每个这样的线程以轮转(round-robin)的方式将桶持续分配到在其对应的临近空间间隔内部的桶缓冲区阵列中。给每个线程空间内部的桶/块指定新号码852,其在指定新的桶/块的每个时刻增加。对于在桶850中的每个有效的源,每个桶的有效的源具有有效的读取位“Rv”,其指示了该源在这个桶内部所需要的指令。通过相同的规范,经由指令在该桶中回写的每个目的地寄存器在这个桶中具有有效的位“Wv”集,并且它具有目的地继承向量853中的字段。当将新的桶拾取到桶缓冲区时,它从之前分配的桶上继承目的地继承向量,所述之前分配的桶由线程桶分配指针852来指明。从之前分配的桶上拷贝继承向量,并且随后所述继承向量重写那些有效的目的地字段,所述有效的字段对应于由那些桶指令更新的寄存器。当从桶内部的对应的继承向量上拷贝无效的目的地时,使用当前桶的号码来标记有效的目的地。随后通过增加线程桶指针,来为新拾取的桶更新线程桶指针(其在内部环绕)。
在桶分派和执行阶段,当在没有任何异常操纵下执行桶时,则设置(包括桶号码的)桶执行标志854,在整个桶缓冲区中广播它,并且在每个桶之内锁住和监视它,所述每个桶具有源和作为源的桶号码。还可以根据桶的号码传送其他相关的信息,例如关于虚拟寄存器位置的信息等。当在桶之内设置了源桶的所有的执行标志时,则设置这个桶的就绪位855,并且准备好对桶进行分派和执行。当桶没有任何异常地执行,并且准备好以程序的序列顺序来更新架构状态时,则它退回桶,并且将退回线程指针857增加到阵列中的下一个桶。可以将退回的桶的位置指定给新的桶。
所有的这些密切相关的线程可以在矩阵/桶/块缓冲区内部共存;每个线程将占据属于该线程的连续的桶的间隔。该线程的分配指针以轮转的方式在桶的该间隔内部移动,来拾取新的指令桶并且以所描述的轮转的方式在线程的间隔之内分配它们。使用这样的间隔区域,将整个桶缓冲区动态地划分为不同的或者相同的间隔长度的桶。
此处引入了指令桶和线程的继承向量的概念。每个指令矩阵/块/桶写入到结构寄存器当中的特定寄存器中。在分配阶段上的每个新桶通过将线程和其所拥有的桶号码写入到继承向量来更新这个继承向量,并且保留没有写入而未更新的寄存器字段。该桶继承向量B_iv 856以程序的顺序从每个桶转发到下一个。在图8.b中,当在矩阵中的指令写入到寄存器中时,每个矩阵将其自己的号码写入到架构目的地寄存器中,否则它从该线程中的之前桶的B_iv上继承值。
经由图9中的实施例所示出,这个概念从管理少量密切耦合的线程的桶缓冲区上,按比例缩放到管理多个桶缓冲区和线程的硬件电路上。将可以扩展来处理具有更少紧密作用的更多线程的那些电路,描述为全局前端,所述全局前端处理线程头(header)902,而无需处理线程内的实际指令,来加强这些远线程的相关性的检查。线程的头和它的桶的子头只包括关于线程和桶写入到的结构寄存器(那些指令的目的地寄存器)的信息,而不需要那些线程头包括实际指令或指令的源。实际上,对那些目的地寄存器和位向量进行列表是足够的,在所述位向量中,为成为指令的目的地的每个寄存器设置每个个别位。不需要将头物理地放置为指令的头;它可以为任何格式的包或者紧密表示,或者在线程内的指令的目的地寄存器的头,其可以或者不可以用其余的指令信息存储。
该全局前端以程序的顺序只拾取线程/块的头,并且生成动态线程和/或桶继承向量901(Tiv和/或Biv)。如903所示,在分配新的线程的每个时刻,通过对当前线程桶没有写到或者更新的旧的字段进行保留,来转发继承向量。将那些继承向量分发给很多个引擎/核心或者处理器904,其中的每一个可以包括局部前端和拾取单元(所述拾取单元将会拾取和存储为每个桶产生相关性向量的实际指令)以及具有局部寄存器文件905的局部矩阵/块/桶缓冲区。随后,局部前-端拾取实际指令并且使用从全局前端获得的继承向量的信息,来为指令的指令源填充相关性信息,所述指令源带入到引擎中来执行。图9示出了全局前端实施例,以及只使用关于指令的简明信息(即是这些指令写入到的寄存器)来将继承向量散布到不同引擎904的方式。放置在头中的其他有用信息是关于线程内的或跨线程的控制路径的变更信息。可以使用全局分支预测器来预测跨这些线程的控制流。这样,头可以包括分支目的地和偏移量。除了用分支预测器确定控制流之外,硬件/编译器可以决定跨2个分支控制路径来分派独立线程。在这样的情况下,如图11所示,它将使用继承向量来稍后合并那2个路径的执行。图9还示出当由全局前部来拾取新的线程的头时的转发过程,例如,线程2(906)将更新对应的继承向量901,将所述继承向量901转发至结果的向量910,在所述向量910中,使用T2标记来更新寄存器1,2,3,4,6,0和7。注意T2桶没有写进910中的寄存器5,所以从之前的继承向量继承其标记。
一个值得注意的观察结果是寄存器文件允许进行跨核心/引擎之间的通信。只要将线程的指令桶拾取并且分配在局部桶缓冲区中,就可以安排跨引擎所需的寄存器的早期请求(以减小访问等待时间),此时填入源的相关性信息,这样在为执行分派实际指令之前,此刻跨引擎的线程引用发布得可能很长。在任何情况下,指令将不会分派直到转发和到达了交叉引用的源为止。可以将交叉引用的源存储在局部多线程寄存器文件或者寄存器缓存中。尽管可以将该交叉引用的源存储在类似于加载存储缓冲区的缓冲区中(它可以重新使用加载存储缓冲区的物理存储装置和相关性检查机制,只是由寄存器加载代替存储器加载)。可以使用许多拓扑结构来连接跨引擎/核心的寄存器文件,所述拓扑结构可以为环式拓扑结构或是纵横式(cross bar)拓扑结构或者网状路由式互连。
下面的论述示出如何在引擎内部以及跨引擎来使用寄存器文件分段。当对桶进行分派时,将其源(同时地或顺序地)发送到寄存器文件和寄存器缓存这二者上。如果将寄存器文件物理地统一并且具有对线程的直接支持,则从对应的线程寄存器区域直接读取操作数。如果寄存器文件是虚拟寄存器,包括使用了标签的物理分段的寄存器文件,则必须进行标签匹配以作为虚拟寄存器读取的一部分。如果标签是匹配的,则从分段的寄存器进行读取。
图9.b示出了总的虚拟寄存器文件和寄存器缓存。其还示出了执行的分段和低级别的寄存器文件。SIMD或者MIMD指令在每个区域访问对应的SIMD/MIMD寄存器数据段,而VLIW指令在每个段中独立地访问不同的寄存器。在线程模式中,每个区域独立地具有单独的指令,所述单独的指令独立地访问寄存器并且可以以统一的视点跨段寄存器来访问。图1-5中示出了线程如何使用寄存器文件层级的更多方式。
在不允许硬件独立地重新安排区域桶的情况下,为了允许保留指令调度的选择权,以作为由编译器自动并行化或者VLIW或者SMID/MIMD的合成物的组成,通过使用位或者标志来利用锁合机制,所述锁合机制将4个桶锁合在一起以作为超桶。如果设置了该位或标志,则超桶之内的所有桶(在这个情况中4个桶)将在相同的周期之内并行地执行。如果没有设置标志,则不将那些桶锁合来一起执行,并且可以在各个时刻独立地执行。
由线程或者乱序执行或者VLIW架构而导致的增加的并行给有限的带宽的寄存器文件带来了巨大的压力。通常将寄存器文件设计为统一资源以允许访问所有寄存器。分段的寄存器文件已经在前进行了设计,但是它们需要在架构/软件级别处跨读/写进行操作,这阻止它们用作资源的统一集,并且增加了跨读/写的开销。
本发明所公开的寄存器架构支持软件线程、硬件生成的线程、VLIW执行、SIMD&MIMD执行、以及乱序超标量执行的模拟。尽管它是物理分段的,但可以将其看作统一的架构资源。该分段的寄存器是虚拟寄存器文件的部分,其可以包括寄存器层级和寄存器缓存以及存储和检查寄存器标签的机制。如果我们使用利用相关性继承向量的基于位置的方案,则无需访问标签。方案这样工作:当在分派阶段期间广播所执行的桶的号码时,接下来的指令的所有源进行CAM(Content Addressable Match,内容可寻址匹配),所述CAM将它们的源的桶与刚分派的/执行的桶作比较,来为该源设置准备好的标志。此处,还可以将该桶执行的物理位置随同寄存器号码一起进行传播,以便解决任何不确定点。例如,在图9.c中,有4个寄存器文件段,其中每个包括16个寄存器。例如,当将桶#x分派给区域2时,将桶的号码x广播到桶缓冲区,并且还将段#2和其一起广播,这样与桶x相关的所有源将会记录:该桶写入了段2中的其所有的寄存器。当时间逐渐开始分派这些指令时,指令知道它们需要从段2而非其它段上读取它们的寄存器,即使相同的寄存器号码存在于其它段上时也是如此。这还可以应用到寄存器缓存上来避免使用标签。我们可以将这个概念扩展到全局前端,其中除了线程信息以外,继承向量可以指明在哪个引擎中,分配了写到这个寄存器的指令桶。
下面描述能够发布SIMD/MIMD、VLIW、ULIB桶指令以及动态和静态线程的统一动态执行架构的描述。该架构在没有明确的乱序组件支持下,还支持对超标量乱序实施的模拟。本发明还可以包括物理分段的、架构的统一寄存器文件以及可能的寄存器缓存层级。
本发明公开的技术允许使用一个统一的方法和实施,来执行软件初始的线程(静态线程)和硬件生成的线程(动态线程)。如上面所提及的,在软件静态最优化级别处识别细粒度并行是困难的,并且因而动态线程是这样一种方式:硬件可以通过将动态线程上下文指定给诸如子例程调用、方法子程序/步骤、独立对象代码或并行循环之类的软件桶,来生成该细粒度并行。这使用公开的实施和方法,通过将一些线程和它们的位标识字段保留给动态线程来实现。例如,在上面的实施例中,可以向硬件动态线程指定可使用的“m”个线程之内的子集线程组。硬件可以使用那些分配的线程保留来动态地生成和/或发布硬件线程。例如,在(如集合的“Call”指令所指示的)子例程调用或子程序调用中,硬件可以将新的硬件线程指定给这样的调用并且最终在返回子例程之后可以要求收回线程保留。相同的情况也适用于循环,在所述循环中,可以在(如集合的“循环”指令所指示的)循环的入口处生成硬件线程并且可以在循环到达其末端指令之后收回硬件线程。还可以将其它指令用作可能的硬件线程机会,其直接映射为软件线程可应用的事件,例如面向对象的方法执行或者对象关联代码或者任何高级别或面向对象的编程线程机会等。在所有情况中,可以通过寄存器/桶的交叉引用和控制标志来解决那些线程的同步,这可以通过上面描述的机制来追踪和解决。图10示出了硬件(或者编译器或者运行时间系统)如何可以使用动态硬件线程来在在循环体内和周围不同的地区重叠指令的执行。
再一次地,线程头封装在循环体1004之内的每个桶的目的地寄存器以及在循环体之外的代码。在这样的情况下,硬件或者编译器将线程T1分配到循环体之前的代码,并且作为示例只有一个线程分配到循环体T2(通过将循环划分为多个迭代直到可以将每个循环迭代指定给一个线程的程度,可以将多个线程指定给循环体,循环线程的数量是可以在运行时或基于编译器分析来决定的动态决定),而将循环之后的代码指定给线程T3。
注意,由反馈回循环体自身T2的循环体的桶生成的继承向量1005,与转发到循环体T3之外的线程的那个继承向量1005些许有些不同。这是因为T3需要由循环上迭代的寄存器的最后拷贝,该最后拷贝通过n指示在继承向量1002中。当到达循环的最后迭代时,在该最后迭代中的所有桶运送标志n,所述标志n设置为桶号码的一部分(即,最高有效位)。
依靠循环体T2的T3寄存器将依靠这个迭代循环T2的最后迭代n。这也将记录在转发给T3的继承向量中。此处假设T2内部没有任何控制路径来在不同的迭代而非最后迭代处改变循环寄存器的更新,则可以使用预测来解决这种控制路径(如果它存在于循环体的内部的话)。可以将标识符n作为如1003中的继承向量的一部分来运送和转发。这本质上指代由循环的最后迭代所构成的桶。此外,可以将这些动态线程指定到分支代码(分支或控制变更接下来的代码)。
可以使用该技术来发布2个线程,其中的一个沿着2个可能分支的每一个。在这样的情况下,一旦架构地解决了分支,则线程中的一个调拨,而另一个则丢弃。考虑到2个不同的控制路径影响不同的寄存器的情况,正在转发向量的继承将指明标志控制F,所述继承将标志控制F嵌入到其桶号码字段1101中。在这样的情况下,在这个控制路径上的相关的桶中的每个源将为由可变的控制路径影响的那些源的寄存器,来指明在继承向量1102之内的条件标志。将标志与桶的号码一起来广播,所述桶的号码在CAM分派匹配期间将该标志执行/解决到所有相关的桶。这在图11中示出。标志的意图是远离的桶目前不知道哪个线程将转发它们所需要的特定寄存器的值(直到执行和决定了标志条件为止)。沿着2个可能的分支路径的线程沿着其路径继续执行,直到解决了标志并且线程中的一个(沿着错过预测的路径)被毁灭为止。但是在沿着这2个路径的执行期间,拾取了和分配了沿着支路方向的许多其他桶,并且它们的继承向量转发。它们需要标志,这样它们可以使用从沿着正确的执行路径的控制线程而来的正确的桶号码,来代替该标志(且等待它们的相关性来解决)。注意我们还可以利用线程和桶的头来找到需要的信息。
另一个创新是在每个结构寄存器之内包含标志字段,这样每个架构或者物理寄存器具有作为可视状态的部分的标志字段。这允许简单的预测机制,来为了指令的条件执行以及能够为分支指令容易地创建延迟槽。例如,在循环中,对循环末端条件进行测试的测试指令必须先于分支指令而在二者之间没有其他标志修改指令。使用寄存器标志字段,并且通过指明这个分支依靠特定寄存器的标志(在这个示例中为测试指令的目的地),测试条件可以放置在循环的开始处,而实际分支可以在循环的末端处。因此,到分支指令在机器的前端处的时候,已经执行了测试指令并且分支可以在前端阶段处而非执行阶段处执行,因而排除了气泡。这些寄存器标志还方便简单指令预测或者线程预测,其中,可以基于特定指令的结果,来对指令或者特定线程中的指令进行预测,该结果不需要在执行序列中立即先于它。这在图11.b中示出。在分支位置处的前面执行测试指令1150。基于其结果,测试指令1150将拾取正确的程序指令路径,并且将它们保留在一边,直到将分支指令1151分配在前端处为止,并且随后它在流水线中供给这些指令。该架构还可以在分支1152之后预测指令,并且在分支之前执行它们,并且在执行分支之后使用有条件的移动。如图11.b所示,还可以与执行的2个可能的控制路径并排来发布2个线程。
此外如图11.b所示,测试指令1150可以推测地执行,即使其源是推测的或者由硬件或编译器来预测也是如此,但该测试指令1150写入到推测的标志F2中,当先于分支1151的实际测试指令执行时,它写入到标志F1中,此时,将2个标志进行比较,并且如果它们匹配与测试指令1150一起开始的推测的执行线程,则该线程继续存在于正确的控制路径中,而其他线程使用在图11.a示出的继承合并机制来停止拾取指令。如果标志没有匹配,则其他线程继续拾取,并且早期测试的线程停止。这允许线程推测和从错过的推测中快速恢复(对于早期的分支推测没有等待时间惩罚)。
通过线程的寄存器交叉引用的操作是在线程自动生成和/自动同步中的另外一个关键要素。可以通过使用经由加载和存储的存储器访问,来进行这些引用。
在图12中,我们大体上示出了全面的方案以及在线程之间和/或在加载和存储之间的同步方案的实施。方案描述了优选的方法,用于跨加载/存储架构的存储器引用和/或跨存储器引用和/或线程的存储器访问的同步和消除歧义。在图12中,我们示出了寄存器文件(地址或数据寄存器)的多个段、执行单元、地址计算单元、第1级缓存和/或加载存储缓冲区和第2级缓存和地址寄存器互连1200和地址计算单元互连1201。通过将一个核心/处理器的集中资源分段和分发给各个引擎,可以在一个核心/处理器内构建这些分段要件,或者从多核心/多处理器配置中的不同的核心/处理器要件来构建它们。在图中将段1211中的一个段示为段号1;可以将该段按比例缩放到大的号码上(通常到如图中示出的N段)。
这个机制还当作对于这些引擎/核心/处理器之间的存储器架构的一致方案。这个方案由地址请求开始,所述地址请求来自于一个段/核心/处理器中的地址计算单元之中的一个。例如,假设由段1(1211)来请求地址。通过使用属于其自己段的地址寄存器,和/或使用地址互连总线1200从跨其他段的寄存器上,段1可以获得并且计算它的地址。在计算地址之后,它创建用于访问缓存和存储器的32位地址或64位地址的基准地址。通常将该地址切分为标签字段以及集和行字段。该特定的段/引擎/核心将地址存储到它的加载存储缓冲区和/或L1和/或L2地址阵列1202中,同时,通过使用压缩技术,它将创建标签的压缩版本(具有比地址的原标签字段更少的位)。更多的,不同的段/引擎/核心/处理器将集字段或集字段的子集用作索引来识别地址保持在哪个段/核心/处理器中。通过地址集字段位,该段的索引确保在特定的段/核心/引擎中的地址的所有权的排他性,即使与该地址相对应的存储器数据可以存在于另一个或多个其他的段/引擎/核心/处理器中也是如此。即使在每个段中均示出了地址CAM/标签阵列1202/1206与数据阵列1207相耦合,但它们的耦合也只是布置和规划的物理近似,或者即使源于二者都属于特定的引擎/核心/处理器的事实,但是在一个段内的地址阵列中保留的地址和数据阵列中的数据之间也没有关系。在形成压缩的地址标签之后,使用集的字段位来识别请求寻址到的特定地址CAM阵列1202。假设其表明是那些集字段位将段/引擎/核心/处理器N认作是拥有地址作用域的那个,则将经压缩的标签发送到该段/引擎/核心/处理器(段N)L1CAM和/或相对应的L2CAM阵列,该阵列随后将在到来的压缩标签和每个存储在其阵列中的压缩标签之间进行内容地址匹配(CAM)。由于这是经压缩的标签,因而CAM阵列的宽度是最小的并且因而能够负担对Cam阵列的构造,所述CAM阵列具有与数据阵列相似的项号码(通常CAM阵列是非常昂贵并且耗能的)。在CAM匹配完成之后,其结果可以是命中(hit)或错过或多个命中(优良的压缩技术应该减少多个命中的数量)。随后在命中/多个命中的情况中,读取在段N中对应于CAM命中项的Loc阵列(位置阵列)1203,以识别属于该地址CAM命中的存储器数据的位置。将这个位置发送回至在请求段(本示例中的段1)中的请求地址计算单元。通过使用互连1205的地址总线,该地址计算单元请求缓冲区1204将完整的地址发送到该位置1206的地址/标签阵列。随后进行标签的查询和比较,以确保准确的/唯一的命中。随后,如果确定命中,则使用互连1205的数据总线来返回数据。如果在CAM或是完整标签阵列中发生错过,则代表请求段来放置对于主存储器的请求。如前面所描述,还可以(通过使用作为索引的集字段位)削减更多的等待时间并且将完整的地址发送到拥有地址作用域的段上,并且在该段处,经压缩的标签形成并且首先访问CAM阵列,而完整的地址保持在那直到检查了CAM的结果为止,并且读取位置阵列1203来识别在标签阵列1206中的项号码,所述标签阵列1206在压缩标签1202中命中。随后使用与CAM相对的比较器,来在完整的地址上完成了完整比较。CAM阵列1202的实施并不需要真正地进行CAM,由于这个比较可以在从完整的标签上获得的并且经过压缩的少数位上进行,所以CAM阵列可以实施为直接索引的寄存器阵列(RAM)或寄存器文件,例如,如果经压缩的标签形成为8位字,则可以将每个段的整个CAM阵列构建为256项RAM阵列,当8位经压缩的标签到来时,它直接进行解码来读取位置阵列1202的256个项中的一个(事实上,在这种8位经压缩的标签的情况中,CAM阵列不需要存在为物理阵列),因此它实际只需要用作对于未知阵列1203的解码器。
下面所要描述的方案,通过定义称为“Temp”和“Chain”的临时存储变量,来在没有增大寄存器压力的情况下,经由发布(issue)存储器加载时间头,来定义存储器等待时间的减少。只使用一次Temp,来允许指令指代存储器中的位置并且将其预取到缓存缓冲区中的特定位置。这不同于将数据预取到缓存的一般方案,因为使用了Temp和Chain,在Temp预取的值和其消耗指令之间是直接1对1的映射;这类似于寄存器指定可以实现的。相反地,在一般的缓存预取方案中,将存储器数据预取到缓存中的地址位置,消耗指令可以通过数据的存储器地址只引用这个预取的数据。消耗指令的调度器并不知道,因而当它执行时消耗指令没有关于发现数据的控制。在此处描述的新的Temp和Chain的方案的情况中,将存储器数据带入到缓存缓冲区中(加载到缓存中)并且指定一个名称“例如,Temp 0”,这个加载的数据的消耗指令现在可以明确地指代Temp 0这个名称,并且当它执行时,期望以绝对的保证在Temp 0的位置中发现它。因为将Temp号码指示为在消耗指令中的相关性的源,所以调度器将不会发布该指令直到Temp是有效的为止(将数据从高级别缓存或存储器实际地加载),这类似于当将存储器的值加载到寄存器中时所发生的。如图12所示,除了精确预取(1对1映射)和消耗指令的精确调度之外,这还具有额外的益处,其允许等待时间可变的分布式缓存。
为了防止Temp和Chain在它们的消耗者指令释放它们之前毁灭,编译器确保它将不会重新指定这个寄存器,直到它的消耗者指令消耗它为止。
在寄存器指定和Temp指定之间的不同很少;使用Temp来将所需要的数据带入到缓存缓冲区中的临时位置,其根据消耗指令(引用该临时位置的指令)消耗的值,来自动地进行解除分配。这将确保该Temp的生命范围很短,并且因此可以非常高效的使用。可以在程序中指定Temp的一个方式是通过使用与加载地址计算指令(或地址计算指令桶)相配的标记,即:
Label Temp 0:Load@Address(R5,R2,offset)
..........
..........
R10←R17,Temp 0
在代码段或子例程中使用的Temp的标记必须在它们使用之前分配。所以,如果编译器意图使用10个Temp位置来预取其数据,它应该使用分配指令/例行程序,来在缓存/缓冲区中分配这些Temp位置。在每个指令消耗其Temp之后,Temp位置将会空闲。硬件可以通过上下文切换来为另一个子例程重新分配这些Temp位置/标记;可以通过使用缓存/缓冲区来实现,所述缓存/缓冲区存储和线程ID标签一起分配的Temp数据。当遇到新的子例程的标记分配指令时,硬件将指定新的线程ID和新的Temp位置。如果在缓存/缓冲区中没有空间来供新的线程占据时,则在分配指令/例行程序的执行期间,非闲置的Temp位置(由于它们的指令还没有消耗它们)将和正在切换的其余的旧的线程上下文一起,在存储器中进行保存/存储。当带回该旧的线程上下文来继续执行时,需要回复和载回这些Temp数据。这将考虑到意料之外的上下文切换的执行(例如,在对异常句柄的调用或系统调用时)。
Chain类似于Temp,但具有两个不同:Chain是相对于只一个的位置序列;第二是无论具有多少要件“位置”,整个Chain都用一个标记进行标记。通过使用类似于Temp的分配指令/例行程序,来分配Chain。类似地,当执行该分配指令时,如果在缓存/缓冲区中没有针对Chain的新的线程空间可以指定给新的线程Chain,则检查该特定Chain要件以找到哪些已经释放(由于它们的消耗指令已经使用了它们),将没有释放的那些保存到存储器并且新的线程Chain可以使用那些空间。Chain允许在一列预取/加载指令和消耗指令之间创建1对1的映射,以允许获得类似于具有可变名称号码的寄存器(例如,R(i))的效果。这在循环内部访问阵列是有作用的,其中,在远远先于消耗指令在循环中需要值之前Chain可以在循环内部开始对值进行预取,而没有产生寄存器压力。可以使用缓存架构或缓存缓冲区来存储这些Temp和Chain要件。图13以乱序执行的方式示出了对于这种概念的实施。在这里,我们再一次示出指令调度器/块缓冲区1304以及转发继承向量。一旦分配了Chain,则它在继承向量中具有重置为0的它的写计数器1300。对于通过加载而到Chain的每个新的指定,写计数器增加并且从而写到Chain要件的每个指令中的目的地要件将得到这个新增加的号码,例如,在3个指定将3个寄存器位置加载到缓存缓冲区之后,计数器的值将会为3个1301。在循环代码内部和对于每个循环迭代,通过每个Chain加载指令,写计数器将继续增加;加载到Chain中的这些展开的动态指令将具有这些增加的号码。在消耗指令方面,具有读计数器1303,当遇到从Chain要件读取的每个消耗指令时,读计数器1303也保持增加。例如,引用Chain 0来读取其第一要件的第一指令将读计数器从0增加到1,随后,从Chain读取的下一个消耗指令将计数器增加到2并且因而它知道它需要从Chain 0第二要件读取。在没有真正依赖于实际的加载地址下,写计数器和读计数器的方案确保在Chain要件的加载和它们的消耗指令之间的隐含的同步。这将具有可变寄存器索引(例如,R(i))的类似效果。
图13.b示出在缓存或缓冲区1351中的项(例如示出为1350的Temp 16)如何依靠由编译器/硬件插入的分配指令1352来动态地指定给Temp和Chain。在图中,将Chain A指定给硬件Chain 0、1、2、3(其中每个在缓冲区中均具有4个位置,通过结合它们的存储1353,因而总共给Chain A分配16个单位)。在图中还示出了Chain和Temp的存储如何覆盖在彼此的上面,从而可以将存储在同时指定给一个或其他。
图14所描述的机制使用地址寄存器,来提供乱序执行以对存储器地址进行计算,而不需要过多的存储空间来重命名那些地址寄存器。该方案利用下述事实:地址寄存器一般通过增加或减少它们来更新1400,而可以通过增加和按比例缩放多个这种地址寄存器来构建完整的寄存器地址以形成完整地址(例如,将基本寄存器添加到索引寄存器并且使用缩放比例因数和偏移量)。方案使用硬件调度器/指令桶缓冲区1401以及继承向量1402来存储增加/减少计数器1403,即,每次地址寄存器增加或减少时,继承向量记录进行那个更新的指令/桶,并且在调度器/桶缓冲区内部对增加/减少事件的计数器1403进行追踪。每个地址寄存器的名称只有一个实际的物理地址寄存器,(所以不存在多个物理寄存器,可以如同在一般重命名的方案中所使用的那样,来指代相同寄存器名称)。为了加载其他的值或者进行更新而不是增加或减少寄存器地址,则必须使用常规的数据寄存器(因为它们是重命名的并且可以存储寄存器名称和值的多个版本)。当执行这样的引入的更新指令时,将数据寄存器的拷贝发送给一个地址寄存器,并且随后,如描述的那样可以使用继承向量来增加或减少地址寄存器,以对更新进行追踪。
图15扩展了加载的用法,以允许对传送较早存储指令1500的数据进行推测的预取/加载(它们以程序顺序先于加载而存储)。
方案在Temp指定指令1503(预取/加载指令)中使用限定符。该Temp指定/预取指令将数据带入到缓存中的临时位置,它对地址没有执行任何次序检查(发现id,加载的地址与还没有执行的存储的那个进行匹配。进行这个次序检查来阻止所述加载从存储器带进数据,所述存储器还没有由在序列执行中先于加载的存储更新)。当消耗指令1504执行时进行顺序检查,此时,如果顺序检查发现:预取的地址正在从在前的存储的相同地址上加载,则消耗指令1504将使用绕过机制(bypass mechanism)来得到存储的值而不是Temp的值。通过为匹配的地址检验存储缓冲区,可以将绕过机制随着次序检查处理来实施,该处理执行为消耗指令1504的执行阶段的部分,并且如果发现了针对在Temp指令加载来自存储器的数据之后执行的较早的存储指令的那个,则该存储的值将会绕到消耗指令而不是Temp的值。可以通过使用相同线程控制预测来在存储指令前面预测地执行消耗指令自身,通过对次序检查发布和有条件的执行预取/加载线程,在图11中论述了所述相同线程控制预测,当先前存储1500执行时,明确地或者隐含地执行所述次序检查。
在本发明的描述中,通过示例示出使用了Temp、Chain、和指令桶的实施方案,通常可以将本文描述的方案应用到加载或简单指令上。
而且,所示的同步和一致方案并不仅仅应用到分布式缓存或单独地址寄存器文件上,还可以将它们使用在不同处理器之间的集中缓存中,并且还可以部分地或通过对方案进行轻微修改来使用它们。
本发明的一个实施例涉及具有计算机可读介质的计算机存储产品,所述计算机可读介质上有计算机代码,以进行各种计算机实施的操作。该介质和计算机代码可以是为了本发明的目的而专门设计和构建的,或者它们可以是公知的并且对计算机软件领域的技术人员来说是可得到的。计算机可读介质的示例包括但不限于:诸如硬盘、软盘、磁带之类的磁介质;诸如CD-ROM、DVD和全息装置之类的光介质;磁-光介质;以及专门配置为存储和执行诸如专用集成电路(“ASIC”)、可编程的逻辑装置(“PLD”)以及ROM和RAM装置之类的程序代码的硬件装置。计算机代码的示例包括:诸如由编译器等产生的机器代码,以及包含更高级别代码的文件,所述更高级别代码通过使用解释器由计算机来执行。例如,可以使用Java、C++、或者其他面向对象的程序语言和开发工具来实施本发明的实施例。可以使用硬件实现的电路代替或结合机器可执行的软件指令,来实施本发明的另一个实施例。
用于解释目的的上文描述使用了具体的术语来提供对于本发明的全面理解。然而,对本领域技术人员显而易见的是,实施本发明无需这些具体的细节。因而,将本发明的具体实施例的上文描述表示为示例和描述的目的。其目的并不在于穷举举例或是将本发明限制到所公开的准确形式;显然,可以根据上面的教诲来进行各种修改和变化。为了最佳地解释本发明的主旨和其实际的应用,来选择和描述实施例,因此它们使本领域的其他技术人员能够最佳地使用本发明和各个实施例来进行各种修改,以适合所预期的特定使用。由所附权利要求和其等价物来限制本发明的范围。

Claims (25)

1.一种用于指令处理的系统,包括:
缓存,其适于存储用于消耗指令的预取值,其中所述缓存将预取值与对应的名称相关联,并且其中所述消耗指令明确地引用所述对应的名称而不是存储器地址,其中所述对应的名称是与在程序内的加载地址计算指令相关联的标记,所述程序被执行以获取所述预取值;以及
调度器,其用于在所述预取值被加载到所述缓存中后发布所述消耗指令,并且用于在执行单元利用所述预取值执行所述消耗指令后释放在所述缓存中的所述预取值的位置。
2.根据权利要求1所述的系统,其中所述对应的名称指示所述消耗指令的相关性源。
3.根据权利要求1-2中任何一项所述的系统,其中在所述执行单元执行所述消耗指令前,所述调度器确认从更高的存储器有效地加载了所述预取值。
4.根据权利要求1所述的系统,其中使用响应于对异常句柄的调用或系统调用的上下文切换来释放所述预取值的所述位置。
5.根据权利要求1所述的系统,进一步包括具有标志字段的物理寄存器,以便于指令的条件执行以及用于分支指令的延迟槽。
6.根据权利要求1所述的系统,进一步包括指令拾取单元,其用于响应于对标志字段的求值而选择性地执行条件分支指令。
7.根据权利要求1所述的系统,
其中所述缓存适于存储用于消耗指令的序列的预取值的序列,其中不管预取值的数量如何,所述预取值的序列具有单一的对应的名称,
其中所述执行单元适于使用所述预取值的序列执行所述消耗指令的序列,以及
其中所述调度器适于在所述执行单元执行所述消耗指令的序列后释放在缓存中的所述预取值的序列的位置。
8.根据权利要求7所述的系统,其中使用所述预取值的序列来访问循环内的矩阵。
9.根据权利要求7-8中任何一项所述的系统,其中当所述预取值的序列被分配时,调度器重置在继承向量中的写计数器。
10.根据权利要求8所述的系统,其中所述调度器为循环的每次迭代增加写计数器。
11.根据权利要求7所述的系统,其中所述调度器为每个消耗指令维护读计数器。
12.根据权利要求11所述的系统,其中所述调度器为从所述消耗指令的序列读取的每个消耗指令增加所述读计数器。
13.根据权利要求7所述的系统,其中所述缓存存储所述预取值的序列,而不进行顺序检查。
14.根据权利要求13所述的系统,其中当消耗指令执行时所述调度器针对存储进行顺序检查。
15.根据权利要求14所述的系统,其中所述调度器包括绕过机制以选择性地拾取存储的值而不是预取值。
16.根据权利要求7所述的系统,其中所述执行单元通过线程控制推测在存储的指令前推测性地执行消耗指令。
17.根据权利要求16所述的系统,其中在顺序检查时所述执行单元有条件地执行预取线程,所述顺序检查是执行在先存储时所作出的。
18.一种由处理器实现的方法,包括:
在缓存中存储用于消耗指令的预取值,其中所述缓存将所述预取值与对应的名称相关联,并且其中所述消耗指令明确地引用所述对应的名称而不是存储器地址,其中所述对应的名称是与在程序内的加载地址计算指令相关联的标记,所述程序被执行以获取所述预取值;
在所述预取值被加载到所述缓存中后,发布所述消耗指令;以及
在使用所述预取值执行所述消耗指令后,释放在所述缓存中所述预取值的位置。
19.根据权利要求18所述的方法,其中所述对应的名称指示所述消耗指令的相关性源。
20.根据权利要求18-19中的任何一项权利要求所述的方法,进一步包括:
在执行所述消耗指令前,确认从更高的存储器有效地加载了所述预取值。
21.根据权利要求18所述的方法,其中用响应于对异常句柄的调用或系统调用的上下文切换来释放所述预取值的位置。
22.根据权利要求18所述的方法,进一步包括:
在所述缓存中存储用于消耗指令的序列的预取值的序列,其中不管预取值的数量如何,所述预取值的序列具有单一的对应的名称;
使用所述预取值的序列执行所述消耗指令的序列;以及
在执行所述消耗指令的序列后释放在缓存中的所述预取值的序列的位置。
23.根据权利要求22所述的方法,其中所述预取值的序列用于访问循环内的矩阵。
24.一种其中存储有代码的机器可读介质,当所述代码被执行时,使处理器执行如权利要求18-23中任何一项所述的方法。
25.一种计算装置,包括多个模块以用于执行如权利要求18-23中任何一项所述的方法中的对应步骤。
CN201710264673.1A 2006-11-14 2007-11-14 多线程架构 Active CN107368285B (zh)

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
US86581606P 2006-11-14 2006-11-14
US86582006P 2006-11-14 2006-11-14
US86581806P 2006-11-14 2006-11-14
US86581306P 2006-11-14 2006-11-14
US60/865,816 2006-11-14
US60/865,813 2006-11-14
US60/865,820 2006-11-14
US60/865,818 2006-11-14
CN200780046679.6A CN101627365B (zh) 2006-11-14 2007-11-14 多线程架构

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN200780046679.6A Division CN101627365B (zh) 2006-11-14 2007-11-14 多线程架构

Publications (2)

Publication Number Publication Date
CN107368285A CN107368285A (zh) 2017-11-21
CN107368285B true CN107368285B (zh) 2020-10-09

Family

ID=39402461

Family Applications (2)

Application Number Title Priority Date Filing Date
CN200780046679.6A Active CN101627365B (zh) 2006-11-14 2007-11-14 多线程架构
CN201710264673.1A Active CN107368285B (zh) 2006-11-14 2007-11-14 多线程架构

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN200780046679.6A Active CN101627365B (zh) 2006-11-14 2007-11-14 多线程架构

Country Status (4)

Country Link
US (4) US8677105B2 (zh)
EP (3) EP2527972A3 (zh)
CN (2) CN101627365B (zh)
WO (1) WO2008061154A2 (zh)

Families Citing this family (154)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2011018B1 (en) 2006-04-12 2016-07-13 Soft Machines, Inc. Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
EP2527972A3 (en) 2006-11-14 2014-08-06 Soft Machines, Inc. Apparatus and method for processing complex instruction formats in a multi- threaded architecture supporting various context switch modes and virtualization schemes
US8275963B2 (en) * 2008-02-01 2012-09-25 International Business Machines Corporation Asynchronous memory move across physical nodes with dual-sided communication
US8095758B2 (en) * 2008-02-01 2012-01-10 International Business Machines Corporation Fully asynchronous memory mover
US8245004B2 (en) * 2008-02-01 2012-08-14 International Business Machines Corporation Mechanisms for communicating with an asynchronous memory mover to perform AMM operations
US8356151B2 (en) * 2008-02-01 2013-01-15 International Business Machines Corporation Reporting of partially performed memory move
US8327101B2 (en) * 2008-02-01 2012-12-04 International Business Machines Corporation Cache management during asynchronous memory move operations
US8015380B2 (en) * 2008-02-01 2011-09-06 International Business Machines Corporation Launching multiple concurrent memory moves via a fully asynchronoous memory mover
US20090240928A1 (en) * 2008-03-18 2009-09-24 Freescale Semiconductor, Inc. Change in instruction behavior within code block based on program action external thereto
US8312448B2 (en) * 2008-06-24 2012-11-13 Sony Corporation Configuration to order software deployment and management
US8078844B1 (en) 2008-12-09 2011-12-13 Nvidia Corporation System, method, and computer program product for removing a register of a processor from an active state
US8200949B1 (en) * 2008-12-09 2012-06-12 Nvidia Corporation Policy based allocation of register file cache to threads in multi-threaded processor
US10698859B2 (en) 2009-09-18 2020-06-30 The Board Of Regents Of The University Of Texas System Data multicasting with router replication and target instruction identification in a distributed multi-core processing architecture
CN101799771B (zh) * 2010-04-09 2012-12-12 上海华为技术有限公司 一种线程切换的矢量现场切换方法和装置
JP5531760B2 (ja) * 2010-04-28 2014-06-25 富士通株式会社 情報処理システム及び情報処理方法
US9501285B2 (en) * 2010-05-27 2016-11-22 International Business Machines Corporation Register allocation to threads
US8825982B2 (en) * 2010-06-10 2014-09-02 Global Supercomputing Corporation Storage unsharing
US9021241B2 (en) 2010-06-18 2015-04-28 The Board Of Regents Of The University Of Texas System Combined branch target and predicate prediction for instruction blocks
WO2012037491A2 (en) 2010-09-17 2012-03-22 Soft Machines, Inc. Single cycle multi-branch prediction including shadow cache for early far branch prediction
US9552206B2 (en) * 2010-11-18 2017-01-24 Texas Instruments Incorporated Integrated circuit with control node circuitry and processing circuitry
CN108376097B (zh) * 2011-03-25 2022-04-15 英特尔公司 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段
TWI533129B (zh) 2011-03-25 2016-05-11 軟體機器公司 使用可分割引擎實體化的虛擬核心執行指令序列程式碼區塊
EP2689326B1 (en) * 2011-03-25 2022-11-16 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
TWI666551B (zh) 2011-05-20 2019-07-21 美商英特爾股份有限公司 以複數個引擎作資源與互連結構的分散式分配以支援指令序列的執行
EP2710480B1 (en) 2011-05-20 2018-06-20 Intel Corporation An interconnect structure to support the execution of instruction sequences by a plurality of engines
US9600288B1 (en) * 2011-07-18 2017-03-21 Apple Inc. Result bypass cache
US20130024647A1 (en) * 2011-07-20 2013-01-24 Gove Darryl J Cache backed vector registers
US8850557B2 (en) * 2012-02-29 2014-09-30 International Business Machines Corporation Processor and data processing method with non-hierarchical computer security enhancements for context states
CN104040491B (zh) * 2011-11-22 2018-06-12 英特尔公司 微处理器加速的代码优化器
IN2014CN03678A (zh) 2011-11-22 2015-09-25 Soft Machines Inc
WO2013101323A1 (en) * 2011-12-30 2013-07-04 Intel Corporation Micro-architecture for eliminating mov operations
US9454371B2 (en) 2011-12-30 2016-09-27 Intel Corporation Micro-architecture for eliminating MOV operations
US8756194B1 (en) * 2012-05-04 2014-06-17 Sencha, Inc. Cloud-based data replication for web applications with replica identifier reassignment feature
KR101996592B1 (ko) * 2012-06-15 2019-07-04 인텔 코포레이션 명확화 없는 비순차 load store 큐를 갖는 재정렬된 투기적 명령어 시퀀스들
CN104583943B (zh) 2012-06-15 2018-06-08 英特尔公司 拥有具有分布式结构的动态分派窗口的虚拟加载存储队列
EP2862063B1 (en) * 2012-06-15 2022-09-14 Intel Corporation A lock-based and synch-based method for out of order loads in a memory consistency model using shared memory resources
CN104583936B (zh) * 2012-06-15 2019-01-04 英特尔公司 具有组成按序从存储器进行读取的加载的存储器一致性模型中的乱序加载的信号量方法和系统
CN104583975B (zh) * 2012-06-15 2017-07-14 英特尔公司 无消歧乱序加载存储队列
EP2862084A4 (en) 2012-06-15 2016-11-30 Soft Machines Inc METHOD AND SYSTEM FOR IMPLEMENTING RECOVERY FROM A SPECULATIVE TRANSMISSION OF FAULT FORECASTS / ERRORS DUE TO THE CHANGE AND OPTIMIZATION OF MEMORY LOADS
WO2013188311A1 (en) * 2012-06-15 2013-12-19 Soft Machines, Inc. A load store buffer agnostic to threads implementing forwarding from different threads based on store seniority
KR101996351B1 (ko) 2012-06-15 2019-07-05 인텔 코포레이션 통합된 구조를 갖는 동적 디스패치 윈도우를 가지는 가상 load store 큐
CN104583956B (zh) 2012-06-15 2019-01-04 英特尔公司 用于实现加载存储重新排序和优化的指令定义
GB2514956B (en) * 2013-01-21 2015-04-01 Imagination Tech Ltd Allocating resources to threads based on speculation metric
CN103973600B (zh) * 2013-02-01 2018-10-09 德克萨斯仪器股份有限公司 用于分组处理旋转掩码合并和寄存字段指令的方法及装置
US20140244977A1 (en) * 2013-02-22 2014-08-28 Mips Technologies, Inc. Deferred Saving of Registers in a Shared Register Pool for a Multithreaded Microprocessor
US10140138B2 (en) 2013-03-15 2018-11-27 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
WO2014151018A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for executing multithreaded instructions grouped onto blocks
US20140281116A1 (en) 2013-03-15 2014-09-18 Soft Machines, Inc. Method and Apparatus to Speed up the Load Access and Data Return Speed Path Using Early Lower Address Bits
US9904625B2 (en) 2013-03-15 2018-02-27 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
WO2014150991A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for implementing a reduced size register view data structure in a microprocessor
US10275255B2 (en) 2013-03-15 2019-04-30 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
WO2014151043A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for emulating a guest centralized flag architecture by using a native distributed flag architecture
US9436476B2 (en) 2013-03-15 2016-09-06 Soft Machines Inc. Method and apparatus for sorting elements in hardware structures
WO2014150971A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for dependency broadcasting through a block organized source view data structure
US9632825B2 (en) 2013-03-15 2017-04-25 Intel Corporation Method and apparatus for efficient scheduling for asymmetrical execution units
US9627038B2 (en) 2013-03-15 2017-04-18 Intel Corporation Multiport memory cell having improved density area
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US9569216B2 (en) * 2013-03-15 2017-02-14 Soft Machines, Inc. Method for populating a source view data structure by using register template snapshots
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
CN105190541A (zh) * 2013-03-15 2015-12-23 索夫特机械公司 利用具有寄存器视图、源视图、指令视图以及多个注册模板的微处理器体系架构执行指令块的方法
US9582322B2 (en) 2013-03-15 2017-02-28 Soft Machines Inc. Method and apparatus to avoid deadlock during instruction scheduling using dynamic port remapping
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
WO2014150806A1 (en) * 2013-03-15 2014-09-25 Soft Machines, Inc. A method for populating register view data structure by using register template snapshots
GB2514618B (en) * 2013-05-31 2020-11-11 Advanced Risc Mach Ltd Data processing systems
GB2520731B (en) * 2013-11-29 2017-02-08 Imagination Tech Ltd Soft-partitioning of a register file cache
GB2556740A (en) * 2013-11-29 2018-06-06 Imagination Tech Ltd Soft-partitioning of a register file cache
US9471307B2 (en) * 2014-01-03 2016-10-18 Nvidia Corporation System and processor that include an implementation of decoupled pipelines
US9921847B2 (en) * 2014-01-21 2018-03-20 Nvidia Corporation Tree-based thread management
CN106796506B (zh) 2014-05-12 2019-09-27 英特尔公司 用于向自修改代码提供硬件支持的方法和装置
US10061592B2 (en) 2014-06-27 2018-08-28 Samsung Electronics Co., Ltd. Architecture and execution for efficient mixed precision computations in single instruction multiple data/thread (SIMD/T) devices
US10061591B2 (en) 2014-06-27 2018-08-28 Samsung Electronics Company, Ltd. Redundancy elimination in single instruction multiple data/thread (SIMD/T) execution processing
US9996386B2 (en) * 2014-07-23 2018-06-12 Intel Corporation Mid-thread pre-emption with software assisted context switch
US11281481B2 (en) 2014-07-25 2022-03-22 Intel Corporation Using a plurality of conversion tables to implement an instruction set agnostic runtime architecture
US20160026484A1 (en) * 2014-07-25 2016-01-28 Soft Machines, Inc. System converter that executes a just in time optimizer for executing code from a guest image
US10353680B2 (en) 2014-07-25 2019-07-16 Intel Corporation System converter that implements a run ahead run time guest instruction conversion/decoding process and a prefetching process where guest code is pre-fetched from the target of guest branches in an instruction sequence
US9823939B2 (en) 2014-07-25 2017-11-21 Intel Corporation System for an instruction set agnostic runtime architecture
US9348595B1 (en) 2014-12-22 2016-05-24 Centipede Semi Ltd. Run-time code parallelization with continuous monitoring of repetitive instruction sequences
US9946549B2 (en) 2015-03-04 2018-04-17 Qualcomm Incorporated Register renaming in block-based instruction set architecture
US9208066B1 (en) 2015-03-04 2015-12-08 Centipede Semi Ltd. Run-time code parallelization with approximate monitoring of instruction sequences
US10571993B2 (en) * 2015-03-20 2020-02-25 Sanken Electric Co., Ltd. Micro controller unit
CN104778082B (zh) * 2015-03-25 2018-04-24 小米科技有限责任公司 开放式图形库操作的处理方法和装置
US10296350B2 (en) 2015-03-31 2019-05-21 Centipede Semi Ltd. Parallelized execution of instruction sequences
US10296346B2 (en) 2015-03-31 2019-05-21 Centipede Semi Ltd. Parallelized execution of instruction sequences based on pre-monitoring
US9715390B2 (en) * 2015-04-19 2017-07-25 Centipede Semi Ltd. Run-time parallelization of code execution based on an approximate register-access specification
US10409599B2 (en) 2015-06-26 2019-09-10 Microsoft Technology Licensing, Llc Decoding information about a group of instructions including a size of the group of instructions
US10346168B2 (en) 2015-06-26 2019-07-09 Microsoft Technology Licensing, Llc Decoupled processor instruction window and operand buffer
US10175988B2 (en) 2015-06-26 2019-01-08 Microsoft Technology Licensing, Llc Explicit instruction scheduler state information for a processor
US10169044B2 (en) 2015-06-26 2019-01-01 Microsoft Technology Licensing, Llc Processing an encoding format field to interpret header information regarding a group of instructions
US9946548B2 (en) 2015-06-26 2018-04-17 Microsoft Technology Licensing, Llc Age-based management of instruction blocks in a processor instruction window
US10191747B2 (en) 2015-06-26 2019-01-29 Microsoft Technology Licensing, Llc Locking operand values for groups of instructions executed atomically
US9940136B2 (en) 2015-06-26 2018-04-10 Microsoft Technology Licensing, Llc Reuse of decoded instructions
US10409606B2 (en) 2015-06-26 2019-09-10 Microsoft Technology Licensing, Llc Verifying branch targets
US9952867B2 (en) 2015-06-26 2018-04-24 Microsoft Technology Licensing, Llc Mapping instruction blocks based on block size
US11755484B2 (en) 2015-06-26 2023-09-12 Microsoft Technology Licensing, Llc Instruction block allocation
US10936316B2 (en) 2015-09-19 2021-03-02 Microsoft Technology Licensing, Llc Dense read encoding for dataflow ISA
US10452399B2 (en) 2015-09-19 2019-10-22 Microsoft Technology Licensing, Llc Broadcast channel architectures for block-based processors
US10061584B2 (en) 2015-09-19 2018-08-28 Microsoft Technology Licensing, Llc Store nullification in the target field
US10031756B2 (en) 2015-09-19 2018-07-24 Microsoft Technology Licensing, Llc Multi-nullification
US10678544B2 (en) 2015-09-19 2020-06-09 Microsoft Technology Licensing, Llc Initiating instruction block execution using a register access instruction
US10719321B2 (en) 2015-09-19 2020-07-21 Microsoft Technology Licensing, Llc Prefetching instruction blocks
US10198263B2 (en) 2015-09-19 2019-02-05 Microsoft Technology Licensing, Llc Write nullification
US10776115B2 (en) 2015-09-19 2020-09-15 Microsoft Technology Licensing, Llc Debug support for block-based processor
US11016770B2 (en) 2015-09-19 2021-05-25 Microsoft Technology Licensing, Llc Distinct system registers for logical processors
US10768936B2 (en) 2015-09-19 2020-09-08 Microsoft Technology Licensing, Llc Block-based processor including topology and control registers to indicate resource sharing and size of logical processor
US11126433B2 (en) 2015-09-19 2021-09-21 Microsoft Technology Licensing, Llc Block-based processor core composition register
US10095519B2 (en) 2015-09-19 2018-10-09 Microsoft Technology Licensing, Llc Instruction block address register
US11681531B2 (en) 2015-09-19 2023-06-20 Microsoft Technology Licensing, Llc Generation and use of memory access instruction order encodings
US11977891B2 (en) 2015-09-19 2024-05-07 Microsoft Technology Licensing, Llc Implicit program order
US10871967B2 (en) 2015-09-19 2020-12-22 Microsoft Technology Licensing, Llc Register read/write ordering
US10180840B2 (en) 2015-09-19 2019-01-15 Microsoft Technology Licensing, Llc Dynamic generation of null instructions
MA44821A (fr) * 2016-02-27 2019-01-02 Kinzinger Automation Gmbh Procédé d'allocation d'une pile de registres virtuels dans une machine à pile
GB2547912B (en) * 2016-03-02 2019-01-30 Advanced Risc Mach Ltd Register access control
US11106467B2 (en) 2016-04-28 2021-08-31 Microsoft Technology Licensing, Llc Incremental scheduler for out-of-order block ISA processors
US10423446B2 (en) 2016-11-28 2019-09-24 Arm Limited Data processing
US10552212B2 (en) * 2016-11-28 2020-02-04 Arm Limited Data processing
US10671426B2 (en) 2016-11-28 2020-06-02 Arm Limited Data processing
US11531552B2 (en) 2017-02-06 2022-12-20 Microsoft Technology Licensing, Llc Executing multiple programs simultaneously on a processor core
US10558464B2 (en) * 2017-02-09 2020-02-11 International Business Machines Corporation Infinite processor thread balancing
US10831537B2 (en) 2017-02-17 2020-11-10 International Business Machines Corporation Dynamic update of the number of architected registers assigned to software threads using spill counts
CA3053391A1 (en) 2017-02-20 2018-08-23 Intensivate Inc. Super-thread processor
US10572265B2 (en) 2017-04-18 2020-02-25 International Business Machines Corporation Selecting register restoration or register reloading
US11010192B2 (en) 2017-04-18 2021-05-18 International Business Machines Corporation Register restoration using recovery buffers
US10489382B2 (en) 2017-04-18 2019-11-26 International Business Machines Corporation Register restoration invalidation based on a context switch
US10782979B2 (en) 2017-04-18 2020-09-22 International Business Machines Corporation Restoring saved architected registers and suppressing verification of registers to be restored
US10838733B2 (en) 2017-04-18 2020-11-17 International Business Machines Corporation Register context restoration based on rename register recovery
US10564977B2 (en) 2017-04-18 2020-02-18 International Business Machines Corporation Selective register allocation
US10540184B2 (en) 2017-04-18 2020-01-21 International Business Machines Corporation Coalescing store instructions for restoration
US10963261B2 (en) 2017-04-18 2021-03-30 International Business Machines Corporation Sharing snapshots across save requests
US10649785B2 (en) 2017-04-18 2020-05-12 International Business Machines Corporation Tracking changes to memory via check and recovery
US10552164B2 (en) 2017-04-18 2020-02-04 International Business Machines Corporation Sharing snapshots between restoration and recovery
US10545766B2 (en) 2017-04-18 2020-01-28 International Business Machines Corporation Register restoration using transactional memory register snapshots
US10740108B2 (en) 2017-04-18 2020-08-11 International Business Machines Corporation Management of store queue based on restoration operation
US10877757B2 (en) * 2017-11-14 2020-12-29 Nvidia Corporation Binding constants at runtime for improved resource utilization
US10963379B2 (en) 2018-01-30 2021-03-30 Microsoft Technology Licensing, Llc Coupling wide memory interface to wide write back paths
US11132228B2 (en) 2018-03-21 2021-09-28 International Business Machines Corporation SMT processor to create a virtual vector register file for a borrower thread from a number of donated vector register files
US10922780B2 (en) 2018-04-10 2021-02-16 Graphisoft Se Method to distribute the drawing calculation of architectural data elements between multiple threads
US10691430B2 (en) * 2018-08-27 2020-06-23 Intel Corporation Latency scheduling mehanism
US10824429B2 (en) 2018-09-19 2020-11-03 Microsoft Technology Licensing, Llc Commit logic and precise exceptions in explicit dataflow graph execution architectures
US10922139B2 (en) 2018-10-11 2021-02-16 Visa International Service Association System, method, and computer program product for processing large data sets by balancing entropy between distributed data segments
JP7217341B2 (ja) * 2019-04-18 2023-02-02 株式会社エヌエスアイテクス プロセッサおよびレジスタの継承方法
US11216278B2 (en) 2019-08-12 2022-01-04 Advanced New Technologies Co., Ltd. Multi-thread processing
CN110569067B (zh) * 2019-08-12 2021-07-13 创新先进技术有限公司 用于多线程处理的方法、装置及系统
US10990393B1 (en) * 2019-10-21 2021-04-27 Advanced Micro Devices, Inc. Address-based filtering for load/store speculation
CN112925566A (zh) * 2019-12-06 2021-06-08 中科寒武纪科技股份有限公司 建立虚拟寄存器生存区间的方法和装置及编译方法和装置
US11709664B2 (en) * 2020-06-02 2023-07-25 SambaNova Systems, Inc. Anti-congestion flow control for reconfigurable processors
US11392384B2 (en) 2020-09-04 2022-07-19 Alibaba Group Holding Limited Methods of breaking down coarse-grained tasks for fine-grained task re-scheduling
US20220100518A1 (en) * 2020-09-25 2022-03-31 Advanced Micro Devices, Inc. Compression metadata assisted computation
CN112445616B (zh) * 2020-11-25 2023-03-21 海光信息技术股份有限公司 资源分配方法以及装置
US11907718B2 (en) * 2020-12-31 2024-02-20 Micron Technology, Inc. Loop execution in a reconfigurable compute fabric using flow controllers for respective synchronous flows
US11442696B1 (en) 2021-03-23 2022-09-13 SambaNova Systems, Inc. Floating point multiply-add, accumulate unit with exception processing
CN113076182B (zh) * 2021-03-24 2024-03-29 成都海光集成电路设计有限公司 计算任务的恢复方法、装置、用户设备及存储介质
US11966726B2 (en) * 2022-02-25 2024-04-23 International Business Machines Corporation Operating system (OS) scheduler and compiler for code generation optimization in a (simultaneous multi-threading) SMT enabled CPU
CN114495399B (zh) * 2022-04-01 2022-07-22 武汉中科通达高新技术股份有限公司 国标相机报警信息的处理系统、方法、计算机设备及介质
US11954492B1 (en) 2022-09-19 2024-04-09 Apple Inc. Fence enforcement techniques based on stall characteristics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5829028A (en) * 1996-05-06 1998-10-27 Advanced Micro Devices, Inc. Data cache configured to store data in a use-once manner
CN1613056A (zh) * 2002-01-03 2005-05-04 英特尔公司 相关性链处理器
CN1650260A (zh) * 2002-05-01 2005-08-03 皇家飞利浦电子股份有限公司 基于存储区的数据预取

Family Cites Families (490)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US727487A (en) 1902-10-21 1903-05-05 Swan F Swanson Dumping-car.
US4075704A (en) 1976-07-02 1978-02-21 Floating Point Systems, Inc. Floating point data processor for high speech operation
US4228496A (en) 1976-09-07 1980-10-14 Tandem Computers Incorporated Multiprocessor system
US4245344A (en) 1979-04-02 1981-01-13 Rockwell International Corporation Processing system with dual buses
US4527237A (en) 1979-10-11 1985-07-02 Nanodata Computer Corporation Data processing system
US4414624A (en) 1980-11-19 1983-11-08 The United States Of America As Represented By The Secretary Of The Navy Multiple-microcomputer processing
US4524415A (en) 1982-12-07 1985-06-18 Motorola, Inc. Virtual machine data processor
US4597061B1 (en) 1983-01-03 1998-06-09 Texas Instruments Inc Memory system using pipleline circuitry for improved system
US4577273A (en) 1983-06-06 1986-03-18 Sperry Corporation Multiple microcomputer system for digital computers
US4682281A (en) 1983-08-30 1987-07-21 Amdahl Corporation Data storage unit employing translation lookaside buffer pointer
US4633434A (en) 1984-04-02 1986-12-30 Sperry Corporation High performance storage unit
US4600986A (en) 1984-04-02 1986-07-15 Sperry Corporation Pipelined split stack with high performance interleaved decode
JPS6140643A (ja) 1984-07-31 1986-02-26 Hitachi Ltd システムの資源割当て制御方式
US4835680A (en) 1985-03-15 1989-05-30 Xerox Corporation Adaptive processor array capable of learning variable associations useful in recognizing classes of inputs
JPS6289149A (ja) 1985-10-15 1987-04-23 Agency Of Ind Science & Technol 多ポ−トメモリシステム
JPH0658650B2 (ja) 1986-03-14 1994-08-03 株式会社日立製作所 仮想計算機システム
US4920477A (en) 1987-04-20 1990-04-24 Multiflow Computer, Inc. Virtual address table look aside buffer miss recovery method and apparatus
US4943909A (en) 1987-07-08 1990-07-24 At&T Bell Laboratories Computational origami
US5339398A (en) 1989-07-31 1994-08-16 North American Philips Corporation Memory architecture and method of data organization optimized for hashing
US5471593A (en) * 1989-12-11 1995-11-28 Branigin; Michael H. Computer processor with an efficient means of executing many instructions simultaneously
US5197130A (en) 1989-12-29 1993-03-23 Supercomputer Systems Limited Partnership Cluster architecture for a highly parallel scalar/vector multiprocessor system
US5317754A (en) 1990-10-23 1994-05-31 International Business Machines Corporation Method and apparatus for enabling an interpretive execution subset
US5317705A (en) 1990-10-24 1994-05-31 International Business Machines Corporation Apparatus and method for TLB purge reduction in a multi-level machine system
US6282583B1 (en) 1991-06-04 2001-08-28 Silicon Graphics, Inc. Method and apparatus for memory access in a matrix processor computer
US5539911A (en) 1991-07-08 1996-07-23 Seiko Epson Corporation High-performance, superscalar-based computer system with out-of-order instruction execution
JPH0820949B2 (ja) 1991-11-26 1996-03-04 松下電器産業株式会社 情報処理装置
GB2277181B (en) 1991-12-23 1995-12-13 Intel Corp Interleaved cache for multiple accesses per clock in a microprocessor
KR100309566B1 (ko) 1992-04-29 2001-12-15 리패치 파이프라인프로세서에서다중명령어를무리짓고,그룹화된명령어를동시에발행하고,그룹화된명령어를실행시키는방법및장치
DE69308548T2 (de) * 1992-05-01 1997-06-12 Seiko Epson Corp Vorrichtung und verfahren zum befehlsabschluss in einem superskalaren prozessor.
EP0576262B1 (en) 1992-06-25 2000-08-23 Canon Kabushiki Kaisha Apparatus for multiplying integers of many figures
JPH0637202A (ja) 1992-07-20 1994-02-10 Mitsubishi Electric Corp マイクロ波ic用パッケージ
JPH06110781A (ja) 1992-09-30 1994-04-22 Nec Corp キャッシュメモリ装置
US5493660A (en) 1992-10-06 1996-02-20 Hewlett-Packard Company Software assisted hardware TLB miss handler
US5513335A (en) 1992-11-02 1996-04-30 Sgs-Thomson Microelectronics, Inc. Cache tag memory having first and second single-port arrays and a dual-port array
US5819088A (en) 1993-03-25 1998-10-06 Intel Corporation Method and apparatus for scheduling instructions for execution on a multi-issue architecture computer
JPH0784883A (ja) 1993-09-17 1995-03-31 Hitachi Ltd 仮想計算機システムのアドレス変換バッファパージ方法
US6948172B1 (en) * 1993-09-21 2005-09-20 Microsoft Corporation Preemptive multi-tasking with cooperative groups of tasks
US5469376A (en) 1993-10-14 1995-11-21 Abdallah; Mohammad A. F. F. Digital circuit for the evaluation of mathematical expressions
US5517651A (en) 1993-12-29 1996-05-14 Intel Corporation Method and apparatus for loading a segment register in a microprocessor capable of operating in multiple modes
US5761476A (en) 1993-12-30 1998-06-02 Intel Corporation Non-clocked early read for back-to-back scheduling of instructions
US5956753A (en) 1993-12-30 1999-09-21 Intel Corporation Method and apparatus for handling speculative memory access operations
JP3048498B2 (ja) 1994-04-13 2000-06-05 株式会社東芝 半導体記憶装置
JPH07287668A (ja) 1994-04-19 1995-10-31 Hitachi Ltd データ処理装置
CN1084005C (zh) 1994-06-27 2002-05-01 国际商业机器公司 用于动态控制地址空间分配的方法和设备
US5548742A (en) 1994-08-11 1996-08-20 Intel Corporation Method and apparatus for combining a direct-mapped cache and a multiple-way cache in a cache memory
US5813031A (en) 1994-09-21 1998-09-22 Industrial Technology Research Institute Caching tag for a large scale cache computer memory system
US5640534A (en) 1994-10-05 1997-06-17 International Business Machines Corporation Method and system for concurrent access in a data cache array utilizing multiple match line selection paths
US5835951A (en) 1994-10-18 1998-11-10 National Semiconductor Branch processing unit with target cache read prioritization protocol for handling multiple hits
JP3569014B2 (ja) 1994-11-25 2004-09-22 富士通株式会社 マルチコンテキストをサポートするプロセッサおよび処理方法
US5724565A (en) 1995-02-03 1998-03-03 International Business Machines Corporation Method and system for processing first and second sets of instructions by first and second types of processing systems
US5649136A (en) 1995-02-14 1997-07-15 Hal Computer Systems, Inc. Processor structure and method for maintaining and restoring precise state at any instruction boundary
US5675759A (en) 1995-03-03 1997-10-07 Shebanow; Michael C. Method and apparatus for register management using issue sequence prior physical register and register association validity information
US5634068A (en) 1995-03-31 1997-05-27 Sun Microsystems, Inc. Packet switched cache coherent multiprocessor system
US5751982A (en) 1995-03-31 1998-05-12 Apple Computer, Inc. Software emulation system with dynamic translation of emulated instructions for increased processing speed
US6209085B1 (en) * 1995-05-05 2001-03-27 Intel Corporation Method and apparatus for performing process switching in multiprocessor computer systems
US6643765B1 (en) 1995-08-16 2003-11-04 Microunity Systems Engineering, Inc. Programmable processor with group floating point operations
US5710902A (en) 1995-09-06 1998-01-20 Intel Corporation Instruction dependency chain indentifier
US6341324B1 (en) 1995-10-06 2002-01-22 Lsi Logic Corporation Exception processing in superscalar microprocessor
US5864657A (en) 1995-11-29 1999-01-26 Texas Micro, Inc. Main memory system and checkpointing protocol for fault-tolerant computer system
US5983327A (en) 1995-12-01 1999-11-09 Nortel Networks Corporation Data path architecture and arbitration scheme for providing access to a shared system resource
US5793941A (en) 1995-12-04 1998-08-11 Advanced Micro Devices, Inc. On-chip primary cache testing circuit and test method
US5911057A (en) 1995-12-19 1999-06-08 Texas Instruments Incorporated Superscalar microprocessor having combined register and memory renaming circuits, systems, and methods
US5699537A (en) 1995-12-22 1997-12-16 Intel Corporation Processor microarchitecture for efficient dynamic scheduling and execution of chains of dependent instructions
US6882177B1 (en) 1996-01-10 2005-04-19 Altera Corporation Tristate structures for programmable logic devices
US5754818A (en) 1996-03-22 1998-05-19 Sun Microsystems, Inc. Architecture and method for sharing TLB entries through process IDS
US5904892A (en) 1996-04-01 1999-05-18 Saint-Gobain/Norton Industrial Ceramics Corp. Tape cast silicon carbide dummy wafer
US5752260A (en) 1996-04-29 1998-05-12 International Business Machines Corporation High-speed, multiple-port, interleaved cache with arbitration of multiple access addresses
US5806085A (en) 1996-05-01 1998-09-08 Sun Microsystems, Inc. Method for non-volatile caching of network and CD-ROM file accesses using a cache directory, pointers, file name conversion, a local hard disk, and separate small database
US6108769A (en) 1996-05-17 2000-08-22 Advanced Micro Devices, Inc. Dependency table for reducing dependency checking hardware
US5881277A (en) 1996-06-13 1999-03-09 Texas Instruments Incorporated Pipelined microprocessor with branch misprediction cache circuits, systems and methods
US5860146A (en) 1996-06-25 1999-01-12 Sun Microsystems, Inc. Auxiliary translation lookaside buffer for assisting in accessing data in remote address spaces
US5903760A (en) 1996-06-27 1999-05-11 Intel Corporation Method and apparatus for translating a conditional instruction compatible with a first instruction set architecture (ISA) into a conditional instruction compatible with a second ISA
US5974506A (en) 1996-06-28 1999-10-26 Digital Equipment Corporation Enabling mirror, nonmirror and partial mirror cache modes in a dual cache system
US6167490A (en) 1996-09-20 2000-12-26 University Of Washington Using global memory information to manage memory in a computer network
KR19980032776A (ko) 1996-10-16 1998-07-25 가나이 츠토무 데이타 프로세서 및 데이타 처리시스템
JP2000507016A (ja) 1996-11-04 2000-06-06 フィリップス エレクトロニクス ネムローゼ フェンノートシャップ メモリの命令を読み出す処理装置
US6385715B1 (en) 1996-11-13 2002-05-07 Intel Corporation Multi-threading for a processor utilizing a replay queue
US6253316B1 (en) 1996-11-19 2001-06-26 Advanced Micro Devices, Inc. Three state branch history using one bit in a branch prediction mechanism
US5978906A (en) 1996-11-19 1999-11-02 Advanced Micro Devices, Inc. Branch selectors associated with byte ranges within an instruction cache for rapidly identifying branch predictions
US5903750A (en) 1996-11-20 1999-05-11 Institute For The Development Of Emerging Architectures, L.L.P. Dynamic branch prediction for branch instructions with multiple targets
US6212542B1 (en) * 1996-12-16 2001-04-03 International Business Machines Corporation Method and system for executing a program within a multiscalar processor by processing linked thread descriptors
US6134634A (en) 1996-12-20 2000-10-17 Texas Instruments Incorporated Method and apparatus for preemptive cache write-back
US5918251A (en) 1996-12-23 1999-06-29 Intel Corporation Method and apparatus for preloading different default address translation attributes
US6016540A (en) 1997-01-08 2000-01-18 Intel Corporation Method and apparatus for scheduling instructions in waves
US6065105A (en) 1997-01-08 2000-05-16 Intel Corporation Dependency matrix
US5802602A (en) 1997-01-17 1998-09-01 Intel Corporation Method and apparatus for performing reads of related data from a set-associative cache memory
US6088780A (en) 1997-03-31 2000-07-11 Institute For The Development Of Emerging Architecture, L.L.C. Page table walker that uses at least one of a default page size and a page size selected for a virtual address space to position a sliding field in a virtual address
US6314511B2 (en) * 1997-04-03 2001-11-06 University Of Washington Mechanism for freeing registers on processors that perform dynamic out-of-order execution of instructions using renaming registers
US6075938A (en) 1997-06-10 2000-06-13 The Board Of Trustees Of The Leland Stanford Junior University Virtual machine monitors for scalable multiprocessors
US6073230A (en) 1997-06-11 2000-06-06 Advanced Micro Devices, Inc. Instruction fetch unit configured to provide sequential way prediction for sequential instruction fetches
JPH1124929A (ja) 1997-06-30 1999-01-29 Sony Corp 演算処理装置およびその方法
US6128728A (en) 1997-08-01 2000-10-03 Micron Technology, Inc. Virtual shadow registers and virtual register windows
US6170051B1 (en) * 1997-08-01 2001-01-02 Micron Technology, Inc. Apparatus and method for program level parallelism in a VLIW processor
US6085315A (en) 1997-09-12 2000-07-04 Siemens Aktiengesellschaft Data processing device with loop pipeline
US6101577A (en) 1997-09-15 2000-08-08 Advanced Micro Devices, Inc. Pipelined instruction cache and branch prediction mechanism therefor
US5901294A (en) 1997-09-18 1999-05-04 International Business Machines Corporation Method and system for bus arbitration in a multiprocessor system utilizing simultaneous variable-width bus access
US6185660B1 (en) * 1997-09-23 2001-02-06 Hewlett-Packard Company Pending access queue for providing data to a target register during an intermediate pipeline phase after a computer cache miss
US5905509A (en) 1997-09-30 1999-05-18 Compaq Computer Corp. Accelerated Graphics Port two level Gart cache having distributed first level caches
US6226732B1 (en) 1997-10-02 2001-05-01 Hitachi Micro Systems, Inc. Memory system architecture
US5922065A (en) * 1997-10-13 1999-07-13 Institute For The Development Of Emerging Architectures, L.L.C. Processor utilizing a template field for encoding instruction sequences in a wide-word format
US6178482B1 (en) * 1997-11-03 2001-01-23 Brecis Communications Virtual register sets
US6021484A (en) 1997-11-14 2000-02-01 Samsung Electronics Co., Ltd. Dual instruction set architecture
US6256728B1 (en) 1997-11-17 2001-07-03 Advanced Micro Devices, Inc. Processor configured to selectively cancel instructions from its pipeline responsive to a predicted-taken short forward branch instruction
US6260131B1 (en) 1997-11-18 2001-07-10 Intrinsity, Inc. Method and apparatus for TLB memory ordering
US6016533A (en) 1997-12-16 2000-01-18 Advanced Micro Devices, Inc. Way prediction logic for cache array
US6219776B1 (en) 1998-03-10 2001-04-17 Billions Of Operations Per Second Merged array controller and processing element
US6609189B1 (en) 1998-03-12 2003-08-19 Yale University Cycle segmented prefix circuits
JP3657424B2 (ja) 1998-03-20 2005-06-08 松下電器産業株式会社 番組情報を放送するセンター装置と端末装置
US6216215B1 (en) 1998-04-02 2001-04-10 Intel Corporation Method and apparatus for senior loads
US6157998A (en) 1998-04-03 2000-12-05 Motorola Inc. Method for performing branch prediction and resolution of two or more branch instructions within two or more branch prediction buffers
US6205545B1 (en) 1998-04-30 2001-03-20 Hewlett-Packard Company Method and apparatus for using static branch predictions hints with dynamically translated code traces to improve performance
US6115809A (en) 1998-04-30 2000-09-05 Hewlett-Packard Company Compiling strong and weak branching behavior instruction blocks to separate caches for dynamic and static prediction
US6256727B1 (en) 1998-05-12 2001-07-03 International Business Machines Corporation Method and system for fetching noncontiguous instructions in a single clock cycle
JPH11338710A (ja) 1998-05-28 1999-12-10 Toshiba Corp 複数種の命令セットを持つプロセッサのためのコンパイル方法ならびに装置および同方法がプログラムされ記録される記録媒体
US6272616B1 (en) 1998-06-17 2001-08-07 Agere Systems Guardian Corp. Method and apparatus for executing multiple instruction streams in a digital processor with multiple data paths
US6988183B1 (en) 1998-06-26 2006-01-17 Derek Chi-Lan Wong Methods for increasing instruction-level parallelism in microprocessors and digital system
US6260138B1 (en) 1998-07-17 2001-07-10 Sun Microsystems, Inc. Method and apparatus for branch instruction processing in a processor
US6122656A (en) 1998-07-31 2000-09-19 Advanced Micro Devices, Inc. Processor configured to map logical register numbers to physical register numbers using virtual register numbers
US6272662B1 (en) 1998-08-04 2001-08-07 International Business Machines Corporation Distributed storage system using front-end and back-end locking
JP2000057054A (ja) 1998-08-12 2000-02-25 Fujitsu Ltd 高速アドレス変換システム
US8631066B2 (en) 1998-09-10 2014-01-14 Vmware, Inc. Mechanism for providing virtual machines for use by multiple users
US6339822B1 (en) 1998-10-02 2002-01-15 Advanced Micro Devices, Inc. Using padded instructions in a block-oriented cache
US6332189B1 (en) 1998-10-16 2001-12-18 Intel Corporation Branch prediction architecture
GB9825102D0 (en) 1998-11-16 1999-01-13 Insignia Solutions Plc Computer system
JP3110404B2 (ja) 1998-11-18 2000-11-20 甲府日本電気株式会社 マイクロプロセッサ装置及びそのソフトウェア命令高速化方法並びにその制御プログラムを記録した記録媒体
US6490673B1 (en) 1998-11-27 2002-12-03 Matsushita Electric Industrial Co., Ltd Processor, compiling apparatus, and compile program recorded on a recording medium
US6519682B2 (en) 1998-12-04 2003-02-11 Stmicroelectronics, Inc. Pipelined non-blocking level two cache system with inherent transaction collision-avoidance
US6477562B2 (en) 1998-12-16 2002-11-05 Clearwater Networks, Inc. Prioritized instruction scheduling for multi-streaming processors
US7020879B1 (en) 1998-12-16 2006-03-28 Mips Technologies, Inc. Interrupt and exception handling for multi-streaming digital processors
US6247097B1 (en) 1999-01-22 2001-06-12 International Business Machines Corporation Aligned instruction cache handling of instruction fetches across multiple predicted branch instructions
US6321298B1 (en) 1999-01-25 2001-11-20 International Business Machines Corporation Full cache coherency across multiple raid controllers
JP3842474B2 (ja) 1999-02-02 2006-11-08 株式会社ルネサステクノロジ データ処理装置
US6327650B1 (en) * 1999-02-12 2001-12-04 Vsli Technology, Inc. Pipelined multiprocessing with upstream processor concurrently writing to local register and to register of downstream processor
US6732220B2 (en) 1999-02-17 2004-05-04 Elbrus International Method for emulating hardware features of a foreign architecture in a host operating system environment
US6668316B1 (en) 1999-02-17 2003-12-23 Elbrus International Limited Method and apparatus for conflict-free execution of integer and floating-point operations with a common register file
US6418530B2 (en) 1999-02-18 2002-07-09 Hewlett-Packard Company Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions
US6437789B1 (en) 1999-02-19 2002-08-20 Evans & Sutherland Computer Corporation Multi-level cache controller
US6850531B1 (en) 1999-02-23 2005-02-01 Alcatel Multi-service network switch
US6212613B1 (en) 1999-03-22 2001-04-03 Cisco Technology, Inc. Methods and apparatus for reusing addresses in a computer
US6529928B1 (en) 1999-03-23 2003-03-04 Silicon Graphics, Inc. Floating-point adder performing floating-point and integer operations
DE69938621D1 (de) 1999-05-03 2008-06-12 St Microelectronics Sa Befehlausgabe in einem Rechner
US6449671B1 (en) 1999-06-09 2002-09-10 Ati International Srl Method and apparatus for busing data elements
US6473833B1 (en) * 1999-07-30 2002-10-29 International Business Machines Corporation Integrated cache and directory structure for multi-level caches
US6643770B1 (en) 1999-09-16 2003-11-04 Intel Corporation Branch misprediction recovery using a side memory
US6772325B1 (en) 1999-10-01 2004-08-03 Hitachi, Ltd. Processor architecture and operation for exploiting improved branch control instruction
US6704822B1 (en) 1999-10-01 2004-03-09 Sun Microsystems, Inc. Arbitration protocol for a shared data cache
US6457120B1 (en) 1999-11-01 2002-09-24 International Business Machines Corporation Processor and method including a cache having confirmation bits for improving address predictable branch instruction target predictions
US7441110B1 (en) 1999-12-10 2008-10-21 International Business Machines Corporation Prefetching using future branch path information derived from branch prediction
US7107434B2 (en) 1999-12-20 2006-09-12 Board Of Regents, The University Of Texas System, method and apparatus for allocating hardware resources using pseudorandom sequences
JP4693326B2 (ja) 1999-12-22 2011-06-01 ウビコム インコーポレイテッド 組込み型プロセッサにおいてゼロタイムコンテクストスイッチを用いて命令レベルをマルチスレッド化するシステムおよび方法
US6557095B1 (en) 1999-12-27 2003-04-29 Intel Corporation Scheduling operations using a dependency matrix
EP1244962B1 (en) 2000-01-03 2003-10-08 Advanced Micro Devices, Inc. Scheduler capable of issuing and reissuing dependency chains
US6542984B1 (en) 2000-01-03 2003-04-01 Advanced Micro Devices, Inc. Scheduler capable of issuing and reissuing dependency chains
US6594755B1 (en) 2000-01-04 2003-07-15 National Semiconductor Corporation System and method for interleaved execution of multiple independent threads
US6728872B1 (en) 2000-02-04 2004-04-27 International Business Machines Corporation Method and apparatus for verifying that instructions are pipelined in correct architectural sequence
GB0002848D0 (en) * 2000-02-08 2000-03-29 Siroyan Limited Communicating instruction results in processors and compiling methods for processors
GB2365661A (en) 2000-03-10 2002-02-20 British Telecomm Allocating switch requests within a packet switch
US6615340B1 (en) * 2000-03-22 2003-09-02 Wilmot, Ii Richard Byron Extended operand management indicator structure and method
US6604187B1 (en) 2000-06-19 2003-08-05 Advanced Micro Devices, Inc. Providing global translations with address space numbers
US6557083B1 (en) 2000-06-30 2003-04-29 Intel Corporation Memory system for multiple data types
US6704860B1 (en) 2000-07-26 2004-03-09 International Business Machines Corporation Data processing system and method for fetching instruction blocks in response to a detected block sequence
US7206925B1 (en) * 2000-08-18 2007-04-17 Sun Microsystems, Inc. Backing Register File for processors
US6728866B1 (en) 2000-08-31 2004-04-27 International Business Machines Corporation Partitioned issue queue and allocation strategy
US6721874B1 (en) * 2000-10-12 2004-04-13 International Business Machines Corporation Method and system for dynamically shared completion table supporting multiple threads in a processing system
US6639866B2 (en) * 2000-11-03 2003-10-28 Broadcom Corporation Very small swing high performance asynchronous CMOS static memory (multi-port register file) with power reducing column multiplexing scheme
US7757065B1 (en) 2000-11-09 2010-07-13 Intel Corporation Instruction segment recording scheme
JP2002185513A (ja) 2000-12-18 2002-06-28 Hitachi Ltd パケット通信ネットワークおよびパケット転送制御方法
US6877089B2 (en) 2000-12-27 2005-04-05 International Business Machines Corporation Branch prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program
US6907600B2 (en) 2000-12-27 2005-06-14 Intel Corporation Virtual translation lookaside buffer
US6647466B2 (en) 2001-01-25 2003-11-11 Hewlett-Packard Development Company, L.P. Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy
FR2820921A1 (fr) 2001-02-14 2002-08-16 Canon Kk Dispositif et procede de transmission dans un commutateur
US6985951B2 (en) 2001-03-08 2006-01-10 International Business Machines Corporation Inter-partition message passing method, system and program product for managing workload in a partitioned processing environment
US6950927B1 (en) * 2001-04-13 2005-09-27 The United States Of America As Represented By The Secretary Of The Navy System and method for instruction-level parallelism in a programmable multiple network processor environment
US7200740B2 (en) 2001-05-04 2007-04-03 Ip-First, Llc Apparatus and method for speculatively performing a return instruction in a microprocessor
US7707397B2 (en) 2001-05-04 2010-04-27 Via Technologies, Inc. Variable group associativity branch target address cache delivering multiple target addresses per cache line
US6658549B2 (en) 2001-05-22 2003-12-02 Hewlett-Packard Development Company, Lp. Method and system allowing a single entity to manage memory comprising compressed and uncompressed data
US6985591B2 (en) 2001-06-29 2006-01-10 Intel Corporation Method and apparatus for distributing keys for decrypting and re-encrypting publicly distributed media
US7203824B2 (en) 2001-07-03 2007-04-10 Ip-First, Llc Apparatus and method for handling BTAC branches that wrap across instruction cache lines
US7024545B1 (en) 2001-07-24 2006-04-04 Advanced Micro Devices, Inc. Hybrid branch prediction device with two levels of branch prediction cache
US6954846B2 (en) 2001-08-07 2005-10-11 Sun Microsystems, Inc. Microprocessor and method for giving each thread exclusive access to one register file in a multi-threading mode and for giving an active thread access to multiple register files in a single thread mode
KR20030018999A (ko) 2001-08-31 2003-03-06 엘지전자 주식회사 동기식 디지털 계위의 가상상자 포인터 생성을 위한스터프 타임 결정장치 및 그 방법
US6718440B2 (en) 2001-09-28 2004-04-06 Intel Corporation Memory access latency hiding with hint buffer
US7150021B1 (en) 2001-10-12 2006-12-12 Palau Acquisition Corporation (Delaware) Method and system to allocate resources within an interconnect device according to a resource allocation table
US7117347B2 (en) 2001-10-23 2006-10-03 Ip-First, Llc Processor including fallback branch prediction mechanism for far jump and far call instructions
US7272832B2 (en) * 2001-10-25 2007-09-18 Hewlett-Packard Development Company, L.P. Method of protecting user process data in a secure platform inaccessible to the operating system and other tasks on top of the secure platform
US6964043B2 (en) 2001-10-30 2005-11-08 Intel Corporation Method, apparatus, and system to optimize frequently executed code and to use compiler transformation and hardware support to handle infrequently executed code
GB2381886B (en) 2001-11-07 2004-06-23 Sun Microsystems Inc Computer system with virtual memory and paging mechanism
US7092869B2 (en) 2001-11-14 2006-08-15 Ronald Hilton Memory address prediction under emulation
US6640333B2 (en) 2002-01-10 2003-10-28 Lsi Logic Corporation Architecture for a sea of platforms
US7055021B2 (en) 2002-02-05 2006-05-30 Sun Microsystems, Inc. Out-of-order processor that reduces mis-speculation using a replay scoreboard
US7331040B2 (en) 2002-02-06 2008-02-12 Transitive Limted Condition code flag emulation for program code conversion
US20030154363A1 (en) * 2002-02-11 2003-08-14 Soltis Donald C. Stacked register aliasing in data hazard detection to reduce circuit
US6839816B2 (en) 2002-02-26 2005-01-04 International Business Machines Corporation Shared cache line update mechanism
US6731292B2 (en) 2002-03-06 2004-05-04 Sun Microsystems, Inc. System and method for controlling a number of outstanding data transactions within an integrated circuit
JP3719509B2 (ja) 2002-04-01 2005-11-24 株式会社ソニー・コンピュータエンタテインメント シリアル演算パイプライン、演算装置、算術論理演算回路およびシリアル演算パイプラインによる演算方法
US7565509B2 (en) 2002-04-17 2009-07-21 Microsoft Corporation Using limits on address translation to control access to an addressable entity
US6920530B2 (en) 2002-04-23 2005-07-19 Sun Microsystems, Inc. Scheme for reordering instructions via an instruction caching mechanism
US7113488B2 (en) 2002-04-24 2006-09-26 International Business Machines Corporation Reconfigurable circular bus
US7281055B2 (en) 2002-05-28 2007-10-09 Newisys, Inc. Routing mechanisms in systems having multiple multi-processor clusters
US7117346B2 (en) * 2002-05-31 2006-10-03 Freescale Semiconductor, Inc. Data processing system having multiple register contexts and method therefor
US6938151B2 (en) 2002-06-04 2005-08-30 International Business Machines Corporation Hybrid branch prediction using a global selection counter and a prediction method comparison table
US6735747B2 (en) 2002-06-10 2004-05-11 Lsi Logic Corporation Pre-silicon verification path coverage
US8024735B2 (en) 2002-06-14 2011-09-20 Intel Corporation Method and apparatus for ensuring fairness and forward progress when executing multiple threads of execution
JP3845043B2 (ja) 2002-06-28 2006-11-15 富士通株式会社 命令フェッチ制御装置
JP3982353B2 (ja) 2002-07-12 2007-09-26 日本電気株式会社 フォルトトレラントコンピュータ装置、その再同期化方法及び再同期化プログラム
US6944744B2 (en) 2002-08-27 2005-09-13 Advanced Micro Devices, Inc. Apparatus and method for independently schedulable functional units with issue lock mechanism in a processor
US7546422B2 (en) 2002-08-28 2009-06-09 Intel Corporation Method and apparatus for the synchronization of distributed caches
US6950925B1 (en) 2002-08-28 2005-09-27 Advanced Micro Devices, Inc. Scheduler for use in a microprocessor that supports data-speculative execution
TW200408242A (en) 2002-09-06 2004-05-16 Matsushita Electric Ind Co Ltd Home terminal apparatus and communication system
US6895491B2 (en) 2002-09-26 2005-05-17 Hewlett-Packard Development Company, L.P. Memory addressing for a virtual machine implementation on a computer processor supporting virtual hash-page-table searching
US7334086B2 (en) 2002-10-08 2008-02-19 Rmi Corporation Advanced processor with system on a chip interconnect technology
US6829698B2 (en) 2002-10-10 2004-12-07 International Business Machines Corporation Method, apparatus and system for acquiring a global promotion facility utilizing a data-less transaction
US7213248B2 (en) 2002-10-10 2007-05-01 International Business Machines Corporation High speed promotion mechanism suitable for lock acquisition in a multiprocessor data processing system
US7222218B2 (en) 2002-10-22 2007-05-22 Sun Microsystems, Inc. System and method for goal-based scheduling of blocks of code for concurrent execution
US20040103251A1 (en) 2002-11-26 2004-05-27 Mitchell Alsup Microprocessor including a first level cache and a second level cache having different cache line sizes
WO2004051449A2 (en) * 2002-12-04 2004-06-17 Koninklijke Philips Electronics N.V. Register file gating to reduce microprocessor power dissipation
US6981083B2 (en) 2002-12-05 2005-12-27 International Business Machines Corporation Processor virtualization mechanism via an enhanced restoration of hard architected states
US7073042B2 (en) 2002-12-12 2006-07-04 Intel Corporation Reclaiming existing fields in address translation data structures to extend control over memory accesses
US20040117594A1 (en) 2002-12-13 2004-06-17 Vanderspek Julius Memory management method
US20040122887A1 (en) 2002-12-20 2004-06-24 Macy William W. Efficient multiplication of small matrices using SIMD registers
US7191349B2 (en) 2002-12-26 2007-03-13 Intel Corporation Mechanism for processor power state aware distribution of lowest priority interrupt
US20040139441A1 (en) * 2003-01-09 2004-07-15 Kabushiki Kaisha Toshiba Processor, arithmetic operation processing method, and priority determination method
US6925421B2 (en) 2003-01-09 2005-08-02 International Business Machines Corporation Method, system, and computer program product for estimating the number of consumers that place a load on an individual resource in a pool of physically distributed resources
US7178010B2 (en) 2003-01-16 2007-02-13 Ip-First, Llc Method and apparatus for correcting an internal call/return stack in a microprocessor that detects from multiple pipeline stages incorrect speculative update of the call/return stack
US7089374B2 (en) 2003-02-13 2006-08-08 Sun Microsystems, Inc. Selectively unmarking load-marked cache lines during transactional program execution
US7278030B1 (en) 2003-03-03 2007-10-02 Vmware, Inc. Virtualization system for computers having multiple protection mechanisms
US6912644B1 (en) 2003-03-06 2005-06-28 Intel Corporation Method and apparatus to steer memory access operations in a virtual memory system
US7111145B1 (en) 2003-03-25 2006-09-19 Vmware, Inc. TLB miss fault handler and method for accessing multiple page tables
US7143273B2 (en) 2003-03-31 2006-11-28 Intel Corporation Method and apparatus for dynamic branch prediction utilizing multiple stew algorithms for indexing a global history
CN1214666C (zh) 2003-04-07 2005-08-10 华为技术有限公司 位置业务中限制位置信息请求流量的方法
US7058764B2 (en) 2003-04-14 2006-06-06 Hewlett-Packard Development Company, L.P. Method of adaptive cache partitioning to increase host I/O performance
US7469407B2 (en) 2003-04-24 2008-12-23 International Business Machines Corporation Method for resource balancing using dispatch flush in a simultaneous multithread processor
US7290261B2 (en) 2003-04-24 2007-10-30 International Business Machines Corporation Method and logical apparatus for rename register reallocation in a simultaneous multi-threaded (SMT) processor
EP1471421A1 (en) 2003-04-24 2004-10-27 STMicroelectronics Limited Speculative load instruction control
US7139855B2 (en) 2003-04-24 2006-11-21 International Business Machines Corporation High performance synchronization of resource allocation in a logically-partitioned system
US7055003B2 (en) 2003-04-25 2006-05-30 International Business Machines Corporation Data cache scrub mechanism for large L2/L3 data cache structures
US7007108B2 (en) 2003-04-30 2006-02-28 Lsi Logic Corporation System method for use of hardware semaphores for resource release notification wherein messages comprises read-modify-write operation and address
US7743238B2 (en) * 2003-05-09 2010-06-22 Arm Limited Accessing items of architectural state from a register cache in a data processing apparatus when performing branch prediction operations for an indirect branch instruction
WO2004114128A2 (en) 2003-06-25 2004-12-29 Koninklijke Philips Electronics N.V. Instruction controlled data processing device
JP2005032018A (ja) 2003-07-04 2005-02-03 Semiconductor Energy Lab Co Ltd 遺伝的アルゴリズムを用いたマイクロプロセッサ
US7149872B2 (en) 2003-07-10 2006-12-12 Transmeta Corporation System and method for identifying TLB entries associated with a physical address of a specified range
WO2006082091A2 (en) 2005-02-07 2006-08-10 Pact Xpp Technologies Ag Low latency massive parallel data processing device
US20050021457A1 (en) * 2003-07-25 2005-01-27 Johnson A. Wayne Financial account up-front incentives management system and method
US7089398B2 (en) 2003-07-31 2006-08-08 Silicon Graphics, Inc. Address translation using a page size tag
US8296771B2 (en) 2003-08-18 2012-10-23 Cray Inc. System and method for mapping between resource consumers and resource providers in a computing system
US7133950B2 (en) 2003-08-19 2006-11-07 Sun Microsystems, Inc. Request arbitration in multi-core processor
US7594089B2 (en) * 2003-08-28 2009-09-22 Mips Technologies, Inc. Smart memory based synchronization controller for a multi-threaded multiprocessor SoC
US7849297B2 (en) * 2003-08-28 2010-12-07 Mips Technologies, Inc. Software emulation of directed exceptions in a multithreading processor
US9032404B2 (en) 2003-08-28 2015-05-12 Mips Technologies, Inc. Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor
EP1660993B1 (en) 2003-08-28 2008-11-19 MIPS Technologies, Inc. Integrated mechanism for suspension and deallocation of computational threads of execution in a processor
US7111126B2 (en) 2003-09-24 2006-09-19 Arm Limited Apparatus and method for loading data values
JP4057989B2 (ja) 2003-09-26 2008-03-05 株式会社東芝 スケジューリング方法および情報処理システム
US7047322B1 (en) 2003-09-30 2006-05-16 Unisys Corporation System and method for performing conflict resolution and flow control in a multiprocessor system
US7373637B2 (en) 2003-09-30 2008-05-13 International Business Machines Corporation Method and apparatus for counting instruction and memory location ranges
FR2860313B1 (fr) 2003-09-30 2005-11-04 Commissariat Energie Atomique Composant a architecture reconfigurable dynamiquement
TWI281121B (en) 2003-10-06 2007-05-11 Ip First Llc Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence
US7395372B2 (en) 2003-11-14 2008-07-01 International Business Machines Corporation Method and system for providing cache set selection which is power optimized
US7243170B2 (en) 2003-11-24 2007-07-10 International Business Machines Corporation Method and circuit for reading and writing an instruction buffer
US20050120191A1 (en) * 2003-12-02 2005-06-02 Intel Corporation (A Delaware Corporation) Checkpoint-based register reclamation
US20050132145A1 (en) 2003-12-15 2005-06-16 Finisar Corporation Contingent processor time division multiple access of memory in a multi-processor system to allow supplemental memory consumer access
US7310722B2 (en) 2003-12-18 2007-12-18 Nvidia Corporation Across-thread out of order instruction dispatch in a multithreaded graphics processor
US7293164B2 (en) 2004-01-14 2007-11-06 International Business Machines Corporation Autonomic method and apparatus for counting branch instructions to generate branch statistics meant to improve branch predictions
US20050204118A1 (en) * 2004-02-27 2005-09-15 National Chiao Tung University Method for inter-cluster communication that employs register permutation
US7478374B2 (en) * 2004-03-22 2009-01-13 Intel Corporation Debug system having assembler correcting register allocation errors
US20050216920A1 (en) 2004-03-24 2005-09-29 Vijay Tewari Use of a virtual machine to emulate a hardware device
KR100877138B1 (ko) 2004-03-29 2009-01-09 고쿠리츠 다이가쿠 호진 교토 다이가쿠 데이터 처리장치, 데이터 처리 프로그램, 및 데이터 처리프로그램을 기록한 기록매체
US7383427B2 (en) 2004-04-22 2008-06-03 Sony Computer Entertainment Inc. Multi-scalar extension for SIMD instruction set processors
US20050251649A1 (en) 2004-04-23 2005-11-10 Sony Computer Entertainment Inc. Methods and apparatus for address map optimization on a multi-scalar extension
US7418582B1 (en) 2004-05-13 2008-08-26 Sun Microsystems, Inc. Versatile register file design for a multi-threaded processor utilizing different modes and register windows
US7478198B2 (en) 2004-05-24 2009-01-13 Intel Corporation Multithreaded clustered microarchitecture with dynamic back-end assignment
US7594234B1 (en) 2004-06-04 2009-09-22 Sun Microsystems, Inc. Adaptive spin-then-block mutual exclusion in multi-threaded processing
US7284092B2 (en) * 2004-06-24 2007-10-16 International Business Machines Corporation Digital data processing apparatus having multi-level register file
US20050289530A1 (en) 2004-06-29 2005-12-29 Robison Arch D Scheduling of instructions in program compilation
EP1628235A1 (en) 2004-07-01 2006-02-22 Texas Instruments Incorporated Method and system of ensuring integrity of a secure mode entry sequence
US8044951B1 (en) 2004-07-02 2011-10-25 Nvidia Corporation Integer-based functionality in a graphics shading language
US7339592B2 (en) 2004-07-13 2008-03-04 Nvidia Corporation Simulating multiported memories using lower port count memories
US7398347B1 (en) 2004-07-14 2008-07-08 Altera Corporation Methods and apparatus for dynamic instruction controlled reconfigurable register file
EP1619593A1 (en) * 2004-07-22 2006-01-25 Sap Ag Computer-Implemented method and system for performing a product availability check
JP4064380B2 (ja) 2004-07-29 2008-03-19 富士通株式会社 演算処理装置およびその制御方法
US8443171B2 (en) 2004-07-30 2013-05-14 Hewlett-Packard Development Company, L.P. Run-time updating of prediction hint instructions
US7213106B1 (en) 2004-08-09 2007-05-01 Sun Microsystems, Inc. Conservative shadow cache support in a point-to-point connected multiprocessing node
US7318143B2 (en) 2004-10-20 2008-01-08 Arm Limited Reuseable configuration data
US20090150890A1 (en) 2007-12-10 2009-06-11 Yourst Matt T Strand-based computing hardware and dynamically optimizing strandware for a high performance microprocessor system
US7707578B1 (en) 2004-12-16 2010-04-27 Vmware, Inc. Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system
US7257695B2 (en) 2004-12-28 2007-08-14 Intel Corporation Register file regions for a processing system
US7996644B2 (en) 2004-12-29 2011-08-09 Intel Corporation Fair sharing of a cache in a multi-core/multi-threaded processor by dynamically partitioning of the cache
US8719819B2 (en) 2005-06-30 2014-05-06 Intel Corporation Mechanism for instruction set based thread execution on a plurality of instruction sequencers
US7050922B1 (en) 2005-01-14 2006-05-23 Agilent Technologies, Inc. Method for optimizing test order, and machine-readable media storing sequences of instructions to perform same
US20060179277A1 (en) * 2005-02-04 2006-08-10 Flachs Brian K System and method for instruction line buffer holding a branch target buffer
US7657891B2 (en) 2005-02-04 2010-02-02 Mips Technologies, Inc. Multithreading microprocessor with optimized thread scheduler for increasing pipeline utilization efficiency
US7681014B2 (en) 2005-02-04 2010-03-16 Mips Technologies, Inc. Multithreading instruction scheduler employing thread group priorities
US7400548B2 (en) 2005-02-09 2008-07-15 International Business Machines Corporation Method for providing multiple reads/writes using a 2read/2write register file array
US7343476B2 (en) * 2005-02-10 2008-03-11 International Business Machines Corporation Intelligent SMT thread hang detect taking into account shared resource contention/blocking
US7152155B2 (en) 2005-02-18 2006-12-19 Qualcomm Incorporated System and method of correcting a branch misprediction
US20060200655A1 (en) 2005-03-04 2006-09-07 Smith Rodney W Forward looking branch target address caching
US20060212853A1 (en) 2005-03-18 2006-09-21 Marvell World Trade Ltd. Real-time control apparatus having a multi-thread processor
US8195922B2 (en) 2005-03-18 2012-06-05 Marvell World Trade, Ltd. System for dynamically allocating processing time to multiple threads
GB2424727B (en) 2005-03-30 2007-08-01 Transitive Ltd Preparing instruction groups for a processor having a multiple issue ports
US8522253B1 (en) 2005-03-31 2013-08-27 Guillermo Rozas Hardware support for virtual machine and operating system context switching in translation lookaside buffers and virtually tagged caches
US20060230243A1 (en) 2005-04-06 2006-10-12 Robert Cochran Cascaded snapshots
US7313775B2 (en) 2005-04-06 2007-12-25 Lsi Corporation Integrated circuit with relocatable processor hardmac
US8230423B2 (en) 2005-04-07 2012-07-24 International Business Machines Corporation Multithreaded processor architecture with operational latency hiding
US20060230409A1 (en) 2005-04-07 2006-10-12 Matteo Frigo Multithreaded processor architecture with implicit granularity adaptation
US20060230253A1 (en) * 2005-04-11 2006-10-12 Lucian Codrescu Unified non-partitioned register files for a digital signal processor operating in an interleaved multi-threaded environment
US20060236074A1 (en) 2005-04-14 2006-10-19 Arm Limited Indicating storage locations within caches
US7437543B2 (en) 2005-04-19 2008-10-14 International Business Machines Corporation Reducing the fetch time of target instructions of a predicted taken branch instruction
US7461237B2 (en) 2005-04-20 2008-12-02 Sun Microsystems, Inc. Method and apparatus for suppressing duplicative prefetches for branch target cache lines
US8713286B2 (en) 2005-04-26 2014-04-29 Qualcomm Incorporated Register files for a digital signal processor operating in an interleaved multi-threaded environment
GB2426084A (en) 2005-05-13 2006-11-15 Agilent Technologies Inc Updating data in a dual port memory
US7861055B2 (en) 2005-06-07 2010-12-28 Broadcom Corporation Method and system for on-chip configurable data ram for fast memory and pseudo associative caches
US8010969B2 (en) 2005-06-13 2011-08-30 Intel Corporation Mechanism for monitoring instruction set based thread execution on a plurality of instruction sequencers
WO2007027671A2 (en) 2005-08-29 2007-03-08 Searete Llc Scheduling mechanism of a hierarchical processor including multiple parallel clusters
CN101263465B (zh) 2005-09-14 2011-11-09 皇家飞利浦电子股份有限公司 用于总线仲裁的方法和系统
US7350056B2 (en) 2005-09-27 2008-03-25 International Business Machines Corporation Method and apparatus for issuing instructions from an issue queue in an information handling system
US7676634B1 (en) 2005-09-28 2010-03-09 Sun Microsystems, Inc. Selective trace cache invalidation for self-modifying code via memory aging
US7231106B2 (en) 2005-09-30 2007-06-12 Lucent Technologies Inc. Apparatus for directing an optical signal from an input fiber to an output fiber within a high index host
US7627735B2 (en) * 2005-10-21 2009-12-01 Intel Corporation Implementing vector memory operations
US7613131B2 (en) 2005-11-10 2009-11-03 Citrix Systems, Inc. Overlay network infrastructure
US7681019B1 (en) 2005-11-18 2010-03-16 Sun Microsystems, Inc. Executing functions determined via a collection of operations from translated instructions
US7861060B1 (en) * 2005-12-15 2010-12-28 Nvidia Corporation Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior
US7634637B1 (en) 2005-12-16 2009-12-15 Nvidia Corporation Execution of parallel groups of threads with per-instruction serialization
US7770161B2 (en) 2005-12-28 2010-08-03 International Business Machines Corporation Post-register allocation profile directed instruction scheduling
US8423682B2 (en) 2005-12-30 2013-04-16 Intel Corporation Address space emulation
US20070186050A1 (en) 2006-02-03 2007-08-09 International Business Machines Corporation Self prefetching L2 cache mechanism for data lines
GB2435362B (en) 2006-02-20 2008-11-26 Cramer Systems Ltd Method of configuring devices in a telecommunications network
JP4332205B2 (ja) 2006-02-27 2009-09-16 富士通株式会社 キャッシュ制御装置およびキャッシュ制御方法
US7543282B2 (en) 2006-03-24 2009-06-02 Sun Microsystems, Inc. Method and apparatus for selectively executing different executable code versions which are optimized in different ways
EP2011018B1 (en) 2006-04-12 2016-07-13 Soft Machines, Inc. Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
US7577820B1 (en) 2006-04-14 2009-08-18 Tilera Corporation Managing data in a parallel processing environment
US7610571B2 (en) 2006-04-14 2009-10-27 Cadence Design Systems, Inc. Method and system for simulating state retention of an RTL design
CN100485636C (zh) 2006-04-24 2009-05-06 华为技术有限公司 一种基于模型驱动进行电信级业务开发的调试方法及装置
US7804076B2 (en) 2006-05-10 2010-09-28 Taiwan Semiconductor Manufacturing Co., Ltd Insulator for high current ion implanters
US8145882B1 (en) 2006-05-25 2012-03-27 Mips Technologies, Inc. Apparatus and method for processing template based user defined instructions
US20080126771A1 (en) 2006-07-25 2008-05-29 Lei Chen Branch Target Extension for an Instruction Cache
CN100495324C (zh) 2006-07-27 2009-06-03 中国科学院计算技术研究所 复杂指令集体系结构中的深度优先异常处理方法
US7904704B2 (en) 2006-08-14 2011-03-08 Marvell World Trade Ltd. Instruction dispatching method and apparatus
US8046775B2 (en) 2006-08-14 2011-10-25 Marvell World Trade Ltd. Event-based bandwidth allocation mode switching method and apparatus
US7539842B2 (en) 2006-08-15 2009-05-26 International Business Machines Corporation Computer memory system for selecting memory buses according to physical memory organization information stored in virtual address translation tables
US7594060B2 (en) 2006-08-23 2009-09-22 Sun Microsystems, Inc. Data buffer allocation in a non-blocking data services platform using input/output switching fabric
US7752474B2 (en) 2006-09-22 2010-07-06 Apple Inc. L1 cache flush when processor is entering low power mode
US7716460B2 (en) 2006-09-29 2010-05-11 Qualcomm Incorporated Effective use of a BHT in processor having variable length instruction set execution modes
US7774549B2 (en) 2006-10-11 2010-08-10 Mips Technologies, Inc. Horizontally-shared cache victims in multiple core processors
TWI337495B (en) 2006-10-26 2011-02-11 Au Optronics Corp System and method for operation scheduling
US7680988B1 (en) 2006-10-30 2010-03-16 Nvidia Corporation Single interconnect providing read and write access to a memory shared by concurrent threads
US8108625B1 (en) * 2006-10-30 2012-01-31 Nvidia Corporation Shared memory with parallel access and access conflict resolution mechanism
US7617384B1 (en) 2006-11-06 2009-11-10 Nvidia Corporation Structured programming control flow using a disable mask in a SIMD architecture
EP2527972A3 (en) 2006-11-14 2014-08-06 Soft Machines, Inc. Apparatus and method for processing complex instruction formats in a multi- threaded architecture supporting various context switch modes and virtualization schemes
US7493475B2 (en) 2006-11-15 2009-02-17 Stmicroelectronics, Inc. Instruction vector-mode processing in multi-lane processor by multiplex switch replicating instruction in one lane to select others along with updated operand address
US7934179B2 (en) 2006-11-20 2011-04-26 Et International, Inc. Systems and methods for logic verification
US20080235500A1 (en) 2006-11-21 2008-09-25 Davis Gordon T Structure for instruction cache trace formation
JP2008130056A (ja) 2006-11-27 2008-06-05 Renesas Technology Corp 半導体回路
US7783869B2 (en) 2006-12-19 2010-08-24 Arm Limited Accessing branch predictions ahead of instruction fetching
WO2008077088A2 (en) 2006-12-19 2008-06-26 The Board Of Governors For Higher Education, State Of Rhode Island And Providence Plantations System and method for branch misprediction prediction using complementary branch predictors
EP1940028B1 (en) 2006-12-29 2012-02-29 STMicroelectronics Srl Asynchronous interconnection system for 3D inter-chip communication
US8321849B2 (en) 2007-01-26 2012-11-27 Nvidia Corporation Virtual architecture and instruction set for parallel thread computing
TW200833002A (en) 2007-01-31 2008-08-01 Univ Nat Yunlin Sci & Tech Distributed switching circuit having fairness
US20080189501A1 (en) 2007-02-05 2008-08-07 Irish John D Methods and Apparatus for Issuing Commands on a Bus
US7685410B2 (en) 2007-02-13 2010-03-23 Global Foundries Inc. Redirect recovery cache that receives branch misprediction redirects and caches instructions to be dispatched in response to the redirects
US7647483B2 (en) 2007-02-20 2010-01-12 Sony Computer Entertainment Inc. Multi-threaded parallel processor methods and apparatus
JP4980751B2 (ja) 2007-03-02 2012-07-18 富士通セミコンダクター株式会社 データ処理装置、およびメモリのリードアクティブ制御方法。
US8452907B2 (en) 2007-03-27 2013-05-28 Arm Limited Data processing apparatus and method for arbitrating access to a shared resource
US20080250227A1 (en) 2007-04-04 2008-10-09 Linderman Michael D General Purpose Multiprocessor Programming Apparatus And Method
US7716183B2 (en) 2007-04-11 2010-05-11 Dot Hill Systems Corporation Snapshot preserved data cloning
US7941791B2 (en) 2007-04-13 2011-05-10 Perry Wang Programming environment for heterogeneous processor resource integration
US7769955B2 (en) 2007-04-27 2010-08-03 Arm Limited Multiple thread instruction fetch from different cache levels
US7711935B2 (en) 2007-04-30 2010-05-04 Netlogic Microsystems, Inc. Universal branch identifier for invalidation of speculative instructions
US8555039B2 (en) 2007-05-03 2013-10-08 Qualcomm Incorporated System and method for using a local condition code register for accelerating conditional instruction execution in a pipeline processor
US8219996B1 (en) 2007-05-09 2012-07-10 Hewlett-Packard Development Company, L.P. Computer processor with fairness monitor
US9495290B2 (en) 2007-06-25 2016-11-15 Sonics, Inc. Various methods and apparatus to support outstanding requests to multiple targets while maintaining transaction ordering
CN101344840B (zh) 2007-07-10 2011-08-31 苏州简约纳电子有限公司 一种微处理器及在微处理器中执行指令的方法
US7937568B2 (en) 2007-07-11 2011-05-03 International Business Machines Corporation Adaptive execution cycle control method for enhanced instruction throughput
US20090025004A1 (en) 2007-07-16 2009-01-22 Microsoft Corporation Scheduling by Growing and Shrinking Resource Allocation
US8108545B2 (en) 2007-08-27 2012-01-31 International Business Machines Corporation Packet coalescing in virtual channels of a data processing system in a multi-tiered full-graph interconnect architecture
US7711929B2 (en) 2007-08-30 2010-05-04 International Business Machines Corporation Method and system for tracking instruction dependency in an out-of-order processor
US8725991B2 (en) 2007-09-12 2014-05-13 Qualcomm Incorporated Register file system and method for pipelined processing
US8082420B2 (en) 2007-10-24 2011-12-20 International Business Machines Corporation Method and apparatus for executing instructions
US7856530B1 (en) 2007-10-31 2010-12-21 Network Appliance, Inc. System and method for implementing a dynamic cache for a data storage system
US7877559B2 (en) 2007-11-26 2011-01-25 Globalfoundries Inc. Mechanism to accelerate removal of store operations from a queue
US8245232B2 (en) 2007-11-27 2012-08-14 Microsoft Corporation Software-configurable and stall-time fair memory access scheduling mechanism for shared memory systems
US7809925B2 (en) 2007-12-07 2010-10-05 International Business Machines Corporation Processing unit incorporating vectorizable execution unit
US8145844B2 (en) 2007-12-13 2012-03-27 Arm Limited Memory controller with write data cache and read data cache
US7831813B2 (en) 2007-12-17 2010-11-09 Globalfoundries Inc. Uses of known good code for implementing processor architectural modifications
US7870371B2 (en) 2007-12-17 2011-01-11 Microsoft Corporation Target-frequency based indirect jump prediction for high-performance processors
US20090165007A1 (en) 2007-12-19 2009-06-25 Microsoft Corporation Task-level thread scheduling and resource allocation
US8782384B2 (en) 2007-12-20 2014-07-15 Advanced Micro Devices, Inc. Branch history with polymorphic indirect branch information
US7917699B2 (en) 2007-12-21 2011-03-29 Mips Technologies, Inc. Apparatus and method for controlling the exclusivity mode of a level-two cache
US8645965B2 (en) 2007-12-31 2014-02-04 Intel Corporation Supporting metered clients with manycore through time-limited partitioning
US9244855B2 (en) 2007-12-31 2016-01-26 Intel Corporation Method, system, and apparatus for page sizing extension
US7877582B2 (en) 2008-01-31 2011-01-25 International Business Machines Corporation Multi-addressable register file
WO2009101563A1 (en) 2008-02-11 2009-08-20 Nxp B.V. Multiprocessing implementing a plurality of virtual processors
US9021240B2 (en) 2008-02-22 2015-04-28 International Business Machines Corporation System and method for Controlling restarting of instruction fetching using speculative address computations
US7949972B2 (en) 2008-03-19 2011-05-24 International Business Machines Corporation Method, system and computer program product for exploiting orthogonal control vectors in timing driven synthesis
US7987343B2 (en) 2008-03-19 2011-07-26 International Business Machines Corporation Processor and method for synchronous load multiple fetching sequence and pipeline stage result tracking to facilitate early address generation interlock bypass
US9513905B2 (en) 2008-03-28 2016-12-06 Intel Corporation Vector instructions to enable efficient synchronization and parallel reduction operations
US8120608B2 (en) 2008-04-04 2012-02-21 Via Technologies, Inc. Constant buffering for a computational core of a programmable graphics processing unit
TWI364703B (en) 2008-05-26 2012-05-21 Faraday Tech Corp Processor and early execution method of data load thereof
US8131982B2 (en) 2008-06-13 2012-03-06 International Business Machines Corporation Branch prediction instructions having mask values involving unloading and loading branch history data
US8145880B1 (en) 2008-07-07 2012-03-27 Ovics Matrix processor data switch routing systems and methods
EP2297647A4 (en) 2008-07-10 2012-12-12 Rocketick Technologies Ltd EFFICIENT PARALLEL CALCULATION OF DEPENDENCE PROBLEMS
JP2010039536A (ja) 2008-07-31 2010-02-18 Panasonic Corp プログラム変換装置、プログラム変換方法およびプログラム変換プログラム
US8316435B1 (en) 2008-08-14 2012-11-20 Juniper Networks, Inc. Routing device having integrated MPLS-aware firewall with virtual security system support
US8135942B2 (en) 2008-08-28 2012-03-13 International Business Machines Corpration System and method for double-issue instructions using a dependency matrix and a side issue queue
US7769984B2 (en) 2008-09-11 2010-08-03 International Business Machines Corporation Dual-issuance of microprocessor instructions using dual dependency matrices
US8225048B2 (en) 2008-10-01 2012-07-17 Hewlett-Packard Development Company, L.P. Systems and methods for resource access
US9244732B2 (en) 2009-08-28 2016-01-26 Vmware, Inc. Compensating threads for microarchitectural resource contentions by prioritizing scheduling and execution
US7941616B2 (en) 2008-10-21 2011-05-10 Microsoft Corporation System to reduce interference in concurrent programs
US8423749B2 (en) 2008-10-22 2013-04-16 International Business Machines Corporation Sequential processing in network on chip nodes by threads generating message containing payload and pointer for nanokernel to access algorithm to be executed on payload in another node
GB2464703A (en) 2008-10-22 2010-04-28 Advanced Risc Mach Ltd An array of interconnected processors executing a cycle-based program
RU2011120258A (ru) 2008-10-30 2012-12-10 Нокиа Корпорейшн Способ и устройство для перемежения блока данных
US8032678B2 (en) 2008-11-05 2011-10-04 Mediatek Inc. Shared resource arbitration
US7848129B1 (en) 2008-11-20 2010-12-07 Netlogic Microsystems, Inc. Dynamically partitioned CAM array
US8868838B1 (en) 2008-11-21 2014-10-21 Nvidia Corporation Multi-class data cache policies
US8171223B2 (en) 2008-12-03 2012-05-01 Intel Corporation Method and system to increase concurrency and control replication in a multi-core cache hierarchy
US8200949B1 (en) 2008-12-09 2012-06-12 Nvidia Corporation Policy based allocation of register file cache to threads in multi-threaded processor
US8312268B2 (en) 2008-12-12 2012-11-13 International Business Machines Corporation Virtual machine
US8099586B2 (en) 2008-12-30 2012-01-17 Oracle America, Inc. Branch misprediction recovery mechanism for microprocessors
US20100169578A1 (en) 2008-12-31 2010-07-01 Texas Instruments Incorporated Cache tag memory
US20100205603A1 (en) 2009-02-09 2010-08-12 Unisys Corporation Scheduling and dispatching tasks in an emulated operating system
JP5417879B2 (ja) 2009-02-17 2014-02-19 富士通セミコンダクター株式会社 キャッシュ装置
US8505013B2 (en) 2010-03-12 2013-08-06 Lsi Corporation Reducing data read latency in a network communications processor architecture
US8805788B2 (en) 2009-05-04 2014-08-12 Moka5, Inc. Transactional virtual disk with differential snapshots
US8332854B2 (en) 2009-05-19 2012-12-11 Microsoft Corporation Virtualized thread scheduling for hardware thread optimization based on hardware resource parameter summaries of instruction blocks in execution groups
US8533437B2 (en) 2009-06-01 2013-09-10 Via Technologies, Inc. Guaranteed prefetch instruction
GB2471067B (en) 2009-06-12 2011-11-30 Graeme Roy Smith Shared resource multi-thread array processor
US9122487B2 (en) 2009-06-23 2015-09-01 Oracle America, Inc. System and method for balancing instruction loads between multiple execution units using assignment history
US8386754B2 (en) 2009-06-24 2013-02-26 Arm Limited Renaming wide register source operand with plural short register source operands for select instructions to detect dependency fast with existing mechanism
CN101582025B (zh) 2009-06-25 2011-05-25 浙江大学 片上多处理器体系架构下全局寄存器重命名表的实现方法
US8397049B2 (en) 2009-07-13 2013-03-12 Apple Inc. TLB prefetching
US8539486B2 (en) 2009-07-17 2013-09-17 International Business Machines Corporation Transactional block conflict resolution based on the determination of executing threads in parallel or in serial mode
JP5423217B2 (ja) 2009-08-04 2014-02-19 富士通株式会社 演算処理装置、情報処理装置、および演算処理装置の制御方法
US8127078B2 (en) 2009-10-02 2012-02-28 International Business Machines Corporation High performance unaligned cache access
US20110082983A1 (en) 2009-10-06 2011-04-07 Alcatel-Lucent Canada, Inc. Cpu instruction and data cache corruption prevention system
US8695002B2 (en) 2009-10-20 2014-04-08 Lantiq Deutschland Gmbh Multi-threaded processors and multi-processor systems comprising shared resources
US8364933B2 (en) 2009-12-18 2013-01-29 International Business Machines Corporation Software assisted translation lookaside buffer search mechanism
JP2011150397A (ja) 2010-01-19 2011-08-04 Panasonic Corp バス調停装置
KR101699910B1 (ko) 2010-03-04 2017-01-26 삼성전자주식회사 재구성 가능 프로세서 및 그 제어 방법
US20120005462A1 (en) 2010-07-01 2012-01-05 International Business Machines Corporation Hardware Assist for Optimizing Code During Processing
US8312258B2 (en) 2010-07-22 2012-11-13 Intel Corporation Providing platform independent memory logic
CN101916180B (zh) 2010-08-11 2013-05-29 中国科学院计算技术研究所 Risc处理器中执行寄存器类型指令的方法和其系统
US8751745B2 (en) 2010-08-11 2014-06-10 Advanced Micro Devices, Inc. Method for concurrent flush of L1 and L2 caches
US9201801B2 (en) 2010-09-15 2015-12-01 International Business Machines Corporation Computing device with asynchronous auxiliary execution unit
US8856460B2 (en) 2010-09-15 2014-10-07 Oracle International Corporation System and method for zero buffer copying in a middleware environment
WO2012037491A2 (en) 2010-09-17 2012-03-22 Soft Machines, Inc. Single cycle multi-branch prediction including shadow cache for early far branch prediction
US20120079212A1 (en) 2010-09-23 2012-03-29 International Business Machines Corporation Architecture for sharing caches among multiple processes
WO2012051262A2 (en) 2010-10-12 2012-04-19 Soft Machines, Inc. An instruction sequence buffer to enhance branch prediction efficiency
TWI525541B (zh) 2010-10-12 2016-03-11 軟體機器公司 輸出可靠可預測指令序列的方法與系統,以及實施識別指令方法的微處理器
US8370553B2 (en) 2010-10-18 2013-02-05 International Business Machines Corporation Formal verification of random priority-based arbiters using property strengthening and underapproximations
US9047178B2 (en) 2010-12-13 2015-06-02 SanDisk Technologies, Inc. Auto-commit memory synchronization
US8677355B2 (en) 2010-12-17 2014-03-18 Microsoft Corporation Virtual machine branching and parallel execution
WO2012103245A2 (en) 2011-01-27 2012-08-02 Soft Machines Inc. Guest instruction block with near branching and far branching sequence construction to native instruction block
CN108376097B (zh) 2011-03-25 2022-04-15 英特尔公司 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段
EP2689326B1 (en) 2011-03-25 2022-11-16 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
TWI533129B (zh) 2011-03-25 2016-05-11 軟體機器公司 使用可分割引擎實體化的虛擬核心執行指令序列程式碼區塊
US20120254592A1 (en) 2011-04-01 2012-10-04 Jesus Corbal San Adrian Systems, apparatuses, and methods for expanding a memory source into a destination register and compressing a source register into a destination memory location
US9740494B2 (en) 2011-04-29 2017-08-22 Arizona Board Of Regents For And On Behalf Of Arizona State University Low complexity out-of-order issue logic using static circuits
US8843690B2 (en) 2011-07-11 2014-09-23 Avago Technologies General Ip (Singapore) Pte. Ltd. Memory conflicts learning capability
US8930432B2 (en) 2011-08-04 2015-01-06 International Business Machines Corporation Floating point execution unit with fixed point functionality
US20130046934A1 (en) 2011-08-15 2013-02-21 Robert Nychka System caching using heterogenous memories
US8839025B2 (en) 2011-09-30 2014-09-16 Oracle International Corporation Systems and methods for retiring and unretiring cache lines
WO2013077872A1 (en) 2011-11-22 2013-05-30 Soft Machines, Inc. A microprocessor accelerated code optimizer and dependency reordering method
CN104040491B (zh) 2011-11-22 2018-06-12 英特尔公司 微处理器加速的代码优化器
IN2014CN03678A (zh) 2011-11-22 2015-09-25 Soft Machines Inc
US20130138888A1 (en) 2011-11-30 2013-05-30 Jama I. Barreh Storing a target address of a control transfer instruction in an instruction field
US8930674B2 (en) 2012-03-07 2015-01-06 Soft Machines, Inc. Systems and methods for accessing a unified translation lookaside buffer
KR20130119285A (ko) 2012-04-23 2013-10-31 한국전자통신연구원 클러스터 컴퓨팅 환경에서의 자원 할당 장치 및 그 방법
US9684601B2 (en) 2012-05-10 2017-06-20 Arm Limited Data processing apparatus having cache and translation lookaside buffer
US9996348B2 (en) 2012-06-14 2018-06-12 Apple Inc. Zero cycle load
US9940247B2 (en) 2012-06-26 2018-04-10 Advanced Micro Devices, Inc. Concurrent access to cache dirty bits
US9916253B2 (en) 2012-07-30 2018-03-13 Intel Corporation Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput
US9740612B2 (en) 2012-07-30 2017-08-22 Intel Corporation Systems and methods for maintaining the coherency of a store coalescing cache and a load cache
US9430410B2 (en) 2012-07-30 2016-08-30 Soft Machines, Inc. Systems and methods for supporting a plurality of load accesses of a cache in a single cycle
US9229873B2 (en) 2012-07-30 2016-01-05 Soft Machines, Inc. Systems and methods for supporting a plurality of load and store accesses of a cache
US9710399B2 (en) 2012-07-30 2017-07-18 Intel Corporation Systems and methods for flushing a cache with modified data
US9678882B2 (en) 2012-10-11 2017-06-13 Intel Corporation Systems and methods for non-blocking implementation of cache flush instructions
US10037228B2 (en) 2012-10-25 2018-07-31 Nvidia Corporation Efficient memory virtualization in multi-threaded processing units
US9195506B2 (en) 2012-12-21 2015-11-24 International Business Machines Corporation Processor provisioning by a middleware processing system for a plurality of logical processor partitions
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
US9569216B2 (en) 2013-03-15 2017-02-14 Soft Machines, Inc. Method for populating a source view data structure by using register template snapshots
US9632825B2 (en) 2013-03-15 2017-04-25 Intel Corporation Method and apparatus for efficient scheduling for asymmetrical execution units
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
US10275255B2 (en) 2013-03-15 2019-04-30 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
WO2014151043A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for emulating a guest centralized flag architecture by using a native distributed flag architecture
WO2014150991A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for implementing a reduced size register view data structure in a microprocessor
WO2014150806A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for populating register view data structure by using register template snapshots
WO2014151018A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for executing multithreaded instructions grouped onto blocks
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US10140138B2 (en) 2013-03-15 2018-11-27 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
CN105190541A (zh) 2013-03-15 2015-12-23 索夫特机械公司 利用具有寄存器视图、源视图、指令视图以及多个注册模板的微处理器体系架构执行指令块的方法
WO2014150971A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for dependency broadcasting through a block organized source view data structure
US9904625B2 (en) 2013-03-15 2018-02-27 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5829028A (en) * 1996-05-06 1998-10-27 Advanced Micro Devices, Inc. Data cache configured to store data in a use-once manner
CN1613056A (zh) * 2002-01-03 2005-05-04 英特尔公司 相关性链处理器
CN1650260A (zh) * 2002-05-01 2005-08-03 皇家飞利浦电子股份有限公司 基于存储区的数据预取

Also Published As

Publication number Publication date
CN101627365B (zh) 2017-03-29
US9501280B2 (en) 2016-11-22
CN101627365A (zh) 2010-01-13
US20180293073A1 (en) 2018-10-11
US9965281B2 (en) 2018-05-08
US20170139714A1 (en) 2017-05-18
EP2523101A1 (en) 2012-11-14
US20100161948A1 (en) 2010-06-24
CN107368285A (zh) 2017-11-21
EP2122461A4 (en) 2010-03-24
US10585670B2 (en) 2020-03-10
WO2008061154A2 (en) 2008-05-22
WO2008061154A9 (en) 2008-07-03
EP2527972A3 (en) 2014-08-06
WO2008061154A3 (en) 2008-09-04
EP2523101B1 (en) 2014-06-04
US20140181475A1 (en) 2014-06-26
EP2527972A2 (en) 2012-11-28
EP2122461A2 (en) 2009-11-25
US8677105B2 (en) 2014-03-18

Similar Documents

Publication Publication Date Title
US10585670B2 (en) Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer
KR101355496B1 (ko) 복수의 병렬 클러스터들을 포함하는 계층 프로세서의스케쥴링 메카니즘
CN108108188B (zh) 用于通过使用由可分区引擎实例化的虚拟核来支持代码块执行的存储器片段
JP4578042B2 (ja) 接近して結合される多重プロセッサのための高速マルチスレッディング
CN108376097B (zh) 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段
US10019263B2 (en) Reordered speculative instruction sequences with a disambiguation-free out of order load store queue
US10048964B2 (en) Disambiguation-free out of order load store queue
US9965277B2 (en) Virtual load store queue having a dynamic dispatch window with a unified structure
EP2862072B1 (en) A load store buffer agnostic to threads implementing forwarding from different threads based on store seniority
US10592300B2 (en) Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization
US9904552B2 (en) Virtual load store queue having a dynamic dispatch window with a distributed structure
EP2862058B1 (en) A semaphore method and system with out of order loads in a memory consistency model that constitutes loads reading from memory in order
US9990198B2 (en) Instruction definition to implement load store reordering and optimization
US20150095588A1 (en) Lock-based and synch-based method for out of order loads in a memory consistency model using shared memory resources
US20150095591A1 (en) Method and system for filtering the stores to prevent all stores from having to snoop check against all words of a cache
Kavi et al. Execution And Cache Performance Of A Decoupled Non-Blocking Multithreaded Architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant