CN105210040A - 用于执行分组成块的多线程指令的方法 - Google Patents
用于执行分组成块的多线程指令的方法 Download PDFInfo
- Publication number
- CN105210040A CN105210040A CN201480024528.0A CN201480024528A CN105210040A CN 105210040 A CN105210040 A CN 105210040A CN 201480024528 A CN201480024528 A CN 201480024528A CN 105210040 A CN105210040 A CN 105210040A
- Authority
- CN
- China
- Prior art keywords
- block
- instruction
- thread
- register
- carry out
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000013507 mapping Methods 0.000 claims description 5
- 230000002265 prevention Effects 0.000 claims 2
- 238000010586 diagram Methods 0.000 description 44
- 230000008569 process Effects 0.000 description 27
- 241001442055 Vipera berus Species 0.000 description 18
- 230000004087 circulation Effects 0.000 description 11
- 230000008901 benefit Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 7
- 230000001052 transient effect Effects 0.000 description 5
- 239000000872 buffer Substances 0.000 description 4
- 230000008878 coupling Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 4
- 238000005859 coupling reaction Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 238000011084 recovery Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 238000006073 displacement reaction Methods 0.000 description 3
- 238000007667 floating Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000000712 assembly Effects 0.000 description 1
- 238000000429 assembly Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3854—Instruction completion, e.g. retiring, committing or graduating
- G06F9/3858—Result writeback, i.e. updating the architectural state or memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30094—Condition code generation, e.g. Carry, Zero flag
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3838—Dependency mechanisms, e.g. register scoreboarding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G06F9/4887—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues involving deadlines, e.g. rate based, periodic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/524—Deadlock detection or avoidance
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Advance Control (AREA)
Abstract
用于执行分组成块的多线程指令的方法。该方法包括使用全局前端接收进入的指令序列;将指令分组以形成指令块,其中所述指令块的指令与多个线程交错;调度所述指令块的指令以依照所述多个线程执行;以及跟踪对所述多个线程的执行以强制执行管线中的公正性。
Description
本申请要求与在2013年3月15日提交的、穆罕默德A.阿卜杜拉(MohammadA.Abdallah)的题为“用于执行分组成块的多线程指令的方法”的、序列号为61/800,123的共同转让的共同未决的美国临时专利申请的权益,其全部内容通过引用并入本文。
相关申请的交叉引用
本申请与在2007年4月12日提交的、穆罕默德A.阿卜杜拉的题为“用于处理指定并行独立操作的指令矩阵的装置和方法”的、序列号为2009/0113170的共同转让的共同未决的美国专利申请相关,其全部内容通过引用并入本文。
本申请与在2007年11月14日提交的、穆罕默德A.阿卜杜拉的题为“用于处理支持各种上下文切换模式和虚拟化方案的多线程体系架构中的复杂指令格式的装置和方法”的、序列号为2010/0161948的共同转让共同未决的美国专利申请相关,其全部内容通过引用并入本文。
技术领域
本发明一般地涉及数字计算机系统,更具体地,涉及用于选择包括指令序列的指令的系统和方法。
背景技术
需要处理器来处理相依赖的或完全独立的多个任务。这类处理器的内部状态通常包含可保存程序执行的每个特定瞬间的不同值的寄存器。在程序执行的每个瞬间,内部状态图像被称为处理器的体系架构状态。
当切换代码执行以运行另一函数(例如,另一线程、过程或程序)时,需要保存机器/处理器的状态,这样新的函数可利用内部寄存器来建立其新的状态。一旦新功能结束,则可丢弃其状态,并且将恢复先前内容的状态且执行继续。这样的切换过程被称为上下文切换(contextswitch),并且通常包括数十个或数百个循环,特别是采用大量寄存器(例如64,128,256)和/或乱序执行的现代体系架构。
在线程感知的硬件体系架构中,硬件支持多上下文状态用于有限数目的硬件支持的线程是正常的。在这种情况下,硬件复制所有体系架构状态元素用于每个所支持的线程。这消除了当执行新线程时对上下文切换的需求。然而,这仍存在多个缺点,也就是针对硬件中所支持的每个附加线程复制所有体系架构状态元素(即寄存器)的区域、电力和复杂度。此外,如果软件线程的数目超过所明确支持的硬件线程的数目,则仍然必须实施上下文切换。
这将由于在要求大量线程的细粒度基础上对并行机制的需要而变得普遍。采用复制的上下文状态硬件存储的硬件线程感知体系架构对非线程软件代码没有帮助,且只能针对线程化的软件减少上下文切换的数目。然而,那些线程通常被构建用于粗粒并行机制,并导致繁重的软件开销用于初始化和同步化,留下细粒并行机制(诸如函数调用和循环并行执行)没有高效的线程初始化/自动生成。针对非明确地/容易地并行化/线程软件代码使用现有技术中的编译器或用户并行化技术,这类所描述的开销伴随这类代码的自动并行化的困难。
发明内容
在一个实施例中,本发明实现为用于执行分组成块的多线程指令的方法。该方法包括使用全局前端接收进入的指令序列;将指令分组以形成指令块,其中所述指令块的指令与多个线程交错;调度所述指令块的指令以依照所述多个线程执行;以及跟踪对所述多个线程的执行以强制执行管线中的公正性。
前述是概要并且因此不可避免地包含简单化、一般化和细节的省略;因此,本领域技术人员将理解概要仅是示例性的,并且非意在以任何方式加以限制。本发明的其他方面、创造性特征和优势,如权利要求所唯一定义的,将在下文阐述的非限制的详细描述中变得显而易见。
附图说明
以示例而非限制的方式将本发明示出在附图的图中,在附图中,类似的参考标记指代相似的元件。
图1示出用于通过使用寄存器模板将指令分组到块中并跟踪指令之间的依赖关系的概略图。
图2示出根据本发明一个实施例的寄存器图、源图以及指令图的概略图。
图3示出根据本发明一个实施例的、阐明示例性寄存器模板以及来自寄存器模板的信息如何填充源图的示意图。
图4示出阐明用于源图内的依赖关系广播的第一实施例的示意图。在该实施例中,每列包括一个指令块。
图5示出阐明用于源图内的依赖关系广播的第二实施例的示意图。
图6示出根据本发明一个实施例的、阐明针对从提交指针(commitpointer)开始分派以及广播相应的端口指派选择就绪块的示意图。
图7示出根据本发明一个实施例的、实现图6中所描述的选择器阵列所使用的加法器树结构。
图8更详细地示出选择器阵列加法器树的示例性逻辑。
图9示出根据本发明一个实施例的、用于实现选择器阵列的加法器树的并行实现方案。
图10示出根据本发明一个实施例的、阐明通过使用进位保留加法器可如何实现来自图9的加法器X的示例性示意图。
图11示出根据本发明的用于针对从提交指针开始调度和使用选择器阵列加法器掩蔽就绪位的掩蔽实施例。
图12示出根据本发明一个实施例的、寄存器模板如何填充寄存器图条目的概略图。
图13示出根据本发明一个实施例的、用于减少的寄存器图印记(footprint)的第一实施例。
图14示出根据本发明一个实施例的、用于减少的寄存器印记的第二实施例。
图15示出根据本发明一个实施例的、快照之间的德尔塔(delta)的示例性格式。
图16示出根据本发明一个实施例的、用于根据指令的块的分配创建寄存器模板快照的过程的示意图。
图17示出根据本发明一个实施例的、用于根据指令的块的分配创建寄存器模板快照的过程的另一示意图。
图18示出根据本发明一个实施例的、用于实现从先前寄存器模板创建后续寄存器模板的串行实现方案的硬件的概略图。
图19示出根据本发明一个实施例的、用于实现从先前寄存器模板创建后续寄存器模板的并行实现方案的硬件的概略图。
图20示出根据本发明一个实施例的、用于指令基于块的执行的硬件以及其如何与源图、指令图、寄存器模板以及寄存器图工作的概略图。
图21示出根据本发明一个实施例的组块体系架构的示例。
图22示出根据本发明一个实施例的、如何根据线程的块编号和线程ID对线程进行分配的描述。
图23示出根据本发明一个实施例的、使用指向物理存储位置的线程指针映射以管理多线程执行的调度器的实现方案。
图24示出根据本发明一个实施例的、使用基于线程的指针映射的调度器的另一实现方案。
图25示出根据本发明一个实施例的、对线程的执行资源的动态的基于日历的分配的示意图。
图26示出根据本发明一个实施例的双分派过程。
图27示出根据本发明一个实施例的双分派暂时乘累加器。
图28示出根据本发明一个实施例的双分派体系架构可视状态乘加器。
图29示出根据本发明一个实施例的用于在经分组的执行单元过程上执行的指令块的取回和形成的概略图。
图30示出根据本发明一个实施例的指令分组的示例性示意图。在图30的实施例中示出第三辅助操作的两个指令。
图31示出根据本发明一个实施例的、块堆栈内的半块对如何映射到执行块单元。
图32示出根据本发明一个实施例的、描绘中间块结果存储作为第一级别寄存器文件的示意图。
图33示出根据本发明一个实施例的奇数/偶数端口调度器。
图34示出图33的更详细版本,其中示出四个执行单元从调度器阵列接收结果并将输出写入到临时寄存器文件段。
图35示出根据本发明一个实施例的描绘访客标志(guestflag)体系架构仿真的示意图。
图36示出根据本发明一个实施例的、阐明机器的前端、调度器、执行单元和中央标志寄存器的示意图。
图37示出如本发明的实施例所实现的中央标志寄存器仿真过程的示意图。
图38示出仿真帐户设置中的中央标志寄存器行为的过程3800的步骤的流程图。
具体实施方式
虽然结合一个实施例描述本发明,但是并非意在将本发明限制在本文中所阐述的具体形式。相反,意在覆盖如可合理地包括在如附随的权利要求所限定的本发明的范围内的这类替代物、修改和等同物。
在下面的详细描述中阐述大量的具体细节,诸如具体的方法顺序、结构、元件和连接。然而,将理解,不必利用这些和其他具体细节来实践本发明的实施例。在其他情况下,省略或者不详细描述已知的结构、元件或连接,以避免对本描述产生不必要的遮蔽。
说明书内对“一个实施例”或“实施例”的引用意在指示结合实施例所描述的特定的特征、结构或特性包括在本发明的至少一个实施例中。出现在说明书内各处的短语“在一个实施例中”未必指代同一个实施例、或与其他实施例互斥的独立的或可替代的实施例。此外,描述了一些实施例将展现出而其他实施例中没有的各种特征。类似地,描述了一些实施例要求而其他实施例不要求的各种要求。
接下来详细描述的一些部分以对计算机存储器内的数据位的操作的过程、步骤、逻辑块、处理和其他符号表现的方式加以呈现。这些描述和表现是数据处理领域的技术人员使用以将其工作实质的最有效地传达给本领域其他技术人员的手段。本文中的过程、计算机执行步骤、逻辑块、处理等通常被设想为致使所希望的结果的自相一致序列的步骤或指令。步骤是要求对物理量的物理操纵的那些步骤。虽然不是必要的,但是通常这些量采用计算机可读存储介质的电或磁信号的形式,其能够被存储、转移、组合、比较和计算机系统中的其他操纵。主要因为一般使用的原因,有时已证明其指代如位、值、元素、符号、字符、术语、数字等是方便的。
然而,应该记住,所有这些和相似的术语将与合适的物理量相关联,并且仅是适用于这些量的方便标签。除非特别陈述,否则如从下面的讨论中显而易见的,将理解贯穿本发明,利用诸如“处理”或“访问”或“写”或“存储”或“复制”等的讨论指代计算机系统或相似电子计算设备的动作和过程,所述计算机系统或相似电子计算操纵计算机系统寄存器和存储器以及其他计算机可读介质内的表示为物理(电子)量的数据并将其转换为类似地表示为计算机系统存储器或寄存器或其他这类信息存储、传输或显示设备内的物理量的其他数据。
图1示出用于通过使用寄存器模板将指令分组到块中并跟踪指令之间的依赖关系的概略图。
图1示出具有头部和主体的指令块。从一组指令创建块。块包括封装该组指令的实体。在本实施例的微处理器中,抽象级别上升到块而不是个别的指令。块被处理用于分派,而不是个别指令。每个块采用块编号加以标注。从而使机器的乱序管理工作简化。一个关键特征是找到管理大量将被处理的指令的方式,而不显著增加机器的管理开销。
本发明的实施例通过实现指令块、寄存器模板和继承向量来达成该目标。在图1示出的块中,块的头部列出并封装了块的指令的所有源和目的地以及那些源来自于何处(例如来自哪些块)。头部包括更新寄存器模板的目的地。包括在头部中的源将与存储在寄存器模板中的块编号连接在一起。
将被乱序处理的指令的数目决定了乱序机器的管理复杂度。乱序指令越多导致更大的复杂度。在处理器的乱序分派窗口中,源需要与先前指令的目的地相比较。
如图1所示,寄存器模板具有针对每个寄存器从R0到R63的字段。块将其各自块编号写入到与块目的地相对应的寄存器模板字段中。每个块从寄存器模板读取表示其寄存器源的寄存器字段。当块撤回并将其目的地寄存器内容写入到寄存器文件中时,其编号将从寄存器模板中擦除。这意味着可从寄存器文件本身将那些寄存器读取为源。
在所呈现的实施例中,每当分配块时每个机器循环更新寄存器模板。随着新模板更新生成,寄存器模板的先前快照被存储到阵列(例如图2中所示的寄存器图)中,每块一个。该信息被保留直到相应的块被撤回。这允许机器从误预测中恢复并非常快地清除(例如通过获得最后已知的依赖关系状态)。
在一个实施例中,可以通过仅存储连续快照之间的delta(快照之间的递增量)来压缩存储在寄存器图中的寄存器模板(从而节省存储空间)。以该方式,机器获得收缩的寄存器图。可以通过仅存储用于具有分支指令的块的模板来进一步获得压缩。
如果除分支误预测之外需要恢复点,那么在分支恢复点首先获得恢复,然后可以无分配指令(但不执行它们)而重建状态直到机器到达恢复点之后的寻求地(sought)。
应该注意,在一个实施例中,如本文中所使用的术语“寄存器模板”与早先提交的共同转让的专利申请“通过使用由可分区引擎实例化的虚拟代码执行指令序列代码块”中所描述的术语“继承向量”同义,该专利申请由MohammadAbdallah在2012年3月23日提交,序列号为13428440,该专利申请的全部内容通过引用并入本文。
图2示出根据本发明一个实施例的寄存器图、源图以及指令图的概略图。该图示出调度器体系架构(例如,具有源图、指令图、寄存器图等)的一个实施例。通过组合或分离上述结构的一个或多个来达成相同功能的调度器体系架构的其他实施例是可能的。
图2用图表示出支持寄存器模板的操作和机器状态的保留的功能实体。图2的左手边示出寄存器图T0到T4,箭头指示从一个寄存器模板/继承向量到下一个的信息的继承。寄存器图、源图和指令图各自包括用于存储与指令块相关的信息的数据结构。图2还示出具有头部的示例性指令块,以及指令块如何包括用于机器的寄存器的源和目的地。关于块所引用的寄存器的信息存储在寄存器图数据结构中。关于块所引用的源的信息存储在源图数据结构中。关于块所引用的指令自身的信息存储在指令图数据结构中。寄存器模板/继承向量本身包括存储块所引用的继承信息和依赖关系的数据结构。
图3示出根据本发明一个实施例的、阐明示例性寄存器模板以及来自寄存器模板的信息如何填充源图的示意图。
在本实施例中,应该注意,源图的目标是确定何时可以分派特定的块。当块被分派时,它将它的块编号广播到所有剩余的块。用于其他块的源的任何匹配(例如,比较)产生将被设置的就绪位(例如,某其他类型的指示符)。当所有就绪位被设置时(例如,与门),块就准备好被分派。块基于它们所依赖的其他块的而准备就绪被分派。
当多个块准备好用于分派时,选择最老的块用于在较年轻的块之前分派。例如,在一个实施例中,可以使用第一查找电路来基于接近提交指针查找最老的块,并基于相对接近提交指针查找后续块(例如,作用于每个块的就绪位)。
仍然参考图3,在该示例中,检查在块20到达时创建的寄存器模板快照。如上所述,寄存器模板具有用于从R0到R63的每个寄存器的字段。块将它们各自的块编号写入到与块目的地相对应的寄存器模板字段。每个块从寄存器模板读取表示其寄存器源的寄存器字段。第一数字是写入到寄存器的块,第二数字是该块的目的地编号。
例如,当块20到达时,它读取寄存器模板的快照并在寄存器模板中查找它自身的寄存器源,以确定写入到它的每个源的最新的块并根据其目的地对先前的寄存器模板快照所做的更新来填充源图。随后的块将采用它们自身的目的地更新寄存器模板。这在图3的左下角示出,其中块20填充它的源:源1、源2、源3、一直到源8。
图4示出阐明用于源图的依赖关系广播的第一实施例的示意图。在该实施例中,每列包括一个指令块。当块被分配时,它在它的源所依赖的所有那些块的列中做标记(例如通过写0)。当任何其他块被分派时,它的编号被跨与该块相关的准确列广播。应该注意,写1是缺省值,指示对该块没有依赖关系。
当块中的所有就绪位就绪时,该块非分派并且其编号被广播回所有剩余的块。该块编号与存储在其他块的源中的所有编号进行比较。如果存在匹配,则用于该源的就绪位被置位。例如,如果在源1上广播的块编号等于11,那么用于块20的源1的就绪位将被置位。
图5示出阐明用于源图内的依赖关系广播的第二实施例的示意图。该实施例由源来组织而不是由块来组织。这通过跨源图数据结构的源S1到S8来示出。以与上面图4所描述的类似的方式,在图5的实施例中,当块中的所有就绪位就绪时,该块被分派并且其编号被广播回所有剩余的块。该块编号与存储在其他块的源中的所有编号进行比较。如果存在匹配,则用于该源的就绪位被置位。例如,如果在源1上广播的块编号等于11,那么用于块20的源1的就绪位将被置位。
图5的实施例还示出如何仅在提交指针和分配指针之间的块上使能该比较。所有其他块无效。
图6示出根据本发明一个实施例的、阐明针对从提交指针开始分派以及广播相应的端口分配选择就绪块的示意图。源图数据结构示出在图6的左侧。指令图数据结构示出在图6的右侧。选择器阵列示出在源图和指令图之间。在该实施例中,选择器阵列经由四个分派端口P1到P4每循环分派四个块。
如上所述,块被选择用于从提交指针围绕分配指针分派(例如尝试首先荣幸分派最老的块)。选择器阵列被使用以找到从提交指针开始的前四个就绪块。期望的是分派最老的就绪块。在一个实施例中,可以使用加法器树结构来实现选择器阵列。这将在下面的图7中描述。
图6还示出选择器阵列如何被耦连到穿过指令图中的条目的四个端口中的每一个。在该实施例中,端口随端口使能而耦连,使四个端口中的一个能够被激活,用于该指令图条目向下穿过分派端口到执行单元上。此外,如上所述,所分派的块被广播回源图。所选择的块的块编号被广播回。这在图6的更右侧示出。
图7示出根据本发明一个实施例的、实现图6中所描述的选择器阵列所使用的加法器树结构。所描绘的加法器树结构实现选择器阵列的功能。加法器树挑选前四个就绪块并将它们装载到四个可用端口以用于分派(例如读端口1到读端口4)。不使用仲裁。用来特别使能具体端口的实际逻辑在条目编号1中明确示出。为了清楚,未在其他条目中示出逻辑。以该方式,图7示出如何实现对每个特定端口的直接选择以用于块分派的一个具体实施例。然而,应该注意,可替代地,可以实现使用优先编码器的实施例。
图8更详细地示出选择器阵列加法器树的示例性逻辑。在图8的实施例中,示出用于范围超过位(rangeexceedbit)的逻辑。范围超过位确保不多于四个块将被选择用于分派,如果第五个块就绪而前四个也就绪,范围超过位将不允许第五个块被分派。应该注意,和数位为S0到S3,均用来使能分派端口以及传播到串行实现中的下一加法器级。
图9示出根据本发明一个实施例的、用于实现选择器阵列的加法器树的并行实现方案。并行实现方案不将总和从每个加法器转发到下一个。在并行实现方案中,每个加法器使用多输入加法实现方案(诸如多输入进位保留加法器树)来直接使用所有其必需的输入。例如,加法器“X”对所有先前的输入求和。为了执行更快计算次数(例如单个循环),该并行实现方法是理想的。
图10示出根据本发明一个实施例的、阐明通过使用进位保留加法器可如何实现来自图9的加法器X的示例性示意图。图10示出可以在单个循环中对32个输入求和的结构。该结构使用4:2进位保留加法器组合而成。
图11示出根据本发明的用于针对从提交指针开始调度和使用选择器阵列加法器掩蔽就绪位的掩蔽实施例。在该实现方案中,选择器阵列加法器尝试选择前四个就绪块,以从提交指针开始潜在地围绕分配指针分派。在该实现方案中,使用多输入并行加法器。此外,在该实现方案中利用这些循环缓冲器的源。
图11示出如何采用两个掩码(单独地或分开地)将就绪位求与(AND)在一起以及如何将其应用到两个并行的加法器树。通过使用两个加法器树并比较四个的阈值来选择前四个。“X”标记表示“针对该加法器树从选择阵列中排除”,因此“X”值为0。另一方面,“Y”标记表示“针对该加法器树确实包括在选择器阵列中”,因此“Y”值为1。
图12示出根据本发明一个实施例的、寄存器模板如何填充寄存器图条目的概略图。
如上所述,通过寄存器模板填充寄存器图条目。寄存器图按序针对每个块存储寄存器模板的快照。当推测无效时(例如分支误预测),寄存器图具有无效推测点之前的最新有效快照。机器可以通过读取该寄存器图条目并将其加载在寄存器模板的底部来将其状态退回到最后的有效快照。寄存器的每个条目显示所有的寄存器继承状态。例如在图12的实施例中,如果用于块F的寄存器图无效,则机器状态可被退回到较早的最后有效的寄存器模板快照。
图13示出根据本发明一个实施例的、用于减少的寄存器图印记的第一实施例。可通过仅存储那些包含分支指令的寄存器图模板快照来减少存储寄存器图条目所需要的存储器的数量。当发生例外时(例如推测无效、分支误预测等),可从分支指令中重建在例外之前发生的最后的有效快照。在从在例外之前直到例外的分支取回指令以建造最后的有效快照。指令被取回但未被执行。如图13中所示,仅那些包括分支指令的快照被保存在减少的寄存器图中。这大大减少了存储寄存器模板快照所需要的存储器的数量。
图14示出根据本发明一个实施例的、用于减少的寄存器印记的第二实施例。可以通过仅存储快照的顺序子集(例如每四个快照一个)来减少存储寄存器图条目所需要的存储器的数量。可以使用比完整连续快照相对较小的存储器数量将连续快照之间改变存储为从原始快照的“delta”。当发生例外时(例如推测无效、分支误预测等),可从在例外之前发生的原始快照中重建最后的有效快照。使用从在例外之前发生的原始快照的“delta”和连续快照来重建最后的有效快照。初始原始状态可以累积delta以到达所要求的快照的状态。
图15示出根据本发明一个实施例的、快照之间的delta的示例性格式。图15示出原始快照和两个delta。在一个delta中,仅R5和R6是被B3所更新的寄存器。其余的条目不变。在另一个delta中,仅R1和R7是被B2所更新的寄存器。其余的条目不变。
图16示出根据本发明一个实施例的、用于根据指令的块的分配创建寄存器模板快照的过程的示意图。在该实施例中,图16的左侧示出两个解复用器,并且图16的上部是快照寄存器模板。图16示出用于从先前寄存器模板创建后续寄存器模板(例如串行实现方案)的示意图。
该串行实现方案示出如何基于指令的块的分配创建寄存器模板快照。那些快照用来捕获最新的寄存器体系架构状态更新,其用于依赖关系跟踪(例如,如图1到4中所描述的)以及更新寄存器图用于处理误预测/例外(例如,如图12到15中所描述的)。
解复用通过选择哪个传入源通过来工作。例如,寄存器R2将在第二输出处解复用到1,而R8将在第七输出处解复用为1等等。
图17示出根据本发明一个实施例的、用于根据指令的块的分配创建寄存器模板快照的过程的另一示意图。图17的实施例还示出从先前寄存器模板创建后续寄存器模板。图17的实施例还示出寄存器模板块继承的示例。该图示出如何从分配的块编号更新寄存器模板。例如,块Bf更新R2、R8和R10。Bg更新R1和R9。虚线箭头指示从前一个快照继承值。该过程一直继续到块Bi.。这样,例如,因为没有快照更新寄存器R7,所以它的原始值Bb将一直传播下去。
图18示出根据本发明一个实施例的、用于实现从先前寄存器模板创建后续寄存器模板的串行实现方案的硬件的概略图。解复用器用来控制两个输入多路复用器系列两个块编号中的哪个将被传播到下一级。其可以是来自先前级的块编号或者是当前块编号。
图19示出根据本发明一个实施例的、用于实现从先前寄存器模板创建后续寄存器模板的并行实现方案的硬件的概略图。该并行实现方案使用特别编码的多路复用器控制以从先前寄存器模板创建后续寄存器模板。
图20示出根据本发明一个实施例的、用于指令基于块的执行的硬件以及其如何与源图、指令图、寄存器模板以及寄存器图工作的概略图。
在该实现方案中,分派器中的分配器调度器接收由机器的前端所取回的指令。这些指令以我们较早描述的方式穿过块队形。如较早所描述的,块产生寄存器模板并且这些寄存器模板用来填充寄存器图。源被从源图传递到寄存器文件层级,并且以上文所描述的方式广播回源图。指令图向执行单元传递指令。当指令所需要的源从寄存器文件层级来到时,指令被执行单元所执行。这些经执行的指令随后被转移出执行单元并回到寄存器文件层级。
图21示出根据本发明一个实施例的组块体系架构的示例。组块的重要性在于它通过使用示出的四个多路复用器,减少了到从4到1的每个调度器条目的写端口的数目,而仍密集地包装所有条目而没有形成磁泡(bubble)。
组块的重要性可通过下面的示例(例如,注意每个循环中块的分配开始于上部位置,在这种情况下为B0)看出。假定在循环1中,三个指令块被分配到调度器条目(例如,三个块将占用调度器中的前3个条目)。在下一循环(例如,循环2)中,另两个指令块被分配。为了避免在调度器阵列条目中创建磁泡,调度器阵列条目需被建造为支持四个写端口。在这功耗、时序、面积等方面是高代价的。上面的组块结构简化了所有的调度器阵列,其通过在分配到阵列之前使用多路复用结构使调度器阵列仅具有一个写端口。在上述示例中,循环2中的B0将被最后的复用器所选择,而循环2中的B1将被第一个复用器所选择(例如从左到右进行)。
以该方式,每个针对条目的组块仅需要每条目一个写端口和每条目四个读端口。在成本上存在折衷,因为必须实现多路复用器,然而该成本超过因不必实现每条目四个写端口所节约的成本的许多倍,因为可能存在非常多的条目。
图21还示出中间分配缓冲区。如果调度器阵列不能接受发送给它们的所有组块,则组块可被临时存储在中间分配缓冲区中。当调度器阵列具有空闲空间时,组块将被从中间分配缓冲区中转移到调度器阵列。
图22示出根据本发明一个实施例的、如何根据线程的块编号和线程ID对线程进行分配的描述。块经由上文所描述的组块实现方案被分配到调度器阵列。每个线程块使用块编号在它们之间维持相继顺序。来自不同线程的块可交错(例如用于线程Th1的块和用于线程Th2的块在调度器阵列中交错)。以该方式,来自不同线程的块存在于调度器阵列内。
图23示出根据本发明一个实施例的、使用指向物理存储位置的线程指针映射以管理多线程执行的调度器的实现方案。在该实施例中,线程的管理通过对线程映射的控制来实现。例如,此处图23示出线程1映射和线程2映射。映射跟踪个体线程的块的位置。映射2中的条目,映射中的物理存储位置条目被分配到属于该线程的块。在该实现方案中,每个线程具有分配计数器,其为两个线程计数。总的计数不能超过N除以2(例如超过可用空间)。分配计数器具有可调整的阈值,以实现来自池中的全部条目的分配的公正性。分配计数器可以阻止一个线程使用所有的可用空间。
图24示出根据本发明一个实施例的、使用基于线程的指针映射的调度器的另一实现方案。图24示出提交指针和分配指针之间的关系。如所示的,每个线程具有提交指针和分配指针,箭头示出用于线程2的真实指针可如何环绕分配块B1和B2的物理存储,但其不能分配块B9,直到用于线程2的提交指针向下移动。这通过线程2的提交指针的位置和删除线示出。图24的右侧示出块的分配和提交指针之间随着其围绕逆时针方向移动的关系。
图25示出根据本发明一个实施例的、对线程的执行资源的动态的基于日历的分配的示意图。可基于每个线程的向前进度使用分配计数器动态控制公正性。如果两个线程均在进行大量向前进度,则两个分配计数器均被设置在相同的阈值(例如,9)。然而,如果一个线程向前进度迟缓,诸如遭遇L2高速缓存未命中或这类事件,则计数器的阈值的比例可被调整有利于仍在进行大量向前进度的线程。如果一个线程停止或暂停(例如在等待OS或IO响应的等待或自旋状态),则比例可被完全调整到另一线程,除了被保留用于暂停线程以发信号通知释放等待状态的单个返回条目。
在一个实施例中,进程从50%:50%的比例开始。一旦L2高速缓存未命中对块22的检测,管线的前端停止任何到管线的进一步取回或到线程2的块的调度器的分配。一旦线程2的块从调度器中退出,那些条目将可用于线程1分配,直到达到线程分配的新的动态比例的指针。例如,3个最近退出的线程2之外的块将被返回到池中用于分配到线程1而不是线程2,使线程1对线程2的比例为75%:25%。
应该注意,如果没有硬件机制(例如通过传递停止的线程2的块由线程1的块)绕过管线前端线程2的块,管线前端线程2的块的停止可能要求从管线前端清除那些块。
图26示出根据本发明一个实施例的双分派过程。多分派通常包含多次分派块(其内具有多个指令),这样块内的不同指令可各自通过执行单元执行。一个示例将是地址计算指令的分派,其后跟随消耗结果数据的相继分派。另一个示例将是浮点操作,其中第一部分执行为定点操作,并且第二部分被执行以通过实施舍入(rounding)、标志生成/计算、指数调整等完成操作。块作为单个条目被原子地分配、提交和退出。
多分派的主要益处在于它避免了将多个单独块分配到机器窗口中,从而使机器窗口实际上更大。较大的机器窗口意味着更多的机会用于优化和重新排序。
看图26的左下角,描绘了指令块。该块不能在单个循环中被分派,因为加载地址计算和加载来自高速缓存/存储器的返回数据之间存在延迟。因此该块首先采用其保存为暂时状态的中间结果被分派(其结果被飞速传送到第二分派,对体系架构状态不可见)。第一分派发送两个组件1和2,其用于地址计算和LA的分派。第二分派发送组件3和4,其为根据加载来自高速缓存/存储器的返回数据的加载数据的执行部分。
看图26的右下角,描绘了浮点乘累加操作。在硬件没有足够进入源带宽以在单个阶段中分派操作的情况下,使用双分配,如乘累加图所示的。第一分派是如所示的定点乘。第二分派是如所示的浮点加法舍入。当这些所分派的指令均执行时,它们有效地实施浮点乘/加。
图27示出根据本发明一个实施例的双分派暂时乘累加器。如图27中所示,第一分派是整数32位乘,并且第二分派时整数累加。第一分派和第二分派之间传达的状态(乘的结果)是暂时的,并且在体系架构上不可见。在一个实现方案中,暂时存储可以保存一个乘法器以上的结果,并且可以标记它们以识别相对应的乘累加对,从而允许以任意方式(例如交错等)被分派的多个乘累加对的混合。
注意,其他指令可以使用相同的硬件用于它们的实现(例如,浮点等)。
图28示出根据本发明一个实施例的双分派体系架构可视状态乘加器。第一分派是单精度乘,并且第二分派时单精度加。在该实现方案中,第一分派和第二分派之间传达的状态信息(例如,乘的结果)在体系架构上可见,因为该存储是体系架构状态寄存器。
图29示出根据本发明一个实施例的用于在经分组的执行单元过程上执行的指令块的取回和形成的概略图。本发明的实施例利用过程,凭借其指令被硬件或动态转换器/JIT取回并形成为块。块中的指令被组织,这样块中早期指令的结果为块中的后续指令提供源。这通过指令块中的虚线箭头示出。该性质使块能够高效地执行在执行块的堆栈执行单元上。即使指令并行执行,诸如如果它们共享相同的源(未在该图中明确示出),也可将指令分组。
在硬件中形成块的一个替代性方案是在形成指令对、三重指令、四重指令等的软件中形成它们(静态地或在运行时)。
指令分组功能的其他实现方案可在共同转让的美国专利8,327,115中找到。
图30示出根据本发明一个实施例的指令分组的示例性示意图。在图30的实施例中示出第三辅助操作的两个指令。图31左侧的指令块包括上半块/1槽和下半块/1槽。从顶部向下的垂直箭头指示源进入到块中,而从底部向下的垂直箭头指示目的地回到存储器。从图3的左侧继续到右侧,示出可能的不同指令组合。在该实现方案中,每半块可接收三个源并且可传递两个目的地。OP1和OP2是常规操作。辅助OP是辅助的操作,诸如逻辑、移位、移动、符号扩展、分支等。将块分为两半的益处在于允许使各自半个独立地分派在其自身上或基于依赖关系解析另外动态地合而为一块(用于端口利用或因为资源约束)的益处,这样具有执行次数的更好利用,同时使与一块相对应的2个半块允许机器将2个半块的复杂度抽象为像一块那样管理(即在分配和退出上)。
图31示出根据本发明一个实施例的、块堆栈内的半块对如何映射到执行块单元。如在执行块中所示的,每个执行块具有两个槽,槽1和槽2。目标是将块映射到执行单元上,这样第一半块在槽1上执行,并且第二半块在槽2上执行。目标是如果每半块的指令组不依赖于另半块则允许两个半块独立地分派。从顶部进入执行块的成对的箭头是源的两个32位的字。离开执行块向下的成对的箭头是目的地的两个32位的字。从图31的左侧到右侧,示出能够被堆栈在执行块单元上的指令的不同示例性组合。
图31的上部概述了半块对如何在整块上下文或任何半块上下文中执行。每个执行块具有两个槽/半块并且每个半块/执行槽执行单个、成对的或三个经分组的操作。存在四种类型的块执行类型。第一种是并行半块(其允许一旦其自身的源就绪则每半块独立地执行,但如果两个半块同时就绪则两个半块仍可作为一个块执行在一个执行单元上)。第二种是原子并行半块(其指的是可以并行执行的半块,因为两个半块之间不存在依赖关系但是它们被迫作为一个块一起执行,因为两个半块之间的资源共享使两个半块在每个执行块中可用的资源的约束下优先或需要一起原子地执行)。第三种类型是原子串行半块(其要求第一半块将数据转发到第二半块,通过采用或不采用内部存储的暂时转发)。第四种类型是顺序半块(如在双分派中的),其中第二半块依赖于第一半块并在比第一个靠后的循环上被分派,并且通过被堆栈用于依赖关系解析的外部存储转发数据,与双分派的情况类似。
图32示出根据本发明一个实施例的、描绘中间块结果存储作为第一级别寄存器文件的示意图。寄存器的每个组表示指令的块(表示两个半块),在其中32位结果以及64位结果可通过使用两个32位寄存器支持一个64位寄存器来加以支持。每块存储假定虚拟块存储,这意味着来自不同块的两个半块可以写入同一个虚拟块存储。两个半块的组合结果的存储构成一个虚拟块存储。
图33示出根据本发明一个实施例的奇数/偶数端口调度器。在该实现方案中,结果存储是非对称的。某些结果存储是每半块三个64位结果寄存器,而其他的是每半块一个64位结果寄存器,然而,可替代的实现方案可以使用每半块对称存储,并且此外还可以采用如图32中所描述的64位和32位分区。在这些实施例中,存储被每半块指派,而不是每块。该实现方案通过使用端口作为奇数或偶数减少了用于分派所需要的端口的数目。
图34示出图33的更详细版本,其中示出四个执行单元从调度器阵列接收结果并将输出写入到临时寄存器文件段。端口在奇数和偶数间隔附上。调度阵列的左侧示出块编号,右侧示出半块编号。
每个核心具有到调度阵列的偶数和奇数端口,其中每个端口连接到奇数或偶数半块位置。在一个实现方案中,偶数端口和它们相对应的半块可以驻留在与奇数端口和其对应的半块不同的核心内。在另一个实现方案中,奇数和偶数端口将如该图中所示的跨多个不同的核心被分配。如MohammadAbdallah在2012年3月23日提交的、题为“通过使用由可分区引擎所实例化的虚拟代码执行指令序列代码块”的、序列号为13428440的共同转让的专利申请中所描述的,核心可以是物理核心或虚拟核心,该专利申请的全部内容通过引用并入本文。
在某些类型的块中,块的一半可独立于块的另一半加以分派。在其他类型的块中,块的两个半块均需要同时分派到相同的执行块单元。在另外其他类型的块中,块的两个半块需要被顺序分派(第二半块在第一半块之后)。
图35示出根据本发明一个实施例的描绘访客标志体系架构仿真的示意图。图35的左侧示出具有五个标志的中央式标志寄存器。图35的右侧示出具有分布的标志寄存器的分布式标志体系架构,其中标志分布在寄存器自身之中。
在体系架构仿真期间,需要分布式标志体系架构仿真中央式访客标志体系架构的行为。分布式标志体系架构还可通过使用多个独立的标志寄存器加以实现,独立的标志寄存器和与数据寄存器相关联的标志字段不同。例如,数据寄存器可实现为R0到R15,而独立标志寄存器可实现为F0到F3。在该情况下,那些标志寄存器不与数据寄存器直接相关联。
图36示出根据本发明一个实施例的、阐明机器的前端、调度器、执行单元和中央标志寄存器的示意图。在该实现方案中,前端基于指令更新访客指令标志的方式将进入的指令分类。在一个实施例中,访客指令被分为4个本地指令类型,T1、T2、T3和T4。T1-T4是指示每个访客指令类型更新哪些标志字段的指令类型。访客指令类型基于其类型更新不同的访客指令标志。例如,逻辑访客指令更新T1本地指令。
图37示出如本发明的实施例所实现的中央标志寄存器仿真过程的示意图。图37中的作用物包括最新更新类型表、重命名表扩展、物理寄存器以及分布式标志寄存器。现在通过图38的流程图描述图37。
图38示出仿真访客设置中的中央标志寄存器行为的过程3800的步骤的流程图。
在步骤3801中,前端/动态转换器(硬件或软件)基于进入的指令更新访客指令标志的方式将进入的指令进行分类。在一个实施例中,访客指令被分为四种标志体系架构类型,T1、T2、T3和T4。T1-T4是指示每个访客指令类型更新哪些标志字段的指令类型。访客指令类型基于其类型更新不同的访客标志。例如,逻辑访客指令更新T1类型标志,移位访客指令更新T2类型标志,算术访客指令更新T3类型标志,并且特殊访客指令更新T4类型标志。应该注意,访客指令可以是体系架构指令表示,而本地的可以是机器内部执行的(例如微代码)。可替代地,访客指令可以是来自经仿真的体系架构(例如x86、java、ARM代码等)的指令。
在步骤3802中,将那些指令类型更新其各自访客标志的顺序记录在最新更新类型表数据结构中。在一个实施例中,该动作由机器的前端实施。
在步骤3803中,当那些指令类型到达调度器(分配/重命名级的有序部分)时,调度器指派与体系架构类型相对应的隐含的物理目的地,并在重命名/映射表数据结构中记录该指派。
并且在步骤3804中,当后续访客指令到达调度器中的分配/重命名级并且该指令想要读取访客标志字段时,(a)机器确定哪些标志体系架构类型需要被访问以实施读取。(b)如果发现所有需要的标志为同一最新更新标志类型(例如,如由最新更新类型表所确定的),则读取(映射到该最新标志类型的)相对应的物理寄存器以获得所需要的标志。(c)如果发现所有需要的标志不是同一最新更新标志类型,则需要从映射到个体最新更新标志类型的相对应的物理寄存器中读取每个标志。
以及在步骤3805中,从保留其最后被更新的如最新更新标志类型表所跟踪的最新值的物理寄存器中分别读取每个标志。
应该注意,如果最新更新类型包括另一类型,那么所有子集类型需要映射到母集类型的相同物理寄存器。
在退出时,目的地标志字段与克隆的中央式/访客标志体系架构寄存器合并。应该注意,由于本地体系架构利用分布式标志体系架构而不是单个寄存器中央式标志体系架构,所以实施克隆。
更新某些标志类型的指令的示例:
CF、OF、SF、ZR—算术指令和加载/写标志指令
SF、ZF和条件CF—逻辑和移位
SF、ZF—移动/加载、EXTR、一些乘
ZF—POPCNT和STREX[P]
GE—SIMD指令???
读取某些标志的条件/预测的示例:
0000EQ相等Z==1
0001NE不相等或无序Z==0
0010CSb进位置位,大于或等于,或无序C==1
0011CCc进位清零,小于C==0
0100MI减,负,小于N==1
0101PL加,正或零,大于或等于,无序N==00110VS溢出,无序V==1
0111VC不溢出,非无序V==0
1000HI无符号更高,大于,无序C==1并且Z==0
1001LS无符号更新或相同,小于或等于,C==0或Z==1
1010GE有符号大于或等于,大于或等于N==V
1011LT有符号小于,小于,无序N!=V
1100GT有符号大于,大于Z==0并且N==V
1101LE有符号小于或等于,小于或等于,无序Z==1或N!=V
1110无(AL),总是(无条件),任何标志设置为任何值。
以上已参照具体实施例以解释为目的对前述说明书进行了描述。然而,上面的示例性讨论并非意在穷尽或将本发明限制到所公开的精确形式。在上述教导下的许多修改和变形是可能的。实施例的选择和描述是为了最好地解释本发明的原理和其实际应用,从而使本领域其他技术人员能够最好地以如可能适合于预期实际使用的各种修改来利用本发明和各种实施例。
Claims (20)
1.一种用于执行分组成块的多线程指令的方法,包括:
使用全局前端接收进入的指令序列;
将所述指令分组以形成指令块,其中所述指令块的所述指令与多个线程交错;
调度所述指令块的所述指令以依照所述多个线程执行;以及
跟踪对所述多个线程的执行以强制执行管线中的公正性。
2.根据权利要求1所述的方法,其中属于不同线程的块能够在调度器阵列中交错。
3.根据权利要求1所述的方法,其中使用调度器线程指针来映射调度器阵列内交错的属于不同线程的块。
4.根据权利要求1所述的方法,其中使用分配计数器来分配调度器阵列内的线程的块以实现公正策略。
5.根据权利要求1所述的方法,其中使用动态的基于日历的分配来分配调度器阵列内的线程的块,以实现公正策略。
6.根据权利要求1所述的方法,其中使用动态的基于日历的分配来分配调度器阵列内的线程的块,以实现线程分配的动态比例。
7.根据权利要求1所述的方法,其中使用分配计数器来分配调度器阵列内的线程的块,以实现阻止一个线程阻塞另一个线程的进度。
8.一种非暂时性计算机可读介质,具有计算机可读代码,所述代码当由计算机系统执行时,致使所述计算机系统实现用于执行分组成块的多线程指令的方法,包括:
使用全局前端接收进入的指令序列;
将所述指令分组以形成指令块,其中所述指令块的所述指令与多个线程交错;
调度所述指令块的所述指令以依照所述多个线程执行;以及
跟踪对所述多个线程的执行以强制执行管线中的公正性。
9.根据权利要求8所述的计算机可读介质,其中属于不同线程的块能够在调度器阵列中交错。
10.根据权利要求8所述的计算机可读介质,其中使用调度器线程指针来映射调度器阵列内交错的属于不同线程的块。
11.根据权利要求8所述的计算机可读介质,其中使用分配计数器来分配调度器阵列内的线程的块,以实现公正策略。
12.根据权利要求8所述的计算机可读介质,其中使用动态的基于日历的分配来分配调度器阵列内的线程的块,以实现公正策略。
13.根据权利要求8所述的计算机可读介质,其中使用动态的基于日历的分配来分配调度器阵列内的线程的块,以实现线程分配的动态比例。
14.根据权利要求8所述的计算机可读介质,其中使用分配计数器来分配调度器阵列内的线程的块,以实现阻止一个线程阻塞另一个线程的进度。
15.一种计算机系统,具有耦连到存储器的处理器,所述存储器具有计算机可读代码,所述代码当由所述计算机系统执行时,致使所述计算机系统实现用于执行分组成块的多线程指令的方法,包括:
使用全局前端接收进入的指令序列;
将所述指令分组以形成指令块,其中所述指令块的所述指令与多个线程交错;
调度所述指令块的所述指令以依照所述多个线程执行;以及
跟踪对所述多个线程的执行以强制执行管线中的公正性。
16.根据权利要求15所述的计算机系统,其中属于不同线程的块能够在调度器阵列中交错。
17.根据权利要求15所述的计算机系统,其中使用调度器线程指针来映射调度器阵列内交错的属于不同线程的块。
18.根据权利要求15所述的计算机系统,其中使用分配计数器来分配调度器阵列内的线程的块,以实现公正策略。
19.根据权利要求15所述的计算机系统,其中使用动态的基于日历的分配来分配调度器阵列内的线程的块,以实现公正策略。
20.一种用于实现微处理器中的减小尺寸的寄存器图数据结构的方法,包括:
使用全局前端接收进入的指令序列;
将所述指令分组以形成指令块;
使用多个多路复用器访问调度阵列的端口,以将所述指令块存储为一系列组块。
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361800123P | 2013-03-15 | 2013-03-15 | |
US61/800,123 | 2013-03-15 | ||
PCT/US2014/024775 WO2014151018A1 (en) | 2013-03-15 | 2014-03-12 | A method for executing multithreaded instructions grouped onto blocks |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105210040A true CN105210040A (zh) | 2015-12-30 |
CN105210040B CN105210040B (zh) | 2019-04-02 |
Family
ID=51534783
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201480024528.0A Active CN105210040B (zh) | 2013-03-15 | 2014-03-12 | 用于执行分组成块的多线程指令的方法 |
Country Status (6)
Country | Link |
---|---|
US (2) | US9811377B2 (zh) |
EP (1) | EP2972845B1 (zh) |
KR (2) | KR101708591B1 (zh) |
CN (1) | CN105210040B (zh) |
TW (1) | TWI619077B (zh) |
WO (1) | WO2014151018A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106484519A (zh) * | 2016-10-11 | 2017-03-08 | 东南大学 | 异步线程重组方法及基于该方法的simt处理器 |
CN108475196A (zh) * | 2015-12-02 | 2018-08-31 | 斯瓦姆64有限责任公司 | 数据处理 |
Families Citing this family (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103646009B (zh) | 2006-04-12 | 2016-08-17 | 索夫特机械公司 | 对载明并行和依赖运算的指令矩阵进行处理的装置和方法 |
CN101627365B (zh) | 2006-11-14 | 2017-03-29 | 索夫特机械公司 | 多线程架构 |
CN103250131B (zh) | 2010-09-17 | 2015-12-16 | 索夫特机械公司 | 包括用于早期远分支预测的影子缓存的单周期多分支预测 |
TWI518504B (zh) | 2011-03-25 | 2016-01-21 | 軟體機器公司 | 使用可分割引擎實體化的虛擬核心以支援程式碼區塊執行的暫存器檔案節段 |
WO2012135050A2 (en) | 2011-03-25 | 2012-10-04 | Soft Machines, Inc. | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
CN103547993B (zh) | 2011-03-25 | 2018-06-26 | 英特尔公司 | 通过使用由可分割引擎实例化的虚拟核来执行指令序列代码块 |
TWI603198B (zh) | 2011-05-20 | 2017-10-21 | 英特爾股份有限公司 | 以複數個引擎作資源與互連結構的分散式分配以支援指令序列的執行 |
CN103649931B (zh) | 2011-05-20 | 2016-10-12 | 索夫特机械公司 | 用于支持由多个引擎执行指令序列的互连结构 |
EP2783281B1 (en) | 2011-11-22 | 2020-05-13 | Intel Corporation | A microprocessor accelerated code optimizer |
WO2013077875A1 (en) | 2011-11-22 | 2013-05-30 | Soft Machines, Inc. | An accelerated code optimizer for a multiengine microprocessor |
US10275255B2 (en) | 2013-03-15 | 2019-04-30 | Intel Corporation | Method for dependency broadcasting through a source organized source view data structure |
US9904625B2 (en) | 2013-03-15 | 2018-02-27 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
US9886279B2 (en) | 2013-03-15 | 2018-02-06 | Intel Corporation | Method for populating and instruction view data structure by using register template snapshots |
US9632825B2 (en) | 2013-03-15 | 2017-04-25 | Intel Corporation | Method and apparatus for efficient scheduling for asymmetrical execution units |
KR101708591B1 (ko) * | 2013-03-15 | 2017-02-20 | 소프트 머신즈, 인크. | 블록들로 그룹화된 멀티스레드 명령어들을 실행하기 위한 방법 |
US9891924B2 (en) | 2013-03-15 | 2018-02-13 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
WO2014150806A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for populating register view data structure by using register template snapshots |
US10140138B2 (en) | 2013-03-15 | 2018-11-27 | Intel Corporation | Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation |
US9569216B2 (en) | 2013-03-15 | 2017-02-14 | Soft Machines, Inc. | Method for populating a source view data structure by using register template snapshots |
WO2014150991A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for implementing a reduced size register view data structure in a microprocessor |
US9811342B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for performing dual dispatch of blocks and half blocks |
EP2972836B1 (en) | 2013-03-15 | 2022-11-09 | Intel Corporation | A method for emulating a guest centralized flag architecture by using a native distributed flag architecture |
WO2014150971A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for dependency broadcasting through a block organized source view data structure |
US9348595B1 (en) | 2014-12-22 | 2016-05-24 | Centipede Semi Ltd. | Run-time code parallelization with continuous monitoring of repetitive instruction sequences |
US9135015B1 (en) | 2014-12-25 | 2015-09-15 | Centipede Semi Ltd. | Run-time code parallelization with monitoring of repetitive instruction sequences during branch mis-prediction |
US9208066B1 (en) * | 2015-03-04 | 2015-12-08 | Centipede Semi Ltd. | Run-time code parallelization with approximate monitoring of instruction sequences |
US10296346B2 (en) | 2015-03-31 | 2019-05-21 | Centipede Semi Ltd. | Parallelized execution of instruction sequences based on pre-monitoring |
US10296350B2 (en) | 2015-03-31 | 2019-05-21 | Centipede Semi Ltd. | Parallelized execution of instruction sequences |
US9715390B2 (en) | 2015-04-19 | 2017-07-25 | Centipede Semi Ltd. | Run-time parallelization of code execution based on an approximate register-access specification |
US9935927B2 (en) * | 2015-10-16 | 2018-04-03 | Quanta Computer Inc. | System and method for low energy double authentication between mobile device and server nodes |
US10120688B2 (en) | 2016-11-15 | 2018-11-06 | Andes Technology Corporation | Data processing system and method for executing block call and block return instructions |
CN110032407B (zh) * | 2019-03-08 | 2020-12-22 | 创新先进技术有限公司 | 提升cpu并行性能的方法及装置和电子设备 |
US11372769B1 (en) * | 2019-08-29 | 2022-06-28 | Xilinx, Inc. | Fine-grained multi-tenant cache management |
US20210073056A1 (en) * | 2019-09-11 | 2021-03-11 | Advanced Micro Devices, Inc. | Distributed scheduler providing execution pipe balance |
US11288072B2 (en) * | 2019-09-11 | 2022-03-29 | Ceremorphic, Inc. | Multi-threaded processor with thread granularity |
CN112527393A (zh) * | 2019-09-18 | 2021-03-19 | 无锡江南计算技术研究所 | 面向主从融合架构处理器的指令调度优化装置和方法 |
USD915534S1 (en) | 2019-09-25 | 2021-04-06 | Coulter Ventures, Llc. | Weight rack with name plate |
KR102224446B1 (ko) * | 2019-10-14 | 2021-03-09 | 이화여자대학교 산학협력단 | Gpgpu 스레드 블록 스케줄링 확장 방법 및 장치 |
US20220121487A1 (en) * | 2020-10-20 | 2022-04-21 | Micron Technology, Inc. | Thread scheduling control and memory splitting in a barrel processor |
US20220121593A1 (en) * | 2021-12-23 | 2022-04-21 | Intel Corporation | Systems And Methods For Processor Circuits |
USD1010033S1 (en) | 2022-05-23 | 2024-01-02 | Coulter Ventures, Llc. | Weight rack |
Family Cites Families (473)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US727487A (en) | 1902-10-21 | 1903-05-05 | Swan F Swanson | Dumping-car. |
US4075704A (en) | 1976-07-02 | 1978-02-21 | Floating Point Systems, Inc. | Floating point data processor for high speech operation |
US4228496A (en) | 1976-09-07 | 1980-10-14 | Tandem Computers Incorporated | Multiprocessor system |
US4245344A (en) | 1979-04-02 | 1981-01-13 | Rockwell International Corporation | Processing system with dual buses |
US4527237A (en) | 1979-10-11 | 1985-07-02 | Nanodata Computer Corporation | Data processing system |
US4414624A (en) | 1980-11-19 | 1983-11-08 | The United States Of America As Represented By The Secretary Of The Navy | Multiple-microcomputer processing |
US4524415A (en) | 1982-12-07 | 1985-06-18 | Motorola, Inc. | Virtual machine data processor |
US4597061B1 (en) | 1983-01-03 | 1998-06-09 | Texas Instruments Inc | Memory system using pipleline circuitry for improved system |
US4577273A (en) | 1983-06-06 | 1986-03-18 | Sperry Corporation | Multiple microcomputer system for digital computers |
US4682281A (en) | 1983-08-30 | 1987-07-21 | Amdahl Corporation | Data storage unit employing translation lookaside buffer pointer |
US4600986A (en) | 1984-04-02 | 1986-07-15 | Sperry Corporation | Pipelined split stack with high performance interleaved decode |
US4633434A (en) | 1984-04-02 | 1986-12-30 | Sperry Corporation | High performance storage unit |
JPS6140643A (ja) | 1984-07-31 | 1986-02-26 | Hitachi Ltd | システムの資源割当て制御方式 |
US4835680A (en) | 1985-03-15 | 1989-05-30 | Xerox Corporation | Adaptive processor array capable of learning variable associations useful in recognizing classes of inputs |
JPS6289149A (ja) | 1985-10-15 | 1987-04-23 | Agency Of Ind Science & Technol | 多ポ−トメモリシステム |
JPH0658650B2 (ja) | 1986-03-14 | 1994-08-03 | 株式会社日立製作所 | 仮想計算機システム |
US4920477A (en) | 1987-04-20 | 1990-04-24 | Multiflow Computer, Inc. | Virtual address table look aside buffer miss recovery method and apparatus |
US4943909A (en) | 1987-07-08 | 1990-07-24 | At&T Bell Laboratories | Computational origami |
US5339398A (en) | 1989-07-31 | 1994-08-16 | North American Philips Corporation | Memory architecture and method of data organization optimized for hashing |
US5471593A (en) | 1989-12-11 | 1995-11-28 | Branigin; Michael H. | Computer processor with an efficient means of executing many instructions simultaneously |
US5197130A (en) | 1989-12-29 | 1993-03-23 | Supercomputer Systems Limited Partnership | Cluster architecture for a highly parallel scalar/vector multiprocessor system |
US5317754A (en) | 1990-10-23 | 1994-05-31 | International Business Machines Corporation | Method and apparatus for enabling an interpretive execution subset |
US5317705A (en) | 1990-10-24 | 1994-05-31 | International Business Machines Corporation | Apparatus and method for TLB purge reduction in a multi-level machine system |
US6282583B1 (en) | 1991-06-04 | 2001-08-28 | Silicon Graphics, Inc. | Method and apparatus for memory access in a matrix processor computer |
US5539911A (en) | 1991-07-08 | 1996-07-23 | Seiko Epson Corporation | High-performance, superscalar-based computer system with out-of-order instruction execution |
JPH0820949B2 (ja) | 1991-11-26 | 1996-03-04 | 松下電器産業株式会社 | 情報処理装置 |
JPH07502358A (ja) | 1991-12-23 | 1995-03-09 | インテル・コーポレーション | マイクロプロセッサーのクロックに依るマルチプル・アクセスのためのインターリーブ・キャッシュ |
KR100309566B1 (ko) * | 1992-04-29 | 2001-12-15 | 리패치 | 파이프라인프로세서에서다중명령어를무리짓고,그룹화된명령어를동시에발행하고,그룹화된명령어를실행시키는방법및장치 |
DE69308548T2 (de) | 1992-05-01 | 1997-06-12 | Seiko Epson Corp | Vorrichtung und verfahren zum befehlsabschluss in einem superskalaren prozessor. |
DE69329260T2 (de) | 1992-06-25 | 2001-02-22 | Canon Kk | Gerät zum Multiplizieren von Ganzzahlen mit vielen Ziffern |
JPH0637202A (ja) | 1992-07-20 | 1994-02-10 | Mitsubishi Electric Corp | マイクロ波ic用パッケージ |
JPH06110781A (ja) | 1992-09-30 | 1994-04-22 | Nec Corp | キャッシュメモリ装置 |
US5493660A (en) | 1992-10-06 | 1996-02-20 | Hewlett-Packard Company | Software assisted hardware TLB miss handler |
US5513335A (en) | 1992-11-02 | 1996-04-30 | Sgs-Thomson Microelectronics, Inc. | Cache tag memory having first and second single-port arrays and a dual-port array |
US5819088A (en) | 1993-03-25 | 1998-10-06 | Intel Corporation | Method and apparatus for scheduling instructions for execution on a multi-issue architecture computer |
JPH0784883A (ja) | 1993-09-17 | 1995-03-31 | Hitachi Ltd | 仮想計算機システムのアドレス変換バッファパージ方法 |
US6948172B1 (en) | 1993-09-21 | 2005-09-20 | Microsoft Corporation | Preemptive multi-tasking with cooperative groups of tasks |
US5469376A (en) | 1993-10-14 | 1995-11-21 | Abdallah; Mohammad A. F. F. | Digital circuit for the evaluation of mathematical expressions |
US5517651A (en) | 1993-12-29 | 1996-05-14 | Intel Corporation | Method and apparatus for loading a segment register in a microprocessor capable of operating in multiple modes |
US5761476A (en) | 1993-12-30 | 1998-06-02 | Intel Corporation | Non-clocked early read for back-to-back scheduling of instructions |
US5956753A (en) | 1993-12-30 | 1999-09-21 | Intel Corporation | Method and apparatus for handling speculative memory access operations |
JP3048498B2 (ja) | 1994-04-13 | 2000-06-05 | 株式会社東芝 | 半導体記憶装置 |
JPH07287668A (ja) | 1994-04-19 | 1995-10-31 | Hitachi Ltd | データ処理装置 |
CN1084005C (zh) | 1994-06-27 | 2002-05-01 | 国际商业机器公司 | 用于动态控制地址空间分配的方法和设备 |
US5548742A (en) | 1994-08-11 | 1996-08-20 | Intel Corporation | Method and apparatus for combining a direct-mapped cache and a multiple-way cache in a cache memory |
US5813031A (en) | 1994-09-21 | 1998-09-22 | Industrial Technology Research Institute | Caching tag for a large scale cache computer memory system |
US5640534A (en) | 1994-10-05 | 1997-06-17 | International Business Machines Corporation | Method and system for concurrent access in a data cache array utilizing multiple match line selection paths |
US5835951A (en) | 1994-10-18 | 1998-11-10 | National Semiconductor | Branch processing unit with target cache read prioritization protocol for handling multiple hits |
JP3569014B2 (ja) | 1994-11-25 | 2004-09-22 | 富士通株式会社 | マルチコンテキストをサポートするプロセッサおよび処理方法 |
US5724565A (en) | 1995-02-03 | 1998-03-03 | International Business Machines Corporation | Method and system for processing first and second sets of instructions by first and second types of processing systems |
US5673408A (en) | 1995-02-14 | 1997-09-30 | Hal Computer Systems, Inc. | Processor structure and method for renamable trap-stack |
US5675759A (en) | 1995-03-03 | 1997-10-07 | Shebanow; Michael C. | Method and apparatus for register management using issue sequence prior physical register and register association validity information |
US5751982A (en) | 1995-03-31 | 1998-05-12 | Apple Computer, Inc. | Software emulation system with dynamic translation of emulated instructions for increased processing speed |
US5634068A (en) | 1995-03-31 | 1997-05-27 | Sun Microsystems, Inc. | Packet switched cache coherent multiprocessor system |
US6209085B1 (en) | 1995-05-05 | 2001-03-27 | Intel Corporation | Method and apparatus for performing process switching in multiprocessor computer systems |
US6643765B1 (en) | 1995-08-16 | 2003-11-04 | Microunity Systems Engineering, Inc. | Programmable processor with group floating point operations |
US5710902A (en) | 1995-09-06 | 1998-01-20 | Intel Corporation | Instruction dependency chain indentifier |
US6341324B1 (en) | 1995-10-06 | 2002-01-22 | Lsi Logic Corporation | Exception processing in superscalar microprocessor |
US5864657A (en) | 1995-11-29 | 1999-01-26 | Texas Micro, Inc. | Main memory system and checkpointing protocol for fault-tolerant computer system |
US5983327A (en) | 1995-12-01 | 1999-11-09 | Nortel Networks Corporation | Data path architecture and arbitration scheme for providing access to a shared system resource |
US5793941A (en) | 1995-12-04 | 1998-08-11 | Advanced Micro Devices, Inc. | On-chip primary cache testing circuit and test method |
US5911057A (en) | 1995-12-19 | 1999-06-08 | Texas Instruments Incorporated | Superscalar microprocessor having combined register and memory renaming circuits, systems, and methods |
US5699537A (en) | 1995-12-22 | 1997-12-16 | Intel Corporation | Processor microarchitecture for efficient dynamic scheduling and execution of chains of dependent instructions |
US6882177B1 (en) | 1996-01-10 | 2005-04-19 | Altera Corporation | Tristate structures for programmable logic devices |
US5754818A (en) | 1996-03-22 | 1998-05-19 | Sun Microsystems, Inc. | Architecture and method for sharing TLB entries through process IDS |
US5904892A (en) | 1996-04-01 | 1999-05-18 | Saint-Gobain/Norton Industrial Ceramics Corp. | Tape cast silicon carbide dummy wafer |
US5752260A (en) | 1996-04-29 | 1998-05-12 | International Business Machines Corporation | High-speed, multiple-port, interleaved cache with arbitration of multiple access addresses |
US5806085A (en) | 1996-05-01 | 1998-09-08 | Sun Microsystems, Inc. | Method for non-volatile caching of network and CD-ROM file accesses using a cache directory, pointers, file name conversion, a local hard disk, and separate small database |
US5829028A (en) | 1996-05-06 | 1998-10-27 | Advanced Micro Devices, Inc. | Data cache configured to store data in a use-once manner |
US6108769A (en) | 1996-05-17 | 2000-08-22 | Advanced Micro Devices, Inc. | Dependency table for reducing dependency checking hardware |
US5881277A (en) | 1996-06-13 | 1999-03-09 | Texas Instruments Incorporated | Pipelined microprocessor with branch misprediction cache circuits, systems and methods |
US5860146A (en) | 1996-06-25 | 1999-01-12 | Sun Microsystems, Inc. | Auxiliary translation lookaside buffer for assisting in accessing data in remote address spaces |
US5903760A (en) | 1996-06-27 | 1999-05-11 | Intel Corporation | Method and apparatus for translating a conditional instruction compatible with a first instruction set architecture (ISA) into a conditional instruction compatible with a second ISA |
US5974506A (en) | 1996-06-28 | 1999-10-26 | Digital Equipment Corporation | Enabling mirror, nonmirror and partial mirror cache modes in a dual cache system |
US6167490A (en) | 1996-09-20 | 2000-12-26 | University Of Washington | Using global memory information to manage memory in a computer network |
KR19980032776A (ko) | 1996-10-16 | 1998-07-25 | 가나이 츠토무 | 데이타 프로세서 및 데이타 처리시스템 |
KR19990076967A (ko) | 1996-11-04 | 1999-10-25 | 요트.게.아. 롤페즈 | 처리 장치 및 메모리내의 명령 판독 |
US5978906A (en) | 1996-11-19 | 1999-11-02 | Advanced Micro Devices, Inc. | Branch selectors associated with byte ranges within an instruction cache for rapidly identifying branch predictions |
US6253316B1 (en) | 1996-11-19 | 2001-06-26 | Advanced Micro Devices, Inc. | Three state branch history using one bit in a branch prediction mechanism |
US5903750A (en) | 1996-11-20 | 1999-05-11 | Institute For The Development Of Emerging Architectures, L.L.P. | Dynamic branch prediction for branch instructions with multiple targets |
US6212542B1 (en) | 1996-12-16 | 2001-04-03 | International Business Machines Corporation | Method and system for executing a program within a multiscalar processor by processing linked thread descriptors |
US6134634A (en) | 1996-12-20 | 2000-10-17 | Texas Instruments Incorporated | Method and apparatus for preemptive cache write-back |
US5918251A (en) | 1996-12-23 | 1999-06-29 | Intel Corporation | Method and apparatus for preloading different default address translation attributes |
US6016540A (en) | 1997-01-08 | 2000-01-18 | Intel Corporation | Method and apparatus for scheduling instructions in waves |
US6065105A (en) | 1997-01-08 | 2000-05-16 | Intel Corporation | Dependency matrix |
US5802602A (en) | 1997-01-17 | 1998-09-01 | Intel Corporation | Method and apparatus for performing reads of related data from a set-associative cache memory |
US6088780A (en) | 1997-03-31 | 2000-07-11 | Institute For The Development Of Emerging Architecture, L.L.C. | Page table walker that uses at least one of a default page size and a page size selected for a virtual address space to position a sliding field in a virtual address |
US6075938A (en) | 1997-06-10 | 2000-06-13 | The Board Of Trustees Of The Leland Stanford Junior University | Virtual machine monitors for scalable multiprocessors |
US6073230A (en) | 1997-06-11 | 2000-06-06 | Advanced Micro Devices, Inc. | Instruction fetch unit configured to provide sequential way prediction for sequential instruction fetches |
JPH1124929A (ja) | 1997-06-30 | 1999-01-29 | Sony Corp | 演算処理装置およびその方法 |
US6128728A (en) | 1997-08-01 | 2000-10-03 | Micron Technology, Inc. | Virtual shadow registers and virtual register windows |
US6170051B1 (en) | 1997-08-01 | 2001-01-02 | Micron Technology, Inc. | Apparatus and method for program level parallelism in a VLIW processor |
US6085315A (en) | 1997-09-12 | 2000-07-04 | Siemens Aktiengesellschaft | Data processing device with loop pipeline |
US6101577A (en) | 1997-09-15 | 2000-08-08 | Advanced Micro Devices, Inc. | Pipelined instruction cache and branch prediction mechanism therefor |
US5901294A (en) | 1997-09-18 | 1999-05-04 | International Business Machines Corporation | Method and system for bus arbitration in a multiprocessor system utilizing simultaneous variable-width bus access |
US6185660B1 (en) | 1997-09-23 | 2001-02-06 | Hewlett-Packard Company | Pending access queue for providing data to a target register during an intermediate pipeline phase after a computer cache miss |
US5905509A (en) | 1997-09-30 | 1999-05-18 | Compaq Computer Corp. | Accelerated Graphics Port two level Gart cache having distributed first level caches |
US6226732B1 (en) | 1997-10-02 | 2001-05-01 | Hitachi Micro Systems, Inc. | Memory system architecture |
US5922065A (en) | 1997-10-13 | 1999-07-13 | Institute For The Development Of Emerging Architectures, L.L.C. | Processor utilizing a template field for encoding instruction sequences in a wide-word format |
US6178482B1 (en) | 1997-11-03 | 2001-01-23 | Brecis Communications | Virtual register sets |
US6021484A (en) | 1997-11-14 | 2000-02-01 | Samsung Electronics Co., Ltd. | Dual instruction set architecture |
US6256728B1 (en) | 1997-11-17 | 2001-07-03 | Advanced Micro Devices, Inc. | Processor configured to selectively cancel instructions from its pipeline responsive to a predicted-taken short forward branch instruction |
US6260131B1 (en) | 1997-11-18 | 2001-07-10 | Intrinsity, Inc. | Method and apparatus for TLB memory ordering |
US6016533A (en) | 1997-12-16 | 2000-01-18 | Advanced Micro Devices, Inc. | Way prediction logic for cache array |
US6219776B1 (en) | 1998-03-10 | 2001-04-17 | Billions Of Operations Per Second | Merged array controller and processing element |
US6609189B1 (en) | 1998-03-12 | 2003-08-19 | Yale University | Cycle segmented prefix circuits |
JP3657424B2 (ja) | 1998-03-20 | 2005-06-08 | 松下電器産業株式会社 | 番組情報を放送するセンター装置と端末装置 |
US6216215B1 (en) | 1998-04-02 | 2001-04-10 | Intel Corporation | Method and apparatus for senior loads |
US6157998A (en) | 1998-04-03 | 2000-12-05 | Motorola Inc. | Method for performing branch prediction and resolution of two or more branch instructions within two or more branch prediction buffers |
US6205545B1 (en) | 1998-04-30 | 2001-03-20 | Hewlett-Packard Company | Method and apparatus for using static branch predictions hints with dynamically translated code traces to improve performance |
US6115809A (en) | 1998-04-30 | 2000-09-05 | Hewlett-Packard Company | Compiling strong and weak branching behavior instruction blocks to separate caches for dynamic and static prediction |
US6256727B1 (en) | 1998-05-12 | 2001-07-03 | International Business Machines Corporation | Method and system for fetching noncontiguous instructions in a single clock cycle |
JPH11338710A (ja) | 1998-05-28 | 1999-12-10 | Toshiba Corp | 複数種の命令セットを持つプロセッサのためのコンパイル方法ならびに装置および同方法がプログラムされ記録される記録媒体 |
US6272616B1 (en) | 1998-06-17 | 2001-08-07 | Agere Systems Guardian Corp. | Method and apparatus for executing multiple instruction streams in a digital processor with multiple data paths |
US6988183B1 (en) | 1998-06-26 | 2006-01-17 | Derek Chi-Lan Wong | Methods for increasing instruction-level parallelism in microprocessors and digital system |
US6260138B1 (en) | 1998-07-17 | 2001-07-10 | Sun Microsystems, Inc. | Method and apparatus for branch instruction processing in a processor |
US6122656A (en) | 1998-07-31 | 2000-09-19 | Advanced Micro Devices, Inc. | Processor configured to map logical register numbers to physical register numbers using virtual register numbers |
US6272662B1 (en) | 1998-08-04 | 2001-08-07 | International Business Machines Corporation | Distributed storage system using front-end and back-end locking |
JP2000057054A (ja) | 1998-08-12 | 2000-02-25 | Fujitsu Ltd | 高速アドレス変換システム |
US8631066B2 (en) | 1998-09-10 | 2014-01-14 | Vmware, Inc. | Mechanism for providing virtual machines for use by multiple users |
US6339822B1 (en) * | 1998-10-02 | 2002-01-15 | Advanced Micro Devices, Inc. | Using padded instructions in a block-oriented cache |
US6332189B1 (en) | 1998-10-16 | 2001-12-18 | Intel Corporation | Branch prediction architecture |
GB9825102D0 (en) | 1998-11-16 | 1999-01-13 | Insignia Solutions Plc | Computer system |
JP3110404B2 (ja) | 1998-11-18 | 2000-11-20 | 甲府日本電気株式会社 | マイクロプロセッサ装置及びそのソフトウェア命令高速化方法並びにその制御プログラムを記録した記録媒体 |
US6490673B1 (en) | 1998-11-27 | 2002-12-03 | Matsushita Electric Industrial Co., Ltd | Processor, compiling apparatus, and compile program recorded on a recording medium |
US6519682B2 (en) | 1998-12-04 | 2003-02-11 | Stmicroelectronics, Inc. | Pipelined non-blocking level two cache system with inherent transaction collision-avoidance |
US6477562B2 (en) * | 1998-12-16 | 2002-11-05 | Clearwater Networks, Inc. | Prioritized instruction scheduling for multi-streaming processors |
US7020879B1 (en) | 1998-12-16 | 2006-03-28 | Mips Technologies, Inc. | Interrupt and exception handling for multi-streaming digital processors |
US6247097B1 (en) | 1999-01-22 | 2001-06-12 | International Business Machines Corporation | Aligned instruction cache handling of instruction fetches across multiple predicted branch instructions |
US6321298B1 (en) | 1999-01-25 | 2001-11-20 | International Business Machines Corporation | Full cache coherency across multiple raid controllers |
JP3842474B2 (ja) | 1999-02-02 | 2006-11-08 | 株式会社ルネサステクノロジ | データ処理装置 |
US6327650B1 (en) | 1999-02-12 | 2001-12-04 | Vsli Technology, Inc. | Pipelined multiprocessing with upstream processor concurrently writing to local register and to register of downstream processor |
US6732220B2 (en) | 1999-02-17 | 2004-05-04 | Elbrus International | Method for emulating hardware features of a foreign architecture in a host operating system environment |
US6668316B1 (en) | 1999-02-17 | 2003-12-23 | Elbrus International Limited | Method and apparatus for conflict-free execution of integer and floating-point operations with a common register file |
US6418530B2 (en) | 1999-02-18 | 2002-07-09 | Hewlett-Packard Company | Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions |
US6437789B1 (en) | 1999-02-19 | 2002-08-20 | Evans & Sutherland Computer Corporation | Multi-level cache controller |
US6850531B1 (en) | 1999-02-23 | 2005-02-01 | Alcatel | Multi-service network switch |
US6212613B1 (en) | 1999-03-22 | 2001-04-03 | Cisco Technology, Inc. | Methods and apparatus for reusing addresses in a computer |
US6529928B1 (en) | 1999-03-23 | 2003-03-04 | Silicon Graphics, Inc. | Floating-point adder performing floating-point and integer operations |
EP1050808B1 (en) | 1999-05-03 | 2008-04-30 | STMicroelectronics S.A. | Computer instruction scheduling |
US6449671B1 (en) | 1999-06-09 | 2002-09-10 | Ati International Srl | Method and apparatus for busing data elements |
US6473833B1 (en) | 1999-07-30 | 2002-10-29 | International Business Machines Corporation | Integrated cache and directory structure for multi-level caches |
US6643770B1 (en) | 1999-09-16 | 2003-11-04 | Intel Corporation | Branch misprediction recovery using a side memory |
US6772325B1 (en) | 1999-10-01 | 2004-08-03 | Hitachi, Ltd. | Processor architecture and operation for exploiting improved branch control instruction |
US6704822B1 (en) | 1999-10-01 | 2004-03-09 | Sun Microsystems, Inc. | Arbitration protocol for a shared data cache |
US6457120B1 (en) | 1999-11-01 | 2002-09-24 | International Business Machines Corporation | Processor and method including a cache having confirmation bits for improving address predictable branch instruction target predictions |
US7441110B1 (en) | 1999-12-10 | 2008-10-21 | International Business Machines Corporation | Prefetching using future branch path information derived from branch prediction |
US7107434B2 (en) | 1999-12-20 | 2006-09-12 | Board Of Regents, The University Of Texas | System, method and apparatus for allocating hardware resources using pseudorandom sequences |
JP4693326B2 (ja) | 1999-12-22 | 2011-06-01 | ウビコム インコーポレイテッド | 組込み型プロセッサにおいてゼロタイムコンテクストスイッチを用いて命令レベルをマルチスレッド化するシステムおよび方法 |
US6557095B1 (en) * | 1999-12-27 | 2003-04-29 | Intel Corporation | Scheduling operations using a dependency matrix |
DE60005860T2 (de) | 2000-01-03 | 2004-08-05 | Advanced Micro Devices, Inc., Sunnyvale | Ablaufsteuerung zum ausgeben und wiederausgeben von ketten abhängiger befehle |
US6542984B1 (en) | 2000-01-03 | 2003-04-01 | Advanced Micro Devices, Inc. | Scheduler capable of issuing and reissuing dependency chains |
US6594755B1 (en) | 2000-01-04 | 2003-07-15 | National Semiconductor Corporation | System and method for interleaved execution of multiple independent threads |
US6728872B1 (en) | 2000-02-04 | 2004-04-27 | International Business Machines Corporation | Method and apparatus for verifying that instructions are pipelined in correct architectural sequence |
GB0002848D0 (en) | 2000-02-08 | 2000-03-29 | Siroyan Limited | Communicating instruction results in processors and compiling methods for processors |
GB2365661A (en) | 2000-03-10 | 2002-02-20 | British Telecomm | Allocating switch requests within a packet switch |
US6615340B1 (en) | 2000-03-22 | 2003-09-02 | Wilmot, Ii Richard Byron | Extended operand management indicator structure and method |
US6604187B1 (en) | 2000-06-19 | 2003-08-05 | Advanced Micro Devices, Inc. | Providing global translations with address space numbers |
US6557083B1 (en) | 2000-06-30 | 2003-04-29 | Intel Corporation | Memory system for multiple data types |
US6704860B1 (en) | 2000-07-26 | 2004-03-09 | International Business Machines Corporation | Data processing system and method for fetching instruction blocks in response to a detected block sequence |
US7206925B1 (en) | 2000-08-18 | 2007-04-17 | Sun Microsystems, Inc. | Backing Register File for processors |
US6728866B1 (en) | 2000-08-31 | 2004-04-27 | International Business Machines Corporation | Partitioned issue queue and allocation strategy |
US6721874B1 (en) | 2000-10-12 | 2004-04-13 | International Business Machines Corporation | Method and system for dynamically shared completion table supporting multiple threads in a processing system |
US7757065B1 (en) | 2000-11-09 | 2010-07-13 | Intel Corporation | Instruction segment recording scheme |
JP2002185513A (ja) | 2000-12-18 | 2002-06-28 | Hitachi Ltd | パケット通信ネットワークおよびパケット転送制御方法 |
US6877089B2 (en) | 2000-12-27 | 2005-04-05 | International Business Machines Corporation | Branch prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program |
US6907600B2 (en) | 2000-12-27 | 2005-06-14 | Intel Corporation | Virtual translation lookaside buffer |
US6647466B2 (en) | 2001-01-25 | 2003-11-11 | Hewlett-Packard Development Company, L.P. | Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy |
FR2820921A1 (fr) | 2001-02-14 | 2002-08-16 | Canon Kk | Dispositif et procede de transmission dans un commutateur |
US6985951B2 (en) | 2001-03-08 | 2006-01-10 | International Business Machines Corporation | Inter-partition message passing method, system and program product for managing workload in a partitioned processing environment |
US6950927B1 (en) | 2001-04-13 | 2005-09-27 | The United States Of America As Represented By The Secretary Of The Navy | System and method for instruction-level parallelism in a programmable multiple network processor environment |
US7707397B2 (en) | 2001-05-04 | 2010-04-27 | Via Technologies, Inc. | Variable group associativity branch target address cache delivering multiple target addresses per cache line |
US7200740B2 (en) | 2001-05-04 | 2007-04-03 | Ip-First, Llc | Apparatus and method for speculatively performing a return instruction in a microprocessor |
US6658549B2 (en) | 2001-05-22 | 2003-12-02 | Hewlett-Packard Development Company, Lp. | Method and system allowing a single entity to manage memory comprising compressed and uncompressed data |
US6985591B2 (en) | 2001-06-29 | 2006-01-10 | Intel Corporation | Method and apparatus for distributing keys for decrypting and re-encrypting publicly distributed media |
US7203824B2 (en) | 2001-07-03 | 2007-04-10 | Ip-First, Llc | Apparatus and method for handling BTAC branches that wrap across instruction cache lines |
US7024545B1 (en) | 2001-07-24 | 2006-04-04 | Advanced Micro Devices, Inc. | Hybrid branch prediction device with two levels of branch prediction cache |
US6954846B2 (en) | 2001-08-07 | 2005-10-11 | Sun Microsystems, Inc. | Microprocessor and method for giving each thread exclusive access to one register file in a multi-threading mode and for giving an active thread access to multiple register files in a single thread mode |
US6718440B2 (en) | 2001-09-28 | 2004-04-06 | Intel Corporation | Memory access latency hiding with hint buffer |
US7150021B1 (en) | 2001-10-12 | 2006-12-12 | Palau Acquisition Corporation (Delaware) | Method and system to allocate resources within an interconnect device according to a resource allocation table |
US7117347B2 (en) | 2001-10-23 | 2006-10-03 | Ip-First, Llc | Processor including fallback branch prediction mechanism for far jump and far call instructions |
US7272832B2 (en) | 2001-10-25 | 2007-09-18 | Hewlett-Packard Development Company, L.P. | Method of protecting user process data in a secure platform inaccessible to the operating system and other tasks on top of the secure platform |
US6964043B2 (en) | 2001-10-30 | 2005-11-08 | Intel Corporation | Method, apparatus, and system to optimize frequently executed code and to use compiler transformation and hardware support to handle infrequently executed code |
GB2381886B (en) | 2001-11-07 | 2004-06-23 | Sun Microsystems Inc | Computer system with virtual memory and paging mechanism |
US7092869B2 (en) | 2001-11-14 | 2006-08-15 | Ronald Hilton | Memory address prediction under emulation |
US7363467B2 (en) | 2002-01-03 | 2008-04-22 | Intel Corporation | Dependence-chain processing using trace descriptors having dependency descriptors |
US6640333B2 (en) | 2002-01-10 | 2003-10-28 | Lsi Logic Corporation | Architecture for a sea of platforms |
US7055021B2 (en) | 2002-02-05 | 2006-05-30 | Sun Microsystems, Inc. | Out-of-order processor that reduces mis-speculation using a replay scoreboard |
US7331040B2 (en) | 2002-02-06 | 2008-02-12 | Transitive Limted | Condition code flag emulation for program code conversion |
US6839816B2 (en) | 2002-02-26 | 2005-01-04 | International Business Machines Corporation | Shared cache line update mechanism |
US6731292B2 (en) | 2002-03-06 | 2004-05-04 | Sun Microsystems, Inc. | System and method for controlling a number of outstanding data transactions within an integrated circuit |
JP3719509B2 (ja) | 2002-04-01 | 2005-11-24 | 株式会社ソニー・コンピュータエンタテインメント | シリアル演算パイプライン、演算装置、算術論理演算回路およびシリアル演算パイプラインによる演算方法 |
US7565509B2 (en) | 2002-04-17 | 2009-07-21 | Microsoft Corporation | Using limits on address translation to control access to an addressable entity |
US6920530B2 (en) | 2002-04-23 | 2005-07-19 | Sun Microsystems, Inc. | Scheme for reordering instructions via an instruction caching mechanism |
US7113488B2 (en) | 2002-04-24 | 2006-09-26 | International Business Machines Corporation | Reconfigurable circular bus |
US7281055B2 (en) | 2002-05-28 | 2007-10-09 | Newisys, Inc. | Routing mechanisms in systems having multiple multi-processor clusters |
US7117346B2 (en) | 2002-05-31 | 2006-10-03 | Freescale Semiconductor, Inc. | Data processing system having multiple register contexts and method therefor |
US6938151B2 (en) | 2002-06-04 | 2005-08-30 | International Business Machines Corporation | Hybrid branch prediction using a global selection counter and a prediction method comparison table |
US8024735B2 (en) * | 2002-06-14 | 2011-09-20 | Intel Corporation | Method and apparatus for ensuring fairness and forward progress when executing multiple threads of execution |
JP3845043B2 (ja) | 2002-06-28 | 2006-11-15 | 富士通株式会社 | 命令フェッチ制御装置 |
JP3982353B2 (ja) | 2002-07-12 | 2007-09-26 | 日本電気株式会社 | フォルトトレラントコンピュータ装置、その再同期化方法及び再同期化プログラム |
US6944744B2 (en) | 2002-08-27 | 2005-09-13 | Advanced Micro Devices, Inc. | Apparatus and method for independently schedulable functional units with issue lock mechanism in a processor |
US6950925B1 (en) | 2002-08-28 | 2005-09-27 | Advanced Micro Devices, Inc. | Scheduler for use in a microprocessor that supports data-speculative execution |
US7546422B2 (en) | 2002-08-28 | 2009-06-09 | Intel Corporation | Method and apparatus for the synchronization of distributed caches |
TW200408242A (en) | 2002-09-06 | 2004-05-16 | Matsushita Electric Ind Co Ltd | Home terminal apparatus and communication system |
US6895491B2 (en) | 2002-09-26 | 2005-05-17 | Hewlett-Packard Development Company, L.P. | Memory addressing for a virtual machine implementation on a computer processor supporting virtual hash-page-table searching |
US7334086B2 (en) | 2002-10-08 | 2008-02-19 | Rmi Corporation | Advanced processor with system on a chip interconnect technology |
US7213248B2 (en) | 2002-10-10 | 2007-05-01 | International Business Machines Corporation | High speed promotion mechanism suitable for lock acquisition in a multiprocessor data processing system |
US6829698B2 (en) | 2002-10-10 | 2004-12-07 | International Business Machines Corporation | Method, apparatus and system for acquiring a global promotion facility utilizing a data-less transaction |
US7222218B2 (en) * | 2002-10-22 | 2007-05-22 | Sun Microsystems, Inc. | System and method for goal-based scheduling of blocks of code for concurrent execution |
US20040103251A1 (en) | 2002-11-26 | 2004-05-27 | Mitchell Alsup | Microprocessor including a first level cache and a second level cache having different cache line sizes |
US7539879B2 (en) | 2002-12-04 | 2009-05-26 | Nxp B.V. | Register file gating to reduce microprocessor power dissipation |
US6981083B2 (en) | 2002-12-05 | 2005-12-27 | International Business Machines Corporation | Processor virtualization mechanism via an enhanced restoration of hard architected states |
US7073042B2 (en) | 2002-12-12 | 2006-07-04 | Intel Corporation | Reclaiming existing fields in address translation data structures to extend control over memory accesses |
US20040117594A1 (en) | 2002-12-13 | 2004-06-17 | Vanderspek Julius | Memory management method |
US20040122887A1 (en) | 2002-12-20 | 2004-06-24 | Macy William W. | Efficient multiplication of small matrices using SIMD registers |
US7191349B2 (en) | 2002-12-26 | 2007-03-13 | Intel Corporation | Mechanism for processor power state aware distribution of lowest priority interrupt |
US6925421B2 (en) | 2003-01-09 | 2005-08-02 | International Business Machines Corporation | Method, system, and computer program product for estimating the number of consumers that place a load on an individual resource in a pool of physically distributed resources |
US20040139441A1 (en) | 2003-01-09 | 2004-07-15 | Kabushiki Kaisha Toshiba | Processor, arithmetic operation processing method, and priority determination method |
US7178010B2 (en) | 2003-01-16 | 2007-02-13 | Ip-First, Llc | Method and apparatus for correcting an internal call/return stack in a microprocessor that detects from multiple pipeline stages incorrect speculative update of the call/return stack |
US7089374B2 (en) | 2003-02-13 | 2006-08-08 | Sun Microsystems, Inc. | Selectively unmarking load-marked cache lines during transactional program execution |
US7278030B1 (en) | 2003-03-03 | 2007-10-02 | Vmware, Inc. | Virtualization system for computers having multiple protection mechanisms |
US6912644B1 (en) | 2003-03-06 | 2005-06-28 | Intel Corporation | Method and apparatus to steer memory access operations in a virtual memory system |
US7111145B1 (en) | 2003-03-25 | 2006-09-19 | Vmware, Inc. | TLB miss fault handler and method for accessing multiple page tables |
US7143273B2 (en) | 2003-03-31 | 2006-11-28 | Intel Corporation | Method and apparatus for dynamic branch prediction utilizing multiple stew algorithms for indexing a global history |
CN1214666C (zh) | 2003-04-07 | 2005-08-10 | 华为技术有限公司 | 位置业务中限制位置信息请求流量的方法 |
US7058764B2 (en) | 2003-04-14 | 2006-06-06 | Hewlett-Packard Development Company, L.P. | Method of adaptive cache partitioning to increase host I/O performance |
US7139855B2 (en) | 2003-04-24 | 2006-11-21 | International Business Machines Corporation | High performance synchronization of resource allocation in a logically-partitioned system |
EP1471421A1 (en) | 2003-04-24 | 2004-10-27 | STMicroelectronics Limited | Speculative load instruction control |
US7290261B2 (en) | 2003-04-24 | 2007-10-30 | International Business Machines Corporation | Method and logical apparatus for rename register reallocation in a simultaneous multi-threaded (SMT) processor |
US7469407B2 (en) * | 2003-04-24 | 2008-12-23 | International Business Machines Corporation | Method for resource balancing using dispatch flush in a simultaneous multithread processor |
US7055003B2 (en) | 2003-04-25 | 2006-05-30 | International Business Machines Corporation | Data cache scrub mechanism for large L2/L3 data cache structures |
US7007108B2 (en) | 2003-04-30 | 2006-02-28 | Lsi Logic Corporation | System method for use of hardware semaphores for resource release notification wherein messages comprises read-modify-write operation and address |
CN1809810B (zh) | 2003-06-25 | 2010-06-09 | 皇家飞利浦电子股份有限公司 | 指令控制数据处理设备 |
JP2005032018A (ja) | 2003-07-04 | 2005-02-03 | Semiconductor Energy Lab Co Ltd | 遺伝的アルゴリズムを用いたマイクロプロセッサ |
US7149872B2 (en) | 2003-07-10 | 2006-12-12 | Transmeta Corporation | System and method for identifying TLB entries associated with a physical address of a specified range |
US7089398B2 (en) | 2003-07-31 | 2006-08-08 | Silicon Graphics, Inc. | Address translation using a page size tag |
US8296771B2 (en) | 2003-08-18 | 2012-10-23 | Cray Inc. | System and method for mapping between resource consumers and resource providers in a computing system |
US7133950B2 (en) | 2003-08-19 | 2006-11-07 | Sun Microsystems, Inc. | Request arbitration in multi-core processor |
US7849297B2 (en) | 2003-08-28 | 2010-12-07 | Mips Technologies, Inc. | Software emulation of directed exceptions in a multithreading processor |
US9032404B2 (en) * | 2003-08-28 | 2015-05-12 | Mips Technologies, Inc. | Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor |
US7594089B2 (en) | 2003-08-28 | 2009-09-22 | Mips Technologies, Inc. | Smart memory based synchronization controller for a multi-threaded multiprocessor SoC |
US7424599B2 (en) * | 2003-08-28 | 2008-09-09 | Mips Technologies, Inc. | Apparatus, method, and instruction for software management of multiple computational contexts in a multithreaded microprocessor |
US7111126B2 (en) | 2003-09-24 | 2006-09-19 | Arm Limited | Apparatus and method for loading data values |
JP4057989B2 (ja) | 2003-09-26 | 2008-03-05 | 株式会社東芝 | スケジューリング方法および情報処理システム |
FR2860313B1 (fr) | 2003-09-30 | 2005-11-04 | Commissariat Energie Atomique | Composant a architecture reconfigurable dynamiquement |
US7373637B2 (en) | 2003-09-30 | 2008-05-13 | International Business Machines Corporation | Method and apparatus for counting instruction and memory location ranges |
US7047322B1 (en) | 2003-09-30 | 2006-05-16 | Unisys Corporation | System and method for performing conflict resolution and flow control in a multiprocessor system |
TWI281121B (en) | 2003-10-06 | 2007-05-11 | Ip First Llc | Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence |
US7395372B2 (en) | 2003-11-14 | 2008-07-01 | International Business Machines Corporation | Method and system for providing cache set selection which is power optimized |
US7243170B2 (en) | 2003-11-24 | 2007-07-10 | International Business Machines Corporation | Method and circuit for reading and writing an instruction buffer |
US20050120191A1 (en) | 2003-12-02 | 2005-06-02 | Intel Corporation (A Delaware Corporation) | Checkpoint-based register reclamation |
US20050132145A1 (en) | 2003-12-15 | 2005-06-16 | Finisar Corporation | Contingent processor time division multiple access of memory in a multi-processor system to allow supplemental memory consumer access |
US7310722B2 (en) | 2003-12-18 | 2007-12-18 | Nvidia Corporation | Across-thread out of order instruction dispatch in a multithreaded graphics processor |
US7293164B2 (en) | 2004-01-14 | 2007-11-06 | International Business Machines Corporation | Autonomic method and apparatus for counting branch instructions to generate branch statistics meant to improve branch predictions |
US20050204118A1 (en) | 2004-02-27 | 2005-09-15 | National Chiao Tung University | Method for inter-cluster communication that employs register permutation |
US20050216920A1 (en) | 2004-03-24 | 2005-09-29 | Vijay Tewari | Use of a virtual machine to emulate a hardware device |
KR100877138B1 (ko) | 2004-03-29 | 2009-01-09 | 고쿠리츠 다이가쿠 호진 교토 다이가쿠 | 데이터 처리장치, 데이터 처리 프로그램, 및 데이터 처리프로그램을 기록한 기록매체 |
US7383427B2 (en) | 2004-04-22 | 2008-06-03 | Sony Computer Entertainment Inc. | Multi-scalar extension for SIMD instruction set processors |
US20050251649A1 (en) | 2004-04-23 | 2005-11-10 | Sony Computer Entertainment Inc. | Methods and apparatus for address map optimization on a multi-scalar extension |
US7418582B1 (en) | 2004-05-13 | 2008-08-26 | Sun Microsystems, Inc. | Versatile register file design for a multi-threaded processor utilizing different modes and register windows |
US7478198B2 (en) | 2004-05-24 | 2009-01-13 | Intel Corporation | Multithreaded clustered microarchitecture with dynamic back-end assignment |
US7594234B1 (en) * | 2004-06-04 | 2009-09-22 | Sun Microsystems, Inc. | Adaptive spin-then-block mutual exclusion in multi-threaded processing |
US7284092B2 (en) | 2004-06-24 | 2007-10-16 | International Business Machines Corporation | Digital data processing apparatus having multi-level register file |
US20050289530A1 (en) | 2004-06-29 | 2005-12-29 | Robison Arch D | Scheduling of instructions in program compilation |
EP1628235A1 (en) | 2004-07-01 | 2006-02-22 | Texas Instruments Incorporated | Method and system of ensuring integrity of a secure mode entry sequence |
US8044951B1 (en) | 2004-07-02 | 2011-10-25 | Nvidia Corporation | Integer-based functionality in a graphics shading language |
US7339592B2 (en) | 2004-07-13 | 2008-03-04 | Nvidia Corporation | Simulating multiported memories using lower port count memories |
US7398347B1 (en) | 2004-07-14 | 2008-07-08 | Altera Corporation | Methods and apparatus for dynamic instruction controlled reconfigurable register file |
EP1619593A1 (en) | 2004-07-22 | 2006-01-25 | Sap Ag | Computer-Implemented method and system for performing a product availability check |
JP4064380B2 (ja) | 2004-07-29 | 2008-03-19 | 富士通株式会社 | 演算処理装置およびその制御方法 |
US8443171B2 (en) | 2004-07-30 | 2013-05-14 | Hewlett-Packard Development Company, L.P. | Run-time updating of prediction hint instructions |
US7213106B1 (en) | 2004-08-09 | 2007-05-01 | Sun Microsystems, Inc. | Conservative shadow cache support in a point-to-point connected multiprocessing node |
US7318143B2 (en) | 2004-10-20 | 2008-01-08 | Arm Limited | Reuseable configuration data |
US20090150890A1 (en) | 2007-12-10 | 2009-06-11 | Yourst Matt T | Strand-based computing hardware and dynamically optimizing strandware for a high performance microprocessor system |
US7707578B1 (en) * | 2004-12-16 | 2010-04-27 | Vmware, Inc. | Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system |
US7257695B2 (en) | 2004-12-28 | 2007-08-14 | Intel Corporation | Register file regions for a processing system |
US7996644B2 (en) | 2004-12-29 | 2011-08-09 | Intel Corporation | Fair sharing of a cache in a multi-core/multi-threaded processor by dynamically partitioning of the cache |
US8719819B2 (en) | 2005-06-30 | 2014-05-06 | Intel Corporation | Mechanism for instruction set based thread execution on a plurality of instruction sequencers |
US7050922B1 (en) | 2005-01-14 | 2006-05-23 | Agilent Technologies, Inc. | Method for optimizing test order, and machine-readable media storing sequences of instructions to perform same |
US7681014B2 (en) * | 2005-02-04 | 2010-03-16 | Mips Technologies, Inc. | Multithreading instruction scheduler employing thread group priorities |
US7657891B2 (en) | 2005-02-04 | 2010-02-02 | Mips Technologies, Inc. | Multithreading microprocessor with optimized thread scheduler for increasing pipeline utilization efficiency |
EP1849095B1 (en) | 2005-02-07 | 2013-01-02 | Richter, Thomas | Low latency massive parallel data processing device |
US7400548B2 (en) | 2005-02-09 | 2008-07-15 | International Business Machines Corporation | Method for providing multiple reads/writes using a 2read/2write register file array |
US7343476B2 (en) | 2005-02-10 | 2008-03-11 | International Business Machines Corporation | Intelligent SMT thread hang detect taking into account shared resource contention/blocking |
US7152155B2 (en) | 2005-02-18 | 2006-12-19 | Qualcomm Incorporated | System and method of correcting a branch misprediction |
US20060200655A1 (en) | 2005-03-04 | 2006-09-07 | Smith Rodney W | Forward looking branch target address caching |
US8195922B2 (en) * | 2005-03-18 | 2012-06-05 | Marvell World Trade, Ltd. | System for dynamically allocating processing time to multiple threads |
US20060212853A1 (en) * | 2005-03-18 | 2006-09-21 | Marvell World Trade Ltd. | Real-time control apparatus having a multi-thread processor |
GB2424727B (en) | 2005-03-30 | 2007-08-01 | Transitive Ltd | Preparing instruction groups for a processor having a multiple issue ports |
US8522253B1 (en) | 2005-03-31 | 2013-08-27 | Guillermo Rozas | Hardware support for virtual machine and operating system context switching in translation lookaside buffers and virtually tagged caches |
US7313775B2 (en) | 2005-04-06 | 2007-12-25 | Lsi Corporation | Integrated circuit with relocatable processor hardmac |
US20060230243A1 (en) | 2005-04-06 | 2006-10-12 | Robert Cochran | Cascaded snapshots |
US8230423B2 (en) * | 2005-04-07 | 2012-07-24 | International Business Machines Corporation | Multithreaded processor architecture with operational latency hiding |
US20060230409A1 (en) * | 2005-04-07 | 2006-10-12 | Matteo Frigo | Multithreaded processor architecture with implicit granularity adaptation |
US20060230253A1 (en) | 2005-04-11 | 2006-10-12 | Lucian Codrescu | Unified non-partitioned register files for a digital signal processor operating in an interleaved multi-threaded environment |
US20060236074A1 (en) | 2005-04-14 | 2006-10-19 | Arm Limited | Indicating storage locations within caches |
US7437543B2 (en) | 2005-04-19 | 2008-10-14 | International Business Machines Corporation | Reducing the fetch time of target instructions of a predicted taken branch instruction |
US7461237B2 (en) | 2005-04-20 | 2008-12-02 | Sun Microsystems, Inc. | Method and apparatus for suppressing duplicative prefetches for branch target cache lines |
US8713286B2 (en) | 2005-04-26 | 2014-04-29 | Qualcomm Incorporated | Register files for a digital signal processor operating in an interleaved multi-threaded environment |
GB2426084A (en) | 2005-05-13 | 2006-11-15 | Agilent Technologies Inc | Updating data in a dual port memory |
US7861055B2 (en) | 2005-06-07 | 2010-12-28 | Broadcom Corporation | Method and system for on-chip configurable data ram for fast memory and pseudo associative caches |
US8010969B2 (en) * | 2005-06-13 | 2011-08-30 | Intel Corporation | Mechanism for monitoring instruction set based thread execution on a plurality of instruction sequencers |
KR101355496B1 (ko) | 2005-08-29 | 2014-01-28 | 디 인벤션 사이언스 펀드 원, 엘엘씨 | 복수의 병렬 클러스터들을 포함하는 계층 프로세서의스케쥴링 메카니즘 |
CN101263465B (zh) | 2005-09-14 | 2011-11-09 | 皇家飞利浦电子股份有限公司 | 用于总线仲裁的方法和系统 |
US7350056B2 (en) | 2005-09-27 | 2008-03-25 | International Business Machines Corporation | Method and apparatus for issuing instructions from an issue queue in an information handling system |
US7676634B1 (en) | 2005-09-28 | 2010-03-09 | Sun Microsystems, Inc. | Selective trace cache invalidation for self-modifying code via memory aging |
US7231106B2 (en) | 2005-09-30 | 2007-06-12 | Lucent Technologies Inc. | Apparatus for directing an optical signal from an input fiber to an output fiber within a high index host |
US7613131B2 (en) | 2005-11-10 | 2009-11-03 | Citrix Systems, Inc. | Overlay network infrastructure |
US7681019B1 (en) | 2005-11-18 | 2010-03-16 | Sun Microsystems, Inc. | Executing functions determined via a collection of operations from translated instructions |
US7861060B1 (en) | 2005-12-15 | 2010-12-28 | Nvidia Corporation | Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior |
US7634637B1 (en) | 2005-12-16 | 2009-12-15 | Nvidia Corporation | Execution of parallel groups of threads with per-instruction serialization |
US7770161B2 (en) | 2005-12-28 | 2010-08-03 | International Business Machines Corporation | Post-register allocation profile directed instruction scheduling |
US8423682B2 (en) | 2005-12-30 | 2013-04-16 | Intel Corporation | Address space emulation |
GB2435362B (en) * | 2006-02-20 | 2008-11-26 | Cramer Systems Ltd | Method of configuring devices in a telecommunications network |
JP4332205B2 (ja) | 2006-02-27 | 2009-09-16 | 富士通株式会社 | キャッシュ制御装置およびキャッシュ制御方法 |
US7543282B2 (en) | 2006-03-24 | 2009-06-02 | Sun Microsystems, Inc. | Method and apparatus for selectively executing different executable code versions which are optimized in different ways |
CN103646009B (zh) | 2006-04-12 | 2016-08-17 | 索夫特机械公司 | 对载明并行和依赖运算的指令矩阵进行处理的装置和方法 |
US7610571B2 (en) | 2006-04-14 | 2009-10-27 | Cadence Design Systems, Inc. | Method and system for simulating state retention of an RTL design |
US7577820B1 (en) | 2006-04-14 | 2009-08-18 | Tilera Corporation | Managing data in a parallel processing environment |
CN100485636C (zh) | 2006-04-24 | 2009-05-06 | 华为技术有限公司 | 一种基于模型驱动进行电信级业务开发的调试方法及装置 |
US7804076B2 (en) | 2006-05-10 | 2010-09-28 | Taiwan Semiconductor Manufacturing Co., Ltd | Insulator for high current ion implanters |
US8145882B1 (en) | 2006-05-25 | 2012-03-27 | Mips Technologies, Inc. | Apparatus and method for processing template based user defined instructions |
US20080126771A1 (en) | 2006-07-25 | 2008-05-29 | Lei Chen | Branch Target Extension for an Instruction Cache |
CN100495324C (zh) | 2006-07-27 | 2009-06-03 | 中国科学院计算技术研究所 | 复杂指令集体系结构中的深度优先异常处理方法 |
US8046775B2 (en) * | 2006-08-14 | 2011-10-25 | Marvell World Trade Ltd. | Event-based bandwidth allocation mode switching method and apparatus |
US7904704B2 (en) | 2006-08-14 | 2011-03-08 | Marvell World Trade Ltd. | Instruction dispatching method and apparatus |
US7539842B2 (en) | 2006-08-15 | 2009-05-26 | International Business Machines Corporation | Computer memory system for selecting memory buses according to physical memory organization information stored in virtual address translation tables |
US7594060B2 (en) | 2006-08-23 | 2009-09-22 | Sun Microsystems, Inc. | Data buffer allocation in a non-blocking data services platform using input/output switching fabric |
US7752474B2 (en) | 2006-09-22 | 2010-07-06 | Apple Inc. | L1 cache flush when processor is entering low power mode |
US7716460B2 (en) | 2006-09-29 | 2010-05-11 | Qualcomm Incorporated | Effective use of a BHT in processor having variable length instruction set execution modes |
US7774549B2 (en) | 2006-10-11 | 2010-08-10 | Mips Technologies, Inc. | Horizontally-shared cache victims in multiple core processors |
TWI337495B (en) | 2006-10-26 | 2011-02-11 | Au Optronics Corp | System and method for operation scheduling |
US7680988B1 (en) | 2006-10-30 | 2010-03-16 | Nvidia Corporation | Single interconnect providing read and write access to a memory shared by concurrent threads |
US7617384B1 (en) | 2006-11-06 | 2009-11-10 | Nvidia Corporation | Structured programming control flow using a disable mask in a SIMD architecture |
CN101627365B (zh) | 2006-11-14 | 2017-03-29 | 索夫特机械公司 | 多线程架构 |
US7493475B2 (en) | 2006-11-15 | 2009-02-17 | Stmicroelectronics, Inc. | Instruction vector-mode processing in multi-lane processor by multiplex switch replicating instruction in one lane to select others along with updated operand address |
US7934179B2 (en) * | 2006-11-20 | 2011-04-26 | Et International, Inc. | Systems and methods for logic verification |
US20080235500A1 (en) | 2006-11-21 | 2008-09-25 | Davis Gordon T | Structure for instruction cache trace formation |
JP2008130056A (ja) | 2006-11-27 | 2008-06-05 | Renesas Technology Corp | 半導体回路 |
WO2008077088A2 (en) | 2006-12-19 | 2008-06-26 | The Board Of Governors For Higher Education, State Of Rhode Island And Providence Plantations | System and method for branch misprediction prediction using complementary branch predictors |
US7783869B2 (en) | 2006-12-19 | 2010-08-24 | Arm Limited | Accessing branch predictions ahead of instruction fetching |
EP1940028B1 (en) | 2006-12-29 | 2012-02-29 | STMicroelectronics Srl | Asynchronous interconnection system for 3D inter-chip communication |
US8321849B2 (en) | 2007-01-26 | 2012-11-27 | Nvidia Corporation | Virtual architecture and instruction set for parallel thread computing |
TW200833002A (en) * | 2007-01-31 | 2008-08-01 | Univ Nat Yunlin Sci & Tech | Distributed switching circuit having fairness |
US20080189501A1 (en) | 2007-02-05 | 2008-08-07 | Irish John D | Methods and Apparatus for Issuing Commands on a Bus |
US7685410B2 (en) | 2007-02-13 | 2010-03-23 | Global Foundries Inc. | Redirect recovery cache that receives branch misprediction redirects and caches instructions to be dispatched in response to the redirects |
US7647483B2 (en) | 2007-02-20 | 2010-01-12 | Sony Computer Entertainment Inc. | Multi-threaded parallel processor methods and apparatus |
JP4980751B2 (ja) | 2007-03-02 | 2012-07-18 | 富士通セミコンダクター株式会社 | データ処理装置、およびメモリのリードアクティブ制御方法。 |
US8452907B2 (en) | 2007-03-27 | 2013-05-28 | Arm Limited | Data processing apparatus and method for arbitrating access to a shared resource |
US20080250227A1 (en) | 2007-04-04 | 2008-10-09 | Linderman Michael D | General Purpose Multiprocessor Programming Apparatus And Method |
US7716183B2 (en) | 2007-04-11 | 2010-05-11 | Dot Hill Systems Corporation | Snapshot preserved data cloning |
US7941791B2 (en) | 2007-04-13 | 2011-05-10 | Perry Wang | Programming environment for heterogeneous processor resource integration |
US7769955B2 (en) | 2007-04-27 | 2010-08-03 | Arm Limited | Multiple thread instruction fetch from different cache levels |
US7711935B2 (en) | 2007-04-30 | 2010-05-04 | Netlogic Microsystems, Inc. | Universal branch identifier for invalidation of speculative instructions |
US8555039B2 (en) | 2007-05-03 | 2013-10-08 | Qualcomm Incorporated | System and method for using a local condition code register for accelerating conditional instruction execution in a pipeline processor |
US8219996B1 (en) * | 2007-05-09 | 2012-07-10 | Hewlett-Packard Development Company, L.P. | Computer processor with fairness monitor |
US9292436B2 (en) * | 2007-06-25 | 2016-03-22 | Sonics, Inc. | Various methods and apparatus to support transactions whose data address sequence within that transaction crosses an interleaved channel address boundary |
CN101344840B (zh) | 2007-07-10 | 2011-08-31 | 苏州简约纳电子有限公司 | 一种微处理器及在微处理器中执行指令的方法 |
US7937568B2 (en) | 2007-07-11 | 2011-05-03 | International Business Machines Corporation | Adaptive execution cycle control method for enhanced instruction throughput |
US20090025004A1 (en) | 2007-07-16 | 2009-01-22 | Microsoft Corporation | Scheduling by Growing and Shrinking Resource Allocation |
US8108545B2 (en) | 2007-08-27 | 2012-01-31 | International Business Machines Corporation | Packet coalescing in virtual channels of a data processing system in a multi-tiered full-graph interconnect architecture |
US7711929B2 (en) | 2007-08-30 | 2010-05-04 | International Business Machines Corporation | Method and system for tracking instruction dependency in an out-of-order processor |
US8725991B2 (en) | 2007-09-12 | 2014-05-13 | Qualcomm Incorporated | Register file system and method for pipelined processing |
US8082420B2 (en) | 2007-10-24 | 2011-12-20 | International Business Machines Corporation | Method and apparatus for executing instructions |
US7856530B1 (en) | 2007-10-31 | 2010-12-21 | Network Appliance, Inc. | System and method for implementing a dynamic cache for a data storage system |
US7877559B2 (en) | 2007-11-26 | 2011-01-25 | Globalfoundries Inc. | Mechanism to accelerate removal of store operations from a queue |
US8245232B2 (en) * | 2007-11-27 | 2012-08-14 | Microsoft Corporation | Software-configurable and stall-time fair memory access scheduling mechanism for shared memory systems |
US7809925B2 (en) | 2007-12-07 | 2010-10-05 | International Business Machines Corporation | Processing unit incorporating vectorizable execution unit |
US8145844B2 (en) | 2007-12-13 | 2012-03-27 | Arm Limited | Memory controller with write data cache and read data cache |
US7831813B2 (en) | 2007-12-17 | 2010-11-09 | Globalfoundries Inc. | Uses of known good code for implementing processor architectural modifications |
US7870371B2 (en) | 2007-12-17 | 2011-01-11 | Microsoft Corporation | Target-frequency based indirect jump prediction for high-performance processors |
US20090165007A1 (en) * | 2007-12-19 | 2009-06-25 | Microsoft Corporation | Task-level thread scheduling and resource allocation |
US8782384B2 (en) | 2007-12-20 | 2014-07-15 | Advanced Micro Devices, Inc. | Branch history with polymorphic indirect branch information |
US7917699B2 (en) | 2007-12-21 | 2011-03-29 | Mips Technologies, Inc. | Apparatus and method for controlling the exclusivity mode of a level-two cache |
US9244855B2 (en) | 2007-12-31 | 2016-01-26 | Intel Corporation | Method, system, and apparatus for page sizing extension |
US8645965B2 (en) | 2007-12-31 | 2014-02-04 | Intel Corporation | Supporting metered clients with manycore through time-limited partitioning |
US7877582B2 (en) | 2008-01-31 | 2011-01-25 | International Business Machines Corporation | Multi-addressable register file |
WO2009101563A1 (en) | 2008-02-11 | 2009-08-20 | Nxp B.V. | Multiprocessing implementing a plurality of virtual processors |
US7949972B2 (en) | 2008-03-19 | 2011-05-24 | International Business Machines Corporation | Method, system and computer program product for exploiting orthogonal control vectors in timing driven synthesis |
US7987343B2 (en) | 2008-03-19 | 2011-07-26 | International Business Machines Corporation | Processor and method for synchronous load multiple fetching sequence and pipeline stage result tracking to facilitate early address generation interlock bypass |
US9513905B2 (en) | 2008-03-28 | 2016-12-06 | Intel Corporation | Vector instructions to enable efficient synchronization and parallel reduction operations |
US8120608B2 (en) | 2008-04-04 | 2012-02-21 | Via Technologies, Inc. | Constant buffering for a computational core of a programmable graphics processing unit |
TWI364703B (en) | 2008-05-26 | 2012-05-21 | Faraday Tech Corp | Processor and early execution method of data load thereof |
US8145880B1 (en) | 2008-07-07 | 2012-03-27 | Ovics | Matrix processor data switch routing systems and methods |
CN102089752B (zh) | 2008-07-10 | 2014-05-07 | 洛克泰克科技有限公司 | 依赖性问题的有效率的并行计算 |
JP2010039536A (ja) | 2008-07-31 | 2010-02-18 | Panasonic Corp | プログラム変換装置、プログラム変換方法およびプログラム変換プログラム |
US8316435B1 (en) | 2008-08-14 | 2012-11-20 | Juniper Networks, Inc. | Routing device having integrated MPLS-aware firewall with virtual security system support |
US8135942B2 (en) | 2008-08-28 | 2012-03-13 | International Business Machines Corpration | System and method for double-issue instructions using a dependency matrix and a side issue queue |
US7769984B2 (en) | 2008-09-11 | 2010-08-03 | International Business Machines Corporation | Dual-issuance of microprocessor instructions using dual dependency matrices |
US8225048B2 (en) | 2008-10-01 | 2012-07-17 | Hewlett-Packard Development Company, L.P. | Systems and methods for resource access |
US9244732B2 (en) * | 2009-08-28 | 2016-01-26 | Vmware, Inc. | Compensating threads for microarchitectural resource contentions by prioritizing scheduling and execution |
US7941616B2 (en) * | 2008-10-21 | 2011-05-10 | Microsoft Corporation | System to reduce interference in concurrent programs |
US8423749B2 (en) | 2008-10-22 | 2013-04-16 | International Business Machines Corporation | Sequential processing in network on chip nodes by threads generating message containing payload and pointer for nanokernel to access algorithm to be executed on payload in another node |
GB2464703A (en) | 2008-10-22 | 2010-04-28 | Advanced Risc Mach Ltd | An array of interconnected processors executing a cycle-based program |
BRPI0921720A2 (pt) | 2008-10-30 | 2016-01-05 | Nokia Corp | método e aparelho para entralaçamento de bloco de dados |
US8032678B2 (en) | 2008-11-05 | 2011-10-04 | Mediatek Inc. | Shared resource arbitration |
US7848129B1 (en) | 2008-11-20 | 2010-12-07 | Netlogic Microsystems, Inc. | Dynamically partitioned CAM array |
US8868838B1 (en) | 2008-11-21 | 2014-10-21 | Nvidia Corporation | Multi-class data cache policies |
US8171223B2 (en) | 2008-12-03 | 2012-05-01 | Intel Corporation | Method and system to increase concurrency and control replication in a multi-core cache hierarchy |
US8200949B1 (en) | 2008-12-09 | 2012-06-12 | Nvidia Corporation | Policy based allocation of register file cache to threads in multi-threaded processor |
US8312268B2 (en) * | 2008-12-12 | 2012-11-13 | International Business Machines Corporation | Virtual machine |
US8099586B2 (en) | 2008-12-30 | 2012-01-17 | Oracle America, Inc. | Branch misprediction recovery mechanism for microprocessors |
US20100169578A1 (en) | 2008-12-31 | 2010-07-01 | Texas Instruments Incorporated | Cache tag memory |
US20100205603A1 (en) | 2009-02-09 | 2010-08-12 | Unisys Corporation | Scheduling and dispatching tasks in an emulated operating system |
JP5417879B2 (ja) | 2009-02-17 | 2014-02-19 | 富士通セミコンダクター株式会社 | キャッシュ装置 |
US8505013B2 (en) * | 2010-03-12 | 2013-08-06 | Lsi Corporation | Reducing data read latency in a network communications processor architecture |
US8805788B2 (en) | 2009-05-04 | 2014-08-12 | Moka5, Inc. | Transactional virtual disk with differential snapshots |
US8332854B2 (en) * | 2009-05-19 | 2012-12-11 | Microsoft Corporation | Virtualized thread scheduling for hardware thread optimization based on hardware resource parameter summaries of instruction blocks in execution groups |
US8533437B2 (en) | 2009-06-01 | 2013-09-10 | Via Technologies, Inc. | Guaranteed prefetch instruction |
GB2471067B (en) * | 2009-06-12 | 2011-11-30 | Graeme Roy Smith | Shared resource multi-thread array processor |
US9122487B2 (en) | 2009-06-23 | 2015-09-01 | Oracle America, Inc. | System and method for balancing instruction loads between multiple execution units using assignment history |
CN101582025B (zh) | 2009-06-25 | 2011-05-25 | 浙江大学 | 片上多处理器体系架构下全局寄存器重命名表的实现方法 |
US8397049B2 (en) | 2009-07-13 | 2013-03-12 | Apple Inc. | TLB prefetching |
US8539486B2 (en) * | 2009-07-17 | 2013-09-17 | International Business Machines Corporation | Transactional block conflict resolution based on the determination of executing threads in parallel or in serial mode |
JP5423217B2 (ja) | 2009-08-04 | 2014-02-19 | 富士通株式会社 | 演算処理装置、情報処理装置、および演算処理装置の制御方法 |
US8127078B2 (en) | 2009-10-02 | 2012-02-28 | International Business Machines Corporation | High performance unaligned cache access |
US20110082983A1 (en) | 2009-10-06 | 2011-04-07 | Alcatel-Lucent Canada, Inc. | Cpu instruction and data cache corruption prevention system |
US8695002B2 (en) * | 2009-10-20 | 2014-04-08 | Lantiq Deutschland Gmbh | Multi-threaded processors and multi-processor systems comprising shared resources |
US8364933B2 (en) | 2009-12-18 | 2013-01-29 | International Business Machines Corporation | Software assisted translation lookaside buffer search mechanism |
JP2011150397A (ja) | 2010-01-19 | 2011-08-04 | Panasonic Corp | バス調停装置 |
KR101699910B1 (ko) | 2010-03-04 | 2017-01-26 | 삼성전자주식회사 | 재구성 가능 프로세서 및 그 제어 방법 |
US20120005462A1 (en) | 2010-07-01 | 2012-01-05 | International Business Machines Corporation | Hardware Assist for Optimizing Code During Processing |
US8312258B2 (en) | 2010-07-22 | 2012-11-13 | Intel Corporation | Providing platform independent memory logic |
US8751745B2 (en) | 2010-08-11 | 2014-06-10 | Advanced Micro Devices, Inc. | Method for concurrent flush of L1 and L2 caches |
CN101916180B (zh) | 2010-08-11 | 2013-05-29 | 中国科学院计算技术研究所 | Risc处理器中执行寄存器类型指令的方法和其系统 |
US9201801B2 (en) | 2010-09-15 | 2015-12-01 | International Business Machines Corporation | Computing device with asynchronous auxiliary execution unit |
US8756329B2 (en) | 2010-09-15 | 2014-06-17 | Oracle International Corporation | System and method for parallel multiplexing between servers in a cluster |
CN103250131B (zh) | 2010-09-17 | 2015-12-16 | 索夫特机械公司 | 包括用于早期远分支预测的影子缓存的单周期多分支预测 |
US20120079212A1 (en) | 2010-09-23 | 2012-03-29 | International Business Machines Corporation | Architecture for sharing caches among multiple processes |
TWI541721B (zh) | 2010-10-12 | 2016-07-11 | 軟體機器公司 | 使用指令序列緩衝器來增強分支預測效能的方法、系統及微處理器 |
EP3306466B1 (en) | 2010-10-12 | 2020-05-13 | INTEL Corporation | An instruction sequence buffer to store branches having reliably predictable instruction sequences |
US8370553B2 (en) | 2010-10-18 | 2013-02-05 | International Business Machines Corporation | Formal verification of random priority-based arbiters using property strengthening and underapproximations |
US9047178B2 (en) | 2010-12-13 | 2015-06-02 | SanDisk Technologies, Inc. | Auto-commit memory synchronization |
US8677355B2 (en) | 2010-12-17 | 2014-03-18 | Microsoft Corporation | Virtual machine branching and parallel execution |
WO2012103245A2 (en) | 2011-01-27 | 2012-08-02 | Soft Machines Inc. | Guest instruction block with near branching and far branching sequence construction to native instruction block |
WO2012135050A2 (en) | 2011-03-25 | 2012-10-04 | Soft Machines, Inc. | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
CN103547993B (zh) | 2011-03-25 | 2018-06-26 | 英特尔公司 | 通过使用由可分割引擎实例化的虚拟核来执行指令序列代码块 |
TWI518504B (zh) | 2011-03-25 | 2016-01-21 | 軟體機器公司 | 使用可分割引擎實體化的虛擬核心以支援程式碼區塊執行的暫存器檔案節段 |
US20120254592A1 (en) | 2011-04-01 | 2012-10-04 | Jesus Corbal San Adrian | Systems, apparatuses, and methods for expanding a memory source into a destination register and compressing a source register into a destination memory location |
US9740494B2 (en) * | 2011-04-29 | 2017-08-22 | Arizona Board Of Regents For And On Behalf Of Arizona State University | Low complexity out-of-order issue logic using static circuits |
US8843690B2 (en) | 2011-07-11 | 2014-09-23 | Avago Technologies General Ip (Singapore) Pte. Ltd. | Memory conflicts learning capability |
US8930432B2 (en) | 2011-08-04 | 2015-01-06 | International Business Machines Corporation | Floating point execution unit with fixed point functionality |
US20130046934A1 (en) | 2011-08-15 | 2013-02-21 | Robert Nychka | System caching using heterogenous memories |
US8839025B2 (en) | 2011-09-30 | 2014-09-16 | Oracle International Corporation | Systems and methods for retiring and unretiring cache lines |
EP2783281B1 (en) | 2011-11-22 | 2020-05-13 | Intel Corporation | A microprocessor accelerated code optimizer |
WO2013077875A1 (en) | 2011-11-22 | 2013-05-30 | Soft Machines, Inc. | An accelerated code optimizer for a multiengine microprocessor |
KR101648278B1 (ko) | 2011-11-22 | 2016-08-12 | 소프트 머신즈, 인크. | 마이크로프로세서 가속 코드 최적화기 및 의존성 재순서화 방법 |
US8930674B2 (en) | 2012-03-07 | 2015-01-06 | Soft Machines, Inc. | Systems and methods for accessing a unified translation lookaside buffer |
KR20130119285A (ko) | 2012-04-23 | 2013-10-31 | 한국전자통신연구원 | 클러스터 컴퓨팅 환경에서의 자원 할당 장치 및 그 방법 |
US9684601B2 (en) | 2012-05-10 | 2017-06-20 | Arm Limited | Data processing apparatus having cache and translation lookaside buffer |
US9940247B2 (en) | 2012-06-26 | 2018-04-10 | Advanced Micro Devices, Inc. | Concurrent access to cache dirty bits |
US9740612B2 (en) | 2012-07-30 | 2017-08-22 | Intel Corporation | Systems and methods for maintaining the coherency of a store coalescing cache and a load cache |
US9229873B2 (en) | 2012-07-30 | 2016-01-05 | Soft Machines, Inc. | Systems and methods for supporting a plurality of load and store accesses of a cache |
US9916253B2 (en) | 2012-07-30 | 2018-03-13 | Intel Corporation | Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput |
US9430410B2 (en) | 2012-07-30 | 2016-08-30 | Soft Machines, Inc. | Systems and methods for supporting a plurality of load accesses of a cache in a single cycle |
US9710399B2 (en) | 2012-07-30 | 2017-07-18 | Intel Corporation | Systems and methods for flushing a cache with modified data |
US9678882B2 (en) | 2012-10-11 | 2017-06-13 | Intel Corporation | Systems and methods for non-blocking implementation of cache flush instructions |
US10037228B2 (en) * | 2012-10-25 | 2018-07-31 | Nvidia Corporation | Efficient memory virtualization in multi-threaded processing units |
US9195506B2 (en) * | 2012-12-21 | 2015-11-24 | International Business Machines Corporation | Processor provisioning by a middleware processing system for a plurality of logical processor partitions |
US9891924B2 (en) | 2013-03-15 | 2018-02-13 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
WO2014150806A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for populating register view data structure by using register template snapshots |
KR101708591B1 (ko) | 2013-03-15 | 2017-02-20 | 소프트 머신즈, 인크. | 블록들로 그룹화된 멀티스레드 명령어들을 실행하기 위한 방법 |
US10275255B2 (en) | 2013-03-15 | 2019-04-30 | Intel Corporation | Method for dependency broadcasting through a source organized source view data structure |
WO2014150971A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for dependency broadcasting through a block organized source view data structure |
US9811342B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for performing dual dispatch of blocks and half blocks |
US9904625B2 (en) | 2013-03-15 | 2018-02-27 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
EP2972836B1 (en) | 2013-03-15 | 2022-11-09 | Intel Corporation | A method for emulating a guest centralized flag architecture by using a native distributed flag architecture |
US9632825B2 (en) | 2013-03-15 | 2017-04-25 | Intel Corporation | Method and apparatus for efficient scheduling for asymmetrical execution units |
US9886279B2 (en) | 2013-03-15 | 2018-02-06 | Intel Corporation | Method for populating and instruction view data structure by using register template snapshots |
US9569216B2 (en) | 2013-03-15 | 2017-02-14 | Soft Machines, Inc. | Method for populating a source view data structure by using register template snapshots |
WO2014150991A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for implementing a reduced size register view data structure in a microprocessor |
CN105190541A (zh) | 2013-03-15 | 2015-12-23 | 索夫特机械公司 | 利用具有寄存器视图、源视图、指令视图以及多个注册模板的微处理器体系架构执行指令块的方法 |
-
2014
- 2014-03-12 KR KR1020157028745A patent/KR101708591B1/ko active Application Filing
- 2014-03-12 KR KR1020177003623A patent/KR102063656B1/ko active IP Right Grant
- 2014-03-12 EP EP14769450.9A patent/EP2972845B1/en active Active
- 2014-03-12 WO PCT/US2014/024775 patent/WO2014151018A1/en active Application Filing
- 2014-03-12 CN CN201480024528.0A patent/CN105210040B/zh active Active
- 2014-03-14 TW TW103109479A patent/TWI619077B/zh active
- 2014-03-14 US US14/214,280 patent/US9811377B2/en active Active
-
2017
- 2017-09-21 US US15/712,017 patent/US10146576B2/en active Active
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108475196A (zh) * | 2015-12-02 | 2018-08-31 | 斯瓦姆64有限责任公司 | 数据处理 |
CN106484519A (zh) * | 2016-10-11 | 2017-03-08 | 东南大学 | 异步线程重组方法及基于该方法的simt处理器 |
CN106484519B (zh) * | 2016-10-11 | 2019-11-08 | 东南大学苏州研究院 | 异步线程重组方法及基于该方法的simt处理器 |
Also Published As
Publication number | Publication date |
---|---|
KR102063656B1 (ko) | 2020-01-09 |
EP2972845A1 (en) | 2016-01-20 |
TW201504948A (zh) | 2015-02-01 |
US20180011738A1 (en) | 2018-01-11 |
US20140282592A1 (en) | 2014-09-18 |
WO2014151018A1 (en) | 2014-09-25 |
US9811377B2 (en) | 2017-11-07 |
CN105210040B (zh) | 2019-04-02 |
EP2972845B1 (en) | 2021-07-07 |
EP2972845A4 (en) | 2017-03-22 |
KR20150128968A (ko) | 2015-11-18 |
KR20170018475A (ko) | 2017-02-17 |
US10146576B2 (en) | 2018-12-04 |
TWI619077B (zh) | 2018-03-21 |
KR101708591B1 (ko) | 2017-02-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105210040A (zh) | 用于执行分组成块的多线程指令的方法 | |
US11656875B2 (en) | Method and system for instruction block to execution unit grouping | |
US10169045B2 (en) | Method for dependency broadcasting through a source organized source view data structure | |
US10255076B2 (en) | Method for performing dual dispatch of blocks and half blocks | |
US10146548B2 (en) | Method for populating a source view data structure by using register template snapshots | |
CN105190541A (zh) | 利用具有寄存器视图、源视图、指令视图以及多个注册模板的微处理器体系架构执行指令块的方法 | |
US10198266B2 (en) | Method for populating register view data structure by using register template snapshots | |
US9891924B2 (en) | Method for implementing a reduced size register view data structure in a microprocessor | |
TWI533221B (zh) | 經由區塊組織的來源觀點資料結構來廣播依附的方法、非暫時性電腦可讀取媒體、與電腦系統 | |
US20140281427A1 (en) | Method for implementing a reduced size register view data structure in a microprocessor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20170329 Address after: American California Applicant after: Intel Corporation Address before: American California Applicant before: Soft Machines Inc. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |