CN108376097B - 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段 - Google Patents

用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段 Download PDF

Info

Publication number
CN108376097B
CN108376097B CN201810154116.9A CN201810154116A CN108376097B CN 108376097 B CN108376097 B CN 108376097B CN 201810154116 A CN201810154116 A CN 201810154116A CN 108376097 B CN108376097 B CN 108376097B
Authority
CN
China
Prior art keywords
ports
arbiter
entries
cache
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810154116.9A
Other languages
English (en)
Other versions
CN108376097A (zh
Inventor
M·阿布达拉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Publication of CN108376097A publication Critical patent/CN108376097A/zh
Application granted granted Critical
Publication of CN108376097B publication Critical patent/CN108376097B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible
    • G06F12/0848Partitioned cache, e.g. separate instruction and operand caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/1652Handling requests for interconnection or transfer for access to memory bus based on arbitration in a multiprocessor architecture
    • G06F13/1657Access to multiple memories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30123Organisation of register space, e.g. banked or distributed register file according to context, e.g. thread buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/3834Maintaining memory consistency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3858Result writeback, i.e. updating the architectural state or memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/28Using a specific disk cache architecture
    • G06F2212/282Partitioned cache

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本申请涉及用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段的技术。本申请提出一种用于使用用于处理器的多个寄存器文件片段来执行指令的系统。该系统包括:全局前端调度器,用于接收传入指令序列,其中全局前端调度器将传入指令序列分割成指令的多个代码块并且生成描述在代码块的指令之间的相互依赖性的多个继承性矢量。该系统还包括:处理器的多个虚拟核,被耦合用于接收全局前端调度器分配的代码块,其中每个虚拟核包括多个可分割引擎的资源的相应子集,其中根据虚拟核模式并且根据相应的继承性矢量通过使用可分割引擎来执行代码块。多个寄存器文件段耦合到可分割引擎用于提供数据存储。

Description

用于通过使用由可分割引擎实例化的虚拟核来支持代码块执 行的寄存器文件段
本申请是国际申请号为PCT/US2012/030383,国际申请日为2012/03/23,进入国家阶段的申请号为201280024054.0,题为“用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段”的发明专利申请的分案申请。
本申请要求对Mohammad A.Abdallah于2011年3月25日提交的、名称为"REGISTERFILE SEGMENTS FOR SUPPORTING CODE BLOCK EXECUTION BY USING VIRTUAL CORESINSTANTIATED BY PARTITIONABLE ENGINES"的、共同未决、共同转让的美国临时专利申请第61/467,939号的权益,并且其全部内容并入于此。
有关申请的交叉引用
本申请与Mohammad A.Abdallah于2007年4月12日提交的、名称为"APPARATUS ANDMETHOD FOR PROCESSING AN INSTRUCTION MATRIX SPECIFYING PARALLEL IN DEPENDENTOPERATIONS"的、共同未决、共同转让的美国专利申请公开第2009/0113170号有关,并且其全部内容并入于此。
本申请与Mohammad A.Abdallah于2007年11月14日提交的、名称为"APPARATUSAND METHOD FOR PROCESSING COMPLEX INSTRUCTION FORMATS IN A MULTITHREADEDARCHITECTURE SUPPORTING VARIOUS CONTEXT SWITCH MODES AND VIRTUALIZATIONSCHEMES"的、共同未决、共同转让的美国专利申请公开第2010/0161948号有关,并且其全部内容并入于此。
技术领域
本发明总体上涉及数字计算机系统、更具体地涉及用于选择包括指令序列的指令的系统和方法。
背景技术
处理器被要求处置依赖或者全独立的多个任务。这样的处理器的内部状态通常由可以在每个特定程序执行的实例保持不同值的寄存器构成。在每个程序执行的实例,内部状态映像被称为处理器的架构状态。
如果将代码执行进行切换以运行另一功能(例如另一线程、进程或者程序),则必须保存机器/处理器的状态,从而新功能可以利用内部寄存器以构建它的新状态。一旦终止新功能,则可以丢弃它的状态,并且将还原先前上下文的状态而且执行恢复。这样的切换过程被称为上下文切换并且尤其对于运用大量寄存器(例如64、128、256个)和/或无序执行的现代架构通常包括数十或者数百个周期。
在线程感知硬件架构中,硬件通常针对有限数目的由硬件支持的线程而支持多个上下文状态。在这一情况下,硬件针对每个支持的线程重复所有架构状态单元。这消除了在执行新线程时对于上下文切换的需要。然而这仍然具有多个缺点、即针对硬件中支持的每个附加线程重复所有架构状态单元(即寄存器)的面积、功率和复杂性。此外,如果软件线程的数目超过显式支持的硬件线程的数目,则仍然必须执行上下文切换。
这随着在要求大量线程的细粒度基础上需要并行性而变得常见。具有重复上下文状态硬件存储的硬件线程感知架构无助于非线程式软件代码而仅针对线程化的软件减少上下文切换数目。然而那些线程通常是针对粗粒度并行性而构造的并且导致用于发起和同步的繁重软件开销,从而让细粒度并行性、比如功能调用和循环并行执行而没有高效线程化发起/自动生成。如此描述的开销伴随有难以使用用于非显式/容易并行化/线程化的软件代码的现有技术编译器或者用户并行化技术来自动地并行化这样的代码。
发明内容
在一个实施例中,将本发明实施为一种用于使用用于处理器的多个虚拟核来执行指令的系统。该系统包括:全局前端调度器,用于接收传入指令序列,其中所述全局前端调度器将所述传入指令序列分割成指令的多个代码块并且生成描述在所述代码块的指令之间的相互依赖性的多个继承性矢量。该系统还包括:所述处理器的多个虚拟核,被耦合用于接收由所述全局前端调度器分配的代码块,其中每个虚拟核包括多个可分割引擎的资源的相应子集,其中根据虚拟核模式并且根据相应的所述继承性矢量通过使用所述可分割引擎来执行所述代码块。多个寄存器文件段,耦合到所述可分割引擎用于提供数据存储。
本发明的其它实施例利用公共调度器、公共寄存器文件和公共存储器子系统以实施用于处理器的多个可分割引擎的片段式地址空间。可分割引擎可以用来实施多个虚拟核。片段通过允许附加虚拟核以协同地执行指令序列来实现微处理器性能升级。片段分级可以跨越每个高速缓存分级(例如L1高速缓存、L2高速缓存和公共寄存器文件)相同。片段分级可以使用地址位将地址空间划分成片段,其中使用地址位使得片段在高速缓存线边界以上并且在页面边界以下。每个片段可以被配置为利用多端口存储体结构用于存储。
前文是发明内容、因此必然包含细节的简化、概括和省略;因而本领域技术人员将理解,发明内容仅为示例而并非旨在于以任何方式进行限制。如仅由权利要求限定的本发明的其它方面、发明特征和优点将在下文阐述的非限制具体描述中变得清楚。
附图说明
在附图的各图中通过示例而非通过限制举例说明本发明,并且在附图中,相似标号指代相似单元。
图1A示出全局前端生成代码块和继承性矢量以支持在代码序列的相应可分割引擎上执行代码序列的方式的概况。
图1B示出根据本发明的一个实施例的可分割引擎及其部件的概况图,这些部件包括用于多核处理器的分段式调度器和寄存器文件、全局互连以及分段式存储器子系统。
图2示出根据本发明的一个实施例的调度器流程图。
图3示出根据本发明的一个实施例的示例硬件电路的图,该图示出存储具有互连的操作数和结果的分段式寄存器文件。
图4示出根据本发明的一个实施例的描绘全局前端提取和调度器的图。
图5示出根据本发明的一个实施例的跨越许多虚拟核的指令分布的备选实现方式。
图6示出根据本发明的一个实施例的具有对应多个寄存器文件以及操作数和结果缓冲器的多个寄存器段。
图7示出根据本发明的一个实施例的用于多核处理器的片段式存储器子系统的更具体图。
图8示出描绘根据本发明的一个实施例的通过地址生成如何使用地址的位以枚举片段的图。
图9示出本发明的实施例的如何处置加载和存储的图。
图10示出根据本发明的一个实施例的可以将片段拆分成两个或者更多域的方式。
图11示出根据本发明的一个实施例的处理器的操作模式,在该操作模式中虚拟核被配置作为在执行应用中与逻辑核对应的物理核。
图12示出根据本发明的一个实施例的处理器的操作模式,在该操作模式中虚拟核被配置作为在执行应用中与逻辑核对应的软核。
图13示出根据本发明的一个实施例的处理器的操作模式,在该操作模式中虚拟核被配置作为在执行应用中与单个逻辑核对应的软核。
图14示出根据本发明的一个实施例的用来支持逻辑核和虚拟核功能的片段分割的示例性实现方式。
图15示出根据本发明的一个实施例的实施多物理到多逻辑模式的示例性四片段处理器的片段存储器。
图16示出根据本发明的一个备选实施例的实施多物理到多逻辑模式的示例性四片段处理器的片段存储器。
图17示出根据本发明的一个实施例的实施多软核到多逻辑核模式的示例性四片段处理器的片段存储器。
图18示出根据本发明的一个实施例的实施多软核到一个逻辑核模式的示例性四片段处理器的片段存储器。
图19示出根据本发明的一个实施例的实施物理到逻辑模式的示例性四片段处理器的地址计算和执行单元、操作数/结果缓冲器、线程式寄存器文件以及公共分割调度器。
图20示出根据本发明的一个实施例的用来实施多物理到多逻辑模式的示例性四片段处理器的地址计算和执行单元、操作数/结果缓冲器、线程式寄存器文件以及公共分割调度器的备选实现方式。
图21示出根据本发明的一个实施例的实施多软核到多逻辑模式的示例四片段处理器的地址计算和执行单元、寄存器文件以及公共分割调度器。
图22示出根据本发明的一个实施例的实施多软核到一个逻辑核模式的示例性四片段处理器的地址计算和执行单元、寄存器文件以及公共分割调度器。
图23示出根据本发明的一个实施例的示例性微处理器流水线的图。
具体实施方式
虽然已经结合一个实施例来描述本发明,但是本发明未旨在局限于这里阐述的具体形式。与此相反,它旨在覆盖如能够在如所附权利要求限定的本发明的范围内合理包括的这样的备选、修改和等同。
在以下具体描述中,已经阐述许多具体细节、比如具体方法顺序、结构、单元和连接。然而将理解,无需利用这些和其它具体细节以实现本发明的实施例。在其它境况中,已经省略或者尚未特别具体描述公知结构、单元或者连接以免不必要地使本描述混淆。
在说明书内对“一个实施例”或者“实施例”的引用旨在于指示结合该实施例描述的特定特征、结构或者特性包含于本发明的至少一个实施例中。在说明书内的各处出现短语“在一个实施例中”未必都指代相同实施例,也并非是与其它实施例互斥的单独或者备选实施例。另外,描述可以被一些实施例而未被其它实施例表现的各种特征。类似地,描述各种要求,这些要求可以是针对一些实施例、但是并非其它实施例的要求。
以流程、步骤、逻辑块、处理和对计算机存储器内的数据位的操作的其它符号表示形式呈现以下具体描述的一些部分。这些描述和表示形式是数据处理领域中的技术人员用来向本领域其他技术人员最有效传达他们的工作实质的手段。过程、计算机执行的步骤、逻辑块、处理等在这里以及总体上被理解为是导致所期望结果的步骤或者指令的自一致序列。步骤是需要对物理量进行物理操纵的步骤。这些量尽管未必、但是通常采用计算机可读存储介质的电或者磁信号的形式并且能够在计算机系统中被存储、传送、组合、比较和以其他方式进行操纵。主要出于普遍使用的原因而将这些信号称为位、数值、单元、符号、字符、项、数等已经证实有时是方便的。
然而应当谨记,所有这些和相似术语将与适当物理量关联并且仅为应用于这些量的方便标注。除非如从以下讨论中清楚的那样另有特别指出,否则要理解,贯穿本发明利用诸如“处理”或者“访问”或者“写入”或者“存储”或者“重复”等术语的讨论指代计算机系统或者相似电子计算设备的动作和过程,该计算机系统或者电子计算设备将计算机系统的寄存器和存储器以及其它计算机可读介质内表示为物理(电子)量的数据操纵和变换成计算机系统存储器或者寄存器或者其它这样的信息存储、传输或者显示设备内相似地表示为物理量的其它数据。
本发明的实施例利用公共全局前端调度器、多个片段式寄存器文件和存储器子系统以实施用于多核处理器的多个核的片段式地址空间。在一个实施例中,片段通过允许附加虚拟核(例如软核)以协同地执行包括一个或者多个线程的指令序列来实现微处理器性能升级。片段分级跨越每个高速缓存分级(例如L1高速缓存、L2高速缓存和公共寄存器文件)相同。片段分级使用地址位将地址空间划分成片段,其中使用地址位使得由在高速缓存线边界以上并且在页面边界以下的位来标识片段。每个片段被配置为利用多端口存储体结构用于存储。在以下图1A和1B中进一步描述本发明的实施例。
图1A示出根据本发明的一个实施例的处理器的概况图。如图1A中描绘的,处理器包括全局前端提取和调度器150以及多个可分割引擎11-14。
图1A示出全局前端生成代码块和继承性矢量以支持在代码序列的相应可分割引擎上执行代码序列的方式的概况。代码序列20-23中的每个代码序列可以根据特定虚拟核执行模式属于相同逻辑核/线程或者属于不同逻辑核/线程。全局前端提取和调度器将处理代码序列20-23以生成代码块和继承性矢量。如图所示这些代码块和继承性矢量被分配至特定可分割引擎11-14。
可分割引擎根据选择的模式实施虚拟核。可分割引擎包括段、片段和多个执行单元。在可分割引擎内的资源可以用来实施具有多个模式的虚拟核。如虚拟核模式调配的那样,可以实施一个软核或者许多软核以支持一个逻辑核/线程。在图1A的实施例中,根据选择的模式,虚拟核可以支持一个逻辑核/线程或者四个逻辑核/线程。在虚拟核支持四个逻辑核/线程的实施例中,跨越可分割引擎中的每个可分割引擎展开每个虚拟核的资源。在虚拟核支持一个逻辑核/线程的实施例中,所有引擎的资源专用于该核/线程。引擎被分割使得每个引擎提供包括每个虚拟核的资源的子集。换而言之,虚拟核将包括引擎11-14中的每个引擎的资源的子集。在引擎11-14中的每个引擎的资源之间的通信由全局互连结构30提供以便有助于这一过程。备选地,引擎11-14可以用来实施物理模式,在该物理模式中,引擎11-14的资源专用于支持专用核/线程的执行。以这一方式,由引擎实施的软核包括让资源跨越引擎中的每个引擎展开的虚拟核。在以下图中进一步描述虚拟核执行模式。
应当注意在常规核实现方式中,仅仅一个核/引擎内的资源被唯一分配给一个逻辑线程/核。对照而言,在本发明的实施例中,任何引擎/核的资源可以被分割以与其它引擎/核分割共同实例化向一个逻辑线程/核分配的虚拟核。此外,本发明的实施例可以实施多个虚拟执行模式,在执行虚拟执行模式中,那些相同引擎可以被分割以支持许多专用核/线程、许多动态分配的核/线程或者所有引擎的所有资源都支持单个核/线程的执行的实施例。在以下描述中进一步描述这些实施例。
图1B示出根据本发明的一个实施例的可分割引擎及其部件的概况图,这些部件包括用于多核处理器的分段式调度器和寄存器文件、全局互连以及片段式存储器子系统。如图1B中描绘的,示出四个片段101-104。片段分级跨越每个高速缓存分级(例如L1高速缓存、L2高速缓存和加载存储缓冲器)相同。可以经由存储器全局互连110a在L1高速缓存中的每个高速缓存、L2高速缓存中的每个高速缓存和加载存储缓冲器中的每个加载存储缓冲器之间交换数据。
存储器全局互连包括路由矩阵,该路由矩阵允许多个核(例如地址计算和执行单元121-124)访问可以在片段式高速缓存分级(例如L1高速缓存、加载存储缓冲器和L2高速缓存)中的任何点存储的数据。图1B也描绘地址计算和执行单元121-124可以经由存储器全局互连110a访问片段101-104中的每个片段的方式。
执行全局互连110b类似地包括路由矩阵,该路由矩阵允许多个核(例如地址计算和执行单元121-124)访问可以在片段式寄存器文件中的任何片段式寄存器文件存储的数据。因此,核具有经由存储器全局互连110a或者执行全局互连110b对在片段中的任何片段中存储的数据和在段中的任何段中存储的数据的访问。此外,应当注意在一个实施例中,在公共分割提取和调度器中的每个公共分割提取和调度器之间存在另一全局互连。这由在每个公共分割提取和调度器之间并且连接每个公共分割提取和调度器的水平箭头所示出。
图1B还示出全局前端提取和调度器150,该全局前端提取和调度器具有整个机器的了解并且管理寄存器文件段和片段式存储器子系统的利用。地址生成包括用于片段定义的基础。全局前端提取和调度器通过向每个段的分割调度器分配指令序列来工作。公共分割调度器然后分派那些指令序列用于在地址计算和执行单元121-124上执行。
应当注意在一个实施例中可以向全局前端调度器150中并入公共分割提取和调度器的功能。在这样的实施例中,段将未包括相应公共分割提取和调度器,并且将无需在它们之间的互连。
此外,应当注意可以用分级方式嵌套图1A中所示可分割引擎。在这样的实施例中,第一级可分割引擎将包括本地前端提取和调度器以及连接到它的多个次级可分割引擎。
图2示出根据本发明的一个实施例的调度器流程图。如图2中描绘的,示出包括推测性线程存储桶指针、存储桶源列表和目的列表的存储桶缓冲器。调度器和执行存储桶包括存储桶分派选择器以及虚拟寄存器匹配和读取、包括寄存器分级和寄存器高速缓存的可能性。后端是记录执行的存储桶之处,并且在引退之前实行异常排序。寄存器分级/高速缓存也用为用于执行的存储桶结果的中间存储装置直至它们为非推测性并且可以更新架构状态。以下公开记录执行的存储桶的前端、分派级和后端的一个可能实现方式。
图2示出该概念从管理少量紧密耦合的线程的存储桶缓冲器升级成管理多个存储桶缓冲器和线程的硬件电路的方式。将描述那些可以被扩展以处理可能具有更少紧密交互的更大量线程的电路为全局前端(例如图1B中所示全局前端调度器150)。
该过程通过提取新线程矩阵/存储桶/块来开始,然后向存储桶缓冲器中的空余存储桶槽中指派新线程存储桶。线程分配指针阵列852中的线程分配指针中的每个线程分配指针组成存储桶间隔,该存储桶间隔物理上允许线程在该存储桶间隔中放置它的指令块/存储桶。这些线程中的每个线程保持以轮循方式向在它的对应连续空间间隔内部的存储桶缓冲器阵列中分配存储桶。每当新存储桶/块被指派时在每个线程空间内部的存储桶/块获得指派的递增的新编号854。针对每个存储桶中的有效源,针对每个存储桶,有效源具有有效读取位“Rv”,该有效读取位“Rv”指示在这一个存储桶内部的指令需要这一个源。按照相同约定,在这一个存储桶中的将回写指令的每个指令寄存器具有在存储桶中设置的有效位“Wv”,并且它在目的继承性矢量853中具有字段。在将向存储桶缓冲器中提取新存储桶时,它从由线程存储桶分配指针852指向的先前分配的存储桶继承目的继承性矢量。该继承性矢量从先前分配的存储桶复制并且然后它改写与由那些存储桶指令将更新的寄存器对应的那些有效目的字段。将用当前存储桶编号来标注有效目的,而从在存储桶内部的对应继承性矢量复制无效目的。然后对于新提取的存储桶通过递增线程存储桶指针来更新它的指针(指针在它的间隔内卷绕)。
在存储桶分派和执行阶段中,无论何时执行存储桶都没有任何异常处置,然后设置并且贯穿存储桶缓冲器而且在每个如下存储桶内锁存/监视广播存储桶异常标志(包含存储桶编号)854,该存储桶以具有该存储桶编号的源为源。也有可能随着存储桶编号传递其它有关信息、比如关于虚拟寄存器位置的信息。如果存储桶内设置源存储桶的所有执行标志,则设置该存储桶就绪位855并且该存储桶准备好被分派和执行。如果存储桶执行而无任何异常并且它准备好按照程序的依次顺序更新架构状态,则它引退存储桶并且将引退线程指针857递增到阵列中的下一个存储桶。可以向新存储桶指派引退的存储桶位置。
那些紧密相关线程可以都共存于矩阵/存储桶/块缓冲器内部;每个线程将占用属于该线程的连续存储桶间隔。该线程的分配指针以轮循方式在这一存储桶间隔内部移动从而提取新指令存储桶并且以描述的轮循方式在线程间隔内部分配它们。利用这样的间隔分节,用不同或者相等存储桶间隔长度动态划分整个存储桶缓冲器。
这里针对指令存储桶以及针对线程引入继承性矢量概念。每个指令矩阵/块/存储桶向在架构寄存器之中的特定寄存器中写入。每个新存储桶在分配阶段时向这一矢量中写入它自己的线程和存储桶编号而更新这一继承性矢量、留下它没有写入的寄存器的字段而不更新。按照程序顺序从每个存储桶向下一个存储桶转发这一存储桶继承性矢量B_iv856。在图2中,如果每个矩阵中的指令向架构目的寄存器中写入,则该矩阵向那些寄存器中写入它自己的编号,否则它从该线程中的先前存储桶的B_iv继承该值。
图3示出根据本发明的一个实施例的示例硬件电路的图,该图示出存储具有互连的操作数和结果的分段式寄存器文件。图3示出经由执行全局互连而耦合到多个执行单元的操作数结果缓冲器。
图4示出根据本发明的一个实施例的描绘全局前端调度器的图。全局前端调度器被配置用于处理可以具有更少紧密交互的更大量线程(例如图1B中所示全局前端调度器150)。该图示出如何跨越许多虚拟核来分布来自一个逻辑核的指令序列。这一过程将针对机器中存在的每个逻辑核进行重复。应当注意图4的“引擎”包括虚拟核的部件,其中显式地描绘寄存器文件以示出在寄存器文件级的虚拟核间通信的各方面。
例如如图4中所描绘的,全局前端调度器可以处理线程头部902、但是无需处理线程内的实际指令以跨越那些远线程实行依赖性检查。线程的头部和它的存储桶的子头部仅包含关于那些线程和存储桶写入到的架构寄存器(那些指令的目的寄存器)的信息、在那些头部中无需包括那些指令的实际指令或者源。实际上,列举那些目的寄存器或者位矢量是足够的,在该位矢量中针对用于指令的目的的每个寄存器设置每一个位。该头部无需物理地放置为指令的头部;它可以是可以与其余指令信息一起存储或者不与其余指令信息一起存储的、在线程内的指令的目的寄存器的任何格式化分组或者紧凑表示。
这一全局前端按照程序顺序仅提取线程/块的头部并且生成动态线程和/或存储桶继承性矢量901(Tiv和/或Biv)。每当分配新线程时,如903所示通过保持当前线程存储桶不会写入到或者更新的旧字段来转发那些继承性矢量。那些继承性矢量被分布到大量引擎/核/或者处理器904,这些引擎/核/或者处理器中的每个引擎/核/或者处理器可能包括本地前端和提取单元(该本地前端和提取单元将提取和存储针对每个存储桶产生依赖性矢量的实际指令)和具有本地寄存器文件905的本地矩阵/块/存储桶缓冲器。本地前端然后提取实际指令并且使用从全局前端获得的继承性矢量的信息以针对那些被带入引擎用于执行的指令的指令源来填充依赖性信息。图3图示了全局前端实现方式和它仅使用关于指令的简洁信息(该信息仅为那些指令写入到的寄存器)来向不同引擎904散播继承性矢量的方式。放置于头部中有帮助的其它信息是关于在线程内或者跨越线程的控制路径的变化的信息。全局分支预测器可以用来预测跨越那些线程的控制流。因此,这样的头部可以包括分支目的和偏移。除了分支预测器确定控制流之外,硬件/编译器还可以判决跨越分支的2个控制路径来分派独立线程。在这样的情况下,它将随后使用继承性矢量来合并那些2个路径的执行。图3也示出在前端提取新线程的头部时的转发过程,例如线程2(906)将更新向它转发的对应继承性矢量901从而产生矢量910,在该矢量910中用T2标签更新寄存器1、2、3、4、6、0和7。注意在910中T2存储桶未写入寄存器5,因此T2存储桶的标签从先前继承性矢量继承。
一个感兴趣的观察是寄存器文件允许在核/引擎之中的交叉通信。一旦在填充源依赖性信息时提取并且在本地存储桶缓冲器中分配线程的指令存储桶就可以提出来自交叉引擎的需要的寄存器的(用于减少访问延时的)早期请求,从而可以在分派实际指令用于执行之前可能很久的时间发布交叉引擎线程引用。在任何情况下,指令直至转发和到达交叉引用的源才将被分派。这一交叉引用的源可以存储于本地多线程式寄存器文件或者寄存器高速缓存中。虽然这一交叉引用的源(可以重用加载存储缓冲器物理存储装置和依赖性检查机制而)可以存储于与加载存储缓冲器相似的缓冲器中,但是它是作为寄存器加载而不是存储器加载。许多技术可以用来跨越引擎/核连接寄存器文件,这可以是环形拓扑或者纵横拓扑或者网状路由式互连。
以下讨论可以举例说明如何可以在引擎内部并且也跨越引擎使用寄存器文件分割。在分派存储桶时,向寄存器文件和寄存器高速缓存二者(同时或者依次)发送它的源。如果寄存器文件被物理地统一并且具有针对线程化的直接支持,则从对应线程寄存器分节直接读取操作数。如果寄存器文件是包括使用标记的物理上片段的寄存器文件的虚拟寄存器,则必须完成将标记匹配为虚拟寄存器读取的部分。如果该标记匹配,则从片段的寄存器文件发生读取。
公开了一种支持软件线程、硬件生成的线程、VLIW执行、SIMD和MIMD执行以及无序超标量执行的仿真的寄存器架构。虽然它被物理地片段,但是它看来是统一架构资源。这一片段的寄存器是可以包括寄存器分级和寄存器高速缓存以及用于存储和检查寄存器标记的机制的虚拟寄存器文件的部分。如果我们使用利用依赖性继承性矢量的基于位置的方案则可以消除标记访问。该方案工作使得在分派阶段期间对执行的存储桶编号进行广播时后续指令的所有源执行CAM(内容可寻址匹配),该CAM将它们的源存储桶与刚才分派/执行的存储桶进行比较以设置针对该源的就绪标志。这里,该存储桶执行之处的物理位置也可以与寄存器编号一起传播,从而解决语意模糊。
例如考虑有4个寄存器文件段的实现方式,每个寄存器文件段包含16个寄存器。例如在向分节2分派存储桶#x时,将该存储桶编号x向存储桶缓冲器广播并且还随着它广播段#2,从而对存储桶x具有依赖性的所有源将记录有它在段2中写入所有它的寄存器。在分派那些指令的时间到来时,它们知道它们需要从段2而不是任何其它段读取它们的寄存器,即使在其它段中存在相同寄存器编号。这也适用于寄存器高速缓存以避免使用标记。我们可以将这一概念延伸到全局前端,在该全局前端中,除了线程信息之外,继承性矢量还可以指定在哪个引擎中分配向这一寄存器写入的指令存储桶。
图5示出根据本发明的一个实施例的跨越许多虚拟核的指令分布的备选实现方式。图5示出通过向虚拟核分布继承性矢量编码段来工作的运行时间优化器调度器550。在一个实施例中,优化器关注多个指令代码块并且跨越所有代码块重新调度指令以创建代码段和继承性矢量。优化器的目标将是使代码段在它们的相应虚拟核上重叠执行的执行效率最大化。
图6示出根据本发明的一个实施例的具有对应多个寄存器文件以及操作数结果缓冲器的多个寄存器段。如图6中所描绘的,执行全局互连将每个寄存器段连接到多个地址计算和执行单元。
图6中的寄存器段可以用来实施3个执行模式之一:通过由编译器/编程器分组在一起以形成MIMD超指令矩阵,或者可以在单个线程在4个硬件分节中的每个硬件分节上同时执行的线程式模式中独立执行每个矩阵。最后一个可能执行模式是有能力使用硬件依赖性检查来动态执行来自单个线程的4个不同指令矩阵以确保在4个不同硬件分节上同时执行的那些不同矩阵之间没有依赖性存在。
可以根据执行模式交替地配置图6中的寄存器文件。在一个模式中,寄存器文件视为服务于MIMD宽度的4个分节的MIMD分节式寄存器文件或者寄存器文件用为4个单独的寄存器文件,每个寄存器文件服务于单个线程。寄存器文件也可以支持动态执行模式,在该动态执行模式中4个分节是一个统一寄存器文件,在该统一寄存器文件中向特定分节中的任何寄存器写入的数据可由其它分节中的所有单元访问。在那些模式之间切换可以无缝,因为不同执行模式可以在每一个线程基线指令矩阵与MIMD超指令矩阵线程之间交替。
在多线程执行模式中,每个寄存器文件和它的执行线程的执行单元完全独立于其它寄存器文件及其线程。这与每个线程具有它自己的寄存器状态相似。然而,可以指定在那些线程之间的依赖性。属于线程的每个矩阵将在该线程的寄存器文件的执行单元中执行。如果在硬件上仅执行一个线程或者单个非线程式程序,则以下方法用来允许属于该单个线程/程序的并行矩阵能够访问向其它分节中的寄存器中写入的结果。做到这一点的方式是通过允许向4个寄存器文件分节中的任何寄存器文件分节中写入结果的任何矩阵生成那些寄存器在其它寄存器文件分节中的副本。在物理上,这通过将每个分节的写入端口延伸到剩余分节中来做到。然而这不可升级,因为我们不能构建高效寄存器文件而每个存储器单元具有如一个分节独自所需写入端口的多达4倍。我们呈现构建寄存器文件使得这样的单线程寄存器广播延伸将不影响它的机制。
应当注意可以在Mohammad A.Abdallah于2007年11月14日提交的、名称为"APPARATUS AND METHOD FOR PROCESSING COMPLEX INSTRUCTION FORMATS IN AMULTITHREADED ARCHITECTURE SUPPORTING VARIOUS CONTEXT SWITCH MODES ANDVTRTUALTZATTON SCHEMES"的、美国专利申请公开第2010/0161948号中发现关于如在本发明的实施例中使用的寄存器段的各附加方面。
图7示出根据本发明的一个实施例的用于多核处理器的片段式存储器子系统的更具体图。图7示出在线程之中和/或一般在加载和存储之中的同步方案的全面方案和实现方式。该方案描述一种用于跨越加载/存储架构和/或跨越存储器引用和/或线程的存储器访问的存储器引用的同步和消歧的优选方法。在图7中,我们示出寄存器文件(地址和/或数据寄存器)的多个段、执行单元、地址计算单元和1级高速缓存的片段和/或加载存储缓冲器和2级高速缓存以及地址寄存器互连1200和地址计算单元互连1201。那些片段式单元可以在一个核/处理器内通过将它的集中式资源片段和分布到若干引擎中来构造,或者它们可以由在多核/多处理器配置中的不同核/处理器的单元构造。在该图中示出那些片段1211之一为片段编号1;片段可以升级成大量(一般如该图中所示升级成N个片段)。
这一机制也用为针对在那些引擎/核/处理器之中的存储器架构的相干方案。这一方案通过来自一个片段/核/处理器中的一个地址计算单元的地址请求来开始。例如,假设片段1请求地址(1211),则它可以使用属于它自己的片段的地址寄存器和/或使用地址互连总线1200从跨越其它片段的寄存器来获得和计算它的地址。在计算地址之后,它创建用来访问高速缓存和存储器的32位地址或者64位地址的参考地址。这一地址通常被片段成标记字段以及设置和线字段。这一特定片段/引擎/核将该地址存储到它的加载存储缓冲器和/或L1地址阵列和/或L2地址阵列1202中,同时它将通过使用压缩技术来创建标记的压缩版本(具有比地址的原有标记字段更少的位数)。
另外,不同片段/引擎/核/处理器将使用设置字段或者设置字段的子集作为用于标识在哪个片段/核/处理器中维护地址的索引。地址设置字段对片段的这一索引化确保了地址在特定片段/核/引擎中的所有权独占,即使与该地址对应的存储器数据可以在另一个或者多个其它片段/引擎/核/处理器中存活。即使在每个片段中示出将地址CAM/标记阵列1202/1206与数据阵列1207耦合,但是它们可以仅以物理邻近的放置和布局的方式耦合或者甚至事实是二者属于特定引擎/核/处理器这样,但是在地址阵列中保持的地址与在一个片段内部的数据阵列中的数据之间没有关系。
图8示出描绘根据本发明的一个实施例的通过地址生成如何使用地址的位以枚举片段的图。在本实施例中,如图8中所描绘,片段由在页面边界以上的地址位和在高速缓存线边界以下的地址位定义。本发明有利地保持于页面边界以上以避免在从虚拟地址转译成物理地址期间引起TLB错失。该过程保持于高速缓存线边界以下以便具有完整高速缓存线以便在硬件高速缓存分级内正确相配。例如,在运用64字节高速缓存线的系统中,片段边界将避免后六个地址位。为比较,运用32字节高速缓存线的系统,片段边界将避免后五位。一旦定义,片段分级跨越处理器的所有高速缓存分级相同。
图9示出本发明的实施例的如何处置加载和存储的图。如图9中所描绘,每个片段与它的加载存储缓冲器和存储引退缓冲器关联。对于任何给定的片段,向该片段的加载存储缓冲器发送指明与该片段或者另一片段关联的地址范围的加载和存储用于处理。应当注意,它们可以无序地到达,因为核无序地执行指令。在每个核内,核具有不仅对它自己的寄存器文件而且对其它核的寄存器文件中的每个寄存器文件的访问。
本发明的实施例实施一种分布式加载存储排序系统。该系统跨越多个片段分布。在片段内,该片段执行本地数据依赖性检查。这是因为片段仅在该特定片段的存储引退缓冲器内加载和存储。这限制了对必须关注其它片段以维护数据相干性的需要。以这一方式,将片段内的数据依赖性进行本地实行。
关于数据一致性,存储分派门根据严格的按照程序顺序的存储器一致性规则实行存储引退。存储无序地到达加载存储缓冲器。加载也无序地到达加载存储缓冲器。同时,无序加载和存储被转发到存储引退缓冲器用于处理。应当注意虽然在给定的片段内依序引退存储,但是在它们去往存储分派门时它们可以无序地来自多个片段。存储分派门实行如下策略,该策略确保即使存储可以无序地跨越存储引退缓冲器而驻留,并且即使缓冲器可以相对于其它缓冲器的存储而向存储分派门无序地转发存储,分派门仍然确保向片段存储器严格依序转发它们。这是因为存储分派门具有存储引退的全局了解并且仅允许存储跨越所有片段——即全局——依序离开去往存储器的全局可视侧。以这一方式,存储分派门作为全局观察器工作以确保存储跨越所有片段最终依序返回到存储器。
图10示出根据本发明的一个实施例的可以将片段拆分成两个或者更多域的方式。图10示出可以将单个片段拆分成多个域的方式。域拆分可以经由地址生成过程来实施。域拆分改变必须在片段内完成加载存储检查的方式,因为在这一情况下它们必须仅按域来完成,这有别于跨越整个片段。域拆分还有利在于它可以使单端口式存储器表现如同多端口存储器,其中将单个端口访问每个不同的域。
图11示出根据本发明的一个实施例的处理器的操作模式,在该操作模式中,可分割引擎的硬件资源在执行应用中如同逻辑核工作。在这一实施例中,将虚拟核的引擎的硬件资源配置为物理核。在图11的模式中,每个物理核被配置为作为逻辑核工作。多线程式应用和多线程式功能取决于应用的软件的线程式可编程性。
图12示出根据本发明的一个实施例的处理器的操作模式,在该操作模式中,软件核用来在执行应用中如同逻辑核工作。在这一实施例中,虚拟核的可分割引擎将支持多个软核。在图12的模式中,每个软核被配置为作为逻辑核工作。多线程式应用和多线程式功能取决于应用的软件的线程式可编程性。
图13示出根据本发明的一个实施例的处理器的操作模式,在该操作模式中,软核用来在执行应用中如同单个逻辑核工作。在图13的模式中,每个软核被配置为作为单个逻辑核工作。在这样的实现方式中,单个线程式应用让它的指令序列被划分并且在虚拟核之中分配,其中它们被协同地执行以实现高的单线程式性能。以这一方式,单线程式性能可以随着添加附加软核而升级。
可以在选择处理器的操作模式时使用多个策略。对于具有大量引擎(例如8个引擎、12个引擎等)的处理器,多个软核可以被配置为作为单个逻辑核工作,而剩余核可以在其它模式中操作。这一属性允许智能资源分割以确保硬件的利用最大化和/或浪费的功耗最小化。例如,在一个实施例中,可以根据执行的应用类型来在每线程基础上分配核(例如软核或者逻辑核)。
图14示出根据本发明的一个实施例的用来支持逻辑核和虚拟核功能的片段分割的示例实现方式。如以上讨论的那样,片段分割允许处理器被配置为支持如以上描述的不同虚拟核执行模式。
全局互连允许核的线程访问端口1401中的任何端口。应当注意,这里所用术语“线程”是指来自不同逻辑核的指令序列、来自相同逻辑核的指令序列或者二者的某种混合的表示。
如图所示根据仲裁器的策略可调节线程利用端口1401之一以访问加载存储缓冲器的方式。因此,使用端口1401中的任何端口的线程可以具有经由端口1402对加载存储缓冲器的更大或者更少量访问。分配的大小和管理分配的方式由仲裁器控制。该仲裁器可以根据特定线程的需求而动态分配对端口的访问。
加载存储缓冲器被配置为让多个条目跨越端口展开。对加载存储缓冲器的访问由仲裁器控制。以这一方式,仲裁器可以向不同线程动态地分配加载存储缓冲器中的条目。
图14还示出在加载存储缓冲器与L1高速缓存之间的端口上的仲裁器。因此,与以上描述的加载存储缓冲器一样,使用端口1403中的任何端口的线程可以具有经由端口1404对L1高速缓存的更大或者更少量访问。分配的大小和管理分配的方式由仲裁器控制。该仲裁器可以根据特定线程的需求而动态地分配对端口的访问。
L1高速缓存被配置为让多个方式跨越端口展开。对L1高速缓存的访问由仲裁器控制。以这一方式,仲裁器可以向不同线程动态地分配L1高速缓存中的条目。
在一个实施例中,仲裁器被配置以与用于跟踪功能的多个跟踪计数器1460和提供限制功能的多个阈值寄存器1450工作。限制功能针对每个给定的线程指定最大资源分配百分比。跟踪功能跟踪在任何给定的时间向给定的线程分配的实际资源。这些跟踪和限制功能影响用于加载存储缓冲器、L1高速缓存、L2高速缓存或者全局互连的每线程条目、路线或者端口的数目。例如可以对比可变阈值而动态地检查针对每个线程分配的在加载存储缓冲器中的条目总数。这一可变阈值可以根据给定的线程的转发进度来更新。例如在一个实施例中,将减缓的线程(例如大量L2错失等)量化为产生缓慢向前进度,因此降低它们的包括条目阈值、路线阈值和端口阈值的相应资源分配阈值。
图14也示出共享L2高速缓存。在本实施例中,共享的L2高速缓存具有固定的端口布置而没有在来自L1高速缓存的访问之间的任何仲裁。在处理器上执行的线程将都共享对L2高速缓存和L2高速缓存的资源的访问。
图15示出根据本发明的一个实施例的实施多物理到多逻辑模式的示例性四片段处理器的片段存储器。
图15上的阴影示出一个示例性逻辑核及其与处理器的资源的关系。在图11的操作模式(多物理核到多逻辑核的模式,其中物理核用来在执行应用中如同逻辑核而工作)中,每个逻辑核将被配置为具有固定的加载存储缓冲器和L1高速缓存的资源比值。端口可以向每个线程或者核具体指派。可以每线程或者核具体保留加载存储缓冲器中的条目。可以每线程或者核具体保留L1高速缓存内的路线。多线程式应用和多线程式功能取决于应用的软件的线程式可编程性。这由一个逻辑核具有分配的端口以及片段中的每个片段的存储缓冲器和L1高速缓存的分配部分而示出。以这一方式,逻辑核包括固定的每个片段的资源的分配片段。
在一个实施例中,在多物理核到多逻辑核模式中,可以根据访问每个片段的端口(端口1401)的数目对四个片段进行分割。例如在每片段有六个端口的实施例中,可以用跨越4个片段和4个分割双引擎来支持6个物理核这样的方式对每个片段的资源进行划分并且因此划分引擎的每个分割的资源。每个分割可以被分配它自己的端口。类似地,将以支持6个物理核这样的方式而分配加载存储缓冲器和L1高速缓存的资源。例如,在其中加载存储缓冲器具有48个条目的实施例中,可以分配48个条目使得每物理核有12个条目以支持实施4个物理核的模式,或者可以分配它们使得每物理核有8个条目以支持实施6个物理核的模式。
图16示出根据本发明的一个备选实施例的实施多物理到多逻辑模式的示例性四片段处理器的片段存储器。
同图15,图16上的阴影示出一个示例逻辑核及其与处理器的资源的关系。在图11的操作模式(多物理核到多逻辑核模式)中,整个分割表引擎专用于支持单个逻辑核的执行。这由图16中的阴影示出。物理资源引擎用来在执行应用中如同逻辑核工作。
图17示出根据本发明的一个实施例的实施多软核到多逻辑核模式的示例性四片段处理器的片段存储器。
图17上的阴影示出一个示例性逻辑核及其与处理器的资源的关系。在图12的操作模式(多软核到多逻辑模式,其中虚拟核用来在执行应用中如同逻辑核工作)中,加载存储缓冲器的资源的分配的大小和管理分配的方式由仲裁器控制。该仲裁器可以根据特定线程或者核的需求来动态地分配对端口的访问。类似地,L1高速缓存的资源的分配的大小和管理分配的方式由仲裁器控制。该仲裁器可以根据特定线程或者核的需求来动态地分配对端口的访问。因此,在任何给定的实例,(例如有阴影的)逻辑线程/核可以使用不同仲裁器和不同端口。
以这一方式,对加载存储缓冲器的资源的访问和对L1高速缓存的资源的访问可以更由策略驱动并且可以更基于产生向前进度的个别线程或者核的需要。这由一个逻辑核具有动态分配的端口以及片段中的每个片段的存储缓冲器和L1高速缓存的动态分配部分示出。以这一方式,逻辑核包括每个片段的资源的非固定、动态分配片段。
图18示出根据本发明的一个实施例的实施多软核到一个逻辑核模式的示例性四片段处理器的片段存储器。
在图13的操作模式(多软核到一个逻辑核模式,其中软核用来在执行应用中如同单个逻辑核工作)中,软核中的每个软核被配置以与其它软核一起作为单个逻辑核协同地工作。单个线程或者核具有加载存储缓冲器的所有资源和L1高速缓存的所有资源。在这样的实现方式中,单个线程式应用让它的指令序列被划分并且在软核之中分配,其中它们被协同地执行以实现高的单线程式性能。以这一方式,单线程式性能可以随着添加附加软核而升级。在图17中示出这一点,其中处理器的所有资源的阴影示出一个示例逻辑核及其与处理器的资源的关系。
图19示出根据本发明的一个实施例的实施多物理到多逻辑模式的示例性四片段处理器的地址计算和执行单元、操作数/结果缓冲器、线程式寄存器文件以及公共分割调度器。
图19上的阴影示出一个示例性逻辑核及其与处理器的资源的关系。在图11操作模式(多物理核到多逻辑核模式,其中物理核用来在执行应用中如同逻辑核工作)中,每个逻辑核将被配置为具有固定的地址计算单元、操作数/结果缓冲器、线程式寄存器文件和公共分割调度器的资源比值。多线程式应用和多线程式功能取决于应用的软件的线程式可编程性。这由一个逻辑核具有分配的地址计算和执行单元、分配的线程式寄存器文件以及分配的公共分割调度器而示出。以这一方式,逻辑核包括固定的分配段。然而在一个实施例中,在这一操作模式中,仍然可以共享地址计算和执行单元(例如意味着地址计算和执行单元中的每个地址计算和执行单元将无阴影)
图20示出根据本发明的一个实施例的用来实施多物理到多逻辑模式的示例性四片段处理器的地址计算和执行单元、操作数/结果缓冲器、线程式寄存器文件以及公共分割调度器的备选实现方式。
图20上的阴影示出一个示例性逻辑核及其与处理器的资源的关系。然而在图20的实施例中,跨越片段中的每个片段和可分割引擎中的每个可分割引擎展开物理核的资源。这由一个逻辑核跨越段中的每个段具有地址计算和执行单元的分配部分、线程式寄存器文件的分配部分以及公共分割调度器的分配部分而示出。此外,图20示出将如何已经向一个逻辑核分配地址计算执行单元中的每个地址计算执行单元的资源的部分。以这一方式,逻辑核包括固定的段中的每个段的分配部分。
图21示出根据本发明的一个实施例的实施多软核到多逻辑核模式的示例性四片段处理器的地址计算和执行单元、寄存器文件以及公共分割调度器。
图21上的阴影示出一个示例性逻辑核及其与处理器的资源的关系。在图12的操作模式(多软核到多逻辑核模式,其中软核用来在执行应用中如同逻辑核工作)中,每个逻辑核将被配置为具有对地址计算单元以及操作数/结果缓冲器、线程式寄存器文件和公共分割调度器的动态分配部分中的任一项的共享访问。多线程式应用和多线程式功能取决于应用的软件的线程式可编程性。
图22示出根据本发明的一个实施例的实施多软核到一个逻辑核模式的示例性四片段处理器的地址计算和执行单元、寄存器文件以及公共分割调度器。
图22上的阴影示出一个示例性逻辑核及其与处理器的资源的关系。在图13的操作模式(多软核到一个逻辑核模式,其中软核用来在执行应用中如同单个逻辑核操作)中,每个软核将被配置为具有对所有地址计算单元以及所有操作数/结果缓冲器、线程式寄存器文件和公共分割调度器的共享访问。在这样的实现方式中,单个线程式应用让它的指令序列被划分并且在虚拟核之中分配,其中它们被协同地执行以实现高的单线程式性能。以这一方式,单线程式性能可以随着添加附加软核而升级。
图23示出根据本发明的一个实施例的示例性微处理器流水线2300的图。微处理器流水线2300包括实施如以上描述的过程的功能的提取模块2301,该过程用于标识和抽取包括执行的指令。在图23的实施例中,提取模块跟随有解码模块2302、分配模块2303、分派模块2304、执行模块2305和引退模块2306。应当注意,微处理器流水线2300仅为实施以上描述的本发明的实施例的功能的流水线的一个示例。本领域技术人员将认识可以实施包括以上描述的解码模块的功能的其它微处理器流水线。
已经出于说明的目的参照具体实施例对前文描述进行了描述。然而以上所示讨论并非旨在详尽的或者使本发明限于公开的精确形式。鉴于以上教导的许多修改和变化是可能的。选择和描述实施例以便最好地说明本发明的原理及其实际应用,以由此使本领域其他技术人员能够借助如可以与设想的特定使用相适应的各种修改最好地利用本发明和各种实施例。

Claims (20)

1.一种分段式存储器子系统,包括:
多个存储器片段,所述多个存储器片段中的每一个包括:
具有多个条目的高速缓存,该多个条目中的每一个可由第一多个端口中的一个访问,
具有多个条目的加载存储缓冲器,该多个条目中的每一个可由第二多个端口中的一个访问,以及
第一仲裁器,其能够用于在所述第一多个端口和所述第二多个端口之间提供连接,所述第一多个端口能够使所述第一仲裁器得以访问在所述高速缓存中的所述多个条目,并且所述第二多个端口能够允许所述第一仲裁器去访问在所述加载存储缓冲器中的所述多个条目。
2.如权利要求1所述的分段式存储器子系统,其中所述第一仲裁器能够被配置为实现把所述高速缓存中的所述多个条目动态分配给指令线程。
3.如权利要求1所述的分段式存储器子系统,其中所述第一仲裁器能够被配置为实现把所述加载存储缓冲器中的所述多个条目动态分配给指令线程。
4.如权利要求1所述的分段式存储器子系统,其中所述第一仲裁器能够被配置为通过将所述第一多个端口中的至少一个分配给指令线程从而动态地分配对于所述高速缓存的多个条目的访问。
5.如权利要求4所述的分段式存储器子系统,其中所述第一仲裁器进一步能够控制所述指令线程对所述高速缓存的访问的大小和方式。
6.如权利要求1所述的分段式存储器子系统,其中所述高速缓存包括第一级高速缓存和第二级高速缓存。
7.如权利要求1所述的分段式存储器子系统,其中所述第一仲裁器通过将所述加载存储缓冲器中的条目动态分配给指令线程从而控制对于所述加载存储缓冲器的访问,并且其中所述加载存储缓冲器中的每个条目可由所述第二多个端口中的每一个来访问。
8.如权利要求1所述的分段式存储器子系统,其中每个存储器片段进一步包括:
第二仲裁器,其能够用于在第三多个端口和第四多个端口之间提供连接,所述第三多个端口能够使所述第二仲裁器得以访问所述加载存储缓冲器,并且所述第四多个端口能够使所述第二仲裁器去访问连接所述多个存储器片段的全局互连。
9.如权利要求8所述的分段式存储器子系统,其中所述第二仲裁器能够被配置为通过将所述第三多个端口中的至少一个分配给指令线程从而动态地分配对于所述加载存储缓冲器中的所述多个条目的访问。
10.如权利要求1所述的分段式存储器子系统,还包括多个计数器,用于跟踪所述存储器子系统中的分配给指令线程的资源。
11.一种处理器,包括:
多个物理核,该多个物理核被配置为执行:
调度器,以及
多个虚拟核,该多个虚拟核用于接收由所述调度器分配的指令线程,其中每个虚拟核包括多个可分割引擎的资源,以及
片段式存储器子系统,包括:
多个存储器片段,所述多个存储器片段中的每一个包括:
具有多个条目的高速缓存,该多个条目中的每一个可由第一多个端口中的一个访问,
具有多个条目的加载存储缓冲器,该多个条目中的每一个可由第二多个端口中的一个访问,以及
第一仲裁器,其能够用于在所述第一多个端口和所述第二多个端口之间提供连接,所述多个第一端口能够使所述第一仲裁器得以访问在所述高速缓存中的所述多个条目,并且所述第二多个端口能够允许所述第一仲裁器去访问在所述加载存储缓冲器中的所述多个条目。
12.如权利要求11所述的处理器,其中所述第一仲裁器能够被配置为实现把所述高速缓存中的所述多个条目动态分配给指令线程。
13.如权利要求11所述的处理器,其中所述第一仲裁器能够被配置为实现把所述加载存储缓冲器中的所述多个条目动态分配给指令线程。
14.如权利要求11所述的处理器,其中所述第一仲裁器能够被配置为通过将所述第一多个端口中的至少一个分配给指令线程从而动态地分配对于所述高速缓存的多个条目的访问。
15.如权利要求14所述的处理器,其中所述第一仲裁器进一步能够控制所述指令线程对所述高速缓存的访问的大小和方式。
16.如权利要求11所述的处理器,其中所述高速缓存包括第一级高速缓存和第二级高速缓存。
17.如权利要求11所述的处理器,其中每个存储器片段进一步包括:
第二仲裁器,其能够用于在第三多个端口和第四多个端口之间提供连接,所述第三多个端口能够使所述第二仲裁器得以访问所述加载存储缓冲器,并且所述第四多个端口能够使所述第二仲裁器去访问连接所述多个存储器片段的全局互连。
18.一种计算系统,包括:
主存储器;
系统互连;以及
耦合到所述系统互连和所述主存储器的处理器,该处理器包括:
多个物理核,该多个物理核被配置为执行:
调度器,以及
多个虚拟核,该多个虚拟核用于接收由所述调度器分配的指令线程,其中每个虚拟核包括多个可分割引擎的资源,以及
片段式存储器子系统,包括:
多个存储器片段,所述多个存储器片段中的每一个包括:
具有多个条目的高速缓存,该多个条目中的每一个可由第一多个端口中的一个访问,
具有多个条目的加载存储缓冲器,该多个条目中的每一个可由第二多个端口中的一个访问,以及
第一仲裁器,其能够用于在所述第一多个端口和所述第二多个端口之间提供连接,所述多个第一端口能够使所述第一仲裁器得以访问在所述高速缓存中的所述多个条目,并且所述第二多个端口能够允许所述第一仲裁器去访问在所述加载存储缓冲器中的所述多个条目。
19.如权利要求18所述的计算机系统,其中每个存储器片段进一步包括:
第二仲裁器,其能够用于在第三多个端口和第四多个端口之间提供连接,所述第三多个端口能够使所述第二仲裁器得以访问所述加载存储缓冲器,并且所述第四多个端口能够使所述第二仲裁器去访问连接所述多个存储器片段的全局互连。
20.如权利要求18所述的计算机系统,进一步包括多个计数器,用于跟踪所述存储器子系统中的分配给指令线程的资源。
CN201810154116.9A 2011-03-25 2012-03-23 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段 Active CN108376097B (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201161467939P 2011-03-25 2011-03-25
US61/467,939 2011-03-25
PCT/US2012/030383 WO2012135041A2 (en) 2011-03-25 2012-03-23 Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
CN201280024054.0A CN103562866B (zh) 2011-03-25 2012-03-23 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201280024054.0A Division CN103562866B (zh) 2011-03-25 2012-03-23 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段

Publications (2)

Publication Number Publication Date
CN108376097A CN108376097A (zh) 2018-08-07
CN108376097B true CN108376097B (zh) 2022-04-15

Family

ID=46878335

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201810154116.9A Active CN108376097B (zh) 2011-03-25 2012-03-23 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段
CN201280024054.0A Active CN103562866B (zh) 2011-03-25 2012-03-23 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201280024054.0A Active CN103562866B (zh) 2011-03-25 2012-03-23 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段

Country Status (6)

Country Link
US (2) US9842005B2 (zh)
EP (1) EP2689330B1 (zh)
KR (1) KR101620676B1 (zh)
CN (2) CN108376097B (zh)
TW (1) TWI518504B (zh)
WO (1) WO2012135041A2 (zh)

Families Citing this family (66)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8327115B2 (en) 2006-04-12 2012-12-04 Soft Machines, Inc. Plural matrices of execution units for processing matrices of row dependent instructions in single clock cycle in super or separate mode
EP2527972A3 (en) 2006-11-14 2014-08-06 Soft Machines, Inc. Apparatus and method for processing complex instruction formats in a multi- threaded architecture supporting various context switch modes and virtualization schemes
KR101685247B1 (ko) 2010-09-17 2016-12-09 소프트 머신즈, 인크. 조기 원거리 분기 예측을 위한 섀도우 캐시를 포함하는 단일 사이클 다중 분기 예측
US8789065B2 (en) 2012-06-08 2014-07-22 Throughputer, Inc. System and method for input data load adaptive parallel processing
CN108376097B (zh) 2011-03-25 2022-04-15 英特尔公司 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段
CN103547993B (zh) 2011-03-25 2018-06-26 英特尔公司 通过使用由可分割引擎实例化的虚拟核来执行指令序列代码块
EP2689326B1 (en) 2011-03-25 2022-11-16 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
WO2012162189A1 (en) 2011-05-20 2012-11-29 Soft Machines, Inc. An interconnect structure to support the execution of instruction sequences by a plurality of engines
TWI603198B (zh) 2011-05-20 2017-10-21 英特爾股份有限公司 以複數個引擎作資源與互連結構的分散式分配以支援指令序列的執行
US9448847B2 (en) 2011-07-15 2016-09-20 Throughputer, Inc. Concurrent program execution optimization
EP2783280B1 (en) 2011-11-22 2019-09-11 Intel Corporation An accelerated code optimizer for a multiengine microprocessor
WO2013077876A1 (en) 2011-11-22 2013-05-30 Soft Machines, Inc. A microprocessor accelerated code optimizer
US9569216B2 (en) 2013-03-15 2017-02-14 Soft Machines, Inc. Method for populating a source view data structure by using register template snapshots
US9632825B2 (en) 2013-03-15 2017-04-25 Intel Corporation Method and apparatus for efficient scheduling for asymmetrical execution units
WO2014150806A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for populating register view data structure by using register template snapshots
US9582322B2 (en) 2013-03-15 2017-02-28 Soft Machines Inc. Method and apparatus to avoid deadlock during instruction scheduling using dynamic port remapping
CN105247484B (zh) 2013-03-15 2021-02-23 英特尔公司 利用本地分布式标志体系架构来仿真访客集中式标志体系架构的方法
US10275255B2 (en) 2013-03-15 2019-04-30 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
US20140281116A1 (en) 2013-03-15 2014-09-18 Soft Machines, Inc. Method and Apparatus to Speed up the Load Access and Data Return Speed Path Using Early Lower Address Bits
US10140138B2 (en) 2013-03-15 2018-11-27 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
WO2014150971A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for dependency broadcasting through a block organized source view data structure
US9904625B2 (en) 2013-03-15 2018-02-27 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
WO2014150991A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for implementing a reduced size register view data structure in a microprocessor
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
EP2972845B1 (en) 2013-03-15 2021-07-07 Intel Corporation A method for executing multithreaded instructions grouped onto blocks
KR101800948B1 (ko) * 2013-03-15 2017-11-23 인텔 코포레이션 레지스터 뷰, 소스 뷰, 명령어 뷰, 및 복수의 레지스터 템플릿을 가진 마이크로프로세서 아키텍처를 이용하여 명령어들의 블록들을 실행하는 방법
JP6086230B2 (ja) * 2013-04-01 2017-03-01 日本電気株式会社 中央演算装置、情報処理装置、および仮想コア内レジスタ値取得方法
US9672043B2 (en) 2014-05-12 2017-06-06 International Business Machines Corporation Processing of multiple instruction streams in a parallel slice processor
JP2017516228A (ja) 2014-05-12 2017-06-15 インテル・コーポレーション 自己書き換えコードのハードウェアサポートを提供する方法及び装置
US9720696B2 (en) 2014-09-30 2017-08-01 International Business Machines Corporation Independent mapping of threads
US9977678B2 (en) * 2015-01-12 2018-05-22 International Business Machines Corporation Reconfigurable parallel execution and load-store slice processor
US10133576B2 (en) 2015-01-13 2018-11-20 International Business Machines Corporation Parallel slice processor having a recirculating load-store queue for fast deallocation of issue queue entries
US10133581B2 (en) 2015-01-13 2018-11-20 International Business Machines Corporation Linkable issue queue parallel execution slice for a processor
US11544214B2 (en) * 2015-02-02 2023-01-03 Optimum Semiconductor Technologies, Inc. Monolithic vector processor configured to operate on variable length vectors using a vector length register
US11755484B2 (en) 2015-06-26 2023-09-12 Microsoft Technology Licensing, Llc Instruction block allocation
US9946548B2 (en) 2015-06-26 2018-04-17 Microsoft Technology Licensing, Llc Age-based management of instruction blocks in a processor instruction window
US10191747B2 (en) 2015-06-26 2019-01-29 Microsoft Technology Licensing, Llc Locking operand values for groups of instructions executed atomically
US10346168B2 (en) 2015-06-26 2019-07-09 Microsoft Technology Licensing, Llc Decoupled processor instruction window and operand buffer
US10409606B2 (en) 2015-06-26 2019-09-10 Microsoft Technology Licensing, Llc Verifying branch targets
US10175988B2 (en) 2015-06-26 2019-01-08 Microsoft Technology Licensing, Llc Explicit instruction scheduler state information for a processor
US10169044B2 (en) 2015-06-26 2019-01-01 Microsoft Technology Licensing, Llc Processing an encoding format field to interpret header information regarding a group of instructions
US10409599B2 (en) 2015-06-26 2019-09-10 Microsoft Technology Licensing, Llc Decoding information about a group of instructions including a size of the group of instructions
US9952867B2 (en) 2015-06-26 2018-04-24 Microsoft Technology Licensing, Llc Mapping instruction blocks based on block size
US10768936B2 (en) 2015-09-19 2020-09-08 Microsoft Technology Licensing, Llc Block-based processor including topology and control registers to indicate resource sharing and size of logical processor
US11016770B2 (en) 2015-09-19 2021-05-25 Microsoft Technology Licensing, Llc Distinct system registers for logical processors
US11126433B2 (en) * 2015-09-19 2021-09-21 Microsoft Technology Licensing, Llc Block-based processor core composition register
US9983875B2 (en) 2016-03-04 2018-05-29 International Business Machines Corporation Operation of a multi-slice processor preventing early dependent instruction wakeup
US10037211B2 (en) 2016-03-22 2018-07-31 International Business Machines Corporation Operation of a multi-slice processor with an expanded merge fetching queue
US10346174B2 (en) 2016-03-24 2019-07-09 International Business Machines Corporation Operation of a multi-slice processor with dynamic canceling of partial loads
US10761854B2 (en) 2016-04-19 2020-09-01 International Business Machines Corporation Preventing hazard flushes in an instruction sequencing unit of a multi-slice processor
US10037229B2 (en) 2016-05-11 2018-07-31 International Business Machines Corporation Operation of a multi-slice processor implementing a load/store unit maintaining rejected instructions
US9934033B2 (en) 2016-06-13 2018-04-03 International Business Machines Corporation Operation of a multi-slice processor implementing simultaneous two-target loads and stores
US10042647B2 (en) 2016-06-27 2018-08-07 International Business Machines Corporation Managing a divided load reorder queue
US10318419B2 (en) 2016-08-08 2019-06-11 International Business Machines Corporation Flush avoidance in a load store unit
US10069732B2 (en) * 2016-09-02 2018-09-04 Telefonaktiebolaget Lm Ericsson (Publ) Techniques for architecture-independent dynamic flow learning in a packet forwarder
CN106484394B (zh) * 2016-09-13 2019-07-12 厦门兆慧网络科技有限公司 一种双引擎快速软件开发系统
CN108009007B (zh) * 2016-10-27 2021-01-15 华为技术有限公司 轻量级线程的调度方法及协同管理器、vCPU调度器
US11531552B2 (en) 2017-02-06 2022-12-20 Microsoft Technology Licensing, Llc Executing multiple programs simultaneously on a processor core
JP6979777B2 (ja) * 2017-03-22 2021-12-15 キヤノン株式会社 インターフェース装置およびその制御方法
US11016776B2 (en) * 2017-12-22 2021-05-25 Alibaba Group Holding Limited System and method for executing instructions
US11068305B2 (en) * 2018-05-07 2021-07-20 Micron Technology, Inc. System call management in a user-mode, multi-threaded, self-scheduling processor
EP3857371A1 (en) * 2019-12-19 2021-08-04 Google LLC Resource management unit for capturing operating system configuration states and memory management
WO2021126216A1 (en) 2019-12-19 2021-06-24 Google Llc Resource management unit for capturing operating system configuration states and offloading tasks
US11775307B2 (en) * 2021-09-24 2023-10-03 Apple Inc. Systems and methods for synchronizing data processing in a cellular modem

Family Cites Families (458)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US727487A (en) 1902-10-21 1903-05-05 Swan F Swanson Dumping-car.
US4075704A (en) 1976-07-02 1978-02-21 Floating Point Systems, Inc. Floating point data processor for high speech operation
US4228496A (en) 1976-09-07 1980-10-14 Tandem Computers Incorporated Multiprocessor system
US4245344A (en) 1979-04-02 1981-01-13 Rockwell International Corporation Processing system with dual buses
US4527237A (en) 1979-10-11 1985-07-02 Nanodata Computer Corporation Data processing system
US4414624A (en) 1980-11-19 1983-11-08 The United States Of America As Represented By The Secretary Of The Navy Multiple-microcomputer processing
US4524415A (en) 1982-12-07 1985-06-18 Motorola, Inc. Virtual machine data processor
US4597061B1 (en) 1983-01-03 1998-06-09 Texas Instruments Inc Memory system using pipleline circuitry for improved system
US4577273A (en) 1983-06-06 1986-03-18 Sperry Corporation Multiple microcomputer system for digital computers
US4682281A (en) 1983-08-30 1987-07-21 Amdahl Corporation Data storage unit employing translation lookaside buffer pointer
US4633434A (en) 1984-04-02 1986-12-30 Sperry Corporation High performance storage unit
US4600986A (en) 1984-04-02 1986-07-15 Sperry Corporation Pipelined split stack with high performance interleaved decode
JPS6140643A (ja) 1984-07-31 1986-02-26 Hitachi Ltd システムの資源割当て制御方式
US4835680A (en) 1985-03-15 1989-05-30 Xerox Corporation Adaptive processor array capable of learning variable associations useful in recognizing classes of inputs
JPS6289149A (ja) 1985-10-15 1987-04-23 Agency Of Ind Science & Technol 多ポ−トメモリシステム
JPH0658650B2 (ja) 1986-03-14 1994-08-03 株式会社日立製作所 仮想計算機システム
US4920477A (en) 1987-04-20 1990-04-24 Multiflow Computer, Inc. Virtual address table look aside buffer miss recovery method and apparatus
US4943909A (en) 1987-07-08 1990-07-24 At&T Bell Laboratories Computational origami
US5339398A (en) 1989-07-31 1994-08-16 North American Philips Corporation Memory architecture and method of data organization optimized for hashing
US5471593A (en) 1989-12-11 1995-11-28 Branigin; Michael H. Computer processor with an efficient means of executing many instructions simultaneously
US5197130A (en) 1989-12-29 1993-03-23 Supercomputer Systems Limited Partnership Cluster architecture for a highly parallel scalar/vector multiprocessor system
US5317754A (en) 1990-10-23 1994-05-31 International Business Machines Corporation Method and apparatus for enabling an interpretive execution subset
US5317705A (en) 1990-10-24 1994-05-31 International Business Machines Corporation Apparatus and method for TLB purge reduction in a multi-level machine system
US6282583B1 (en) 1991-06-04 2001-08-28 Silicon Graphics, Inc. Method and apparatus for memory access in a matrix processor computer
US5539911A (en) 1991-07-08 1996-07-23 Seiko Epson Corporation High-performance, superscalar-based computer system with out-of-order instruction execution
JPH0820949B2 (ja) 1991-11-26 1996-03-04 松下電器産業株式会社 情報処理装置
JPH07502358A (ja) 1991-12-23 1995-03-09 インテル・コーポレーション マイクロプロセッサーのクロックに依るマルチプル・アクセスのためのインターリーブ・キャッシュ
KR100309566B1 (ko) 1992-04-29 2001-12-15 리패치 파이프라인프로세서에서다중명령어를무리짓고,그룹화된명령어를동시에발행하고,그룹화된명령어를실행시키는방법및장치
EP0638183B1 (en) 1992-05-01 1997-03-05 Seiko Epson Corporation A system and method for retiring instructions in a superscalar microprocessor
EP0576262B1 (en) 1992-06-25 2000-08-23 Canon Kabushiki Kaisha Apparatus for multiplying integers of many figures
JPH0637202A (ja) 1992-07-20 1994-02-10 Mitsubishi Electric Corp マイクロ波ic用パッケージ
JPH06110781A (ja) 1992-09-30 1994-04-22 Nec Corp キャッシュメモリ装置
US5493660A (en) 1992-10-06 1996-02-20 Hewlett-Packard Company Software assisted hardware TLB miss handler
US5513335A (en) 1992-11-02 1996-04-30 Sgs-Thomson Microelectronics, Inc. Cache tag memory having first and second single-port arrays and a dual-port array
US5819088A (en) 1993-03-25 1998-10-06 Intel Corporation Method and apparatus for scheduling instructions for execution on a multi-issue architecture computer
JPH0784883A (ja) 1993-09-17 1995-03-31 Hitachi Ltd 仮想計算機システムのアドレス変換バッファパージ方法
US6948172B1 (en) 1993-09-21 2005-09-20 Microsoft Corporation Preemptive multi-tasking with cooperative groups of tasks
US5469376A (en) 1993-10-14 1995-11-21 Abdallah; Mohammad A. F. F. Digital circuit for the evaluation of mathematical expressions
US5517651A (en) 1993-12-29 1996-05-14 Intel Corporation Method and apparatus for loading a segment register in a microprocessor capable of operating in multiple modes
US5956753A (en) 1993-12-30 1999-09-21 Intel Corporation Method and apparatus for handling speculative memory access operations
US5761476A (en) 1993-12-30 1998-06-02 Intel Corporation Non-clocked early read for back-to-back scheduling of instructions
JPH07287668A (ja) 1994-04-19 1995-10-31 Hitachi Ltd データ処理装置
CN1084005C (zh) 1994-06-27 2002-05-01 国际商业机器公司 用于动态控制地址空间分配的方法和设备
US5548742A (en) 1994-08-11 1996-08-20 Intel Corporation Method and apparatus for combining a direct-mapped cache and a multiple-way cache in a cache memory
US5813031A (en) 1994-09-21 1998-09-22 Industrial Technology Research Institute Caching tag for a large scale cache computer memory system
US5640534A (en) 1994-10-05 1997-06-17 International Business Machines Corporation Method and system for concurrent access in a data cache array utilizing multiple match line selection paths
US5835951A (en) 1994-10-18 1998-11-10 National Semiconductor Branch processing unit with target cache read prioritization protocol for handling multiple hits
JP3569014B2 (ja) 1994-11-25 2004-09-22 富士通株式会社 マルチコンテキストをサポートするプロセッサおよび処理方法
US5724565A (en) 1995-02-03 1998-03-03 International Business Machines Corporation Method and system for processing first and second sets of instructions by first and second types of processing systems
US5644742A (en) 1995-02-14 1997-07-01 Hal Computer Systems, Inc. Processor structure and method for a time-out checkpoint
US5675759A (en) 1995-03-03 1997-10-07 Shebanow; Michael C. Method and apparatus for register management using issue sequence prior physical register and register association validity information
US5634068A (en) 1995-03-31 1997-05-27 Sun Microsystems, Inc. Packet switched cache coherent multiprocessor system
US5751982A (en) 1995-03-31 1998-05-12 Apple Computer, Inc. Software emulation system with dynamic translation of emulated instructions for increased processing speed
US6209085B1 (en) 1995-05-05 2001-03-27 Intel Corporation Method and apparatus for performing process switching in multiprocessor computer systems
US6643765B1 (en) 1995-08-16 2003-11-04 Microunity Systems Engineering, Inc. Programmable processor with group floating point operations
US5710902A (en) 1995-09-06 1998-01-20 Intel Corporation Instruction dependency chain indentifier
US6341324B1 (en) 1995-10-06 2002-01-22 Lsi Logic Corporation Exception processing in superscalar microprocessor
US5864657A (en) 1995-11-29 1999-01-26 Texas Micro, Inc. Main memory system and checkpointing protocol for fault-tolerant computer system
US5983327A (en) 1995-12-01 1999-11-09 Nortel Networks Corporation Data path architecture and arbitration scheme for providing access to a shared system resource
US5793941A (en) 1995-12-04 1998-08-11 Advanced Micro Devices, Inc. On-chip primary cache testing circuit and test method
US5911057A (en) 1995-12-19 1999-06-08 Texas Instruments Incorporated Superscalar microprocessor having combined register and memory renaming circuits, systems, and methods
US5699537A (en) 1995-12-22 1997-12-16 Intel Corporation Processor microarchitecture for efficient dynamic scheduling and execution of chains of dependent instructions
US6882177B1 (en) 1996-01-10 2005-04-19 Altera Corporation Tristate structures for programmable logic devices
US5754818A (en) 1996-03-22 1998-05-19 Sun Microsystems, Inc. Architecture and method for sharing TLB entries through process IDS
US5904892A (en) 1996-04-01 1999-05-18 Saint-Gobain/Norton Industrial Ceramics Corp. Tape cast silicon carbide dummy wafer
US5752260A (en) 1996-04-29 1998-05-12 International Business Machines Corporation High-speed, multiple-port, interleaved cache with arbitration of multiple access addresses
US5806085A (en) 1996-05-01 1998-09-08 Sun Microsystems, Inc. Method for non-volatile caching of network and CD-ROM file accesses using a cache directory, pointers, file name conversion, a local hard disk, and separate small database
US5829028A (en) 1996-05-06 1998-10-27 Advanced Micro Devices, Inc. Data cache configured to store data in a use-once manner
US6108769A (en) 1996-05-17 2000-08-22 Advanced Micro Devices, Inc. Dependency table for reducing dependency checking hardware
US5881277A (en) 1996-06-13 1999-03-09 Texas Instruments Incorporated Pipelined microprocessor with branch misprediction cache circuits, systems and methods
US5860146A (en) 1996-06-25 1999-01-12 Sun Microsystems, Inc. Auxiliary translation lookaside buffer for assisting in accessing data in remote address spaces
US5903760A (en) 1996-06-27 1999-05-11 Intel Corporation Method and apparatus for translating a conditional instruction compatible with a first instruction set architecture (ISA) into a conditional instruction compatible with a second ISA
US5974506A (en) 1996-06-28 1999-10-26 Digital Equipment Corporation Enabling mirror, nonmirror and partial mirror cache modes in a dual cache system
US6167490A (en) 1996-09-20 2000-12-26 University Of Washington Using global memory information to manage memory in a computer network
KR19980032776A (ko) 1996-10-16 1998-07-25 가나이 츠토무 데이타 프로세서 및 데이타 처리시스템
KR19990076967A (ko) 1996-11-04 1999-10-25 요트.게.아. 롤페즈 처리 장치 및 메모리내의 명령 판독
US6253316B1 (en) 1996-11-19 2001-06-26 Advanced Micro Devices, Inc. Three state branch history using one bit in a branch prediction mechanism
US5903750A (en) 1996-11-20 1999-05-11 Institute For The Development Of Emerging Architectures, L.L.P. Dynamic branch prediction for branch instructions with multiple targets
US6212542B1 (en) 1996-12-16 2001-04-03 International Business Machines Corporation Method and system for executing a program within a multiscalar processor by processing linked thread descriptors
US6134634A (en) 1996-12-20 2000-10-17 Texas Instruments Incorporated Method and apparatus for preemptive cache write-back
US5918251A (en) 1996-12-23 1999-06-29 Intel Corporation Method and apparatus for preloading different default address translation attributes
US6016540A (en) 1997-01-08 2000-01-18 Intel Corporation Method and apparatus for scheduling instructions in waves
US6065105A (en) 1997-01-08 2000-05-16 Intel Corporation Dependency matrix
US5802602A (en) 1997-01-17 1998-09-01 Intel Corporation Method and apparatus for performing reads of related data from a set-associative cache memory
US6088780A (en) 1997-03-31 2000-07-11 Institute For The Development Of Emerging Architecture, L.L.C. Page table walker that uses at least one of a default page size and a page size selected for a virtual address space to position a sliding field in a virtual address
US6075938A (en) 1997-06-10 2000-06-13 The Board Of Trustees Of The Leland Stanford Junior University Virtual machine monitors for scalable multiprocessors
US6073230A (en) 1997-06-11 2000-06-06 Advanced Micro Devices, Inc. Instruction fetch unit configured to provide sequential way prediction for sequential instruction fetches
JPH1124929A (ja) 1997-06-30 1999-01-29 Sony Corp 演算処理装置およびその方法
US6128728A (en) 1997-08-01 2000-10-03 Micron Technology, Inc. Virtual shadow registers and virtual register windows
US6170051B1 (en) 1997-08-01 2001-01-02 Micron Technology, Inc. Apparatus and method for program level parallelism in a VLIW processor
US6101577A (en) 1997-09-15 2000-08-08 Advanced Micro Devices, Inc. Pipelined instruction cache and branch prediction mechanism therefor
US5901294A (en) 1997-09-18 1999-05-04 International Business Machines Corporation Method and system for bus arbitration in a multiprocessor system utilizing simultaneous variable-width bus access
US6185660B1 (en) 1997-09-23 2001-02-06 Hewlett-Packard Company Pending access queue for providing data to a target register during an intermediate pipeline phase after a computer cache miss
US5905509A (en) 1997-09-30 1999-05-18 Compaq Computer Corp. Accelerated Graphics Port two level Gart cache having distributed first level caches
US6226732B1 (en) 1997-10-02 2001-05-01 Hitachi Micro Systems, Inc. Memory system architecture
US5922065A (en) 1997-10-13 1999-07-13 Institute For The Development Of Emerging Architectures, L.L.C. Processor utilizing a template field for encoding instruction sequences in a wide-word format
US6178482B1 (en) 1997-11-03 2001-01-23 Brecis Communications Virtual register sets
US6021484A (en) 1997-11-14 2000-02-01 Samsung Electronics Co., Ltd. Dual instruction set architecture
US6256728B1 (en) 1997-11-17 2001-07-03 Advanced Micro Devices, Inc. Processor configured to selectively cancel instructions from its pipeline responsive to a predicted-taken short forward branch instruction
US6260131B1 (en) 1997-11-18 2001-07-10 Intrinsity, Inc. Method and apparatus for TLB memory ordering
US6016533A (en) 1997-12-16 2000-01-18 Advanced Micro Devices, Inc. Way prediction logic for cache array
US6219776B1 (en) 1998-03-10 2001-04-17 Billions Of Operations Per Second Merged array controller and processing element
US6609189B1 (en) 1998-03-12 2003-08-19 Yale University Cycle segmented prefix circuits
JP3657424B2 (ja) 1998-03-20 2005-06-08 松下電器産業株式会社 番組情報を放送するセンター装置と端末装置
US6216215B1 (en) 1998-04-02 2001-04-10 Intel Corporation Method and apparatus for senior loads
US6157998A (en) 1998-04-03 2000-12-05 Motorola Inc. Method for performing branch prediction and resolution of two or more branch instructions within two or more branch prediction buffers
US6205545B1 (en) 1998-04-30 2001-03-20 Hewlett-Packard Company Method and apparatus for using static branch predictions hints with dynamically translated code traces to improve performance
US6115809A (en) 1998-04-30 2000-09-05 Hewlett-Packard Company Compiling strong and weak branching behavior instruction blocks to separate caches for dynamic and static prediction
US6256727B1 (en) 1998-05-12 2001-07-03 International Business Machines Corporation Method and system for fetching noncontiguous instructions in a single clock cycle
JPH11338710A (ja) 1998-05-28 1999-12-10 Toshiba Corp 複数種の命令セットを持つプロセッサのためのコンパイル方法ならびに装置および同方法がプログラムされ記録される記録媒体
US6272616B1 (en) 1998-06-17 2001-08-07 Agere Systems Guardian Corp. Method and apparatus for executing multiple instruction streams in a digital processor with multiple data paths
US6988183B1 (en) 1998-06-26 2006-01-17 Derek Chi-Lan Wong Methods for increasing instruction-level parallelism in microprocessors and digital system
US6260138B1 (en) 1998-07-17 2001-07-10 Sun Microsystems, Inc. Method and apparatus for branch instruction processing in a processor
US6122656A (en) 1998-07-31 2000-09-19 Advanced Micro Devices, Inc. Processor configured to map logical register numbers to physical register numbers using virtual register numbers
US6272662B1 (en) 1998-08-04 2001-08-07 International Business Machines Corporation Distributed storage system using front-end and back-end locking
JP2000057054A (ja) 1998-08-12 2000-02-25 Fujitsu Ltd 高速アドレス変換システム
US8631066B2 (en) 1998-09-10 2014-01-14 Vmware, Inc. Mechanism for providing virtual machines for use by multiple users
US6339822B1 (en) 1998-10-02 2002-01-15 Advanced Micro Devices, Inc. Using padded instructions in a block-oriented cache
US6332189B1 (en) 1998-10-16 2001-12-18 Intel Corporation Branch prediction architecture
GB9825102D0 (en) 1998-11-16 1999-01-13 Insignia Solutions Plc Computer system
JP3110404B2 (ja) 1998-11-18 2000-11-20 甲府日本電気株式会社 マイクロプロセッサ装置及びそのソフトウェア命令高速化方法並びにその制御プログラムを記録した記録媒体
US6490673B1 (en) 1998-11-27 2002-12-03 Matsushita Electric Industrial Co., Ltd Processor, compiling apparatus, and compile program recorded on a recording medium
US6519682B2 (en) 1998-12-04 2003-02-11 Stmicroelectronics, Inc. Pipelined non-blocking level two cache system with inherent transaction collision-avoidance
US6049501A (en) * 1998-12-14 2000-04-11 Motorola, Inc. Memory data bus architecture and method of configuring multi-wide word memories
US6477562B2 (en) 1998-12-16 2002-11-05 Clearwater Networks, Inc. Prioritized instruction scheduling for multi-streaming processors
US7020879B1 (en) 1998-12-16 2006-03-28 Mips Technologies, Inc. Interrupt and exception handling for multi-streaming digital processors
US6247097B1 (en) 1999-01-22 2001-06-12 International Business Machines Corporation Aligned instruction cache handling of instruction fetches across multiple predicted branch instructions
US6321298B1 (en) 1999-01-25 2001-11-20 International Business Machines Corporation Full cache coherency across multiple raid controllers
JP3842474B2 (ja) 1999-02-02 2006-11-08 株式会社ルネサステクノロジ データ処理装置
US6327650B1 (en) 1999-02-12 2001-12-04 Vsli Technology, Inc. Pipelined multiprocessing with upstream processor concurrently writing to local register and to register of downstream processor
US6668316B1 (en) 1999-02-17 2003-12-23 Elbrus International Limited Method and apparatus for conflict-free execution of integer and floating-point operations with a common register file
US6732220B2 (en) 1999-02-17 2004-05-04 Elbrus International Method for emulating hardware features of a foreign architecture in a host operating system environment
US6418530B2 (en) 1999-02-18 2002-07-09 Hewlett-Packard Company Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions
US6437789B1 (en) 1999-02-19 2002-08-20 Evans & Sutherland Computer Corporation Multi-level cache controller
US6212613B1 (en) 1999-03-22 2001-04-03 Cisco Technology, Inc. Methods and apparatus for reusing addresses in a computer
US6529928B1 (en) 1999-03-23 2003-03-04 Silicon Graphics, Inc. Floating-point adder performing floating-point and integer operations
US6449671B1 (en) 1999-06-09 2002-09-10 Ati International Srl Method and apparatus for busing data elements
US6473833B1 (en) * 1999-07-30 2002-10-29 International Business Machines Corporation Integrated cache and directory structure for multi-level caches
US6643770B1 (en) 1999-09-16 2003-11-04 Intel Corporation Branch misprediction recovery using a side memory
US6704822B1 (en) 1999-10-01 2004-03-09 Sun Microsystems, Inc. Arbitration protocol for a shared data cache
US6772325B1 (en) 1999-10-01 2004-08-03 Hitachi, Ltd. Processor architecture and operation for exploiting improved branch control instruction
US6457120B1 (en) 1999-11-01 2002-09-24 International Business Machines Corporation Processor and method including a cache having confirmation bits for improving address predictable branch instruction target predictions
US7107434B2 (en) 1999-12-20 2006-09-12 Board Of Regents, The University Of Texas System, method and apparatus for allocating hardware resources using pseudorandom sequences
JP4693326B2 (ja) 1999-12-22 2011-06-01 ウビコム インコーポレイテッド 組込み型プロセッサにおいてゼロタイムコンテクストスイッチを用いて命令レベルをマルチスレッド化するシステムおよび方法
US6557095B1 (en) 1999-12-27 2003-04-29 Intel Corporation Scheduling operations using a dependency matrix
US6542984B1 (en) 2000-01-03 2003-04-01 Advanced Micro Devices, Inc. Scheduler capable of issuing and reissuing dependency chains
JP2003519833A (ja) 2000-01-03 2003-06-24 アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド 依存性連鎖の発行および再発行が可能なスケジューラ
US6594755B1 (en) 2000-01-04 2003-07-15 National Semiconductor Corporation System and method for interleaved execution of multiple independent threads
US6728872B1 (en) 2000-02-04 2004-04-27 International Business Machines Corporation Method and apparatus for verifying that instructions are pipelined in correct architectural sequence
GB0002848D0 (en) 2000-02-08 2000-03-29 Siroyan Limited Communicating instruction results in processors and compiling methods for processors
GB2365661A (en) 2000-03-10 2002-02-20 British Telecomm Allocating switch requests within a packet switch
US6615340B1 (en) 2000-03-22 2003-09-02 Wilmot, Ii Richard Byron Extended operand management indicator structure and method
US6604187B1 (en) 2000-06-19 2003-08-05 Advanced Micro Devices, Inc. Providing global translations with address space numbers
US6557083B1 (en) 2000-06-30 2003-04-29 Intel Corporation Memory system for multiple data types
US6704860B1 (en) 2000-07-26 2004-03-09 International Business Machines Corporation Data processing system and method for fetching instruction blocks in response to a detected block sequence
US7206925B1 (en) 2000-08-18 2007-04-17 Sun Microsystems, Inc. Backing Register File for processors
US6728866B1 (en) 2000-08-31 2004-04-27 International Business Machines Corporation Partitioned issue queue and allocation strategy
US6721874B1 (en) 2000-10-12 2004-04-13 International Business Machines Corporation Method and system for dynamically shared completion table supporting multiple threads in a processing system
US7757065B1 (en) 2000-11-09 2010-07-13 Intel Corporation Instruction segment recording scheme
US6907600B2 (en) 2000-12-27 2005-06-14 Intel Corporation Virtual translation lookaside buffer
US6647466B2 (en) 2001-01-25 2003-11-11 Hewlett-Packard Development Company, L.P. Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy
FR2820921A1 (fr) 2001-02-14 2002-08-16 Canon Kk Dispositif et procede de transmission dans un commutateur
US6985951B2 (en) 2001-03-08 2006-01-10 International Business Machines Corporation Inter-partition message passing method, system and program product for managing workload in a partitioned processing environment
US6950927B1 (en) 2001-04-13 2005-09-27 The United States Of America As Represented By The Secretary Of The Navy System and method for instruction-level parallelism in a programmable multiple network processor environment
US7200740B2 (en) 2001-05-04 2007-04-03 Ip-First, Llc Apparatus and method for speculatively performing a return instruction in a microprocessor
US6658549B2 (en) 2001-05-22 2003-12-02 Hewlett-Packard Development Company, Lp. Method and system allowing a single entity to manage memory comprising compressed and uncompressed data
US6985591B2 (en) 2001-06-29 2006-01-10 Intel Corporation Method and apparatus for distributing keys for decrypting and re-encrypting publicly distributed media
US7024545B1 (en) 2001-07-24 2006-04-04 Advanced Micro Devices, Inc. Hybrid branch prediction device with two levels of branch prediction cache
US6954846B2 (en) 2001-08-07 2005-10-11 Sun Microsystems, Inc. Microprocessor and method for giving each thread exclusive access to one register file in a multi-threading mode and for giving an active thread access to multiple register files in a single thread mode
US6718440B2 (en) 2001-09-28 2004-04-06 Intel Corporation Memory access latency hiding with hint buffer
US7150021B1 (en) 2001-10-12 2006-12-12 Palau Acquisition Corporation (Delaware) Method and system to allocate resources within an interconnect device according to a resource allocation table
US7117347B2 (en) 2001-10-23 2006-10-03 Ip-First, Llc Processor including fallback branch prediction mechanism for far jump and far call instructions
US7272832B2 (en) 2001-10-25 2007-09-18 Hewlett-Packard Development Company, L.P. Method of protecting user process data in a secure platform inaccessible to the operating system and other tasks on top of the secure platform
US6964043B2 (en) 2001-10-30 2005-11-08 Intel Corporation Method, apparatus, and system to optimize frequently executed code and to use compiler transformation and hardware support to handle infrequently executed code
GB2381886B (en) 2001-11-07 2004-06-23 Sun Microsystems Inc Computer system with virtual memory and paging mechanism
US7092869B2 (en) 2001-11-14 2006-08-15 Ronald Hilton Memory address prediction under emulation
US7080169B2 (en) * 2001-12-11 2006-07-18 Emulex Design & Manufacturing Corporation Receiving data from interleaved multiple concurrent transactions in a FIFO memory having programmable buffer zones
US7363467B2 (en) 2002-01-03 2008-04-22 Intel Corporation Dependence-chain processing using trace descriptors having dependency descriptors
US6640333B2 (en) 2002-01-10 2003-10-28 Lsi Logic Corporation Architecture for a sea of platforms
US7055021B2 (en) 2002-02-05 2006-05-30 Sun Microsystems, Inc. Out-of-order processor that reduces mis-speculation using a replay scoreboard
US7331040B2 (en) 2002-02-06 2008-02-12 Transitive Limted Condition code flag emulation for program code conversion
US6839816B2 (en) 2002-02-26 2005-01-04 International Business Machines Corporation Shared cache line update mechanism
US6731292B2 (en) 2002-03-06 2004-05-04 Sun Microsystems, Inc. System and method for controlling a number of outstanding data transactions within an integrated circuit
JP3719509B2 (ja) 2002-04-01 2005-11-24 株式会社ソニー・コンピュータエンタテインメント シリアル演算パイプライン、演算装置、算術論理演算回路およびシリアル演算パイプラインによる演算方法
US7565509B2 (en) 2002-04-17 2009-07-21 Microsoft Corporation Using limits on address translation to control access to an addressable entity
US6920530B2 (en) 2002-04-23 2005-07-19 Sun Microsystems, Inc. Scheme for reordering instructions via an instruction caching mechanism
US7113488B2 (en) 2002-04-24 2006-09-26 International Business Machines Corporation Reconfigurable circular bus
US7281055B2 (en) 2002-05-28 2007-10-09 Newisys, Inc. Routing mechanisms in systems having multiple multi-processor clusters
US7117346B2 (en) 2002-05-31 2006-10-03 Freescale Semiconductor, Inc. Data processing system having multiple register contexts and method therefor
US6938151B2 (en) 2002-06-04 2005-08-30 International Business Machines Corporation Hybrid branch prediction using a global selection counter and a prediction method comparison table
US8024735B2 (en) 2002-06-14 2011-09-20 Intel Corporation Method and apparatus for ensuring fairness and forward progress when executing multiple threads of execution
JP3845043B2 (ja) 2002-06-28 2006-11-15 富士通株式会社 命令フェッチ制御装置
JP3982353B2 (ja) 2002-07-12 2007-09-26 日本電気株式会社 フォルトトレラントコンピュータ装置、その再同期化方法及び再同期化プログラム
US6944744B2 (en) 2002-08-27 2005-09-13 Advanced Micro Devices, Inc. Apparatus and method for independently schedulable functional units with issue lock mechanism in a processor
US6950925B1 (en) 2002-08-28 2005-09-27 Advanced Micro Devices, Inc. Scheduler for use in a microprocessor that supports data-speculative execution
US7546422B2 (en) 2002-08-28 2009-06-09 Intel Corporation Method and apparatus for the synchronization of distributed caches
US6895491B2 (en) 2002-09-26 2005-05-17 Hewlett-Packard Development Company, L.P. Memory addressing for a virtual machine implementation on a computer processor supporting virtual hash-page-table searching
US7334086B2 (en) 2002-10-08 2008-02-19 Rmi Corporation Advanced processor with system on a chip interconnect technology
US7213248B2 (en) 2002-10-10 2007-05-01 International Business Machines Corporation High speed promotion mechanism suitable for lock acquisition in a multiprocessor data processing system
US6829698B2 (en) 2002-10-10 2004-12-07 International Business Machines Corporation Method, apparatus and system for acquiring a global promotion facility utilizing a data-less transaction
US7222218B2 (en) 2002-10-22 2007-05-22 Sun Microsystems, Inc. System and method for goal-based scheduling of blocks of code for concurrent execution
US20040103251A1 (en) 2002-11-26 2004-05-27 Mitchell Alsup Microprocessor including a first level cache and a second level cache having different cache line sizes
EP1570334A2 (en) 2002-12-04 2005-09-07 Koninklijke Philips Electronics N.V. Register file gating to reduce microprocessor power dissipation
US6981083B2 (en) 2002-12-05 2005-12-27 International Business Machines Corporation Processor virtualization mechanism via an enhanced restoration of hard architected states
US7073042B2 (en) 2002-12-12 2006-07-04 Intel Corporation Reclaiming existing fields in address translation data structures to extend control over memory accesses
US20040117594A1 (en) 2002-12-13 2004-06-17 Vanderspek Julius Memory management method
US20040122887A1 (en) 2002-12-20 2004-06-24 Macy William W. Efficient multiplication of small matrices using SIMD registers
US7191349B2 (en) 2002-12-26 2007-03-13 Intel Corporation Mechanism for processor power state aware distribution of lowest priority interrupt
US20040139441A1 (en) 2003-01-09 2004-07-15 Kabushiki Kaisha Toshiba Processor, arithmetic operation processing method, and priority determination method
US6925421B2 (en) 2003-01-09 2005-08-02 International Business Machines Corporation Method, system, and computer program product for estimating the number of consumers that place a load on an individual resource in a pool of physically distributed resources
US7178010B2 (en) 2003-01-16 2007-02-13 Ip-First, Llc Method and apparatus for correcting an internal call/return stack in a microprocessor that detects from multiple pipeline stages incorrect speculative update of the call/return stack
US7089374B2 (en) 2003-02-13 2006-08-08 Sun Microsystems, Inc. Selectively unmarking load-marked cache lines during transactional program execution
US7278030B1 (en) 2003-03-03 2007-10-02 Vmware, Inc. Virtualization system for computers having multiple protection mechanisms
US6912644B1 (en) 2003-03-06 2005-06-28 Intel Corporation Method and apparatus to steer memory access operations in a virtual memory system
US7111145B1 (en) 2003-03-25 2006-09-19 Vmware, Inc. TLB miss fault handler and method for accessing multiple page tables
US7143273B2 (en) 2003-03-31 2006-11-28 Intel Corporation Method and apparatus for dynamic branch prediction utilizing multiple stew algorithms for indexing a global history
CN1214666C (zh) 2003-04-07 2005-08-10 华为技术有限公司 位置业务中限制位置信息请求流量的方法
US7058764B2 (en) 2003-04-14 2006-06-06 Hewlett-Packard Development Company, L.P. Method of adaptive cache partitioning to increase host I/O performance
US7469407B2 (en) 2003-04-24 2008-12-23 International Business Machines Corporation Method for resource balancing using dispatch flush in a simultaneous multithread processor
EP1471421A1 (en) 2003-04-24 2004-10-27 STMicroelectronics Limited Speculative load instruction control
US7139855B2 (en) 2003-04-24 2006-11-21 International Business Machines Corporation High performance synchronization of resource allocation in a logically-partitioned system
US7290261B2 (en) 2003-04-24 2007-10-30 International Business Machines Corporation Method and logical apparatus for rename register reallocation in a simultaneous multi-threaded (SMT) processor
US7055003B2 (en) 2003-04-25 2006-05-30 International Business Machines Corporation Data cache scrub mechanism for large L2/L3 data cache structures
US7007108B2 (en) 2003-04-30 2006-02-28 Lsi Logic Corporation System method for use of hardware semaphores for resource release notification wherein messages comprises read-modify-write operation and address
JP2007519052A (ja) 2003-06-25 2007-07-12 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ 命令制御式データ処理装置
JP2005032018A (ja) 2003-07-04 2005-02-03 Semiconductor Energy Lab Co Ltd 遺伝的アルゴリズムを用いたマイクロプロセッサ
US7149872B2 (en) 2003-07-10 2006-12-12 Transmeta Corporation System and method for identifying TLB entries associated with a physical address of a specified range
US7089398B2 (en) 2003-07-31 2006-08-08 Silicon Graphics, Inc. Address translation using a page size tag
US8296771B2 (en) 2003-08-18 2012-10-23 Cray Inc. System and method for mapping between resource consumers and resource providers in a computing system
US7133950B2 (en) 2003-08-19 2006-11-07 Sun Microsystems, Inc. Request arbitration in multi-core processor
JP4818919B2 (ja) 2003-08-28 2011-11-16 ミップス テクノロジーズ インコーポレイテッド プロセッサ内での実行の計算スレッドを一時停止して割り当て解除するための統合されたメカニズム
US9032404B2 (en) 2003-08-28 2015-05-12 Mips Technologies, Inc. Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor
US7594089B2 (en) 2003-08-28 2009-09-22 Mips Technologies, Inc. Smart memory based synchronization controller for a multi-threaded multiprocessor SoC
US7849297B2 (en) 2003-08-28 2010-12-07 Mips Technologies, Inc. Software emulation of directed exceptions in a multithreading processor
US7111126B2 (en) 2003-09-24 2006-09-19 Arm Limited Apparatus and method for loading data values
JP4057989B2 (ja) 2003-09-26 2008-03-05 株式会社東芝 スケジューリング方法および情報処理システム
FR2860313B1 (fr) 2003-09-30 2005-11-04 Commissariat Energie Atomique Composant a architecture reconfigurable dynamiquement
US7373637B2 (en) 2003-09-30 2008-05-13 International Business Machines Corporation Method and apparatus for counting instruction and memory location ranges
US7047322B1 (en) 2003-09-30 2006-05-16 Unisys Corporation System and method for performing conflict resolution and flow control in a multiprocessor system
TWI281121B (en) 2003-10-06 2007-05-11 Ip First Llc Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence
US8407433B2 (en) 2007-06-25 2013-03-26 Sonics, Inc. Interconnect implementing internal controls
US7395372B2 (en) 2003-11-14 2008-07-01 International Business Machines Corporation Method and system for providing cache set selection which is power optimized
US7243170B2 (en) 2003-11-24 2007-07-10 International Business Machines Corporation Method and circuit for reading and writing an instruction buffer
US20050120191A1 (en) 2003-12-02 2005-06-02 Intel Corporation (A Delaware Corporation) Checkpoint-based register reclamation
US20050132145A1 (en) 2003-12-15 2005-06-16 Finisar Corporation Contingent processor time division multiple access of memory in a multi-processor system to allow supplemental memory consumer access
US7310722B2 (en) 2003-12-18 2007-12-18 Nvidia Corporation Across-thread out of order instruction dispatch in a multithreaded graphics processor
US7293164B2 (en) 2004-01-14 2007-11-06 International Business Machines Corporation Autonomic method and apparatus for counting branch instructions to generate branch statistics meant to improve branch predictions
US20050204118A1 (en) 2004-02-27 2005-09-15 National Chiao Tung University Method for inter-cluster communication that employs register permutation
US20050216920A1 (en) * 2004-03-24 2005-09-29 Vijay Tewari Use of a virtual machine to emulate a hardware device
US7386679B2 (en) * 2004-04-15 2008-06-10 International Business Machines Corporation System, method and storage medium for memory management
US7383427B2 (en) 2004-04-22 2008-06-03 Sony Computer Entertainment Inc. Multi-scalar extension for SIMD instruction set processors
US20050251649A1 (en) 2004-04-23 2005-11-10 Sony Computer Entertainment Inc. Methods and apparatus for address map optimization on a multi-scalar extension
US7418582B1 (en) 2004-05-13 2008-08-26 Sun Microsystems, Inc. Versatile register file design for a multi-threaded processor utilizing different modes and register windows
US7478198B2 (en) 2004-05-24 2009-01-13 Intel Corporation Multithreaded clustered microarchitecture with dynamic back-end assignment
US7594234B1 (en) 2004-06-04 2009-09-22 Sun Microsystems, Inc. Adaptive spin-then-block mutual exclusion in multi-threaded processing
US7284092B2 (en) 2004-06-24 2007-10-16 International Business Machines Corporation Digital data processing apparatus having multi-level register file
US20050289530A1 (en) 2004-06-29 2005-12-29 Robison Arch D Scheduling of instructions in program compilation
EP1628235A1 (en) 2004-07-01 2006-02-22 Texas Instruments Incorporated Method and system of ensuring integrity of a secure mode entry sequence
US8044951B1 (en) 2004-07-02 2011-10-25 Nvidia Corporation Integer-based functionality in a graphics shading language
US7339592B2 (en) 2004-07-13 2008-03-04 Nvidia Corporation Simulating multiported memories using lower port count memories
US7398347B1 (en) 2004-07-14 2008-07-08 Altera Corporation Methods and apparatus for dynamic instruction controlled reconfigurable register file
EP1619593A1 (en) 2004-07-22 2006-01-25 Sap Ag Computer-Implemented method and system for performing a product availability check
JP4064380B2 (ja) 2004-07-29 2008-03-19 富士通株式会社 演算処理装置およびその制御方法
US8443171B2 (en) 2004-07-30 2013-05-14 Hewlett-Packard Development Company, L.P. Run-time updating of prediction hint instructions
US7213106B1 (en) 2004-08-09 2007-05-01 Sun Microsystems, Inc. Conservative shadow cache support in a point-to-point connected multiprocessing node
US7318143B2 (en) 2004-10-20 2008-01-08 Arm Limited Reuseable configuration data
US20090150890A1 (en) 2007-12-10 2009-06-11 Yourst Matt T Strand-based computing hardware and dynamically optimizing strandware for a high performance microprocessor system
US7707578B1 (en) 2004-12-16 2010-04-27 Vmware, Inc. Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system
US7257695B2 (en) 2004-12-28 2007-08-14 Intel Corporation Register file regions for a processing system
US7996644B2 (en) 2004-12-29 2011-08-09 Intel Corporation Fair sharing of a cache in a multi-core/multi-threaded processor by dynamically partitioning of the cache
US8719819B2 (en) 2005-06-30 2014-05-06 Intel Corporation Mechanism for instruction set based thread execution on a plurality of instruction sequencers
US7050922B1 (en) 2005-01-14 2006-05-23 Agilent Technologies, Inc. Method for optimizing test order, and machine-readable media storing sequences of instructions to perform same
US7681014B2 (en) 2005-02-04 2010-03-16 Mips Technologies, Inc. Multithreading instruction scheduler employing thread group priorities
US7657891B2 (en) 2005-02-04 2010-02-02 Mips Technologies, Inc. Multithreading microprocessor with optimized thread scheduler for increasing pipeline utilization efficiency
JP2008530642A (ja) 2005-02-07 2008-08-07 ペーアーツェーテー イクスペーペー テクノロジーズ アクチエンゲゼルシャフト 低レイテンシーの大量並列データ処理装置
US7400548B2 (en) 2005-02-09 2008-07-15 International Business Machines Corporation Method for providing multiple reads/writes using a 2read/2write register file array
US7343476B2 (en) 2005-02-10 2008-03-11 International Business Machines Corporation Intelligent SMT thread hang detect taking into account shared resource contention/blocking
US7152155B2 (en) 2005-02-18 2006-12-19 Qualcomm Incorporated System and method of correcting a branch misprediction
US20060200655A1 (en) 2005-03-04 2006-09-07 Smith Rodney W Forward looking branch target address caching
US8195922B2 (en) 2005-03-18 2012-06-05 Marvell World Trade, Ltd. System for dynamically allocating processing time to multiple threads
US8522253B1 (en) 2005-03-31 2013-08-27 Guillermo Rozas Hardware support for virtual machine and operating system context switching in translation lookaside buffers and virtually tagged caches
US20060230243A1 (en) 2005-04-06 2006-10-12 Robert Cochran Cascaded snapshots
US7313775B2 (en) 2005-04-06 2007-12-25 Lsi Corporation Integrated circuit with relocatable processor hardmac
US7447869B2 (en) * 2005-04-07 2008-11-04 Ati Technologies, Inc. Method and apparatus for fragment processing in a virtual memory system
US20060230409A1 (en) 2005-04-07 2006-10-12 Matteo Frigo Multithreaded processor architecture with implicit granularity adaptation
US8230423B2 (en) 2005-04-07 2012-07-24 International Business Machines Corporation Multithreaded processor architecture with operational latency hiding
US20060230253A1 (en) 2005-04-11 2006-10-12 Lucian Codrescu Unified non-partitioned register files for a digital signal processor operating in an interleaved multi-threaded environment
US20060236074A1 (en) 2005-04-14 2006-10-19 Arm Limited Indicating storage locations within caches
US7461237B2 (en) 2005-04-20 2008-12-02 Sun Microsystems, Inc. Method and apparatus for suppressing duplicative prefetches for branch target cache lines
US8713286B2 (en) 2005-04-26 2014-04-29 Qualcomm Incorporated Register files for a digital signal processor operating in an interleaved multi-threaded environment
US7861055B2 (en) 2005-06-07 2010-12-28 Broadcom Corporation Method and system for on-chip configurable data ram for fast memory and pseudo associative caches
US8010969B2 (en) 2005-06-13 2011-08-30 Intel Corporation Mechanism for monitoring instruction set based thread execution on a plurality of instruction sequencers
CN101263465B (zh) 2005-09-14 2011-11-09 皇家飞利浦电子股份有限公司 用于总线仲裁的方法和系统
US7562271B2 (en) * 2005-09-26 2009-07-14 Rambus Inc. Memory system topologies including a buffer device and an integrated circuit memory device
US7350056B2 (en) 2005-09-27 2008-03-25 International Business Machines Corporation Method and apparatus for issuing instructions from an issue queue in an information handling system
US7606975B1 (en) 2005-09-28 2009-10-20 Sun Microsystems, Inc. Trace cache for efficient self-modifying code processing
US7231106B2 (en) 2005-09-30 2007-06-12 Lucent Technologies Inc. Apparatus for directing an optical signal from an input fiber to an output fiber within a high index host
US7681019B1 (en) 2005-11-18 2010-03-16 Sun Microsystems, Inc. Executing functions determined via a collection of operations from translated instructions
US7861060B1 (en) 2005-12-15 2010-12-28 Nvidia Corporation Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior
US7634637B1 (en) 2005-12-16 2009-12-15 Nvidia Corporation Execution of parallel groups of threads with per-instruction serialization
US7770161B2 (en) * 2005-12-28 2010-08-03 International Business Machines Corporation Post-register allocation profile directed instruction scheduling
US8423682B2 (en) 2005-12-30 2013-04-16 Intel Corporation Address space emulation
GB2435362B (en) 2006-02-20 2008-11-26 Cramer Systems Ltd Method of configuring devices in a telecommunications network
JP4332205B2 (ja) 2006-02-27 2009-09-16 富士通株式会社 キャッシュ制御装置およびキャッシュ制御方法
US8327115B2 (en) 2006-04-12 2012-12-04 Soft Machines, Inc. Plural matrices of execution units for processing matrices of row dependent instructions in single clock cycle in super or separate mode
US7610571B2 (en) 2006-04-14 2009-10-27 Cadence Design Systems, Inc. Method and system for simulating state retention of an RTL design
US7577820B1 (en) 2006-04-14 2009-08-18 Tilera Corporation Managing data in a parallel processing environment
CN100485636C (zh) 2006-04-24 2009-05-06 华为技术有限公司 一种基于模型驱动进行电信级业务开发的调试方法及装置
US7804076B2 (en) 2006-05-10 2010-09-28 Taiwan Semiconductor Manufacturing Co., Ltd Insulator for high current ion implanters
US8145882B1 (en) 2006-05-25 2012-03-27 Mips Technologies, Inc. Apparatus and method for processing template based user defined instructions
US20080126771A1 (en) 2006-07-25 2008-05-29 Lei Chen Branch Target Extension for an Instruction Cache
CN100495324C (zh) 2006-07-27 2009-06-03 中国科学院计算技术研究所 复杂指令集体系结构中的深度优先异常处理方法
US8046775B2 (en) 2006-08-14 2011-10-25 Marvell World Trade Ltd. Event-based bandwidth allocation mode switching method and apparatus
US7904704B2 (en) 2006-08-14 2011-03-08 Marvell World Trade Ltd. Instruction dispatching method and apparatus
US7539842B2 (en) 2006-08-15 2009-05-26 International Business Machines Corporation Computer memory system for selecting memory buses according to physical memory organization information stored in virtual address translation tables
US7594060B2 (en) 2006-08-23 2009-09-22 Sun Microsystems, Inc. Data buffer allocation in a non-blocking data services platform using input/output switching fabric
US7752474B2 (en) 2006-09-22 2010-07-06 Apple Inc. L1 cache flush when processor is entering low power mode
US7716460B2 (en) 2006-09-29 2010-05-11 Qualcomm Incorporated Effective use of a BHT in processor having variable length instruction set execution modes
US7774549B2 (en) 2006-10-11 2010-08-10 Mips Technologies, Inc. Horizontally-shared cache victims in multiple core processors
TWI337495B (en) 2006-10-26 2011-02-11 Au Optronics Corp System and method for operation scheduling
US7680988B1 (en) 2006-10-30 2010-03-16 Nvidia Corporation Single interconnect providing read and write access to a memory shared by concurrent threads
EP2527972A3 (en) 2006-11-14 2014-08-06 Soft Machines, Inc. Apparatus and method for processing complex instruction formats in a multi- threaded architecture supporting various context switch modes and virtualization schemes
US7493475B2 (en) 2006-11-15 2009-02-17 Stmicroelectronics, Inc. Instruction vector-mode processing in multi-lane processor by multiplex switch replicating instruction in one lane to select others along with updated operand address
US7934179B2 (en) 2006-11-20 2011-04-26 Et International, Inc. Systems and methods for logic verification
US20080235500A1 (en) 2006-11-21 2008-09-25 Davis Gordon T Structure for instruction cache trace formation
JP2008130056A (ja) 2006-11-27 2008-06-05 Renesas Technology Corp 半導体回路
WO2008077088A2 (en) 2006-12-19 2008-06-26 The Board Of Governors For Higher Education, State Of Rhode Island And Providence Plantations System and method for branch misprediction prediction using complementary branch predictors
US7783869B2 (en) 2006-12-19 2010-08-24 Arm Limited Accessing branch predictions ahead of instruction fetching
EP1940028B1 (en) 2006-12-29 2012-02-29 STMicroelectronics Srl Asynchronous interconnection system for 3D inter-chip communication
US8321849B2 (en) 2007-01-26 2012-11-27 Nvidia Corporation Virtual architecture and instruction set for parallel thread computing
TW200833002A (en) 2007-01-31 2008-08-01 Univ Nat Yunlin Sci & Tech Distributed switching circuit having fairness
US20080189501A1 (en) 2007-02-05 2008-08-07 Irish John D Methods and Apparatus for Issuing Commands on a Bus
US7685410B2 (en) 2007-02-13 2010-03-23 Global Foundries Inc. Redirect recovery cache that receives branch misprediction redirects and caches instructions to be dispatched in response to the redirects
US7647483B2 (en) 2007-02-20 2010-01-12 Sony Computer Entertainment Inc. Multi-threaded parallel processor methods and apparatus
JP4980751B2 (ja) 2007-03-02 2012-07-18 富士通セミコンダクター株式会社 データ処理装置、およびメモリのリードアクティブ制御方法。
US8452907B2 (en) 2007-03-27 2013-05-28 Arm Limited Data processing apparatus and method for arbitrating access to a shared resource
US20080250227A1 (en) 2007-04-04 2008-10-09 Linderman Michael D General Purpose Multiprocessor Programming Apparatus And Method
US7716183B2 (en) 2007-04-11 2010-05-11 Dot Hill Systems Corporation Snapshot preserved data cloning
US7941791B2 (en) 2007-04-13 2011-05-10 Perry Wang Programming environment for heterogeneous processor resource integration
US7769955B2 (en) 2007-04-27 2010-08-03 Arm Limited Multiple thread instruction fetch from different cache levels
US7711935B2 (en) 2007-04-30 2010-05-04 Netlogic Microsystems, Inc. Universal branch identifier for invalidation of speculative instructions
US8555039B2 (en) 2007-05-03 2013-10-08 Qualcomm Incorporated System and method for using a local condition code register for accelerating conditional instruction execution in a pipeline processor
US8219996B1 (en) 2007-05-09 2012-07-10 Hewlett-Packard Development Company, L.P. Computer processor with fairness monitor
CN101344840B (zh) 2007-07-10 2011-08-31 苏州简约纳电子有限公司 一种微处理器及在微处理器中执行指令的方法
US7937568B2 (en) 2007-07-11 2011-05-03 International Business Machines Corporation Adaptive execution cycle control method for enhanced instruction throughput
US20090025004A1 (en) 2007-07-16 2009-01-22 Microsoft Corporation Scheduling by Growing and Shrinking Resource Allocation
US8433851B2 (en) * 2007-08-16 2013-04-30 International Business Machines Corporation Reducing wiring congestion in a cache subsystem utilizing sectored caches with discontiguous addressing
US8108545B2 (en) 2007-08-27 2012-01-31 International Business Machines Corporation Packet coalescing in virtual channels of a data processing system in a multi-tiered full-graph interconnect architecture
US7711929B2 (en) * 2007-08-30 2010-05-04 International Business Machines Corporation Method and system for tracking instruction dependency in an out-of-order processor
US8725991B2 (en) 2007-09-12 2014-05-13 Qualcomm Incorporated Register file system and method for pipelined processing
US8082420B2 (en) 2007-10-24 2011-12-20 International Business Machines Corporation Method and apparatus for executing instructions
US7856530B1 (en) 2007-10-31 2010-12-21 Network Appliance, Inc. System and method for implementing a dynamic cache for a data storage system
CN100478918C (zh) * 2007-10-31 2009-04-15 中国人民解放军国防科学技术大学 微处理器中分段高速缓存的设计方法及分段高速缓存
US7877559B2 (en) 2007-11-26 2011-01-25 Globalfoundries Inc. Mechanism to accelerate removal of store operations from a queue
US8245232B2 (en) 2007-11-27 2012-08-14 Microsoft Corporation Software-configurable and stall-time fair memory access scheduling mechanism for shared memory systems
US7809925B2 (en) * 2007-12-07 2010-10-05 International Business Machines Corporation Processing unit incorporating vectorizable execution unit
US8145844B2 (en) 2007-12-13 2012-03-27 Arm Limited Memory controller with write data cache and read data cache
US7870371B2 (en) 2007-12-17 2011-01-11 Microsoft Corporation Target-frequency based indirect jump prediction for high-performance processors
US7831813B2 (en) 2007-12-17 2010-11-09 Globalfoundries Inc. Uses of known good code for implementing processor architectural modifications
US20090165007A1 (en) 2007-12-19 2009-06-25 Microsoft Corporation Task-level thread scheduling and resource allocation
US8782384B2 (en) 2007-12-20 2014-07-15 Advanced Micro Devices, Inc. Branch history with polymorphic indirect branch information
US7917699B2 (en) 2007-12-21 2011-03-29 Mips Technologies, Inc. Apparatus and method for controlling the exclusivity mode of a level-two cache
US8645965B2 (en) 2007-12-31 2014-02-04 Intel Corporation Supporting metered clients with manycore through time-limited partitioning
US9244855B2 (en) 2007-12-31 2016-01-26 Intel Corporation Method, system, and apparatus for page sizing extension
US7877582B2 (en) 2008-01-31 2011-01-25 International Business Machines Corporation Multi-addressable register file
WO2009101563A1 (en) 2008-02-11 2009-08-20 Nxp B.V. Multiprocessing implementing a plurality of virtual processors
US7987343B2 (en) 2008-03-19 2011-07-26 International Business Machines Corporation Processor and method for synchronous load multiple fetching sequence and pipeline stage result tracking to facilitate early address generation interlock bypass
US9513905B2 (en) 2008-03-28 2016-12-06 Intel Corporation Vector instructions to enable efficient synchronization and parallel reduction operations
US8120608B2 (en) 2008-04-04 2012-02-21 Via Technologies, Inc. Constant buffering for a computational core of a programmable graphics processing unit
TWI364703B (en) 2008-05-26 2012-05-21 Faraday Tech Corp Processor and early execution method of data load thereof
US8145880B1 (en) 2008-07-07 2012-03-27 Ovics Matrix processor data switch routing systems and methods
JP5733860B2 (ja) 2008-07-10 2015-06-10 ロケティック テクノロジーズ リミテッド 依存問題の効率的並列計算
JP2010039536A (ja) * 2008-07-31 2010-02-18 Panasonic Corp プログラム変換装置、プログラム変換方法およびプログラム変換プログラム
US8316435B1 (en) 2008-08-14 2012-11-20 Juniper Networks, Inc. Routing device having integrated MPLS-aware firewall with virtual security system support
US8135942B2 (en) 2008-08-28 2012-03-13 International Business Machines Corpration System and method for double-issue instructions using a dependency matrix and a side issue queue
US7769984B2 (en) 2008-09-11 2010-08-03 International Business Machines Corporation Dual-issuance of microprocessor instructions using dual dependency matrices
US8225048B2 (en) 2008-10-01 2012-07-17 Hewlett-Packard Development Company, L.P. Systems and methods for resource access
US9244732B2 (en) 2009-08-28 2016-01-26 Vmware, Inc. Compensating threads for microarchitectural resource contentions by prioritizing scheduling and execution
US7941616B2 (en) 2008-10-21 2011-05-10 Microsoft Corporation System to reduce interference in concurrent programs
GB2464703A (en) 2008-10-22 2010-04-28 Advanced Risc Mach Ltd An array of interconnected processors executing a cycle-based program
US8423749B2 (en) 2008-10-22 2013-04-16 International Business Machines Corporation Sequential processing in network on chip nodes by threads generating message containing payload and pointer for nanokernel to access algorithm to be executed on payload in another node
KR101374452B1 (ko) 2008-10-30 2014-03-17 노키아 코포레이션 데이터 블록을 인터리빙하기 위한 방법 및 장치
US8032678B2 (en) 2008-11-05 2011-10-04 Mediatek Inc. Shared resource arbitration
US7848129B1 (en) 2008-11-20 2010-12-07 Netlogic Microsystems, Inc. Dynamically partitioned CAM array
US8868838B1 (en) 2008-11-21 2014-10-21 Nvidia Corporation Multi-class data cache policies
US8171223B2 (en) 2008-12-03 2012-05-01 Intel Corporation Method and system to increase concurrency and control replication in a multi-core cache hierarchy
US8200949B1 (en) 2008-12-09 2012-06-12 Nvidia Corporation Policy based allocation of register file cache to threads in multi-threaded processor
US8312268B2 (en) 2008-12-12 2012-11-13 International Business Machines Corporation Virtual machine
US8099586B2 (en) 2008-12-30 2012-01-17 Oracle America, Inc. Branch misprediction recovery mechanism for microprocessors
US20100169578A1 (en) 2008-12-31 2010-07-01 Texas Instruments Incorporated Cache tag memory
US20100205603A1 (en) 2009-02-09 2010-08-12 Unisys Corporation Scheduling and dispatching tasks in an emulated operating system
JP5417879B2 (ja) 2009-02-17 2014-02-19 富士通セミコンダクター株式会社 キャッシュ装置
US8505013B2 (en) 2010-03-12 2013-08-06 Lsi Corporation Reducing data read latency in a network communications processor architecture
US8805788B2 (en) 2009-05-04 2014-08-12 Moka5, Inc. Transactional virtual disk with differential snapshots
US8332854B2 (en) 2009-05-19 2012-12-11 Microsoft Corporation Virtualized thread scheduling for hardware thread optimization based on hardware resource parameter summaries of instruction blocks in execution groups
US8533437B2 (en) 2009-06-01 2013-09-10 Via Technologies, Inc. Guaranteed prefetch instruction
GB2471067B (en) 2009-06-12 2011-11-30 Graeme Roy Smith Shared resource multi-thread array processor
US9122487B2 (en) 2009-06-23 2015-09-01 Oracle America, Inc. System and method for balancing instruction loads between multiple execution units using assignment history
CN101582025B (zh) 2009-06-25 2011-05-25 浙江大学 片上多处理器体系架构下全局寄存器重命名表的实现方法
US8397049B2 (en) 2009-07-13 2013-03-12 Apple Inc. TLB prefetching
US8539486B2 (en) 2009-07-17 2013-09-17 International Business Machines Corporation Transactional block conflict resolution based on the determination of executing threads in parallel or in serial mode
JP5423217B2 (ja) 2009-08-04 2014-02-19 富士通株式会社 演算処理装置、情報処理装置、および演算処理装置の制御方法
US8127078B2 (en) 2009-10-02 2012-02-28 International Business Machines Corporation High performance unaligned cache access
US20110082983A1 (en) 2009-10-06 2011-04-07 Alcatel-Lucent Canada, Inc. Cpu instruction and data cache corruption prevention system
US8695002B2 (en) 2009-10-20 2014-04-08 Lantiq Deutschland Gmbh Multi-threaded processors and multi-processor systems comprising shared resources
US8364933B2 (en) 2009-12-18 2013-01-29 International Business Machines Corporation Software assisted translation lookaside buffer search mechanism
JP2011150397A (ja) 2010-01-19 2011-08-04 Panasonic Corp バス調停装置
KR101699910B1 (ko) 2010-03-04 2017-01-26 삼성전자주식회사 재구성 가능 프로세서 및 그 제어 방법
US20120005462A1 (en) 2010-07-01 2012-01-05 International Business Machines Corporation Hardware Assist for Optimizing Code During Processing
US8312258B2 (en) 2010-07-22 2012-11-13 Intel Corporation Providing platform independent memory logic
CN101916180B (zh) 2010-08-11 2013-05-29 中国科学院计算技术研究所 Risc处理器中执行寄存器类型指令的方法和其系统
US8751745B2 (en) 2010-08-11 2014-06-10 Advanced Micro Devices, Inc. Method for concurrent flush of L1 and L2 caches
US8756329B2 (en) 2010-09-15 2014-06-17 Oracle International Corporation System and method for parallel multiplexing between servers in a cluster
US9201801B2 (en) 2010-09-15 2015-12-01 International Business Machines Corporation Computing device with asynchronous auxiliary execution unit
KR101685247B1 (ko) 2010-09-17 2016-12-09 소프트 머신즈, 인크. 조기 원거리 분기 예측을 위한 섀도우 캐시를 포함하는 단일 사이클 다중 분기 예측
US20120079212A1 (en) 2010-09-23 2012-03-29 International Business Machines Corporation Architecture for sharing caches among multiple processes
US9678755B2 (en) 2010-10-12 2017-06-13 Intel Corporation Instruction sequence buffer to enhance branch prediction efficiency
US9733944B2 (en) 2010-10-12 2017-08-15 Intel Corporation Instruction sequence buffer to store branches having reliably predictable instruction sequences
US8370553B2 (en) 2010-10-18 2013-02-05 International Business Machines Corporation Formal verification of random priority-based arbiters using property strengthening and underapproximations
US9047178B2 (en) 2010-12-13 2015-06-02 SanDisk Technologies, Inc. Auto-commit memory synchronization
US8677355B2 (en) 2010-12-17 2014-03-18 Microsoft Corporation Virtual machine branching and parallel execution
CN103547993B (zh) 2011-03-25 2018-06-26 英特尔公司 通过使用由可分割引擎实例化的虚拟核来执行指令序列代码块
EP2689326B1 (en) 2011-03-25 2022-11-16 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
CN108376097B (zh) 2011-03-25 2022-04-15 英特尔公司 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段
US20120254592A1 (en) 2011-04-01 2012-10-04 Jesus Corbal San Adrian Systems, apparatuses, and methods for expanding a memory source into a destination register and compressing a source register into a destination memory location
US9740494B2 (en) 2011-04-29 2017-08-22 Arizona Board Of Regents For And On Behalf Of Arizona State University Low complexity out-of-order issue logic using static circuits
US8843690B2 (en) 2011-07-11 2014-09-23 Avago Technologies General Ip (Singapore) Pte. Ltd. Memory conflicts learning capability
US8930432B2 (en) 2011-08-04 2015-01-06 International Business Machines Corporation Floating point execution unit with fixed point functionality
US20130046934A1 (en) 2011-08-15 2013-02-21 Robert Nychka System caching using heterogenous memories
US8839025B2 (en) 2011-09-30 2014-09-16 Oracle International Corporation Systems and methods for retiring and unretiring cache lines
WO2013077876A1 (en) 2011-11-22 2013-05-30 Soft Machines, Inc. A microprocessor accelerated code optimizer
EP2783280B1 (en) 2011-11-22 2019-09-11 Intel Corporation An accelerated code optimizer for a multiengine microprocessor
KR101648278B1 (ko) 2011-11-22 2016-08-12 소프트 머신즈, 인크. 마이크로프로세서 가속 코드 최적화기 및 의존성 재순서화 방법
US8930674B2 (en) 2012-03-07 2015-01-06 Soft Machines, Inc. Systems and methods for accessing a unified translation lookaside buffer
KR20130119285A (ko) 2012-04-23 2013-10-31 한국전자통신연구원 클러스터 컴퓨팅 환경에서의 자원 할당 장치 및 그 방법
US9684601B2 (en) 2012-05-10 2017-06-20 Arm Limited Data processing apparatus having cache and translation lookaside buffer
US9940247B2 (en) 2012-06-26 2018-04-10 Advanced Micro Devices, Inc. Concurrent access to cache dirty bits
US9916253B2 (en) 2012-07-30 2018-03-13 Intel Corporation Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput
US9710399B2 (en) 2012-07-30 2017-07-18 Intel Corporation Systems and methods for flushing a cache with modified data
US9740612B2 (en) 2012-07-30 2017-08-22 Intel Corporation Systems and methods for maintaining the coherency of a store coalescing cache and a load cache
US9430410B2 (en) 2012-07-30 2016-08-30 Soft Machines, Inc. Systems and methods for supporting a plurality of load accesses of a cache in a single cycle
US9229873B2 (en) 2012-07-30 2016-01-05 Soft Machines, Inc. Systems and methods for supporting a plurality of load and store accesses of a cache
US9678882B2 (en) 2012-10-11 2017-06-13 Intel Corporation Systems and methods for non-blocking implementation of cache flush instructions
US10037228B2 (en) 2012-10-25 2018-07-31 Nvidia Corporation Efficient memory virtualization in multi-threaded processing units
US9195506B2 (en) 2012-12-21 2015-11-24 International Business Machines Corporation Processor provisioning by a middleware processing system for a plurality of logical processor partitions
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
US9632825B2 (en) 2013-03-15 2017-04-25 Intel Corporation Method and apparatus for efficient scheduling for asymmetrical execution units
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
KR101800948B1 (ko) 2013-03-15 2017-11-23 인텔 코포레이션 레지스터 뷰, 소스 뷰, 명령어 뷰, 및 복수의 레지스터 템플릿을 가진 마이크로프로세서 아키텍처를 이용하여 명령어들의 블록들을 실행하는 방법
US9904625B2 (en) 2013-03-15 2018-02-27 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
WO2014150806A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for populating register view data structure by using register template snapshots
CN105247484B (zh) 2013-03-15 2021-02-23 英特尔公司 利用本地分布式标志体系架构来仿真访客集中式标志体系架构的方法
US10275255B2 (en) 2013-03-15 2019-04-30 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
EP2972845B1 (en) 2013-03-15 2021-07-07 Intel Corporation A method for executing multithreaded instructions grouped onto blocks
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
WO2014150991A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for implementing a reduced size register view data structure in a microprocessor
WO2014150971A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for dependency broadcasting through a block organized source view data structure
US9569216B2 (en) 2013-03-15 2017-02-14 Soft Machines, Inc. Method for populating a source view data structure by using register template snapshots

Also Published As

Publication number Publication date
EP2689330A2 (en) 2014-01-29
US9934072B2 (en) 2018-04-03
TWI518504B (zh) 2016-01-21
CN103562866A (zh) 2014-02-05
EP2689330B1 (en) 2022-12-21
US20160210176A1 (en) 2016-07-21
EP2689330A4 (en) 2014-08-13
US9842005B2 (en) 2017-12-12
WO2012135041A2 (en) 2012-10-04
CN108376097A (zh) 2018-08-07
WO2012135041A3 (en) 2012-12-27
CN103562866B (zh) 2018-03-30
TW201305820A (zh) 2013-02-01
KR101620676B1 (ko) 2016-05-23
KR20140018946A (ko) 2014-02-13
US20120246450A1 (en) 2012-09-27

Similar Documents

Publication Publication Date Title
US11204769B2 (en) Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
CN108376097B (zh) 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段
US9990200B2 (en) Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
US10013391B1 (en) Architecture emulation in a parallel processing environment
KR20180021165A (ko) 프로세서 명령어 창에 대한 명령어 블록의 대량 할당

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant