CN108427574B - 微处理器加速的代码优化器 - Google Patents
微处理器加速的代码优化器 Download PDFInfo
- Publication number
- CN108427574B CN108427574B CN201810449173.XA CN201810449173A CN108427574B CN 108427574 B CN108427574 B CN 108427574B CN 201810449173 A CN201810449173 A CN 201810449173A CN 108427574 B CN108427574 B CN 108427574B
- Authority
- CN
- China
- Prior art keywords
- dependency
- instruction
- instructions
- sequence
- microprocessor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 claims abstract description 85
- 230000008569 process Effects 0.000 claims abstract description 61
- 230000001419 dependent effect Effects 0.000 claims abstract description 41
- 239000013598 vector Substances 0.000 claims description 8
- 238000005457 optimization Methods 0.000 abstract description 44
- 230000001133 acceleration Effects 0.000 abstract description 3
- 230000015654 memory Effects 0.000 description 20
- 238000010586 diagram Methods 0.000 description 19
- 239000011159 matrix material Substances 0.000 description 17
- 238000012545 processing Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 9
- 238000005192 partition Methods 0.000 description 7
- 239000000872 buffer Substances 0.000 description 4
- 239000012634 fragment Substances 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000000903 blocking effect Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 101100534223 Caenorhabditis elegans src-1 gene Proteins 0.000 description 1
- 101100058681 Drosophila melanogaster Btk29A gene Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000005429 filling process Methods 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3808—Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/3017—Runtime instruction translation, e.g. macros
- G06F9/30174—Runtime instruction translation, e.g. macros for non-native instruction set, e.g. Javabyte, legacy code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3838—Dependency mechanisms, e.g. register scoreboarding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3853—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
本发明涉及微处理器加速的代码优化器。一种用于加速微处理器的代码优化的方法。该方法包括使用指令提取部件来提取传入的微指令序列并且向解码部件传送所提取的宏指令用于解码成微指令。通过将微指令序列重排序成包括多个依赖代码组的优化的微指令序列来执行优化处理。向微处理器流水线输出优化的微指令序列用于执行。优化的微指令序列的副本存储到序列高速缓存中,用于在后续命中优化的微指令序列时的后续使用。
Description
本发明专利申请是国际申请号为PCT/US2011/061957,国际申请日为2011年11月22日,进入中国国家阶段的申请号为201180076248.0,名称为“微处理器加速的代码优化器”的发明专利申请的分案申请。
有关申请的交叉引用
本申请与Mohammad A.Abdallah提交于2010年1月5日、名称为"APPARATUS ANDMETHOD FOR PROCESSING COMPLEX INSTRUCTION FORMATS IN A MULTITHREADEDARCHITECTURE SUPPORTING VARIOUS CONTEXT SWITCH MODES AND VIRTUALIZATIONSCHEMES"的共同未决、共同转让的美国专利申请第2010/0161948号有关,并且将其完全结合于此。
本申请与Mohammad A.Abdallah提交于2008年12月19日、名称为"APPARATUS ANDMETHOD FOR PROCESSING AN INSTRUCTION MATRIX SPECIFYING PARALLEL IN DEPENDENTOPERATIONS"的共同未决、共同转让的美国专利申请第2009/0113170号有关,并且将其完全结合于此。
本申请与Mohammad A.Abdallah提交于2010年9月17日、名称为"SINGLE CYCLEMULTI-BRANCH PREDICTION INCLUDING SHADOW CACHE FOR EARLY FAR BRANCHPREDICTION"共同未决、共同转让的美国专利申请第61/384,198号有关,并且将其完全结合于此。
本申请与Mohammad A.Abdallah提交于2011年3月25日、名称为"EXECUTINGINSTRUCTION SEQUENCE CODE BLOCKS BY USING VIRTUAL CORES INSTANTIATED BYPARTITIONABLE ENGINES"的共同未决、共同转让的美国专利申请第61/467,944号有关,并且将其完全结合于此。
技术领域
本发明总体上涉及数字计算机系统、更具体地涉及一种用于选择包括指令序列的指令的系统和方法。
背景技术
要求处理器处置相互依赖的多个任务或者完全独立的多个任务。这样的处理器的内部状态通常由可以在每个特定程序执行瞬间保持不同值的寄存器构成。在每个程序执行瞬间,内部状态映像被称为处理器的架构状态。
在切换代码执行以运行另一函数(例如另一线程、进程或者程序)时,必须保存机器/处理器的状态,从而新函数可以利用内部寄存器以构建它的新状态。一旦终止新函数,然后可以丢弃它的状态,并且将恢复先前上下文的状态并且执行重建。这样的切换过程被称为上下文切换并且尤其对于运用大量(例如64、128、256个)寄存器和/或无序执行的现代架构通常包括数十或者数百个周期。
在线程认知硬件架构中,对于硬件通常支持用于有限数目的由硬件支持的线程的多个上下文状态。在这一情况下,硬件重复用于每个支持的线程的所有架构状态元件。这消除对于在执行新线程时的上下文切换的需要。然而这仍然具有多个缺点、即为在硬件中支持的每个附加线程重复所有架构状态元件(即寄存器)的面积、功率和复杂性。此外,如果软件线程数目超过显式支持的硬件线程的数目,则仍然必须执行上下文切换。
这随着在需要大量线程的精细粒度基础上需要并行而变得普遍。具有重复上下文状态硬件存储的硬件线程认知架构无助于非线程式软件代码而仅减少用于线程式软件的上下文切换数目。然而那些线程通常被构造用于谷粒平行性并且造成用于启动和同步的繁重软件开销从而让精细粒度并行、比如函数调用和循环并行执行而没有高效线程启动/自动生成。这样的描述的开销伴随有难以使用现有技术的用于非显式/易于并行化/线程化的软件代码的编译器或者用户并行化技术的这样的代码的自动并行化。
发明内容
在一个实施例中,将本发明实施为一种用于加速微处理器中的代码优化的方法。该方法包括使用指令提取部件来提取传入的宏指令序列并且向解码部件传送所提取的宏指令用于解码成微指令。通过将微指令序列重排序成包括多个依赖代码组的优化的微指令序列来执行优化处理。然后向微处理器流水线输出优化的微指令序列用于执行。优化的微指令序列的副本被存储在序列高速缓存中用于在后续命中优化的微指令序列时的后续使用。
前文为发明内容、因此必然地包含简化、概括和省略的细节;因而本领域技术人员将理解发明内容仅为示例而未旨在于以任何方式限制。仅如权利要求限定的本发明的其它方面、发明特征和优点将在以下阐述的非限制具体描述中变得清楚。
附图说明
在附图的各图中通过示例而非通过限制来举例说明本发明,并且在附图中,相似标号指代相似元素。
图1示出根据本发明的一个实施例的微处理器的分配/发布级的概况图。
图2示出对根据本发明的一个实施例的优化过程进行图示的概况图。
图3示出根据本发明的一个实施例的多步骤优化过程。
图4示出根据本发明的一个实施例的多步骤优化和指令移动过程。
图5示出根据本发明的一个实施例的示例硬件优化过程的步骤的流程图。
图6示出根据本发明的一个实施例的备选示例硬件优化过程的步骤的流程图。
图7示出对根据本发明的一个实施例的分配/发布级的CAM匹配硬件和优先级编码硬件的操作进行示出的图。
图8示出对根据本发明的一个实施例的在分支之前的优化的调度进行图示的图。
图9示出对根据本发明的一个实施例的在存储之前的优化的调度进行图示的图。
图10示出根据本发明的一个实施例的示例软件优化过程的图。
图11示出根据本发明的一个实施例的SIMD基于软件的优化过程的流程图。
图12示出根据本发明的一个实施例的示例SIMD基于软件的优化过程的操作步骤的流程图。
图13示出根据本发明的一个实施例的基于软件的依赖性广播过程。
图14示出对根据本发明的一个实施例如何使用指令的依赖性分组以构建依赖指令的可变有界组的进行示出的示例流程图。
图15示出对根据本发明的一个实施例的指令的分级调度进行描绘的流程图。
图16示出对根据本发明的一个实施例的三时隙依赖性指令组的分级调度进行描绘的流程图。
图17示出对根据本发明的一个实施例的三时隙依赖性指令组的分级移动窗口调度进行描绘的流程图。
图18示出根据本发明的一个实施例如何向多个计算引擎分配指令的可变大小的依赖链(例如可变有界组)。
图19示出对根据本发明的一个实施例的向调度队列的块分配和三时隙依赖性指令组的分级移动窗口调度进行描绘的流程图。
图20示出根据本发明的一个实施例如何在引擎上执行依赖代码块(例如依赖性组或者依赖性链)。
图21示出根据本发明的一个实施例的多个引擎及其部件的概况图,这些部件包括全局前端提取和调度器和寄存器文件、全局互连以及用于多核处理器的片段式存储器子系统。
图22示出根据本发明的一个实施例的多个分段、多个分段式公共分区调度器和互连以及进入分段的端口。
图23示出根据本发明的一个实施例的示例微处理器流水线的图。
具体实施方式
虽然已经与一个实施例结合来描述本发明,但是本发明未旨在于限于这里阐述的具体形式。恰好相反,它旨在于覆盖如可以在如所附权利要求限定的本发明的范围内合理包括的这样的备选、修改和等同。
在以下具体描述中,已经阐述许多具体细节、比如具体方法顺序、结构、单元和连接。然而将理解无需利用这些和其它具体细节以实现本发明的实施例。在其它境况中,已经省略或者尚未具体描述公知结构、单元或者连接以免不必要地模糊本描述。
在说明书内对“一个实施例”或者“实施例”的引用旨在于指示与该实施例结合描述的特定特征、结构或者特性包含于本发明的至少一个实施例中。在说明书内的各处出现短语“在一个实施例中”未必都指代相同实施例,并且分离或者备选实施例未与其它实施例互斥。另外,描述各种特征可以被一些实施例而未被其它实施例表现。相似地,描述各种要求可以对于一些实施例、但是并非其它实施例的要求。
在对计算机存储器内的数据位的操作的流程、步骤、逻辑块、处理和其它符号表示方面呈现以下具体描述的一些部分。这些描述和表示是在数据处理领域中的技术人员用来向本领域中的其他技术人员最有效传达他们的工作的实质的手段。这里并且通常设想流程、计算机执行的步骤、逻辑块、过程等是促成所需结果的自一致步骤或者指令序列。该步骤是需要物理操控物理量的步骤。这些量尽管未必但是通常采用计算机可读存储介质的电信号或磁信号的形式并且能够在计算机系统中被存储、传送、组合、比较和另外操作。已经证实主要出于普遍用法的原因而将这些信号称为位、值、单元、符号、字符、项、数等有时是方便的。
然而应当谨记这些和相似术语中的所有术语将与适当物理量关联并且仅为应用于这些量的方便标签。除非如从以下讨论中清楚的那样另有具体明示,理解贯穿本发明利用术语、比如“处理”或者“访问”或者“写入”或者“存储”或者“重复”等的讨论指代计算机系统或者相似电子计算设备的动作和过程,该计算机系统或者电子计算设备将在计算机系统的寄存器和存储器以及其它计算机可读介质内的表示为物理(电子)量的数据操控和变换成在计算机系统存储器或者寄存器或者其它这样的信息存储、传输或者显示设备内的相似地表示为物理量的其它数据。
在一个实施例中,将本发明实施为一种用于加速微处理器中的代码优化的方法。该方法包括使用指令提取部件来提取传入的微指令序列并且向解码部件传送所提取的宏指令用于解码成微指令。通过将微指令序列重排序成包括多个依赖代码组的优化的微指令序列来执行优化处理。该优化的微指令序列被输出至微处理器流水线用于执行。优化的微指令序列的副本被存储在序列高速缓存中用于在向优化的微指令序列后续命中时的后续使用。
图1示出根据本发明的一个实施例的微处理器100的分配/发出级的概况图。如图1中所示,微处理器100包括提取部件101、本机解码部件102以及指令调度和优化部件110和微处理器的其余流水线105。
在图1的实施例中,宏指令由提取部件101提取并且由本机解码部件102解码成本机微指令,该本机解码部件然后向微指令高速缓存121以及指令调度和优化器部件110提供该本机微指令。在一个实施例中,所提取的宏指令包括通过预测某些分支而组装的指令序列。
宏指令序列由本机解码部件102解码成结果微指令序列。这一微指令序列然后通过复用器103向指令调度和优化部件110传输。该指令调度和优化器部件通过例如通过对微指令序列的某些指令重排序来执行优化处理用于更高效执行而工作。这产生然后通过复用器104向其余流水线105(例如分配、派发、执行和引退级等)传送的优化的微指令序列。该优化的微指令序列造成指令的更快和更高效执行。
在一个实施例中,宏指令可以是来自高级指令集架构的指令,而微指令是低级机器指令。在另一实施例中,宏指令可以是来自多个不同指令集架构(例如与CISC相似、x86、与RISC相似、MIPS、SPARC、ARM、与虚拟相似、JAVA等)的客户指令,而微指令是低级别机器指令或者不同本机指令集架构的指令。相似地,在一个实施例中,宏指令可以是架构的本机指令,并且微指令可以是该相同架构的已经重排序和优化的本机微指令。例如X86宏指令和X86微代码化的微指令。
在一个实施例中,为了加速频繁遇到的代码(例如热代码)的执行性能,在微指令高速缓存121内高速缓存频繁遇到的微指令序列的副本,并且在序列高速缓存122内高速缓存频繁遇到的优化的微指令序列的副本。在代码被提取、解码、优化和执行时,可以通过描绘的驱逐和填充路径130根据序列高速缓存的大小来驱逐或者提取某些优化的微指令序列。这一驱逐和填充路径允许向微处理器的存储器分级(例如L1高速缓存、L2高速缓存、特殊可缓冲存储器范围等)和从微处理器的存储器分级传送优化的微指令序列。
应当指出在一个实施例中可以省略微指令高速缓存121。在这样的实施例中,通过在序列高速缓存122内存储优化的微指令序列来提供热代码的加速。例如通过省略微指令高速缓存121而节省的空间可以例如用来实现更大序列高速缓存122。
图2示出对根据本发明的一个实施例的优化过程进行图示的概况图。图2的左手侧示出如例如从本机解码部件102或者微指令高速缓存121接收的传入的微指令序列。在第一次接收这些指令时,它们未被优化。
优化过程的一个目的是对相互依赖的指令进行定位和标识并且将它们移入它们的相应依赖性组中,从而它们可以更高效执行。在一个实施例中,可以一起派发依赖指令组,从而它们可以更高效执行,因为它们的相应源和目的被一起分组用于本地。应当指出这一优化过程可以使用在无序处理器以及依序处理器二者中。例如在依序处理器中,指令被依序地派发。然而可以到处移动它们,从而如以上描述的那样在相应组中放置依赖指令,使得组然后可以独立执行。
例如传入指令包括加载、操作和存储。例如指令1包括如下操作,在该操作中将源寄存器(例如寄存器9和寄存器9)相加并且将结果存储在寄存器5中。因此,寄存器5是目的而寄存器9和寄存器5是源。以这一方式,16个指令的序列如图所述包括目的寄存器和源寄存器。
图2的实施例实施指令重排序以创建依赖性组,其中属于组的指令相互依赖。为了实现这一点,执行关于16个传入指令的加载和存储执行危险校验的算法。例如存储在无依赖性校验的情况下就不能移动过去更早的加载。存储不能传递更早存储。加载在无依赖性校验的情况下就不能传递更早加载。加载可以传递加载。指令可以通过使用重命名技术来传递在先路径预测的分支(例如动态构造的分支)。在非动态预测的分支的情况下,指令移动需要考虑分支的范围。以上规则中的每个规则可以通过添加虚拟依赖性来实施(例如通过向指令人为添加虚拟源或者目的以施行规则)。
仍然参照图2,如以上描述的那样,优化过程的目的是对依赖指令进行定位并且将它们移入公共依赖性组。这一过程必须根据危险校验算法来完成。优化算法寻找指令依赖性。指令依赖性还包括真实依赖性、输出依赖性和反依赖性。
算法通过首先寻找真实依赖性来开始。为了标识真实依赖性,比较16指令序列的每个目的与在16指令序列中更晚出现的其它后续源。真实依赖于更早指令的后续指令被标记“_1”以表明它们的真实依赖性。这在图2中由在16指令序列上从左向右进行的指令编号示出。例如考虑指令编号4,比较目的寄存器R3与后续指令的源,并且每个后续源被标记“_1”以指示该指令的真实依赖性。在这一情况下,指令6、指令7、指令11和指令15被标记“_1”。
算法然后寻找输出依赖性。为了标识输出依赖性,比较每个目的与其它后续指令的目的。并且对于16个指令中的每个指令,匹配的每个后续目的被标记“1_”(例如有时称为红色目的)。
算法然后寻找反依赖性。为了标识反依赖性,对于16个指令中的每个指令,比较每个源与更早指令的源以标识匹配。如果匹配出现,则在考虑之下的指令标记它本身为“1_”(例如有时称为红色指令)。
以这一方式,算法填充用于16个指令的序列的行和列的依赖性矩阵。依赖性矩阵包括标记,这些标记指示用于16个指令中的每个指令的不同依赖性类型。在一个实施例中,通过使用CAM匹配硬件和适当广播逻辑在一个周期中填充依赖性矩阵。例如目的可以通过其余指令向下广播以与后续指令的源比较(例如真实依赖性)以及与后续指令的目的比较(例如输出依赖性),而目的可以通过先前指令向上广播以与在先指令的源比较(例如反依赖性)。
优化算法使用依赖性矩阵以选择将哪些指令一起移入公共依赖性组。希望真实相互依赖的指令被移向相同组。寄存器重命名用来消除反依赖性以允许移动那些反依赖指令。该移动根据以上描述的规则和危险校验来完成。例如存储在无依赖性校验的情况下就不能移动过去更早的加载。存储不能传递更早存储。加载在无依赖性校验的情况下就不能传递更早存储。加载可以穿递加载。指令可以通过使用重命名技术来传递在先路径预测的分支(例如动态构造的分支)。在非动态预测的分支的情况下,指令移动需要考虑分支的范围。注意该描述。
在一个实施例中,可以实施优先级编码器以确定哪些指令获得移动以与其它指令分组。优先级编码器将根据依赖性矩阵提供的信息工作。
图3和图4示出根据本发明的一个实施例的多步骤优化过程。在一个实施例中,优化过程是迭代的,因为在指令通过移动它们的依赖性列在第一传递中被移动之后,重填充并且再次检查依赖性矩阵寻找移动指令的新机会。在一个实施例中,重复这一依赖性矩阵填充过程三次。这在图4中被示出,该图示出已经被移动、然后再次被检查寻找移动其它指令的机会的指令。在16个指令中的每个指令的右手侧上的编号序列示出指令在过程开始时所在的组和指令在过程完成时所在的组而在它们之间有居间组编号。例如图4示出指令6如何初始地在组4中、但是被移动在组1中。
以这一方式,图2至图4图示根据本发明的一个实施例的优化算法的操作。应当指出虽然图2至图4图示分配/发布级,但是也可以在本地调度器/派发级中实施这一功能。
图5示出根据本发明的一个实施例的示例硬件优化过程500的步骤的流程图。如在图5中描绘的那样,流程图示出根据本发明的一个实施例的如在微处理器的分配/发布级中实施的优化过程的操作步骤。
过程500在步骤501中开始,在该步骤中使用指令提取部件(例如来自图1的提取部件20)来提取传入宏指令序列。如以上描述的那样,提取的指令包括通过预测某些指令分支而组装的序列。
在步骤502中,将提取的宏指令向解码部件传送用于解码成微指令。该宏指令序列根据分支预测而解码成微指令序列。在一个实施例中,微指令序列然后在微指令高速缓存中存储。
在步骤503中,然后通过将包括微指令的序列重排序成依赖性组对微指令序列执行优化处理。重排序由指令重排序部件(例如指令调度和优化器部件110)实施。在图2至图4中描述这一过程。
在步骤504中,优化的微指令序列是向微处理器流水线的输出用于执行。如以上描述的那样,将优化的微指令序列向机器的其余部分转发用于执行(例如其余流水线105)。
并且后续在步骤505中,将优化的微指令序列的副本在序列高速缓存中存储用于在向该序列后续命中时的后续使用。以这一方式,序列高速缓存实现在后续命中优化的微指令序列时访问那些序列、由此加速热代码。
图6示出根据本发明的一个实施例的备选示例硬件优化过程600的步骤的流程图。如在图6中描绘的那样,流程图示出根据本发明的一个备选实施例的如在微处理器的分配/发布级中实施的优化过程的操作步骤。
过程600在步骤601中开始,在该步骤中使用指令提取部件(例如来自图1的提取部件20)来提取传入宏指令序列。如以上描述的那样,提取的指令包括通过预测某些指令分支而组装的序列。
在步骤602中,将提取的宏指令向解码部件传送用于解码成微指令。将宏指令序列根据分支预测解码成微指令序列。在一个实施例中,微指令序列然后在微指令高速缓存中存储。
在步骤603中,将所解码的微指令在微指令序列高速缓存中存储成序列。根据基本块边界形成微指令高速缓存中的序列以开始。这些序列在这一点未被优化。
在步骤604中,然后通过将包括微指令的序列重排序成依赖性组对微指令序列进行优化处理。重排序由指令重排序部件(例如指令调度和优化器部件110)实施。在图2至图4中描述这一过程。
在步骤605中,优化的微指令序列是向微处理器流水线的输出用于执行。如以上描述的那样,将优化的微指令序列向机器的其余部分转发用于执行(例如其余流水线105)。
并且后续在步骤606中,将优化的微指令序列的副本在序列高速缓存中存储用于在向该序列后续命中时用于后续使用。以这一方式,序列高速缓存实现在后续命中优化的微指令序列时访问那些序列、由此加速热代码。
图7示出对根据本发明的一个实施例的分配/发布级的CAM匹配硬件和优先级编码硬件的操作进行示出的图。如在图7中描绘的那样,从左侧向CAM阵列中广播指令的目的。示出三个示例指令目的。更浅色阴影的CAM(例如绿色)用于真实依赖性匹配和输出依赖性匹配,因此将该目的向下广播。更深色阴影(例如蓝色)用于反依赖性匹配,因此将目的向上广播。这些匹配如以上描述的那样填充依赖性矩阵。在右侧上示出优先级编码器,并且它们通过扫描CAM行以发现第一匹配“_1”或者“1_”来工作。如以上在图2-图4的讨论中描述的那样,该过程可以实施为迭代。例如,如果“_1”被“1_”阻塞,则可以将该目的重命名和移动。
图8示出对根据本发明的一个实施例的在分支之前的优化的调度指令进行图示的图。如图8中所示,与传统准时编译器实例并排描绘硬件优化的示例。图8的左手侧示出原有未优化的代码,该代码包括偏置的未取得的分支“Branch C to L1”。图8的中间列示出传统准时编译器优化,其中将寄存器重命名并且将指令移动在分支之前。在这一示例中,准时编译器插入补偿代码以考虑分支偏置判决错误的那些场合(例如其中将分支实际取得而不是未取得)。对照而言,图8的右列示出硬件展开的优化。在这一情况下,将寄存器重命名并且将指令移动在分支之前。然而应当指出未插入补偿代码。硬件保持对分支偏置判决是否为真的追踪。在错误预测的分支的情况下,硬件自动退回它的状态以便执行正确的指令序列。硬件优化器解决方案能够避免使用补偿代码,因为在那些情况下,在未命中预测分支时,硬件跳转至存储器中的原有代码并且从该原有代码执行正确序列而刷新未命中预测的指令序列。
图9示出对根据本发明的一个实施例的在存储之前的优化调度加载进行图示的图。如图9中所示,与传统准时编译器示例并排描绘硬件优化的示例。图9的左手侧示出包括存储“R3<-LD[R5]”的原有未优化的代码。图9的中间列示出传统准时编译器优化,其中将寄存器重命名并且将加载移动在存储之前。在这一示例中,准时编译器插入补偿代码以考虑加载指令的地址与存储指令的地址混淆的场合(例如其中将加载移动在存储之前是不适合的)。对照而言,图9的右列示出硬件展开的优化。在这一情况下,将寄存器重命名并且也将加载移动在存储之前。然而应当指出未插入补偿代码。在将加载移动在存储之前是错误的情况下,硬件自动退回它的状态以便执行正确的指令序列。硬件优化器解决方案能够避免使用补偿代码,因为在那些情况下,在未命中预测地址混淆校验分支的情况下,硬件跳转至存储器中的原有代码并且从该原有代码执行正确序列而刷新未命中预测的指令序列。在这一情况下,序列假设无混淆。应当指出在一个实施例中,在图9中图解的功能可以由图1的指令调度和优化器部件110实施。相似地,应当指出在一个实施例中,在图9中描绘的功能可以由以下在图10中描述的软件优化器1000实施。
此外,关于动态地展开的序列,应当指出指令可以通过使用重命名来传递在先路径预测的分支(例如动态构造的分支)。在非动态预测的分支的情况下,指令移动应当考虑分支的范围。可以在希望的程度上展开循环,并且可以跨越整个序列应用优化。例如这可以通过将跨越分支移动的指令的目的寄存器重命名来实施。这一特征的益处之一是无需补偿代码或者分支范围的广泛分析。这一特征因此大量加速和简化优化过程。
关于分支预测和指令序列的组装的附加信息可以在Mohammad A.Abdallah提交于2010年9月17日的、名称为"SINGLE CYCLE MULTI-BRANCH PREDICTION INCLUDING SHADOWCACHE FOR EARLY FAR BRANCH PREDICTION"的、共同转让的美国专利申请第61/384,198号中发现,并且将其完全结合于此。
图10示出根据本发明的一个实施例的示例软件优化过程的图。在图10的实施例中,指令调度和优化器部件(例如图1的部件110)被基于软件的优化器1000替代。
在图10的实施例中,软件优化器1000执行由基于硬件的指令调度和优化器部件110执行的优化处理。软件优化器在存储器分级(例如L1、L2、系统存储器)中维护优化的序列的副本。这允许软件优化器维护与在序列高速缓存中存储的优化的序列的汇集比较的大得多的优化的序列的汇集。
应当指出软件优化器1000可以包括在存储器分级中驻留的代码作为向优化器的输入和从优化过程的输出二者。
应当指出在一个实施例中可以省略微指令高速缓存。在这样的实施例中,仅高速缓存优化的微指令序列。
图11示出根据本发明的一个实施例的SIMD基于软件的优化过程的流程图。图11的顶部示出基于软件的优化器如何检查输入指令序列的每个指令。图11示出SIMD比较如何可以用来匹配一个与许多(例如SIMD字节比较第一源“Src1”与所有第二源字节“Scr2”)。在一个实施例中,Src1包含任何指令的目的寄存器,并且Src2包含来自每个其它后续指令的一个源。对于每个目的完成与所有后续指令源匹配(例如真实依赖性校验)。这是指示用于指令的希望组的按对匹配。在每个目的与每个后续指令目的之间完成匹配(例如输出依赖性校验)。这是可以用重命名化解的阻塞匹配。在每个目的与每个在先指令源之间完成匹配(例如反依赖性匹配)。这是可以通过重命名化解的阻塞匹配。结果用来填充依赖性矩阵的行和列。
图12示出根据本发明的一个实施例的示例SIMD基于软件的优化过程1200的操作步骤的流程图。该过程1200在图9的流程图的上下文中描述。
在步骤1201中,通过使用基于软件的优化器实例化的存储器来访问输入指令序列。
在步骤1202中,使用SIMD指令用通过使用SIMD比较指令序列从输入指令序列抽取的依赖性信息来填充依赖性矩阵。
在步骤1203中,从右向左扫描矩阵的行寻找第一匹配(例如依赖性标记)。
在步骤1204中,分析第一匹配中的每个第一匹配以确定匹配的类型。
在步骤1205中,如果第一标记的匹配是阻塞依赖性,则对于这一目的完成重命名。
在步骤1206中,标识用于矩阵的每行的所有第一匹配并且将用于该匹配的对应列移向给定的依赖性组。
在步骤1207中,重复扫描过程若干次以对包括输入序列的指令重排序以产生优化的输出序列。
在步骤1208中,将优化的指令序列向微处理器的执行流水线输出用于执行。
在步骤1209中,将优化的输出序列存储在序列高速缓存中用于后续消耗(例如加速热代码)。
应当指出可以使用SIMD指令来串行完成软件优化。例如可以通过一次处理一个指令从而扫描指令的源和目的(例如从在序列中的更早指令到后续指令)来实施优化。软件使用SIMD指令以根据以上描述的优化算法和SIMD指令并行比较当前指令源和目的与在先指令源和目的(例如检测真实依赖性、输出依赖性和反依赖性)。
图13示出根据本发明的一个实施例的基于软件的依赖性广播过程。图13的实施例示出处理指令组而未以如以上描述的完全并行硬件实现方式为代价的示例软件调度过程的流程图。然而图13的实施例仍然可以使用SIMD以并行处理更小指令组。
图13的软件调度过程进行如下。首先,该过程初始化三个寄存器。该过程取得指令编号并且将它们加载到第一寄存器中。该过程然后取得目的寄存器编号并且将它们加载到第二寄存器中。该过程然后取得第一寄存器中的值并且根据第二寄存器中的位置编号将它们向第三结果寄存器中的位置广播。该过程然后在第二寄存器中从左向右改写,最左侧值将在广播去往结果寄存器中的相同位置的那些实例中改写右侧值。第三寄存器中的尚未写入的位置被绕过。这一信息用来填充依赖性矩阵。
图13的实施例也示出可以将输入指令序列处理为多组的方式。例如可以将16输入指令序列处理为第一组8个指令和第二组8个指令。利用第一组,将指令编号加载第一寄存器中,将指令目的编号加载到第二寄存器中,并且根据第二寄存器中的位置编号(例如组广播)将第一寄存器中的值向第三寄存器(例如结果寄存器)中的位置广播。第三寄存器中的尚未写入的位置被绕过。第三寄存器现在变成用于处理第二组的基础。例如来自组1的结果寄存器现在变成用于处理组二的结果寄存器。
利用第二组,将指令编号加载到第一寄存器中,将指令目的编号加载到第二寄存器中,并且根据第二寄存器中的位置编号将第一寄存器中的值向第三寄存器(例如结果寄存器)中的位置广播。在第三寄存器中的位置可以改写在处理第一组期间写入的结果。第三寄存器中的尚未写入的位置被绕过。以这一方式,第二组更新来自第一组的基础并且由此产生用于处理第三组的新基础并且以此类推。
在第二组中的指令可以继承在第一组的处理中生成的依赖性信息。应当指出无需处理整个第二组以更新结果寄存器中的依赖性。例如可以在第一组的处理中生成用于指令12的依赖性,然后处理第二组中的指令直至指令11。这更新结果寄存器为直至指令12的状态。在一个实施例中,掩码可以用来防止用于第二组的其余指令(例如指令12至16)的更新。为了确定用于指令12的依赖性,检查结果寄存器寻找R2和R5。将用指令1更新R5,并且将用指令11更新R2。应当指出在处理组2的全部的情况下将用指令15更新R2。
此外,应当指出可以相互独立处理第二组的所有指令(指令9-16)。在这样的情况下,第二组的指令仅依赖于第一组的结果寄存器。一旦根据第一组的处理更新结果寄存器,则第二组的指令可以并行处理。以这一方式,可以接连并行处理指令组。在一个实施例中,使用SIMD指令(例如SIMD广播指令)来处理每组、由此并行处理所述每组中的所有指令。
图14示出对根据本发明的一个实施例可以如何使用指令的依赖性分组以构建依赖指令的可变有界组进行示出的示例流程图。在图2至图4的描述中,组大小被约束,在那些情况下每组三个指令。图14示出如何可以将指令重排序成可变大小组,该可变大小组然后可以向多个计算引擎进行分配。例如图14示出4个引擎。由于组可以根据它们的特性而大小可变,所以引擎1可以被分配例如比引擎2更大的组。这可以例如在引擎2具有如下指令情况下出现,该指令未具体依赖于该组中的其它指令。
图15示出对根据本发明的一个实施例的指令的分级调度进行描绘的流程图。如以上描述的那样,指令的依赖性分组可以用来构建可变有界组。图15示出如下特征,其中各种依赖性级存在于依赖性组内。例如指令1未依赖于这一指令序列内的任何其它指令、因此使指令1成为L0依赖性级。然而指令4依赖于指令1、因此使指令4成为L1依赖性级。以这一方式,如图所示将指令序列的指令中的每个指令指派依赖性级。
每个指令的依赖性级由第二级分级调度器用来以确保资源可用于执行依赖指令这样的方式派发指令。例如在一个实施例中,将L0指令加载到第二级调度器1-4处理的指令队列中。加载L0指令使得它们在队列中的每个队列前面,加载L1指令使得它们在队列中的每个队列中跟随、L2指令跟随它们并且以此类推。这在图15中由从L0至Ln的依赖性级示出。调度器1-4的分级调度有利地利用在时间上局部和指令到指令依赖性以用最优方式做出调度判决。
以这一方式,本发明的实施例提示用于指令序列的指令的依赖性组时隙分配。例如为了实施无序微架构,指令序列的指令的派发是无序的。在一个实施例中,在每个周期上校验指令准备就绪。如果先前已经派发它依赖于的所有指令,则指令准备就绪。调度器结构通过校验那些依赖性来工作。在一个实施例中,调度器是统一的调度器,并且在统一的调度器结构中执行所有依赖性校验。在另一实施例中,跨越多个引擎的执行单元的派发队列分布调度器功能。因此,在一个实施例中,调度器是统一的,而在另一实施例中,调度器是分布的。利用这两种解决方案,每个周期按照派发指令的目的来校验每个指令源。
因此,图15示出如本发明的实施例执行的分级调度。如以上描述的那样,首先将指令分组以形成依赖性链(例如依赖性组)。这些依赖性链的形成可以通过软件或者硬件静态或者动态完成。一旦这些依赖性链已经形成,可以向引擎分布/派发它们。以这一方式,依赖性分组允许无序调度的有序形成的组。依赖性分组也向多个引擎(例如核或者线程)上分布整个依赖性组。依赖性分组也有助于如以上描述的分级调度,其中依赖指令在第一步骤中被分组、然后在第二步骤中被调度。
应当指出在图14-图19中图解的功能可以与用来对指令进行分组的任何方法(例如是否在硬件、软件等中实施分组功能)独立地工作。此外,图14-图19中所示依赖性分组可以包括独立组的矩阵,其中每组还包括依赖指令。此外,应当指出调度器也可以是引擎。在这样的实施例中,调度器1-4中的每个调度器可以被并入于它的相应引擎内(例如如图22中所示,其中每个分段包括公共分区调度器)。
图16示出对根据本发明的一个实施例的三时隙依赖性指令组的分级调度进行描绘的流程图。如以上描述的那样,指令的依赖性分组可以用来构建可变有界组。在这一实施例中,依赖性组包括三个时隙。图16示出甚至在三时隙依赖性组内的各种依赖性级。如以上描述的那样,指令1未依赖在这一指令序列内的任何其它指令、因此使指令1成为L0依赖性级。然而指令4依赖于指令1、因此使指令4成为L1依赖性级。以这一方式,如图所示将指令序列的指令中的每个指令指派依赖性级。
如以上描述的那样,每个指令的依赖性级由第二级分级调度器用来以确保资源可用于执行依赖指令这样的方式来派发指令。将L0指令加载到第二级调度器1-4处理的指令队列中。如图6中从L0至Ln的依赖性级所示,加载L0指令使得它们在队列中的每个队列前面,加载L1指令使得它们在队列中的每个指令中跟随、L2指令跟随它们并且以此类推。应当指出组编号四(例如从顶部起的第四组)即使它是分离组仍然在L2开始。这是因为指令7依赖指令4,指令4依赖指令1,由此向指令7给予L2依赖性。
以这一方式,图16示出如何在调度器1-4中的给定的调度器上一起调度每三个依赖指令。在第一级组后面调度第二级组,然后旋转该组。
图17示出对根据本发明的一个实施例的三时隙依赖性指令组的分级移动窗口调度进行描绘的流程图。在这一实施例中,经由统一的移动窗口调度器实施用于三时隙依赖性组的分级调度。移动窗口调度器处理队列中的指令以用确保资源可用于执行依赖指令这样的方式来派发指令。如以上描述的那样,向第二级调度器1-4处理的指令队列中加载L0指令。如图17中从L0至Ln的依赖性级所示,加载L0指令使得它们在队列中的每个队列前面,加载L1指令使得它们在队列中的每个队列中跟随,L2指令跟随它并且以此类推。移动窗口图示如何可以从队列中的每个队列派发L0指令,即使它们可以在一个队列而不是另一队列中。以这一方式,移动窗口调度器如图17中所示随着队列从左向右流动而派发指令。
图18示出根据本发明的一个实施例如何向多个计算引擎分配指令的可变大小依赖链(例如可变有界组)。
如图18中描绘的那样,处理器包括指令调度器部件10和多个引擎11-14。该指令调度器部件生成代码块和继承矢量以支持在它们的相应引擎上执行依赖代码块(例如可变有界组)。依赖代码块中的每个依赖代码块可以属于相同逻辑核/线程或者不同逻辑核/线程。指令调度器部件将处理依赖代码块以生成相应继承矢量。这些依赖代码块和相应继承矢量如图所示向特定引擎11-14分配。全局互联30支持跨越引擎11-14中的每个引擎的必需通信。应当指出如以上在图14的讨论中描述的用于指令的依赖性分组以构建依赖指令的可变有界组的功能由图18的实施例的指令调度器部件10实施。
图19示出对根据本发明的一个实施例的向调度队列的块分配和三时隙依赖性指令组的分级移动窗口调度进行描绘的流程图。如以上描述的那样,可以经由统一的移动窗口调度器实施用于三时隙依赖性组的分级调度。图19示出依赖性组如何变成向调度队列中加载的块。在图19的实施例中,可以在每个队列中加载两个独立组作为半块。这在图19的顶部被示出,其中组1形成向第一调度队列中加载的一个半块并且组4形成另一半块。
如以上描述的那样,移动窗口调度器处理队列中的指令以用确保资源可用于执行依赖指令这样的方式分派指令。图19的底部示出如何向第二级调度器处理的指令队列中加载L0指令。
图20示出根据本发明的一个实施例如何在引擎11-14上执行依赖代码块(例如依赖性组或者依赖性链)。如以上描述的那样,指令调度器部件生成代码块和继承矢量以支持在它们的相应引擎上执行依赖代码块(例如可变有界组、三时隙组等)。如以上在图19中描述的那样,图20还示出如何可以向每个引擎中加载两个独立组作为代码块。图20示出如何向引擎11-14派发这些代码块,其中依赖指令在每个引擎的堆叠(例如串联连接)的执行单元上执行。例如在图20的左上部上的第一依赖性组或者代码块中,向引擎11派发指令,其中按照它们的依赖性的顺序在执行单元上堆叠它们,从而在L1上面堆叠L0,在L2上进一步堆叠L1。在这样做时,L0的结果流向L1的执行单元,L1然后可以流向L2的执行。
以这一方式,图20中所示依赖性组可以包括独立组的矩阵,其中每组还包括依赖指令。组独立的益处是有能力并行派发和执行它们以及最小化对于跨越在引擎之间的互连的通信的需要这样的属性。此外,应当指出图11-14中所示执行单元可以包括CPU或者GPU。
根据本发明的实施例,应当理解指令根据它们的依赖性被抽象化成依赖性组或者块或者指令矩阵。根据指令的依赖性对它们进行分组有助于具有更大指令窗口(例如更大输入指令序列)的更简化调度过程。如以上描述的分组去除指令变化并且统一地抽象化这样的变化、由此允许实施简单、同构和统一调度决策做出。以上描述的分组功能增加调度器的吞吐量而未增加调度器的复杂性。例如在用于四个引擎的调度器中,调度器可以派发四组,其中每组具有三个指令。在这样做时,调度器在派发12个指令之时仅处置超定标器复杂性的四个通道。另外,每个块可以包含并行独立组,这进一步增加派发的指令数目。
图21示出根据本发明的一个实施例的多个引擎及其部件的概况图,这些部件包括全局前端提取和调度器和寄存器文件、全局互连以及用于多核处理器的分段式存储器子系统。如在图21中描绘的那样,示出四个存储器片段101-104。存储器片段化分级跨越每个存储器分级(例如L1高速缓存、L2高速缓存和加载存储缓冲器)相同。可以通过存储器全局互连110a在L1高速缓存中的每个L1高速缓存、L2高速缓存中的每个L2高速缓存和加载存储缓冲器中的每个加载存储缓冲器之间交换数据。
存储器全局互连包括路由矩阵,该路由矩阵允许多个核(例如地址计算和执行单元121-124)访问可以在片段式高速缓存分级中的任何点(例如L1高速缓存、加载存储缓冲器和L2高速缓存)存储的数据。图21还描绘地址计算和执行单元121-124可以通过存储器全局互连110a来访问片段101-104中的每个片段的方式。
执行全局互连110b相似地包括路由矩阵,该路由矩阵允许多个核(例如地址计算和执行单元121-124)访问可以在分段式寄存器文件中的任何分段式寄存器文件存储的数据。因此,该核具有通过存储器全局互连110a或者执行全局互连110b对在片段中的任何片段中存储的数据和对在分段中的任何分段中存储的数据的访问。
图21还示出全局前端提取和调度器,该全局前端提取和调度器对于整个机器的视图并且管理寄存器文件分段和片段式存储器子系统的利用。地址生成包括用于片段定义的基础。全局前端提取和调度器通过向每个片段分配指令序列来工作。
图22示出根据本发明的一个实施例的多个片段、多个片段是公共分区调度器互连和以及进入片段的端口。如图在图22中描绘的那样,示出每个分段具有公共分区调度器。该公共分区调度器通过调度在它的相应分段内的指令来工作。这些指令是从全局前端提取和调度器轮流接收的。在这一实施例中,该公共分区调度器被配置为与全局前端提取和调度器配合工作。还示出该分段具有4个读取写入端口,这些读取写入端口提供对操作数/结果缓冲器、线程式寄存器文件和公共分区或者调度器的读取/写入访问。
在一个实施例中,实施非集中式访问过程用于使用互连,并且本地互连运用保留加法器和门限限制器来控制对每个争用的资源——在这一情况下为进入每个分段的端口——的访问。在这样的一个实施例中,核需要保留必需总线并且保留必需端口。
图23示出根据本发明的一个实施例的示例微处理器流水线2300的图。微处理器流水线2300包括提取模块2301,该提取模块实施如以上描述的用于标识和抽取包括执行的指令的过程的功能。在图23的实施例中,提取模块跟随有解码模块2302、分配模块2303、派发模块2304、执行模块2305和引退模块2306。应当指出微处理器流水线2300仅为实施以上描述的本发明的实施例的功能的流水线的一个示例。本领域技术人员将认识可以实施包括以上描述的解码模块的功能的其它微处理器流水线。
出于说明的目的,前文描述参照具体实施例,其未旨在于穷举或者限制本发明。与以上教导一致的许多修改和变化是可能的。选择和描述实施例以便最好地说明本发明的原理及其实际应用以便使本领域其他技术人员能够借助如可以与本发明和各种实施例的特定使用相适的各种修改最好地利用本发明和各种实施例。
Claims (14)
1.一种用于分级地调度指令的微处理器中的方法,其中,所述微处理器包括多个指令队列,所述方法包括:
访问输入指令序列;
将所述输入指令序列分组为多个依赖性分组;
向所述输入指令序列中的每个指令指派依赖性级;
按照依赖性级的顺序将被分组到所述多个依赖性分组中的相同依赖性分组中的指令加载到所述多个指令队列中的相同指令队列中;以及
分派来自所述多个指令队列的指令。
2.如权利要求1所述的方法,其特征在于,所述多个依赖性分组包括可变大小的依赖性分组。
3.如权利要求1所述的方法,其特征在于,所述多个依赖性分组包括固定大小的依赖性分组。
4.如权利要求1所述的方法,其特征在于,移动窗口调度器处理所述多个指令队列中的指令以分派来自指令队列的指令。
5.如权利要求1所述的方法,其特征在于还包括:
生成继承矢量以支持所述多个依赖性分组的执行;以及
将所述多个依赖性分组和所述继承矢量分配给多个引擎。
6.如权利要求1所述的方法,其特征在于,所述多个依赖性分组中的第一依赖性分组和第二依赖性分组被加载到所述多个指令队列中的一指令队列中作为多个半块。
7.如权利要求6所述的方法,其特征在于,所述第一依赖性分组和所述第二依赖性分组被分派给一引擎,其中,依赖性分组中的指令按依赖性级的顺序被堆叠在所述引擎的一组执行单元上,以允许执行给定指令的结果流至用于执行依赖于所述给定指令的指令的执行单元。
8.一种用于分级地调度指令的微处理器,包括:
多个指令队列;以及
调度器组件,所述调度器组件用于:访问输入指令序列,将所述输入指令序列分组为多个依赖性分组,向所述输入指令序列中的每个指令指派依赖性级,按照依赖性级的顺序将被分组到所述多个依赖性分组中的相同依赖性分组中的指令加载到所述多个指令队列中的相同指令队列中,以及分派来自所述多个指令队列的指令。
9.如权利要求8所述的微处理器,其特征在于,所述多个依赖性分组包括可变大小的依赖性分组。
10.如权利要求8所述的微处理器,其特征在于,所述多个依赖性分组包括固定大小的依赖性分组。
11.如权利要求8所述的微处理器,其特征在于,移动窗口调度器处理所述多个指令队列中的指令以分派来自指令队列的指令。
12.如权利要求8所述的微处理器,其特征在于还包括:
多个引擎,其中,所述调度器组件被配置为用于生成继承矢量以支持所述多个依赖性分组的执行并且将所述多个依赖性分组和所述继承矢量分配给所述多个引擎。
13.如权利要求8所述的微处理器,其特征在于,所述多个依赖性分组中的第一依赖性分组和第二依赖性分组被加载到所述多个指令队列中的一指令队列中作为多个半块。
14.如权利要求13所述的微处理器,其特征在于,所述第一依赖性分组和所述第二依赖性分组被分派给一引擎,其中,依赖性分组中的指令按依赖性级的顺序被堆叠在所述引擎的一组执行单元上,以允许执行给定指令的结果流至用于执行依赖于所述给定指令的指令的执行单元。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810449173.XA CN108427574B (zh) | 2011-11-22 | 2011-11-22 | 微处理器加速的代码优化器 |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2011/061957 WO2013077876A1 (en) | 2011-11-22 | 2011-11-22 | A microprocessor accelerated code optimizer |
CN201810449173.XA CN108427574B (zh) | 2011-11-22 | 2011-11-22 | 微处理器加速的代码优化器 |
CN201180076248.0A CN104040491B (zh) | 2011-11-22 | 2011-11-22 | 微处理器加速的代码优化器 |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201180076248.0A Division CN104040491B (zh) | 2011-11-22 | 2011-11-22 | 微处理器加速的代码优化器 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108427574A CN108427574A (zh) | 2018-08-21 |
CN108427574B true CN108427574B (zh) | 2022-06-07 |
Family
ID=48470172
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201180076248.0A Active CN104040491B (zh) | 2011-11-22 | 2011-11-22 | 微处理器加速的代码优化器 |
CN201810449173.XA Active CN108427574B (zh) | 2011-11-22 | 2011-11-22 | 微处理器加速的代码优化器 |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201180076248.0A Active CN104040491B (zh) | 2011-11-22 | 2011-11-22 | 微处理器加速的代码优化器 |
Country Status (5)
Country | Link |
---|---|
US (2) | US20150039859A1 (zh) |
EP (1) | EP2783281B1 (zh) |
KR (2) | KR101703400B1 (zh) |
CN (2) | CN104040491B (zh) |
WO (1) | WO2013077876A1 (zh) |
Families Citing this family (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8327115B2 (en) | 2006-04-12 | 2012-12-04 | Soft Machines, Inc. | Plural matrices of execution units for processing matrices of row dependent instructions in single clock cycle in super or separate mode |
EP2527972A3 (en) | 2006-11-14 | 2014-08-06 | Soft Machines, Inc. | Apparatus and method for processing complex instruction formats in a multi- threaded architecture supporting various context switch modes and virtualization schemes |
KR101685247B1 (ko) | 2010-09-17 | 2016-12-09 | 소프트 머신즈, 인크. | 조기 원거리 분기 예측을 위한 섀도우 캐시를 포함하는 단일 사이클 다중 분기 예측 |
EP2689326B1 (en) | 2011-03-25 | 2022-11-16 | Intel Corporation | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
CN108376097B (zh) | 2011-03-25 | 2022-04-15 | 英特尔公司 | 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段 |
CN103547993B (zh) | 2011-03-25 | 2018-06-26 | 英特尔公司 | 通过使用由可分割引擎实例化的虚拟核来执行指令序列代码块 |
WO2012162189A1 (en) | 2011-05-20 | 2012-11-29 | Soft Machines, Inc. | An interconnect structure to support the execution of instruction sequences by a plurality of engines |
TWI603198B (zh) | 2011-05-20 | 2017-10-21 | 英特爾股份有限公司 | 以複數個引擎作資源與互連結構的分散式分配以支援指令序列的執行 |
EP2783280B1 (en) | 2011-11-22 | 2019-09-11 | Intel Corporation | An accelerated code optimizer for a multiengine microprocessor |
WO2013077876A1 (en) | 2011-11-22 | 2013-05-30 | Soft Machines, Inc. | A microprocessor accelerated code optimizer |
US10270709B2 (en) | 2015-06-26 | 2019-04-23 | Microsoft Technology Licensing, Llc | Allocating acceleration component functionality for supporting services |
US9569216B2 (en) | 2013-03-15 | 2017-02-14 | Soft Machines, Inc. | Method for populating a source view data structure by using register template snapshots |
WO2014150991A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for implementing a reduced size register view data structure in a microprocessor |
WO2014150971A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for dependency broadcasting through a block organized source view data structure |
US9886279B2 (en) | 2013-03-15 | 2018-02-06 | Intel Corporation | Method for populating and instruction view data structure by using register template snapshots |
US9904625B2 (en) | 2013-03-15 | 2018-02-27 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
EP2972845B1 (en) | 2013-03-15 | 2021-07-07 | Intel Corporation | A method for executing multithreaded instructions grouped onto blocks |
US10275255B2 (en) | 2013-03-15 | 2019-04-30 | Intel Corporation | Method for dependency broadcasting through a source organized source view data structure |
US9811342B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for performing dual dispatch of blocks and half blocks |
CN105247484B (zh) | 2013-03-15 | 2021-02-23 | 英特尔公司 | 利用本地分布式标志体系架构来仿真访客集中式标志体系架构的方法 |
US9891924B2 (en) | 2013-03-15 | 2018-02-13 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US10140138B2 (en) | 2013-03-15 | 2018-11-27 | Intel Corporation | Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation |
WO2014150806A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for populating register view data structure by using register template snapshots |
GB2514618B (en) * | 2013-05-31 | 2020-11-11 | Advanced Risc Mach Ltd | Data processing systems |
US9733909B2 (en) * | 2014-07-25 | 2017-08-15 | Intel Corporation | System converter that implements a reordering process through JIT (just in time) optimization that ensures loads do not dispatch ahead of other loads that are to the same address |
EP3172665A4 (en) | 2014-07-25 | 2018-03-28 | INTEL Corporation | Using a conversion look aside buffer to implement an instruction set agnostic runtime architecture |
US11281481B2 (en) | 2014-07-25 | 2022-03-22 | Intel Corporation | Using a plurality of conversion tables to implement an instruction set agnostic runtime architecture |
US10353680B2 (en) | 2014-07-25 | 2019-07-16 | Intel Corporation | System converter that implements a run ahead run time guest instruction conversion/decoding process and a prefetching process where guest code is pre-fetched from the target of guest branches in an instruction sequence |
US20160026486A1 (en) * | 2014-07-25 | 2016-01-28 | Soft Machines, Inc. | An allocation and issue stage for reordering a microinstruction sequence into an optimized microinstruction sequence to implement an instruction set agnostic runtime architecture |
CN104699466B (zh) * | 2015-03-26 | 2017-07-18 | 中国人民解放军国防科学技术大学 | 一种面向vliw体系结构的多元启发式指令选择方法 |
US10198294B2 (en) | 2015-04-17 | 2019-02-05 | Microsoft Licensing Technology, LLC | Handling tenant requests in a system that uses hardware acceleration components |
US9983938B2 (en) * | 2015-04-17 | 2018-05-29 | Microsoft Technology Licensing, Llc | Locally restoring functionality at acceleration components |
US9792154B2 (en) | 2015-04-17 | 2017-10-17 | Microsoft Technology Licensing, Llc | Data processing system having a hardware acceleration plane and a software plane |
US10216555B2 (en) | 2015-06-26 | 2019-02-26 | Microsoft Technology Licensing, Llc | Partially reconfiguring acceleration components |
KR20180038793A (ko) * | 2016-10-07 | 2018-04-17 | 삼성전자주식회사 | 영상 데이터 처리 방법 및 장치 |
US10884751B2 (en) | 2018-07-13 | 2021-01-05 | Advanced Micro Devices, Inc. | Method and apparatus for virtualizing the micro-op cache |
GB2577738B (en) * | 2018-10-05 | 2021-02-24 | Advanced Risc Mach Ltd | An apparatus and method for providing decoded instructions |
Family Cites Families (495)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US727487A (en) | 1902-10-21 | 1903-05-05 | Swan F Swanson | Dumping-car. |
US4075704A (en) | 1976-07-02 | 1978-02-21 | Floating Point Systems, Inc. | Floating point data processor for high speech operation |
US4228496A (en) | 1976-09-07 | 1980-10-14 | Tandem Computers Incorporated | Multiprocessor system |
US4245344A (en) | 1979-04-02 | 1981-01-13 | Rockwell International Corporation | Processing system with dual buses |
US4527237A (en) | 1979-10-11 | 1985-07-02 | Nanodata Computer Corporation | Data processing system |
US4414624A (en) | 1980-11-19 | 1983-11-08 | The United States Of America As Represented By The Secretary Of The Navy | Multiple-microcomputer processing |
US4524415A (en) | 1982-12-07 | 1985-06-18 | Motorola, Inc. | Virtual machine data processor |
US4597061B1 (en) | 1983-01-03 | 1998-06-09 | Texas Instruments Inc | Memory system using pipleline circuitry for improved system |
US4577273A (en) | 1983-06-06 | 1986-03-18 | Sperry Corporation | Multiple microcomputer system for digital computers |
US4682281A (en) | 1983-08-30 | 1987-07-21 | Amdahl Corporation | Data storage unit employing translation lookaside buffer pointer |
US4600986A (en) | 1984-04-02 | 1986-07-15 | Sperry Corporation | Pipelined split stack with high performance interleaved decode |
US4633434A (en) | 1984-04-02 | 1986-12-30 | Sperry Corporation | High performance storage unit |
JPS6140643A (ja) | 1984-07-31 | 1986-02-26 | Hitachi Ltd | システムの資源割当て制御方式 |
US4835680A (en) | 1985-03-15 | 1989-05-30 | Xerox Corporation | Adaptive processor array capable of learning variable associations useful in recognizing classes of inputs |
JPS6289149A (ja) | 1985-10-15 | 1987-04-23 | Agency Of Ind Science & Technol | 多ポ−トメモリシステム |
JPH0658650B2 (ja) | 1986-03-14 | 1994-08-03 | 株式会社日立製作所 | 仮想計算機システム |
US4920477A (en) | 1987-04-20 | 1990-04-24 | Multiflow Computer, Inc. | Virtual address table look aside buffer miss recovery method and apparatus |
US4943909A (en) | 1987-07-08 | 1990-07-24 | At&T Bell Laboratories | Computational origami |
US5339398A (en) | 1989-07-31 | 1994-08-16 | North American Philips Corporation | Memory architecture and method of data organization optimized for hashing |
US5471593A (en) | 1989-12-11 | 1995-11-28 | Branigin; Michael H. | Computer processor with an efficient means of executing many instructions simultaneously |
US5197130A (en) | 1989-12-29 | 1993-03-23 | Supercomputer Systems Limited Partnership | Cluster architecture for a highly parallel scalar/vector multiprocessor system |
US5317754A (en) | 1990-10-23 | 1994-05-31 | International Business Machines Corporation | Method and apparatus for enabling an interpretive execution subset |
US5317705A (en) | 1990-10-24 | 1994-05-31 | International Business Machines Corporation | Apparatus and method for TLB purge reduction in a multi-level machine system |
US6282583B1 (en) | 1991-06-04 | 2001-08-28 | Silicon Graphics, Inc. | Method and apparatus for memory access in a matrix processor computer |
US5539911A (en) | 1991-07-08 | 1996-07-23 | Seiko Epson Corporation | High-performance, superscalar-based computer system with out-of-order instruction execution |
JPH0820949B2 (ja) | 1991-11-26 | 1996-03-04 | 松下電器産業株式会社 | 情報処理装置 |
JPH07502358A (ja) | 1991-12-23 | 1995-03-09 | インテル・コーポレーション | マイクロプロセッサーのクロックに依るマルチプル・アクセスのためのインターリーブ・キャッシュ |
KR100309566B1 (ko) | 1992-04-29 | 2001-12-15 | 리패치 | 파이프라인프로세서에서다중명령어를무리짓고,그룹화된명령어를동시에발행하고,그룹화된명령어를실행시키는방법및장치 |
EP0638183B1 (en) | 1992-05-01 | 1997-03-05 | Seiko Epson Corporation | A system and method for retiring instructions in a superscalar microprocessor |
EP0576262B1 (en) | 1992-06-25 | 2000-08-23 | Canon Kabushiki Kaisha | Apparatus for multiplying integers of many figures |
JPH0637202A (ja) | 1992-07-20 | 1994-02-10 | Mitsubishi Electric Corp | マイクロ波ic用パッケージ |
JPH06110781A (ja) | 1992-09-30 | 1994-04-22 | Nec Corp | キャッシュメモリ装置 |
US5493660A (en) | 1992-10-06 | 1996-02-20 | Hewlett-Packard Company | Software assisted hardware TLB miss handler |
US5513335A (en) | 1992-11-02 | 1996-04-30 | Sgs-Thomson Microelectronics, Inc. | Cache tag memory having first and second single-port arrays and a dual-port array |
US5819088A (en) | 1993-03-25 | 1998-10-06 | Intel Corporation | Method and apparatus for scheduling instructions for execution on a multi-issue architecture computer |
JPH0784883A (ja) | 1993-09-17 | 1995-03-31 | Hitachi Ltd | 仮想計算機システムのアドレス変換バッファパージ方法 |
US6948172B1 (en) | 1993-09-21 | 2005-09-20 | Microsoft Corporation | Preemptive multi-tasking with cooperative groups of tasks |
US5469376A (en) | 1993-10-14 | 1995-11-21 | Abdallah; Mohammad A. F. F. | Digital circuit for the evaluation of mathematical expressions |
US5517651A (en) * | 1993-12-29 | 1996-05-14 | Intel Corporation | Method and apparatus for loading a segment register in a microprocessor capable of operating in multiple modes |
US5761476A (en) | 1993-12-30 | 1998-06-02 | Intel Corporation | Non-clocked early read for back-to-back scheduling of instructions |
US5956753A (en) | 1993-12-30 | 1999-09-21 | Intel Corporation | Method and apparatus for handling speculative memory access operations |
JP3048498B2 (ja) | 1994-04-13 | 2000-06-05 | 株式会社東芝 | 半導体記憶装置 |
JPH07287668A (ja) | 1994-04-19 | 1995-10-31 | Hitachi Ltd | データ処理装置 |
CN1084005C (zh) | 1994-06-27 | 2002-05-01 | 国际商业机器公司 | 用于动态控制地址空间分配的方法和设备 |
US5548742A (en) | 1994-08-11 | 1996-08-20 | Intel Corporation | Method and apparatus for combining a direct-mapped cache and a multiple-way cache in a cache memory |
US5813031A (en) | 1994-09-21 | 1998-09-22 | Industrial Technology Research Institute | Caching tag for a large scale cache computer memory system |
US5640534A (en) | 1994-10-05 | 1997-06-17 | International Business Machines Corporation | Method and system for concurrent access in a data cache array utilizing multiple match line selection paths |
US5835951A (en) | 1994-10-18 | 1998-11-10 | National Semiconductor | Branch processing unit with target cache read prioritization protocol for handling multiple hits |
JP3569014B2 (ja) | 1994-11-25 | 2004-09-22 | 富士通株式会社 | マルチコンテキストをサポートするプロセッサおよび処理方法 |
US5724565A (en) | 1995-02-03 | 1998-03-03 | International Business Machines Corporation | Method and system for processing first and second sets of instructions by first and second types of processing systems |
US5644742A (en) | 1995-02-14 | 1997-07-01 | Hal Computer Systems, Inc. | Processor structure and method for a time-out checkpoint |
US5675759A (en) | 1995-03-03 | 1997-10-07 | Shebanow; Michael C. | Method and apparatus for register management using issue sequence prior physical register and register association validity information |
US5751982A (en) | 1995-03-31 | 1998-05-12 | Apple Computer, Inc. | Software emulation system with dynamic translation of emulated instructions for increased processing speed |
US5634068A (en) | 1995-03-31 | 1997-05-27 | Sun Microsystems, Inc. | Packet switched cache coherent multiprocessor system |
US6209085B1 (en) | 1995-05-05 | 2001-03-27 | Intel Corporation | Method and apparatus for performing process switching in multiprocessor computer systems |
US6643765B1 (en) | 1995-08-16 | 2003-11-04 | Microunity Systems Engineering, Inc. | Programmable processor with group floating point operations |
US5710902A (en) * | 1995-09-06 | 1998-01-20 | Intel Corporation | Instruction dependency chain indentifier |
US6341324B1 (en) | 1995-10-06 | 2002-01-22 | Lsi Logic Corporation | Exception processing in superscalar microprocessor |
US5864657A (en) | 1995-11-29 | 1999-01-26 | Texas Micro, Inc. | Main memory system and checkpointing protocol for fault-tolerant computer system |
US5983327A (en) | 1995-12-01 | 1999-11-09 | Nortel Networks Corporation | Data path architecture and arbitration scheme for providing access to a shared system resource |
US5793941A (en) | 1995-12-04 | 1998-08-11 | Advanced Micro Devices, Inc. | On-chip primary cache testing circuit and test method |
US5911057A (en) | 1995-12-19 | 1999-06-08 | Texas Instruments Incorporated | Superscalar microprocessor having combined register and memory renaming circuits, systems, and methods |
US5699537A (en) * | 1995-12-22 | 1997-12-16 | Intel Corporation | Processor microarchitecture for efficient dynamic scheduling and execution of chains of dependent instructions |
US6882177B1 (en) | 1996-01-10 | 2005-04-19 | Altera Corporation | Tristate structures for programmable logic devices |
US5752796A (en) | 1996-01-24 | 1998-05-19 | Muka; Richard S. | Vacuum integrated SMIF system |
US5754818A (en) | 1996-03-22 | 1998-05-19 | Sun Microsystems, Inc. | Architecture and method for sharing TLB entries through process IDS |
US5904892A (en) | 1996-04-01 | 1999-05-18 | Saint-Gobain/Norton Industrial Ceramics Corp. | Tape cast silicon carbide dummy wafer |
US5752260A (en) | 1996-04-29 | 1998-05-12 | International Business Machines Corporation | High-speed, multiple-port, interleaved cache with arbitration of multiple access addresses |
US5806085A (en) | 1996-05-01 | 1998-09-08 | Sun Microsystems, Inc. | Method for non-volatile caching of network and CD-ROM file accesses using a cache directory, pointers, file name conversion, a local hard disk, and separate small database |
US5829028A (en) | 1996-05-06 | 1998-10-27 | Advanced Micro Devices, Inc. | Data cache configured to store data in a use-once manner |
US6108769A (en) | 1996-05-17 | 2000-08-22 | Advanced Micro Devices, Inc. | Dependency table for reducing dependency checking hardware |
US5881277A (en) | 1996-06-13 | 1999-03-09 | Texas Instruments Incorporated | Pipelined microprocessor with branch misprediction cache circuits, systems and methods |
US5860146A (en) | 1996-06-25 | 1999-01-12 | Sun Microsystems, Inc. | Auxiliary translation lookaside buffer for assisting in accessing data in remote address spaces |
US5903760A (en) | 1996-06-27 | 1999-05-11 | Intel Corporation | Method and apparatus for translating a conditional instruction compatible with a first instruction set architecture (ISA) into a conditional instruction compatible with a second ISA |
US5974506A (en) | 1996-06-28 | 1999-10-26 | Digital Equipment Corporation | Enabling mirror, nonmirror and partial mirror cache modes in a dual cache system |
US6167490A (en) | 1996-09-20 | 2000-12-26 | University Of Washington | Using global memory information to manage memory in a computer network |
KR19980032776A (ko) | 1996-10-16 | 1998-07-25 | 가나이 츠토무 | 데이타 프로세서 및 데이타 처리시스템 |
KR19990076967A (ko) | 1996-11-04 | 1999-10-25 | 요트.게.아. 롤페즈 | 처리 장치 및 메모리내의 명령 판독 |
US6385715B1 (en) | 1996-11-13 | 2002-05-07 | Intel Corporation | Multi-threading for a processor utilizing a replay queue |
US6253316B1 (en) | 1996-11-19 | 2001-06-26 | Advanced Micro Devices, Inc. | Three state branch history using one bit in a branch prediction mechanism |
US5978906A (en) | 1996-11-19 | 1999-11-02 | Advanced Micro Devices, Inc. | Branch selectors associated with byte ranges within an instruction cache for rapidly identifying branch predictions |
US5903750A (en) | 1996-11-20 | 1999-05-11 | Institute For The Development Of Emerging Architectures, L.L.P. | Dynamic branch prediction for branch instructions with multiple targets |
US6212542B1 (en) | 1996-12-16 | 2001-04-03 | International Business Machines Corporation | Method and system for executing a program within a multiscalar processor by processing linked thread descriptors |
US6134634A (en) | 1996-12-20 | 2000-10-17 | Texas Instruments Incorporated | Method and apparatus for preemptive cache write-back |
US5918251A (en) | 1996-12-23 | 1999-06-29 | Intel Corporation | Method and apparatus for preloading different default address translation attributes |
US6016540A (en) | 1997-01-08 | 2000-01-18 | Intel Corporation | Method and apparatus for scheduling instructions in waves |
US6065105A (en) * | 1997-01-08 | 2000-05-16 | Intel Corporation | Dependency matrix |
US5802602A (en) | 1997-01-17 | 1998-09-01 | Intel Corporation | Method and apparatus for performing reads of related data from a set-associative cache memory |
US6088780A (en) | 1997-03-31 | 2000-07-11 | Institute For The Development Of Emerging Architecture, L.L.C. | Page table walker that uses at least one of a default page size and a page size selected for a virtual address space to position a sliding field in a virtual address |
US6314511B2 (en) | 1997-04-03 | 2001-11-06 | University Of Washington | Mechanism for freeing registers on processors that perform dynamic out-of-order execution of instructions using renaming registers |
US6075938A (en) | 1997-06-10 | 2000-06-13 | The Board Of Trustees Of The Leland Stanford Junior University | Virtual machine monitors for scalable multiprocessors |
US6073230A (en) | 1997-06-11 | 2000-06-06 | Advanced Micro Devices, Inc. | Instruction fetch unit configured to provide sequential way prediction for sequential instruction fetches |
JPH1124929A (ja) | 1997-06-30 | 1999-01-29 | Sony Corp | 演算処理装置およびその方法 |
US6128728A (en) | 1997-08-01 | 2000-10-03 | Micron Technology, Inc. | Virtual shadow registers and virtual register windows |
US6170051B1 (en) | 1997-08-01 | 2001-01-02 | Micron Technology, Inc. | Apparatus and method for program level parallelism in a VLIW processor |
US6085315A (en) | 1997-09-12 | 2000-07-04 | Siemens Aktiengesellschaft | Data processing device with loop pipeline |
US6101577A (en) | 1997-09-15 | 2000-08-08 | Advanced Micro Devices, Inc. | Pipelined instruction cache and branch prediction mechanism therefor |
US5901294A (en) | 1997-09-18 | 1999-05-04 | International Business Machines Corporation | Method and system for bus arbitration in a multiprocessor system utilizing simultaneous variable-width bus access |
US6185660B1 (en) | 1997-09-23 | 2001-02-06 | Hewlett-Packard Company | Pending access queue for providing data to a target register during an intermediate pipeline phase after a computer cache miss |
US5905509A (en) | 1997-09-30 | 1999-05-18 | Compaq Computer Corp. | Accelerated Graphics Port two level Gart cache having distributed first level caches |
US6226732B1 (en) | 1997-10-02 | 2001-05-01 | Hitachi Micro Systems, Inc. | Memory system architecture |
US5922065A (en) | 1997-10-13 | 1999-07-13 | Institute For The Development Of Emerging Architectures, L.L.C. | Processor utilizing a template field for encoding instruction sequences in a wide-word format |
US6178482B1 (en) | 1997-11-03 | 2001-01-23 | Brecis Communications | Virtual register sets |
US6021484A (en) | 1997-11-14 | 2000-02-01 | Samsung Electronics Co., Ltd. | Dual instruction set architecture |
US6256728B1 (en) | 1997-11-17 | 2001-07-03 | Advanced Micro Devices, Inc. | Processor configured to selectively cancel instructions from its pipeline responsive to a predicted-taken short forward branch instruction |
US6260131B1 (en) | 1997-11-18 | 2001-07-10 | Intrinsity, Inc. | Method and apparatus for TLB memory ordering |
US6016533A (en) | 1997-12-16 | 2000-01-18 | Advanced Micro Devices, Inc. | Way prediction logic for cache array |
US6219776B1 (en) | 1998-03-10 | 2001-04-17 | Billions Of Operations Per Second | Merged array controller and processing element |
US6609189B1 (en) | 1998-03-12 | 2003-08-19 | Yale University | Cycle segmented prefix circuits |
JP3657424B2 (ja) | 1998-03-20 | 2005-06-08 | 松下電器産業株式会社 | 番組情報を放送するセンター装置と端末装置 |
US6216215B1 (en) | 1998-04-02 | 2001-04-10 | Intel Corporation | Method and apparatus for senior loads |
US6157998A (en) | 1998-04-03 | 2000-12-05 | Motorola Inc. | Method for performing branch prediction and resolution of two or more branch instructions within two or more branch prediction buffers |
US6115809A (en) | 1998-04-30 | 2000-09-05 | Hewlett-Packard Company | Compiling strong and weak branching behavior instruction blocks to separate caches for dynamic and static prediction |
US6205545B1 (en) | 1998-04-30 | 2001-03-20 | Hewlett-Packard Company | Method and apparatus for using static branch predictions hints with dynamically translated code traces to improve performance |
US6256727B1 (en) | 1998-05-12 | 2001-07-03 | International Business Machines Corporation | Method and system for fetching noncontiguous instructions in a single clock cycle |
JPH11338710A (ja) | 1998-05-28 | 1999-12-10 | Toshiba Corp | 複数種の命令セットを持つプロセッサのためのコンパイル方法ならびに装置および同方法がプログラムされ記録される記録媒体 |
US6272616B1 (en) | 1998-06-17 | 2001-08-07 | Agere Systems Guardian Corp. | Method and apparatus for executing multiple instruction streams in a digital processor with multiple data paths |
US6988183B1 (en) * | 1998-06-26 | 2006-01-17 | Derek Chi-Lan Wong | Methods for increasing instruction-level parallelism in microprocessors and digital system |
US6260138B1 (en) | 1998-07-17 | 2001-07-10 | Sun Microsystems, Inc. | Method and apparatus for branch instruction processing in a processor |
US6122656A (en) | 1998-07-31 | 2000-09-19 | Advanced Micro Devices, Inc. | Processor configured to map logical register numbers to physical register numbers using virtual register numbers |
US6272662B1 (en) | 1998-08-04 | 2001-08-07 | International Business Machines Corporation | Distributed storage system using front-end and back-end locking |
JP2000057054A (ja) | 1998-08-12 | 2000-02-25 | Fujitsu Ltd | 高速アドレス変換システム |
US6742111B2 (en) * | 1998-08-31 | 2004-05-25 | Stmicroelectronics, Inc. | Reservation stations to increase instruction level parallelism |
US8631066B2 (en) | 1998-09-10 | 2014-01-14 | Vmware, Inc. | Mechanism for providing virtual machines for use by multiple users |
US6339822B1 (en) | 1998-10-02 | 2002-01-15 | Advanced Micro Devices, Inc. | Using padded instructions in a block-oriented cache |
US6332189B1 (en) | 1998-10-16 | 2001-12-18 | Intel Corporation | Branch prediction architecture |
GB9825102D0 (en) | 1998-11-16 | 1999-01-13 | Insignia Solutions Plc | Computer system |
JP3110404B2 (ja) | 1998-11-18 | 2000-11-20 | 甲府日本電気株式会社 | マイクロプロセッサ装置及びそのソフトウェア命令高速化方法並びにその制御プログラムを記録した記録媒体 |
US6490673B1 (en) | 1998-11-27 | 2002-12-03 | Matsushita Electric Industrial Co., Ltd | Processor, compiling apparatus, and compile program recorded on a recording medium |
US6519682B2 (en) | 1998-12-04 | 2003-02-11 | Stmicroelectronics, Inc. | Pipelined non-blocking level two cache system with inherent transaction collision-avoidance |
US7020879B1 (en) | 1998-12-16 | 2006-03-28 | Mips Technologies, Inc. | Interrupt and exception handling for multi-streaming digital processors |
US6477562B2 (en) | 1998-12-16 | 2002-11-05 | Clearwater Networks, Inc. | Prioritized instruction scheduling for multi-streaming processors |
US6247097B1 (en) | 1999-01-22 | 2001-06-12 | International Business Machines Corporation | Aligned instruction cache handling of instruction fetches across multiple predicted branch instructions |
US6321298B1 (en) | 1999-01-25 | 2001-11-20 | International Business Machines Corporation | Full cache coherency across multiple raid controllers |
JP3842474B2 (ja) | 1999-02-02 | 2006-11-08 | 株式会社ルネサステクノロジ | データ処理装置 |
US6327650B1 (en) | 1999-02-12 | 2001-12-04 | Vsli Technology, Inc. | Pipelined multiprocessing with upstream processor concurrently writing to local register and to register of downstream processor |
US6668316B1 (en) | 1999-02-17 | 2003-12-23 | Elbrus International Limited | Method and apparatus for conflict-free execution of integer and floating-point operations with a common register file |
US6732220B2 (en) | 1999-02-17 | 2004-05-04 | Elbrus International | Method for emulating hardware features of a foreign architecture in a host operating system environment |
US6418530B2 (en) | 1999-02-18 | 2002-07-09 | Hewlett-Packard Company | Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions |
US6437789B1 (en) | 1999-02-19 | 2002-08-20 | Evans & Sutherland Computer Corporation | Multi-level cache controller |
US6850531B1 (en) | 1999-02-23 | 2005-02-01 | Alcatel | Multi-service network switch |
US6212613B1 (en) | 1999-03-22 | 2001-04-03 | Cisco Technology, Inc. | Methods and apparatus for reusing addresses in a computer |
US6529928B1 (en) | 1999-03-23 | 2003-03-04 | Silicon Graphics, Inc. | Floating-point adder performing floating-point and integer operations |
EP1050808B1 (en) | 1999-05-03 | 2008-04-30 | STMicroelectronics S.A. | Computer instruction scheduling |
US6449671B1 (en) | 1999-06-09 | 2002-09-10 | Ati International Srl | Method and apparatus for busing data elements |
US6473833B1 (en) | 1999-07-30 | 2002-10-29 | International Business Machines Corporation | Integrated cache and directory structure for multi-level caches |
US6643770B1 (en) | 1999-09-16 | 2003-11-04 | Intel Corporation | Branch misprediction recovery using a side memory |
US6772325B1 (en) * | 1999-10-01 | 2004-08-03 | Hitachi, Ltd. | Processor architecture and operation for exploiting improved branch control instruction |
US6704822B1 (en) | 1999-10-01 | 2004-03-09 | Sun Microsystems, Inc. | Arbitration protocol for a shared data cache |
US6457120B1 (en) | 1999-11-01 | 2002-09-24 | International Business Machines Corporation | Processor and method including a cache having confirmation bits for improving address predictable branch instruction target predictions |
US7441110B1 (en) | 1999-12-10 | 2008-10-21 | International Business Machines Corporation | Prefetching using future branch path information derived from branch prediction |
US7107434B2 (en) | 1999-12-20 | 2006-09-12 | Board Of Regents, The University Of Texas | System, method and apparatus for allocating hardware resources using pseudorandom sequences |
JP4693326B2 (ja) | 1999-12-22 | 2011-06-01 | ウビコム インコーポレイテッド | 組込み型プロセッサにおいてゼロタイムコンテクストスイッチを用いて命令レベルをマルチスレッド化するシステムおよび方法 |
US6557095B1 (en) | 1999-12-27 | 2003-04-29 | Intel Corporation | Scheduling operations using a dependency matrix |
US6542984B1 (en) | 2000-01-03 | 2003-04-01 | Advanced Micro Devices, Inc. | Scheduler capable of issuing and reissuing dependency chains |
JP2003519833A (ja) | 2000-01-03 | 2003-06-24 | アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド | 依存性連鎖の発行および再発行が可能なスケジューラ |
US6594755B1 (en) | 2000-01-04 | 2003-07-15 | National Semiconductor Corporation | System and method for interleaved execution of multiple independent threads |
US6728872B1 (en) | 2000-02-04 | 2004-04-27 | International Business Machines Corporation | Method and apparatus for verifying that instructions are pipelined in correct architectural sequence |
GB0002848D0 (en) | 2000-02-08 | 2000-03-29 | Siroyan Limited | Communicating instruction results in processors and compiling methods for processors |
GB2365661A (en) | 2000-03-10 | 2002-02-20 | British Telecomm | Allocating switch requests within a packet switch |
US6615340B1 (en) | 2000-03-22 | 2003-09-02 | Wilmot, Ii Richard Byron | Extended operand management indicator structure and method |
US6604187B1 (en) | 2000-06-19 | 2003-08-05 | Advanced Micro Devices, Inc. | Providing global translations with address space numbers |
US6557083B1 (en) | 2000-06-30 | 2003-04-29 | Intel Corporation | Memory system for multiple data types |
US6704860B1 (en) | 2000-07-26 | 2004-03-09 | International Business Machines Corporation | Data processing system and method for fetching instruction blocks in response to a detected block sequence |
US7206925B1 (en) | 2000-08-18 | 2007-04-17 | Sun Microsystems, Inc. | Backing Register File for processors |
US6728866B1 (en) * | 2000-08-31 | 2004-04-27 | International Business Machines Corporation | Partitioned issue queue and allocation strategy |
US6721874B1 (en) | 2000-10-12 | 2004-04-13 | International Business Machines Corporation | Method and system for dynamically shared completion table supporting multiple threads in a processing system |
US7757065B1 (en) | 2000-11-09 | 2010-07-13 | Intel Corporation | Instruction segment recording scheme |
JP2002185513A (ja) | 2000-12-18 | 2002-06-28 | Hitachi Ltd | パケット通信ネットワークおよびパケット転送制御方法 |
US6877089B2 (en) | 2000-12-27 | 2005-04-05 | International Business Machines Corporation | Branch prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program |
US6907600B2 (en) | 2000-12-27 | 2005-06-14 | Intel Corporation | Virtual translation lookaside buffer |
US6647466B2 (en) | 2001-01-25 | 2003-11-11 | Hewlett-Packard Development Company, L.P. | Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy |
FR2820921A1 (fr) | 2001-02-14 | 2002-08-16 | Canon Kk | Dispositif et procede de transmission dans un commutateur |
US6985951B2 (en) | 2001-03-08 | 2006-01-10 | International Business Machines Corporation | Inter-partition message passing method, system and program product for managing workload in a partitioned processing environment |
US6950927B1 (en) | 2001-04-13 | 2005-09-27 | The United States Of America As Represented By The Secretary Of The Navy | System and method for instruction-level parallelism in a programmable multiple network processor environment |
US7200740B2 (en) | 2001-05-04 | 2007-04-03 | Ip-First, Llc | Apparatus and method for speculatively performing a return instruction in a microprocessor |
US7707397B2 (en) | 2001-05-04 | 2010-04-27 | Via Technologies, Inc. | Variable group associativity branch target address cache delivering multiple target addresses per cache line |
US6658549B2 (en) | 2001-05-22 | 2003-12-02 | Hewlett-Packard Development Company, Lp. | Method and system allowing a single entity to manage memory comprising compressed and uncompressed data |
US6985591B2 (en) | 2001-06-29 | 2006-01-10 | Intel Corporation | Method and apparatus for distributing keys for decrypting and re-encrypting publicly distributed media |
US7203824B2 (en) | 2001-07-03 | 2007-04-10 | Ip-First, Llc | Apparatus and method for handling BTAC branches that wrap across instruction cache lines |
US7024545B1 (en) | 2001-07-24 | 2006-04-04 | Advanced Micro Devices, Inc. | Hybrid branch prediction device with two levels of branch prediction cache |
US6954846B2 (en) | 2001-08-07 | 2005-10-11 | Sun Microsystems, Inc. | Microprocessor and method for giving each thread exclusive access to one register file in a multi-threading mode and for giving an active thread access to multiple register files in a single thread mode |
US6718440B2 (en) | 2001-09-28 | 2004-04-06 | Intel Corporation | Memory access latency hiding with hint buffer |
US7150021B1 (en) | 2001-10-12 | 2006-12-12 | Palau Acquisition Corporation (Delaware) | Method and system to allocate resources within an interconnect device according to a resource allocation table |
US7117347B2 (en) | 2001-10-23 | 2006-10-03 | Ip-First, Llc | Processor including fallback branch prediction mechanism for far jump and far call instructions |
US7272832B2 (en) | 2001-10-25 | 2007-09-18 | Hewlett-Packard Development Company, L.P. | Method of protecting user process data in a secure platform inaccessible to the operating system and other tasks on top of the secure platform |
US6964043B2 (en) | 2001-10-30 | 2005-11-08 | Intel Corporation | Method, apparatus, and system to optimize frequently executed code and to use compiler transformation and hardware support to handle infrequently executed code |
GB2381886B (en) | 2001-11-07 | 2004-06-23 | Sun Microsystems Inc | Computer system with virtual memory and paging mechanism |
US7092869B2 (en) | 2001-11-14 | 2006-08-15 | Ronald Hilton | Memory address prediction under emulation |
US7363467B2 (en) * | 2002-01-03 | 2008-04-22 | Intel Corporation | Dependence-chain processing using trace descriptors having dependency descriptors |
US6640333B2 (en) | 2002-01-10 | 2003-10-28 | Lsi Logic Corporation | Architecture for a sea of platforms |
US7055021B2 (en) | 2002-02-05 | 2006-05-30 | Sun Microsystems, Inc. | Out-of-order processor that reduces mis-speculation using a replay scoreboard |
US7331040B2 (en) | 2002-02-06 | 2008-02-12 | Transitive Limted | Condition code flag emulation for program code conversion |
US20030154363A1 (en) | 2002-02-11 | 2003-08-14 | Soltis Donald C. | Stacked register aliasing in data hazard detection to reduce circuit |
US6839816B2 (en) | 2002-02-26 | 2005-01-04 | International Business Machines Corporation | Shared cache line update mechanism |
US6731292B2 (en) | 2002-03-06 | 2004-05-04 | Sun Microsystems, Inc. | System and method for controlling a number of outstanding data transactions within an integrated circuit |
JP3719509B2 (ja) | 2002-04-01 | 2005-11-24 | 株式会社ソニー・コンピュータエンタテインメント | シリアル演算パイプライン、演算装置、算術論理演算回路およびシリアル演算パイプラインによる演算方法 |
US7565509B2 (en) | 2002-04-17 | 2009-07-21 | Microsoft Corporation | Using limits on address translation to control access to an addressable entity |
US6920530B2 (en) | 2002-04-23 | 2005-07-19 | Sun Microsystems, Inc. | Scheme for reordering instructions via an instruction caching mechanism |
US7113488B2 (en) | 2002-04-24 | 2006-09-26 | International Business Machines Corporation | Reconfigurable circular bus |
US7281055B2 (en) | 2002-05-28 | 2007-10-09 | Newisys, Inc. | Routing mechanisms in systems having multiple multi-processor clusters |
US7117346B2 (en) | 2002-05-31 | 2006-10-03 | Freescale Semiconductor, Inc. | Data processing system having multiple register contexts and method therefor |
US6938151B2 (en) | 2002-06-04 | 2005-08-30 | International Business Machines Corporation | Hybrid branch prediction using a global selection counter and a prediction method comparison table |
US6735747B2 (en) | 2002-06-10 | 2004-05-11 | Lsi Logic Corporation | Pre-silicon verification path coverage |
US8024735B2 (en) | 2002-06-14 | 2011-09-20 | Intel Corporation | Method and apparatus for ensuring fairness and forward progress when executing multiple threads of execution |
JP3845043B2 (ja) | 2002-06-28 | 2006-11-15 | 富士通株式会社 | 命令フェッチ制御装置 |
JP3982353B2 (ja) | 2002-07-12 | 2007-09-26 | 日本電気株式会社 | フォルトトレラントコンピュータ装置、その再同期化方法及び再同期化プログラム |
US6944744B2 (en) | 2002-08-27 | 2005-09-13 | Advanced Micro Devices, Inc. | Apparatus and method for independently schedulable functional units with issue lock mechanism in a processor |
US6950925B1 (en) | 2002-08-28 | 2005-09-27 | Advanced Micro Devices, Inc. | Scheduler for use in a microprocessor that supports data-speculative execution |
US7546422B2 (en) | 2002-08-28 | 2009-06-09 | Intel Corporation | Method and apparatus for the synchronization of distributed caches |
TW200408242A (en) | 2002-09-06 | 2004-05-16 | Matsushita Electric Ind Co Ltd | Home terminal apparatus and communication system |
US6895491B2 (en) | 2002-09-26 | 2005-05-17 | Hewlett-Packard Development Company, L.P. | Memory addressing for a virtual machine implementation on a computer processor supporting virtual hash-page-table searching |
US7334086B2 (en) | 2002-10-08 | 2008-02-19 | Rmi Corporation | Advanced processor with system on a chip interconnect technology |
US7213248B2 (en) | 2002-10-10 | 2007-05-01 | International Business Machines Corporation | High speed promotion mechanism suitable for lock acquisition in a multiprocessor data processing system |
US6829698B2 (en) | 2002-10-10 | 2004-12-07 | International Business Machines Corporation | Method, apparatus and system for acquiring a global promotion facility utilizing a data-less transaction |
US7222218B2 (en) | 2002-10-22 | 2007-05-22 | Sun Microsystems, Inc. | System and method for goal-based scheduling of blocks of code for concurrent execution |
US20040103251A1 (en) | 2002-11-26 | 2004-05-27 | Mitchell Alsup | Microprocessor including a first level cache and a second level cache having different cache line sizes |
EP1570334A2 (en) | 2002-12-04 | 2005-09-07 | Koninklijke Philips Electronics N.V. | Register file gating to reduce microprocessor power dissipation |
US6981083B2 (en) | 2002-12-05 | 2005-12-27 | International Business Machines Corporation | Processor virtualization mechanism via an enhanced restoration of hard architected states |
US7073042B2 (en) | 2002-12-12 | 2006-07-04 | Intel Corporation | Reclaiming existing fields in address translation data structures to extend control over memory accesses |
US20040117594A1 (en) | 2002-12-13 | 2004-06-17 | Vanderspek Julius | Memory management method |
US20040122887A1 (en) | 2002-12-20 | 2004-06-24 | Macy William W. | Efficient multiplication of small matrices using SIMD registers |
US7191349B2 (en) | 2002-12-26 | 2007-03-13 | Intel Corporation | Mechanism for processor power state aware distribution of lowest priority interrupt |
US6925421B2 (en) | 2003-01-09 | 2005-08-02 | International Business Machines Corporation | Method, system, and computer program product for estimating the number of consumers that place a load on an individual resource in a pool of physically distributed resources |
US20040139441A1 (en) | 2003-01-09 | 2004-07-15 | Kabushiki Kaisha Toshiba | Processor, arithmetic operation processing method, and priority determination method |
US7178010B2 (en) | 2003-01-16 | 2007-02-13 | Ip-First, Llc | Method and apparatus for correcting an internal call/return stack in a microprocessor that detects from multiple pipeline stages incorrect speculative update of the call/return stack |
US7089374B2 (en) | 2003-02-13 | 2006-08-08 | Sun Microsystems, Inc. | Selectively unmarking load-marked cache lines during transactional program execution |
US7278030B1 (en) | 2003-03-03 | 2007-10-02 | Vmware, Inc. | Virtualization system for computers having multiple protection mechanisms |
US6912644B1 (en) | 2003-03-06 | 2005-06-28 | Intel Corporation | Method and apparatus to steer memory access operations in a virtual memory system |
US7111145B1 (en) | 2003-03-25 | 2006-09-19 | Vmware, Inc. | TLB miss fault handler and method for accessing multiple page tables |
US7143273B2 (en) | 2003-03-31 | 2006-11-28 | Intel Corporation | Method and apparatus for dynamic branch prediction utilizing multiple stew algorithms for indexing a global history |
CN1214666C (zh) | 2003-04-07 | 2005-08-10 | 华为技术有限公司 | 位置业务中限制位置信息请求流量的方法 |
US7058764B2 (en) | 2003-04-14 | 2006-06-06 | Hewlett-Packard Development Company, L.P. | Method of adaptive cache partitioning to increase host I/O performance |
US7469407B2 (en) | 2003-04-24 | 2008-12-23 | International Business Machines Corporation | Method for resource balancing using dispatch flush in a simultaneous multithread processor |
US7290261B2 (en) | 2003-04-24 | 2007-10-30 | International Business Machines Corporation | Method and logical apparatus for rename register reallocation in a simultaneous multi-threaded (SMT) processor |
EP1471421A1 (en) | 2003-04-24 | 2004-10-27 | STMicroelectronics Limited | Speculative load instruction control |
US7139855B2 (en) | 2003-04-24 | 2006-11-21 | International Business Machines Corporation | High performance synchronization of resource allocation in a logically-partitioned system |
US7055003B2 (en) | 2003-04-25 | 2006-05-30 | International Business Machines Corporation | Data cache scrub mechanism for large L2/L3 data cache structures |
US7007108B2 (en) | 2003-04-30 | 2006-02-28 | Lsi Logic Corporation | System method for use of hardware semaphores for resource release notification wherein messages comprises read-modify-write operation and address |
US7743238B2 (en) | 2003-05-09 | 2010-06-22 | Arm Limited | Accessing items of architectural state from a register cache in a data processing apparatus when performing branch prediction operations for an indirect branch instruction |
JP2007519052A (ja) | 2003-06-25 | 2007-07-12 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | 命令制御式データ処理装置 |
JP2005032018A (ja) | 2003-07-04 | 2005-02-03 | Semiconductor Energy Lab Co Ltd | 遺伝的アルゴリズムを用いたマイクロプロセッサ |
US7149872B2 (en) | 2003-07-10 | 2006-12-12 | Transmeta Corporation | System and method for identifying TLB entries associated with a physical address of a specified range |
US7089398B2 (en) | 2003-07-31 | 2006-08-08 | Silicon Graphics, Inc. | Address translation using a page size tag |
US8296771B2 (en) | 2003-08-18 | 2012-10-23 | Cray Inc. | System and method for mapping between resource consumers and resource providers in a computing system |
US7133950B2 (en) | 2003-08-19 | 2006-11-07 | Sun Microsystems, Inc. | Request arbitration in multi-core processor |
US7849297B2 (en) | 2003-08-28 | 2010-12-07 | Mips Technologies, Inc. | Software emulation of directed exceptions in a multithreading processor |
US9032404B2 (en) | 2003-08-28 | 2015-05-12 | Mips Technologies, Inc. | Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor |
US7594089B2 (en) | 2003-08-28 | 2009-09-22 | Mips Technologies, Inc. | Smart memory based synchronization controller for a multi-threaded multiprocessor SoC |
JP4818919B2 (ja) | 2003-08-28 | 2011-11-16 | ミップス テクノロジーズ インコーポレイテッド | プロセッサ内での実行の計算スレッドを一時停止して割り当て解除するための統合されたメカニズム |
US7111126B2 (en) | 2003-09-24 | 2006-09-19 | Arm Limited | Apparatus and method for loading data values |
JP4057989B2 (ja) | 2003-09-26 | 2008-03-05 | 株式会社東芝 | スケジューリング方法および情報処理システム |
US7373637B2 (en) | 2003-09-30 | 2008-05-13 | International Business Machines Corporation | Method and apparatus for counting instruction and memory location ranges |
US7047322B1 (en) | 2003-09-30 | 2006-05-16 | Unisys Corporation | System and method for performing conflict resolution and flow control in a multiprocessor system |
FR2860313B1 (fr) | 2003-09-30 | 2005-11-04 | Commissariat Energie Atomique | Composant a architecture reconfigurable dynamiquement |
TWI281121B (en) | 2003-10-06 | 2007-05-11 | Ip First Llc | Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence |
US8407433B2 (en) | 2007-06-25 | 2013-03-26 | Sonics, Inc. | Interconnect implementing internal controls |
US7395372B2 (en) | 2003-11-14 | 2008-07-01 | International Business Machines Corporation | Method and system for providing cache set selection which is power optimized |
US7243170B2 (en) | 2003-11-24 | 2007-07-10 | International Business Machines Corporation | Method and circuit for reading and writing an instruction buffer |
US20050120191A1 (en) | 2003-12-02 | 2005-06-02 | Intel Corporation (A Delaware Corporation) | Checkpoint-based register reclamation |
US20050132145A1 (en) | 2003-12-15 | 2005-06-16 | Finisar Corporation | Contingent processor time division multiple access of memory in a multi-processor system to allow supplemental memory consumer access |
US7310722B2 (en) | 2003-12-18 | 2007-12-18 | Nvidia Corporation | Across-thread out of order instruction dispatch in a multithreaded graphics processor |
US7293164B2 (en) | 2004-01-14 | 2007-11-06 | International Business Machines Corporation | Autonomic method and apparatus for counting branch instructions to generate branch statistics meant to improve branch predictions |
US20050204118A1 (en) | 2004-02-27 | 2005-09-15 | National Chiao Tung University | Method for inter-cluster communication that employs register permutation |
US7478374B2 (en) | 2004-03-22 | 2009-01-13 | Intel Corporation | Debug system having assembler correcting register allocation errors |
US20050216920A1 (en) | 2004-03-24 | 2005-09-29 | Vijay Tewari | Use of a virtual machine to emulate a hardware device |
EP1731998A1 (en) | 2004-03-29 | 2006-12-13 | Kyoto University | Data processing device, data processing program, and recording medium containing the data processing program |
US7383427B2 (en) | 2004-04-22 | 2008-06-03 | Sony Computer Entertainment Inc. | Multi-scalar extension for SIMD instruction set processors |
US20050251649A1 (en) | 2004-04-23 | 2005-11-10 | Sony Computer Entertainment Inc. | Methods and apparatus for address map optimization on a multi-scalar extension |
US7418582B1 (en) | 2004-05-13 | 2008-08-26 | Sun Microsystems, Inc. | Versatile register file design for a multi-threaded processor utilizing different modes and register windows |
US7478198B2 (en) | 2004-05-24 | 2009-01-13 | Intel Corporation | Multithreaded clustered microarchitecture with dynamic back-end assignment |
US7594234B1 (en) | 2004-06-04 | 2009-09-22 | Sun Microsystems, Inc. | Adaptive spin-then-block mutual exclusion in multi-threaded processing |
US7284092B2 (en) | 2004-06-24 | 2007-10-16 | International Business Machines Corporation | Digital data processing apparatus having multi-level register file |
US20050289530A1 (en) | 2004-06-29 | 2005-12-29 | Robison Arch D | Scheduling of instructions in program compilation |
EP1628235A1 (en) | 2004-07-01 | 2006-02-22 | Texas Instruments Incorporated | Method and system of ensuring integrity of a secure mode entry sequence |
US8044951B1 (en) | 2004-07-02 | 2011-10-25 | Nvidia Corporation | Integer-based functionality in a graphics shading language |
US7339592B2 (en) | 2004-07-13 | 2008-03-04 | Nvidia Corporation | Simulating multiported memories using lower port count memories |
US7398347B1 (en) | 2004-07-14 | 2008-07-08 | Altera Corporation | Methods and apparatus for dynamic instruction controlled reconfigurable register file |
EP1619593A1 (en) | 2004-07-22 | 2006-01-25 | Sap Ag | Computer-Implemented method and system for performing a product availability check |
JP4064380B2 (ja) | 2004-07-29 | 2008-03-19 | 富士通株式会社 | 演算処理装置およびその制御方法 |
US8443171B2 (en) | 2004-07-30 | 2013-05-14 | Hewlett-Packard Development Company, L.P. | Run-time updating of prediction hint instructions |
US7213106B1 (en) | 2004-08-09 | 2007-05-01 | Sun Microsystems, Inc. | Conservative shadow cache support in a point-to-point connected multiprocessing node |
US7318143B2 (en) | 2004-10-20 | 2008-01-08 | Arm Limited | Reuseable configuration data |
US20090150890A1 (en) * | 2007-12-10 | 2009-06-11 | Yourst Matt T | Strand-based computing hardware and dynamically optimizing strandware for a high performance microprocessor system |
US7707578B1 (en) | 2004-12-16 | 2010-04-27 | Vmware, Inc. | Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system |
US7257695B2 (en) | 2004-12-28 | 2007-08-14 | Intel Corporation | Register file regions for a processing system |
US7996644B2 (en) | 2004-12-29 | 2011-08-09 | Intel Corporation | Fair sharing of a cache in a multi-core/multi-threaded processor by dynamically partitioning of the cache |
US8719819B2 (en) | 2005-06-30 | 2014-05-06 | Intel Corporation | Mechanism for instruction set based thread execution on a plurality of instruction sequencers |
US7050922B1 (en) | 2005-01-14 | 2006-05-23 | Agilent Technologies, Inc. | Method for optimizing test order, and machine-readable media storing sequences of instructions to perform same |
US7681014B2 (en) | 2005-02-04 | 2010-03-16 | Mips Technologies, Inc. | Multithreading instruction scheduler employing thread group priorities |
US7657891B2 (en) | 2005-02-04 | 2010-02-02 | Mips Technologies, Inc. | Multithreading microprocessor with optimized thread scheduler for increasing pipeline utilization efficiency |
JP2008530642A (ja) | 2005-02-07 | 2008-08-07 | ペーアーツェーテー イクスペーペー テクノロジーズ アクチエンゲゼルシャフト | 低レイテンシーの大量並列データ処理装置 |
US7400548B2 (en) | 2005-02-09 | 2008-07-15 | International Business Machines Corporation | Method for providing multiple reads/writes using a 2read/2write register file array |
US7343476B2 (en) | 2005-02-10 | 2008-03-11 | International Business Machines Corporation | Intelligent SMT thread hang detect taking into account shared resource contention/blocking |
US7152155B2 (en) | 2005-02-18 | 2006-12-19 | Qualcomm Incorporated | System and method of correcting a branch misprediction |
US20060200655A1 (en) | 2005-03-04 | 2006-09-07 | Smith Rodney W | Forward looking branch target address caching |
US20060212853A1 (en) | 2005-03-18 | 2006-09-21 | Marvell World Trade Ltd. | Real-time control apparatus having a multi-thread processor |
US8195922B2 (en) | 2005-03-18 | 2012-06-05 | Marvell World Trade, Ltd. | System for dynamically allocating processing time to multiple threads |
GB2424727B (en) | 2005-03-30 | 2007-08-01 | Transitive Ltd | Preparing instruction groups for a processor having a multiple issue ports |
US8522253B1 (en) | 2005-03-31 | 2013-08-27 | Guillermo Rozas | Hardware support for virtual machine and operating system context switching in translation lookaside buffers and virtually tagged caches |
US7313775B2 (en) | 2005-04-06 | 2007-12-25 | Lsi Corporation | Integrated circuit with relocatable processor hardmac |
US20060230243A1 (en) | 2005-04-06 | 2006-10-12 | Robert Cochran | Cascaded snapshots |
US8230423B2 (en) | 2005-04-07 | 2012-07-24 | International Business Machines Corporation | Multithreaded processor architecture with operational latency hiding |
US20060230409A1 (en) | 2005-04-07 | 2006-10-12 | Matteo Frigo | Multithreaded processor architecture with implicit granularity adaptation |
US20060230253A1 (en) | 2005-04-11 | 2006-10-12 | Lucian Codrescu | Unified non-partitioned register files for a digital signal processor operating in an interleaved multi-threaded environment |
US20060236074A1 (en) | 2005-04-14 | 2006-10-19 | Arm Limited | Indicating storage locations within caches |
US7437543B2 (en) | 2005-04-19 | 2008-10-14 | International Business Machines Corporation | Reducing the fetch time of target instructions of a predicted taken branch instruction |
US7461237B2 (en) | 2005-04-20 | 2008-12-02 | Sun Microsystems, Inc. | Method and apparatus for suppressing duplicative prefetches for branch target cache lines |
US8713286B2 (en) | 2005-04-26 | 2014-04-29 | Qualcomm Incorporated | Register files for a digital signal processor operating in an interleaved multi-threaded environment |
GB2426084A (en) | 2005-05-13 | 2006-11-15 | Agilent Technologies Inc | Updating data in a dual port memory |
US7861055B2 (en) | 2005-06-07 | 2010-12-28 | Broadcom Corporation | Method and system for on-chip configurable data ram for fast memory and pseudo associative caches |
US8010969B2 (en) | 2005-06-13 | 2011-08-30 | Intel Corporation | Mechanism for monitoring instruction set based thread execution on a plurality of instruction sequencers |
GB2444455A (en) * | 2005-08-29 | 2008-06-04 | Searete Llc | Scheduling mechanism of a hierarchical processor including multiple parallel clusters |
CN101263465B (zh) | 2005-09-14 | 2011-11-09 | 皇家飞利浦电子股份有限公司 | 用于总线仲裁的方法和系统 |
US7350056B2 (en) | 2005-09-27 | 2008-03-25 | International Business Machines Corporation | Method and apparatus for issuing instructions from an issue queue in an information handling system |
US7606975B1 (en) | 2005-09-28 | 2009-10-20 | Sun Microsystems, Inc. | Trace cache for efficient self-modifying code processing |
US7231106B2 (en) | 2005-09-30 | 2007-06-12 | Lucent Technologies Inc. | Apparatus for directing an optical signal from an input fiber to an output fiber within a high index host |
US7627735B2 (en) | 2005-10-21 | 2009-12-01 | Intel Corporation | Implementing vector memory operations |
US7613131B2 (en) | 2005-11-10 | 2009-11-03 | Citrix Systems, Inc. | Overlay network infrastructure |
US7681019B1 (en) | 2005-11-18 | 2010-03-16 | Sun Microsystems, Inc. | Executing functions determined via a collection of operations from translated instructions |
US7861060B1 (en) | 2005-12-15 | 2010-12-28 | Nvidia Corporation | Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior |
US7634637B1 (en) | 2005-12-16 | 2009-12-15 | Nvidia Corporation | Execution of parallel groups of threads with per-instruction serialization |
US7770161B2 (en) | 2005-12-28 | 2010-08-03 | International Business Machines Corporation | Post-register allocation profile directed instruction scheduling |
US8423682B2 (en) | 2005-12-30 | 2013-04-16 | Intel Corporation | Address space emulation |
US20070186050A1 (en) | 2006-02-03 | 2007-08-09 | International Business Machines Corporation | Self prefetching L2 cache mechanism for data lines |
GB2435362B (en) | 2006-02-20 | 2008-11-26 | Cramer Systems Ltd | Method of configuring devices in a telecommunications network |
JP4332205B2 (ja) | 2006-02-27 | 2009-09-16 | 富士通株式会社 | キャッシュ制御装置およびキャッシュ制御方法 |
US7543282B2 (en) | 2006-03-24 | 2009-06-02 | Sun Microsystems, Inc. | Method and apparatus for selectively executing different executable code versions which are optimized in different ways |
US8327115B2 (en) | 2006-04-12 | 2012-12-04 | Soft Machines, Inc. | Plural matrices of execution units for processing matrices of row dependent instructions in single clock cycle in super or separate mode |
US7610571B2 (en) | 2006-04-14 | 2009-10-27 | Cadence Design Systems, Inc. | Method and system for simulating state retention of an RTL design |
US7577820B1 (en) | 2006-04-14 | 2009-08-18 | Tilera Corporation | Managing data in a parallel processing environment |
CN100485636C (zh) | 2006-04-24 | 2009-05-06 | 华为技术有限公司 | 一种基于模型驱动进行电信级业务开发的调试方法及装置 |
US7804076B2 (en) | 2006-05-10 | 2010-09-28 | Taiwan Semiconductor Manufacturing Co., Ltd | Insulator for high current ion implanters |
US8145882B1 (en) | 2006-05-25 | 2012-03-27 | Mips Technologies, Inc. | Apparatus and method for processing template based user defined instructions |
US20080126771A1 (en) | 2006-07-25 | 2008-05-29 | Lei Chen | Branch Target Extension for an Instruction Cache |
CN100495324C (zh) | 2006-07-27 | 2009-06-03 | 中国科学院计算技术研究所 | 复杂指令集体系结构中的深度优先异常处理方法 |
US7904704B2 (en) | 2006-08-14 | 2011-03-08 | Marvell World Trade Ltd. | Instruction dispatching method and apparatus |
US8046775B2 (en) | 2006-08-14 | 2011-10-25 | Marvell World Trade Ltd. | Event-based bandwidth allocation mode switching method and apparatus |
US7539842B2 (en) | 2006-08-15 | 2009-05-26 | International Business Machines Corporation | Computer memory system for selecting memory buses according to physical memory organization information stored in virtual address translation tables |
US7594060B2 (en) | 2006-08-23 | 2009-09-22 | Sun Microsystems, Inc. | Data buffer allocation in a non-blocking data services platform using input/output switching fabric |
US7657494B2 (en) * | 2006-09-20 | 2010-02-02 | Chevron U.S.A. Inc. | Method for forecasting the production of a petroleum reservoir utilizing genetic programming |
US7752474B2 (en) | 2006-09-22 | 2010-07-06 | Apple Inc. | L1 cache flush when processor is entering low power mode |
US7716460B2 (en) | 2006-09-29 | 2010-05-11 | Qualcomm Incorporated | Effective use of a BHT in processor having variable length instruction set execution modes |
US7774549B2 (en) | 2006-10-11 | 2010-08-10 | Mips Technologies, Inc. | Horizontally-shared cache victims in multiple core processors |
TWI337495B (en) | 2006-10-26 | 2011-02-11 | Au Optronics Corp | System and method for operation scheduling |
US7680988B1 (en) | 2006-10-30 | 2010-03-16 | Nvidia Corporation | Single interconnect providing read and write access to a memory shared by concurrent threads |
US8108625B1 (en) | 2006-10-30 | 2012-01-31 | Nvidia Corporation | Shared memory with parallel access and access conflict resolution mechanism |
US7617384B1 (en) | 2006-11-06 | 2009-11-10 | Nvidia Corporation | Structured programming control flow using a disable mask in a SIMD architecture |
EP2527972A3 (en) | 2006-11-14 | 2014-08-06 | Soft Machines, Inc. | Apparatus and method for processing complex instruction formats in a multi- threaded architecture supporting various context switch modes and virtualization schemes |
US7493475B2 (en) | 2006-11-15 | 2009-02-17 | Stmicroelectronics, Inc. | Instruction vector-mode processing in multi-lane processor by multiplex switch replicating instruction in one lane to select others along with updated operand address |
US7934179B2 (en) | 2006-11-20 | 2011-04-26 | Et International, Inc. | Systems and methods for logic verification |
US20080235500A1 (en) | 2006-11-21 | 2008-09-25 | Davis Gordon T | Structure for instruction cache trace formation |
JP2008130056A (ja) | 2006-11-27 | 2008-06-05 | Renesas Technology Corp | 半導体回路 |
US20080148020A1 (en) * | 2006-12-13 | 2008-06-19 | Luick David A | Low Cost Persistent Instruction Predecoded Issue and Dispatcher |
US7945763B2 (en) * | 2006-12-13 | 2011-05-17 | International Business Machines Corporation | Single shared instruction predecoder for supporting multiple processors |
US7783869B2 (en) | 2006-12-19 | 2010-08-24 | Arm Limited | Accessing branch predictions ahead of instruction fetching |
WO2008077088A2 (en) | 2006-12-19 | 2008-06-26 | The Board Of Governors For Higher Education, State Of Rhode Island And Providence Plantations | System and method for branch misprediction prediction using complementary branch predictors |
EP1940028B1 (en) | 2006-12-29 | 2012-02-29 | STMicroelectronics Srl | Asynchronous interconnection system for 3D inter-chip communication |
US8321849B2 (en) | 2007-01-26 | 2012-11-27 | Nvidia Corporation | Virtual architecture and instruction set for parallel thread computing |
TW200833002A (en) | 2007-01-31 | 2008-08-01 | Univ Nat Yunlin Sci & Tech | Distributed switching circuit having fairness |
US20080189501A1 (en) | 2007-02-05 | 2008-08-07 | Irish John D | Methods and Apparatus for Issuing Commands on a Bus |
US7685410B2 (en) | 2007-02-13 | 2010-03-23 | Global Foundries Inc. | Redirect recovery cache that receives branch misprediction redirects and caches instructions to be dispatched in response to the redirects |
US7647483B2 (en) | 2007-02-20 | 2010-01-12 | Sony Computer Entertainment Inc. | Multi-threaded parallel processor methods and apparatus |
JP4980751B2 (ja) | 2007-03-02 | 2012-07-18 | 富士通セミコンダクター株式会社 | データ処理装置、およびメモリのリードアクティブ制御方法。 |
US8452907B2 (en) | 2007-03-27 | 2013-05-28 | Arm Limited | Data processing apparatus and method for arbitrating access to a shared resource |
US20080250227A1 (en) | 2007-04-04 | 2008-10-09 | Linderman Michael D | General Purpose Multiprocessor Programming Apparatus And Method |
US7716183B2 (en) | 2007-04-11 | 2010-05-11 | Dot Hill Systems Corporation | Snapshot preserved data cloning |
US7941791B2 (en) | 2007-04-13 | 2011-05-10 | Perry Wang | Programming environment for heterogeneous processor resource integration |
US7769955B2 (en) | 2007-04-27 | 2010-08-03 | Arm Limited | Multiple thread instruction fetch from different cache levels |
US7711935B2 (en) | 2007-04-30 | 2010-05-04 | Netlogic Microsystems, Inc. | Universal branch identifier for invalidation of speculative instructions |
US8555039B2 (en) | 2007-05-03 | 2013-10-08 | Qualcomm Incorporated | System and method for using a local condition code register for accelerating conditional instruction execution in a pipeline processor |
US8219996B1 (en) | 2007-05-09 | 2012-07-10 | Hewlett-Packard Development Company, L.P. | Computer processor with fairness monitor |
CN101344840B (zh) | 2007-07-10 | 2011-08-31 | 苏州简约纳电子有限公司 | 一种微处理器及在微处理器中执行指令的方法 |
US7937568B2 (en) | 2007-07-11 | 2011-05-03 | International Business Machines Corporation | Adaptive execution cycle control method for enhanced instruction throughput |
US20090025004A1 (en) | 2007-07-16 | 2009-01-22 | Microsoft Corporation | Scheduling by Growing and Shrinking Resource Allocation |
US8108545B2 (en) | 2007-08-27 | 2012-01-31 | International Business Machines Corporation | Packet coalescing in virtual channels of a data processing system in a multi-tiered full-graph interconnect architecture |
US7711929B2 (en) | 2007-08-30 | 2010-05-04 | International Business Machines Corporation | Method and system for tracking instruction dependency in an out-of-order processor |
US8725991B2 (en) | 2007-09-12 | 2014-05-13 | Qualcomm Incorporated | Register file system and method for pipelined processing |
US8082420B2 (en) | 2007-10-24 | 2011-12-20 | International Business Machines Corporation | Method and apparatus for executing instructions |
US7856530B1 (en) | 2007-10-31 | 2010-12-21 | Network Appliance, Inc. | System and method for implementing a dynamic cache for a data storage system |
US7877559B2 (en) | 2007-11-26 | 2011-01-25 | Globalfoundries Inc. | Mechanism to accelerate removal of store operations from a queue |
US8245232B2 (en) | 2007-11-27 | 2012-08-14 | Microsoft Corporation | Software-configurable and stall-time fair memory access scheduling mechanism for shared memory systems |
US7809925B2 (en) | 2007-12-07 | 2010-10-05 | International Business Machines Corporation | Processing unit incorporating vectorizable execution unit |
US8145844B2 (en) | 2007-12-13 | 2012-03-27 | Arm Limited | Memory controller with write data cache and read data cache |
US7870371B2 (en) | 2007-12-17 | 2011-01-11 | Microsoft Corporation | Target-frequency based indirect jump prediction for high-performance processors |
US7831813B2 (en) | 2007-12-17 | 2010-11-09 | Globalfoundries Inc. | Uses of known good code for implementing processor architectural modifications |
US20090165007A1 (en) | 2007-12-19 | 2009-06-25 | Microsoft Corporation | Task-level thread scheduling and resource allocation |
US8782384B2 (en) | 2007-12-20 | 2014-07-15 | Advanced Micro Devices, Inc. | Branch history with polymorphic indirect branch information |
US7917699B2 (en) | 2007-12-21 | 2011-03-29 | Mips Technologies, Inc. | Apparatus and method for controlling the exclusivity mode of a level-two cache |
US9244855B2 (en) | 2007-12-31 | 2016-01-26 | Intel Corporation | Method, system, and apparatus for page sizing extension |
US8645965B2 (en) | 2007-12-31 | 2014-02-04 | Intel Corporation | Supporting metered clients with manycore through time-limited partitioning |
CN101217495A (zh) * | 2008-01-11 | 2008-07-09 | 北京邮电大学 | 用于t-mpls网络环境下的流量监控方法和装置 |
US7877582B2 (en) | 2008-01-31 | 2011-01-25 | International Business Machines Corporation | Multi-addressable register file |
WO2009101563A1 (en) | 2008-02-11 | 2009-08-20 | Nxp B.V. | Multiprocessing implementing a plurality of virtual processors |
US9021240B2 (en) | 2008-02-22 | 2015-04-28 | International Business Machines Corporation | System and method for Controlling restarting of instruction fetching using speculative address computations |
US7949972B2 (en) | 2008-03-19 | 2011-05-24 | International Business Machines Corporation | Method, system and computer program product for exploiting orthogonal control vectors in timing driven synthesis |
US7987343B2 (en) | 2008-03-19 | 2011-07-26 | International Business Machines Corporation | Processor and method for synchronous load multiple fetching sequence and pipeline stage result tracking to facilitate early address generation interlock bypass |
US9513905B2 (en) | 2008-03-28 | 2016-12-06 | Intel Corporation | Vector instructions to enable efficient synchronization and parallel reduction operations |
US8120608B2 (en) | 2008-04-04 | 2012-02-21 | Via Technologies, Inc. | Constant buffering for a computational core of a programmable graphics processing unit |
CR20170001A (es) * | 2008-04-28 | 2017-08-10 | Genentech Inc | Anticuerpos anti factor d humanizados |
TWI364703B (en) | 2008-05-26 | 2012-05-21 | Faraday Tech Corp | Processor and early execution method of data load thereof |
US8131982B2 (en) | 2008-06-13 | 2012-03-06 | International Business Machines Corporation | Branch prediction instructions having mask values involving unloading and loading branch history data |
US8145880B1 (en) | 2008-07-07 | 2012-03-27 | Ovics | Matrix processor data switch routing systems and methods |
JP5733860B2 (ja) | 2008-07-10 | 2015-06-10 | ロケティック テクノロジーズ リミテッド | 依存問題の効率的並列計算 |
JP2010039536A (ja) | 2008-07-31 | 2010-02-18 | Panasonic Corp | プログラム変換装置、プログラム変換方法およびプログラム変換プログラム |
US8316435B1 (en) | 2008-08-14 | 2012-11-20 | Juniper Networks, Inc. | Routing device having integrated MPLS-aware firewall with virtual security system support |
US8135942B2 (en) | 2008-08-28 | 2012-03-13 | International Business Machines Corpration | System and method for double-issue instructions using a dependency matrix and a side issue queue |
US7769984B2 (en) | 2008-09-11 | 2010-08-03 | International Business Machines Corporation | Dual-issuance of microprocessor instructions using dual dependency matrices |
US8225048B2 (en) | 2008-10-01 | 2012-07-17 | Hewlett-Packard Development Company, L.P. | Systems and methods for resource access |
US9244732B2 (en) | 2009-08-28 | 2016-01-26 | Vmware, Inc. | Compensating threads for microarchitectural resource contentions by prioritizing scheduling and execution |
US7941616B2 (en) | 2008-10-21 | 2011-05-10 | Microsoft Corporation | System to reduce interference in concurrent programs |
US8423749B2 (en) | 2008-10-22 | 2013-04-16 | International Business Machines Corporation | Sequential processing in network on chip nodes by threads generating message containing payload and pointer for nanokernel to access algorithm to be executed on payload in another node |
GB2464703A (en) | 2008-10-22 | 2010-04-28 | Advanced Risc Mach Ltd | An array of interconnected processors executing a cycle-based program |
KR101374452B1 (ko) | 2008-10-30 | 2014-03-17 | 노키아 코포레이션 | 데이터 블록을 인터리빙하기 위한 방법 및 장치 |
US8032678B2 (en) | 2008-11-05 | 2011-10-04 | Mediatek Inc. | Shared resource arbitration |
US7848129B1 (en) | 2008-11-20 | 2010-12-07 | Netlogic Microsystems, Inc. | Dynamically partitioned CAM array |
US8868838B1 (en) | 2008-11-21 | 2014-10-21 | Nvidia Corporation | Multi-class data cache policies |
US8171223B2 (en) | 2008-12-03 | 2012-05-01 | Intel Corporation | Method and system to increase concurrency and control replication in a multi-core cache hierarchy |
US8200949B1 (en) | 2008-12-09 | 2012-06-12 | Nvidia Corporation | Policy based allocation of register file cache to threads in multi-threaded processor |
US8312268B2 (en) | 2008-12-12 | 2012-11-13 | International Business Machines Corporation | Virtual machine |
US8099586B2 (en) | 2008-12-30 | 2012-01-17 | Oracle America, Inc. | Branch misprediction recovery mechanism for microprocessors |
US20100169578A1 (en) | 2008-12-31 | 2010-07-01 | Texas Instruments Incorporated | Cache tag memory |
US20100205603A1 (en) | 2009-02-09 | 2010-08-12 | Unisys Corporation | Scheduling and dispatching tasks in an emulated operating system |
JP5417879B2 (ja) | 2009-02-17 | 2014-02-19 | 富士通セミコンダクター株式会社 | キャッシュ装置 |
US8505013B2 (en) | 2010-03-12 | 2013-08-06 | Lsi Corporation | Reducing data read latency in a network communications processor architecture |
US8805788B2 (en) | 2009-05-04 | 2014-08-12 | Moka5, Inc. | Transactional virtual disk with differential snapshots |
US8332854B2 (en) | 2009-05-19 | 2012-12-11 | Microsoft Corporation | Virtualized thread scheduling for hardware thread optimization based on hardware resource parameter summaries of instruction blocks in execution groups |
US8533437B2 (en) | 2009-06-01 | 2013-09-10 | Via Technologies, Inc. | Guaranteed prefetch instruction |
GB2471067B (en) | 2009-06-12 | 2011-11-30 | Graeme Roy Smith | Shared resource multi-thread array processor |
US9122487B2 (en) | 2009-06-23 | 2015-09-01 | Oracle America, Inc. | System and method for balancing instruction loads between multiple execution units using assignment history |
US8386754B2 (en) | 2009-06-24 | 2013-02-26 | Arm Limited | Renaming wide register source operand with plural short register source operands for select instructions to detect dependency fast with existing mechanism |
CN101582025B (zh) | 2009-06-25 | 2011-05-25 | 浙江大学 | 片上多处理器体系架构下全局寄存器重命名表的实现方法 |
US8397049B2 (en) | 2009-07-13 | 2013-03-12 | Apple Inc. | TLB prefetching |
US8539486B2 (en) | 2009-07-17 | 2013-09-17 | International Business Machines Corporation | Transactional block conflict resolution based on the determination of executing threads in parallel or in serial mode |
JP5423217B2 (ja) | 2009-08-04 | 2014-02-19 | 富士通株式会社 | 演算処理装置、情報処理装置、および演算処理装置の制御方法 |
US8127078B2 (en) | 2009-10-02 | 2012-02-28 | International Business Machines Corporation | High performance unaligned cache access |
US20110082983A1 (en) | 2009-10-06 | 2011-04-07 | Alcatel-Lucent Canada, Inc. | Cpu instruction and data cache corruption prevention system |
US8695002B2 (en) | 2009-10-20 | 2014-04-08 | Lantiq Deutschland Gmbh | Multi-threaded processors and multi-processor systems comprising shared resources |
US8364933B2 (en) | 2009-12-18 | 2013-01-29 | International Business Machines Corporation | Software assisted translation lookaside buffer search mechanism |
JP2011150397A (ja) | 2010-01-19 | 2011-08-04 | Panasonic Corp | バス調停装置 |
KR101699910B1 (ko) | 2010-03-04 | 2017-01-26 | 삼성전자주식회사 | 재구성 가능 프로세서 및 그 제어 방법 |
US20120005462A1 (en) | 2010-07-01 | 2012-01-05 | International Business Machines Corporation | Hardware Assist for Optimizing Code During Processing |
US8312258B2 (en) | 2010-07-22 | 2012-11-13 | Intel Corporation | Providing platform independent memory logic |
CN101916180B (zh) | 2010-08-11 | 2013-05-29 | 中国科学院计算技术研究所 | Risc处理器中执行寄存器类型指令的方法和其系统 |
US8751745B2 (en) | 2010-08-11 | 2014-06-10 | Advanced Micro Devices, Inc. | Method for concurrent flush of L1 and L2 caches |
US8756329B2 (en) | 2010-09-15 | 2014-06-17 | Oracle International Corporation | System and method for parallel multiplexing between servers in a cluster |
US9201801B2 (en) | 2010-09-15 | 2015-12-01 | International Business Machines Corporation | Computing device with asynchronous auxiliary execution unit |
KR101685247B1 (ko) | 2010-09-17 | 2016-12-09 | 소프트 머신즈, 인크. | 조기 원거리 분기 예측을 위한 섀도우 캐시를 포함하는 단일 사이클 다중 분기 예측 |
US20120079212A1 (en) | 2010-09-23 | 2012-03-29 | International Business Machines Corporation | Architecture for sharing caches among multiple processes |
US9733944B2 (en) | 2010-10-12 | 2017-08-15 | Intel Corporation | Instruction sequence buffer to store branches having reliably predictable instruction sequences |
US9678755B2 (en) | 2010-10-12 | 2017-06-13 | Intel Corporation | Instruction sequence buffer to enhance branch prediction efficiency |
US8370553B2 (en) | 2010-10-18 | 2013-02-05 | International Business Machines Corporation | Formal verification of random priority-based arbiters using property strengthening and underapproximations |
US9047178B2 (en) | 2010-12-13 | 2015-06-02 | SanDisk Technologies, Inc. | Auto-commit memory synchronization |
US8677355B2 (en) | 2010-12-17 | 2014-03-18 | Microsoft Corporation | Virtual machine branching and parallel execution |
WO2012103245A2 (en) | 2011-01-27 | 2012-08-02 | Soft Machines Inc. | Guest instruction block with near branching and far branching sequence construction to native instruction block |
EP2689326B1 (en) | 2011-03-25 | 2022-11-16 | Intel Corporation | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
CN108376097B (zh) | 2011-03-25 | 2022-04-15 | 英特尔公司 | 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段 |
CN103547993B (zh) | 2011-03-25 | 2018-06-26 | 英特尔公司 | 通过使用由可分割引擎实例化的虚拟核来执行指令序列代码块 |
US20120254592A1 (en) | 2011-04-01 | 2012-10-04 | Jesus Corbal San Adrian | Systems, apparatuses, and methods for expanding a memory source into a destination register and compressing a source register into a destination memory location |
US9740494B2 (en) | 2011-04-29 | 2017-08-22 | Arizona Board Of Regents For And On Behalf Of Arizona State University | Low complexity out-of-order issue logic using static circuits |
US8843690B2 (en) | 2011-07-11 | 2014-09-23 | Avago Technologies General Ip (Singapore) Pte. Ltd. | Memory conflicts learning capability |
US8930432B2 (en) | 2011-08-04 | 2015-01-06 | International Business Machines Corporation | Floating point execution unit with fixed point functionality |
US20130046934A1 (en) | 2011-08-15 | 2013-02-21 | Robert Nychka | System caching using heterogenous memories |
US8839025B2 (en) | 2011-09-30 | 2014-09-16 | Oracle International Corporation | Systems and methods for retiring and unretiring cache lines |
WO2013077876A1 (en) | 2011-11-22 | 2013-05-30 | Soft Machines, Inc. | A microprocessor accelerated code optimizer |
EP2783280B1 (en) | 2011-11-22 | 2019-09-11 | Intel Corporation | An accelerated code optimizer for a multiengine microprocessor |
KR101648278B1 (ko) | 2011-11-22 | 2016-08-12 | 소프트 머신즈, 인크. | 마이크로프로세서 가속 코드 최적화기 및 의존성 재순서화 방법 |
US20130138888A1 (en) | 2011-11-30 | 2013-05-30 | Jama I. Barreh | Storing a target address of a control transfer instruction in an instruction field |
US8930674B2 (en) | 2012-03-07 | 2015-01-06 | Soft Machines, Inc. | Systems and methods for accessing a unified translation lookaside buffer |
KR20130119285A (ko) | 2012-04-23 | 2013-10-31 | 한국전자통신연구원 | 클러스터 컴퓨팅 환경에서의 자원 할당 장치 및 그 방법 |
US9684601B2 (en) | 2012-05-10 | 2017-06-20 | Arm Limited | Data processing apparatus having cache and translation lookaside buffer |
US9996348B2 (en) | 2012-06-14 | 2018-06-12 | Apple Inc. | Zero cycle load |
US9940247B2 (en) | 2012-06-26 | 2018-04-10 | Advanced Micro Devices, Inc. | Concurrent access to cache dirty bits |
US9710399B2 (en) | 2012-07-30 | 2017-07-18 | Intel Corporation | Systems and methods for flushing a cache with modified data |
US9916253B2 (en) | 2012-07-30 | 2018-03-13 | Intel Corporation | Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput |
US9430410B2 (en) | 2012-07-30 | 2016-08-30 | Soft Machines, Inc. | Systems and methods for supporting a plurality of load accesses of a cache in a single cycle |
US9229873B2 (en) | 2012-07-30 | 2016-01-05 | Soft Machines, Inc. | Systems and methods for supporting a plurality of load and store accesses of a cache |
US9740612B2 (en) | 2012-07-30 | 2017-08-22 | Intel Corporation | Systems and methods for maintaining the coherency of a store coalescing cache and a load cache |
US9678882B2 (en) | 2012-10-11 | 2017-06-13 | Intel Corporation | Systems and methods for non-blocking implementation of cache flush instructions |
US10037228B2 (en) | 2012-10-25 | 2018-07-31 | Nvidia Corporation | Efficient memory virtualization in multi-threaded processing units |
US9195506B2 (en) | 2012-12-21 | 2015-11-24 | International Business Machines Corporation | Processor provisioning by a middleware processing system for a plurality of logical processor partitions |
US9811342B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for performing dual dispatch of blocks and half blocks |
US9632825B2 (en) | 2013-03-15 | 2017-04-25 | Intel Corporation | Method and apparatus for efficient scheduling for asymmetrical execution units |
EP2972845B1 (en) | 2013-03-15 | 2021-07-07 | Intel Corporation | A method for executing multithreaded instructions grouped onto blocks |
US10140138B2 (en) | 2013-03-15 | 2018-11-27 | Intel Corporation | Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation |
WO2014150971A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for dependency broadcasting through a block organized source view data structure |
WO2014150806A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for populating register view data structure by using register template snapshots |
WO2014150991A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for implementing a reduced size register view data structure in a microprocessor |
US9891924B2 (en) | 2013-03-15 | 2018-02-13 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US10275255B2 (en) | 2013-03-15 | 2019-04-30 | Intel Corporation | Method for dependency broadcasting through a source organized source view data structure |
US9886279B2 (en) | 2013-03-15 | 2018-02-06 | Intel Corporation | Method for populating and instruction view data structure by using register template snapshots |
KR101800948B1 (ko) | 2013-03-15 | 2017-11-23 | 인텔 코포레이션 | 레지스터 뷰, 소스 뷰, 명령어 뷰, 및 복수의 레지스터 템플릿을 가진 마이크로프로세서 아키텍처를 이용하여 명령어들의 블록들을 실행하는 방법 |
CN105247484B (zh) | 2013-03-15 | 2021-02-23 | 英特尔公司 | 利用本地分布式标志体系架构来仿真访客集中式标志体系架构的方法 |
US9569216B2 (en) | 2013-03-15 | 2017-02-14 | Soft Machines, Inc. | Method for populating a source view data structure by using register template snapshots |
US9904625B2 (en) | 2013-03-15 | 2018-02-27 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
-
2011
- 2011-11-22 WO PCT/US2011/061957 patent/WO2013077876A1/en active Application Filing
- 2011-11-22 US US14/360,282 patent/US20150039859A1/en not_active Abandoned
- 2011-11-22 EP EP11876314.3A patent/EP2783281B1/en active Active
- 2011-11-22 KR KR1020147016763A patent/KR101703400B1/ko not_active Application Discontinuation
- 2011-11-22 CN CN201180076248.0A patent/CN104040491B/zh active Active
- 2011-11-22 KR KR1020177002379A patent/KR101832679B1/ko active IP Right Grant
- 2011-11-22 CN CN201810449173.XA patent/CN108427574B/zh active Active
-
2016
- 2016-10-03 US US15/283,836 patent/US10521239B2/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
KR20170016012A (ko) | 2017-02-10 |
WO2013077876A1 (en) | 2013-05-30 |
KR101703400B1 (ko) | 2017-02-06 |
US20170024219A1 (en) | 2017-01-26 |
KR20140093721A (ko) | 2014-07-28 |
CN104040491B (zh) | 2018-06-12 |
EP2783281A1 (en) | 2014-10-01 |
EP2783281A4 (en) | 2016-07-13 |
CN108427574A (zh) | 2018-08-21 |
KR101832679B1 (ko) | 2018-02-26 |
US20150039859A1 (en) | 2015-02-05 |
CN104040491A (zh) | 2014-09-10 |
US10521239B2 (en) | 2019-12-31 |
EP2783281B1 (en) | 2020-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108427574B (zh) | 微处理器加速的代码优化器 | |
EP2783280B1 (en) | An accelerated code optimizer for a multiengine microprocessor | |
EP2783282B1 (en) | A microprocessor accelerated code optimizer and dependency reordering method | |
KR101966712B1 (ko) | 분할가능한 엔진에 의해 인스턴스화된 가상 코어를 이용한 코드 블록의 실행을 지원하는 메모리 프래그먼트 | |
KR101620676B1 (ko) | 분할가능한 엔진에 의해 인스턴스화된 가상 코어를 이용한 코드 블록의 실행을 지원하는 레지스터 파일 세그먼트 | |
KR101638225B1 (ko) | 분할가능한 엔진에 의해 인스턴스화된 가상 코어를 이용한 명령어 시퀀스 코드 블록의 실행 | |
TWI610224B (zh) | 微處理器加速編碼最佳化器 | |
TWI512613B (zh) | 多引擎微處理器之加速編碼最佳化器 | |
TWI506548B (zh) | 微處理器加速編碼最佳化器與相依性重排序之方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |