CN101449256A - 对载明并行和依赖运算的指令矩阵进行处理的装置和方法 - Google Patents

对载明并行和依赖运算的指令矩阵进行处理的装置和方法 Download PDF

Info

Publication number
CN101449256A
CN101449256A CNA2007800173174A CN200780017317A CN101449256A CN 101449256 A CN101449256 A CN 101449256A CN A2007800173174 A CNA2007800173174 A CN A2007800173174A CN 200780017317 A CN200780017317 A CN 200780017317A CN 101449256 A CN101449256 A CN 101449256A
Authority
CN
China
Prior art keywords
instruction
matrix
unit
section
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2007800173174A
Other languages
English (en)
Other versions
CN101449256B (zh
Inventor
M·A·阿卜杜拉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Soft Machines Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Soft Machines Inc filed Critical Soft Machines Inc
Priority to CN201310589048.6A priority Critical patent/CN103646009B/zh
Publication of CN101449256A publication Critical patent/CN101449256A/zh
Application granted granted Critical
Publication of CN101449256B publication Critical patent/CN101449256B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3858Result writeback, i.e. updating the architectural state or memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/533Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even
    • G06F7/5334Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product
    • G06F7/5336Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product overlapped, i.e. with successive bitgroups sharing one or more bits being recoded into signed digit representation, e.g. using the Modified Booth Algorithm
    • G06F7/5338Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product overlapped, i.e. with successive bitgroups sharing one or more bits being recoded into signed digit representation, e.g. using the Modified Booth Algorithm each bitgroup having two new bits, e.g. 2nd order MBA
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30105Register structure
    • G06F9/30109Register structure having multiple operands in a single register
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30123Organisation of register space, e.g. banked or distributed register file according to context, e.g. thread buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30141Implementation provisions of register files, e.g. ports
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • G06F9/3016Decoding the operand specifier, e.g. specifier format
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30189Instruction operation extension or modification according to execution mode, e.g. mode flag
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/3826Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage
    • G06F9/3828Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage with global bypass, e.g. between pipelines, between clusters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3838Dependency mechanisms, e.g. register scoreboarding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3853Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3889Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3889Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
    • G06F9/3891Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute organised in groups of units sharing resources, e.g. clusters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/53Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
    • G06F7/5318Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel with column wise addition of partial products, e.g. using Wallace tree, Dadda counters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Advance Control (AREA)
  • Multi Processors (AREA)

Abstract

执行块的矩阵形成多行和多列。多行支持指令的并行执行,多列支持依赖指令的执行。执行块矩阵处理载明并行指令和依赖指令的单个指令块。

Description

对载明并行和依赖运算的指令矩阵进行处理的装置和方法
技术领域
[0001]本发明主要涉及计算机体系结构。更具体地,本发明涉及一种用于处理指定并行运算和依赖运算的矩阵指令的计算机体系结构。
背景技术
[0002]改善计算机体系结构性能是一个很困难的任务。已经通过频率变标(frequency scaling)、单指令多数据(SIMD)、超长指令字(VLIW)、多线程和多处理器技术来寻求上述的性能改善。这些方法着重于改善程序执行的吞吐量。上述技术中的很大一部分要求软件显式(explicitly)地呈现出并行性。与此不同的是,频率变标在改善吞吐量和延迟的同时无需软件显式地标记并行性。近来,频率变标面临着功耗壁垒(power wall)问题,所以很难通过频率变标进行改进。因此,除非表示出大规模的显式软件并行(explicitsoftware parallelization),否则很难增加吞吐量。
[0003]鉴于上述问题,人们期望在不依赖于频率变标和大规模的显式软件并行的前提下改善计算机体系结构的性能。
发明内容
[0004]执行块矩阵形成多行和多列。多行支持指令的并行执行,而多列支持依赖指令的执行。执行块矩阵处理载明并行指令和依赖指令的单个指令矩阵。
附图说明
[0005]参考详细说明并结合附图,本发明将得到更完整地阐释,在附图中:
[0006]图1表示按照本发明实施例的支持执行并行指令和依赖指令的体系结构;
[0007]图2A表示映射串行指令以在本发明的执行矩阵中产生并行和依赖运算;
[0008]图2B表示对本发明执行矩阵进行的运算映射;
[0009]图3表示按照本发明实施例可能用到的一行执行块;
[0010]图4A表示按照本发明实施例可能用到的执行块和支持寄存器文件及缓冲器;
[0011]图4B表示按照本发明实施例配置的寄存器文件;
[0012]图5表示可用于本发明缓冲器的各种数据结构;
[0013]图6表示按照本发明实施例可使用的流水线;
[0014]图7表示按照本发明实施例的矩阵指令处理;
[0015]图8表示对本发明执行块进行运算映射;
[0016]图9表示按照本发明实施例使用的矩阵缓冲器;
[0017]图10A表示实现本发明运算的通用单元;
[0018]图10B表示用于图10A中4位加法器的超前进位加法器的1个位单元;
[0019]图11表示按照本发明实施例配置的延时切片体系结构(TLSA)单元;
[0020]图12表示按照本发明实施例的交错结构中的多个TLSA单元;
[0021]图13表示现有的乘法运算技术;
[0022]图14表示按照本发明实施例采用的延时乘法运算技术;
[0023]图15表示现有的布斯编码乘法;
[0024]图16表示实现图15的布斯编码乘法的逻辑;
[0025]图17表示按照本发明实施例的布斯编码乘法;
[0026]图18表示实现图17的布斯编码乘法的逻辑;
[0027]图19表示按照本发明实施例配置的存储器/寄存器文件数据块;
[0028]图20表示按照本发明实施例配置的时间切片存储器;
[0029]图21表示按照本发明实施例配置带有转置/移位单元的TLSA体系结构;
[0030]图22表示按照本发明实施例配置的浮点乘累加单元。
在所有附图中,相似的附图标记指代对应的部分。
具体实施方式
[0031]允许非显式并行软件在单处理器硬件上执行是非常有益处的,这种单处理器硬件能够在短的延迟流水线(latency pipeline)中处理大量指令。当前解决方案的现有观点为;超标量处理器可以在峰值瞬间的一个周期内实际处理4-5条指令,这类似于好的VLIW编译器在峰值瞬间实现的处理量。人们已经认识到实际上无法实现超标量处理器在单个周期/瞬间内调度10条指令。类似地,如果编译器尝试并行处理用于带有10个时隙的VLIW体系结构的通用程序,则会导致留下大量未填充的时隙。
[0032]本发明提供的体系结构和处理器实现了在短的流水线中高度并行地读数、调度、解码和执行多条指令,极大地改善了执行吞吐量,同时保持了单处理器流水线运算的更优化的延迟,并具有高效的分时、功率、面积和路径。
[0033]在这个被称为超大指令矩阵(Ultra Large Instruction Matrix,ULIM)的新体系结构中,固定大小的指令矩阵模板被调度为作为并行和依赖指令组的矩阵在硬件中执行。与使用超长指令宽度模板仅对并行指令进行调度的VLIW不同,ULIM体系结构的模板封装了并行指令和依赖指令的指令组。这可以被看作二维模板矩阵,其中横向分配并行指令,同时仅纵向分配依赖指令。这个模板以一个指令矩阵的形式被发送到执行硬件,其中各个执行块处理一个指令矩阵。
[0034]图1表示这种模板,其中,如果将一个指令放入时隙101,那么另一个可与其并行执行的指令可放入与其处于同一行的并行时隙1到N中的任何一个中。然而,与设置在时隙101中的指令相依赖的指令不能被放入相同的行,而是必须放入随后的行102(例如,串行时隙1的并行时隙0)。标记为102的时隙接收101产生的结果、或外部输入操作数、或这二者的结合作为输入。这种指令模板的执行模型是任意行的指令都在下一行的指令执行之前执行。
[0035]ULIM指令组可由ULIM编译器静态调度,或者由ULIM硬件组成单元动态调度。如果与传统体系结构进行比较,那么这种体系结构的意义则非常明显。在传统体系结构中,单行N1并行指令可放进一个例如VLIW或SIMD模板的模板中。这意味着每个周期都可以执行N1个指令(全部指令都必须保证是并行的,这对于很大的N而言是非常严格的约束)。另一方面,通过在每个行边界的流水线执行,ULIM体系结构在每个周期可执行N1*N2个指令。尽管每个中周期执行N1*N2个指令,ULIM体系结构仅有如下的相同约束:确保并行执行N1个指令。
[0036]指令矩阵模板可由编程器或编译器/硬件格式程序构建,其中二者都不需要被限制成仅找到并行指令以调度每个周期。有效的并行指令可以被提取出来并且按行横向设置。通过在后续行中设置依赖指令,可以增加并行指令。矩阵可以在单个或多个周期内执行。矩阵可以本地化存储接近执行硬件的操作数,从而优化相互关联的复杂度、面积、速度和功率。
[0037]参考图2A的实例,本发明得到更完整的说明。图2A表示由单个串行程序流表示的指令序列200的实例。该图也表示如何对指令分组,从而形成在矩阵模板中采用的兼有串行时隙和并行时隙的单指令矩阵。矩阵模板被用于执行块201。执行块201包括乘法单元202、浮点加法单元204、加载单元206、乘累加(multiple accumulate,MAC)单元208、逻辑单元(例如ALU)210和另一个逻辑单元212。
[0038]因此,指令矩阵引用是指并行和串行指令的指令组模板。执行块是指执行指令矩阵的执行单元组。执行单元是执行块内部单独的计算单元(例如,复杂单元和简单单元两者)。
[0039]诸如214之类的直线箭头指出指令之间的依赖性。这种映射可由编译器、前端硬件或运行时环境实现。图2A还表示了格式化运算图,其显示出当矩阵中的指令将在带有并行和串行依赖性指示器的对应硬件位置处被执行时,这些指令在矩阵中的相对物理位置。
[0040]如图2A所示,单数据流程序再形成为静态确定串行执行的指令以及指令并行的指令矩阵。例如,在图2A的串行单一程序流中,最后一个指令采用R3作为源,同时第四个指令将R3写作结果。通过将最后一条指令放置在紧接着第四条指令所在那行之后的一行中,这就映射在矩阵模板中。线214表示这种依赖性。
[0041]图2B中解释了编码指令及其用于图2A所示模板的源和目的地,其中指令矩阵模板包括用于指令的操作码,并指定操作数和结果寄存器。源和目的地与操作码分离。这样简化了解码阶段、依赖性解析阶段和寄存器/存储器读取阶段。
[0042]如图3所示,可以同时发出几个指令矩阵。图3表示201的指令矩阵运算图,在此表示为301,以及可选配置的指令矩阵运算图300、302和304。可采用下列模型之一来实现同时的发出:
1-MIMD或SIMD:软件/编译器将多个矩阵合并为一个超级矩阵。
2-线程模型:每个矩阵属于分离的软件或硬件线程。
3-动态执行:如果在来自单个流的多个矩阵之间没有依赖性,则可由硬件动态发出矩阵。
[0043]图3中由运算图表示的指令矩阵模板在将这些指令时隙一一对应地映射到执行单元(执行块内部)的硬件上执行。矩阵粒度允许形成基线矩阵并将那些矩阵合并为一个超级矩阵。这在图4A中表示,其中4基线指令矩阵在四个执行块301、300、302和304上并行执行。每个执行程序由两行复杂单元组成。第一行具有3个复杂单元(例如402、404和406),而另一行具有3个复杂单元(例如408、410和412)。每个复杂单元可操作地进行复杂运算,比如乘法运算、浮点加法运算或者乘-累加运算。因此,比如复杂单元402可以对应于图2的202,复杂单元404可以对应于图2的204等等。
[0044]复杂单元可操作地进行至多4个简单运算,比如ALU运算。因此,这种复杂单元被显示为具有多个操作数和多个输出。这些单元可设置成计算并行运算或者顺序地执行依赖运算。每个基线指令矩阵在线程模式下可以独立于其他矩阵运行,或者多个基线指令矩阵合并成在同一周期内运行的组,作为组成MIMD体系结构的超级指令矩阵。整体矩阵可以在一个周期内执行,或者可在多个周期内按照流水线方式执行。
[0045]作为操作图4A所示的执行引擎来执行图3所示ULIM指令模板的实例,在复杂单元408中执行指令MAC 208,同时在一对408的单元中执行简单逻辑和ALU运算对210。图4A表示与VLIW体系结构的异同。如果在图4A中采用上部3个时隙(402、404、406)执行3个并行指令,那么ULIM模板中的这3个时隙将类似于等效的VLIW指令。利用简单单元(408、410、412)下一行执行3个或更多的并行指令,将意味着正在执行依赖于先前指令的另一个等效VLIW。这样,ULIM体系结构可以被看成在空间中执行,并具有固定延迟、固定数量的多个相互依赖的VLIW指令。另外,该体系结构允许在一个简单时隙中执行单个复杂指令或多个简单指令,而这在VLIW模板中是不允许的。
[0046]图3中的运算图301、300、302和304表示的4个ULIM矩阵可在图4A的硬件上执行。这可以按照3种执行模式之一来实现:要么通过编译器/编程器将这些矩阵集合起来形成MIMD超级指令矩阵,要么每个矩阵可以独立地以线程模式执行,线程模式中分离的线程同时在四个硬件分段(301、300、302和304)上的每个硬件分段上执行。最后一种执行模式可能是利用硬件依赖性检查来确保同时在图4A中的四个不同硬件分段上执行的不同矩阵之间不具有依赖性,从而能够动态地在单个线程中执行4个不同的指令矩阵。
[0047]图4A中的寄存器文件420可选地根据执行模式进行配置。在一种模式下,寄存器文件要么被看作提供4段MIMD宽度的MIMD分段寄存器文件,要么用作4个独立的寄存器文件,每个寄存器文件提供一个分离的线程。寄存器文件还可以支持动态执行模式,其中4段是一个合一的寄存器文件,其中写入一个指定分段中任何寄存器的数据都可由其他分段中的全部单元进行访问。因为不同的执行模式可以在独立线程基线指令矩阵和MIMD超级指令矩阵线程之间转换,所以那些模式之间的转换可以是无缝的。
[0048]每个单指令矩阵是并行指令和依赖指令的混合。此外,每个独立指令可以是标量或SIMD指令。在最细的粒度下,该指令可以类似于对多个字节/字或单个标量实体进行的可变数据宽度SIMD运算。
[0049]在多线程执行模式下,每个寄存器文件及其执行线程的执行单元完全独立于其他寄存器文件及其线程。这类似于每个线程都具有其自己的寄存器状态。然而,可以指定这些线程之间的依赖性。属于一个线程的每个矩阵将在该线程的寄存器文件的执行单元中执行。如果只有一个线程或非线程化单一程序在图4A中的硬件上执行,那么采用下列方法允许属于单个线程/程序的并行矩阵能够访问已写入其他分段内的寄存器中的结果。实现这点的方法是允许任何矩阵将结果写入到4个寄存器文件的任何一个,从而产生其他寄存器文件分段中那些寄存器的拷贝。这在物理上是通过将每个分段的写端口延伸到剩余分段中实现的。然而,因为不能建立有效率的寄存器文件,使得每个存储单元具有的写端口的数量是一个分段单独所需写端口数量的4倍,所以这不是可伸缩的。现在提供一种机制,其中使得寄存器文件的建立不受这种单线程寄存器广播扩展的影响。这种机制如图4B所示。
[0050]图4B显示由24个寄存器组成的寄存器文件的一段,其中属于单个线程的矩阵正在保存位于该分段的24个条目寄存器文件中的执行结果。同时,3个其他的并行矩阵正在图4A中其他3个分段上执行,并且它们的执行结果被广播到这个寄存器文件分段。
[0051]通过将每个矩阵的结果限制到不相重叠的6个寄存器组,将写端口配置成启动单个线程寄存器广播。通过分段写端口实现了这种方法,其中每个写端口写入到寄存器430的分离的组中。来自其他分段的写端口440将写入到不同的不相重叠的寄存器组中。
[0052]如果这是线程模式或MIMD模式,那么由该分段的结果使用那些进入不重叠组的全部写端口,从而利用并写入全部24个条目寄存器文件,并且因为其他分段具有使用独立寄存器的独立代码(意味着本地分段寄存器将需要使用全部寄存器),所以不进行广播。另一方面,如果使用单线程,那么全部分段协作起来为这个单线程工作。在这种情况下的寄存器总数仅仅为24,因此跨过剩余分段(24条目*3分段)的寄存器可用于互相保存拷贝。通过编译器进行分析以判断矩阵是否可以并行化并对那些有机会并行执行不重叠寄存器组的矩阵进行分配,可以分配这个寄存器组。
[0053]尽管根据全部4个分段写入结果,寄存器文件中的每个存储单元仅仅具有支持一个分段的端口。在传统的寄存器文件中必需支持4个分段,在下面所示的实例中进行了4折(fold)的增加。
[0054]在ULIM基线格式之上,这些体系结构中实现了ULIM中的数据并行性。这是通过允许ULIM模板中的每个指令都是SIMD/MIMD指令来实现的。在前述附图中,每个并行时隙可以支持内部复制的SIMD结构,同时由不同的并行/串行时隙支持MIMD。
[0055]依赖于不同的预期访问性质,能够看到存储器/寄存器或矩阵缓冲器可由指令访问。例如,数据矩阵可以被看作宽数据元(element)的MIMD,小数据元的SIMD,或混合数据宽度SIMD指令的MIMD。
[0056]在图5中,显示了数据的存储器/寄存器矩阵缓冲器布置的2个视图。右侧视图表示每行和每列中的相互垂直的数据元。这样支持不同的MIMD/SIMD数据的组合。左侧视图表示任意行中每个位置的不同元,但是列表示更大数据量元的剩余位数。例如,右侧视图可以表示4条MIMD指令,每条指令对4个不同的SIMD字节进行运算,每条指令是一个字节的并行数据元。同时左侧表示4条MIMD指令,每条指令对纵向布置的32位元进行运算(实际的物理布置不同于所示的逻辑表示)。此外,如果视图是MIMD视图,那么全都属于4分段的一个单独的MIMD寄存器,但是如果该视图不是MIMD视图,那么那些寄存器是4个纵向布置的独立寄存器。
[0057]这个存储器和寄存器文件视图及其对应的执行模式的意义在于,它启动执行单元变形(morph)来执行宽MIMD/SIMD指令(将全部寄存器分段合并成4路MIMD/SIMD),但是同时4个寄存器文件分段和附加的对应执行单元能作为4个独立单元对4个不同的标量寄存器进行运算,允许以最小粒度级别在执行单元内部以单线程和多线程执行。
[0058]ULIM体系结构具有类似于VLIW或MIMD模板的固定大小的指令模板。与VLIW或MIMD模板不同,ULIM模板允许指定并行指令以及依赖指令两者。写入寄存器以及在通过寄存器名称传达依赖性的矩阵内部的指令依赖性都遵循相同的冯诺依曼指令体系结构。ULIM体系结构另一个值得注意的方面在于,矩阵中每个指令在矩阵中具有固定的预定位置,并且以相对于矩阵中其他指令的固定分时进行执行。ULIM矩阵宽度类似于对应的VLIW模板的宽度。事实上,通过在ULIM模板一个横向行中放置一条VLIW指令,可以将依赖VLIW指令以串行流转换到ULIM模板中。不可能相似于利用VLIW指令流的全部可能的ULIM模板,因为ULIM模板可以包括在一个时隙中处于同一行的一条复杂指令以及在横向相邻时隙中的多条简单指令。
[0059]与执行单独指令的传统体系结构不同,使用组成指令矩阵的矩阵体系结构的优点很多。下面的论述示出了启动和利用本发明指令矩阵和执行块体系结构的机制建立和实现大规模并行单处理器流水线。
[0060]图6表示可以实现的利用ULIM体系结构的处理器流水线。本发明利用一种机制来简化提取(fetch)阶段600、分支(branch)分析和解码阶段608。提取单元600基于指令矩阵而不是基于指令取得代码。通过矩阵大小而不是指令大小来增加这种体系结构的程序计数器。这样意味着在每个周期提取大量指令。为了能够实现上述目的,ULIM矩阵不允许在ULIM矩阵内部存在分支,但可以在ULIM矩阵之间存在分支。分支分析在两种级别上进行。在ULIM矩阵内部,分支被替换为条件执行、条件移动和预测。在矩阵之间,通过路径预计和分支着色对分支进行处理。这样允许分组到矩阵中的大量指令前进通过流水线提取阶段和分支分析阶段。
[0061]在单个矩阵内部沿着侧面并行指令执行依赖指令解除了由编译器构造全部并行指令时隙代码的困难。它也简化了在分数板硬件(score boardhardware)602、分派单元604或硬件调度单元中的数据依赖性检查。这是通过利用矩阵数目作为工具执行矩阵之间的分数板和依赖性保持、而不是利用单独的寄存器或单独的指令实现的。在图7的实例中,分数板仅通过参考矩阵数目2、4和5归纳矩阵8的依赖性优先级,这意味着矩阵8需要从那些矩阵中读出数据结果。分数板依赖性检查不需要参考(reference)单独寄存器或指令信息以维持依赖性分数板。矩阵数目携带这种信息并且足以保持依赖性检查的正确性。依赖于其他矩阵的矩阵内部指令可以在那些矩阵已被执行之后发出。当矩阵依赖于延迟(例如,因为高速缓存未中)时,阻止整个矩阵被分派。在ULIM流水线实现的一个实施例中,延迟解码阶段608,直到该阶段正好在执行阶段之前为止,并且该阶段与寄存器读阶段606并行执行。
[0062]图8表示对ULIM体系结构指令模板进行编码的一个实施例。其关键在于编码和操作数的结构(结果寄存器和源寄存器)。在ULIM模板分离分段中指定指令结果和源操作数的寄存器,而不管指令的操作码并且不管这些指令是复杂运算或简单指令对。这种矩阵格式以明确的矩阵分段的方式列出源和目的地,使得获取源寄存器和目的地寄存器与矩阵内部指令解码无关。这样能实现延迟解码阶段,其中单独指令操作数的实际解码被延迟,直到仅仅先于执行阶段为止,并且这种解码和读取寄存器来启动下一个周期执行并行进行。这样也简化了依赖性分析和分数板实现。
[0063]如果时隙包括复杂指令,比如“乘累加(MAC)”,那么需要4个源并且写回2个结果。如果同一时隙包括两个简单指令,比如Logic和ALU,那么每个指令需要两个源并写回一个结果,其中两者合起来需要4个源并产生两个结果。这使得源和结果的数量与运算类型无关。
[0064]如图4A所示的处理器执行硬件包括寄存器读写机制,其中矩阵操作数缓冲器能够基于每个源在相应的矩阵硬件元上执行的物理位置,汇集所需寄存器源和目的地。这样减少了读写端口数目以及寄存器文件的带宽要求。利用这种矩阵体系结构的特点,在矩阵缓冲器中对源和/或目的地进行缓冲的旁路机制得以简化,其中由每个执行单元关闭或附着到矩阵缓冲器能够比传统的寄存器文件(特别在需要支持如此多的单个执行单元的大型寄存器文件的情况下)提供更短的访问时间以及更大的源和结果端口带宽。
[0065]图9表示本发明的矩阵缓冲器的原理。该图显示指令矩阵操作数缓冲器900连接到执行单元902。在这个实例中,指令矩阵操作数缓冲器900为3个不同的矩阵缓冲源和目的地。尤其重要的是写端口904设置成每个写端口向不同的存储器单元写入。这意味着即使同时写入6个结果,矩阵操作数缓冲器存储器单元仍是单个端口的,所述6个结果相当于传统的6路端口的寄存器文件。此外,每个写端口的扇出(需要驱动的单元)等于矩阵缓冲器中的矩阵数目(在这个实例中仅为3)。这些特征在面积、功率和访问速度方面具有很多的优点,使得这种缓冲器结构具有很好的可伸缩性,并且对于作为高带宽高速寄存器文件的备选非常具有吸引力。
[0066]下列方法描述如何从矩阵操作数缓冲器对寄存器进行写入和访问。每个矩阵正好在或略先于矩阵准备好执行而被分配到矩阵缓冲器中任何可用的矩阵空间中。矩阵需要能够开始执行的远端源(例如不在这个矩阵缓冲器内部的其他矩阵存储器中的源)可以暂时在这个矩阵存储器中分级。
[0067]在执行矩阵之后,结果写入为这个矩阵存储器分配的区域中(图9的矩阵缓冲器内的3个矩阵位置之一)。每个结果被写入到由结果写端口访问的对应位置,而与结果寄存器数目无关。与矩阵位置一起的这个位置被传输到类似于图7所示的分数板机制的消耗矩阵中,使得依赖于该矩阵的每个矩阵将用那些源来自的矩阵位置以及矩阵内部每个源的位置来标记它的寄存器源。矩阵内部的结果位置能够在执行时间由执行硬件传递,或者因为矩阵模板在编译时固定,所以矩阵内部结果位置能伴随着源寄存器数目被软件嵌入矩阵指令模板中。
[0068]基本思想是建立起可伸缩的伴随着寄存器文件的源和结果缓冲器结构,其中那些缓冲器连接到执行单元,从而通过在常规寄存器文件和执行单元之间建立起中间介质的标识方法,在基于位置的矩阵中临时保持数据,以允许更高带宽和速度。可利用矩阵位置和矩阵内部源位置访问那些矩阵缓冲器中的值。例如,通过记录在矩阵缓冲器何处分配该矩阵,以及通过在矩阵内部物理占用的寄存器5的结果的条目编号来索引该矩阵本身的存储器,能够访问通过矩阵20写入的寄存器5。然而,在矩阵被从矩阵缓冲器中解除分配之后,那么矩阵内部的全部条目保持寄存器值将写回实际的寄存器文件并且通过从该时刻往前的寄存器编号来访问。用于寄存器文件的利用矩阵数据缓冲器的基于位置的前述相同标识和访问方法可被用于使用存储缓冲器高速缓冲的存储器访问。
[0069]如果下层硬件是统一的并且是重复的,那么可以容易地通过编译器构造该矩阵的体系结构。它也允许更有效率地使用功率和硅面积。在此,我们引入通用单元的概念,所述通用单元是根据诸如小加法器、逻辑门、复用器(multiplexer)和布斯(booth)单元之类的基础建立元而构成的。
[0070]通用单元的体系结构允许它在每一个通用单元的内部执行全部功能/运算。这意味着每个单元都能够进行加法、乘法、移位、转置(permute)等。这种方法能执行这种通用功能的方式在于它的组成脱离了用于执行简单ALU运算的基础建立元。另一方面,这些简单元级联地执行复杂运算。它也可以处理浮点和整数数据。上述思想有利于理解通用单元的概念,但是它通过简化调度并将机器吞吐量利用到极限,从而获得了显著的优点。在常规的体系结构中,采用分离的单元执行每个运算或功能。它们共享相同的执行端口。这样,当使用一个单元时,没有使用其他单元。此外,传统体系结构中的延迟在它们之间发生变化,使得调度器难于调度它们。相反,在这个通用单元中,对于整个单元统一了延迟,并且调度器可以看到通用单元的对称例化。
[0071]图10A显示这个通用单元的一部分。每个通用单元可以执行不同的执行单元功能,比如乘法器、加法器、移位器、转置器等等。这个实施例显示进位保留加法器和/或普通加法器1000的结构。该单元由基础要件组成,每个要件具有4行(可以是更多或更少的加法器)能并行加8个输入(4个并行/串行的加运算)的加法器。这些加法器然后被构造成组。一行中的每个加法器要么连接到下面一行相同位置地址加法器(以执行串行ALU运算),要么被连接到下面一行它右侧的加法器以执行乘法运算。该运算可以是C*B+A或A OP B,在每行中形成4个并行/串行ALU运算。另外,这些加法器组有可能具有不同的数据大小。加法器的这种结构允许树执行多操作数加法、乘法、乘累积、差分求和、移位及循环。另外,复用器(未在图中显示)将对齐/转置/移位输入或中间输出,从而获得所需的运算,包括移位和转置运算。布斯单元/位乘单元1002被加到加法器,从而实现乘法。增加其他具体的逻辑、状态、存储器或LUT元,从而提供扩展功能。
[0072]通用单元允许利用与执行乘法运算相同的结构或与用于执行浮点加或浮点乘累积相同的结构实现转置/移位单元。其优点在于允许较少的路径/面积来实现转置/移位逻辑。通过执行2x乘,利用乘结构执行移位或循环(rotate),其中x是移位计数值。通过分别选择乘积结果的上部分或者下半部,或者在乘积结果的上部分和下部分之间执行"或"函数,来实现左移、右移或循环。
[0073]利用基础的2-输入加法器结构,每个元组成一组位。进位保存加法器还可以利用逻辑和MUX(多路复用器)来实现。例如,为了建立32*32元,基组可以由8位或4位基础加法器和MUXES构建,并且能利用修改过的超前进位加法器单元内部逻辑门执行逻辑函数。
[0074]图10B显示用于图10A中4位加法器的修改后的超前进位加法器的基础一位单元,从而要么产生加法器输出,要么产生选择的逻辑输出。通过不在加法器的关键路径中的连接1010和两个4:1复用器1012显示出这种修改。原来的加法器位结构(标记为1014)包括超前进位和求和逻辑。这些图是逻辑表示,其实际电路可能不同。
[0075]4位(或8位)组便于执行各种大小的SIMD宽度以及进行16位运算。通过相同的原理,那些16位的片(tiles)可以级联起来用于更大宽度的数据运算,比如64位运算,同时仍然便于加法和多操作数加法、移位、循环和乘法的8位、16位和32位SIMD运算。
[0076]这种结构背后的基本原理在于能在同一结构上执行并行和串行指令的组合。例如,第一行结构可以执行单个32位ALU,该ALU可以在第二行继之以依赖ALU或者无关ALU等等。4行一起可以执行至多四个32位ALU串行/平行运算,或者单个32位乘法运算。它还可以对子矩阵执行部分宽度SIMD运算。尤其在矩阵数据和指令分段中,在这个通用单元上调度的指令和操作数成为一个组。
[0077]因为在同一周期或在接下来的周期内,对待调度的依赖指令进行流水线处理依赖于所需频率,所以可以利用指令矩阵体系结构在一个周期内对指令进行流水线处理。存在多种方法来利用ULIM体系结构。该系统允许超大指令矩阵调度。并行指令以及依赖指令作为一个矩阵进行调度(这与VLIW中仅能调度并行指令形成对比)。这个矩阵中每个指令或依赖指令可以是标量或SIMD。
[0078]该发明可用很多种方法实现。例如,可以在时钟周期内部分级多个依赖指令。在本发明的这个具体实施例中,可以在一个时钟周期内部分级多个依赖和并行指令。可以在一个周期内开始多个依赖指令;这样减少了程序执行的最优关键路径。将一个单元内的每种基本运算从随后的运算中分离,利用状态元可以对多个依赖指令进行流水线处理。这样增加了流水线执行率。然而,因为时钟速度和额外的状态元,这种结构的功率将增加。状态元可以保持定值,但是利用波(wave)流水线提高了通过该结构提取的数据率。
[0079]该发明也包括延时切片体系结构(Time-Lag Sliced Architecture,TLSA),其减少了依赖指令的等待时间。延时切片体系结构背后的基本思想在于一种运算通过数字切片产生其结果数字切片。每个切片早于下一个切片一个切片间的延时产生。一旦产生第一个切片,可以开始执行下一个计算,然后产生其自己的切片。在此描述的切片结构是总体结构,适用于计算单元以及寄存器文件和存储器。该体系结构应用于全部算术、移位、整数和浮点运算。
[0080]TLSA用于实现整个系统总体结构,包括存储器和计算。数字切片不必等于固定大小的数字。该发明可以对数字切片方式到达的两个操作数进行运算。本发明可以实现布斯编码乘法器、变量移位器、转置引擎以及浮点加法器和乘法器。
[0081]在设计的执行单元中,普通方法是将算术/逻辑或移位器单元的输出结果的数字或数字位作为一个单输出结果进行同步。该结果要么锁定到存储元或者同时作为一个块分级到接收元中。然而,在这种体系结构中,细粒执行提供或不提供粗粒同步执行。基本思想是在时间和/或空间上分级的延时逻辑切片结构中用公式表示算术或转置/移位运算。执行硬件以延时流连接,其中先前切片执行得更快并且更快产生输出结果的切片,同时后面的切片就需要更多时间来执行并以相对于先前切片延迟产生结果。值得注意的是,即使在单个执行单元内部,比如加法器或转置器内部,也以位/数字的细粒度进行切片。这个体系结构可以利用这种数字/位逻辑电路切片结构来优化逻辑分时关键路径和/或信号路由路径编号和/或执行算术、转置、移位等用于整数和/或浮点运算的面积。该切片可以具有相等的位/数字或不同数目的位/数字。这种体系结构的一个特别的优点在于能够在最后确定全部源指令的输出结果切片之前,开始执行依赖指令。
[0082]图11表示TLSA单元1100,其中数据流入流出切片1102A-1102C。每个切片输出具有相对于前一个切片的延时延迟(d)。这种延时特征允许对该单元进行流水线处理,以便于不同切片的状态元不同步到一个时间,这典型地用于行/分级触发器。
[0083]图11的体系结构包括基础加法器阶段结构,用于计算基础计算任务,诸如多操作数加法或子块乘法等。利用不作为基础任务关键路径的一部分的分级元1104A-1104C,上述计算被级联起来。依赖于被切片的基础计算任务,分级元可以是加法器、复用器或逻辑门。这些分级元1104的延迟最小化并等于时间"d","d"就是每个切片输出和下一个切片输出之间的延迟。
[0084]分级元被选作具有最小的延迟可能,因为它也建立起切片之间的输入操作数切片的延迟。图11中的基础计算任务可采用任意加法器级。那些加法器可以是任何类型,比如是二进位的或进位保留加法器(CSA)。图11的体系结构的优点在于:产生结果的第一切片早于最终结果。第一切片被送到后续运算。
[0085]图12表示本发明的一个实施例,其中多个延时切片单元背对背的交错。在这个实施例中,每个对角线切片表示被分成切片的一个指令计算单元。在此,相对于前一切片的延时处开始执行每个切片。每个单元反馈一个后续的单元。这个实施例显示背对背的四个单元。标记SU1_0引用单元号1的切片号0。SU4_7引用单元号4的切片号7。
[0086]图12所示体系结构允许(如有必要)多个TLSA单元在单个周期内(或者在多个周期)内处理数据。第一单元的下切片反馈第二单元的下切片,并且继而第二单元反馈第三单元,然后第三单元反馈第四单元等。还应当注意到,除第一单元的第一切片反馈第二单元的第一切片之外,它也反馈它自身单元(第一单元)的第二切片。图12表示下列原理:
1-TLSA中的子周期/多周期执行
TLSA允许在一个周期内部执行算术/移位/逻辑运算。图12表示4个单元在一个周期执行,其中每个切片具有延迟的时钟周期。通过同样的方式,在每个单元切片的输出(而不是4的输出)处执行流水线,从而增加多个周期中的吞吐量并在多个周期中进行。
2-异步/同步/波TLSA拓扑
TLSA可以设计成至少3个不同的拓扑或这些拓扑的组合:
A-异步:其中无需同步状态元(例如,触发),切片输入/输出在周期时间内相互通信。这样允许去除内部流水线状态元,并且启动电源放缓时钟域。
B-同步:每个切片计时成状态元(触发/锁定/多米诺(domino)等等)。这样允许更高的时钟吞吐量和流水线速率。
C-波:在这种拓扑里,一片一片地将数据馈入到单元中,接下来输入的数据快于正常的流水线速率。正常流水线速率依赖于两个状态元之间的逻辑路径的最大时间。波流水线速率依赖于两个状态元之间的逻辑路径的最小时间。
一种有趣的拓扑组合是细粒异步粗粒同步(FGA-CGS)。在这种模式中,利用异步连接的延时切片实现TLSA,其中细粒异步执行提供或不提供粗粒同步执行。基本思想是以这种切片分级处理结构来用公式表示算术或转置/移位运算,其中执行硬件的不同切片的执行在一个延迟流中异步连接,其中的早期的切片具有较少的输入并且更快地执行,然后更快地产生其输出结果。后面的切片具有更多的输入,因此需要更多的时间以相对于早期切片的延迟执行并产生结果。然后每个切片以相对于前面的切片具有延时的同步(或异步)元进行计时。
[0087]图12表示在一个周期内交错的多个TLSA单元(在1个时钟内4个背对背加倍计数器单元),同时该实现表示单元切片以异步方式(细粒同步)相互通信的FGA-CGS实现,同时在输出到状态元时钟时同步每个切片或背对背切片。每个切片的输出可以同步到不同的时钟(延迟版本)。
[0088]TLSA体系结构支持超大指令矩阵(ULIM)体系结构。在这个TLSA实施例中,整个指令组被调度,其中在上述TLSA簇的实例上调度多个并行指令和依赖指令。每个单元还可以支持SIMD数据,其中数据切片副本被实例化,但是受到相同的指令激励的控制。另外,多个SIMD通用单元的这种组合结构可以被横向地实例化,从而在单个单元上实现MIMD体系结构。这种方法在一个周期内调度包含SIMD指令和依赖指令的并行变化两者的整个指令模板。
[0089]图13表示传统的乘法技术。具体而言,每个B数字乘以A数字的集合,在乘法开始时每个数字必须在乘法执行单元处可用。然后对结果求和。相反,利用本发明,如图14所示,操作数一个数字一个数字地到达。可见图14每行表示乘法运算的部分乘积,但是相对于数字切片到达时间,仅仅包括当前到达和先前到达的数字。
[0090]为了显示如何建立使用布斯编码乘法的逻辑结构来执行延时切片结构,图15显示了传统的布斯编码32位常规乘法器。布斯编码器组顺序将多位相加以产生数字。这种分组可以减少最大数字值,所述最大数字值是通过编号上连续的两个数字的标记组合来表示这些位的。例如,3位数字的最大值为7,但是通过在该数字值左侧加1,则数字7现在变为了-1。利用已标记的数字表示,允许那些已编码数字值达到原始数字值的1/2的最大值。
[0091]图16实现了图15所示利用根值4数字进行常规布斯乘法的传统逻辑结构。选择器1600A、1600B选择被乘数的哪一个倍数来用完可能的已标记值(0,1,-1,2,-2);通过该乘法器位的布斯编码来确定这种选择。[0092]图17和18显示新的布斯编码方案以及实现这种方案的新的TLSA逻辑结构。注意,传统的乘法器具有连续不断编码的位流,同时TLSA布斯编码器在数字切片边界处在位流中插入O(在这个实例中是8位数字)。无论乘法的符号(最后2位表示符号),插入的0都不变化。
[0093]图18显示出如何实现新修改的延时切片布斯编码32位乘法器。这种实现相似于图11所示的通用TLSA结构。图18实现如图14和17所示的子乘法运算。
[0094]在ULIM基线格式之上,这些体系结构中实现了TLSA中的数据并行。通过允许SIMD/MIMD的TLSA数据格式中的每个指令按照正交维数组织到TLSA切片中实现了这种方法。图19显示由能够根据计划访问的性质不同而看到的指令访问存储器/寄存器文件数据块的结构。例如,数据块可以看作宽数据元的MIMD、小数据元的SIMD、混合数据宽度SIMD指令的MIMD等等。图19显示执行这种组合的体系结构。单元内的切片可以独立地运算,从而执行子运算。例如,每个8位切片可以执行独立的8位乘法,同时构造一个32位乘法器单元的切片组还可以作为4路SIMD字节乘法器运算。另一方面,单元组可以作为MIMD乘法器。
[0095]图20表示以切片方式访问的存储器。在传统的存储器体系结构中,单个操作数(例如64位整数或浮点操作数)作为整体被提取。在完成地址解码之后,通过读端口读取全部位,读端口必需缓存,从而在存储器中操作数大小的整个宽范围内缓冲使能读出。在TLSA体系结构中,在发生地址解码之后,在与前一个切片之间具有延时的跟随切片的延时模型上发生数据读出和/或写入。这个模型的优点在于,当无需立即解码并驱动整个数据宽度时,该模型能够快速响应存储器。
[0096]图21表示利用数据切片之间延时的具有转置/移位单元的TLSA体系结构。这个优点允许更短的时间来产生早期结果切片和/或更少的路由/面积来实现转置/移位逻辑。在图21中,构造右移移位器来获得以8位数字切片的32位操作数的延时到达。32位输入的第一切片(数字)在时间T0到达,而上一次切片在时间T0+3d到达,其中d是一个MUX延迟。纵向显示在右边的位(0到5)表示移位次数(最大32,任何大于32的值产生零输出,基本上全部数据被移出)。从来自移位器右侧的第一低位数字到达开始执行。然后该单元等待下一个输入数字到达。依赖于移位计数值的位3的值,将数字中的一个数字选到更低的数字位置,这样将判断该数目将右移8位。然后,下一个最高有效位到达,并且选择该新数字,或者利用移位计数值中的下一位,传递在上面的复用器级中选择的数据等等。任何移位计数值零移出的复用器位置将对其输出实现零负载。当最后一个数字到达时,它通过最小数量的复用器级,这样将相对于前一个数字具有最小延迟“d”,从而启动上一个数字对输出的快速传播。
[0097]左移移位器可用关于垂线对称的图21的结构来构造,其中左切片具有较多复用器栈,右切片具有较少的复用器栈。最高有效数字经过最小数目的复用器级。
[0098]切片结构通用单元可以是类似于图10A-10B所述的一个通用单元。它采用相同的切片体系结构技术(TLSA)和/或细粒异步原理。它在每一个通用单元内部执行全部功能/运算。这意味着每个通用单元能够执行加法、乘法、移位、转置等。执行这种通用功能具有低延迟,其方法是以延迟方式每次处理一个单独的切片。它也可以处理浮点和整型数据。通用单元原理通过上述思想变得更为清晰,但是它通过简化调度并将机器吞吐量利用到极限从而实现了重要的优点。在常规的体系结构中,分离的单元用于执行每个运算或功能。它们共享相同的执行端口。因此,当一个单元被使用时,其他单元没有被使用。此外,传统体系结构中的延时是变化的,使得调度器调度它们很难。与此形成对比的是,在这个通用单元中,对于整个单元延迟被统一,并且调度器可看见通用单元的对称实例。
[0099]每个通用单元可以执行不同的执行单元功能,比如乘法器、加法器、移位器等等。另外,这些切片有可能具有不同的数据大小。在这个特定示例中,每个切片的数据宽度大于前一个切片的数据宽度。这种加法器结构允许切片树执行多操作数加法、乘法、乘累积、差分求和等。复用器(未在图中显示)对齐/转置/移位输入或中间输出,从而利用加法器/复用器的通用结构获得所需的运算,包括移位和转置运算。布斯单元/位乘单元1002被加到加法器,从而启用乘法。增加其他具体的逻辑、状态、存储器或LUT元来提供扩展功能。
[0100]ULIM体系结构可以利用已披露的TLSA技术进行时间切片。构建通用单元的其他方法是将其配置为浮点乘累积单元(MAC)。用于建立这个单元的功能模块是乘法器、右移移位器、加法器和左移移位器。这种结构在图22中公开。通用单元可以利用这种结构来执行构造FP-MAC的这些功能中任一功能或这些功能的组合。
[0101]已在上文中描述了上述功能中的每一个,因此可以分别作为TLSA结构实现,然后合并起来作为浮点乘累加TLSA结构。这种结构还可以作为连续切片的ALU或后面乘以ALU或移位等。
[0102]本发明的一个实施例涉及具有计算机可读介质的计算机存储产品,其上具有用于执行各种计算机实现的运算的计算机代码。这种介质和计算机代码可以是为本发明目的而专门设计和构建的,或者是计算机软件领域技术人员公知的并可以获得的。计算机可读介质的实例包括但不限于:诸如硬盘、软盘和磁带之类的磁介质;诸如CD-ROM、DVD和全息装置之类的光学介质;磁光介质;以及专门配置为存储和执行程序代码的硬件设备,比如专用集成电路(ASIC)、可编程逻辑器件(PLD)以及ROM及RAM设备。计算机代码的实例包括机器代码,诸如编译器产生的机器代码,以及由计算机利用解释器执行的包含高阶代码的文件。例如,本发明的一个实施例可利用Java、C++或其他的面向对象的程序设计语言和开发工具来实现。本发明的另一个实施例将以硬线电路代替机器执行软件指令来实现或以硬线电路结合机器执行软件指令来实现。
[0103]以上描述仅用于解释的目的,采用了特定的术语来提供对本发明的透彻理解。然而显而易见的是,对于本领域的技术人员而言,无需这些具体的细节也能够实现本发明。因此,本发明的上述具体实施例的描述仅是出于解释说明的目的。它们不意味着是详尽的,或者将本发明限制为已公开的精确形式;很明显,可以对上述教诲进行各种各样的修改和变更。为了最佳地解释本发明及其实际应用的原理才选择和描述了上述的实施例,因此它们使得本领域的其他技术人员能够最优地利用本发明并且具有各种修改的各种实施例适用于特定用途。本发明的范围由所附权利要求书及其等效内容来界定。

Claims (20)

1.一种装置,包括:
形成多行和多列的执行块矩阵,其中所述多行支持指令的并行执行,以及所述多列支持依赖指令的执行,其中所述执行块矩阵处理载明并行指令和依赖指令的单个指令块。
2.权利要求1的装置,其中所述执行块矩阵形成单处理器流水线的一部分。
3.权利要求2的装置,其中所述单处理器流水线包括提取阶段、分支处理阶段、解码阶段、调度阶段、执行阶段和提交阶段。
4.权利要求3的装置,其中所述提取阶段提取所述单个指令块,并且根据所述指令块的大小增加程序计数器的计数值。
5.权利要求4的装置,其中所述分支处理阶段支持其他指令块的分支。
6.权利要求3的装置,其中所述调度阶段使用块寄存器参考。
7.权利要求3的装置,还包括支持并行块写入的寄存器文件。
8.权利要求1的装置,其中所述执行块支持浮点运算、整数运算、单指令多数据(SIMD)运算和多指令多数据(MIMD)运算。
9.权利要求1的装置,其中所述执行块矩阵形成延时切片体系结构,从而在单个时钟周期内处理并行指令和依赖指令。
10.权利要求9的装置,其中所述延时切片体系结构在执行切片之间形成延时。
11.权利要求9的装置,其中所述执行块矩阵在单个时钟周期内启动多个依赖指令。
12.权利要求11的装置,其中利用分离执行切片的状态元对多个依赖指令进行流水线处理。
13.权利要求9的装置,其中所述延时切片体系结构每次产生一个数字。
14.权利要求13的装置,其中不同的数字具有不同的位宽。
15.权利要求9的装置,其中所述延时切片体系结构包括经修改的布斯编码。
16.权利要求15的装置,其中所述经修改的布斯编码在数字切片边界插入零。
17.权利要求9的装置,其中所述延时切片体系结构包括右移移位器,所述右移移位器使第一组切片数字应用于第一复用器栈,并使第二组切片数字应用于第二复用器栈,其中所述第一复用器栈大于所述第二复用器栈。
18.权利要求9的装置,其中所述延时切片体系结构包括左移移位器,所述左移移位器使第一组切片数字应用于第一复用器栈,并使第二组切片数字应用于第二复用器栈,其中所述第二复用器栈大于所述第一复用器栈。
19.权利要求9的装置,还包括具有延时结构的存储器,以产生第一数据切片,所述第一数据切片后面跟随有延时的第二数据切片。
20.权利要求9的装置,还包括浮点乘累加单元,所述浮点乘累加单元实现乘法运算、加法运算、右移运算、左移运算和混合运算。
CN2007800173174A 2006-04-12 2007-04-12 对载明并行和依赖运算的指令矩阵进行处理的装置和方法 Active CN101449256B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310589048.6A CN103646009B (zh) 2006-04-12 2007-04-12 对载明并行和依赖运算的指令矩阵进行处理的装置和方法

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US79164906P 2006-04-12 2006-04-12
US79178206P 2006-04-12 2006-04-12
US60/791,782 2006-04-12
US60/791,649 2006-04-12
US79221906P 2006-04-14 2006-04-14
US60/792,219 2006-04-14
PCT/US2007/066536 WO2007143278A2 (en) 2006-04-12 2007-04-12 Apparatus and method for processing an instruction matrix specifying parallel and dependent operations

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201310589048.6A Division CN103646009B (zh) 2006-04-12 2007-04-12 对载明并行和依赖运算的指令矩阵进行处理的装置和方法

Publications (2)

Publication Number Publication Date
CN101449256A true CN101449256A (zh) 2009-06-03
CN101449256B CN101449256B (zh) 2013-12-25

Family

ID=38802165

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201310589048.6A Active CN103646009B (zh) 2006-04-12 2007-04-12 对载明并行和依赖运算的指令矩阵进行处理的装置和方法
CN2007800173174A Active CN101449256B (zh) 2006-04-12 2007-04-12 对载明并行和依赖运算的指令矩阵进行处理的装置和方法

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201310589048.6A Active CN103646009B (zh) 2006-04-12 2007-04-12 对载明并行和依赖运算的指令矩阵进行处理的装置和方法

Country Status (4)

Country Link
US (5) US8327115B2 (zh)
EP (2) EP2011018B1 (zh)
CN (2) CN103646009B (zh)
WO (1) WO2007143278A2 (zh)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104583975A (zh) * 2012-06-15 2015-04-29 索夫特机械公司 无消歧乱序加载存储队列
CN105074657A (zh) * 2013-01-29 2015-11-18 超威半导体公司 并行管道中的发散分支的硬件和软件解决方案
CN104008216B (zh) * 2013-02-22 2017-04-26 円星科技股份有限公司 使用存储器编译器以产生最佳化存储器实例的方法
US9766893B2 (en) 2011-03-25 2017-09-19 Intel Corporation Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
CN107305538A (zh) * 2016-04-22 2017-10-31 北京中科寒武纪科技有限公司 一种子矩阵运算装置及方法
US9811377B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for executing multithreaded instructions grouped into blocks
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
US9823930B2 (en) 2013-03-15 2017-11-21 Intel Corporation Method for emulating a guest centralized flag architecture by using a native distributed flag architecture
US9842005B2 (en) 2011-03-25 2017-12-12 Intel Corporation Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9858080B2 (en) 2013-03-15 2018-01-02 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
US9886416B2 (en) 2006-04-12 2018-02-06 Intel Corporation Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US9898412B2 (en) 2013-03-15 2018-02-20 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US9904552B2 (en) 2012-06-15 2018-02-27 Intel Corporation Virtual load store queue having a dynamic dispatch window with a distributed structure
US9921845B2 (en) 2011-03-25 2018-03-20 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9928121B2 (en) 2012-06-15 2018-03-27 Intel Corporation Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization
US9934042B2 (en) 2013-03-15 2018-04-03 Intel Corporation Method for dependency broadcasting through a block organized source view data structure
US9940134B2 (en) 2011-05-20 2018-04-10 Intel Corporation Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines
US9965281B2 (en) 2006-11-14 2018-05-08 Intel Corporation Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer
US9965277B2 (en) 2012-06-15 2018-05-08 Intel Corporation Virtual load store queue having a dynamic dispatch window with a unified structure
CN108121688A (zh) * 2017-12-15 2018-06-05 北京中科寒武纪科技有限公司 一种计算方法及相关产品
US9990198B2 (en) 2012-06-15 2018-06-05 Intel Corporation Instruction definition to implement load store reordering and optimization
US10019263B2 (en) 2012-06-15 2018-07-10 Intel Corporation Reordered speculative instruction sequences with a disambiguation-free out of order load store queue
US10031784B2 (en) 2011-05-20 2018-07-24 Intel Corporation Interconnect system to support the execution of instruction sequences by a plurality of partitionable engines
US10140138B2 (en) 2013-03-15 2018-11-27 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
US10146548B2 (en) 2013-03-15 2018-12-04 Intel Corporation Method for populating a source view data structure by using register template snapshots
US10169045B2 (en) 2013-03-15 2019-01-01 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
US10191746B2 (en) 2011-11-22 2019-01-29 Intel Corporation Accelerated code optimizer for a multiengine microprocessor
US10198266B2 (en) 2013-03-15 2019-02-05 Intel Corporation Method for populating register view data structure by using register template snapshots
US10228949B2 (en) 2010-09-17 2019-03-12 Intel Corporation Single cycle multi-branch prediction including shadow cache for early far branch prediction
US10521239B2 (en) 2011-11-22 2019-12-31 Intel Corporation Microprocessor accelerated code optimizer
CN112596869A (zh) * 2020-12-08 2021-04-02 海光信息技术股份有限公司 延时模型的构建方法及装置、电子设备及存储介质

Families Citing this family (62)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8755515B1 (en) * 2008-09-29 2014-06-17 Wai Wu Parallel signal processing system and method
EP2207090B1 (en) * 2009-01-09 2018-12-19 STMicroelectronics (Research & Development) Limited Apparatus comprising a plurality of arithmetic logic units
US10698859B2 (en) 2009-09-18 2020-06-30 The Board Of Regents Of The University Of Texas System Data multicasting with router replication and target instruction identification in a distributed multi-core processing architecture
US10061618B2 (en) * 2011-06-16 2018-08-28 Imagination Technologies Limited Scheduling heterogenous computation on multithreaded processors
US20130027416A1 (en) * 2011-07-25 2013-01-31 Karthikeyan Vaithianathan Gather method and apparatus for media processing accelerators
KR101648278B1 (ko) 2011-11-22 2016-08-12 소프트 머신즈, 인크. 마이크로프로세서 가속 코드 최적화기 및 의존성 재순서화 방법
US9378181B2 (en) * 2012-11-09 2016-06-28 Intel Corporation Scalable computing array
US9632825B2 (en) 2013-03-15 2017-04-25 Intel Corporation Method and apparatus for efficient scheduling for asymmetrical execution units
KR20140126195A (ko) * 2013-04-22 2014-10-30 삼성전자주식회사 배치 쓰레드 처리 기반의 프로세서, 그 프로세서를 이용한 배치 쓰레드 처리 방법 및 배치 쓰레드 처리를 위한 코드 생성 장치
US9870204B2 (en) * 2014-11-14 2018-01-16 Cavium, Inc. Algorithm to achieve optimal layout of instruction tables for programmable network devices
US10180841B2 (en) 2014-12-22 2019-01-15 Centipede Semi Ltd. Early termination of segment monitoring in run-time code parallelization
US10296350B2 (en) 2015-03-31 2019-05-21 Centipede Semi Ltd. Parallelized execution of instruction sequences
US10296346B2 (en) 2015-03-31 2019-05-21 Centipede Semi Ltd. Parallelized execution of instruction sequences based on pre-monitoring
US9952865B2 (en) * 2015-04-04 2018-04-24 Texas Instruments Incorporated Low energy accelerator processor architecture with short parallel instruction word and non-orthogonal register data file
US9715390B2 (en) 2015-04-19 2017-07-25 Centipede Semi Ltd. Run-time parallelization of code execution based on an approximate register-access specification
US10191747B2 (en) 2015-06-26 2019-01-29 Microsoft Technology Licensing, Llc Locking operand values for groups of instructions executed atomically
US10409599B2 (en) 2015-06-26 2019-09-10 Microsoft Technology Licensing, Llc Decoding information about a group of instructions including a size of the group of instructions
US9946548B2 (en) 2015-06-26 2018-04-17 Microsoft Technology Licensing, Llc Age-based management of instruction blocks in a processor instruction window
US9940136B2 (en) 2015-06-26 2018-04-10 Microsoft Technology Licensing, Llc Reuse of decoded instructions
US10169044B2 (en) 2015-06-26 2019-01-01 Microsoft Technology Licensing, Llc Processing an encoding format field to interpret header information regarding a group of instructions
US10409606B2 (en) 2015-06-26 2019-09-10 Microsoft Technology Licensing, Llc Verifying branch targets
US10346168B2 (en) 2015-06-26 2019-07-09 Microsoft Technology Licensing, Llc Decoupled processor instruction window and operand buffer
US11755484B2 (en) 2015-06-26 2023-09-12 Microsoft Technology Licensing, Llc Instruction block allocation
US9952867B2 (en) 2015-06-26 2018-04-24 Microsoft Technology Licensing, Llc Mapping instruction blocks based on block size
US10175988B2 (en) 2015-06-26 2019-01-08 Microsoft Technology Licensing, Llc Explicit instruction scheduler state information for a processor
CN105183433B (zh) 2015-08-24 2018-02-06 上海兆芯集成电路有限公司 指令合并方法以及具有多数据通道的装置
US10678544B2 (en) 2015-09-19 2020-06-09 Microsoft Technology Licensing, Llc Initiating instruction block execution using a register access instruction
US10452399B2 (en) 2015-09-19 2019-10-22 Microsoft Technology Licensing, Llc Broadcast channel architectures for block-based processors
US10180840B2 (en) 2015-09-19 2019-01-15 Microsoft Technology Licensing, Llc Dynamic generation of null instructions
US10871967B2 (en) 2015-09-19 2020-12-22 Microsoft Technology Licensing, Llc Register read/write ordering
US11126433B2 (en) 2015-09-19 2021-09-21 Microsoft Technology Licensing, Llc Block-based processor core composition register
US11977891B2 (en) 2015-09-19 2024-05-07 Microsoft Technology Licensing, Llc Implicit program order
US11681531B2 (en) 2015-09-19 2023-06-20 Microsoft Technology Licensing, Llc Generation and use of memory access instruction order encodings
US10198263B2 (en) 2015-09-19 2019-02-05 Microsoft Technology Licensing, Llc Write nullification
US10061584B2 (en) 2015-09-19 2018-08-28 Microsoft Technology Licensing, Llc Store nullification in the target field
US10776115B2 (en) 2015-09-19 2020-09-15 Microsoft Technology Licensing, Llc Debug support for block-based processor
US10936316B2 (en) 2015-09-19 2021-03-02 Microsoft Technology Licensing, Llc Dense read encoding for dataflow ISA
US10719321B2 (en) 2015-09-19 2020-07-21 Microsoft Technology Licensing, Llc Prefetching instruction blocks
US10031756B2 (en) 2015-09-19 2018-07-24 Microsoft Technology Licensing, Llc Multi-nullification
US10095519B2 (en) 2015-09-19 2018-10-09 Microsoft Technology Licensing, Llc Instruction block address register
US10768936B2 (en) 2015-09-19 2020-09-08 Microsoft Technology Licensing, Llc Block-based processor including topology and control registers to indicate resource sharing and size of logical processor
US11016770B2 (en) 2015-09-19 2021-05-25 Microsoft Technology Licensing, Llc Distinct system registers for logical processors
US9921838B2 (en) * 2015-10-02 2018-03-20 Mediatek Inc. System and method for managing static divergence in a SIMD computing architecture
US11106467B2 (en) 2016-04-28 2021-08-31 Microsoft Technology Licensing, Llc Incremental scheduler for out-of-order block ISA processors
US11531552B2 (en) 2017-02-06 2022-12-20 Microsoft Technology Licensing, Llc Executing multiple programs simultaneously on a processor core
US10474468B2 (en) * 2017-02-22 2019-11-12 Advanced Micro Devices, Inc. Indicating instruction scheduling mode for processing wavefront portions
US11567765B2 (en) 2017-03-20 2023-01-31 Intel Corporation Systems, methods, and apparatuses for tile load
US10372456B2 (en) * 2017-05-24 2019-08-06 Microsoft Technology Licensing, Llc Tensor processor instruction set architecture
US11275588B2 (en) 2017-07-01 2022-03-15 Intel Corporation Context save with variable save state size
CN107895191B (zh) * 2017-10-30 2022-02-22 上海寒武纪信息科技有限公司 一种信息处理方法及相关产品
US10963379B2 (en) 2018-01-30 2021-03-30 Microsoft Technology Licensing, Llc Coupling wide memory interface to wide write back paths
US10664287B2 (en) * 2018-03-30 2020-05-26 Intel Corporation Systems and methods for implementing chained tile operations
CN110659069B (zh) * 2018-06-28 2022-08-19 赛灵思公司 用于执行神经网络计算的指令调度方法及相应计算系统
US20210133854A1 (en) 2018-09-13 2021-05-06 Shanghai Cambricon Information Technology Co., Ltd. Information processing method and terminal device
US10824429B2 (en) 2018-09-19 2020-11-03 Microsoft Technology Licensing, Llc Commit logic and precise exceptions in explicit dataflow graph execution architectures
CN110032407B (zh) * 2019-03-08 2020-12-22 创新先进技术有限公司 提升cpu并行性能的方法及装置和电子设备
US11520854B2 (en) * 2019-10-29 2022-12-06 Meta Platforms, Inc. Support for different matrix multiplications by selecting adder tree intermediate results
EP4133368A1 (en) * 2020-05-14 2023-02-15 Huawei Technologies Co., Ltd. Device and method for data processing
US11706039B2 (en) 2020-12-26 2023-07-18 Intel Corporation ISA accessible physical unclonable function
US11570010B2 (en) * 2020-12-26 2023-01-31 Intel Corporation ISA accessible physical unclonable function
US11700135B2 (en) * 2020-12-26 2023-07-11 Intel Corporation ISA accessible physical unclonable function
TWI792546B (zh) * 2021-09-09 2023-02-11 瑞昱半導體股份有限公司 用於管線化控制的設備以及方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4835680A (en) * 1985-03-15 1989-05-30 Xerox Corporation Adaptive processor array capable of learning variable associations useful in recognizing classes of inputs
US4943909A (en) * 1987-07-08 1990-07-24 At&T Bell Laboratories Computational origami
WO2004114128A2 (en) * 2003-06-25 2004-12-29 Koninklijke Philips Electronics N.V. Instruction controlled data processing device

Family Cites Families (508)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US727487A (en) 1902-10-21 1903-05-05 Swan F Swanson Dumping-car.
US4075704A (en) 1976-07-02 1978-02-21 Floating Point Systems, Inc. Floating point data processor for high speech operation
US4228496A (en) 1976-09-07 1980-10-14 Tandem Computers Incorporated Multiprocessor system
US4245344A (en) 1979-04-02 1981-01-13 Rockwell International Corporation Processing system with dual buses
US4527237A (en) 1979-10-11 1985-07-02 Nanodata Computer Corporation Data processing system
US4414624A (en) 1980-11-19 1983-11-08 The United States Of America As Represented By The Secretary Of The Navy Multiple-microcomputer processing
US4524415A (en) 1982-12-07 1985-06-18 Motorola, Inc. Virtual machine data processor
US4597061B1 (en) 1983-01-03 1998-06-09 Texas Instruments Inc Memory system using pipleline circuitry for improved system
US4577273A (en) 1983-06-06 1986-03-18 Sperry Corporation Multiple microcomputer system for digital computers
US4682281A (en) 1983-08-30 1987-07-21 Amdahl Corporation Data storage unit employing translation lookaside buffer pointer
US4600986A (en) 1984-04-02 1986-07-15 Sperry Corporation Pipelined split stack with high performance interleaved decode
US4633434A (en) 1984-04-02 1986-12-30 Sperry Corporation High performance storage unit
JPS6140643A (ja) 1984-07-31 1986-02-26 Hitachi Ltd システムの資源割当て制御方式
JPS6289149A (ja) 1985-10-15 1987-04-23 Agency Of Ind Science & Technol 多ポ−トメモリシステム
JPH0658650B2 (ja) 1986-03-14 1994-08-03 株式会社日立製作所 仮想計算機システム
US4920477A (en) 1987-04-20 1990-04-24 Multiflow Computer, Inc. Virtual address table look aside buffer miss recovery method and apparatus
JP2930341B2 (ja) * 1988-10-07 1999-08-03 マーチン・マリエッタ・コーポレーション データ並列処理装置
US5339398A (en) 1989-07-31 1994-08-16 North American Philips Corporation Memory architecture and method of data organization optimized for hashing
US5471593A (en) 1989-12-11 1995-11-28 Branigin; Michael H. Computer processor with an efficient means of executing many instructions simultaneously
US5197130A (en) 1989-12-29 1993-03-23 Supercomputer Systems Limited Partnership Cluster architecture for a highly parallel scalar/vector multiprocessor system
EP0463965B1 (en) 1990-06-29 1998-09-09 Digital Equipment Corporation Branch prediction unit for high-performance processor
US5317754A (en) 1990-10-23 1994-05-31 International Business Machines Corporation Method and apparatus for enabling an interpretive execution subset
US5317705A (en) 1990-10-24 1994-05-31 International Business Machines Corporation Apparatus and method for TLB purge reduction in a multi-level machine system
US6282583B1 (en) 1991-06-04 2001-08-28 Silicon Graphics, Inc. Method and apparatus for memory access in a matrix processor computer
US5539911A (en) 1991-07-08 1996-07-23 Seiko Epson Corporation High-performance, superscalar-based computer system with out-of-order instruction execution
JPH0820949B2 (ja) 1991-11-26 1996-03-04 松下電器産業株式会社 情報処理装置
WO1993013481A1 (en) 1991-12-23 1993-07-08 Intel Corporation Interleaved cache for multiple accesses per clock in a microprocessor
JP2647327B2 (ja) * 1992-04-06 1997-08-27 インターナショナル・ビジネス・マシーンズ・コーポレイション 大規模並列コンピューティング・システム装置
KR100309566B1 (ko) 1992-04-29 2001-12-15 리패치 파이프라인프로세서에서다중명령어를무리짓고,그룹화된명령어를동시에발행하고,그룹화된명령어를실행시키는방법및장치
EP0638183B1 (en) 1992-05-01 1997-03-05 Seiko Epson Corporation A system and method for retiring instructions in a superscalar microprocessor
EP0576262B1 (en) * 1992-06-25 2000-08-23 Canon Kabushiki Kaisha Apparatus for multiplying integers of many figures
JPH0637202A (ja) 1992-07-20 1994-02-10 Mitsubishi Electric Corp マイクロ波ic用パッケージ
JPH06110781A (ja) 1992-09-30 1994-04-22 Nec Corp キャッシュメモリ装置
US5493660A (en) 1992-10-06 1996-02-20 Hewlett-Packard Company Software assisted hardware TLB miss handler
US5513335A (en) 1992-11-02 1996-04-30 Sgs-Thomson Microelectronics, Inc. Cache tag memory having first and second single-port arrays and a dual-port array
US5819088A (en) 1993-03-25 1998-10-06 Intel Corporation Method and apparatus for scheduling instructions for execution on a multi-issue architecture computer
US5548773A (en) * 1993-03-30 1996-08-20 The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration Digital parallel processor array for optimum path planning
JPH0784883A (ja) 1993-09-17 1995-03-31 Hitachi Ltd 仮想計算機システムのアドレス変換バッファパージ方法
US6948172B1 (en) 1993-09-21 2005-09-20 Microsoft Corporation Preemptive multi-tasking with cooperative groups of tasks
US5469376A (en) 1993-10-14 1995-11-21 Abdallah; Mohammad A. F. F. Digital circuit for the evaluation of mathematical expressions
US5517651A (en) 1993-12-29 1996-05-14 Intel Corporation Method and apparatus for loading a segment register in a microprocessor capable of operating in multiple modes
US5956753A (en) 1993-12-30 1999-09-21 Intel Corporation Method and apparatus for handling speculative memory access operations
US5761476A (en) 1993-12-30 1998-06-02 Intel Corporation Non-clocked early read for back-to-back scheduling of instructions
JP3048498B2 (ja) 1994-04-13 2000-06-05 株式会社東芝 半導体記憶装置
JPH07287668A (ja) 1994-04-19 1995-10-31 Hitachi Ltd データ処理装置
CN1084005C (zh) 1994-06-27 2002-05-01 国际商业机器公司 用于动态控制地址空间分配的方法和设备
US5548742A (en) 1994-08-11 1996-08-20 Intel Corporation Method and apparatus for combining a direct-mapped cache and a multiple-way cache in a cache memory
US5813031A (en) 1994-09-21 1998-09-22 Industrial Technology Research Institute Caching tag for a large scale cache computer memory system
US5640534A (en) 1994-10-05 1997-06-17 International Business Machines Corporation Method and system for concurrent access in a data cache array utilizing multiple match line selection paths
US5835951A (en) 1994-10-18 1998-11-10 National Semiconductor Branch processing unit with target cache read prioritization protocol for handling multiple hits
JP3569014B2 (ja) 1994-11-25 2004-09-22 富士通株式会社 マルチコンテキストをサポートするプロセッサおよび処理方法
US5724565A (en) 1995-02-03 1998-03-03 International Business Machines Corporation Method and system for processing first and second sets of instructions by first and second types of processing systems
US5673426A (en) 1995-02-14 1997-09-30 Hal Computer Systems, Inc. Processor structure and method for tracking floating-point exceptions
US5675759A (en) 1995-03-03 1997-10-07 Shebanow; Michael C. Method and apparatus for register management using issue sequence prior physical register and register association validity information
US5634068A (en) 1995-03-31 1997-05-27 Sun Microsystems, Inc. Packet switched cache coherent multiprocessor system
US5751982A (en) 1995-03-31 1998-05-12 Apple Computer, Inc. Software emulation system with dynamic translation of emulated instructions for increased processing speed
US6209085B1 (en) 1995-05-05 2001-03-27 Intel Corporation Method and apparatus for performing process switching in multiprocessor computer systems
US6643765B1 (en) 1995-08-16 2003-11-04 Microunity Systems Engineering, Inc. Programmable processor with group floating point operations
US5710902A (en) 1995-09-06 1998-01-20 Intel Corporation Instruction dependency chain indentifier
US6341324B1 (en) 1995-10-06 2002-01-22 Lsi Logic Corporation Exception processing in superscalar microprocessor
US5864657A (en) 1995-11-29 1999-01-26 Texas Micro, Inc. Main memory system and checkpointing protocol for fault-tolerant computer system
US5983327A (en) 1995-12-01 1999-11-09 Nortel Networks Corporation Data path architecture and arbitration scheme for providing access to a shared system resource
US5793941A (en) 1995-12-04 1998-08-11 Advanced Micro Devices, Inc. On-chip primary cache testing circuit and test method
US5911057A (en) 1995-12-19 1999-06-08 Texas Instruments Incorporated Superscalar microprocessor having combined register and memory renaming circuits, systems, and methods
US5699537A (en) 1995-12-22 1997-12-16 Intel Corporation Processor microarchitecture for efficient dynamic scheduling and execution of chains of dependent instructions
US6882177B1 (en) 1996-01-10 2005-04-19 Altera Corporation Tristate structures for programmable logic devices
US5754818A (en) 1996-03-22 1998-05-19 Sun Microsystems, Inc. Architecture and method for sharing TLB entries through process IDS
US5904892A (en) 1996-04-01 1999-05-18 Saint-Gobain/Norton Industrial Ceramics Corp. Tape cast silicon carbide dummy wafer
US5752260A (en) 1996-04-29 1998-05-12 International Business Machines Corporation High-speed, multiple-port, interleaved cache with arbitration of multiple access addresses
US5806085A (en) 1996-05-01 1998-09-08 Sun Microsystems, Inc. Method for non-volatile caching of network and CD-ROM file accesses using a cache directory, pointers, file name conversion, a local hard disk, and separate small database
US5829028A (en) 1996-05-06 1998-10-27 Advanced Micro Devices, Inc. Data cache configured to store data in a use-once manner
US6108769A (en) 1996-05-17 2000-08-22 Advanced Micro Devices, Inc. Dependency table for reducing dependency checking hardware
US5958042A (en) 1996-06-11 1999-09-28 Sun Microsystems, Inc. Grouping logic circuit in a pipelined superscalar processor
US5881277A (en) 1996-06-13 1999-03-09 Texas Instruments Incorporated Pipelined microprocessor with branch misprediction cache circuits, systems and methods
US5860146A (en) 1996-06-25 1999-01-12 Sun Microsystems, Inc. Auxiliary translation lookaside buffer for assisting in accessing data in remote address spaces
US5903760A (en) 1996-06-27 1999-05-11 Intel Corporation Method and apparatus for translating a conditional instruction compatible with a first instruction set architecture (ISA) into a conditional instruction compatible with a second ISA
US5974506A (en) 1996-06-28 1999-10-26 Digital Equipment Corporation Enabling mirror, nonmirror and partial mirror cache modes in a dual cache system
US6167490A (en) 1996-09-20 2000-12-26 University Of Washington Using global memory information to manage memory in a computer network
KR19980032776A (ko) 1996-10-16 1998-07-25 가나이 츠토무 데이타 프로세서 및 데이타 처리시스템
WO1998020415A1 (en) 1996-11-04 1998-05-14 Philips Electronics N.V. Processing device, reads instructions in memory
US6385715B1 (en) 1996-11-13 2002-05-07 Intel Corporation Multi-threading for a processor utilizing a replay queue
US5978906A (en) 1996-11-19 1999-11-02 Advanced Micro Devices, Inc. Branch selectors associated with byte ranges within an instruction cache for rapidly identifying branch predictions
US6253316B1 (en) 1996-11-19 2001-06-26 Advanced Micro Devices, Inc. Three state branch history using one bit in a branch prediction mechanism
US5903750A (en) 1996-11-20 1999-05-11 Institute For The Development Of Emerging Architectures, L.L.P. Dynamic branch prediction for branch instructions with multiple targets
US6212542B1 (en) 1996-12-16 2001-04-03 International Business Machines Corporation Method and system for executing a program within a multiscalar processor by processing linked thread descriptors
US6134634A (en) 1996-12-20 2000-10-17 Texas Instruments Incorporated Method and apparatus for preemptive cache write-back
US5918251A (en) 1996-12-23 1999-06-29 Intel Corporation Method and apparatus for preloading different default address translation attributes
US6016540A (en) 1997-01-08 2000-01-18 Intel Corporation Method and apparatus for scheduling instructions in waves
US6065105A (en) 1997-01-08 2000-05-16 Intel Corporation Dependency matrix
US5802602A (en) 1997-01-17 1998-09-01 Intel Corporation Method and apparatus for performing reads of related data from a set-associative cache memory
US6088780A (en) 1997-03-31 2000-07-11 Institute For The Development Of Emerging Architecture, L.L.C. Page table walker that uses at least one of a default page size and a page size selected for a virtual address space to position a sliding field in a virtual address
US6314511B2 (en) 1997-04-03 2001-11-06 University Of Washington Mechanism for freeing registers on processors that perform dynamic out-of-order execution of instructions using renaming registers
US6035120A (en) 1997-05-28 2000-03-07 Sun Microsystems, Inc. Method and apparatus for converting executable computer programs in a heterogeneous computing environment
US6075938A (en) 1997-06-10 2000-06-13 The Board Of Trustees Of The Leland Stanford Junior University Virtual machine monitors for scalable multiprocessors
US6073230A (en) 1997-06-11 2000-06-06 Advanced Micro Devices, Inc. Instruction fetch unit configured to provide sequential way prediction for sequential instruction fetches
JPH1124929A (ja) 1997-06-30 1999-01-29 Sony Corp 演算処理装置およびその方法
US6658447B2 (en) 1997-07-08 2003-12-02 Intel Corporation Priority based simultaneous multi-threading
US6128728A (en) 1997-08-01 2000-10-03 Micron Technology, Inc. Virtual shadow registers and virtual register windows
US6170051B1 (en) 1997-08-01 2001-01-02 Micron Technology, Inc. Apparatus and method for program level parallelism in a VLIW processor
US6085315A (en) 1997-09-12 2000-07-04 Siemens Aktiengesellschaft Data processing device with loop pipeline
US6101577A (en) 1997-09-15 2000-08-08 Advanced Micro Devices, Inc. Pipelined instruction cache and branch prediction mechanism therefor
US5901294A (en) 1997-09-18 1999-05-04 International Business Machines Corporation Method and system for bus arbitration in a multiprocessor system utilizing simultaneous variable-width bus access
US6185660B1 (en) 1997-09-23 2001-02-06 Hewlett-Packard Company Pending access queue for providing data to a target register during an intermediate pipeline phase after a computer cache miss
US5905509A (en) 1997-09-30 1999-05-18 Compaq Computer Corp. Accelerated Graphics Port two level Gart cache having distributed first level caches
US6226732B1 (en) 1997-10-02 2001-05-01 Hitachi Micro Systems, Inc. Memory system architecture
US5922065A (en) 1997-10-13 1999-07-13 Institute For The Development Of Emerging Architectures, L.L.C. Processor utilizing a template field for encoding instruction sequences in a wide-word format
US6178482B1 (en) 1997-11-03 2001-01-23 Brecis Communications Virtual register sets
US6021484A (en) 1997-11-14 2000-02-01 Samsung Electronics Co., Ltd. Dual instruction set architecture
US6256728B1 (en) 1997-11-17 2001-07-03 Advanced Micro Devices, Inc. Processor configured to selectively cancel instructions from its pipeline responsive to a predicted-taken short forward branch instruction
US6260131B1 (en) 1997-11-18 2001-07-10 Intrinsity, Inc. Method and apparatus for TLB memory ordering
US6016533A (en) 1997-12-16 2000-01-18 Advanced Micro Devices, Inc. Way prediction logic for cache array
US6219776B1 (en) 1998-03-10 2001-04-17 Billions Of Operations Per Second Merged array controller and processing element
US6609189B1 (en) 1998-03-12 2003-08-19 Yale University Cycle segmented prefix circuits
JP3657424B2 (ja) 1998-03-20 2005-06-08 松下電器産業株式会社 番組情報を放送するセンター装置と端末装置
US6216215B1 (en) 1998-04-02 2001-04-10 Intel Corporation Method and apparatus for senior loads
US6157998A (en) 1998-04-03 2000-12-05 Motorola Inc. Method for performing branch prediction and resolution of two or more branch instructions within two or more branch prediction buffers
US6205545B1 (en) 1998-04-30 2001-03-20 Hewlett-Packard Company Method and apparatus for using static branch predictions hints with dynamically translated code traces to improve performance
US6115809A (en) 1998-04-30 2000-09-05 Hewlett-Packard Company Compiling strong and weak branching behavior instruction blocks to separate caches for dynamic and static prediction
US6256727B1 (en) 1998-05-12 2001-07-03 International Business Machines Corporation Method and system for fetching noncontiguous instructions in a single clock cycle
JPH11338710A (ja) 1998-05-28 1999-12-10 Toshiba Corp 複数種の命令セットを持つプロセッサのためのコンパイル方法ならびに装置および同方法がプログラムされ記録される記録媒体
US6272616B1 (en) 1998-06-17 2001-08-07 Agere Systems Guardian Corp. Method and apparatus for executing multiple instruction streams in a digital processor with multiple data paths
US6988183B1 (en) 1998-06-26 2006-01-17 Derek Chi-Lan Wong Methods for increasing instruction-level parallelism in microprocessors and digital system
US6260138B1 (en) 1998-07-17 2001-07-10 Sun Microsystems, Inc. Method and apparatus for branch instruction processing in a processor
US6122656A (en) 1998-07-31 2000-09-19 Advanced Micro Devices, Inc. Processor configured to map logical register numbers to physical register numbers using virtual register numbers
US6272662B1 (en) 1998-08-04 2001-08-07 International Business Machines Corporation Distributed storage system using front-end and back-end locking
JP2000057054A (ja) 1998-08-12 2000-02-25 Fujitsu Ltd 高速アドレス変換システム
US8631066B2 (en) 1998-09-10 2014-01-14 Vmware, Inc. Mechanism for providing virtual machines for use by multiple users
US6339822B1 (en) 1998-10-02 2002-01-15 Advanced Micro Devices, Inc. Using padded instructions in a block-oriented cache
US6332189B1 (en) 1998-10-16 2001-12-18 Intel Corporation Branch prediction architecture
GB9825102D0 (en) 1998-11-16 1999-01-13 Insignia Solutions Plc Computer system
JP3110404B2 (ja) 1998-11-18 2000-11-20 甲府日本電気株式会社 マイクロプロセッサ装置及びそのソフトウェア命令高速化方法並びにその制御プログラムを記録した記録媒体
US6490673B1 (en) 1998-11-27 2002-12-03 Matsushita Electric Industrial Co., Ltd Processor, compiling apparatus, and compile program recorded on a recording medium
US6519682B2 (en) 1998-12-04 2003-02-11 Stmicroelectronics, Inc. Pipelined non-blocking level two cache system with inherent transaction collision-avoidance
US7020879B1 (en) 1998-12-16 2006-03-28 Mips Technologies, Inc. Interrupt and exception handling for multi-streaming digital processors
US6477562B2 (en) 1998-12-16 2002-11-05 Clearwater Networks, Inc. Prioritized instruction scheduling for multi-streaming processors
US6247097B1 (en) 1999-01-22 2001-06-12 International Business Machines Corporation Aligned instruction cache handling of instruction fetches across multiple predicted branch instructions
US6321298B1 (en) 1999-01-25 2001-11-20 International Business Machines Corporation Full cache coherency across multiple raid controllers
JP3842474B2 (ja) 1999-02-02 2006-11-08 株式会社ルネサステクノロジ データ処理装置
US6327650B1 (en) 1999-02-12 2001-12-04 Vsli Technology, Inc. Pipelined multiprocessing with upstream processor concurrently writing to local register and to register of downstream processor
US6668316B1 (en) 1999-02-17 2003-12-23 Elbrus International Limited Method and apparatus for conflict-free execution of integer and floating-point operations with a common register file
US6732220B2 (en) 1999-02-17 2004-05-04 Elbrus International Method for emulating hardware features of a foreign architecture in a host operating system environment
US6418530B2 (en) 1999-02-18 2002-07-09 Hewlett-Packard Company Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions
US6437789B1 (en) 1999-02-19 2002-08-20 Evans & Sutherland Computer Corporation Multi-level cache controller
US6850531B1 (en) 1999-02-23 2005-02-01 Alcatel Multi-service network switch
US6212613B1 (en) 1999-03-22 2001-04-03 Cisco Technology, Inc. Methods and apparatus for reusing addresses in a computer
US6529928B1 (en) 1999-03-23 2003-03-04 Silicon Graphics, Inc. Floating-point adder performing floating-point and integer operations
EP1050808B1 (en) 1999-05-03 2008-04-30 STMicroelectronics S.A. Computer instruction scheduling
US6449671B1 (en) 1999-06-09 2002-09-10 Ati International Srl Method and apparatus for busing data elements
US6473833B1 (en) 1999-07-30 2002-10-29 International Business Machines Corporation Integrated cache and directory structure for multi-level caches
US6643770B1 (en) 1999-09-16 2003-11-04 Intel Corporation Branch misprediction recovery using a side memory
US6704822B1 (en) 1999-10-01 2004-03-09 Sun Microsystems, Inc. Arbitration protocol for a shared data cache
US6772325B1 (en) 1999-10-01 2004-08-03 Hitachi, Ltd. Processor architecture and operation for exploiting improved branch control instruction
US6457120B1 (en) 1999-11-01 2002-09-24 International Business Machines Corporation Processor and method including a cache having confirmation bits for improving address predictable branch instruction target predictions
US7441110B1 (en) 1999-12-10 2008-10-21 International Business Machines Corporation Prefetching using future branch path information derived from branch prediction
US7107434B2 (en) 1999-12-20 2006-09-12 Board Of Regents, The University Of Texas System, method and apparatus for allocating hardware resources using pseudorandom sequences
EP1247195A4 (en) 1999-12-22 2005-01-05 Ubicom Inc SYSTEM AND METHOD FOR MULTITHREADING WORKING ON COMMAND LEVEL IN AN EMBEDDED PROCESSOR WITH ZERO-TIME CONTEXT SWITCHING
US6557095B1 (en) 1999-12-27 2003-04-29 Intel Corporation Scheduling operations using a dependency matrix
CN1210649C (zh) 2000-01-03 2005-07-13 先进微装置公司 能够发送及重新发送附属链接的排程器、包括该排程器的处理器以及排程方法
US6542984B1 (en) 2000-01-03 2003-04-01 Advanced Micro Devices, Inc. Scheduler capable of issuing and reissuing dependency chains
US6594755B1 (en) 2000-01-04 2003-07-15 National Semiconductor Corporation System and method for interleaved execution of multiple independent threads
US6728872B1 (en) 2000-02-04 2004-04-27 International Business Machines Corporation Method and apparatus for verifying that instructions are pipelined in correct architectural sequence
GB0002848D0 (en) 2000-02-08 2000-03-29 Siroyan Limited Communicating instruction results in processors and compiling methods for processors
GB2365661A (en) 2000-03-10 2002-02-20 British Telecomm Allocating switch requests within a packet switch
US6615340B1 (en) 2000-03-22 2003-09-02 Wilmot, Ii Richard Byron Extended operand management indicator structure and method
US7140022B2 (en) 2000-06-02 2006-11-21 Honeywell International Inc. Method and apparatus for slack stealing with dynamic threads
US6604187B1 (en) 2000-06-19 2003-08-05 Advanced Micro Devices, Inc. Providing global translations with address space numbers
US6557083B1 (en) 2000-06-30 2003-04-29 Intel Corporation Memory system for multiple data types
US6704860B1 (en) 2000-07-26 2004-03-09 International Business Machines Corporation Data processing system and method for fetching instruction blocks in response to a detected block sequence
US7206925B1 (en) 2000-08-18 2007-04-17 Sun Microsystems, Inc. Backing Register File for processors
US6728866B1 (en) 2000-08-31 2004-04-27 International Business Machines Corporation Partitioned issue queue and allocation strategy
US6721874B1 (en) 2000-10-12 2004-04-13 International Business Machines Corporation Method and system for dynamically shared completion table supporting multiple threads in a processing system
US6639866B2 (en) 2000-11-03 2003-10-28 Broadcom Corporation Very small swing high performance asynchronous CMOS static memory (multi-port register file) with power reducing column multiplexing scheme
US7757065B1 (en) 2000-11-09 2010-07-13 Intel Corporation Instruction segment recording scheme
JP2002185513A (ja) 2000-12-18 2002-06-28 Hitachi Ltd パケット通信ネットワークおよびパケット転送制御方法
US6877089B2 (en) 2000-12-27 2005-04-05 International Business Machines Corporation Branch prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program
US6907600B2 (en) 2000-12-27 2005-06-14 Intel Corporation Virtual translation lookaside buffer
US6647466B2 (en) 2001-01-25 2003-11-11 Hewlett-Packard Development Company, L.P. Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy
FR2820921A1 (fr) 2001-02-14 2002-08-16 Canon Kk Dispositif et procede de transmission dans un commutateur
US6985951B2 (en) 2001-03-08 2006-01-10 International Business Machines Corporation Inter-partition message passing method, system and program product for managing workload in a partitioned processing environment
US6950927B1 (en) 2001-04-13 2005-09-27 The United States Of America As Represented By The Secretary Of The Navy System and method for instruction-level parallelism in a programmable multiple network processor environment
US7707397B2 (en) 2001-05-04 2010-04-27 Via Technologies, Inc. Variable group associativity branch target address cache delivering multiple target addresses per cache line
US7200740B2 (en) 2001-05-04 2007-04-03 Ip-First, Llc Apparatus and method for speculatively performing a return instruction in a microprocessor
US6658549B2 (en) 2001-05-22 2003-12-02 Hewlett-Packard Development Company, Lp. Method and system allowing a single entity to manage memory comprising compressed and uncompressed data
US6985591B2 (en) 2001-06-29 2006-01-10 Intel Corporation Method and apparatus for distributing keys for decrypting and re-encrypting publicly distributed media
US7203824B2 (en) 2001-07-03 2007-04-10 Ip-First, Llc Apparatus and method for handling BTAC branches that wrap across instruction cache lines
US7024545B1 (en) 2001-07-24 2006-04-04 Advanced Micro Devices, Inc. Hybrid branch prediction device with two levels of branch prediction cache
US6954846B2 (en) * 2001-08-07 2005-10-11 Sun Microsystems, Inc. Microprocessor and method for giving each thread exclusive access to one register file in a multi-threading mode and for giving an active thread access to multiple register files in a single thread mode
KR20030018999A (ko) 2001-08-31 2003-03-06 엘지전자 주식회사 동기식 디지털 계위의 가상상자 포인터 생성을 위한스터프 타임 결정장치 및 그 방법
US6718440B2 (en) 2001-09-28 2004-04-06 Intel Corporation Memory access latency hiding with hint buffer
US7150021B1 (en) 2001-10-12 2006-12-12 Palau Acquisition Corporation (Delaware) Method and system to allocate resources within an interconnect device according to a resource allocation table
US7117347B2 (en) 2001-10-23 2006-10-03 Ip-First, Llc Processor including fallback branch prediction mechanism for far jump and far call instructions
US7272832B2 (en) 2001-10-25 2007-09-18 Hewlett-Packard Development Company, L.P. Method of protecting user process data in a secure platform inaccessible to the operating system and other tasks on top of the secure platform
US6964043B2 (en) 2001-10-30 2005-11-08 Intel Corporation Method, apparatus, and system to optimize frequently executed code and to use compiler transformation and hardware support to handle infrequently executed code
GB2381886B (en) 2001-11-07 2004-06-23 Sun Microsystems Inc Computer system with virtual memory and paging mechanism
US7092869B2 (en) 2001-11-14 2006-08-15 Ronald Hilton Memory address prediction under emulation
US20030126416A1 (en) 2001-12-31 2003-07-03 Marr Deborah T. Suspending execution of a thread in a multi-threaded processor
US7363467B2 (en) 2002-01-03 2008-04-22 Intel Corporation Dependence-chain processing using trace descriptors having dependency descriptors
US6640333B2 (en) 2002-01-10 2003-10-28 Lsi Logic Corporation Architecture for a sea of platforms
US7055021B2 (en) 2002-02-05 2006-05-30 Sun Microsystems, Inc. Out-of-order processor that reduces mis-speculation using a replay scoreboard
US7331040B2 (en) 2002-02-06 2008-02-12 Transitive Limted Condition code flag emulation for program code conversion
US20030154363A1 (en) 2002-02-11 2003-08-14 Soltis Donald C. Stacked register aliasing in data hazard detection to reduce circuit
US6839816B2 (en) 2002-02-26 2005-01-04 International Business Machines Corporation Shared cache line update mechanism
US6731292B2 (en) 2002-03-06 2004-05-04 Sun Microsystems, Inc. System and method for controlling a number of outstanding data transactions within an integrated circuit
JP3719509B2 (ja) * 2002-04-01 2005-11-24 株式会社ソニー・コンピュータエンタテインメント シリアル演算パイプライン、演算装置、算術論理演算回路およびシリアル演算パイプラインによる演算方法
US7565509B2 (en) 2002-04-17 2009-07-21 Microsoft Corporation Using limits on address translation to control access to an addressable entity
US6920530B2 (en) 2002-04-23 2005-07-19 Sun Microsystems, Inc. Scheme for reordering instructions via an instruction caching mechanism
US7113488B2 (en) 2002-04-24 2006-09-26 International Business Machines Corporation Reconfigurable circular bus
US6760818B2 (en) 2002-05-01 2004-07-06 Koninklijke Philips Electronics N.V. Memory region based data pre-fetching
US7281055B2 (en) 2002-05-28 2007-10-09 Newisys, Inc. Routing mechanisms in systems having multiple multi-processor clusters
US7117346B2 (en) 2002-05-31 2006-10-03 Freescale Semiconductor, Inc. Data processing system having multiple register contexts and method therefor
US6938151B2 (en) 2002-06-04 2005-08-30 International Business Machines Corporation Hybrid branch prediction using a global selection counter and a prediction method comparison table
US6735747B2 (en) 2002-06-10 2004-05-11 Lsi Logic Corporation Pre-silicon verification path coverage
US8024735B2 (en) 2002-06-14 2011-09-20 Intel Corporation Method and apparatus for ensuring fairness and forward progress when executing multiple threads of execution
JP3845043B2 (ja) 2002-06-28 2006-11-15 富士通株式会社 命令フェッチ制御装置
JP3982353B2 (ja) 2002-07-12 2007-09-26 日本電気株式会社 フォルトトレラントコンピュータ装置、その再同期化方法及び再同期化プログラム
US6944744B2 (en) 2002-08-27 2005-09-13 Advanced Micro Devices, Inc. Apparatus and method for independently schedulable functional units with issue lock mechanism in a processor
US7546422B2 (en) 2002-08-28 2009-06-09 Intel Corporation Method and apparatus for the synchronization of distributed caches
US6950925B1 (en) 2002-08-28 2005-09-27 Advanced Micro Devices, Inc. Scheduler for use in a microprocessor that supports data-speculative execution
TW200408242A (en) 2002-09-06 2004-05-16 Matsushita Electric Ind Co Ltd Home terminal apparatus and communication system
US6895491B2 (en) 2002-09-26 2005-05-17 Hewlett-Packard Development Company, L.P. Memory addressing for a virtual machine implementation on a computer processor supporting virtual hash-page-table searching
US7334086B2 (en) 2002-10-08 2008-02-19 Rmi Corporation Advanced processor with system on a chip interconnect technology
US6829698B2 (en) 2002-10-10 2004-12-07 International Business Machines Corporation Method, apparatus and system for acquiring a global promotion facility utilizing a data-less transaction
US7213248B2 (en) 2002-10-10 2007-05-01 International Business Machines Corporation High speed promotion mechanism suitable for lock acquisition in a multiprocessor data processing system
US7222218B2 (en) 2002-10-22 2007-05-22 Sun Microsystems, Inc. System and method for goal-based scheduling of blocks of code for concurrent execution
US20040103251A1 (en) 2002-11-26 2004-05-27 Mitchell Alsup Microprocessor including a first level cache and a second level cache having different cache line sizes
CN1720494A (zh) 2002-12-04 2006-01-11 皇家飞利浦电子股份有限公司 减少微处理器的功率消耗的寄存器堆选通方法
US6981083B2 (en) 2002-12-05 2005-12-27 International Business Machines Corporation Processor virtualization mechanism via an enhanced restoration of hard architected states
US7073042B2 (en) 2002-12-12 2006-07-04 Intel Corporation Reclaiming existing fields in address translation data structures to extend control over memory accesses
US20040117594A1 (en) 2002-12-13 2004-06-17 Vanderspek Julius Memory management method
US20040122887A1 (en) * 2002-12-20 2004-06-24 Macy William W. Efficient multiplication of small matrices using SIMD registers
US7191349B2 (en) 2002-12-26 2007-03-13 Intel Corporation Mechanism for processor power state aware distribution of lowest priority interrupt
US6925421B2 (en) 2003-01-09 2005-08-02 International Business Machines Corporation Method, system, and computer program product for estimating the number of consumers that place a load on an individual resource in a pool of physically distributed resources
US20040139441A1 (en) 2003-01-09 2004-07-15 Kabushiki Kaisha Toshiba Processor, arithmetic operation processing method, and priority determination method
US7178010B2 (en) 2003-01-16 2007-02-13 Ip-First, Llc Method and apparatus for correcting an internal call/return stack in a microprocessor that detects from multiple pipeline stages incorrect speculative update of the call/return stack
US7089374B2 (en) 2003-02-13 2006-08-08 Sun Microsystems, Inc. Selectively unmarking load-marked cache lines during transactional program execution
US7278030B1 (en) 2003-03-03 2007-10-02 Vmware, Inc. Virtualization system for computers having multiple protection mechanisms
US6912644B1 (en) 2003-03-06 2005-06-28 Intel Corporation Method and apparatus to steer memory access operations in a virtual memory system
US7111145B1 (en) 2003-03-25 2006-09-19 Vmware, Inc. TLB miss fault handler and method for accessing multiple page tables
US7143273B2 (en) 2003-03-31 2006-11-28 Intel Corporation Method and apparatus for dynamic branch prediction utilizing multiple stew algorithms for indexing a global history
CN1214666C (zh) 2003-04-07 2005-08-10 华为技术有限公司 位置业务中限制位置信息请求流量的方法
US7058764B2 (en) 2003-04-14 2006-06-06 Hewlett-Packard Development Company, L.P. Method of adaptive cache partitioning to increase host I/O performance
EP1471421A1 (en) 2003-04-24 2004-10-27 STMicroelectronics Limited Speculative load instruction control
US7290261B2 (en) 2003-04-24 2007-10-30 International Business Machines Corporation Method and logical apparatus for rename register reallocation in a simultaneous multi-threaded (SMT) processor
US7139855B2 (en) 2003-04-24 2006-11-21 International Business Machines Corporation High performance synchronization of resource allocation in a logically-partitioned system
US7469407B2 (en) 2003-04-24 2008-12-23 International Business Machines Corporation Method for resource balancing using dispatch flush in a simultaneous multithread processor
US7055003B2 (en) 2003-04-25 2006-05-30 International Business Machines Corporation Data cache scrub mechanism for large L2/L3 data cache structures
US7007108B2 (en) 2003-04-30 2006-02-28 Lsi Logic Corporation System method for use of hardware semaphores for resource release notification wherein messages comprises read-modify-write operation and address
US7743238B2 (en) 2003-05-09 2010-06-22 Arm Limited Accessing items of architectural state from a register cache in a data processing apparatus when performing branch prediction operations for an indirect branch instruction
JP2005032018A (ja) 2003-07-04 2005-02-03 Semiconductor Energy Lab Co Ltd 遺伝的アルゴリズムを用いたマイクロプロセッサ
US7149872B2 (en) 2003-07-10 2006-12-12 Transmeta Corporation System and method for identifying TLB entries associated with a physical address of a specified range
US7089398B2 (en) 2003-07-31 2006-08-08 Silicon Graphics, Inc. Address translation using a page size tag
US8296771B2 (en) 2003-08-18 2012-10-23 Cray Inc. System and method for mapping between resource consumers and resource providers in a computing system
US7133950B2 (en) 2003-08-19 2006-11-07 Sun Microsystems, Inc. Request arbitration in multi-core processor
US9032404B2 (en) 2003-08-28 2015-05-12 Mips Technologies, Inc. Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor
US7694304B2 (en) 2003-08-28 2010-04-06 Mips Technologies, Inc. Mechanisms for dynamic configuration of virtual processor resources
US7594089B2 (en) 2003-08-28 2009-09-22 Mips Technologies, Inc. Smart memory based synchronization controller for a multi-threaded multiprocessor SoC
US7849297B2 (en) 2003-08-28 2010-12-07 Mips Technologies, Inc. Software emulation of directed exceptions in a multithreading processor
US7111126B2 (en) 2003-09-24 2006-09-19 Arm Limited Apparatus and method for loading data values
JP4057989B2 (ja) 2003-09-26 2008-03-05 株式会社東芝 スケジューリング方法および情報処理システム
US7047322B1 (en) 2003-09-30 2006-05-16 Unisys Corporation System and method for performing conflict resolution and flow control in a multiprocessor system
US7373637B2 (en) 2003-09-30 2008-05-13 International Business Machines Corporation Method and apparatus for counting instruction and memory location ranges
FR2860313B1 (fr) * 2003-09-30 2005-11-04 Commissariat Energie Atomique Composant a architecture reconfigurable dynamiquement
TWI281121B (en) 2003-10-06 2007-05-11 Ip First Llc Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence
US7395372B2 (en) 2003-11-14 2008-07-01 International Business Machines Corporation Method and system for providing cache set selection which is power optimized
US7243170B2 (en) 2003-11-24 2007-07-10 International Business Machines Corporation Method and circuit for reading and writing an instruction buffer
US20050120191A1 (en) 2003-12-02 2005-06-02 Intel Corporation (A Delaware Corporation) Checkpoint-based register reclamation
US20050132145A1 (en) 2003-12-15 2005-06-16 Finisar Corporation Contingent processor time division multiple access of memory in a multi-processor system to allow supplemental memory consumer access
US7310722B2 (en) 2003-12-18 2007-12-18 Nvidia Corporation Across-thread out of order instruction dispatch in a multithreaded graphics processor
US7293164B2 (en) 2004-01-14 2007-11-06 International Business Machines Corporation Autonomic method and apparatus for counting branch instructions to generate branch statistics meant to improve branch predictions
US20050204118A1 (en) 2004-02-27 2005-09-15 National Chiao Tung University Method for inter-cluster communication that employs register permutation
CA2558892A1 (en) 2004-03-13 2005-09-29 Cluster Resources, Inc. System and method for a self-optimizing reservation in time of compute resources
US7478374B2 (en) 2004-03-22 2009-01-13 Intel Corporation Debug system having assembler correcting register allocation errors
US20050216920A1 (en) 2004-03-24 2005-09-29 Vijay Tewari Use of a virtual machine to emulate a hardware device
WO2005093562A1 (ja) 2004-03-29 2005-10-06 Kyoto University データ処理装置、データ処理プログラム、およびデータ処理プログラムを記録した記録媒体
US7383427B2 (en) 2004-04-22 2008-06-03 Sony Computer Entertainment Inc. Multi-scalar extension for SIMD instruction set processors
US20050251649A1 (en) 2004-04-23 2005-11-10 Sony Computer Entertainment Inc. Methods and apparatus for address map optimization on a multi-scalar extension
US7418582B1 (en) * 2004-05-13 2008-08-26 Sun Microsystems, Inc. Versatile register file design for a multi-threaded processor utilizing different modes and register windows
US7478198B2 (en) 2004-05-24 2009-01-13 Intel Corporation Multithreaded clustered microarchitecture with dynamic back-end assignment
US7594234B1 (en) 2004-06-04 2009-09-22 Sun Microsystems, Inc. Adaptive spin-then-block mutual exclusion in multi-threaded processing
US7284092B2 (en) 2004-06-24 2007-10-16 International Business Machines Corporation Digital data processing apparatus having multi-level register file
US20050289530A1 (en) 2004-06-29 2005-12-29 Robison Arch D Scheduling of instructions in program compilation
EP1628235A1 (en) 2004-07-01 2006-02-22 Texas Instruments Incorporated Method and system of ensuring integrity of a secure mode entry sequence
US8044951B1 (en) 2004-07-02 2011-10-25 Nvidia Corporation Integer-based functionality in a graphics shading language
US7339592B2 (en) 2004-07-13 2008-03-04 Nvidia Corporation Simulating multiported memories using lower port count memories
US7398347B1 (en) * 2004-07-14 2008-07-08 Altera Corporation Methods and apparatus for dynamic instruction controlled reconfigurable register file
EP1619593A1 (en) 2004-07-22 2006-01-25 Sap Ag Computer-Implemented method and system for performing a product availability check
JP4064380B2 (ja) 2004-07-29 2008-03-19 富士通株式会社 演算処理装置およびその制御方法
US8443171B2 (en) 2004-07-30 2013-05-14 Hewlett-Packard Development Company, L.P. Run-time updating of prediction hint instructions
US7213106B1 (en) 2004-08-09 2007-05-01 Sun Microsystems, Inc. Conservative shadow cache support in a point-to-point connected multiprocessing node
US7318143B2 (en) * 2004-10-20 2008-01-08 Arm Limited Reuseable configuration data
US20090150890A1 (en) 2007-12-10 2009-06-11 Yourst Matt T Strand-based computing hardware and dynamically optimizing strandware for a high performance microprocessor system
US7707578B1 (en) 2004-12-16 2010-04-27 Vmware, Inc. Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system
US7257695B2 (en) * 2004-12-28 2007-08-14 Intel Corporation Register file regions for a processing system
US7996644B2 (en) 2004-12-29 2011-08-09 Intel Corporation Fair sharing of a cache in a multi-core/multi-threaded processor by dynamically partitioning of the cache
US8719819B2 (en) 2005-06-30 2014-05-06 Intel Corporation Mechanism for instruction set based thread execution on a plurality of instruction sequencers
US7050922B1 (en) 2005-01-14 2006-05-23 Agilent Technologies, Inc. Method for optimizing test order, and machine-readable media storing sequences of instructions to perform same
US20060179277A1 (en) 2005-02-04 2006-08-10 Flachs Brian K System and method for instruction line buffer holding a branch target buffer
US7657891B2 (en) 2005-02-04 2010-02-02 Mips Technologies, Inc. Multithreading microprocessor with optimized thread scheduler for increasing pipeline utilization efficiency
US7681014B2 (en) 2005-02-04 2010-03-16 Mips Technologies, Inc. Multithreading instruction scheduler employing thread group priorities
EP1849095B1 (en) * 2005-02-07 2013-01-02 Richter, Thomas Low latency massive parallel data processing device
US7400548B2 (en) 2005-02-09 2008-07-15 International Business Machines Corporation Method for providing multiple reads/writes using a 2read/2write register file array
US7343476B2 (en) 2005-02-10 2008-03-11 International Business Machines Corporation Intelligent SMT thread hang detect taking into account shared resource contention/blocking
US7152155B2 (en) 2005-02-18 2006-12-19 Qualcomm Incorporated System and method of correcting a branch misprediction
US20060200655A1 (en) 2005-03-04 2006-09-07 Smith Rodney W Forward looking branch target address caching
US20060212853A1 (en) 2005-03-18 2006-09-21 Marvell World Trade Ltd. Real-time control apparatus having a multi-thread processor
US8195922B2 (en) 2005-03-18 2012-06-05 Marvell World Trade, Ltd. System for dynamically allocating processing time to multiple threads
GB2424727B (en) 2005-03-30 2007-08-01 Transitive Ltd Preparing instruction groups for a processor having a multiple issue ports
US8522253B1 (en) 2005-03-31 2013-08-27 Guillermo Rozas Hardware support for virtual machine and operating system context switching in translation lookaside buffers and virtually tagged caches
US7313775B2 (en) * 2005-04-06 2007-12-25 Lsi Corporation Integrated circuit with relocatable processor hardmac
US20060230243A1 (en) 2005-04-06 2006-10-12 Robert Cochran Cascaded snapshots
US8230423B2 (en) 2005-04-07 2012-07-24 International Business Machines Corporation Multithreaded processor architecture with operational latency hiding
US20060230409A1 (en) 2005-04-07 2006-10-12 Matteo Frigo Multithreaded processor architecture with implicit granularity adaptation
US20060230253A1 (en) 2005-04-11 2006-10-12 Lucian Codrescu Unified non-partitioned register files for a digital signal processor operating in an interleaved multi-threaded environment
US20060236074A1 (en) 2005-04-14 2006-10-19 Arm Limited Indicating storage locations within caches
US7437543B2 (en) 2005-04-19 2008-10-14 International Business Machines Corporation Reducing the fetch time of target instructions of a predicted taken branch instruction
US7461237B2 (en) 2005-04-20 2008-12-02 Sun Microsystems, Inc. Method and apparatus for suppressing duplicative prefetches for branch target cache lines
JP2008537268A (ja) * 2005-04-22 2008-09-11 アルトリックス ロジック,インク. 可変精度相互接続を具えたデータ処理エレメントの配列
US8713286B2 (en) 2005-04-26 2014-04-29 Qualcomm Incorporated Register files for a digital signal processor operating in an interleaved multi-threaded environment
GB2426084A (en) 2005-05-13 2006-11-15 Agilent Technologies Inc Updating data in a dual port memory
US7861055B2 (en) 2005-06-07 2010-12-28 Broadcom Corporation Method and system for on-chip configurable data ram for fast memory and pseudo associative caches
US8010969B2 (en) 2005-06-13 2011-08-30 Intel Corporation Mechanism for monitoring instruction set based thread execution on a plurality of instruction sequencers
WO2007027671A2 (en) 2005-08-29 2007-03-08 Searete Llc Scheduling mechanism of a hierarchical processor including multiple parallel clusters
US7765350B2 (en) 2005-09-14 2010-07-27 Koninklijke Philips Electronics N.V. Method and system for bus arbitration
US7350056B2 (en) * 2005-09-27 2008-03-25 International Business Machines Corporation Method and apparatus for issuing instructions from an issue queue in an information handling system
US7606975B1 (en) 2005-09-28 2009-10-20 Sun Microsystems, Inc. Trace cache for efficient self-modifying code processing
US7231106B2 (en) 2005-09-30 2007-06-12 Lucent Technologies Inc. Apparatus for directing an optical signal from an input fiber to an output fiber within a high index host
US7627735B2 (en) 2005-10-21 2009-12-01 Intel Corporation Implementing vector memory operations
US7613131B2 (en) 2005-11-10 2009-11-03 Citrix Systems, Inc. Overlay network infrastructure
US7681019B1 (en) 2005-11-18 2010-03-16 Sun Microsystems, Inc. Executing functions determined via a collection of operations from translated instructions
US7861060B1 (en) 2005-12-15 2010-12-28 Nvidia Corporation Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior
US7634637B1 (en) 2005-12-16 2009-12-15 Nvidia Corporation Execution of parallel groups of threads with per-instruction serialization
US7673111B2 (en) 2005-12-23 2010-03-02 Intel Corporation Memory system with both single and consolidated commands
US7770161B2 (en) 2005-12-28 2010-08-03 International Business Machines Corporation Post-register allocation profile directed instruction scheduling
US8423682B2 (en) 2005-12-30 2013-04-16 Intel Corporation Address space emulation
US20070186050A1 (en) 2006-02-03 2007-08-09 International Business Machines Corporation Self prefetching L2 cache mechanism for data lines
GB2435362B (en) 2006-02-20 2008-11-26 Cramer Systems Ltd Method of configuring devices in a telecommunications network
WO2007097019A1 (ja) 2006-02-27 2007-08-30 Fujitsu Limited キャッシュ制御装置およびキャッシュ制御方法
US7543282B2 (en) 2006-03-24 2009-06-02 Sun Microsystems, Inc. Method and apparatus for selectively executing different executable code versions which are optimized in different ways
EP2011018B1 (en) 2006-04-12 2016-07-13 Soft Machines, Inc. Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
US7610571B2 (en) 2006-04-14 2009-10-27 Cadence Design Systems, Inc. Method and system for simulating state retention of an RTL design
US7577820B1 (en) 2006-04-14 2009-08-18 Tilera Corporation Managing data in a parallel processing environment
CN100485636C (zh) 2006-04-24 2009-05-06 华为技术有限公司 一种基于模型驱动进行电信级业务开发的调试方法及装置
US7804076B2 (en) 2006-05-10 2010-09-28 Taiwan Semiconductor Manufacturing Co., Ltd Insulator for high current ion implanters
US8145882B1 (en) 2006-05-25 2012-03-27 Mips Technologies, Inc. Apparatus and method for processing template based user defined instructions
US20080126771A1 (en) 2006-07-25 2008-05-29 Lei Chen Branch Target Extension for an Instruction Cache
CN100495324C (zh) 2006-07-27 2009-06-03 中国科学院计算技术研究所 复杂指令集体系结构中的深度优先异常处理方法
US7904704B2 (en) 2006-08-14 2011-03-08 Marvell World Trade Ltd. Instruction dispatching method and apparatus
US8046775B2 (en) 2006-08-14 2011-10-25 Marvell World Trade Ltd. Event-based bandwidth allocation mode switching method and apparatus
US7539842B2 (en) 2006-08-15 2009-05-26 International Business Machines Corporation Computer memory system for selecting memory buses according to physical memory organization information stored in virtual address translation tables
US7594060B2 (en) 2006-08-23 2009-09-22 Sun Microsystems, Inc. Data buffer allocation in a non-blocking data services platform using input/output switching fabric
US7752474B2 (en) 2006-09-22 2010-07-06 Apple Inc. L1 cache flush when processor is entering low power mode
US7716460B2 (en) 2006-09-29 2010-05-11 Qualcomm Incorporated Effective use of a BHT in processor having variable length instruction set execution modes
US7774549B2 (en) 2006-10-11 2010-08-10 Mips Technologies, Inc. Horizontally-shared cache victims in multiple core processors
TWI337495B (en) 2006-10-26 2011-02-11 Au Optronics Corp System and method for operation scheduling
US8108625B1 (en) 2006-10-30 2012-01-31 Nvidia Corporation Shared memory with parallel access and access conflict resolution mechanism
US7680988B1 (en) 2006-10-30 2010-03-16 Nvidia Corporation Single interconnect providing read and write access to a memory shared by concurrent threads
US7617384B1 (en) 2006-11-06 2009-11-10 Nvidia Corporation Structured programming control flow using a disable mask in a SIMD architecture
EP2122461A4 (en) 2006-11-14 2010-03-24 Soft Machines Inc DEVICE AND METHOD FOR PROCESSING COMMUNICATIONS IN A MULTITHREAD ARCHITECTURE WITH CONTEXT CHANGES
US7493475B2 (en) 2006-11-15 2009-02-17 Stmicroelectronics, Inc. Instruction vector-mode processing in multi-lane processor by multiplex switch replicating instruction in one lane to select others along with updated operand address
US7934179B2 (en) 2006-11-20 2011-04-26 Et International, Inc. Systems and methods for logic verification
US20080235500A1 (en) 2006-11-21 2008-09-25 Davis Gordon T Structure for instruction cache trace formation
JP2008130056A (ja) 2006-11-27 2008-06-05 Renesas Technology Corp 半導体回路
WO2008077088A2 (en) 2006-12-19 2008-06-26 The Board Of Governors For Higher Education, State Of Rhode Island And Providence Plantations System and method for branch misprediction prediction using complementary branch predictors
US7783869B2 (en) 2006-12-19 2010-08-24 Arm Limited Accessing branch predictions ahead of instruction fetching
EP1940028B1 (en) 2006-12-29 2012-02-29 STMicroelectronics Srl Asynchronous interconnection system for 3D inter-chip communication
US8321849B2 (en) 2007-01-26 2012-11-27 Nvidia Corporation Virtual architecture and instruction set for parallel thread computing
TW200833002A (en) 2007-01-31 2008-08-01 Univ Nat Yunlin Sci & Tech Distributed switching circuit having fairness
US20080189501A1 (en) 2007-02-05 2008-08-07 Irish John D Methods and Apparatus for Issuing Commands on a Bus
US7685410B2 (en) 2007-02-13 2010-03-23 Global Foundries Inc. Redirect recovery cache that receives branch misprediction redirects and caches instructions to be dispatched in response to the redirects
US7647483B2 (en) 2007-02-20 2010-01-12 Sony Computer Entertainment Inc. Multi-threaded parallel processor methods and apparatus
US20080209190A1 (en) 2007-02-28 2008-08-28 Advanced Micro Devices, Inc. Parallel prediction of multiple branches
JP4980751B2 (ja) 2007-03-02 2012-07-18 富士通セミコンダクター株式会社 データ処理装置、およびメモリのリードアクティブ制御方法。
US8452907B2 (en) 2007-03-27 2013-05-28 Arm Limited Data processing apparatus and method for arbitrating access to a shared resource
US20080250227A1 (en) 2007-04-04 2008-10-09 Linderman Michael D General Purpose Multiprocessor Programming Apparatus And Method
US7716183B2 (en) 2007-04-11 2010-05-11 Dot Hill Systems Corporation Snapshot preserved data cloning
US7941791B2 (en) 2007-04-13 2011-05-10 Perry Wang Programming environment for heterogeneous processor resource integration
US7769955B2 (en) 2007-04-27 2010-08-03 Arm Limited Multiple thread instruction fetch from different cache levels
US7711935B2 (en) 2007-04-30 2010-05-04 Netlogic Microsystems, Inc. Universal branch identifier for invalidation of speculative instructions
US8555039B2 (en) 2007-05-03 2013-10-08 Qualcomm Incorporated System and method for using a local condition code register for accelerating conditional instruction execution in a pipeline processor
US8219996B1 (en) 2007-05-09 2012-07-10 Hewlett-Packard Development Company, L.P. Computer processor with fairness monitor
US9292436B2 (en) 2007-06-25 2016-03-22 Sonics, Inc. Various methods and apparatus to support transactions whose data address sequence within that transaction crosses an interleaved channel address boundary
CN101344840B (zh) 2007-07-10 2011-08-31 苏州简约纳电子有限公司 一种微处理器及在微处理器中执行指令的方法
US7937568B2 (en) 2007-07-11 2011-05-03 International Business Machines Corporation Adaptive execution cycle control method for enhanced instruction throughput
US20090025004A1 (en) 2007-07-16 2009-01-22 Microsoft Corporation Scheduling by Growing and Shrinking Resource Allocation
US8108545B2 (en) 2007-08-27 2012-01-31 International Business Machines Corporation Packet coalescing in virtual channels of a data processing system in a multi-tiered full-graph interconnect architecture
US7711929B2 (en) 2007-08-30 2010-05-04 International Business Machines Corporation Method and system for tracking instruction dependency in an out-of-order processor
GB2452316B (en) 2007-08-31 2009-08-19 Toshiba Res Europ Ltd Method of Allocating Resources in a Computer.
US8725991B2 (en) * 2007-09-12 2014-05-13 Qualcomm Incorporated Register file system and method for pipelined processing
US8082420B2 (en) 2007-10-24 2011-12-20 International Business Machines Corporation Method and apparatus for executing instructions
US7856530B1 (en) 2007-10-31 2010-12-21 Network Appliance, Inc. System and method for implementing a dynamic cache for a data storage system
US7877559B2 (en) 2007-11-26 2011-01-25 Globalfoundries Inc. Mechanism to accelerate removal of store operations from a queue
US8245232B2 (en) 2007-11-27 2012-08-14 Microsoft Corporation Software-configurable and stall-time fair memory access scheduling mechanism for shared memory systems
US7809925B2 (en) 2007-12-07 2010-10-05 International Business Machines Corporation Processing unit incorporating vectorizable execution unit
US8145844B2 (en) 2007-12-13 2012-03-27 Arm Limited Memory controller with write data cache and read data cache
US7831813B2 (en) 2007-12-17 2010-11-09 Globalfoundries Inc. Uses of known good code for implementing processor architectural modifications
US7870371B2 (en) 2007-12-17 2011-01-11 Microsoft Corporation Target-frequency based indirect jump prediction for high-performance processors
US20090165007A1 (en) 2007-12-19 2009-06-25 Microsoft Corporation Task-level thread scheduling and resource allocation
US8782384B2 (en) 2007-12-20 2014-07-15 Advanced Micro Devices, Inc. Branch history with polymorphic indirect branch information
US7917699B2 (en) 2007-12-21 2011-03-29 Mips Technologies, Inc. Apparatus and method for controlling the exclusivity mode of a level-two cache
US9244855B2 (en) 2007-12-31 2016-01-26 Intel Corporation Method, system, and apparatus for page sizing extension
US8645965B2 (en) 2007-12-31 2014-02-04 Intel Corporation Supporting metered clients with manycore through time-limited partitioning
US7877582B2 (en) * 2008-01-31 2011-01-25 International Business Machines Corporation Multi-addressable register file
WO2009101563A1 (en) 2008-02-11 2009-08-20 Nxp B.V. Multiprocessing implementing a plurality of virtual processors
US9021240B2 (en) 2008-02-22 2015-04-28 International Business Machines Corporation System and method for Controlling restarting of instruction fetching using speculative address computations
US7987343B2 (en) 2008-03-19 2011-07-26 International Business Machines Corporation Processor and method for synchronous load multiple fetching sequence and pipeline stage result tracking to facilitate early address generation interlock bypass
US7949972B2 (en) 2008-03-19 2011-05-24 International Business Machines Corporation Method, system and computer program product for exploiting orthogonal control vectors in timing driven synthesis
US9513905B2 (en) 2008-03-28 2016-12-06 Intel Corporation Vector instructions to enable efficient synchronization and parallel reduction operations
US8120608B2 (en) 2008-04-04 2012-02-21 Via Technologies, Inc. Constant buffering for a computational core of a programmable graphics processing unit
TWI364703B (en) 2008-05-26 2012-05-21 Faraday Tech Corp Processor and early execution method of data load thereof
US8131982B2 (en) 2008-06-13 2012-03-06 International Business Machines Corporation Branch prediction instructions having mask values involving unloading and loading branch history data
US8145880B1 (en) 2008-07-07 2012-03-27 Ovics Matrix processor data switch routing systems and methods
JP5733860B2 (ja) 2008-07-10 2015-06-10 ロケティック テクノロジーズ リミテッド 依存問題の効率的並列計算
JP2010039536A (ja) 2008-07-31 2010-02-18 Panasonic Corp プログラム変換装置、プログラム変換方法およびプログラム変換プログラム
US8316435B1 (en) 2008-08-14 2012-11-20 Juniper Networks, Inc. Routing device having integrated MPLS-aware firewall with virtual security system support
US8135942B2 (en) 2008-08-28 2012-03-13 International Business Machines Corpration System and method for double-issue instructions using a dependency matrix and a side issue queue
US7769984B2 (en) 2008-09-11 2010-08-03 International Business Machines Corporation Dual-issuance of microprocessor instructions using dual dependency matrices
US8225048B2 (en) 2008-10-01 2012-07-17 Hewlett-Packard Development Company, L.P. Systems and methods for resource access
US9244732B2 (en) 2009-08-28 2016-01-26 Vmware, Inc. Compensating threads for microarchitectural resource contentions by prioritizing scheduling and execution
US7941616B2 (en) 2008-10-21 2011-05-10 Microsoft Corporation System to reduce interference in concurrent programs
US8423749B2 (en) 2008-10-22 2013-04-16 International Business Machines Corporation Sequential processing in network on chip nodes by threads generating message containing payload and pointer for nanokernel to access algorithm to be executed on payload in another node
GB2464703A (en) 2008-10-22 2010-04-28 Advanced Risc Mach Ltd An array of interconnected processors executing a cycle-based program
EP2351325B1 (en) 2008-10-30 2018-09-26 Nokia Technologies Oy Method and apparatus for interleaving a data block
US8032678B2 (en) 2008-11-05 2011-10-04 Mediatek Inc. Shared resource arbitration
US7848129B1 (en) 2008-11-20 2010-12-07 Netlogic Microsystems, Inc. Dynamically partitioned CAM array
US8868838B1 (en) 2008-11-21 2014-10-21 Nvidia Corporation Multi-class data cache policies
US8171223B2 (en) 2008-12-03 2012-05-01 Intel Corporation Method and system to increase concurrency and control replication in a multi-core cache hierarchy
US8200949B1 (en) 2008-12-09 2012-06-12 Nvidia Corporation Policy based allocation of register file cache to threads in multi-threaded processor
US8312268B2 (en) 2008-12-12 2012-11-13 International Business Machines Corporation Virtual machine
US7870308B2 (en) 2008-12-23 2011-01-11 International Business Machines Corporation Programmable direct memory access engine
US8099586B2 (en) 2008-12-30 2012-01-17 Oracle America, Inc. Branch misprediction recovery mechanism for microprocessors
US20100169578A1 (en) 2008-12-31 2010-07-01 Texas Instruments Incorporated Cache tag memory
US20100205603A1 (en) 2009-02-09 2010-08-12 Unisys Corporation Scheduling and dispatching tasks in an emulated operating system
JP5417879B2 (ja) 2009-02-17 2014-02-19 富士通セミコンダクター株式会社 キャッシュ装置
JP2010226275A (ja) 2009-03-23 2010-10-07 Nec Corp 通信装置および通信方法
US8505013B2 (en) 2010-03-12 2013-08-06 Lsi Corporation Reducing data read latency in a network communications processor architecture
US8805788B2 (en) 2009-05-04 2014-08-12 Moka5, Inc. Transactional virtual disk with differential snapshots
US8332854B2 (en) 2009-05-19 2012-12-11 Microsoft Corporation Virtualized thread scheduling for hardware thread optimization based on hardware resource parameter summaries of instruction blocks in execution groups
US8533437B2 (en) 2009-06-01 2013-09-10 Via Technologies, Inc. Guaranteed prefetch instruction
GB2471067B (en) 2009-06-12 2011-11-30 Graeme Roy Smith Shared resource multi-thread array processor
US9122487B2 (en) 2009-06-23 2015-09-01 Oracle America, Inc. System and method for balancing instruction loads between multiple execution units using assignment history
US8386754B2 (en) 2009-06-24 2013-02-26 Arm Limited Renaming wide register source operand with plural short register source operands for select instructions to detect dependency fast with existing mechanism
CN101582025B (zh) 2009-06-25 2011-05-25 浙江大学 片上多处理器体系架构下全局寄存器重命名表的实现方法
US8397049B2 (en) 2009-07-13 2013-03-12 Apple Inc. TLB prefetching
US8539486B2 (en) 2009-07-17 2013-09-17 International Business Machines Corporation Transactional block conflict resolution based on the determination of executing threads in parallel or in serial mode
JP5423217B2 (ja) 2009-08-04 2014-02-19 富士通株式会社 演算処理装置、情報処理装置、および演算処理装置の制御方法
US8127078B2 (en) 2009-10-02 2012-02-28 International Business Machines Corporation High performance unaligned cache access
US20110082983A1 (en) 2009-10-06 2011-04-07 Alcatel-Lucent Canada, Inc. Cpu instruction and data cache corruption prevention system
US8695002B2 (en) 2009-10-20 2014-04-08 Lantiq Deutschland Gmbh Multi-threaded processors and multi-processor systems comprising shared resources
US8364933B2 (en) 2009-12-18 2013-01-29 International Business Machines Corporation Software assisted translation lookaside buffer search mechanism
JP2011150397A (ja) 2010-01-19 2011-08-04 Panasonic Corp バス調停装置
KR101699910B1 (ko) 2010-03-04 2017-01-26 삼성전자주식회사 재구성 가능 프로세서 및 그 제어 방법
US20120005462A1 (en) 2010-07-01 2012-01-05 International Business Machines Corporation Hardware Assist for Optimizing Code During Processing
US8312258B2 (en) 2010-07-22 2012-11-13 Intel Corporation Providing platform independent memory logic
CN101916180B (zh) 2010-08-11 2013-05-29 中国科学院计算技术研究所 Risc处理器中执行寄存器类型指令的方法和其系统
US8751745B2 (en) 2010-08-11 2014-06-10 Advanced Micro Devices, Inc. Method for concurrent flush of L1 and L2 caches
US9201801B2 (en) 2010-09-15 2015-12-01 International Business Machines Corporation Computing device with asynchronous auxiliary execution unit
US8856460B2 (en) 2010-09-15 2014-10-07 Oracle International Corporation System and method for zero buffer copying in a middleware environment
EP2616928B1 (en) 2010-09-17 2016-11-02 Soft Machines, Inc. Single cycle multi-branch prediction including shadow cache for early far branch prediction
US20120079212A1 (en) 2010-09-23 2012-03-29 International Business Machines Corporation Architecture for sharing caches among multiple processes
EP3306466B1 (en) 2010-10-12 2020-05-13 INTEL Corporation An instruction sequence buffer to store branches having reliably predictable instruction sequences
CN107092467B (zh) 2010-10-12 2021-10-29 英特尔公司 用于增强分支预测效率的指令序列缓冲器
US8370553B2 (en) 2010-10-18 2013-02-05 International Business Machines Corporation Formal verification of random priority-based arbiters using property strengthening and underapproximations
US9047178B2 (en) 2010-12-13 2015-06-02 SanDisk Technologies, Inc. Auto-commit memory synchronization
US8677355B2 (en) 2010-12-17 2014-03-18 Microsoft Corporation Virtual machine branching and parallel execution
WO2012103245A2 (en) 2011-01-27 2012-08-02 Soft Machines Inc. Guest instruction block with near branching and far branching sequence construction to native instruction block
KR101638225B1 (ko) 2011-03-25 2016-07-08 소프트 머신즈, 인크. 분할가능한 엔진에 의해 인스턴스화된 가상 코어를 이용한 명령어 시퀀스 코드 블록의 실행
EP2689326B1 (en) 2011-03-25 2022-11-16 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
CN103562866B (zh) 2011-03-25 2018-03-30 英特尔公司 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段
US20120254592A1 (en) 2011-04-01 2012-10-04 Jesus Corbal San Adrian Systems, apparatuses, and methods for expanding a memory source into a destination register and compressing a source register into a destination memory location
US9740494B2 (en) 2011-04-29 2017-08-22 Arizona Board Of Regents For And On Behalf Of Arizona State University Low complexity out-of-order issue logic using static circuits
US8843690B2 (en) 2011-07-11 2014-09-23 Avago Technologies General Ip (Singapore) Pte. Ltd. Memory conflicts learning capability
US8930432B2 (en) 2011-08-04 2015-01-06 International Business Machines Corporation Floating point execution unit with fixed point functionality
US20130046934A1 (en) 2011-08-15 2013-02-21 Robert Nychka System caching using heterogenous memories
US8839025B2 (en) 2011-09-30 2014-09-16 Oracle International Corporation Systems and methods for retiring and unretiring cache lines
WO2013077876A1 (en) 2011-11-22 2013-05-30 Soft Machines, Inc. A microprocessor accelerated code optimizer
KR101703401B1 (ko) 2011-11-22 2017-02-06 소프트 머신즈, 인크. 다중 엔진 마이크로프로세서용 가속 코드 최적화기
KR101648278B1 (ko) 2011-11-22 2016-08-12 소프트 머신즈, 인크. 마이크로프로세서 가속 코드 최적화기 및 의존성 재순서화 방법
US20130138888A1 (en) 2011-11-30 2013-05-30 Jama I. Barreh Storing a target address of a control transfer instruction in an instruction field
US8930674B2 (en) 2012-03-07 2015-01-06 Soft Machines, Inc. Systems and methods for accessing a unified translation lookaside buffer
KR20130119285A (ko) 2012-04-23 2013-10-31 한국전자통신연구원 클러스터 컴퓨팅 환경에서의 자원 할당 장치 및 그 방법
US9684601B2 (en) 2012-05-10 2017-06-20 Arm Limited Data processing apparatus having cache and translation lookaside buffer
US9996348B2 (en) 2012-06-14 2018-06-12 Apple Inc. Zero cycle load
US9940247B2 (en) 2012-06-26 2018-04-10 Advanced Micro Devices, Inc. Concurrent access to cache dirty bits
US9710399B2 (en) 2012-07-30 2017-07-18 Intel Corporation Systems and methods for flushing a cache with modified data
US9430410B2 (en) 2012-07-30 2016-08-30 Soft Machines, Inc. Systems and methods for supporting a plurality of load accesses of a cache in a single cycle
US9916253B2 (en) 2012-07-30 2018-03-13 Intel Corporation Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput
US9229873B2 (en) 2012-07-30 2016-01-05 Soft Machines, Inc. Systems and methods for supporting a plurality of load and store accesses of a cache
US9740612B2 (en) 2012-07-30 2017-08-22 Intel Corporation Systems and methods for maintaining the coherency of a store coalescing cache and a load cache
US9678882B2 (en) 2012-10-11 2017-06-13 Intel Corporation Systems and methods for non-blocking implementation of cache flush instructions
US10037228B2 (en) 2012-10-25 2018-07-31 Nvidia Corporation Efficient memory virtualization in multi-threaded processing units
US9195506B2 (en) 2012-12-21 2015-11-24 International Business Machines Corporation Processor provisioning by a middleware processing system for a plurality of logical processor partitions
GB2514956B (en) 2013-01-21 2015-04-01 Imagination Tech Ltd Allocating resources to threads based on speculation metric
US9569216B2 (en) 2013-03-15 2017-02-14 Soft Machines, Inc. Method for populating a source view data structure by using register template snapshots
US10140138B2 (en) 2013-03-15 2018-11-27 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
WO2014150971A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for dependency broadcasting through a block organized source view data structure
US9112767B2 (en) 2013-03-15 2015-08-18 Cavium, Inc. Method and an accumulator scoreboard for out-of-order rule response handling
US9904625B2 (en) 2013-03-15 2018-02-27 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
WO2014150806A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for populating register view data structure by using register template snapshots
WO2014150941A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for executing blocks of instructions using a microprocessor architecture having a register view, source view, instruction view, and a plurality of register templates
CN105210040B (zh) 2013-03-15 2019-04-02 英特尔公司 用于执行分组成块的多线程指令的方法
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
CN105247484B (zh) 2013-03-15 2021-02-23 英特尔公司 利用本地分布式标志体系架构来仿真访客集中式标志体系架构的方法
US9632825B2 (en) 2013-03-15 2017-04-25 Intel Corporation Method and apparatus for efficient scheduling for asymmetrical execution units
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US10275255B2 (en) 2013-03-15 2019-04-30 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
WO2014150991A1 (en) 2013-03-15 2014-09-25 Soft Machines, Inc. A method for implementing a reduced size register view data structure in a microprocessor
US9208066B1 (en) 2015-03-04 2015-12-08 Centipede Semi Ltd. Run-time code parallelization with approximate monitoring of instruction sequences

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4835680A (en) * 1985-03-15 1989-05-30 Xerox Corporation Adaptive processor array capable of learning variable associations useful in recognizing classes of inputs
US4943909A (en) * 1987-07-08 1990-07-24 At&T Bell Laboratories Computational origami
WO2004114128A2 (en) * 2003-06-25 2004-12-29 Koninklijke Philips Electronics N.V. Instruction controlled data processing device

Cited By (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10289605B2 (en) 2006-04-12 2019-05-14 Intel Corporation Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
US9886416B2 (en) 2006-04-12 2018-02-06 Intel Corporation Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
US11163720B2 (en) 2006-04-12 2021-11-02 Intel Corporation Apparatus and method for processing an instruction matrix specifying parallel and dependent operations
US10585670B2 (en) 2006-11-14 2020-03-10 Intel Corporation Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer
US9965281B2 (en) 2006-11-14 2018-05-08 Intel Corporation Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer
US10228949B2 (en) 2010-09-17 2019-03-12 Intel Corporation Single cycle multi-branch prediction including shadow cache for early far branch prediction
US9766893B2 (en) 2011-03-25 2017-09-19 Intel Corporation Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
US10564975B2 (en) 2011-03-25 2020-02-18 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9921845B2 (en) 2011-03-25 2018-03-20 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9990200B2 (en) 2011-03-25 2018-06-05 Intel Corporation Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines
US9842005B2 (en) 2011-03-25 2017-12-12 Intel Corporation Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
US9934072B2 (en) 2011-03-25 2018-04-03 Intel Corporation Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines
US11204769B2 (en) 2011-03-25 2021-12-21 Intel Corporation Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines
CN103562866B (zh) * 2011-03-25 2018-03-30 英特尔公司 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段
US10372454B2 (en) 2011-05-20 2019-08-06 Intel Corporation Allocation of a segmented interconnect to support the execution of instruction sequences by a plurality of engines
US10031784B2 (en) 2011-05-20 2018-07-24 Intel Corporation Interconnect system to support the execution of instruction sequences by a plurality of partitionable engines
US9940134B2 (en) 2011-05-20 2018-04-10 Intel Corporation Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines
US10521239B2 (en) 2011-11-22 2019-12-31 Intel Corporation Microprocessor accelerated code optimizer
US10191746B2 (en) 2011-11-22 2019-01-29 Intel Corporation Accelerated code optimizer for a multiengine microprocessor
CN104583975A (zh) * 2012-06-15 2015-04-29 索夫特机械公司 无消歧乱序加载存储队列
CN104583975B (zh) * 2012-06-15 2017-07-14 英特尔公司 无消歧乱序加载存储队列
US10592300B2 (en) 2012-06-15 2020-03-17 Intel Corporation Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization
US9928121B2 (en) 2012-06-15 2018-03-27 Intel Corporation Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization
US9965277B2 (en) 2012-06-15 2018-05-08 Intel Corporation Virtual load store queue having a dynamic dispatch window with a unified structure
US9990198B2 (en) 2012-06-15 2018-06-05 Intel Corporation Instruction definition to implement load store reordering and optimization
US9904552B2 (en) 2012-06-15 2018-02-27 Intel Corporation Virtual load store queue having a dynamic dispatch window with a distributed structure
US10019263B2 (en) 2012-06-15 2018-07-10 Intel Corporation Reordered speculative instruction sequences with a disambiguation-free out of order load store queue
US10048964B2 (en) 2012-06-15 2018-08-14 Intel Corporation Disambiguation-free out of order load store queue
CN105074657B (zh) * 2013-01-29 2018-11-30 超威半导体公司 并行管道中的发散分支的硬件和软件解决方案
CN105074657A (zh) * 2013-01-29 2015-11-18 超威半导体公司 并行管道中的发散分支的硬件和软件解决方案
CN104008216B (zh) * 2013-02-22 2017-04-26 円星科技股份有限公司 使用存储器编译器以产生最佳化存储器实例的方法
US10169045B2 (en) 2013-03-15 2019-01-01 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
US10503514B2 (en) 2013-03-15 2019-12-10 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US10146576B2 (en) 2013-03-15 2018-12-04 Intel Corporation Method for executing multithreaded instructions grouped into blocks
US10146548B2 (en) 2013-03-15 2018-12-04 Intel Corporation Method for populating a source view data structure by using register template snapshots
US9823930B2 (en) 2013-03-15 2017-11-21 Intel Corporation Method for emulating a guest centralized flag architecture by using a native distributed flag architecture
US11656875B2 (en) 2013-03-15 2023-05-23 Intel Corporation Method and system for instruction block to execution unit grouping
US10198266B2 (en) 2013-03-15 2019-02-05 Intel Corporation Method for populating register view data structure by using register template snapshots
US9811342B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for performing dual dispatch of blocks and half blocks
US10248570B2 (en) 2013-03-15 2019-04-02 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US10255076B2 (en) 2013-03-15 2019-04-09 Intel Corporation Method for performing dual dispatch of blocks and half blocks
US10275255B2 (en) 2013-03-15 2019-04-30 Intel Corporation Method for dependency broadcasting through a source organized source view data structure
US9811377B2 (en) 2013-03-15 2017-11-07 Intel Corporation Method for executing multithreaded instructions grouped into blocks
US9858080B2 (en) 2013-03-15 2018-01-02 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US10140138B2 (en) 2013-03-15 2018-11-27 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
US9934042B2 (en) 2013-03-15 2018-04-03 Intel Corporation Method for dependency broadcasting through a block organized source view data structure
US9904625B2 (en) 2013-03-15 2018-02-27 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US9886279B2 (en) 2013-03-15 2018-02-06 Intel Corporation Method for populating and instruction view data structure by using register template snapshots
US9891924B2 (en) 2013-03-15 2018-02-13 Intel Corporation Method for implementing a reduced size register view data structure in a microprocessor
US9898412B2 (en) 2013-03-15 2018-02-20 Intel Corporation Methods, systems and apparatus for predicting the way of a set associative cache
US10740126B2 (en) 2013-03-15 2020-08-11 Intel Corporation Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation
CN107305538B (zh) * 2016-04-22 2020-07-31 中科寒武纪科技股份有限公司 一种子矩阵运算装置及方法
CN107305538A (zh) * 2016-04-22 2017-10-31 北京中科寒武纪科技有限公司 一种子矩阵运算装置及方法
CN108121688B (zh) * 2017-12-15 2020-06-23 中科寒武纪科技股份有限公司 一种计算方法及相关产品
CN108121688A (zh) * 2017-12-15 2018-06-05 北京中科寒武纪科技有限公司 一种计算方法及相关产品
CN112596869A (zh) * 2020-12-08 2021-04-02 海光信息技术股份有限公司 延时模型的构建方法及装置、电子设备及存储介质

Also Published As

Publication number Publication date
EP2477109A1 (en) 2012-07-18
EP2011018A2 (en) 2009-01-07
US20190227982A1 (en) 2019-07-25
US9053292B2 (en) 2015-06-09
US20180137081A1 (en) 2018-05-17
US20150269118A1 (en) 2015-09-24
US20090113170A1 (en) 2009-04-30
CN101449256B (zh) 2013-12-25
CN103646009B (zh) 2016-08-17
CN103646009A (zh) 2014-03-19
WO2007143278A2 (en) 2007-12-13
US8327115B2 (en) 2012-12-04
EP2011018B1 (en) 2016-07-13
EP2011018A4 (en) 2009-12-02
US20130091340A1 (en) 2013-04-11
US11163720B2 (en) 2021-11-02
WO2007143278A3 (en) 2008-10-30
US9886416B2 (en) 2018-02-06
EP2477109B1 (en) 2016-07-13
US10289605B2 (en) 2019-05-14

Similar Documents

Publication Publication Date Title
CN101449256B (zh) 对载明并行和依赖运算的指令矩阵进行处理的装置和方法
Weaver et al. Post-placement C-slow retiming for the Xilinx Virtex FPGA
CN100538628C (zh) 用于在simd结构中处理线程组的系统和方法
US6609189B1 (en) Cycle segmented prefix circuits
US20230325195A1 (en) Replicating logic blocks to enable increased throughput with sequential enabling of input register blocks
Minervini et al. Vitruvius+: an area-efficient RISC-V decoupled vector coprocessor for high performance computing applications
Gunadi et al. CRIB: Consolidated rename, issue, and bypass
Tai et al. Accelerating matrix operations with improved deeply pipelined vector reduction
Uhrig et al. A two-dimensional superscalar processor architecture
Tai et al. Scalable matrix decompositions with multiple cores on FPGAs
Flachs et al. Microarchitecture and implementation of the synergistic processor in 65-nm and 90-nm SOI
Mesa-Martínez et al. Seed: scalable, efficient enforcement of dependences
Vilim In-Database Machine Learning on Reconfigurable Dataflow Accelerators
Snelling The design and analysis of a Stateless Data-Flow Architecture
Tai Accelerating scientific applications on reconfigurable computing systems
Sano Microparallel processors
Schaffer et al. Using hardware multithreading to overcome broadcast/reduction latency in an associative SIMD processor
Chen et al. A Novel Highly Scalable Architecture with Partially Distributed Pipeline and Hardware/Software Instruction Encoding
Sethumadhavan Scalable hardware memory disambiguation
Brown Reducing critical path execution time by breaking critical loops
Wu et al. Fast custom instruction generation under area constraint
González et al. Execute
Yang et al. Reconfiguration support for vector operations
Zang et al. Issue mechanism for embedded simultaneous multithreading processor
Ortancil Analysis of design issues in superscalar machines

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20170317

Address after: American California

Patentee after: Intel Corporation

Address before: American California

Patentee before: Soft Machines Inc.

TR01 Transfer of patent right