CN101449256A - 对载明并行和依赖运算的指令矩阵进行处理的装置和方法 - Google Patents
对载明并行和依赖运算的指令矩阵进行处理的装置和方法 Download PDFInfo
- Publication number
- CN101449256A CN101449256A CNA2007800173174A CN200780017317A CN101449256A CN 101449256 A CN101449256 A CN 101449256A CN A2007800173174 A CNA2007800173174 A CN A2007800173174A CN 200780017317 A CN200780017317 A CN 200780017317A CN 101449256 A CN101449256 A CN 101449256A
- Authority
- CN
- China
- Prior art keywords
- instruction
- matrix
- unit
- section
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000011159 matrix material Substances 0.000 title claims abstract description 139
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000001419 dependent effect Effects 0.000 title abstract description 10
- 238000012545 processing Methods 0.000 title description 4
- 230000008569 process Effects 0.000 claims abstract description 9
- 239000004744 fabric Substances 0.000 claims description 16
- 238000007667 floating Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 7
- 239000000654 additive Substances 0.000 claims description 2
- 230000000996 additive effect Effects 0.000 claims description 2
- 238000003780 insertion Methods 0.000 claims description 2
- 230000037431 insertion Effects 0.000 claims description 2
- 230000011218 segmentation Effects 0.000 description 27
- 230000008901 benefit Effects 0.000 description 15
- 230000017105 transposition Effects 0.000 description 15
- 230000006870 function Effects 0.000 description 14
- 238000006073 displacement reaction Methods 0.000 description 13
- 239000013001 matrix buffer Substances 0.000 description 13
- 230000015654 memory Effects 0.000 description 11
- 230000014509 gene expression Effects 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 230000007246 mechanism Effects 0.000 description 7
- 239000000872 buffer Substances 0.000 description 6
- 238000000926 separation method Methods 0.000 description 5
- 230000001360 synchronised effect Effects 0.000 description 5
- 238000009825 accumulation Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000007689 inspection Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000003139 buffering effect Effects 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3854—Instruction completion, e.g. retiring, committing or graduating
- G06F9/3858—Result writeback, i.e. updating the architectural state or memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/533—Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even
- G06F7/5334—Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product
- G06F7/5336—Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product overlapped, i.e. with successive bitgroups sharing one or more bits being recoded into signed digit representation, e.g. using the Modified Booth Algorithm
- G06F7/5338—Reduction of the number of iteration steps or stages, e.g. using the Booth algorithm, log-sum, odd-even by using multiple bit scanning, i.e. by decoding groups of successive multiplier bits in order to select an appropriate precalculated multiple of the multiplicand as a partial product overlapped, i.e. with successive bitgroups sharing one or more bits being recoded into signed digit representation, e.g. using the Modified Booth Algorithm each bitgroup having two new bits, e.g. 2nd order MBA
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30105—Register structure
- G06F9/30109—Register structure having multiple operands in a single register
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
- G06F9/30123—Organisation of register space, e.g. banked or distributed register file according to context, e.g. thread buffers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30141—Implementation provisions of register files, e.g. ports
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
- G06F9/3016—Decoding the operand specifier, e.g. specifier format
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
- G06F9/30189—Instruction operation extension or modification according to execution mode, e.g. mode flag
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
- G06F9/3826—Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage
- G06F9/3828—Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage with global bypass, e.g. between pipelines, between clusters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3838—Dependency mechanisms, e.g. register scoreboarding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3853—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G06F9/3891—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute organised in groups of units sharing resources, e.g. clusters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/53—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
- G06F7/5318—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel with column wise addition of partial products, e.g. using Wallace tree, Dadda counters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Advance Control (AREA)
- Multi Processors (AREA)
Abstract
执行块的矩阵形成多行和多列。多行支持指令的并行执行,多列支持依赖指令的执行。执行块矩阵处理载明并行指令和依赖指令的单个指令块。
Description
技术领域
[0001]本发明主要涉及计算机体系结构。更具体地,本发明涉及一种用于处理指定并行运算和依赖运算的矩阵指令的计算机体系结构。
背景技术
[0002]改善计算机体系结构性能是一个很困难的任务。已经通过频率变标(frequency scaling)、单指令多数据(SIMD)、超长指令字(VLIW)、多线程和多处理器技术来寻求上述的性能改善。这些方法着重于改善程序执行的吞吐量。上述技术中的很大一部分要求软件显式(explicitly)地呈现出并行性。与此不同的是,频率变标在改善吞吐量和延迟的同时无需软件显式地标记并行性。近来,频率变标面临着功耗壁垒(power wall)问题,所以很难通过频率变标进行改进。因此,除非表示出大规模的显式软件并行(explicitsoftware parallelization),否则很难增加吞吐量。
[0003]鉴于上述问题,人们期望在不依赖于频率变标和大规模的显式软件并行的前提下改善计算机体系结构的性能。
发明内容
[0004]执行块矩阵形成多行和多列。多行支持指令的并行执行,而多列支持依赖指令的执行。执行块矩阵处理载明并行指令和依赖指令的单个指令矩阵。
附图说明
[0005]参考详细说明并结合附图,本发明将得到更完整地阐释,在附图中:
[0006]图1表示按照本发明实施例的支持执行并行指令和依赖指令的体系结构;
[0007]图2A表示映射串行指令以在本发明的执行矩阵中产生并行和依赖运算;
[0008]图2B表示对本发明执行矩阵进行的运算映射;
[0009]图3表示按照本发明实施例可能用到的一行执行块;
[0010]图4A表示按照本发明实施例可能用到的执行块和支持寄存器文件及缓冲器;
[0011]图4B表示按照本发明实施例配置的寄存器文件;
[0012]图5表示可用于本发明缓冲器的各种数据结构;
[0013]图6表示按照本发明实施例可使用的流水线;
[0014]图7表示按照本发明实施例的矩阵指令处理;
[0015]图8表示对本发明执行块进行运算映射;
[0016]图9表示按照本发明实施例使用的矩阵缓冲器;
[0017]图10A表示实现本发明运算的通用单元;
[0018]图10B表示用于图10A中4位加法器的超前进位加法器的1个位单元;
[0019]图11表示按照本发明实施例配置的延时切片体系结构(TLSA)单元;
[0020]图12表示按照本发明实施例的交错结构中的多个TLSA单元;
[0021]图13表示现有的乘法运算技术;
[0022]图14表示按照本发明实施例采用的延时乘法运算技术;
[0023]图15表示现有的布斯编码乘法;
[0024]图16表示实现图15的布斯编码乘法的逻辑;
[0025]图17表示按照本发明实施例的布斯编码乘法;
[0026]图18表示实现图17的布斯编码乘法的逻辑;
[0027]图19表示按照本发明实施例配置的存储器/寄存器文件数据块;
[0028]图20表示按照本发明实施例配置的时间切片存储器;
[0029]图21表示按照本发明实施例配置带有转置/移位单元的TLSA体系结构;
[0030]图22表示按照本发明实施例配置的浮点乘累加单元。
在所有附图中,相似的附图标记指代对应的部分。
具体实施方式
[0031]允许非显式并行软件在单处理器硬件上执行是非常有益处的,这种单处理器硬件能够在短的延迟流水线(latency pipeline)中处理大量指令。当前解决方案的现有观点为;超标量处理器可以在峰值瞬间的一个周期内实际处理4-5条指令,这类似于好的VLIW编译器在峰值瞬间实现的处理量。人们已经认识到实际上无法实现超标量处理器在单个周期/瞬间内调度10条指令。类似地,如果编译器尝试并行处理用于带有10个时隙的VLIW体系结构的通用程序,则会导致留下大量未填充的时隙。
[0032]本发明提供的体系结构和处理器实现了在短的流水线中高度并行地读数、调度、解码和执行多条指令,极大地改善了执行吞吐量,同时保持了单处理器流水线运算的更优化的延迟,并具有高效的分时、功率、面积和路径。
[0033]在这个被称为超大指令矩阵(Ultra Large Instruction Matrix,ULIM)的新体系结构中,固定大小的指令矩阵模板被调度为作为并行和依赖指令组的矩阵在硬件中执行。与使用超长指令宽度模板仅对并行指令进行调度的VLIW不同,ULIM体系结构的模板封装了并行指令和依赖指令的指令组。这可以被看作二维模板矩阵,其中横向分配并行指令,同时仅纵向分配依赖指令。这个模板以一个指令矩阵的形式被发送到执行硬件,其中各个执行块处理一个指令矩阵。
[0034]图1表示这种模板,其中,如果将一个指令放入时隙101,那么另一个可与其并行执行的指令可放入与其处于同一行的并行时隙1到N中的任何一个中。然而,与设置在时隙101中的指令相依赖的指令不能被放入相同的行,而是必须放入随后的行102(例如,串行时隙1的并行时隙0)。标记为102的时隙接收101产生的结果、或外部输入操作数、或这二者的结合作为输入。这种指令模板的执行模型是任意行的指令都在下一行的指令执行之前执行。
[0035]ULIM指令组可由ULIM编译器静态调度,或者由ULIM硬件组成单元动态调度。如果与传统体系结构进行比较,那么这种体系结构的意义则非常明显。在传统体系结构中,单行N1并行指令可放进一个例如VLIW或SIMD模板的模板中。这意味着每个周期都可以执行N1个指令(全部指令都必须保证是并行的,这对于很大的N而言是非常严格的约束)。另一方面,通过在每个行边界的流水线执行,ULIM体系结构在每个周期可执行N1*N2个指令。尽管每个中周期执行N1*N2个指令,ULIM体系结构仅有如下的相同约束:确保并行执行N1个指令。
[0036]指令矩阵模板可由编程器或编译器/硬件格式程序构建,其中二者都不需要被限制成仅找到并行指令以调度每个周期。有效的并行指令可以被提取出来并且按行横向设置。通过在后续行中设置依赖指令,可以增加并行指令。矩阵可以在单个或多个周期内执行。矩阵可以本地化存储接近执行硬件的操作数,从而优化相互关联的复杂度、面积、速度和功率。
[0037]参考图2A的实例,本发明得到更完整的说明。图2A表示由单个串行程序流表示的指令序列200的实例。该图也表示如何对指令分组,从而形成在矩阵模板中采用的兼有串行时隙和并行时隙的单指令矩阵。矩阵模板被用于执行块201。执行块201包括乘法单元202、浮点加法单元204、加载单元206、乘累加(multiple accumulate,MAC)单元208、逻辑单元(例如ALU)210和另一个逻辑单元212。
[0038]因此,指令矩阵引用是指并行和串行指令的指令组模板。执行块是指执行指令矩阵的执行单元组。执行单元是执行块内部单独的计算单元(例如,复杂单元和简单单元两者)。
[0039]诸如214之类的直线箭头指出指令之间的依赖性。这种映射可由编译器、前端硬件或运行时环境实现。图2A还表示了格式化运算图,其显示出当矩阵中的指令将在带有并行和串行依赖性指示器的对应硬件位置处被执行时,这些指令在矩阵中的相对物理位置。
[0040]如图2A所示,单数据流程序再形成为静态确定串行执行的指令以及指令并行的指令矩阵。例如,在图2A的串行单一程序流中,最后一个指令采用R3作为源,同时第四个指令将R3写作结果。通过将最后一条指令放置在紧接着第四条指令所在那行之后的一行中,这就映射在矩阵模板中。线214表示这种依赖性。
[0041]图2B中解释了编码指令及其用于图2A所示模板的源和目的地,其中指令矩阵模板包括用于指令的操作码,并指定操作数和结果寄存器。源和目的地与操作码分离。这样简化了解码阶段、依赖性解析阶段和寄存器/存储器读取阶段。
[0042]如图3所示,可以同时发出几个指令矩阵。图3表示201的指令矩阵运算图,在此表示为301,以及可选配置的指令矩阵运算图300、302和304。可采用下列模型之一来实现同时的发出:
1-MIMD或SIMD:软件/编译器将多个矩阵合并为一个超级矩阵。
2-线程模型:每个矩阵属于分离的软件或硬件线程。
3-动态执行:如果在来自单个流的多个矩阵之间没有依赖性,则可由硬件动态发出矩阵。
[0043]图3中由运算图表示的指令矩阵模板在将这些指令时隙一一对应地映射到执行单元(执行块内部)的硬件上执行。矩阵粒度允许形成基线矩阵并将那些矩阵合并为一个超级矩阵。这在图4A中表示,其中4基线指令矩阵在四个执行块301、300、302和304上并行执行。每个执行程序由两行复杂单元组成。第一行具有3个复杂单元(例如402、404和406),而另一行具有3个复杂单元(例如408、410和412)。每个复杂单元可操作地进行复杂运算,比如乘法运算、浮点加法运算或者乘-累加运算。因此,比如复杂单元402可以对应于图2的202,复杂单元404可以对应于图2的204等等。
[0044]复杂单元可操作地进行至多4个简单运算,比如ALU运算。因此,这种复杂单元被显示为具有多个操作数和多个输出。这些单元可设置成计算并行运算或者顺序地执行依赖运算。每个基线指令矩阵在线程模式下可以独立于其他矩阵运行,或者多个基线指令矩阵合并成在同一周期内运行的组,作为组成MIMD体系结构的超级指令矩阵。整体矩阵可以在一个周期内执行,或者可在多个周期内按照流水线方式执行。
[0045]作为操作图4A所示的执行引擎来执行图3所示ULIM指令模板的实例,在复杂单元408中执行指令MAC 208,同时在一对408的单元中执行简单逻辑和ALU运算对210。图4A表示与VLIW体系结构的异同。如果在图4A中采用上部3个时隙(402、404、406)执行3个并行指令,那么ULIM模板中的这3个时隙将类似于等效的VLIW指令。利用简单单元(408、410、412)下一行执行3个或更多的并行指令,将意味着正在执行依赖于先前指令的另一个等效VLIW。这样,ULIM体系结构可以被看成在空间中执行,并具有固定延迟、固定数量的多个相互依赖的VLIW指令。另外,该体系结构允许在一个简单时隙中执行单个复杂指令或多个简单指令,而这在VLIW模板中是不允许的。
[0046]图3中的运算图301、300、302和304表示的4个ULIM矩阵可在图4A的硬件上执行。这可以按照3种执行模式之一来实现:要么通过编译器/编程器将这些矩阵集合起来形成MIMD超级指令矩阵,要么每个矩阵可以独立地以线程模式执行,线程模式中分离的线程同时在四个硬件分段(301、300、302和304)上的每个硬件分段上执行。最后一种执行模式可能是利用硬件依赖性检查来确保同时在图4A中的四个不同硬件分段上执行的不同矩阵之间不具有依赖性,从而能够动态地在单个线程中执行4个不同的指令矩阵。
[0047]图4A中的寄存器文件420可选地根据执行模式进行配置。在一种模式下,寄存器文件要么被看作提供4段MIMD宽度的MIMD分段寄存器文件,要么用作4个独立的寄存器文件,每个寄存器文件提供一个分离的线程。寄存器文件还可以支持动态执行模式,其中4段是一个合一的寄存器文件,其中写入一个指定分段中任何寄存器的数据都可由其他分段中的全部单元进行访问。因为不同的执行模式可以在独立线程基线指令矩阵和MIMD超级指令矩阵线程之间转换,所以那些模式之间的转换可以是无缝的。
[0048]每个单指令矩阵是并行指令和依赖指令的混合。此外,每个独立指令可以是标量或SIMD指令。在最细的粒度下,该指令可以类似于对多个字节/字或单个标量实体进行的可变数据宽度SIMD运算。
[0049]在多线程执行模式下,每个寄存器文件及其执行线程的执行单元完全独立于其他寄存器文件及其线程。这类似于每个线程都具有其自己的寄存器状态。然而,可以指定这些线程之间的依赖性。属于一个线程的每个矩阵将在该线程的寄存器文件的执行单元中执行。如果只有一个线程或非线程化单一程序在图4A中的硬件上执行,那么采用下列方法允许属于单个线程/程序的并行矩阵能够访问已写入其他分段内的寄存器中的结果。实现这点的方法是允许任何矩阵将结果写入到4个寄存器文件的任何一个,从而产生其他寄存器文件分段中那些寄存器的拷贝。这在物理上是通过将每个分段的写端口延伸到剩余分段中实现的。然而,因为不能建立有效率的寄存器文件,使得每个存储单元具有的写端口的数量是一个分段单独所需写端口数量的4倍,所以这不是可伸缩的。现在提供一种机制,其中使得寄存器文件的建立不受这种单线程寄存器广播扩展的影响。这种机制如图4B所示。
[0050]图4B显示由24个寄存器组成的寄存器文件的一段,其中属于单个线程的矩阵正在保存位于该分段的24个条目寄存器文件中的执行结果。同时,3个其他的并行矩阵正在图4A中其他3个分段上执行,并且它们的执行结果被广播到这个寄存器文件分段。
[0051]通过将每个矩阵的结果限制到不相重叠的6个寄存器组,将写端口配置成启动单个线程寄存器广播。通过分段写端口实现了这种方法,其中每个写端口写入到寄存器430的分离的组中。来自其他分段的写端口440将写入到不同的不相重叠的寄存器组中。
[0052]如果这是线程模式或MIMD模式,那么由该分段的结果使用那些进入不重叠组的全部写端口,从而利用并写入全部24个条目寄存器文件,并且因为其他分段具有使用独立寄存器的独立代码(意味着本地分段寄存器将需要使用全部寄存器),所以不进行广播。另一方面,如果使用单线程,那么全部分段协作起来为这个单线程工作。在这种情况下的寄存器总数仅仅为24,因此跨过剩余分段(24条目*3分段)的寄存器可用于互相保存拷贝。通过编译器进行分析以判断矩阵是否可以并行化并对那些有机会并行执行不重叠寄存器组的矩阵进行分配,可以分配这个寄存器组。
[0053]尽管根据全部4个分段写入结果,寄存器文件中的每个存储单元仅仅具有支持一个分段的端口。在传统的寄存器文件中必需支持4个分段,在下面所示的实例中进行了4折(fold)的增加。
[0054]在ULIM基线格式之上,这些体系结构中实现了ULIM中的数据并行性。这是通过允许ULIM模板中的每个指令都是SIMD/MIMD指令来实现的。在前述附图中,每个并行时隙可以支持内部复制的SIMD结构,同时由不同的并行/串行时隙支持MIMD。
[0055]依赖于不同的预期访问性质,能够看到存储器/寄存器或矩阵缓冲器可由指令访问。例如,数据矩阵可以被看作宽数据元(element)的MIMD,小数据元的SIMD,或混合数据宽度SIMD指令的MIMD。
[0056]在图5中,显示了数据的存储器/寄存器矩阵缓冲器布置的2个视图。右侧视图表示每行和每列中的相互垂直的数据元。这样支持不同的MIMD/SIMD数据的组合。左侧视图表示任意行中每个位置的不同元,但是列表示更大数据量元的剩余位数。例如,右侧视图可以表示4条MIMD指令,每条指令对4个不同的SIMD字节进行运算,每条指令是一个字节的并行数据元。同时左侧表示4条MIMD指令,每条指令对纵向布置的32位元进行运算(实际的物理布置不同于所示的逻辑表示)。此外,如果视图是MIMD视图,那么全都属于4分段的一个单独的MIMD寄存器,但是如果该视图不是MIMD视图,那么那些寄存器是4个纵向布置的独立寄存器。
[0057]这个存储器和寄存器文件视图及其对应的执行模式的意义在于,它启动执行单元变形(morph)来执行宽MIMD/SIMD指令(将全部寄存器分段合并成4路MIMD/SIMD),但是同时4个寄存器文件分段和附加的对应执行单元能作为4个独立单元对4个不同的标量寄存器进行运算,允许以最小粒度级别在执行单元内部以单线程和多线程执行。
[0058]ULIM体系结构具有类似于VLIW或MIMD模板的固定大小的指令模板。与VLIW或MIMD模板不同,ULIM模板允许指定并行指令以及依赖指令两者。写入寄存器以及在通过寄存器名称传达依赖性的矩阵内部的指令依赖性都遵循相同的冯诺依曼指令体系结构。ULIM体系结构另一个值得注意的方面在于,矩阵中每个指令在矩阵中具有固定的预定位置,并且以相对于矩阵中其他指令的固定分时进行执行。ULIM矩阵宽度类似于对应的VLIW模板的宽度。事实上,通过在ULIM模板一个横向行中放置一条VLIW指令,可以将依赖VLIW指令以串行流转换到ULIM模板中。不可能相似于利用VLIW指令流的全部可能的ULIM模板,因为ULIM模板可以包括在一个时隙中处于同一行的一条复杂指令以及在横向相邻时隙中的多条简单指令。
[0059]与执行单独指令的传统体系结构不同,使用组成指令矩阵的矩阵体系结构的优点很多。下面的论述示出了启动和利用本发明指令矩阵和执行块体系结构的机制建立和实现大规模并行单处理器流水线。
[0060]图6表示可以实现的利用ULIM体系结构的处理器流水线。本发明利用一种机制来简化提取(fetch)阶段600、分支(branch)分析和解码阶段608。提取单元600基于指令矩阵而不是基于指令取得代码。通过矩阵大小而不是指令大小来增加这种体系结构的程序计数器。这样意味着在每个周期提取大量指令。为了能够实现上述目的,ULIM矩阵不允许在ULIM矩阵内部存在分支,但可以在ULIM矩阵之间存在分支。分支分析在两种级别上进行。在ULIM矩阵内部,分支被替换为条件执行、条件移动和预测。在矩阵之间,通过路径预计和分支着色对分支进行处理。这样允许分组到矩阵中的大量指令前进通过流水线提取阶段和分支分析阶段。
[0061]在单个矩阵内部沿着侧面并行指令执行依赖指令解除了由编译器构造全部并行指令时隙代码的困难。它也简化了在分数板硬件(score boardhardware)602、分派单元604或硬件调度单元中的数据依赖性检查。这是通过利用矩阵数目作为工具执行矩阵之间的分数板和依赖性保持、而不是利用单独的寄存器或单独的指令实现的。在图7的实例中,分数板仅通过参考矩阵数目2、4和5归纳矩阵8的依赖性优先级,这意味着矩阵8需要从那些矩阵中读出数据结果。分数板依赖性检查不需要参考(reference)单独寄存器或指令信息以维持依赖性分数板。矩阵数目携带这种信息并且足以保持依赖性检查的正确性。依赖于其他矩阵的矩阵内部指令可以在那些矩阵已被执行之后发出。当矩阵依赖于延迟(例如,因为高速缓存未中)时,阻止整个矩阵被分派。在ULIM流水线实现的一个实施例中,延迟解码阶段608,直到该阶段正好在执行阶段之前为止,并且该阶段与寄存器读阶段606并行执行。
[0062]图8表示对ULIM体系结构指令模板进行编码的一个实施例。其关键在于编码和操作数的结构(结果寄存器和源寄存器)。在ULIM模板分离分段中指定指令结果和源操作数的寄存器,而不管指令的操作码并且不管这些指令是复杂运算或简单指令对。这种矩阵格式以明确的矩阵分段的方式列出源和目的地,使得获取源寄存器和目的地寄存器与矩阵内部指令解码无关。这样能实现延迟解码阶段,其中单独指令操作数的实际解码被延迟,直到仅仅先于执行阶段为止,并且这种解码和读取寄存器来启动下一个周期执行并行进行。这样也简化了依赖性分析和分数板实现。
[0063]如果时隙包括复杂指令,比如“乘累加(MAC)”,那么需要4个源并且写回2个结果。如果同一时隙包括两个简单指令,比如Logic和ALU,那么每个指令需要两个源并写回一个结果,其中两者合起来需要4个源并产生两个结果。这使得源和结果的数量与运算类型无关。
[0064]如图4A所示的处理器执行硬件包括寄存器读写机制,其中矩阵操作数缓冲器能够基于每个源在相应的矩阵硬件元上执行的物理位置,汇集所需寄存器源和目的地。这样减少了读写端口数目以及寄存器文件的带宽要求。利用这种矩阵体系结构的特点,在矩阵缓冲器中对源和/或目的地进行缓冲的旁路机制得以简化,其中由每个执行单元关闭或附着到矩阵缓冲器能够比传统的寄存器文件(特别在需要支持如此多的单个执行单元的大型寄存器文件的情况下)提供更短的访问时间以及更大的源和结果端口带宽。
[0065]图9表示本发明的矩阵缓冲器的原理。该图显示指令矩阵操作数缓冲器900连接到执行单元902。在这个实例中,指令矩阵操作数缓冲器900为3个不同的矩阵缓冲源和目的地。尤其重要的是写端口904设置成每个写端口向不同的存储器单元写入。这意味着即使同时写入6个结果,矩阵操作数缓冲器存储器单元仍是单个端口的,所述6个结果相当于传统的6路端口的寄存器文件。此外,每个写端口的扇出(需要驱动的单元)等于矩阵缓冲器中的矩阵数目(在这个实例中仅为3)。这些特征在面积、功率和访问速度方面具有很多的优点,使得这种缓冲器结构具有很好的可伸缩性,并且对于作为高带宽高速寄存器文件的备选非常具有吸引力。
[0066]下列方法描述如何从矩阵操作数缓冲器对寄存器进行写入和访问。每个矩阵正好在或略先于矩阵准备好执行而被分配到矩阵缓冲器中任何可用的矩阵空间中。矩阵需要能够开始执行的远端源(例如不在这个矩阵缓冲器内部的其他矩阵存储器中的源)可以暂时在这个矩阵存储器中分级。
[0067]在执行矩阵之后,结果写入为这个矩阵存储器分配的区域中(图9的矩阵缓冲器内的3个矩阵位置之一)。每个结果被写入到由结果写端口访问的对应位置,而与结果寄存器数目无关。与矩阵位置一起的这个位置被传输到类似于图7所示的分数板机制的消耗矩阵中,使得依赖于该矩阵的每个矩阵将用那些源来自的矩阵位置以及矩阵内部每个源的位置来标记它的寄存器源。矩阵内部的结果位置能够在执行时间由执行硬件传递,或者因为矩阵模板在编译时固定,所以矩阵内部结果位置能伴随着源寄存器数目被软件嵌入矩阵指令模板中。
[0068]基本思想是建立起可伸缩的伴随着寄存器文件的源和结果缓冲器结构,其中那些缓冲器连接到执行单元,从而通过在常规寄存器文件和执行单元之间建立起中间介质的标识方法,在基于位置的矩阵中临时保持数据,以允许更高带宽和速度。可利用矩阵位置和矩阵内部源位置访问那些矩阵缓冲器中的值。例如,通过记录在矩阵缓冲器何处分配该矩阵,以及通过在矩阵内部物理占用的寄存器5的结果的条目编号来索引该矩阵本身的存储器,能够访问通过矩阵20写入的寄存器5。然而,在矩阵被从矩阵缓冲器中解除分配之后,那么矩阵内部的全部条目保持寄存器值将写回实际的寄存器文件并且通过从该时刻往前的寄存器编号来访问。用于寄存器文件的利用矩阵数据缓冲器的基于位置的前述相同标识和访问方法可被用于使用存储缓冲器高速缓冲的存储器访问。
[0069]如果下层硬件是统一的并且是重复的,那么可以容易地通过编译器构造该矩阵的体系结构。它也允许更有效率地使用功率和硅面积。在此,我们引入通用单元的概念,所述通用单元是根据诸如小加法器、逻辑门、复用器(multiplexer)和布斯(booth)单元之类的基础建立元而构成的。
[0070]通用单元的体系结构允许它在每一个通用单元的内部执行全部功能/运算。这意味着每个单元都能够进行加法、乘法、移位、转置(permute)等。这种方法能执行这种通用功能的方式在于它的组成脱离了用于执行简单ALU运算的基础建立元。另一方面,这些简单元级联地执行复杂运算。它也可以处理浮点和整数数据。上述思想有利于理解通用单元的概念,但是它通过简化调度并将机器吞吐量利用到极限,从而获得了显著的优点。在常规的体系结构中,采用分离的单元执行每个运算或功能。它们共享相同的执行端口。这样,当使用一个单元时,没有使用其他单元。此外,传统体系结构中的延迟在它们之间发生变化,使得调度器难于调度它们。相反,在这个通用单元中,对于整个单元统一了延迟,并且调度器可以看到通用单元的对称例化。
[0071]图10A显示这个通用单元的一部分。每个通用单元可以执行不同的执行单元功能,比如乘法器、加法器、移位器、转置器等等。这个实施例显示进位保留加法器和/或普通加法器1000的结构。该单元由基础要件组成,每个要件具有4行(可以是更多或更少的加法器)能并行加8个输入(4个并行/串行的加运算)的加法器。这些加法器然后被构造成组。一行中的每个加法器要么连接到下面一行相同位置地址加法器(以执行串行ALU运算),要么被连接到下面一行它右侧的加法器以执行乘法运算。该运算可以是C*B+A或A OP B,在每行中形成4个并行/串行ALU运算。另外,这些加法器组有可能具有不同的数据大小。加法器的这种结构允许树执行多操作数加法、乘法、乘累积、差分求和、移位及循环。另外,复用器(未在图中显示)将对齐/转置/移位输入或中间输出,从而获得所需的运算,包括移位和转置运算。布斯单元/位乘单元1002被加到加法器,从而实现乘法。增加其他具体的逻辑、状态、存储器或LUT元,从而提供扩展功能。
[0072]通用单元允许利用与执行乘法运算相同的结构或与用于执行浮点加或浮点乘累积相同的结构实现转置/移位单元。其优点在于允许较少的路径/面积来实现转置/移位逻辑。通过执行2x乘,利用乘结构执行移位或循环(rotate),其中x是移位计数值。通过分别选择乘积结果的上部分或者下半部,或者在乘积结果的上部分和下部分之间执行"或"函数,来实现左移、右移或循环。
[0073]利用基础的2-输入加法器结构,每个元组成一组位。进位保存加法器还可以利用逻辑和MUX(多路复用器)来实现。例如,为了建立32*32元,基组可以由8位或4位基础加法器和MUXES构建,并且能利用修改过的超前进位加法器单元内部逻辑门执行逻辑函数。
[0074]图10B显示用于图10A中4位加法器的修改后的超前进位加法器的基础一位单元,从而要么产生加法器输出,要么产生选择的逻辑输出。通过不在加法器的关键路径中的连接1010和两个4:1复用器1012显示出这种修改。原来的加法器位结构(标记为1014)包括超前进位和求和逻辑。这些图是逻辑表示,其实际电路可能不同。
[0075]4位(或8位)组便于执行各种大小的SIMD宽度以及进行16位运算。通过相同的原理,那些16位的片(tiles)可以级联起来用于更大宽度的数据运算,比如64位运算,同时仍然便于加法和多操作数加法、移位、循环和乘法的8位、16位和32位SIMD运算。
[0076]这种结构背后的基本原理在于能在同一结构上执行并行和串行指令的组合。例如,第一行结构可以执行单个32位ALU,该ALU可以在第二行继之以依赖ALU或者无关ALU等等。4行一起可以执行至多四个32位ALU串行/平行运算,或者单个32位乘法运算。它还可以对子矩阵执行部分宽度SIMD运算。尤其在矩阵数据和指令分段中,在这个通用单元上调度的指令和操作数成为一个组。
[0077]因为在同一周期或在接下来的周期内,对待调度的依赖指令进行流水线处理依赖于所需频率,所以可以利用指令矩阵体系结构在一个周期内对指令进行流水线处理。存在多种方法来利用ULIM体系结构。该系统允许超大指令矩阵调度。并行指令以及依赖指令作为一个矩阵进行调度(这与VLIW中仅能调度并行指令形成对比)。这个矩阵中每个指令或依赖指令可以是标量或SIMD。
[0078]该发明可用很多种方法实现。例如,可以在时钟周期内部分级多个依赖指令。在本发明的这个具体实施例中,可以在一个时钟周期内部分级多个依赖和并行指令。可以在一个周期内开始多个依赖指令;这样减少了程序执行的最优关键路径。将一个单元内的每种基本运算从随后的运算中分离,利用状态元可以对多个依赖指令进行流水线处理。这样增加了流水线执行率。然而,因为时钟速度和额外的状态元,这种结构的功率将增加。状态元可以保持定值,但是利用波(wave)流水线提高了通过该结构提取的数据率。
[0079]该发明也包括延时切片体系结构(Time-Lag Sliced Architecture,TLSA),其减少了依赖指令的等待时间。延时切片体系结构背后的基本思想在于一种运算通过数字切片产生其结果数字切片。每个切片早于下一个切片一个切片间的延时产生。一旦产生第一个切片,可以开始执行下一个计算,然后产生其自己的切片。在此描述的切片结构是总体结构,适用于计算单元以及寄存器文件和存储器。该体系结构应用于全部算术、移位、整数和浮点运算。
[0080]TLSA用于实现整个系统总体结构,包括存储器和计算。数字切片不必等于固定大小的数字。该发明可以对数字切片方式到达的两个操作数进行运算。本发明可以实现布斯编码乘法器、变量移位器、转置引擎以及浮点加法器和乘法器。
[0081]在设计的执行单元中,普通方法是将算术/逻辑或移位器单元的输出结果的数字或数字位作为一个单输出结果进行同步。该结果要么锁定到存储元或者同时作为一个块分级到接收元中。然而,在这种体系结构中,细粒执行提供或不提供粗粒同步执行。基本思想是在时间和/或空间上分级的延时逻辑切片结构中用公式表示算术或转置/移位运算。执行硬件以延时流连接,其中先前切片执行得更快并且更快产生输出结果的切片,同时后面的切片就需要更多时间来执行并以相对于先前切片延迟产生结果。值得注意的是,即使在单个执行单元内部,比如加法器或转置器内部,也以位/数字的细粒度进行切片。这个体系结构可以利用这种数字/位逻辑电路切片结构来优化逻辑分时关键路径和/或信号路由路径编号和/或执行算术、转置、移位等用于整数和/或浮点运算的面积。该切片可以具有相等的位/数字或不同数目的位/数字。这种体系结构的一个特别的优点在于能够在最后确定全部源指令的输出结果切片之前,开始执行依赖指令。
[0082]图11表示TLSA单元1100,其中数据流入流出切片1102A-1102C。每个切片输出具有相对于前一个切片的延时延迟(d)。这种延时特征允许对该单元进行流水线处理,以便于不同切片的状态元不同步到一个时间,这典型地用于行/分级触发器。
[0083]图11的体系结构包括基础加法器阶段结构,用于计算基础计算任务,诸如多操作数加法或子块乘法等。利用不作为基础任务关键路径的一部分的分级元1104A-1104C,上述计算被级联起来。依赖于被切片的基础计算任务,分级元可以是加法器、复用器或逻辑门。这些分级元1104的延迟最小化并等于时间"d","d"就是每个切片输出和下一个切片输出之间的延迟。
[0084]分级元被选作具有最小的延迟可能,因为它也建立起切片之间的输入操作数切片的延迟。图11中的基础计算任务可采用任意加法器级。那些加法器可以是任何类型,比如是二进位的或进位保留加法器(CSA)。图11的体系结构的优点在于:产生结果的第一切片早于最终结果。第一切片被送到后续运算。
[0085]图12表示本发明的一个实施例,其中多个延时切片单元背对背的交错。在这个实施例中,每个对角线切片表示被分成切片的一个指令计算单元。在此,相对于前一切片的延时处开始执行每个切片。每个单元反馈一个后续的单元。这个实施例显示背对背的四个单元。标记SU1_0引用单元号1的切片号0。SU4_7引用单元号4的切片号7。
[0086]图12所示体系结构允许(如有必要)多个TLSA单元在单个周期内(或者在多个周期)内处理数据。第一单元的下切片反馈第二单元的下切片,并且继而第二单元反馈第三单元,然后第三单元反馈第四单元等。还应当注意到,除第一单元的第一切片反馈第二单元的第一切片之外,它也反馈它自身单元(第一单元)的第二切片。图12表示下列原理:
1-TLSA中的子周期/多周期执行
TLSA允许在一个周期内部执行算术/移位/逻辑运算。图12表示4个单元在一个周期执行,其中每个切片具有延迟的时钟周期。通过同样的方式,在每个单元切片的输出(而不是4的输出)处执行流水线,从而增加多个周期中的吞吐量并在多个周期中进行。
2-异步/同步/波TLSA拓扑
TLSA可以设计成至少3个不同的拓扑或这些拓扑的组合:
A-异步:其中无需同步状态元(例如,触发),切片输入/输出在周期时间内相互通信。这样允许去除内部流水线状态元,并且启动电源放缓时钟域。
B-同步:每个切片计时成状态元(触发/锁定/多米诺(domino)等等)。这样允许更高的时钟吞吐量和流水线速率。
C-波:在这种拓扑里,一片一片地将数据馈入到单元中,接下来输入的数据快于正常的流水线速率。正常流水线速率依赖于两个状态元之间的逻辑路径的最大时间。波流水线速率依赖于两个状态元之间的逻辑路径的最小时间。
一种有趣的拓扑组合是细粒异步粗粒同步(FGA-CGS)。在这种模式中,利用异步连接的延时切片实现TLSA,其中细粒异步执行提供或不提供粗粒同步执行。基本思想是以这种切片分级处理结构来用公式表示算术或转置/移位运算,其中执行硬件的不同切片的执行在一个延迟流中异步连接,其中的早期的切片具有较少的输入并且更快地执行,然后更快地产生其输出结果。后面的切片具有更多的输入,因此需要更多的时间以相对于早期切片的延迟执行并产生结果。然后每个切片以相对于前面的切片具有延时的同步(或异步)元进行计时。
[0087]图12表示在一个周期内交错的多个TLSA单元(在1个时钟内4个背对背加倍计数器单元),同时该实现表示单元切片以异步方式(细粒同步)相互通信的FGA-CGS实现,同时在输出到状态元时钟时同步每个切片或背对背切片。每个切片的输出可以同步到不同的时钟(延迟版本)。
[0088]TLSA体系结构支持超大指令矩阵(ULIM)体系结构。在这个TLSA实施例中,整个指令组被调度,其中在上述TLSA簇的实例上调度多个并行指令和依赖指令。每个单元还可以支持SIMD数据,其中数据切片副本被实例化,但是受到相同的指令激励的控制。另外,多个SIMD通用单元的这种组合结构可以被横向地实例化,从而在单个单元上实现MIMD体系结构。这种方法在一个周期内调度包含SIMD指令和依赖指令的并行变化两者的整个指令模板。
[0089]图13表示传统的乘法技术。具体而言,每个B数字乘以A数字的集合,在乘法开始时每个数字必须在乘法执行单元处可用。然后对结果求和。相反,利用本发明,如图14所示,操作数一个数字一个数字地到达。可见图14每行表示乘法运算的部分乘积,但是相对于数字切片到达时间,仅仅包括当前到达和先前到达的数字。
[0090]为了显示如何建立使用布斯编码乘法的逻辑结构来执行延时切片结构,图15显示了传统的布斯编码32位常规乘法器。布斯编码器组顺序将多位相加以产生数字。这种分组可以减少最大数字值,所述最大数字值是通过编号上连续的两个数字的标记组合来表示这些位的。例如,3位数字的最大值为7,但是通过在该数字值左侧加1,则数字7现在变为了-1。利用已标记的数字表示,允许那些已编码数字值达到原始数字值的1/2的最大值。
[0091]图16实现了图15所示利用根值4数字进行常规布斯乘法的传统逻辑结构。选择器1600A、1600B选择被乘数的哪一个倍数来用完可能的已标记值(0,1,-1,2,-2);通过该乘法器位的布斯编码来确定这种选择。[0092]图17和18显示新的布斯编码方案以及实现这种方案的新的TLSA逻辑结构。注意,传统的乘法器具有连续不断编码的位流,同时TLSA布斯编码器在数字切片边界处在位流中插入O(在这个实例中是8位数字)。无论乘法的符号(最后2位表示符号),插入的0都不变化。
[0093]图18显示出如何实现新修改的延时切片布斯编码32位乘法器。这种实现相似于图11所示的通用TLSA结构。图18实现如图14和17所示的子乘法运算。
[0094]在ULIM基线格式之上,这些体系结构中实现了TLSA中的数据并行。通过允许SIMD/MIMD的TLSA数据格式中的每个指令按照正交维数组织到TLSA切片中实现了这种方法。图19显示由能够根据计划访问的性质不同而看到的指令访问存储器/寄存器文件数据块的结构。例如,数据块可以看作宽数据元的MIMD、小数据元的SIMD、混合数据宽度SIMD指令的MIMD等等。图19显示执行这种组合的体系结构。单元内的切片可以独立地运算,从而执行子运算。例如,每个8位切片可以执行独立的8位乘法,同时构造一个32位乘法器单元的切片组还可以作为4路SIMD字节乘法器运算。另一方面,单元组可以作为MIMD乘法器。
[0095]图20表示以切片方式访问的存储器。在传统的存储器体系结构中,单个操作数(例如64位整数或浮点操作数)作为整体被提取。在完成地址解码之后,通过读端口读取全部位,读端口必需缓存,从而在存储器中操作数大小的整个宽范围内缓冲使能读出。在TLSA体系结构中,在发生地址解码之后,在与前一个切片之间具有延时的跟随切片的延时模型上发生数据读出和/或写入。这个模型的优点在于,当无需立即解码并驱动整个数据宽度时,该模型能够快速响应存储器。
[0096]图21表示利用数据切片之间延时的具有转置/移位单元的TLSA体系结构。这个优点允许更短的时间来产生早期结果切片和/或更少的路由/面积来实现转置/移位逻辑。在图21中,构造右移移位器来获得以8位数字切片的32位操作数的延时到达。32位输入的第一切片(数字)在时间T0到达,而上一次切片在时间T0+3d到达,其中d是一个MUX延迟。纵向显示在右边的位(0到5)表示移位次数(最大32,任何大于32的值产生零输出,基本上全部数据被移出)。从来自移位器右侧的第一低位数字到达开始执行。然后该单元等待下一个输入数字到达。依赖于移位计数值的位3的值,将数字中的一个数字选到更低的数字位置,这样将判断该数目将右移8位。然后,下一个最高有效位到达,并且选择该新数字,或者利用移位计数值中的下一位,传递在上面的复用器级中选择的数据等等。任何移位计数值零移出的复用器位置将对其输出实现零负载。当最后一个数字到达时,它通过最小数量的复用器级,这样将相对于前一个数字具有最小延迟“d”,从而启动上一个数字对输出的快速传播。
[0097]左移移位器可用关于垂线对称的图21的结构来构造,其中左切片具有较多复用器栈,右切片具有较少的复用器栈。最高有效数字经过最小数目的复用器级。
[0098]切片结构通用单元可以是类似于图10A-10B所述的一个通用单元。它采用相同的切片体系结构技术(TLSA)和/或细粒异步原理。它在每一个通用单元内部执行全部功能/运算。这意味着每个通用单元能够执行加法、乘法、移位、转置等。执行这种通用功能具有低延迟,其方法是以延迟方式每次处理一个单独的切片。它也可以处理浮点和整型数据。通用单元原理通过上述思想变得更为清晰,但是它通过简化调度并将机器吞吐量利用到极限从而实现了重要的优点。在常规的体系结构中,分离的单元用于执行每个运算或功能。它们共享相同的执行端口。因此,当一个单元被使用时,其他单元没有被使用。此外,传统体系结构中的延时是变化的,使得调度器调度它们很难。与此形成对比的是,在这个通用单元中,对于整个单元延迟被统一,并且调度器可看见通用单元的对称实例。
[0099]每个通用单元可以执行不同的执行单元功能,比如乘法器、加法器、移位器等等。另外,这些切片有可能具有不同的数据大小。在这个特定示例中,每个切片的数据宽度大于前一个切片的数据宽度。这种加法器结构允许切片树执行多操作数加法、乘法、乘累积、差分求和等。复用器(未在图中显示)对齐/转置/移位输入或中间输出,从而利用加法器/复用器的通用结构获得所需的运算,包括移位和转置运算。布斯单元/位乘单元1002被加到加法器,从而启用乘法。增加其他具体的逻辑、状态、存储器或LUT元来提供扩展功能。
[0100]ULIM体系结构可以利用已披露的TLSA技术进行时间切片。构建通用单元的其他方法是将其配置为浮点乘累积单元(MAC)。用于建立这个单元的功能模块是乘法器、右移移位器、加法器和左移移位器。这种结构在图22中公开。通用单元可以利用这种结构来执行构造FP-MAC的这些功能中任一功能或这些功能的组合。
[0101]已在上文中描述了上述功能中的每一个,因此可以分别作为TLSA结构实现,然后合并起来作为浮点乘累加TLSA结构。这种结构还可以作为连续切片的ALU或后面乘以ALU或移位等。
[0102]本发明的一个实施例涉及具有计算机可读介质的计算机存储产品,其上具有用于执行各种计算机实现的运算的计算机代码。这种介质和计算机代码可以是为本发明目的而专门设计和构建的,或者是计算机软件领域技术人员公知的并可以获得的。计算机可读介质的实例包括但不限于:诸如硬盘、软盘和磁带之类的磁介质;诸如CD-ROM、DVD和全息装置之类的光学介质;磁光介质;以及专门配置为存储和执行程序代码的硬件设备,比如专用集成电路(ASIC)、可编程逻辑器件(PLD)以及ROM及RAM设备。计算机代码的实例包括机器代码,诸如编译器产生的机器代码,以及由计算机利用解释器执行的包含高阶代码的文件。例如,本发明的一个实施例可利用Java、C++或其他的面向对象的程序设计语言和开发工具来实现。本发明的另一个实施例将以硬线电路代替机器执行软件指令来实现或以硬线电路结合机器执行软件指令来实现。
[0103]以上描述仅用于解释的目的,采用了特定的术语来提供对本发明的透彻理解。然而显而易见的是,对于本领域的技术人员而言,无需这些具体的细节也能够实现本发明。因此,本发明的上述具体实施例的描述仅是出于解释说明的目的。它们不意味着是详尽的,或者将本发明限制为已公开的精确形式;很明显,可以对上述教诲进行各种各样的修改和变更。为了最佳地解释本发明及其实际应用的原理才选择和描述了上述的实施例,因此它们使得本领域的其他技术人员能够最优地利用本发明并且具有各种修改的各种实施例适用于特定用途。本发明的范围由所附权利要求书及其等效内容来界定。
Claims (20)
1.一种装置,包括:
形成多行和多列的执行块矩阵,其中所述多行支持指令的并行执行,以及所述多列支持依赖指令的执行,其中所述执行块矩阵处理载明并行指令和依赖指令的单个指令块。
2.权利要求1的装置,其中所述执行块矩阵形成单处理器流水线的一部分。
3.权利要求2的装置,其中所述单处理器流水线包括提取阶段、分支处理阶段、解码阶段、调度阶段、执行阶段和提交阶段。
4.权利要求3的装置,其中所述提取阶段提取所述单个指令块,并且根据所述指令块的大小增加程序计数器的计数值。
5.权利要求4的装置,其中所述分支处理阶段支持其他指令块的分支。
6.权利要求3的装置,其中所述调度阶段使用块寄存器参考。
7.权利要求3的装置,还包括支持并行块写入的寄存器文件。
8.权利要求1的装置,其中所述执行块支持浮点运算、整数运算、单指令多数据(SIMD)运算和多指令多数据(MIMD)运算。
9.权利要求1的装置,其中所述执行块矩阵形成延时切片体系结构,从而在单个时钟周期内处理并行指令和依赖指令。
10.权利要求9的装置,其中所述延时切片体系结构在执行切片之间形成延时。
11.权利要求9的装置,其中所述执行块矩阵在单个时钟周期内启动多个依赖指令。
12.权利要求11的装置,其中利用分离执行切片的状态元对多个依赖指令进行流水线处理。
13.权利要求9的装置,其中所述延时切片体系结构每次产生一个数字。
14.权利要求13的装置,其中不同的数字具有不同的位宽。
15.权利要求9的装置,其中所述延时切片体系结构包括经修改的布斯编码。
16.权利要求15的装置,其中所述经修改的布斯编码在数字切片边界插入零。
17.权利要求9的装置,其中所述延时切片体系结构包括右移移位器,所述右移移位器使第一组切片数字应用于第一复用器栈,并使第二组切片数字应用于第二复用器栈,其中所述第一复用器栈大于所述第二复用器栈。
18.权利要求9的装置,其中所述延时切片体系结构包括左移移位器,所述左移移位器使第一组切片数字应用于第一复用器栈,并使第二组切片数字应用于第二复用器栈,其中所述第二复用器栈大于所述第一复用器栈。
19.权利要求9的装置,还包括具有延时结构的存储器,以产生第一数据切片,所述第一数据切片后面跟随有延时的第二数据切片。
20.权利要求9的装置,还包括浮点乘累加单元,所述浮点乘累加单元实现乘法运算、加法运算、右移运算、左移运算和混合运算。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310589048.6A CN103646009B (zh) | 2006-04-12 | 2007-04-12 | 对载明并行和依赖运算的指令矩阵进行处理的装置和方法 |
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US79164906P | 2006-04-12 | 2006-04-12 | |
US79178206P | 2006-04-12 | 2006-04-12 | |
US60/791,782 | 2006-04-12 | ||
US60/791,649 | 2006-04-12 | ||
US79221906P | 2006-04-14 | 2006-04-14 | |
US60/792,219 | 2006-04-14 | ||
PCT/US2007/066536 WO2007143278A2 (en) | 2006-04-12 | 2007-04-12 | Apparatus and method for processing an instruction matrix specifying parallel and dependent operations |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310589048.6A Division CN103646009B (zh) | 2006-04-12 | 2007-04-12 | 对载明并行和依赖运算的指令矩阵进行处理的装置和方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101449256A true CN101449256A (zh) | 2009-06-03 |
CN101449256B CN101449256B (zh) | 2013-12-25 |
Family
ID=38802165
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310589048.6A Active CN103646009B (zh) | 2006-04-12 | 2007-04-12 | 对载明并行和依赖运算的指令矩阵进行处理的装置和方法 |
CN2007800173174A Active CN101449256B (zh) | 2006-04-12 | 2007-04-12 | 对载明并行和依赖运算的指令矩阵进行处理的装置和方法 |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310589048.6A Active CN103646009B (zh) | 2006-04-12 | 2007-04-12 | 对载明并行和依赖运算的指令矩阵进行处理的装置和方法 |
Country Status (4)
Country | Link |
---|---|
US (5) | US8327115B2 (zh) |
EP (2) | EP2011018B1 (zh) |
CN (2) | CN103646009B (zh) |
WO (1) | WO2007143278A2 (zh) |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104583975A (zh) * | 2012-06-15 | 2015-04-29 | 索夫特机械公司 | 无消歧乱序加载存储队列 |
CN105074657A (zh) * | 2013-01-29 | 2015-11-18 | 超威半导体公司 | 并行管道中的发散分支的硬件和软件解决方案 |
CN104008216B (zh) * | 2013-02-22 | 2017-04-26 | 円星科技股份有限公司 | 使用存储器编译器以产生最佳化存储器实例的方法 |
US9766893B2 (en) | 2011-03-25 | 2017-09-19 | Intel Corporation | Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines |
CN107305538A (zh) * | 2016-04-22 | 2017-10-31 | 北京中科寒武纪科技有限公司 | 一种子矩阵运算装置及方法 |
US9811377B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for executing multithreaded instructions grouped into blocks |
US9811342B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for performing dual dispatch of blocks and half blocks |
US9823930B2 (en) | 2013-03-15 | 2017-11-21 | Intel Corporation | Method for emulating a guest centralized flag architecture by using a native distributed flag architecture |
US9842005B2 (en) | 2011-03-25 | 2017-12-12 | Intel Corporation | Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US9858080B2 (en) | 2013-03-15 | 2018-01-02 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US9886279B2 (en) | 2013-03-15 | 2018-02-06 | Intel Corporation | Method for populating and instruction view data structure by using register template snapshots |
US9886416B2 (en) | 2006-04-12 | 2018-02-06 | Intel Corporation | Apparatus and method for processing an instruction matrix specifying parallel and dependent operations |
US9891924B2 (en) | 2013-03-15 | 2018-02-13 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US9898412B2 (en) | 2013-03-15 | 2018-02-20 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
US9904552B2 (en) | 2012-06-15 | 2018-02-27 | Intel Corporation | Virtual load store queue having a dynamic dispatch window with a distributed structure |
US9921845B2 (en) | 2011-03-25 | 2018-03-20 | Intel Corporation | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US9928121B2 (en) | 2012-06-15 | 2018-03-27 | Intel Corporation | Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization |
US9934042B2 (en) | 2013-03-15 | 2018-04-03 | Intel Corporation | Method for dependency broadcasting through a block organized source view data structure |
US9940134B2 (en) | 2011-05-20 | 2018-04-10 | Intel Corporation | Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines |
US9965281B2 (en) | 2006-11-14 | 2018-05-08 | Intel Corporation | Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer |
US9965277B2 (en) | 2012-06-15 | 2018-05-08 | Intel Corporation | Virtual load store queue having a dynamic dispatch window with a unified structure |
CN108121688A (zh) * | 2017-12-15 | 2018-06-05 | 北京中科寒武纪科技有限公司 | 一种计算方法及相关产品 |
US9990198B2 (en) | 2012-06-15 | 2018-06-05 | Intel Corporation | Instruction definition to implement load store reordering and optimization |
US10019263B2 (en) | 2012-06-15 | 2018-07-10 | Intel Corporation | Reordered speculative instruction sequences with a disambiguation-free out of order load store queue |
US10031784B2 (en) | 2011-05-20 | 2018-07-24 | Intel Corporation | Interconnect system to support the execution of instruction sequences by a plurality of partitionable engines |
US10140138B2 (en) | 2013-03-15 | 2018-11-27 | Intel Corporation | Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation |
US10146548B2 (en) | 2013-03-15 | 2018-12-04 | Intel Corporation | Method for populating a source view data structure by using register template snapshots |
US10169045B2 (en) | 2013-03-15 | 2019-01-01 | Intel Corporation | Method for dependency broadcasting through a source organized source view data structure |
US10191746B2 (en) | 2011-11-22 | 2019-01-29 | Intel Corporation | Accelerated code optimizer for a multiengine microprocessor |
US10198266B2 (en) | 2013-03-15 | 2019-02-05 | Intel Corporation | Method for populating register view data structure by using register template snapshots |
US10228949B2 (en) | 2010-09-17 | 2019-03-12 | Intel Corporation | Single cycle multi-branch prediction including shadow cache for early far branch prediction |
US10521239B2 (en) | 2011-11-22 | 2019-12-31 | Intel Corporation | Microprocessor accelerated code optimizer |
CN112596869A (zh) * | 2020-12-08 | 2021-04-02 | 海光信息技术股份有限公司 | 延时模型的构建方法及装置、电子设备及存储介质 |
Families Citing this family (62)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8755515B1 (en) * | 2008-09-29 | 2014-06-17 | Wai Wu | Parallel signal processing system and method |
EP2207090B1 (en) * | 2009-01-09 | 2018-12-19 | STMicroelectronics (Research & Development) Limited | Apparatus comprising a plurality of arithmetic logic units |
US10698859B2 (en) | 2009-09-18 | 2020-06-30 | The Board Of Regents Of The University Of Texas System | Data multicasting with router replication and target instruction identification in a distributed multi-core processing architecture |
US10061618B2 (en) * | 2011-06-16 | 2018-08-28 | Imagination Technologies Limited | Scheduling heterogenous computation on multithreaded processors |
US20130027416A1 (en) * | 2011-07-25 | 2013-01-31 | Karthikeyan Vaithianathan | Gather method and apparatus for media processing accelerators |
KR101648278B1 (ko) | 2011-11-22 | 2016-08-12 | 소프트 머신즈, 인크. | 마이크로프로세서 가속 코드 최적화기 및 의존성 재순서화 방법 |
US9378181B2 (en) * | 2012-11-09 | 2016-06-28 | Intel Corporation | Scalable computing array |
US9632825B2 (en) | 2013-03-15 | 2017-04-25 | Intel Corporation | Method and apparatus for efficient scheduling for asymmetrical execution units |
KR20140126195A (ko) * | 2013-04-22 | 2014-10-30 | 삼성전자주식회사 | 배치 쓰레드 처리 기반의 프로세서, 그 프로세서를 이용한 배치 쓰레드 처리 방법 및 배치 쓰레드 처리를 위한 코드 생성 장치 |
US9870204B2 (en) * | 2014-11-14 | 2018-01-16 | Cavium, Inc. | Algorithm to achieve optimal layout of instruction tables for programmable network devices |
US10180841B2 (en) | 2014-12-22 | 2019-01-15 | Centipede Semi Ltd. | Early termination of segment monitoring in run-time code parallelization |
US10296350B2 (en) | 2015-03-31 | 2019-05-21 | Centipede Semi Ltd. | Parallelized execution of instruction sequences |
US10296346B2 (en) | 2015-03-31 | 2019-05-21 | Centipede Semi Ltd. | Parallelized execution of instruction sequences based on pre-monitoring |
US9952865B2 (en) * | 2015-04-04 | 2018-04-24 | Texas Instruments Incorporated | Low energy accelerator processor architecture with short parallel instruction word and non-orthogonal register data file |
US9715390B2 (en) | 2015-04-19 | 2017-07-25 | Centipede Semi Ltd. | Run-time parallelization of code execution based on an approximate register-access specification |
US10191747B2 (en) | 2015-06-26 | 2019-01-29 | Microsoft Technology Licensing, Llc | Locking operand values for groups of instructions executed atomically |
US10409599B2 (en) | 2015-06-26 | 2019-09-10 | Microsoft Technology Licensing, Llc | Decoding information about a group of instructions including a size of the group of instructions |
US9946548B2 (en) | 2015-06-26 | 2018-04-17 | Microsoft Technology Licensing, Llc | Age-based management of instruction blocks in a processor instruction window |
US9940136B2 (en) | 2015-06-26 | 2018-04-10 | Microsoft Technology Licensing, Llc | Reuse of decoded instructions |
US10169044B2 (en) | 2015-06-26 | 2019-01-01 | Microsoft Technology Licensing, Llc | Processing an encoding format field to interpret header information regarding a group of instructions |
US10409606B2 (en) | 2015-06-26 | 2019-09-10 | Microsoft Technology Licensing, Llc | Verifying branch targets |
US10346168B2 (en) | 2015-06-26 | 2019-07-09 | Microsoft Technology Licensing, Llc | Decoupled processor instruction window and operand buffer |
US11755484B2 (en) | 2015-06-26 | 2023-09-12 | Microsoft Technology Licensing, Llc | Instruction block allocation |
US9952867B2 (en) | 2015-06-26 | 2018-04-24 | Microsoft Technology Licensing, Llc | Mapping instruction blocks based on block size |
US10175988B2 (en) | 2015-06-26 | 2019-01-08 | Microsoft Technology Licensing, Llc | Explicit instruction scheduler state information for a processor |
CN105183433B (zh) | 2015-08-24 | 2018-02-06 | 上海兆芯集成电路有限公司 | 指令合并方法以及具有多数据通道的装置 |
US10678544B2 (en) | 2015-09-19 | 2020-06-09 | Microsoft Technology Licensing, Llc | Initiating instruction block execution using a register access instruction |
US10452399B2 (en) | 2015-09-19 | 2019-10-22 | Microsoft Technology Licensing, Llc | Broadcast channel architectures for block-based processors |
US10180840B2 (en) | 2015-09-19 | 2019-01-15 | Microsoft Technology Licensing, Llc | Dynamic generation of null instructions |
US10871967B2 (en) | 2015-09-19 | 2020-12-22 | Microsoft Technology Licensing, Llc | Register read/write ordering |
US11126433B2 (en) | 2015-09-19 | 2021-09-21 | Microsoft Technology Licensing, Llc | Block-based processor core composition register |
US11977891B2 (en) | 2015-09-19 | 2024-05-07 | Microsoft Technology Licensing, Llc | Implicit program order |
US11681531B2 (en) | 2015-09-19 | 2023-06-20 | Microsoft Technology Licensing, Llc | Generation and use of memory access instruction order encodings |
US10198263B2 (en) | 2015-09-19 | 2019-02-05 | Microsoft Technology Licensing, Llc | Write nullification |
US10061584B2 (en) | 2015-09-19 | 2018-08-28 | Microsoft Technology Licensing, Llc | Store nullification in the target field |
US10776115B2 (en) | 2015-09-19 | 2020-09-15 | Microsoft Technology Licensing, Llc | Debug support for block-based processor |
US10936316B2 (en) | 2015-09-19 | 2021-03-02 | Microsoft Technology Licensing, Llc | Dense read encoding for dataflow ISA |
US10719321B2 (en) | 2015-09-19 | 2020-07-21 | Microsoft Technology Licensing, Llc | Prefetching instruction blocks |
US10031756B2 (en) | 2015-09-19 | 2018-07-24 | Microsoft Technology Licensing, Llc | Multi-nullification |
US10095519B2 (en) | 2015-09-19 | 2018-10-09 | Microsoft Technology Licensing, Llc | Instruction block address register |
US10768936B2 (en) | 2015-09-19 | 2020-09-08 | Microsoft Technology Licensing, Llc | Block-based processor including topology and control registers to indicate resource sharing and size of logical processor |
US11016770B2 (en) | 2015-09-19 | 2021-05-25 | Microsoft Technology Licensing, Llc | Distinct system registers for logical processors |
US9921838B2 (en) * | 2015-10-02 | 2018-03-20 | Mediatek Inc. | System and method for managing static divergence in a SIMD computing architecture |
US11106467B2 (en) | 2016-04-28 | 2021-08-31 | Microsoft Technology Licensing, Llc | Incremental scheduler for out-of-order block ISA processors |
US11531552B2 (en) | 2017-02-06 | 2022-12-20 | Microsoft Technology Licensing, Llc | Executing multiple programs simultaneously on a processor core |
US10474468B2 (en) * | 2017-02-22 | 2019-11-12 | Advanced Micro Devices, Inc. | Indicating instruction scheduling mode for processing wavefront portions |
US11567765B2 (en) | 2017-03-20 | 2023-01-31 | Intel Corporation | Systems, methods, and apparatuses for tile load |
US10372456B2 (en) * | 2017-05-24 | 2019-08-06 | Microsoft Technology Licensing, Llc | Tensor processor instruction set architecture |
US11275588B2 (en) | 2017-07-01 | 2022-03-15 | Intel Corporation | Context save with variable save state size |
CN107895191B (zh) * | 2017-10-30 | 2022-02-22 | 上海寒武纪信息科技有限公司 | 一种信息处理方法及相关产品 |
US10963379B2 (en) | 2018-01-30 | 2021-03-30 | Microsoft Technology Licensing, Llc | Coupling wide memory interface to wide write back paths |
US10664287B2 (en) * | 2018-03-30 | 2020-05-26 | Intel Corporation | Systems and methods for implementing chained tile operations |
CN110659069B (zh) * | 2018-06-28 | 2022-08-19 | 赛灵思公司 | 用于执行神经网络计算的指令调度方法及相应计算系统 |
US20210133854A1 (en) | 2018-09-13 | 2021-05-06 | Shanghai Cambricon Information Technology Co., Ltd. | Information processing method and terminal device |
US10824429B2 (en) | 2018-09-19 | 2020-11-03 | Microsoft Technology Licensing, Llc | Commit logic and precise exceptions in explicit dataflow graph execution architectures |
CN110032407B (zh) * | 2019-03-08 | 2020-12-22 | 创新先进技术有限公司 | 提升cpu并行性能的方法及装置和电子设备 |
US11520854B2 (en) * | 2019-10-29 | 2022-12-06 | Meta Platforms, Inc. | Support for different matrix multiplications by selecting adder tree intermediate results |
EP4133368A1 (en) * | 2020-05-14 | 2023-02-15 | Huawei Technologies Co., Ltd. | Device and method for data processing |
US11706039B2 (en) | 2020-12-26 | 2023-07-18 | Intel Corporation | ISA accessible physical unclonable function |
US11570010B2 (en) * | 2020-12-26 | 2023-01-31 | Intel Corporation | ISA accessible physical unclonable function |
US11700135B2 (en) * | 2020-12-26 | 2023-07-11 | Intel Corporation | ISA accessible physical unclonable function |
TWI792546B (zh) * | 2021-09-09 | 2023-02-11 | 瑞昱半導體股份有限公司 | 用於管線化控制的設備以及方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4835680A (en) * | 1985-03-15 | 1989-05-30 | Xerox Corporation | Adaptive processor array capable of learning variable associations useful in recognizing classes of inputs |
US4943909A (en) * | 1987-07-08 | 1990-07-24 | At&T Bell Laboratories | Computational origami |
WO2004114128A2 (en) * | 2003-06-25 | 2004-12-29 | Koninklijke Philips Electronics N.V. | Instruction controlled data processing device |
Family Cites Families (508)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US727487A (en) | 1902-10-21 | 1903-05-05 | Swan F Swanson | Dumping-car. |
US4075704A (en) | 1976-07-02 | 1978-02-21 | Floating Point Systems, Inc. | Floating point data processor for high speech operation |
US4228496A (en) | 1976-09-07 | 1980-10-14 | Tandem Computers Incorporated | Multiprocessor system |
US4245344A (en) | 1979-04-02 | 1981-01-13 | Rockwell International Corporation | Processing system with dual buses |
US4527237A (en) | 1979-10-11 | 1985-07-02 | Nanodata Computer Corporation | Data processing system |
US4414624A (en) | 1980-11-19 | 1983-11-08 | The United States Of America As Represented By The Secretary Of The Navy | Multiple-microcomputer processing |
US4524415A (en) | 1982-12-07 | 1985-06-18 | Motorola, Inc. | Virtual machine data processor |
US4597061B1 (en) | 1983-01-03 | 1998-06-09 | Texas Instruments Inc | Memory system using pipleline circuitry for improved system |
US4577273A (en) | 1983-06-06 | 1986-03-18 | Sperry Corporation | Multiple microcomputer system for digital computers |
US4682281A (en) | 1983-08-30 | 1987-07-21 | Amdahl Corporation | Data storage unit employing translation lookaside buffer pointer |
US4600986A (en) | 1984-04-02 | 1986-07-15 | Sperry Corporation | Pipelined split stack with high performance interleaved decode |
US4633434A (en) | 1984-04-02 | 1986-12-30 | Sperry Corporation | High performance storage unit |
JPS6140643A (ja) | 1984-07-31 | 1986-02-26 | Hitachi Ltd | システムの資源割当て制御方式 |
JPS6289149A (ja) | 1985-10-15 | 1987-04-23 | Agency Of Ind Science & Technol | 多ポ−トメモリシステム |
JPH0658650B2 (ja) | 1986-03-14 | 1994-08-03 | 株式会社日立製作所 | 仮想計算機システム |
US4920477A (en) | 1987-04-20 | 1990-04-24 | Multiflow Computer, Inc. | Virtual address table look aside buffer miss recovery method and apparatus |
JP2930341B2 (ja) * | 1988-10-07 | 1999-08-03 | マーチン・マリエッタ・コーポレーション | データ並列処理装置 |
US5339398A (en) | 1989-07-31 | 1994-08-16 | North American Philips Corporation | Memory architecture and method of data organization optimized for hashing |
US5471593A (en) | 1989-12-11 | 1995-11-28 | Branigin; Michael H. | Computer processor with an efficient means of executing many instructions simultaneously |
US5197130A (en) | 1989-12-29 | 1993-03-23 | Supercomputer Systems Limited Partnership | Cluster architecture for a highly parallel scalar/vector multiprocessor system |
EP0463965B1 (en) | 1990-06-29 | 1998-09-09 | Digital Equipment Corporation | Branch prediction unit for high-performance processor |
US5317754A (en) | 1990-10-23 | 1994-05-31 | International Business Machines Corporation | Method and apparatus for enabling an interpretive execution subset |
US5317705A (en) | 1990-10-24 | 1994-05-31 | International Business Machines Corporation | Apparatus and method for TLB purge reduction in a multi-level machine system |
US6282583B1 (en) | 1991-06-04 | 2001-08-28 | Silicon Graphics, Inc. | Method and apparatus for memory access in a matrix processor computer |
US5539911A (en) | 1991-07-08 | 1996-07-23 | Seiko Epson Corporation | High-performance, superscalar-based computer system with out-of-order instruction execution |
JPH0820949B2 (ja) | 1991-11-26 | 1996-03-04 | 松下電器産業株式会社 | 情報処理装置 |
WO1993013481A1 (en) | 1991-12-23 | 1993-07-08 | Intel Corporation | Interleaved cache for multiple accesses per clock in a microprocessor |
JP2647327B2 (ja) * | 1992-04-06 | 1997-08-27 | インターナショナル・ビジネス・マシーンズ・コーポレイション | 大規模並列コンピューティング・システム装置 |
KR100309566B1 (ko) | 1992-04-29 | 2001-12-15 | 리패치 | 파이프라인프로세서에서다중명령어를무리짓고,그룹화된명령어를동시에발행하고,그룹화된명령어를실행시키는방법및장치 |
EP0638183B1 (en) | 1992-05-01 | 1997-03-05 | Seiko Epson Corporation | A system and method for retiring instructions in a superscalar microprocessor |
EP0576262B1 (en) * | 1992-06-25 | 2000-08-23 | Canon Kabushiki Kaisha | Apparatus for multiplying integers of many figures |
JPH0637202A (ja) | 1992-07-20 | 1994-02-10 | Mitsubishi Electric Corp | マイクロ波ic用パッケージ |
JPH06110781A (ja) | 1992-09-30 | 1994-04-22 | Nec Corp | キャッシュメモリ装置 |
US5493660A (en) | 1992-10-06 | 1996-02-20 | Hewlett-Packard Company | Software assisted hardware TLB miss handler |
US5513335A (en) | 1992-11-02 | 1996-04-30 | Sgs-Thomson Microelectronics, Inc. | Cache tag memory having first and second single-port arrays and a dual-port array |
US5819088A (en) | 1993-03-25 | 1998-10-06 | Intel Corporation | Method and apparatus for scheduling instructions for execution on a multi-issue architecture computer |
US5548773A (en) * | 1993-03-30 | 1996-08-20 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | Digital parallel processor array for optimum path planning |
JPH0784883A (ja) | 1993-09-17 | 1995-03-31 | Hitachi Ltd | 仮想計算機システムのアドレス変換バッファパージ方法 |
US6948172B1 (en) | 1993-09-21 | 2005-09-20 | Microsoft Corporation | Preemptive multi-tasking with cooperative groups of tasks |
US5469376A (en) | 1993-10-14 | 1995-11-21 | Abdallah; Mohammad A. F. F. | Digital circuit for the evaluation of mathematical expressions |
US5517651A (en) | 1993-12-29 | 1996-05-14 | Intel Corporation | Method and apparatus for loading a segment register in a microprocessor capable of operating in multiple modes |
US5956753A (en) | 1993-12-30 | 1999-09-21 | Intel Corporation | Method and apparatus for handling speculative memory access operations |
US5761476A (en) | 1993-12-30 | 1998-06-02 | Intel Corporation | Non-clocked early read for back-to-back scheduling of instructions |
JP3048498B2 (ja) | 1994-04-13 | 2000-06-05 | 株式会社東芝 | 半導体記憶装置 |
JPH07287668A (ja) | 1994-04-19 | 1995-10-31 | Hitachi Ltd | データ処理装置 |
CN1084005C (zh) | 1994-06-27 | 2002-05-01 | 国际商业机器公司 | 用于动态控制地址空间分配的方法和设备 |
US5548742A (en) | 1994-08-11 | 1996-08-20 | Intel Corporation | Method and apparatus for combining a direct-mapped cache and a multiple-way cache in a cache memory |
US5813031A (en) | 1994-09-21 | 1998-09-22 | Industrial Technology Research Institute | Caching tag for a large scale cache computer memory system |
US5640534A (en) | 1994-10-05 | 1997-06-17 | International Business Machines Corporation | Method and system for concurrent access in a data cache array utilizing multiple match line selection paths |
US5835951A (en) | 1994-10-18 | 1998-11-10 | National Semiconductor | Branch processing unit with target cache read prioritization protocol for handling multiple hits |
JP3569014B2 (ja) | 1994-11-25 | 2004-09-22 | 富士通株式会社 | マルチコンテキストをサポートするプロセッサおよび処理方法 |
US5724565A (en) | 1995-02-03 | 1998-03-03 | International Business Machines Corporation | Method and system for processing first and second sets of instructions by first and second types of processing systems |
US5673426A (en) | 1995-02-14 | 1997-09-30 | Hal Computer Systems, Inc. | Processor structure and method for tracking floating-point exceptions |
US5675759A (en) | 1995-03-03 | 1997-10-07 | Shebanow; Michael C. | Method and apparatus for register management using issue sequence prior physical register and register association validity information |
US5634068A (en) | 1995-03-31 | 1997-05-27 | Sun Microsystems, Inc. | Packet switched cache coherent multiprocessor system |
US5751982A (en) | 1995-03-31 | 1998-05-12 | Apple Computer, Inc. | Software emulation system with dynamic translation of emulated instructions for increased processing speed |
US6209085B1 (en) | 1995-05-05 | 2001-03-27 | Intel Corporation | Method and apparatus for performing process switching in multiprocessor computer systems |
US6643765B1 (en) | 1995-08-16 | 2003-11-04 | Microunity Systems Engineering, Inc. | Programmable processor with group floating point operations |
US5710902A (en) | 1995-09-06 | 1998-01-20 | Intel Corporation | Instruction dependency chain indentifier |
US6341324B1 (en) | 1995-10-06 | 2002-01-22 | Lsi Logic Corporation | Exception processing in superscalar microprocessor |
US5864657A (en) | 1995-11-29 | 1999-01-26 | Texas Micro, Inc. | Main memory system and checkpointing protocol for fault-tolerant computer system |
US5983327A (en) | 1995-12-01 | 1999-11-09 | Nortel Networks Corporation | Data path architecture and arbitration scheme for providing access to a shared system resource |
US5793941A (en) | 1995-12-04 | 1998-08-11 | Advanced Micro Devices, Inc. | On-chip primary cache testing circuit and test method |
US5911057A (en) | 1995-12-19 | 1999-06-08 | Texas Instruments Incorporated | Superscalar microprocessor having combined register and memory renaming circuits, systems, and methods |
US5699537A (en) | 1995-12-22 | 1997-12-16 | Intel Corporation | Processor microarchitecture for efficient dynamic scheduling and execution of chains of dependent instructions |
US6882177B1 (en) | 1996-01-10 | 2005-04-19 | Altera Corporation | Tristate structures for programmable logic devices |
US5754818A (en) | 1996-03-22 | 1998-05-19 | Sun Microsystems, Inc. | Architecture and method for sharing TLB entries through process IDS |
US5904892A (en) | 1996-04-01 | 1999-05-18 | Saint-Gobain/Norton Industrial Ceramics Corp. | Tape cast silicon carbide dummy wafer |
US5752260A (en) | 1996-04-29 | 1998-05-12 | International Business Machines Corporation | High-speed, multiple-port, interleaved cache with arbitration of multiple access addresses |
US5806085A (en) | 1996-05-01 | 1998-09-08 | Sun Microsystems, Inc. | Method for non-volatile caching of network and CD-ROM file accesses using a cache directory, pointers, file name conversion, a local hard disk, and separate small database |
US5829028A (en) | 1996-05-06 | 1998-10-27 | Advanced Micro Devices, Inc. | Data cache configured to store data in a use-once manner |
US6108769A (en) | 1996-05-17 | 2000-08-22 | Advanced Micro Devices, Inc. | Dependency table for reducing dependency checking hardware |
US5958042A (en) | 1996-06-11 | 1999-09-28 | Sun Microsystems, Inc. | Grouping logic circuit in a pipelined superscalar processor |
US5881277A (en) | 1996-06-13 | 1999-03-09 | Texas Instruments Incorporated | Pipelined microprocessor with branch misprediction cache circuits, systems and methods |
US5860146A (en) | 1996-06-25 | 1999-01-12 | Sun Microsystems, Inc. | Auxiliary translation lookaside buffer for assisting in accessing data in remote address spaces |
US5903760A (en) | 1996-06-27 | 1999-05-11 | Intel Corporation | Method and apparatus for translating a conditional instruction compatible with a first instruction set architecture (ISA) into a conditional instruction compatible with a second ISA |
US5974506A (en) | 1996-06-28 | 1999-10-26 | Digital Equipment Corporation | Enabling mirror, nonmirror and partial mirror cache modes in a dual cache system |
US6167490A (en) | 1996-09-20 | 2000-12-26 | University Of Washington | Using global memory information to manage memory in a computer network |
KR19980032776A (ko) | 1996-10-16 | 1998-07-25 | 가나이 츠토무 | 데이타 프로세서 및 데이타 처리시스템 |
WO1998020415A1 (en) | 1996-11-04 | 1998-05-14 | Philips Electronics N.V. | Processing device, reads instructions in memory |
US6385715B1 (en) | 1996-11-13 | 2002-05-07 | Intel Corporation | Multi-threading for a processor utilizing a replay queue |
US5978906A (en) | 1996-11-19 | 1999-11-02 | Advanced Micro Devices, Inc. | Branch selectors associated with byte ranges within an instruction cache for rapidly identifying branch predictions |
US6253316B1 (en) | 1996-11-19 | 2001-06-26 | Advanced Micro Devices, Inc. | Three state branch history using one bit in a branch prediction mechanism |
US5903750A (en) | 1996-11-20 | 1999-05-11 | Institute For The Development Of Emerging Architectures, L.L.P. | Dynamic branch prediction for branch instructions with multiple targets |
US6212542B1 (en) | 1996-12-16 | 2001-04-03 | International Business Machines Corporation | Method and system for executing a program within a multiscalar processor by processing linked thread descriptors |
US6134634A (en) | 1996-12-20 | 2000-10-17 | Texas Instruments Incorporated | Method and apparatus for preemptive cache write-back |
US5918251A (en) | 1996-12-23 | 1999-06-29 | Intel Corporation | Method and apparatus for preloading different default address translation attributes |
US6016540A (en) | 1997-01-08 | 2000-01-18 | Intel Corporation | Method and apparatus for scheduling instructions in waves |
US6065105A (en) | 1997-01-08 | 2000-05-16 | Intel Corporation | Dependency matrix |
US5802602A (en) | 1997-01-17 | 1998-09-01 | Intel Corporation | Method and apparatus for performing reads of related data from a set-associative cache memory |
US6088780A (en) | 1997-03-31 | 2000-07-11 | Institute For The Development Of Emerging Architecture, L.L.C. | Page table walker that uses at least one of a default page size and a page size selected for a virtual address space to position a sliding field in a virtual address |
US6314511B2 (en) | 1997-04-03 | 2001-11-06 | University Of Washington | Mechanism for freeing registers on processors that perform dynamic out-of-order execution of instructions using renaming registers |
US6035120A (en) | 1997-05-28 | 2000-03-07 | Sun Microsystems, Inc. | Method and apparatus for converting executable computer programs in a heterogeneous computing environment |
US6075938A (en) | 1997-06-10 | 2000-06-13 | The Board Of Trustees Of The Leland Stanford Junior University | Virtual machine monitors for scalable multiprocessors |
US6073230A (en) | 1997-06-11 | 2000-06-06 | Advanced Micro Devices, Inc. | Instruction fetch unit configured to provide sequential way prediction for sequential instruction fetches |
JPH1124929A (ja) | 1997-06-30 | 1999-01-29 | Sony Corp | 演算処理装置およびその方法 |
US6658447B2 (en) | 1997-07-08 | 2003-12-02 | Intel Corporation | Priority based simultaneous multi-threading |
US6128728A (en) | 1997-08-01 | 2000-10-03 | Micron Technology, Inc. | Virtual shadow registers and virtual register windows |
US6170051B1 (en) | 1997-08-01 | 2001-01-02 | Micron Technology, Inc. | Apparatus and method for program level parallelism in a VLIW processor |
US6085315A (en) | 1997-09-12 | 2000-07-04 | Siemens Aktiengesellschaft | Data processing device with loop pipeline |
US6101577A (en) | 1997-09-15 | 2000-08-08 | Advanced Micro Devices, Inc. | Pipelined instruction cache and branch prediction mechanism therefor |
US5901294A (en) | 1997-09-18 | 1999-05-04 | International Business Machines Corporation | Method and system for bus arbitration in a multiprocessor system utilizing simultaneous variable-width bus access |
US6185660B1 (en) | 1997-09-23 | 2001-02-06 | Hewlett-Packard Company | Pending access queue for providing data to a target register during an intermediate pipeline phase after a computer cache miss |
US5905509A (en) | 1997-09-30 | 1999-05-18 | Compaq Computer Corp. | Accelerated Graphics Port two level Gart cache having distributed first level caches |
US6226732B1 (en) | 1997-10-02 | 2001-05-01 | Hitachi Micro Systems, Inc. | Memory system architecture |
US5922065A (en) | 1997-10-13 | 1999-07-13 | Institute For The Development Of Emerging Architectures, L.L.C. | Processor utilizing a template field for encoding instruction sequences in a wide-word format |
US6178482B1 (en) | 1997-11-03 | 2001-01-23 | Brecis Communications | Virtual register sets |
US6021484A (en) | 1997-11-14 | 2000-02-01 | Samsung Electronics Co., Ltd. | Dual instruction set architecture |
US6256728B1 (en) | 1997-11-17 | 2001-07-03 | Advanced Micro Devices, Inc. | Processor configured to selectively cancel instructions from its pipeline responsive to a predicted-taken short forward branch instruction |
US6260131B1 (en) | 1997-11-18 | 2001-07-10 | Intrinsity, Inc. | Method and apparatus for TLB memory ordering |
US6016533A (en) | 1997-12-16 | 2000-01-18 | Advanced Micro Devices, Inc. | Way prediction logic for cache array |
US6219776B1 (en) | 1998-03-10 | 2001-04-17 | Billions Of Operations Per Second | Merged array controller and processing element |
US6609189B1 (en) | 1998-03-12 | 2003-08-19 | Yale University | Cycle segmented prefix circuits |
JP3657424B2 (ja) | 1998-03-20 | 2005-06-08 | 松下電器産業株式会社 | 番組情報を放送するセンター装置と端末装置 |
US6216215B1 (en) | 1998-04-02 | 2001-04-10 | Intel Corporation | Method and apparatus for senior loads |
US6157998A (en) | 1998-04-03 | 2000-12-05 | Motorola Inc. | Method for performing branch prediction and resolution of two or more branch instructions within two or more branch prediction buffers |
US6205545B1 (en) | 1998-04-30 | 2001-03-20 | Hewlett-Packard Company | Method and apparatus for using static branch predictions hints with dynamically translated code traces to improve performance |
US6115809A (en) | 1998-04-30 | 2000-09-05 | Hewlett-Packard Company | Compiling strong and weak branching behavior instruction blocks to separate caches for dynamic and static prediction |
US6256727B1 (en) | 1998-05-12 | 2001-07-03 | International Business Machines Corporation | Method and system for fetching noncontiguous instructions in a single clock cycle |
JPH11338710A (ja) | 1998-05-28 | 1999-12-10 | Toshiba Corp | 複数種の命令セットを持つプロセッサのためのコンパイル方法ならびに装置および同方法がプログラムされ記録される記録媒体 |
US6272616B1 (en) | 1998-06-17 | 2001-08-07 | Agere Systems Guardian Corp. | Method and apparatus for executing multiple instruction streams in a digital processor with multiple data paths |
US6988183B1 (en) | 1998-06-26 | 2006-01-17 | Derek Chi-Lan Wong | Methods for increasing instruction-level parallelism in microprocessors and digital system |
US6260138B1 (en) | 1998-07-17 | 2001-07-10 | Sun Microsystems, Inc. | Method and apparatus for branch instruction processing in a processor |
US6122656A (en) | 1998-07-31 | 2000-09-19 | Advanced Micro Devices, Inc. | Processor configured to map logical register numbers to physical register numbers using virtual register numbers |
US6272662B1 (en) | 1998-08-04 | 2001-08-07 | International Business Machines Corporation | Distributed storage system using front-end and back-end locking |
JP2000057054A (ja) | 1998-08-12 | 2000-02-25 | Fujitsu Ltd | 高速アドレス変換システム |
US8631066B2 (en) | 1998-09-10 | 2014-01-14 | Vmware, Inc. | Mechanism for providing virtual machines for use by multiple users |
US6339822B1 (en) | 1998-10-02 | 2002-01-15 | Advanced Micro Devices, Inc. | Using padded instructions in a block-oriented cache |
US6332189B1 (en) | 1998-10-16 | 2001-12-18 | Intel Corporation | Branch prediction architecture |
GB9825102D0 (en) | 1998-11-16 | 1999-01-13 | Insignia Solutions Plc | Computer system |
JP3110404B2 (ja) | 1998-11-18 | 2000-11-20 | 甲府日本電気株式会社 | マイクロプロセッサ装置及びそのソフトウェア命令高速化方法並びにその制御プログラムを記録した記録媒体 |
US6490673B1 (en) | 1998-11-27 | 2002-12-03 | Matsushita Electric Industrial Co., Ltd | Processor, compiling apparatus, and compile program recorded on a recording medium |
US6519682B2 (en) | 1998-12-04 | 2003-02-11 | Stmicroelectronics, Inc. | Pipelined non-blocking level two cache system with inherent transaction collision-avoidance |
US7020879B1 (en) | 1998-12-16 | 2006-03-28 | Mips Technologies, Inc. | Interrupt and exception handling for multi-streaming digital processors |
US6477562B2 (en) | 1998-12-16 | 2002-11-05 | Clearwater Networks, Inc. | Prioritized instruction scheduling for multi-streaming processors |
US6247097B1 (en) | 1999-01-22 | 2001-06-12 | International Business Machines Corporation | Aligned instruction cache handling of instruction fetches across multiple predicted branch instructions |
US6321298B1 (en) | 1999-01-25 | 2001-11-20 | International Business Machines Corporation | Full cache coherency across multiple raid controllers |
JP3842474B2 (ja) | 1999-02-02 | 2006-11-08 | 株式会社ルネサステクノロジ | データ処理装置 |
US6327650B1 (en) | 1999-02-12 | 2001-12-04 | Vsli Technology, Inc. | Pipelined multiprocessing with upstream processor concurrently writing to local register and to register of downstream processor |
US6668316B1 (en) | 1999-02-17 | 2003-12-23 | Elbrus International Limited | Method and apparatus for conflict-free execution of integer and floating-point operations with a common register file |
US6732220B2 (en) | 1999-02-17 | 2004-05-04 | Elbrus International | Method for emulating hardware features of a foreign architecture in a host operating system environment |
US6418530B2 (en) | 1999-02-18 | 2002-07-09 | Hewlett-Packard Company | Hardware/software system for instruction profiling and trace selection using branch history information for branch predictions |
US6437789B1 (en) | 1999-02-19 | 2002-08-20 | Evans & Sutherland Computer Corporation | Multi-level cache controller |
US6850531B1 (en) | 1999-02-23 | 2005-02-01 | Alcatel | Multi-service network switch |
US6212613B1 (en) | 1999-03-22 | 2001-04-03 | Cisco Technology, Inc. | Methods and apparatus for reusing addresses in a computer |
US6529928B1 (en) | 1999-03-23 | 2003-03-04 | Silicon Graphics, Inc. | Floating-point adder performing floating-point and integer operations |
EP1050808B1 (en) | 1999-05-03 | 2008-04-30 | STMicroelectronics S.A. | Computer instruction scheduling |
US6449671B1 (en) | 1999-06-09 | 2002-09-10 | Ati International Srl | Method and apparatus for busing data elements |
US6473833B1 (en) | 1999-07-30 | 2002-10-29 | International Business Machines Corporation | Integrated cache and directory structure for multi-level caches |
US6643770B1 (en) | 1999-09-16 | 2003-11-04 | Intel Corporation | Branch misprediction recovery using a side memory |
US6704822B1 (en) | 1999-10-01 | 2004-03-09 | Sun Microsystems, Inc. | Arbitration protocol for a shared data cache |
US6772325B1 (en) | 1999-10-01 | 2004-08-03 | Hitachi, Ltd. | Processor architecture and operation for exploiting improved branch control instruction |
US6457120B1 (en) | 1999-11-01 | 2002-09-24 | International Business Machines Corporation | Processor and method including a cache having confirmation bits for improving address predictable branch instruction target predictions |
US7441110B1 (en) | 1999-12-10 | 2008-10-21 | International Business Machines Corporation | Prefetching using future branch path information derived from branch prediction |
US7107434B2 (en) | 1999-12-20 | 2006-09-12 | Board Of Regents, The University Of Texas | System, method and apparatus for allocating hardware resources using pseudorandom sequences |
EP1247195A4 (en) | 1999-12-22 | 2005-01-05 | Ubicom Inc | SYSTEM AND METHOD FOR MULTITHREADING WORKING ON COMMAND LEVEL IN AN EMBEDDED PROCESSOR WITH ZERO-TIME CONTEXT SWITCHING |
US6557095B1 (en) | 1999-12-27 | 2003-04-29 | Intel Corporation | Scheduling operations using a dependency matrix |
CN1210649C (zh) | 2000-01-03 | 2005-07-13 | 先进微装置公司 | 能够发送及重新发送附属链接的排程器、包括该排程器的处理器以及排程方法 |
US6542984B1 (en) | 2000-01-03 | 2003-04-01 | Advanced Micro Devices, Inc. | Scheduler capable of issuing and reissuing dependency chains |
US6594755B1 (en) | 2000-01-04 | 2003-07-15 | National Semiconductor Corporation | System and method for interleaved execution of multiple independent threads |
US6728872B1 (en) | 2000-02-04 | 2004-04-27 | International Business Machines Corporation | Method and apparatus for verifying that instructions are pipelined in correct architectural sequence |
GB0002848D0 (en) | 2000-02-08 | 2000-03-29 | Siroyan Limited | Communicating instruction results in processors and compiling methods for processors |
GB2365661A (en) | 2000-03-10 | 2002-02-20 | British Telecomm | Allocating switch requests within a packet switch |
US6615340B1 (en) | 2000-03-22 | 2003-09-02 | Wilmot, Ii Richard Byron | Extended operand management indicator structure and method |
US7140022B2 (en) | 2000-06-02 | 2006-11-21 | Honeywell International Inc. | Method and apparatus for slack stealing with dynamic threads |
US6604187B1 (en) | 2000-06-19 | 2003-08-05 | Advanced Micro Devices, Inc. | Providing global translations with address space numbers |
US6557083B1 (en) | 2000-06-30 | 2003-04-29 | Intel Corporation | Memory system for multiple data types |
US6704860B1 (en) | 2000-07-26 | 2004-03-09 | International Business Machines Corporation | Data processing system and method for fetching instruction blocks in response to a detected block sequence |
US7206925B1 (en) | 2000-08-18 | 2007-04-17 | Sun Microsystems, Inc. | Backing Register File for processors |
US6728866B1 (en) | 2000-08-31 | 2004-04-27 | International Business Machines Corporation | Partitioned issue queue and allocation strategy |
US6721874B1 (en) | 2000-10-12 | 2004-04-13 | International Business Machines Corporation | Method and system for dynamically shared completion table supporting multiple threads in a processing system |
US6639866B2 (en) | 2000-11-03 | 2003-10-28 | Broadcom Corporation | Very small swing high performance asynchronous CMOS static memory (multi-port register file) with power reducing column multiplexing scheme |
US7757065B1 (en) | 2000-11-09 | 2010-07-13 | Intel Corporation | Instruction segment recording scheme |
JP2002185513A (ja) | 2000-12-18 | 2002-06-28 | Hitachi Ltd | パケット通信ネットワークおよびパケット転送制御方法 |
US6877089B2 (en) | 2000-12-27 | 2005-04-05 | International Business Machines Corporation | Branch prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program |
US6907600B2 (en) | 2000-12-27 | 2005-06-14 | Intel Corporation | Virtual translation lookaside buffer |
US6647466B2 (en) | 2001-01-25 | 2003-11-11 | Hewlett-Packard Development Company, L.P. | Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy |
FR2820921A1 (fr) | 2001-02-14 | 2002-08-16 | Canon Kk | Dispositif et procede de transmission dans un commutateur |
US6985951B2 (en) | 2001-03-08 | 2006-01-10 | International Business Machines Corporation | Inter-partition message passing method, system and program product for managing workload in a partitioned processing environment |
US6950927B1 (en) | 2001-04-13 | 2005-09-27 | The United States Of America As Represented By The Secretary Of The Navy | System and method for instruction-level parallelism in a programmable multiple network processor environment |
US7707397B2 (en) | 2001-05-04 | 2010-04-27 | Via Technologies, Inc. | Variable group associativity branch target address cache delivering multiple target addresses per cache line |
US7200740B2 (en) | 2001-05-04 | 2007-04-03 | Ip-First, Llc | Apparatus and method for speculatively performing a return instruction in a microprocessor |
US6658549B2 (en) | 2001-05-22 | 2003-12-02 | Hewlett-Packard Development Company, Lp. | Method and system allowing a single entity to manage memory comprising compressed and uncompressed data |
US6985591B2 (en) | 2001-06-29 | 2006-01-10 | Intel Corporation | Method and apparatus for distributing keys for decrypting and re-encrypting publicly distributed media |
US7203824B2 (en) | 2001-07-03 | 2007-04-10 | Ip-First, Llc | Apparatus and method for handling BTAC branches that wrap across instruction cache lines |
US7024545B1 (en) | 2001-07-24 | 2006-04-04 | Advanced Micro Devices, Inc. | Hybrid branch prediction device with two levels of branch prediction cache |
US6954846B2 (en) * | 2001-08-07 | 2005-10-11 | Sun Microsystems, Inc. | Microprocessor and method for giving each thread exclusive access to one register file in a multi-threading mode and for giving an active thread access to multiple register files in a single thread mode |
KR20030018999A (ko) | 2001-08-31 | 2003-03-06 | 엘지전자 주식회사 | 동기식 디지털 계위의 가상상자 포인터 생성을 위한스터프 타임 결정장치 및 그 방법 |
US6718440B2 (en) | 2001-09-28 | 2004-04-06 | Intel Corporation | Memory access latency hiding with hint buffer |
US7150021B1 (en) | 2001-10-12 | 2006-12-12 | Palau Acquisition Corporation (Delaware) | Method and system to allocate resources within an interconnect device according to a resource allocation table |
US7117347B2 (en) | 2001-10-23 | 2006-10-03 | Ip-First, Llc | Processor including fallback branch prediction mechanism for far jump and far call instructions |
US7272832B2 (en) | 2001-10-25 | 2007-09-18 | Hewlett-Packard Development Company, L.P. | Method of protecting user process data in a secure platform inaccessible to the operating system and other tasks on top of the secure platform |
US6964043B2 (en) | 2001-10-30 | 2005-11-08 | Intel Corporation | Method, apparatus, and system to optimize frequently executed code and to use compiler transformation and hardware support to handle infrequently executed code |
GB2381886B (en) | 2001-11-07 | 2004-06-23 | Sun Microsystems Inc | Computer system with virtual memory and paging mechanism |
US7092869B2 (en) | 2001-11-14 | 2006-08-15 | Ronald Hilton | Memory address prediction under emulation |
US20030126416A1 (en) | 2001-12-31 | 2003-07-03 | Marr Deborah T. | Suspending execution of a thread in a multi-threaded processor |
US7363467B2 (en) | 2002-01-03 | 2008-04-22 | Intel Corporation | Dependence-chain processing using trace descriptors having dependency descriptors |
US6640333B2 (en) | 2002-01-10 | 2003-10-28 | Lsi Logic Corporation | Architecture for a sea of platforms |
US7055021B2 (en) | 2002-02-05 | 2006-05-30 | Sun Microsystems, Inc. | Out-of-order processor that reduces mis-speculation using a replay scoreboard |
US7331040B2 (en) | 2002-02-06 | 2008-02-12 | Transitive Limted | Condition code flag emulation for program code conversion |
US20030154363A1 (en) | 2002-02-11 | 2003-08-14 | Soltis Donald C. | Stacked register aliasing in data hazard detection to reduce circuit |
US6839816B2 (en) | 2002-02-26 | 2005-01-04 | International Business Machines Corporation | Shared cache line update mechanism |
US6731292B2 (en) | 2002-03-06 | 2004-05-04 | Sun Microsystems, Inc. | System and method for controlling a number of outstanding data transactions within an integrated circuit |
JP3719509B2 (ja) * | 2002-04-01 | 2005-11-24 | 株式会社ソニー・コンピュータエンタテインメント | シリアル演算パイプライン、演算装置、算術論理演算回路およびシリアル演算パイプラインによる演算方法 |
US7565509B2 (en) | 2002-04-17 | 2009-07-21 | Microsoft Corporation | Using limits on address translation to control access to an addressable entity |
US6920530B2 (en) | 2002-04-23 | 2005-07-19 | Sun Microsystems, Inc. | Scheme for reordering instructions via an instruction caching mechanism |
US7113488B2 (en) | 2002-04-24 | 2006-09-26 | International Business Machines Corporation | Reconfigurable circular bus |
US6760818B2 (en) | 2002-05-01 | 2004-07-06 | Koninklijke Philips Electronics N.V. | Memory region based data pre-fetching |
US7281055B2 (en) | 2002-05-28 | 2007-10-09 | Newisys, Inc. | Routing mechanisms in systems having multiple multi-processor clusters |
US7117346B2 (en) | 2002-05-31 | 2006-10-03 | Freescale Semiconductor, Inc. | Data processing system having multiple register contexts and method therefor |
US6938151B2 (en) | 2002-06-04 | 2005-08-30 | International Business Machines Corporation | Hybrid branch prediction using a global selection counter and a prediction method comparison table |
US6735747B2 (en) | 2002-06-10 | 2004-05-11 | Lsi Logic Corporation | Pre-silicon verification path coverage |
US8024735B2 (en) | 2002-06-14 | 2011-09-20 | Intel Corporation | Method and apparatus for ensuring fairness and forward progress when executing multiple threads of execution |
JP3845043B2 (ja) | 2002-06-28 | 2006-11-15 | 富士通株式会社 | 命令フェッチ制御装置 |
JP3982353B2 (ja) | 2002-07-12 | 2007-09-26 | 日本電気株式会社 | フォルトトレラントコンピュータ装置、その再同期化方法及び再同期化プログラム |
US6944744B2 (en) | 2002-08-27 | 2005-09-13 | Advanced Micro Devices, Inc. | Apparatus and method for independently schedulable functional units with issue lock mechanism in a processor |
US7546422B2 (en) | 2002-08-28 | 2009-06-09 | Intel Corporation | Method and apparatus for the synchronization of distributed caches |
US6950925B1 (en) | 2002-08-28 | 2005-09-27 | Advanced Micro Devices, Inc. | Scheduler for use in a microprocessor that supports data-speculative execution |
TW200408242A (en) | 2002-09-06 | 2004-05-16 | Matsushita Electric Ind Co Ltd | Home terminal apparatus and communication system |
US6895491B2 (en) | 2002-09-26 | 2005-05-17 | Hewlett-Packard Development Company, L.P. | Memory addressing for a virtual machine implementation on a computer processor supporting virtual hash-page-table searching |
US7334086B2 (en) | 2002-10-08 | 2008-02-19 | Rmi Corporation | Advanced processor with system on a chip interconnect technology |
US6829698B2 (en) | 2002-10-10 | 2004-12-07 | International Business Machines Corporation | Method, apparatus and system for acquiring a global promotion facility utilizing a data-less transaction |
US7213248B2 (en) | 2002-10-10 | 2007-05-01 | International Business Machines Corporation | High speed promotion mechanism suitable for lock acquisition in a multiprocessor data processing system |
US7222218B2 (en) | 2002-10-22 | 2007-05-22 | Sun Microsystems, Inc. | System and method for goal-based scheduling of blocks of code for concurrent execution |
US20040103251A1 (en) | 2002-11-26 | 2004-05-27 | Mitchell Alsup | Microprocessor including a first level cache and a second level cache having different cache line sizes |
CN1720494A (zh) | 2002-12-04 | 2006-01-11 | 皇家飞利浦电子股份有限公司 | 减少微处理器的功率消耗的寄存器堆选通方法 |
US6981083B2 (en) | 2002-12-05 | 2005-12-27 | International Business Machines Corporation | Processor virtualization mechanism via an enhanced restoration of hard architected states |
US7073042B2 (en) | 2002-12-12 | 2006-07-04 | Intel Corporation | Reclaiming existing fields in address translation data structures to extend control over memory accesses |
US20040117594A1 (en) | 2002-12-13 | 2004-06-17 | Vanderspek Julius | Memory management method |
US20040122887A1 (en) * | 2002-12-20 | 2004-06-24 | Macy William W. | Efficient multiplication of small matrices using SIMD registers |
US7191349B2 (en) | 2002-12-26 | 2007-03-13 | Intel Corporation | Mechanism for processor power state aware distribution of lowest priority interrupt |
US6925421B2 (en) | 2003-01-09 | 2005-08-02 | International Business Machines Corporation | Method, system, and computer program product for estimating the number of consumers that place a load on an individual resource in a pool of physically distributed resources |
US20040139441A1 (en) | 2003-01-09 | 2004-07-15 | Kabushiki Kaisha Toshiba | Processor, arithmetic operation processing method, and priority determination method |
US7178010B2 (en) | 2003-01-16 | 2007-02-13 | Ip-First, Llc | Method and apparatus for correcting an internal call/return stack in a microprocessor that detects from multiple pipeline stages incorrect speculative update of the call/return stack |
US7089374B2 (en) | 2003-02-13 | 2006-08-08 | Sun Microsystems, Inc. | Selectively unmarking load-marked cache lines during transactional program execution |
US7278030B1 (en) | 2003-03-03 | 2007-10-02 | Vmware, Inc. | Virtualization system for computers having multiple protection mechanisms |
US6912644B1 (en) | 2003-03-06 | 2005-06-28 | Intel Corporation | Method and apparatus to steer memory access operations in a virtual memory system |
US7111145B1 (en) | 2003-03-25 | 2006-09-19 | Vmware, Inc. | TLB miss fault handler and method for accessing multiple page tables |
US7143273B2 (en) | 2003-03-31 | 2006-11-28 | Intel Corporation | Method and apparatus for dynamic branch prediction utilizing multiple stew algorithms for indexing a global history |
CN1214666C (zh) | 2003-04-07 | 2005-08-10 | 华为技术有限公司 | 位置业务中限制位置信息请求流量的方法 |
US7058764B2 (en) | 2003-04-14 | 2006-06-06 | Hewlett-Packard Development Company, L.P. | Method of adaptive cache partitioning to increase host I/O performance |
EP1471421A1 (en) | 2003-04-24 | 2004-10-27 | STMicroelectronics Limited | Speculative load instruction control |
US7290261B2 (en) | 2003-04-24 | 2007-10-30 | International Business Machines Corporation | Method and logical apparatus for rename register reallocation in a simultaneous multi-threaded (SMT) processor |
US7139855B2 (en) | 2003-04-24 | 2006-11-21 | International Business Machines Corporation | High performance synchronization of resource allocation in a logically-partitioned system |
US7469407B2 (en) | 2003-04-24 | 2008-12-23 | International Business Machines Corporation | Method for resource balancing using dispatch flush in a simultaneous multithread processor |
US7055003B2 (en) | 2003-04-25 | 2006-05-30 | International Business Machines Corporation | Data cache scrub mechanism for large L2/L3 data cache structures |
US7007108B2 (en) | 2003-04-30 | 2006-02-28 | Lsi Logic Corporation | System method for use of hardware semaphores for resource release notification wherein messages comprises read-modify-write operation and address |
US7743238B2 (en) | 2003-05-09 | 2010-06-22 | Arm Limited | Accessing items of architectural state from a register cache in a data processing apparatus when performing branch prediction operations for an indirect branch instruction |
JP2005032018A (ja) | 2003-07-04 | 2005-02-03 | Semiconductor Energy Lab Co Ltd | 遺伝的アルゴリズムを用いたマイクロプロセッサ |
US7149872B2 (en) | 2003-07-10 | 2006-12-12 | Transmeta Corporation | System and method for identifying TLB entries associated with a physical address of a specified range |
US7089398B2 (en) | 2003-07-31 | 2006-08-08 | Silicon Graphics, Inc. | Address translation using a page size tag |
US8296771B2 (en) | 2003-08-18 | 2012-10-23 | Cray Inc. | System and method for mapping between resource consumers and resource providers in a computing system |
US7133950B2 (en) | 2003-08-19 | 2006-11-07 | Sun Microsystems, Inc. | Request arbitration in multi-core processor |
US9032404B2 (en) | 2003-08-28 | 2015-05-12 | Mips Technologies, Inc. | Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor |
US7694304B2 (en) | 2003-08-28 | 2010-04-06 | Mips Technologies, Inc. | Mechanisms for dynamic configuration of virtual processor resources |
US7594089B2 (en) | 2003-08-28 | 2009-09-22 | Mips Technologies, Inc. | Smart memory based synchronization controller for a multi-threaded multiprocessor SoC |
US7849297B2 (en) | 2003-08-28 | 2010-12-07 | Mips Technologies, Inc. | Software emulation of directed exceptions in a multithreading processor |
US7111126B2 (en) | 2003-09-24 | 2006-09-19 | Arm Limited | Apparatus and method for loading data values |
JP4057989B2 (ja) | 2003-09-26 | 2008-03-05 | 株式会社東芝 | スケジューリング方法および情報処理システム |
US7047322B1 (en) | 2003-09-30 | 2006-05-16 | Unisys Corporation | System and method for performing conflict resolution and flow control in a multiprocessor system |
US7373637B2 (en) | 2003-09-30 | 2008-05-13 | International Business Machines Corporation | Method and apparatus for counting instruction and memory location ranges |
FR2860313B1 (fr) * | 2003-09-30 | 2005-11-04 | Commissariat Energie Atomique | Composant a architecture reconfigurable dynamiquement |
TWI281121B (en) | 2003-10-06 | 2007-05-11 | Ip First Llc | Apparatus and method for selectively overriding return stack prediction in response to detection of non-standard return sequence |
US7395372B2 (en) | 2003-11-14 | 2008-07-01 | International Business Machines Corporation | Method and system for providing cache set selection which is power optimized |
US7243170B2 (en) | 2003-11-24 | 2007-07-10 | International Business Machines Corporation | Method and circuit for reading and writing an instruction buffer |
US20050120191A1 (en) | 2003-12-02 | 2005-06-02 | Intel Corporation (A Delaware Corporation) | Checkpoint-based register reclamation |
US20050132145A1 (en) | 2003-12-15 | 2005-06-16 | Finisar Corporation | Contingent processor time division multiple access of memory in a multi-processor system to allow supplemental memory consumer access |
US7310722B2 (en) | 2003-12-18 | 2007-12-18 | Nvidia Corporation | Across-thread out of order instruction dispatch in a multithreaded graphics processor |
US7293164B2 (en) | 2004-01-14 | 2007-11-06 | International Business Machines Corporation | Autonomic method and apparatus for counting branch instructions to generate branch statistics meant to improve branch predictions |
US20050204118A1 (en) | 2004-02-27 | 2005-09-15 | National Chiao Tung University | Method for inter-cluster communication that employs register permutation |
CA2558892A1 (en) | 2004-03-13 | 2005-09-29 | Cluster Resources, Inc. | System and method for a self-optimizing reservation in time of compute resources |
US7478374B2 (en) | 2004-03-22 | 2009-01-13 | Intel Corporation | Debug system having assembler correcting register allocation errors |
US20050216920A1 (en) | 2004-03-24 | 2005-09-29 | Vijay Tewari | Use of a virtual machine to emulate a hardware device |
WO2005093562A1 (ja) | 2004-03-29 | 2005-10-06 | Kyoto University | データ処理装置、データ処理プログラム、およびデータ処理プログラムを記録した記録媒体 |
US7383427B2 (en) | 2004-04-22 | 2008-06-03 | Sony Computer Entertainment Inc. | Multi-scalar extension for SIMD instruction set processors |
US20050251649A1 (en) | 2004-04-23 | 2005-11-10 | Sony Computer Entertainment Inc. | Methods and apparatus for address map optimization on a multi-scalar extension |
US7418582B1 (en) * | 2004-05-13 | 2008-08-26 | Sun Microsystems, Inc. | Versatile register file design for a multi-threaded processor utilizing different modes and register windows |
US7478198B2 (en) | 2004-05-24 | 2009-01-13 | Intel Corporation | Multithreaded clustered microarchitecture with dynamic back-end assignment |
US7594234B1 (en) | 2004-06-04 | 2009-09-22 | Sun Microsystems, Inc. | Adaptive spin-then-block mutual exclusion in multi-threaded processing |
US7284092B2 (en) | 2004-06-24 | 2007-10-16 | International Business Machines Corporation | Digital data processing apparatus having multi-level register file |
US20050289530A1 (en) | 2004-06-29 | 2005-12-29 | Robison Arch D | Scheduling of instructions in program compilation |
EP1628235A1 (en) | 2004-07-01 | 2006-02-22 | Texas Instruments Incorporated | Method and system of ensuring integrity of a secure mode entry sequence |
US8044951B1 (en) | 2004-07-02 | 2011-10-25 | Nvidia Corporation | Integer-based functionality in a graphics shading language |
US7339592B2 (en) | 2004-07-13 | 2008-03-04 | Nvidia Corporation | Simulating multiported memories using lower port count memories |
US7398347B1 (en) * | 2004-07-14 | 2008-07-08 | Altera Corporation | Methods and apparatus for dynamic instruction controlled reconfigurable register file |
EP1619593A1 (en) | 2004-07-22 | 2006-01-25 | Sap Ag | Computer-Implemented method and system for performing a product availability check |
JP4064380B2 (ja) | 2004-07-29 | 2008-03-19 | 富士通株式会社 | 演算処理装置およびその制御方法 |
US8443171B2 (en) | 2004-07-30 | 2013-05-14 | Hewlett-Packard Development Company, L.P. | Run-time updating of prediction hint instructions |
US7213106B1 (en) | 2004-08-09 | 2007-05-01 | Sun Microsystems, Inc. | Conservative shadow cache support in a point-to-point connected multiprocessing node |
US7318143B2 (en) * | 2004-10-20 | 2008-01-08 | Arm Limited | Reuseable configuration data |
US20090150890A1 (en) | 2007-12-10 | 2009-06-11 | Yourst Matt T | Strand-based computing hardware and dynamically optimizing strandware for a high performance microprocessor system |
US7707578B1 (en) | 2004-12-16 | 2010-04-27 | Vmware, Inc. | Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system |
US7257695B2 (en) * | 2004-12-28 | 2007-08-14 | Intel Corporation | Register file regions for a processing system |
US7996644B2 (en) | 2004-12-29 | 2011-08-09 | Intel Corporation | Fair sharing of a cache in a multi-core/multi-threaded processor by dynamically partitioning of the cache |
US8719819B2 (en) | 2005-06-30 | 2014-05-06 | Intel Corporation | Mechanism for instruction set based thread execution on a plurality of instruction sequencers |
US7050922B1 (en) | 2005-01-14 | 2006-05-23 | Agilent Technologies, Inc. | Method for optimizing test order, and machine-readable media storing sequences of instructions to perform same |
US20060179277A1 (en) | 2005-02-04 | 2006-08-10 | Flachs Brian K | System and method for instruction line buffer holding a branch target buffer |
US7657891B2 (en) | 2005-02-04 | 2010-02-02 | Mips Technologies, Inc. | Multithreading microprocessor with optimized thread scheduler for increasing pipeline utilization efficiency |
US7681014B2 (en) | 2005-02-04 | 2010-03-16 | Mips Technologies, Inc. | Multithreading instruction scheduler employing thread group priorities |
EP1849095B1 (en) * | 2005-02-07 | 2013-01-02 | Richter, Thomas | Low latency massive parallel data processing device |
US7400548B2 (en) | 2005-02-09 | 2008-07-15 | International Business Machines Corporation | Method for providing multiple reads/writes using a 2read/2write register file array |
US7343476B2 (en) | 2005-02-10 | 2008-03-11 | International Business Machines Corporation | Intelligent SMT thread hang detect taking into account shared resource contention/blocking |
US7152155B2 (en) | 2005-02-18 | 2006-12-19 | Qualcomm Incorporated | System and method of correcting a branch misprediction |
US20060200655A1 (en) | 2005-03-04 | 2006-09-07 | Smith Rodney W | Forward looking branch target address caching |
US20060212853A1 (en) | 2005-03-18 | 2006-09-21 | Marvell World Trade Ltd. | Real-time control apparatus having a multi-thread processor |
US8195922B2 (en) | 2005-03-18 | 2012-06-05 | Marvell World Trade, Ltd. | System for dynamically allocating processing time to multiple threads |
GB2424727B (en) | 2005-03-30 | 2007-08-01 | Transitive Ltd | Preparing instruction groups for a processor having a multiple issue ports |
US8522253B1 (en) | 2005-03-31 | 2013-08-27 | Guillermo Rozas | Hardware support for virtual machine and operating system context switching in translation lookaside buffers and virtually tagged caches |
US7313775B2 (en) * | 2005-04-06 | 2007-12-25 | Lsi Corporation | Integrated circuit with relocatable processor hardmac |
US20060230243A1 (en) | 2005-04-06 | 2006-10-12 | Robert Cochran | Cascaded snapshots |
US8230423B2 (en) | 2005-04-07 | 2012-07-24 | International Business Machines Corporation | Multithreaded processor architecture with operational latency hiding |
US20060230409A1 (en) | 2005-04-07 | 2006-10-12 | Matteo Frigo | Multithreaded processor architecture with implicit granularity adaptation |
US20060230253A1 (en) | 2005-04-11 | 2006-10-12 | Lucian Codrescu | Unified non-partitioned register files for a digital signal processor operating in an interleaved multi-threaded environment |
US20060236074A1 (en) | 2005-04-14 | 2006-10-19 | Arm Limited | Indicating storage locations within caches |
US7437543B2 (en) | 2005-04-19 | 2008-10-14 | International Business Machines Corporation | Reducing the fetch time of target instructions of a predicted taken branch instruction |
US7461237B2 (en) | 2005-04-20 | 2008-12-02 | Sun Microsystems, Inc. | Method and apparatus for suppressing duplicative prefetches for branch target cache lines |
JP2008537268A (ja) * | 2005-04-22 | 2008-09-11 | アルトリックス ロジック,インク. | 可変精度相互接続を具えたデータ処理エレメントの配列 |
US8713286B2 (en) | 2005-04-26 | 2014-04-29 | Qualcomm Incorporated | Register files for a digital signal processor operating in an interleaved multi-threaded environment |
GB2426084A (en) | 2005-05-13 | 2006-11-15 | Agilent Technologies Inc | Updating data in a dual port memory |
US7861055B2 (en) | 2005-06-07 | 2010-12-28 | Broadcom Corporation | Method and system for on-chip configurable data ram for fast memory and pseudo associative caches |
US8010969B2 (en) | 2005-06-13 | 2011-08-30 | Intel Corporation | Mechanism for monitoring instruction set based thread execution on a plurality of instruction sequencers |
WO2007027671A2 (en) | 2005-08-29 | 2007-03-08 | Searete Llc | Scheduling mechanism of a hierarchical processor including multiple parallel clusters |
US7765350B2 (en) | 2005-09-14 | 2010-07-27 | Koninklijke Philips Electronics N.V. | Method and system for bus arbitration |
US7350056B2 (en) * | 2005-09-27 | 2008-03-25 | International Business Machines Corporation | Method and apparatus for issuing instructions from an issue queue in an information handling system |
US7606975B1 (en) | 2005-09-28 | 2009-10-20 | Sun Microsystems, Inc. | Trace cache for efficient self-modifying code processing |
US7231106B2 (en) | 2005-09-30 | 2007-06-12 | Lucent Technologies Inc. | Apparatus for directing an optical signal from an input fiber to an output fiber within a high index host |
US7627735B2 (en) | 2005-10-21 | 2009-12-01 | Intel Corporation | Implementing vector memory operations |
US7613131B2 (en) | 2005-11-10 | 2009-11-03 | Citrix Systems, Inc. | Overlay network infrastructure |
US7681019B1 (en) | 2005-11-18 | 2010-03-16 | Sun Microsystems, Inc. | Executing functions determined via a collection of operations from translated instructions |
US7861060B1 (en) | 2005-12-15 | 2010-12-28 | Nvidia Corporation | Parallel data processing systems and methods using cooperative thread arrays and thread identifier values to determine processing behavior |
US7634637B1 (en) | 2005-12-16 | 2009-12-15 | Nvidia Corporation | Execution of parallel groups of threads with per-instruction serialization |
US7673111B2 (en) | 2005-12-23 | 2010-03-02 | Intel Corporation | Memory system with both single and consolidated commands |
US7770161B2 (en) | 2005-12-28 | 2010-08-03 | International Business Machines Corporation | Post-register allocation profile directed instruction scheduling |
US8423682B2 (en) | 2005-12-30 | 2013-04-16 | Intel Corporation | Address space emulation |
US20070186050A1 (en) | 2006-02-03 | 2007-08-09 | International Business Machines Corporation | Self prefetching L2 cache mechanism for data lines |
GB2435362B (en) | 2006-02-20 | 2008-11-26 | Cramer Systems Ltd | Method of configuring devices in a telecommunications network |
WO2007097019A1 (ja) | 2006-02-27 | 2007-08-30 | Fujitsu Limited | キャッシュ制御装置およびキャッシュ制御方法 |
US7543282B2 (en) | 2006-03-24 | 2009-06-02 | Sun Microsystems, Inc. | Method and apparatus for selectively executing different executable code versions which are optimized in different ways |
EP2011018B1 (en) | 2006-04-12 | 2016-07-13 | Soft Machines, Inc. | Apparatus and method for processing an instruction matrix specifying parallel and dependent operations |
US7610571B2 (en) | 2006-04-14 | 2009-10-27 | Cadence Design Systems, Inc. | Method and system for simulating state retention of an RTL design |
US7577820B1 (en) | 2006-04-14 | 2009-08-18 | Tilera Corporation | Managing data in a parallel processing environment |
CN100485636C (zh) | 2006-04-24 | 2009-05-06 | 华为技术有限公司 | 一种基于模型驱动进行电信级业务开发的调试方法及装置 |
US7804076B2 (en) | 2006-05-10 | 2010-09-28 | Taiwan Semiconductor Manufacturing Co., Ltd | Insulator for high current ion implanters |
US8145882B1 (en) | 2006-05-25 | 2012-03-27 | Mips Technologies, Inc. | Apparatus and method for processing template based user defined instructions |
US20080126771A1 (en) | 2006-07-25 | 2008-05-29 | Lei Chen | Branch Target Extension for an Instruction Cache |
CN100495324C (zh) | 2006-07-27 | 2009-06-03 | 中国科学院计算技术研究所 | 复杂指令集体系结构中的深度优先异常处理方法 |
US7904704B2 (en) | 2006-08-14 | 2011-03-08 | Marvell World Trade Ltd. | Instruction dispatching method and apparatus |
US8046775B2 (en) | 2006-08-14 | 2011-10-25 | Marvell World Trade Ltd. | Event-based bandwidth allocation mode switching method and apparatus |
US7539842B2 (en) | 2006-08-15 | 2009-05-26 | International Business Machines Corporation | Computer memory system for selecting memory buses according to physical memory organization information stored in virtual address translation tables |
US7594060B2 (en) | 2006-08-23 | 2009-09-22 | Sun Microsystems, Inc. | Data buffer allocation in a non-blocking data services platform using input/output switching fabric |
US7752474B2 (en) | 2006-09-22 | 2010-07-06 | Apple Inc. | L1 cache flush when processor is entering low power mode |
US7716460B2 (en) | 2006-09-29 | 2010-05-11 | Qualcomm Incorporated | Effective use of a BHT in processor having variable length instruction set execution modes |
US7774549B2 (en) | 2006-10-11 | 2010-08-10 | Mips Technologies, Inc. | Horizontally-shared cache victims in multiple core processors |
TWI337495B (en) | 2006-10-26 | 2011-02-11 | Au Optronics Corp | System and method for operation scheduling |
US8108625B1 (en) | 2006-10-30 | 2012-01-31 | Nvidia Corporation | Shared memory with parallel access and access conflict resolution mechanism |
US7680988B1 (en) | 2006-10-30 | 2010-03-16 | Nvidia Corporation | Single interconnect providing read and write access to a memory shared by concurrent threads |
US7617384B1 (en) | 2006-11-06 | 2009-11-10 | Nvidia Corporation | Structured programming control flow using a disable mask in a SIMD architecture |
EP2122461A4 (en) | 2006-11-14 | 2010-03-24 | Soft Machines Inc | DEVICE AND METHOD FOR PROCESSING COMMUNICATIONS IN A MULTITHREAD ARCHITECTURE WITH CONTEXT CHANGES |
US7493475B2 (en) | 2006-11-15 | 2009-02-17 | Stmicroelectronics, Inc. | Instruction vector-mode processing in multi-lane processor by multiplex switch replicating instruction in one lane to select others along with updated operand address |
US7934179B2 (en) | 2006-11-20 | 2011-04-26 | Et International, Inc. | Systems and methods for logic verification |
US20080235500A1 (en) | 2006-11-21 | 2008-09-25 | Davis Gordon T | Structure for instruction cache trace formation |
JP2008130056A (ja) | 2006-11-27 | 2008-06-05 | Renesas Technology Corp | 半導体回路 |
WO2008077088A2 (en) | 2006-12-19 | 2008-06-26 | The Board Of Governors For Higher Education, State Of Rhode Island And Providence Plantations | System and method for branch misprediction prediction using complementary branch predictors |
US7783869B2 (en) | 2006-12-19 | 2010-08-24 | Arm Limited | Accessing branch predictions ahead of instruction fetching |
EP1940028B1 (en) | 2006-12-29 | 2012-02-29 | STMicroelectronics Srl | Asynchronous interconnection system for 3D inter-chip communication |
US8321849B2 (en) | 2007-01-26 | 2012-11-27 | Nvidia Corporation | Virtual architecture and instruction set for parallel thread computing |
TW200833002A (en) | 2007-01-31 | 2008-08-01 | Univ Nat Yunlin Sci & Tech | Distributed switching circuit having fairness |
US20080189501A1 (en) | 2007-02-05 | 2008-08-07 | Irish John D | Methods and Apparatus for Issuing Commands on a Bus |
US7685410B2 (en) | 2007-02-13 | 2010-03-23 | Global Foundries Inc. | Redirect recovery cache that receives branch misprediction redirects and caches instructions to be dispatched in response to the redirects |
US7647483B2 (en) | 2007-02-20 | 2010-01-12 | Sony Computer Entertainment Inc. | Multi-threaded parallel processor methods and apparatus |
US20080209190A1 (en) | 2007-02-28 | 2008-08-28 | Advanced Micro Devices, Inc. | Parallel prediction of multiple branches |
JP4980751B2 (ja) | 2007-03-02 | 2012-07-18 | 富士通セミコンダクター株式会社 | データ処理装置、およびメモリのリードアクティブ制御方法。 |
US8452907B2 (en) | 2007-03-27 | 2013-05-28 | Arm Limited | Data processing apparatus and method for arbitrating access to a shared resource |
US20080250227A1 (en) | 2007-04-04 | 2008-10-09 | Linderman Michael D | General Purpose Multiprocessor Programming Apparatus And Method |
US7716183B2 (en) | 2007-04-11 | 2010-05-11 | Dot Hill Systems Corporation | Snapshot preserved data cloning |
US7941791B2 (en) | 2007-04-13 | 2011-05-10 | Perry Wang | Programming environment for heterogeneous processor resource integration |
US7769955B2 (en) | 2007-04-27 | 2010-08-03 | Arm Limited | Multiple thread instruction fetch from different cache levels |
US7711935B2 (en) | 2007-04-30 | 2010-05-04 | Netlogic Microsystems, Inc. | Universal branch identifier for invalidation of speculative instructions |
US8555039B2 (en) | 2007-05-03 | 2013-10-08 | Qualcomm Incorporated | System and method for using a local condition code register for accelerating conditional instruction execution in a pipeline processor |
US8219996B1 (en) | 2007-05-09 | 2012-07-10 | Hewlett-Packard Development Company, L.P. | Computer processor with fairness monitor |
US9292436B2 (en) | 2007-06-25 | 2016-03-22 | Sonics, Inc. | Various methods and apparatus to support transactions whose data address sequence within that transaction crosses an interleaved channel address boundary |
CN101344840B (zh) | 2007-07-10 | 2011-08-31 | 苏州简约纳电子有限公司 | 一种微处理器及在微处理器中执行指令的方法 |
US7937568B2 (en) | 2007-07-11 | 2011-05-03 | International Business Machines Corporation | Adaptive execution cycle control method for enhanced instruction throughput |
US20090025004A1 (en) | 2007-07-16 | 2009-01-22 | Microsoft Corporation | Scheduling by Growing and Shrinking Resource Allocation |
US8108545B2 (en) | 2007-08-27 | 2012-01-31 | International Business Machines Corporation | Packet coalescing in virtual channels of a data processing system in a multi-tiered full-graph interconnect architecture |
US7711929B2 (en) | 2007-08-30 | 2010-05-04 | International Business Machines Corporation | Method and system for tracking instruction dependency in an out-of-order processor |
GB2452316B (en) | 2007-08-31 | 2009-08-19 | Toshiba Res Europ Ltd | Method of Allocating Resources in a Computer. |
US8725991B2 (en) * | 2007-09-12 | 2014-05-13 | Qualcomm Incorporated | Register file system and method for pipelined processing |
US8082420B2 (en) | 2007-10-24 | 2011-12-20 | International Business Machines Corporation | Method and apparatus for executing instructions |
US7856530B1 (en) | 2007-10-31 | 2010-12-21 | Network Appliance, Inc. | System and method for implementing a dynamic cache for a data storage system |
US7877559B2 (en) | 2007-11-26 | 2011-01-25 | Globalfoundries Inc. | Mechanism to accelerate removal of store operations from a queue |
US8245232B2 (en) | 2007-11-27 | 2012-08-14 | Microsoft Corporation | Software-configurable and stall-time fair memory access scheduling mechanism for shared memory systems |
US7809925B2 (en) | 2007-12-07 | 2010-10-05 | International Business Machines Corporation | Processing unit incorporating vectorizable execution unit |
US8145844B2 (en) | 2007-12-13 | 2012-03-27 | Arm Limited | Memory controller with write data cache and read data cache |
US7831813B2 (en) | 2007-12-17 | 2010-11-09 | Globalfoundries Inc. | Uses of known good code for implementing processor architectural modifications |
US7870371B2 (en) | 2007-12-17 | 2011-01-11 | Microsoft Corporation | Target-frequency based indirect jump prediction for high-performance processors |
US20090165007A1 (en) | 2007-12-19 | 2009-06-25 | Microsoft Corporation | Task-level thread scheduling and resource allocation |
US8782384B2 (en) | 2007-12-20 | 2014-07-15 | Advanced Micro Devices, Inc. | Branch history with polymorphic indirect branch information |
US7917699B2 (en) | 2007-12-21 | 2011-03-29 | Mips Technologies, Inc. | Apparatus and method for controlling the exclusivity mode of a level-two cache |
US9244855B2 (en) | 2007-12-31 | 2016-01-26 | Intel Corporation | Method, system, and apparatus for page sizing extension |
US8645965B2 (en) | 2007-12-31 | 2014-02-04 | Intel Corporation | Supporting metered clients with manycore through time-limited partitioning |
US7877582B2 (en) * | 2008-01-31 | 2011-01-25 | International Business Machines Corporation | Multi-addressable register file |
WO2009101563A1 (en) | 2008-02-11 | 2009-08-20 | Nxp B.V. | Multiprocessing implementing a plurality of virtual processors |
US9021240B2 (en) | 2008-02-22 | 2015-04-28 | International Business Machines Corporation | System and method for Controlling restarting of instruction fetching using speculative address computations |
US7987343B2 (en) | 2008-03-19 | 2011-07-26 | International Business Machines Corporation | Processor and method for synchronous load multiple fetching sequence and pipeline stage result tracking to facilitate early address generation interlock bypass |
US7949972B2 (en) | 2008-03-19 | 2011-05-24 | International Business Machines Corporation | Method, system and computer program product for exploiting orthogonal control vectors in timing driven synthesis |
US9513905B2 (en) | 2008-03-28 | 2016-12-06 | Intel Corporation | Vector instructions to enable efficient synchronization and parallel reduction operations |
US8120608B2 (en) | 2008-04-04 | 2012-02-21 | Via Technologies, Inc. | Constant buffering for a computational core of a programmable graphics processing unit |
TWI364703B (en) | 2008-05-26 | 2012-05-21 | Faraday Tech Corp | Processor and early execution method of data load thereof |
US8131982B2 (en) | 2008-06-13 | 2012-03-06 | International Business Machines Corporation | Branch prediction instructions having mask values involving unloading and loading branch history data |
US8145880B1 (en) | 2008-07-07 | 2012-03-27 | Ovics | Matrix processor data switch routing systems and methods |
JP5733860B2 (ja) | 2008-07-10 | 2015-06-10 | ロケティック テクノロジーズ リミテッド | 依存問題の効率的並列計算 |
JP2010039536A (ja) | 2008-07-31 | 2010-02-18 | Panasonic Corp | プログラム変換装置、プログラム変換方法およびプログラム変換プログラム |
US8316435B1 (en) | 2008-08-14 | 2012-11-20 | Juniper Networks, Inc. | Routing device having integrated MPLS-aware firewall with virtual security system support |
US8135942B2 (en) | 2008-08-28 | 2012-03-13 | International Business Machines Corpration | System and method for double-issue instructions using a dependency matrix and a side issue queue |
US7769984B2 (en) | 2008-09-11 | 2010-08-03 | International Business Machines Corporation | Dual-issuance of microprocessor instructions using dual dependency matrices |
US8225048B2 (en) | 2008-10-01 | 2012-07-17 | Hewlett-Packard Development Company, L.P. | Systems and methods for resource access |
US9244732B2 (en) | 2009-08-28 | 2016-01-26 | Vmware, Inc. | Compensating threads for microarchitectural resource contentions by prioritizing scheduling and execution |
US7941616B2 (en) | 2008-10-21 | 2011-05-10 | Microsoft Corporation | System to reduce interference in concurrent programs |
US8423749B2 (en) | 2008-10-22 | 2013-04-16 | International Business Machines Corporation | Sequential processing in network on chip nodes by threads generating message containing payload and pointer for nanokernel to access algorithm to be executed on payload in another node |
GB2464703A (en) | 2008-10-22 | 2010-04-28 | Advanced Risc Mach Ltd | An array of interconnected processors executing a cycle-based program |
EP2351325B1 (en) | 2008-10-30 | 2018-09-26 | Nokia Technologies Oy | Method and apparatus for interleaving a data block |
US8032678B2 (en) | 2008-11-05 | 2011-10-04 | Mediatek Inc. | Shared resource arbitration |
US7848129B1 (en) | 2008-11-20 | 2010-12-07 | Netlogic Microsystems, Inc. | Dynamically partitioned CAM array |
US8868838B1 (en) | 2008-11-21 | 2014-10-21 | Nvidia Corporation | Multi-class data cache policies |
US8171223B2 (en) | 2008-12-03 | 2012-05-01 | Intel Corporation | Method and system to increase concurrency and control replication in a multi-core cache hierarchy |
US8200949B1 (en) | 2008-12-09 | 2012-06-12 | Nvidia Corporation | Policy based allocation of register file cache to threads in multi-threaded processor |
US8312268B2 (en) | 2008-12-12 | 2012-11-13 | International Business Machines Corporation | Virtual machine |
US7870308B2 (en) | 2008-12-23 | 2011-01-11 | International Business Machines Corporation | Programmable direct memory access engine |
US8099586B2 (en) | 2008-12-30 | 2012-01-17 | Oracle America, Inc. | Branch misprediction recovery mechanism for microprocessors |
US20100169578A1 (en) | 2008-12-31 | 2010-07-01 | Texas Instruments Incorporated | Cache tag memory |
US20100205603A1 (en) | 2009-02-09 | 2010-08-12 | Unisys Corporation | Scheduling and dispatching tasks in an emulated operating system |
JP5417879B2 (ja) | 2009-02-17 | 2014-02-19 | 富士通セミコンダクター株式会社 | キャッシュ装置 |
JP2010226275A (ja) | 2009-03-23 | 2010-10-07 | Nec Corp | 通信装置および通信方法 |
US8505013B2 (en) | 2010-03-12 | 2013-08-06 | Lsi Corporation | Reducing data read latency in a network communications processor architecture |
US8805788B2 (en) | 2009-05-04 | 2014-08-12 | Moka5, Inc. | Transactional virtual disk with differential snapshots |
US8332854B2 (en) | 2009-05-19 | 2012-12-11 | Microsoft Corporation | Virtualized thread scheduling for hardware thread optimization based on hardware resource parameter summaries of instruction blocks in execution groups |
US8533437B2 (en) | 2009-06-01 | 2013-09-10 | Via Technologies, Inc. | Guaranteed prefetch instruction |
GB2471067B (en) | 2009-06-12 | 2011-11-30 | Graeme Roy Smith | Shared resource multi-thread array processor |
US9122487B2 (en) | 2009-06-23 | 2015-09-01 | Oracle America, Inc. | System and method for balancing instruction loads between multiple execution units using assignment history |
US8386754B2 (en) | 2009-06-24 | 2013-02-26 | Arm Limited | Renaming wide register source operand with plural short register source operands for select instructions to detect dependency fast with existing mechanism |
CN101582025B (zh) | 2009-06-25 | 2011-05-25 | 浙江大学 | 片上多处理器体系架构下全局寄存器重命名表的实现方法 |
US8397049B2 (en) | 2009-07-13 | 2013-03-12 | Apple Inc. | TLB prefetching |
US8539486B2 (en) | 2009-07-17 | 2013-09-17 | International Business Machines Corporation | Transactional block conflict resolution based on the determination of executing threads in parallel or in serial mode |
JP5423217B2 (ja) | 2009-08-04 | 2014-02-19 | 富士通株式会社 | 演算処理装置、情報処理装置、および演算処理装置の制御方法 |
US8127078B2 (en) | 2009-10-02 | 2012-02-28 | International Business Machines Corporation | High performance unaligned cache access |
US20110082983A1 (en) | 2009-10-06 | 2011-04-07 | Alcatel-Lucent Canada, Inc. | Cpu instruction and data cache corruption prevention system |
US8695002B2 (en) | 2009-10-20 | 2014-04-08 | Lantiq Deutschland Gmbh | Multi-threaded processors and multi-processor systems comprising shared resources |
US8364933B2 (en) | 2009-12-18 | 2013-01-29 | International Business Machines Corporation | Software assisted translation lookaside buffer search mechanism |
JP2011150397A (ja) | 2010-01-19 | 2011-08-04 | Panasonic Corp | バス調停装置 |
KR101699910B1 (ko) | 2010-03-04 | 2017-01-26 | 삼성전자주식회사 | 재구성 가능 프로세서 및 그 제어 방법 |
US20120005462A1 (en) | 2010-07-01 | 2012-01-05 | International Business Machines Corporation | Hardware Assist for Optimizing Code During Processing |
US8312258B2 (en) | 2010-07-22 | 2012-11-13 | Intel Corporation | Providing platform independent memory logic |
CN101916180B (zh) | 2010-08-11 | 2013-05-29 | 中国科学院计算技术研究所 | Risc处理器中执行寄存器类型指令的方法和其系统 |
US8751745B2 (en) | 2010-08-11 | 2014-06-10 | Advanced Micro Devices, Inc. | Method for concurrent flush of L1 and L2 caches |
US9201801B2 (en) | 2010-09-15 | 2015-12-01 | International Business Machines Corporation | Computing device with asynchronous auxiliary execution unit |
US8856460B2 (en) | 2010-09-15 | 2014-10-07 | Oracle International Corporation | System and method for zero buffer copying in a middleware environment |
EP2616928B1 (en) | 2010-09-17 | 2016-11-02 | Soft Machines, Inc. | Single cycle multi-branch prediction including shadow cache for early far branch prediction |
US20120079212A1 (en) | 2010-09-23 | 2012-03-29 | International Business Machines Corporation | Architecture for sharing caches among multiple processes |
EP3306466B1 (en) | 2010-10-12 | 2020-05-13 | INTEL Corporation | An instruction sequence buffer to store branches having reliably predictable instruction sequences |
CN107092467B (zh) | 2010-10-12 | 2021-10-29 | 英特尔公司 | 用于增强分支预测效率的指令序列缓冲器 |
US8370553B2 (en) | 2010-10-18 | 2013-02-05 | International Business Machines Corporation | Formal verification of random priority-based arbiters using property strengthening and underapproximations |
US9047178B2 (en) | 2010-12-13 | 2015-06-02 | SanDisk Technologies, Inc. | Auto-commit memory synchronization |
US8677355B2 (en) | 2010-12-17 | 2014-03-18 | Microsoft Corporation | Virtual machine branching and parallel execution |
WO2012103245A2 (en) | 2011-01-27 | 2012-08-02 | Soft Machines Inc. | Guest instruction block with near branching and far branching sequence construction to native instruction block |
KR101638225B1 (ko) | 2011-03-25 | 2016-07-08 | 소프트 머신즈, 인크. | 분할가능한 엔진에 의해 인스턴스화된 가상 코어를 이용한 명령어 시퀀스 코드 블록의 실행 |
EP2689326B1 (en) | 2011-03-25 | 2022-11-16 | Intel Corporation | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
CN103562866B (zh) | 2011-03-25 | 2018-03-30 | 英特尔公司 | 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段 |
US20120254592A1 (en) | 2011-04-01 | 2012-10-04 | Jesus Corbal San Adrian | Systems, apparatuses, and methods for expanding a memory source into a destination register and compressing a source register into a destination memory location |
US9740494B2 (en) | 2011-04-29 | 2017-08-22 | Arizona Board Of Regents For And On Behalf Of Arizona State University | Low complexity out-of-order issue logic using static circuits |
US8843690B2 (en) | 2011-07-11 | 2014-09-23 | Avago Technologies General Ip (Singapore) Pte. Ltd. | Memory conflicts learning capability |
US8930432B2 (en) | 2011-08-04 | 2015-01-06 | International Business Machines Corporation | Floating point execution unit with fixed point functionality |
US20130046934A1 (en) | 2011-08-15 | 2013-02-21 | Robert Nychka | System caching using heterogenous memories |
US8839025B2 (en) | 2011-09-30 | 2014-09-16 | Oracle International Corporation | Systems and methods for retiring and unretiring cache lines |
WO2013077876A1 (en) | 2011-11-22 | 2013-05-30 | Soft Machines, Inc. | A microprocessor accelerated code optimizer |
KR101703401B1 (ko) | 2011-11-22 | 2017-02-06 | 소프트 머신즈, 인크. | 다중 엔진 마이크로프로세서용 가속 코드 최적화기 |
KR101648278B1 (ko) | 2011-11-22 | 2016-08-12 | 소프트 머신즈, 인크. | 마이크로프로세서 가속 코드 최적화기 및 의존성 재순서화 방법 |
US20130138888A1 (en) | 2011-11-30 | 2013-05-30 | Jama I. Barreh | Storing a target address of a control transfer instruction in an instruction field |
US8930674B2 (en) | 2012-03-07 | 2015-01-06 | Soft Machines, Inc. | Systems and methods for accessing a unified translation lookaside buffer |
KR20130119285A (ko) | 2012-04-23 | 2013-10-31 | 한국전자통신연구원 | 클러스터 컴퓨팅 환경에서의 자원 할당 장치 및 그 방법 |
US9684601B2 (en) | 2012-05-10 | 2017-06-20 | Arm Limited | Data processing apparatus having cache and translation lookaside buffer |
US9996348B2 (en) | 2012-06-14 | 2018-06-12 | Apple Inc. | Zero cycle load |
US9940247B2 (en) | 2012-06-26 | 2018-04-10 | Advanced Micro Devices, Inc. | Concurrent access to cache dirty bits |
US9710399B2 (en) | 2012-07-30 | 2017-07-18 | Intel Corporation | Systems and methods for flushing a cache with modified data |
US9430410B2 (en) | 2012-07-30 | 2016-08-30 | Soft Machines, Inc. | Systems and methods for supporting a plurality of load accesses of a cache in a single cycle |
US9916253B2 (en) | 2012-07-30 | 2018-03-13 | Intel Corporation | Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput |
US9229873B2 (en) | 2012-07-30 | 2016-01-05 | Soft Machines, Inc. | Systems and methods for supporting a plurality of load and store accesses of a cache |
US9740612B2 (en) | 2012-07-30 | 2017-08-22 | Intel Corporation | Systems and methods for maintaining the coherency of a store coalescing cache and a load cache |
US9678882B2 (en) | 2012-10-11 | 2017-06-13 | Intel Corporation | Systems and methods for non-blocking implementation of cache flush instructions |
US10037228B2 (en) | 2012-10-25 | 2018-07-31 | Nvidia Corporation | Efficient memory virtualization in multi-threaded processing units |
US9195506B2 (en) | 2012-12-21 | 2015-11-24 | International Business Machines Corporation | Processor provisioning by a middleware processing system for a plurality of logical processor partitions |
GB2514956B (en) | 2013-01-21 | 2015-04-01 | Imagination Tech Ltd | Allocating resources to threads based on speculation metric |
US9569216B2 (en) | 2013-03-15 | 2017-02-14 | Soft Machines, Inc. | Method for populating a source view data structure by using register template snapshots |
US10140138B2 (en) | 2013-03-15 | 2018-11-27 | Intel Corporation | Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation |
WO2014150971A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for dependency broadcasting through a block organized source view data structure |
US9112767B2 (en) | 2013-03-15 | 2015-08-18 | Cavium, Inc. | Method and an accumulator scoreboard for out-of-order rule response handling |
US9904625B2 (en) | 2013-03-15 | 2018-02-27 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
US9811342B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for performing dual dispatch of blocks and half blocks |
WO2014150806A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for populating register view data structure by using register template snapshots |
WO2014150941A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for executing blocks of instructions using a microprocessor architecture having a register view, source view, instruction view, and a plurality of register templates |
CN105210040B (zh) | 2013-03-15 | 2019-04-02 | 英特尔公司 | 用于执行分组成块的多线程指令的方法 |
US9886279B2 (en) | 2013-03-15 | 2018-02-06 | Intel Corporation | Method for populating and instruction view data structure by using register template snapshots |
CN105247484B (zh) | 2013-03-15 | 2021-02-23 | 英特尔公司 | 利用本地分布式标志体系架构来仿真访客集中式标志体系架构的方法 |
US9632825B2 (en) | 2013-03-15 | 2017-04-25 | Intel Corporation | Method and apparatus for efficient scheduling for asymmetrical execution units |
US9891924B2 (en) | 2013-03-15 | 2018-02-13 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US10275255B2 (en) | 2013-03-15 | 2019-04-30 | Intel Corporation | Method for dependency broadcasting through a source organized source view data structure |
WO2014150991A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for implementing a reduced size register view data structure in a microprocessor |
US9208066B1 (en) | 2015-03-04 | 2015-12-08 | Centipede Semi Ltd. | Run-time code parallelization with approximate monitoring of instruction sequences |
-
2007
- 2007-04-12 EP EP07811845.2A patent/EP2011018B1/en active Active
- 2007-04-12 EP EP12150513.5A patent/EP2477109B1/en active Active
- 2007-04-12 CN CN201310589048.6A patent/CN103646009B/zh active Active
- 2007-04-12 US US12/296,919 patent/US8327115B2/en active Active
- 2007-04-12 WO PCT/US2007/066536 patent/WO2007143278A2/en active Application Filing
- 2007-04-12 CN CN2007800173174A patent/CN101449256B/zh active Active
-
2012
- 2012-11-30 US US13/691,609 patent/US9053292B2/en not_active Expired - Fee Related
-
2015
- 2015-06-08 US US14/733,827 patent/US9886416B2/en active Active
-
2017
- 2017-12-22 US US15/853,323 patent/US10289605B2/en active Active
-
2019
- 2019-04-01 US US16/371,831 patent/US11163720B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4835680A (en) * | 1985-03-15 | 1989-05-30 | Xerox Corporation | Adaptive processor array capable of learning variable associations useful in recognizing classes of inputs |
US4943909A (en) * | 1987-07-08 | 1990-07-24 | At&T Bell Laboratories | Computational origami |
WO2004114128A2 (en) * | 2003-06-25 | 2004-12-29 | Koninklijke Philips Electronics N.V. | Instruction controlled data processing device |
Cited By (56)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10289605B2 (en) | 2006-04-12 | 2019-05-14 | Intel Corporation | Apparatus and method for processing an instruction matrix specifying parallel and dependent operations |
US9886416B2 (en) | 2006-04-12 | 2018-02-06 | Intel Corporation | Apparatus and method for processing an instruction matrix specifying parallel and dependent operations |
US11163720B2 (en) | 2006-04-12 | 2021-11-02 | Intel Corporation | Apparatus and method for processing an instruction matrix specifying parallel and dependent operations |
US10585670B2 (en) | 2006-11-14 | 2020-03-10 | Intel Corporation | Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer |
US9965281B2 (en) | 2006-11-14 | 2018-05-08 | Intel Corporation | Cache storing data fetched by address calculating load instruction with label used as associated name for consuming instruction to refer |
US10228949B2 (en) | 2010-09-17 | 2019-03-12 | Intel Corporation | Single cycle multi-branch prediction including shadow cache for early far branch prediction |
US9766893B2 (en) | 2011-03-25 | 2017-09-19 | Intel Corporation | Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines |
US10564975B2 (en) | 2011-03-25 | 2020-02-18 | Intel Corporation | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US9921845B2 (en) | 2011-03-25 | 2018-03-20 | Intel Corporation | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US9990200B2 (en) | 2011-03-25 | 2018-06-05 | Intel Corporation | Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines |
US9842005B2 (en) | 2011-03-25 | 2017-12-12 | Intel Corporation | Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US9934072B2 (en) | 2011-03-25 | 2018-04-03 | Intel Corporation | Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines |
US11204769B2 (en) | 2011-03-25 | 2021-12-21 | Intel Corporation | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
CN103562866B (zh) * | 2011-03-25 | 2018-03-30 | 英特尔公司 | 用于通过使用由可分割引擎实例化的虚拟核来支持代码块执行的寄存器文件段 |
US10372454B2 (en) | 2011-05-20 | 2019-08-06 | Intel Corporation | Allocation of a segmented interconnect to support the execution of instruction sequences by a plurality of engines |
US10031784B2 (en) | 2011-05-20 | 2018-07-24 | Intel Corporation | Interconnect system to support the execution of instruction sequences by a plurality of partitionable engines |
US9940134B2 (en) | 2011-05-20 | 2018-04-10 | Intel Corporation | Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines |
US10521239B2 (en) | 2011-11-22 | 2019-12-31 | Intel Corporation | Microprocessor accelerated code optimizer |
US10191746B2 (en) | 2011-11-22 | 2019-01-29 | Intel Corporation | Accelerated code optimizer for a multiengine microprocessor |
CN104583975A (zh) * | 2012-06-15 | 2015-04-29 | 索夫特机械公司 | 无消歧乱序加载存储队列 |
CN104583975B (zh) * | 2012-06-15 | 2017-07-14 | 英特尔公司 | 无消歧乱序加载存储队列 |
US10592300B2 (en) | 2012-06-15 | 2020-03-17 | Intel Corporation | Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization |
US9928121B2 (en) | 2012-06-15 | 2018-03-27 | Intel Corporation | Method and system for implementing recovery from speculative forwarding miss-predictions/errors resulting from load store reordering and optimization |
US9965277B2 (en) | 2012-06-15 | 2018-05-08 | Intel Corporation | Virtual load store queue having a dynamic dispatch window with a unified structure |
US9990198B2 (en) | 2012-06-15 | 2018-06-05 | Intel Corporation | Instruction definition to implement load store reordering and optimization |
US9904552B2 (en) | 2012-06-15 | 2018-02-27 | Intel Corporation | Virtual load store queue having a dynamic dispatch window with a distributed structure |
US10019263B2 (en) | 2012-06-15 | 2018-07-10 | Intel Corporation | Reordered speculative instruction sequences with a disambiguation-free out of order load store queue |
US10048964B2 (en) | 2012-06-15 | 2018-08-14 | Intel Corporation | Disambiguation-free out of order load store queue |
CN105074657B (zh) * | 2013-01-29 | 2018-11-30 | 超威半导体公司 | 并行管道中的发散分支的硬件和软件解决方案 |
CN105074657A (zh) * | 2013-01-29 | 2015-11-18 | 超威半导体公司 | 并行管道中的发散分支的硬件和软件解决方案 |
CN104008216B (zh) * | 2013-02-22 | 2017-04-26 | 円星科技股份有限公司 | 使用存储器编译器以产生最佳化存储器实例的方法 |
US10169045B2 (en) | 2013-03-15 | 2019-01-01 | Intel Corporation | Method for dependency broadcasting through a source organized source view data structure |
US10503514B2 (en) | 2013-03-15 | 2019-12-10 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US10146576B2 (en) | 2013-03-15 | 2018-12-04 | Intel Corporation | Method for executing multithreaded instructions grouped into blocks |
US10146548B2 (en) | 2013-03-15 | 2018-12-04 | Intel Corporation | Method for populating a source view data structure by using register template snapshots |
US9823930B2 (en) | 2013-03-15 | 2017-11-21 | Intel Corporation | Method for emulating a guest centralized flag architecture by using a native distributed flag architecture |
US11656875B2 (en) | 2013-03-15 | 2023-05-23 | Intel Corporation | Method and system for instruction block to execution unit grouping |
US10198266B2 (en) | 2013-03-15 | 2019-02-05 | Intel Corporation | Method for populating register view data structure by using register template snapshots |
US9811342B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for performing dual dispatch of blocks and half blocks |
US10248570B2 (en) | 2013-03-15 | 2019-04-02 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
US10255076B2 (en) | 2013-03-15 | 2019-04-09 | Intel Corporation | Method for performing dual dispatch of blocks and half blocks |
US10275255B2 (en) | 2013-03-15 | 2019-04-30 | Intel Corporation | Method for dependency broadcasting through a source organized source view data structure |
US9811377B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for executing multithreaded instructions grouped into blocks |
US9858080B2 (en) | 2013-03-15 | 2018-01-02 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US10140138B2 (en) | 2013-03-15 | 2018-11-27 | Intel Corporation | Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation |
US9934042B2 (en) | 2013-03-15 | 2018-04-03 | Intel Corporation | Method for dependency broadcasting through a block organized source view data structure |
US9904625B2 (en) | 2013-03-15 | 2018-02-27 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
US9886279B2 (en) | 2013-03-15 | 2018-02-06 | Intel Corporation | Method for populating and instruction view data structure by using register template snapshots |
US9891924B2 (en) | 2013-03-15 | 2018-02-13 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US9898412B2 (en) | 2013-03-15 | 2018-02-20 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
US10740126B2 (en) | 2013-03-15 | 2020-08-11 | Intel Corporation | Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation |
CN107305538B (zh) * | 2016-04-22 | 2020-07-31 | 中科寒武纪科技股份有限公司 | 一种子矩阵运算装置及方法 |
CN107305538A (zh) * | 2016-04-22 | 2017-10-31 | 北京中科寒武纪科技有限公司 | 一种子矩阵运算装置及方法 |
CN108121688B (zh) * | 2017-12-15 | 2020-06-23 | 中科寒武纪科技股份有限公司 | 一种计算方法及相关产品 |
CN108121688A (zh) * | 2017-12-15 | 2018-06-05 | 北京中科寒武纪科技有限公司 | 一种计算方法及相关产品 |
CN112596869A (zh) * | 2020-12-08 | 2021-04-02 | 海光信息技术股份有限公司 | 延时模型的构建方法及装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
EP2477109A1 (en) | 2012-07-18 |
EP2011018A2 (en) | 2009-01-07 |
US20190227982A1 (en) | 2019-07-25 |
US9053292B2 (en) | 2015-06-09 |
US20180137081A1 (en) | 2018-05-17 |
US20150269118A1 (en) | 2015-09-24 |
US20090113170A1 (en) | 2009-04-30 |
CN101449256B (zh) | 2013-12-25 |
CN103646009B (zh) | 2016-08-17 |
CN103646009A (zh) | 2014-03-19 |
WO2007143278A2 (en) | 2007-12-13 |
US8327115B2 (en) | 2012-12-04 |
EP2011018B1 (en) | 2016-07-13 |
EP2011018A4 (en) | 2009-12-02 |
US20130091340A1 (en) | 2013-04-11 |
US11163720B2 (en) | 2021-11-02 |
WO2007143278A3 (en) | 2008-10-30 |
US9886416B2 (en) | 2018-02-06 |
EP2477109B1 (en) | 2016-07-13 |
US10289605B2 (en) | 2019-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101449256B (zh) | 对载明并行和依赖运算的指令矩阵进行处理的装置和方法 | |
Weaver et al. | Post-placement C-slow retiming for the Xilinx Virtex FPGA | |
CN100538628C (zh) | 用于在simd结构中处理线程组的系统和方法 | |
US6609189B1 (en) | Cycle segmented prefix circuits | |
US20230325195A1 (en) | Replicating logic blocks to enable increased throughput with sequential enabling of input register blocks | |
Minervini et al. | Vitruvius+: an area-efficient RISC-V decoupled vector coprocessor for high performance computing applications | |
Gunadi et al. | CRIB: Consolidated rename, issue, and bypass | |
Tai et al. | Accelerating matrix operations with improved deeply pipelined vector reduction | |
Uhrig et al. | A two-dimensional superscalar processor architecture | |
Tai et al. | Scalable matrix decompositions with multiple cores on FPGAs | |
Flachs et al. | Microarchitecture and implementation of the synergistic processor in 65-nm and 90-nm SOI | |
Mesa-Martínez et al. | Seed: scalable, efficient enforcement of dependences | |
Vilim | In-Database Machine Learning on Reconfigurable Dataflow Accelerators | |
Snelling | The design and analysis of a Stateless Data-Flow Architecture | |
Tai | Accelerating scientific applications on reconfigurable computing systems | |
Sano | Microparallel processors | |
Schaffer et al. | Using hardware multithreading to overcome broadcast/reduction latency in an associative SIMD processor | |
Chen et al. | A Novel Highly Scalable Architecture with Partially Distributed Pipeline and Hardware/Software Instruction Encoding | |
Sethumadhavan | Scalable hardware memory disambiguation | |
Brown | Reducing critical path execution time by breaking critical loops | |
Wu et al. | Fast custom instruction generation under area constraint | |
González et al. | Execute | |
Yang et al. | Reconfiguration support for vector operations | |
Zang et al. | Issue mechanism for embedded simultaneous multithreading processor | |
Ortancil | Analysis of design issues in superscalar machines |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20170317 Address after: American California Patentee after: Intel Corporation Address before: American California Patentee before: Soft Machines Inc. |
|
TR01 | Transfer of patent right |