CN103221938B - 移动数据的方法和装置 - Google Patents
移动数据的方法和装置 Download PDFInfo
- Publication number
- CN103221938B CN103221938B CN201180055810.1A CN201180055810A CN103221938B CN 103221938 B CN103221938 B CN 103221938B CN 201180055810 A CN201180055810 A CN 201180055810A CN 103221938 B CN103221938 B CN 103221938B
- Authority
- CN
- China
- Prior art keywords
- simd
- data
- logical block
- unit
- processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title abstract description 11
- 230000008859 change Effects 0.000 abstract description 3
- 239000013598 vector Substances 0.000 description 47
- 238000012545 processing Methods 0.000 description 40
- 238000010586 diagram Methods 0.000 description 10
- 238000003860 storage Methods 0.000 description 10
- 230000005540 biological transmission Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000011068 loading method Methods 0.000 description 5
- 238000004806 packaging method and process Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000002093 peripheral effect Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 230000003139 buffering effect Effects 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 238000007792 addition Methods 0.000 description 2
- 125000004122 cyclic group Chemical group 0.000 description 2
- 238000011049 filling Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000001343 mnemonic effect Effects 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 239000006144 Dulbecco’s modified Eagle's medium Substances 0.000 description 1
- 238000012952 Resampling Methods 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010304 firing Methods 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8053—Vector processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3005—Arrangements for executing specific machine instructions to perform operations for flow control
- G06F9/30054—Unconditional branch instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30101—Special purpose registers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
- G06F9/322—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
- G06F9/323—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for indirect branch instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/355—Indexed addressing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/355—Indexed addressing
- G06F9/3552—Indexed addressing using wraparound, e.g. modulo or circular addressing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3853—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
- G06F9/38873—Iterative single instructions for multiple data lanes [SIMD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3887—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
- G06F9/38873—Iterative single instructions for multiple data lanes [SIMD]
- G06F9/38875—Iterative single instructions for multiple data lanes [SIMD] for adaptable or variable architectural vector length
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3888—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple threads [SIMT] in parallel
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G06F9/3891—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute organised in groups of units sharing resources, e.g. clusters
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Multi Processors (AREA)
- Image Processing (AREA)
- Advance Control (AREA)
- Executing Machine-Instructions (AREA)
- Complex Calculations (AREA)
- Debugging And Monitoring (AREA)
Abstract
提供了一种用于将数据从计算单元(808i)中的第一寄存器文件移动至处理器(1410)中的第二寄存器文件的方法。改变数据移动引线(risc?is?mtwr)上的信号状态以指示从计算单元中的第一寄存器文件到处理器(1410)中的第二寄存器文件的数据移动指令。经由第一地址引线(risc?is?ua)提供从处理器到计算单元的通道地址。经由第二地址引线(risc?is?ra)提供从处理器到计算单元的读取地址,并且经由数据接口引线(node?regf?rd)将数据从计算单元中的第一寄存器文件传递至处理器中的第二寄存器文件。
Description
技术领域
本公开总体涉及处理器,并更具体地涉及处理集群。
背景技术
图1是描绘多内核系统(范围从2个到16个内核)的执行速率的加速比与并行开销的关系的图示,其中加速比是单个处理器执行时间除以并行处理器执行时间。可以看出,并行开销必须接近于零,以从大量内核获得显著益处。但是,由于如果并行程序之间存在任何交互,则开销往往非常高,因此通常难以高效使用多于一个或两个处理器来进行任何事情,除了完全分离的程序。因此,有必要改进处理集群。
发明内容
因此,本发明的实施例提供了一种方法。该方法的特征在于:改变数据移动引线(risc_is_mtvvr)上的信号状态,以指示从计算单元(4308-1到4308-M,7607-1到7607-P)中的第一寄存器文件(4358-1到4358-8,7902)到处理器(4322,7614)中的第二寄存器文件(5206)的数据移动指令;经由第一地址引线(risc_is_ua)将通道地址从处理器(4322,7614)提供到计算单元(4308-1到4308-M,7607-1到7607-P);经由第二地址引线(risc_is_ra)将读取地址从处理器(4322,7614)提供到计算单元(4308-1到4308-M,7607-1到7607-P);和经由数据接口引线(node_regf_rd)将数据从计算单元(4308-1到4308-M,7607-1到7607-P)中的第一寄存器文件(4358-1到4358-8,7902)传递到处理器(4322,7614)中的第二寄存器文件(5206)。
附图说明
图1示出了多内核加速比参数的图示;
图2示出了根据本公开实施例的系统的图示;
图3示出了根据本公开实施例的SOC的图示;
图4示出了根据本公开实施例的并行处理集群的图示;
图5和图6示出了该处理集群中节点或计算元件的一部分的图示;
图7示出了共享功能存储器的方框图;
图8示出了该共享功能存储器的SIMD数据路径的图示;
图9示出了一个SIMD数据路径的一部分的图示;
图10示出了节点处理器或RISC处理器更详细的图示;和
图11和图12示出了节点处理器或RISC处理器的流水线的一部分的示例的图示。
具体实施方式
图2中示出了执行并行处理的SOC的应用的示例。在该示例中,示出了成像设备1250,并且该成像器件1250(其可以例如是移动电话或摄像机)一般包括图像传感器1252、SOC1300、动态随机存取存储器(DRAM)1315、闪存1314、显示器1254以及电源管理集成电路(PMIC)1256。在操作中,图像传感器1252能够捕获图像信息(其可以是静止图像或视频),该图像信息可以由SOC1300和DRAM1315处理,并且存储在非易失性存储器中(即闪存1314)。此外,存储在闪存1314中的图像信息也可以通过使用SOC1300和DRAM1315显示在显示器1254上给用户。同样,成像设备1250常常是便携式的,并且包括电池作为电源;PMIC1256(其可以由SOC1300控制)可以帮助调节电源使用,从而延长电池寿命。
在图3中,根据本公开实施例描绘了片上系统或SOC1300的示例。该SOC1300(其通常是集成电路或IC,例如OMAPTM)一般包括处理集群1400(其一般执行上述并行处理)和提供宿主环境(在上面描述和引用)的主处理器1316。主处理器1316可以是宽的(即32位,64位等)RISC处理器(例如ARMCortex-A9),并且与总线仲裁器1310、缓冲器1306、总线桥1320(其允许主处理器1316经由接口总线或Ibus1330访问外围接口1324)、硬件应用编程接口(API)1308以及中断控制器1322经由主处理器总线或HP总线1328进行通信。处理集群1400通常与功能电路1302(其可以例如是电荷耦合器件或CCD接口,并且可以与片外设备进行通信)、缓冲器1306、总线仲裁器1310以及外围接口1324经由处理集群总线或PC总线1326进行通信。借助该配置,主处理器1316能够通过API1308提供信息(即将处理集群1400配置为符合期望的并行实现),同时处理集群1400和主处理器1316两者都可以直接访问闪存1256(通过闪存接口1312)和DRAM1254(通过存储器控制器1304)。此外,通过联合测试行动组(JTAG)接口1318可以执行测试和边界扫描。
转到图4,根据本公开实施例描绘了并行处理集群1400的示例。处理集群1400一般包括分区1402-1至1402-R,它们可以包括节点808-1至808-N、节点包装器(nodewrapper)810-1至810-N、指令存储器(IMEM)1404-1至1404-R以及总线接口单元或(BIU)4710-1至4710-R(其在下面详细讨论)。节点808-1至808-N各自耦合到数据互连814(分别通过BIU4710-1至4710-R以及数据总线1422),并且分区1402-1至1402-R的控制或消息可以通过消息1420从控制节点1406提供。全局加载/存储(GLS)单元1408和共享功能存储器1410也提供用于数据移动的额外功能(如下描述)。此外,三级或L3高速缓存1412、外围设备1414(其一般不包括在IC内)、存储器1416(其通常是闪存1256和/或DRAM1254以及不包括在SOC1300内的其它存储器)以及硬件加速器(HWA)单元1418与处理集群1400一起使用。还可以提供接口1405,以便将数据和地址传递到控制节点1406。
处理集群1400一般使用“推”模型(“push”model)用于数据传递。传递一般表现为缓冲写入(postedwrite),而不是请求-响应型的访问。与请求-响应的访问相比,这有利于将全局互连(即数据互连814)的占用减小一半,因为数据传递是单向的。一般不期望将请求路由通过互连814,接着将响应路由到请求器,这导致在互连814上有两个转换。推模型产生单个传递。这对于可扩展性是很重要的,因为随着网络大小增加,网络延迟增加,这必然降低请求-响应事务处理的性能。
推模型以及数据流协议(即812-1至812-N)一般将全局数据流量最小化到用于正确性的全局数据流量,同时也一般最小化全局数据流对本地节点利用的影响。通常对节点(即808-i)性能影响很少或没有影响,即使在大量全局流量的情况下。源将数据写入到全局输出缓冲器(下面讨论),并且继续而不需要确认传递成功。数据流协议(即812-1至812-N)一般确保在第一次尝试将数据移动到目的地时传递成功,从而在互连814进行单个传递。全局输出缓冲器(其在下面讨论)可以容纳高达16个输出(例如),从而使节点(即808-i)不太可能由于用于输出的瞬时全局带宽不足而拖延/停止(stall)。此外,瞬时带宽不被请求-响应事务处理或失败传递的重试影响。
最后,推模型更紧密地匹配编程模型,即程序不“摄取”它们自己的数据。相反,它们的输入变量和/或参数在被调用之前被写入。在编程环境中,输入变量的初始化表现为由源程序对存储器进行写入。在处理集群1400中,这些写入被转换成缓冲写入,其将变量值填充(populate)在节点上下文中。
全局输入缓冲器(其在下面讨论)用于接收来自源节点的数据。由于用于各节点808-1至808-N的数据存储器(DMEM)是单端口的,因此输入数据的写入可能与本地单输入多数据(SIMD)的读取相冲突。通过将输入数据接受到全局输入缓冲器中来避免这个竞争,其中全局输入缓冲器可以等待开放的数据存储器周期(即,与SIMD访问没有存储体(bank)冲突)。数据存储器可以具有32个存储体(例如),所以缓冲器很可能被快速释放。然而,节点(即808-i)应该具有空闲缓冲器条目,因为没有握手来确认传递。如果期望,全局输入缓冲器可以停止本地节点(即808-i)并且对数据存储器进行强制写入,从而释放缓冲器位置,但该事件应该极其罕见。通常,全局输入缓冲器被实现为两个独立的随机存取存储器(RAM),使得一个存储器可以处于对全局数据进行写入的状态,而另一个存储器处于被读取到数据存储器中的状态。消息传送互连与全局数据互连分开,但也采用了推模型。
在系统级,节点808-1至808-N在处理集群1400中是复制的,类似于SMP或对称多处理,其中节点数量被缩放到期望的吞吐量。处理集群1400可以缩放到很大数量的节点。节点808-1至808-N可以被分组到分区1402-1至1402-R,其中每个分区具有一个或多个节点。分区1402-1至1402-R通过增加节点之间的本地通信并且允许较大的程序计算较大量的输出数据来帮助可扩展性,从而使其更可能满足期望的吞吐量要求。在分区(即1402-i)内,节点使用本地互连进行通信,并且不需要全局资源。分区(即1404-i)内的节点也可以以任何粒度共享指令存储器(即1404-i):从每个节点使用专用指令存储器到所有节点使用公共指令存储器。例如,三个节点可以共享指令存储器中的三个存储体,第四个节点具有指令存储器中的专用存储体。当节点共享指令存储器(即1404-i)时,节点一般同步执行相同的程序。
处理集群1400也可以支持很大数量的节点(即808-i)和分区(即1402-i)。然而,每个分区的节点数量通常限制在4个,因为每个分区具有4个以上节点一般类似于非均匀存储器访问(NUMA)架构。在该情形中,通过具有大致恒定的截面带宽的一个(或多个)交叉互连(crossbar)(其在下面针对互连814进行描述)连接分区。处理集群1400目前被构建为每个周期传递一个节点宽度的数据(例如,64个16位像素),被分割成在4个周期上每个周期16像素的4个传递。处理集群1400一般容许延迟,并且节点缓冲一般防止了即使在互连814接近饱和时的节点停止(应该注意,除了合成程序之外,该条件难以实现)。
通常,处理集群1400包括在分区之间共享的全局资源:
(1)控制节点1406,其实现整个系统的消息传送互连(经由消息总线1420)、事件处理和调度以及到主处理器和调试器的接口(所有这些都在下面详细描述)。
(2)GLS单元1408,其包含可编程精简指令集(RISC)处理器,从而使系统数据移动可以由C++程序来描述,C++程序可以被直接编译为GLS数据移动线程。这使得系统代码能够在交叉宿主环境中执行,而不需要修改源代码,并且比直接存储器访问更加通用,因为它可以从系统或SIMD数据存储器(下面描述)中的任何一组地址(变量)移动到任何另一组地址(变量)。它是多线程的,在(例如)0周期上下文切换的情况下,支持例如多达16个线程。
(3)共享功能存储器1410,其是大的共享存储器,所述共享存储器提供了通用的查找表(LUT)和统计收集设施(直方图)。它也可以支持使用大的共享存储器进行的像素处理,例如重采样和失真校正,这种像素处理不被节点SIMD很好地支持(出于成本原因)。该处理使用(例如)六发射(six-issue)RISC处理器(即SFM处理器7614,其在下面详细描述),从而实现标量、向量以及2D阵列作为原生类型。
(4)硬件加速器1418,其可以被包括而用于不需要可编程性的功能,或者用于优化电力和/或面积。加速器表现为子系统,作为系统中的其它节点,参与控制和数据流,可以创建事件和被调度,并且对调试器可见。(在适用情况下,硬件加速器可以具有专用的LUT和统计收集)。
(5)数据互连814和系统开放内核协议(OCP)L3连接1412。这些管理节点分区、硬件加速器以及系统存储器和外围设备之间在数据总线1422上的数据移动(硬件加速器还可以具有到L3的专用连接)。
(6)调试接口。这些在示意图上未示出,但在本文件中有描述。
转到图5,可以更详细地看出节点808-i的示例。节点808-i是处理集群1400中的计算元件,而用于寻址和程序流控制的基本元件是RISC处理器或节点处理器4322。通常,该节点处理器4322可以具有32位的数据路径,其中具有20位指令(在40位指令中可能有20位立即字段)。像素操作例如以如下方式执行:在一组32像素功能单元中,在SIMD组织中,与从SIMD数据存储器到SIMD寄存器的四个加载(例如)和从SIMD寄存器到SIMD数据存储器的两个存储(例如)并行(节点处理器4322的指令集架构在下面的第7节中描述)。指令包描述(例如)一个RISC处理器内核指令、四个SIMD加载和两个SIMD存储,与由所有SIMD功能单元4308-1至4308-M执行的3发射SIMD指令并行。
通常,加载和存储(从加载存储单元4318-i)在SIMD数据存储器位置和SIMD本地寄存器之间移动数据,这些数据可以例如代表多达64个16位像素。虽然SIMD加载和存储使用共享的寄存器4320-i来进行间接寻址(也支持直接寻址),但是SIMD寻址操作读取这些寄存器:寻址上下文由内核或节点处理器4322管理。内核或节点处理器4322具有用于寄存器溢出/填充、寻址上下文以及输入参数的本地存储器4328。为每个节点提供分区指令存储器1404-i,其中多个节点可以共享分区指令存储器1404-i,从而对跨越多个节点的数据集执行较大程序。
节点808-i也包括支持并行的若干特征。全局输入缓冲器4316-i和全局输出缓冲器4310-i(其结合Lf缓冲器4314-i和Rt缓冲器4312-i,一般包括用于节点808-i的输入/输出(IO)电路)将节点808-i输入和输出与指令执行去耦合,从而使节点不太可能由于系统IO而停止。输入通常在处理之前很好地被接收(通过SIMD数据存储器4306-1至4306-M,和功能单元4308-1至4308-M),并且使用备用周期(sparecycle)存储在SIMD数据存储器4306-1至4306-M中(这很常见)。SIMD输出数据被写入全局输出缓冲器4210-i,并且从那里被路由通过处理集群1400,从而使节点(即808-i)即使在系统带宽接近其极限(这也是不可能的)时也不太可能停止。SIMD数据存储器4306-1至4306-M和对应的SIMD功能单元4308-1至4308-M的各自统称为“SIMD单元”。
SIMD数据存储器4306-1至4306-M被组织成具有可变大小的、被分配到相关或不相关任务的互不重叠的上下文。上下文在水平和垂直两个方向上都是完全共享的。在水平方向上进行共享使用只读存储器4330-i,它们对于程序是只读的,但可通过写入缓冲器4302-i和4304-i、加载/存储(LS)单元4318-i或其它硬件进行写入。这些存储器4330-i也可以是约512x2位大小。一般地,这些存储器4330-i对应于相对于被操作的中央像素位置而言在左边和右边的像素位置。这些存储器4330-i使用写缓冲机制(即写入缓冲器4302-i和4304-i)来调度写入,其中侧上下文写入通常不与本地访问同步。缓冲器4302-i一般与当前操作的邻近像素(例如)上下文保持一致性。在垂直方向上进行共享使用SIMD数据存储器4306-1至4306-M内的循环缓冲器;循环寻址是LS单元4318-i所施加的加载和存储指令支持的一种模式。通常使用上面描述的系统级依赖性协议保持共享数据一致性。
上下文分配和共享由SIMD数据存储器4306-1至4306-M上下文描述符在与节点处理器4322相关联的上下文状态存储器4326中指定。该存储器4326可以例如是16x16x32位或2x16x256位的RAM。这些描述符也指定了数据如何在上下文之间以完全通用的方式共享,并且保留信息以处理上下文之间的数据依赖性。上下文保存/恢复存储器4324通过允许寄存器4320-i并行保存和恢复,以此用于支持0周期任务切换(如上描述)。针对每个任务使用独立的上下文区域来保持SIMD数据存储器4306-1至4306-M和处理器数据存储器4328上下文。
SIMD数据存储器4306-1至4306-M和处理器数据存储器4328被划分成具有可变大小的可变数量的上下文。垂直帧方向的数据被保留并且在上下文本身内再使用。通过将上下文链接在一起成为水平组来共享水平帧方向的数据。重要的是要注意,上下文组织方式与计算中所涉及的节点数量以及它们如何彼此交互是基本上无关的。上下文的主要目的是保留、共享以及再使用图像数据,而不管操作该数据的节点的组织方式。
通常,SIMD数据存储器4306-1至4306-M包含(例如)由功能单元4308-1至4308-M操作的像素和中间上下文。SIMD数据存储器4306-1至4306-M一般划分成(例如)多达16个不相交的上下文区域,其各自具有可编程的基地址,其中从所有的上下文可访问公共区域,公共区域由编译器用于寄存器溢出/填充。处理器数据存储器4328包含输入参数、寻址上下文以及用于寄存器4320-i的溢出/填充区域。处理器数据存储器4328可以具有(例如)高达16个不相交的本地上下文区域,它们对应于SIMD数据存储器4306-1至4306-M上下文,并且各自具有可编程的基地址。
通常,节点(即节点808-i)例如具有三种配置:8个SIMD寄存器(第一配置);32个SIMD寄存器(第二配置);以及32个SIMD寄存器加上在每个较小的功能单元中有三个额外的执行单元(第三配置)。
作为示例,图6更详细地示出了SIMD单元(即,SIMD数据存储器4306-1和SIMD功能单元4308-1)、节点处理器4322和LS单元4318-i的示例。如该示例中所示,SIMD功能单元4308通常由八个较小的功能单元 4308-1到 4308-8组成。
首先看处理器内核,节点处理器4322通常执行所有控制相关的指令并且保持在寄存器文件4340和4342(分别)示出的所有地址寄存器值和SIMD单元的专用寄存器值。至多六个(例如)存储器指令可以在一个周期中计算。对于地址寄存器值,地址源操作数从所示的SIMD单元发送至节点处理器4322,并且节点处理器4322发送回寄存器值,其接着由SIMD单元使用于地址计算。类似地,对于专用寄存器值,专用寄存器源操作数从所示的SIMD单元发送至节点处理器4322,并且节点处理器4322发送回寄存器值。
节点处理器4322可以具有(例如)用于SIMD的15个读取端口和6个写入端口。通常,15个读取端口包括(例如)容纳用于6个存储器指令中的每一个存储器指令的两个操作数(即lssrc和lssrc2)的12个读取端口,和用于专用寄存器文件4312的3个端口。通常,专用寄存器文件4342包括两个名为RCLIPMIN和RCLIPMAX的寄存器,其应该被提供在一起并且通常限制于16条目寄存器文件4342中的低四个寄存器。接着,在指令中直接指定RCLIPMAX和RCLIPMIN寄存器。其他专用寄存器RND和SCL由4位寄存器标识符指定并且可以位于16条目寄存器文件4342中的任意位置。此外,节点处理器4322包括程序计数器执行单元(PCEXE)4344,其可以更新指令存储器1404-i。
现在转到LS单元4318-i和SIMD单元,图6示出了各自的通用结构。如所示,LS单元4318-i通常包括LS解码器4334、LS执行单元4336、逻辑单元4346、乘法单元4348、右执行单元4350和LS数据存储器4339;然而,下面提供关于LS单元4318-i的数据路径的细节。较小的功能单元4338-1至4338-8中的每一个通常(且分别)包括SIMD寄存器文件4358-1到4358-8(例如,其可以分别包括32个寄存器)、左逻辑单元(LU)4352-1到4352-8、乘法单元(MU)4354-1到4354-8和右逻辑单元(RU)4356-1到4356-8。这些左逻辑单元4352-1到4352-8、乘法单元4354-1到4354-8和右逻辑单元4356-1到4356-8通常分别是是左单元4346、中单元4348和右单元4350的复制。此外,类似于LS单元4318-i,下面说明每个功能单元4338-1到4338-8的数据路径。
此外,对于节点(即节点808-i)的三种示例配置,某些部件(即逻辑单元4352-1)的大小或对应的指令可以变化,而其他部件可以保持相同。LS数据存储器4339、查找表和直方图相对地保持相同。优选地,LS数据存储器4339可以是约512*32位,其中前16个位置保持上下文基地址,其余的位置可以由上下文访问。查找表或LUT(其通常在PC执行单元4344内)可以具有多达12个存储大小为16Kb的表,其中四个位可以用于选择表,十四个位可以用于寻址。直方图(其通常也位于PC执行单元4344中)可以具有4个表,其中直方图与LUT共享4位ID以选择表,并且使用8位用于寻址。下面的表1中,可以看出三种示例配置中的每一种的指令大小,其可以对应各种部件的大小。
转到图7,可以看到共享功能存储器1410。共享功能存储器1410一般是大的集中存储器,其支持节点不能很好地支持(即出于成本原因)的操作。共享功能存储器1410的主要部件是两个大的存储器:功能存储器(FMEM)7602和向量存储器(VMEM)7603(其各自具有在例如48至1024千字节之间的可配置大小和组织)。该功能存储器7602实现了高带宽的基于向量的查找表(LUT)和直方图的同步的指令驱动的实现。向量存储器7603可以支持暗示(imply)向量指令(在上面第8部分中详细描述)的6发射处理器(即SFM处理器7614)进行的操作,向量指令例如可以用于基于块的(block-based)像素处理。通常,可以使用消息传送接口1420和数据总线1422访问该SFM处理器7614。SFM处理器7614例如可以对宽像素上下文(64像素)操作,宽像素上下文可以具有比节点中的SIMD数据存储器更通用的组织和总存储器大小,其中更通用的处理被应用到数据。其支持对标准C++整数数据类型进行标量、向量和数组操作,以及对与各种数据类型可兼容的包装的像素进行标量、向量和数组操作。例如并如图示出,与向量存储器7603和功能存储器7602相关联的SIMD数据路径一般包括端口7605-1至7605-Q以及功能单元7607-1至7607-P。
所有处理节点(即808-i)可以访问功能存储器7602和向量存储器7603,在这个意义上,功能存储器7602和向量存储器7603一般是“共享”的。可以通过SFM包装器访问提供给功能存储器7602的数据(通常以只写的方式)。该共享一般也与上述针对处理节点(即808-i)描述的上下文管理一致。处理节点和共享功能存储器1410之间的数据I/O也使用数据流协议,并且且处理节点通常不能直接访问向量存储器7603。共享功能存储器1410也可以对功能存储器7602进行写入,但当其正在被处理节点访问时不可以写入。处理节点(即808-i)可以对功能存储器7602中的公共位置进行读取和写入,但(通常)作为只读LUT操作或只写直方图操作。处理节点也可能对功能存储器7602区域进行读写访问,但这对于给定程序的访问应该是专有的。
转到图8,示出了共享功能存储器1410的SIMD数据通路示例。例如,可以使用八个SIMD数据路径(其可以分成两个16位半部分,因为其可以对16位包装数据操作)。如所示,这些SIMD数据路径通常包括多组存储体7802-1到7802-L、关联的寄存器7804-1到7804-L和关联的多组功能单元7806-1到7806-L。
在图9中,可以看到一个SIMD数据路径的一部分的示例(即且例如,寄存器7804-1到7804-L中的一个的一部分和功能单元7806-1到7806-L中的一个的一部分)。如所示且例如,该SIMD数据路径可以包括16条目32位寄存器文件7902、两个16位乘法器7904和7906以及可以在一个周期内执行两个16位包装操作的单个32位算法/逻辑单元7908。同样,作为示例,每个SIMD数据路径可以执行两个独立的16位操作,或组合的32位操作。例如,这可以使用16位乘法器结合32位加法形成32位乘法。此外,算法/逻辑单元7908能够执行加法、减法、逻辑操作(即AND)、比较和条件移动。
返回图8,SIMD数据路径寄存器7804-1到7804-L可以使用至向量存储器7603的加载/存储接口。这些加载和存储可以使用向量存储器7603中提供用于节点(即808-i)的并行LUT和直方图访问的特征:对于节点,每个SIMD数据通路的一半可以提供对功能存储器7602的索引;并且,类似地,SFM处理器7614中的每个SIMD数据通路的一半可以提供独立的向量存储器7603地址。寻址通常被组织为使得相邻数据路径可以对多个数据类型实例,例如8-、16-或32-位(例如)数据的标量、向量和数组执行相同操作:这些被称为向量隐含寻址模式(向量由SIMD通过线性向量存储器7603寻址隐含)。替换地,每个数据路径可以对存储体7802-1到7802-L内的帧区域的包装像素操作:这些称作向量包装寻址模式(包装像素的向量由SIMD通过二维向量存储器7603寻址隐含)。在这两种情况下,与节点处理器4322一样,编程模型可以隐藏SIMD的宽度,并且程序可以被写入,如同它们对其他数据类型的单个像素或元素操作那样。
隐含向量的数据类型一般是8位字符、16位半字或32位整字的SIMD实现的向量,其通过每个SIMD数据路径单独地操作(即图9)。这些向量在程序中一般不是显式的,而是由硬件操作暗含。这些数据类型还可以构成显式程序向量或数组内的元素:SIMD有效地将隐藏的第二或第三维加入这些程序向量或数组。实际上,编程视图(programmingview)可以是具有专用32位数据存储器的单个SIMD数据路径,并且该存储器可以利用常规的寻址模式访问。在硬件中,以32个SIMD数据路径中的每一个具有私有数据存储器的表现的方式映射该视图,但是该实现方式利用向量存储器7603的宽的存储体组织在共享功能存储器1410中实现该功能。
SFM处理器7614的SIMD一般在与节点处理器上下文相似的向量存储器7603上下文内操作,其中描述符具有与多组存储体7802-1对准的基址,并且充分大以至于能寻址整个向量存储器7603(即对于1024k字节的大小,为13位)。SIMD数据路径的每二分之一用6位标识符(POSN)进行编号,最左边的数据路径以0开始。对于向量隐含寻址,该值的LSB一般被省略,并且其余的五位用于使数据路径产生的向量存储器7603地址与向量存储器7603中的相应字对准。
在处理集群1400内,通用目的的RISC处理器用于各种目的。例如,节点处理器4322(其可以是RISC处理器)可以用于程序流控制。下面描述了RISC架构的示例。
转到图10,可以看到RISC处理器5200(即节点处理器4322)的更详细的示例。处理器5200使用的流水线一般提供在处理集群1400中通用高层次语言(即C/C++)执行的支持。在操作中,处理器5200采用摄取、解码和执行三级流水线。通常,上下文接口5214和LS端口5212提供指令给程序高速缓存5208,并且指令摄取5204可以从程序高速缓存5208中摄取指令。指令摄取5204和程序高速缓存5208之间的总线可以例如是40位宽,从而允许处理器5200支持双发射指令(即指令可以是40位或20位宽)。一般地,“A侧”和“B侧”功能单元(在处理单元5202内)执行较小的指令(即20位指令),而“B侧”功能单元执行较大的指令(即40位指令)。为了执行提供的指令,处理单元可以使用寄存器文件5206作为暂存器(scratchpad);该寄存器文件5206可以(例如)是在“A侧”和“B侧”之间共享的16条目32位寄存器文件。此外,处理器5200包括控制寄存器文件5216和程序计数器5218。也可以通过边界引脚或引线访问处理器5200;在表2中描述了每一种的示例(“z”表示活动低引脚)。
表格2
转到图11,可以更详细地看到处理器5200。这里,指令摄取5204(其对应于摄取级5306)被分成A侧和B侧,其中A侧接收“摄取分组”(其可以是40位宽的指令字符,其具有一个40位的指令或两个20位的指令)的前20位(即[19:0]),而B侧接收摄取分组的后20位(即[39:20])。通常,指令取出5204确定摄取分组中的指令的结构和大小,并且相应地分发指令(其在下面的7.3节讨论)。
解码器5221(其是解码级5308和处理单元5202的一部分)将来自指令摄取5204的指令进行解码。解码器5221一般包括运算符格式电路5223-1和5221-2(以生成中间体)以及解码电路5225-1和5225-2,分别用于B侧和A侧。接着由解码-执行单元5220(其也是解码级5308和处理单元5202的一部分)接收来自解码器5221的输出。解码-执行单元5220生成用于执行单元5227的命令,其对应于通过摄取分组接收的指令。
执行单元5227的A侧和B侧也细分。执行单元5227的B侧和A侧中的每一个分别包括乘法单元5222-1/5222-2、布尔单元5226-1/5226-2、加/减单元5228-1/5228-2以及移动单元5330-1/5330-2。执行单元5227的B侧还包括加载/存储单元5224和分支单元5232。接着,乘法单元5222-1/5222-2、布尔单元5226-1/5226-2、加/减单元5228-1/5228-2以及移动单元5330-1/5330-2可以分别执行乘法操作、逻辑布尔操作、加/减操作以及对加载到通用寄存器文件5206中的数据的数据移动操作(其也可以包括读取用于A侧和B侧中的每一个的地址)。也可以在控制寄存器文件5216中执行移动操作。
具有向量处理模块的RISC处理器一般与共享功能存储器1410一起使用。RISC处理器与用于处理器5200的RISC处理器大致相同,但是其包括向量处理模块,从而扩展计算和加载/存储带宽。该模块可以包含16个向量单元,每个向量单元能够执行每周期4操作执行分组。通常的执行分组一般包括来自向量存储器阵列的数据加载、两个寄存器到寄存器的操作以及到向量存储器阵列的结果存储。该类型的RISC处理器一般使用80位宽或120位宽的指令字符,其一般构成“摄取分组”,并且可以包括未对齐的指令。摄取分组可以包含40位和20位指令的混合,其可以包括向量单元指令和标量指令,类似于处理器5200使用的那些。通常,向量单元指令可以是20位宽,而其它指令可以是20位宽或40位宽(类似于处理器5200)。向量指令也可以被呈现在指令摄取总线的所有通道上,但是,如果摄取分组包括标量和向量单元指令两者,则向量指令被呈现(例如)在指令摄取总线位[39:0]上,而标量指令被呈现(例如)在指令摄取总线位[79:40]上。此外,未使用的指令摄取总线通道用NOP填充(pad)。
接着可以从一个或多个摄取分组中形成“执行分组”。部分的执行分组被保持在指令队列中,直至完成。通常,完整的执行分组被提交到执行级(即5227)。四个向量单元指令(例如)、两个标量指令(例如)或20位和40位指令的组合(例如)可以在单个周期内执行。连续的20位指令也可以串行执行。如果当前20位指令的位19被设置,则这表明,当前指令和随后的20位指令形成执行分组。位19一般可以称为P位或并行位。如果P位未被设置,则这指示执行分组的末端。P位未被设置的连续20位指令导致20位指令的串行执行。也应该注意,该RISC处理器(具有向量处理模块)可以包括下面约束中的任何一项:
(1)P位在40位指令中(例如)被设置成1是非法的;
(2)加载或存储指令应该显示在指令摄取总线的B侧(即用于40位加载和存储的位79:40,或用于20位加载或存储的摄取总线的位79:60上);
(3)单个标量加载或存储是非法的;
(4)对于向量单元,单个加载和单个存储都可以存在于摄取分组中;
(5)P位等于1的20位指令在40位指令之前是非法的;以及
(6)没有硬件就位来检测这些非法条件。这些限制被期望通过系统编程工具718强加。
转到图12,可以看到向量模块的示例。向量模块包括检测器解码器5246、解码-执行单元5250以及执行单元5251。向量解码器包括时隙解码器(slotdecoder)5248-1至5248-4,其从指令摄取5204接收指令。通常,时隙解码器5248-1和5248-2以彼此类似的方式操作,而时隙解码器5248-3和5248-4包括加载/存储解码电路。接着,解码-执行单元5250可以基于向量解码器5246的解码输出来生成用于执行单元5251的指令。每个时隙解码器可以生成乘法单元5252、加/减单元5254、移动单元5256以及布尔单元5258(其各自使用通用寄存器5206中的数据和地址)可以使用的指令。此外,时隙解码器5248-3和5248-4可以生成用于加载/存储单元5260和5262的加载和存储指令。
通用寄存器文件5206可以是16条目32位通用寄存器文件。通用寄存器(GPR)的宽度可以被参数化。通常来说,当处理器5200用于节点(即808-i)时,有4+15个(15个由边界引脚控制)读取端口和4+6个(6个由边界引脚控制)写入端口,而用于GLS单元1408的处理器5200具有4个读取端口和4个写入端口。
下面的表3指示可以在节点处理器4322和SIMD(即,包括SIMD数据存储器4306-1和功能单元4308-1的SIMD单元)之间移动数据的指令。
表格3
下面的表2示出了用于处理器5200的指令集架构的示例,其中:
(1)单元名称.SA和.SB用于区分20位指令在哪个发射时隙中执行;
(2)40位指令在B侧(.SB)上按照惯例执行;
(3)基本形式是<助记符号(mnemonic)><单元(unit)><逗号分隔的操作数列表(commaseparatedoperandlist)>;以及
(4)伪代码具有C++语法,合适的库可以被直接包括在模拟器或其它黄金模型中。
表格4
本发明涉及的本领域技术人员应该理解,在不偏离本发明的范围的情况下,可以对描述的实施例和实现的其他实施例进行修改。
Claims (3)
1.一种集成电路(1300)SIMD单元(808-i),其包括:
A、SIMD数据存储器(4306-1),其具有全局数据输入和输出引线;
B、第一SIMD功能单元(4308-1),其包括耦合到所述SIMD数据存储器(4306-1)的第一SIMD寄存器文件(4358-1)、第一左逻辑单元(4352-1)、第一乘法单元(4354-1)以及第一右逻辑单元(4356-1),其中所述第一左逻辑单元(4352-1)、第一乘法单元(4354-1)以及第一右逻辑单元(4356-1)的全部均耦合到所述第一SIMD寄存器文件(4358-1);
C、第二SIMD功能单元(4308-2),其包括耦合到所述SIMD数据存储器(4306-1)的第二SIMD寄存器文件(4358-2)、第二左逻辑单元(4352-2)、第二乘法单元(4354-2)以及第二右逻辑单元(4356-2),其中所述第二左逻辑单元(4352-2)、第二乘法单元(4354-2)以及第二右逻辑单元(4356-2)的全部均耦合到所述第二SIMD寄存器文件(4358-2);以及
D、加载存储电路(4318-i),其包括加载存储左逻辑单元(4346)、加载存储乘法单元(4348)以及加载存储右逻辑单元(4350),其中所述加载存储左逻辑单元(4346)、加载存储乘法单元(4348)以及加载存储右逻辑单元(4350)的全部均具有耦合到指令存储器(1404-i)的输入端,均耦合到所述第一左逻辑单元(4352-1)、第一乘法单元(4354-1)以及第一右逻辑单元(4356-1)并且均耦合到所述第二左逻辑单元(4352-2)、第二乘法单元(4354-2)以及第二右逻辑单元(4356-2)。
2.根据权利要求1所述的集成电路(1300)SIMD单元(808-i),其中所述加载存储电路(4318-i)包括耦合到所述指令存储器(1404-i)的加载存储解码器电路(4334)以及耦合到所述加载存储解码器电路(4334)的加载存储执行电路(4336)。
3.根据权利要求1所述的集成电路(1300)SIMD单元(808-i),其中存在八个SIMD功能单元(4338-1至4338-8),每个功能单元包括耦合到所述SIMD数据存储器(4306-1)的SIMD寄存器文件(4358-1至4358-8)、左逻辑单元(4352-1至4352-8)、乘法单元(4354-1至4354-8)以及右逻辑单元(4356-1至4356-8),所述左逻辑单元(4352-1至4352-8)、乘法单元(4354-1至4354-8)以及右逻辑单元(4356-1至4356-8)的全部均耦合到所述SIMD功能单元的所述SIMD寄存器文件(4358-1至4358-8)。
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US41521010P | 2010-11-18 | 2010-11-18 | |
US41520510P | 2010-11-18 | 2010-11-18 | |
US61/415,205 | 2010-11-18 | ||
US61/415,210 | 2010-11-18 | ||
US13/232,774 | 2011-09-14 | ||
US13/232,774 US9552206B2 (en) | 2010-11-18 | 2011-09-14 | Integrated circuit with control node circuitry and processing circuitry |
PCT/US2011/061474 WO2012068504A2 (en) | 2010-11-18 | 2011-11-18 | Method and apparatus for moving data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103221938A CN103221938A (zh) | 2013-07-24 |
CN103221938B true CN103221938B (zh) | 2016-01-13 |
Family
ID=46065497
Family Applications (8)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201180055771.5A Active CN103221935B (zh) | 2010-11-18 | 2011-11-18 | 从simd寄存器文件向通用寄存器文件移动数据的方法和装置 |
CN201180055828.1A Active CN103221939B (zh) | 2010-11-18 | 2011-11-18 | 移动数据的方法和装置 |
CN201180055803.1A Active CN103221937B (zh) | 2010-11-18 | 2011-11-18 | 用于处理集群的加载/存储电路 |
CN201180055782.3A Active CN103221936B (zh) | 2010-11-18 | 2011-11-18 | 一种用于处理集群的共享功能存储器电路 |
CN201180055748.6A Active CN103221934B (zh) | 2010-11-18 | 2011-11-18 | 用于处理集群的控制节点 |
CN201180055810.1A Active CN103221938B (zh) | 2010-11-18 | 2011-11-18 | 移动数据的方法和装置 |
CN201180055668.0A Active CN103221933B (zh) | 2010-11-18 | 2011-11-18 | 从通用寄存器文件向simd寄存器文件移动数据的方法和装置 |
CN201180055694.3A Active CN103221918B (zh) | 2010-11-18 | 2011-11-18 | 具有分开的数据总线和消息总线的ic集群处理设备 |
Family Applications Before (5)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201180055771.5A Active CN103221935B (zh) | 2010-11-18 | 2011-11-18 | 从simd寄存器文件向通用寄存器文件移动数据的方法和装置 |
CN201180055828.1A Active CN103221939B (zh) | 2010-11-18 | 2011-11-18 | 移动数据的方法和装置 |
CN201180055803.1A Active CN103221937B (zh) | 2010-11-18 | 2011-11-18 | 用于处理集群的加载/存储电路 |
CN201180055782.3A Active CN103221936B (zh) | 2010-11-18 | 2011-11-18 | 一种用于处理集群的共享功能存储器电路 |
CN201180055748.6A Active CN103221934B (zh) | 2010-11-18 | 2011-11-18 | 用于处理集群的控制节点 |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201180055668.0A Active CN103221933B (zh) | 2010-11-18 | 2011-11-18 | 从通用寄存器文件向simd寄存器文件移动数据的方法和装置 |
CN201180055694.3A Active CN103221918B (zh) | 2010-11-18 | 2011-11-18 | 具有分开的数据总线和消息总线的ic集群处理设备 |
Country Status (4)
Country | Link |
---|---|
US (1) | US9552206B2 (zh) |
JP (9) | JP2014501007A (zh) |
CN (8) | CN103221935B (zh) |
WO (8) | WO2012068504A2 (zh) |
Families Citing this family (235)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7797367B1 (en) * | 1999-10-06 | 2010-09-14 | Gelvin David C | Apparatus for compact internetworked wireless integrated network sensors (WINS) |
US9710384B2 (en) | 2008-01-04 | 2017-07-18 | Micron Technology, Inc. | Microprocessor architecture having alternative memory access paths |
US8397088B1 (en) | 2009-07-21 | 2013-03-12 | The Research Foundation Of State University Of New York | Apparatus and method for efficient estimation of the energy dissipation of processor based systems |
US8446824B2 (en) * | 2009-12-17 | 2013-05-21 | Intel Corporation | NUMA-aware scaling for network devices |
US9003414B2 (en) * | 2010-10-08 | 2015-04-07 | Hitachi, Ltd. | Storage management computer and method for avoiding conflict by adjusting the task starting time and switching the order of task execution |
US9552206B2 (en) * | 2010-11-18 | 2017-01-24 | Texas Instruments Incorporated | Integrated circuit with control node circuitry and processing circuitry |
KR20120066305A (ko) * | 2010-12-14 | 2012-06-22 | 한국전자통신연구원 | 비디오 움직임 예측 및 보상용 캐싱 장치 및 방법 |
DE202012013520U1 (de) * | 2011-01-26 | 2017-05-30 | Apple Inc. | Externer Kontaktverbinder |
US8918791B1 (en) * | 2011-03-10 | 2014-12-23 | Applied Micro Circuits Corporation | Method and system for queuing a request by a processor to access a shared resource and granting access in accordance with an embedded lock ID |
US9008180B2 (en) * | 2011-04-21 | 2015-04-14 | Intellectual Discovery Co., Ltd. | Method and apparatus for encoding/decoding images using a prediction method adopting in-loop filtering |
US9086883B2 (en) | 2011-06-10 | 2015-07-21 | Qualcomm Incorporated | System and apparatus for consolidated dynamic frequency/voltage control |
US20130060555A1 (en) * | 2011-06-10 | 2013-03-07 | Qualcomm Incorporated | System and Apparatus Modeling Processor Workloads Using Virtual Pulse Chains |
US8656376B2 (en) * | 2011-09-01 | 2014-02-18 | National Tsing Hua University | Compiler for providing intrinsic supports for VLIW PAC processors with distributed register files and method thereof |
CN102331961B (zh) * | 2011-09-13 | 2014-02-19 | 华为技术有限公司 | 并行模拟多个处理器的方法及系统、调度器 |
US20130077690A1 (en) * | 2011-09-23 | 2013-03-28 | Qualcomm Incorporated | Firmware-Based Multi-Threaded Video Decoding |
KR101859188B1 (ko) * | 2011-09-26 | 2018-06-29 | 삼성전자주식회사 | 매니코어 시스템에서의 파티션 스케줄링 장치 및 방법 |
CA2889387C (en) * | 2011-11-22 | 2020-03-24 | Solano Labs, Inc. | System of distributed software quality improvement |
JP5915116B2 (ja) * | 2011-11-24 | 2016-05-11 | 富士通株式会社 | ストレージシステム、ストレージ装置、システム制御プログラムおよびシステム制御方法 |
WO2013095608A1 (en) * | 2011-12-23 | 2013-06-27 | Intel Corporation | Apparatus and method for vectorization with speculation support |
US9329834B2 (en) * | 2012-01-10 | 2016-05-03 | Intel Corporation | Intelligent parametric scratchap memory architecture |
US8639894B2 (en) * | 2012-01-27 | 2014-01-28 | Comcast Cable Communications, Llc | Efficient read and write operations |
GB201204687D0 (en) * | 2012-03-16 | 2012-05-02 | Microsoft Corp | Communication privacy |
EP2831721B1 (en) * | 2012-03-30 | 2020-08-26 | Intel Corporation | Context switching mechanism for a processing core having a general purpose cpu core and a tightly coupled accelerator |
US10430190B2 (en) * | 2012-06-07 | 2019-10-01 | Micron Technology, Inc. | Systems and methods for selectively controlling multithreaded execution of executable code segments |
US10437602B2 (en) | 2012-06-15 | 2019-10-08 | International Business Machines Corporation | Program interruption filtering in transactional execution |
US9361115B2 (en) | 2012-06-15 | 2016-06-07 | International Business Machines Corporation | Saving/restoring selected registers in transactional processing |
US8682877B2 (en) | 2012-06-15 | 2014-03-25 | International Business Machines Corporation | Constrained transaction execution |
US9384004B2 (en) | 2012-06-15 | 2016-07-05 | International Business Machines Corporation | Randomized testing within transactional execution |
US9442737B2 (en) | 2012-06-15 | 2016-09-13 | International Business Machines Corporation | Restricting processing within a processor to facilitate transaction completion |
US9348642B2 (en) | 2012-06-15 | 2016-05-24 | International Business Machines Corporation | Transaction begin/end instructions |
US9336046B2 (en) | 2012-06-15 | 2016-05-10 | International Business Machines Corporation | Transaction abort processing |
US8688661B2 (en) | 2012-06-15 | 2014-04-01 | International Business Machines Corporation | Transactional processing |
US9436477B2 (en) * | 2012-06-15 | 2016-09-06 | International Business Machines Corporation | Transaction abort instruction |
US9772854B2 (en) | 2012-06-15 | 2017-09-26 | International Business Machines Corporation | Selectively controlling instruction execution in transactional processing |
US9740549B2 (en) | 2012-06-15 | 2017-08-22 | International Business Machines Corporation | Facilitating transaction completion subsequent to repeated aborts of the transaction |
US9367323B2 (en) | 2012-06-15 | 2016-06-14 | International Business Machines Corporation | Processor assist facility |
US20130339680A1 (en) | 2012-06-15 | 2013-12-19 | International Business Machines Corporation | Nontransactional store instruction |
US9448796B2 (en) | 2012-06-15 | 2016-09-20 | International Business Machines Corporation | Restricted instructions in transactional execution |
US9317460B2 (en) | 2012-06-15 | 2016-04-19 | International Business Machines Corporation | Program event recording within a transactional environment |
US10223246B2 (en) * | 2012-07-30 | 2019-03-05 | Infosys Limited | System and method for functional test case generation of end-to-end business process models |
US10154177B2 (en) * | 2012-10-04 | 2018-12-11 | Cognex Corporation | Symbology reader with multi-core processor |
US9436475B2 (en) * | 2012-11-05 | 2016-09-06 | Nvidia Corporation | System and method for executing sequential code using a group of threads and single-instruction, multiple-thread processor incorporating the same |
CN104813306B (zh) * | 2012-11-21 | 2017-07-04 | 相干逻辑公司 | 具有散布处理器dma‑fifo的处理系统 |
US9417873B2 (en) | 2012-12-28 | 2016-08-16 | Intel Corporation | Apparatus and method for a hybrid latency-throughput processor |
US9804839B2 (en) * | 2012-12-28 | 2017-10-31 | Intel Corporation | Instruction for determining histograms |
US10140129B2 (en) | 2012-12-28 | 2018-11-27 | Intel Corporation | Processing core having shared front end unit |
US9361116B2 (en) * | 2012-12-28 | 2016-06-07 | Intel Corporation | Apparatus and method for low-latency invocation of accelerators |
US10346195B2 (en) | 2012-12-29 | 2019-07-09 | Intel Corporation | Apparatus and method for invocation of a multi threaded accelerator |
US11163736B2 (en) * | 2013-03-04 | 2021-11-02 | Avaya Inc. | System and method for in-memory indexing of data |
US9400611B1 (en) * | 2013-03-13 | 2016-07-26 | Emc Corporation | Data migration in cluster environment using host copy and changed block tracking |
US9582320B2 (en) * | 2013-03-14 | 2017-02-28 | Nxp Usa, Inc. | Computer systems and methods with resource transfer hint instruction |
US9158698B2 (en) | 2013-03-15 | 2015-10-13 | International Business Machines Corporation | Dynamically removing entries from an executing queue |
US9471521B2 (en) * | 2013-05-15 | 2016-10-18 | Stmicroelectronics S.R.L. | Communication system for interfacing a plurality of transmission circuits with an interconnection network, and corresponding integrated circuit |
US9244810B2 (en) | 2013-05-23 | 2016-01-26 | Nvidia Corporation | Debugger graphical user interface system, method, and computer program product |
US8943448B2 (en) * | 2013-05-23 | 2015-01-27 | Nvidia Corporation | System, method, and computer program product for providing a debugger using a common hardware database |
US20140351811A1 (en) * | 2013-05-24 | 2014-11-27 | Empire Technology Development Llc | Datacenter application packages with hardware accelerators |
US20140358759A1 (en) * | 2013-05-28 | 2014-12-04 | Rivada Networks, Llc | Interfacing between a Dynamic Spectrum Policy Controller and a Dynamic Spectrum Controller |
US9910816B2 (en) * | 2013-07-22 | 2018-03-06 | Futurewei Technologies, Inc. | Scalable direct inter-node communication over peripheral component interconnect-express (PCIe) |
US9882984B2 (en) | 2013-08-02 | 2018-01-30 | International Business Machines Corporation | Cache migration management in a virtualized distributed computing system |
US10373301B2 (en) * | 2013-09-25 | 2019-08-06 | Sikorsky Aircraft Corporation | Structural hot spot and critical location monitoring system and method |
US8914757B1 (en) * | 2013-10-02 | 2014-12-16 | International Business Machines Corporation | Explaining illegal combinations in combinatorial models |
GB2519108A (en) | 2013-10-09 | 2015-04-15 | Advanced Risc Mach Ltd | A data processing apparatus and method for controlling performance of speculative vector operations |
GB2519107B (en) * | 2013-10-09 | 2020-05-13 | Advanced Risc Mach Ltd | A data processing apparatus and method for performing speculative vector access operations |
US9740854B2 (en) * | 2013-10-25 | 2017-08-22 | Red Hat, Inc. | System and method for code protection |
US10185604B2 (en) * | 2013-10-31 | 2019-01-22 | Advanced Micro Devices, Inc. | Methods and apparatus for software chaining of co-processor commands before submission to a command queue |
US9727611B2 (en) * | 2013-11-08 | 2017-08-08 | Samsung Electronics Co., Ltd. | Hybrid buffer management scheme for immutable pages |
US10191765B2 (en) | 2013-11-22 | 2019-01-29 | Sap Se | Transaction commit operations with thread decoupling and grouping of I/O requests |
US9495312B2 (en) | 2013-12-20 | 2016-11-15 | International Business Machines Corporation | Determining command rate based on dropped commands |
US9552221B1 (en) * | 2013-12-23 | 2017-01-24 | Google Inc. | Monitoring application execution using probe and profiling modules to collect timing and dependency information |
WO2015099767A1 (en) | 2013-12-27 | 2015-07-02 | Intel Corporation | Scalable input/output system and techniques |
US9307057B2 (en) * | 2014-01-08 | 2016-04-05 | Cavium, Inc. | Methods and systems for resource management in a single instruction multiple data packet parsing cluster |
US9509769B2 (en) * | 2014-02-28 | 2016-11-29 | Sap Se | Reflecting data modification requests in an offline environment |
US9720991B2 (en) | 2014-03-04 | 2017-08-01 | Microsoft Technology Licensing, Llc | Seamless data migration across databases |
US9697100B2 (en) * | 2014-03-10 | 2017-07-04 | Accenture Global Services Limited | Event correlation |
GB2524063B (en) | 2014-03-13 | 2020-07-01 | Advanced Risc Mach Ltd | Data processing apparatus for executing an access instruction for N threads |
JP6183251B2 (ja) * | 2014-03-14 | 2017-08-23 | 株式会社デンソー | 電子制御装置 |
US9268597B2 (en) * | 2014-04-01 | 2016-02-23 | Google Inc. | Incremental parallel processing of data |
US9607073B2 (en) * | 2014-04-17 | 2017-03-28 | Ab Initio Technology Llc | Processing data from multiple sources |
US10102210B2 (en) * | 2014-04-18 | 2018-10-16 | Oracle International Corporation | Systems and methods for multi-threaded shadow migration |
US9400654B2 (en) * | 2014-06-27 | 2016-07-26 | Freescale Semiconductor, Inc. | System on a chip with managing processor and method therefor |
CN104125283B (zh) * | 2014-07-30 | 2017-10-03 | 中国银行股份有限公司 | 一种用于集群的消息队列接收方法及系统 |
US9787564B2 (en) * | 2014-08-04 | 2017-10-10 | Cisco Technology, Inc. | Algorithm for latency saving calculation in a piped message protocol on proxy caching engine |
US9692813B2 (en) * | 2014-08-08 | 2017-06-27 | Sas Institute Inc. | Dynamic assignment of transfers of blocks of data |
US9910650B2 (en) * | 2014-09-25 | 2018-03-06 | Intel Corporation | Method and apparatus for approximating detection of overlaps between memory ranges |
US9501420B2 (en) * | 2014-10-22 | 2016-11-22 | Netapp, Inc. | Cache optimization technique for large working data sets |
US20170262879A1 (en) * | 2014-11-06 | 2017-09-14 | Appriz Incorporated | Mobile application and two-way financial interaction solution with personalized alerts and notifications |
US9697151B2 (en) | 2014-11-19 | 2017-07-04 | Nxp Usa, Inc. | Message filtering in a data processing system |
US9727500B2 (en) | 2014-11-19 | 2017-08-08 | Nxp Usa, Inc. | Message filtering in a data processing system |
US9727679B2 (en) * | 2014-12-20 | 2017-08-08 | Intel Corporation | System on chip configuration metadata |
US9851970B2 (en) * | 2014-12-23 | 2017-12-26 | Intel Corporation | Method and apparatus for performing reduction operations on a set of vector elements |
US9880953B2 (en) | 2015-01-05 | 2018-01-30 | Tuxera Corporation | Systems and methods for network I/O based interrupt steering |
US9286196B1 (en) * | 2015-01-08 | 2016-03-15 | Arm Limited | Program execution optimization using uniform variable identification |
WO2016115075A1 (en) | 2015-01-13 | 2016-07-21 | Sikorsky Aircraft Corporation | Structural health monitoring employing physics models |
US20160219101A1 (en) * | 2015-01-23 | 2016-07-28 | Tieto Oyj | Migrating an application providing latency critical service |
US9547881B2 (en) * | 2015-01-29 | 2017-01-17 | Qualcomm Incorporated | Systems and methods for calculating a feature descriptor |
JP6508661B2 (ja) * | 2015-02-06 | 2019-05-08 | 華為技術有限公司Huawei Technologies Co.,Ltd. | データ処理システム、計算ノードおよびデータ処理方法 |
US9785413B2 (en) * | 2015-03-06 | 2017-10-10 | Intel Corporation | Methods and apparatus to eliminate partial-redundant vector loads |
JP6427053B2 (ja) * | 2015-03-31 | 2018-11-21 | 株式会社デンソー | 並列化コンパイル方法、及び並列化コンパイラ |
US10095479B2 (en) * | 2015-04-23 | 2018-10-09 | Google Llc | Virtual image processor instruction set architecture (ISA) and memory model and exemplary target hardware having a two-dimensional shift array structure |
US10372616B2 (en) | 2015-06-03 | 2019-08-06 | Renesas Electronics America Inc. | Microcontroller performing address translations using address offsets in memory where selected absolute addressing based programs are stored |
US9923965B2 (en) | 2015-06-05 | 2018-03-20 | International Business Machines Corporation | Storage mirroring over wide area network circuits with dynamic on-demand capacity |
US10409599B2 (en) | 2015-06-26 | 2019-09-10 | Microsoft Technology Licensing, Llc | Decoding information about a group of instructions including a size of the group of instructions |
US10409606B2 (en) | 2015-06-26 | 2019-09-10 | Microsoft Technology Licensing, Llc | Verifying branch targets |
US10346168B2 (en) | 2015-06-26 | 2019-07-09 | Microsoft Technology Licensing, Llc | Decoupled processor instruction window and operand buffer |
US10175988B2 (en) | 2015-06-26 | 2019-01-08 | Microsoft Technology Licensing, Llc | Explicit instruction scheduler state information for a processor |
US10191747B2 (en) | 2015-06-26 | 2019-01-29 | Microsoft Technology Licensing, Llc | Locking operand values for groups of instructions executed atomically |
CN106293893B (zh) * | 2015-06-26 | 2019-12-06 | 阿里巴巴集团控股有限公司 | 作业调度方法、装置及分布式系统 |
US10169044B2 (en) | 2015-06-26 | 2019-01-01 | Microsoft Technology Licensing, Llc | Processing an encoding format field to interpret header information regarding a group of instructions |
US10459723B2 (en) | 2015-07-20 | 2019-10-29 | Qualcomm Incorporated | SIMD instructions for multi-stage cube networks |
US9930498B2 (en) * | 2015-07-31 | 2018-03-27 | Qualcomm Incorporated | Techniques for multimedia broadcast multicast service transmissions in unlicensed spectrum |
US20170054449A1 (en) * | 2015-08-19 | 2017-02-23 | Texas Instruments Incorporated | Method and System for Compression of Radar Signals |
EP3271820B1 (en) | 2015-09-24 | 2020-06-24 | Hewlett-Packard Enterprise Development LP | Failure indication in shared memory |
US20170104733A1 (en) * | 2015-10-09 | 2017-04-13 | Intel Corporation | Device, system and method for low speed communication of sensor information |
US9898325B2 (en) * | 2015-10-20 | 2018-02-20 | Vmware, Inc. | Configuration settings for configurable virtual components |
US20170116154A1 (en) * | 2015-10-23 | 2017-04-27 | The Intellisis Corporation | Register communication in a network-on-a-chip architecture |
CN106648563B (zh) * | 2015-10-30 | 2021-03-23 | 阿里巴巴集团控股有限公司 | 应用程序中共享模块的依赖解耦处理方法和装置 |
KR102248846B1 (ko) * | 2015-11-04 | 2021-05-06 | 삼성전자주식회사 | 데이터를 병렬 처리하는 방법 및 이를 위한 장치 |
US9977619B2 (en) * | 2015-11-06 | 2018-05-22 | Vivante Corporation | Transfer descriptor for memory access commands |
US10216441B2 (en) | 2015-11-25 | 2019-02-26 | International Business Machines Corporation | Dynamic quality of service for storage I/O port allocation |
US9923839B2 (en) * | 2015-11-25 | 2018-03-20 | International Business Machines Corporation | Configuring resources to exploit elastic network capability |
US10057327B2 (en) | 2015-11-25 | 2018-08-21 | International Business Machines Corporation | Controlled transfer of data over an elastic network |
US9923784B2 (en) | 2015-11-25 | 2018-03-20 | International Business Machines Corporation | Data transfer using flexible dynamic elastic network service provider relationships |
US10581680B2 (en) | 2015-11-25 | 2020-03-03 | International Business Machines Corporation | Dynamic configuration of network features |
US10177993B2 (en) | 2015-11-25 | 2019-01-08 | International Business Machines Corporation | Event-based data transfer scheduling using elastic network optimization criteria |
US10642617B2 (en) * | 2015-12-08 | 2020-05-05 | Via Alliance Semiconductor Co., Ltd. | Processor with an expandable instruction set architecture for dynamically configuring execution resources |
US10180829B2 (en) * | 2015-12-15 | 2019-01-15 | Nxp Usa, Inc. | System and method for modulo addressing vectorization with invariant code motion |
US20170177349A1 (en) * | 2015-12-21 | 2017-06-22 | Intel Corporation | Instructions and Logic for Load-Indices-and-Prefetch-Gathers Operations |
CN107015931A (zh) * | 2016-01-27 | 2017-08-04 | 三星电子株式会社 | 用于中断处理的方法和加速器单元 |
CN105760321B (zh) * | 2016-02-29 | 2019-08-13 | 福州瑞芯微电子股份有限公司 | SOC芯片的debug时钟域电路 |
US20210049292A1 (en) * | 2016-03-07 | 2021-02-18 | Crowdstrike, Inc. | Hypervisor-Based Interception of Memory and Register Accesses |
GB2548601B (en) * | 2016-03-23 | 2019-02-13 | Advanced Risc Mach Ltd | Processing vector instructions |
EP3226184A1 (en) * | 2016-03-30 | 2017-10-04 | Tata Consultancy Services Limited | Systems and methods for determining and rectifying events in processes |
US9967539B2 (en) * | 2016-06-03 | 2018-05-08 | Samsung Electronics Co., Ltd. | Timestamp error correction with double readout for the 3D camera with epipolar line laser point scanning |
US20170364334A1 (en) * | 2016-06-21 | 2017-12-21 | Atti Liu | Method and Apparatus of Read and Write for the Purpose of Computing |
US10797941B2 (en) * | 2016-07-13 | 2020-10-06 | Cisco Technology, Inc. | Determining network element analytics and networking recommendations based thereon |
CN107832005B (zh) * | 2016-08-29 | 2021-02-26 | 鸿富锦精密电子(天津)有限公司 | 分布式数据存取系统及方法 |
US10353711B2 (en) | 2016-09-06 | 2019-07-16 | Apple Inc. | Clause chaining for clause-based instruction execution |
KR102247529B1 (ko) * | 2016-09-06 | 2021-05-03 | 삼성전자주식회사 | 전자 장치, 재구성 가능 프로세서 및 그 제어 방법들 |
US10909077B2 (en) * | 2016-09-29 | 2021-02-02 | Paypal, Inc. | File slack leveraging |
US10866842B2 (en) * | 2016-10-25 | 2020-12-15 | Reconfigure.Io Limited | Synthesis path for transforming concurrent programs into hardware deployable on FPGA-based cloud infrastructures |
US10423446B2 (en) * | 2016-11-28 | 2019-09-24 | Arm Limited | Data processing |
KR102659495B1 (ko) * | 2016-12-02 | 2024-04-22 | 삼성전자주식회사 | 벡터 프로세서 및 그 제어 방법 |
GB2558220B (en) | 2016-12-22 | 2019-05-15 | Advanced Risc Mach Ltd | Vector generating instruction |
CN108616905B (zh) * | 2016-12-28 | 2021-03-19 | 大唐移动通信设备有限公司 | 基于蜂窝的窄带物联网中用户平面优化方法和系统 |
US10268558B2 (en) | 2017-01-13 | 2019-04-23 | Microsoft Technology Licensing, Llc | Efficient breakpoint detection via caches |
US10671395B2 (en) * | 2017-02-13 | 2020-06-02 | The King Abdulaziz City for Science and Technology—KACST | Application specific instruction-set processor (ASIP) for simultaneously executing a plurality of operations using a long instruction word |
US11144820B2 (en) * | 2017-02-28 | 2021-10-12 | Microsoft Technology Licensing, Llc | Hardware node with position-dependent memories for neural network processing |
US10169196B2 (en) * | 2017-03-20 | 2019-01-01 | Microsoft Technology Licensing, Llc | Enabling breakpoints on entire data structures |
US10360045B2 (en) * | 2017-04-25 | 2019-07-23 | Sandisk Technologies Llc | Event-driven schemes for determining suspend/resume periods |
US10552206B2 (en) * | 2017-05-23 | 2020-02-04 | Ge Aviation Systems Llc | Contextual awareness associated with resources |
US20180349137A1 (en) * | 2017-06-05 | 2018-12-06 | Intel Corporation | Reconfiguring a processor without a system reset |
US11021944B2 (en) | 2017-06-13 | 2021-06-01 | Schlumberger Technology Corporation | Well construction communication and control |
US11143010B2 (en) | 2017-06-13 | 2021-10-12 | Schlumberger Technology Corporation | Well construction communication and control |
US20180359130A1 (en) * | 2017-06-13 | 2018-12-13 | Schlumberger Technology Corporation | Well Construction Communication and Control |
US10599617B2 (en) * | 2017-06-29 | 2020-03-24 | Intel Corporation | Methods and apparatus to modify a binary file for scalable dependency loading on distributed computing systems |
WO2019005165A1 (en) | 2017-06-30 | 2019-01-03 | Intel Corporation | METHOD AND APPARATUS FOR VECTORIZING INDIRECT UPDATING BUCKLES |
CN118069218A (zh) | 2017-09-12 | 2024-05-24 | 恩倍科微公司 | 极低功率微控制器系统 |
US10713050B2 (en) | 2017-09-19 | 2020-07-14 | International Business Machines Corporation | Replacing Table of Contents (TOC)-setting instructions in code with TOC predicting instructions |
US10620955B2 (en) | 2017-09-19 | 2020-04-14 | International Business Machines Corporation | Predicting a table of contents pointer value responsive to branching to a subroutine |
US10896030B2 (en) | 2017-09-19 | 2021-01-19 | International Business Machines Corporation | Code generation relating to providing table of contents pointer values |
US11061575B2 (en) * | 2017-09-19 | 2021-07-13 | International Business Machines Corporation | Read-only table of contents register |
US10705973B2 (en) | 2017-09-19 | 2020-07-07 | International Business Machines Corporation | Initializing a data structure for use in predicting table of contents pointer values |
US10884929B2 (en) | 2017-09-19 | 2021-01-05 | International Business Machines Corporation | Set table of contents (TOC) register instruction |
US10725918B2 (en) | 2017-09-19 | 2020-07-28 | International Business Machines Corporation | Table of contents cache entry having a pointer for a range of addresses |
CN109697114B (zh) * | 2017-10-20 | 2023-07-28 | 伊姆西Ip控股有限责任公司 | 用于应用迁移的方法和机器 |
US10761970B2 (en) * | 2017-10-20 | 2020-09-01 | International Business Machines Corporation | Computerized method and systems for performing deferred safety check operations |
US10572302B2 (en) * | 2017-11-07 | 2020-02-25 | Oracle Internatíonal Corporatíon | Computerized methods and systems for executing and analyzing processes |
US10705843B2 (en) * | 2017-12-21 | 2020-07-07 | International Business Machines Corporation | Method and system for detection of thread stall |
US10915317B2 (en) | 2017-12-22 | 2021-02-09 | Alibaba Group Holding Limited | Multiple-pipeline architecture with special number detection |
CN108196946B (zh) * | 2017-12-28 | 2019-08-09 | 北京翼辉信息技术有限公司 | 一种微内核操作系统的分区多核方法 |
US10366017B2 (en) | 2018-03-30 | 2019-07-30 | Intel Corporation | Methods and apparatus to offload media streams in host devices |
WO2019191739A1 (en) * | 2018-03-31 | 2019-10-03 | Micron Technology, Inc. | Efficient loop execution for a multi-threaded, self-scheduling reconfigurable computing fabric |
US11277455B2 (en) | 2018-06-07 | 2022-03-15 | Mellanox Technologies, Ltd. | Streaming system |
US10740220B2 (en) | 2018-06-27 | 2020-08-11 | Microsoft Technology Licensing, Llc | Cache-based trace replay breakpoints using reserved tag field bits |
CN109087381B (zh) * | 2018-07-04 | 2023-01-17 | 西安邮电大学 | 一种基于双发射vliw的统一架构渲染着色器 |
CN110837414B (zh) * | 2018-08-15 | 2024-04-12 | 京东科技控股股份有限公司 | 任务处理方法和装置 |
US10862485B1 (en) * | 2018-08-29 | 2020-12-08 | Verisilicon Microelectronics (Shanghai) Co., Ltd. | Lookup table index for a processor |
CN109445516A (zh) * | 2018-09-27 | 2019-03-08 | 北京中电华大电子设计有限责任公司 | 一种应用于双核SoC中外设时钟控制方法及电路 |
US20200106828A1 (en) * | 2018-10-02 | 2020-04-02 | Mellanox Technologies, Ltd. | Parallel Computation Network Device |
US11061894B2 (en) * | 2018-10-31 | 2021-07-13 | Salesforce.Com, Inc. | Early detection and warning for system bottlenecks in an on-demand environment |
US11108675B2 (en) | 2018-10-31 | 2021-08-31 | Keysight Technologies, Inc. | Methods, systems, and computer readable media for testing effects of simulated frame preemption and deterministic fragmentation of preemptable frames in a frame-preemption-capable network |
US10776984B2 (en) | 2018-11-08 | 2020-09-15 | Insightfulvr, Inc | Compositor for decoupled rendering |
US10678693B2 (en) * | 2018-11-08 | 2020-06-09 | Insightfulvr, Inc | Logic-executing ring buffer |
US10728134B2 (en) * | 2018-11-14 | 2020-07-28 | Keysight Technologies, Inc. | Methods, systems, and computer readable media for measuring delivery latency in a frame-preemption-capable network |
CN109374935A (zh) * | 2018-11-28 | 2019-02-22 | 武汉精能电子技术有限公司 | 一种电子负载并机方法及系统 |
US10761822B1 (en) * | 2018-12-12 | 2020-09-01 | Amazon Technologies, Inc. | Synchronization of computation engines with non-blocking instructions |
GB2580136B (en) * | 2018-12-21 | 2021-01-20 | Graphcore Ltd | Handling exceptions in a multi-tile processing arrangement |
US10671550B1 (en) * | 2019-01-03 | 2020-06-02 | International Business Machines Corporation | Memory offloading a problem using accelerators |
TWI703500B (zh) * | 2019-02-01 | 2020-09-01 | 睿寬智能科技有限公司 | 可縮短內文交換時間之方法及其半導體裝置 |
US11625393B2 (en) | 2019-02-19 | 2023-04-11 | Mellanox Technologies, Ltd. | High performance computing system |
EP3699770A1 (en) | 2019-02-25 | 2020-08-26 | Mellanox Technologies TLV Ltd. | Collective communication system and methods |
WO2020181259A1 (en) * | 2019-03-06 | 2020-09-10 | Live Nation Entertainment, Inc. | Systems and methods for queue control based on client-specific protocols |
US10935600B2 (en) * | 2019-04-05 | 2021-03-02 | Texas Instruments Incorporated | Dynamic security protection in configurable analog signal chains |
CN111966399B (zh) * | 2019-05-20 | 2024-06-07 | 上海寒武纪信息科技有限公司 | 指令处理方法、装置及相关产品 |
CN110177220B (zh) * | 2019-05-23 | 2020-09-01 | 上海图趣信息科技有限公司 | 一种具有外部授时功能的相机及其控制方法 |
WO2021026225A1 (en) * | 2019-08-08 | 2021-02-11 | Neuralmagic Inc. | System and method of accelerating execution of a neural network |
US11403110B2 (en) * | 2019-10-23 | 2022-08-02 | Texas Instruments Incorporated | Storing a result of a first instruction of an execute packet in a holding register prior to completion of a second instruction of the execute packet |
US11144483B2 (en) * | 2019-10-25 | 2021-10-12 | Micron Technology, Inc. | Apparatuses and methods for writing data to a memory |
FR3103583B1 (fr) * | 2019-11-27 | 2023-05-12 | Commissariat Energie Atomique | Système de gestion des données partagées |
US10877761B1 (en) * | 2019-12-08 | 2020-12-29 | Mellanox Technologies, Ltd. | Write reordering in a multiprocessor system |
CN111061510B (zh) * | 2019-12-12 | 2021-01-05 | 湖南毂梁微电子有限公司 | 一种可扩展的asip结构平台及指令处理方法 |
CN111143127B (zh) * | 2019-12-23 | 2023-09-26 | 杭州迪普科技股份有限公司 | 监管网络设备的方法、装置、存储介质及设备 |
CN113034653B (zh) * | 2019-12-24 | 2023-08-08 | 腾讯科技(深圳)有限公司 | 一种动画渲染方法及装置 |
US11750699B2 (en) | 2020-01-15 | 2023-09-05 | Mellanox Technologies, Ltd. | Small message aggregation |
US11137936B2 (en) | 2020-01-21 | 2021-10-05 | Google Llc | Data processing on memory controller |
US11360780B2 (en) * | 2020-01-22 | 2022-06-14 | Apple Inc. | Instruction-level context switch in SIMD processor |
US11252027B2 (en) | 2020-01-23 | 2022-02-15 | Mellanox Technologies, Ltd. | Network element supporting flexible data reduction operations |
US12014443B2 (en) | 2020-02-05 | 2024-06-18 | Sony Interactive Entertainment Inc. | Graphics processor and information processing system |
US11188316B2 (en) * | 2020-03-09 | 2021-11-30 | International Business Machines Corporation | Performance optimization of class instance comparisons |
US11354130B1 (en) * | 2020-03-19 | 2022-06-07 | Amazon Technologies, Inc. | Efficient race-condition detection |
US12001929B2 (en) * | 2020-04-01 | 2024-06-04 | Samsung Electronics Co., Ltd. | Mixed-precision neural processing unit (NPU) using spatial fusion with load balancing |
WO2021212074A1 (en) * | 2020-04-16 | 2021-10-21 | Tom Herbert | Parallelism in serial pipeline processing |
JP7380416B2 (ja) | 2020-05-18 | 2023-11-15 | トヨタ自動車株式会社 | エージェント制御装置 |
JP7380415B2 (ja) * | 2020-05-18 | 2023-11-15 | トヨタ自動車株式会社 | エージェント制御装置 |
SE544261C2 (en) | 2020-06-16 | 2022-03-15 | IntuiCell AB | A computer-implemented or hardware-implemented method of entity identification, a computer program product and an apparatus for entity identification |
US11876885B2 (en) | 2020-07-02 | 2024-01-16 | Mellanox Technologies, Ltd. | Clock queue with arming and/or self-arming features |
GB202010839D0 (en) * | 2020-07-14 | 2020-08-26 | Graphcore Ltd | Variable allocation |
US12021546B2 (en) * | 2020-09-03 | 2024-06-25 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and apparatus for improved belief propagation based decoding |
US11340914B2 (en) * | 2020-10-21 | 2022-05-24 | Red Hat, Inc. | Run-time identification of dependencies during dynamic linking |
JP7203799B2 (ja) | 2020-10-27 | 2023-01-13 | 昭和電線ケーブルシステム株式会社 | 油入り電力ケーブルおよび接続部における漏油箇所の補修方法 |
US11243773B1 (en) | 2020-12-14 | 2022-02-08 | International Business Machines Corporation | Area and power efficient mechanism to wakeup store-dependent loads according to store drain merges |
TWI768592B (zh) * | 2020-12-14 | 2022-06-21 | 瑞昱半導體股份有限公司 | 中央處理器 |
US11556378B2 (en) | 2020-12-14 | 2023-01-17 | Mellanox Technologies, Ltd. | Offloading execution of a multi-task parameter-dependent operation to a network device |
CN112924962B (zh) * | 2021-01-29 | 2023-02-21 | 上海匀羿电磁科技有限公司 | 一种地下管线侧向偏移滤波检测及定位方法 |
CN113112393B (zh) * | 2021-03-04 | 2022-05-31 | 浙江欣奕华智能科技有限公司 | 视觉导航系统中的边缘化装置 |
CN113438171B (zh) * | 2021-05-08 | 2022-11-15 | 清华大学 | 一种低功耗存算一体系统的多芯片连接方法 |
CN113553266A (zh) * | 2021-07-23 | 2021-10-26 | 湖南大学 | 一种基于并行性检测模型的串行程序的并行性检测方法、系统、终端及可读存储介质 |
US12086160B2 (en) * | 2021-09-23 | 2024-09-10 | Oracle International Corporation | Analyzing performance of resource systems that process requests for particular datasets |
US11770345B2 (en) * | 2021-09-30 | 2023-09-26 | US Technology International Pvt. Ltd. | Data transfer device for receiving data from a host device and method therefor |
US12118384B2 (en) * | 2021-10-29 | 2024-10-15 | Blackberry Limited | Scheduling of threads for clusters of processors |
JP2023082571A (ja) * | 2021-12-02 | 2023-06-14 | 富士通株式会社 | 演算処理装置及び演算処理方法 |
US20230289189A1 (en) * | 2022-03-10 | 2023-09-14 | Nvidia Corporation | Distributed Shared Memory |
WO2023214915A1 (en) * | 2022-05-06 | 2023-11-09 | IntuiCell AB | A data processing system for processing pixel data to be indicative of contrast. |
US11922237B1 (en) | 2022-09-12 | 2024-03-05 | Mellanox Technologies, Ltd. | Single-step collective operations |
DE102022003674A1 (de) * | 2022-10-05 | 2024-04-11 | Mercedes-Benz Group AG | Verfahren zum statischen Allozieren von lnformationen zu Speicherbereichen, informationstechnisches System und Fahrzeug |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1309347A (zh) * | 1999-11-15 | 2001-08-22 | 德克萨斯仪器股份有限公司 | 具有改良的指令集体系结构的微处理器 |
CN101133409A (zh) * | 2005-03-03 | 2008-02-27 | Clear-Speed科技公司 | 处理器中的可再配置逻辑 |
CN101322111A (zh) * | 2005-04-07 | 2008-12-10 | 杉桥技术公司 | 每个线程具有多个并发流水线的多线程处理器 |
Family Cites Families (78)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4862350A (en) * | 1984-08-03 | 1989-08-29 | International Business Machines Corp. | Architecture for a distributive microprocessing system |
GB2211638A (en) * | 1987-10-27 | 1989-07-05 | Ibm | Simd array processor |
US5218709A (en) * | 1989-12-28 | 1993-06-08 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | Special purpose parallel computer architecture for real-time control and simulation in robotic applications |
IL97315A (en) * | 1990-02-28 | 1994-10-07 | Hughes Aircraft Co | Multi-group signal processor |
US5815723A (en) * | 1990-11-13 | 1998-09-29 | International Business Machines Corporation | Picket autonomy on a SIMD machine |
CA2073516A1 (en) * | 1991-11-27 | 1993-05-28 | Peter Michael Kogge | Dynamic multi-mode parallel processor array architecture computer system |
US5315700A (en) * | 1992-02-18 | 1994-05-24 | Neopath, Inc. | Method and apparatus for rapidly processing data sequences |
JPH07287700A (ja) * | 1992-05-22 | 1995-10-31 | Internatl Business Mach Corp <Ibm> | コンピュータ・システム |
US5315701A (en) * | 1992-08-07 | 1994-05-24 | International Business Machines Corporation | Method and system for processing graphics data streams utilizing scalable processing nodes |
US5560034A (en) * | 1993-07-06 | 1996-09-24 | Intel Corporation | Shared command list |
JPH07210545A (ja) * | 1994-01-24 | 1995-08-11 | Matsushita Electric Ind Co Ltd | 並列処理プロセッサ |
US6002411A (en) * | 1994-11-16 | 1999-12-14 | Interactive Silicon, Inc. | Integrated video and memory controller with data processing and graphical processing capabilities |
JPH1049368A (ja) * | 1996-07-30 | 1998-02-20 | Mitsubishi Electric Corp | 条件実行命令を有するマイクロプロセッサ |
JP3778573B2 (ja) * | 1996-09-27 | 2006-05-24 | 株式会社ルネサステクノロジ | データプロセッサ及びデータ処理システム |
US6108775A (en) * | 1996-12-30 | 2000-08-22 | Texas Instruments Incorporated | Dynamically loadable pattern history tables in a multi-task microprocessor |
US6243499B1 (en) * | 1998-03-23 | 2001-06-05 | Xerox Corporation | Tagging of antialiased images |
JP2000207202A (ja) * | 1998-10-29 | 2000-07-28 | Pacific Design Kk | 制御装置およびデ―タ処理装置 |
US8171263B2 (en) * | 1999-04-09 | 2012-05-01 | Rambus Inc. | Data processing apparatus comprising an array controller for separating an instruction stream processing instructions and data transfer instructions |
AU3829500A (en) * | 1999-04-09 | 2000-11-14 | Clearspeed Technology Limited | Parallel data processing apparatus |
US6751698B1 (en) * | 1999-09-29 | 2004-06-15 | Silicon Graphics, Inc. | Multiprocessor node controller circuit and method |
JP2001167069A (ja) * | 1999-12-13 | 2001-06-22 | Fujitsu Ltd | マルチプロセッサシステム及びデータ転送方法 |
JP2002073329A (ja) * | 2000-08-29 | 2002-03-12 | Canon Inc | プロセッサ |
US20020052868A1 (en) * | 2000-10-04 | 2002-05-02 | Sanjeev Mohindra | SIMD system and method |
US6959346B2 (en) * | 2000-12-22 | 2005-10-25 | Mosaid Technologies, Inc. | Method and system for packet encryption |
JP5372307B2 (ja) * | 2001-06-25 | 2013-12-18 | 株式会社ガイア・システム・ソリューション | データ処理装置およびその制御方法 |
GB0119145D0 (en) * | 2001-08-06 | 2001-09-26 | Nokia Corp | Controlling processing networks |
JP2003099252A (ja) * | 2001-09-26 | 2003-04-04 | Pacific Design Kk | データ処理装置およびその制御方法 |
JP3840966B2 (ja) * | 2001-12-12 | 2006-11-01 | ソニー株式会社 | 画像処理装置およびその方法 |
US7853778B2 (en) * | 2001-12-20 | 2010-12-14 | Intel Corporation | Load/move and duplicate instructions for a processor |
US7548586B1 (en) * | 2002-02-04 | 2009-06-16 | Mimar Tibet | Audio and video processing apparatus |
US7506135B1 (en) * | 2002-06-03 | 2009-03-17 | Mimar Tibet | Histogram generation with vector operations in SIMD and VLIW processor by consolidating LUTs storing parallel update incremented count values for vector data elements |
WO2004015563A1 (en) * | 2002-08-09 | 2004-02-19 | Intel Corporation | Multimedia coprocessor control mechanism including alignment or broadcast instructions |
JP2004295494A (ja) * | 2003-03-27 | 2004-10-21 | Fujitsu Ltd | 汎用性及びリアルタイム性を有するマルチ処理ノードシステム |
US7107436B2 (en) * | 2003-09-08 | 2006-09-12 | Freescale Semiconductor, Inc. | Conditional next portion transferring of data stream to or from register based on subsequent instruction aspect |
US7836276B2 (en) * | 2005-12-02 | 2010-11-16 | Nvidia Corporation | System and method for processing thread groups in a SIMD architecture |
DE10353267B3 (de) * | 2003-11-14 | 2005-07-28 | Infineon Technologies Ag | Multithread-Prozessorarchitektur zum getriggerten Thread-Umschalten ohne Zykluszeitverlust und ohne Umschalt-Programmbefehl |
GB2409060B (en) * | 2003-12-09 | 2006-08-09 | Advanced Risc Mach Ltd | Moving data between registers of different register data stores |
US8566828B2 (en) * | 2003-12-19 | 2013-10-22 | Stmicroelectronics, Inc. | Accelerator for multi-processing system and method |
US7206922B1 (en) * | 2003-12-30 | 2007-04-17 | Cisco Systems, Inc. | Instruction memory hierarchy for an embedded processor |
US7412587B2 (en) * | 2004-02-16 | 2008-08-12 | Matsushita Electric Industrial Co., Ltd. | Parallel operation processor utilizing SIMD data transfers |
JP4698242B2 (ja) * | 2004-02-16 | 2011-06-08 | パナソニック株式会社 | 並列演算プロセッサ、並列演算プロセッサの動作を制御する制御プログラム及び制御方法、並びに並列演算プロセッサを搭載した画像処理装置 |
JP2005352568A (ja) * | 2004-06-08 | 2005-12-22 | Hitachi-Lg Data Storage Inc | アナログ信号処理回路、並びに、そのデータレジスタ書換方法とそのデータ通信方法 |
US7681199B2 (en) * | 2004-08-31 | 2010-03-16 | Hewlett-Packard Development Company, L.P. | Time measurement using a context switch count, an offset, and a scale factor, received from the operating system |
US7565469B2 (en) * | 2004-11-17 | 2009-07-21 | Nokia Corporation | Multimedia card interface method, computer program product and apparatus |
US7257695B2 (en) * | 2004-12-28 | 2007-08-14 | Intel Corporation | Register file regions for a processing system |
US20060155955A1 (en) * | 2005-01-10 | 2006-07-13 | Gschwind Michael K | SIMD-RISC processor module |
GB2437836B (en) * | 2005-02-25 | 2009-01-14 | Clearspeed Technology Plc | Microprocessor architectures |
US7992144B1 (en) * | 2005-04-04 | 2011-08-02 | Oracle America, Inc. | Method and apparatus for separating and isolating control of processing entities in a network interface |
US20060259737A1 (en) * | 2005-05-10 | 2006-11-16 | Telairity Semiconductor, Inc. | Vector processor with special purpose registers and high speed memory access |
EP1883045A4 (en) * | 2005-05-20 | 2016-10-05 | Sony Corp | SIGNAL PROCESSOR |
JP2006343872A (ja) * | 2005-06-07 | 2006-12-21 | Keio Gijuku | マルチスレッド中央演算装置および同時マルチスレッディング制御方法 |
US20060294344A1 (en) * | 2005-06-28 | 2006-12-28 | Universal Network Machines, Inc. | Computer processor pipeline with shadow registers for context switching, and method |
US8275976B2 (en) * | 2005-08-29 | 2012-09-25 | The Invention Science Fund I, Llc | Hierarchical instruction scheduler facilitating instruction replay |
US7617363B2 (en) * | 2005-09-26 | 2009-11-10 | Intel Corporation | Low latency message passing mechanism |
US7421529B2 (en) * | 2005-10-20 | 2008-09-02 | Qualcomm Incorporated | Method and apparatus to clear semaphore reservation for exclusive access to shared memory |
JP2009519513A (ja) * | 2005-12-06 | 2009-05-14 | ボストンサーキッツ インコーポレイテッド | 専用スレッド管理を用いたマルチコアの演算処理方法及び装置 |
US7788468B1 (en) * | 2005-12-15 | 2010-08-31 | Nvidia Corporation | Synchronization of threads in a cooperative thread array |
CN2862511Y (zh) * | 2005-12-15 | 2007-01-24 | 李志刚 | 用于gjb-289a总线的多功能接口板 |
US7360063B2 (en) * | 2006-03-02 | 2008-04-15 | International Business Machines Corporation | Method for SIMD-oriented management of register maps for map-based indirect register-file access |
US8560863B2 (en) * | 2006-06-27 | 2013-10-15 | Intel Corporation | Systems and techniques for datapath security in a system-on-a-chip device |
JP2008059455A (ja) * | 2006-09-01 | 2008-03-13 | Kawasaki Microelectronics Kk | マルチプロセッサ |
EP2523101B1 (en) * | 2006-11-14 | 2014-06-04 | Soft Machines, Inc. | Apparatus and method for processing complex instruction formats in a multi- threaded architecture supporting various context switch modes and virtualization schemes |
US7870400B2 (en) * | 2007-01-02 | 2011-01-11 | Freescale Semiconductor, Inc. | System having a memory voltage controller which varies an operating voltage of a memory and method therefor |
JP5079342B2 (ja) * | 2007-01-22 | 2012-11-21 | ルネサスエレクトロニクス株式会社 | マルチプロセッサ装置 |
US20080270363A1 (en) * | 2007-01-26 | 2008-10-30 | Herbert Dennis Hunt | Cluster processing of a core information matrix |
US8250550B2 (en) * | 2007-02-14 | 2012-08-21 | The Mathworks, Inc. | Parallel processing of distributed arrays and optimum data distribution |
CN101021832A (zh) * | 2007-03-19 | 2007-08-22 | 中国人民解放军国防科学技术大学 | 支持局部寄存和条件执行的64位浮点整数融合运算群 |
US8132172B2 (en) * | 2007-03-26 | 2012-03-06 | Intel Corporation | Thread scheduling on multiprocessor systems |
US7627744B2 (en) * | 2007-05-10 | 2009-12-01 | Nvidia Corporation | External memory accessing DMA request scheduling in IC of parallel processing engines according to completion notification queue occupancy level |
CN100461095C (zh) * | 2007-11-20 | 2009-02-11 | 浙江大学 | 一种支持多模式的媒体增强流水线乘法单元设计方法 |
FR2925187B1 (fr) * | 2007-12-14 | 2011-04-08 | Commissariat Energie Atomique | Systeme comportant une pluralite d'unites de traitement permettant d'executer des taches en parallele,en mixant le mode d'execution de type controle et le mode d'execution de type flot de donnees |
CN101471810B (zh) * | 2007-12-28 | 2011-09-14 | 华为技术有限公司 | 一种在集群环境下实现任务的方法、装置及系统 |
US20090183035A1 (en) * | 2008-01-10 | 2009-07-16 | Butler Michael G | Processor including hybrid redundancy for logic error protection |
JP5461533B2 (ja) * | 2008-05-30 | 2014-04-02 | アドバンスト・マイクロ・ディバイシズ・インコーポレイテッド | ローカル及びグローバルのデータ共有 |
CN101739235A (zh) * | 2008-11-26 | 2010-06-16 | 中国科学院微电子研究所 | 将32位dsp与通用risc cpu无缝混链的处理器装置 |
CN101799750B (zh) * | 2009-02-11 | 2015-05-06 | 上海芯豪微电子有限公司 | 一种数据处理的方法与装置 |
CN101593164B (zh) * | 2009-07-13 | 2012-05-09 | 中国船舶重工集团公司第七○九研究所 | 基于嵌入式Linux的从USB HID装置及固件实现方法 |
US9552206B2 (en) * | 2010-11-18 | 2017-01-24 | Texas Instruments Incorporated | Integrated circuit with control node circuitry and processing circuitry |
-
2011
- 2011-09-14 US US13/232,774 patent/US9552206B2/en active Active
- 2011-11-18 JP JP2013540065A patent/JP2014501007A/ja active Pending
- 2011-11-18 CN CN201180055771.5A patent/CN103221935B/zh active Active
- 2011-11-18 WO PCT/US2011/061474 patent/WO2012068504A2/en active Application Filing
- 2011-11-18 JP JP2013540061A patent/JP6096120B2/ja active Active
- 2011-11-18 CN CN201180055828.1A patent/CN103221939B/zh active Active
- 2011-11-18 CN CN201180055803.1A patent/CN103221937B/zh active Active
- 2011-11-18 WO PCT/US2011/061461 patent/WO2012068498A2/en active Application Filing
- 2011-11-18 JP JP2013540048A patent/JP5859017B2/ja active Active
- 2011-11-18 WO PCT/US2011/061444 patent/WO2012068486A2/en active Application Filing
- 2011-11-18 JP JP2013540059A patent/JP5989656B2/ja active Active
- 2011-11-18 WO PCT/US2011/061428 patent/WO2012068475A2/en active Application Filing
- 2011-11-18 JP JP2013540058A patent/JP2014505916A/ja active Pending
- 2011-11-18 JP JP2013540074A patent/JP2014501009A/ja active Pending
- 2011-11-18 WO PCT/US2011/061487 patent/WO2012068513A2/en active Application Filing
- 2011-11-18 JP JP2013540069A patent/JP2014501008A/ja active Pending
- 2011-11-18 CN CN201180055782.3A patent/CN103221936B/zh active Active
- 2011-11-18 CN CN201180055748.6A patent/CN103221934B/zh active Active
- 2011-11-18 CN CN201180055810.1A patent/CN103221938B/zh active Active
- 2011-11-18 CN CN201180055668.0A patent/CN103221933B/zh active Active
- 2011-11-18 CN CN201180055694.3A patent/CN103221918B/zh active Active
- 2011-11-18 WO PCT/US2011/061431 patent/WO2012068478A2/en active Application Filing
- 2011-11-18 JP JP2013540064A patent/JP2014501969A/ja active Pending
- 2011-11-18 WO PCT/US2011/061369 patent/WO2012068449A2/en active Application Filing
- 2011-11-18 WO PCT/US2011/061456 patent/WO2012068494A2/en active Application Filing
-
2016
- 2016-02-12 JP JP2016024486A patent/JP6243935B2/ja active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1309347A (zh) * | 1999-11-15 | 2001-08-22 | 德克萨斯仪器股份有限公司 | 具有改良的指令集体系结构的微处理器 |
CN101133409A (zh) * | 2005-03-03 | 2008-02-27 | Clear-Speed科技公司 | 处理器中的可再配置逻辑 |
CN101322111A (zh) * | 2005-04-07 | 2008-12-10 | 杉桥技术公司 | 每个线程具有多个并发流水线的多线程处理器 |
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103221938B (zh) | 移动数据的方法和装置 | |
US11669443B2 (en) | Data layout optimization on processing in memory architecture for executing neural network model | |
US11714780B2 (en) | Compiler flow logic for reconfigurable architectures | |
US11238334B2 (en) | System and method of input alignment for efficient vector operations in an artificial neural network | |
US12105630B2 (en) | Compile time logic for inserting a buffer between a producer operation unit and a consumer operation unit in a dataflow graph | |
US11593157B2 (en) | Full asynchronous execution queue for accelerator hardware | |
CN114327362A (zh) | 大规模矩阵重构和矩阵-标量操作 | |
Kuzmanov et al. | A Polymorphic Register File for matrix operations | |
EP3862874A1 (en) | Full asynchronous execution queue for accelerator hardware | |
US9542192B1 (en) | Tokenized streams for concurrent execution between asymmetric multiprocessors | |
Dworaczyk Wiltshire | CUDA Enhanced Filtering In a Pipelined Video Processing Framework | |
Wiltshire | CUDA Enhanced Filtering in a Pipelined |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |