CN1666187A - 可重配置的流型矢量处理器 - Google Patents
可重配置的流型矢量处理器 Download PDFInfo
- Publication number
- CN1666187A CN1666187A CN03815336XA CN03815336A CN1666187A CN 1666187 A CN1666187 A CN 1666187A CN 03815336X A CN03815336X A CN 03815336XA CN 03815336 A CN03815336 A CN 03815336A CN 1666187 A CN1666187 A CN 1666187A
- Authority
- CN
- China
- Prior art keywords
- vector processor
- data
- data value
- input
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000006870 function Effects 0.000 claims abstract description 8
- 238000000034 method Methods 0.000 claims description 29
- 238000010586 diagram Methods 0.000 claims description 21
- 230000005055 memory storage Effects 0.000 claims description 3
- 238000013500 data storage Methods 0.000 claims 1
- 230000009191 jumping Effects 0.000 claims 1
- 230000008859 change Effects 0.000 description 10
- 230000008569 process Effects 0.000 description 10
- 238000004590 computer program Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 241001673391 Entandrophragma candollei Species 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 230000008672 reprogramming Effects 0.000 description 1
- 238000011079 streamline operation Methods 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/355—Indexed addressing
- G06F9/3552—Indexed addressing using wraparound, e.g. modulo or circular addressing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8053—Vector processors
- G06F15/8061—Details on data memory access
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/345—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/345—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
- G06F9/3455—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results using stride
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
- G06F9/383—Operand prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3877—Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3877—Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
- G06F9/3879—Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor for non-native instruction execution, e.g. executing a command; for Java instruction set
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Advance Control (AREA)
- Complex Calculations (AREA)
Abstract
提供可重配置的流矢量处理器(100),它包括许多功能单元(102),每一个功能单元都具有一个或多个用来接收数据值的输入端和提供数据值的输出端,一个可重配置的互连开关(104)以及一个微型序列发生器(118)。可重配置的互连开关(104)包括一个或多个链路,可操作各个链路,以将功能单元(102)的输出与功能单元(102)的输入耦合在一起,如微型序列发生器(118)所指导的那样。矢量处理器还可以包括一个或多个用来从存储器中检索数据的输入流单元(122)。每一个输入流单元由主处理器所指导,并具有到该主处理器的规定接口(116)。矢量处理器还包括一个或多个用来将数据写入存储器或主处理器的输出流单元(124)。
Description
相关申请的交叉引用
本申请与同此申请在同一天申请的共同未决专利申请相关,这些申请是名称为“INTERCONNECTION DEVICE WITH INTEGRATEDSTORAGE”并被标记为Attorney Docket No.CML00101D的申请、名称为“MEMORY INTERFACE WITH FRACTIONAL ADDRESSING”并被标记为Attorney Docket No.CML00102D的申请、名称为“SCHEDULER FOR STREAMING VECTOR PROCESSOR”并被标记为Attorney Docket No.CML00108D的申请以及名称为“METHODOF PROGRAMMING LINEAR GRAPHS FOR STREAMING VECTORCOMPUTATION”并被标记为Attorney Docket No.CML00109D的申请,因此在这里结合它们作为参考。
技术领域
本发明总的来说涉及计算机处理器领域。更特别地,本发明涉及一种可重配置的流型矢量处理器。
背景技术
许多为移动设备(多媒体,图形,图像压缩/解压缩,等等)设计的新应用包括高百分比的流型矢量计算。这些应用的计算速率经常超过最好的通用CPU所能提供的速率。因此,希望找到一种方法来改善存在于这样的设备中的计算引擎的性能,以满足这些新应用计算上的需求。
同时,这些新应用的特性是:标准以及遵照这些标准的最好的算法经常改变,需要一个可编程的并且易于编程的解决方案。况且,上市的时间压力在不断地在增加。针对这个问题的一个方法是增加对以前在软件及硬件方面的投资的重新利用。通过可编程性来最好地提升在多种产品内的硬件的重新利用。软件的重新利用是通过在一个设备的多样实现中的一致程序设计模型来提升的,以便保持二进制兼容性。
满足这个需求的一个尝试是使用硬件加速器。这些不足以解决问题,因为它们改编程序的能力有限。那些在功能上不固定的硬件加速器只允许在它们所执行功能的参数上的改变,而不是功能的类型或排序上的改变。
可编程的解决方案以矢量处理器、数字信号处理器、SIMD处理器和VLIW处理器的形式存在。这些解决方案由于它们的程序设计模型的限制而达不到目标,该限制导致它们在所有的硬件阶段都很难编程并且使之难于维持一致的程序设计模型。在它们的程序设计模型中的限制包括:数据通路管道的程序设计员的可见度,存储器宽度及延迟,在存储器中的数据调整,以及显式资源的相关性。
附图说明
被认为是该发明的特性的新颖特征将在所附的权利要求中阐述。然而,该发明本身,以及优选的使用模式,及进一步的对象和其中的优点,最好是在结合附图进行阅读的时候,通过参考下列说明性实施例的详细描述来理解,其中:
图1是本发明的可重配置的流型矢量处理器的一个实施例的图形表示。
图2是一个示范性的迭代计算的数据流图。
图3是对本发明的矢量处理器进行编程的一个进程的流程图。
图4是对一个主处理器进行编程以操作本发明的矢量处理器的一个进程的流程图。
具体实施方式
尽管本发明容许有许多不同形式的实施例,但在图中示出了一个或多个具体实施例,并在这里将详细说明它们,要理解当前的公开内容应被看作本发明的原理的示范,而不是将该发明局限于示出并说明的具体实施例。在下面的说明中,在这些图的几个视图中,用相似的参考数字来描述相同、相似或对应的部分。
本发明的可重配置的流型矢量处理器(RSVP)是一个协处理器,它执行矢量操作:即一组在一序列的数据元素上的相同操作。设计了通过执行高速矢量操作来增强嵌入式通用处理器(主处理器)的性能。在RSVP程序设计模型中,矢量操作的规范分成两个部分:访问和计算。
在本发明的一个实施例中,提供了一个可重配置的流型矢量处理器,该处理器包括许多功能单元,每一个功能单元都具有一个或多个用来接收数据值的输入端以及一个用来提供数据值的输出端,该处理器还包括一个可重配置的互连开关和一个微型序列发生器。可重配置的互连开关包括一个或多个链路,可操作每一个链路,用以将功能单元的输出与功能单元的输入耦合在一起,如微型序列发生器所指导的那样。矢量处理器还包括一个或多个用来从存储器中检索数据的输入流单元。输入流单元是由主处理器控制的,并具有到该主处理器的规定接口。矢量处理器还包括一个或多个用来将数据写入存储器的输出流单元,同时还包括一个到主处理器的规定接口。
在一个进一步的实施例中,可重配置的互连开关包括一个用于存储中间数据值的存储器。
在优选实施例中,输入流单元的规定接口构成了程序设计模型的第一部分。在序列中的存储器中所存储的指令构成了程序设计模型的第二部分,该序列控制可重配置的互连开关。
图1示出了RVSP硬件100的一个示范性的实施例。参照图1,许多功能单元102的输出和输入都通过可重配置的互连开关104而相互连接。功能单元可以包括乘法器106、加法器108、逻辑单元110和移位器112。其它的功能单元也可以包括在其中,并且特定型号的多功能单元也可以包括在其中。来自功能单元的输出端可以是单精度(single)寄存器或是流水线寄存器。功能单元可以支持多路独立操作。例如,乘法器可以具有128比特的输入和128比特的输出,并且能够执行两个32×32到64或四个16×16到32比特的乘法,或是任何总计不超过128比特输入或128比特输出的组合。硬件也包括一个或多个累加器114。在优选实施例中,累加器的作用既是累加器也是存储寄存器,并且它们不仅接合到互连开关104也接合到外部接口116。外部接口116使RSVP能够与主处理器连接并使主处理器能够访问累加器和RSVP的其它部件。功能单元102和可重配置的互连开关104为RSVP规定了数据通路。功能单元和可重配置的互连开关104连接到微型序列发生器118,该微型序列发生器118包括存储器120(最好是高速缓存),用来存储描述执行过程的指令程序,即所要求的矢量计算的数据流图的特定说明。在处理器时钟的每一个周期,微型序列发生器产生控制字,该控制字配置在互连开关中的链路并驱动功能单元。纯量(scalar)值以及隧道节点功能性的存储和表示由常量单元120提供。纯量值和隧道初值可以由主处理器或是指令程序加载。在操作中,一个或多个输入流单元122(在图中只示出了一个单元)将输入数据值提供给互连开关104。每一个输入流单元122都由描述存储器中的数据分配的一组参数所控制。这组参数由主处理器提供,其中,通过外部接口116,输入流单元连接到该主处理器上。每一个输出流单元124(在图中只示出了一个单元)都类似地由主处理器来控制,并且可操作来从可重配置互连开关104传输数据到外部存储器。输入流单元122和输出流单元124都与使数据流同步的微型序列发生器118相连。
上述示范性RSVP的体系结构易于矢量访问和矢量计算的操作的分离。
由输入和输出流单元执行的矢量访问,由作为矢量操作的一部分的每一个输入和输出矢量的位置、形状以及类型的描述构成。在优选实施例中,这些属性是通过两个或多个参数来描述的:
1.矢量地址:下一个矢量元素的存储器中的起始地址。
2.跨距(stride):从一个元素到下一个元素的带符号的增量。
3.跨度(span):在跳跃(skip)之前的跨距的数目。
4.跳跃:在对元素的跨度进行计数之后的带符号的增量。
5.大小:每一个数据元素的大小(例如,1,2或4字节)。
除了输入和输出矢量之外,矢量处理器还可以包括许多纯量和累加器寄存器,这些寄存器的初值可以由程序设计员指定。这些寄存器在矢量计算的内部使用。就累加器来说,寄存器的值允许在计算的过程中改变,并且可以为了以后使用而被访问。程序设计模型的矢量访问部分是以用于主处理器的程序设计语言来描述的,并在主机上执行。
矢量计算由将应用于输入矢量的元素的部分有序的运算符组、纯量寄存器以及导出每一个输出矢量元素的累加器所组成。在本发明的程序设计模型中,这通过计算的数据流图的线性化表示来完成。在数据流图中,在图中的每一个节点都使用一个节点描述符来标示,该节点描述符指定了将由该节点执行的操作以及从中获得输入数据的节点。这里没有像在其它CPU中那样显式命名在运算符之间传递数据的寄存器。这里是一个范例计算,它具有如下的C语言描述:
void quant(short*out,short*in,int n,short qp) { long rq,b,c; rq=((1<<16)+qp)/(qp<<1); b=qp-!(qp & 1); <dp n="d5"/> while(--n>0) { c=*in++; if(c<0)c+=b; else if(c>0)c-=b; *out++(c*rq)/(1<<16); } }
图2示出了一个相应的数据流图。参考图2,矢量v1在方框202中加载。该矢量的符号在节点204取出。纯量值s2和s1分别在方框206和208中加载。快速移位值16在方框210中加载。在节点212中用纯量s2乘以矢量v1。在节点214中从v1减去该乘法的结果。在节点216中用纯量s1乘以该减法的结果,然后在节点218中右移16。在方框220中将最后的矢量结果存储为v0。
数据流图的线性化的形式由以下给出:
Q1:vld.s16 (v1) //c=*in++;
Q2:vsign.s16 Q1
Q3:vscalar s2 //s2 is b
Q4:vscalar s1 //s1 is rq
Q5:vimm 16
Q6:vmul.s16 Q2,Q3 //if(c<0)c+=b;
Q7:vsub.s16 Q1,Q6 //else if(c>0)c-=b;
Q8:vmul.s32 Q7,Q4 //c*=rq;
Q9:vasr0.s16 Q8,Q5 //*out++=c/(1<<16);
这个示范性的线性化流程形式使用了以下功能操作:
vld.s16:从16比特数据值中加载下一个数据元素
vsign.s16:计算数据值的符号
vscalar:加载纯量值
vmul.s16:乘以两个16比特数据值
vmul.s32:乘以两个32比特数据值
vsub.s16:减去两个16比特数据值
vasr0.s16:16比特数据值的算术右移
功能单元优选地执行多种其它功能操作,包括矢量元素加法(vadd)以及累加(vadda)。在执行之前,将线性化图调度到RSVP数据通路上。该数据通路在每一个时钟周期都是可重配置的。功能单元可以聚合,就是说可以联合它们的子集以形成更大的功能单元。功能单元的互连允许功能单元的任意流水线操作。
RSVP通过它的程序设计模型而促进了高性能和快速上市的目标。既然它是处理器,RSVP优选地使用单芯的程序设计模型。双芯解决方案倾向于更加不容易编程,如使用在通用CPU/DSP组合中的那样。这是因为使用这些类型的解决方案,程序设计员必须采用两套不同的程序设计工具并且必须明确地处理CPU和DSP之间的同步。
在本发明的程序设计模型中,矢量访问描述从矢量计算描述中分离出来了。因此,程序设计员并不需要处理两个相关要素的混合作用。因为矢量访问只根据五个参数来描述,程序设计员可以避免处理数据调整和填充、存储器总线宽度或存储器延迟的问题。基础矢量访问硬件处理这些问题。这样,不论是存储器子系统或是矢量访问硬件的实现,矢量访问描述都保持不变。这不仅简化了程序设计员的任务,也提高了二进制代码的兼容性,因为RSVP二进制代码不必为了反映出在那些实现中的变动而改变。
矢量计算描述(在数据流图的形式中)不包括RSVP数据通路实现的特殊信息。除了累加器和矢量流单元(VSU)的使用外,RSVP没有任何的资源相关性。特别是,它没有在运算符之间传递数据的寄存器的显式命名。这就为调度程序消除了负担,使调度程序更容易达到最优化的调度。结果,数据通路能从一个纯量数据通路改变成一个超纯量的、类似VLIW或SIMD的、对程序设计员透明的数据通路,并且没有在RSVP二进制代码内的变动。
在图3和图4中说明了在程序设计模型中矢量访问描述和矢量计算描述的分离。图3示出了用于生成用于RSVP的代码的方法的流程图。在开始方框302之后,在方框304中指定了计算的数据流图。在方框306中从数据流图生成了计算的线性图形式。该线性图形式的生成可以手动完成,或是由计算机程序自动完成。在一个实施例中,计算机程序为用户提供了图形用户界面,方便数据流图的输入。在方框308中,将在方框306中生成的线性图形式提供给调度程序。调度程序是一个计算机程序,它为了RSVP资源的高效使用而对功能操作进行排序。调度一结束,在方框310中就生成用于RSVP的二进制代码,并且进程在方框312中终止。注意这个进程并不考虑矢量访问问题,例如数据调整和填充、存储器总线宽度或存储器延迟。这些问题由硬件来处理。数据访问在主处理器中指定。图4示出了用于主处理器的程序设计进程。参考图4,在开始方框402之后,在方框404中指定在存储器中的数据结构。在方框406中,指定了相关的数据访问参数(起始地址,跨距,跨度,跳跃,大小)。在操作中,将这些参数传给RSVP的输入流单元。在方框408,生成主处理器代码的剩余部分,接着在方框410进程终止。因此,主进程指定矢量访问,但是与矢量计算无关。
RSVP硬件运用程序设计模型的多个方面以提高性能。由于矢量访问和计算的分离,因此相对于其它的硬件,用于每一个的硬件都异步地操作。从而,矢量访问硬件可以在计算之前进行,在需要数据之前就提取数据,因此就隐藏了至少一部分的存储器延迟。
矢量访问描述是非常简洁的,所有的信息都保存在访问硬件里的少量寄存器中。这些寄存器对主处理器来说是可访问的。类似的,RSVP数据流图在体系结构上受限于固定数目的节点(例如,256),这样RSVP微型序列发生器存储器对于保存数据流图的全部线性化的形式来说就是足够大的。这个方法的优点是硬件永远不需要提取指令来决定如何执行地址计算或矢量计算。这就消除了为了服务RSVP计算所需的从存储带宽中取指令。
既然将矢量计算指定为包含极少资源相关性的数据流图,那么RSVP数据通路就不像其它CPU的数据通路。大多数DSP、SIMD、VLIW和矢量处理器设备都不能以任意的顺序来连接它们的数据通路的功能单元。没有一个具有可以聚集的功能单元。
本领域的技术人员会认识到本发明是依照基于使用特定体系结构的示范性实施例而描述的。然而,不应该这样限制本发明,因为可以使用同等的体系结构来实现本发明。那些本领域的技术人员会进一步理解,可以在不偏离本发明的精神和范围的情况下做出在形式和细节上的各种改变。
虽然已结合具体实施例描述了本发明,但是显然根据上述说明,许多替换、修改、变更和变化对于本领域的普通技术人员来说都是显而易见的。因此,意味着本发明应当包含所有这样的落在所附权利要求的范围之内的替换、修改、变更以及变化。
Claims (25)
1.一种可重配置的流型矢量处理器,其包括:
多个功能单元,每一个功能单元都具有一个或多个用于接收数据值的输入端以及一个用于提供数据值的输出端;
可重配置的互连开关,包括一个或多个链路,每一个链路都可操作的将功能单元的输出端耦合至功能单元的所述一个或多个输入端中的一个输入端;以及
耦合到可重配置的互连开关的微型序列发生器,可操作的控制所述可重配置的互连开关。
2.根据权利要求1的可重配置的流型矢量处理器,其中,所述微型序列发生器包括用于存储指令程序的程序存储器。
3.根据权利要求1的可重配置的流型矢量处理器,其中,所述可重配置的互连开关包括用于存储数据值的开关存储器。
4.根据权利要求3的可重配置的流型矢量处理器,其中,所述开关存储器包括FIFO、按程序工作的延迟和流水线寄存器堆中的至少一个。
5.根据权利要求1的可重配置的流型矢量处理器,其中,所述可重配置的互连开关的链路由微型序列发生器控制,以从功能单元的输出端接收数据值并且向功能单元的所述一个或多个输入端中的一个输入端提供数据值。
6.根据权利要求1的可重配置的流型矢量处理器,进一步包括:
一个或多个耦合到所述可重配置的互连开关的输入流单元,可操作的从数据存储器检索输入数据值,并提供数据值给所述可重配置的互连开关;以及
一个或多个耦合到所述可重配置的互连开关的输出流单元,可操作的从所述可重配置的互连开关接收数据值,并提供输出数据值给数据存储器。
7.根据权利要求6的可重配置的流型矢量处理器,其中,所述输入和输出流单元包括用于从主机接收控制指令的接口。
8.根据权利要求7的可重配置的流型矢量处理器,其中,所述控制指令包括以下各项的至少一项:
在数据存储器中的数据值的矢量的起始地址;
数据值的矢量的跨度;
数据值之间的跨距;
在矢量数据值的跨度之间跳跃的大量存储器地址;
以及
在数据值的矢量中的每一个数据值的大小。
9.根据权利要求6的可重配置的流型矢量处理器,进一步包括外部接口,可操作的耦合到所述输入流单元、输出流单元以及到主机的微型序列发生器。
10.根据权利要求1的可重配置的流型矢量处理器,其中,所述功能单元包括以下各项中的至少一项:
移位器;
加法器;
逻辑单元;以及
乘法器。
11.根据权利要求10的可重配置的流型矢量处理器,其中,所述功能单元进一步包括通过(passthrough)功能单元。
12.根据权利要求1的可重配置的流型矢量处理器,其中,所述多个功能单元中的至少一个的输出端包括寄存器流水线。
13.根据权利要求1的可重配置的流型矢量处理器,进一步包括至少一个耦合到所述可重配置的互连开关的累加器。
14.根据权利要求13的可重配置的流型矢量处理器,其中,所述至少一个累加器是可操作的耦合到主机的。
15.根据权利要求1的可重配置的流型矢量处理器,进一步包括多个纯量寄存器。
16.根据权利要求15的可重配置的流型矢量处理器,其中,所述多个纯量寄存器提供数据隧道。
17.一种用于配置流型矢量处理器的方法,所述流型矢量处理器处理器包括互连开关、微型序列发生器以及多个功能单元,该方法包括:
在所述微型序列发生器中存储指令程序;
检索所述指令程序的指令;
根据从所述指令程序中检索到的指令来配置所述互连开关;
根据从所述指令程序中接收的指令而将存储在第一个存储器内的数据提供给功能单元;
所述功能单元操作所述数据;以及
根据从所述指令程序接收的指令而将来自功能单元的数据存储在第二个存储器中。
18.根据权利要求17的方法,其中,所述流矢量处理器进一步包括一个或多个具有缓冲存储器的输入流单元,其中第一个存储器是输入流单元的缓冲存储器以及在互连开关中的存储器中的一个或多个。
19.根据权利要求18的方法,该方法进一步包括每一个输入流单元根据一组从主机接收的参数而从外部存储器检索数据值并将它们存储在所述输入流单元的缓冲存储器中。
20.根据权利要求17的方法,其中,所述流型矢量处理器进一步包括一个或多个具有缓冲存储器的输出流单元,其中第二个存储器是输出流单元的缓冲存储器以及在互连开关中的存储器中的一个或多个。
21.根据权利要求20的方法,进一步包括每一个输出流单元根据一组从主机接收的参数而将来自所述输出流单元的缓冲存储器的数据值写入外部存储器。
22.一种用于对流型矢量处理器进行编程以执行迭代计算的方法,所述流矢量处理器具有可重配置的数据通路,该方法包括:
指定所述迭代计算的迭代的数据流图;
从所述数据流图生成线性图,该线性图指定对应于所述数据流图的操作的部分有序集;
调度所述线性图到所述流型矢量处理器的数据通路上;以及
生成可操作的二进制代码指令以配置所述流矢量处理器的数据通路。
23.根据权利要求22的方法,其中,所述流型矢量处理器包括具有存储器的微型序列发生器,并且该方法进一步包括将二进制代码指令存储在所述微型序列发生器的存储器中。
24.根据权利要求23的方法,其中,所述调度所述线性图和生成二进制代码指令都由计算机来执行。
25.根据权利要求22的方法,其中,所述生成线性图进一步包括使用图形用户界面来对计算机指定所述数据流图以及所述计算机自动从所述数据流图生成所述线性图。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/184,583 | 2002-06-28 | ||
US10/184,583 US7159099B2 (en) | 2002-06-28 | 2002-06-28 | Streaming vector processor with reconfigurable interconnection switch |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1666187A true CN1666187A (zh) | 2005-09-07 |
Family
ID=29779404
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN03815336XA Pending CN1666187A (zh) | 2002-06-28 | 2003-05-20 | 可重配置的流型矢量处理器 |
Country Status (7)
Country | Link |
---|---|
US (2) | US7159099B2 (zh) |
EP (1) | EP1535171A4 (zh) |
JP (1) | JP2005531848A (zh) |
CN (1) | CN1666187A (zh) |
AU (1) | AU2003228247A1 (zh) |
TW (1) | TWI234738B (zh) |
WO (1) | WO2004003767A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010057375A1 (zh) * | 2008-11-19 | 2010-05-27 | 北京大学深圳研究生院 | 一种可配置处理器体系结构和控制方法 |
CN102122275A (zh) * | 2010-01-08 | 2011-07-13 | 上海芯豪微电子有限公司 | 一种可配置处理器 |
CN103530088A (zh) * | 2012-06-29 | 2014-01-22 | 美国亚德诺半导体公司 | 分级循环指令 |
CN105027109A (zh) * | 2013-03-13 | 2015-11-04 | 高通股份有限公司 | 具有用于提供多模向量处理的可编程数据路径配置的向量处理引擎、以及相关向量处理器、系统和方法 |
CN105723332A (zh) * | 2013-11-15 | 2016-06-29 | 高通股份有限公司 | 在执行单元与向量数据存储器之间的数据流路径中采用解扩展电路系统的向量处理引擎以及相关的方法 |
CN114579083A (zh) * | 2022-05-09 | 2022-06-03 | 上海擎昆信息科技有限公司 | 一种基于矢量处理器的数据处理装置和方法 |
Families Citing this family (80)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7415601B2 (en) * | 2002-06-28 | 2008-08-19 | Motorola, Inc. | Method and apparatus for elimination of prolog and epilog instructions in a vector processor using data validity tags and sink counters |
US7159099B2 (en) * | 2002-06-28 | 2007-01-02 | Motorola, Inc. | Streaming vector processor with reconfigurable interconnection switch |
US6961888B2 (en) * | 2002-08-20 | 2005-11-01 | Flarion Technologies, Inc. | Methods and apparatus for encoding LDPC codes |
US7290122B2 (en) * | 2003-08-29 | 2007-10-30 | Motorola, Inc. | Dataflow graph compression for power reduction in a vector processor |
US7610466B2 (en) * | 2003-09-05 | 2009-10-27 | Freescale Semiconductor, Inc. | Data processing system using independent memory and register operand size specifiers and method thereof |
US7275148B2 (en) * | 2003-09-08 | 2007-09-25 | Freescale Semiconductor, Inc. | Data processing system using multiple addressing modes for SIMD operations and method thereof |
US7315932B2 (en) * | 2003-09-08 | 2008-01-01 | Moyer William C | Data processing system having instruction specifiers for SIMD register operands and method thereof |
US7107436B2 (en) * | 2003-09-08 | 2006-09-12 | Freescale Semiconductor, Inc. | Conditional next portion transferring of data stream to or from register based on subsequent instruction aspect |
US8484441B2 (en) * | 2004-03-31 | 2013-07-09 | Icera Inc. | Apparatus and method for separate asymmetric control processing and data path processing in a configurable dual path processor that supports instructions having different bit widths |
US7949856B2 (en) * | 2004-03-31 | 2011-05-24 | Icera Inc. | Method and apparatus for separate control processing and data path processing in a dual path processor with a shared load/store unit |
US9047094B2 (en) * | 2004-03-31 | 2015-06-02 | Icera Inc. | Apparatus and method for separate asymmetric control processing and data path processing in a dual path processor |
JP2006018412A (ja) * | 2004-06-30 | 2006-01-19 | Fujitsu Ltd | アドレス生成器および演算回路 |
JP2006236106A (ja) * | 2005-02-25 | 2006-09-07 | Canon Inc | データ処理装置及びデータ処理方法 |
US7305649B2 (en) * | 2005-04-20 | 2007-12-04 | Motorola, Inc. | Automatic generation of a streaming processor circuit |
US20070150697A1 (en) * | 2005-05-10 | 2007-06-28 | Telairity Semiconductor, Inc. | Vector processor with multi-pipe vector block matching |
US20060265485A1 (en) * | 2005-05-17 | 2006-11-23 | Chai Sek M | Method and apparatus for controlling data transfer in a processing system |
US7415595B2 (en) * | 2005-05-24 | 2008-08-19 | Coresonic Ab | Data processing without processor core intervention by chain of accelerators selectively coupled by programmable interconnect network and to memory |
EP1732306B1 (en) * | 2005-06-10 | 2012-11-21 | Agfa Graphics N.V. | Image processing method for reducing image artefacts |
US7603492B2 (en) * | 2005-09-20 | 2009-10-13 | Motorola, Inc. | Automatic generation of streaming data interface circuit |
US7502909B2 (en) * | 2005-10-11 | 2009-03-10 | Motorola, Inc. | Memory address generation with non-harmonic indexing |
GB0522957D0 (en) * | 2005-11-11 | 2005-12-21 | Portabottle Ltd | Carrier |
US7856628B2 (en) * | 2006-01-23 | 2010-12-21 | International Business Machines Corporation | Method for simplifying compiler-generated software code |
US7872657B1 (en) * | 2006-06-16 | 2011-01-18 | Nvidia Corporation | Memory addressing scheme using partition strides |
US20080120497A1 (en) * | 2006-11-20 | 2008-05-22 | Motorola, Inc. | Automated configuration of a processing system using decoupled memory access and computation |
US7634633B2 (en) * | 2006-11-30 | 2009-12-15 | Motorola, Inc. | Method and apparatus for memory address generation using dynamic stream descriptors |
US8204075B2 (en) * | 2007-01-31 | 2012-06-19 | Broadcom Corporation | Inter-device wireless communication for intra-device communications |
US20080320293A1 (en) * | 2007-01-31 | 2008-12-25 | Broadcom Corporation | Configurable processing core |
US8289944B2 (en) * | 2007-01-31 | 2012-10-16 | Broadcom Corporation | Apparatus for configuration of wireless operation |
US20090017910A1 (en) * | 2007-06-22 | 2009-01-15 | Broadcom Corporation | Position and motion tracking of an object |
US8438322B2 (en) * | 2007-01-31 | 2013-05-07 | Broadcom Corporation | Processing module with millimeter wave transceiver interconnection |
US20090011832A1 (en) * | 2007-01-31 | 2009-01-08 | Broadcom Corporation | Mobile communication device with game application for display on a remote monitor and methods for use therewith |
US9486703B2 (en) | 2007-01-31 | 2016-11-08 | Broadcom Corporation | Mobile communication device with game application for use in conjunction with a remote mobile communication device and methods for use therewith |
US8239650B2 (en) * | 2007-01-31 | 2012-08-07 | Broadcom Corporation | Wirelessly configurable memory device addressing |
US8223736B2 (en) * | 2007-01-31 | 2012-07-17 | Broadcom Corporation | Apparatus for managing frequency use |
US8238275B2 (en) * | 2007-01-31 | 2012-08-07 | Broadcom Corporation | IC with MMW transceiver communications |
US8116294B2 (en) * | 2007-01-31 | 2012-02-14 | Broadcom Corporation | RF bus controller |
US8280303B2 (en) * | 2007-01-31 | 2012-10-02 | Broadcom Corporation | Distributed digital signal processor |
US20090197641A1 (en) * | 2008-02-06 | 2009-08-06 | Broadcom Corporation | Computing device with handheld and extended computing units |
US8125950B2 (en) * | 2007-01-31 | 2012-02-28 | Broadcom Corporation | Apparatus for wirelessly managing resources |
US8200156B2 (en) * | 2007-01-31 | 2012-06-12 | Broadcom Corporation | Apparatus for allocation of wireless resources |
US8254319B2 (en) * | 2007-01-31 | 2012-08-28 | Broadcom Corporation | Wireless programmable logic device |
US8121541B2 (en) * | 2007-01-31 | 2012-02-21 | Broadcom Corporation | Integrated circuit with intra-chip and extra-chip RF communication |
US7694193B2 (en) * | 2007-03-13 | 2010-04-06 | Hewlett-Packard Development Company, L.P. | Systems and methods for implementing a stride value for accessing memory |
US7802005B2 (en) * | 2007-03-30 | 2010-09-21 | Motorola, Inc. | Method and apparatus for configuring buffers for streaming data transfer |
US8175646B2 (en) * | 2008-02-06 | 2012-05-08 | Broadcom Corporation | Networking of multiple mode handheld computing unit |
US8064952B2 (en) * | 2008-02-06 | 2011-11-22 | Broadcom Corporation | A/V control for a computing device with handheld and extended computing units |
US8195928B2 (en) * | 2008-02-06 | 2012-06-05 | Broadcom Corporation | Handheld computing unit with merged mode |
US8717974B2 (en) * | 2008-02-06 | 2014-05-06 | Broadcom Corporation | Handheld computing unit coordination of femtocell AP functions |
US8117370B2 (en) * | 2008-02-06 | 2012-02-14 | Broadcom Corporation | IC for handheld computing unit of a computing device |
US20090198798A1 (en) * | 2008-02-06 | 2009-08-06 | Broadcom Corporation | Handheld computing unit back-up system |
KR100976628B1 (ko) * | 2008-05-09 | 2010-08-18 | 한국전자통신연구원 | 다중 프로세서 시스템 및 그 시스템에서의 다중 프로세싱방법 |
US8430750B2 (en) * | 2008-05-22 | 2013-04-30 | Broadcom Corporation | Video gaming device with image identification |
US7945768B2 (en) | 2008-06-05 | 2011-05-17 | Motorola Mobility, Inc. | Method and apparatus for nested instruction looping using implicit predicates |
US8793472B2 (en) * | 2008-08-15 | 2014-07-29 | Apple Inc. | Vector index instruction for generating a result vector with incremental values based on a start value and an increment value |
US10318291B2 (en) | 2011-11-30 | 2019-06-11 | Intel Corporation | Providing vector horizontal compare functionality within a vector register |
CN103959237B (zh) | 2011-11-30 | 2016-09-28 | 英特尔公司 | 用于提供向量横向比较功能的指令和逻辑 |
US9747363B1 (en) * | 2012-03-01 | 2017-08-29 | Attivio, Inc. | Efficient storage and retrieval of sparse arrays of identifier-value pairs |
US9116686B2 (en) | 2012-04-02 | 2015-08-25 | Apple Inc. | Selective suppression of branch prediction in vector partitioning loops until dependency vector is available for predicate generating instruction |
US9465620B2 (en) * | 2012-12-20 | 2016-10-11 | Intel Corporation | Scalable compute fabric |
US9275014B2 (en) | 2013-03-13 | 2016-03-01 | Qualcomm Incorporated | Vector processing engines having programmable data path configurations for providing multi-mode radix-2x butterfly vector processing circuits, and related vector processors, systems, and methods |
US9043510B2 (en) | 2013-08-06 | 2015-05-26 | Oracle International Corporation | Hardware streaming unit |
US9396113B2 (en) * | 2013-08-06 | 2016-07-19 | Oracle International Corporation | Flexible configuration hardware streaming unit |
US20150052330A1 (en) * | 2013-08-14 | 2015-02-19 | Qualcomm Incorporated | Vector arithmetic reduction |
US9390058B2 (en) | 2013-09-24 | 2016-07-12 | Apple Inc. | Dynamic attribute inference |
US9367309B2 (en) * | 2013-09-24 | 2016-06-14 | Apple Inc. | Predicate attribute tracker |
CN103544131B (zh) * | 2013-10-12 | 2017-06-16 | 深圳市欧克蓝科技有限公司 | 一种可动态配置矢量处理装置 |
US10353708B2 (en) * | 2016-09-23 | 2019-07-16 | Advanced Micro Devices, Inc. | Strided loading of non-sequential memory locations by skipping memory locations between consecutive loads |
US10216515B2 (en) | 2016-10-18 | 2019-02-26 | Oracle International Corporation | Processor load using a bit vector to calculate effective address |
US11853244B2 (en) * | 2017-01-26 | 2023-12-26 | Wisconsin Alumni Research Foundation | Reconfigurable computer accelerator providing stream processor and dataflow processor |
US10956360B2 (en) * | 2017-03-14 | 2021-03-23 | Azurengine Technologies Zhuhai Inc. | Static shared memory access with one piece of input data to be reused for successive execution of one instruction in a reconfigurable parallel processor |
US10318306B1 (en) * | 2017-05-03 | 2019-06-11 | Ambarella, Inc. | Multidimensional vectors in a coprocessor |
US10331445B2 (en) * | 2017-05-24 | 2019-06-25 | Microsoft Technology Licensing, Llc | Multifunction vector processor circuits |
CN109032668B (zh) | 2017-06-09 | 2023-09-19 | 超威半导体公司 | 具有高带宽和低功率向量寄存器堆的流处理器 |
US11614941B2 (en) * | 2018-03-30 | 2023-03-28 | Qualcomm Incorporated | System and method for decoupling operations to accelerate processing of loop structures |
US11663001B2 (en) * | 2018-11-19 | 2023-05-30 | Advanced Micro Devices, Inc. | Family of lossy sparse load SIMD instructions |
KR20220139304A (ko) * | 2019-12-30 | 2022-10-14 | 스타 알리 인터내셔널 리미티드 | 구성 가능한 병렬 계산을 위한 프로세서 |
EP3937009A1 (en) * | 2020-07-09 | 2022-01-12 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | A computing unit, method to perform computations, method to generate program code for a hardware comprising a multi-stage-buffer and program code |
US20220414049A1 (en) * | 2021-06-28 | 2022-12-29 | Silicon Laboratories Inc. | Apparatus for Array Processor and Associated Methods |
US20220414050A1 (en) * | 2021-06-28 | 2022-12-29 | Silicon Laboratories Inc. | Apparatus for Memory Configuration for Array Processor and Associated Methods |
US20220413850A1 (en) * | 2021-06-28 | 2022-12-29 | Silicon Laboratories Inc. | Apparatus for Processor with Macro-Instruction and Associated Methods |
Family Cites Families (77)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3718912A (en) | 1970-12-22 | 1973-02-27 | Ibm | Instruction execution unit |
US4128880A (en) | 1976-06-30 | 1978-12-05 | Cray Research, Inc. | Computer vector register processing |
JPS5975365A (ja) | 1982-10-22 | 1984-04-28 | Hitachi Ltd | ベクトル処理装置 |
JPS60134974A (ja) | 1983-12-23 | 1985-07-18 | Hitachi Ltd | ベクトル処理装置 |
JPS6145354A (ja) | 1984-08-10 | 1986-03-05 | Nec Corp | マイクロプロセツサ |
US4744043A (en) * | 1985-03-25 | 1988-05-10 | Motorola, Inc. | Data processor execution unit which receives data with reduced instruction overhead |
US4807183A (en) | 1985-09-27 | 1989-02-21 | Carnegie-Mellon University | Programmable interconnection chip for computer system functional modules |
US5481740A (en) | 1986-04-14 | 1996-01-02 | National Instruments Corporation | Method and apparatus for providing autoprobe features in a graphical data flow diagram |
US5821934A (en) | 1986-04-14 | 1998-10-13 | National Instruments Corporation | Method and apparatus for providing stricter data type capabilities in a graphical data flow diagram |
US5734863A (en) | 1986-04-14 | 1998-03-31 | National Instruments Corporation | Method and apparatus for providing improved type compatibility and data structure organization in a graphical data flow diagram |
US4862407A (en) * | 1987-10-05 | 1989-08-29 | Motorola, Inc. | Digital signal processing apparatus |
US4918600A (en) | 1988-08-01 | 1990-04-17 | Board Of Regents, University Of Texas System | Dynamic address mapping for conflict-free vector access |
US5317734A (en) | 1989-08-29 | 1994-05-31 | North American Philips Corporation | Method of synchronizing parallel processors employing channels and compiling method minimizing cross-processor data dependencies |
JP2718254B2 (ja) | 1990-10-02 | 1998-02-25 | 日本電気株式会社 | ベクトル処理装置 |
US5966528A (en) | 1990-11-13 | 1999-10-12 | International Business Machines Corporation | SIMD/MIMD array processor with vector processing |
JP3532932B2 (ja) | 1991-05-20 | 2004-05-31 | モトローラ・インコーポレイテッド | 時間重複メモリ・アクセスを有するランダムにアクセス可能なメモリ |
US5423040A (en) | 1991-07-24 | 1995-06-06 | International Business Machines Corporation | System and method for efficiently executing directed acyclic graphs |
US5965528A (en) * | 1991-09-27 | 1999-10-12 | Mcgill University | Recombinant human alph-fetoprotein as an immunosuppressive agent |
US5206822A (en) | 1991-11-15 | 1993-04-27 | Regents Of The University Of California | Method and apparatus for optimized processing of sparse matrices |
EP0610727B1 (en) | 1993-02-08 | 1998-09-30 | Sony Corporation | Optical waveguide for second harmonic generation |
US5717947A (en) | 1993-03-31 | 1998-02-10 | Motorola, Inc. | Data processing system and method thereof |
US5418953A (en) | 1993-04-12 | 1995-05-23 | Loral/Rohm Mil-Spec Corp. | Method for automated deployment of a software program onto a multi-processor architecture |
US5450607A (en) * | 1993-05-17 | 1995-09-12 | Mips Technologies Inc. | Unified floating point and integer datapath for a RISC processor |
US6064819A (en) | 1993-12-08 | 2000-05-16 | Imec | Control flow and memory management optimization |
US5719988A (en) * | 1994-05-31 | 1998-02-17 | Tektronix, Inc. | Dynamically paged non-volatile random access video store |
FR2723652B1 (fr) | 1994-08-11 | 1996-09-13 | Cegelec | Procede pour ordonnancer des taches successives |
JP2660163B2 (ja) * | 1994-10-11 | 1997-10-08 | 有限会社アレフロジック | アルゴリズム教育支援システム |
US5652903A (en) * | 1994-11-01 | 1997-07-29 | Motorola, Inc. | DSP co-processor for use on an integrated circuit that performs multiple communication tasks |
US5887183A (en) | 1995-01-04 | 1999-03-23 | International Business Machines Corporation | Method and system in a data processing system for loading and storing vectors in a plurality of modes |
US5495817A (en) * | 1995-05-22 | 1996-03-05 | Blough-Wagner Manufacturing Co., Inc. | Pedal mechanism for operating presser and motor in sewing machines |
US5719998A (en) * | 1995-06-12 | 1998-02-17 | S3, Incorporated | Partitioned decompression of audio data using audio decoder engine for computationally intensive processing |
JP3598589B2 (ja) | 1995-06-28 | 2004-12-08 | 株式会社日立製作所 | プロセッサ |
JP3520611B2 (ja) | 1995-07-06 | 2004-04-19 | 株式会社日立製作所 | プロセッサの制御方法 |
US5742821A (en) | 1995-11-08 | 1998-04-21 | Lucent Technologies Inc. | Multiprocessor scheduling and execution |
US5764787A (en) | 1996-03-27 | 1998-06-09 | Intel Corporation | Multi-byte processing of byte-based image data |
US6571016B1 (en) | 1997-05-05 | 2003-05-27 | Microsoft Corporation | Intra compression of pixel blocks using predicted mean |
JPH09330304A (ja) | 1996-06-05 | 1997-12-22 | Internatl Business Mach Corp <Ibm> | プロセッサ間の通信スケジュールを決定する方法 |
US5805614A (en) | 1996-07-03 | 1998-09-08 | General Signal Corporation | Fault tolerant switch fabric with control and data correction by hamming codes |
US5889989A (en) | 1996-09-16 | 1999-03-30 | The Research Foundation Of State University Of New York | Load sharing controller for optimizing monetary cost |
GB2317469B (en) | 1996-09-23 | 2001-02-21 | Advanced Risc Mach Ltd | Data processing system register control |
GB2317464A (en) | 1996-09-23 | 1998-03-25 | Advanced Risc Mach Ltd | Register addressing in a data processing apparatus |
GB2317465B (en) | 1996-09-23 | 2000-11-15 | Advanced Risc Mach Ltd | Data processing apparatus registers. |
US6317774B1 (en) | 1997-01-09 | 2001-11-13 | Microsoft Corporation | Providing predictable scheduling of programs using a repeating precomputed schedule |
US6112023A (en) | 1997-02-24 | 2000-08-29 | Lucent Technologies Inc. | Scheduling-based hardware-software co-synthesis of heterogeneous distributed embedded systems |
US5999736A (en) | 1997-05-09 | 1999-12-07 | Intel Corporation | Optimizing code by exploiting speculation and predication with a cost-benefit data flow analysis based on path profiling information |
US6437804B1 (en) | 1997-10-23 | 2002-08-20 | Aprisma Management Technologies, Inc | Method for automatic partitioning of node-weighted, edge-constrained graphs |
US6173389B1 (en) | 1997-12-04 | 2001-01-09 | Billions Of Operations Per Second, Inc. | Methods and apparatus for dynamic very long instruction word sub-instruction selection for execution time parallelism in an indirect very long instruction word processor |
US5936953A (en) | 1997-12-18 | 1999-08-10 | Raytheon Company | Multi-mode, multi-channel communication bus |
US6430671B1 (en) | 1998-02-10 | 2002-08-06 | Lucent Technologies, Inc. | Address generation utilizing an adder, a non-sequential counter and a latch |
US6104962A (en) | 1998-03-26 | 2000-08-15 | Rockwell Technologies, Llc | System for and method of allocating processing tasks of a control program configured to control a distributed control system |
US6202130B1 (en) * | 1998-04-17 | 2001-03-13 | Motorola, Inc. | Data processing system for processing vector data and method therefor |
US6128775A (en) | 1998-06-16 | 2000-10-03 | Silicon Graphics, Incorporated | Method, system, and computer program product for performing register promotion via load and store placement optimization within an optimizing compiler |
US6052766A (en) | 1998-07-07 | 2000-04-18 | Lucent Technologies Inc. | Pointer register indirectly addressing a second register in the processor core of a digital processor |
US6192384B1 (en) | 1998-09-14 | 2001-02-20 | The Board Of Trustees Of The Leland Stanford Junior University | System and method for performing compound vector operations |
US6629123B1 (en) | 1998-10-02 | 2003-09-30 | Microsoft Corporation | Interception of unit creation requests by an automatic distributed partitioning system |
US6442701B1 (en) | 1998-11-25 | 2002-08-27 | Texas Instruments Incorporated | Power saving by disabling memory block access for aligned NOP slots during fetch of multiple instruction words |
SE9804529L (sv) | 1998-12-23 | 2000-06-24 | Axis Ab | Flexibel minneskanal |
US6665749B1 (en) | 1999-08-17 | 2003-12-16 | Nec Electronics, Inc. | Bus protocol for efficiently transferring vector data |
US6513107B1 (en) | 1999-08-17 | 2003-01-28 | Nec Electronics, Inc. | Vector transfer system generating address error exception when vector to be transferred does not start and end on same memory page |
US6745160B1 (en) | 1999-10-08 | 2004-06-01 | Nec Corporation | Verification of scheduling in the presence of loops using uninterpreted symbolic simulation |
US6588009B1 (en) | 1999-11-29 | 2003-07-01 | Adelante Technologies Nv | Method and apparatus for compiling source code using symbolic execution |
US6754893B2 (en) | 1999-12-29 | 2004-06-22 | Texas Instruments Incorporated | Method for collapsing the prolog and epilog of software pipelined loops |
US6892380B2 (en) | 1999-12-30 | 2005-05-10 | Texas Instruments Incorporated | Method for software pipelining of irregular conditional control loops |
US6795908B1 (en) * | 2000-02-16 | 2004-09-21 | Freescale Semiconductor, Inc. | Method and apparatus for instruction execution in a data processing system |
JP3674515B2 (ja) * | 2000-02-25 | 2005-07-20 | 日本電気株式会社 | アレイ型プロセッサ |
US6598221B1 (en) * | 2000-04-13 | 2003-07-22 | Koninklijke Philips Electronics N.V. | Assembly code performance evaluation apparatus and method |
US6647546B1 (en) | 2000-05-03 | 2003-11-11 | Sun Microsystems, Inc. | Avoiding gather and scatter when calling Fortran 77 code from Fortran 90 code |
US7010788B1 (en) | 2000-05-19 | 2006-03-07 | Hewlett-Packard Development Company, L.P. | System for computing the optimal static schedule using the stored task execution costs with recent schedule execution costs |
DE10057343A1 (de) | 2000-11-18 | 2002-05-23 | Philips Corp Intellectual Pty | Paketvermittlungseinrichtung mit einer Kaskadensteuerung und pufferloser Kaskadenkoppelmatrix |
US6898691B2 (en) * | 2001-06-06 | 2005-05-24 | Intrinsity, Inc. | Rearranging data between vector and matrix forms in a SIMD matrix processor |
JP3914771B2 (ja) | 2002-01-09 | 2007-05-16 | 株式会社日立製作所 | パケット通信装置及びパケットデータ転送制御方法 |
US6732354B2 (en) | 2002-04-23 | 2004-05-04 | Quicksilver Technology, Inc. | Method, system and software for programming reconfigurable hardware |
US7159099B2 (en) * | 2002-06-28 | 2007-01-02 | Motorola, Inc. | Streaming vector processor with reconfigurable interconnection switch |
AT412881B (de) * | 2002-08-23 | 2005-08-25 | Wuester Heinrich | Schirmartiger wäschetrockner mit schutzhülle |
US7610466B2 (en) * | 2003-09-05 | 2009-10-27 | Freescale Semiconductor, Inc. | Data processing system using independent memory and register operand size specifiers and method thereof |
US7107436B2 (en) * | 2003-09-08 | 2006-09-12 | Freescale Semiconductor, Inc. | Conditional next portion transferring of data stream to or from register based on subsequent instruction aspect |
US7315932B2 (en) * | 2003-09-08 | 2008-01-01 | Moyer William C | Data processing system having instruction specifiers for SIMD register operands and method thereof |
-
2002
- 2002-06-28 US US10/184,583 patent/US7159099B2/en not_active Expired - Fee Related
-
2003
- 2003-05-20 WO PCT/US2003/016019 patent/WO2004003767A1/en not_active Application Discontinuation
- 2003-05-20 AU AU2003228247A patent/AU2003228247A1/en not_active Abandoned
- 2003-05-20 EP EP03726946A patent/EP1535171A4/en not_active Withdrawn
- 2003-05-20 JP JP2004517568A patent/JP2005531848A/ja active Pending
- 2003-05-20 CN CN03815336XA patent/CN1666187A/zh active Pending
- 2003-06-11 TW TW092115849A patent/TWI234738B/zh not_active IP Right Cessation
- 2003-09-08 US US10/657,793 patent/US7100019B2/en not_active Expired - Fee Related
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010057375A1 (zh) * | 2008-11-19 | 2010-05-27 | 北京大学深圳研究生院 | 一种可配置处理器体系结构和控制方法 |
CN102122275A (zh) * | 2010-01-08 | 2011-07-13 | 上海芯豪微电子有限公司 | 一种可配置处理器 |
CN103530088A (zh) * | 2012-06-29 | 2014-01-22 | 美国亚德诺半导体公司 | 分级循环指令 |
CN103530088B (zh) * | 2012-06-29 | 2016-08-17 | 美国亚德诺半导体公司 | 分级循环指令 |
CN105027109A (zh) * | 2013-03-13 | 2015-11-04 | 高通股份有限公司 | 具有用于提供多模向量处理的可编程数据路径配置的向量处理引擎、以及相关向量处理器、系统和方法 |
CN105027109B (zh) * | 2013-03-13 | 2019-03-08 | 高通股份有限公司 | 具有用于提供多模向量处理的可编程数据路径配置的向量处理引擎、以及相关向量处理器、系统和方法 |
CN105723332A (zh) * | 2013-11-15 | 2016-06-29 | 高通股份有限公司 | 在执行单元与向量数据存储器之间的数据流路径中采用解扩展电路系统的向量处理引擎以及相关的方法 |
CN114579083A (zh) * | 2022-05-09 | 2022-06-03 | 上海擎昆信息科技有限公司 | 一种基于矢量处理器的数据处理装置和方法 |
Also Published As
Publication number | Publication date |
---|---|
JP2005531848A (ja) | 2005-10-20 |
US7100019B2 (en) | 2006-08-29 |
AU2003228247A1 (en) | 2004-01-19 |
TWI234738B (en) | 2005-06-21 |
US20040117595A1 (en) | 2004-06-17 |
US7159099B2 (en) | 2007-01-02 |
EP1535171A4 (en) | 2007-02-28 |
EP1535171A1 (en) | 2005-06-01 |
US20040003206A1 (en) | 2004-01-01 |
TW200405981A (en) | 2004-04-16 |
WO2004003767A1 (en) | 2004-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1666187A (zh) | 可重配置的流型矢量处理器 | |
JP7264955B2 (ja) | プログラム可能な最適化を有するメモリネットワークプロセッサ | |
KR102292349B1 (ko) | 처리 장치 및 처리 방법 | |
Talla et al. | Bottlenecks in multimedia processing with SIMD style extensions and architectural enhancements | |
JP3547139B2 (ja) | プロセッサ | |
RU2263947C2 (ru) | Целочисленное умножение высокого порядка с округлением и сдвигом в архитектуре с одним потоком команд и множеством потоков данных | |
Rixner et al. | A bandwidth-efficient architecture for media processing | |
JP3958662B2 (ja) | プロセッサ | |
JP3739403B2 (ja) | プロセッサのカスタム動作のための方法及び装置 | |
US7305649B2 (en) | Automatic generation of a streaming processor circuit | |
Yu et al. | Vector processing as a soft-core CPU accelerator | |
US20050283589A1 (en) | Data processor | |
US11635957B2 (en) | Hardware-implemented universal floating-point instruction set architecture for computing directly with human-readable decimal character sequence floating-point representation operands | |
JP2021057004A (ja) | 行列演算アクセラレータの命令のための装置、方法、及びシステム | |
US6934938B2 (en) | Method of programming linear graphs for streaming vector computation | |
CN109213472A (zh) | 用于利用常数值的矢量运算的指令 | |
KR100267089B1 (ko) | 스칼라/벡터연산이조합된단일명령복수데이터처리 | |
KR19980018071A (ko) | 멀티미디어 신호 프로세서의 단일 명령 다중 데이터 처리 | |
Jeannerod et al. | Simultaneous floating-point sine and cosine for VLIW integer processors | |
Chiu et al. | A multi-streaming SIMD architecture for multimedia applications | |
Talla et al. | Mediabreeze: a decoupled architecture for accelerating multimedia applications | |
Jeroen van Straten | ρ-VEX user manual | |
Vanderbauwhede et al. | MORA: High-Level FPGA Programming Using a Many-Core Framework | |
De Sutter et al. | On the use of subword parallelism in medical image processing | |
CN116414461A (zh) | 一种基于simd指令的线性插值算法的优化方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20050907 |
|
C20 | Patent right or utility model deemed to be abandoned or is abandoned |