CN105027075A - 具有共享前端单元的处理核 - Google Patents
具有共享前端单元的处理核 Download PDFInfo
- Publication number
- CN105027075A CN105027075A CN201380060918.9A CN201380060918A CN105027075A CN 105027075 A CN105027075 A CN 105027075A CN 201380060918 A CN201380060918 A CN 201380060918A CN 105027075 A CN105027075 A CN 105027075A
- Authority
- CN
- China
- Prior art keywords
- processing unit
- thread
- instruction
- microcode
- processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012545 processing Methods 0.000 title claims abstract description 76
- 238000000034 method Methods 0.000 claims description 52
- 230000008569 process Effects 0.000 claims description 42
- 230000008878 coupling Effects 0.000 claims 2
- 238000010168 coupling process Methods 0.000 claims 2
- 238000005859 coupling reaction Methods 0.000 claims 2
- 230000006870 function Effects 0.000 description 5
- 230000002708 enhancing effect Effects 0.000 description 4
- 230000001133 acceleration Effects 0.000 description 2
- 230000000712 assembly Effects 0.000 description 2
- 238000000429 assembly Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 244000045947 parasite Species 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000014616 translation Effects 0.000 description 2
- 241001522296 Erithacus rubecula Species 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G06F9/3891—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute organised in groups of units sharing resources, e.g. clusters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
- G06F9/30123—Organisation of register space, e.g. banked or distributed register file according to context, e.g. thread buffers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3818—Decoding for concurrent execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3888—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple threads [SIMT] in parallel
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Advance Control (AREA)
- Computer Hardware Design (AREA)
Abstract
描述了具有一个或多个处理核的处理器。一个或多个处理核中的每一个都具有前端逻辑电路以及多个处理单元。前端逻辑电路将获取线程的相应的指令并将指令解码为指令的相应的微代码以及输入操作数和结果地址。多个处理单元中的每一个都将被分配至少一个线程,耦合到所述前端单元,并具有相应的缓冲区以接收并存储线程中的分配给其的至少一个的微代码。多个处理单元中的每一个都还包括:i)对应于由处理器提供的完整指令组的至少一组功能单元,所述至少一组功能单元用于执行其相应的处理单元的接收到的微代码;ii)耦合到所述至少一组功能单元以存储接收到的微代码的操作数以及结果的寄存器;iii)获取输入操作数用于至少一个功能单元执行接收到的微代码的数据获取电路。
Description
发明领域
本发明的领域一般涉及计算科学,更具体而言,涉及具有共享前端单元的处理核。
背景
图1示出了示例性多核处理器100的架构。如在图1中观察到的,处理器包括:1)多个处理核101_1到101_N;2)互连网络102;3)末级缓存系统103;4)存储器控制器104和I/O中枢105。处理核中的每一个都包含用于执行程序代码指令的一个或多个指令执行流水线。互连网络102可以将核101_1到101_N中的每一个彼此互连并与其他组件103、104、105互连。末级缓存系统103在指令和/或数据被驱逐到系统存储器108之前充当处理器中的最后一层缓存。
存储器控制器104从/向系统存储器108读/写数据和指令。I/O中枢105管理处理器和“I/O”设备(例如,非易失性存储器设备和/或网络接口)之间的通信。端口106源于互连网络102以链接多个处理器,以便可以实现具有N以上的核的系统。图形处理器107执行图形计算。功率管理电路(未示出)作为整体管理处理器的性能和功率状态(“封装级别”)以及处理器内的诸如单个核101_1到101_N、图形处理器107等等之类的单个单元的性能和功率状态的各方面。为了方便起见,在图1中未描绘其他重要的功能块(例如,锁相环(PLL)电路)。
图2示出了图1的处理核中的一个的示例性实施例200。如在图2中观察到的,每一核都包括两个指令执行流水线250,260。每一指令执行流水线250,260都包括其自己的相应的:i)指令获取级201;ii)数据获取级202;iii)指令执行级203;以及,iv)写回级204。指令获取级201从缓存或系统存储器(如果所希望的指令不在缓存内)获取指令序列中的“接下来的”指令。指令通常操作数数据以及要对操作数数据执行的操作。数据获取级202从本地操作数寄存器空间、数据缓存或系统存储器获取操作数数据。指令执行级203包含一组功能单元,其中任何一个都被调用,以执行由任何一个指令对操作数数据调用的特定操作,操作数数据是由指令所指定的,并由数据获取级202获取的。写回级204通常通过将结果写入到耦合到相应的流水线的本地寄存器空间,来“提交”执行的结果。
为了避免对较早的“飞行中的”指令没有任何依赖关系的指令的不必要的等待时间,许多现代的指令执行流水线具有增强的数据获取和写回级以实行“无序”执行。这里,流水线250,260的相应的数据获取级202被增强,以包括数据相依性逻辑205,以识别指令何时没有对较早的飞行中的指令的依赖关系,并准许其在例如其数据还没有被获取的较早的指令的“前头”向指令执行级203发出。
此外,写回级204还被增强以包括重排序缓冲器206,该重排序缓冲器206将无序执行的指令的结果重排序为其正确的顺序,并将它们的引退等待时间到物理寄存器组,直到指令执行结果的正确地排序的连续序列引退。
还观察到增强的指令执行流水线在指令获取级201内包括指令推测逻辑207。推测逻辑207猜测指令序列将采取什么条件分支方向或跳转,并开始获取从该方向或跳转流出的指令序列。然后,由执行流水线的其余级处理推理性指令。
附图
本发明是作为示例说明的,而不仅限制于各个附图的图形,在附图中,类似的参考编号表示类似的元件,其中:
图1示出了处理器(现有技术);
图2示出了指令执行流水线(现有技术);
图3示出了具有共享前端单元的处理核;
图4示出了由图3的处理核执行的方法;
图5示出了其相应的核具有共享前端单元的处理器;
图6示出了由其相应的核具有共享前端单元的处理器组成的计算系统。
详细描述
在半导体芯片上制造的逻辑晶体管的数量可以被视为半导体芯片的用于处理信息的固定资源。上文参考图1和2所讨论的处理器和处理核架构的特征是,强调缩短由处理器处理的指令的等待时间。换言之,图1和2的处理器设计的固定资源,诸如对每一个流水线作出的无序执行增强,专用于通过带有最小的等待时间的流水线运行线程。
然而,逻辑电路专用于当前活跃的线程的加速,是以处理器在任何时刻能够同时处理的线程的总数为代价来实现的。换言之,如果以不同的方式强调处理器的逻辑电路单元,则处理器能够比图1的处理器(其处理核是根据2的架构设计的)同时处理更多线程。例如,如果移除无序执行增强的逻辑电路资源,则“被腾出的”逻辑电路可以重新用于实例化处理器内的更多执行单元。利用更多执行单元,处理器可以同时执行更多指令,并由此的更多线程。
图3示出了在多核处理器内可以实例化多次(例如,对于每一个处理核,一次)的处理核300的架构的实施例。图3的处理核架构被设计有比典型的标准处理核更多执行单元,以便提高处理核的总的吞吐量(即,提高处理核可以同时处理的线程的数量)。如在图3中观察到的,处理核架构包括耦合到多个处理单元302_1至302_N的共享前端单元301。在一个实施例中,处理单元302_1到302_N中的每一个都包含能够支持整个指令集,诸如整个x86指令集或其他通用指令集(与诸如数字信号处理器(DSP)或加速器的典型的指令集之类的比较有限的特定用途指令集相反)的至少一组功能单元(例如,至少一组功能单元303)。
如在图3中观察到的,共享前端单元301获取并接收要由处理核300处理的指令,解码接收到的指令,并将经解码的指令派遣到它们的合适的处理单元。在一个实施例中,共享前端单元获取正在由处理核的所有通用处理单元执行的所有线程的所有指令。
将特定线程分配给特定处理单元,并且每一处理单元,如下面更详细地描述的,是多线程的(即,可以同时和/或并发地处理一个以上的线程)。如此,如果每一处理单元都可以同时/并发地执行高达M个硬件线程,并且有N个处理单元,则处理核可以同时/并发地执行高达MN个硬件线程。这里,乘积MN可以大于可以在典型的处理核中同时执行的硬件线程的典型的数量(例如,大于当前密度中的8或16)。参考共享前端单元301,共享前端单元包含程序控制逻辑电路311,用以标识并获取每一个线程的合适的“接下来的”指令。这里,程序控制逻辑电路311包括每一个线程的指令指针312_1到312_MN,以及指令获取电路313。注意,图3指出有MN个指令指针以反映对于MN个不同的硬件线程的支持。对于每一个硬件线程,指令获取电路313首先针对在线程的指令指针内标识的指令查看指令缓存314。如果在指令缓存314内没有发现搜寻的指令,则从程序存储器315中获取它。在各实现中,可以以每个硬件线程为基础存储指令块,并从缓存和/或存储器中获取它们。
个体硬件线程可以由指令获取电路313以时间切片为基础来服务(例如,合理的轮询(round robin)方法)。进一步,指令获取电路313可以被并行化为并行地获取不同的硬件线程的指令的类似的/相同块(例如,指令获取电路的每一并行块服务指令指针的不同的子集)。
然而,由于可能比传统的处理器较慢地处理单个硬件线程(例如,由于每个线程等待时间缩减电路没有被实例化以有利于更多处理单元,如上文所描述的),因此,可以设想,某些实现可能不需要并行指令获取能力,或至少包括少于N个并行指令获取通道(例如,N/2并行指令获取块)。相应地,在这些中的任何一种情况下,前端单元301的某些组件由处理单元302_1到302_N中的至少两个共享。
在又一实施例中,程序控制逻辑电路311还包括每一个硬件线程的指令转换后备缓冲器(ITLB)电路316。如本领域理解的,ITLB将从程序存储器315接收到的指令地址转换为物理存储器中指令实际驻留的实际地址。
在获取指令之后,它被指令解码器317解码。在一个实施例中,每一个处理单元有指令解码器(即,有N个解码器)。再次,例如,在以较低等待时间执行线程为代价增大处理单元的数量N的情况下,每个指令解码器可以有一个以上的处理单元。可以想象地,对于所有处理单元,甚至可以有一个解码器。
指令通常指定:i)要以“操作码”的形式执行的操作;ii)可以发现操作的输入操作数的位置(寄存器和/或存储器空间);以及,iii)将存储操作的结果的位置(寄存器和/或存储器空间)。在一个实施例中,指令解码器317不仅通过将指令分解为其操作码和输入操作数/结果存储位置,而且还通过将操作码转换为微指令序列,来解码指令。
如本领域理解的,微指令类似于执行单元将执行以便执行指令的功能的小软件程序(微代码)。如此,指令操作码被转换为对应于指令的功能操作的微代码。通常,操作码作为查询参数被输入到被配置成作用就象查询表一样的电路318(例如,被配置为查询表的只读存储器(ROM))。查询表电路318利用操作码的指令的微代码,对输入操作码作出响应。如此,在一个实施例中,在处理核中每一个处理单元有ROM(或再次,每个微代码ROM,有一个以上处理单元,因为与传统的处理器相比,处理单元的每线程等待时间已经变弱)。
然后,与其输入操作数和结果的经解码的指令的寄存器/存储器地址一起,将经解码的指令的微代码分派到处理单元,该处理单元已经分配给经解码的指令是其分量的硬件线程。注意,在两个不同的处理单元上运行的两个不同的硬件线程的两个不同的指令的相应的微代码可以同时被派遣到它们的相应的处理单元。
在一个实施例中,如上文所讨论的,每一处理单元302_1到302_N都可以同时和/或并发地执行一个以上的硬件线程。例如,每一处理单元都可以具有X组执行单元(其中,X=1或更大),其中,每一组执行单元都能够支持诸如整个x86指令集之类的整个指令集。可另选地或组合地,每一处理单元都可以并发地(而不是同时)执行多个软件线程。这里,并发执行,而不是同时执行,对应于在一段时间内通过在由处理单元支持的软件线程之间交替处理资源执行多个软件线程(例如,以轮询方式给每一个软件线程供给资源)。如此,在一个实施例中,在一个时间窗口内,单一处理单元可以通过作为处理单元的硬件线程,将软件线程以及它们的相关联的状态信息切换进出处理单元,并发地执行多个软件线程。
如在图3中观察到的,每一处理单元都具有微代码缓冲区320,用于存储从指令解码器317派遣的微代码。可以将微代码缓冲区320分区,以便对于由处理单元支持的每一个硬件线程,存在单独的FIFO排队空间。输入操作数和结果地址还以对准方式排队,或以别的方式与它们的指令的相应的微代码相关联。
每一处理单元都包括耦合到其内部功能单元组303的寄存器空间321,用于保存功能单元组303负责执行的线程的操作数/结果数据。如果单个功能单元组将并发地执行多个硬件线程,则可以将功能单元组303的寄存器空间321分区,以便对于功能单元组303将并发地执行的每一个硬件线程,有一个寄存器组分区。如此,功能单元组303对于功能单元组正在并发地执行的每一个唯一硬件线程,在特定寄存器分区“之外操作”。
如在图3中观察到的,每一处理单元302_1到302_N都包括寄存器分配逻辑322,用以为处理单元正在并发地和/或同时执行的每一个相应的硬件线程的指令分配寄存器。这里,对于每个处理单元具有一个以上的功能单元组的实现,可以有微代码缓冲区电路320和寄存器分配电路322的多个实例(例如,处理单元的每一功能单元组,一个实例),或可以有馈给一个以上的功能单元组的一个微代码缓冲区和寄存器分配电路(即,对于两个或更多功能单元组,一个微代码缓冲区320和寄存器分配电路322)。寄存器分配逻辑电路322包括数据获取逻辑,用于从与操作数的相应的指令被指向的功能单元相关联的寄存器空间321获取操作数(由指令调用的)。数据获取逻辑电路可以耦合到系统存储器323,以显式地从系统存储器323获取数据操作数。
在一个实施例中,每一功能单元组303都包括:i)包含用于执行整数数学/逻辑指令的功能单元的整数功能单元集群;ii)包含用于执行浮点数学/逻辑指令的功能单元的浮点功能单元集群;iii)包含用于执行SIMD数学/逻辑指令的功能单元的SIMD功能单元集群;以及,iv)包含用于执行数据存储器访问(对于整数和/或浮点和/或SIMD操作数和/或结果)的功能单元的存储器访问功能单元集群。存储器访问功能单元集群可以包含一个或多个数据TLB,以对于其相应的线程执行虚拟到物理地址转换。
特定指令的微代码连同由与寄存器分配逻辑322相关联的获取电路为指令获取的操作数数据一起从其相应的微代码缓冲区320发到合适的功能单元。功能单元的执行的结果被写回到与执行单元相关联的寄存器空间321。
在又一实施例中,每一处理单元都包含耦合到存储器访问集群的功能单元的数据缓存324。存储器访问集群的功能单元还耦合到系统存储器323,以便它们可以从存储器获取数据。值得注意的是,上文所描述的每一寄存器组分区都可以被进一步分区为耦合到对应的功能单元集群的单独的整数、浮点以及SIMD寄存器空间。
根据一种情况,操作系统和/或虚拟机监视器(VMM)软件将特定软件线程分配到特定处理单元。共享前端逻辑301和/或操作系统/VMM能够动态地将软件线程分配到特定处理单元或功能单元组,以作为硬件线程来激活线程。在各实施例中,每一处理单元都包括“上下文切换”逻辑(未示出),以便每一处理单元都比起它可以同时或并发地作为硬件线程支持,可分配更多软件线程。即,分配给处理单元的软件线程的数量可以超出处理单元目前能够执行(或者同时或者并发)的“活跃”硬件线程的数量,如由线程的上下文信息在处理单元的寄存器空间内的存在所证明的。
这里,例如,当软件线程作为硬件线程变为活跃时,其上下文信息(例如,其各种操作数的值和控制信息)位于耦合到执行线程的指令的功能单元组303的寄存器空间321内。如果作出将线程从活跃转换为非活跃状态的决定,则从此寄存器空间321读出线程的上下文信息,并存储在别处(例如,系统存储器323)。在线程的寄存器空间现在被“腾出”的情况下,可以将其上下文信息驻留,例如,在系统存储器232中的另一“非活跃”软件线程的上下文信息写入到寄存器空间321。结果,其他线程从“非活跃”转换到“活跃”,其指令作为前进的硬件线程来执行。
如上文所讨论的,可以以最大化任何一个特定线程的等待时间为代价取得容纳大量的硬件线程的逻辑电路的“空间”。如此,用于“加速”硬件线程的执行的任何一个机制以及相关联的逻辑电路可能不存在于共享前端或处理单元电路中。这样的被消除的块可包括下列各项中的任何一项或多项:1)推测逻辑(例如,分支预测逻辑);2)无序执行逻辑(例如,寄存器重命名逻辑和/或重排序缓冲区和/或数据相依性逻辑);3)用于对于单一硬件线程动态地实行并行指令发出的超级标量逻辑。
利用图3的处理核架构的多个实例构建的多核处理器可包括上文参考图1所讨论的周围的特征中的任何一个/全部。
图4示出了描述上文所描述的处理核的方法的流程图。根据图4的方法,在共享前端单元中获取401和解码不同的硬件线程的第一和第二指令。解码指令,从共享前端单元402向不同的处理单元发出指令的相应的微代码和操作数/结果地址。相应的处理单元获取它们的相应的操作数的数据,并向相应的功能单元403发出接收到的微代码和相应的操作数。然后,功能单元执行它们的相应的指令404。
图5示出了具有多个处理核550_1到550_N的处理器500的实施例,每一处理核都具有相应的共享前端单元511_1,511_2,...511_N(带有相应的指令TLB 516_1,516_2,...516_N)和具有对应的微代码缓冲区的相应的处理单元(例如,核501_1的处理单元内的微代码缓冲区520_1,520_2,等等)。每一核还包括一个或多个缓存级别550_1,550_2,550_N以个体地和/或相应的核作为整体地,缓存每一处理单元的指令和/或数据。核501_1,501_2,...501_N通过互连网络502彼此耦合,该互连网络502还将核耦合到一个或多个缓存级别(例如,末级缓存503),它们缓存核501_1,501_2...501_N的指令和/或数据),以及耦合到例如系统存储器的“切片”的存储器控制器504。在图5中还可以包括诸如图1的组件中的任何一个之类的其他组件。
图6示出了诸如利用具有上文在图5中所讨论的特征的多个处理器600_1到600_z实现的计算机之类的计算系统的实施例。多个处理器600_1到600_z通过网络彼此连接,该网络还将处理器耦合到多个系统存储器单元608_1,608_2,非易失性存储器单元610(例如,磁盘驱动器)和外部(例如,因特网)网络接口611。
在上述说明书中,已经参考特定示例性实施例描述了本发明。然而,显然可对这些实施例作出各种修改和改变,而不背离如所附权利要求所述的本发明的更宽泛精神和范围。因此,说明书和附图应被认为是说明性而非限制性意义。
Claims (20)
1.具有一个或多个处理核的处理器,所述一个或多个处理核中的每一个都包括:
前端逻辑电路,所述前端逻辑电路用于获取线程的相应的指令并将所述指令解码为所述指令的相应的微代码以及输入操作数和结果地址;
多个处理单元,所述处理单元中的每一个被分配所述线程中的至少一个,每一处理单元耦合到所述前端单元,并具有相应的缓冲区,用于接收并存储所述线程中分配给其的至少一个线程的微代码,所述多个处理单元中的每一个都包括:
i)对应于由所述处理器提供的完整指令组的至少一组功能单元,所述至少一组功能单元执行其相应的处理单元的接收到的微代码;
ii)耦合到所述至少一组功能单元以存储所述接收到的微代码的操作数以及结果的寄存器;
iii)用于获取输入操作数以用于所述至少一个功能单元执行所述接收到的微代码的数据获取电路。
2.如权利要求1所述的处理器,其特征在于,所述功能单元不耦合到用于执行所述接收到的微代码的无序执行的任何逻辑电路。
3.如权利要求2所述的处理器,其特征在于,所述处理器包括N个处理单元。
4.如权利要求1所述的处理器,其特征在于,所述功能单元单元不耦合到用于执行所述接收到的微代码的推理性执行的任何逻辑电路。
5.如权利要求4所述的处理器,其特征在于,所述处理器包括N个处理单元。
6.如权利要求1所述的处理器,其特征在于,所述处理器不包括用于所述线程中的任何一个的为所述线程中的任何一个并行地发出指令的电路。
7.如权利要求6所述的处理器,其特征在于,所述处理器包括N个处理单元。
8.一种由处理器执行的方法,包括:
利用所述处理器的处理核的相同逻辑电路,执行下面的a)和b)中的至少一个:
a)获取两个不同的线程的第一和第二指令;
b)将所述第一和第二指令解码为微代码的相应的单元、输入操作数地址信息以及结果地址信息;
将微代码的所述相应的单元以及地址信息分派到两个不同的处理单元;以及,
在每一处理单元处,对于所述两个线程中的其相应的一个,执行下列各项:
存储其相应的线程的微代码;
获取带有接收到的输入操作数地址的输入操作数数据;
利用功能单元电路对所述获取的输入操作数执行接收到的微代码,所述功能单元电路是支持完整通用指令组的一组功能单元的一部分。
9.如权利要求8所述的方法,其特征在于,所述处理单元中的第一个是第一处理单元,而所述处理单元中的第二个是第N处理单元。
10.如权利要求9所述的方法,其特征在于,软件将所述线程中的第一线程分配到所述第一处理单元,并且将所述线程中的第二线程分配到所述第N处理单元。
11.如权利要求8所述的方法,其特征在于,所述线程两者都不利用任何推理性执行逻辑电路来处理。
12.如权利要求8所述的方法,其特征在于,所述线程两者都不利用任何无序执行逻辑电路来处理。
13.如权利要求8所述的方法,其特征在于,所述线程两者都不并行地发出其相应的指令。
14.一种处理器,包括:
至少两个处理核,每一核都具有:
前端单元,所述前端单元用于获取由其处理核处理的所有线程的所有相应的指令并将所述指令解码为所述指令的相应的微代码以及输入操作数和结果地址;
所述前端单元耦合到其处理核的所有通用处理单元,所述处理单元中的每一个被分配所述线程中的至少一个,每一处理单元耦合到所述前端单元,用于接收所述线程中分配给其的至少一个线程的微代码以及输入操作数和结果地址,所述多个处理单元中的每一个包括:
i)对应于由所述处理器提供的完整通用指令组的至少一组功能单元,所述至少一组功能单元执行其相应的处理单元的接收到的微代码;
ii)耦合到所述至少一组功能单元以存储所述接收到的微代码的操作数和结果的寄存器;
iii)用于获取输入操作数以用于所述至少一个功能单元执行所述接收到的微代码的数据获取电路;
耦合到所述多个处理单元的互连网络;
耦合到所述互连网络的末级缓存。
15.如权利要求14所述的处理器,其特征在于,所述功能单元不耦合到用于执行所述接收到的微代码的无序执行的任何逻辑电路。
16.如权利要求15所述的处理器,其特征在于,所述处理器包括N个处理单元。
17.如权利要求14所述的处理器,其特征在于,所述功能单元不耦合到用于执行所述接收到的微代码的推理性执行的任何逻辑电路。
18.如权利要求17所述的处理器,其特征在于,所述处理器包括N个处理单元。
19.如权利要求14所述的处理器,其特征在于,所述处理器不包括用于所述线程中的任何一个的为所述线程中的任何一个并行地发出指令的电路。
20.如权利要求19所述的处理器,其特征在于,所述处理器包括N个处理单元。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811504065.4A CN110045988B (zh) | 2012-12-28 | 2013-06-28 | 具有共享前端单元的处理核 |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/730,719 | 2012-12-28 | ||
US13/730,719 US10140129B2 (en) | 2012-12-28 | 2012-12-28 | Processing core having shared front end unit |
PCT/US2013/048694 WO2014105207A1 (en) | 2012-12-28 | 2013-06-28 | Processing core having shared front end unit |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811504065.4A Division CN110045988B (zh) | 2012-12-28 | 2013-06-28 | 具有共享前端单元的处理核 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105027075A true CN105027075A (zh) | 2015-11-04 |
CN105027075B CN105027075B (zh) | 2019-01-29 |
Family
ID=51018681
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811504065.4A Active CN110045988B (zh) | 2012-12-28 | 2013-06-28 | 具有共享前端单元的处理核 |
CN201380060918.9A Active CN105027075B (zh) | 2012-12-28 | 2013-06-28 | 具有共享前端单元的处理核 |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811504065.4A Active CN110045988B (zh) | 2012-12-28 | 2013-06-28 | 具有共享前端单元的处理核 |
Country Status (3)
Country | Link |
---|---|
US (2) | US10140129B2 (zh) |
CN (2) | CN110045988B (zh) |
WO (1) | WO2014105207A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106250200A (zh) * | 2016-08-02 | 2016-12-21 | 合肥奇也信息科技有限公司 | 一种用于计算机划分至少一个软件应用段的执行方法 |
CN110045988A (zh) * | 2012-12-28 | 2019-07-23 | 英特尔公司 | 具有共享前端单元的处理核 |
CN110771047A (zh) * | 2017-06-15 | 2020-02-07 | 交流通讯有限公司 | 具有f功能和g功能的llr域计算的极性解码器 |
CN110300989B (zh) * | 2017-05-15 | 2023-12-22 | 谷歌有限责任公司 | 可配置并且可编程的图像处理器单元 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9542193B2 (en) | 2012-12-28 | 2017-01-10 | Intel Corporation | Memory address collision detection of ordered parallel threads with bloom filters |
WO2014169477A1 (zh) * | 2013-04-19 | 2014-10-23 | 中国科学院自动化研究所 | 具有多态指令集体系结构的处理器 |
US9747108B2 (en) | 2015-03-27 | 2017-08-29 | Intel Corporation | User-level fork and join processors, methods, systems, and instructions |
US11893392B2 (en) | 2020-12-01 | 2024-02-06 | Electronics And Telecommunications Research Institute | Multi-processor system and method for processing floating point operation thereof |
US20230100586A1 (en) * | 2021-09-24 | 2023-03-30 | Intel Corporation | Circuitry and methods for accelerating streaming data-transformation operations |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050257186A1 (en) * | 2004-05-13 | 2005-11-17 | Michael Zilbershlag | Operation system for programmable hardware |
CN1981280A (zh) * | 2004-07-02 | 2007-06-13 | 英特尔公司 | 用于经由资源分配和限制的异构芯片多处理器的装置和方法 |
US7234042B1 (en) * | 1999-07-14 | 2007-06-19 | Broadcom Corporation | Identification bit at a predetermined instruction location that indicates whether the instruction is one or two independent operations and indicates the nature the operations executing in two processing channels |
US7793080B2 (en) * | 2007-12-31 | 2010-09-07 | Globalfoundries Inc. | Processing pipeline having parallel dispatch and method thereof |
CN101855614A (zh) * | 2007-07-18 | 2010-10-06 | 先进微装置公司 | 具有阶层式微码储存之多内核处理器 |
US20120023314A1 (en) * | 2010-07-21 | 2012-01-26 | Crum Matthew M | Paired execution scheduling of dependent micro-operations |
Family Cites Families (135)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4943915A (en) | 1987-09-29 | 1990-07-24 | Digital Equipment Corporation | Apparatus and method for synchronization of a coprocessor unit in a pipelined central processing unit |
US4982402A (en) | 1989-02-03 | 1991-01-01 | Digital Equipment Corporation | Method and apparatus for detecting and correcting errors in a pipelined computer system |
US5276798A (en) | 1990-09-14 | 1994-01-04 | Hughes Aircraft Company | Multifunction high performance graphics rendering processor |
US5329615A (en) | 1990-09-14 | 1994-07-12 | Hughes Aircraft Company | Concurrent general purpose and DMA processing in a graphics rendering processor |
CA2050658C (en) | 1990-09-14 | 1997-01-28 | John M. Peaslee | Dual hardware channels and hardware context switching in a graphics rendering processor |
US5444853A (en) | 1992-03-31 | 1995-08-22 | Seiko Epson Corporation | System and method for transferring data between a plurality of virtual FIFO's and a peripheral via a hardware FIFO and selectively updating control information associated with the virtual FIFO's |
US5423025A (en) | 1992-09-29 | 1995-06-06 | Amdahl Corporation | Error handling mechanism for a controller having a plurality of servers |
US5430841A (en) | 1992-10-29 | 1995-07-04 | International Business Machines Corporation | Context management in a graphics system |
JPH07219774A (ja) | 1994-02-07 | 1995-08-18 | Fujitsu Ltd | データ処理装置および例外処理方法 |
US5550988A (en) | 1994-03-01 | 1996-08-27 | Intel Corporation | Apparatus and method for performing error correction in a multi-processor system |
US6341324B1 (en) | 1995-10-06 | 2002-01-22 | Lsi Logic Corporation | Exception processing in superscalar microprocessor |
US5778211A (en) | 1996-02-15 | 1998-07-07 | Sun Microsystems, Inc. | Emulating a delayed exception on a digital computer having a corresponding precise exception mechanism |
US6061711A (en) | 1996-08-19 | 2000-05-09 | Samsung Electronics, Inc. | Efficient context saving and restoring in a multi-tasking computing system environment |
CN100373331C (zh) * | 1996-08-27 | 2008-03-05 | 松下电器产业株式会社 | 独立处理多个指令流、软式控制各指令流的处理功能的多线程处理器 |
US6247040B1 (en) | 1996-09-30 | 2001-06-12 | Lsi Logic Corporation | Method and structure for automated switching between multiple contexts in a storage subsystem target device |
US6148326A (en) | 1996-09-30 | 2000-11-14 | Lsi Logic Corporation | Method and structure for independent disk and host transfer in a storage subsystem target device |
US6081849A (en) | 1996-10-01 | 2000-06-27 | Lsi Logic Corporation | Method and structure for switching multiple contexts in storage subsystem target device |
US6275497B1 (en) | 1997-02-10 | 2001-08-14 | Hybrid Networks, Inc. | Method and apparatus for controlling communication channels using contention and polling schemes |
US6075546A (en) | 1997-11-10 | 2000-06-13 | Silicon Grahphics, Inc. | Packetized command interface to graphics processor |
US6272522B1 (en) * | 1998-11-17 | 2001-08-07 | Sun Microsystems, Incorporated | Computer data packet switching and load balancing system using a general-purpose multiprocessor architecture |
US6397240B1 (en) | 1999-02-18 | 2002-05-28 | Agere Systems Guardian Corp. | Programmable accelerator for a programmable processor system |
CA2383526A1 (en) * | 1999-09-01 | 2001-03-15 | Intel Corporation | Branch instruction for multithreaded processor |
US6543026B1 (en) | 1999-09-10 | 2003-04-01 | Lsi Logic Corporation | Forward error correction apparatus and methods |
JP3621315B2 (ja) | 1999-11-22 | 2005-02-16 | Necエレクトロニクス株式会社 | マイクロプロセッサシステム |
US6691143B2 (en) | 2000-05-11 | 2004-02-10 | Cyberguard Corporation | Accelerated montgomery multiplication using plural multipliers |
US6742104B2 (en) | 2000-08-21 | 2004-05-25 | Texas Instruments Incorporated | Master/slave processing system with shared translation lookaside buffer |
EP1182569B8 (en) | 2000-08-21 | 2011-07-06 | Texas Instruments Incorporated | TLB lock and unlock operation |
EP1182568A3 (en) | 2000-08-21 | 2004-07-21 | Texas Instruments Incorporated | TLB operation based on task-id |
JP3729087B2 (ja) | 2001-05-23 | 2005-12-21 | 日本電気株式会社 | マルチプロセッサシステム、データ依存投機実行制御装置およびその方法 |
JP2003015900A (ja) | 2001-06-28 | 2003-01-17 | Hitachi Ltd | 追走型多重化システム、及び追走により信頼性を高めるデータ処理方法 |
US20030028751A1 (en) | 2001-08-03 | 2003-02-06 | Mcdonald Robert G. | Modular accelerator framework |
US7209996B2 (en) | 2001-10-22 | 2007-04-24 | Sun Microsystems, Inc. | Multi-core multi-thread processor |
US7228401B2 (en) | 2001-11-13 | 2007-06-05 | Freescale Semiconductor, Inc. | Interfacing a processor to a coprocessor in which the processor selectively broadcasts to or selectively alters an execution mode of the coprocessor |
US20030126416A1 (en) | 2001-12-31 | 2003-07-03 | Marr Deborah T. | Suspending execution of a thread in a multi-threaded processor |
US7313734B2 (en) | 2002-01-14 | 2007-12-25 | International Business Machines Corporation | Method and system for instruction tracing with enhanced interrupt avoidance |
US20030135719A1 (en) | 2002-01-14 | 2003-07-17 | International Business Machines Corporation | Method and system using hardware assistance for tracing instruction disposition information |
US20030135718A1 (en) | 2002-01-14 | 2003-07-17 | International Business Machines Corporation | Method and system using hardware assistance for instruction tracing by revealing executed opcode or instruction |
US20040215444A1 (en) | 2002-03-25 | 2004-10-28 | Patel Mukesh K. | Hardware-translator-based custom method invocation system and method |
US6944746B2 (en) | 2002-04-01 | 2005-09-13 | Broadcom Corporation | RISC processor supporting one or more uninterruptible co-processors |
US7200735B2 (en) | 2002-04-10 | 2007-04-03 | Tensilica, Inc. | High-performance hybrid processor with configurable execution units |
GB2388447B (en) | 2002-05-09 | 2005-07-27 | Sun Microsystems Inc | A computer system method and program product for performing a data access from low-level code |
US6952214B2 (en) | 2002-07-12 | 2005-10-04 | Sun Microsystems, Inc. | Method for context switching a graphics accelerator comprising multiple rendering pipelines |
US7313797B2 (en) | 2002-09-18 | 2007-12-25 | Wind River Systems, Inc. | Uniprocessor operating system design facilitating fast context switching |
US20040111594A1 (en) | 2002-12-05 | 2004-06-10 | International Business Machines Corporation | Multithreading recycle and dispatch mechanism |
US7673304B2 (en) | 2003-02-18 | 2010-03-02 | Microsoft Corporation | Multithreaded kernel for graphics processing unit |
US7079147B2 (en) | 2003-05-14 | 2006-07-18 | Lsi Logic Corporation | System and method for cooperative operation of a processor and coprocessor |
US7714870B2 (en) | 2003-06-23 | 2010-05-11 | Intel Corporation | Apparatus and method for selectable hardware accelerators in a data driven architecture |
US7082508B2 (en) | 2003-06-24 | 2006-07-25 | Intel Corporation | Dynamic TLB locking based on page usage metric |
US7765388B2 (en) | 2003-09-17 | 2010-07-27 | Broadcom Corporation | Interrupt verification support mechanism |
US8566828B2 (en) | 2003-12-19 | 2013-10-22 | Stmicroelectronics, Inc. | Accelerator for multi-processing system and method |
US7302627B1 (en) | 2004-04-05 | 2007-11-27 | Mimar Tibet | Apparatus for efficient LFSR calculation in a SIMD processor |
US7370243B1 (en) | 2004-06-30 | 2008-05-06 | Sun Microsystems, Inc. | Precise error handling in a fine grain multithreaded multicore processor |
US7388588B2 (en) | 2004-09-09 | 2008-06-17 | International Business Machines Corporation | Programmable graphics processing engine |
US7437581B2 (en) | 2004-09-28 | 2008-10-14 | Intel Corporation | Method and apparatus for varying energy per instruction according to the amount of available parallelism |
US7809982B2 (en) | 2004-10-01 | 2010-10-05 | Lockheed Martin Corporation | Reconfigurable computing machine and related systems and methods |
US7350055B2 (en) | 2004-10-20 | 2008-03-25 | Arm Limited | Tightly coupled accelerator |
US7598958B1 (en) | 2004-11-17 | 2009-10-06 | Nvidia Corporation | Multi-chip graphics processing unit apparatus, system, and method |
US8788787B2 (en) | 2005-03-02 | 2014-07-22 | The Boeing Company | Systems, methods and architecture for facilitating software access to acceleration technology |
US20060288193A1 (en) | 2005-06-03 | 2006-12-21 | Silicon Integrated System Corp. | Register-collecting mechanism for multi-threaded processors and method using the same |
US7426626B2 (en) | 2005-08-23 | 2008-09-16 | Qualcomm Incorporated | TLB lock indicator |
US7953933B1 (en) * | 2005-09-28 | 2011-05-31 | Oracle America, Inc. | Instruction cache, decoder circuit, basic block cache circuit and multi-block cache circuit |
US7545381B2 (en) | 2005-11-10 | 2009-06-09 | Via Technologies, Inc. | Interruptible GPU and method for context saving and restoring |
US7583268B2 (en) | 2005-11-10 | 2009-09-01 | Via Technologies, Inc. | Graphics pipeline precise interrupt method and apparatus |
US8212824B1 (en) | 2005-12-19 | 2012-07-03 | Nvidia Corporation | Apparatus and method for serial save and restore of graphics processing unit state information |
US7725624B2 (en) | 2005-12-30 | 2010-05-25 | Intel Corporation | System and method for cryptography processing units and multiplier |
US7509481B2 (en) * | 2006-03-03 | 2009-03-24 | Sun Microsystems, Inc. | Patchable and/or programmable pre-decode |
US7480838B1 (en) | 2006-03-23 | 2009-01-20 | Intel Corporation | Method, system and apparatus for detecting and recovering from timing errors |
US7746350B1 (en) | 2006-06-15 | 2010-06-29 | Nvidia Corporation | Cryptographic computations on general purpose graphics processing units |
US8041929B2 (en) * | 2006-06-16 | 2011-10-18 | Cisco Technology, Inc. | Techniques for hardware-assisted multi-threaded processing |
US7487341B2 (en) | 2006-06-29 | 2009-02-03 | Intel Corporation | Handling address translations and exceptions of a heterogeneous resource of a processor using another processor resource |
US8959311B2 (en) | 2006-08-25 | 2015-02-17 | Texas Instruments Incorporated | Methods and systems involving secure RAM |
US9478062B2 (en) | 2006-09-19 | 2016-10-25 | Imagination Technologies Limited | Memory allocation in distributed memories for multiprocessing |
US7949887B2 (en) | 2006-11-01 | 2011-05-24 | Intel Corporation | Independent power control of processing cores |
US8127113B1 (en) | 2006-12-01 | 2012-02-28 | Synopsys, Inc. | Generating hardware accelerators and processor offloads |
US7827383B2 (en) | 2007-03-09 | 2010-11-02 | Oracle America, Inc. | Efficient on-chip accelerator interfaces to reduce software overhead |
CN100489830C (zh) * | 2007-03-19 | 2009-05-20 | 中国人民解放军国防科学技术大学 | 面向科学计算的64位流处理器芯片 |
US8015368B2 (en) * | 2007-04-20 | 2011-09-06 | Siport, Inc. | Processor extensions for accelerating spectral band replication |
US7937568B2 (en) | 2007-07-11 | 2011-05-03 | International Business Machines Corporation | Adaptive execution cycle control method for enhanced instruction throughput |
US8345052B1 (en) | 2007-11-08 | 2013-01-01 | Nvidia Corporation | Method and system for using a GPU frame buffer in a multi-GPU system as cache memory |
US8339404B2 (en) | 2007-11-29 | 2012-12-25 | Accelereyes, Llc | System for improving utilization of GPU resources |
US8140823B2 (en) | 2007-12-03 | 2012-03-20 | Qualcomm Incorporated | Multithreaded processor with lock indicator |
US7865675B2 (en) | 2007-12-06 | 2011-01-04 | Arm Limited | Controlling cleaning of data values within a hardware accelerator |
GB2455344B (en) | 2007-12-06 | 2012-06-13 | Advanced Risc Mach Ltd | Recovering from control path errors |
US8780123B2 (en) | 2007-12-17 | 2014-07-15 | Nvidia Corporation | Interrupt handling techniques in the rasterizer of a GPU |
US8086825B2 (en) * | 2007-12-31 | 2011-12-27 | Advanced Micro Devices, Inc. | Processing pipeline having stage-specific thread selection and method thereof |
US7877582B2 (en) | 2008-01-31 | 2011-01-25 | International Business Machines Corporation | Multi-addressable register file |
US8055872B2 (en) | 2008-02-21 | 2011-11-08 | Arm Limited | Data processor with hardware accelerator, accelerator interface and shared memory management unit |
US8776077B2 (en) | 2008-04-02 | 2014-07-08 | Oracle America, Inc. | Method for multithreading an application using partitioning to allocate work to threads |
US8776030B2 (en) | 2008-04-09 | 2014-07-08 | Nvidia Corporation | Partitioning CUDA code for execution by a general purpose processor |
US8141102B2 (en) | 2008-09-04 | 2012-03-20 | International Business Machines Corporation | Data processing in a hybrid computing environment |
US8230442B2 (en) | 2008-09-05 | 2012-07-24 | International Business Machines Corporation | Executing an accelerator application program in a hybrid computing environment |
US8082426B2 (en) | 2008-11-06 | 2011-12-20 | Via Technologies, Inc. | Support of a plurality of graphic processing units |
US20100274972A1 (en) | 2008-11-24 | 2010-10-28 | Boris Babayan | Systems, methods, and apparatuses for parallel computing |
US7930519B2 (en) | 2008-12-17 | 2011-04-19 | Advanced Micro Devices, Inc. | Processor with coprocessor interfacing functional unit for forwarding result from coprocessor to retirement unit |
US8281185B2 (en) | 2009-06-30 | 2012-10-02 | Oracle America, Inc. | Advice-based feedback for transactional execution |
US20110040924A1 (en) | 2009-08-11 | 2011-02-17 | Selinger Robert D | Controller and Method for Detecting a Transmission Error Over a NAND Interface Using Error Detection Code |
US8458677B2 (en) | 2009-08-20 | 2013-06-04 | International Business Machines Corporation | Generating code adapted for interlinking legacy scalar code and extended vector code |
US8719547B2 (en) | 2009-09-18 | 2014-05-06 | Intel Corporation | Providing hardware support for shared virtual memory between local and remote physical memory |
US8405666B2 (en) | 2009-10-08 | 2013-03-26 | Advanced Micro Devices, Inc. | Saving, transferring and recreating GPU context information across heterogeneous GPUs during hot migration of a virtual machine |
US8244946B2 (en) | 2009-10-16 | 2012-08-14 | Brocade Communications Systems, Inc. | Interrupt moderation |
US8095824B2 (en) | 2009-12-15 | 2012-01-10 | Intel Corporation | Performing mode switching in an unbounded transactional memory (UTM) system |
US8166437B2 (en) * | 2009-12-15 | 2012-04-24 | Apple Inc. | Automated pad ring generation for programmable logic device implementation of integrated circuit design |
US8316194B2 (en) | 2009-12-15 | 2012-11-20 | Intel Corporation | Mechanisms to accelerate transactions using buffered stores |
US9275491B2 (en) | 2010-04-05 | 2016-03-01 | Nvidia Corporation | GPU work creation and stateless graphics in OPENGL |
US9015443B2 (en) | 2010-04-30 | 2015-04-21 | International Business Machines Corporation | Reducing remote reads of memory in a hybrid computing environment |
JP4818450B1 (ja) | 2010-06-30 | 2011-11-16 | 株式会社東芝 | グラフィクスプロセッシングユニットおよび情報処理装置 |
US8667253B2 (en) | 2010-08-04 | 2014-03-04 | International Business Machines Corporation | Initiating assist thread upon asynchronous event for processing simultaneously with controlling thread and updating its running status in status register |
US9552206B2 (en) | 2010-11-18 | 2017-01-24 | Texas Instruments Incorporated | Integrated circuit with control node circuitry and processing circuitry |
EP2458510B1 (en) | 2010-11-29 | 2014-05-07 | NTT DoCoMo, Inc. | Method and apparatus for performing a cross-correlation |
US20120159090A1 (en) | 2010-12-16 | 2012-06-21 | Microsoft Corporation | Scalable multimedia computer system architecture with qos guarantees |
US20120166777A1 (en) * | 2010-12-22 | 2012-06-28 | Advanced Micro Devices, Inc. | Method and apparatus for switching threads |
CN102567556A (zh) | 2010-12-27 | 2012-07-11 | 北京国睿中数科技股份有限公司 | 一种面向调试的处理器验证方法及验证设备 |
CN102270166A (zh) | 2011-02-22 | 2011-12-07 | 清华大学 | 基于模拟器的处理器故障注入及跟踪方法及模拟器 |
US8683175B2 (en) | 2011-03-15 | 2014-03-25 | International Business Machines Corporation | Seamless interface for multi-threaded core accelerators |
US8892924B2 (en) | 2011-05-31 | 2014-11-18 | Intel Corporation | Reducing power consumption of uncore circuitry of a processor |
US8793515B2 (en) | 2011-06-27 | 2014-07-29 | Intel Corporation | Increasing power efficiency of turbo mode operation in a processor |
US9003102B2 (en) | 2011-08-26 | 2015-04-07 | Sandisk Technologies Inc. | Controller with extended status register and method of use therewith |
SG11201402727WA (en) | 2011-12-01 | 2014-06-27 | Univ Singapore | Polymorphic heterogeneous multi-core architecture |
US20130159630A1 (en) | 2011-12-20 | 2013-06-20 | Ati Technologies Ulc | Selective cache for inter-operations in a processor-based environment |
US9436512B2 (en) | 2011-12-22 | 2016-09-06 | Board Of Supervisors Of Louisana State University And Agricultural And Mechanical College | Energy efficient job scheduling in heterogeneous chip multiprocessors based on dynamic program behavior using prim model |
US9268596B2 (en) | 2012-02-02 | 2016-02-23 | Intel Corparation | Instruction and logic to test transactional execution status |
WO2013147887A1 (en) | 2012-03-30 | 2013-10-03 | Intel Corporation | Context switching mechanism for a processing core having a general purpose cpu core and a tightly coupled accelerator |
WO2013147885A1 (en) | 2012-03-30 | 2013-10-03 | Intel Corporation | Apparatus and method for accelerating operations in a processor which uses shared virtual memory |
US20130332937A1 (en) | 2012-05-29 | 2013-12-12 | Advanced Micro Devices, Inc. | Heterogeneous Parallel Primitives Programming Model |
US9753778B2 (en) | 2012-07-20 | 2017-09-05 | Microsoft Technology Licensing, Llc | Domain-agnostic resource allocation framework |
US9123128B2 (en) | 2012-12-21 | 2015-09-01 | Nvidia Corporation | Graphics processing unit employing a standard processing unit and a method of constructing a graphics processing unit |
US9417873B2 (en) | 2012-12-28 | 2016-08-16 | Intel Corporation | Apparatus and method for a hybrid latency-throughput processor |
US20140189333A1 (en) | 2012-12-28 | 2014-07-03 | Oren Ben-Kiki | Apparatus and method for task-switchable synchronous hardware accelerators |
US9053025B2 (en) | 2012-12-28 | 2015-06-09 | Intel Corporation | Apparatus and method for fast failure handling of instructions |
US10140129B2 (en) * | 2012-12-28 | 2018-11-27 | Intel Corporation | Processing core having shared front end unit |
US9361116B2 (en) | 2012-12-28 | 2016-06-07 | Intel Corporation | Apparatus and method for low-latency invocation of accelerators |
US9086813B2 (en) | 2013-03-15 | 2015-07-21 | Qualcomm Incorporated | Method and apparatus to save and restore system memory management unit (MMU) contexts |
GB2519103B (en) * | 2013-10-09 | 2020-05-06 | Advanced Risc Mach Ltd | Decoding a complex program instruction corresponding to multiple micro-operations |
US10031770B2 (en) | 2014-04-30 | 2018-07-24 | Intel Corporation | System and method of delayed context switching in processor registers |
US9703603B1 (en) | 2016-04-25 | 2017-07-11 | Nxp Usa, Inc. | System and method for executing accelerator call |
-
2012
- 2012-12-28 US US13/730,719 patent/US10140129B2/en active Active
-
2013
- 2013-06-28 CN CN201811504065.4A patent/CN110045988B/zh active Active
- 2013-06-28 WO PCT/US2013/048694 patent/WO2014105207A1/en active Application Filing
- 2013-06-28 CN CN201380060918.9A patent/CN105027075B/zh active Active
-
2018
- 2018-11-26 US US16/200,203 patent/US20190171462A1/en not_active Abandoned
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7234042B1 (en) * | 1999-07-14 | 2007-06-19 | Broadcom Corporation | Identification bit at a predetermined instruction location that indicates whether the instruction is one or two independent operations and indicates the nature the operations executing in two processing channels |
US20050257186A1 (en) * | 2004-05-13 | 2005-11-17 | Michael Zilbershlag | Operation system for programmable hardware |
CN1981280A (zh) * | 2004-07-02 | 2007-06-13 | 英特尔公司 | 用于经由资源分配和限制的异构芯片多处理器的装置和方法 |
CN101855614A (zh) * | 2007-07-18 | 2010-10-06 | 先进微装置公司 | 具有阶层式微码储存之多内核处理器 |
US7793080B2 (en) * | 2007-12-31 | 2010-09-07 | Globalfoundries Inc. | Processing pipeline having parallel dispatch and method thereof |
US20120023314A1 (en) * | 2010-07-21 | 2012-01-26 | Crum Matthew M | Paired execution scheduling of dependent micro-operations |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110045988A (zh) * | 2012-12-28 | 2019-07-23 | 英特尔公司 | 具有共享前端单元的处理核 |
CN110045988B (zh) * | 2012-12-28 | 2023-08-15 | 英特尔公司 | 具有共享前端单元的处理核 |
CN106250200A (zh) * | 2016-08-02 | 2016-12-21 | 合肥奇也信息科技有限公司 | 一种用于计算机划分至少一个软件应用段的执行方法 |
CN110300989B (zh) * | 2017-05-15 | 2023-12-22 | 谷歌有限责任公司 | 可配置并且可编程的图像处理器单元 |
CN110771047A (zh) * | 2017-06-15 | 2020-02-07 | 交流通讯有限公司 | 具有f功能和g功能的llr域计算的极性解码器 |
CN110771047B (zh) * | 2017-06-15 | 2023-08-04 | 交流通讯有限公司 | 具有f功能和g功能的llr域计算的极性解码器 |
Also Published As
Publication number | Publication date |
---|---|
CN110045988B (zh) | 2023-08-15 |
CN105027075B (zh) | 2019-01-29 |
WO2014105207A1 (en) | 2014-07-03 |
CN110045988A (zh) | 2019-07-23 |
US20140189300A1 (en) | 2014-07-03 |
US20190171462A1 (en) | 2019-06-06 |
US10140129B2 (en) | 2018-11-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105027075A (zh) | 具有共享前端单元的处理核 | |
TWI628594B (zh) | 用戶等級分叉及會合處理器、方法、系統及指令 | |
US9146777B2 (en) | Parallel processing with solidarity cells by proactively retrieving from a task pool a matching task for the solidarity cell to process | |
US9606797B2 (en) | Compressing execution cycles for divergent execution in a single instruction multiple data (SIMD) processor | |
KR101275698B1 (ko) | 데이터 처리 방법 및 장치 | |
CN111310910A (zh) | 一种计算装置及方法 | |
US10437638B2 (en) | Method and apparatus for dynamically balancing task processing while maintaining task order | |
US20080244222A1 (en) | Many-core processing using virtual processors | |
US9904553B2 (en) | Method and apparatus for implementing dynamic portbinding within a reservation station | |
CN106575220B (zh) | 多个经集群极长指令字处理核心 | |
US11900120B2 (en) | Issuing instructions based on resource conflict constraints in microprocessor | |
US20200192676A1 (en) | Apparatus and method for a high throughput parallel co-processor and interconnect with low offload latency | |
KR20170036035A (ko) | 인터럽트들의 세트들을 구성하는 장치 및 방법 | |
US10241885B2 (en) | System, apparatus and method for multi-kernel performance monitoring in a field programmable gate array | |
US9886327B2 (en) | Resource mapping in multi-threaded central processor units | |
US20210089305A1 (en) | Instruction executing method and apparatus | |
CN111078289B (zh) | 用于执行多线程系统的子线程的方法和多线程系统 | |
US11451241B2 (en) | Setting values of portions of registers based on bit values | |
CN112463218B (zh) | 指令发射控制方法及电路、数据处理方法及电路 | |
CN115525343A (zh) | 一种并行译码的方法、处理器、芯片及电子设备 | |
US10620958B1 (en) | Crossbar between clients and a cache | |
CN114327635A (zh) | 用于处理器的非对称执行端口和分配宽度的可缩放端口绑定的方法、系统和装置 | |
US8683181B2 (en) | Processor and method for distributing load among plural pipeline units | |
CN117931293B (zh) | 指令处理方法、装置、设备及存储介质 | |
US20070226454A1 (en) | Highly scalable MIMD machine for java and .net processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |