CN103809964B - 用线程组执行顺序代码的系统和方法和包含其的simt处理器 - Google Patents
用线程组执行顺序代码的系统和方法和包含其的simt处理器 Download PDFInfo
- Publication number
- CN103809964B CN103809964B CN201310538631.4A CN201310538631A CN103809964B CN 103809964 B CN103809964 B CN 103809964B CN 201310538631 A CN201310538631 A CN 201310538631A CN 103809964 B CN103809964 B CN 103809964B
- Authority
- CN
- China
- Prior art keywords
- thread
- passage
- sequence code
- copy
- instruction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title abstract description 21
- 230000002159 abnormal effect Effects 0.000 claims description 4
- 239000003550 marker Substances 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 239000007858 starting material Substances 0.000 description 3
- 238000007792 addition Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0844—Multiple simultaneous or quasi-simultaneous cache accessing
- G06F12/0853—Cache with multiport tag or data arrays
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/3012—Organisation of register space, e.g. banked or distributed register file
- G06F9/30134—Register stacks; shift registers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/522—Barrier synchronisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/45—Caching of specific data in cache memory
- G06F2212/451—Stack data
Abstract
提供用线程组执行顺序代码的系统和方法和包含其的SIMT处理器。用于在单指令、多线程(SIMT)处理器的上下文中执行顺序代码的系统和方法。在一个实施例中,系统包括:(1)管线控制单元,可操作以创建顺序代码的副本线程组,副本线程中的一个是主线程,副本线程中的其余线程是从线程,以及(2)通道,可操作以:(2a)仅在主线程中执行顺序代码的某些指令,从线程中的相应指令依据该某些指令而被断言,以及(2b)将主线程中的分支条件广播到从线程。
Description
相关申请的交叉引用
本申请要求于2012年11月5日由Lin等人所提交的、序列号为61/722,661的、标题为“EXECUTING SEQUENTIAL CODE USING A GROUP OF THREADS”的美国临时申请以及于2012年12月21日由Lin等人所提交的、序列号为13/723,981的、标题为“SYSTEM AND METHODFOR EXECUTING SEQUENTIAL CODE USING A GROUP OF THREADS AND SIGLE-INSTRUCTION,MULTIPLE-THREAD PROCESSOR INCORPORATING THE SAME”的美国申请的优先权,在先申请与本申请共同受让,并在本文通过援引的方式对二者加以合并。
技术领域
本申请总地涉及并行处理器,并且,更具体地,涉及用于使用线程组执行顺序代码的系统和方法以及包含系统或方法的单指令多线程(SIMT)处理器。
背景技术
如相关领域技术人员意识到的,可并行地执行应用以增加其性能。数据并行应用在不同数据上并发实行相同进程。任务并行应用在相同数据上并发实行不同进程。静态并行应用是具有可在其执行之前被确定的并行度级别的应用。相反,由动态并行应用可达到的并行度仅可随着其执行而被确定。无论应用是数据或任务并行、或静态或动态并行,其可在管线中执行,这通常是用于图形应用的情况。
SIMT处理器尤其擅长执行数据并行应用。SIMT处理器中的管线控制单元创建执行的线程组并调度其用于执行,在执行期间组中的所有线程并发执行相同指令。在一个特定处理器中,每个组具有32个线程,与SIMT处理器中的32个执行管线或通道(lane)相对应。
并行应用典型地包含顺序代码和并行代码区。顺序代码不能并行执行,所以执行在单线程中。当遭遇并行代码时,管线控制单元将执行分开,创建用于并行代码的并行执行的工作者线程组。当再次遭遇顺序代码时,管线控制单元对并行执行的结果加以合并、创建用于顺序代码的另一单线程,并且执行继续。
使组中的线程同步是重要的。同步部分地涉及遵循与每个通道相关联的本地存储器的状态。已经发现,如果在执行顺序代码的同时在通道中的每一个中执行顺序代码的副本(counterpart)线程,那么可使同步更快。因此如果执行稍后被分开,那么假定已经加以遵循本地存储器状态。
发明内容
一个方面提供用于执行顺序代码的系统。在一个实施例中,系统包括:(1)管线控制单元,可操作以创建顺序代码的副本线程组,副本线程中的一个是主线程,副本线程中的其余线程是从线程,以及(2)通道,可操作以:(2a)仅在主线程中执行顺序代码的某些指令,从线程中的相应指令依据该某些指令而被断言(predicate),以及(2b)将主线程中的分支条件广播到从线程。
另一方面提供执行顺序代码的方法。在一个实施例中,方法包括:(1)创建顺序代码的副本线程组,副本线程中的一个是主线程,副本线程中的其余线程是从线程,(2)仅在主线程中执行顺序代码的某些指令,从线程中的相应指令依据该某些指令而被断言,以及(3)将主线程中的分支条件广播到从线程。
又一方面提供SIMT处理器。在一个实施例中,SIMT处理器包括:(1)通道,(2)与通道中的相应通道相关联的本地存储器;(3)由通道所共享的存储器设备,以及(4)管线控制单元,可操作以创建顺序代码的副本线程组并使组在通道中执行,副本线程中的一个是主线程,副本线程中的其余线程是从线程。通道可操作以:(1)仅在主线程中执行顺序代码的某些指令,从线程中的相应指令依据该某些指令而被断言,以及(2)将主线程中的分支条件广播到从线程。
附图说明
现在结合附图对下面的描述加以参考,其中:
图1是SIMT处理器的框图,该SIMT处理器可操作以包含或实行用于使用线程组执行顺序代码的系统或方法;
图2是用于使用线程组执行顺序代码的系统的一个实施例的框图;以及
图3是使用线程组执行顺序代码的方法的一个实施例的流程图。
具体实施方式
如上文所阐明的,已经发现,如果在通道中的每一个中执行顺序代码的副本线程,那么可使SIMT处理器的通道或核心之间的同步过程更快。因为副本线程是相同代码的(即以相同次序的相同指令),并且因为当代码的副本线程开始执行时遵循本地存储器状态,所以本地存储器状态将保持被遵循的假定似乎几成定局。然而,本文认识到的是,可存在这样的条件,在该条件下存储器状态发散(diverge)。
作为一个示例,假定顺序代码的副本线程要执行相同加载指令。加载的存储器位置由寄存器或地址二者之一所指定。如果由寄存器所指定,那么寄存器的值可按线程变化,因为每个线程具有其自己的寄存器的拷贝。如果由地址所指定,那么地址值可指向系统中的不同的线程本地存储器位置。在二者中的任一情况下,每个线程可从多种存储器位置加载不同的值,这致使线程本地存储器状态发散。如果副本线程随后基于所加载的数据进行分支,那么所采取的一些分支将是正确的,其他的将是错误的。
类似地,假定顺序代码的副本线程要执行相同存储指令。被存储到的存储器出于如上文针对加载指令所描述的相同原因而按线程变化。在顺序执行中未被修改的存储器位置在并行执行中将被错误地修改。
作为另一示例,假定顺序代码的副本线程要将数据并发地存储到共享存储器中的相同位置。结果,可能再次淹没(overwhelm)和损坏共享存储器。在这两个示例中所强调的问题均有时会在矢量操作中经历。
作为又一示例,假定异常处置器是各种通道间的共享资源。顺序代码区通常包括可潜在地使异常发生的许多指令。在并行执行这些指令的同时如果出现异常,那么并行处理可能抛出同时的异常并淹没共享的异常处置器,该共享的异常处置器将预期最多一个异常,并且可能并未预期任何异常。
因此本文认识到的是,在执行顺序代码的副本线程的同时本地存储器状态将必然保持被遵循的假定是难以维系的。本文进一步认识到的是,某些操作可能损坏共享存储器或使本地存储器状态发散作为“副作用”,所述某些操作不仅包括加载自和存储到共享存储器而且包括划分和潜在地导致异常的其他指令。本文还进一步认识到的是,需要机制以确保顺序代码的语义不会经由发散线程本地存储器状态而被曲解。
因此,本文所引入的是用于使用线程组执行顺序代码的系统和方法的各种实施例。从很高级别来看,各种实施例使顺序代码的副本线程执行仿真顺序代码的主线程执行。
根据各种实施例,副本线程中的一个被指派为主线程,其他线程被指派为从线程。随后以主线程中的相应指令来断言从线程中的某些指令(典型地,那些可采用共享资源或确实采用共享资源的指令),并且仅执行主线程中的相应指令。如果在主线程中遭遇分支指令,那么主线程中的分支条件随后被广播到从线程。
图1是SIMT处理器100的框图,该SIMT处理器100可操作以包含或实行用于使用线程组执行顺序代码的系统或方法。SIMT处理器100包括被组织成线程组104或“线程束(warp)”的多个线程处理器或核心106。SIMT处理器100包含J个线程组104-1到104-J,每组具有K个核心106-1到106-K。在某些实施例中,线程组104-1到104-J可进一步被组织成一个或多个线程块102。一个具体实施例具有每线程组104三十二个核心106。其他实施例可包括少如每线程组中四个核心或多如数万核心。某些实施例将核心106组织成单线程组104,而其他实施例可具有数百或甚至数千个线程组104。SIMT处理器100的可替代实施例可将核心106仅组织成线程组104,省略线程块组织级别。
SIMT处理器100进一步包括管线控制单元108、共享存储器110和与线程组104-1到104-J相关联的本地存储器112-1到112-J的阵列。管线控制单元108通过数据总线114将任务分布到各个线程组104-1到104-J。管线控制单元108创建、管理、调度、执行并提供机制以将线程组104-1到104-J同步。在图形处理单元(GPU)内找到SIMT处理器100的某些实施例。一些GPU提供组同步指令,诸如由加利福尼亚州圣塔克拉拉市的Nvidia公司所制造的GPU中的bar.sync。某些实施例支持线程组的发散条件分支的执行。给定分支,线程组104内的一些线程将因为分支条件断言评估为“真”而采取分支,并且其他线程因为分支条件断言评估为“伪”而落到下一指令。管线控制单元108通过以下各项跟踪活动的线程:首先执行路径之一,其中采取了分支或未采取分支,随后执行替代路径,这针对每个路径使能适当的线程。
继续图1的实施例,线程组内的核心106相互并行地执行。线程组104-1到104-J通过存储器总线116与共享存储器110进行通信。线程组104-1到104J通过本地总线118-1到118-J分别与本地存储器112-1到112-J进行通信。例如线程组104-J以通过本地总线118-J进行通信来利用本地存储器112-J。SIMT处理器100的某些实施例将共享存储器110的共享部分分配到每个线程块102,并允许由线程块102内的所有线程组104访问共享存储器110的共享部分。某些实施例包括仅使用本地存储器112的线程组104。许多其他实施例包括平衡本地存储器112和共享存储器110的使用的线程组104。
图1的实施例包括主线程组104-1。其余线程组104-2到104-J中的每一个被视为“工作者”线程组。主线程组104-1包括许多核心,其中的一个是主核心106-1,该主核心106-1最终执行主线程。在SIMT处理器110上所执行的程序被构建为内核的序列。典型地,每个内核在下一内核开始之前完成执行。在某些实施例中,SIMT100可并行执行多个内核,这取决于内核的大小。每个内核被组织为要在核心106上所执行的线程的层级。
图2是用于使用线程组执行顺序代码的系统200的一个实施例的框图。系统200包括具有顺序区204和并行区206的程序202、存储器208、断言模块210、线程标识器212、线程启动器214和线程组104。图1的线程组104包括K个核心106-1到106-K或通道。
线程组104耦连到存储器208,该存储器208被分配到与核心106-1到106-K中的每一个有关的段(section)。线程启动器214在核心106-1到106-K中创建处理线程。指派通常是第一核心106-1的一个核心执行主线程。其余线程是工作者线程。传统上,主线程执行程序202的顺序区204,并且并行区206传统上在工作者线程中执行。当到达并行区206时,线程启动器214创建必要的工作者线程以执行并行处理。
在图2的实施例中,程序202的顺序区204由断言模块210所处理。断言模块指派某些操作仅在主线程上实行。由线程标识器212实现断言,该线程标识器202标识主线程用于处理该某些操作。顺序区204的平衡在线程组104中的所有线程中执行。当工作者线程到达顺序区204的所断言的片段(segment)时,工作者线程跳过所断言的片段并继续直到到达分支语句为止。当工作者线程到达分支语句时,其等待来自主线程的引导,因为仅主线程可能可靠地评估分支条件。一旦主线程处理所断言的片段、到达分支语句并评估分支条件,主线程将分支条件广播到工作者线程中的每一个。工作者线程可随后恢复通过程序202的顺序区204的继续进行。
图3是使用线程组执行顺序代码的方法的一个实施例的流程图。顺序代码可以是矢量操作的一部分、根据OpenMP或OpenACC编程模型所开发的程序的一部分、或与无论任何类型的另一应用相关联。
方法开始于开始步骤310。在步骤320,创建顺序代码的副本线程组,副本线程中的一个是主线程,副本线程中的其余线程是从线程。在步骤330,仅在主线程中执行顺序代码的某些指令,从线程中的相应指令依据该某些指令而被断言。在各种实施例中,某些指令可以是加载指令、存储指令、划分指令、或可产生或可被认为产生副作用的任何其他指令。在一个实施例中,使用基于线程标识器的条件来断言相应指令。
在步骤340,将主线程中的分支条件广播到从线程。在一个实施例中,在主线程中的分支指令的执行之前广播分支条件,并且仅在广播之后在从线程中执行相应分支指令。方法结束于结束步骤350。
本申请相关领域技术人员将理解的是,可对所描述的实施例做出其他和进一步的添加、删除、替换和修改。
Claims (9)
1.一种用于执行顺序代码的系统,包括:
管线控制单元,被配置为在单指令多线程(SIMT)处理器中创建所述顺序代码的副本线程组;所述副本线程中的一个是主线程,所述副本线程中的其余线程是从线程;以及
通道,被配置为:
仅在所述主线程中执行所述顺序代码的某些指令,所述从线程中的相应指令依据所述某些指令而被断言,以及
将所述主线程中的分支条件广播到所述从线程。
2.根据权利要求1所述的系统,其中与执行所述从线程的通道相关联的本地存储器进一步配置为存储所述分支条件。
3.根据权利要求1所述的系统,其中所述某些指令选择自包括以下各项的组中:
加载指令,
存储指令,以及
异常引出指令。
4.根据权利要求1所述的系统,其中执行所述主线程的通道进一步被配置为在所述主线程中的分支指令的执行之前广播所述分支条件,并且执行所述从线程的通道进一步被配置为仅在所述通道广播所述分支条件之后在所述从线程中执行相应分支指令。
5.根据权利要求1所述的系统,其中所述管线控制单元进一步被配置为使用基于线程标识器的条件来断言所述相应指令。
6.根据权利要求1所述的系统,其中所述顺序代码是矢量操作的一部分。
7.一种单指令、多线程SIMT处理器,包括:
通道;
与所述通道中的相应通道相关联的本地存储器;
由所述通道所共享的存储器设备;以及
管线控制单元,被配置为创建顺序代码的副本线程组并使所述组在所述通道中执行,所述副本线程中的一个是主线程,所述副本线程中的其余线程是从线程,所述通道被配置为:
仅在所述主线程中执行所述顺序代码的某些指令,所述从线程中的相应指令依据所述某些指令而被断言,以及
将所述主线程中的分支条件广播到所述从线程。
8.根据权利要求7所述的SIMT处理器,其中与执行所述从线程的通道相关联的所述本地存储器进一步配置为存储所述分支条件。
9.根据权利要求7所述的SIMT处理器,其中所述某些指令选择自包括以下各项的组中:
加载指令,
存储指令,以及
异常引出指令。
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261722661P | 2012-11-05 | 2012-11-05 | |
US61/722,661 | 2012-11-05 | ||
US13/723,981 | 2012-12-21 | ||
US13/723,981 US9436475B2 (en) | 2012-11-05 | 2012-12-21 | System and method for executing sequential code using a group of threads and single-instruction, multiple-thread processor incorporating the same |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103809964A CN103809964A (zh) | 2014-05-21 |
CN103809964B true CN103809964B (zh) | 2017-06-16 |
Family
ID=50623483
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310538409.4A Pending CN103885751A (zh) | 2012-11-05 | 2013-11-04 | 将区别属性的存储器分配给共享数据对象的系统和方法 |
CN201310538631.4A Active CN103809964B (zh) | 2012-11-05 | 2013-11-04 | 用线程组执行顺序代码的系统和方法和包含其的simt处理器 |
CN201310538507.8A Pending CN103809963A (zh) | 2012-11-05 | 2013-11-04 | 转译程序函数以正确处置局部作用域变量的系统和方法 |
CN201310538671.9A Pending CN103809936A (zh) | 2012-11-05 | 2013-11-04 | 编译或运行时执行分叉-合并数据并行程序的系统和方法 |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310538409.4A Pending CN103885751A (zh) | 2012-11-05 | 2013-11-04 | 将区别属性的存储器分配给共享数据对象的系统和方法 |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310538507.8A Pending CN103809963A (zh) | 2012-11-05 | 2013-11-04 | 转译程序函数以正确处置局部作用域变量的系统和方法 |
CN201310538671.9A Pending CN103809936A (zh) | 2012-11-05 | 2013-11-04 | 编译或运行时执行分叉-合并数据并行程序的系统和方法 |
Country Status (3)
Country | Link |
---|---|
US (4) | US9710275B2 (zh) |
CN (4) | CN103885751A (zh) |
TW (4) | TW201443783A (zh) |
Families Citing this family (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9710275B2 (en) | 2012-11-05 | 2017-07-18 | Nvidia Corporation | System and method for allocating memory of differing properties to shared data objects |
US9519668B2 (en) * | 2013-05-06 | 2016-12-13 | International Business Machines Corporation | Lock-free creation of hash tables in parallel |
US9250877B2 (en) * | 2013-09-20 | 2016-02-02 | Cray Inc. | Assisting parallelization of a computer program |
US9207979B1 (en) * | 2014-05-28 | 2015-12-08 | Freescale Semiconductor, Inc. | Explicit barrier scheduling mechanism for pipelining of stream processing algorithms |
US10061592B2 (en) | 2014-06-27 | 2018-08-28 | Samsung Electronics Co., Ltd. | Architecture and execution for efficient mixed precision computations in single instruction multiple data/thread (SIMD/T) devices |
US10061591B2 (en) | 2014-06-27 | 2018-08-28 | Samsung Electronics Company, Ltd. | Redundancy elimination in single instruction multiple data/thread (SIMD/T) execution processing |
US9804883B2 (en) * | 2014-11-14 | 2017-10-31 | Advanced Micro Devices, Inc. | Remote scoped synchronization for work stealing and sharing |
US9886317B2 (en) * | 2015-02-02 | 2018-02-06 | Oracle International Corporation | Fine-grained scheduling of work in runtime systems |
US10318307B2 (en) * | 2015-06-17 | 2019-06-11 | Mediatek, Inc. | Scalarization of vector processing |
US9594512B1 (en) * | 2015-06-19 | 2017-03-14 | Pure Storage, Inc. | Attributing consumed storage capacity among entities storing data in a storage array |
GB2539958B (en) * | 2015-07-03 | 2019-09-25 | Advanced Risc Mach Ltd | Data processing systems |
GB2540937B (en) * | 2015-07-30 | 2019-04-03 | Advanced Risc Mach Ltd | Graphics processing systems |
US10996989B2 (en) * | 2016-06-13 | 2021-05-04 | International Business Machines Corporation | Flexible optimized data handling in systems with multiple memories |
US10255132B2 (en) * | 2016-06-22 | 2019-04-09 | Advanced Micro Devices, Inc. | System and method for protecting GPU memory instructions against faults |
US10310861B2 (en) * | 2017-04-01 | 2019-06-04 | Intel Corporation | Mechanism for scheduling threads on a multiprocessor |
GB2560059B (en) * | 2017-06-16 | 2019-03-06 | Imagination Tech Ltd | Scheduling tasks |
TWI647619B (zh) * | 2017-08-29 | 2019-01-11 | 智微科技股份有限公司 | 用來於一電子裝置中進行硬體資源管理之方法以及對應的電子裝置 |
CN109471673B (zh) * | 2017-09-07 | 2022-02-01 | 智微科技股份有限公司 | 用来于电子装置中进行硬件资源管理的方法及电子装置 |
US10990394B2 (en) * | 2017-09-28 | 2021-04-27 | Intel Corporation | Systems and methods for mixed instruction multiple data (xIMD) computing |
US11068247B2 (en) | 2018-02-06 | 2021-07-20 | Microsoft Technology Licensing, Llc | Vectorizing conditional min-max sequence reduction loops |
US10606595B2 (en) * | 2018-03-23 | 2020-03-31 | Arm Limited | Data processing systems |
CN108804311B (zh) * | 2018-05-07 | 2022-06-03 | 微梦创科网络科技(中国)有限公司 | 一种执行测试文件的方法及装置 |
US11061742B2 (en) * | 2018-06-27 | 2021-07-13 | Intel Corporation | System, apparatus and method for barrier synchronization in a multi-threaded processor |
CN110032407B (zh) | 2019-03-08 | 2020-12-22 | 创新先进技术有限公司 | 提升cpu并行性能的方法及装置和电子设备 |
CN110990151A (zh) * | 2019-11-24 | 2020-04-10 | 浪潮电子信息产业股份有限公司 | 一种基于异构计算平台的业务处理方法 |
CN112114877B (zh) * | 2020-09-28 | 2023-03-14 | 西安芯瞳半导体技术有限公司 | 一种动态补偿线程束warp的方法、处理器及计算机存储介质 |
US11861403B2 (en) | 2020-10-15 | 2024-01-02 | Nxp Usa, Inc. | Method and system for accelerator thread management |
CN112214243B (zh) * | 2020-10-21 | 2022-05-27 | 上海壁仞智能科技有限公司 | 配置向量运算系统中的协作线程束的装置和方法 |
CN112579164B (zh) * | 2020-12-05 | 2022-10-25 | 西安翔腾微电子科技有限公司 | 一种simt条件分支处理装置及方法 |
US11361400B1 (en) | 2021-05-06 | 2022-06-14 | Arm Limited | Full tile primitives in tile-based graphics processing |
CN117009054A (zh) * | 2023-07-27 | 2023-11-07 | 北京登临科技有限公司 | 一种simt装置、线程组动态构建方法及处理器 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1725176A (zh) * | 1999-12-30 | 2006-01-25 | 英特尔公司 | 多线程流水线指令解码器的方法和设备 |
US8250555B1 (en) * | 2007-02-07 | 2012-08-21 | Tilera Corporation | Compiling code for parallel processing architectures based on control flow |
Family Cites Families (81)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5875464A (en) | 1991-12-10 | 1999-02-23 | International Business Machines Corporation | Computer system with private and shared partitions in cache |
US6058465A (en) * | 1996-08-19 | 2000-05-02 | Nguyen; Le Trong | Single-instruction-multiple-data processing in a multimedia signal processor |
JP3849951B2 (ja) * | 1997-02-27 | 2006-11-22 | 株式会社日立製作所 | 主記憶共有型マルチプロセッサ |
US6009517A (en) * | 1997-10-06 | 1999-12-28 | Sun Microsystems, Inc. | Mixed execution stack and exception handling |
GB9825102D0 (en) * | 1998-11-16 | 1999-01-13 | Insignia Solutions Plc | Computer system |
US6389449B1 (en) * | 1998-12-16 | 2002-05-14 | Clearwater Networks, Inc. | Interstream control and communications for multi-streaming digital processors |
US7020879B1 (en) * | 1998-12-16 | 2006-03-28 | Mips Technologies, Inc. | Interrupt and exception handling for multi-streaming digital processors |
US6574725B1 (en) * | 1999-11-01 | 2003-06-03 | Advanced Micro Devices, Inc. | Method and mechanism for speculatively executing threads of instructions |
JP3632635B2 (ja) * | 2001-07-18 | 2005-03-23 | 日本電気株式会社 | マルチスレッド実行方法及び並列プロセッサシステム |
TWI245221B (en) * | 2002-08-22 | 2005-12-11 | Ip First Llc | Apparatus and method for selective memory attribute control |
US7159211B2 (en) * | 2002-08-29 | 2007-01-02 | Indian Institute Of Information Technology | Method for executing a sequential program in parallel with automatic fault tolerance |
US7086063B1 (en) | 2003-03-25 | 2006-08-01 | Electric Cloud, Inc. | System and method for file caching in a distributed program build environment |
US8683132B1 (en) | 2003-09-29 | 2014-03-25 | Nvidia Corporation | Memory controller for sequentially prefetching data for a processor of a computer system |
US7464370B2 (en) * | 2003-12-04 | 2008-12-09 | International Business Machines Corporation | Creating a method from a block of code |
US7380086B2 (en) * | 2003-12-12 | 2008-05-27 | International Business Machines Corporation | Scalable runtime system for global address space languages on shared and distributed memory machines |
WO2005096136A1 (en) * | 2004-03-31 | 2005-10-13 | Intel Corporation | Stack caching using code sharing |
US7822779B2 (en) | 2004-04-23 | 2010-10-26 | Wal-Mart Stores, Inc. | Method and apparatus for scalable transport processing fulfillment system |
US20060095675A1 (en) | 2004-08-23 | 2006-05-04 | Rongzhen Yang | Three stage hybrid stack model |
US7469318B2 (en) | 2005-02-10 | 2008-12-23 | International Business Machines Corporation | System bus structure for large L2 cache array topology with different latency domains |
US7574702B2 (en) * | 2005-03-18 | 2009-08-11 | Microsoft Corporation | Method and apparatus for hybrid stack walking |
US8516483B2 (en) | 2005-05-13 | 2013-08-20 | Intel Corporation | Transparent support for operating system services for a sequestered sequencer |
US8397013B1 (en) * | 2006-10-05 | 2013-03-12 | Google Inc. | Hybrid memory module |
KR101257848B1 (ko) * | 2005-07-13 | 2013-04-24 | 삼성전자주식회사 | 복합 메모리를 구비하는 데이터 저장 시스템 및 그 동작방법 |
US20070136523A1 (en) * | 2005-12-08 | 2007-06-14 | Bonella Randy M | Advanced dynamic disk memory module special operations |
US20070143582A1 (en) | 2005-12-16 | 2007-06-21 | Nvidia Corporation | System and method for grouping execution threads |
US7478190B2 (en) | 2006-02-10 | 2009-01-13 | University Of Utah Technology Commercialization Office | Microarchitectural wire management for performance and power in partitioned architectures |
JP4900784B2 (ja) | 2006-04-13 | 2012-03-21 | 株式会社日立製作所 | ストレージシステム及びストレージシステムのデータ移行方法 |
US8108844B2 (en) * | 2006-06-20 | 2012-01-31 | Google Inc. | Systems and methods for dynamically choosing a processing element for a compute kernel |
US8606998B2 (en) | 2006-08-24 | 2013-12-10 | Advanced Micro Devices, Inc. | System and method for instruction-based cache allocation policies |
US7584335B2 (en) * | 2006-11-02 | 2009-09-01 | International Business Machines Corporation | Methods and arrangements for hybrid data storage |
US20080109795A1 (en) * | 2006-11-02 | 2008-05-08 | Nvidia Corporation | C/c++ language extensions for general-purpose graphics processing unit |
US7593263B2 (en) * | 2006-12-17 | 2009-09-22 | Anobit Technologies Ltd. | Memory device with reduced reading latency |
CN101030152A (zh) * | 2007-03-20 | 2007-09-05 | 华为技术有限公司 | 基于伪同步方式的操作控制方法及装置 |
US8095782B1 (en) * | 2007-04-05 | 2012-01-10 | Nvidia Corporation | Multiple simultaneous context architecture for rebalancing contexts on multithreaded processing cores upon a context change |
US8286196B2 (en) * | 2007-05-03 | 2012-10-09 | Apple Inc. | Parallel runtime execution on multiple processors |
US7856541B2 (en) * | 2007-04-18 | 2010-12-21 | Hitachi, Ltd. | Latency aligned volume provisioning methods for interconnected multiple storage controller configuration |
KR101458028B1 (ko) * | 2007-05-30 | 2014-11-04 | 삼성전자 주식회사 | 병렬 처리 장치 및 방법 |
DE102007025397B4 (de) * | 2007-05-31 | 2010-07-15 | Advanced Micro Devices, Inc., Sunnyvale | System mit mehreren Prozessoren und Verfahren zu seinem Betrieb |
CN101329638B (zh) | 2007-06-18 | 2011-11-09 | 国际商业机器公司 | 程序代码的并行性的分析方法和系统 |
US8966488B2 (en) | 2007-07-06 | 2015-02-24 | XMOS Ltd. | Synchronising groups of threads with dedicated hardware logic |
TW200917277A (en) * | 2007-10-15 | 2009-04-16 | A Data Technology Co Ltd | Adaptive hybrid density memory storage device and control method thereof |
US20090240930A1 (en) * | 2008-03-24 | 2009-09-24 | International Business Machines Corporation | Executing An Application On A Parallel Computer |
US9477587B2 (en) * | 2008-04-11 | 2016-10-25 | Micron Technology, Inc. | Method and apparatus for a volume management system in a non-volatile memory device |
US8161483B2 (en) * | 2008-04-24 | 2012-04-17 | International Business Machines Corporation | Configuring a parallel computer based on an interleave rate of an application containing serial and parallel segments |
US8291427B2 (en) * | 2008-06-09 | 2012-10-16 | International Business Machines Corporation | Scheduling applications for execution on a plurality of compute nodes of a parallel computer to manage temperature of the nodes during execution |
US20100079454A1 (en) | 2008-09-29 | 2010-04-01 | Legakis Justin S | Single Pass Tessellation |
US9672019B2 (en) * | 2008-11-24 | 2017-06-06 | Intel Corporation | Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program to multiple parallel threads |
US8528001B2 (en) * | 2008-12-15 | 2013-09-03 | Oracle America, Inc. | Controlling and dynamically varying automatic parallelization |
US20100169540A1 (en) * | 2008-12-30 | 2010-07-01 | Sinclair Alan W | Method and apparatus for relocating selected data between flash partitions in a memory device |
US8321645B2 (en) | 2009-04-29 | 2012-11-27 | Netapp, Inc. | Mechanisms for moving data in a hybrid aggregate |
US8914799B2 (en) * | 2009-06-30 | 2014-12-16 | Oracle America Inc. | High performance implementation of the OpenMP tasking feature |
US8561046B2 (en) * | 2009-09-14 | 2013-10-15 | Oracle America, Inc. | Pipelined parallelization with localized self-helper threading |
US8271729B2 (en) | 2009-09-18 | 2012-09-18 | International Business Machines Corporation | Read and write aware cache storing cache lines in a read-often portion and a write-often portion |
US9798543B2 (en) | 2009-09-24 | 2017-10-24 | Nvidia Corporation | Fast mapping table register file allocation algorithm for SIMT processors |
US8677106B2 (en) | 2009-09-24 | 2014-03-18 | Nvidia Corporation | Unanimous branch instructions in a parallel thread processor |
US8335892B1 (en) * | 2009-09-28 | 2012-12-18 | Nvidia Corporation | Cache arbitration between multiple clients |
US8607004B2 (en) * | 2009-11-13 | 2013-12-10 | Richard S. Anderson | Distributed symmetric multiprocessing computing architecture |
US8612978B2 (en) * | 2009-12-10 | 2013-12-17 | Oracle America, Inc. | Code execution utilizing single or multiple threads |
US9696995B2 (en) | 2009-12-30 | 2017-07-04 | International Business Machines Corporation | Parallel execution unit that extracts data parallelism at runtime |
US20110191522A1 (en) | 2010-02-02 | 2011-08-04 | Condict Michael N | Managing Metadata and Page Replacement in a Persistent Cache in Flash Memory |
US9235531B2 (en) | 2010-03-04 | 2016-01-12 | Microsoft Technology Licensing, Llc | Multi-level buffer pool extensions |
CN101819675B (zh) | 2010-04-19 | 2011-08-10 | 浙江大学 | 一种基于gpu的层次包围盒的快速构造方法 |
US8650554B2 (en) | 2010-04-27 | 2014-02-11 | International Business Machines Corporation | Single thread performance in an in-order multi-threaded processor |
US8898324B2 (en) | 2010-06-24 | 2014-11-25 | International Business Machines Corporation | Data access management in a hybrid memory server |
US8751771B2 (en) | 2010-09-29 | 2014-06-10 | Nvidia Corporation | Efficient implementation of arrays of structures on SIMT and SIMD architectures |
US8547385B2 (en) | 2010-10-15 | 2013-10-01 | Via Technologies, Inc. | Systems and methods for performing shared memory accesses |
US8499305B2 (en) * | 2010-10-15 | 2013-07-30 | Via Technologies, Inc. | Systems and methods for performing multi-program general purpose shader kickoff |
US9552206B2 (en) * | 2010-11-18 | 2017-01-24 | Texas Instruments Incorporated | Integrated circuit with control node circuitry and processing circuitry |
KR101713051B1 (ko) | 2010-11-29 | 2017-03-07 | 삼성전자주식회사 | 하이브리드 메모리 시스템, 및 그 관리 방법 |
US8996842B2 (en) | 2010-12-09 | 2015-03-31 | Seagate Technology Llc | Memory stacks management |
KR20120082218A (ko) * | 2011-01-13 | 2012-07-23 | (주)인디링스 | 파티션 정보를 기초로 호스트의 요청에 대한 처리 기법을 적응적으로 결정하는 스토리지 장치 및 상기 스토리지 장치의 동작 방법 |
JP5576305B2 (ja) * | 2011-01-20 | 2014-08-20 | インターナショナル・ビジネス・マシーンズ・コーポレーション | コンピュータの動作制御方法、プログラム及びシステム |
US9195550B2 (en) | 2011-02-03 | 2015-11-24 | International Business Machines Corporation | Method for guaranteeing program correctness using fine-grained hardware speculative execution |
JP5668573B2 (ja) | 2011-03-30 | 2015-02-12 | 日本電気株式会社 | マイクロプロセッサ、メモリアクセス方法 |
US8769537B1 (en) * | 2011-06-08 | 2014-07-01 | Workday, Inc. | System for partitioning batch processes |
KR101895605B1 (ko) * | 2011-11-21 | 2018-10-25 | 삼성전자주식회사 | 플래시 메모리 장치 및 그것의 프로그램 방법 |
US9921873B2 (en) * | 2012-01-31 | 2018-03-20 | Nvidia Corporation | Controlling work distribution for processing tasks |
US9063759B2 (en) * | 2012-03-28 | 2015-06-23 | International Business Machines Corporation | Optimizing subroutine calls based on architecture level of called subroutine |
US9575806B2 (en) | 2012-06-29 | 2017-02-21 | Intel Corporation | Monitoring accesses of a thread to multiple memory controllers and selecting a thread processor for the thread based on the monitoring |
TWI479314B (zh) * | 2012-08-30 | 2015-04-01 | Phison Electronics Corp | 系統資料儲存方法、記憶體控制器與記憶體儲存裝置 |
US9710275B2 (en) | 2012-11-05 | 2017-07-18 | Nvidia Corporation | System and method for allocating memory of differing properties to shared data objects |
-
2012
- 2012-12-21 US US13/724,089 patent/US9710275B2/en active Active
- 2012-12-21 US US13/723,981 patent/US9436475B2/en active Active
- 2012-12-21 US US13/724,202 patent/US9727338B2/en active Active
- 2012-12-21 US US13/724,359 patent/US9747107B2/en active Active
-
2013
- 2013-11-04 CN CN201310538409.4A patent/CN103885751A/zh active Pending
- 2013-11-04 CN CN201310538631.4A patent/CN103809964B/zh active Active
- 2013-11-04 CN CN201310538507.8A patent/CN103809963A/zh active Pending
- 2013-11-04 CN CN201310538671.9A patent/CN103809936A/zh active Pending
- 2013-11-05 TW TW102140064A patent/TW201443783A/zh unknown
- 2013-11-05 TW TW102140061A patent/TWI510919B/zh active
- 2013-11-05 TW TW102140062A patent/TWI494853B/zh not_active IP Right Cessation
- 2013-11-05 TW TW102140063A patent/TWI488111B/zh not_active IP Right Cessation
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1725176A (zh) * | 1999-12-30 | 2006-01-25 | 英特尔公司 | 多线程流水线指令解码器的方法和设备 |
US8250555B1 (en) * | 2007-02-07 | 2012-08-21 | Tilera Corporation | Compiling code for parallel processing architectures based on control flow |
Also Published As
Publication number | Publication date |
---|---|
CN103809936A (zh) | 2014-05-21 |
TW201443639A (zh) | 2014-11-16 |
CN103809963A (zh) | 2014-05-21 |
CN103885751A (zh) | 2014-06-25 |
TW201439905A (zh) | 2014-10-16 |
TWI488111B (zh) | 2015-06-11 |
US20140129812A1 (en) | 2014-05-08 |
US20140129783A1 (en) | 2014-05-08 |
CN103809964A (zh) | 2014-05-21 |
US20140130021A1 (en) | 2014-05-08 |
TW201443783A (zh) | 2014-11-16 |
TWI494853B (zh) | 2015-08-01 |
US20140130052A1 (en) | 2014-05-08 |
TWI510919B (zh) | 2015-12-01 |
TW201439907A (zh) | 2014-10-16 |
US9436475B2 (en) | 2016-09-06 |
US9727338B2 (en) | 2017-08-08 |
US9747107B2 (en) | 2017-08-29 |
US9710275B2 (en) | 2017-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103809964B (zh) | 用线程组执行顺序代码的系统和方法和包含其的simt处理器 | |
US20200285473A1 (en) | Synchronisation of execution threads on a multi-threaded processor | |
US10831490B2 (en) | Device and method for scheduling multiple thread groups on SIMD lanes upon divergence in a single thread group | |
US8677106B2 (en) | Unanimous branch instructions in a parallel thread processor | |
US9582321B2 (en) | System and method of data processing | |
CN1983196B (zh) | 用于将执行线程分组的系统和方法 | |
JP2022539844A (ja) | 静止再構成可能データ・プロセッサ | |
US20090240895A1 (en) | Systems and methods for coalescing memory accesses of parallel threads | |
US9256430B2 (en) | Instruction scheduling approach to improve processor performance | |
US10152328B2 (en) | Systems and methods for voting among parallel threads | |
US10409610B2 (en) | Method and apparatus for inter-lane thread migration | |
CN102334108A (zh) | 具有可指派通用寄存器组的处理器 | |
JP5630798B1 (ja) | プロセッサーおよび方法 | |
CN113791770A (zh) | 代码编译器、代码编译方法、代码编译系统和计算机介质 | |
US11379262B2 (en) | Cascading of graph streaming processors | |
US10635443B2 (en) | Apparatus and method to emulate a sequence of instructions via parallel processors | |
EP3935491B1 (en) | Instruction ordering | |
US20080098204A1 (en) | Method And Apparatus For Improving The Efficiency Of A Processor Instruction Pipeline | |
US20130061000A1 (en) | Software compiler generated threaded environment | |
CN115439303A (zh) | 图形处理 | |
JP2005135123A (ja) | マイクロプロセッサでのパイプライン制御機能検証の方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |