CN104603749B - 用于在单指令多数据simd处理系统中控制发散分支指令的方法和设备 - Google Patents

用于在单指令多数据simd处理系统中控制发散分支指令的方法和设备 Download PDF

Info

Publication number
CN104603749B
CN104603749B CN201380046580.1A CN201380046580A CN104603749B CN 104603749 B CN104603749 B CN 104603749B CN 201380046580 A CN201380046580 A CN 201380046580A CN 104603749 B CN104603749 B CN 104603749B
Authority
CN
China
Prior art keywords
minrc
instruction
value
program
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201380046580.1A
Other languages
English (en)
Chinese (zh)
Other versions
CN104603749A (zh
Inventor
陈琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN104603749A publication Critical patent/CN104603749A/zh
Application granted granted Critical
Publication of CN104603749B publication Critical patent/CN104603749B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • G06F9/322Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • G06F9/322Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
    • G06F9/323Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for indirect branch instructions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3888Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple threads [SIMT] in parallel
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3888Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple threads [SIMT] in parallel
    • G06F9/38885Divergence aspects

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Executing Machine-Instructions (AREA)
  • Advance Control (AREA)
CN201380046580.1A 2012-09-10 2013-08-08 用于在单指令多数据simd处理系统中控制发散分支指令的方法和设备 Active CN104603749B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US13/608,668 2012-09-10
US13/608,668 US9229721B2 (en) 2012-09-10 2012-09-10 Executing subroutines in a multi-threaded processing system
PCT/US2013/054148 WO2014039206A1 (en) 2012-09-10 2013-08-08 Executing subroutines in a multi-threaded processing system

Publications (2)

Publication Number Publication Date
CN104603749A CN104603749A (zh) 2015-05-06
CN104603749B true CN104603749B (zh) 2017-09-19

Family

ID=49004016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201380046580.1A Active CN104603749B (zh) 2012-09-10 2013-08-08 用于在单指令多数据simd处理系统中控制发散分支指令的方法和设备

Country Status (6)

Country Link
US (1) US9229721B2 (enExample)
EP (1) EP2893434B1 (enExample)
JP (1) JP6073479B2 (enExample)
KR (1) KR101660659B1 (enExample)
CN (1) CN104603749B (enExample)
WO (1) WO2014039206A1 (enExample)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8832417B2 (en) 2011-09-07 2014-09-09 Qualcomm Incorporated Program flow control for multiple divergent SIMD threads using a minimum resume counter
US9256429B2 (en) 2012-08-08 2016-02-09 Qualcomm Incorporated Selectively activating a resume check operation in a multi-threaded processing system
US9619230B2 (en) * 2013-06-28 2017-04-11 International Business Machines Corporation Predictive fetching and decoding for selected instructions
US9582321B2 (en) * 2013-11-08 2017-02-28 Swarm64 As System and method of data processing
FR3013869B1 (fr) * 2013-11-22 2016-01-01 Thales Sa Procede de detection des debordements de pile et processeur pour la mise en oeuvre d'un tel procede
US10133572B2 (en) * 2014-05-02 2018-11-20 Qualcomm Incorporated Techniques for serialized execution in a SIMD processing system
GB2529899B (en) * 2014-09-08 2021-06-23 Advanced Risc Mach Ltd Shared Resources in a Data Processing Apparatus for Executing a Plurality of Threads
US9928076B2 (en) * 2014-09-26 2018-03-27 Intel Corporation Method and apparatus for unstructured control flow for SIMD execution engine
US9766892B2 (en) * 2014-12-23 2017-09-19 Intel Corporation Method and apparatus for efficient execution of nested branches on a graphics processor unit
US9921838B2 (en) * 2015-10-02 2018-03-20 Mediatek Inc. System and method for managing static divergence in a SIMD computing architecture
US9928117B2 (en) 2015-12-11 2018-03-27 Vivante Corporation Hardware access counters and event generation for coordinating multithreaded processing
FR3082331B1 (fr) * 2018-06-08 2020-09-18 Commissariat Energie Atomique Processeur mimd emule sur architecture simd
US12405790B2 (en) * 2019-06-28 2025-09-02 Advanced Micro Devices, Inc. Compute unit sorting for reduced divergence
US12314760B2 (en) * 2021-09-27 2025-05-27 Advanced Micro Devices, Inc. Garbage collecting wavefront
US20250306946A1 (en) * 2024-03-27 2025-10-02 Advanced Micro Devices, Inc. Independent progress of lanes in a vector processor

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5938762A (en) * 1995-10-06 1999-08-17 Denso Corporation Method and apparatus for performing exception processing routine in pipeline processing
US7353369B1 (en) * 2005-07-13 2008-04-01 Nvidia Corporation System and method for managing divergent threads in a SIMD architecture
US7477255B1 (en) * 2004-04-12 2009-01-13 Nvidia Corporation System and method for synchronizing divergent samples in a programmable graphics processing unit
US7543136B1 (en) * 2005-07-13 2009-06-02 Nvidia Corporation System and method for managing divergent threads using synchronization tokens and program instructions that include set-synchronization bits
US7617384B1 (en) * 2006-11-06 2009-11-10 Nvidia Corporation Structured programming control flow using a disable mask in a SIMD architecture
US20110078690A1 (en) * 2009-09-28 2011-03-31 Brian Fahs Opcode-Specified Predicatable Warp Post-Synchronization

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4435758A (en) 1980-03-10 1984-03-06 International Business Machines Corporation Method for conditional branch execution in SIMD vector processors
US5522083A (en) * 1989-11-17 1996-05-28 Texas Instruments Incorporated Reconfigurable multi-processor operating in SIMD mode with one processor fetching instructions for use by remaining processors
GB2273377A (en) 1992-12-11 1994-06-15 Hughes Aircraft Co Multiple masks for array processors
US5689677A (en) 1995-06-05 1997-11-18 Macmillan; David C. Circuit for enhancing performance of a computer for personal use
US6272616B1 (en) * 1998-06-17 2001-08-07 Agere Systems Guardian Corp. Method and apparatus for executing multiple instruction streams in a digital processor with multiple data paths
US6889319B1 (en) 1999-12-09 2005-05-03 Intel Corporation Method and apparatus for entering and exiting multiple threads within a multithreaded processor
US7454600B2 (en) * 2001-06-22 2008-11-18 Intel Corporation Method and apparatus for assigning thread priority in a processor or the like
US6947047B1 (en) 2001-09-20 2005-09-20 Nvidia Corporation Method and system for programmable pipelined graphics processing with branching instructions
JP2004206692A (ja) * 2002-12-20 2004-07-22 Internatl Business Mach Corp <Ibm> マルチスレッド化プロセッサ・システム上での実行のために、スレッドについての優先順位値を決定する方法および装置
US20050050305A1 (en) 2003-08-28 2005-03-03 Kissell Kevin D. Integrated mechanism for suspension and deallocation of computational threads of execution in a processor
US7890734B2 (en) * 2004-06-30 2011-02-15 Open Computing Trust I & II Mechanism for selecting instructions for execution in a multithreaded processor
US7890735B2 (en) 2004-08-30 2011-02-15 Texas Instruments Incorporated Multi-threading processors, integrated circuit devices, systems, and processes of operation and manufacture
US7765547B2 (en) * 2004-11-24 2010-07-27 Maxim Integrated Products, Inc. Hardware multithreading systems with state registers having thread profiling data
US7761697B1 (en) 2005-07-13 2010-07-20 Nvidia Corporation Processing an indirect branch instruction in a SIMD architecture
US7788468B1 (en) 2005-12-15 2010-08-31 Nvidia Corporation Synchronization of threads in a cooperative thread array
JP2007272353A (ja) 2006-03-30 2007-10-18 Nec Electronics Corp プロセッサ装置及び複合条件処理方法
US8312254B2 (en) * 2008-03-24 2012-11-13 Nvidia Corporation Indirect function call instructions in a synchronous parallel thread processor
US8438554B1 (en) * 2008-12-11 2013-05-07 Nvidia Corporation System, method, and computer program product for removing a synchronization statement
US8615646B2 (en) 2009-09-24 2013-12-24 Nvidia Corporation Unanimous branch instructions in a parallel thread processor
US8539204B2 (en) 2009-09-25 2013-09-17 Nvidia Corporation Cooperative thread array reduction and scan operations
US8832417B2 (en) * 2011-09-07 2014-09-09 Qualcomm Incorporated Program flow control for multiple divergent SIMD threads using a minimum resume counter
US9256429B2 (en) * 2012-08-08 2016-02-09 Qualcomm Incorporated Selectively activating a resume check operation in a multi-threaded processing system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5938762A (en) * 1995-10-06 1999-08-17 Denso Corporation Method and apparatus for performing exception processing routine in pipeline processing
US7477255B1 (en) * 2004-04-12 2009-01-13 Nvidia Corporation System and method for synchronizing divergent samples in a programmable graphics processing unit
US7353369B1 (en) * 2005-07-13 2008-04-01 Nvidia Corporation System and method for managing divergent threads in a SIMD architecture
US7543136B1 (en) * 2005-07-13 2009-06-02 Nvidia Corporation System and method for managing divergent threads using synchronization tokens and program instructions that include set-synchronization bits
US7617384B1 (en) * 2006-11-06 2009-11-10 Nvidia Corporation Structured programming control flow using a disable mask in a SIMD architecture
US20110078690A1 (en) * 2009-09-28 2011-03-31 Brian Fahs Opcode-Specified Predicatable Warp Post-Synchronization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Stack-less SIMT reconvergence at low cost";Sylvain Collange;《http://hal.archives-ouvertes.fr/hal-00622654》;20110912;全文 *

Also Published As

Publication number Publication date
KR20150054858A (ko) 2015-05-20
EP2893434B1 (en) 2018-03-21
US9229721B2 (en) 2016-01-05
CN104603749A (zh) 2015-05-06
JP6073479B2 (ja) 2017-02-01
EP2893434A1 (en) 2015-07-15
JP2015531945A (ja) 2015-11-05
US20140075165A1 (en) 2014-03-13
WO2014039206A1 (en) 2014-03-13
KR101660659B1 (ko) 2016-09-27

Similar Documents

Publication Publication Date Title
CN104603749B (zh) 用于在单指令多数据simd处理系统中控制发散分支指令的方法和设备
US8832417B2 (en) Program flow control for multiple divergent SIMD threads using a minimum resume counter
US9256429B2 (en) Selectively activating a resume check operation in a multi-threaded processing system
CN106233248B (zh) 用于在多线程处理器上执行发散操作的方法和设备
CN109144686B (zh) 对任务进行调度
KR20040016829A (ko) 파이프라인식 프로세서에서의 예외 취급 방법, 장치 및시스템
US10664280B2 (en) Fetch ahead branch target buffer
KR20200138439A (ko) 인터럽트들의 세트들을 구성하는 장치 및 방법
CN109144687B (zh) 对任务进行调度
CN109144684B (zh) 对任务进行调度
KR102152735B1 (ko) 그래픽 처리 장치 및 이의 동작 방법
US20150370568A1 (en) Integrated circuit processor and method of operating a integrated circuit processor
US10963260B2 (en) Branch predictor
TWI836108B (zh) 資料結構放棄
US20220035635A1 (en) Processor with multiple execution pipelines
CN108834427B (zh) 处理向量指令

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant