CN105579967B - Gpu发散栅栏 - Google Patents

Gpu发散栅栏 Download PDF

Info

Publication number
CN105579967B
CN105579967B CN201480052983.1A CN201480052983A CN105579967B CN 105579967 B CN105579967 B CN 105579967B CN 201480052983 A CN201480052983 A CN 201480052983A CN 105579967 B CN105579967 B CN 105579967B
Authority
CN
China
Prior art keywords
thread
thread beam
threads
fence
expression formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201480052983.1A
Other languages
English (en)
Chinese (zh)
Other versions
CN105579967A (zh
Inventor
梅春惠
阿列克谢·弗拉狄米罗维奇·布尔德
陈林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN105579967A publication Critical patent/CN105579967A/zh
Application granted granted Critical
Publication of CN105579967B publication Critical patent/CN105579967B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3888Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple threads [SIMT] in parallel
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3888Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple threads [SIMT] in parallel
    • G06F9/38885Divergence aspects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/522Barrier synchronisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Executing Machine-Instructions (AREA)
  • Advance Control (AREA)
  • Image Processing (AREA)
CN201480052983.1A 2013-10-01 2014-09-10 Gpu发散栅栏 Active CN105579967B (zh)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US14/043,562 2013-10-01
US14/043,562 US9652284B2 (en) 2013-10-01 2013-10-01 GPU divergence barrier
PCT/US2014/054966 WO2015050681A1 (en) 2013-10-01 2014-09-10 Gpu divergence barrier

Publications (2)

Publication Number Publication Date
CN105579967A CN105579967A (zh) 2016-05-11
CN105579967B true CN105579967B (zh) 2019-09-03

Family

ID=51619301

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201480052983.1A Active CN105579967B (zh) 2013-10-01 2014-09-10 Gpu发散栅栏

Country Status (6)

Country Link
US (1) US9652284B2 (enExample)
EP (1) EP3053038B1 (enExample)
JP (1) JP6411477B2 (enExample)
KR (1) KR102253426B1 (enExample)
CN (1) CN105579967B (enExample)
WO (1) WO2015050681A1 (enExample)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150019349A (ko) * 2013-08-13 2015-02-25 삼성전자주식회사 다중 쓰레드 실행 프로세서 및 이의 동작 방법
US10133572B2 (en) * 2014-05-02 2018-11-20 Qualcomm Incorporated Techniques for serialized execution in a SIMD processing system
GB2540937B (en) * 2015-07-30 2019-04-03 Advanced Risc Mach Ltd Graphics processing systems
US9921838B2 (en) * 2015-10-02 2018-03-20 Mediatek Inc. System and method for managing static divergence in a SIMD computing architecture
US10409610B2 (en) * 2016-01-29 2019-09-10 Advanced Micro Devices, Inc. Method and apparatus for inter-lane thread migration
US10115175B2 (en) * 2016-02-19 2018-10-30 Qualcomm Incorporated Uniform predicates in shaders for graphics processing units
KR20180038793A (ko) * 2016-10-07 2018-04-17 삼성전자주식회사 영상 데이터 처리 방법 및 장치
US10649770B2 (en) 2017-01-31 2020-05-12 Facebook, Inc. κ-selection using parallel processing
US10474468B2 (en) * 2017-02-22 2019-11-12 Advanced Micro Devices, Inc. Indicating instruction scheduling mode for processing wavefront portions
US10310861B2 (en) 2017-04-01 2019-06-04 Intel Corporation Mechanism for scheduling threads on a multiprocessor
US10496448B2 (en) * 2017-04-01 2019-12-03 Intel Corporation De-centralized load-balancing at processors
US10325341B2 (en) 2017-04-21 2019-06-18 Intel Corporation Handling pipeline submissions across many compute units
US10620994B2 (en) 2017-05-30 2020-04-14 Advanced Micro Devices, Inc. Continuation analysis tasks for GPU task scheduling
US10866806B2 (en) * 2017-11-14 2020-12-15 Nvidia Corporation Uniform register file for improved resource utilization
CN111712793B (zh) * 2018-02-14 2023-10-20 华为技术有限公司 线程处理方法和图形处理器
US12099867B2 (en) 2018-05-30 2024-09-24 Advanced Micro Devices, Inc. Multi-kernel wavefront scheduler
WO2020186631A1 (en) * 2019-03-21 2020-09-24 Huawei Technologies Co., Ltd. Compute shader warps without ramp up
US12405790B2 (en) 2019-06-28 2025-09-02 Advanced Micro Devices, Inc. Compute unit sorting for reduced divergence
US20210132985A1 (en) * 2019-10-30 2021-05-06 Advanced Micro Devices, Inc. Shadow latches in a shadow-latch configured register file for thread storage
KR102346601B1 (ko) 2020-02-26 2022-01-03 성균관대학교산학협력단 운영체제 내 쓰기 순서 보장이 필요한 여러 쓰기 입출력의 처리 속도 향상 방법
CN113535251A (zh) 2020-04-13 2021-10-22 华为技术有限公司 一种线程管理方法及装置
US11288765B2 (en) * 2020-04-28 2022-03-29 Sony Interactive Entertainment LLC System and method for efficient multi-GPU execution of kernels by region based dependencies
US11204774B1 (en) * 2020-08-31 2021-12-21 Apple Inc. Thread-group-scoped gate instruction
US11640647B2 (en) * 2021-03-03 2023-05-02 Qualcomm Incorporated Methods and apparatus for intra-wave texture looping
US20230153176A1 (en) * 2021-11-17 2023-05-18 Intel Corporation Forward progress guarantee using single-level synchronization at individual thread granularity
US20250068429A1 (en) * 2023-08-22 2025-02-27 Advanced Micro Devices, Inc. Streaming wave coalescer circuit

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1276890A (zh) * 1997-10-23 2000-12-13 国际商业机器公司 在多线程处理器中改变线程优先级的方法和装置
CN102640131A (zh) * 2009-09-24 2012-08-15 辉达公司 并行线程处理器中的一致分支指令
CN103207774A (zh) * 2012-01-11 2013-07-17 辉达公司 用于解决线程发散的方法和系统

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS59180668A (ja) * 1983-03-31 1984-10-13 Fujitsu Ltd 条件付命令の実行時命令選択方式
JP2004220070A (ja) 2003-01-09 2004-08-05 Japan Science & Technology Agency コンテキスト切り替え方法及び装置、中央演算装置、コンテキスト切り替えプログラム及びそれを記憶したコンピュータ読み取り可能な記憶媒体
US7746347B1 (en) 2004-07-02 2010-06-29 Nvidia Corporation Methods and systems for processing a geometry shader program developed in a high-level shading language
US7639252B2 (en) 2004-08-11 2009-12-29 Ati Technologies Ulc Unified tessellation circuit and method therefor
US7480840B2 (en) 2004-10-12 2009-01-20 International Business Machines Corporation Apparatus, system, and method for facilitating port testing of a multi-port host adapter
US7522167B1 (en) 2004-12-16 2009-04-21 Nvidia Corporation Coherence of displayed images for split-frame rendering in multi-processor graphics system
US8171461B1 (en) 2006-02-24 2012-05-01 Nvidia Coporation Primitive program compilation for flat attributes with provoking vertex independence
US7696993B2 (en) 2007-02-08 2010-04-13 Via Technologies, Inc. Geometry primitive type conversion in a GPU pipeline
US8072460B2 (en) 2007-10-17 2011-12-06 Nvidia Corporation System, method, and computer program product for generating a ray tracing data structure utilizing a parallel processor architecture
US8661226B2 (en) * 2007-11-15 2014-02-25 Nvidia Corporation System, method, and computer program product for performing a scan operation on a sequence of single-bit values using a parallel processor architecture
US20100064291A1 (en) 2008-09-05 2010-03-11 Nvidia Corporation System and Method for Reducing Execution Divergence in Parallel Processing Architectures
US20110219221A1 (en) 2010-03-03 2011-09-08 Kevin Skadron Dynamic warp subdivision for integrated branch and memory latency divergence tolerance
US8732711B2 (en) 2010-09-24 2014-05-20 Nvidia Corporation Two-level scheduler for multi-threaded processing
US8595701B2 (en) 2011-02-04 2013-11-26 Fujitsu Limited Symbolic execution and test generation for GPU programs
US9153193B2 (en) 2011-09-09 2015-10-06 Microsoft Technology Licensing, Llc Primitive rendering using a single primitive type

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1276890A (zh) * 1997-10-23 2000-12-13 国际商业机器公司 在多线程处理器中改变线程优先级的方法和装置
CN102640131A (zh) * 2009-09-24 2012-08-15 辉达公司 并行线程处理器中的一致分支指令
CN103207774A (zh) * 2012-01-11 2013-07-17 辉达公司 用于解决线程发散的方法和系统

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CAPRI:Prediction of Compaction-Adequacy for Handling Control-Divergence in GPGPU Architectures;Minsoo Rhu等;《International Symposium on Computer Architecture》;20120609;61-71 *
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow;Wilson W.L.Fung等;《40th IEEE/ACM International Symposium on Microarchitecture》;20071201;407-416 *
Improving GPU SIMD Control Flow Efficiency via Hybrid Warp Size Mechanism;Xingxing Jin;《http://ecommons.usask.ca/bitstream/handle/10388/ETD-2012-06-527/JIN-THESIS.pdf》;20120731;1-82 *

Also Published As

Publication number Publication date
US9652284B2 (en) 2017-05-16
WO2015050681A1 (en) 2015-04-09
JP6411477B2 (ja) 2018-10-24
CN105579967A (zh) 2016-05-11
KR20160065121A (ko) 2016-06-08
KR102253426B1 (ko) 2021-05-17
US20150095914A1 (en) 2015-04-02
JP2016532180A (ja) 2016-10-13
EP3053038A1 (en) 2016-08-10
EP3053038B1 (en) 2020-10-21

Similar Documents

Publication Publication Date Title
CN105579967B (zh) Gpu发散栅栏
JP6329274B2 (ja) コンパイラ最適化のためのメモリ参照メタデータ
CN102099789B (zh) 多处理器的多维线程分组
JP5701487B2 (ja) 同期並列スレッドプロセッサにおける間接的な関数呼び出し命令
TWI493451B (zh) 使用預解碼資料進行指令排程的方法和裝置
US9354892B2 (en) Creating SIMD efficient code by transferring register state through common memory
TWI501150B (zh) 無指令解碼而排程指令的方法和裝置
CN104040500B (zh) 基于线程相似性的调度线程执行
CN108369552A (zh) 以扰乱时序的模式进行的软件向后兼容性测试
WO2017019287A1 (en) Backward compatibility by algorithm matching, disabling features, or throttling performance
TW201717022A (zh) 藉由對硬體資源之限制實現的向後相容性
JP4292198B2 (ja) 実行スレッドをグループ化するための方法
US10289418B2 (en) Cooperative thread array granularity context switch during trap handling
CN106407063B (zh) 一种GPU L1 Cache处访存序列的仿真生成与排序方法
KR101420592B1 (ko) 컴퓨터 시스템
KR102210765B1 (ko) 긴 지연시간 숨김 기반 워프 스케줄링을 위한 방법 및 장치
TWI428833B (zh) 多執行緒處理器及其指令執行及同步方法及其電腦程式產品
US20250138827A1 (en) Methods and apparatus for processing instructions
US12499504B2 (en) System and method for adaptive graph-to-stream scheduling
US20240193721A1 (en) System and method for adaptive graph-to-stream scheduling
Lewis Performance and Programmability Trade-offs in the OpenCL 2.0 SVM and Memory Model
Garland NVIDIA GPU
Ng et al. Speeding up the 3D Model Rendering on Android Device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant