KR102548402B1 - 병렬 프로세서 커널들의 디스패치 크기에 대한 동시 실행 인수를 결정하기 위한 시스템 및 방법 - Google Patents

병렬 프로세서 커널들의 디스패치 크기에 대한 동시 실행 인수를 결정하기 위한 시스템 및 방법 Download PDF

Info

Publication number
KR102548402B1
KR102548402B1 KR1020177033219A KR20177033219A KR102548402B1 KR 102548402 B1 KR102548402 B1 KR 102548402B1 KR 1020177033219 A KR1020177033219 A KR 1020177033219A KR 20177033219 A KR20177033219 A KR 20177033219A KR 102548402 B1 KR102548402 B1 KR 102548402B1
Authority
KR
South Korea
Prior art keywords
kernel
mini
kernels
workgroups
parallel processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
KR1020177033219A
Other languages
English (en)
Korean (ko)
Other versions
KR20180011096A (ko
Inventor
라시지트 센
인드라니 파울
웨이 후앙
Original Assignee
어드밴스드 마이크로 디바이시즈, 인코포레이티드
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 어드밴스드 마이크로 디바이시즈, 인코포레이티드 filed Critical 어드밴스드 마이크로 디바이시즈, 인코포레이티드
Publication of KR20180011096A publication Critical patent/KR20180011096A/ko
Application granted granted Critical
Publication of KR102548402B1 publication Critical patent/KR102548402B1/ko
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/545Interprogram communication where tasks reside in different layers, e.g. user- and kernel-space
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3404Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for parallel or distributed programming
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3433Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment for load management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • G06F9/4893Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues taking into account power or heat criteria
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)
  • Multi Processors (AREA)
KR1020177033219A 2015-05-13 2016-03-22 병렬 프로세서 커널들의 디스패치 크기에 대한 동시 실행 인수를 결정하기 위한 시스템 및 방법 Active KR102548402B1 (ko)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US14/710,879 2015-05-13
US14/710,879 US9965343B2 (en) 2015-05-13 2015-05-13 System and method for determining concurrency factors for dispatch size of parallel processor kernels
PCT/US2016/023560 WO2016182636A1 (en) 2015-05-13 2016-03-22 System and method for determining concurrency factors for dispatch size of parallel processor kernels

Publications (2)

Publication Number Publication Date
KR20180011096A KR20180011096A (ko) 2018-01-31
KR102548402B1 true KR102548402B1 (ko) 2023-06-27

Family

ID=57248294

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020177033219A Active KR102548402B1 (ko) 2015-05-13 2016-03-22 병렬 프로세서 커널들의 디스패치 크기에 대한 동시 실행 인수를 결정하기 위한 시스템 및 방법

Country Status (6)

Country Link
US (1) US9965343B2 (enExample)
EP (1) EP3295300B1 (enExample)
JP (1) JP6659724B2 (enExample)
KR (1) KR102548402B1 (enExample)
CN (1) CN107580698B (enExample)
WO (1) WO2016182636A1 (enExample)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10324730B2 (en) * 2016-03-24 2019-06-18 Mediatek, Inc. Memory shuffle engine for efficient work execution in a parallel computing system
US20180115496A1 (en) * 2016-10-21 2018-04-26 Advanced Micro Devices, Inc. Mechanisms to improve data locality for distributed gpus
US10558499B2 (en) * 2017-10-26 2020-02-11 Advanced Micro Devices, Inc. Wave creation control with dynamic resource allocation
US12405790B2 (en) * 2019-06-28 2025-09-02 Advanced Micro Devices, Inc. Compute unit sorting for reduced divergence
US11900123B2 (en) * 2019-12-13 2024-02-13 Advanced Micro Devices, Inc. Marker-based processor instruction grouping
US11809902B2 (en) * 2020-09-24 2023-11-07 Advanced Micro Devices, Inc. Fine-grained conditional dispatching
US20220334845A1 (en) * 2021-04-15 2022-10-20 Nvidia Corporation Launching code concurrently
US20240220315A1 (en) * 2022-12-30 2024-07-04 Advanced Micro Devices, Inc. Dynamic control of work scheduling

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020116357A1 (en) 1999-12-07 2002-08-22 Paulley Glenn Norman System and methodology for join enumeration in a memory-constrained environment
US20110022817A1 (en) * 2009-07-27 2011-01-27 Advanced Micro Devices, Inc. Mapping Processing Logic Having Data-Parallel Threads Across Processors
US20120291040A1 (en) * 2011-05-11 2012-11-15 Mauricio Breternitz Automatic load balancing for heterogeneous cores
US20130160016A1 (en) * 2011-12-16 2013-06-20 Advanced Micro Devices, Inc. Allocating Compute Kernels to Processors in a Heterogeneous System

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8136104B2 (en) 2006-06-20 2012-03-13 Google Inc. Systems and methods for determining compute kernels for an application in a parallel-processing computer system
US8375368B2 (en) * 2006-06-20 2013-02-12 Google Inc. Systems and methods for profiling an application running on a parallel-processing computer system
CN102023844B (zh) * 2009-09-18 2014-04-09 深圳中微电科技有限公司 并行处理器及其线程处理方法
KR101079697B1 (ko) * 2009-10-05 2011-11-03 주식회사 글로벌미디어테크 범용 그래픽 처리장치의 병렬 프로세서를 이용한 고속 영상 처리 방법
US8250404B2 (en) * 2009-12-31 2012-08-21 International Business Machines Corporation Process integrity of work items in a multiple processor system
US9092267B2 (en) * 2011-06-20 2015-07-28 Qualcomm Incorporated Memory sharing in graphics processing unit
US20120331278A1 (en) 2011-06-23 2012-12-27 Mauricio Breternitz Branch removal by data shuffling
US9823927B2 (en) * 2012-11-30 2017-11-21 Intel Corporation Range selection for data parallel programming environments
CN105027089B (zh) * 2013-03-14 2018-05-22 英特尔公司 内核功能性检查器
US9772864B2 (en) * 2013-04-16 2017-09-26 Arm Limited Methods of and apparatus for multidimensional indexing in microprocessor systems
US10297073B2 (en) * 2016-02-25 2019-05-21 Intel Corporation Method and apparatus for in-place construction of left-balanced and complete point K-D trees

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020116357A1 (en) 1999-12-07 2002-08-22 Paulley Glenn Norman System and methodology for join enumeration in a memory-constrained environment
US20110022817A1 (en) * 2009-07-27 2011-01-27 Advanced Micro Devices, Inc. Mapping Processing Logic Having Data-Parallel Threads Across Processors
US20120291040A1 (en) * 2011-05-11 2012-11-15 Mauricio Breternitz Automatic load balancing for heterogeneous cores
US20130160016A1 (en) * 2011-12-16 2013-06-20 Advanced Micro Devices, Inc. Allocating Compute Kernels to Processors in a Heterogeneous System

Also Published As

Publication number Publication date
US9965343B2 (en) 2018-05-08
EP3295300A1 (en) 2018-03-21
EP3295300B1 (en) 2022-03-23
EP3295300A4 (en) 2019-01-09
WO2016182636A1 (en) 2016-11-17
CN107580698A (zh) 2018-01-12
KR20180011096A (ko) 2018-01-31
CN107580698B (zh) 2019-08-23
US20160335143A1 (en) 2016-11-17
JP6659724B2 (ja) 2020-03-04
JP2018514869A (ja) 2018-06-07

Similar Documents

Publication Publication Date Title
KR102548402B1 (ko) 병렬 프로세서 커널들의 디스패치 크기에 대한 동시 실행 인수를 결정하기 위한 시스템 및 방법
US9697176B2 (en) Efficient sparse matrix-vector multiplication on parallel processors
US8782645B2 (en) Automatic load balancing for heterogeneous cores
KR102011671B1 (ko) 이종 계산 장치 기반의 질의 처리 방법 및 장치
US9535833B2 (en) Reconfigurable processor and method for optimizing configuration memory
US20120331278A1 (en) Branch removal by data shuffling
US10073783B2 (en) Dual mode local data store
JP2014199545A (ja) プログラム、並列演算方法および情報処理装置
CN109308191A (zh) 分支预测方法及装置
KR20150101870A (ko) 메모리의 뱅크 충돌을 방지하기 위한 방법 및 장치
US9383981B2 (en) Method and apparatus of instruction scheduling using software pipelining
KR20240063137A (ko) 하드웨어 가속기 최적화형 그룹 컨볼루션 기반 신경망 모델
US9898333B2 (en) Method and apparatus for selecting preemption technique
US11893502B2 (en) Dynamic hardware selection for experts in mixture-of-experts model
EP4383084A1 (en) Memory device and operating method thereof
US12321733B2 (en) Apparatus and method with neural network computation scheduling
US20160117206A1 (en) Method and system for block scheduling control in a processor by remapping
KR101473955B1 (ko) Qr분해 연산 방법 및 기록매체
US20250208924A1 (en) Systems and Methods for Heterogeneous Model Parallelism and Adaptive Graph Partitioning
US9229875B2 (en) Method and system for extending virtual address space of process performed in operating system
Guo et al. DaCP: Accelerating Synchronization-Free SpTRSV via GPU-Friendly Data Communication and Parallelism Strategies
KR102055617B1 (ko) 운영 체제에서 수행되는 프로세스의 가상 주소 공간을 확장하는 방법 및 시스템
Srivastava Modeling Performance of Tensor Transpose using Regression Techniques

Legal Events

Date Code Title Description
PA0105 International application

St.27 status event code: A-0-1-A10-A15-nap-PA0105

E13-X000 Pre-grant limitation requested

St.27 status event code: A-2-3-E10-E13-lim-X000

P11-X000 Amendment of application requested

St.27 status event code: A-2-2-P10-P11-nap-X000

P13-X000 Application amended

St.27 status event code: A-2-2-P10-P13-nap-X000

P11-X000 Amendment of application requested

St.27 status event code: A-2-2-P10-P11-nap-X000

P13-X000 Application amended

St.27 status event code: A-2-2-P10-P13-nap-X000

PG1501 Laying open of application

St.27 status event code: A-1-1-Q10-Q12-nap-PG1501

P22-X000 Classification modified

St.27 status event code: A-2-2-P10-P22-nap-X000

PA0201 Request for examination

St.27 status event code: A-1-2-D10-D11-exm-PA0201

D13-X000 Search requested

St.27 status event code: A-1-2-D10-D13-srh-X000

D14-X000 Search report completed

St.27 status event code: A-1-2-D10-D14-srh-X000

E902 Notification of reason for refusal
PE0902 Notice of grounds for rejection

St.27 status event code: A-1-2-D10-D21-exm-PE0902

P11-X000 Amendment of application requested

St.27 status event code: A-2-2-P10-P11-nap-X000

P13-X000 Application amended

St.27 status event code: A-2-2-P10-P13-nap-X000

E701 Decision to grant or registration of patent right
PE0701 Decision of registration

St.27 status event code: A-1-2-D10-D22-exm-PE0701

PR0701 Registration of establishment

St.27 status event code: A-2-4-F10-F11-exm-PR0701

PR1002 Payment of registration fee

St.27 status event code: A-2-2-U10-U12-oth-PR1002

Fee payment year number: 1

PG1601 Publication of registration

St.27 status event code: A-4-4-Q10-Q13-nap-PG1601