KR102548402B1 - 병렬 프로세서 커널들의 디스패치 크기에 대한 동시 실행 인수를 결정하기 위한 시스템 및 방법 - Google Patents
병렬 프로세서 커널들의 디스패치 크기에 대한 동시 실행 인수를 결정하기 위한 시스템 및 방법 Download PDFInfo
- Publication number
- KR102548402B1 KR102548402B1 KR1020177033219A KR20177033219A KR102548402B1 KR 102548402 B1 KR102548402 B1 KR 102548402B1 KR 1020177033219 A KR1020177033219 A KR 1020177033219A KR 20177033219 A KR20177033219 A KR 20177033219A KR 102548402 B1 KR102548402 B1 KR 102548402B1
- Authority
- KR
- South Korea
- Prior art keywords
- kernel
- mini
- kernels
- workgroups
- parallel processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/545—Interprogram communication where tasks reside in different layers, e.g. user- and kernel-space
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3404—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for parallel or distributed programming
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
- G06F11/3433—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment for load management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44505—Configuring for program initiating, e.g. using registry, configuration files
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G06F9/4893—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues taking into account power or heat criteria
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Debugging And Monitoring (AREA)
- Multi Processors (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US14/710,879 | 2015-05-13 | ||
| US14/710,879 US9965343B2 (en) | 2015-05-13 | 2015-05-13 | System and method for determining concurrency factors for dispatch size of parallel processor kernels |
| PCT/US2016/023560 WO2016182636A1 (en) | 2015-05-13 | 2016-03-22 | System and method for determining concurrency factors for dispatch size of parallel processor kernels |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| KR20180011096A KR20180011096A (ko) | 2018-01-31 |
| KR102548402B1 true KR102548402B1 (ko) | 2023-06-27 |
Family
ID=57248294
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| KR1020177033219A Active KR102548402B1 (ko) | 2015-05-13 | 2016-03-22 | 병렬 프로세서 커널들의 디스패치 크기에 대한 동시 실행 인수를 결정하기 위한 시스템 및 방법 |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US9965343B2 (enExample) |
| EP (1) | EP3295300B1 (enExample) |
| JP (1) | JP6659724B2 (enExample) |
| KR (1) | KR102548402B1 (enExample) |
| CN (1) | CN107580698B (enExample) |
| WO (1) | WO2016182636A1 (enExample) |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10324730B2 (en) * | 2016-03-24 | 2019-06-18 | Mediatek, Inc. | Memory shuffle engine for efficient work execution in a parallel computing system |
| US20180115496A1 (en) * | 2016-10-21 | 2018-04-26 | Advanced Micro Devices, Inc. | Mechanisms to improve data locality for distributed gpus |
| US10558499B2 (en) * | 2017-10-26 | 2020-02-11 | Advanced Micro Devices, Inc. | Wave creation control with dynamic resource allocation |
| US12405790B2 (en) * | 2019-06-28 | 2025-09-02 | Advanced Micro Devices, Inc. | Compute unit sorting for reduced divergence |
| US11900123B2 (en) * | 2019-12-13 | 2024-02-13 | Advanced Micro Devices, Inc. | Marker-based processor instruction grouping |
| US11809902B2 (en) * | 2020-09-24 | 2023-11-07 | Advanced Micro Devices, Inc. | Fine-grained conditional dispatching |
| US20220334845A1 (en) * | 2021-04-15 | 2022-10-20 | Nvidia Corporation | Launching code concurrently |
| US20240220315A1 (en) * | 2022-12-30 | 2024-07-04 | Advanced Micro Devices, Inc. | Dynamic control of work scheduling |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020116357A1 (en) | 1999-12-07 | 2002-08-22 | Paulley Glenn Norman | System and methodology for join enumeration in a memory-constrained environment |
| US20110022817A1 (en) * | 2009-07-27 | 2011-01-27 | Advanced Micro Devices, Inc. | Mapping Processing Logic Having Data-Parallel Threads Across Processors |
| US20120291040A1 (en) * | 2011-05-11 | 2012-11-15 | Mauricio Breternitz | Automatic load balancing for heterogeneous cores |
| US20130160016A1 (en) * | 2011-12-16 | 2013-06-20 | Advanced Micro Devices, Inc. | Allocating Compute Kernels to Processors in a Heterogeneous System |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8136104B2 (en) | 2006-06-20 | 2012-03-13 | Google Inc. | Systems and methods for determining compute kernels for an application in a parallel-processing computer system |
| US8375368B2 (en) * | 2006-06-20 | 2013-02-12 | Google Inc. | Systems and methods for profiling an application running on a parallel-processing computer system |
| CN102023844B (zh) * | 2009-09-18 | 2014-04-09 | 深圳中微电科技有限公司 | 并行处理器及其线程处理方法 |
| KR101079697B1 (ko) * | 2009-10-05 | 2011-11-03 | 주식회사 글로벌미디어테크 | 범용 그래픽 처리장치의 병렬 프로세서를 이용한 고속 영상 처리 방법 |
| US8250404B2 (en) * | 2009-12-31 | 2012-08-21 | International Business Machines Corporation | Process integrity of work items in a multiple processor system |
| US9092267B2 (en) * | 2011-06-20 | 2015-07-28 | Qualcomm Incorporated | Memory sharing in graphics processing unit |
| US20120331278A1 (en) | 2011-06-23 | 2012-12-27 | Mauricio Breternitz | Branch removal by data shuffling |
| US9823927B2 (en) * | 2012-11-30 | 2017-11-21 | Intel Corporation | Range selection for data parallel programming environments |
| CN105027089B (zh) * | 2013-03-14 | 2018-05-22 | 英特尔公司 | 内核功能性检查器 |
| US9772864B2 (en) * | 2013-04-16 | 2017-09-26 | Arm Limited | Methods of and apparatus for multidimensional indexing in microprocessor systems |
| US10297073B2 (en) * | 2016-02-25 | 2019-05-21 | Intel Corporation | Method and apparatus for in-place construction of left-balanced and complete point K-D trees |
-
2015
- 2015-05-13 US US14/710,879 patent/US9965343B2/en active Active
-
2016
- 2016-03-22 KR KR1020177033219A patent/KR102548402B1/ko active Active
- 2016-03-22 JP JP2017554900A patent/JP6659724B2/ja active Active
- 2016-03-22 CN CN201680026495.2A patent/CN107580698B/zh active Active
- 2016-03-22 WO PCT/US2016/023560 patent/WO2016182636A1/en not_active Ceased
- 2016-03-22 EP EP16793114.6A patent/EP3295300B1/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020116357A1 (en) | 1999-12-07 | 2002-08-22 | Paulley Glenn Norman | System and methodology for join enumeration in a memory-constrained environment |
| US20110022817A1 (en) * | 2009-07-27 | 2011-01-27 | Advanced Micro Devices, Inc. | Mapping Processing Logic Having Data-Parallel Threads Across Processors |
| US20120291040A1 (en) * | 2011-05-11 | 2012-11-15 | Mauricio Breternitz | Automatic load balancing for heterogeneous cores |
| US20130160016A1 (en) * | 2011-12-16 | 2013-06-20 | Advanced Micro Devices, Inc. | Allocating Compute Kernels to Processors in a Heterogeneous System |
Also Published As
| Publication number | Publication date |
|---|---|
| US9965343B2 (en) | 2018-05-08 |
| EP3295300A1 (en) | 2018-03-21 |
| EP3295300B1 (en) | 2022-03-23 |
| EP3295300A4 (en) | 2019-01-09 |
| WO2016182636A1 (en) | 2016-11-17 |
| CN107580698A (zh) | 2018-01-12 |
| KR20180011096A (ko) | 2018-01-31 |
| CN107580698B (zh) | 2019-08-23 |
| US20160335143A1 (en) | 2016-11-17 |
| JP6659724B2 (ja) | 2020-03-04 |
| JP2018514869A (ja) | 2018-06-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| KR102548402B1 (ko) | 병렬 프로세서 커널들의 디스패치 크기에 대한 동시 실행 인수를 결정하기 위한 시스템 및 방법 | |
| US9697176B2 (en) | Efficient sparse matrix-vector multiplication on parallel processors | |
| US8782645B2 (en) | Automatic load balancing for heterogeneous cores | |
| KR102011671B1 (ko) | 이종 계산 장치 기반의 질의 처리 방법 및 장치 | |
| US9535833B2 (en) | Reconfigurable processor and method for optimizing configuration memory | |
| US20120331278A1 (en) | Branch removal by data shuffling | |
| US10073783B2 (en) | Dual mode local data store | |
| JP2014199545A (ja) | プログラム、並列演算方法および情報処理装置 | |
| CN109308191A (zh) | 分支预测方法及装置 | |
| KR20150101870A (ko) | 메모리의 뱅크 충돌을 방지하기 위한 방법 및 장치 | |
| US9383981B2 (en) | Method and apparatus of instruction scheduling using software pipelining | |
| KR20240063137A (ko) | 하드웨어 가속기 최적화형 그룹 컨볼루션 기반 신경망 모델 | |
| US9898333B2 (en) | Method and apparatus for selecting preemption technique | |
| US11893502B2 (en) | Dynamic hardware selection for experts in mixture-of-experts model | |
| EP4383084A1 (en) | Memory device and operating method thereof | |
| US12321733B2 (en) | Apparatus and method with neural network computation scheduling | |
| US20160117206A1 (en) | Method and system for block scheduling control in a processor by remapping | |
| KR101473955B1 (ko) | Qr분해 연산 방법 및 기록매체 | |
| US20250208924A1 (en) | Systems and Methods for Heterogeneous Model Parallelism and Adaptive Graph Partitioning | |
| US9229875B2 (en) | Method and system for extending virtual address space of process performed in operating system | |
| Guo et al. | DaCP: Accelerating Synchronization-Free SpTRSV via GPU-Friendly Data Communication and Parallelism Strategies | |
| KR102055617B1 (ko) | 운영 체제에서 수행되는 프로세스의 가상 주소 공간을 확장하는 방법 및 시스템 | |
| Srivastava | Modeling Performance of Tensor Transpose using Regression Techniques |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PA0105 | International application |
St.27 status event code: A-0-1-A10-A15-nap-PA0105 |
|
| E13-X000 | Pre-grant limitation requested |
St.27 status event code: A-2-3-E10-E13-lim-X000 |
|
| P11-X000 | Amendment of application requested |
St.27 status event code: A-2-2-P10-P11-nap-X000 |
|
| P13-X000 | Application amended |
St.27 status event code: A-2-2-P10-P13-nap-X000 |
|
| P11-X000 | Amendment of application requested |
St.27 status event code: A-2-2-P10-P11-nap-X000 |
|
| P13-X000 | Application amended |
St.27 status event code: A-2-2-P10-P13-nap-X000 |
|
| PG1501 | Laying open of application |
St.27 status event code: A-1-1-Q10-Q12-nap-PG1501 |
|
| P22-X000 | Classification modified |
St.27 status event code: A-2-2-P10-P22-nap-X000 |
|
| PA0201 | Request for examination |
St.27 status event code: A-1-2-D10-D11-exm-PA0201 |
|
| D13-X000 | Search requested |
St.27 status event code: A-1-2-D10-D13-srh-X000 |
|
| D14-X000 | Search report completed |
St.27 status event code: A-1-2-D10-D14-srh-X000 |
|
| E902 | Notification of reason for refusal | ||
| PE0902 | Notice of grounds for rejection |
St.27 status event code: A-1-2-D10-D21-exm-PE0902 |
|
| P11-X000 | Amendment of application requested |
St.27 status event code: A-2-2-P10-P11-nap-X000 |
|
| P13-X000 | Application amended |
St.27 status event code: A-2-2-P10-P13-nap-X000 |
|
| E701 | Decision to grant or registration of patent right | ||
| PE0701 | Decision of registration |
St.27 status event code: A-1-2-D10-D22-exm-PE0701 |
|
| PR0701 | Registration of establishment |
St.27 status event code: A-2-4-F10-F11-exm-PR0701 |
|
| PR1002 | Payment of registration fee |
St.27 status event code: A-2-2-U10-U12-oth-PR1002 Fee payment year number: 1 |
|
| PG1601 | Publication of registration |
St.27 status event code: A-4-4-Q10-Q13-nap-PG1601 |