KR20220104722A

KR20220104722A - Workload-Based Clock Adjustment in Processing Units

Info

Publication number: KR20220104722A
Application number: KR1020227018163A
Authority: KR
Inventors: 망게쉬 피. 니자수레; 마이클 만토르; 나민 아쉬칸 후세인자데; 루이스 레니에르
Original assignee: 어드밴스드 마이크로 디바이시즈, 인코포레이티드; 에이티아이 테크놀로지스 유엘씨
Priority date: 2019-11-22
Filing date: 2020-11-20
Publication date: 2022-07-26
Also published as: WO2021102252A1; US11263044B2; US20210157639A1; EP4062257A1; JP2023503412A; EP4062257A4; CN114730200A; JP7418571B2

Abstract

그래픽 프로세싱 유닛(GPU)(102)은 프로세싱 유닛에서 실행되는 프로그램 스레드(104, 106)를 식별하는 것에 기초하여 클록의 주파수를 조정하고, 프로그램 스레드는 실행될 워크로드(116, 117)에 기초하여 검출된다. 식별된 프로그램 스레드에 기초하여 클록 주파수를 조정함으로써, 프로세싱 유닛은 상이한 프로그램 스레드들의 상이한 프로세싱 요구들에 적응한다. 또한, 워크로드(workload)에 기초하여 프로그램 스레드를 식별함으로써, 프로세싱 유닛은 프로세싱 요구들에 기초하여 클록 주파수를 적응시킴으로써, 프로세싱 리소스들을 절약(conserving)한다.A graphics processing unit (GPU) 102 adjusts the frequency of the clock based on identifying the program threads 104 , 106 executing in the processing unit, and the program threads are detected based on the workload 116 , 117 to be executed. do. By adjusting the clock frequency based on the identified program thread, the processing unit adapts to the different processing needs of different program threads. Also, by identifying a program thread based on a workload, the processing unit conserves processing resources by adapting the clock frequency based on processing demands.

Description

Workload-Based Clock Adjustment in Processing Units

프로세싱 시스템들은 종종 프로세싱 유닛들이 설계된 특정 동작들을 실행하기 위해 특수 프로세싱 유닛들을 이용한다. 예를 들어, 프로세싱 시스템은 프로세싱 시스템에 대한 그래픽 및 벡터 프로세싱 동작들을 실행하기 위해 그래픽 프로세싱 유닛 (GPU)을 이용할 수 있다. 일부 경우들에서, 프로세싱 유닛은 프로세싱 시스템에서 실행되는 상이한 프로그램들을 대신하여 동작들을 동시에 수행한다. 예를 들어, 프로세싱 시스템은 가상화된 컴퓨팅 환경을 구현할 수 있고, 프로세싱 시스템은 하나 이상의 중앙 프로세싱 유닛(CPU)에서 다수의 가상 머신들(VM들)을 동시에 실행한다. VM들 각각은 그래픽 또는 벡터 프로세싱 동작들을 실행하도록 프로세싱 시스템의 GPU에 요청하며, 따라서 GPU는 상이한 VM들을 대신하여 동작들을 동시에 실행하는 임무를 수행한다. 그러나, 상이한 프로그램들은 상이한 전력 요건들, 상이한 최대 클록 주파수 요건들 등과 같은 상이한 요건들을 가질 수 있다. 상이한 요건들은 GPU(또는 다른 프로세싱 유닛)에 상이한 프로세싱 요구들을 부과시키고, 그에 의해 전체적으로 프로세싱 시스템에 부정적인 영향을 미친다.Processing systems often use specialized processing units to perform the specific operations for which the processing units are designed. For example, a processing system may utilize a graphics processing unit (GPU) to perform graphics and vector processing operations for the processing system. In some cases, the processing unit concurrently performs operations on behalf of different programs executing on the processing system. For example, the processing system may implement a virtualized computing environment, wherein the processing system concurrently executes multiple virtual machines (VMs) in one or more central processing units (CPUs). Each of the VMs requests the GPU of the processing system to execute graphics or vector processing operations, so the GPU is tasked with executing the operations concurrently on behalf of the different VMs. However, different programs may have different requirements, such as different power requirements, different maximum clock frequency requirements, and the like. Different requirements impose different processing demands on the GPU (or other processing unit), thereby negatively impacting the processing system as a whole.

본 개시는 첨부된 도면을 참조함으로써 당업자에게 더 잘 이해될 수 있고, 많은 특징 및 장점이 명백해질 수 있다. 다른 도면에서 동일한 참조 기호를 사용하는 것은 유사하거나 동일한 항목을 나타낸다.
도 1은 일부 실시예들에 따른 실행 프로그램 스레드의 검출에 기초하여 클록 주파수를 조정(adjust)하는 그래픽 프로세싱 유닛(GPU)을 포함하는 프로세싱 시스템의 블록도이다.
도 2는 일부 실시예들에 따라 GPU의 워크로드(workload)의 변화들에 기초하여 클록 주파수를 조정하는 도 1의 GPU의 예를 예시하는 다이어그램이다.
도 3은 일부 실시예들에 따른 도 1의 GPU의 클록 주파수 조정 모듈의 블록도이다.
도 4는 일부 실시예들에 따른 실행 프로그램 스레드를 검출하는 것에 기초하여 프로세싱 유닛의 클록 주파수를 조정하는 방법의 흐름도이다.BRIEF DESCRIPTION OF THE DRAWINGS The present disclosure may be better understood by those skilled in the art by reference to the accompanying drawings, and many features and advantages may become apparent to those skilled in the art. The use of the same reference symbols in different drawings indicates similar or identical items.
1 is a block diagram of a processing system including a graphics processing unit (GPU) that adjusts a clock frequency based on detection of an executing program thread in accordance with some embodiments.
FIG. 2 is a diagram illustrating an example of the GPU of FIG. 1 adjusting a clock frequency based on changes in the workload of the GPU in accordance with some embodiments.
3 is a block diagram of a clock frequency adjustment module of the GPU of FIG. 1 in accordance with some embodiments.
4 is a flow diagram of a method of adjusting a clock frequency of a processing unit based on detecting an executing program thread in accordance with some embodiments.

도 1-4는 중앙 프로세싱 유닛(CPU)에서 실행되는 프로그램 스레드를 식별하는 것에 기초하여 GPU(graphics processing unit)에서 클록의 주파수를 조정하기 위한 기법들을 예시하며, 프로그램 스레드는 실행될 워크로드에 기초하여 검출된다. 식별된 프로그램 스레드에 기초하여 클록 주파수를 조정함으로써, 프로세싱 유닛은 동시에 실행되는 프로그램들의 상이한 프로세싱 요구들에 적응한다. 또한, 워크로드(workload)에 기초하여 프로그램 스레드를 식별함으로써, 프로세싱 유닛은 프로세싱 요구들에 기초하여 클록 주파수를 적응시킴으로써, 프로세싱 리소스들을 절약(conserving)한다.1-4 illustrate techniques for adjusting the frequency of a clock in a graphics processing unit (GPU) based on identifying a program thread executing in the central processing unit (CPU), the program thread based on the workload to be executed; detected. By adjusting the clock frequency based on the identified program thread, the processing unit adapts to the different processing needs of concurrently executing programs. Also, by identifying a program thread based on a workload, the processing unit conserves processing resources by adapting the clock frequency based on processing demands.

예를 통해 예시하기 위해, 일부 실시예들에서, CPU는 2개의 상이한 프로그램들(예를 들어, 2개의 상이한 가상 머신들)을 동시에 실행한다. Program1으로 지정된 프로그램 중 하나는 예를 들어 다른 프로그램 또는 시스템과의 호환성을 유지하기 위해 상대적으로 낮은 주파수로 동작을 실행하는 데 필요한 프로그램이다. Program2로 지정된 다른 프로그램은 적어도 일부 상황에서는 성능 목표를 충족시키기 위해 상대적으로 높은 주파수로 동작을 실행하는 데 필요한 프로그램이다. 통상적으로, Program1 및 Program2에 대한 동작들을 동시에 실행하는 GPU의 클록은, 어떤 프로그램의 동작들이 GPU에서 실행되고 있는지에 관계없이, 상이한 클록 주파수들 중 더 높게 설정됨으로써, Program1에 대한 추가적인 소프트웨어 또는 하드웨어가 그 호환성 요건들을 준수할 것을 요구한다. 본 명세서의 기술들을 사용하여, GPU는 Program1 및 Program2 중 어느 것이 실행되고 있는지, 그리고 실행 중인 프로그램의 프로세싱 요구들이 실행을 위해 클록 주파수의 조정이 바람직한지 여부를 검출한다.To illustrate by way of example, in some embodiments, the CPU executes two different programs (eg, two different virtual machines) concurrently. One of the programs designated as Program1 is, for example, a program required to execute operations at a relatively low frequency to maintain compatibility with other programs or systems. Another program, designated Program2, is the program required to execute operations at a relatively high frequency to meet performance targets, at least in some circumstances. Typically, the clock of the GPU executing the operations for Program1 and Program2 simultaneously is set higher among the different clock frequencies, regardless of which program's operations are being executed on the GPU, so that additional software or hardware for Program1 is not It requires compliance with its compatibility requirements. Using the techniques herein, the GPU detects which of Program1 and Program2 is being executed, and whether it is desirable to adjust the clock frequency for execution of the processing demands of the executing program.

그 프로세싱 요건들을 충족시키기 위한 프로그래밍하고, 및 그에 따라 GPU의 클록 주파수를 조정한다. GPU는 이에 따라, Program1 및 Program2 각각에 대한 프로세싱 요건들을 충족시키고, 변화하는 워크로드에 따라 동적으로 수행하여 프로세싱 리소스들을 절약(conserving)한다.Program to meet its processing requirements, and adjust the clock frequency of the GPU accordingly. The GPU thus meets the processing requirements for each of Program1 and Program2, and performs dynamically according to a changing workload, conserving processing resources.

또한, 프로그램 식별자 값 또는 특정된 시간 기간들과 같은 정적 조건에만 의해서가 아니라, 검출된 워크로드에 기초하여 클록 주파수를 변경함으로써, GPU는 주어진 워크로드의 요구들에 기초하여 각각의 프로그램에 대한 프로세싱 사양들을 충족시킬 수 있고, 이에 의해 프로세서 리소스들을 절약할 수 있다. 예를 들어, 일부 경우들에서, 더 높은-성능 프로그램은, 프로세싱 사양들이 더 낮은 클록 주파수를 가질 수 있도록, 상대적으로 가벼운 워크로드를 GPU에 제공할 수 있다. 본 명세서의 기술들을 사용하여, GPU는, 대응하는 프로그램이 상대적으로 높은 특정 클록 주파수와 연관되더라도, 상대적으로 가벼운 워크로드 하에서 더 낮은 주파수에서 클록 신호의 주파수를 유지함으로써, 이들이 프로그램 사양을 충족시킬 필요가 없을 때 GPU 리소스를 절약한다.Also, by changing the clock frequency based on the detected workload, and not solely by static conditions such as a program identifier value or specified time periods, the GPU can process processing for each program based on the needs of a given workload. specifications can be met, thereby saving processor resources. For example, in some cases, a higher-performance program may present a relatively light workload to the GPU, such that processing features may have a lower clock frequency. Using the techniques herein, the GPU maintains the frequency of the clock signal at a lower frequency under relatively light workloads, even if the corresponding program is associated with a relatively high specific clock frequency, so that they need to meet the program specification. Saves GPU resources when not present.

도면들을 참조하면, 도 1은 일부 실시예들에 따른 중앙 프로세싱 유닛(CPU)(101) 및 그래픽 프로세싱 유닛(GPU)(102)을 포함하는 프로세싱 시스템(100)의 블록도를 도시한다. 프로세싱 시스템(100)은 일반적으로 전자 디바이스를 대신하여 작업들을 수행하기 위해 컴퓨터 프로그램들로서 구성되는 명령어(instruction)들을 실행하도록 구성된다. 따라서, 상이한 실시예들에서, 프로세싱 시스템(100)은 데스크탑 또는 랩탑 컴퓨터, 서버, 스마트폰, 태블릿, 게임 콘솔 등과 같은 전자 디바이스의 일부이다. 프로세싱 시스템(100)은 도 1에 예시되지 않은 추가적인 컴포넌트들 및 모듈들을 포함한다. 예를 들어, 일부 실시예들에서, 프로세싱 시스템은 전자 디바이스를 대신하여 작업들을 수행하기 위해 하나 이상의 메모리 제어기들, 입력/출력 제어기들, 네트워크 인터페이스들 등을 포함한다.Referring to the drawings, FIG. 1 shows a block diagram of a processing system 100 including a central processing unit (CPU) 101 and a graphics processing unit (GPU) 102 in accordance with some embodiments. Processing system 100 is generally configured to execute instructions, which are organized as computer programs to perform tasks on behalf of an electronic device. Thus, in different embodiments, processing system 100 is part of an electronic device, such as a desktop or laptop computer, server, smartphone, tablet, game console, or the like. The processing system 100 includes additional components and modules not illustrated in FIG. 1 . For example, in some embodiments, the processing system includes one or more memory controllers, input/output controllers, network interfaces, etc. to perform tasks on behalf of the electronic device.

CPU(101)는 일반적으로 다수의 프로그램들 및 대응하는 프로그램 스레드들을 동시에 실행하도록 구성된다. 본 명세서에서 사용되는 바와 같이, 프로그램 스레드는 개별 프로그램(예를 들어, 운영 체제, 애플리케이션 프로그램 등) 또는 멀티스레드 프로그램의 개별 스레드 중 어느 하나를 지칭한다. 도시된 예에서, CPU(101)는 2개의 프로그램 스레드, 지정된 프로그램(103) 및 프로그램(104)을 동시에 실행한다. 그러나, 도 1과 관련하여 설명된 기술들은, 다른 실시예들에서, N개 프로그램들을 동시에 실행하는 프로세싱 시스템에서 구현되고, 여기서 N은 1보다 큰 정수임이 이해될 것이다. 따라서, 일부 실시예들에서, CPU(101)는 다수의 가상 머신들을 동시에 실행함으로써 가상화된 컴퓨팅 환경을 구현하고, 여기서 프로그램들(103 및 104)은 상이한 가상 머신들에 의해 실행되는 프로그램들에 대응한다. 예를 들어, 일부 실시예들에서, 프로그램(103)은 하나의 가상 머신과 연관된 운영 체제이고, 프로그램(104)은 프로세싱 시스템에서 실행되는 상이한 가상 머신과 연관된 운영 체제이다. 설명의 목적으로, 각각의 프로그램(103 및 104)은 상이한 특정 프로세싱 속도, 전력 소비 사양 등과 같은 상이한 프로세싱 사양을 갖는 것으로 가정한다. 예를 들어, 일부 실시예들에서, 프로그램(103)은 프로세싱 시스템의 다른 프로그램들 또는 하드웨어 컴포넌트들에 대한 역방향 호환성(backward compatibility)을 제공하기 위해 상대적으로 낮은 주파수에서 실행되도록 특정된 "레거시" 프로그램인 반면, 프로그램(104)은 성능 목표들을 충족시키기 위해 상대적으로 높은 주파수에서 실행되도록 특정된 새로운 프로그램이다. 본 명세서에서 추가로 설명되는 바와 같이, GPU(102)는, 프로그램들(103 및 104) 각각이 그 프로세싱 사양들을 준수하도록, 특정된 파라미터들, 및 특히 GPU(102)의 클록 주파수를 조정할 수 있다.CPU 101 is generally configured to concurrently execute multiple programs and corresponding program threads. As used herein, a program thread refers to either an individual program (eg, an operating system, an application program, etc.) or an individual thread in a multithreaded program. In the example shown, CPU 101 executes two program threads, a designated program 103 and program 104 simultaneously. It will be appreciated, however, that the techniques described with respect to FIG. 1 are, in other embodiments, implemented in a processing system that executes N programs concurrently, where N is an integer greater than one. Thus, in some embodiments, CPU 101 implements a virtualized computing environment by concurrently executing multiple virtual machines, where programs 103 and 104 correspond to programs executed by different virtual machines. do. For example, in some embodiments, program 103 is an operating system associated with one virtual machine and program 104 is an operating system associated with a different virtual machine running on a processing system. For illustrative purposes, it is assumed that each program 103 and 104 has different processing specifications, such as different specific processing speeds, power consumption specifications, and the like. For example, in some embodiments, program 103 is a “legacy” program that is specified to run at a relatively low frequency to provide backward compatibility for other programs or hardware components of the processing system. In contrast, program 104 is a new program that is specified to run at a relatively high frequency to meet performance goals. As further described herein, GPU 102 may adjust specified parameters, and in particular clock frequency of GPU 102 , such that programs 103 and 104 each comply with their processing specifications. .

GPU(102)는 일반적으로 프로세싱 시스템에 대한 그래픽 및 벡터 프로세싱 동작들을 수행하도록 구성된다. 예시하기 위해, 일부 실시예들에서 프로그램들(103 및 104)을 실행하는 과정에서 CPU(101)는 GPU(102)가 지정된 동작들을 수행할 것을 요청하는, 본 명세서에서 GPU 명령(command)들(예를 들어, 명령들(105 및 107))로 지칭되는 특정 명령들을 생성한다. GPU 명령들의 예들은, GPU가 디스플레이를 위해 지정된 객체를 드로우(draw)하도록 요청하는 드로우 명령들(draw commands), GPU가 지정된 벡터 연산(vector operation)을 수행하도록 요청하는 벡터 명령들 등을 포함한다. 하나 이상의 CPU들은 실행을 위해 GPU 명령들을 GPU (102)에 제공한다. 각각의 명령은 프로그램들(103 및 104) 중 대응하는 프로그램에 의해 하달되고(issued) 그와 연관된다. 따라서, 예를 들어, 일부 실시예들에서, 명령(105)은 GPU(102)가 프로그램(103)을 대신하여 하나 이상의 객체들을 드로우하도록 요청하는 드로우 명령이고, 명령(107)은 GPU(102)가 프로그램(104)을 대신하여 하나 이상의 객체들을 드로우하도록 요청하는 드로우 명령이다.GPU 102 is generally configured to perform graphics and vector processing operations for a processing system. To illustrate, in some embodiments, in the course of executing programs 103 and 104 , CPU 101 may use GPU commands herein to request GPU 102 to perform specified operations. For example, create specific instructions referred to as instructions 105 and 107). Examples of GPU commands include draw commands that request the GPU to draw a specified object for display, vector commands that request the GPU to perform a specified vector operation, etc. . One or more CPUs provide GPU instructions to GPU 102 for execution. Each instruction is issued by and associated with a corresponding one of programs 103 and 104 . Thus, for example, in some embodiments, instruction 105 is a draw instruction that requests GPU 102 to draw one or more objects on behalf of program 103 , and instruction 107 is GPU 102 . is a draw command that requests to draw one or more objects on behalf of the program 104 .

CPU(101)는 프로그램들(103 및 104)을 동시에 실행한다는 것이 이해될 것이다. 따라서, 일부 실시예들에서, CPU(101)는 실행을 위해 시간 다중화된 방식으로 프로그램들(103 및 104) 중 상이한 것들과 연관된 상이한 명령들을 GPU(102)에 제공한다. 예시하기 위해, 일부 실시예들에서, CPU(101)는 프로그램(103)을 대신하여 실행하기 위해 GPU(102)에 명령(105)을 제공하고, 이어서 프로그램(104)을 대신하여 실행하기 위해 명령(107)을 제공하며, 이어서 프로그램(103)을 대신하여 실행하기 위해 다른 명령(도시되지 않음)를 제공한다. 본 명세서에서 추가로 설명되는 바와 같이, 일부 경우들에서, 상이한 프로그램들, 및 따라서 상이한 명령들은 프로그램들이 특정된 품질 또는 특정된 디스플레이 프레임 레이트와 같은 다른 요건들을 충족시키기 위해 상이한 요구되는 클록 주파수들과 같은 상이한 특정된 프로세싱 요건들을 갖는다. GPU(102)는 명령들에 기초하여 생성된 워크로드들을 분석함으로써 상이한 프로세싱 요건들을 식별하고, 식별된 프로세싱 요건들에 기초하여 클록 주파수와 같은 프로세싱 파라미터들을 조정한다. GPU(102)는 이에 의해, 고정된 프로세싱 파라미터 값들에 기초하기보다는, 프로세싱 요구들 및 특정된 프로세싱 요건들의 조합에 기초하여, 프로세싱 파라미터들을 동적으로 조정한다.It will be appreciated that CPU 101 executes programs 103 and 104 concurrently. Accordingly, in some embodiments, CPU 101 provides different instructions associated with different ones of programs 103 and 104 to GPU 102 in a time multiplexed manner for execution. To illustrate, in some embodiments, CPU 101 provides instructions 105 to GPU 102 for execution on behalf of program 103 , which in turn provides instructions for execution on behalf of program 104 . 107 , followed by other instructions (not shown) for execution on behalf of the program 103 . As further described herein, in some cases different programs, and thus different instructions, may be configured with different required clock frequencies and/or different required clock frequencies in order for the programs to meet other requirements, such as a specified quality or a specified display frame rate. same different specified processing requirements. GPU 102 identifies different processing requirements by analyzing the workloads generated based on the instructions, and adjusts processing parameters, such as clock frequency, based on the identified processing requirements. GPU 102 thereby dynamically adjusts processing parameters based on a combination of processing requirements and specified processing requirements, rather than based on fixed processing parameter values.

CPU 명령들 (예를 들어, 명령들 (105 및 107))의 실행을 용이하게 하기 위해, GPU(102)는 스케줄러(106) 및 컴퓨팅 유닛들(115)의 세트를 포함한다. 컴퓨팅 유닛들(115)의 세트는 복수의 컴퓨팅 유닛들 (예를 들어, 컴퓨팅 유닛 (118))을 포함하고, 각각의 컴퓨팅 유닛은 그래픽들 및 벡터 프로세싱 동작들을 수행하도록 구성된 다수의 프로세싱 엘리먼트들, 예컨대, 단일 명령어, 다수의 데이터 (SIMD) 유닛들, 스트림 프로세서들 등을 포함한다. 일부 실시예들에서, 각각의 컴퓨팅 유닛은 하나 이상의 브랜치 유닛들, 스칼라 유닛들, 벡터 유닛들, 레지스터 파일들 등과 같은 프로세싱 엘리먼트들을 지원하기 위한 추가적인 모듈들을 포함한다. 스케줄러(106)는 하나 이상의 CPU들로부터 수신된 명령들에 기초하여 컴퓨팅 유닛들(115)의 세트에서 웨이브프론트(wavefronts)들의 형태로 동작들을 스케줄링하는 모듈이다. 일부 실시예들에서, GPU(102)는 수신된 명령들을 하나 이상의 동작들로 디코딩하기 위해 명령 프로세서 또는 다른 모듈을 포함하고 스케줄링을 위해 동작들을 스케줄러(106)에 제공한다. 또한, 일부 실시예들에서, 스케줄러(106)는 상이한 스케줄링 모듈들로 구성되고, 각각의 스케줄링 모듈은 GPU(102)의 상이한 리소스들에서 동작들을 스케줄링한다. 예를 들어, 일부 실시예들에서, 스케줄러(106)는 컴퓨팅 유닛들(115)에서 그래픽들 및 벡터 프로세싱 동작들을 스케줄링하기 위한 스케줄링 모듈, GPU(102)의 메모리 리소스들에서 메모리 동작들을 스케줄링하기 위한 스케줄링 모듈 등을 포함한다.To facilitate execution of CPU instructions (eg, instructions 105 and 107 ), GPU 102 includes a scheduler 106 and a set of computing units 115 . The set of computing units 115 includes a plurality of computing units (eg, computing unit 118 ), each computing unit comprising a number of processing elements configured to perform graphics and vector processing operations; It includes, for example, a single instruction, multiple data (SIMD) units, stream processors, and the like. In some embodiments, each computing unit includes additional modules to support processing elements such as one or more branch units, scalar units, vector units, register files, and the like. The scheduler 106 is a module that schedules operations in the form of wavefronts in the set of computing units 115 based on instructions received from one or more CPUs. In some embodiments, GPU 102 includes an instruction processor or other module to decode received instructions into one or more operations and provides the operations to scheduler 106 for scheduling. Also, in some embodiments, scheduler 106 is configured with different scheduling modules, each scheduling module scheduling operations on different resources of GPU 102 . For example, in some embodiments, the scheduler 106 is a scheduling module for scheduling graphics and vector processing operations on the computing units 115 , a scheduling module for scheduling memory operations on memory resources of the GPU 102 . scheduling module and the like.

컴퓨팅 유닛들(115)(뿐만 아니라 다른 모듈들)에서의 동작들을 동기화하기 위해, GPU(102)는 클록 신호(CK)를 생성하기 위해 클록 제어 모듈(110)을 이용하고 CK 신호를 컴퓨팅 유닛들(115) 각각에 제공한다. 일부 실시예들에서, 클록 제어 모듈(110)은 클록 신호(CK)의 주파수를 지정된 주파수로 잠그기(lock) 위한 주파수 고정 루프(frequency locked loop, FLL)와 같은 하나 이상의 제어 루프들을 포함하며, 지정된 주파수는 본 명세서에서 추가로 설명되는 바와 같이 제어 시그널링을 통해 조정가능하다. 특히, GPU(102)는 각각의 프로그램이 그 프로세싱 사양을 준수하도록 프로그램들(103 및 104)에 의해 GPU(102) 상에 배치된 프로세싱 요구들에 기초하여 클록 신호(CK)의 주파수를 설정하기 위해 클록 제어 모듈(110)에 대한 제어 시그널링을 설정한다.To synchronize operations in computing units 115 (as well as other modules), GPU 102 uses clock control module 110 to generate a clock signal CK and transmits the CK signal to the computing units. (115) provided for each. In some embodiments, the clock control module 110 includes one or more control loops, such as a frequency locked loop (FLL) for locking the frequency of the clock signal CK to a specified frequency. The frequency is adjustable via control signaling as further described herein. In particular, GPU 102 sets the frequency of clock signal CK based on processing requests placed on GPU 102 by programs 103 and 104 so that each program conforms to its processing specification. To set the control signaling for the clock control module (110).

예시하기 위해, 수신된 명령 또는 명령들의 세트에 기초한 GPU(102)의 리소스들의 전체 사용은 본 명세서에서 워크로드 (예를 들어, 워크로드들 (116 및 117))로서 지칭된다. 더 무겁거나 더 높은 워크로드는 GPU(102)의 더 많은 리소스를 사용하는 반면, 더 가볍거나 더 낮은 워크로드는 GPU(102)의 더 적은 리소스를 사용한다. 따라서, 프로그램들(103 및 104) 중 특정 하나에 의해 생성된 워크로드는 프로그램에 의해 생성된 명령들에 기초한다. 또한, 프로그램에 의해 생성된 워크로드는 일반적으로 그 프로그램에 대한 프로세싱 사양과 상관 관계가 있다. 따라서, 예를 들어, 실행 주파수가 상대적으로 높은 프로그램(즉, 빠르게 실행될 것으로 예상되거나 특정된 프로그램)은 일반적으로 더 무거운 워크로드(즉, 실행에 더 많은 리소스가 필요한 워크로드)를 생성한다. 반면, 특정된 실행 주파수가 상대적으로 낮은 프로그램은 더 가벼운 워크로드(즉, 실행을 위해 더 많은 리소스가 필요한 워크로드)를 생성한다.To illustrate, the overall use of resources of GPU 102 based on a received instruction or set of instructions is referred to herein as a workload (eg, workloads 116 and 117 ). Heavier or higher workloads use more resources of GPU 102 , while lighter or lower workloads use fewer resources of GPU 102 . Thus, the workload generated by a particular one of the programs 103 and 104 is based on the instructions generated by the program. Also, the workload generated by a program generally correlates with the processing specifications for that program. Thus, for example, a program with a relatively high frequency of execution (ie, a program that is expected or specified to run quickly) generally creates a heavier workload (ie, a workload that requires more resources to execute). On the other hand, a program with a relatively low specified execution frequency creates a lighter workload (ie, a workload that requires more resources to run).

상이한 특정된 실행 주파수들을 갖는 프로그램들을 동시에 실행하는 것을 수용하기 위해, GPU(102)는 클록 주파수 조정 모듈 (CFAM) (108)을 포함한다. CFAM(108)은 GPU(102)의 현재 워크로드를 나타내는 파라미터들을 모니터링하고, 그에 의해, 사실상, 프로그램들(103 및 104) 중 어느 것이 GPU(102)에서 현재 실행되고 있는지를 검출하고, 검출된 프로그램에 대한 특정된 클록 주파수로 클록 신호 (CK)의 주파수를 설정하기 위해 클록 제어 모듈(110)에 제어 시그널링을 제공한다. CFAM(108)에 의해 모니터링되는 파라미터들의 예들은, 일부 실시예들에서, 특정된 양의 시간에서 컴퓨팅 유닛들(115)에서 스케줄러(106)에 의해 스케줄링된 웨이브프론트들의 수, 특정된 양의 시간에서 GPU(102)에 의해 수신된 드로우 명령들의 수, 드로우 또는 디스패치 명령의 유형, 컴파일러에 의해 제공되는 힌트, 등, 또는 이들의 임의의 조합을 포함한다. 모니터링된 파라미터들이 워크로드 임계값을 초과하는 경우, CFAM(108)은 CK 신호의 주파수를 더 높은 특정 주파수(F₂)로 증가시킨다. 모니터링된 파라미터들이 특정된 양의 시간 동안 워크로드 임계값 아래로 떨어지는 것에 응답하여, CFAM(108)은 CK 신호의 클록 주파수를 더 낮은 특정 주파수(F₁)로 감소시킨다. 일부 실시예들에서, 더 높은 및 더 낮은 특정 주파수들(F₂ 및 F₁)은 각 프로그램에 의해 GPU(102)에 제공되는 명령 등을 통해, 대응하는 프로그램의 초기화 동안 프로그램들(103 및 104)에 의해 표시된다. 또한, 일부 실시예들에서, 워크로드 임계값 및 주파수들(F₁ 및 F₂)은 프로그램가능한 값들로서, 프로그래머가 실행중인 프로그램들의 성능을 원하는 레벨로 튜닝할 수 있게 한다. To accommodate concurrently executing programs having different specified execution frequencies, GPU 102 includes a clock frequency adjustment module (CFAM) 108 . The CFAM 108 monitors parameters indicative of the current workload of the GPU 102 , thereby in effect detecting which of the programs 103 and 104 are currently running on the GPU 102 , and detecting the detected Provides control signaling to the clock control module 110 to set the frequency of the clock signal CK to the specified clock frequency for the program. Examples of parameters monitored by the CFAM 108 are, in some embodiments, the number of wavefronts scheduled by the scheduler 106 at the computing units 115 at the specified amount of time, the specified amount of time. the number of draw instructions received by GPU 102 in , the type of draw or dispatch instruction, a hint provided by the compiler, etc., or any combination thereof. If the monitored parameters exceed the workload threshold, the CFAM 108 increases the frequency of the CK signal to a higher specific frequency F ₂ . In response to the monitored parameters falling below the workload threshold for a specified amount of time, the CFAM 108 reduces the clock frequency of the CK signal to a lower specified frequency F ₁ . In some embodiments, higher and lower specific frequencies F ₂ and F ₁ are applied to programs 103 and 104 during initialization of the corresponding program, such as through instructions provided to GPU 102 by each program. ) is indicated by Also, in some embodiments, the workload thresholds and frequencies F ₁ and F ₂ are programmable values that allow the programmer to tune the performance of the running programs to a desired level.

예를 통해 예시하기 위해, 일부 실시예들에서, CFAM(108)에 의해 모니터링되는 파라미터는 프로그램들(103 및 104) 각각으로부터 수신되고 스케줄러(106)에 의해 실행되도록 스케줄링된 드로우 명령들의 수이다. 특정된 워크로드 임계값을 초과하는 특정된 양의 시간 동안의 드로우 명령들의 수(예를 들어, 100회 실행 사이클들에 걸쳐 10회 이상의 드로우 명령들)에 응답하여, CFAM(108)은 더 높은 특정된 클록 주파수와 연관된 프로그램이 실행되고 있다고 가정하고, 높은 수의 리소스들을 요구한다. 이에 응답하여, CFAM(108)은 CK 신호의 주파수를 더 높은 특정 주파수(F₂)로 증가시킨다. 수신된 드로우 명령의 수가 특정된 워크로드 임계값 아래로 떨어지면, CFAM(108)은 더 낮은 특정 주파수와 연관된 프로그램이 실행되고 있다고 가정한다. 이에 응답하여, CFAM(108)은 CK 신호의 주파수를 더 낮은 특정 주파수(F₁)로 감소시킨다.To illustrate by way of example, in some embodiments, the parameter monitored by CFAM 108 is the number of draw instructions received from each of programs 103 and 104 and scheduled to be executed by scheduler 106 . In response to a number of draw instructions during a specified amount of time exceeding a specified workload threshold (eg, 10 or more draw instructions over 100 execution cycles), CFAM 108 may Assume that a program associated with a specified clock frequency is running and requires a high number of resources. In response, the CFAM 108 increases the frequency of the CK signal to a higher specific frequency F ₂ . When the number of draw commands received falls below the specified workload threshold, the CFAM 108 assumes that the program associated with the lower specified frequency is being executed. In response, the CFAM 108 reduces the frequency of the CK signal to a lower specific frequency F ₁ .

다른 예로서, 일부 실시예들에서, CFAM(108)에 의해 모니터링된 파라미터는 스케줄러(106)에 의해 컴퓨팅 유닛들(115)에서 실행되도록 스케줄링된 웨이브프론트들의 수이다. 워크로드 임계값을 초과하는 스케줄링된 웨이브프론트들의 수(예를 들어, 500 실행 사이클들에 걸쳐 100개 초과의 웨이브프론트들)에 응답하여, CFAM(108)은 더 높은 특정된 클록 주파수와 연관된 프로그램이 실행되고 있다고 가정하고, 높은 수의 리소스들을 요구한다. 이에 응답하여, CFAM(108)은 CK 신호의 주파수를 더 높은 특정 주파수(F₂)로 증가시킨다. 스케줄링된 웨이브프론트들의 수가 특정된 워크로드 임계값 아래로 떨어질 때, CFAM(108)은 더 낮은 특정 주파수와 연관된 프로그램이 실행되고 있다고 가정한다. 이에 응답하여, CFAM(108)은 CK 신호의 주파수를 더 낮은 특정 주파수(F₁)로 감소시킨다.As another example, in some embodiments, the parameter monitored by the CFAM 108 is the number of wavefronts scheduled to be executed on the computing units 115 by the scheduler 106 . In response to a number of scheduled wavefronts exceeding a workload threshold (eg, more than 100 wavefronts over 500 execution cycles), the CFAM 108 may initiate a program associated with a higher specified clock frequency. Assuming this is running, it requires a high number of resources. In response, the CFAM 108 increases the frequency of the CK signal to a higher specific frequency F ₂ . When the number of scheduled wavefronts falls below a specified workload threshold, CFAM 108 assumes that a program associated with a lower specific frequency is running. In response, the CFAM 108 reduces the frequency of the CK signal to a lower specific frequency F ₁ .

다른 실시예들에서, CFAM(108)에 의해 모니터링된 파라미터는 특정된 객체를 드로우하기 위한 드로우 명령들의 수, 정점(vertice)들의 임계 수를 갖는 객체 등과 같은 특정된 유형의 드로우 명령들의 수이다. CFAM(108)은 명령들(105 및 107)에 포함된 명령 파라미터들로부터 드로우 명령의 유형을 결정한다. 예를 들어, 일부 실시예들에서, 각각의 명령은 드로잉될 객체의 유형, 객체의 정점들의 수 등을 나타낸다. 다른 실시예들에서, 이들 파라미터들은 GPU(102)의 명령 프로세서에 의해 식별된다.In other embodiments, the parameter monitored by the CFAM 108 is the number of draw instructions for drawing the specified object, the number of draw instructions of a specified type, such as an object having a threshold number of vertices, and the like. CFAM 108 determines the type of draw command from the command parameters included in commands 105 and 107 . For example, in some embodiments, each command indicates the type of object to be drawn, the number of vertices of the object, and the like. In other embodiments, these parameters are identified by the instruction processor of GPU 102 .

상기 예들에 의해 표시된 바와 같이, 일부 실시예들에서, CFAM(108)은 스케줄러(106)에서 저장되거나 모니터링된 정보에 기초하여 GPU의 워크로드를 식별한다. 예를 들어, 일부 실시예들에서, 스케줄러(106)는 CPU(101)로부터 수신된 드로우 명령들의 수, 수신된 드로우 명령들의 유형들, 실행을 위해 스케줄링된 웨이브프론트들의 수 등, 또는 이들의 임의의 조합을 나타내는 데이터를 저장하는 레지스터들 또는 다른 메모리 구조들을 유지한다. 저장된 정보에 기초하여, CFAM (108)은 GPU (102)에 대한 전체 워크로드를 식별하고, 본 명세서에서 설명된 바와 같이 CK 클록 신호의 클록 주파수를 조정한다.As indicated by the examples above, in some embodiments, the CFAM 108 identifies a workload of the GPU based on information stored or monitored in the scheduler 106 . For example, in some embodiments, scheduler 106 may control the number of draw instructions received from CPU 101 , types of draw instructions received, number of wavefronts scheduled for execution, etc., or any thereof maintain registers or other memory structures that store data representing a combination of Based on the stored information, the CFAM 108 identifies the overall workload for the GPU 102 and adjusts the clock frequency of the CK clock signal as described herein.

프로그램 식별자 값 또는 특정된 시간 기간들과 같은 정적 조건에만 의존하지 않고, 검출된 워크로드에 기초하여 클록 주파수를 변경함으로써, GPU(102)는 각각의 프로그램에 대한 프로세싱 사양들을 동적으로 충족시킬 수 있고, 이에 의해 프로세서 리소스들을 절약한다. 예를 들어, 일부 경우들에서, 고성능 프로그램은 (예를 들어, 상대적으로 간단한 드로우 명령을 실행하기 위해) GPU(102)에 상대적으로 가벼운 워크로드를 제공할 수 있고, 그 결과 GPU(102)는 CK 신호의 주파수를 더 낮은 주파수(F₁)에서 유지하게 된다. 대조적으로, GPU(102)가 프로그램 식별자와 같은 정적 조건에 의해서만 클록 주파수를 변경했다면, CK 신호의 주파수는 상응하는 성능 이점 없이 더 높은 주파수로 증가될 것이다.By changing the clock frequency based on the detected workload, rather than relying solely on static conditions such as program identifier values or specified time periods, GPU 102 can dynamically meet the processing specifications for each program and , thereby saving processor resources. For example, in some cases, a high-performance program may present a relatively light workload to GPU 102 (eg, to execute a relatively simple draw instruction), so that GPU 102 The frequency of the CK signal is maintained at a lower frequency (F ₁ ). In contrast, if GPU 102 only changed the clock frequency by a static condition such as a program identifier, the frequency of the CK signal would be increased to a higher frequency without a corresponding performance benefit.

도 2는 일부 실시예들에 따라 검출된 워크로드에 기초하여 클록 신호(CK)의 주파수를 조정하는 CFAM(108)의 예를 도시하는 다이어그램(200)을 예시한다. 다이어그램(200)은 시간을 나타내는 x-축 및 클록 신호(CK)의 주파수를 나타내는 y-축을 도시한다. 다이어그램(200)은 시간이 경과함에 따라 변화하는 클록 신호(CK)의 주파수의 예를 나타내는 플롯(201)을 추가로 도시한다.2 illustrates a diagram 200 illustrating an example of a CFAM 108 that adjusts the frequency of a clock signal CK based on a detected workload in accordance with some embodiments. The diagram 200 shows an x-axis representing time and a y-axis representing the frequency of the clock signal CK. The diagram 200 further shows a plot 201 representing an example of the frequency of the clock signal CK varying over time.

도 2의 예에서, 프로그램(103)은 F₁로 지정된 더 낮은 특정 주파수와 연관되고 프로그램(104)은 더 높은 특정 주파수와 연관되는 것으로 가정한다. 플롯(201)의 예시된 예에서, 초기 시간(202)에서, GPU(102)의 워크로드는 하나 이상의 워크로드 파라미터들에 의해 표시된 바와 같이 워크로드 임계값 미만이다. 이것은 GPU(102)가 프로그램(103)을 대신하여 명령들을 실행할 가능성이 있음을 나타낸다. 따라서, 그리고 워크로드가 워크로드 임계값 아래에 있는 것에 응답하여, CFAM(108)은 클록 신호(CK)의 주파수를 더 낮은 주파수(F₁)로 설정한다.In the example of FIG. 2 , it is assumed that program 103 is associated with a lower specific frequency designated F ₁ and program 104 is associated with a higher specific frequency. In the illustrated example of plot 201 , at initial time 202 , the workload of GPU 102 is below a workload threshold as indicated by one or more workload parameters. This indicates that GPU 102 is likely to execute instructions on behalf of program 103 . Accordingly, and in response to the workload being below the workload threshold, the CFAM 108 sets the frequency of the clock signal CK to a lower frequency F ₁ .

시간(202) 이후의 시간(203)에서, GPU(102)에서의 워크로드는 증가하여, 워크로드는 워크로드 임계값을 초과한다. 따라서, 워크로드는 GPU(102)가 프로그램(104)을 대신하여 명령들을 실행하고 있음을 나타낸다. 따라서, 그리고 워크로드 임계값을 초과하여 증가하는 워크로드에 응답하여, CFAM(108)은 시간(204)에서 주파수가 더 높은 특정 주파수(F₂)에 도달할 때까지 클록 신호(CK)의 주파수를 증가시키기 시작한다. 예시된 바와 같이, CFAM(108)은 시간(203)에서 클록 주파수를 F₂로 즉시 설정하기 보다는 시간에 걸쳐(시간(203)과 시간(204) 사이) 클록 신호를 주파수(F₁)로부터 주파수(F2)로 램핑한다. 일부 실시예에서, 시간(203)과 시간(204) 사이의 시간은 50마이크로초 이하이다. 주파수(F₁)로부터 주파수(F₂)로 클록을 램핑함으로써, GPU(102)는 실행을 중지하고 컴퓨팅 유닛들(115)로부터 데이터를 플러싱(flushing)하기 보다는 시간(203 및 204) 사이에서 동작들을 계속 실행한다.At time 203 after time 202 , the workload on GPU 102 increases, such that the workload exceeds the workload threshold. Thus, the workload indicates that GPU 102 is executing instructions on behalf of program 104 . Thus, and in response to the increasing workload beyond the workload threshold, the CFAM 108 controls the frequency of the clock signal CK until at time 204 a specific frequency F ₂ is reached with a higher frequency. starts to increase As illustrated, the CFAM 108 converts the clock signal from frequency F ₁ to frequency over time (between times 203 and 204 ) rather than immediately setting the clock frequency to F ₂ at time 203 . Ramp with (F2). In some embodiments, the time between time 203 and time 204 is 50 microseconds or less. By ramping the clock from frequency F ₁ to frequency F ₂ , GPU 102 stops executing and operates between times 203 and 204 rather than flushing data from computing units 115 . keep running them

시간(204)과 시간(205) 사이에서, GPU(102)의 워크로드는 워크로드 임계값을 초과하고, 이에 응답하여 CFAM(108)은 클록 신호(CK)의 주파수를 더 높은 주파수(F₂)로 유지한다. 시간(205)에서, GPU(102)의 워크로드는 임계값 아래로 떨어진다. 이에 응답하여, CFAM(108)은 더 낮은 주파수(F₁)로 복귀하기 위해 클록 신호(CK)의 주파수를 램핑(ramping)하기 시작한다. 일부 실시예들에서, CFAM(108)은 워크로드 임계값 주위의 짧은 편위(excursion)가 클록 신호(CK)의 주파수에 대한 빈번한 조정을 야기하는 것을 방지하기 위해 히스테리시스를 이용한다. 예를 들어, 일부 실시예들에서, CFAM(108)은 GPU(102)의 워크로드가 지정된 양의 시간 동안 임계값 초과인 것에 응답하여 클록 신호 주파수의 조정을 개시한다.Between time 204 and time 205 , the workload of GPU 102 exceeds the workload threshold, and in response CFAM 108 increases the frequency of clock signal CK to a higher frequency F _{2 .} ) to keep At time 205 , the workload of GPU 102 drops below the threshold. In response, the CFAM 108 begins to ramp the frequency of the clock signal CK to return to the lower frequency F ₁ . In some embodiments, the CFAM 108 uses hysteresis to prevent short excursions around the workload threshold from causing frequent adjustments to the frequency of the clock signal CK. For example, in some embodiments, the CFAM 108 initiates adjustment of the clock signal frequency in response to the workload of the GPU 102 being above a threshold for a specified amount of time.

도 3은 일부 실시예들에 따른 CFAM(108)의 예를 도시한다. 도시된 예에서, CFAM(108)은 제어 모듈(320), 프로그램 주파수 레지스터들의 세트(322), 및 워크로드 임계값 레지스터들의 세트(324)를 포함한다. 프로그램 주파수 레지스터들(322)은 GPU(102)의 프로세싱 시스템에서 실행되는 각각의 프로그램에 대한 주파수 값들을 저장하는 프로그램 가능한 레지스터들의 세트이다. 일부 실시예들에서, 각각의 실행되는 프로그램은 디바이스 드라이버를 통해 GPU(102)에 명령들을 전송한다. 프로그램이 프로세싱 시스템에서 실행을 시작할 때, 프로그램은 프로그램에 대한 특정된 실행 주파수를 표시하는 디바이스 드라이버 주파수 정보를 전송한다. 이에 응답하여, 디바이스 드라이버는 프로그램에 대한 특정된 실행 주파수로 프로그램 주파수 레지스터들(322) 중 대응하는 하나를 프로그래밍한다.3 shows an example of a CFAM 108 in accordance with some embodiments. In the illustrated example, CFAM 108 includes a control module 320 , a set of program frequency registers 322 , and a set of workload threshold registers 324 . Program frequency registers 322 are a set of programmable registers that store frequency values for each program executed in the processing system of GPU 102 . In some embodiments, each executing program sends instructions to GPU 102 via a device driver. When a program begins execution in the processing system, the program sends device driver frequency information indicating the specified execution frequency for the program. In response, the device driver programs a corresponding one of the program frequency registers 322 with the execution frequency specified for the program.

워크로드 임계값 레지스터들(324)은 GPU(102)의 프로세싱 시스템에서 실행되는 각각의 프로그램에 대한 워크로드 임계값들을 저장하는 프로그램가능 레지스터들의 세트이다. 일부 실시예들에서, 각각의 실행 프로그램은 워크로드 프로파일과 연관되며, 이는 프로그램에 의해 생성된 가능성 있는 워크로드들을 표시한다. 일부 실시예들에서, 워크로드 프로파일은 프로그램의 개발 동안 소프트웨어 개발자에 의해 생성된다. 다른 실시예들에서, 워크로드 프로파일은 프로그램이 실행되는 제1 N회 동안 GPU(102)에 의해 개발되며, 여기서 N은 정수이다. 예를 들어, 프로그램이 실행되는 제1 N회에, GPU(102)는 프로그램을 대신하여 실행을 위해 스케줄링된 웨이브프론트의 수, 프로그램을 대신하여 CPU(101)에 의해 하달된 드로우 명령의 수 등과 같이 프로그램에 의해 생성된 워크로드를 측정하기 위해 성능 모니터(도시되지 않음)를 이용한다. 제어 모듈(320)은 프로그램의 평균 워크로드를 표시하는 워크로드 임계값(예를 들어, 프로그램에 의해 생성된 평균 웨이브프론트들의 수 또는 드로우 명령들의 평균 수, 또는 이들의 조합)을 생성하고, 워크로드 임계값 레지스터들(324) 중 대응하는 하나에 워크로드 임계값을 저장한다.The workload threshold registers 324 are a set of programmable registers that store workload thresholds for each program executing in the processing system of the GPU 102 . In some embodiments, each executable program is associated with a workload profile, which indicates possible workloads generated by the program. In some embodiments, the workload profile is created by the software developer during development of the program. In other embodiments, the workload profile is developed by GPU 102 during the first N times the program is executed, where N is an integer. For example, the first N times that the program is executed, the GPU 102 determines the number of wavefronts scheduled for execution on behalf of the program, the number of draw instructions issued by the CPU 101 on behalf of the program, etc. Similarly, a performance monitor (not shown) is used to measure the workload generated by the program. The control module 320 generates a workload threshold indicative of the average workload of the program (eg, the average number of wavefronts generated by the program or the average number of draw instructions, or a combination thereof), Store the workload threshold in a corresponding one of the load threshold registers 324 .

동작시, 적어도 2개의 프로그램들이 GPU(102)의 프로세싱 시스템에서 동시에 실행될 때, 제어 모듈(320)은 스케줄러(106)로부터, 스케줄링된 웨이브프론트들의 수 또는 특정된 양의 시간에서 수신된 드로우 명령들의 수와 같은 정보를 수신한다. 제어 모듈(320)은 워크로드 정보를 워크로드 임계값 레지스터(324)에 저장된 워크로드 임계값과 비교한다. 워크로드 임계값이 초과하는 것에 응답하여, 제어 모듈(320)은 초과된 임계값과 연관된 프로그램을 결정하고, 프로그램 주파수 레지스터(322)로부터 프로그램에 대한 프로그램 주파수를 검색한다. 그런 다음 제어 모듈(320)은 CK 클록 신호의 주파수를 검색된 프로그램 주파수로 조정하기 위해 제어 시그널링을 클록 제어 모듈(110)에 전송한다.In operation, when at least two programs are concurrently executed in the processing system of GPU 102 , control module 320 controls, from scheduler 106 , the number of scheduled wavefronts or draw commands received at a specified amount of time. Receive information such as numbers. The control module 320 compares the workload information with a workload threshold stored in the workload threshold register 324 . In response to the workload threshold being exceeded, the control module 320 determines the program associated with the exceeded threshold and retrieves the program frequency for the program from the program frequency register 322 . Then, the control module 320 transmits a control signaling to the clock control module 110 to adjust the frequency of the CK clock signal to the retrieved program frequency.

GPU 워크로드가 워크로드 임계값 아래로 떨어졌다는 것을 표시하는 워크로드 정보에 응답하여, 제어 모듈(320)은 워크로드 임계값 레지스터들(324)에 저장된 다음-낮은 워크로드 임계값과 연관된 프로그램을 결정한다. 제어 모듈(320)은 프로그램 주파수 레지스터들(322)로부터 식별된 프로그램에 대한 프로그램 주파수를 검색하고, CK 클록 신호의 주파수를 검색된 프로그램 주파수로 조정하기 위해 제어 시그널링을 클록 제어 모듈(110)에 전송한다.In response to the workload information indicating that the GPU workload has fallen below the workload threshold, the control module 320 executes the program associated with the next-lower workload threshold stored in the workload threshold registers 324 . decide The control module 320 retrieves the program frequency for the identified program from the program frequency registers 322 and sends a control signaling to the clock control module 110 to adjust the frequency of the CK clock signal to the retrieved program frequency. .

도 4는 일부 실시예들에 따라 검출된 워크로드에 의해 표시된 바와 같은 실행 프로그램을 식별하는 것에 기초하여 프로세싱 유닛에서 클록 신호의 주파수를 설정하는 방법(400)의 흐름도를 도시한다. 설명의 목적들을 위해, 방법(400)은 도 1의 GPU(102)에서의 예시적인 구현예와 관련하여 설명되지만, 다른 실시예들에서 방법(400)은 다른 프로세싱 유닛들 및 다른 프로세싱 시스템들에서 구현된다는 것이 이해될 것이다. 흐름도로 돌아가면, 블록(402)에서, GPU(102)는 프로세싱 시스템에서 실행되고 있는 각각의 프로그램에 대한 워크로드 임계값을 결정한다. 위에서 언급된 바와 같이, 일부 실시예들에서, 각각의 실행 프로그램은 프로그램의 개발 동안 생성된 워크로드 프로파일에 기초하여 초기화 시에 워크로드 임계값을 제공한다. 다른 실시예들에서, GPU(102)는 프로그램이 실행되는 제1 N회 동안 프로그램에 의해 생성된 평균 워크로드를 결정함으로써 실행 프로그램에 대한 워크로드 임계값을 식별한다. 또 다른 실시예들에서, GPU(102)는 프로그램이 실행될 때마다 실행 중인 프로그램에 의해 생성된 워크로드를 기록하고, 프로그램이 실행된 마지막 M회 동안 프로그램에 의해 생성된 평균 워크로드를 결정함으로써 워크로드 임계값을 결정하도록 구성되고, 여기서 M은 정수이다. GPU(102)는 워크로드 임계값 레지스터들 (324)의 대응하는 레지스터에서 워크로드 임계값을 기록한다.4 shows a flow diagram of a method 400 for setting the frequency of a clock signal in a processing unit based on identifying an executable program as indicated by a detected workload in accordance with some embodiments. For purposes of explanation, method 400 is described with respect to an example implementation in GPU 102 of FIG. 1 , although in other embodiments method 400 may be used in other processing units and other processing systems. It will be appreciated that the implementation Returning to the flowchart, at block 402 , GPU 102 determines a workload threshold for each program being executed in the processing system. As noted above, in some embodiments, each executable program provides a workload threshold upon initialization based on a workload profile generated during development of the program. In other embodiments, GPU 102 identifies a workload threshold for an executing program by determining an average workload generated by the program during the first N times the program is executed. In still other embodiments, GPU 102 records the workload generated by the running program each time the program is executed, and determines the average workload generated by the program during the last M times the program was executed. and determine a load threshold, where M is an integer. GPU 102 writes the workload threshold in a corresponding register of workload threshold registers 324 .

블록(404)에서, GPU(102)는 프로세싱 시스템에서 동시에 실행되는 프로그램들에 대한 특정된 클록 주파수들을 결정한다. 위에서 표시된 바와 같이, 일부 실시예들에서, 특정된 클록 주파수는 디바이스 드라이버를 통해 각각의 실행 프로그램에 의해 GPU(102)에 제공된다. GPU(102)는 특정된 클록 주파수들을 프로그램 주파수 레지스터들(322) 중 대응하는 것들에 저장한다.At block 404 , GPU 102 determines specified clock frequencies for concurrently executing programs in the processing system. As indicated above, in some embodiments, the specified clock frequency is provided to GPU 102 by the respective executable program via a device driver. GPU 102 stores the specified clock frequencies in corresponding ones of program frequency registers 322 .

블록(406)에서, CFAM(108)은 컴퓨팅 유닛들(115)에서 스케줄링된 웨이브프론트들의 수, 특정된 양의 시간 내에 GPU(102)에 의해 수신된 드로우 명령들의 수 등과 같이, 스케줄러(106)에 의해 제공된 정보에 기초하여 GPU(102)의 워크로드를 모니터링한다. 블록(408)에서, CFAM(108)은 워크로드가 워크로드 레지스터들(324)에 저장된 워크로드 임계값들 중 하나를 초과하였는지를 결정한다. 그렇지 않으면, 방법은 블록(406)으로 돌아가고, CFAM(108)은 현재 주파수에서 CK 클록 신호의 클록 속도를 유지한다.At block 406 , the CFAM 108 executes the scheduler 106 , such as the number of wavefronts scheduled at the computing units 115 , the number of draw instructions received by the GPU 102 within a specified amount of time, and the like. monitor the workload of GPU 102 based on the information provided by At block 408 , the CFAM 108 determines whether the workload has exceeded one of the workload thresholds stored in the workload registers 324 . Otherwise, the method returns to block 406 where the CFAM 108 maintains the clock rate of the CK clock signal at the current frequency.

블록(408)에서, 워크로드가 워크로드 임계값을 초과했다고 CFAM(108)이 결정하면, CFAM(108)은 초과된 임계값과 연관된 프로그램을 식별하고, 프로그램 주파수 레지스터들(322)에 저장된 바와 같이 프로그램에 대한 특정된 프로그램 주파수를 추가로 결정한다. 블록(410)에서, CFAM(108)은 CK 클록 신호의 주파수를 특정된 프로그램 주파수로 조정하기 위해 제어 시그널링을 클록 제어 모듈(110)에 전송한다.At block 408 , if the CFAM 108 determines that the workload has exceeded the workload threshold, then the CFAM 108 identifies the program associated with the exceeded threshold, as stored in the program frequency registers 322 . together further determine the specified program frequency for the program. At block 410 , the CFAM 108 sends control signaling to the clock control module 110 to adjust the frequency of the CK clock signal to the specified program frequency.

방법 흐름은 블록(412)으로 이동하고 CFAM(108)은 스케줄러(106)에 의해 제공된 정보에 기초하여 GPU(102)의 워크로드를 계속 모니터링한다. 블록(414)에서, CFAM(108)은 워크로드가 워크로드 레지스터(324)에 저장된 워크로드 임계값 아래로 떨어졌는지를 결정한다. 그렇지 않으면, 방법은 블록(412)으로 돌아가고, CFAM(108)은 현재 주파수에서 CK 클록 신호의 클록 속도를 유지한다. 워크로드가 워크로드 임계값 아래로 떨어지는 것에 응답하여, 방법 흐름은 블록(416)으로 이동하고, CFAM(108)은 CK 클록 신호의 주파수를 초기 더 낮은 주파수로 돌아가게 하기 위해 클록 제어 모듈(110)에 제어 시그널링을 전송한다. 방법 흐름은 블록(406)으로 돌아가고, CFAM(108)은 GPU(102)의 워크로드를 계속 모니터링한다.The method flow moves to block 412 and the CFAM 108 continues to monitor the workload of the GPU 102 based on the information provided by the scheduler 106 . At block 414 , the CFAM 108 determines whether the workload has fallen below a workload threshold stored in the workload register 324 . Otherwise, the method returns to block 412 and the CFAM 108 maintains the clock rate of the CK clock signal at the current frequency. In response to the workload falling below the workload threshold, the method flow moves to block 416 , where the CFAM 108 returns the frequency of the CK clock signal to the initial lower frequency of the clock control module 110 . ) to transmit control signaling. The method flow returns to block 406 , where the CFAM 108 continues to monitor the workload of the GPU 102 .

일부 실시예들에서, 방법은: 그래픽 프로세싱 유닛(GPU)에서, 중앙 프로세싱 유닛(CPU)으로부터 복수의 명령들을 수신하는 단계 - 상기 복수의 명령들은 상기 CPU에서 동시에 실행되는 복수의 프로그램 스레드들과 연관되고, 상기 복수의 스레드들의 각각은 대응하는 특정된 클록 주파수와 연관됨 -, 상기 GPU에서, 상기 복수의 명령들 중 적어도 하나에 기초하여 상기 GPU에서 실행될 제1 워크로드를 결정하는 단계; 상기 제1 워크로드에 기초하여, 상기 CPU에서 동시에 실행되는 상기 복수의 프로그램 스레드들 중 제1 프로그램 스레드를 식별하는 단계; 및 상기 제1 프로그램 스레드를 식별하는 것에 응답하여, 상기 GPU의 클록 신호를 상기 제1 프로그램 스레드와 연관된 상기 특정 클록 주파수로 조정하는 단계를 포함한다. 일 양태에서, 상기 제1 프로그램 스레드를 식별하는 단계는 상기 제1 워크로드가 제1 워크로드 임계값을 초과하는 것에 응답하여 상기 제1 프로그램 스레드를 식별하는 단계를 포함한다.In some embodiments, the method comprises: receiving, at a graphics processing unit (GPU), a plurality of instructions from a central processing unit (CPU), the plurality of instructions being associated with a plurality of program threads executing concurrently on the CPU determining, at the GPU, a first workload to be executed on the GPU based on at least one of the plurality of instructions, each of the plurality of threads being associated with a corresponding specified clock frequency; identifying, based on the first workload, a first program thread from among the plurality of program threads that are concurrently executed in the CPU; and in response to identifying the first program thread, adjusting a clock signal of the GPU to the particular clock frequency associated with the first program thread. In an aspect, identifying the first program thread comprises identifying the first program thread in response to the first workload exceeding a first workload threshold.

일 양태에서, 방법은 상기 GPU에서, 상기 복수의 명령들 중 적어도 하나의 다른 명령에 기초하여 상기 제1 워크로드 이후에 상기 GPU에서 실행될 제2 워크로드를 결정하는 단계; 상기 제2 워크로드에 기초하여, 상기 CPU에서 동시에 실행되는 상기 복수의 프로그램 스레드들 중 제2 프로그램 스레드를 식별하는 단계; 및 상기 제2 프로그램 스레드를 식별하는 것에 응답하여, 상기 GPU의 상기 클록 신호를 상기 제1 주파수로부터 상기 제2 프로그램 스레드와 연관된 상기 특정 주파수로 조정하는 단계를 포함한다. 다른 양태에서, 상기 제2 프로그램 스레드를 식별하는 단계는 상기 제2 워크로드가 제2 워크로드 임계값 아래에 있는 것에 응답하여 상기 제2 프로그램 스레드를 식별하는 단계를 포함한다. 또 다른 양태에서, 제1 임계값은 프로그래밍 가능하다. 또 다른 양태에서, 제1 워크로드를 식별하는 단계는 GPU의 스케줄러에서 수신된 정보에 기초하여 제1 워크로드를 식별하는 단계를 포함한다.In an aspect, a method further comprises: determining, at the GPU, a second workload to be executed on the GPU after the first workload based on at least one other of the plurality of instructions; identifying, based on the second workload, a second program thread from among the plurality of program threads that are concurrently executed in the CPU; and in response to identifying the second program thread, adjusting the clock signal of the GPU from the first frequency to the particular frequency associated with the second program thread. In another aspect, identifying the second program thread includes identifying the second program thread in response to the second workload being below a second workload threshold. In another aspect, the first threshold is programmable. In another aspect, identifying the first workload includes identifying the first workload based on information received from a scheduler of the GPU.

일 양태에서, 제1 워크로드를 식별하는 것은 GPU의 컴퓨팅 유닛들의 세트에서의 실행을 위해 스케줄링된 웨이브프론트들의 수에 기초하여 제1 워크로드를 식별하는 것을 포함한다. 다른 양태에서, 제1 워크로드를 식별하는 것은 GPU에서 수신된 드로우 명령의 유형에 기초하여 제1 워크로드를 식별하는 것을 포함한다. 다른 양태에서, 클록을 조정하는 것은 제2 주파수로부터 제1 주파수로 클록을 램핑하는 것을 포함한다.In an aspect, identifying the first workload includes identifying the first workload based on a number of wavefronts scheduled for execution on the set of computing units of the GPU. In another aspect, identifying the first workload includes identifying the first workload based on a type of draw command received at the GPU. In another aspect, adjusting the clock includes ramping the clock from the second frequency to the first frequency.

일부 실시예들에서, 방법은: 그래픽 프로세싱 유닛(GPU)에서 실행될 제1 워크로드에 기초하여, 복수의 프로그램 스레드들 중 제1 프로그램 스레드를 식별하는 단계 - GPU는 복수의 프로그램 스레드들을 대신하여 워크로드들을 실행하고, 복수의 프로그램 스레드들은 중앙 프로세싱 유닛(CPU)에서 동시에 실행됨 - ; 및 제1 프로그램 스레드를 식별하는 것에 응답하여, GPU의 클록을 제1 프로그램 스레드와 연관된 제1 주파수로 조정하는 단계를 포함한다. 일 양태에서, 제1 프로그램 스레드를 식별하는 것은 제1 워크로드가 제1 워크로드 임계값을 초과하는 것에 응답하여 제1 프로그램 스레드를 식별하는 것을 포함한다. 다른 양태에서, 방법은 GPU에서 실행될 제1 워크로드에 기초하여 복수의 프로그램 스레드들 중 제2 프로그램 스레드를 식별하는 단계; 및 제2 프로그램 스레드를 식별하는 것에 응답하여, GPU의 클록을 제2 프로그램 스레드와 연관된 제2 주파수로 조정하는 단계를 포함한다. 또 다른 양태에서, 클록을 조정하는 것은 제2 주파수로부터 제1 주파수로 클록을 램핑하는 것을 포함한다.In some embodiments, the method includes: identifying, based on a first workload to be executed on a graphics processing unit (GPU), a first program thread among the plurality of program threads, wherein the GPU performs a work on behalf of the plurality of program threads. execute loads, and multiple program threads are executed concurrently in a central processing unit (CPU); and in response to identifying the first program thread, adjusting a clock of the GPU to a first frequency associated with the first program thread. In an aspect, identifying the first program thread includes identifying the first program thread in response to the first workload exceeding a first workload threshold. In another aspect, a method includes identifying a second program thread of a plurality of program threads based on a first workload to be executed on the GPU; and in response to identifying the second program thread, adjusting a clock of the GPU to a second frequency associated with the second program thread. In another aspect, adjusting the clock includes ramping the clock from the second frequency to the first frequency.

일부 실시예들에서, 그래픽 프로세싱 유닛(GPU)은 중앙 프로세싱 유닛(CPU)으로부터 복수의 명령들을 수신하는 스케줄러 - 복수의 명령들은 CPU에서 동시에 실행되는 복수의 프로그램 스레드와 연관되고, 복수의 스레드 각각은 대응하는 특정 클록 주파수와 연관됨 -; 복수의 명령들에 기초하여 워크로드들을 실행하도록 구성된 복수의 컴퓨팅 유닛들; 복수의 컴퓨팅 유닛들에 대한 제1 클록 신호를 생성하기 위한 클록 제어 모듈; 및 클록 주파수 조정 모듈은 복수의 컴퓨팅 유닛들에서 실행될 제1 워크로드를 결정하고; 제1 워크로드에 기반하여, CPU에서 동시에 실행되는 복수의 프로그램 스레드들 중 제1 프로그램 스레드를 식별하고; 및 제1 프로그램 스레드를 식별하는 것에 응답하여, 클록 신호를 제1 프로그램 스레드와 연관된 특정 클록 주파수로 조정하도록 구성된다. 일 양태에서, 클록 주파수 조정 모듈은, 제1 워크로드가 제1 워크로드 임계값을 초과하는 것에 응답하여 제1 프로그램 스레드를 식별하도록 구성된다.In some embodiments, the graphics processing unit (GPU) is a scheduler that receives a plurality of instructions from a central processing unit (CPU), the plurality of instructions being associated with a plurality of program threads executing concurrently on the CPU, each of the plurality of threads comprising: associated with a corresponding specific clock frequency -; a plurality of computing units configured to execute workloads based on the plurality of instructions; a clock control module for generating a first clock signal for the plurality of computing units; and the clock frequency adjustment module determines a first workload to be executed on the plurality of computing units; identify, based on the first workload, a first program thread from among the plurality of program threads that are concurrently executed in the CPU; and in response to identifying the first program thread, adjust the clock signal to a particular clock frequency associated with the first program thread. In an aspect, the clock frequency adjustment module is configured to identify the first program thread in response to the first workload exceeding the first workload threshold.

일 양태에서, 클록 주파수 조정 모듈은, 복수의 컴퓨팅 유닛들에서 실행될 제2 워크로드를 결정하고; 제2 워크로드에 기초하여 제2 프로그램 스레드를 식별하고; 및 제2 프로그램 스레드를 식별하는 것에 응답하여, 클록 신호를 제1 주파수로부터 제2 프로그램 스레드와 연관된 제2 주파수로 조정하도록 구성된다. 다른 양태에서, 클록 주파수 조정 모듈은, 제2 워크로드가 제2 워크로드 임계값 아래에 있는 것에 응답하여 제2 프로그램 스레드를 식별하도록 구성된다. 또 다른 양태에서, 제1 임계값은 프로그래밍 가능하다. 또 다른 양태에서, 클록 주파수 조정 모듈은 GPU의 컴퓨팅 유닛들의 세트에서 스케줄링된 웨이브프론트들의 수에 기초하여 제1 워크로드를 식별하도록 구성된다. 다른 양태에서, 클록 주파수 조정 모듈은 스케줄러에서 수신된 드로우 명령들의 수에 기초하여 제1 워크로드를 식별하도록 구성된다.In an aspect, the clock frequency adjustment module is configured to: determine a second workload to be executed on the plurality of computing units; identify a second program thread based on the second workload; and in response to identifying the second program thread, adjust the clock signal from the first frequency to a second frequency associated with the second program thread. In another aspect, the clock frequency adjustment module is configured to identify the second program thread in response to the second workload being below the second workload threshold. In another aspect, the first threshold is programmable. In another aspect, the clock frequency adjustment module is configured to identify the first workload based on a number of scheduled wavefronts in the set of computing units of the GPU. In another aspect, the clock frequency adjustment module is configured to identify the first workload based on a number of draw commands received at the scheduler.

컴퓨터 판독 가능 저장 매체는 컴퓨터 시스템에 명령어 및/또는 데이터를 제공하기 위해 사용하는 동안 컴퓨터 시스템에 의해 액세스 가능한 임의의 비일시적 저장 매체 또는 비일시적 저장 매체의 조합을 포함할 수 있다. 그러한 저장 매체는 광학 매체(예: 콤팩트 디스크(CD), 디지털 다목적 디스크(DVD), 블루레이 디스크), 자기 매체(예: 플로피 디스크, 자기 테이프 또는 자기 하드 드라이브), 휘발성 메모리(예: 랜덤 액세스 메모리(RAM) 또는 캐시), 비휘발성 메모리(예: 읽기 전용 메모리(ROM) 또는 플래시 메모리), 또는 MEMS(마이크로 전자기계 시스템) 기반 저장 매체를 포함할 수 있고 그러나 여기에 제한은 되지 않는다. 컴퓨터 판독 가능 저장 매체는 컴퓨팅 시스템(예를 들어, 시스템 RAM 또는 ROM)에 내장될 수 있고, 컴퓨팅 시스템(예를 들어, 자기 하드 드라이브)에 고정적으로 부착되거나 컴퓨팅 시스템(예를 들어, 광학 디스크 또는 유니버설 직렬 버스(USB) 기반 플래시 메모리), 또는 유선 또는 무선 네트워크(예: 네트워크 액세스 가능 스토리지(NAS))를 통해 컴퓨터 시스템에 연결된다.Computer-readable storage media may include any non-transitory storage media or combination of non-transitory storage media that are accessible by a computer system during use to provide instructions and/or data to the computer system. Such storage media may include optical media (eg, compact disc (CD), digital versatile disc (DVD), Blu-ray disc), magnetic media (eg, floppy disk, magnetic tape, or magnetic hard drive), volatile memory (eg, random access). memory (RAM) or cache), non-volatile memory (eg, read-only memory (ROM) or flash memory), or microelectromechanical system (MEMS) based storage media. A computer-readable storage medium may be embodied in a computing system (eg, system RAM or ROM), and may be fixedly attached to a computing system (eg, a magnetic hard drive) or a computing system (eg, an optical disk or It is connected to a computer system through a Universal Serial Bus (USB)-based flash memory), or a wired or wireless network (such as a Network Accessible Storage (NAS)).

일부 실시예들에서, 위에서 설명된 기술들의 특정 양태들은 소프트웨어를 실행하는 프로세싱 시스템의 하나 이상의 프로세서들에 의해 구현될 수 있다. 소프트웨어는 비일시적 컴퓨터 판독가능 저장 매체 상에 저장되거나 그렇지 않으면 유형적으로 실시된 하나 이상의 실행가능한 명령어들의 세트들을 포함한다. 소프트웨어는 하나 이상의 프로세서들에 의해 실행될 때, 위에서 설명된 기술들의 하나 이상의 양태들을 수행하기 위해 하나 이상의 프로세서들을 조작하는 명령어들 및 특정 데이터를 포함할 수 있다. 비일시적 컴퓨터 판독 가능 저장 매체는 자기 또는 광학 디스크 저장 장치, 플래시 메모리(Flash memory), 캐시(cache), 램(RAM) 등과 같은 솔리드 스테이트 저장 장치 또는 다른 비휘발성 메모리 장치 등을 포함할 수 있다. 비일시적 컴퓨터 판독가능 저장 매체 상에 저장된 실행가능한 명령어들은 소스 코드, 어셈블리 언어 코드, 객체 코드, 또는 하나 이상의 프로세서들에 의해 해석되거나 다른 방식으로 실행가능한 다른 명령어 포맷일 수 있다.In some embodiments, certain aspects of the techniques described above may be implemented by one or more processors of a processing system executing software. Software includes sets of one or more executable instructions stored on or otherwise tangibly embodied on a non-transitory computer-readable storage medium. Software may include specific data and instructions that, when executed by one or more processors, operate the one or more processors to perform one or more aspects of the techniques described above. The non-transitory computer-readable storage medium may include a magnetic or optical disk storage device, a flash memory, a cache, a solid state storage device such as RAM, or other non-volatile memory devices, and the like. The executable instructions stored on a non-transitory computer-readable storage medium may be in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executable by one or more processors.

일반적인 설명에서 상술한 모든 활동 또는 엘리먼트가 요구되지 않으며, 특정 활동 또는 장치의 일부가 요구되지 않을 수 있으며, 상술한 것들에 더하여 하나 이상의 추가적인 활동이 수행될 수 있거나, 또는 포함된 엘리먼트들이 요구된다는 점에 유의한다. 또한 활동이 나열되는 순서는 반드시 활동이 수행되는 순서는 아니다. 또한, 구체적인 실시예를 참조하여 개념을 설명하였다. 그러나 통상의 지식을 가진 자라면 아래의 청구항들에 기재된 바와 같이 본 개시의 범위를 벗어나지 않는 범위에서 다양한 수정 및 변경이 가능하다는 것을 알 수 있다. 따라서, 본 명세서 및 도면은 제한적인 의미가 아니라 예시적인 것으로 간주되어야 하며, 이러한 모든 수정은 본 개시의 범위 내에 포함되도록 의도된다.Not all activities or elements described above in the general description are required, some of specific activities or devices may not be required, one or more additional activities in addition to those described above may be performed, or included elements are required. take note of Also, the order in which the activities are listed is not necessarily the order in which the activities are performed. In addition, the concept has been described with reference to specific examples. However, those of ordinary skill in the art can see that various modifications and changes are possible without departing from the scope of the present disclosure as described in the claims below. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.

이점들, 다른 장점들, 및 문제점들에 대한 해결책들이 특정 실시예들과 관련하여 위에서 설명되었다. 그러나, 이점, 장점, 문제에 대한 해결책, 및 어떤 이점, 장점, 또는 해결책이 발생하거나 더 두드러지게 될 수 있는 어떤 특징은 청구항들 중 어느 하나 또는 모든 청구항들의 중요한, 요구되는, 또는 필수적인 특징으로 해석되지 않는다. 더욱이, 위에서 개시된 특정 실시예들은 단지 예시적이며, 개시된 주제가 본 명세서의 교시들의 이점을 갖는 당업자들에게 명백하지만 상이한 그러나 동등한 방식들로 수정되고 실시될 수 있다. 이하의 청구항에 기재된 것 이외의, 본 명세서에 기재된 구성 또는 설계의 상세사항에 대한 제한은 의도되지 않는다. 따라서, 위에서 개시된 특정 실시예들은 변경 또는 수정될 수 있고, 이러한 모든 변형들은 개시된 주제의 범위 내에서 고려된다는 것이 명백하다. 따라서, 본 명세서에서 추구하는 보호범위는 아래의 청구범위에 명시된 바와 같다.Advantages, other advantages, and solutions to problems have been described above in connection with specific embodiments. However, an advantage, advantage, solution to a problem, and any feature from which any advantage, advantage, or solution may arise or become more pronounced shall be construed as an important, required, or essential feature of any one or all of the claims. doesn't happen Moreover, the specific embodiments disclosed above are exemplary only, and while the disclosed subject matter will be apparent to those skilled in the art having the benefit of the teachings herein, it can be modified and practiced in different but equivalent ways. No limitations are intended to the details of construction or design described herein other than as set forth in the claims below. Accordingly, it is apparent that the specific embodiments disclosed above may be altered or modified, and all such variations are considered within the scope of the disclosed subject matter. Accordingly, the scope of protection sought in this specification is as specified in the claims below.

Claims

In the method,
receiving, at a graphics processing unit (GPU) 102 , a plurality of instructions ( 105 , 107 ) from a central processing unit (CPU) 101 , wherein the plurality of instructions are executed concurrently in the CPU by a plurality of program threads 103, 104, wherein each of the plurality of threads is associated with a corresponding particular clock frequency;
determining, at the GPU, a first workload (116) to be executed on the GPU based on at least one of the plurality of instructions;
identifying, based on the first workload, a first program thread from among the plurality of program threads that are concurrently executed in the CPU; and
in response to identifying the first program thread, adjusting a clock signal of the GPU to the particular clock frequency associated with the first program thread.

According to claim 1,
wherein identifying the first program thread comprises identifying the first program thread in response to the first workload exceeding a first workload threshold.

3. The method of claim 2,
determining, at the GPU, a second workload (117) to be executed on the GPU after the first workload based on at least one other of the plurality of instructions;
identifying, based on the second workload, a second program thread from among the plurality of program threads that are concurrently executed in the CPU; and
in response to identifying the second program thread, adjusting the clock signal of the GPU from the first frequency to the particular frequency associated with the second program thread.

4. The method of claim 3,
wherein identifying the second program thread comprises identifying the second program thread in response to the second workload being below a second workload threshold.

3. The method of claim 2, wherein the first threshold is programmable.

The method of claim 1 , wherein identifying the first workload comprises identifying the first workload based on information received at a scheduler (106) of the GPU.

The method of claim 1 , wherein identifying the first workload comprises identifying the first workload based on a number of wavefronts scheduled for execution on a set of computing units of the GPU. Way.

The method of claim 1 , wherein identifying the first workload comprises identifying the first workload based on a type of a draw command received at the GPU.

2. The method of claim 1, wherein adjusting the clock comprises ramping the clock from a second frequency to the first frequency.

In the method,
identifying, based on a first workload (116) to be executed on a graphics processing unit (GPU) (102), a first program thread (103) of a plurality of program threads (103, 104) - the above GPU executes workloads on behalf of the plurality of program threads, wherein the plurality of program threads are concurrently executed in a central processing unit (CPU) 101 ; and
in response to identifying the first program thread, adjusting a clock of the GPU to a first frequency associated with the first program thread.

11. The method of claim 10,
wherein identifying the first program thread comprises identifying the first program thread in response to the first workload exceeding a first workload threshold.

12. The method of claim 11,
identifying a second program thread (104) of the plurality of program threads based on a first workload to be executed on the GPU; and
in response to identifying the second program thread, adjusting the clock of the GPU to a second frequency associated with the second program thread.

11. The method of claim 10, wherein adjusting the clock comprises ramping the clock from a second frequency to the first frequency.

A graphics processing unit (GPU) (102) comprising:
a scheduler receiving a plurality of instructions (105, 107) from a central processing unit (CPU) (101), the plurality of instructions being associated with a plurality of program threads (104, 106) executing concurrently in the CPU, the plurality of each thread of − is associated with a corresponding specific clock frequency;
a plurality of computing units (115) configured to execute workloads (116, 117) based on the plurality of instructions;
a clock control module 110 for generating a first clock signal for the plurality of computing units; and
a clock frequency adjustment module (108), the clock frequency adjustment module comprising:
determine a first workload to be executed on the plurality of computing units;
identify, based on the first workload, a first program thread from among the plurality of program threads that are concurrently executed in the CPU; and
and in response to identifying the first program thread, adjust the clock signal to the particular clock frequency associated with the first program thread.

15. The method of claim 14, wherein the clock frequency adjustment module,
and identify the first program thread in response to the first workload exceeding a first workload threshold.

16. The method of claim 15, wherein the clock frequency adjustment module,
determine a second workload to be executed on the plurality of computing units;
identify a second program thread based on the second workload; and
and in response to identifying the second program thread, adjust the clock signal from a first frequency to a second frequency associated with the second program thread.

The method of claim 16, wherein the clock frequency adjustment module,
and identify the second program thread in response to the second workload being below a second workload threshold.

16. The GPU of claim 15, wherein the first threshold is programmable.

The GPU of claim 14 , wherein the clock frequency adjustment module is configured to identify the first workload based on a number of scheduled wavefronts in a set of computing units of the GPU.

15. The GPU of claim 14, wherein the clock frequency adjustment module is configured to identify the first workload based on a number of draw commands received at the scheduler.