KR20240022968A

KR20240022968A - Integrated circuit performing dynamic voltage and frequency scaling operation based on workload and operating method thereof

Info

Publication number: KR20240022968A
Application number: KR1020220170043A
Authority: KR
Inventors: 김계형
Original assignee: 삼성전자주식회사
Priority date: 2022-08-12
Filing date: 2022-12-07
Publication date: 2024-02-20

Abstract

모니터링한 데이터를 기초로 코어의 워크로드를 구분하고, 구분된 워크로드를 기초로 DVFS 동작을 수행하는 집적 회로 및 집적 회로의 동작 방법이 개시된다. 본 개시의 기술적 사상의 일 측면으로서, 전압-주파수 레벨에 따라 명령어를 처리하도록 구성된 적어도 하나의 코어, 적어도 하나의 코어로부터 요청을 수신하고, 요청에 따라 외부 메모리에 엑세스하고, 외부 메모리로부터 응답을 수신하는 공유 버퍼(shared buffer), 공유 버퍼를 모니터링 하여 공유 버퍼의 버퍼 용량 및 상기 외부 메모리로부터 수신되는 응답에 대한 응답 대기 시간을 획득하는 모니터 및 모니터로부터 버퍼 용량 및 응답 대기 시간을 수신하고, 버퍼 용량 및 응답 대기 시간을 기초로 적어도 하나의 코어의 워크로드를 구분하고, 구분된 워크로드를 기초로 전압-주파수 레벨에 대한 스케일링 인자(scaling factor)를 결정하는 DVFS(Dynamic Voltage and Frequency Scaling) 컨트롤러를 포함한다.An integrated circuit and a method of operating the integrated circuit are disclosed that classify the workload of a core based on monitored data and perform a DVFS operation based on the divided workload. As an aspect of the technical idea of the present disclosure, at least one core configured to process instructions according to voltage-frequency levels, receiving a request from the at least one core, accessing an external memory according to the request, and receiving a response from the external memory Receiving a shared buffer, a monitor that monitors the shared buffer to obtain the buffer capacity of the shared buffer and the response waiting time for the response received from the external memory, and receiving the buffer capacity and response waiting time from the monitor, buffer A Dynamic Voltage and Frequency Scaling (DVFS) controller that classifies the workload of at least one core based on capacity and response latency and determines a scaling factor for voltage-frequency levels based on the divided workload. Includes.

Description

Integrated circuit performing DVFS operation based on workload and operating method thereof {INTEGRATED CIRCUIT PERFORMING DYNAMIC VOLTAGE AND FREQUENCY SCALING OPERATION BASED ON WORKLOAD AND OPERATING METHOD THEREOF}

본 개시의 기술적 사상은 집적 회로에 관한 것으로서, 자세하게는 모니터링한 데이터를 기초로 코어의 워크로드를 구분하고, 구분된 워크로드를 기초로 DVFS 동작을 수행하는 집적 회로 및 집적 회로의 동작 방법에 관한 것이다.The technical idea of the present disclosure relates to an integrated circuit, and more specifically, to an integrated circuit that classifies the workload of a core based on monitored data and performs a DVFS operation based on the divided workload, and a method of operating the integrated circuit. will be.

모바일 장치와 같은 컴퓨팅 시스템의 소형화가 진행됨에 따라, 전력 관리가 중요한 문제로 대두되고 있다. 특히 에너지가 제한된 베터리를 사용하는 모바일 장치와 같은 휴대용 기기에서 소비 전력을 줄이기 위해 전력을 낮추지만 성능을 높이기 위해 전압을 높여야 하므로, 버스나 DRAM(dynamic random access memory)의 메모리 특성 등 워크로드의 특성에 따라 전력 관리를 효율적으로 할 필요성이 증가하고 있다.As computing systems such as mobile devices become miniaturized, power management has emerged as an important issue. In particular, in portable devices such as mobile devices that use energy-limited batteries, power is lowered to reduce power consumption, but voltage must be increased to increase performance, so workload characteristics such as memory characteristics of the bus or dynamic random access memory (DRAM) must be adjusted. Accordingly, the need for efficient power management is increasing.

한 예로서 모바일 장치의 어플리케이션 프로세서는 어플리케이션 프로세서에 내장된 프로세싱 장치의 워크로드에 따라 프로세싱 장치의 주파수와 전압을 조절하는 DVFS(Dynamic Voltage and Frequency Scaling) 동작을 통해 전압을 조절함으로써 전력을 관리할 수 있다.As an example, the application processor of a mobile device can manage power by adjusting the voltage through DVFS (Dynamic Voltage and Frequency Scaling) operation, which adjusts the frequency and voltage of the processing device according to the workload of the processing device built into the application processor. there is.

본 개시의 기술적 사상이 해결하려는 과제는, CPU(central processing unit)등의 상태뿐만 아니라, 버스나 DRAM의 메모리 등의 특성을 고려하여 워크로드를 구분하고, 구분된 워크로드를 기초로 스케일링 인자(scaling factor)를 달리 결정하여 DVFS 동작을 효과적으로 수행할 수 있도록 하는 집적 회로, 집적 회로의 동작 방법 및 컴퓨팅 시스템을 제공한다.The problem that the technical idea of the present disclosure seeks to solve is to classify the workload by considering not only the status of the CPU (central processing unit), but also the characteristics of the bus or DRAM memory, and based on the divided workload, a scaling factor ( Provides an integrated circuit, an operation method of the integrated circuit, and a computing system that can effectively perform DVFS operations by determining different scaling factors.

상기와 같은 목적을 달성하기 위하여, 본 개시의 예시적 실시예에 따른 집적 회로는, 전압-주파수 레벨에 따라 명령어를 처리하도록 구성된 적어도 하나의 코어, 적어도 하나의 코어로부터 요청을 수신하고, 요청에 따라 외부 메모리에 엑세스하고, 외부 메모리로부터 응답을 수신하는 공유 버퍼(shared buffer), 공유 버퍼를 모니터링 하여 공유 버퍼의 버퍼 용량 및 상기 외부 메모리로부터 수신되는 응답에 대한 응답 대기 시간을 획득하는 모니터 및 모니터로부터 버퍼 용량 및 응답 대기 시간을 수신하고, 버퍼 용량 및 응답 대기 시간을 기초로 적어도 하나의 코어의 워크로드를 구분하고, 구분된 워크로드를 기초로 전압-주파수 레벨에 대한 스케일링 인자(scaling factor)를 결정하는 DVFS(Dynamic Voltage and Frequency Scaling) 컨트롤러를 포함한다.To achieve the above object, an integrated circuit according to an exemplary embodiment of the present disclosure receives a request from at least one core, at least one core configured to process an instruction according to a voltage-frequency level, and responds to the request. A shared buffer that accesses external memory and receives a response from the external memory, and a monitor that monitors the shared buffer to obtain the buffer capacity of the shared buffer and the response waiting time for the response received from the external memory. Receive buffer capacity and response latency from the server, distinguish the workload of at least one core based on the buffer capacity and response latency, and set a scaling factor for the voltage-frequency level based on the divided workload. It includes a DVFS (Dynamic Voltage and Frequency Scaling) controller that determines.

본 개시의 예시적 실시예에 따른 집적 회로의 동작 방법은, 공유 버퍼를 모니터링하고, 공유 버퍼의 버퍼 용량 및 공유 버퍼가 외부 메모리로부터 수신하는 응답에 대한 응답 대기 시간을 획득하는 단계, 버퍼 용량 및 응답 대기 시간을 기초로 코어의 워크로드를 구분하는 단계 및 구분된 코어의 워크로드를 기초로 코어의 전압-주파수 레벨에 대한 스케일링 인자를 결정하는 단계를 포함한다.A method of operating an integrated circuit according to an exemplary embodiment of the present disclosure includes monitoring a shared buffer, obtaining a buffer capacity of the shared buffer and a response waiting time for a response that the shared buffer receives from an external memory, buffer capacity, and It includes dividing the workload of the core based on the response latency and determining a scaling factor for the voltage-frequency level of the core based on the workload of the divided core.

본 개시의 예시적 실시예에 따른 컴퓨팅 시스템은, 프로세서, 적어도 하나의 메모리, 프로세서와 적어도 하나의 메모리를 연결하는 버스, 공유 버퍼의 버퍼 용량 및 버스로부터 수신되는 응답에 대한 응답 대기 시간을 기초로 적어도 하나의 코어의 워크로드를 구분하고, 구분된 워크로드를 기초로 스케일링 인자를 결정하고, 결정된 스케일링 인자를 기초로 전압 제어 신호 및 주파수 제어 신호를 생성하는 DVFS 컨트롤러, 전압 제어 신호에 응답하여 상기 적어도 하나의 코어에 제공되는 전원 전압의 크기를 조정하는 파워 관리부 및 주파수 제어 신호에 응답하여 상기 적어도 하나의 코어에 제공되는 클럭 신호의 주파수를 조정하는 클럭 관리부를 포함하고, 프로세서는, 전원 전압의 크기 및 클럭 신호의 주파수에 따라 명령어를 처리하도록 구성된 적어도 하나의 코어, 적어도 하나의 코어로부터 요청을 수신하고, 요청에 따라 버스에 엑세스하고, 버스로부터 응답을 수신하는 공유 버퍼 및 공유 버퍼를 모니터링 하여 버퍼 용량 및 응답 대기 시간을 획득하는 모니터를 포함한다.A computing system according to an exemplary embodiment of the present disclosure includes a processor, at least one memory, a bus connecting the processor and the at least one memory, a buffer capacity of a shared buffer, and a response latency for a response received from the bus. A DVFS controller that separates the workload of at least one core, determines a scaling factor based on the divided workload, and generates a voltage control signal and a frequency control signal based on the determined scaling factor, in response to the voltage control signal. A power management unit that adjusts the size of a power supply voltage provided to at least one core, and a clock management unit that adjusts the frequency of a clock signal provided to the at least one core in response to a frequency control signal, wherein the processor, At least one core configured to process instructions according to the size and frequency of the clock signal, receiving a request from the at least one core, accessing a bus according to the request, and receiving a response from the bus, a shared buffer, and monitoring the shared buffer. Includes monitors to obtain buffer capacity and response latency.

본 개시의 예시적 실시예에 따라, CPU등의 상태뿐만 아니라, 버스나 DRAM의 메모리 등의 특성을 고려하여 워크로드를 구분하고, 구분된 워크로드를 기초로 스케일링 인자를 달리 결정하여 효율적으로 DVFS 동작을 수행할 수 있고, 소모 전력을 줄일 수 있다.According to an exemplary embodiment of the present disclosure, the workload is divided considering characteristics of the bus or DRAM memory as well as the state of the CPU, and different scaling factors are determined based on the divided workload to efficiently perform DVFS. Operation can be performed and power consumption can be reduced.

또한, 공유 버퍼에서 데이터를 기다리는 시간을 모니터링하고, 이를 기초로 워크로드를 구분할 수 있으므로, 소프트웨어를 통해 워크로드를 구분하는 것보다 오버헤드(overhead)를 감소시킬 수 있다.Additionally, since the time waiting for data in the shared buffer can be monitored and workloads can be divided based on this, overhead can be reduced compared to classifying workloads through software.

본 개시의 기술적 사상에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 아니하며, 언급되지 아니한 다른 효과들은 이하의 기재로부터 본 개시의 기술적사상이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 도출되고 이해될 수 있다. 즉, 본 개시의 기술적 사상을 실시함에 따른 의도하지 아니한 효과들 역시 본 개시의 기술적 사상으로부터 당해 기술분야의 통상의 지식을 가진 자에 의해 도출될 수 있다.The effects that can be obtained from the technical idea of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned are clear to those skilled in the art from the following description. It can be derived and understood. In other words, unintended effects resulting from implementing the technical idea of the present disclosure may also be derived from the technical idea of the present disclosure by a person skilled in the art.

도 1은 본 개시의 예시적 실시예에 따른 집적 회로를 설명하는 블록도이다.
도 2는 본 개시의 예시적 실시예에 따른 공유 버퍼 및 모니터가 모니터링하는 데이터를 설명하는 블록도이다.
도 3a 및 도 3b는 본 개시의 예시적 실시예에 따른 버퍼 용량 및 응답 대기 시간을 설명하기 위한 그래프이다.
도 4는 본 개시의 예시적 실시예에 따른 집적 회로의 일 구현예를 나타내는 블록도이다.
도 5는 본 개시의 예시적 실시예에 따른 집적 회로의 일 구현예를 나타내는 블록도이다.
도 6a 및 도 6b는 본 개시의 예시적 실시예에 따른 DVFS 컨트롤러가 스케일링 인자를 결정하는 것을 설명하기 위한 그래프 및 표이다.
도 7은 본 개시의 예시적 실시예에 따른 집적 회로의 일 구현예를 나타내는 블록도이다.
도 8은 본 개시의 예시적 실시예에 따른 집적 회로의 동작 방법을 나타내는 순서도이다.
도 9는 본 개시의 예시적 실시예에 따른 집적 회로의 동작 방법의 일 구현예를 나타내는 순서도이다.
도 10은 본 개시의 예시적 실시예에 따른 집적 회로의 동작 방법의 일 구현예를 나타내는 순서도이다.
도 11은 본 개시의 예시적 실시예에 따른 시스템을 나타내는 블록도이다.
도 12는 본 개시의 예시적 실시예에 따른 어플리케이션 프로세서를 포함하는 통신 장치를 나타내는 블록도이다.1 is a block diagram illustrating an integrated circuit according to an exemplary embodiment of the present disclosure.
Figure 2 is a block diagram illustrating data monitored by a shared buffer and a monitor according to an exemplary embodiment of the present disclosure.
3A and 3B are graphs for explaining buffer capacity and response waiting time according to an exemplary embodiment of the present disclosure.
Figure 4 is a block diagram showing an implementation of an integrated circuit according to an exemplary embodiment of the present disclosure.
Figure 5 is a block diagram illustrating an example implementation of an integrated circuit according to an exemplary embodiment of the present disclosure.
6A and 6B are graphs and tables for explaining how a DVFS controller determines a scaling factor according to an exemplary embodiment of the present disclosure.
Figure 7 is a block diagram showing an implementation of an integrated circuit according to an exemplary embodiment of the present disclosure.
Figure 8 is a flowchart showing a method of operating an integrated circuit according to an exemplary embodiment of the present disclosure.
Figure 9 is a flowchart illustrating an example implementation of a method of operating an integrated circuit according to an exemplary embodiment of the present disclosure.
Figure 10 is a flowchart illustrating an example implementation of a method of operating an integrated circuit according to an exemplary embodiment of the present disclosure.
Figure 11 is a block diagram showing a system according to an exemplary embodiment of the present disclosure.
Figure 12 is a block diagram showing a communication device including an application processor according to an example embodiment of the present disclosure.

도 1은 본 개시의 예시적 실시예에 따른 집적 회로를 설명하는 블록도이다.1 is a block diagram illustrating an integrated circuit according to an exemplary embodiment of the present disclosure.

도 1을 참조하면, 집적 회로(10)는 프로세서(100), DVFS 컨트롤러(200), 클럭 관리부(Clock Management Unit; CMU)(300), 파워 관리부(Power Management Unit; PMU)(400), 버스(500) 및 메모리(600)를 포함할 수 있다. 일부 실시예들에서, 프로세서(100), DVFS 컨트롤러(200), 클럭 관리부(300), 파워 관리부(400), 버스(500) 및 메모리(600)는 하나의 칩, 즉 시스템-온-칩(system-on-chip, SoC)에 포함될 수 있고, 집적 회로(10)는 어플리케이션 프로세서(application processor, AP)로 지칭될 수 있다.Referring to FIG. 1, the integrated circuit 10 includes a processor 100, a DVFS controller 200, a clock management unit (CMU) 300, a power management unit (PMU) 400, and a bus. It may include 500 and memory 600. In some embodiments, processor 100, DVFS controller 200, clock manager 300, power manager 400, bus 500, and memory 600 are integrated into one chip, i.e., system-on-chip ( It may be included in a system-on-chip (SoC), and the integrated circuit 10 may be referred to as an application processor (AP).

집적 회로(10)는 데스크탑 PC, 서버 등과 같은 고정형 컴퓨팅 시스템에 포함될 수도 있고, 랩탑 컴퓨터, 이동 전화기, 스마트폰, 태블릿 PC, PDA(personal digital assistant), EDA(enterprise digital assistant), 디지털 스틸 카메라(digital still camera), 디지털 비디오 카메라(digital video camera), PMP(portable multimedia player), PND(personal navigation device 또는 portable navigation device), 휴대용 게임 콘솔(handheld game console), 모바일 인터넷 장치(mobile internet device(MID), 웨어러블 컴퓨터, 사물 인터넷 (internet of things(IoT)) 장치, 만물 인터넷(internet of everything(IoE)) 장치, 또는 e-북(e-book)에 포함될 수 있다.The integrated circuit 10 may be included in a stationary computing system such as a desktop PC, server, etc., and may also be included in a laptop computer, mobile phone, smartphone, tablet PC, personal digital assistant (PDA), enterprise digital assistant (EDA), or digital still camera ( digital still camera, digital video camera, portable multimedia player (PMP), personal navigation device or portable navigation device (PND), handheld game console, mobile internet device (MID) ), wearable computers, internet of things (IoT) devices, internet of everything (IoE) devices, or e-books.

프로세서(100)는 적어도 하나의 코어(110), 공유 버퍼(120) 및 모니터(130)를 포함할 수 있다. 일부 실시예들에서, 프로세서(100)는 명령어들(instructions)로 구성된 프로그램을 실행할 수 있다. 프로그램은 복수의 서브프로그램들(subprograms)을 포함할 수 있고, 서브프로그램은, 서브루틴(subroutine), 루틴(routine), 프로시저(procedure), 함수(function) 등으로 지칭될 수 있다.The processor 100 may include at least one core 110, a shared buffer 120, and a monitor 130. In some embodiments, processor 100 may execute a program comprised of instructions. A program may include a plurality of subprograms, and a subprogram may be referred to as a subroutine, routine, procedure, function, etc.

코어(110)는 독립적으로 명령어를 처리할 수 있다. 이하에서, 코어(110)는 CPU 코어를 주로 참조하여 설명될 것이나, 본 개시의 예시적 실시예들이 이에 제한되지 아니하는 점이 유의 된다. 예를 들어, 코어(110)는 CPU 코어, GPU(graphics processing unit) 코어, NPU(neural processing unit) 코어 또는 ISP(image signal processor) 코어일 수 있다. 프로세서(100)에 복수의 코어(110)들이 포함될 수 있으므로, 프로세서(100)는 멀티-코어 프로세서로 지칭될 수 있다. 일부 실시예들에서, 코어(110)는 클럭 신호(clk) 및 전원 전압(vdd)에 따라 명령어를 처리할 수 있고, 코어(110)의 성능은 클럭 신호(clk) 및 전원 전압(vdd)에 의존할 수 있다. 이에 대한 일부 실시예들은 도 6a 및 도 6b를 참조하여 후술 될 것이다. Core 110 can independently process instructions. Hereinafter, core 110 will be described primarily with reference to a CPU core, but it is noted that exemplary embodiments of the present disclosure are not limited thereto. For example, the core 110 may be a CPU core, a graphics processing unit (GPU) core, a neural processing unit (NPU) core, or an image signal processor (ISP) core. Since the processor 100 may include a plurality of cores 110, the processor 100 may be referred to as a multi-core processor. In some embodiments, the core 110 may process instructions depending on the clock signal (clk) and the power supply voltage (vdd), and the performance of the core 110 may depend on the clock signal (clk) and the power supply voltage (vdd). You can depend on it. Some embodiments of this will be described later with reference to FIGS. 6A and 6B.

공유 버퍼(120)는 멀티-코어 프로세서에서 복수의 코어(110)들이 공유하는 버퍼일 수 있다. 예를 들면, 프로세서(100)가 CPU일 수 있고, 복수의 코어(110)에 L2 캐시가 포함될 수 있고, 공유 버퍼(120)는 L3 캐시일 수 있다. 공유 버퍼(120)는 코어(110)에 포함되지 않은 데이터를 저장할 수 있고, 코어(110)와 데이터를 송수신할 수 있다. 일부 실시예들에서 코어(110)가 명령어를 처리하기 위한 데이터는 코어(110) 내부에 없을 수 있고, 데이터를 공유 버퍼(120)에 요청(Req)할 수 있다. 공유 버퍼(120)는 코어(110)로부터 요청(Req)을 수신하고, 요청(Req)에 대응한 데이터가 공유 버퍼(120) 내부에 존재할 경우, 해당 데이터를 코어(110)에 송신할 수 있다. 요청(Req)에 대응한 데이터가 공유 버퍼(120) 내부에 존재하지 않은 경우(예를 들면, 캐시 미스(cache miss) 발생), 외부 메모리에 접근(Acc)할 수 있고, 외부 메모리로부터 응답(Res)을 수신하여 요청(Req)에 대응한 데이터를 획득할 수 있다. 예를 들면, 공유 버퍼(120)는 버스(500)를 통해 메모리(600)에 접근(Acc)할 수 있고, 메모리(600)로부터 버스(500)를 통해 응답(Res)을 수신할 수 있다. 공유 버퍼(120)는 복수의 블록(121)들을 포함할 수 있고, 이에 대한 일부 실시예들은 도 2를 참조하여 후술 될 것이다.The shared buffer 120 may be a buffer shared by a plurality of cores 110 in a multi-core processor. For example, the processor 100 may be a CPU, the plurality of cores 110 may include an L2 cache, and the shared buffer 120 may be an L3 cache. The shared buffer 120 can store data not included in the core 110, and can transmit and receive data to and from the core 110. In some embodiments, data for the core 110 to process a command may not be inside the core 110, and data may be requested from the shared buffer 120. The shared buffer 120 receives a request (Req) from the core 110, and when data corresponding to the request (Req) exists inside the shared buffer 120, the data can be transmitted to the core 110. . If the data corresponding to the request (Req) does not exist inside the shared buffer 120 (for example, a cache miss occurs), external memory can be accessed (Acc), and a response from the external memory ( Res) can be received to obtain data corresponding to the request (Req). For example, the shared buffer 120 can access (Acc) the memory 600 through the bus 500 and receive a response (Res) from the memory 600 through the bus 500. The shared buffer 120 may include a plurality of blocks 121, and some embodiments thereof will be described later with reference to FIG. 2.

모니터(130)는 공유 버퍼(120)를 모니터링 하고, 공유 버퍼(120)의 버퍼 용량 및 외부 메모리로부터 수신되는 응답에 대한 응답 대기 시간을 획득할 수 있다. 일부 실시예들에서, 공유 버퍼(120)의 버퍼 용량은 복수의 블록(121)들의 용량이 채워진 상태를 나타낼 수 있고, 외부 메모리로부터 수신되는 응답에 대한 응답 대기 시간은 공유 버퍼(120)가 버스(500)를 통해 메모리(600)에 접근(Acc)한 후에 메모리(600)로부터 응답(Res)을 수신할 때까지 걸린 시간일 수 있다. 이에 대한 일부 실시예들은 도 3a 및 도 3b를 참조하여 후술 될 것이다. The monitor 130 can monitor the shared buffer 120 and obtain the buffer capacity of the shared buffer 120 and the response waiting time for a response received from an external memory. In some embodiments, the buffer capacity of the shared buffer 120 may indicate that the capacity of the plurality of blocks 121 is full, and the response waiting time for a response received from an external memory may indicate that the shared buffer 120 is busy on the bus. This may be the time taken to receive a response (Res) from the memory 600 after accessing (Acc) the memory 600 through (500). Some embodiments of this will be described later with reference to FIGS. 3A and 3B.

일부 실시예들에서, 모니터(130)는 코어(110)를 모니터링 할 수 있고, 코어(110)가 공유 버퍼(120)로부터 요청(Req)에 대응한 데이터를 수신할 때까지 걸린 시간을 획득할 수 있다. 예를 들면, 코어(110)는 CPU 코어일 수 있고, L2 캐시를 포함할 수 있다. 공유 버퍼(120)는 L3 캐시일 수 있다. CPU 코어가 명령어를 처리하기 위한 데이터는 L2 캐시에 없을 수 있고, 이를 L3 캐시에 요청(Req)할 수 있다. L3 캐시는 L2 캐시로부터 요청(Req)을 수신하고, 요청(Req)에 대응한 데이터가 L3 캐시 내부에 존재할 경우, 해당 데이터를 L2 캐시에 송신할 수 있다. 요청(Req)에 대응한 데이터가 L3 캐시 내부에 존재하지 않은 경우, L3 캐시는 외부 메모리에 접근(Acc)할 수 있고, 외부 메모리로부터 응답(Res)을 수신하여 요청(Req)에 대응한 데이터를 획득할 수 있다. L3 캐시는 외부 메모리로부터 획득한 데이터를 L2 캐시에 송신할 수 있다. 모니터(130)는 L2 캐시를 모니터링 할 수 있고, L3 캐시에 데이터를 요청(Req)하여 요청(Req)에 대응한 데이터를 L3 캐시로부터 수신할 때까지 걸린 시간을 획득할 수 있다.In some embodiments, the monitor 130 may monitor the core 110 and obtain the time taken for the core 110 to receive data corresponding to the request (Req) from the shared buffer 120. You can. For example, core 110 may be a CPU core and may include an L2 cache. Shared buffer 120 may be an L3 cache. The data for the CPU core to process instructions may not be in the L2 cache, and it can be requested (Req) from the L3 cache. The L3 cache receives a request (Req) from the L2 cache, and if data corresponding to the request (Req) exists inside the L3 cache, the data can be transmitted to the L2 cache. If the data corresponding to the request (Req) does not exist inside the L3 cache, the L3 cache can access external memory (Acc), receive a response (Re) from the external memory, and retrieve the data corresponding to the request (Req). can be obtained. The L3 cache can transmit data obtained from external memory to the L2 cache. The monitor 130 can monitor the L2 cache, request data from the L3 cache (Req), and obtain the time taken to receive data corresponding to the request (Req) from the L3 cache.

DVFS 컨트롤러(200)는 모니터(130)로부터 공유 버퍼(120)의 버퍼 용량 및 외부 메모리로부터 수신되는 응답에 대한 응답 대기 시간을 수신할 수 있고, 버퍼 용량 및 응답 대기 시간을 기초로 코어(110)의 워크로드를 구분할 수 있다. 일부 실시예들에서, DVFS 컨트롤러(200)는 코어(110)의 워크로드를 제1 워크로드 또는 제2 워크로드로 구분할 수 있고, 제1 워크로드는 제2 워크로드보다 외부 메모리에 접근하는 요청을 더 많이 포함할 수 있다. 예를 들면, 제1 워크로드는 메모리 집중 워크로드(memory intensive workload)일 수 있고, 메모리 집중 워크로드는 프로세서(100)의 동작 중 캐시 미스가 발생할 경우, 버스(500)나 메모리(600)에서 충돌(congestion)이 발생하는 상황을 의미할 수 있다. 제2 워크로드는 컴퓨팅 워크로드(computing workload)일 수 있고, 컴퓨팅 워크로드는 코어(110)의 명령어 처리와 관련된 것으로, 프로세서(100)의 동작 중 캐시 미스가 발생하지 않는 상황을 의미할 수 있다. The DVFS controller 200 may receive the buffer capacity of the shared buffer 120 and the response waiting time for the response received from the external memory from the monitor 130, and the core 110 based on the buffer capacity and response waiting time. The workload can be distinguished. In some embodiments, the DVFS controller 200 may divide the workload of the core 110 into a first workload or a second workload, with the first workload requesting access to external memory more than the second workload. It can include more. For example, the first workload may be a memory intensive workload, and the memory intensive workload may be transferred to the bus 500 or the memory 600 when a cache miss occurs during operation of the processor 100. It may mean a situation in which a collision occurs. The second workload may be a computing workload, and the computing workload is related to instruction processing of the core 110, and may mean a situation in which a cache miss does not occur during the operation of the processor 100. .

일부 실시예들에서, 버퍼 용량 및 응답 대기 시간이 임계값 이상이면 DVFS 컨트롤러(200)는 코어(110)의 워크로드를 제1 워크로드로 구분할 수 있다. 버퍼 용량 또는 응답 대기 시간이 임계값 미만이면 DVFS 컨트롤러(200)는 코어(110)의 워크로드를 제2 워크로드로 구분할 수 있다. 이에 대한 일부 실시예들은 도 2 내지 도 4를 참조하여 후술 될 것이다.In some embodiments, if the buffer capacity and response latency are greater than or equal to a threshold, the DVFS controller 200 may classify the workload of the core 110 as a first workload. If the buffer capacity or response latency is less than the threshold, the DVFS controller 200 may classify the workload of the core 110 as a second workload. Some embodiments of this will be described later with reference to FIGS. 2 to 4.

일부 실시예들에서, DVFS 컨트롤러(200)는 모니터(130)로부터 코어(110)가 공유 버퍼(120)로부터 요청(Req)에 대응한 데이터를 수신할 때까지 걸린 시간을 수신할 수 있고, 수신한 시간을 기초로 코어(110)의 워크로드를 제1 워크로드 또는 제2 워크로드로 구분할 수 있다. 제1 워크로드는 제2 워크로드보다 외부 메모리에 접근하는 요청을 더 많이 포함할 수 있다. 예를 들면, 코어(110)는 CPU 코어일 수 있고, L2 캐시를 포함할 수 있다. 공유 버퍼(120)는 L3 캐시일 수 있다. L3 캐시 용량의 여유 공간이 있는 경우, L3 캐시는 L2 캐시로부터 요청(Req)을 수신할 수 있고, 외부 메모리에 접근(Acc) 및 외부 메모리로부터 응답(Res)을 수신하여 요청(Req)에 대응한 데이터를 획득할 수 있다. 이때, L2 캐시가 L3 캐시로부터 요청(Req)에 대응한 데이터를 수신할 때까지 걸린 시간은 임계 시간보다 짧을 수 있고, DVFS 컨트롤러(200)는 코어(110)의 워크로드를 제2 워크로드로 구분할 수 있다. L3 캐시 용량의 여유 공간이 없는 경우(예를 들면, L3 캐시의 용량이 요청(Req)에 대응한 데이터 외 다른 데이터로 가득 채워진 경우), L3 캐시는 L2 캐시로부터 요청(Req)을 수신할 수 없고, 채워진 데이터에 대한 작업을 수행한 이후에 요청(Req)을 수신할 수 있다. 이후 외부 메모리에 접근(Acc) 및 외부 메모리로부터 응답(Res)을 수신하여 요청(Req)에 대응한 데이터를 획득할 수 있다. 이때, L2 캐시가 L3 캐시로부터 요청(Req)에 대응한 데이터를 수신할 때까지 걸린 시간은 임계 시간보다 길 수 있고, DVFS 컨트롤러(200)는 코어(110)의 워크로드를 제1 워크로드로 구분할 수 있다.In some embodiments, the DVFS controller 200 may receive from the monitor 130 the time taken for the core 110 to receive data corresponding to the request (Req) from the shared buffer 120, and receive Based on one hour, the workload of the core 110 can be divided into a first workload or a second workload. The first workload may include more requests to access external memory than the second workload. For example, core 110 may be a CPU core and may include an L2 cache. Shared buffer 120 may be an L3 cache. If there is free space in the L3 cache capacity, the L3 cache can receive a request (Req) from the L2 cache, access the external memory (Acc), and respond to the request (Req) by receiving a response (Re) from the external memory. One data can be obtained. At this time, the time taken for the L2 cache to receive data corresponding to the request (Req) from the L3 cache may be shorter than the threshold time, and the DVFS controller 200 transfers the workload of the core 110 to the second workload. can be distinguished. If there is no free space in the L3 cache capacity (for example, if the capacity of the L3 cache is filled with data other than the data corresponding to the request (Req)), the L3 cache cannot receive the request (Req) from the L2 cache. There is none, and a request (Req) can be received after performing an operation on the filled data. Afterwards, data corresponding to the request (Req) can be obtained by accessing the external memory (Acc) and receiving a response (Res) from the external memory. At this time, the time taken for the L2 cache to receive data corresponding to the request (Req) from the L3 cache may be longer than the threshold time, and the DVFS controller 200 divides the workload of the core 110 into the first workload. can be distinguished.

DVFS 컨트롤러(200)는 구분된 코어(110)의 워크로드를 기초로 전압-주파수 레벨에 대한 스케일링 인자를 결정할 수 있다. 일부 실시예들에서, 구분된 코어(110)의 워크로드에 따라 전압-주파수 레벨이 감소되도록 스케일링 인자를 결정할 수 있다. 예를 들면, DVFS 컨트롤러(200)는 코어(110)의 워크로드를 제1 워크로드 또는 제2 워크로드로 구분할 수 있고, 제1 워크로드는 제2 워크로드보다 외부 메모리에 접근하는 요청을 더 많이 포함할 수 있다. 제1 워크로드를 갖는 코어(110)는 제2 워크로드를 갖는 코어(110)보다 요구되는 성능이 낮을 수 있다. 코어(110)의 성능은 전압-주파수 레벨에 의존적일 수 있고, DVFS 컨트롤러(200)는 코어(110)의 워크로드를 제1 워크로드로 구분한 경우 전압-주파수 레벨이 감소되도록 스케일링 인자를 결정할 수 있다. 이에 대한 일부 실시예들은, 도 6a 및 도 6b를 참조하여 후술 될 것이다.The DVFS controller 200 may determine a scaling factor for the voltage-frequency level based on the workload of the divided cores 110. In some embodiments, a scaling factor may be determined such that the voltage-frequency level is reduced depending on the workload of the divided cores 110. For example, the DVFS controller 200 may divide the workload of the core 110 into a first workload or a second workload, and the first workload may make more requests to access external memory than the second workload. It can include a lot. The core 110 with the first workload may have lower required performance than the core 110 with the second workload. The performance of the core 110 may be dependent on the voltage-frequency level, and the DVFS controller 200 determines a scaling factor so that the voltage-frequency level is reduced when the workload of the core 110 is divided into the first workload. You can. Some embodiments of this will be described later with reference to FIGS. 6A and 6B.

DVFS 컨트롤러(200)는 결정된 스케일링 인자를 기초로 제어 신호를 생성할 수 있다. 일부 실시예들에서, DVFS 컨트롤러(200)는 코어(110)의 주파수를 조절하기 위한 클럭 제어 신호(C_clk)를 생성할 수 있고, 클럭 제어 신호(C_clk)를 클럭 관리부(300)에 송신할 수 있다. 일부 실시예들에서, DVFS 컨트롤러(200)는 코어(110)의 전원 전압(vdd)을 조절하기 위한 전압 제어 신호(C_vdd)를 생성할 수 있고, 전압 제어 신호(C_vdd)를 파워 관리부(400)에 송신할 수 있다.The DVFS controller 200 may generate a control signal based on the determined scaling factor. In some embodiments, the DVFS controller 200 may generate a clock control signal (C_clk) to adjust the frequency of the core 110 and transmit the clock control signal (C_clk) to the clock manager 300. there is. In some embodiments, the DVFS controller 200 may generate a voltage control signal (C_vdd) to adjust the power supply voltage (vdd) of the core 110, and may send the voltage control signal (C_vdd) to the power management unit 400. It can be sent to .

DVFS 컨트롤러(200)는 모니터(130)가 공유 버퍼(120)를 모니터링하여 획득한 버퍼 용량 및 응답 대기 시간을 기초로 코어(110)의 워크로드를 구분하기 때문에 버스(500)나 메모리(600)의 특성에 따라 전압-주파수 레벨에 대한 스케일링 인자를 다르게 결정할 수 있다. 이에 따라, 워크로드를 구분하지 않는 경우보다 전력 소비를 효율적으로 감소시킬 수 있고, 소프트웨어를 이용하여 워크로드를 구분하는 경우보다 추가 계산이 필요로 하지 않으므로 오버헤드(overhead)를 감소시킬 수 있다. DVFS 컨트롤러(200)는 프로세서(100) 외부에 위치한 것으로 설명하고 있으나, 프로세서(100) 내부에 위치할 수 있다.Since the DVFS controller 200 classifies the workload of the core 110 based on the buffer capacity and response latency obtained by the monitor 130 by monitoring the shared buffer 120, the bus 500 or memory 600 Depending on the characteristics of , the scaling factor for the voltage-frequency level can be determined differently. Accordingly, power consumption can be reduced more efficiently than in the case of not dividing workloads, and overhead can be reduced because additional calculations are not required compared to the case of dividing workloads using software. The DVFS controller 200 is described as being located outside the processor 100, but may be located inside the processor 100.

클럭 관리부(300)는 클럭 신호(clk)를 생성할 수 있고, 클럭 제어 신호(C_clk)에 기초하여 클럭 신호(clk)의 주파수를 조절할 수 있다. 예를 들면, 클럭 관리부(300)는 클럭 제어 신호(C_clk)에 기초하여 클럭 신호(clk)를 생성하는 오실레이터를 포함할 수 있다. 클럭 관리부(300)는 클럭 생성기 또는 클럭 생성 회로로 지칭될 수도 있다.The clock manager 300 may generate a clock signal clk and adjust the frequency of the clock signal clk based on the clock control signal C_clk. For example, the clock manager 300 may include an oscillator that generates a clock signal (clk) based on the clock control signal (C_clk). The clock management unit 300 may also be referred to as a clock generator or clock generation circuit.

파워 관리부(400)는 전원 전압(vdd)을 생성할 수 있고, 전압 제어 신호(C_vdd)에 기초하여 전원 전압(vdd)의 크기를 조정할 수 있다. 예를 들면, 파워 관리부(400)는 전압 제어 신호(C_vdd)에 기초하여 외부 전원으로부터 제공되는 전압으로부터 전원 전압(vdd)을 생성하는 스위칭 레귤레이터를 포함할 수 있다. 파워 관리부(400)는 전력 관리 집적 회로(Power Management Integrated Circuit, PMIC)로 지칭될 수도 있다.The power management unit 400 may generate the power supply voltage (vdd) and adjust the size of the power supply voltage (vdd) based on the voltage control signal (C_vdd). For example, the power management unit 400 may include a switching regulator that generates the power supply voltage (vdd) from a voltage provided from an external power source based on the voltage control signal (C_vdd). The power management unit 400 may also be referred to as a power management integrated circuit (PMIC).

버스(500)는 소정의 표준 버스 규격을 갖는 프로토콜이 적용된 시스템 버스일 수 있으며, 상기 시스템 버스에 연결되는 각종 IP(Intellectual Property)들을 포함할 수 있다. 시스템 버스의 표준 규격으로서, ARM(Advanced RISC Machine) 사의 AMBA(Advanced Microcontroller Bus Architecture) 프로토콜이 적용될 수 있다. AMBA 프로토콜의 버스 타입에는 AHB(Advanced High-Performance Bus), APB(Advanced Peripheral Bus), AXI(Advanced eXtensible Interface), AXI4, ACE(AXI Coherency Extensions) 등이 포함될 수 있다. 이외에도, 소닉사(SONICs Inc.)의 uNetwork 이나 IBM의 CoreConnect, OCP-IP의 오픈 코어 프로토콜(Open Core Protocol) 등 다른 타입의 프로토콜이 적용되어도 무방하다. 버스(500)는 집적 회로(10)에 포함된 것으로 설명하고 있으나, 집적 회로(10) 외부에 위치할 수 있다.The bus 500 may be a system bus to which a protocol having a predetermined standard bus standard is applied, and may include various IPs (Intellectual Properties) connected to the system bus. As a standard standard for the system bus, the Advanced Microcontroller Bus Architecture (AMBA) protocol of ARM (Advanced RISC Machine) can be applied. Bus types of the AMBA protocol may include Advanced High-Performance Bus (AHB), Advanced Peripheral Bus (APB), Advanced eXtensible Interface (AXI), AXI4, and AXI Coherency Extensions (ACE). In addition, other types of protocols, such as Sonics Inc.'s uNetwork, IBM's CoreConnect, or OCP-IP's Open Core Protocol, may be applied. The bus 500 is described as being included in the integrated circuit 10, but may be located outside the integrated circuit 10.

메모리(600)는 다양한 종류의 반도체 메모리 장치에 해당할 수 있으며, 일 실시예에 따라 DDR SDRAM(Double Data Rate Synchronous Dynamic Ramdom Access Memory), LPDDR(Low Power Double Data Rate) SDRAM, GDDR(Graphics Double Data Rate) SDRAM, RDRAM(Rambus Dynamic Ramdom Access Memory) 등과 같은 동적 랜덤 억세스 메모리(Dynamic Ramdom Access Memory, DRAM)일 수 있다. 또한, 메모리(600)는 더 나아가 플래시 메모리(Flash Memory), PRAM(Phase-change RAM), MRAM(Magnetoresistive RAM), ReRAM(Resistive RAM) 및 FeRAM(Ferroelectric RAM) 중 어느 하나일 수 있다. 메모리(600)는 집적 회로(10)에 포함된 것으로 설명하고 있으나, 집적 회로(10) 외부에 위치할 수 있고, 이를 외부 메모리로 지칭할 수 있다.The memory 600 may correspond to various types of semiconductor memory devices, and according to one embodiment, DDR SDRAM (Double Data Rate Synchronous Dynamic Ramdom Access Memory), LPDDR (Low Power Double Data Rate) SDRAM, GDDR (Graphics Double Data) Rate) It may be dynamic random access memory (Dynamic Ramdom Access Memory, DRAM) such as SDRAM, RDRAM (Rambus Dynamic Ramdom Access Memory), etc. Additionally, the memory 600 may further be any one of flash memory, phase-change RAM (PRAM), magnetoresistive RAM (MRAM), resistive RAM (ReRAM), and ferroelectric RAM (FeRAM). The memory 600 is described as being included in the integrated circuit 10, but may be located outside the integrated circuit 10 and may be referred to as external memory.

집적 회로(10)는 도 1에 도시된 구성요소들 이외의 구성요소들을 포함할 수 있다. 예를 들어, 집적 회로(10)는 IO 인터페이스 블록(input/output interface block), USB 호스트 블록(universal serial bus host block), USB 슬레이브 블록(universal serial bus slave block) 등 다양한 종류의 기능 블록들을 더 포함할 수 있다.Integrated circuit 10 may include components other than those shown in FIG. 1 . For example, the integrated circuit 10 includes various types of functional blocks such as an IO interface block (input/output interface block), a USB host block (universal serial bus host block), and a USB slave block (universal serial bus slave block). It can be included.

도 2는 본 개시의 예시적 실시예에 따른 공유 버퍼 및 모니터가 모니터링하는 데이터를 설명하는 블록도이다. Figure 2 is a block diagram illustrating data monitored by a shared buffer and a monitor according to an exemplary embodiment of the present disclosure.

도 1 및 도 2를 참조하면, 도 2의 공유 버퍼(120)는 도 1의 공유 버퍼(120)와 동일할 수 있고, 복수의 블록(121)들을 포함할 수 있다. 도 1의 내용과 중복되는 설명은 생략한다. 일부 실시예들에서, 적어도 하나의 블록(121)은 하나의 세트를 구성하고, 공유 버퍼(120)는 복수의 세트를 포함할 수 있다. 복수의 세트를 기준으로 공유 버퍼(120)의 전체 용량이 결정될 수 있다. 예를 들면, 블록(121)의 크기는 B(B는 0이상의 정수) bytes 일 수 있고, 하나의 세트는 N(N은 1이상의 정수)개 이상의 블록(121)을 포함할 수 있고, 공유 버퍼(120)는 M(M은 1이상의 정수)개의 세트를 포함할 수 있다. 이때, 공유 버퍼(120)의 전체 용량은 B*N*M bytes 로 결정될 수 있다.Referring to FIGS. 1 and 2 , the shared buffer 120 of FIG. 2 may be the same as the shared buffer 120 of FIG. 1 and may include a plurality of blocks 121 . Descriptions that overlap with the content of FIG. 1 will be omitted. In some embodiments, at least one block 121 constitutes one set, and the shared buffer 120 may include a plurality of sets. The total capacity of the shared buffer 120 may be determined based on the plurality of sets. For example, the size of the block 121 may be B (B is an integer greater than 0) bytes, and one set may include more than N (N is an integer greater than 1) blocks 121, and the shared buffer (120) may include M (M is an integer of 1 or more) sets. At this time, the total capacity of the shared buffer 120 may be determined as B*N*M bytes.

일부 실시예들에서, 공유 버퍼(120)의 전체 용량 중 채워진 정도가 임계값 이상인 경우 DVFS 컨트롤러(200)는 코어(110)의 워크로드를 제1 워크로드로 구분할 수 있다. 예를 들면, 공유 버퍼(120)의 전체 용량인 B*N*M bytes 중 70%이상 채워져 있는 경우 DVFS 컨트롤러(200)는 코어(110)의 워크로드를 제1 워크로드로 구분할 수 있다.In some embodiments, when the full capacity of the shared buffer 120 is greater than or equal to a threshold, the DVFS controller 200 may classify the workload of the core 110 as the first workload. For example, if more than 70% of the total capacity of B*N*M bytes of the shared buffer 120 is filled, the DVFS controller 200 may classify the workload of the core 110 as the first workload.

일부 실시예들에서, 각각의 코어(110)들은 복수의 블록(121)들 중 특정 블록에만 요청(Req)을 송신할 수 있다. 예를 들면, 요청(Req)에는 복수의 블록(121)들 중 특정 블록의 위치를 지정하는 데이터를 포함할 수 있다.In some embodiments, each core 110 may transmit a request (Req) only to a specific block among the plurality of blocks 121. For example, the request (Req) may include data specifying the location of a specific block among the plurality of blocks 121.

일부 실시예들에서, 메모리(600)는 복수의 구역들(미도시)로 구분될 수 있고, 각각의 블록(121)들은 메모리(600)의 복수의 구역들 중 특정 구역에만 접근(Acc)할 수 있다. 예를 들면, 공유 버퍼(120)가 코어(110)로부터 수신한 요청(Req)은 어드레스(address)를 포함할 수 있고, 어드레스(address)는 메모리(600)의 특정 구역의 위치를 지정하는 데이터를 포함할 수 있다.In some embodiments, the memory 600 may be divided into a plurality of regions (not shown), and each block 121 may access (Acc) only a specific region among the plurality of regions of the memory 600. You can. For example, the request (Req) received by the shared buffer 120 from the core 110 may include an address, and the address is data specifying the location of a specific area of the memory 600. may include.

도 3a 및 도 3b는 본 개시의 예시적 실시예에 따른 버퍼 용량 및 응답 대기 시간을 설명하기 위한 그래프이다.3A and 3B are graphs for explaining buffer capacity and response waiting time according to an exemplary embodiment of the present disclosure.

도 1 및 도 3a를 참조하면, 도 3a의 그래프는 블록(121)의 용량이 채워진 정도(capacity)를 시간(time)에 따라 나타낸 것일 수 있다. 일부 실시예들에서, 블록(121)의 용량이 채워진 정도가 임계 용량 이상인 블록(121)들의 개수가 임계 개수 이상인 경우 공유 버퍼(120)의 버퍼 용량은 풀(full)인 상태로 지칭할 수 있다. 예를 들면, 복수의 블록(121)들 중 용량이 채워진 정도가 70%(Th1)이상인 블록(121)들의 개수가 전체 블록(121)들의 개수 중 50%(임계 개수)이상인 경우 공유 버퍼(120)의 버퍼 용량은 풀(full)인 상태일 수 있다. 버퍼 용량이 풀인 구간은 구간 C1 및 구간 C2 일 수 있다.Referring to FIGS. 1 and 3A , the graph in FIG. 3A may represent the capacity of the block 121 over time. In some embodiments, when the degree to which the capacity of the block 121 is filled is greater than or equal to the critical capacity, the buffer capacity of the shared buffer 120 may be referred to as being full. . For example, if the number of blocks 121 whose capacity is filled to 70% (Th1) or more among the plurality of blocks 121 is more than 50% (threshold number) of the total number of blocks 121, the shared buffer 120 )'s buffer capacity may be full. The sections with full buffer capacity may be section C1 and section C2.

도 3b를 더 참조하면, 도 3b의 그래프는 공유 버퍼(120)가 외부 메모리로부터 수신되는 응답에 대한 응답 대기 시간을 나타낸 것일 수 있다. 일부 실시예들에서, 외부 메모리로부터 수신되는 응답에 대한 응답 대기 시간은 공유 버퍼(120)가 버스(500)를 통해 메모리(600)에 접근(Acc)한 후에 메모리(600)로부터 응답(Res)을 수신할 때까지 걸린 시간(이하, 응답 대기 시간이라 함.)일 수 있다. 공유 버퍼(120)가 버스(500)를 통해 메모리(600)에 접근(Acc)할 경우, 신호(signal)가 제1 레벨(예컨대 로직 로우)에서 제2 레벨(예컨대 로직 하이)로 천이 될 수 있다. 공유 버퍼(120)가 버스(500)를 통해 메모리(600)로부터 응답(Res)을 수신할 경우, 신호(signal)가 제2 레벨에서 제1 레벨로 천이 될 수 있다. 신호(signal)가 제1 레벨에서 제2 레벨로 천이 된 이후 제2 레벨에서 제1 레벨로 천이 될 때까지 걸린 시간(예를 들면, 시간 a1, 시간 a2, 시간 a3 또는 시간 b1)은 응답 대기 시간일 수 있다.Referring further to FIG. 3B, the graph of FIG. 3B may represent the response waiting time of the shared buffer 120 for a response received from an external memory. In some embodiments, the response waiting time for a response received from an external memory is a response (Re) from memory 600 after the shared buffer 120 accesses (Acc) the memory 600 through the bus 500. It may be the time taken to receive (hereinafter referred to as response waiting time). When the shared buffer 120 accesses (Acc) the memory 600 through the bus 500, a signal may transition from a first level (e.g., logic low) to a second level (e.g., logic high). there is. When the shared buffer 120 receives a response Res from the memory 600 through the bus 500, the signal may transition from the second level to the first level. After the signal transitions from the first level to the second level, the time taken from the second level to the first level (for example, time a1, time a2, time a3, or time b1) is the waiting time for a response. It could be time.

일부 실시예들에서, DVFS 컨트롤러(200)는 공유 버퍼(120)의 버퍼 용량이 풀인 상태를 나타내고, 응답 대기 시간이 임계 시간(Th2) 보다 긴 경우, 코어(110)의 워크로드를 제1 워크로드 또는 제2 워크로드 중 제1 워크로드로 구분할 수 있다. 제1 워크로드는 제2 워크로드보다 외부 메모리에 접근하는 요청을 더 많이 포함할 수 있다. 예를 들면, 공유 버퍼(120)의 버퍼 용량이 풀인 상태는 구간 C1 및 구간 C2 일 수 있다. 응답 대기 시간이 임계 시간(Th2)보다 긴 경우는, 시간 a1, 시간 a2 및 시간 a3 일 수 있다. 시점 T1에서, 공유 버퍼(120)의 버퍼 용량은 풀인 상태를 나타내고, 응답 대기 시간은 임계 시간(Th2)보다 긴 경우일 수 있다. 시점 T2에서, 응답 대기 시간은 임계 시간(Th2)보다 긴 경우이지만, 공유 버퍼(120)의 버퍼 용량은 풀인 상태가 아닐 수 있다. 따라서, 시점 T1부터 시점 T2까지의 구간에서 버스(500)나 메모리(600)에서 충돌이 발생하는 상황일 수 있고, DVFS 컨트롤러(200)는 코어(110)의 워크로드를 제1 워크로드로 구분할 수 있다.In some embodiments, the DVFS controller 200 indicates that the buffer capacity of the shared buffer 120 is full, and when the response waiting time is longer than the threshold time (Th2), the DVFS controller 200 transfers the workload of the core 110 to the first workload. It can be divided into a first workload or a second workload. The first workload may include more requests to access external memory than the second workload. For example, the state in which the buffer capacity of the shared buffer 120 is full may be section C1 and section C2. If the response waiting time is longer than the threshold time (Th2), it may be time a1, time a2, and time a3. At time T1, the buffer capacity of the shared buffer 120 may be in a full state, and the response waiting time may be longer than the threshold time (Th2). At time T2, the response waiting time is longer than the threshold time (Th2), but the buffer capacity of the shared buffer 120 may not be full. Therefore, there may be a situation in which a collision occurs in the bus 500 or the memory 600 in the section from time T1 to time T2, and the DVFS controller 200 may divide the workload of the core 110 into the first workload. You can.

일부 실시예들에서, 모니터(130)는 공유 버퍼(120)를 모니터링 하고, 식별자(source ID)를 획득할 수 있다. 식별자(source ID)는 공유 버퍼(120)에 요청(Req)을 송신하는 코어(110)의 주소를 지칭할 수 있고, DVFS 컨트롤러(200)는 모니터(130)로부터 식별자(source ID)를 수신할 수 있다. 시점 T1부터 시점 T2까지의 구간에서 DVFS 컨트롤러(200)는 식별자(source ID)에 따라 복수의 코어(110)들 중 특정 코어를 식별하고, 식별된 코어의 워크로드를 제1 워크로드로 구분할 수 있다.In some embodiments, monitor 130 may monitor the shared buffer 120 and obtain an identifier (source ID). The identifier (source ID) may refer to the address of the core 110 sending a request (Req) to the shared buffer 120, and the DVFS controller 200 may receive the identifier (source ID) from the monitor 130. You can. In the section from time T1 to time T2, the DVFS controller 200 may identify a specific core among the plurality of cores 110 according to an identifier (source ID) and distinguish the workload of the identified core as the first workload. there is.

도 4는 본 개시의 예시적 실시예에 따른 집적 회로의 일 구현예를 나타내는 블록도이다.Figure 4 is a block diagram showing an implementation of an integrated circuit according to an exemplary embodiment of the present disclosure.

도 1 및 도 4를 참조하면, 집적 회로(10a)는 프로세서(100), DVFS 컨트롤러(200) 및 온도 센서(800)를 포함할 수 있다. 일부 실시예들에서, 도 4의 프로세서(100) 및 DVFS 컨트롤러(200)는 도 1의 프로세서(100) 및 DVFS 컨트롤러(200)와 동일할 수 있다. 도 1의 내용과 중복되는 설명은 생략한다.Referring to FIGS. 1 and 4 , the integrated circuit 10a may include a processor 100, a DVFS controller 200, and a temperature sensor 800. In some embodiments, the processor 100 and DVFS controller 200 of FIG. 4 may be the same as the processor 100 and DVFS controller 200 of FIG. 1 . Descriptions that overlap with those of FIG. 1 will be omitted.

온도 센서(800)는 프로세서(100)의 온도를 감지하고 감지 결과에 따른 온도 정보를 DVFS 컨트롤러(200)에 제공할 수 있다. 온도 센서(800)는 서미터(thermistor) 및 온도 정보를 저장할 수 있는 메모리(미도시)를 포함할 수 있다. 일부 실시예들에서, 온도 센서(800)는 코어(110)들의 온도를 감지할 수 있다. 감지 결과에 따른 온도 정보를 DVFS 컨트롤러(200)에 제공하고, 메모리에 저장할 수 있다.The temperature sensor 800 may detect the temperature of the processor 100 and provide temperature information according to the detection result to the DVFS controller 200. The temperature sensor 800 may include a thermistor and a memory (not shown) capable of storing temperature information. In some embodiments, temperature sensor 800 may sense the temperature of cores 110. Temperature information according to the detection result can be provided to the DVFS controller 200 and stored in memory.

도 3a 및 도 3b를 더 참조하면, 일부 실시예들에서, DVFS 컨트롤러(200)는 공유 버퍼(120)의 버퍼 용량이 풀인 상태가 아니거나, 응답 대기 시간이 임계 시간(Th2)이하인 경우, 온도 정보에 기초하여 코어(110)의 워크로드를 제1 워크로드 또는 제2 워크로드 중 제2 워크로드로 구분할 수 있다. 제1 워크로드는 제2 워크로드보다 외부 메모리에 접근하는 요청을 더 많이 포함할 수 있다. 예를 들면, 구간 C1 및 구간 C2를 제외한 구간은 버퍼 용량이 풀인 상태가 아닐 수 있다. 구간 C1에서 버퍼 용량은 풀인 상태이나, 응답 대기 시간(예를 들면, 시간 b1)은 임계 시간(Th2)을 초과하지 않을 수 있다. 구간 C3에서 버퍼 용량은 풀인 상태이나, 응답 대기 시간은 임계 시간(Th2)을 초과하지 않을 수 있다. 따라서, 시점 T1이전 또는 시점 T2 이후의 구간은 공유 버퍼(120)의 버퍼 용량이 풀인 상태가 아니거나, 응답 대기 시간이 임계 시간(Th2)이하인 경우일 수 있다. 시점 T1이전 또는 시점 T2 이후의 구간에서, 온도 센서(800)가 감지한 온도 정보가 임계 온도 이상인 경우, 코어(110)는 명령어 처리를 위해 과도하게 동작하는 경우일 수 있고, 프로세서(100)의 동작 중 캐시 미스가 발생하지 않는 상황일 수 있다. 따라서, DVFS 컨트롤러(200)는 코어(110)의 워크로드를 제2 워크로드로 구분할 수 있다.Referring further to FIGS. 3A and 3B, in some embodiments, the DVFS controller 200 controls the temperature when the buffer capacity of the shared buffer 120 is not full or the response waiting time is less than or equal to the threshold time (Th2). Based on the information, the workload of the core 110 may be divided into a first workload or a second workload. The first workload may include more requests to access external memory than the second workload. For example, sections other than section C1 and section C2 may not have full buffer capacity. In section C1, the buffer capacity is full, but the response waiting time (eg, time b1) may not exceed the threshold time (Th2). In section C3, the buffer capacity is full, but the response waiting time may not exceed the threshold time (Th2). Therefore, in the section before time T1 or after time T2, the buffer capacity of the shared buffer 120 may not be full or the response waiting time may be less than the threshold time (Th2). In the section before time T1 or after time T2, if the temperature information detected by the temperature sensor 800 is above the critical temperature, the core 110 may operate excessively for command processing, and the processor 100 may operate excessively. There may be a situation where a cache miss does not occur during operation. Accordingly, the DVFS controller 200 may divide the workload of the core 110 into a second workload.

일부 실시예들에서, 모니터(130)는 공유 버퍼(120)를 모니터링 하고, 식별자(source ID)를 획득할 수 있다. 식별자(source ID)는 공유 버퍼(120)에 요청(Req)을 송신하는 코어(110)의 주소를 지칭할 수 있고, DVFS 컨트롤러(200)는 모니터(130)로부터 식별자(source ID)를 수신할 수 있다. 시점 T1이전 또는 시점 T2이후의 구간에서 DVFS 컨트롤러(200)는 식별자(source ID)에 따라 복수의 코어(110)들 중 특정 코어를 식별하고, 식별된 코어의 워크로드를 제2 워크로드로 구분할 수 있다.In some embodiments, monitor 130 may monitor the shared buffer 120 and obtain an identifier (source ID). The identifier (source ID) may refer to the address of the core 110 sending a request (Req) to the shared buffer 120, and the DVFS controller 200 may receive the identifier (source ID) from the monitor 130. You can. In the section before time T1 or after time T2, the DVFS controller 200 identifies a specific core among the plurality of cores 110 according to an identifier (source ID) and divides the workload of the identified core into a second workload. You can.

도 5는 본 개시의 예시적 실시예에 따른 집적 회로의 일 구현예를 나타내는 블록도이다.Figure 5 is a block diagram illustrating an example implementation of an integrated circuit according to an exemplary embodiment of the present disclosure.

도 1 및 도 5를 참조하면, 집적 회로(10b)는 DVFS 컨트롤러(200a), 클럭 관리부(300), 파워 관리부(400) 및 메모리(700)를 포함할 수 있다. 일부 실시예들에서, 도 5의 클럭 관리부(300) 및 파워 관리부(400)는 도 1의 클럭 관리부(300) 및 파워 관리부(400)와 동일할 수 있다. 도 1의 내용과 중복되는 설명은 생략한다.Referring to FIGS. 1 and 5 , the integrated circuit 10b may include a DVFS controller 200a, a clock management unit 300, a power management unit 400, and a memory 700. In some embodiments, the clock management unit 300 and power management unit 400 of FIG. 5 may be the same as the clock management unit 300 and power management unit 400 of FIG. 1 . Descriptions that overlap with the content of FIG. 1 will be omitted.

DVFS 컨트롤러(200a)는 워크로드 분류 로직(workload classification logic)(210), DVFS 관리자 모듈(DVFS governor module)(220), 클럭 관리부 드라이버(230) 및 파워 관리부 드라이버(240)를 포함할 수 있다. 워크로드 분류 로직(210)은 모니터(130)로부터 공유 버퍼(120)에 요청(Req)을 송신하는 코어(110)의 주소(이하, 식별자로 지칭함.), 공유 버퍼(120)의 버퍼 용량 및 외부 메모리로부터 수신되는 응답에 대한 응답 대기 시간을 수신할 수 있다. 워크로드 분류 로직(210)은 식별자에 따라 복수의 코어(110)들 중 특정 코어를 식별하고, 버퍼 용량 및 응답 대기 시간을 기초로 특정 코어의 워크로드를 구분할 수 있다. 특정 코어의 구분된 워크로드 데이터를 DVFS 관리자 모듈(220)에 송신할 수 있다. 예를 들면, 워크로드 분류 로직(210)은 버퍼 용량 및 응답 대기 시간을 기초로 코어(110)의 워크로드를 제1 워크로드 또는 제2 워크로드로 구분할 수 있고, 제1 워크로드는 제2 워크로드보다 외부 메모리에 접근하는 요청을 더 많이 포함할 수 있다. 워크로드 분류 로직(210)은 버퍼 용량이 풀인 상태이고, 응답 대기 시간이 임계 시간보다 긴 경우, 특정 코어의 워크로드를 제1 워크로드로 구분할 수 있고, 특정 코어가 제1 워크로드로 구분되었다는 데이터를 DVFS 관리자 모듈(220)에 송신할 수 있다.The DVFS controller 200a may include workload classification logic 210, a DVFS governor module 220, a clock management driver 230, and a power management driver 240. The workload classification logic 210 includes the address (hereinafter referred to as an identifier) of the core 110 that transmits a request (Req) from the monitor 130 to the shared buffer 120, the buffer capacity of the shared buffer 120, and The response waiting time for a response received from an external memory can be received. The workload classification logic 210 may identify a specific core among the plurality of cores 110 according to an identifier and classify the workload of the specific core based on buffer capacity and response latency. Separated workload data of a specific core can be transmitted to the DVFS manager module 220. For example, the workload classification logic 210 may classify the workload of the core 110 into a first workload or a second workload based on buffer capacity and response latency, and the first workload is the second workload. It may contain more requests accessing external memory than the workload. The workload classification logic 210 may classify the workload of a specific core as the first workload when the buffer capacity is full and the response waiting time is longer than the threshold time, and the specific core is classified as the first workload. Data may be transmitted to the DVFS manager module 220.

DVFS 관리자 모듈(220)은 특정 코어의 구분된 워크로드 데이터를 기초로 전압-주파수 레벨에 대한 스케일링 인자를 결정할 수 있다. 일부 실시예들에서, DVFS 관리자 모듈(220)은 특정 코어가 제1 워크로드로 구분되었다는 데이터를 수신한 경우, 메모리(700)로부터 제1 워크로드에 대응되는 전원 전압(vdd) 및 클럭 신호(clk)의 주파수를 포함하는 DVFS 테이블(710)을 획득할 수 있다. DVFS 관리자 모듈(220)은 제1 워크로드에 대응되는 전원 전압(vdd) 및 클럭 신호(clk)를 기초로 전압-주파수 레벨에 대한 스케일링 인자를 결정할 수 있다. 이하에서 모듈(module)이라 함은 각각의 명칭에 따른 기능과 동작을 수행할 수 있는 하드웨어를 의미하거나 특정한 기능과 동작을 수행할 수 있는 컴퓨터 프로그램 코드를 의미할 수 있다. 다만, 이에 한정되지 않으며 특정한 기능과 동작을 수행시킬 수 있는 컴퓨터 프로그램 코드가 탑재된 전자적 기록 매체, 예컨대 프로세서를 의미할 수 있다. 즉, 모듈이란 본 발명의 기술적 사상을 수행하기 위한 소프트웨어의 기능적 및/또는 구조적 결합을 의미할 수 있다.The DVFS manager module 220 may determine a scaling factor for the voltage-frequency level based on the separated workload data of a specific core. In some embodiments, when the DVFS manager module 220 receives data indicating that a specific core is classified as the first workload, the power supply voltage (vdd) and clock signal corresponding to the first workload from the memory 700 ( A DVFS table 710 containing frequencies of clk) can be obtained. The DVFS manager module 220 may determine a scaling factor for the voltage-frequency level based on the power supply voltage (vdd) and clock signal (clk) corresponding to the first workload. Hereinafter, module may refer to hardware that can perform functions and operations according to each name, or computer program code that can perform specific functions and operations. However, it is not limited to this and may refer to an electronic recording medium loaded with computer program code that can perform specific functions and operations, such as a processor. In other words, a module may mean a functional and/or structural combination of software to carry out the technical idea of the present invention.

클럭 관리부 드라이버(230)는 DVFS 관리자 모듈(220)이 결정한 스케일링 인자에 기초하여 클럭 제어 신호(C_clk)를 생성하고, 클럭 제어 신호(C_clk)를 클럭 관리부(300)에 제공할 수 있다.The clock manager driver 230 may generate a clock control signal (C_clk) based on the scaling factor determined by the DVFS manager module 220 and provide the clock control signal (C_clk) to the clock manager 300.

파워 관리부 드라이버(240)는 DVFS 관리자 모듈(220)이 결정한 스케일링 인자에 기초하여 전원 전압 제어 신호(C_vdd)를 생성하고, 전원 전압 제어 신호(C_vdd)를 파워 관리부(400)에 제공할 수 있다.The power management unit driver 240 may generate a power supply voltage control signal (C_vdd) based on the scaling factor determined by the DVFS manager module 220 and provide the power supply voltage control signal (C_vdd) to the power management unit 400.

메모리(700)는 DVFS 테이블(710)을 포함할 수 있다. 일부 실시예들에서, DVFS 테이블(710)은 각 워크로드에 대응되는 전원 전압(vdd) 및 클럭 신호(clk)의 주파수를 포함할 수 있다. DVFS 테이블(710)은 메모리(700)에 하드 코딩(hard coding)된 값들을 포함할 수 있고, 소프트 코딩(soft coding)되어 수정 가능한 값들을 포함할 수 있다. DVFS 테이블(710)의 수정은 DVFS 관리자 모듈(220)에 의해 수행될 수 있다. 도 5에서는 하나의 DVFS 테이블(710)이 도시되어 있으나, DVFS 테이블(710)은 복수개일 수 있다. 예를 들면, 버퍼 용량, 응답 대기 시간 및 워크로드에 따라 복수개의 DVFS 테이블(710)이 생성될 수 있다. Memory 700 may include a DVFS table 710. In some embodiments, the DVFS table 710 may include the power supply voltage (vdd) and the frequency of the clock signal (clk) corresponding to each workload. The DVFS table 710 may include values hard coded in the memory 700, and may include soft coded values that can be modified. Modification of the DVFS table 710 may be performed by the DVFS manager module 220. Although one DVFS table 710 is shown in FIG. 5, there may be multiple DVFS tables 710. For example, a plurality of DVFS tables 710 may be created depending on buffer capacity, response waiting time, and workload.

도 6a 및 도 6b는 본 개시의 예시적 실시예에 따른 DVFS 컨트롤러가 스케일링 인자를 결정하는 것을 설명하기 위한 그래프 및 표이다.6A and 6B are graphs and tables for explaining how a DVFS controller determines a scaling factor according to an exemplary embodiment of the present disclosure.

도 1 및 도 6a를 참조하면, 도 6a의 그래프는 코어(110)의 성능 및 소모 전력의 관계를 나타낸 것일 수 있다. 소모 전력(P)은 다음의 [수학식 1]을 만족할 수 있다.Referring to FIGS. 1 and 6A , the graph in FIG. 6A may show the relationship between performance and power consumption of the core 110. Power consumption (P) can satisfy the following [Equation 1].

V는 전원 전압을 의미할 수 있고, f는 코어(110)의 주파수를 의미할 수 있다. 소모 전력(P)은 전원 전압(V)의 제곱에 비례하고, 코어(110)의 주파수(f)에 비례할 수 있어 전원 전압(V) 또는 코어(110)의 주파수(f)가 높을수록 소모 전력(P)은 증가할 수 있다. 코어(110)의 성능은 전압-주파수 레벨에 의존적일 수 있다. 예를 들면, 전원 전압(V)의 크기가 증가할 수록, 코어(110)의 주파수(f)가 높을수록 코어(110)는 빠르게 동작할 수 있어 코어(110)의 성능이 향상될 수 있다. 따라서, 소모 전력(P)과 코어(110)의 성능은 비례한 관계를 가질 수 있다.V may mean the power supply voltage, and f may mean the frequency of the core 110. Power consumption (P) is proportional to the square of the power supply voltage (V) and may be proportional to the frequency (f) of the core 110, so the higher the power supply voltage (V) or the frequency (f) of the core 110, the higher the power consumption. Power (P) can be increased. Performance of core 110 may be dependent on voltage-frequency level. For example, as the size of the power supply voltage (V) increases and the frequency (f) of the core 110 increases, the core 110 can operate faster and the performance of the core 110 can improve. Accordingly, the power consumption (P) and the performance of the core 110 may have a proportional relationship.

도 5 및 도 6b를 더 참조하면, 워크로드 분류 로직(210)은 코어(110)의 워크로드를 제1 워크로드 또는 제2 워크로드로 구분할 수 있고, DVFS 관리자 모듈(220)은 구분된 코어(110)의 워크로드를 기초로 전압-주파수 레벨에 대한 스케일링 인자를 결정할 수 있다. 도 6b의 표는 도 5의 DVFS 테이블(710)일 수 있다. 일부 실시예들에서, DVFS 관리자 모듈(220)은 코어(110)의 워크로드가 제1 워크로드로 구분되었다는 데이터(W1)를 수신할 경우, 메모리(700)로부터 전원 전압(v1) 및 클럭 신호의 주파수(f1)를 포함하는 DVFS 테이블(710)을 획득할 수 있다. DVFS 관리자 모듈(220)은 전원 전압(v1) 및 클럭 신호의 주파수(f1)를 기초로 전압-주파수 레벨에 대한 스케일링 인자를 결정할 수 있다. 예를 들면, 제1 워크로드는 메모리 집중 워크로드일 수 있고, 메모리 집중 워크로드는 프로세서(100)의 동작 중 캐시 미스가 발생할 경우, 버스(500)나 메모리(600)에서 충돌이 발생하는 상황을 의미할 수 있다. 버스(500)나 메모리(600)에서 충돌이 발생하는 경우, 코어(110)의 성능을 낮춤으로써 요청(Req)수를 조절하여 충돌을 해결할 수 있고, 코어(110)의 성능은 소모 전력과 비례한 관계이므로, 소모 전력을 감소시킬 수 있다. 따라서, DVFS 관리자 모듈(220)은 전원 전압(v1) 및 클럭 신호의 주파수(f1)를 기초로 전압-주파수 레벨이 감소되도록 스케일링 인자를 결정할 수 있다. Referring further to FIGS. 5 and 6B, the workload classification logic 210 may classify the workload of the core 110 into a first workload or a second workload, and the DVFS manager module 220 may classify the workload of the core 110 into a first workload or a second workload. Based on the workload of (110), the scaling factor for the voltage-frequency level can be determined. The table in FIG. 6B may be the DVFS table 710 in FIG. 5. In some embodiments, when the DVFS manager module 220 receives data W1 indicating that the workload of the core 110 is classified as the first workload, the DVFS manager module 220 generates a power supply voltage v1 and a clock signal from the memory 700. The DVFS table 710 including the frequency (f1) of can be obtained. The DVFS manager module 220 may determine a scaling factor for the voltage-frequency level based on the power supply voltage (v1) and the frequency (f1) of the clock signal. For example, the first workload may be a memory-intensive workload, and the memory-intensive workload is a situation in which a conflict occurs in the bus 500 or the memory 600 when a cache miss occurs during operation of the processor 100. It can mean. If a conflict occurs in the bus 500 or memory 600, the conflict can be resolved by adjusting the number of requests (Req) by lowering the performance of the core 110, and the performance of the core 110 is proportional to the power consumed. Since there is one relationship, power consumption can be reduced. Accordingly, the DVFS manager module 220 may determine a scaling factor to reduce the voltage-frequency level based on the power supply voltage (v1) and the frequency (f1) of the clock signal.

일부 실시예들에서, 도 4를 더 참조하면, 온도 센서(800)가 감지한 온도 정보가 임계 온도 이상인 경우, DVFS 관리자 모듈(220)은 코어(110)의 워크로드가 제2 워크로드로 구분되었다는 데이터(W2)를 수신할 수 있고, 메모리(700)로부터 전원 전압(v2) 및 클럭 신호의 주파수(f2)를 포함하는 DVFS 테이블(710)을 획득할 수 있다. 예를 들면, 제2 워크로드는 컴퓨팅 워크로드일 수 있고, 컴퓨팅 워크로드는 코어(110)의 명령어 처리와 관련된 것으로, 프로세서(100)의 동작 중 캐시 미스가 발생하지 않는 상황을 의미할 수 있다. 캐시 미스가 발생하지 않는 상황일 경우, 제1 워크로드로 구분된 경우와 달리 코어(110)의 성능을 낮춤으로써 요청(Req)수를 조절하지 아니할 수 있다. 다만, 코어(110)의 온도가 임계 온도 이상인 경우, 코어(110)의 성능을 높이면 코어(110) 발열로 인한 오작동을 유발할 수 있으므로, 코어(110)의 성능을 낮춰 코어(110)의 온도를 낮출 수 있다. 따라서, DVFS 관리자 모듈(220)은 전원 전압(v2) 및 클럭 신호의 주파수(f2)를 기초로 전압-주파수 레벨이 감소되도록 스케일링 인자를 결정할 수 있다.In some embodiments, further referring to FIG. 4, when the temperature information detected by the temperature sensor 800 is above the threshold temperature, the DVFS manager module 220 divides the workload of the core 110 into a second workload. Data (W2) indicating that the data has been received can be received, and the DVFS table 710 including the power supply voltage (v2) and the frequency (f2) of the clock signal can be obtained from the memory 700. For example, the second workload may be a computing workload, and the computing workload may be related to instruction processing of the core 110, and may mean a situation in which a cache miss does not occur during operation of the processor 100. . In a situation where a cache miss does not occur, the number of requests (Req) may not be adjusted by lowering the performance of the core 110, unlike in the case where the workload is classified as the first workload. However, if the temperature of the core 110 is above the critical temperature, increasing the performance of the core 110 may cause malfunction due to heat generation of the core 110, so lower the performance of the core 110 to lower the temperature of the core 110. It can be lowered. Accordingly, the DVFS manager module 220 may determine a scaling factor to reduce the voltage-frequency level based on the power supply voltage (v2) and the frequency (f2) of the clock signal.

DVFS 테이블(710)은 도시된 데이터(W1 또는 W2)외 다른 데이터에 따른 전원 전압 및 클럭 신호의 주파수를 포함할 수 있다. 예를 들면, 온도 센서(800)가 감지한 온도 정보가 임계 온도 미만인 경우, DVFS 관리자 모듈(220)은 코어(110)의 워크로드가 제2 워크로드로 구분되었다는 데이터(미도시)를 수신할 수 있고, 메모리(700)로부터 전원 전압(미도시) 및 클럭 신호의 주파수(미도시)를 포함하는 DVFS 테이블(710)을 획득할 수 있다. 제2 워크로드는 컴퓨팅 워크로드일 수 있고, 코어(110)는 명령어 처리를 위해 높은 성능이 필요할 수 있다. 코어(110)의 온도는 임계 온도 미만이기 때문에, 코어(110)의 성능을 높여도 발열로 인한 오작동이 유발되지 않을 수 있다. 따라서, DVFS 관리자 모듈(220)은 전원 전압(미도시) 및 클럭 신호의 주파수(미도시)를 기초로 전압-주파수 레벨이 증가되도록 스케일링 인자를 결정할 수 있다.The DVFS table 710 may include power voltage and clock signal frequency according to data other than the illustrated data (W1 or W2). For example, if the temperature information detected by the temperature sensor 800 is below the critical temperature, the DVFS manager module 220 may receive data (not shown) indicating that the workload of the core 110 has been classified as a second workload. The DVFS table 710 including the power supply voltage (not shown) and the frequency of the clock signal (not shown) can be obtained from the memory 700. The second workload may be a computing workload, and the core 110 may require high performance for instruction processing. Since the temperature of the core 110 is below the critical temperature, malfunction due to heat generation may not be caused even if the performance of the core 110 is increased. Accordingly, the DVFS manager module 220 may determine a scaling factor to increase the voltage-frequency level based on the power supply voltage (not shown) and the frequency of the clock signal (not shown).

소모 전력(P)과 코어(110)의 성능은 비례한 관계를 가질 수 있기 때문에, DVFS 관리자 모듈(220)은 코어(110)의 워크로드에 따라 스케일링 인자를 달리 결정할 수 있고, 효율적으로 소모 전력 관리 및 코어(110)의 성능 관리를 할 수 있다.Since the power consumption (P) and the performance of the core 110 may have a proportional relationship, the DVFS manager module 220 can determine the scaling factor differently depending on the workload of the core 110 and efficiently consume power. Management and performance management of the core 110 can be performed.

도 7은 본 개시의 예시적 실시예에 따른 집적 회로의 구현예를 나타내는 블록도이다.Figure 7 is a block diagram showing an implementation example of an integrated circuit according to an exemplary embodiment of the present disclosure.

도 1 및 도 7을 참조하면, 집적 회로(10c)는 적어도 하나의 프로세서(100) 및 DVFS 컨트롤러(200b)를 포함할 수 있다. 일부 실시예들에서, 도 7의 프로세서(100)는 도 1의 프로세서(100)와 동일할 수 있다. 도 1의 내용과 중복되는 설명은 생략한다.Referring to FIGS. 1 and 7 , the integrated circuit 10c may include at least one processor 100 and a DVFS controller 200b. In some embodiments, processor 100 of FIG. 7 may be the same as processor 100 of FIG. 1 . Descriptions that overlap with the content of FIG. 1 will be omitted.

DVFS 컨트롤러(200b)는 적어도 하나의 프로세서(280), 메모리(250), AI(Artificial Intelligence) 가속기(260) 및 하드웨어 가속기(270)를 포함할 수 있다. 적어도 하나의 프로세서(280)는 명령어들을 실행할 수 있다. 예를 들면, 적어도 하나의 프로세서(280)는 메모리(250)에 저장된 명령어들을 실행함으로써 운영 체제(operating system)를 실행할 수도 있고, 운영 체제 상에서 실행되는 어플리케이션들을 실행할 수도 있다. 일부 실시예들에서, 적어도 하나의 프로세서(280)는 명령어들을 실행함으로써, AI 가속기(260) 및/또는 하드웨어 가속기(270)에 작업을 지시할 수 있고, AI 가속기(260) 및/또는 하드웨어 가속기(270)로부터 작업의 수행 결과를 획득할 수도 있다. 일부 실시예들에서, 적어도 하나의 프로세서(280)는 특정한 용도를 위하여 커스텀화된 ASIP(Application Specific Instruction set Processor)일 수 있고, 전용의 명령어 세트(instruction set)를 지원할 수도 있다.The DVFS controller 200b may include at least one processor 280, memory 250, artificial intelligence (AI) accelerator 260, and hardware accelerator 270. At least one processor 280 may execute instructions. For example, at least one processor 280 may execute an operating system or applications running on the operating system by executing instructions stored in the memory 250. In some embodiments, at least one processor 280 may direct tasks to AI accelerator 260 and/or hardware accelerator 270 by executing instructions, and AI accelerator 260 and/or hardware accelerator 270 may execute instructions. The task performance result can also be obtained from (270). In some embodiments, at least one processor 280 may be an Application Specific Instruction Set Processor (ASIP) customized for a specific purpose and may support a dedicated instruction set.

메모리(250)는 데이터를 저장하는 임의의 구조를 가질 수 있다. 예를 들면, 메모리(250)는, DRAM(Dynamic Random Access Memory), SRAM(Static Random Access Memory) 등과 같은 휘발성 메모리 장치를 포함할 수도 있고, 플래시 메모리, RRAM(Resistive Random Access Memory) 등과 같은 비휘발성 메모리 장치를 포함할 수도 있다. Memory 250 may have any structure for storing data. For example, the memory 250 may include volatile memory devices such as Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), and non-volatile memory devices such as flash memory and Resistive Random Access Memory (RRAM). It may also include a memory device.

AI 가속기(260)는 AI 어플리케이션들을 위해 설계된 하드웨어를 지칭할 수 있다. 일부 실시예들에서, AI 가속기(260)는 뉴로모픽(neuromorphic) 구조를 구현하기 위한 NPU(Neural Processing Unit)를 포함할 수 있고, 적어도 하나의 프로세서(280) 및/또는 하드웨어 가속기(270)로부터 제공된 입력 데이터를 처리함으로써 출력 데이터를 생성할 수 있고, 적어도 하나의 프로세서(280) 및/또는 하드웨어 가속기(270)에 출력 데이터를 제공할 수 있다. 일부 실시예들에서, AI 가속기(260)는 프로그램 가능할 수 있고, 적어도 하나의 프로세서(280) 및/또는 하드웨어 가속기(270)에 의해서 프로그램될 수 있다.AI accelerator 260 may refer to hardware designed for AI applications. In some embodiments, the AI accelerator 260 may include a neural processing unit (NPU) for implementing a neuromorphic structure, and may include at least one processor 280 and/or a hardware accelerator 270. Output data may be generated by processing input data provided from and the output data may be provided to at least one processor 280 and/or hardware accelerator 270. In some embodiments, AI accelerator 260 may be programmable and may be programmed by at least one processor 280 and/or hardware accelerator 270.

하드웨어 가속기(270)는 특정 작업을 고속으로 수행하기 위하여 설계된 하드웨어를 지칭할 수 있다. 예를 들면, 하드웨어 가속기(270)는 복조, 변조, 부호화, 복호화 등과 같은 데이터 변환을 고속으로 수행하도록 설계될 수 있다. 하드웨어 가속기(270)는 프로그램 가능할 수 있고, 적어도 하나의 프로세서(280) 및/또는 하드웨어 가속기(270)에 의해서 프로그램될 수 있다.Hardware accelerator 270 may refer to hardware designed to perform a specific task at high speed. For example, the hardware accelerator 270 may be designed to perform data conversion such as demodulation, modulation, encoding, decoding, etc. at high speed. Hardware accelerator 270 may be programmable and may be programmed by at least one processor 280 and/or hardware accelerator 270.

일부 실시예들에서, AI 가속기(260)는 인공신경망 모델을 실행할 수 있다. 예를 들면, 도 1을 더 참조할 때, 메모리(250)는 학습 데이터를 저장할 수 있다. 학습 데이터는 모니터(130)가 공유 버퍼(120)를 모니터링하여 획득한 버퍼 용량 및 응답 대기 시간을 포함할 수 있고, DVFS 컨트롤러(200)가 버퍼 용량 및 응답 대기 시간을 기초로 결정한 스케일링 인자를 포함할 수 있다. AI 가속기(260)는 인공신경망 모델을 실행할 수 있고, 프로세서(280)는 학습 데이터를 사용하여 인공신경망 모델을 학습시킬 수 있다. 인공신경망 모델이 학습 완료된 이후, 프로세서(280)는 모니터(130)로부터 버퍼 용량 및 응답 대기 시간을 수신할 수 있다. 수신한 버퍼 용량 및 응답 대기 시간이 메모리(250)에 저장된 버퍼 용량 및 응답 대기 시간과 대응되는 경우, 학습된 인공신경망 모델을 사용하여 버퍼 용량 및 응답 대기 시간에 대응하는 스케일링 인자를 결정할 수 있다. 인공신경망 모델이 학습된 이후 DVFS 컨트롤러(200)는 코어(110)의 워크로드를 구분하지 않고 스케일링 인자를 결정할 수 있어 빠르게 클럭 제어 신호 및 전압 제어 신호를 생성할 수 있다.In some embodiments, AI accelerator 260 may execute an artificial neural network model. For example, with further reference to Figure 1, memory 250 may store training data. The learning data may include buffer capacity and response latency obtained by the monitor 130 by monitoring the shared buffer 120, and includes a scaling factor determined by the DVFS controller 200 based on the buffer capacity and response latency. can do. The AI accelerator 260 can execute an artificial neural network model, and the processor 280 can train the artificial neural network model using training data. After the artificial neural network model is trained, the processor 280 may receive the buffer capacity and response waiting time from the monitor 130. If the received buffer capacity and response latency correspond to the buffer capacity and response latency stored in the memory 250, a scaling factor corresponding to the buffer capacity and response latency can be determined using the learned artificial neural network model. After the artificial neural network model is learned, the DVFS controller 200 can determine the scaling factor without distinguishing the workload of the core 110, thereby quickly generating a clock control signal and a voltage control signal.

일부 실시예들에서, 학습 데이터는 어드레스(address)를 포함할 수 있고, 프로세서(280)는 어드레스(address)를 포함한 학습 데이터를 사용하여 인공신경망 모델을 학습시킬 수 있다. 예를 들면, 공유 버퍼(120)가 코어(110)로부터 수신한 요청(Req)은 어드레스(address)를 포함할 수 있고, 어드레스(address)는 메모리(600)의 특정 구역의 위치를 지정하는 데이터를 포함할 수 있다. 인공신경망 모델이 학습 완료된 이후, 프로세서(280)는 모니터(130)로부터 어드레스(address)를 수신하고, 어드레스(address)에 대응하는 스케일링 인자를 결정할 수 있다. 학습 데이터는 전술한 데이터 외 다른 데이터를 더 포함할 수 있다.In some embodiments, the training data may include an address, and the processor 280 may train an artificial neural network model using the training data including the address. For example, the request (Req) received by the shared buffer 120 from the core 110 may include an address, and the address is data specifying the location of a specific area of the memory 600. may include. After the artificial neural network model is completed training, the processor 280 may receive an address from the monitor 130 and determine a scaling factor corresponding to the address. Learning data may further include data other than the above-described data.

도 8은 본 개시의 예시적 실시예에 따른 집적 회로의 동작 방법을 나타내는 순서도이다. 도 8에 도시된 바와 같이, 집적 회로의 동작 방법(900)은 복수의 단계들(S110 내지 S150)을 포함할 수 있다. Figure 8 is a flowchart showing a method of operating an integrated circuit according to an exemplary embodiment of the present disclosure. As shown in FIG. 8, the method 900 of operating an integrated circuit may include a plurality of steps S110 to S150.

도 1 및 도 8을 참조하면, 단계 S110에서 공유 버퍼(120)를 모니터링하고, 공유 버퍼(120)의 버퍼 용량 및 공유 버퍼(120)가 외부 메모리로부터 수신하는 응답에 대한 응답 대기 시간을 획득할 수 있다. 예시적 실시예로, 공유 버퍼(120)의 버퍼 용량은 복수의 블록(121)들의 용량이 채워진 상태를 나타낼 수 있고, 외부 메모리로부터 수신되는 응답에 대한 응답 대기 시간은 공유 버퍼(120)가 버스(500)를 통해 메모리(600)에 접근(Acc)한 후에 메모리(600)로부터 응답(Res)을 수신할 때까지 걸린 시간일 수 있다.Referring to Figures 1 and 8, in step S110, the shared buffer 120 is monitored and the buffer capacity of the shared buffer 120 and the response waiting time for the response that the shared buffer 120 receives from the external memory are obtained. You can. In an exemplary embodiment, the buffer capacity of the shared buffer 120 may indicate that the capacity of the plurality of blocks 121 is full, and the response waiting time for a response received from an external memory may indicate that the shared buffer 120 is on the bus. This may be the time taken to receive a response (Res) from the memory 600 after accessing (Acc) the memory 600 through (500).

단계 S130에서 버퍼 용량 및 응답 대기 시간을 기초로 코어(110)의 워크로드를 구분할 수 있다. 예시적 실시예로, 버퍼 용량 및 응답 대기 시간이 임계값 이상이면 DVFS 컨트롤러(200)는 코어(110)의 워크로드를 제1 워크로드로 구분할 수 있다. 버퍼 용량 또는 응답 대기 시간이 임계값 미만이면 DVFS 컨트롤러(200)는 코어(110)의 워크로드를 제2 워크로드로 구분할 수 있다In step S130, the workload of the core 110 can be distinguished based on buffer capacity and response waiting time. In an exemplary embodiment, if the buffer capacity and response latency are greater than or equal to a threshold, the DVFS controller 200 may classify the workload of the core 110 as a first workload. If the buffer capacity or response latency is below the threshold, the DVFS controller 200 may classify the workload of the core 110 as a second workload.

단계 S150에서 구분된 코어(110)의 워크로드를 기초로 코어(110)의 전압-주파수 레벨에 대한 스케일링 인자를 결정할 수 있다. 예시적 실시예로, 코어(110)의 워크로드가 제1 워크로드로 구분된 경우, 전압-주파수 레벨이 감소되도록 스케일링 인자를 결정할 수 있다. 코어(110)의 워크로드가 제2 워크로드로 구분된 경우, 전압-주파수 레벨이 증가되도록 스케일링 인자를 결정할 수 있다.A scaling factor for the voltage-frequency level of the core 110 may be determined based on the workload of the core 110 classified in step S150. In an exemplary embodiment, when the workload of the core 110 is divided into the first workload, a scaling factor may be determined so that the voltage-frequency level is reduced. When the workload of the core 110 is divided into a second workload, a scaling factor may be determined to increase the voltage-frequency level.

도 9는 본 개시의 예시적 실시예에 따른 집적 회로의 동작 방법의 일 구현예를 나타내는 순서도이다. 도 8의 단계 S130는 도 9의 집적 회로의 동작 방법의 일 구현예를 포함할 수 있다. 집적 회로의 동작 방법의 일 구현예(S130)는 복수의 단계들(S131 내지 S135)을 포함할 수 있다. Figure 9 is a flowchart illustrating an example implementation of a method of operating an integrated circuit according to an exemplary embodiment of the present disclosure. Step S130 of FIG. 8 may include an implementation example of the method of operating the integrated circuit of FIG. 9. An implementation example (S130) of a method of operating an integrated circuit may include a plurality of steps (S131 to S135).

도 1 및 도 9를 참조하면, 집적 회로의 동작 방법의 일 구현예(S130)는 DVFS 컨트롤러(200)가 세가지 모드로 동작하는 것을 나타낼 수 있다. 단계 S131에서 버퍼 용량이 풀인 상태인지 여부 및 응답 대기 시간이 임계 시간(th2)보다 긴 경우 인지 여부를 판단할 수 있다. 버퍼 용량이 풀인 상태이고, 응답 대기 시간이 임계 시간(th2)보다 긴 경우, 단계 S132에서 DVFS 컨트롤러(200)는 메모리 인텐시브 워크로드 모드로 동작할 수 있다. 예를 들면, 메모리 인텐시브 워크로드 모드에서 DVFS 컨트롤러(200)는 코어(110)의 워크로드를 메모리 인텐시브 워크로드로 구분할 수 있고, 전압-주파수 레벨이 감소되도록 스케일링 인자를 결정할 수 있다.Referring to FIGS. 1 and 9 , an implementation example (S130) of a method of operating an integrated circuit may indicate that the DVFS controller 200 operates in three modes. In step S131, it can be determined whether the buffer capacity is full and whether the response waiting time is longer than the threshold time (th2). If the buffer capacity is full and the response waiting time is longer than the threshold time (th2), the DVFS controller 200 may operate in memory intensive workload mode in step S132. For example, in the memory intensive workload mode, the DVFS controller 200 may divide the workload of the core 110 into a memory intensive workload and determine a scaling factor so that the voltage-frequency level is reduced.

버퍼 용량이 풀인 상태가 아니거나, 응답 대기 시간이 임계 시간(Th2)보다 짧은 경우, 단계 S133에서 DVFS 컨트롤러(200)는 도 4의 온도 센서로부터 코어(110)의 온도 정보(T)를 수신할 수 있다. 코어(110)의 온도 정보(T)가 임계 온도(th3)보다 높은 경우, 단계 S134에서 DVFS 컨트롤러(200)는 컴퓨팅 워크로드 모드로 동작할 수 있다. 예를 들면, 컴퓨팅 워크로드 모드에서 DVFS 컨트롤러(200)는 코어(110)의 워크로드를 컴퓨팅 워크로드로 구분할 수 있고, 전압-주파수 레벨이 감소되도록 스케일링 인자를 결정할 수 있다. 코어(110)의 온도 정보(T)가 임계 온도(th3)보다 높지 않은 경우, 단계 S135에서 DVFS 컨트롤러(200)는 노멀 워크로드 모드로 동작할 수 있다. 예를 들면, 노멀 워크로드 모드에서 DVFS 컨트롤러(200)는 코어(110)의 워크로드를 노멀 워크로드로 구분할 수 있고, 전압-주파수 레벨이 증가되도록 스케일링 인자를 결정할 수 있다.If the buffer capacity is not full or the response waiting time is shorter than the threshold time (Th2), the DVFS controller 200 receives temperature information (T) of the core 110 from the temperature sensor in FIG. 4 in step S133. You can. If the temperature information (T) of the core 110 is higher than the threshold temperature (th3), the DVFS controller 200 may operate in computing workload mode in step S134. For example, in the computing workload mode, the DVFS controller 200 may divide the workload of the core 110 into computing workloads and determine a scaling factor so that the voltage-frequency level is reduced. If the temperature information (T) of the core 110 is not higher than the critical temperature (th3), the DVFS controller 200 may operate in normal workload mode in step S135. For example, in normal workload mode, the DVFS controller 200 may divide the workload of the core 110 into a normal workload and determine a scaling factor to increase the voltage-frequency level.

버퍼 용량 및 응답 대기 시간에 따라 DVFS 컨트롤러(200)의 동작 모드가 달라지고, 동작 모드에 따라 스케일링 인자를 달리 결정하므로 상황에 맞게 소모 전력을 조절할 수 있다. 예를 들면, 메모리 인텐시브 워크로드 모드로 동작할 경우의 DVFS 컨트롤러(200)가 결정한 스케일링 인자는 컴퓨팅워크로드 모드로 동작할 경우의 DVFS 컨트롤러(200)가 결정한 스케일링 인자보다 전압-주파수 레벨을 더 낮게 감소시키는 스케일링 인자일 수 있다.The operation mode of the DVFS controller 200 varies depending on the buffer capacity and response waiting time, and the scaling factor is determined differently depending on the operation mode, so power consumption can be adjusted according to the situation. For example, the scaling factor determined by the DVFS controller 200 when operating in the memory intensive workload mode sets the voltage-frequency level lower than the scaling factor determined by the DVFS controller 200 when operating in the computing workload mode. It may be a scaling factor that reduces it.

도 10은 본 개시의 예시적 실시예에 따른 집적 회로의 동작 방법의 일 구현예를 나타내는 순서도이다. 도 10에 도시된 바와 같이, 집적 회로의 동작 방법의 일 구현예(1000)는 복수의 단계들(S210 내지 S250)을 포함할 수 있다. Figure 10 is a flowchart illustrating an example implementation of a method of operating an integrated circuit according to an exemplary embodiment of the present disclosure. As shown in FIG. 10 , an implementation example 1000 of a method of operating an integrated circuit may include a plurality of steps S210 to S250.

도 7 및 도 10을 참조하면, 단계 S210에서 메모리(250)는 학습 데이터를 저장할 수 있다. 예시적 실시예로, 메모리(250)는 모니터(130)로부터 학습 데이터를 수신하여 저장할 수 있다. 학습 데이터는 모니터(130)가 공유 버퍼(120)를 모니터링하여 획득한 버퍼 용량 및 응답 대기 시간을 포함할 수 있고, DVFS 컨트롤러(200)가 버퍼 용량 및 응답 대기 시간을 기초로 결정한 스케일링 인자를 포함할 수 있다. Referring to FIGS. 7 and 10 , the memory 250 may store training data in step S210. In an exemplary embodiment, the memory 250 may receive learning data from the monitor 130 and store it. The learning data may include buffer capacity and response latency obtained by the monitor 130 by monitoring the shared buffer 120, and includes a scaling factor determined by the DVFS controller 200 based on the buffer capacity and response latency. can do.

단계 S230에서 학습 데이터를 사용하여 인공싱경망 모델을 학습할 수 있다. 예시적 실시예로, AI 가속기(260)는 인공신경망 모델을 실행할 수 있고, 프로세서(280)는 학습 데이터를 사용하여 인공신경망 모델을 학습시킬 수 있다.In step S230, an artificial neural network model can be learned using the training data. In an example embodiment, the AI accelerator 260 may execute an artificial neural network model, and the processor 280 may train the artificial neural network model using training data.

단계 S250에서 학습이 완료된 이후, 프로세서(280)는 모니터로부터 수신되는 버퍼 용량 및 응답 대기 시간에 대응하는 스케일링 인자를 결정할 수 있다. 예시적 실시예로, 프로세서(280)는 모니터(130)로부터 버퍼 용량 및 응답 대기 시간을 수신할 수 있다. 수신한 버퍼 용량 및 응답 대기 시간이 메모리(250)에 저장된 버퍼 용량 및 응답 대기 시간과 대응되는 경우, 학습된 인공신경망 모델을 사용하여 버퍼 용량 및 응답 대기 시간에 대응하는 스케일링 인자를 결정할 수 있다. 인공신경망 모델이 학습된 이후 DVFS 컨트롤러(200)는 코어(110)의 워크로드를 구분하지 않고 스케일링 인자를 결정할 수 있어 빠르게 클럭 제어 신호 및 전압 제어 신호를 생성할 수 있다.After learning is completed in step S250, the processor 280 may determine a scaling factor corresponding to the buffer capacity and response waiting time received from the monitor. In an exemplary embodiment, the processor 280 may receive the buffer capacity and response waiting time from the monitor 130. If the received buffer capacity and response latency correspond to the buffer capacity and response latency stored in the memory 250, a scaling factor corresponding to the buffer capacity and response latency can be determined using the learned artificial neural network model. After the artificial neural network model is learned, the DVFS controller 200 can determine the scaling factor without distinguishing the workload of the core 110, thereby quickly generating a clock control signal and a voltage control signal.

도 11은 본 개시의 예시적 실시예에 따른 시스템을 나타내는 블록도이다.Figure 11 is a block diagram showing a system according to an exemplary embodiment of the present disclosure.

도 11을 참조하면, 시스템(30)은 이동 전화기, 스마트폰, 태블릿 컴퓨터(tablet computer), PDA(personal digital assistant), EDA(enterprise digital assistant), 디지털 스틸 카메라(digital still camera), 디지털 비디오 카메라(digital video camera), PMP(portable multimedia player), PDN(personal navigation device) 또는 portable navigation device), 손으로 들고 다닐 수 있는 게임 콘솔(handheld game console), 또는 e-북(e-book)과 같이 손으로 들고 다닐 수 있는 장치(handheld device)로 구현될 수 있다.Referring to FIG. 11, system 30 may be used in a mobile phone, a smartphone, a tablet computer, a personal digital assistant (PDA), an enterprise digital assistant (EDA), a digital still camera, or a digital video camera. (digital video camera), portable multimedia player (PMP), personal navigation device (PDN) or portable navigation device (PDN), handheld game console, or e-book. It can be implemented as a handheld device.

시스템(30)은 SoC(3100) 및 메모리 장치(3200)를 포함할 수 있다. SoC(3100)는 CPU(central processing unit)(3110), GPU(graphic processing unit)(3120), NPU(neural processing unit)(3130), ISP(Image Signal Processor)(3140), MIF(memory interface)(3150), CMU(clock management unit)(3160), PMU(power management unit)(3170)를 포함할 수 있다. CPU(3110), GPU(3120), NPU(3130), ISP(3140) 및 MIF(3150)는 도 1 내지 도 10을 통해 전술된 집적 회로(10)의 일 구현 예일 수 있다. 따라서, CPU(3110), GPU(3120), NPU(3130), ISP(3140) 및 MIF(3150) 각각에 모니터(130) 및 DVFS 컨트롤러(200)를 포함할 수 있고, DVFS 컨트롤러(200)는 모니터(130)가 모니터링한 버퍼 용량 및 응답 대기 시간을 기초로 DVFS 동작을 수행할 수 있다.System 30 may include SoC 3100 and memory device 3200. The SoC (3100) includes a central processing unit (CPU) (3110), a graphic processing unit (GPU) (3120), a neural processing unit (NPU) (3130), an image signal processor (ISP) (3140), and a memory interface (MIF). It may include (3150), a clock management unit (CMU) (3160), and a power management unit (PMU) (3170). The CPU 3110, GPU 3120, NPU 3130, ISP 3140, and MIF 3150 may be examples of implementations of the integrated circuit 10 described above with reference to FIGS. 1 to 10. Accordingly, the CPU 3110, GPU 3120, NPU 3130, ISP 3140, and MIF 3150 may each include a monitor 130 and a DVFS controller 200, and the DVFS controller 200 DVFS operations can be performed based on the buffer capacity and response waiting time monitored by the monitor 130.

CPU(3110)는 CMU(3160)에 의해 생성된 클럭 신호에 응답하여 메모리 장치(3200)에 저장된 명령들 및/또는 데이터를 처리 또는 실행할 수 있다.The CPU 3110 may process or execute instructions and/or data stored in the memory device 3200 in response to a clock signal generated by the CMU 3160.

GPU(3120)는 CMU(3160)에 의해 생성된 클럭 신호에 응답하여 메모리 장치(3200)에 저장된 이미지 데이터를 획득할 수 있다. GPU(3120)는 MIF(3150)로부터 제공되는 이미지 데이터로부터 디스플레이 장치(미도시)를 통해서 출력되는 영상을 위한 데이터를 생성할 수도 있고, 이미지 데이터를 인코딩할 수도 있다.The GPU 3120 may obtain image data stored in the memory device 3200 in response to a clock signal generated by the CMU 3160. The GPU 3120 may generate data for an image output through a display device (not shown) from image data provided by the MIF 3150, and may encode the image data.

NPU(3130)는 기계 학습 모델을 실행하는 임의의 장치를 지칭할 수 있다. NPU(3130)는 기계 학습 모델을 실행하기 위하여 설계된 하드웨어 블록일 수 있다. 기계 학습 모델은 인공 신경망(artificial neural network), 결정 트리, 서포트 벡터 머신, 회귀 분석(regression analysis), 베이즈 네트워크(Bayesian network), 유전 계획법(genetic algorithm) 등에 기초한 모델일 수 있다. 인공신경망은, 비제한적인 예시로서 CNN(convolution neural network), R-CNN(region with convolution neural network), RPN(region proposal network), RNN(recurrent neural network), S-DNN(stacking-based deep neural network), S-SDNN(state-space dynamic neural network), Deconvolution Network, DBN(deep belief network), RBM(restricted Boltzmann machine), Fully Convolutional Network, LSTM(long short-term memory) Network, Classification Network를 포함할 수 있다.NPU 3130 may refer to any device that executes a machine learning model. The NPU 3130 may be a hardware block designed to execute a machine learning model. The machine learning model may be a model based on an artificial neural network, decision tree, support vector machine, regression analysis, Bayesian network, genetic algorithm, etc. Artificial neural networks include, but are not limited to, convolution neural network (CNN), region with convolution neural network (R-CNN), region proposal network (RPN), recurrent neural network (RNN), and stacking-based deep neural network (S-DNN). network), S-SDNN (state-space dynamic neural network), Deconvolution Network, DBN (deep belief network), RBM (restricted Boltzmann machine), Fully Convolutional Network, LSTM (long short-term memory) Network, and Classification Network. can do.

ISP(3140)는 SoC(3100) 외부에 위치하는 이미지 센서(미도시)로부터 수신된 로우(RAW) 데이터에 대해 신호 처리 동작을 수행하고, 향상된 이미지 품질을 갖는 디지털 데이터를 생성할 수 있다.The ISP 3140 may perform a signal processing operation on raw data received from an image sensor (not shown) located outside the SoC 3100 and generate digital data with improved image quality.

MIF(3150)는 SoC(3100)의 외부에 위치하는 메모리 장치(3200)에 대한 인터페이스를 제공할 수 있다. 메모리 장치(3200)는 DRAM(Dynamic Random Access Memory), PRAM(Phase-change Random Access Memory), ReRAM(Resistive Random Access Memory) 또는 플래시 메모리일 수 있다.The MIF 3150 may provide an interface to a memory device 3200 located outside the SoC 3100. The memory device 3200 may be Dynamic Random Access Memory (DRAM), Phase-change Random Access Memory (PRAM), Resistive Random Access Memory (ReRAM), or flash memory.

CMU(3160)는 클럭 신호를 생성하고, 클록 신호를 SoC(3100)의 구성요소들에 제공할 수 있다. CMU(3160)는 위상 동기 루프 회로(Phase Locked Loop; PLL), 지연 동기 루프(Delayed Locked Loop; DLL), 수정자(crystal)등의 클럭 생성 장치를 포함할 수 있다. PMU(3170)는 외부 전원을 내부 전원으로 변환하고, 내부 전원을 SoC(3100)의 구성요소들에 전력을 공급할 수 있다.The CMU 3160 may generate a clock signal and provide the clock signal to components of the SoC 3100. The CMU 3160 may include a clock generation device such as a phase locked loop (PLL), a delayed locked loop (DLL), and a crystal. The PMU (3170) can convert external power to internal power and supply internal power to components of the SoC (3100).

도 12는 본 개시의 예시적 실시예에 따른 어플리케이션 프로세서를 포함하는 통신 장치를 나타내는 블록도이다.Figure 12 is a block diagram showing a communication device including an application processor according to an example embodiment of the present disclosure.

도 12를 참조하면, 통신 장치(40)는 어플리케이션 프로세서(4010), 메모리 장치(4020), 디스플레이(4030), 입력 장치(4040) 및 무선 송수신기(4050)를 포함할 수 있다. 어플리케이션 프로세서(4010)는 도 1 내지 11을 참조하여 전술된 집적 회로(10)의 일 구현 예일 수 있다.Referring to FIG. 12, the communication device 40 may include an application processor 4010, a memory device 4020, a display 4030, an input device 4040, and a wireless transceiver 4050. The application processor 4010 may be an implementation example of the integrated circuit 10 described above with reference to FIGS. 1 to 11 .

무선 송수신기(4050)는 안테나(4060)를 통하여 무선 신호를 주거나 받을 수 있다. 예컨대, 무선 송수신기(4050)는 안테나(4060)를 통하여 수신된 무선 신호를 어플리케이션 프로세서(4010)에서 처리될 수 있는 신호로 변경할 수 있다.The wireless transceiver 4050 can transmit or receive wireless signals through the antenna 4060. For example, the wireless transceiver 4050 can change a wireless signal received through the antenna 4060 into a signal that can be processed by the application processor 4010.

따라서, 어플리케이션 프로세서(4010)는 무선 송수신기(4050)로부터 출력된 신호를 처리하고 처리된 신호를 디스플레이(4030)로 전송할 수 있다. 또한, 무선 송수신기(3250)는 어플리케이션 프로세서(4010)로부터 출력된 신호를 무선 신호로 변경하고 변경된 무선 신호를 안테나(4060)를 통하여 외부 장치로 출력할 수 있다.Accordingly, the application processor 4010 can process the signal output from the wireless transceiver 4050 and transmit the processed signal to the display 4030. Additionally, the wireless transceiver 3250 can change the signal output from the application processor 4010 into a wireless signal and output the changed wireless signal to an external device through the antenna 4060.

입력 장치(4040)는 어플리케이션 프로세서(4010)의 동작을 제어하기 위한 제어 신호 또는 어플리케이션 프로세서(4010)에 의하여 처리될 데이터를 입력할 수 있는 장치로서, 터치 패드 (touch pad)와 컴퓨터 마우스(computer mouse)와 같은 포인팅 장치(pointing device), 키패드(keypad), 또는 키보드로 구현될 수 있다.The input device 4040 is a device that can input control signals for controlling the operation of the application processor 4010 or data to be processed by the application processor 4010, and includes a touch pad and a computer mouse. ) may be implemented with a pointing device, keypad, or keyboard.

이때 어플리케이션 프로세서(4010)는 본 개시의 실시예에 따른 모니터(130) 및 DVFS 컨트롤러(200)를 더 포함할 수 있고, DVFS 컨트롤러(200)는 모니터(130)가 모니터링한 버퍼 용량 및 응답 대기 시간을 기초로 DVFS 동작을 수행할 수 있다.At this time, the application processor 4010 may further include a monitor 130 and a DVFS controller 200 according to an embodiment of the present disclosure, and the DVFS controller 200 may monitor the buffer capacity and response latency monitored by the monitor 130. DVFS operations can be performed based on .

도 12에는 도시되지 않았으나, 통신 장치(40)에 구비되는 각종 구성들에 클럭 신호를 제공하는 클럭 관리부 및 전원 전압을 제공하는 파워 관리부를 더 포함할 수 있다.Although not shown in FIG. 12, the communication device 40 may further include a clock management unit that provides clock signals and a power management unit that provides a power supply voltage to various components provided in the communication device 40.

상술한 바와 같은 본 개시에 따른 집적 회로(10)를 이용하면, 코어의 워크로드를 구분하고 구분된 워크로드를 기초로 전압-주파수 레벨에 대한 스케일링 인자를 결정할 수 있어 상황에 맞게 효율적으로 소모 전력을 조절할 수 있다.Using the integrated circuit 10 according to the present disclosure as described above, it is possible to classify the workload of the core and determine a scaling factor for the voltage-frequency level based on the divided workload, thereby efficiently consuming power according to the situation. can be adjusted.

이상에서와 같이 도면과 명세서에서 예시적인 실시예들이 개시되었다. 본 명세서에서 특정한 용어를 사용하여 실시예들을 설명되었으나, 이는 단지 본 개시의 기술적 사상을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 특허청구범위에 기재된 본 개시의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로 본 기술분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다.As above, exemplary embodiments have been disclosed in the drawings and specification. Although embodiments have been described in this specification using specific terms, this is only used for the purpose of explaining the technical idea of the present disclosure and is not used to limit the meaning or scope of the present disclosure as set forth in the claims. . Therefore, those skilled in the art will understand that various modifications and other equivalent embodiments are possible therefrom.

Claims

at least one core configured to process instructions according to voltage-frequency levels;
a shared buffer that receives a request from the at least one core, accesses an external memory according to the request, and receives a response from the external memory;
a monitor that monitors the shared buffer to obtain a buffer capacity of the shared buffer and a response waiting time for the response received from the external memory; and
Receive the buffer capacity and the response waiting time from the monitor, classify the workload of the at least one core based on the buffer capacity and the response waiting time, and set the voltage-frequency level based on the divided workload. An integrated circuit including a Dynamic Voltage and Frequency Scaling (DVFS) controller that determines a scaling factor for .

According to claim 1,
The DVFS controller divides the workload of the at least one core into a first workload or a second workload based on the buffer capacity and the response waiting time,
The first workload is,
An integrated circuit comprising more requests to access the external memory than the second workload.

According to clause 2,
The DVFS controller is,
An integrated circuit that classifies the workload of the at least one core as a first workload when the buffer capacity indicates a full state and the response waiting time is longer than a threshold time.

According to clause 3,
The shared buffer includes a plurality of blocks,
In the state where the buffer capacity is full,
An integrated circuit indicating a state in which the number of blocks using more than a critical capacity among the plurality of blocks is greater than a critical number.

According to clause 2,
Further comprising a temperature sensor that measures the temperature of the at least one core and transmits the measured temperature to the DVFS controller,
The DVFS controller is,
When the buffer capacity is not full or the response waiting time is shorter than a threshold time, an integrated circuit that divides the workload of the core whose measured temperature is higher than the threshold temperature into a second workload.

According to claim 1,
The DVFS controller is,
a memory that stores training data including the buffer capacity, the response latency, and the scaling factor;
Further comprising a processor that trains an artificial neural network model using the training data,
The processor,
An integrated circuit that determines a scaling factor corresponding to the buffer capacity and the response latency received from the monitor using the learned artificial neural network model after the learning is completed.

A method of operating an integrated circuit, comprising:
Monitoring a shared buffer and obtaining a buffer capacity of the shared buffer and a response waiting time for a response that the shared buffer receives from an external memory;
Classifying workloads of cores based on the buffer capacity and the response waiting time; and
A method comprising determining a scaling factor for the voltage-frequency level of the core based on the workload of the divided core.

processor;
at least one memory;
a bus connecting the processor and the at least one memory;
The workload of at least one core is divided based on the buffer capacity of the shared buffer and the response waiting time for the response received from the bus, a scaling factor is determined based on the divided workload, and based on the determined scaling factor DVFS controller that generates voltage control signals and clock control signals;
a power management unit that adjusts the size of the power supply voltage provided to the at least one core in response to the voltage control signal; and
A clock manager configured to adjust the frequency of a clock signal provided to the at least one core in response to the clock control signal,
The processor,
the at least one core configured to process instructions according to the magnitude of the power voltage and the frequency of the clock signal;
the shared buffer receiving a request from the at least one core, accessing the bus according to the request, and receiving the response from the bus; and
A computing system comprising a monitor that monitors the shared buffer to obtain the buffer capacity and the response waiting time.

According to clause 8,
The DVFS controller is,
Workload classification logic that classifies the workload of the at least one core;
a DVFS governor module that determines the scaling factor;
a power management unit driver that generates a power supply voltage control signal based on the determined scaling factor; and
A computing system further comprising a clock management unit driver that generates a frequency control signal based on the determined scaling factor.

In clause 8
The monitor is,
further monitor the identifier for the at least one core,
The DVFS controller is,
Identifying a core according to the identifier, and dividing the workload of the identified core into a first workload or a second workload based on the buffer capacity and the response waiting time,
The first workload is,
A computing system comprising more requests to access the at least one memory than the second workload.