KR20150009823A

KR20150009823A - Method of predicting computer processor performance and method of adjusting frequency using the method

Info

Publication number: KR20150009823A
Application number: KR1020130084229A
Authority: KR
Inventors: 염헌영; 김신규; 서동유
Original assignee: 서울대학교산학협력단
Priority date: 2013-07-17
Filing date: 2013-07-17
Publication date: 2015-01-27
Also published as: KR101543074B1

Abstract

According to an embodiment of the present invention, a method for predicting a change in the performance of a computer processor according to dynamic voltage and frequency scaling (DVFS) comprises the steps of: calculating a stop state time (T_stall(f)) at the processor operating speed (f); and calculating the total run time (T_total(f)) based on the calculated stop state time, wherein the stop state time (T_stall(f)) is a duration when the calculation operation of the processor is stopped until a response comes back from the main memory of a computer.

Description

TECHNICAL FIELD The present invention relates to a method for predicting a performance change of a computer processor according to dynamic voltage and frequency scaling (DVFS) and a method for adjusting the speed of a processor using the same,

본 발명은 프로세서 성능 변화 예측 방법 및 프로세서 동작속도 조절 방법에 관한 것으로, 동적으로 현재의 프로세서 동작속도에 따른 부하의 실행시간을 예측하고 이에 따라 최소의 전력소비량을 갖는 동작속도로 프로세서의 동작속도를 조절할 수 있는, 동적 전압 및 주파수 스케일링(DVFS)에 따른 컴퓨터 프로세서의 성능 변화 예측 방법 및 이를 이용한 프로세서 동작속도 조절 방법에 관한 것이다. The present invention relates to a method for predicting a processor performance change and a method for controlling a speed of a processor, and more particularly, to a method for predicting a performance change of a processor by dynamically estimating a load execution time according to a current processor operation speed, To a method for predicting the performance change of a computer processor according to dynamic voltage and frequency scaling (DVFS), and a method for controlling the speed of processor operation using the same.

동적 전압 및 주파수 스케일링(DVFS: Dynamic Voltage & Frequency Scaling)은 동적으로 컴퓨터 프로세서의 속도를 조절하는 기술이다. 일반적으로 프로세서의 낮은 동작속도는 낮은 전압에서도 구동이 가능한데, 전압을 낮추게 되면 전력소모 절감에 도움이 되기 때문에 동작속도가 낮아질 때 전압도 동시에 낮아지도록 조절을 할 수 있다. 이와 같이 DVFS는 프로세서의 전압과 동작 주파수(동작속도)를 조절함으로써 주어진 조건하에서 전력소비량을 줄이는데 사용되고 있다. Dynamic Voltage & Frequency Scaling (DVFS) is a technology that dynamically adjusts the speed of a computer processor. Generally, the lower operating speed of a processor can be driven at a lower voltage. Lowering the voltage helps reduce power consumption, so that the voltage can be adjusted to be lowered simultaneously when the operating speed is lowered. As such, DVFS is used to reduce power consumption under given conditions by adjusting the processor voltage and operating frequency (operating speed).

한편 프로세서의 동작 주파수와 전력소비량의 관계는 도1과 같이 나타낼 수 있다. 도1은 프로세서의 동작속도에 따른 실행시간, 소비전력, 및 전력소비량의 관계를 나타내는 그래프로서, 도1(a)를 참조하면, DVFS 단계가 높아지게 되면(즉, 동작 주파수가 높아지면) 프로세서의 처리속도가 빨라져서 실행시간이 줄어들지만 사용되는 전력(W)이 증가하게 된다. 반대로 DVFS 단계가 낮아지면(즉, 동작 주파수가 낮아지면) 사용 전력은 줄어들지만 실행시간은 증가하게 된다. The relationship between the operating frequency of the processor and the power consumption can be represented as shown in FIG. FIG. 1 is a graph showing a relationship between execution time, power consumption, and power consumption according to the operation speed of the processor. Referring to FIG. 1 (a), when the DVFS level becomes high (that is, The processing speed is increased and the execution time is reduced, but the power W to be used is increased. Conversely, if the DVFS step is lowered (ie, the operating frequency is lowered), the power used is reduced but the execution time is increased.

일반적으로 프로세서의 전력소비량(Wh)은 전력(W)과 시간(h)의 곱이므로 다음 수식1과 같이 정의된다. Generally, the power consumption (Wh) of the processor is a product of the power (W) and the time (h), and is defined as follows.

동작 주파수에 따른 프로세서의 전력과 실행시간이 도1(a)의 그래프와 같으므로, 수식1에 따라 동작 주파수에 따른 프로세서의 전력소비량은 도1(b)의 그래프와 같이 나타내어진다. 즉 주파수가 증가함에 따라 전력소비량이 점차 감소하다가 어느 주파수 이상부터는 다시 전력소비량이 증가함을 알 수 있다. Since the power of the processor and the execution time of the processor according to the operating frequency are the same as those of the graph of FIG. 1 (a), the power consumption of the processor according to the operating frequency is expressed by the graph of FIG. That is, as the frequency increases, the power consumption gradually decreases, and the power consumption increases again from a certain frequency.

그런데 이러한 전력소비량 패턴은 프로세서에 의해 실행되는 작업부하(workload)(예컨대, 프로그램)의 특성에 따라 달라진다. 메모리 집약적인 작업부하(Memory intensive workload)의 경우 주기억장치 제어기(Memory controller)의 대역폭이 병목현상을 일으키기 때문에 프로세서의 DVFS 단계를 낮추더라도 프로세서 집약적인 작업부하(CPU intensive workload)에 비해 실행시간이 길어지지 않는다. 전력은 DVFS의 단계에 비례해서 증가하지만 실행시간의 변화량은 위와 같이 작업부하의 특성에 영향을 받으므로 DVFS의 변화에 항상 비례하는 것은 아니다. However, this pattern of power consumption depends on the nature of the workload (e.g., program) being executed by the processor. In the case of memory intensive workload, the bandwidth of the memory controller becomes a bottleneck, so even though lowering the DVFS level of the processor, execution time is longer than processor intensive workload It does not. The power increases proportionally to the DVFS phase, but the variation in the runtime is not always proportional to the change in DVFS, as it is affected by the characteristics of the workload.

그러므로, 동적으로 현재의 프로세서의 동작속도에 따른 작업부하의 실행시간을 예측하고 이에 따라 전력 소비량을 줄일 수 있는 방법에 대한 필요성이 제기된다. Therefore, there is a need for a method that can dynamically predict the execution time of the workload according to the operating speed of the current processor and thereby reduce the power consumption.

본 발명의 일 실시예에 따르면, 본 발명에서는 동적으로 주기억장치 대역폭의 상태를 확인하고 DVFS 단계에 따른 부하의 실행시간을 예측하는 방법을 제공한다. According to an embodiment of the present invention, there is provided a method of dynamically checking the state of a main memory bandwidth and predicting a load execution time according to the DVFS step.

본 발명의 일 실시예에 따르면, 예측된 부하의 실행시간으로부터 DVFS 단계별 전력 소비량을 예측하고 이에 의해 최소 전력소비량을 갖도록 프로세서 동작속도를 조절하는 방법을 제공한다. According to an embodiment of the present invention, there is provided a method of predicting a DVFS step-by-step power consumption amount from a predicted load execution time and thereby adjusting a processor operation speed to have a minimum power consumption amount.

본 발명의 일 실시예에 따르면, 동적 전압 및 주파수 스케일링(DVFS)에 따른 컴퓨터 프로세서의 성능 변화 예측 방법에 있어서, 프로세서 동작속도(f)에서의 멈춤상태 시간(T_stall(f))을 산출하는 단계; 및 상기 멈춤상태 시간에 기초하여 총 실행시간(T_total(f))을 산출하는 단계;를 포함하고, 상기 멈춤상태 시간(T_stall(f))은 컴퓨터의 주기억장치로부터 응답이 돌아올 때까지 상기 프로세서의 계산동작을 멈추는 시간인 것을 특징으로 하는 프로세서 성능 변화 예측 방법을 제공할 수 있다. According to an embodiment of the present invention, there is provided a method of predicting the performance change of a computer processor according to dynamic voltage and frequency scaling (DVFS), comprising: calculating a stop state time T _stall (f) at a processor operation speed f step; And calculating a total execution time (T _total (f)) based on the stop state time, wherein the stop state time (T _stall (f) And a time for stopping the calculation operation of the processor.

본 발명의 일 실시예에 따르면, 동적 전압 및 주파수 스케일링(DVFS)에 따른 컴퓨터 프로세서의 동작속도 조절 방법에 있어서, 상기 프로세서 성능 변화 예측 방법에 의해 프로세서의 총 실행시간(T_total(f))을 산출하는 단계; 상기 총 실행시간(T_total(f))에 기초하여 동작속도(f)에 따른 프로세서의 전력 소비량을 산출하는 단계; 및 산출된 상기 전력 소비량으로부터 최소의 전력 소비량을 갖는 동작속도를 예측하는 단계;를 더 포함하는 것을 특징으로 하는 프로세서 동작속도 조절 방법을 제공할 수 있다. According to one embodiment of the present invention, in the operation timing of the computer processor, the method according to the dynamic voltage and frequency scaling (DVFS), the total running time (T _total (f)) of the processor by the processor performance impact prediction method Calculating; Calculating a power consumption amount of the processor according to the operation speed (f) based on the total execution time (T _total (f)); And estimating an operation speed having a minimum power consumption amount from the calculated power consumption amount.

본 발명의 일 실시예에 따르면, 상기 방법을 컴퓨터에서 실행시키기 위한 프로그램이 기록된 컴퓨터로 읽을 수 있는 기록매체를 제공할 수 있다. According to an embodiment of the present invention, a computer-readable recording medium having recorded thereon a program for causing the computer to execute the method may be provided.

본 발명의 일 실시예에 따르면, 동적으로 주기억장치 대역폭의 상태를 확인하고 DVFS 단계에 따른 부하의 실행시간을 예측할 수 있는 이점이 있다. According to an embodiment of the present invention, it is possible to dynamically check the state of the main memory bandwidth and estimate the execution time of the load according to the DVFS step.

본 발명의 일 실시예에 따르면, 예측된 부하의 실행시간으로부터 DVFS 단계별 전력 소비량을 예측하고 이에 의해 최소 전력소비량을 갖도록 프로세서 동작속도를 조절할 수 있는 이점이 있다.According to an embodiment of the present invention, there is an advantage in that it is possible to predict the DVFS step-by-step power consumption amount from the execution time of the predicted load, thereby adjusting the processor operation speed to have the minimum power consumption amount.

도1은 컴퓨터 프로세서의 동작속도에 따른 실행시간, 소비전력, 및 전력소비량의 관계를 나타내는 그래프,
도2는 본 발명의 일 실시예에 따라 DVFS에 따른 컴퓨터 프로세서의 성능 변화 예측 및 프로세서 동작속도 조절 방법을 설명하기 위한 흐름도,
도3은 프로세서 실행 동작을 설명하기 위한 도면,
도4는 주기억장치의 트래픽과 프로세서의 동작속도에 따른 주기억장치의 응답시간 변화를 설명하기 위한 그래프,
도5는 일 실시예에 따라 멈춤상태 시간(T_stall(f))을 구하는 방법을 설명하기 위한 흐름도,
도6은 도5의 흐름도를 설명하기 위한 도면,
도7은 도5의 흐름도에 대응하는 의사코드를 설명하기 위한 도면, 그리고,
도8은 본 발명의 일 실시예에 따라 프로세서 동작속도를 조절하였을 때의 성능 변화 예측 실험 결과를 나타내는 도면이다. 1 is a graph showing a relationship between execution time, power consumption, and power consumption according to the operating speed of a computer processor,
FIG. 2 is a flowchart illustrating a method of predicting a performance change of a computer processor according to an embodiment of the present invention and a method of adjusting a processor operation speed according to an embodiment of the present invention.
3 is a diagram for explaining a processor execution operation,
FIG. 4 is a graph for explaining the response time change of the main memory device according to the traffic of the main memory device and the operation speed of the processor,
FIG. 5 is a flowchart for explaining a method of obtaining a stop state time (T _stall (f)) according to an embodiment,
FIG. 6 is a view for explaining the flow chart of FIG. 5,
FIG. 7 is a diagram for explaining a pseudo code corresponding to the flowchart of FIG. 5,
FIG. 8 is a diagram illustrating a result of a performance change prediction test when the processor operation speed is adjusted according to an embodiment of the present invention.

이상의 본 발명의 목적들, 다른 목적들, 특징들 및 이점들은 첨부된 도면과 관련된 이하의 바람직한 실시예들을 통해서 쉽게 이해될 것이다. 그러나 본 발명은 여기서 설명되는 실시예들에 한정되지 않고 다른 형태로 구체화될 수도 있다. 오히려, 여기서 소개되는 실시예들은 개시된 내용이 철저하고 완전해질 수 있도록 그리고 당업자에게 본 발명의 사상이 충분히 전달될 수 있도록 하기 위해 제공되는 것이다.BRIEF DESCRIPTION OF THE DRAWINGS The above and other objects, features, and advantages of the present invention will become more readily apparent from the following description of preferred embodiments with reference to the accompanying drawings. However, the present invention is not limited to the embodiments described herein but may be embodied in other forms. Rather, the embodiments disclosed herein are provided so that the disclosure can be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

본 명세서에서 어떤 구성요소가 다른 구성요소 상에 있다고 언급되는 경우에 그것은 다른 구성요소 상에 직접 형성될 수 있거나 또는 그들 사이에 제 3의 구성요소가 개재될 수도 있다는 것을 의미한다. 또한, 도면들에 있어서, 구성요소들의 두께는 기술적 내용의 효과적인 설명을 위해 과장된 것이다.In the present specification, when an element is referred to as being on another element, it may be directly formed on another element, or a third element may be interposed therebetween. Further, in the drawings, the thickness of the components is exaggerated for an effective description of the technical content.

본 명세서에서 제1, 제2 등의 용어가 구성요소들을 기술하기 위해서 사용된 경우, 이들 구성요소들이 이 같은 용어들에 의해서 한정되어서는 안 된다. 이들 용어들은 단지 어느 구성요소를 다른 구성요소와 구별시키기 위해서 사용되었을 뿐이다. 여기에 설명되고 예시되는 실시예들은 그것의 상보적인 실시예들도 포함한다.Where the terms first, second, etc. are used herein to describe components, these components should not be limited by such terms. These terms have only been used to distinguish one component from another. The embodiments described and exemplified herein also include their complementary embodiments.

본 명세서에서 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 '포함한다(comprise)' 및/또는 '포함하는(comprising)'은 언급된 구성요소는 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다.The singular forms herein include plural forms unless the context clearly dictates otherwise. The terms "comprise" and / or "comprising" used in the specification do not exclude the presence or addition of one or more other elements.

본 명세서에서 프로세서의 동작속도는 동작 주파수와 동일한 개념이므로 이하 본 명세서에서는 프로세서의 동작속도를 "동작속도", "동작 주파수", 또는 "주파수"라고 표현하기로 한다. Herein, the operation speed of the processor is referred to as "operation speed "," operation frequency ", or "frequency"

이하에서 도면을 참조하여 본 발명을 상세히 설명하도록 한다. 아래의 특정 실시예들을 기술하는데 있어서 여러 가지의 특정적인 내용들은 발명을 더 구체적으로 설명하고 이해를 돕기 위해 작성되었다. 하지만 본 발명을 이해할 수 있을 정도로 이 분야의 지식을 갖고 있는 독자는 이러한 여러 가지의 특정적인 내용들이 없어도 사용될 수 있다는 것을 인지할 수 있다. 또한 발명을 기술하는 데 있어서 흔히 알려졌으면서 발명과 크게 관련 없는 부분들은 본 발명을 설명하는 데 있어 혼돈이 오는 것을 막기 위해 기술하지 않음을 미리 언급해 둔다. Hereinafter, the present invention will be described in detail with reference to the drawings. Various specific details are set forth in the following description of specific embodiments in order to provide a more detailed description of the invention and to aid in understanding the invention. However, it will be appreciated by those skilled in the art that the present invention may be understood by those skilled in the art without departing from such specific details. It should also be mentioned in advance that it is common knowledge in the description of the invention that the parts which are not largely related to the invention do not describe to prevent confusion in explaining the invention.

본 발명의 일 실시예에서, 동적으로 주기억장치 대역폭의 상태를 확인하고 DVFS 단계에 따른 작업부하의 실행시간을 예측하는 방법을 제공한다. 또한, 예측된 작업부하의 실행시간으로부터 DVFS 단계별 전력 소비량을 예측하고 이에 의해 최소 전력소비량을 갖도록 프로세서의 동작속도를 조절하는 방법을 제공한다. In one embodiment of the present invention, a method for dynamically checking the state of main memory bandwidth and predicting the execution time of a workload according to the DVFS step is provided. The present invention also provides a method for predicting the power consumption of the DVFS step by step from the execution time of the predicted work load, thereby adjusting the operation speed of the processor to have the minimum power consumption.

이를 위한 일 실시예로서 도2는 DVFS에 따른 컴퓨터 프로세서의 성능 변화 예측 및 프로세서 동작속도 조절 방법을 수행하는 예시적인 흐름도를 나타낸다. For example, FIG. 2 illustrates an exemplary flow chart for performing a performance change prediction and a processor operation speed adjustment method according to DVFS in a computer system.

도2를 참조하면, 동적 전압 및 주파수 스케일링(DVFS)에 따른 컴퓨터 프로세서의 성능 변화 예측을 위해, 프로세서의 총 실행시간을 산출하는 단계(S110) 및 프로세서 동작속도에 따른 프로세서의 전력 소비량을 산출하는 단계(S120)를 포함할 수 있다. 2, for predicting the performance change of a computer processor according to dynamic voltage and frequency scaling (DVFS), calculating a total execution time of the processor (S110) and calculating a power consumption of the processor according to the processor operation speed Step S120 may be included.

우선 단계(S110)에서, 소정의 작업부하(프로그램)를 실행하기 위한 프로세서의 총 실행시간(T_total(f))을 산출한다. 도2를 참조하여 후술하는 바와 같이 총 실행시간(T_total(f))은 프로세서의 계산상태 시간과 멈춤상태 시간을 각각 계산한 뒤 이를 합하여 산출할 수 있다. First, in step S110, the total execution time T _total (f) of the processor for executing a predetermined work load (program) is calculated. As will be described later with reference to FIG. 2, the total execution time (T _total (f)) can be calculated by calculating the calculation state time and the stop state time of the processor, respectively.

그 후 단계(S120)에서, 산출된 총 실행시간(T_total(f))에 기초하여, 프로세서의 동작속도에 따른 프로세서의 전력 소비량을 산출한다. 즉 단계(S110)를 통해 도1(a)에서 실행시간 그래프에 해당하는 값들을 산출하였고 동작 주파수에 따른 전력은 측정을 통해 그 값을 알 수 있으므로, 이 두 값을 곱하여 도1(b)에서와 같이 동작 주파수에 따른 전력소비량을 산출할 수 있다. Then, in step S120, based on the calculated total execution time T _total (f), the power consumption amount of the processor according to the operating speed of the processor is calculated. That is, the values corresponding to the execution time graph are calculated in FIG. 1 (a) through step S110, and the power according to the operating frequency can be determined through measurement. Therefore, The power consumption according to the operating frequency can be calculated.

또한 본 발명에서는 상기 단계(S120)에서 산출된 전력 소비량에 기초하여 DVFS에 따른 컴퓨터 프로세서의 동작속도를 조절하는 방법을 포함할 수 있다. 이를 위해, 단계(S120)에서 산출된 전력 소비량으로부터 최소의 전력 소비량을 갖는 동작속도를 예측하는 단계(S130), 및 최소 전력 소비량을 갖는 예측된 상기 동작속도로 상기 프로세서를 동작시키는 단계(S140)를 더 포함할 수 있다. The present invention may also include a method of adjusting the operation speed of the computer processor according to the DVFS based on the power consumption calculated in the step S120. For this, step S130 of estimating an operation speed having a minimum power consumption from the power consumption calculated in step S120, and operating the processor with the operation speed predicted with the minimum power consumption (S140) As shown in FIG.

이하에서는 도3 내지 도7을 참조하여 일 실시예에 따라 총 실행시간(T_total(f))을 산출하는 방법을 설명하기로 한다.
Hereinafter, a method of calculating the total execution time T _total (f) according to an embodiment will be described with reference to FIG. 3 to FIG.

1. 총 실행시간 1. Total Run Time TT _totaltotal (f)(f)

본 발명에 따른 프로세서의 성능변화 예측을 위해서는 임의의 동작속도(f)에서 프로세서가 특정 작업부하를 수행하는데 소요되는 총 실행시간(T_total(f))을 구하여야 한다. In order to predict the performance change of the processor according to the present invention, the total execution time (T _total (f)) required for the processor to perform a specific work load at an arbitrary operation speed (f) must be obtained.

도3은 프로세서 실행 동작을 설명하기 위한 도면으로, 일반적인 프로세서의 실행 모델(Execution Model)을 나타낸다. 3 is a diagram for explaining a processor execution operation, and shows an execution model of a general processor.

도3을 참조하면, 작업부하의 실행시간은 "계산(compute)"과 "멈춤(stall)"이라는 서로 번갈아 발생하는 두 개의 상태로 나뉠 수 있다. 프로세서는 준비된 명령어를 처리하는 동안 "계산" 상태를 유지한다. 계산 상태의 프로세서는 명령어를 처리하는 도중 필요한 데이터를 주기억장치(메모리)에 요청하고, 준비된 명령어가 모두 소진되면 계산 상태에서 멈춤 상태로 바뀐다. 멈춤 상태의 프로세서는 주기억장치로부터 응답이 돌아올 때까지 기다리게 된다. 주기억장치로부터 응답이 돌아오면 프로세서는 즉시 그 다음 명령어를 처리하기 시작하며 계산 상태로 바뀌게 된다. 따라서, 도3에 도시된 것처럼 프로세서는 계산 상태와 멈춤 상태를 반복적으로 되풀이함을 알 수 있다. Referring to FIG. 3, the execution time of a workload can be divided into two alternating states, "compute" and "stall." The processor remains in the "calculated" state while processing the prepared instructions. The processor in the computational state requests the necessary data from the main memory (memory) while processing the instruction, and changes from the computational state to the stopped state when all the prepared instructions are exhausted. The processor in the stopped state waits until a response is returned from the main memory. When a response is returned from main memory, the processor immediately begins processing the next instruction and changes to the calculated state. Accordingly, it can be seen that the processor repeats the calculation state and the stop state repeatedly, as shown in Fig.

이러한 프로세서 동작에 기반하여, 본 발명의 일 실시예에서 프로세서의 총 실행시간(T_total(f))은 다음 수식2와 같이 표현된다. Based on this processor operation, the total execution time (T _total (f)) of the processor in an embodiment of the present invention is expressed as:

여기서 T_comp(f)는 동작속도(f)에서의 계산상태 시간으로 프로세서가 주어진 작업부하를 실행하는데 소요된 시간이고, T_stall(f)은 동작속도(f)에서의 멈춤상태 시간으로 주기억장치로부터 응답이 돌아올 때까지 프로세서의 계산동작을 멈추는 시간이다. Where T _comp (f) is the computation state time at the operation speed (f), and T _stall (f) is the time required for the processor to execute the given work load. Is the time to stop the processor's computation until a response is returned from the processor.

수식2에서와 같이, 프로세서의 동작속도(f)에 따른 총 실행시간(T_total(f))을 알기 위해서는 동작속도(f)에 따른 계산상태 시간(T_comp(f))과 멈춤상태 시간(T_stall(f))을 각각 구하면 된다.
In order to know the total execution time T _total (f) according to the operation speed f of the processor, the calculation state time T _comp (f) according to the operation speed f and the stop state time T _stall (f), respectively.

2. 계산상태 시간 2. Calculation state time TT _compcomp (f)의 계산(f)

프로세서가 명령어를 처리하기 위해 계산 상태에서 소비한 사이클의 개수는 프로세서의 동작속도(f)가 바뀌어도 변하지 않는다. 따라서 동작속도(f)에 따른 계산상태 시간은 수식3과 같이 나타낼 수 있다.The number of cycles the processor spent in the computation state to process an instruction does not change even if the processor's operating speed, f, changes. Therefore, the calculation state time according to the operation speed (f) can be expressed by Equation (3).

여기서 C_comp는 주어진 작업부하의 계산에 필요한 사이클의 개수이며, 이 값은 동작속도(f)에 관계없이 작업부하가 특정되면 쉽게 도출된다.
Where C _comp is the number of cycles required to compute a given workload, which is easily derived when the workload is specified regardless of the operating speed (f).

3. 3. 멈춤상태Stopped state 시간 time TT _stallstall (f)의 계산(f)

멈춤상태 시간은 컴퓨터의 주기억장치로부터 응답이 돌아올 때까지 프로세서의 계산동작을 멈추는 시간이다. 멈춤상태 시간은 메모리 지연시간(latency)(이하에서 "주기억장치 응답시간"이라고도 함)에 비례한다. The pause state time is the amount of time that the processor stops counting until a response is returned from the computer's main memory. The pause state time is proportional to the memory latency (hereinafter also referred to as "main memory response time").

그런데 주기억장치 응답시간은 주기억장치의 트래픽과 프로세서의 동작속도(f)에 따라 변하기 때문에 이를 고려해야 한다. 이와 관련하여 도4는 주기억장치의 트래픽과 프로세서의 동작속도(f)에 따른 주기억장치의 응답시간 변화를 설명하기 위한 그래프로서, 인텔 제온(Xeon) E5-2670 프로세서에 대해 프로세서의 동작속도 및 주기억장치 트래픽에 따른 주기억장치 응답시간의 변화를 보여준다.However, since the main memory response time varies depending on the traffic of the main memory and the operation speed (f) of the processor, it should be considered. In this regard, FIG. 4 is a graph for explaining the response time change of the main storage device according to the traffic of the main memory device and the operation speed f of the processor. In the Intel Xeon E5-2670 processor, It shows the change of the main memory response time according to the device traffic.

도면에서 알 수 있듯이, X축의 주기억장치 트래픽 값이 특정 지점까지 증가하는 도중에는 Y축의 주기억장치 응답시간의 변화가 거의 없지만 이 지점을 지나면 응답시간이 급격하게 증가하는 것을 확인할 수 있다. 또한 프로세서의 동작속도(f)가 느려지면 주기억장치 응답시간이 늘어 나는 것도 알 수 있다. As can be seen from the figure, while the main memory traffic value of the X axis increases to a certain point, the response time of the main memory of the Y axis does not substantially change, but the response time increases sharply beyond this point. Also, it can be seen that the response time of the main memory is increased when the operation speed (f) of the processor is slowed down.

이와 같이 멈춤상태 시간(T_stall(f))은 주기억장치 응답시간에 비례하고, 주기억장치 응답시간은 주기억장치 트래픽(Rf)과 프로세서의 동작속도(f)의 함수이므로, 이러한 관계로부터 다음 수식4와 같이 멈춤상태 시간(T_stall(f))을 표현할 수 있다. Since the stop state time T _stall (f) is proportional to the main memory response time and the main memory response time is a function of the main memory traffic Rf and the operation speed f of the processor, And the stop state time T _stall (f) as shown in Fig.

여기서 T_stall(f₀)는 기준 동작속도(f₀)에서의 멈춤상태 시간이고, Latency(f,Rf)는 주기억장치 응답시간으로서 동작속도(f) 및 이 동작속도(f)에서의 주기억장치 트래픽(Rf)의 함수이고, Latency(f₀,R₀)는 기준 동작속도(f₀)에서의 응답시간이다. 이 때의 기준 동작속도(f₀)는 그 값을 알고 있는 임의의 동작속도가 될 수 있고, 예컨대 현재의 동작속도를 기준 동작속도(f₀)로 설정할 수 있다. Where T _stall (f ₀₎ is the stop state period at the reference operation rate _{(f 0), Latency (f} , Rf) is the main memory unit in a main storage device response time operating speed (f) and the operating speed (f) a function of traffic _{(Rf), Latency (f 0} , R 0) is the response time from the reference operation rate (f _0). Based on the operation speed of the time (f ₀₎ may be any operating speed of a known value, for example, it can be set to, based on the current operating speed of the operating speed (f _0).

이 때 Latency(f,Rf)는 각 동작속도(f) 및 주기억장치 트래픽(Rf)의 함수이므로 도4에 도시한 그래프와 같이 표현되며, 일 실시예에서 이 응답시간 함수의 값은 미리 계산하거나 측정하여 알고 있는 기지의 값이라고 가정한다. 즉 도4에서 각 동작속도(f)마다 X축 좌표의 값(주기억장치 트래픽(Rf))에 대한 Y축 좌표의 값(응답시간)이 미리 저장되어 있다고 전제한다. 이를 위해, 일 실시예에서 수학적인 모델링으로 이러한 값을 구해놓을 수도 있고, 또는 대안적으로, 각 동작속도(f)에서의 주기억장치 트래픽(Rf)에 따른 응답속도 값을 미리 측정하여 저장해놓을 수도 있다. 후자의 방법을 사용할 경우, 취할 수 있는 동작속도와 주기억장치 트래픽의 일정 간격의 값에 대해 주기억장치의 응답시간을 직접 측정하여 특성함수를 만들고 이 함수의 측정하지 못한 부분에 대해서는 보간법(Interpolation)을 이용하여 도4에서와 같은 그래프 값을 계산할 수 있다. 그러나 이것은 일 예시적인 방법이고, 다른 방법을 사용하여 각 동작속도(f)와 트래픽(Rf)에 대한 주기억장치 응답시간을 구할 수 있음은 물론이다. In this case, since the latency (f, Rf) is a function of each operation speed f and the main memory traffic Rf, it is expressed as shown in the graph of FIG. 4, It is assumed that the measured value is a known value. That is, it is presumed in FIG. 4 that the value (response time) of the Y-axis coordinate with respect to the value of the X-axis coordinate (main memory device traffic Rf) is stored in advance for each operation speed f. To this end, in one embodiment, this value may be obtained by mathematical modeling, or alternatively, a response speed value according to main memory traffic Rf at each operating speed f may be measured and stored in advance have. When the latter method is used, the response time of the main memory device is directly measured with respect to the operation speed which can be taken and the interval value of the main memory device traffic, and a characteristic function is created. The interpolation method is applied to the non- The graph value as shown in FIG. 4 can be calculated. However, this is an exemplary method, and it is needless to say that the main memory response time for each operation speed f and the traffic Rf can be obtained by using another method.

한편 수식4에서 멈춤상태 시간(T_stall(f))을 계산하기 위해서는 동작속도(f)에 따른 주기억장치 트래픽(Rf)을 알아야 하며, 이 트래픽 값도 상수가 아니라 동작속도(f)에 따라 변하는 함수이다. Meanwhile, in order to calculate the stopping state time (T _stall (f)) in Equation (4), the main memory traffic Rf according to the operation speed f must be known, and this traffic value is also changed according to the operation speed f Function.

본 발명의 일 실시예에서, 주기억장치 트래픽(Rf)은 전체 주고받은 데이터 양을 전체 실행시간으로 나눈 값으로 정의하며, 이에 따라 임의의 동작속도(f)에서의 주기억장치 트래픽(Rf)은 다음 수식5와 같이 표현될 수 있다. In an embodiment of the present invention, the main memory traffic Rf is defined as a value obtained by dividing the total amount of data sent and received by the total execution time. Accordingly, the main memory traffic Rf at an arbitrary operation speed f is Can be expressed as Equation 5. " (5) "

여기서, N_data는 주어진 작업부하의 실행시 주기억장치와 프로세서가 주고받은 전체 데이터량을 나타낸다. 예컨대 N_data는 주기억장치와 주고받는 모든 종류의 데이터 전송을 포함하며 도3에 도시한 것과 같이 데이터 불러오기(data load), 기록(writeback), 미리읽기(prefetch) 등이 모두 포함되며, 이 값은 프로세서의 동작속도(f)에 따라 변하지 않는다. Where N _data represents the total amount of data exchanged between the main memory and the processor when a given workload is executed. For example, N _data includes all kinds of data transfer to and from the main memory, and includes data loading, writeback, and prefetch as shown in FIG. 3, Does not change with the operation speed (f) of the processor.

다시 수식4를 참조하면, 기준 동작속도(예컨대, 현재의 동작 속도)(f₀)와 이 때의 주기억장치 트래픽(R₀), 멈춤 상태의 시간(T_stall(f₀))은 하드웨어 성능 감시 계수기로 쉽게 측정할 수 있다. Referring back to Equation 4, the reference operation speed (for example, the current operation speed) f ₀ , the main memory traffic R _{0 at} this time, and the time T _stall (f ₀ ) It can be easily measured with a counter.

하지만, 수식4에서 임의의 동작속도(f)에 따른 멈춤상태 시간(T_stall(f))을 구하기 위해서는 임의의 동작속도(f)에 따른 주기억장치 트래픽(Rf)을 알아야 하고, 수식5에 의하면 이 트래픽(Rf)을 구하기 위해서는 멈춤상태 시간(T_stall(f))을 알아야 한다. 즉 멈춤상태 시간(T_stall(f))이 일종의 재귀함수임을 알 수 있다. However, in order to obtain the stop state time (T _stall (f)) according to the arbitrary operation speed (f) in Equation (4), it is necessary to know the main memory traffic Rf according to an arbitrary operation speed f, In order to obtain this traffic Rf, it is necessary to know the stop state time (T _stall (f)). That is, it can be seen that the stop state time (T _stall (f)) is a kind of recursive function.

본 발명의 일 실시예에서, 상기 수식4 및 수식5로부터 멈춤상태 시간(T_stall(f))을 구하기 위해 축소구간정리(Nested Interval Theorem)를 사용할 수 있다. 그러나 축소구간정리는 이러한 재귀함수 문제를 해결하는 여러 방법 중 하나이며 본 발명이 이 방법에 제한되지 않음은 물론이다. 이하에서는 도5 및 도6을 참조하여 축소구간정리에 의해 멈춤상태 시간(T_stall(f))을 구하는 일 예시적인 방법을 설명하기로 한다. In one embodiment of the present invention, a Nested Interval Theorem can be used to calculate the stop state time (T _stall (f)) from Equations (4) and (5). However, it is a matter of course that the narrowing-down theorem is one of various methods for solving such a recursive function problem, and the present invention is not limited to this method. Hereinafter, an exemplary method for obtaining the stop state time (T _stall (f)) by the reduced interval theorem will be described with reference to FIG. 5 and FIG.

도5는 일 실시예에 따라 축소구간정리에 의해 멈춤상태 시간(T_stall(f))을 구하는 방법을 설명하기 위한 흐름도이고 도6은 도5의 방법을 설명하기 위한 도면이다. FIG. 5 is a flowchart for explaining a method of determining a stop state time (T _stall (f)) by a reduction interval correction according to an embodiment, and FIG. 6 is a diagram for explaining the method of FIG.

축소구간정리 알고리즘의 핵심은 계산을 반복함으로써 변수가 존재할 수 있는 범위를 계속 줄여나가는 것이다. 도시된 실시예에서는 반복 계산에 의해 주기억장치 트래픽(Rf)의 값을 결정한 후 이 트래픽(Rf) 값을 수식4에 대입하여 최종적으로 멈춤상태 시간(T_stall(f))을 구하기로 한다. The key to the reduction algorithm is to reduce the range in which variables can exist by repeating calculations. In the illustrated embodiment, the value of the main memory traffic Rf is determined by the iterative calculation, and the value of the traffic Rf is substituted into the equation 4 to finally obtain the stop state time T _stall (f).

반복 계산의 횟수를 설정하기 위해 우선 단계(S210)에서 k=1로 설정한다. 여기서 k는 반복 횟수를 카운트하기 위한 정수이다. 그 후 단계(S220)에서, 주기억장치 트래픽(Rf)이 가질 수 있는 값의 범위 [R_left, R_right]를 설정한다. 이 범위의 초기값으로 R_left는 0, R_right는 무한대라고 가정할 수 있다(도6(a) 참조). In order to set the number of iterative calculations, k = 1 is set in step S210. Where k is an integer for counting the number of repetitions. Thereafter, in step S220, a range [R _left , R _right ] that the main memory traffic Rf can have is set. As an initial value of this range, it can be assumed that R _left is 0 and R _right is infinity (see Fig. 6 (a)).

다음으로 단계(S230)에서, 주기억장치 트래픽(Rf)의 초기 추정값인 R_input을 선택한다. 이 초기값은 임의의 값이 될 수 있다. 일 실시예에서, 주기억장치의 응답시간이 일정하다고 가정하고, R_input = R₀ × T_comp(f)/T_comp(f₀)을 초기값으로 선택할 수 있다. Next, in step S230, R _input , which is an initial estimated value of the main memory traffic Rf, is selected. This initial value can be any value. In one embodiment, it is possible to select R _input = R ₀ x T _comp (f) / T _comp (f ₀ ) as the initial value, assuming that the response time of the main memory is constant.

그 다음 이 초기값(R_input)을 수식4에 대입하여 T_stall(f)을 계산하고 이 계산된 T_stall(f)를 수식5에 대입하여 새로운 트래픽(Rf) 추정값인 R_output을 계산한다(단계 S240, S250 및 도6(b) 참조). Next, this initial value (R _input ) is substituted into Equation 4 to calculate T _stall (f), and the calculated T _stall (f) is substituted into Equation 5 to calculate R _output , which is a new traffic (Rf) estimation value Steps S240, S250 and Fig. 6 (b)).

그 후 단계(S260)에서, 이전 단계들(S240, S250)에서 수식4의 입력으로 사용되었던 R_input과 수식5에 대입하여 얻은 R_output의 값을 비교하여 이 두 값이 동일하거나 두 값의 차이가 일정 범위(D)인지 여부를 판단하고, 만일 일정 범위(D) 이내라면 단계(S270)에서 상기 R_output을 수식4에 대입하여 산출된 값을 최종 멈춤상태 시간(T_stall(f)) 값으로 예측한다. Thereafter, in step S260, the R _input used as the _input of Equation 4 and the R _output obtained by substituting the Equation 5 are compared in the previous steps S240 and S250, and these two values are equal to or different from each other (D), if it is within the predetermined range (D), the R _output is substituted into the equation (4) in step S270 to determine the final stop state time (T _stall .

그러나 만일 단계(S260)에서 R_input과 R_output 값의 차이가 일정 범위(D)를 초과하면, 상기 계산 단계(S240, S250)를 반복한다. However, if it is determined in step S260 that R _input and R _output If the difference between the values exceeds the predetermined range (D), the calculation step (S240, S250) is repeated.

이를 위해 우선 단계(S280)에서 반복 횟수가 소정 횟수(N; 여기서 N은 2 이상의 정수)에 도달했는지를 먼저 확인하고, 소정 횟수(N)에 아직 도달하지 않았다면 단계(S300)로 진행하여, 주기억장치 트래픽(Rf)이 존재할 수 있는 범위 [R_left, R_right]를 재설정한다. 이 때 바람직하게는, 이전 단계에서 구한 R_input과 R_output을 이 범위 [R_left, R_right]의 상한과 하한으로 각각 재설정한다. 즉 R_input과 R_output 중 작은 값을 R_left로 정하고 큰 값을 R_right로 정한다(도6(c) 참조). 다음으로 단계(S310)에서, 트래픽(Rf)의 예측값인 R_input도 다시 선택한다. 일 실시예에서, 상기 단계(S300)에서 새로 설정된 범위 [R_left, R_right]의 중간 값을 새로운 예측값(R_input)으로 선택할 수 있다. To this end, it is first checked in step S280 whether the number of repetitions has reached a predetermined number N (N is an integer of 2 or more). If the number of repetitions has not yet reached the predetermined number N, the process proceeds to step S300, And resets the range [R _left , R _right ] in which the device traffic Rf may exist. Preferably, the R _input and the R _output obtained in the previous step are reset to the upper and lower limits of the range [R _left , R _right ], respectively. That is, a smaller value among R _input and R _output is set as R _left, and a larger value is set as R _right (refer to FIG. 6 (c)). Next, in step S310, the R _input , which is a predicted value of the traffic Rf, is also selected again. In one embodiment, the intermediate value of the newly set range [R _left , R _right ] may be selected as a new predicted value (R _input ) in step S300.

그 후 상기 단계(S240, S250)로 다시 돌아가서 수식4와 수식5를 이용하여 새로운 R_output을 구하고(도6(d) 참조), 단계(S260)에서 이렇게 새로 구해진 R_input과 R_output을 비교하여 상기 루틴을 반복할지 종료할지 결정한다. 만일 루틴을 다시 반복하게 되면 도6(e)에 도시한 바와 같이 현재의 R_input과 R_output 중 큰 값과 작은 값을 각각 범위의 하한(R_left)과 상한(R_right)으로 재설정한다. Thereafter, the process returns to steps S240 and S250, and a new R _output is obtained by using the equations 4 and 5 (refer to FIG. 6 (d)). In step S260, the newly obtained R _input is compared with the R _output It is determined whether the routine is repeated or terminated. If the routine is repeated again, the upper and lower values of the current R _input and R _output are reset to the lower limit (R _left ) and the upper limit (R _right ), respectively, as shown in FIG. 6 (e).

이와 같이 새로 계산이 반복될 때마다 R_input이 R_left와 R_right의 평균값으로 정해지기 때문에 주기억장치 트래픽(Rf)이 가질 수 있는 범위는 항상 단조감소(monotonically decrease) 하게 되고, 이에 따라 주기억장치 트래픽(Rf)은 최종적으로 하나의 값으로 수렴하게 된다. 그러나 하나의 값으로 수렴하기까지 많은 시간이 걸릴 수 있기 때문에 단계(S260)에서 R_input과 R_output의 차이가 특정 범위(D) 이내인지 판단하고 이 범위(D) 이내이면 계산을 종료하고 마지막 R_output 값을 이용하여 멈춤상태 시간(T_stall(f))를 산출한다. Since the R _input is determined as the average value of R _left and R _right whenever the calculation is repeated, the range that the main memory traffic (Rf) can have is always monotonically decreased, (Rf) converges finally to a single value. However, since it may take a long time to converge to a single value, it is determined whether the difference between the R _input and the R _output is within a specific range D in step S260. If the range is within the range D, _Output value is used to calculate the stop state time (T _stall (f)).

상술한 도5의 흐름도에 따른 동작은 도7에서와 같은 의사코드로 표현이 가능하다. 도7은 도5의 흐름도를 일 예시적인 의사코드로 표현한 것이다. The operation according to the flowchart of FIG. 5 described above can be expressed by the pseudo code as shown in FIG. FIG. 7 illustrates the flow chart of FIG. 5 in an exemplary pseudocode.

도7을 참조하면, 의사코드의 4번째 행은 도5의 단계(S220)에 대응하며 주기억장치 트래픽(Rf)이 가질 수 있는 값의 범위 [R_left, R_right]를 설정한다. 의사코드의 5번째 행은 단계(S230)에 대응하며 주기억장치 트래픽(Rf)의 초기 추정값인 R_input을 선택하는 것이다.Referring to FIG. 7, the fourth row of the pseudo code corresponds to step S220 of FIG. 5 and sets a range [ _R.sub.left , _R.sub.right ] that the main memory traffic Rf can have. The fifth line of the pseudo code corresponds to step S230 and selects R _input which is an initial estimated value of the main memory traffic Rf.

의사코드의 7, 8번째 행은 각각 계산 단계(S240, S250)에 대응하고, 9번째 행은 R_input과 R_output 값을 비교하는 단계(S260)에 대응한다. The seventh and eighth rows of the pseudo code respectively correspond to the calculation steps (S240 and S250), the ninth row corresponds to the R _input and the R _output Corresponding to the step S260 of comparing the values.

의사코드의 12, 13번째 행에서는 주기억장치 트래픽(Rf)이 존재할 수 있는 범위 [R_left, R_right]를 재설정하며, 의사코드의 14번째 행은 이렇게 새로 설정된 범위 [R_left, R_right]의 중간 값을 새로운 예측값(R_input)으로 선택하는 것이다. 또한 이 의사코드에서는 계산의 반복 횟수를 최대 10회로 제한하였으며 Rf의 오차범위는 5%로 설정하였음을 알 수 있다.
R [the _left , R _right ] in which the main memory traffic Rf can exist in the 12th and 13th rows of the pseudo code, and the 14th row of the pseudo code resets the range [R _left , R _right ] And the intermediate value is selected as a new predicted value (R _input ). Also, in this pseudo-code, the number of iterations of the calculation is limited to a maximum of 10, and the error range of Rf is set to 5%.

4. 총 실행시간 4. Total Run Time TT _totaltotal (f)의 계산(f)

프로세서의 성능은 주기억장치 응답시간(Latency) 뿐만 아니라 프로세서의 동작속도(f)에 따른 주기억장치의 최대 대역폭(bandwidth)에 의해서도 제한을 받는다. 주기억장치의 대역폭이 포화상태가 되면 총 실행시간(T_total(f))은 주기억장치와 데이터를 주고받는 시간(T_memory(f))보다 빠를 수 없게 된다. 일 실시예에서 데이터를 전송하는데 소요되는 최소시간은 수식6과 같이 표현할 수 있다. The performance of the processor is limited not only by the main memory latency but also by the maximum bandwidth of the main memory depending on the operation speed f of the processor. When the bandwidth of the main memory is saturated, the total execution time (T _total (f)) can not be faster than the time (T _memory (f)) for exchanging data with the main _memory . In one embodiment, the minimum time required to transmit data may be expressed as: < EMI ID = 6.0 >

여기서 Bandwidth(f)는 동작속도가 f일 때의 주기억장치 최대 대역폭을 나타내는 함수(이하 "최대 대역폭 함수"라고도 함)이다. 이 최대 대역폭 함수의 값은 기지의 값으로 미리 저장되어 있다고 가정할 수 있다. 수식4를 참조하여 설명한 응답시간 함수 Latency(f,Rf)와 유사하게, 최대 대역폭 함수도 수학적인 모델링으로 이 값을 구할 수 있고, 대안적으로, 각 동작속도(f)에서 실제의 최대 대역폭 함수 값을 미리 측정하여 저장해놓을 수도 있다. Here, Bandwidth (f) is a function indicating the maximum bandwidth of the main memory when the operation speed is f (hereinafter also referred to as "maximum bandwidth function"). It can be assumed that the value of this maximum bandwidth function is stored in advance as a known value. Similar to the response time function Latency (f, Rf) described with reference to equation (4), the maximum bandwidth function can also be found by mathematical modeling and, alternatively, at each operating speed f, The value can be measured and stored in advance.

최종적으로, 수식6의 총 실행시간(T_total(f))의 하한 값까지 고려할 때, 프로세서의 동작속도가 f일때의 작업부하를 처리하는 총 실행시간은 수식7로 표현할 수 있다.Finally, when considering the lower limit value of the total execution time (T _total (f)) of Equation 6, the total execution time for processing the work load when the operation speed of the processor is f can be expressed by Equation (7).

5. 최소 전력소비량을 갖는 동작속도의 예측5. Prediction of operating speed with minimum power consumption

이상과 같이 총 실행시간(T_total(f))을 산출하면 이에 기초하여 프로세서의 성능 변화를 예측하고 작업부하 별로 에너지 소비량 측면에서 최적의 DVFS 단계를 예측할 수 있다. 즉 도2를 참조하여 설명한 바와 같이, 프로세서의 총 실행시간(T_total(f))을 산출하면 그 후 단계(S120)에서 이 총 실행시간(T_total(f))과 동작속도(f)에 따른 전력을 곱함으로써 동작속도(f)에 따른 프로세서의 전력 소비량을 산출할 수 있으므로, 각 동작속도(f)에 따른 성능 변화를 알 수 있다. As described above, if the total execution time (T _total (f)) is calculated, the performance change of the processor can be predicted based on this, and the optimal DVFS step can be predicted in terms of energy consumption by workload. That is, as described with reference to Fig. 2, when the total execution time T _total (f) of the processor is calculated, then the total execution time T _total (f) and the operation speed f are calculated in step S120 It is possible to calculate the power consumption of the processor according to the operation speed f, so that it is possible to know a performance change according to each operation speed f.

이 때 각 동작속도(f)에서의 총 실행시간은 상술한 수식7에 의해 산출된 값이고 동작속도(f)에 따른 전력은 미리 알고 있는 측정값을 사용할 수 있다. 일반적으로 동작속도(f)에 따른 전력은 프로그램에 따라 달라지는 것이 아니라 하드웨어의 특성이므로 실행중인 프로그램에 상관없어 미리 측정이 가능하다. At this time, the total execution time at each operation speed f is a value calculated by the above-described Expression 7, and the power according to the operation speed f can use a previously known measurement value. Generally, the power according to the operation speed (f) does not depend on the program but is a characteristic of the hardware, so it can be measured in advance regardless of the program being executed.

또한 본 발명의 일 실시예에 따르면, 프로세서 성능 변화의 예측값을 이용하여 최적의 DVFS 단계에서 프로세서를 실행시킬 수 있다. 즉 도2의 단계(S120)에서 산출된 동작속도별 전력 소비량으로부터 최소의 전력 소비량을 갖는 동작속도(f)를 예측하고(단계 S130), 이 예측된 전력소비량 중 가장 적은 전력소비량을 갖는 동작속도로 프로세서를 동작시킨다(단계 S140). 이 때의 동작속도가 프로세서가 해당 작업부하를 처리하기 위한 최적의 DVFS 단계이다.
Also, according to an embodiment of the present invention, a processor can be executed in an optimal DVFS step using predicted values of processor performance changes. That is, the operation speed f having the minimum power consumption amount from the power consumption amount calculated by the operation speed calculated in the step S120 of FIG. 2 (step S130), and the operation speed having the smallest power consumption amount among the estimated power consumption amount (Step S140). The operating speed at this time is the optimal DVFS step for the processor to process the workload.

6. 실험 결과6. Experimental Results

상술한 실시예에 따른 DVFS 단계별 프로세서 성능 변화 예측 방법을 도8에 도시한 바와 같이 평가하였다. 평가를 위한 실험은 인텔 제온 E5-2670 8-코어 프로세서 2개와 128GB의 메모리를 탑재한 서버를 사용하였다. 프로세서에 탑재된 모든 종류의 프리패처(prefetcher)를 활성화하였고 하이퍼 스레딩(hyper threading)은 비활성화하였다. 운영체제는 리눅스 커널 3.2를 탑재한 우분투(Ubuntu) 12.04를 사용하였고 가상화 환경은 KVM이 사용되었다. The method of predicting the processor performance change according to the DVFS step according to the above embodiment is evaluated as shown in FIG. Experiments were conducted using two Intel Xeon E5-2670 8-core processors and 128GB of memory. All kinds of prefetchers on the processor were activated and hyper threading was disabled. The operating system used Ubuntu 12.04 with Linux kernel 3.2, and the virtualization environment used KVM.

가상머신은 하나의 가상프로세서와 2GB의 메모리가 할당되었으며, 각 가상머신은 전용코어에 할당되었다. DVFS 성능 변화 예측 방법을 평가하기 위해, 본 발명의 일 실시예에 따른 방법을 사용하여 각 프로그램의 성능을 3%, 5%, 10%, 20%만큼 저하되는 지점을 예측하고 그에 따라 성능을 감소시키도록 했고, 각 경우에 대한 실제 성능 측정값을 그래프로 표시하였다. 프로세서의 동작속도는 1.2GHz에서 2.6GHz까지 범위에서 변경 가능하도록 하였다. 여덟 개의 동일한 벤치마크 프로그램을 동시에 수행하면서 수행시간과 프로세서 동작속도를 측정하였고, 그 결과를 도8(a) 및 8(b)에 나타내었다. The virtual machine was allocated a virtual processor and 2GB of memory, and each virtual machine was assigned to a dedicated core. In order to evaluate the DVFS performance change prediction method, a method according to an embodiment of the present invention is used to predict a point where the performance of each program is lowered by 3%, 5%, 10%, 20% And the actual performance measurements for each case are plotted. The operating speed of the processor is changeable from 1.2 GHz to 2.6 GHz. Execution time and processor operation speed were measured while simultaneously executing eight identical benchmark programs, and the results are shown in FIGS. 8 (a) and 8 (b).

도8(a)는 총 29개의 프로그램의 각각에 대해, 프로세서가 가장 빠른 속도로 설정되었을 때와 비교하여 각 프로그램의 성능 저하비율을 보여주고 있다. 도8(a)의 가장 오른쪽 그래프("g-mean"으로 표시됨)는 왼쪽편의 전체 29개 프로그램에 대한 각 수치들의 평균값을 나타낸다. 그리고 도8(b)는 각 경우에 대한 프로세서의 평균 동작속도를 나타낸다. Figure 8 (a) shows the performance degradation rate of each program for each of the total 29 programs compared to when the processor was set at the fastest rate. The rightmost graph (denoted as "g-mean") in FIG. 8 (a) represents the average value of each value for all 29 programs on the left hand side. And Figure 8 (b) shows the average operating speed of the processor for each case.

도8(a)의 그래프에서 왼쪽은 비교적 메모리 집약적인(memory-intensive) 프로그램들이고 오른쪽은 비교적 계산 집약적인(CPU-intensive) 프로그램들인데, 성능저하 비율은 크게 차이가 없지만 평균 동작속도는 메모리 집약적인 프로그램들이 상대 적으로 더 낮음을 확인할 수 있다. 그러나 도8(a)와 도8(b)를 보면 전반적으로 프로세서의 동작속도에 따른 성능 변화를 잘 예측하고 있음을 확인할 수 있다. In the graph of FIG. 8 (a), the left side shows relatively memory-intensive programs and the right side shows relatively CPU-intensive programs. The performance degradation ratio is not significantly different, but the average operation speed is memory- Can be seen to be relatively low. However, FIG. 8 (a) and FIG. 8 (b) show that the overall performance of the processor is well predicted according to the operating speed of the processor.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되지 않는다. 그러므로 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 상술한 기재로부터 다양한 수정 및 변형이 가능함을 이해할 것이다. 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Although the present invention has been described with reference to the preferred embodiments and drawings, the present invention is not limited to the above embodiments. Therefore, it is to be understood that various modifications and changes may be made by those skilled in the art to which the present invention pertains. The scope of the present invention should not be limited by the described embodiments, but should be determined by the scope of the appended claims, as well as the appended claims.

Claims

A method for predicting a performance change of a computer processor according to dynamic voltage and frequency scaling (DVFS)
Calculating a stop state time (T _stall (f)) at the processor operating speed (f); And
And calculating a total execution time (T _total (f)) based on the stop state time,
Wherein the stop state time (T _stall (f)) is a time to stop the computing operation of the processor until a response is returned from the main memory of the computer.

The method according to claim 1,
Characterized in that the total execution time (T _total (f)) is the sum of the calculated state time (T _comp (f)) taken to execute a given workload and the _stall state time (T _stall How to predict performance change.

The method according to claim 1,
Wherein the stop state time (T _stall (f)) is determined based on a response time (Latency) and main memory traffic (Rf) according to the operation speed (f)

The method of claim 3,
The stop state time (T _stall (f)) is defined by the following first equation,

Here, T _stall (f ₀₎ is the stop state period at the reference operation rate _{(f 0), Latency (f} , Rf) is a main memory traffic on the operating speed (f) and the operating speed (f) (Rf) And the latency (f ₀ , R ₀ ) represents the response time at the reference operation speed (f ₀ ).

5. The method of claim 4,
The traffic Rf of the main memory is defined by the following second formula,

Where N _data represents the total amount of data exchanged between the main memory and the processor during execution of a given workload.

6. The method of claim 5, wherein the stop state time (T _stall (f)) is calculated from the first and second expressions using Nested Interval Theorem.

A method for adjusting the operating speed of a computer processor according to dynamic voltage and frequency scaling (DVFS)
Calculating a total execution time (T _total (f)) of the processor by the processor performance change prediction method according to claim 1;
Calculating a power consumption amount of the processor according to the operation speed (f) based on the total execution time (T _total (f)); And
And estimating an operation speed having a minimum power consumption amount from the calculated power consumption amount.

8. The method of claim 7,
Further comprising: operating the processor at the predicted operating speed with a minimum power consumption.

A computer-readable recording medium having recorded thereon a program for causing a computer to execute the method according to any one of claims 1 to 8.