KR102123117B1

KR102123117B1 - Apparatus and method for predicting performance and power in multi-node system

Info

Publication number: KR102123117B1
Application number: KR1020190005045A
Authority: KR
Inventors: 정의영; 한상우; 정태양
Original assignee: 연세대학교 산학협력단
Priority date: 2019-01-15
Filing date: 2019-01-15
Publication date: 2020-06-15

Abstract

The present invention relates to an apparatus for predicting performance and power in a multi-node system and a method thereof. According to an embodiment, the apparatus for predicting performance and power comprises: an interface unit which receives parameters for a multi-node application, which includes a message passing interface (MPI) command, and parameters of hardware based on a multi-node, which performs the application; and a performance and power amount predictor which applies the received parameters to at least one of a performance prediction model and a power prediction model to predict a result value of at least one of performance information and a power amount according to the application performance in the multi-node.

Description

Apparatus and method for predicting performance and power in a multi-node system{APPARATUS AND METHOD FOR PREDICTING PERFORMANCE AND POWER IN MULTI-NODE SYSTEM}

본 발명은 다중 노드 시스템에서 성능과 전력을 예측하는 장치 및 그 방법 에 관한 것으로서, 보다 상세하게는 다중 노드 시스템에서 데이터 이동 분석을 통해 성능 및 전력 예측하는 기술적 사상에 관한 것이다.The present invention relates to an apparatus and a method for predicting performance and power in a multi-node system, and more particularly, to a technical idea of predicting performance and power through data movement analysis in a multi-node system.

병렬 컴퓨팅은 큰 연산을 작은 연산으로 나눠 동시에 병렬적으로 연산하는 방법을 의미한다. Parallel computing refers to a method of dividing a large operation into smaller operations and computing them in parallel at the same time.

또한, 병렬 컴퓨팅에서 여러 대의 컴퓨터들이 연결되어 하나의 시스템처럼 동작하는 컴퓨터들의 집합을 컴퓨터 클러스터라고 한다. 클러스터(다중 노드)는 일반적으로 단일 컴퓨터(노드)보다 더 뛰어난 성능과 안정성을 제공하며, 비슷한 성능과 안정성을 제공하는 단일 컴퓨터보다 비용 면에서 훨씬 더 효율적이다.In addition, in parallel computing, a set of computers connected to several computers and acting as one system is called a computer cluster. Clusters (multi-nodes) generally provide better performance and reliability than single computers (nodes), and are much more cost-effective than single computers that offer similar performance and reliability.

도 1은 병렬 프로그램의 동작 구조를 설명하기 위한 도면이다. 1 is a diagram for explaining an operation structure of a parallel program.

도 1을 참조하면, 참조부호 100은 단일 노드에서 데이터를 연산하는 직렬 처리 방식과, 다중 노드에서 데이터를 연산하는 병렬 처리 방식을 시간 상으로 비교한 결과를 나타낸다. Referring to FIG. 1, reference numeral 100 denotes a result of comparing a serial processing method for computing data at a single node and a parallel processing method for computing data at multiple nodes over time.

참조부호 100에 따르면, 다중 노드를 이용한 병렬 처리 방식은 다중 노드(Node 1 내지 Node 4)에서 동일한 시간에 다수의 데이터(P1 내지 P4)를 분산하여 처리할 수 있어, 단일 노드를 이용한 직렬 처리 방식 보다 빠른 속도로 데이터 연산을 수행할 수 있다.According to reference numeral 100, the parallel processing method using multiple nodes can distribute and process a plurality of data P1 to P4 at the same time in multiple nodes (Node 1 to Node 4), so that the serial processing method using a single node Data operations can be performed at a faster rate.

도 2는 다중 노드의 네트워크 토폴로지를 설명하기 위한 도면이다. 2 is a diagram for explaining a network topology of multiple nodes.

도 2를 참조하면, 참조부호 200은 컴퓨터 네트워크의 요소(노드)들을 물리적으로 연결하는 네트워크 토폴로지 형태에 대한 예시를 나타낸다.2, reference numeral 200 denotes an example of a network topology type that physically connects elements (nodes) of a computer network.

참조부호 200에 따르면, 네트워크 토폴로지는 다수의 노드가 참조부호 200의 (a)에 도시된 형태로 연결되는 링형(Ring type) 토폴로지, (b)에 도시된 형태로 연결되는 메쉬(Mesh type)형 토폴로지, (c)에 도시된 형태로 연결되는 스타형(Star type) 토폴로지 및 (d)에 도시된 형태로 연결되는 완전 연결형(Full connected) 토폴로지로 구분할 수 있다. According to reference numeral 200, the network topology is a ring type topology in which a plurality of nodes are connected in the form shown in (a) of reference numeral 200, and a mesh type connected in the form shown in (b). It can be divided into a topology, a star type topology connected in the form shown in (c), and a full connected topology connected in the form shown in (d).

또한, 네트워크 토폴로지는 다수의 노드가 참조부호 200의 (e)에 도시된 형태로 연결되는 라인 형(Line type) 토폴로지, (f)에 도시된 형태로 연결되는 트리형(Tree type) 토폴로지 및 (g)에 도시된 형태로 연결되는 버스형(Bus type) 토폴로지로 구분할 수도 있다. In addition, the network topology includes a line type topology in which a plurality of nodes are connected in the form shown in (e) of reference numeral 200, a tree type topology connected in the form shown in (f), and ( It can also be divided into a bus type (Bus type) topology connected in the form shown in g).

구체적으로, 네트워크 토폴로지는 형태에 따라 데이터를 이동하는 처리 방식에 차이가 있고, 성능과 전력에서도 차이를 보이고 있다. Specifically, the network topology has a difference in a processing method for moving data according to a form, and also shows a difference in performance and power.

노드에 대한 성능과 전력을 예측하는 기존 기술은 단일 노드에서만 성능 및 전력을 예측하기에 현재 널리 이용되는 다중 노드 시스템에 적용하기에는 무리가 있다.The existing technology for predicting performance and power for a node is difficult to apply to a currently widely used multi-node system for predicting performance and power for only a single node.

또한, 기존 기술은 다중 노드 시스템에서의 토폴로지의 형태에 대한 고려 없이 성능 및 전력을 예측하기에 출력되는 성능 정보 및 전력량에 대한 정확성과 신뢰성이 떨어진다는 문제가 있다.In addition, the existing technology has a problem in that accuracy and reliability of performance information and power output to predict performance and power are deteriorated without considering the shape of the topology in a multi-node system.

한국공개특허 제10-2010-7003117호 "병렬 컴퓨터에서의 사전적 전력 관리"Korean Patent Publication No. 10-2010-7003117 "Proactive power management in parallel computers"

본 발명은 다중 노드의 동작을 필요로 하는 어플리케이션에 대한 성능 및 소모 전력량을 용이하게 예측할 수 있는 성능 및 전력 예측 장치와 그 방법을 제공하고자 한다.The present invention is to provide a performance and power prediction apparatus and method for easily predicting performance and power consumption for an application requiring operation of multiple nodes.

또한, 본 발명은 동일한 어플리케이션을 동작 시킬 때 토폴로지의 형태에 따라 서로 다른 성능 및 전력 변화를 예측할 수 있는 성능 및 전력 예측 장치와 그 방법을 제공하고자 한다.In addition, the present invention is to provide a performance and power prediction apparatus and method for predicting different performance and power change according to the type of topology when operating the same application.

또한, 본발명은 토폴로지 형태를 고려하여 성능 및 소모 전력량의 예측값을 제공함으로써, 성능 및 소모 전력량에 대한 예측값의 정확성과 신뢰성을 향상시킬 수 있는 성능 및 전력 예측 장치와 그 방법을 제공하고자 한다.In addition, the present invention intends to provide a performance and power predicting apparatus and a method for improving the accuracy and reliability of a predicted value for performance and power consumption by providing a predicted value of performance and power consumption in consideration of the topology type.

일실시예에 따른 성능 및 전력 예측 장치는 MPI(Message Passing Interface) 명령어가 포함된 다중 노드용 어플리케이션 및 상기 어플리케이션을 수행하는 다중 노드 기반의 하드웨어의 매개변수(Parameter)를 수신하는 인터페이스부 및 상기 수신한 매개변수를 성능 예측 모델 및 전력 예측 모델 중 적어도 하나의 모델에 적용하여 상기 다중 노드에서 상기 어플리케이션 수행에 따른 성능 정보 및 전력량 중 적어도 하나의 결과값을 예측하는 성능 및 전력량 예측부를 포함할 수 있다. The apparatus for predicting performance and power according to an embodiment includes an interface unit for receiving an application for a multi-node including a message passing interface (MPI) command, and parameters of hardware based on a multi-node performing the application. And applying a parameter to at least one of the performance prediction model and the power prediction model to include a performance and power amount prediction unit for predicting a result value of at least one of performance information and power amount according to the application execution in the multiple nodes. .

일측에 따르면, 상기 매개변수는 상기 다중 노드를 구성하는 노드들의 개수 정보, 상기 다중 노드의 네트워크 토폴로지 정보, 상기 다중 노드를 구성하는 각 노드들의 하드웨어 정보 및 유휴소모전력 정보 중 적어도 하나를 포함할 수 있다. According to one side, the parameter may include at least one of the number of nodes constituting the multi-node, network topology information of the multi-node, hardware information of each node constituting the multi-node, and idle power consumption information. have.

일측에 따르면, 상기 성능 예측 모델 및 상기 전력 예측 모델 중 적어도 하나의 모델은 상기 다중 노드를 구성하는 노드들의 개수 정보 및 상기 다중 노드의 네트워크 토폴로지 정보를 입력으로 수신하여 상기 어플리케이션 수행에 따른 성능 정보 및 전력량 중 적어도 하나의 결과값을 예측하는 모델일 수 있다. According to one side, at least one of the performance prediction model and the power prediction model receives the number of nodes constituting the multi-node and the network topology information of the multi-node as input, and performs performance information according to the application execution and It may be a model for predicting a result value of at least one of the amount of power.

일측에 따르면, 성능 및 전력 예측 장치는 상기 어플리케이션에 포함된 MPI 명령어를 추출 및 분할하고, 상기 분할된 MPI 명령어에 기초하여 상기 어플리케이션 수행에 따른 성능 정보를 예측하는 상기 성능 예측 모델과, 상기 분할된 MPI 명령어에 기초하여 상기 어플리케이션 수행에 따른 전력량을 예측하는 상기 전력 예측 모델을 생성하는 예측 모델 생성부를 더 포함할 수 있다.According to one side, the performance and power prediction apparatus extracts and splits the MPI instruction included in the application, and predicts performance information according to the application performance based on the divided MPI instruction, and the divided A prediction model generator for generating the power prediction model predicting the amount of power according to the application execution based on the MPI instruction may be further included.

일측에 따르면, 상기 예측 모델 생성부는 상기 어플리케이션에 포함된 MPI 명령어를 상기 다중 노드를 구성하는 각 노드에서 데이터를 연산하기 위한 직렬적 명령어 집합 및 상기 다중 노드에서 데이터를 연산하기 위한 병렬적 명령어 집합으로 분할할 수 있다. According to one side, the prediction model generating unit includes an MPI instruction included in the application as a serial instruction set for computing data at each node constituting the multi-node and a parallel instruction set for computing data at the multi-node. Can be divided.

일측에 따르면, 상기 예측 모델 생성부는 상기 다중 노드를 구성하는 각 노드별 데이터 연산에 소모되는 시간의 총합과, 상기 각 노드간의 데이터 이동에 소모되는 시간의 총합의 합산을 통해 전체 수행시간을 도출하는 상기 성능 예측 모델을 생성할 수 있다. According to one side, the prediction model generator derives the total execution time by summing up the sum of time spent in data operation for each node constituting the multi-node and the sum of time spent in data movement between the nodes. The performance prediction model can be generated.

일측에 따르면, 상기 각 노드별 데이터 연산에 소모되는 시간은 상기 각 노드별 연산 데이터량과 상기 각 노드별 데이터 처리 속도 사이의 비율을 반영한 값을 통해 도출될 수 있다. According to one side, the time spent in calculating data for each node may be derived through a value reflecting a ratio between the amount of calculated data for each node and the data processing speed for each node.

일측에 따르면, 상기 각 노드간의 데이터 이동에 소모되는 시간은 상기 다중 노드의 전체 연산 데이터량과 기설정된 대역폭 사이의 비율에 라우팅 복잡도를 반영한 값을 통해 도출될 수 있다. According to one side, the time spent for data movement between each node may be derived through a value reflecting routing complexity in a ratio between the total amount of computed data of the multiple nodes and a predetermined bandwidth.

일측에 따르면, 상기 예측 모델 생성부는 상기 다중 노드를 구성하는 각 노드별 데이터 연산에 따른 소모 전력의 총합과, 상기 각 노드간의 데이터 이동에 따른 소모 전력의 총합의 합산을 통해 전체 소모 전력량을 도출하는 상기 전력 예측 모델을 생성할 수 있다. According to one side, the prediction model generator derives the total amount of power consumption through summation of power consumption according to data calculation for each node constituting the multi-node, and sum of power consumption according to data movement between the nodes. The power prediction model can be generated.

일측에 따르면, 상기 각 노드별 데이터 연산에 따른 소모 전력은 기측정된 정적 소모 전력값에 상기 다중 노드에서의 전체 수행시간을 반영한 값과, 상기 분할된 MPI 명령어의 분류별 연산 횟수가 반영된 상기 각 노드별 동적 소모 전력의 총합의 합산을 통해 도출될 수 있다. According to one side, the power consumption according to the data operation for each node is a value that reflects the total execution time in the multi-node to a previously measured static power consumption value, and the number of operations for each classification of the divided MPI instruction is reflected. It can be derived through the summation of the sum of dynamic power consumption for each star.

일측에 따르면, 상기 각 노드간의 데이터 이동에 따른 소모 전력은 상기 다중 노드를 구성하는 각 노드 사이에 구비된 케이블에서의 단위소모 전력값에 상기 각 노드간의 데이터 이동에 소모되는 시간을 반영한 값들의 합산을 통해 도출될 수 있다. According to one side, the power consumption according to data movement between each node is a sum of values reflecting the unit power consumption value in a cable provided between each node constituting the multiple nodes and the time spent in data movement between the nodes. Can be derived through

일실시예에 따른 성능 및 전력 예측 방법은 인터페이스부에서 MPI(Message Passing Interface) 명령어가 포함된 다중 노드용 어플리케이션 및 상기 어플리케이션을 수행하는 다중 노드 기반의 하드웨어의 매개변수(Parameter)를 수신하는 단계 및 성능 및 전력량 예측부에서 상기 수신한 매개변수를 성능 예측 모델 및 전력 예측 모델 중 어느 하나의 모델에 적용하여 상기 다중 노드에서 상기 어플리케이션 수행에 따른 성능 정보 및 전력량 중 적어도 하나의 결과값을 예측하는 단계를 포함할 수 있다. A method of predicting performance and power according to an embodiment includes receiving an application parameter for a multi-node including a message passing interface (MPI) command from an interface unit and parameters of hardware based on a multi-node performing the application, and Predicting a result value of at least one of performance information and power according to the application performance in the multi-node by applying the received parameter from the performance and power prediction unit to any one of a performance prediction model and a power prediction model It may include.

일측에 따르면, 성능 및 전력 예측 방법은 예측 모델 생성부에서 상기 어플리케이션에 포함된 MPI 명령어를 추출 및 분할하고, 상기 분할된 MPI 명령어에 기초하여 상기 어플리케이션 수행에 따른 성능 정보를 예측하는 상기 성능 예측 모델과, 상기 분할된 MPI 명령어에 기초하여 상기 어플리케이션 수행에 따른 전력량을 예측하는 상기 전력 예측 모델을 생성하는 단계를 더 포함할 수 있다. According to one side, a performance and power prediction method is a prediction model generating unit extracting and dividing an MPI instruction included in the application from the prediction model generator, and predicting performance information according to the application execution based on the divided MPI instruction. And, based on the divided MPI command may further include generating the power prediction model for predicting the amount of power according to the application execution.

일실시예에 따르면, 다중 노드의 동작을 필요로 하는 어플리케이션에 대한 성능 및 소모 전력량을 용이하게 예측할 수 있다.According to one embodiment, it is possible to easily predict performance and power consumption for an application requiring operation of multiple nodes.

일실시예에 따르면, 동일한 어플리케이션을 동작 시킬 때 토폴로지의 형태에 따라 서로 다른 성능 및 전력 변화를 예측할 수 있다.According to an embodiment, when operating the same application, different performances and power changes can be predicted according to the shape of the topology.

일실시예에 따르면, 토폴로지 형태를 고려하여 성능 및 소모 전력량의 예측값을 제공함으로써, 성능 및 소모 전력량에 대한 예측값의 정확성과 신뢰성을 향상시킬 수 있다.According to an embodiment, by providing a predicted value of performance and power consumption in consideration of a topology form, accuracy and reliability of a predicted value for performance and power consumption can be improved.

도 1은 병렬 프로그램의 동작 구조를 설명하기 위한 도면이다.
도 2는 다중 노드의 네트워크 토폴로지를 설명하기 위한 도면이다.
도 3은 일실시예에 따른 성능 및 전력 예측 장치를 설명하기 위한 도면이다.
도 4는 일실시예에 따른 성능 및 전력 예측 방법을 설명하기 위한 도면이다.1 is a diagram for explaining an operation structure of a parallel program.
2 is a diagram for explaining a network topology of multiple nodes.
3 is a diagram for describing a performance and power prediction apparatus according to an embodiment.
4 is a view for explaining a method for predicting performance and power according to an embodiment.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시예들은 다양한 형태로 실시될 수 있으며 본 명세서에 설명된 실시예들에 한정되지 않는다.Specific structural or functional descriptions of the embodiments according to the concept of the present invention disclosed in this specification are exemplified only for the purpose of illustrating the embodiments according to the concept of the present invention, and the embodiments according to the concept of the present invention These can be implemented in various forms and are not limited to the embodiments described herein.

본 발명의 개념에 따른 실시예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시예들을 도면에 예시하고 본 명세서에 상세하게 설명하고자 한다. 그러나, 이는 본 발명의 개념에 따른 실시예들을 특정한 개시형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Embodiments according to the concept of the present invention can be applied to various changes and have various forms, so the embodiments will be illustrated in the drawings and described in detail herein. However, this is not intended to limit the embodiments according to the concept of the present invention to specific disclosure forms, and includes modifications, equivalents, or substitutes included in the spirit and scope of the present invention.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만, 예를 들어 본 발명의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but the components should not be limited by the terms. The above terms are only for the purpose of distinguishing one component from other components, for example, without departing from the scope of rights according to the concept of the present invention, the first component may be referred to as the second component, Similarly, the second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 표현들, 예를 들어 "~사이에"와 "바로~사이에" 또는 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When an element is said to be "connected" to or "connected" to another component, it is understood that other components may be directly connected to or connected to the other component, but may exist in the middle. It should be. On the other hand, when a component is said to be "directly connected" or "directly connected" to another component, it should be understood that no other component exists in the middle. Expressions describing the relationship between the components, for example, "between" and "immediately between" or "directly adjacent to" should be interpreted similarly.

본 명세서에서 사용한 용어는 단지 특정한 실시예들을 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is only used to describe specific embodiments and is not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, the terms "include" or "have" are intended to designate the presence of a feature, number, step, action, component, part, or combination thereof as described, one or more other features or numbers, It should be understood that the existence or addition possibilities of steps, actions, components, parts or combinations thereof are not excluded in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person skilled in the art to which the present invention pertains. Terms such as those defined in a commonly used dictionary should be interpreted as having meanings consistent with meanings in the context of related technologies, and should not be interpreted as ideal or excessively formal meanings unless explicitly defined herein. Does not.

이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나, 특허출원의 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, the scope of the patent application is not limited or limited by these embodiments. The same reference numerals in each drawing denote the same members.

도 3은 일실시예에 따른 성능 및 전력 예측 장치를 설명하기 위한 도면이다. 3 is a diagram for describing a performance and power prediction apparatus according to an embodiment.

도 3을 참조하면, 일실시예에 따른 성능 및 전력 예측 장치(300)는 다중 노드의 동작을 필요로 하는 어플리케이션에 대한 성능 및 소모 전력량을 용이하게 예측할 수 있다. Referring to FIG. 3, the apparatus for predicting performance and power according to an embodiment 300 can easily predict performance and power consumption for an application requiring operation of multiple nodes.

또한, 성능 및 전력 예측 장치(300)는 동일한 어플리케이션을 동작 시킬 때 토폴로지 형태에 따라 서로 다른 성능 및 전력 변화를 예측할 수 있다. In addition, the performance and power prediction apparatus 300 may predict different performance and power changes according to the topology type when operating the same application.

또한, 성능 및 전력 예측 장치(300)는 토폴로지 형태를 고려하여 성능 및 소모 전력량의 예측값을 제공함으로써, 성능 및 소모 전력량에 대한 예측값의 정확성과 신뢰성을 향상시킬 수 있다. In addition, the performance and power predicting apparatus 300 may improve the accuracy and reliability of the predicted values for performance and power consumption by providing the predicted values for performance and power consumption in consideration of the topology type.

이를 위해, 성능 및 전력 예측 장치(300)는 인터페이스부(310)와 예측 모델 생성부(320)와 성능 및 전력량 예측부(330)를 포함할 수 있다.To this end, the performance and power prediction apparatus 300 may include an interface unit 310, a prediction model generator 320, and a performance and power amount prediction unit 330.

구체적으로, 일실시예에 따른 인터페이스부(310)는 MPI(Message Passing Interface) 명령어가 포함된 다중 노드용 어플리케이션 및 어플리케이션을 수행하는 다중 노드 기반의 하드웨어의 매개변수(Parameter)를 수신할 수 있다. Specifically, the interface unit 310 according to an embodiment may receive parameters for a multi-node application and a multi-node-based hardware for performing an application for an application including an MPI (Message Passing Interface) command.

여기서 MPI는 분산 및 병렬 처리에서 정보의 교환에 대해 기술하는 표준을 의미한다. Here, MPI means a standard that describes the exchange of information in distributed and parallel processing.

보다 구체적으로, MPI는 각자의 메모리를 지역적으로 따로 가지는 프로세스들로 구성된 다중 노드 시스템에서 프로세스들 사이의 통신을 메시지들의 송신(Sending)과 수신(Receiving)으로만 구현하는 프로그래밍 모델을 의미한다.More specifically, MPI refers to a programming model that implements communication between processes only by sending (sending) and receiving (receiving) messages in a multi-node system composed of processes having their own memory locally.

매개변수는 성능 및 전력량의 예측을 목표로 하는 다중 노드 시스템의 하드웨어 매개변수일 수 있다. The parameters may be hardware parameters of a multi-node system aimed at predicting performance and power consumption.

다시 말해, 일실시예에 따른 성능 및 전력 예측 장치(300)는 다중 노드 A를 통해 MPI 명령어가 포함된 다중 노드용 어플리케이션을 통해 예측 모델을 생성하면, 성능 및 전력량의 예측을 목표로 하는 다중 노드 B에 대한 하드웨어 매개변수를 생성된 예측 모델의 입력으로 적용함으로써, 다중 노드 B에서 어플리케이션 이용에 따른 성능 및 전력량을 예측할 수 있다. In other words, the performance and power prediction apparatus 300 according to an embodiment generates a prediction model through a multi-node application that includes an MPI instruction through multi-node A, and multi-nodes aiming at prediction of performance and power consumption. By applying the hardware parameters for B as the input of the generated prediction model, it is possible to predict the performance and power consumption according to application use in the multi-node B.

일측에 따르면, 매개변수는 다중 노드를 구성하는 노드들의 개수 정보, 다중 노드의 네트워크 토폴로지 정보, 다중 노드를 구성하는 각 노드들의 하드웨어 정보 및 유휴소모전력 정보 중 적어도 하나를 포함할 수 있다. According to one side, the parameter may include at least one of the number of nodes constituting the multiple nodes, the network topology information of the multiple nodes, the hardware information of each node constituting the multiple nodes, and the idle power consumption information.

또한, 매개변수는 네트워크 토폴로지에 따른 데이터 효율성 정보를 더 포함할 수도 있으나, 전술한 예시에 한정되지 않고 다중 노드 시스템으로부터 도출될 수 있는 다양한 매개변수가 포함될 수 있다. In addition, the parameters may further include data efficiency information according to the network topology, but are not limited to the above-described examples, and various parameters that may be derived from a multi-node system may be included.

예를 들면, 다중 노드의 네트워크 토폴로지 정보는 네트워크 토폴로지 형태에 관한 정보를 포함하고, 토폴로지의 형태는 링형(Ring type), 메쉬(Mesh type)형, 스타형(Star type), 완전 연결형(Full connected), 라인형(Line type), 트리형(Tree type) 및 버스형(Bus type) 중 적어도 하나를 포함할 수 있으나, 토폴로지 형태는 전술한 예시에 한정되지 않고, 다양한 연결 형태를 포함할 수 있다.For example, the network topology information of multiple nodes includes information on the network topology type, and the topology types are ring type, mesh type, star type, and full connected type. ), a line type (Line type), a tree type (Tree type) and may include at least one of a bus type (Bus type), but the topology type is not limited to the above-described examples, and may include various connection types. .

다음으로, 일실시예에 따른 성능 및 전력량 예측부(330)는 수신한 매개변수를 성능 예측 모델 및 전력 예측 모델 중 적어도 하나의 모델에 적용하여 다중 노드에서 어플리케이션 수행에 따른 성능 정보 및 전력량 중 적어도 하나의 결과값을 예측할 수 있다. Next, the performance and power amount predicting unit 330 according to an embodiment applies at least one of the performance prediction model and the power prediction model to the received parameters to perform at least one of performance information and power amount according to application execution in multiple nodes. One result can be predicted.

일측에 따르면, 성능 예측 모델 및 전력 예측 모델 중 적어도 하나의 모델은 다중 노드를 구성하는 노드들의 개수 정보 및 다중 노드의 네트워크 토폴로지 정보를 입력으로 수신하여 어플리케이션 수행에 따른 성능 정보 및 전력량 중 적어도 하나의 결과값을 예측하는 모델일 수 있다. According to one side, at least one of the performance prediction model and the power prediction model receives at least one of the number of nodes constituting the multi-nodes and the network topology information of the multi-nodes as inputs, and the performance information and the amount of power according to application execution. It may be a model for predicting the result.

다시 말해, 일실시예에 따른 성능 예측 모델 및 전력 예측 모델은 성능 및 전력량의 예측을 목표로 하는 다중 노드 B와 관련된 매개변수인 다중 노드 B를 구성하는 노드들의 개수 정보 및 다중 노드 B의 네트워크 토폴로지 정보를 입력으로 수신하여 다중 노드 B에서 어플리케이션 이용에 따른 성능 및 전력량을 예측하는 모델일 수 있다. In other words, the performance prediction model and the power prediction model according to an embodiment include information on the number of nodes constituting the multi-node B, which is a parameter related to the multi-node B, and the network topology of the multi-node B. It may be a model that receives information as input and predicts performance and power consumption according to application use in multiple Node Bs.

보다 구체적으로, 성능 예측 모델 및 전력 예측 모델은 노드의 개수에 따라 데이터 이동량과 각 노드별 연산량을 유추하고, 네트워크 토폴로지 정보에 따라 다중 노드에서 전체 실행시간 및 소모전력을 예측하는 모델일 수 있다. More specifically, the performance prediction model and the power prediction model may be a model that infers a data movement amount and a calculation amount for each node according to the number of nodes, and predicts total execution time and power consumption in multiple nodes according to network topology information.

다시 말해, 성능 예측 모델 및 전력 예측 모델은 특정 어플리케이션을 다중 노드에서 수행시킬 때 데이터 이동량을 분석하여, 네트워크 토폴로지에 따른 명령어의 실행시간 및 소모 전력량을 예측할 수 있다. In other words, the performance prediction model and the power prediction model can predict the execution time and power consumption of instructions according to the network topology by analyzing the data movement amount when a specific application is executed in multiple nodes.

한편, 예측 모델 생성부(320)는 어플리케이션에 포함된 MPI 명령어를 추출 및 분할하고, 분할된 MPI 명령어에 기초하여 어플리케이션 수행에 따른 성능 정보를 예측하는 성능 예측 모델과, 분할된 MPI 명령어에 기초하여 어플리케이션 수행에 따른 전력량을 예측하는 전력 예측 모델을 생성할 수 있다. Meanwhile, the prediction model generator 320 extracts and splits the MPI instructions included in the application, and predicts performance information according to application performance based on the divided MPI instructions, and based on the divided MPI instructions. A power prediction model for predicting the amount of power according to application execution may be generated.

보다 구체적으로, 예측 모델 생성부(320)는 어플리케이션의 동작이 수행되는 동안 MPI 명령어를 추출할 수 있으며, MPI 명령어에는 어느 프로세서(노드)에서 어느 프로세서(노드)로 데이터를 요청하는지에 관한 정보와 요청하는 데이터의 사이즈 정보 등이 명시되어 있다. More specifically, the prediction model generator 320 may extract an MPI instruction while an application operation is performed, and the MPI instruction may include information regarding which processor (node) to which processor (node) is requesting data. Size information of requested data is specified.

또한, MPI 명령어는 일 뿐만 아니라 일 대 다, 다 대 일 통신의 요청 가능하며 어플리케이션이 구동될 때 실시간으로 추출이 가능하여, 어플리케이션 수행시켰을 때 MPI명령어가 호출되면 그것들을 파싱하여 데이터 이동량을 추출 할 수 있다. In addition, MPI commands can be requested for one-to-many, many-to-one communication as well as real-time extraction when the application is running. When the MPI command is called when the application is executed, the MPI commands are parsed to extract the data movement amount. Can be.

일측에 따르면, 예측 모델 생성부(320)는 어플리케이션에 포함된 MPI 명령어를 다중 노드를 구성하는 각 노드에서 데이터를 연산하기 위한 직렬적 명령어 집합 및 다중 노드에서 데이터를 연산하기 위한 병렬적 명령어 집합으로 분할할 수 있다. According to one side, the prediction model generation unit 320 includes MPI instructions included in an application as a serial instruction set for computing data at each node constituting multiple nodes and a parallel instruction set for computing data at multiple nodes. Can be divided.

보다 구체적으로, 예측 모델 생성부(320)는 수행된 명령어 순서를 확인하며 하드웨어적으로 동작을 분석할 수 있다. 이 때, 예측 모델 생성부(320)는 명령어 사이에서 호출되는 MPI 명령어를 토대로 다중 노드에서 처리되는 명령어 구간을 구분할 수 있다.More specifically, the prediction model generator 320 may check the order of the performed instructions and analyze the operation in hardware. At this time, the prediction model generator 320 may classify instruction sections processed in multiple nodes based on MPI instructions called between instructions.

이를 통해, 예측 모델 생성부(320)는 단일 노드에서의 직렬적 처리구간, 다중 노드에서의 병렬적 처리구간으로 명령어 순서를 분할하는 것이 가능하며, 전술한 대로 MPI 명령어에는 출발 노드, 목표 노드, 데이터 양에 대한 정보가 포함 되어 있어 전송되는 데이터의 양을 파악할 수 있다. Through this, the prediction model generator 320 can divide the instruction sequence into a serial processing section in a single node and a parallel processing section in multiple nodes. As described above, the MPI command includes a starting node, a target node, Information about the amount of data is included, so you can grasp the amount of data transmitted.

즉, 예측 모델 생성부(320)는 분할된 명령어 집합을 이용하여 연산의 종류를 파악할 수 있으며, 이러한 연산의 종류를 이용하여 각 노드별로 연산량과, 데이터량을 추측할 수도 있다.That is, the prediction model generator 320 may grasp the type of operation using the divided instruction set, and may estimate the amount of operation and the amount of data for each node using the type of operation.

일측에 따르면, 예측 모델 생성부(320)는 MPI 명령어를 직렬적 명령어 집합과 병렬적 명령어 집합뿐만 아니라 노드간의 데이터 이동과 관련한 데이터 이동 명령어 집합으로도 분할할 수 있으나, 전술한 예시에 한정되지 않고 성능 및 전력량 예측을 위해 필요한 다양한 기준으로 분할할 수 있다.According to one side, the prediction model generator 320 may divide the MPI instruction into a serial instruction set and a parallel instruction set, as well as a data movement instruction set related to data movement between nodes, but is not limited to the above-described example. And it can be divided into a variety of criteria required for power consumption prediction.

일측에 따르면, 예측 모델 생성부(320)는 다중 노드를 구성하는 각 노드별 데이터 연산에 소모되는 시간(T_multi _-node)의 총합과, 각 노드간의 데이터 이동에 소모되는 시간(T_data _- _trasnsfer)의 총합의 합산을 통해 전체 수행시간(T_total)을 도출하는 성능 예측 모델을 생성할 수 있다. According to one side, the prediction model generator 320 is the sum of the time (T _multi _-node ) spent in data calculation for each node constituting multiple nodes, and the time spent in data movement between each node (T _data _- _trasnsfer). ) Can generate a performance prediction model that derives the _total execution time (T _total ) through the summation of ).

다시 말해, 일실시예에 따른 성능 예측 모델을 하기 수학식1로 표현될 수 있다.In other words, the performance prediction model according to an embodiment may be expressed by Equation 1 below.

[수학식1][Equation 1]

일측에 따르면, 수학식 1에서 각 노드별 데이터 연산에 소모되는 시간(T_data _-trasnsfer)은 단일 노드의 연산 처리 수행시간값(T_single _-node) 중에서 가장 큰 값을 도출하는 하기 수학식2를 통해 도출될 수 있다. According to one side, in Equation 1, the time (T _data _-trasnsfer ) spent on data calculation for each node is the following Equation 2, which derives the largest value among the execution time values (T _single _-node ) of a single node. Can be derived through

[수학식2] [Equation 2]

일측에 따르면, 각 노드별 데이터 연산에 소모되는 시간(T_multi _-node)은 각 노드별 연산 데이터량(N_single-data)과 각 노드별 데이터 처리 속도(F_single _-node) 사이의 비율을 반영한 값을 통해 도출될 수 있다. According to one side, the time (T _multi _-node ) spent on data calculation for each node reflects the ratio between the amount of computation data (N _single-data ) for each node and the data processing speed (F _single _-node ) for each node. It can be derived from a value.

다시 말해, 수학식2에서 각 노드별 데이터 연산에 소모되는 시간(T_multi _-node)을 도출하기 위한 단일 노드의 연산 처리 수행시간값(T_single-node)은 하기 수학식 3을 통해 도출될 수 있다. In other words, in Equation 2, the operation time value (T _single-node ) of a single node for deriving the time (T _multi _-node ) spent on data calculation for each node may be derived through Equation 3 below. have.

[수학식 3] [Equation 3]

예를 들면, 각 노드별 데이터 처리 속도(F_single _-node)는 각 노드의 하드웨어 성능, 메모리 구조 및 동작주파수 중 적어도 하나의 정보를 통해 결정될 수 있으며, 이는 각 노드의 하드웨어 성능에 기재된 것을 활용하거나 대상 하드웨어의 실측을 통해 획득할 수 있다. 또한, 각 노드의 하드웨어는 각 노드에 대응되는 프로세서(Processor)일 수 있다. For example, the data processing speed (F _single _-node ) for each node may be determined through at least one information of hardware performance, memory structure, and operating frequency of each node, which may be utilized as described in hardware performance of each node or It can be obtained through measurement of target hardware. Further, the hardware of each node may be a processor corresponding to each node.

한편, 활용될 수 있는 프로세서의 개수는 노드의 개수에 따라 달라질 수 있고, 병렬성에서 차이가 발생될 수 있으며, 이는 다중 노드에서의 병렬 연산 성능에도 영향을 미칠 수 있다. On the other hand, the number of processors that can be utilized may vary depending on the number of nodes, and a difference may occur in parallelism, which may also affect the performance of parallel computation in multiple nodes.

노드의 수와 최대로 활용될 수 있는 병렬성은 비례 관계에 있으며, 병렬성이 증가할수록 노드 당 처리 연산량이 감소할 수 있다. The number of nodes and the parallelism that can be utilized to the maximum have a proportional relationship, and as the parallelism increases, the throughput per node may decrease.

일측에 따르면, 다중 노드의 전체 연산 데이터량(N_data)은 어플리케이션 특성에 따라 달라질 수 있으며, MPI 명령어의 코드 분석을 통해 도출할 수 있다. According to one side, the total amount of operation data (N _data ) of multiple nodes may vary depending on application characteristics, and may be derived through code analysis of MPI instructions.

또한, 다중 노드의 전체 연산 데이터량(N_data)는 여러 개의 병렬 오퍼레이션(N_{op_i}, 여기서 i는 1 이상의 자연수)으로 구성 될 수 있으며, 각 개별 오퍼레이션은 다중 노드에 나뉘어져 수행될 수 있다.In addition, the total operation data amount N _data of multiple nodes may be composed of several parallel operations (N _{op_i} , where i is a natural number of 1 or more), and each individual operation may be performed by being divided into multiple nodes.

즉, 각 노드별 연산 데이터량(N_single-data)은 여러 개의 병렬 오퍼레이션(N_op _{_i})을 다중 노드에서 연산이 수행되는 노드의 개수(n)로 나누는 하기 수학식 4를 통해 도출될 수 있다.That is, the amount of operation data (N _single-data ) for each node may be derived through Equation 4 below, which divides several parallel operations (N _op _{_i} ) by the number (n) of nodes on which multiple operations are performed. .

[수학식 4] [Equation 4]

일측에 따르면, 각 노드간의 데이터 이동에 소모되는 시간(T_data _- _trasnsfer)은 다중 노드의 전체 연산 데이터량(N_data)과 기설정된 대역폭(B) 사이의 비율에 라우팅 복잡도(C_R)를 반영한 값을 통해 도출될 수 있다. According to one side, the time spent in data movement between each node (T _data _- _trasnsfer ) reflects the routing complexity (C _R ) in the ratio between the total amount of computed data (N _data ) and the preset bandwidth (B) of multiple nodes. It can be derived from a value.

다시 말해, 각 노드간의 데이터 이동에 소모되는 시간(T_data _- _trasnsfer)은 하기 수학식 5를 통해 도출될 수 있다. In other words, the time (T _data _- _trasnsfer ) spent on data movement between each node may be derived through Equation 5 below.

[수학식5] [Equation 5]

여기서, 대괄호는 가우스 기호를 의미하고, N_basic은 데이터가 이동할 때의 단위 사이즈를 나타낸다.Here, square brackets mean a Gaussian symbol, and N _basic indicates a unit size when data is moved.

보다 구체적으로, 수학식5에서는 단위 사이즈(N_basic)가 64byte인 경우, 10byte를 보낼 때나 40byte를 보낼 때나 동일하게 64byte 단위로 데이터가 이동하기 때문에 가우스 값을 추가하여 단위 사이즈(N_basic)를 맞출 수 있다.More specifically, in Equation 5, when the unit size (N _basic ) is 64 bytes, data is moved in units of 64 bytes in the same way when sending 10 bytes or sending 40 bytes, so a Gaussian value is added to match the unit size (N _basic ). Can be.

한편, 수학식5에서 라우팅 복잡도(C_R)는 라우팅 스킴(Scheme)에 따라 결정될 수 있다. Meanwhile, in Equation 5, the routing complexity (C _R ) may be determined according to a routing scheme.

예를 들면, 라우팅 스킴은 다중 노드의 네트워크 토폴로지 형태에 기초하여 결정될 수 있다.For example, the routing scheme can be determined based on the multi-node network topology type.

또한, 라우팅 복잡도(C_R)는 라우팅 스킴에 따른 minimum hop count를 연산하는 하기 수학식6을 통해 도출될 수 있다. In addition, the routing complexity (C _R ) can be derived through Equation 6 below, which calculates the minimum hop count according to the routing scheme.

[수학식6][Equation 6]

일측에 따르면, 예측 모델 생성부(320)는 다중 노드를 구성하는 각 노드별 데이터 연산에 따른 소모 전력(W_single _-node)의 총합과, 각 노드간의 데이터 이동에 따른 소모 전력(W_data _-transfer)의 총합의 합산을 통해 전체 소모 전력량(W_total)을 도출하는 전력 예측 모델을 생성할 수 있다. According to one side, the prediction model generating unit 320 is the sum of the power consumption (W _single _-node ) according to data operation for each node constituting multiple nodes, and the power consumption (W _data _-transfer) according to data movement between each node. ) Can generate a power prediction model that derives the _total power consumption (W _total ) through the summation of ).

다시 말해, 일실시예에 따른 전력 예측 모델은 하기 수학식7로 표현될 수 있다. In other words, the power prediction model according to an embodiment may be expressed by Equation 7 below.

[수학식7][Equation 7]

일측에 따르면, 각 노드별 데이터 연산에 따른 소모 전력(W_single _-node)은 기측정된 정적 소모 전력값(P_static)에 다중 노드에서의 전체 수행시간(T_total)을 반영한 값과, 분할된 MPI 명령어의 분류별 연산 횟수(N_inst)가 반영된 각 노드별 동적 소모 전력(W_compute)의 총합의 합산을 통해 도출될 수 있다. According to one side, the power consumption (W _single _-node ) according to the data calculation for each node is a value that reflects the _total execution time (T _total ) at multiple nodes to the previously measured static power consumption value (P _static ), and is divided. It can be derived through summation of the sum of dynamic power consumption (W _compute ) for each node in which the number of operations (N _inst ) for each classification of the MPI instruction is reflected.

다시 말해, 각 노드별 데이터 연산에 따른 소모 전력(W_single _-node)은 하기 수학식8을 통해 도출될 수 있다. In other words, power consumption (W _single _-node ) according to data calculation for each node may be derived through Equation 8 below.

[수학식8] [Equation 8]

여기서, 정적 소모 전력(P_static)은 각 노드에 기재되는 수치 또는 실험을 통해 측정된 값으로부터 도출될 수 있다. Here, the static power consumption (P _static ) may be derived from a value described through each node or a value measured through experiments.

한편, 수학식8에서 각 노드별 동적 소모 전력(W_compute)은 각 노드에서 데이터를 연산할 때 소모되는 전력의 차이를 반영한 값으로서, 전력을 측정하고자 하는 대상 노드의 정적 소모 전력(W_static)과 기준 노드의 정적 소모 전력(W_stdstatic) 사이의 비율에 기준 노드의 동적 소모 전력(W_stdcompute)을 반영하는 하기 수학식9를 통해 도출될 수 있다. Meanwhile, in Equation 8, the dynamic power consumption (W _compute ) for each node is a value reflecting the difference in power consumed when calculating data at each node, and the _static power consumption of the target node (W _static ) to measure power It can be derived through Equation 9 below, which reflects the dynamic power consumption of the reference node (W _stdcompute ) in the ratio between the static power consumption of the reference node (W _stdstatic ).

[수학식9] [Equation 9]

일측에 따르면, 수학식9에서 기준 노드는 전력 측정이 가능한 노드(단일 컴퓨터)를 의미할 수 있다. 또한, 전력 측정이 불가한 노드의 전력값을 구하기 위하여 정적 소모 전력(W_static)과 기준 노드의 정적 소모 전력(W_stdstatic) 사이의 비율을 이용할 수 있다. According to one side, in Equation 9, the reference node may mean a node (a single computer) capable of measuring power. In addition, a ratio between the _static power consumption (W _static ) and the static power consumption (W _stdstatic ) of the reference node may be used to obtain a power value of a node that cannot measure power.

예를 들면, CPU의 경우, 정적 소모 전력(W_static)과 기준 노드의 정적 소모 전력(W_stdstatic)는 제조사에서 제공하는 Thermal Design Power(TDP)를 통해 결정될 수 있다.For example, in the case of the CPU, the static power consumption (W _static ) and the static power consumption (W _stdstatic ) of the reference node may be determined through thermal design power (TDP) provided by the manufacturer.

한편, 수학식9에서 기준 노드의 동적 소모 전력(W_stdcompute)은 분할된 MPI 명령어의 분류별 연산 횟수(N_inst)에 명령어의 분류에 따른 기준 소모 전력(P_stdInst)을 반영하는 하기 수학식10을 통해 도출될 수 있다. Meanwhile, in Equation 9, the dynamic power consumption of the reference node (W _stdcompute ) reflects the following Equation 10 reflecting the reference power consumption (P _stdInst ) according to the classification of the instruction in the number of operations (N _inst ) by classification of the divided MPI instruction. Can be derived through

[수학식10] [Equation 10]

여기서, i는 1 이상의 자연수일 수 있다.Here, i may be a natural number of 1 or more.

예를 들면, 분류별 연산 횟수(N_inst)는 각 노드에서 수행되는 명령어들의 분류별 연산 횟수일 수 있으며, 명령어들은 예측 모델 생성부(320)를 통해 분할된 MPI 명령어일 수 있다.For example, the number of operations per classification (N _inst ) may be the number of operations per category of instructions performed in each node, and the instructions may be MPI instructions divided through the prediction model generator 320.

또한, 명령어들은 정수형 데이터 연산 명령어, 부동소수형 데이터 연산 명령어, 분기 제어 명령어 중 적어도 하나로 분류된 명령어일 수 있다. Further, the instructions may be an instruction classified as at least one of an integer data operation instruction, a floating-point data operation instruction, and a branch control instruction.

한편, 기준 소모 전력(P_stdInst)은 각 명령어 분류에 따른 소모 전력으로서, 실험을 통해 도출될 수 있다. Meanwhile, the reference power consumption P _stdInst is power consumption according to each instruction classification, and may be derived through experimentation.

일측에 따르면, 각 노드간의 데이터 이동에 따른 소모 전력(W_data _-transfer)은 다중 노드를 구성하는 각 노드 사이에 구비된 케이블에서의 단위소모 전력(P_transfer)값에 각 노드간의 데이터 이동에 소모되는 시간(T_trasnsfer)을 반영한 값들의 합산을 통해 도출될 수 있다. According to one side, the power consumption (W _data _-transfer ) due to data movement between each node is consumed for data movement between each node in the unit power consumption (P _transfer ) value in a cable provided between each node constituting multiple nodes. It can be derived by summing the values reflecting the time (T _trasnsfer ).

다시 말해, 각 노드간의 데이터 이동에 따른 소모 전력(W_data _-transfer)은 하기 수학식11을 통해 도출될 수 있다. In other words, power consumption (W _data _-transfer ) according to data movement between nodes may be derived through Equation 11 below.

[수학식11] [Equation 11]

구체적으로, 수학식 11에 따르면, 각 노드간의 데이터 이동에 따른 소모 전력(W_{data-transfer})은 각 노드 간의 데이터 이동에 소모되는 시간(T_transfer)과 케이블에서의 단위소모 전력(P_transfer)의 곱들의 합으로 나타낼 수 있있다. 또한, 케이블에서의 단위 소모전력(P_transfer)은 노드간 연결되어 있는 케이블에 따라 다르며, 제조사에서 제공되는 값으로 결정될 수도 있다.Specifically, according to Equation 11, the power consumption (W _{data-transfer} ) of data movement between each node is the time (T _transfer ) of data movement between each node and the unit power consumption (P _transfer ) of the cable. It can be expressed as the sum of products. In addition, the unit power consumption (P _transfer ) in the cable depends on the cable connected between the nodes, and may be determined by a value provided by the manufacturer.

일측에 따르면, 각 노드 간의 데이터 이동에 소모되는 시간(T_transfer)은 상술한 수학식 5를 통해 도출될 수 있다. 즉, 수학식 11을 통해 도출되는 각 노드간의 데이터 이동에 따른 소모 전력(W_data _-transfer) 값은 다중 노드의 네트워크 토폴로지 형태에 따라 상이한 값이 도출될 수 있다.According to one side, the time (T _transfer ) spent on data movement between each node may be derived through Equation 5 above. That is, a value of power consumption (W _data _-transfer ) according to data movement between each node derived through Equation 11 may be derived according to a network topology type of multiple nodes.

도 4는 일실시예에 따른 성능 및 전력 예측 방법을 설명하기 위한 도면이다. 4 is a view for explaining a method for predicting performance and power according to an embodiment.

다시 말해, 도 4는 도 3을 통해 설명한 일실시예에 따른 성능 및 전력 예측 장치를 이용한 성능 및 전력 예측 방법에 관한 것으로, 이후 도 4를 통해 설명하는 내용 중 도 3을 통해 설명한 내용과 중복되는 설명은 생략하기로 한다.In other words, FIG. 4 relates to a method for predicting performance and power using a performance and power predicting apparatus according to an embodiment described with reference to FIG. 3, which is later described with reference to FIG. The description will be omitted.

도 4를 참조하면, 410 단계에서 일실시예에 따른 성능 및 전력 예측 방법은 인터페이스부에서 MPI(Message Passing Interface) 명령어가 포함된 다중 노드용 어플리케이션을 수신할 수 있다. Referring to FIG. 4, in step 410, the performance and power prediction method according to an embodiment may receive an application for a multi-node including a message passing interface (MPI) command from the interface unit.

일측에 따르면, 420 단계에서 일실시예에 따른 성능 및 전력 예측 방법은 예측 모델 생성부에서 상기 어플리케이션에 포함된 MPI 명령어를 추출 및 분할하고, 상기 분할된 MPI 명령어에 기초하여 상기 어플리케이션 수행에 따른 성능 정보를 예측하는 상기 성능 예측 모델과, 상기 분할된 MPI 명령어에 기초하여 상기 어플리케이션 수행에 따른 전력량을 예측하는 상기 전력 예측 모델을 생성할 수 있다. According to one side, in step 420, the performance and power prediction method according to an embodiment extracts and splits the MPI instruction included in the application from the prediction model generator, and performs the performance according to the application execution based on the divided MPI instruction. The performance prediction model for predicting information and the power prediction model for predicting the amount of power according to the application execution may be generated based on the divided MPI instruction.

다음으로, 430 단계에서 일실시예에 따른 성능 및 전력 예측 방법은 인터페이스부에서 다중 노드용 어플리케이션을 수행하는 다중 노드 기반의 하드웨어의 매개변수(Parameter)를 수신할 수 있다. Next, in step 430, the performance and power prediction method according to an embodiment may receive parameters of hardware based on multi-nodes performing an application for multiple nodes in the interface unit.

다음으로, 440 단계에서 일실시예에 따른 성능 및 전력 예측 방법은 성능 및 전력량 예측부에서 상기 수신한 매개변수를 성능 예측 모델 및 전력 예측 모델 중 어느 하나의 모델에 적용하여 상기 다중 노드에서 상기 어플리케이션 수행에 따른 성능 정보 및 전력량 중 적어도 하나의 결과값을 예측할 수 있다. Next, in step 440, the performance and power prediction method according to an embodiment applies the received parameters from the performance and power estimation unit to any one of the performance prediction model and the power prediction model to apply the application in the multi-node. At least one result value among performance information and power amount according to performance may be predicted.

결국, 본 발명을 이용하면다중 노드의 동작을 필요로 하는 어플리케이션에 대한 성능 및 소모 전력량을 용이하게 예측할 수 있다.As a result, using the present invention, it is possible to easily predict performance and power consumption for an application requiring operation of multiple nodes.

또한, 동일한 어플리케이션을 동작 시킬 때 토폴로지의 연결 형태에 따라 서로 다른 성능 및 전력 변화를 예측할 수 있다.Also, when operating the same application, different performance and power changes can be predicted according to the topology connection type.

또한, 토폴로지의 연결 형태를 고려하여 성능 및 소모 전력량의 예측값을 제공함으로써, 성능 및 소모 전력량에 대한 예측값의 정확성과 신뢰성을 향상시킬 수 있다.In addition, by providing the predicted values of performance and power consumption in consideration of the connection type of the topology, it is possible to improve the accuracy and reliability of the predicted values for performance and power consumption.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and/or combinations of hardware components and software components. For example, the devices and components described in the embodiments include, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors (micro signal processors), microcomputers, field programmable gate arrays (FPGAs). , A programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose computers or special purpose computers. The processing device may run an operating system (OS) and one or more software applications running on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of understanding, a processing device may be described as one being used, but a person having ordinary skill in the art, the processing device may include a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include. For example, the processing device may include a plurality of processors or a processor and a controller. In addition, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instruction, or a combination of one or more of these, and configure the processing device to operate as desired, or process independently or collectively You can command the device. Software and/or data may be interpreted by a processing device, or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. , Or may be permanently or temporarily embodied in the transmitted signal wave. The software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded in the medium may be specially designed and configured for the embodiments or may be known and usable by those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks. Includes hardware devices specifically configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language code that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described by the limited drawings as described above, a person skilled in the art can make various modifications and variations from the above description. For example, the described techniques are performed in a different order than the described method, and/or the components of the described system, structure, device, circuit, etc. are combined or combined in a different form from the described method, or other components Alternatively, even if replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

300: 성능 및 전력 예측 장치 310: 인터페이스부
320: 예측 모델 생성부 330: 성능 및 전력량 예측부300: performance and power prediction device 310: interface
320: prediction model generation unit 330: performance and power amount prediction unit

Claims

An interface unit for receiving a parameter for a multi-node application including MPI (Message Passing Interface) commands and multi-node-based hardware for performing the application;
A performance and power amount prediction unit for predicting at least one result of performance information and power amount according to the application performance in the multi-node by applying the received parameter to at least one of a performance prediction model and a power prediction model;
The performance prediction model for extracting and partitioning MPI instructions included in the application, and predicting performance information according to the application performance based on the divided MPI instruction, and the application performance based on the divided MPI instruction. Prediction model generation unit for generating the power prediction model for predicting the amount of power
Including,
The prediction model generator
The MPI instruction included in the application is divided into a serial instruction set for computing data at each node constituting the multi-node and a parallel instruction set for computing data at the multi-node.
Multi-node performance and power prediction device.

According to claim 1,
The above parameters
Including at least one of the number of nodes constituting the multi-node, network topology information of the multi-node, hardware information of each node constituting the multi-node, and idle power consumption information.
Multi-node performance and power prediction device.

According to claim 1,
At least one of the performance prediction model and the power prediction model is
A model that receives the number of nodes constituting the multi-node and the network topology information of the multi-node as an input and predicts a result value of at least one of performance information and power according to the application execution
Multi-node performance and power prediction device.

delete

An interface unit for receiving a parameter for a multi-node application including MPI (Message Passing Interface) commands and multi-node-based hardware for performing the application;
A performance and power amount prediction unit for predicting at least one result value of performance information and power amount according to the application execution in the multi-node by applying the received parameter to at least one of a performance prediction model and a power prediction model;
The performance prediction model for extracting and dividing the MPI instruction included in the application, and predicting performance information according to the application execution based on the divided MPI instruction, and the application execution based on the divided MPI instruction. Prediction model generation unit for generating the power prediction model for predicting the amount of power
Including,
The prediction model generator
Generating the performance prediction model that derives the total execution time through the sum of the time spent in data operation for each node constituting the multi-node and the sum of time spent in data movement between each node
Multi-node performance and power prediction device.

The method of claim 6,
The time spent in data calculation for each node is
Derived through a value reflecting the ratio between the calculation data amount for each node and the data processing speed for each node
Multi-node performance and power prediction device.

The method of claim 6,
The time spent for data movement between each node is
Derived through a value reflecting the routing complexity in the ratio between the total amount of computed data of the multi-nodes and a predetermined bandwidth
Multi-node performance and power prediction device.

An interface unit for receiving a parameter for a multi-node application including MPI (Message Passing Interface) commands and multi-node-based hardware for performing the application;
A performance and power amount prediction unit for predicting at least one result value of performance information and power amount according to the application execution in the multi-node by applying the received parameter to at least one of a performance prediction model and a power prediction model;
The performance prediction model for extracting and dividing the MPI instruction included in the application, and predicting performance information according to the application execution based on the divided MPI instruction, and the application execution based on the divided MPI instruction. Prediction model generation unit for generating the power prediction model for predicting the amount of power
Including,
The prediction model generator
Generating the power prediction model to derive the total amount of power consumption by summing the sum of power consumption according to data operation for each node constituting the multi-node and the sum of power consumption according to data movement between the nodes
Multi-node performance and power prediction device.

The method of claim 9,
The power consumption according to the data calculation for each node is
Derived from the sum of the sum of the dynamic power consumption for each node, which reflects the total execution time in the multi-node and the calculated number of operations of each segmented MPI instruction in the previously measured static power consumption value.
Multi-node performance and power prediction device.

The method of claim 9,
The power consumption according to data movement between the nodes is
It is derived through the sum of values that reflect the time spent for data movement between each node and the unit power consumption value in the cable provided between each node constituting the multi-node.
Multi-node performance and power prediction device.

Receiving a parameter for a multi-node application including MPI (Message Passing Interface) commands from the interface unit and multi-node-based hardware for performing the application;
Predicting a result value of at least one of performance information and power according to the application performance in the multi-node by applying the received parameter from the performance and power prediction unit to any one of a performance prediction model and a power prediction model And
The prediction model generation unit extracts and splits the MPI instruction included in the application, and predicts performance information according to the application performance based on the divided MPI instruction, and based on the divided MPI instruction. Generating the power prediction model to predict the amount of power according to the application execution
Including,
The prediction model generator
The MPI instruction included in the application is divided into a serial instruction set for computing data at each node constituting the multi-node and a parallel instruction set for computing data at the multi-node.
A method for predicting performance and power of multiple nodes.

delete