KR102504939B1

KR102504939B1 - Cloud-based deep learning task execution time prediction system and method

Info

Publication number: KR102504939B1
Application number: KR1020200110778A
Authority: KR
Inventors: 이경용
Original assignee: 국민대학교산학협력단
Priority date: 2020-09-01
Filing date: 2020-09-01
Publication date: 2023-03-02
Also published as: WO2022050477A1; KR20220029004A

Abstract

본 발명은 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템 및 방법에 관한 것으로, 상기 시스템은 복수의 딥러닝 알고리즘들에 대한 특징 벡터들을 생성하는 특징 벡터 생성부; 상기 복수의 딥러닝 알고리즘들 각각에 대해 복수의 클라우드 인스턴스들에서 실행한 결과로서 생성된 복수의 학습 데이터들을 학습하여 성능 예측 모델을 구축하는 예측 모델 구축부; 수행시간을 예측하고자 하는 후보 딥러닝 알고리즘에 대한 후보 특징 벡터를 생성하는 후보 특징 벡터 생성부; 상기 후보 특징 벡터를 상기 성능 예측 모델에 적용하여 성능 특징 벡터를 예측하는 성능 특징 벡터 예측부; 및 상기 성능 특징 벡터를 기초로 상기 후보 딥러닝 알고리즘에 대한 수행 시간을 예측하는 수행시간 예측부를 포함한다.The present invention relates to a system and method for predicting execution time of a cloud-based deep learning task, the system comprising: a feature vector generator for generating feature vectors for a plurality of deep learning algorithms; a prediction model building unit configured to build a performance prediction model by learning a plurality of training data generated as a result of executing each of the plurality of deep learning algorithms in a plurality of cloud instances; a candidate feature vector generator for generating a candidate feature vector for a candidate deep learning algorithm to predict an execution time; a performance feature vector predictor predicting a performance feature vector by applying the candidate feature vector to the performance prediction model; and an execution time predictor for estimating an execution time of the candidate deep learning algorithm based on the performance feature vector.

Description

Execution time prediction system and method of cloud-based deep learning task {CLOUD-BASED DEEP LEARNING TASK EXECUTION TIME PREDICTION SYSTEM AND METHOD}

본 발명은 딥러닝 작업의 수행시간 예측 기술에 관한 것으로, 보다 상세하게는 다양한 하드웨어 자원에서 딥러닝 학습 작업을 수행할 때 단위 연산당 소요되는 시간을 예측하여 효과적인 환경을 구축하도록 지원할 수 있는 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템 및 방법에 관한 것이다.The present invention relates to a technology for predicting the execution time of a deep learning task, and more particularly, to predict the time required per unit operation when performing a deep learning learning task in various hardware resources to support the establishment of an effective environment based on a cloud. It relates to a system and method for predicting execution time of a deep learning task.

최근 딥러닝 알고리즘은 다양한 분야에서 우수한 성능을 보이며 인공지능의 응용 사례를 넓히고 있다. 딥러닝 모델의 학습은 단시간에 많은 컴퓨팅 자원을 필요로 하기 때문에 주로 클라우드 환경 하에서 학습 작업이 이루어지고 있다.Recently, deep learning algorithms have shown excellent performance in various fields and are expanding the application cases of artificial intelligence. Because deep learning model learning requires a lot of computing resources in a short time, learning work is mainly performed in a cloud environment.

하지만, 클라우드 컴퓨팅 서비스를 통해서 제공되는 자원의 종류가 너무 많은 탓에 사용자들은 다양한 서비스를 활용하여 최적의 딥러닝 학습 환경을 구축하는데 큰 어려움을 겪고 있다. 클라우드 인스턴스들 간의 가격 역시 큰 차이를 보이기에 성능 및 비용 측면에서 최적의 효율을 보이는 인스턴스를 선택하여 학습 작업을 진행하는 것은 매우 중요하면서도 어려운 일이다.However, due to the large number of types of resources provided through cloud computing services, users have great difficulty in establishing an optimal deep learning learning environment using various services. Since the price of cloud instances also shows a big difference, it is very important and difficult to select an instance that shows the optimal efficiency in terms of performance and cost and proceed with the learning task.

한편, 딥러닝(인공지능) 플랫폼은 인공지능 기술들, 예를 들어 영상처리, 음성인식, 자연어처리 등을 이용하여 필요에 의해서 사용자가 사용이 가능하게 해주는 제품이나 서비스를 개발하기 위한 도구를 의미할 수 있다. 최근 구현되고 있는 인공지능의 핵심 기술들은 다양한 분야로 응용 가능한 범용적인 특성을 갖고 있으며, 인공지능은 딥러닝 플랫폼의 핵심 기술에 해당할 수 있다.On the other hand, deep learning (artificial intelligence) platform refers to a tool for developing products or services that users can use as needed by using artificial intelligence technologies, such as image processing, voice recognition, and natural language processing. can do. The core technologies of artificial intelligence that are being implemented recently have general-purpose characteristics that can be applied to various fields, and artificial intelligence can correspond to the core technology of a deep learning platform.

한국공개특허 제10-2017-0078012호 (2017.07.07)Korean Patent Publication No. 10-2017-0078012 (2017.07.07)

본 발명의 일 실시예는 다양한 하드웨어 자원에서 딥러닝 학습 작업을 수행할 때 단위 연산당 소요되는 시간을 예측하여 효과적인 환경을 구축하도록 지원할 수 있는 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템 및 방법을 제공하고자 한다.An embodiment of the present invention provides a cloud-based deep learning task execution time prediction system and method that can support building an effective environment by estimating the time required per unit operation when performing a deep learning learning task in various hardware resources. want to do

본 발명의 일 실시예는 사용자 정의 코드를 클라우드 컴퓨팅 환경에서 실행하기 위해 요구되는 최적의 자원을 정확하게 추론함으로써 비용 효율적인 환경 구축이 가능한 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템 및 방법을 제공하고자 한다.One embodiment of the present invention is to provide a system and method for predicting execution time of a cloud-based deep learning task capable of constructing a cost-effective environment by accurately inferring optimal resources required to execute a user-defined code in a cloud computing environment.

실시예들 중에서, 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템은 복수의 딥러닝 알고리즘들에 대한 특징 벡터들을 생성하는 특징 벡터 생성부; 상기 복수의 딥러닝 알고리즘들 각각에 대해 복수의 클라우드 인스턴스들에서 실행한 결과로서 생성된 복수의 학습 데이터들을 학습하여 성능 예측 모델을 구축하는 예측 모델 구축부; 수행시간을 예측하고자 하는 후보 딥러닝 알고리즘에 대한 후보 특징 벡터를 생성하는 후보 특징 벡터 생성부; 상기 후보 특징 벡터를 상기 성능 예측 모델에 적용하여 성능 특징 벡터를 예측하는 성능 특징 벡터 예측부; 및 상기 성능 특징 벡터를 기초로 상기 후보 딥러닝 알고리즘에 대한 수행 시간을 예측하는 수행시간 예측부를 포함한다.Among the embodiments, a system for predicting execution time of a cloud-based deep learning task includes a feature vector generator for generating feature vectors for a plurality of deep learning algorithms; a prediction model building unit configured to build a performance prediction model by learning a plurality of training data generated as a result of executing each of the plurality of deep learning algorithms in a plurality of cloud instances; a candidate feature vector generator for generating a candidate feature vector for a candidate deep learning algorithm to predict an execution time; a performance feature vector predictor predicting a performance feature vector by applying the candidate feature vector to the performance prediction model; and an execution time predictor for estimating an execution time for the candidate deep learning algorithm based on the performance feature vector.

상기 특징 벡터 생성부는 딥러닝 알고리즘을 구현한 딥러닝 학습코드의 실행에 따른 학습 과정을 모니터링하고 상기 모니터링의 결과로서 생성된 성능 메트릭(metric)을 해당 딥러닝 알고리즘에 관한 특징 벡터로 결정할 수 있다.The feature vector generation unit may monitor a learning process according to execution of a deep learning learning code implementing the deep learning algorithm, and may determine a performance metric generated as a result of the monitoring as a feature vector for the corresponding deep learning algorithm.

상기 특징 벡터 생성부는 상기 성능 메트릭을 구성하는 복수의 필드(field)들 중 특정 필드들 만을 추출하여 압축된 특징 벡터를 생성할 수 있다.The feature vector generator may generate a compressed feature vector by extracting specific fields from among a plurality of fields constituting the performance metric.

상기 예측 모델 구축부는 특정 딥러닝 알고리즘을 제1 클라우드 인스턴스에서 n번(상기 n은 자연수) 반복 실행한 결과로서 n개의 제1 특징 벡터들을 생성하고, 상기 특정 딥러닝 알고리즘을 제2 클라우드 인스턴스에서 m번(상기 m은 자연수) 반복 실행한 결과로서 m개의 제2 특징 벡터들을 생성하며, 상기 제1 및 제2 특징 벡터들 간의 조합으로 생성되는 n*m개의 특징 벡터 쌍들을 상기 복수의 학습 데이터들에 포함시켜 상기 성능 예측 모델을 구축할 수 있다.The predictive model building unit generates n first feature vectors as a result of repeatedly executing a specific deep learning algorithm n times (where n is a natural number) in a first cloud instance, and uses the specific deep learning algorithm in a second cloud instance m m second feature vectors are generated as a result of repeated execution several times (where m is a natural number), and n*m feature vector pairs generated by a combination of the first and second feature vectors are stored in the plurality of training data. It can be included in to build the performance prediction model.

상기 특징 벡터 생성부는 복수의 특징 벡터들을 벡터 간의 거리를 기준으로 그룹화 하고, 상기 예측 모델 구축부는 상기 그룹화의 결과로 생성된 적어도 하나의 벡터 그룹마다 상기 성능 예측 모델을 독립적으로 구축할 수 있다.The feature vector generator may group the plurality of feature vectors based on distances between vectors, and the predictive model builder may independently build the performance prediction model for each of at least one vector group generated as a result of the grouping.

상기 후보 특징 벡터 생성부는 상기 후보 딥러닝 알고리즘을 구현한 후보 딥러닝 학습코드를 최소 비용의 클라우드 인스턴스에서 실행한 결과로서 상기 후보 특징 벡터를 생성할 수 있다.The candidate feature vector generation unit may generate the candidate feature vector as a result of executing a candidate deep learning learning code implementing the candidate deep learning algorithm in a cloud instance with a minimum cost.

상기 성능 특징 벡터 예측부는 상기 후보 특징 벡터를 기준으로 상기 적어도 하나의 벡터 그룹 중 어느 하나를 선택하고 해당 벡터 그룹에 대응되는 성능 예측 모델을 이용하여 상기 성능 특징 벡터를 예측할 수 있다.The performance feature vector predictor may select one of the at least one vector group based on the candidate feature vector and predict the performance feature vector by using a performance prediction model corresponding to the corresponding vector group.

상기 수행시간 예측부는 상기 성능 특징 벡터를 회귀(regressor) 모델에 적용하여 상기 수행 시간을 예측할 수 있다.The execution time estimation unit may estimate the execution time by applying the performance feature vector to a regression model.

실시예들 중에서, 클라우드 기반 딥러닝 작업의 수행시간 예측 방법은 복수의 딥러닝 알고리즘들에 대한 특징 벡터들을 생성하는 단계; 상기 복수의 딥러닝 알고리즘들 각각에 대해 복수의 클라우드 인스턴스들에서 실행한 결과로서 생성된 복수의 학습 데이터들을 학습하여 성능 예측 모델을 구축하는 단계; 수행시간을 예측하고자 하는 후보 딥러닝 알고리즘에 대한 후보 특징 벡터를 생성하는 단계; 상기 후보 특징 벡터를 상기 성능 예측 모델에 적용하여 성능 특징 벡터를 예측하는 단계; 및 상기 성능 특징 벡터를 기초로 상기 후보 딥러닝 알고리즘에 대한 수행 시간을 예측하는 단계를 포함한다.Among the embodiments, a method for predicting execution time of a cloud-based deep learning task includes generating feature vectors for a plurality of deep learning algorithms; constructing a performance prediction model by learning a plurality of training data generated as a result of executing each of the plurality of deep learning algorithms in a plurality of cloud instances; generating a candidate feature vector for a candidate deep learning algorithm whose execution time is to be predicted; predicting a performance feature vector by applying the candidate feature vector to the performance prediction model; and estimating an execution time for the candidate deep learning algorithm based on the performance feature vector.

개시된 기술은 다음의 효과를 가질 수 있다. 다만, 특정 실시예가 다음의 효과를 전부 포함하여야 한다거나 다음의 효과만을 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The disclosed technology may have the following effects. However, it does not mean that a specific embodiment must include all of the following effects or only the following effects, so it should not be understood that the scope of rights of the disclosed technology is limited thereby.

본 발명의 일 실시예에 따른 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템 및 방법은 다양한 하드웨어 자원에서 딥러닝 학습 작업을 수행할 때 단위 연산당 소요되는 시간을 예측하여 효과적인 환경을 구축하도록 지원할 수 있다.A system and method for predicting execution time of a cloud-based deep learning task according to an embodiment of the present invention can support building an effective environment by estimating the time required per unit operation when performing a deep learning learning task in various hardware resources. .

본 발명의 일 실시예에 따른 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템 및 방법은 사용자 정의 코드를 클라우드 컴퓨팅 환경에서 실행하기 위해 요구되는 최적의 자원을 정확하게 추론함으로써 비용 효율적인 환경 구축이 가능할 수 있다.A system and method for predicting execution time of a cloud-based deep learning task according to an embodiment of the present invention can build a cost-effective environment by accurately inferring optimal resources required to execute a user-defined code in a cloud computing environment.

도 1은 본 발명에 따른 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템을 설명하는 도면이다.
도 2는 도 1의 수행시간 예측 장치의 시스템 구성을 설명하는 도면이다.
도 3은 도 1의 수행시간 예측 장치의 기능적 구성을 설명하는 도면이다.
도 4는 본 발명에 따른 클라우드 기반 딥러닝 작업의 수행시간 예측 과정을 설명하는 순서도이다.
도 5는 본 발명에 따른 특징 벡터를 생성하는 과정을 설명하는 예시도이다.
도 6은 본 발명에 따른 성능 예측 모델을 생성하는 과정을 설명하는 예시도이다.
도 7은 본 발명에 따른 최종 수행시간을 예측하는 과정을 설명하는 예시도이다.1 is a diagram illustrating a system for predicting execution time of a cloud-based deep learning task according to the present invention.
FIG. 2 is a diagram explaining the system configuration of the execution time prediction device of FIG. 1 .
FIG. 3 is a diagram explaining the functional configuration of the execution time prediction device of FIG. 1 .
4 is a flowchart illustrating a process of predicting execution time of a cloud-based deep learning task according to the present invention.
5 is an exemplary view illustrating a process of generating a feature vector according to the present invention.
6 is an exemplary view illustrating a process of generating a performance prediction model according to the present invention.
7 is an exemplary view illustrating a process of estimating a final execution time according to the present invention.

본 발명에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. 또한, 본 발명에서 제시된 목적 또는 효과는 특정 실시예가 이를 전부 포함하여야 한다거나 그러한 효과만을 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.Since the description of the present invention is only an embodiment for structural or functional description, the scope of the present invention should not be construed as being limited by the embodiments described in the text. That is, since the embodiment can be changed in various ways and can have various forms, it should be understood that the scope of the present invention includes equivalents capable of realizing the technical idea. In addition, since the object or effect presented in the present invention does not mean that a specific embodiment should include all of them or only such effects, the scope of the present invention should not be construed as being limited thereto.

한편, 본 출원에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.Meanwhile, the meaning of terms described in this application should be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Terms such as "first" and "second" are used to distinguish one component from another, and the scope of rights should not be limited by these terms. For example, a first element may be termed a second element, and similarly, a second element may be termed a first element.

어떤 구성요소가 다른 구성요소에 "연결되어"있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어"있다고 언급된 때에는 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.It should be understood that when an element is referred to as being “connected” to another element, it may be directly connected to the other element, but other elements may exist in the middle. On the other hand, when an element is referred to as being "directly connected" to another element, it should be understood that no intervening elements exist. Meanwhile, other expressions describing the relationship between components, such as “between” and “immediately between” or “adjacent to” and “directly adjacent to” should be interpreted similarly.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다"또는 "가지다" 등의 용어는 실시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Expressions in the singular number should be understood to include plural expressions unless the context clearly dictates otherwise, and terms such as “comprise” or “having” refer to an embodied feature, number, step, operation, component, part, or these. It should be understood that it is intended to indicate that a combination exists, and does not preclude the possibility of the presence or addition of one or more other features, numbers, steps, operations, components, parts, or combinations thereof.

각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In each step, the identification code (eg, a, b, c, etc.) is used for convenience of explanation, and the identification code does not describe the order of each step, and each step clearly follows a specific order in context. Unless otherwise specified, it may occur in a different order than specified. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

본 발명은 컴퓨터가 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있고, 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The present invention can be implemented as computer readable code on a computer readable recording medium, and the computer readable recording medium includes all types of recording devices storing data that can be read by a computer system. . Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage devices. In addition, the computer-readable recording medium may be distributed to computer systems connected through a network, so that computer-readable codes may be stored and executed in a distributed manner.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs, unless defined otherwise. Terms defined in commonly used dictionaries should be interpreted as consistent with meanings in the context of the related art, and cannot be interpreted as having ideal or excessively formal meanings unless explicitly defined in the present application.

도 1은 본 발명에 따른 클라우드 기반 딥러닝 작업의 수행시간 예측 시스템을 설명하는 도면이다.1 is a diagram illustrating a system for predicting execution time of a cloud-based deep learning task according to the present invention.

도 1을 참조하면, 수행시간 예측 시스템(100)은 사용자 단말(110), 수행시간 예측 장치(130), 클라우드 서버(150) 및 데이터베이스(170)를 포함할 수 있다.Referring to FIG. 1 , the execution time prediction system 100 may include a user terminal 110 , an execution time prediction device 130 , a cloud server 150 and a database 170 .

사용자 단말(110)은 클라우드 서비스를 이용할 수 있는 컴퓨팅 장치에 해당할 수 있고, 스마트폰, 노트북 또는 컴퓨터로 구현될 수 있으며, 반드시 이에 한정되지 않고, 태블릿 PC 등 다양한 디바이스로도 구현될 수 있다. 사용자 단말(110)은 수행시간 예측 장치(130)와 네트워크를 통해 연결될 수 있고, 복수의 사용자 단말(110)들은 수행시간 예측 장치(140)와 동시에 연결될 수 있다. 또한, 사용자 단말(110)은 클라우드 서버(150)와 직접 연결될 수 있으며, 클라우드 서비스 이용을 위한 전용 프로그램 또는 애플리케이션을 설치하여 실행시킬 수 있다.The user terminal 110 may correspond to a computing device capable of using cloud services, and may be implemented as a smart phone, laptop, or computer, but is not necessarily limited thereto, and may be implemented as various devices such as a tablet PC. The user terminal 110 may be connected to the execution time prediction device 130 through a network, and a plurality of user terminals 110 may be simultaneously connected to the execution time prediction device 140 . In addition, the user terminal 110 may be directly connected to the cloud server 150 and may install and execute a dedicated program or application for using cloud services.

수행시간 예측 장치(130)는 클라우드 컴퓨팅 환경에서 딥러닝 학습 작업 수행 시 최적의 환경을 추천할 수 있는 알고리즘을 구동하는 시스템, 또는 이에 해당하는 서버로 구현될 수 있다. 수행시간 예측 장치(130)는 사용자 단말(110)과 네트워크를 통해 연결될 수 있고 정보를 주고받을 수 있다.The execution time prediction device 130 may be implemented as a system driving an algorithm capable of recommending an optimal environment when performing a deep learning learning task in a cloud computing environment, or a server corresponding thereto. The execution time prediction device 130 may be connected to the user terminal 110 through a network and exchange information.

또한, 수행시간 예측 장치(130)는 적어도 하나의 외부 시스템과 연동하여 동작할 수 있다. 예를 들어, 외부 시스템은 클라우드 서비스를 제공하는 클라우드 서버(150), 딥러닝 학습을 수행하는 인공지능 서버, 서비스 결제를 위한 결제 서버 등을 포함할 수 있다.In addition, the execution time prediction device 130 may operate in conjunction with at least one external system. For example, the external system may include the cloud server 150 providing cloud services, an artificial intelligence server performing deep learning learning, a payment server for service payment, and the like.

일 실시예에서, 수행시간 예측 장치(130)는 데이터베이스(170)와 연동하여 클라우드 컴퓨팅 환경에서 딥러닝 작업의 실행시간을 예측하고 클라우드 서비스를 이용한 최적의 딥러닝 환경을 추천하기 위해 필요한 데이터를 저장할 수 있다. 또한, 수행시간 예측 장치(130)는 프로세서, 메모리, 사용자 입출력부 및 네트워크 입출력부를 포함하여 구현될 수 있으며, 이에 대해서는 도 2에서 보다 자세히 설명한다.In one embodiment, the execution time prediction device 130 interworks with the database 170 to predict the execution time of a deep learning task in a cloud computing environment and to store data necessary to recommend an optimal deep learning environment using a cloud service. can In addition, the execution time prediction device 130 may be implemented by including a processor, memory, user input/output unit, and network input/output unit, which will be described in detail with reference to FIG. 2 .

클라우드 서버(150)는 클라우드 서비스를 제공하는 서버에 해당할 수 있다. 클라우드 서버(150)는 수행시간 예측 장치(130)와 네트워크를 통해 연결될 수 있으며, 사용자 단말(110)과 직접 연결될 수 있다. 클라우드 서버(150)는 수행시간 예측 장치(130)에서 수행되는 딥러닝 학습을 위한 다양한 클라우드 인스턴스들을 제공할 수 있다. 일 실시예에서, 클라우드 서버(150)는 딥러닝 플랫폼을 제공하는 서버의 역할을 수행할 수 있다.The cloud server 150 may correspond to a server providing cloud services. The cloud server 150 may be connected to the execution time prediction device 130 through a network, and may be directly connected to the user terminal 110 . The cloud server 150 may provide various cloud instances for deep learning learning performed by the execution time prediction device 130 . In one embodiment, the cloud server 150 may serve as a server providing a deep learning platform.

데이터베이스(170)는 수행시간 예측 장치(130)의 동작 과정에서 필요한 다양한 정보들을 저장하는 저장장치에 해당할 수 있다. 데이터베이스(170)는 딥러닝 알고리즘 및 이에 관한 딥러닝 학습코드에 관한 정보를 저장할 수 있고, 딥러닝 알고리즘에 관한 특징 벡터와 학습 데이터에 관한 정보를 저장할 수 있으며, 반드시 이에 한정되지 않고, 클라우드 기반 딥러닝 작업의 수행시간 예측 과정에서 다양한 형태로 수집 또는 가공된 정보들을 저장할 수 있다.The database 170 may correspond to a storage device for storing various pieces of information necessary for the operation of the execution time prediction device 130 . The database 170 may store information about a deep learning algorithm and a deep learning learning code related thereto, and may store information about a feature vector and learning data about a deep learning algorithm, but is not necessarily limited thereto, and may store cloud-based deep learning code. Information collected or processed in various forms can be stored in the process of predicting execution time of a running task.

도 2는 도 1의 수행시간 예측 장치의 시스템 구성을 설명하는 도면이다.FIG. 2 is a diagram explaining the system configuration of the execution time prediction device of FIG. 1 .

도 2를 참조하면, 수행시간 예측 장치(130)는 프로세서(210), 메모리(230), 사용자 입출력부(250) 및 네트워크 입출력부(270)를 포함하여 구현될 수 있다.Referring to FIG. 2 , the execution time prediction apparatus 130 may be implemented by including a processor 210, a memory 230, a user input/output unit 250, and a network input/output unit 270.

프로세서(210)는 수행시간 예측 장치(130)가 동작하는 과정에서의 각 단계들을 처리하는 프로시저를 실행할 수 있고, 그 과정 전반에서 읽혀지거나 작성되는 메모리(230)를 관리할 수 있으며, 메모리(230)에 있는 휘발성 메모리와 비휘발성 메모리 간의 동기화 시간을 스케줄할 수 있다. 프로세서(210)는 수행시간 예측 장치(130)의 동작 전반을 제어할 수 있고, 메모리(230), 사용자 입출력부(250) 및 네트워크 입출력부(270)와 전기적으로 연결되어 이들 간의 데이터 흐름을 제어할 수 있다. 프로세서(210)는 수행시간 예측 장치(130)의 CPU(Central Processing Unit)로 구현될 수 있다.The processor 210 may execute a procedure for processing each step in the operation of the execution time prediction device 130, manage the memory 230 read or written throughout the process, and memory ( Synchronization time between the volatile memory and the non-volatile memory in 230) can be scheduled. The processor 210 can control the overall operation of the execution time prediction device 130, and is electrically connected to the memory 230, the user input/output unit 250, and the network input/output unit 270 to control data flow between them. can do. The processor 210 may be implemented as a central processing unit (CPU) of the execution time prediction device 130 .

메모리(230)는 SSD(Solid State Drive) 또는 HDD(Hard Disk Drive)와 같은 비휘발성 메모리로 구현되어 수행시간 예측 장치(130)에 필요한 데이터 전반을 저장하는데 사용되는 보조기억장치를 포함할 수 있고, RAM(Random Access Memory)과 같은 휘발성 메모리로 구현된 주기억장치를 포함할 수 있다.The memory 230 is implemented as a non-volatile memory such as a solid state drive (SSD) or a hard disk drive (HDD) and may include an auxiliary storage device used to store all data required for the execution time prediction device 130, , may include a main memory implemented as a volatile memory such as RAM (Random Access Memory).

사용자 입출력부(250)는 사용자 입력을 수신하기 위한 환경 및 사용자에게 특정 정보를 출력하기 위한 환경을 포함할 수 있다. 예를 들어, 사용자 입출력부(250)는 터치 패드, 터치 스크린, 화상 키보드 또는 포인팅 장치와 같은 어댑터를 포함하는 입력장치 및 모니터 또는 터치스크린과 같은 어댑터를 포함하는 출력장치를 포함할 수 있다. 일 실시예에서, 사용자 입출력부(250)는 원격 접속을 통해 접속되는 컴퓨팅 장치에 해당할 수 있고, 그러한 경우, 수행시간 예측 장치(130)는 서버로서 수행될 수 있다.The user input/output unit 250 may include an environment for receiving user input and an environment for outputting specific information to the user. For example, the user input/output unit 250 may include an input device including an adapter such as a touch pad, a touch screen, an on-screen keyboard, or a pointing device, and an output device including an adapter such as a monitor or touch screen. In one embodiment, the user input/output unit 250 may correspond to a computing device connected through a remote connection, and in such a case, the execution time prediction device 130 may be implemented as a server.

네트워크 입출력부(270)은 네트워크를 통해 외부 장치 또는 시스템과 연결하기 위한 환경을 포함하고, 예를 들어, LAN(Local Area Network), MAN(Metropolitan Area Network), WAN(Wide Area Network) 및 VAN(Value Added Network) 등의 통신을 위한 어댑터를 포함할 수 있다.The network input/output unit 270 includes an environment for connecting to an external device or system through a network, and includes, for example, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), and a VAN ( An adapter for communication such as Value Added Network) may be included.

도 3은 도 1의 수행시간 예측 장치의 기능적 구성을 설명하는 도면이다.FIG. 3 is a diagram explaining the functional configuration of the execution time prediction device of FIG. 1 .

도 3을 참조하면, 수행시간 예측 장치(130)는 특징 벡터 생성부(310), 예측 모델 구축부(320), 후보 특징 벡터 생성부(330), 성능 특징 벡터 예측부(340), 수행시간 예측부(350) 및 제어부(360)를 포함할 수 있다.Referring to FIG. 3, the execution time prediction apparatus 130 includes a feature vector generator 310, a predictive model builder 320, a candidate feature vector generator 330, a performance feature vector predictor 340, and an execution time A prediction unit 350 and a control unit 360 may be included.

특징 벡터 생성부(310)는 복수의 딥러닝 알고리즘들에 대한 특징 벡터들을 생성할 수 있다. 즉, 딥러닝 알고리즘에 대응되는 특징 벡터는 딥러닝 알고리즘을 구현한 학습코드가 클라우드 인스턴스에서 실행될 경우 도출되는 특징 정보에 해당할 수 있다. 결과적으로, 특징 벡터 생성부(310)는 딥러닝 알고리즘에 대한 특징 정보를 표현하기 위하여 딥러닝 알고리즘에 대응되는 특징 벡터를 새롭게 정의하여 정확도 높은 예측 모델을 구축하기 위한 입력 데이터를 제공할 수 있다.The feature vector generator 310 may generate feature vectors for a plurality of deep learning algorithms. That is, the feature vector corresponding to the deep learning algorithm may correspond to feature information derived when the learning code implementing the deep learning algorithm is executed in the cloud instance. As a result, the feature vector generator 310 may provide input data for constructing a highly accurate predictive model by newly defining a feature vector corresponding to the deep learning algorithm in order to express feature information about the deep learning algorithm.

일 실시예에서, 특징 벡터 생성부(310)는 딥러닝 알고리즘을 구현한 딥러닝 학습코드의 실행에 따른 학습 과정을 모니터링하고 모니터링의 결과로서 생성된 성능 메트릭(metric)을 해당 딥러닝 알고리즘에 관한 특징 벡터로 결정할 수 있다. 예를 들어, 도 5에서, 특징 벡터 생성부(310)는 텐서플로우(TensorFlow), 파이토치(PyTorch) 등의 딥러닝 플랫폼(530)이 딥러닝 학습코드(510)의 실행 과정에서 제공하는 성능 메트릭(550)을 이용하여 각 딥러닝 알고리즘에 대응되는 특징 벡터를 생성할 수 있다. 해당 성능 메트릭(550)은 사용자로 하여금 작업의 특성을 관찰하고 진행 사항을 모니터링 하는 목적으로 딥러닝 플랫폼(530)에 의해 시각화(visualization) 툴과 함께 제공될 수 있다. 보다 구체적으로, 텐서플로우의 경우 모델링 과정에서 n = 2046 개의 특징값을 제공하고 있으며, 해당 특징값들은 딥러닝 알고리즘의 특성에 따라 공백 값과 유효 값을 함께 포함할 수 있다.In one embodiment, the feature vector generator 310 monitors the learning process according to the execution of the deep learning learning code implementing the deep learning algorithm, and the performance metric generated as a result of the monitoring is related to the deep learning algorithm. It can be determined as a feature vector. For example, in FIG. 5, the feature vector generation unit 310 provides performance provided by a deep learning platform 530 such as TensorFlow or PyTorch during the execution of the deep learning learning code 510. A feature vector corresponding to each deep learning algorithm may be generated using the metric 550 . The performance metric 550 may be provided along with a visualization tool by the deep learning platform 530 for the purpose of allowing the user to observe characteristics of the task and monitor progress. More specifically, in the case of TensorFlow, n = 2046 feature values are provided in the modeling process, and the feature values may include both blank values and valid values depending on the characteristics of the deep learning algorithm.

일 실시예에서, 특징 벡터 생성부(310)는 성능 메트릭을 구성하는 복수의 필드(field)들 중 특정 필드들 만을 추출하여 압축된 특징 벡터를 생성할 수 있다. 텐서플로우를 사용하는 경우 특징 벡터 생성부(310)는 2046개의 성능 메트릭 중에서 딥러닝 알고리즘의 수행과 밀접한 관련이 있는 필드들만을 추출하여 특징 벡터를 구성할 수 있다. 예를 들어, BatchMatMul 필드는 딥러닝 학습 중 행렬 곱셈에 소요되는 시간을 나타내는 메트릭에 해당할 수 있고, 특징 벡터 생성부(310)는 성능 메트릭 중 이와 관련된 필드들을 추출하여 압축된 특징 벡터를 생성할 수 있다.In an embodiment, the feature vector generator 310 may generate a compressed feature vector by extracting specific fields from among a plurality of fields constituting the performance metric. When using TensorFlow, the feature vector generator 310 may configure a feature vector by extracting only fields closely related to the performance of the deep learning algorithm among 2046 performance metrics. For example, the BatchMatMul field may correspond to a metric representing the time required for matrix multiplication during deep learning learning, and the feature vector generator 310 extracts fields related thereto from performance metrics to generate a compressed feature vector. can

일 실시예에서, 특징 벡터 생성부(310)는 복수의 특징 벡터들을 벡터 간의 거리를 기준으로 그룹화 할 수 있다. 딥러닝 알고리즘의 수행시간 예측에 있어, 사용자는 딥러닝 알고리즘을 구현하기 위한 자신만의 코드를 새롭게 작성하여 학습 모델을 새롭게 구성할 수 있다. 또한, 딥러닝 알고리즘이 매우 많기 때문에 이를 구현한 다양한 딥러닝 학습코드들을 하나의 성능 예측 모델로 분류하는 것은 쉽지 않을 수 있다. 특징 벡터 생성부(310)는 딥러닝 알고리즘에 대해 유사한 알고리즘을 하나의 클러스터로 묶어 각 클러스터 별로 성능 예측 모델이 독립적으로 생성되도록 동작할 수 있으며, 특징 벡터 간의 거리를 기준으로 유사한 알고리즘을 분류할 수 있다.In one embodiment, the feature vector generator 310 may group a plurality of feature vectors based on distances between the vectors. In predicting the execution time of the deep learning algorithm, the user can newly configure the learning model by writing his/her own code to implement the deep learning algorithm. In addition, since there are so many deep learning algorithms, it may not be easy to classify various deep learning learning codes that implement them into one performance prediction model. The feature vector generator 310 may group similar algorithms for deep learning algorithms into one cluster and operate to independently generate a performance prediction model for each cluster, and classify similar algorithms based on distances between feature vectors. there is.

예측 모델 구축부(320)는 복수의 딥러닝 알고리즘들 각각에 대해 복수의 클라우드 인스턴스들에서 실행한 결과로서 생성된 복수의 학습 데이터들을 학습하여 성능 예측 모델을 구축할 수 있다. 성능 예측 모델의 입력은 사용자가 정의한 딥러닝 학습코드를 임의의 타입의 클라우드 인스턴스에서 실행시켜 추출되는 특징 벡터에 해당할 수 있다. 이 때, 딥러닝 작업이 수행된 인스턴스 타입은 앵커 타입에 해당할 수 있다. 즉, 성능 예측 모델에 의해 예측되는 값은 앵커 타입이 아닌 다른 인스턴스 타입의 클라우드 인스턴스에서 해당 딥러닝 코드를 실행시켜 생성된 특징 벡터에 해당할 수 있다.The prediction model builder 320 may build a performance prediction model by learning a plurality of training data generated as a result of executing each of a plurality of deep learning algorithms in a plurality of cloud instances. The input of the performance prediction model may correspond to a feature vector extracted by executing a user-defined deep learning learning code on an arbitrary type of cloud instance. In this case, the instance type on which the deep learning task is performed may correspond to the anchor type. That is, the value predicted by the performance prediction model may correspond to a feature vector generated by executing the corresponding deep learning code in a cloud instance of an instance type other than the anchor type.

예를 들어, 도 6에서, 앵커노드의 인스턴스 타입이 G3.2xlarge 라면, g3.2xlarge에서 사용자 정의 코드를 실행시켜 발생되는 제1 특징 벡터(610)가 성능 예측 모델(630)의 입력이 될 수 있다. 성능 예측 모델(630)은 해당 입력을 기초로 다른 인스턴스 타입(예를 들어, P2.xlarge)의 클라우드 인스턴스에서 실행될 경우의 제2 특징 벡터(650)들을 예측할 수 있다. 즉, 성능 예측 모델(630)을 구축하기 위해서는 다양한 인스턴스 타입에서 실행되어 생성된 특징 벡터들을 학습 데이터로 사용될 필요가 있으며, 예측 모델 구축부(320)는 하나의 알고리즘을 다양한 클라우드 인스턴스들에서 실행한 결과로서 생성되는 특징 벡터들을 학습 데이터로서 학습할 수 있다.For example, in FIG. 6 , if the instance type of the anchor node is G3.2xlarge, the first feature vector 610 generated by executing the user-defined code in g3.2xlarge can be an input of the performance prediction model 630. there is. The performance prediction model 630 may predict the second feature vectors 650 when executed in a cloud instance of a different instance type (eg, P2.xlarge) based on the corresponding input. That is, in order to build the performance prediction model 630, it is necessary to use feature vectors generated by execution in various instance types as learning data, and the predictive model building unit 320 executes one algorithm in various cloud instances. Feature vectors generated as a result may be learned as learning data.

일 실시예에서, 예측 모델 구축부(320)는 특정 딥러닝 알고리즘을 제1 클라우드 인스턴스에서 n번(상기 n은 자연수) 반복 실행한 결과로서 n개의 제1 특징 벡터들을 생성하고, 특정 딥러닝 알고리즘을 제2 클라우드 인스턴스에서 m번(상기 m은 자연수) 반복 실행한 결과로서 m개의 제2 특징 벡터들을 생성하며, 제1 및 제2 특징 벡터들 간의 조합으로 생성되는 n*m개의 특징 벡터 쌍들을 복수의 학습 데이터들에 포함시켜 성능 예측 모델을 구축할 수 있다. 즉, 예측 모델 구축부(320)는 클라우드 환경에서 새로운 데이터 증강(augmentation) 기법을 적용하여 다수의 학습 데이터를 확보함으로써 성능 예측 모델의 일반성을 높일 수 있다.In one embodiment, the predictive model builder 320 generates n first feature vectors as a result of repeatedly executing a specific deep learning algorithm n times (where n is a natural number) in the first cloud instance, and generates the specific deep learning algorithm. m second feature vectors are generated as a result of repeatedly executing m times (where m is a natural number) in the second cloud instance, and n*m feature vector pairs generated as a combination of the first and second feature vectors are generated. A performance prediction model may be built by including it in a plurality of training data. That is, the predictive model builder 320 may increase the generality of the performance prediction model by securing a plurality of training data by applying a new data augmentation technique in a cloud environment.

보다 구체적으로, 예측 모델 구축부(320)는 특정 딥러닝 알고리즘을 제1 클라우드 인스턴스에서 n번(상기 n은 자연수) 반복 실행한 결과로서 n개의 제1 특징 벡터들을 생성할 수 있다. 여기에서, 제1 클라우드 인스턴스는 앵커 타입의 클라우드 인스턴스에 해당할 수 있다. 클라우드의 특성상 n개의 제1 특징 벡터들은 서로 비슷한 값을 가질 수 있지만, 실행 시점과 동작 상태의 차이로 인해 조금씩 상이한 값을 가질 수 있다.More specifically, the predictive model builder 320 may generate n first feature vectors as a result of repeatedly executing a specific deep learning algorithm n times (where n is a natural number) in the first cloud instance. Here, the first cloud instance may correspond to an anchor type cloud instance. Due to the nature of the cloud, n first feature vectors may have values similar to each other, but may have slightly different values due to differences in execution time and operating state.

그 다음, 예측 모델 구축부(320)는 동일한 딥러닝 알고리즘을 제2 클라우드 인스턴스에서 m번(상기 m은 자연수) 반복 실행한 결과로서 m개의 제2 특징 벡터들을 생성할 수 있다. 여기에서, 제2 클라우드 인스턴스는 앵커 타입이 아닌 다른 인스턴스 타입에 해당할 수 있으며, 제2 특징 벡터들 역시 서로 비슷하지만 조금씩 상이한 값을 가질 수 있다.Next, the predictive model builder 320 may generate m second feature vectors as a result of repeatedly executing the same deep learning algorithm m times (where m is a natural number) in the second cloud instance. Here, the second cloud instance may correspond to an instance type other than the anchor type, and the second feature vectors may also have similar but slightly different values.

그 다음, 예측 모델 구축부(320)는 제1 및 제2 특징 벡터들 간의 조합으로 생성되는 n*m개의 특징 벡터 쌍들을 복수의 학습 데이터들에 포함시켜 성능 예측 모델을 구축할 수 있다. 즉, 특징 벡터 쌍은 성능 예측 모델 구축을 위한 하나의 학습 데이터에 대응될 수 있고, 각각 입력과 출력 데이터에 대응될 수 있다.Next, the predictive model builder 320 may build a performance prediction model by including n*m feature vector pairs generated as a combination of the first and second feature vectors in a plurality of training data. That is, a pair of feature vectors may correspond to one training data for constructing a performance prediction model, and may correspond to input and output data, respectively.

일 실시예에서, 예측 모델 구축부(320)는 그룹화의 결과로 생성된 적어도 하나의 벡터 그룹마다 성능 예측 모델을 독립적으로 구축할 수 있다. 특징 벡터 생성부(310)는 복수의 특징 벡터들을 벡터 간의 거리를 기준으로 그룹화 할 수 있으며, 이 경우 예측 모델 구축부(320)는 그룹화된 결과로 생성된 클러스터들, 즉 각 벡터 그룹에 대응되는 성능 예측 모델을 개별적으로 구축하여 성능 예측 모델의 예측 정확성을 높일 수 있다.In an embodiment, the prediction model builder 320 may independently build a performance prediction model for each of at least one vector group generated as a result of grouping. The feature vector generator 310 may group a plurality of feature vectors based on the distance between the vectors, and in this case, the predictive model builder 320 may group clusters generated as a result of the grouping, that is, corresponding to each vector group. It is possible to increase the prediction accuracy of the performance prediction model by individually building the performance prediction model.

후보 특징 벡터 생성부(330)는 수행시간을 예측하고자 하는 후보 딥러닝 알고리즘에 대한 후보 특징 벡터를 생성할 수 있다. 예측 모델 구축부(320)에 의해 성능 예측 모델을 구축된 경우, 후보 특징 벡터 생성부(330)는 실제 성능 예측 대상이 되는 후보 딥러닝 알고리즘이 구현된 학습코드를 앵커 타입의 클라우드 인스턴스에서 실행시킨 결과로서 특징 벡터를 생성할 수 있다. 이후 단계에서, 후보 특징 벡터는 성능 예측 모델의 입력으로 활용될 수 있다.The candidate feature vector generation unit 330 may generate a candidate feature vector for a candidate deep learning algorithm whose execution time is to be predicted. When the performance prediction model is built by the predictive model builder 320, the candidate feature vector generator 330 executes the learning code in which the candidate deep learning algorithm, which is the actual performance prediction target, is implemented in an anchor-type cloud instance. As a result, a feature vector can be created. In a later step, the candidate feature vector may be used as an input to a performance prediction model.

일 실시예에서, 후보 특징 벡터 생성부(330)는 후보 딥러닝 알고리즘을 구현한 후보 딥러닝 학습코드를 최소 비용의 클라우드 인스턴스에서 실행한 결과로서 후보 특징 벡터를 생성할 수 있다. 성능 예측 모델의 입력으로 활용될 후보 특징 벡터는 기준이 되는 클라우드 인스턴스에서 실행시킬 필요가 있으며, 후보 특징 벡터 생성부(330)는 최소 비용으로 구성 가능한 클라우드 인스턴스를 기초로 후보 특징 벡터를 생성할 수 있다.In an embodiment, the candidate feature vector generator 330 may generate a candidate feature vector as a result of executing a candidate deep learning learning code implementing a candidate deep learning algorithm in a cloud instance with a minimum cost. A candidate feature vector to be used as an input of the performance prediction model needs to be executed in a reference cloud instance, and the candidate feature vector generator 330 can generate a candidate feature vector based on a configurable cloud instance at a minimum cost. there is.

성능 특징 벡터 예측부(340)는 후보 특징 벡터를 성능 예측 모델에 적용하여 성능 특징 벡터를 예측할 수 있다. 즉, 성능 예측 모델은 사용자가 작성한 딥러닝 학습코드에 대한 후보 특징 벡터를 기초로 다른 인스턴스 타입에서 동작 시 생성될 수 있는 특징 벡터를 예측하여 출력으로 제공할 수 있다.The performance feature vector predictor 340 may predict the performance feature vector by applying the candidate feature vector to the performance prediction model. That is, the performance prediction model may predict a feature vector that may be generated during operation in another instance type based on the candidate feature vector for the deep learning learning code written by the user and provide it as an output.

일 실시예에서, 성능 특징 벡터 예측부(340)는 후보 특징 벡터를 기준으로 적어도 하나의 벡터 그룹 중 어느 하나를 선택하고 해당 벡터 그룹에 대응되는 성능 예측 모델을 이용하여 성능 특징 벡터를 예측할 수 있다. 성능 특징 벡터 예측부(340)는 후보 특징 벡터를 기준으로 벡터 간의 거리에 따라 특정 벡터 그룹을 결정할 수 있으며, 해당 벡터 그룹에 대응되어 구축된 성능 예측 모델을 선택하여 성능 특징 벡터 예측에 사용할 수 있다.In an embodiment, the performance feature vector predictor 340 selects one of at least one vector group based on the candidate feature vector and predicts the performance feature vector by using a performance prediction model corresponding to the corresponding vector group. . The performance feature vector predictor 340 may determine a specific vector group according to a distance between vectors based on candidate feature vectors, and may select a performance prediction model built corresponding to the corresponding vector group and use the performance feature vector prediction. .

수행시간 예측부(350)는 성능 특징 벡터를 기초로 후보 딥러닝 알고리즘에 대한 수행 시간을 예측할 수 있다. 성능 특징 벡터는 특정 클라우드 인스턴스에서 딥러닝 학습코드가 실행되는 과정에서 모니터링된 성능 메트릭에 해당할 수 있으며, 과거 실제 수행 과정에서 수집된 정보를 기초로 이와 유사한 성능 메트릭과 실제 수행시간에 관한 정보를 이용하면 후보 딥러닝 알고리즘에 대한 수행 시간을 예측할 수 있다. 이를 위하여, 수행시간 예측부(350)는 통계적 분석 방법론에 해당하는 회귀분석(regression analysis)을 수행 시간 예측에 활용할 수 있다.The execution time estimation unit 350 may predict the execution time of the candidate deep learning algorithm based on the performance feature vector. The performance feature vector may correspond to a performance metric monitored in the process of executing the deep learning learning code in a specific cloud instance, and based on information collected in the past actual execution process, similar performance metrics and information on actual execution time can be obtained. When used, the execution time for the candidate deep learning algorithm can be predicted. To this end, the execution time prediction unit 350 may utilize regression analysis corresponding to a statistical analysis methodology to predict execution time.

일 실시예에서, 수행시간 예측부(350)는 성능 특징 벡터를 회귀(regressor) 모델에 적용하여 수행 시간을 예측할 수 있다. 예를 들어, 도 7에서, 회귀 모델(730)은 딥러닝 알고리즘에 관한 특징 벡터와 실제 수행 시간 간의 회귀분석을 통해 사전에 생성될 수 있으며, 수행시간 예측부(350)는 성능 예측 모델을 통해 예측된 성능 특징 벡터(710)를 회귀 모델(730)에 적용하여 실제 수행 시간을 예측할 수 있다. 즉, 회귀 모델(730)은 학습 데이터 생성 시 만들어진 특징 벡터와 해당 특징 벡터를 생성하기 위해서 실행된 단계에서의 학습 시간을 추론하는 분석 모델에 해당할 수 있다.In one embodiment, the execution time estimation unit 350 may predict the execution time by applying the performance feature vector to a regression model. For example, in FIG. 7 , a regression model 730 may be generated in advance through a regression analysis between a feature vector for a deep learning algorithm and an actual execution time, and the execution time predictor 350 uses a performance prediction model. Actual execution time may be predicted by applying the predicted performance feature vector 710 to the regression model 730 . That is, the regression model 730 may correspond to an analysis model that infers a feature vector created when training data is generated and a learning time in a step executed to generate the corresponding feature vector.

제어부(360)는 특징 벡터 생성부(310), 예측 모델 구축부(320), 후보 특징 벡터 생성부(330), 성능 특징 벡터 예측부(340) 및 수행시간 예측부(350) 간의 제어 흐름 또는 데이터 흐름을 관리할 수 있다.The controller 360 controls the flow of control between the feature vector generator 310, the predictive model builder 320, the candidate feature vector generator 330, the performance feature vector predictor 340, and the execution time predictor 350. You can manage data flow.

도 4는 본 발명에 따른 클라우드 기반 딥러닝 작업의 수행시간 예측 과정을 설명하는 순서도이다.4 is a flowchart illustrating a process of predicting execution time of a cloud-based deep learning task according to the present invention.

도 4를 참조하면, 수행시간 예측 장치(130)는 특징 벡터 생성부(310)를 통해 복수의 딥러닝 알고리즘들에 대한 특징 벡터들을 생성할 수 있다(단계 S410). 수행시간 예측 장치(130)는 예측 모델 구축부(320)를 통해 복수의 딥러닝 알고리즘들 각각에 대해 복수의 클라우드 인스턴스들에서 실행한 결과로서 생성된 복수의 학습 데이터들을 학습하여 성능 예측 모델을 구축할 수 있다(단계 S420).Referring to FIG. 4 , the execution time prediction apparatus 130 may generate feature vectors for a plurality of deep learning algorithms through the feature vector generator 310 (step S410). The execution time prediction device 130 builds a performance prediction model by learning a plurality of training data generated as a result of executing each of a plurality of deep learning algorithms in a plurality of cloud instances through the predictive model building unit 320. It can be done (step S420).

또한, 수행시간 예측 장치(130)는 후보 특징 벡터 생성부(330)를 통해 수행시간을 예측하고자 하는 후보 딥러닝 알고리즘에 대한 후보 특징 벡터를 생성할 수 있다(단계 S430). 수행시간 예측 장치(130)는 성능 특징 벡터 예측부(340)를 통해 후보 특징 벡터를 성능 예측 모델에 적용하여 성능 특징 벡터를 예측할 수 있다(단계 S440). 수행시간 예측 장치(130)는 수행시간 예측부(350)를 통해 성능 특징 벡터를 기초로 후보 딥러닝 알고리즘에 대한 수행 시간을 예측할 수 있다(단계 S450).In addition, the execution time prediction apparatus 130 may generate a candidate feature vector for a candidate deep learning algorithm to predict an execution time through the candidate feature vector generator 330 (step S430). The execution time prediction apparatus 130 may predict the performance feature vector by applying the candidate feature vector to the performance prediction model through the performance feature vector predictor 340 (step S440). The execution time prediction device 130 may predict the execution time of the candidate deep learning algorithm based on the performance feature vector through the execution time estimation unit 350 (step S450).

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to preferred embodiments of the present invention, those skilled in the art will variously modify and change the present invention within the scope not departing from the spirit and scope of the present invention described in the claims below. You will understand that it can be done.

100: 수행시간 예측 시스템
110: 사용자 단말 130: 수행시간 예측 장치
150: 클라우드 서버 170: 데이터베이스
210: 프로세서 230: 메모리
250: 사용자 입출력부 270: 네트워크 입출력부
310: 특징 벡터 생성부 320: 예측 모델 구축부
330: 후보 특징 벡터 생성부 340: 성능 특징 벡터 예측부
350: 수행시간 예측부 360: 제어부
510: 딥러닝 학습코드 530: 딥러닝 플랫폼
550: 성능 메트릭
610: 제1 특징 벡터 630: 성능 예측 모델
650: 제2 특징 벡터
710: 성능 특징 벡터 730: 회귀 모델100: execution time prediction system
110: user terminal 130: execution time prediction device
150: cloud server 170: database
210: processor 230: memory
250: user input/output unit 270: network input/output unit
310: feature vector generation unit 320: predictive model building unit
330: candidate feature vector generator 340: performance feature vector predictor
350: execution time prediction unit 360: control unit
510: Deep learning learning code 530: Deep learning platform
550: performance metrics
610: first feature vector 630: performance prediction model
650: second feature vector
710: performance feature vector 730: regression model

Claims

The learning process according to the execution of the deep learning learning code that implements the deep learning algorithm is monitored, and the performance metric generated as a result of the monitoring is determined as a feature vector for the deep learning algorithm to be applied to a plurality of deep learning algorithms. a feature vector generator for generating feature vectors for;
a prediction model building unit configured to build a performance prediction model by learning a plurality of training data generated as a result of executing each of the plurality of deep learning algorithms in a plurality of cloud instances;
a candidate feature vector generator for generating a candidate feature vector for a candidate deep learning algorithm to predict an execution time;
a performance feature vector predictor predicting a performance feature vector by applying the candidate feature vector to the performance prediction model; and
An execution time predictor for predicting an execution time for the candidate deep learning algorithm based on the performance feature vector,
The performance prediction model takes the feature vector extracted by executing the deep learning learning code in an anchor type cloud instance as an input, and executes the deep learning learning code in a cloud instance of an instance type other than the anchor type. Features generated by executing the model A system for predicting execution time of cloud-based deep learning tasks, characterized by predicting vectors.

delete

The method of claim 1, wherein the feature vector generator
A cloud-based deep learning task execution time prediction system, characterized in that for generating a compressed feature vector by extracting only specific fields from among a plurality of fields constituting the performance metric.

The method of claim 1, wherein the predictive model building unit
As a result of repeatedly executing a specific deep learning algorithm n times (where n is a natural number) in a first cloud instance, n first feature vectors are generated, and the specific deep learning algorithm is run m times in a second cloud instance (where m is a natural number). Natural number) As a result of repeated execution, m second feature vectors are generated, and n*m feature vector pairs generated by a combination of the first and second feature vectors are included in the plurality of training data to perform the performance. A system for predicting execution time of a cloud-based deep learning task, characterized in that it builds a predictive model.

According to claim 1,
The feature vector generation unit groups a plurality of feature vectors based on distances between vectors,
The prediction model building unit independently builds the performance prediction model for each of at least one vector group generated as a result of the grouping.

The method of claim 1, wherein the candidate feature vector generator
A cloud-based deep learning task execution time prediction system, characterized in that the candidate feature vector is generated as a result of executing the candidate deep learning learning code implementing the candidate deep learning algorithm in a cloud instance with a minimum cost.

The method of claim 5, wherein the performance feature vector prediction unit
Execution time of cloud-based deep learning task, characterized in that selecting any one of the at least one vector group based on the candidate feature vector and predicting the performance feature vector using a performance prediction model corresponding to the vector group prediction system.

The method of claim 1, wherein the execution time prediction unit
A system for predicting the execution time of a cloud-based deep learning task, characterized in that the execution time is predicted by applying the performance feature vector to a regression model.

Prediction of execution time of cloud-based deep learning task performed in a system for predicting execution time of cloud-based deep learning task including feature vector generator, predictive model builder, candidate feature vector generator, performance feature vector predictor, and execution time predictor in the method,
Through the feature vector generation unit, the learning process according to the execution of the deep learning learning code implementing the deep learning algorithm is monitored, and the performance metric generated as a result of the monitoring is determined as a feature vector for the deep learning algorithm. generating feature vectors for a plurality of deep learning algorithms;
building a performance prediction model by learning a plurality of training data generated as a result of executing each of the plurality of deep learning algorithms in a plurality of cloud instances through the predictive model building unit;
generating a candidate feature vector for a candidate deep learning algorithm whose execution time is to be predicted through the candidate feature vector generator;
predicting a performance feature vector by applying the candidate feature vector to the performance prediction model through the performance feature vector prediction unit; and
Predicting an execution time for the candidate deep learning algorithm based on the performance feature vector through the execution time prediction unit,
The performance prediction model takes the feature vector extracted by executing the deep learning learning code in an anchor type cloud instance as an input, and executes the deep learning learning code in a cloud instance of an instance type other than the anchor type. Features generated by executing the model A method for predicting execution time of a cloud-based deep learning task characterized by predicting a vector.