KR20200010644A

KR20200010644A - Computer-enabled cloud-based ai computing service method

Info

Publication number: KR20200010644A
Application number: KR1020180074043A
Authority: KR
Inventors: 이경용; 김혁만
Original assignee: 국민대학교산학협력단
Priority date: 2018-06-27
Filing date: 2018-06-27
Publication date: 2020-01-31
Also published as: KR102158051B1

Abstract

The present invention relates to a computer-executable cloud-based artificial intelligence operation service method. The computer-executable cloud-based artificial intelligence operation service method comprises the steps of: (a) determining a server layout for a matrix multiplication operation based on the shape and size of each of first and second matrices and the number of servers; (b) dividing the first and second matrices into first and second block-based partitions based on the server layout; (c) calculating a unit operation time by applying dispersion matrix characteristics to the first and second block-based partitions; and (d) estimating a total operation time required for multiplication operations of the first and second matrices based on the unit operation time.

Description

Computer-based cloud-based AI computation service method {COMPUTER-ENABLED CLOUD-BASED AI COMPUTING SERVICE METHOD}

본 발명은 컴퓨터 수행 가능한 클라우드 기반의 인공지능 연산 서비스 기술에 관한 것으로, 보다 상세하게는 행렬 곱셈을 수행하는 서버의 개수에 따라 변화하는 연산 성능을 예측할 수 있는 컴퓨터 수행 가능한 클라우드 기반의 인공지능 연산 서비스 방법에 관한 것이다.The present invention relates to a computer-implemented cloud-based AI operation technology, and more particularly, to a computer-implemented cloud-based AI operation service capable of predicting computational performance that varies with the number of servers performing matrix multiplication. It is about a method.

최근 하드웨어 및 소프트웨어 시스템 기술의 향상은 과거에 불가능했던 대규모 데이터 집합의 처리를 가능하게 만들었다. 시스템들은 증가하는 빅데이터 분석 어플리케이션들의 수를 수용하기 위하여 운영 작업들을 통해 오버헤드를 줄임으로써 확장성과 내결함성을 제공하는 클라우드 컴퓨팅 환경을 점점 더 많이 적용하고 있다. 클라우드 컴퓨팅 서비스는 다양한 인스턴스에 고유한 하드웨어 구성을 제공하고, 많은 빅데이터 처리 소프트웨어 플랫폼은 이러한 리소스를 스케일 아웃 방식으로 사용할 수 있다. Recent advances in hardware and software system technology have enabled the processing of large data sets that were not possible in the past. Systems are increasingly deploying cloud computing environments that provide scalability and fault tolerance by reducing overhead through operational tasks to accommodate the growing number of big data analytics applications. Cloud computing services provide hardware configurations that are unique to various instances, and many big data processing software platforms can use these resources in a scale-out fashion.

행렬 곱셈 작업은 많은 기계 학습 알고리즘에서 중요한 커널 작업에 해당하지만, 분산 클라우드 컴퓨팅 환경에서 해당 작업에 소요되는 시간이나 해당 작업 완료에 필요한 인스턴스의 수를 정확히 예측하기는 매우 어려운 작업에 해당할 수 있다. 행렬 곱셈 작업은 클라우드 환경에서 비용 효율성을 유지하기 위해 오버헤드를 예측하는 작업을 필수적으로 수행할 필요가 있다.Matrix multiplication is an important kernel task in many machine learning algorithms, but in a distributed cloud computing environment, it can be a very difficult task to accurately predict the time it takes or the number of instances required to complete it. Matrix multiplication is essential for predicting overhead in order to maintain cost efficiency in cloud environments.

한국 등록특허공보 제10-0909510(2009.07.20)호Korea Patent Publication No. 10-0909510 (2009.07.20)

본 발명의 일 실시예는 행렬 곱셈을 수행하는 서버의 개수에 따라 변화하는 연산 성능을 예측할 수 있는 컴퓨터 수행 가능한 클라우드 기반의 인공지능 연산 서비스 방법을 제공하고자 한다.An embodiment of the present invention is to provide a computer-implemented cloud-based AI calculation service method that can predict the calculation performance changes according to the number of servers performing matrix multiplication.

본 발명의 일 실시예는 작업 서버의 개수를 기초로 제1 및 제2 행렬들의 형상들에 맞게 적어도 하나의 작업 서버를 매칭시켜 서버 레이아웃을 결정할 수 있는 컴퓨터 수행 가능한 클라우드 기반의 인공지능 연산 서비스 방법을 제공하고자 한다.According to an embodiment of the present invention, a computer-implemented cloud-based AI computing service method capable of determining a server layout by matching at least one job server according to shapes of first and second matrices based on the number of job servers. To provide.

본 발명의 일 실시예는 단위 연산 시간에 성능 예측 모델의 작업 서버의 개수에 대한 서버 레이아웃에 포함된 작업 서버의 개수에 대한 비율을 적용하여 전체 연산 시간을 예측할 수 있는 컴퓨터 수행 가능한 클라우드 기반의 인공지능 연산 서비스 방법을 제공하고자 한다.According to an embodiment of the present invention, a computer-implemented cloud-based artificial machine capable of estimating the total calculation time by applying a ratio of the number of job servers included in the server layout to the number of job servers of the performance prediction model to the unit calculation time An intelligent operation service method is provided.

실시예들 중에서, 컴퓨터 수행 가능한 클라우드 기반의 인공지능 연산 서비스 방법은 (a) 제1 및 제2 행렬들 각각의 형상을 기초로 행렬 곱셈 연산을 위한 서버 레이아웃(Layout)을 결정하는 단계, (b) 상기 서버 레이아웃을 기초로 상기 제1 및 제2 행렬들을 제1 및 제2 블록 기반의 파티션(Partition)들로 분할하는 단계, (c) 상기 제1 및 제2 블록 기반 파티션들에 대해 분산 행렬 특성을 적용하여 단위 연산 시간을 산출하는 단계 및 (d) 상기 단위 연산 시간을 기초로 상기 제1 및 제2 행렬들의 곱셈 연산에 필요한 전체 연산 시간을 예측하는 단계를 포함한다.Among the embodiments, the computer-implemented cloud-based AI operation method includes (a) determining a server layout for matrix multiplication operation based on the shape of each of the first and second matrices, (b) Dividing the first and second matrices into first and second block based partitions based on the server layout; (c) a distribution matrix for the first and second block based partitions; Calculating a unit operation time by applying a characteristic, and (d) predicting a total operation time required for a multiplication operation of the first and second matrices based on the unit operation time.

상기 (a) 단계는 복수의 서버들 중에서 상기 제1 및 제2 행렬들 간의 행렬 곱셈 연산의 특정 조건을 충족시키는 적어도 하나의 작업 서버를 결정하고 작업 서버의 개수를 기초로 상기 제1 및 제2 행렬들의 형상들에 맞게 상기 적어도 하나의 작업 서버를 매칭시켜 상기 서버 레이아웃을 결정할 수 있다.Step (a) determines at least one job server satisfying a specific condition of a matrix multiplication operation between the first and second matrices among a plurality of servers, and based on the number of job servers, the first and second jobs are determined. The server layout may be determined by matching the at least one job server according to the shapes of the matrices.

상기 (b) 단계는 상기 제1 및 제2 행렬들 각각에 대해 상기 작업 서버의 개수를 기초로 상기 제1 및 제2 블록 기반의 파티션들로 분할할 수 있다.Step (b) may be divided into partitions based on the first and second block based on the number of the job servers for each of the first and second matrices.

상기 (c) 단계는 상기 제1 및 제2 블록 기반 파티션들을 서로 연관시켜 상기 서버 레이아웃을 기초로 상기 적어도 하나의 작업 서버에 할당하고 상기 적어도 하나의 작업 서버에서의 연산 시간을 이용하여 상기 단위 연산 시간을 산출할 수 있다.Step (c) associates the first and second block-based partitions with each other and allocates the at least one job server to the at least one job server based on the server layout and uses the operation time at the at least one job server to perform the unit operation. The time can be calculated.

상기 (c) 단계는 상기 적어도 하나의 작업 서버에 할당된 상기 제1 블록 기반 파티션에 대한 행의 크기와 열의 크기 및 제2 블록 기반 파티션에 대한 열의 크기를 기초로 성능 예측 모델을 통해 상기 단위 연산 시간을 산출할 수 있다.Step (c) is the unit operation through the performance prediction model based on the size of the row and the size of the column for the first block-based partition assigned to the at least one job server and the size of the column for the second block-based partition The time can be calculated.

상기 (c) 단계는 성능 예측 모델의 작업 서버의 개수에 대한 상기 서버 레이아웃에 포함된 작업 서버의 개수에 대한 비율을 기초로 상기 분산 행렬 특성을 결정하는 단계를 포함할 수 있다.The step (c) may include determining the distribution matrix characteristic based on a ratio of the number of job servers included in the server layout to the number of job servers of the performance prediction model.

상기 (d) 단계는 상기 단위 연산 시간에 상기 성능 예측 모델의 작업 서버의 개수에 대한 상기 서버 레이아웃에 포함된 작업 서버의 개수에 대한 비율을 적용하여 상기 전체 연산 시간을 예측할 수 있다.In step (d), the total calculation time may be estimated by applying a ratio of the number of job servers included in the server layout to the number of job servers of the performance prediction model.

실시예들 중에서, 클라우드 기반의 인공지능 연산 서비스 방법은 (a) 제곱승으로 구성된 서버의 개수를 기초로 곱셈 연산을 위한 제1 및 제2 행렬들을 제1 및 제2 블록 기반 파티션(Partition)들로 분할하는 단계, (b) 상기 제1 및 제2 블록 기반 파티션들에 대해 분산 행렬 특성을 적용하여 단위 연산 시간을 산출하는 단계 및 (c) 상기 단위 연산 시간을 기초로 상기 제1 및 제2 행렬들의 곱셈 연산에 필요한 전체 연산 시간을 예측하는 단계를 포함한다.Among the embodiments, the cloud-based AI operation method includes (a) first and second block-based partitions for first and second matrices for a multiplication operation based on the number of servers of squared powers. (B) calculating a unit operation time by applying a dispersion matrix property to the first and second block-based partitions, and (c) the first and second units based on the unit operation time. Estimating the total operation time required for the multiplication operation of the matrices.

개시된 기술은 다음의 효과를 가질 수 있다. 다만, 특정 실시예가 다음의 효과를 전부 포함하여야 한다거나 다음의 효과만을 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The disclosed technique can have the following effects. However, since a specific embodiment does not mean to include all of the following effects or only the following effects, it should not be understood that the scope of the disclosed technology is limited by this.

본 발명의 일 실시예에 따른 컴퓨터 수행 가능한 클라우드 기반의 인공지능 연산 서비스 방법은 작업 서버의 개수를 기초로 제1 및 제2 행렬들의 형상들에 맞게 적어도 하나의 작업 서버를 매칭시켜 서버 레이아웃을 결정할 수 있다.According to an embodiment of the present invention, a computer-implemented cloud-based AI computing service method determines a server layout by matching at least one job server according to shapes of first and second matrices based on the number of job servers. Can be.

본 발명의 일 실시예에 따른 컴퓨터 수행 가능한 클라우드 기반의 인공지능 연산 서비스 방법은 단위 연산 시간에 성능 예측 모델의 작업 서버의 개수에 대한 서버 레이아웃에 포함된 작업 서버의 개수에 대한 비율을 적용하여 전체 연산 시간을 예측할 수 있다.In the computer-implemented cloud-based AI calculation method according to an embodiment of the present invention, the ratio of the number of job servers included in the server layout to the number of job servers of the performance prediction model is applied to the unit calculation time. The computation time can be predicted.

도 1은 본 발명의 일 실시예에 따른 컴퓨터 수행 가능한 클라우드 기반의 인공지능 연산 서비스 시스템을 설명하는 도면이다.
도 2는 도 1에 있는 인공지능 연산 서비스 장치를 나타내는 블록도이다.
도 3은 도 1에 있는 인공지능 연산 서비스 장치에서 수행되는 인공지능 연산 서비스 제공 과정을 설명하는 순서도이다.
도 4는 도 1에 있는 인공지능 연산 서비스 장치에서 블록 기반의 분산 행렬 곱셈이 수행되는 과정을 설명하는 예시도이다.
도 5는 도 1에 있는 인공지능 연산 서비스 장치에서 성능 예측 모델을 생성하는 과정을 설명하는 예시도이다.
도 6은 도 1에 있는 인공지능 연산 서비스 장치에서 서로 다른 서버 레이아웃을 기초로 행렬 곱셈에 대한 전체 연산 시간을 예측하는 과정을 설명하는 예시도이다.1 is a diagram illustrating a computer-implemented cloud-based AI computing service system according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating an artificial intelligence service device of FIG. 1.
FIG. 3 is a flowchart illustrating a process of providing an artificial intelligence calculation service performed by the artificial intelligence service apparatus of FIG. 1.
FIG. 4 is an exemplary diagram illustrating a process of performing block-based distributed matrix multiplication in the artificial intelligence service device of FIG. 1.
FIG. 5 is an exemplary diagram illustrating a process of generating a performance prediction model in the artificial intelligence computing service device of FIG. 1.
FIG. 6 is an exemplary diagram illustrating a process of estimating a total operation time for matrix multiplication on the basis of different server layouts in the apparatus for arithmetic operation shown in FIG. 1.

본 발명에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. 또한, 본 발명에서 제시된 목적 또는 효과는 특정 실시예가 이를 전부 포함하여야 한다거나 그러한 효과만을 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.Description of the present invention is only an embodiment for structural or functional description, the scope of the present invention should not be construed as limited by the embodiments described in the text. That is, since the embodiments may be variously modified and may have various forms, the scope of the present invention should be understood to include equivalents for realizing the technical idea. In addition, the objects or effects presented in the present invention does not mean that a specific embodiment should include all or only such effects, the scope of the present invention should not be understood as being limited thereby.

한편, 본 출원에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.On the other hand, the meaning of the terms described in the present application should be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Terms such as "first" and "second" are intended to distinguish one component from another component, and the scope of rights should not be limited by these terms. For example, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

어떤 구성요소가 다른 구성요소에 "연결되어"있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어"있다고 언급된 때에는 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is referred to as being "connected" to another component, it should be understood that there may be other components in between, although it may be directly connected to the other component. On the other hand, when a component is referred to as being "directly connected" to another component, it should be understood that there is no other component in between. On the other hand, other expressions describing the relationship between the components, such as "between" and "immediately between" or "neighboring to" and "directly neighboring", should be interpreted as well.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다"또는 "가지다" 등의 용어는 실시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions should be understood to include plural expressions unless the context clearly indicates otherwise, and terms such as "comprise" or "have" refer to a feature, number, step, operation, component, part, or portion thereof that is implemented. It is to be understood that the combination is intended to be present and does not exclude in advance the possibility of the presence or addition of one or more other features or numbers, steps, actions, components, parts or combinations thereof.

각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In each step, an identification code (e.g., a, b, c, etc.) is used for convenience of description, and the identification code does not describe the order of the steps, and each step is clearly contextual. Unless stated otherwise, they may occur out of the order noted. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

본 발명은 컴퓨터가 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있고, 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다.The present invention can be embodied as computer readable code on a computer readable recording medium, and the computer readable recording medium includes all kinds of recording devices in which data can be read by a computer system. . Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. The terms defined in the commonly used dictionary should be interpreted to coincide with the meanings in the context of the related art, and should not be interpreted as having ideal or excessively formal meanings unless clearly defined in the present application.

행렬 곱셈(Matrix Multiplication)에 대한 성능 예측은 클라우드 컴퓨팅 환경에서 행렬 곱셈에 소요되는 시간을 산출함으로써 수행될 수 있다. 즉, 임의의 행렬들 간의 곱셈에 대한 성능을 예측하는 방법은 성능 예측 모델을 생성하고 성능 예측 모델을 이용하여 행렬 곱셈에 대한 소요 시간을 예측하는 것에 해당할 수 있다. 성능 예측 모델은 행렬 곱셈의 연산 시간에 가장 큰 영향을 미치는 행렬 특성을 입력 데이터로 하고 해당 행렬 특성을 가진 행렬들 간의 행렬 곱셈에 소요되는 예상 시간을 출력 데이터로 하는 학습 데이터들을 기계 학습하여 생성된 학습 결과에 해당할 수 있다.Performance prediction for matrix multiplication may be performed by calculating the time required for matrix multiplication in a cloud computing environment. That is, a method of predicting performance for multiplication between arbitrary matrices may correspond to generating a performance prediction model and predicting time required for matrix multiplication using the performance prediction model. The performance prediction model is generated by machine learning training data whose matrix data that has the greatest influence on the computation time of matrix multiplication are input data, and the expected data for matrix multiplication between matrices having the matrix characteristics as output data. It may correspond to a learning result.

행렬 곱셈 성능 예측은 핵심적인 구성이라고 할 수 있는 성능 예측 모델 구축을 통해 수행될 수 있고, 학습 데이터 집합 생성 단계, 특징 추출 단계 및 모델링 작업 단계로 구성될 수 있으며, 각 단계별로 수행되는 동작은 다음과 같다.Matrix multiplication performance prediction can be performed by building a performance prediction model, which is a core configuration, and can be composed of training data set generation step, feature extraction step, and modeling work step. Same as

1) 학습 데이터 집합 생성 단계1) Generation of training dataset

학습 데이터 집합 생성 단계에서 행렬 곱셈 성능 예측은 성능 예측 모델을 구축하기 위해 다양한 형상과 크기의 행렬 곱셈에 관한 프로파일링을 수행할 수 있다. 보다 구체적으로, 행렬 곱셈 성능 예측은 학습에 사용될 학습 데이터를 생성하기 위하여 행렬 곱셈의 다양한 유형들에 속하는 행렬 곱셈 작업을 생성할 수 있다. 행렬 곱셈 성능 예측은 모든 형상과 크기의 행렬들을 처리하기 위해 행렬 곱셈 작업에 대해 왼쪽 및 오른쪽 행렬들 간의 행렬 곱셈 연산에 소요되는 예상 연산시간을 포함하는 프로파일을 수집하여 학습 프로파일링을 수행할 수 있다. In the generation of training data sets, matrix multiplication performance prediction may perform profiling on matrix multiplication of various shapes and sizes to build a performance prediction model. More specifically, matrix multiplication performance prediction may generate matrix multiplication tasks belonging to various types of matrix multiplication in order to generate training data to be used for learning. Matrix multiplication performance prediction can perform learning profiling by collecting a profile containing the estimated computation time for matrix multiplication operations between left and right matrices for matrix multiplication operations to process matrices of all shapes and sizes. .

행렬 곱셈 작업은 왼쪽 및 오른쪽 행렬의 형상과 크기에 따라 정사각형 행렬들 간의 곱셈(square X square), 길고 얇은 직사각형 행렬과 짧고 넓은 직사각형 행렬 간의 곱셈(long-thin X short-wide) 및 짧고 넓은 직사각형 행렬과 길고 얇은 직사각형 행렬 간의 곱셈(short-wide X long-thin)으로 크게 분류될 수 있다. 또한, 행렬 곱셈 성능 예측은 행렬 곱셈에 소요되는 연산시간 측정에 있어서 JSON 형식의 다양한 실행 지표들을 제공하는 Apache Spark web UI REST API를 사용할 수 있으며, 반드시 이에 한정되지 않고, 다양한 분산 인공지능 연산 프로그램을 사용할 수 있다.Matrix multiplying works by multiplying square matrices according to the shape and size of the left and right matrices, multiplying between long and thin rectangular matrices and short and wide rectangular matrices (long-thin X short-wide) and short and wide rectangular matrices. Can be largely classified into short-wide X long-thin. In addition, matrix multiplication performance prediction may use the Apache Spark web UI REST API, which provides various performance indicators in JSON format for measuring the computation time required for matrix multiplication, and is not limited thereto. Can be used

행렬 곱셈 성능 예측은 서로 다른 용량을 가진 다양한 클라우드 컴퓨팅 인스턴스(instance)들에 대해 최적의 성능을 얻기 위해 GPU 장치를 사용하는 인스턴스에서는 행렬 곱셈을 수행할 때 NVBLAS 라이브러리(Library)를 사용하고 CPU 장치를 사용하는 인스턴스의 경우 OpenBLAS를 사용할 수 있다. 또한, 행렬 곱셈 성능 예측은 Spark가 하드웨어 최적화 선형 대수 라이브러리와 상호 작용할 수 있도록 netlib-java library를 사용할 수 있다. 행렬 곱셈 성능 예측은 반드시 이에 한정되지 않고 다양한 분산 인공지능 연산 프로그램을 사용할 수 있다.Matrix multiplication performance prediction uses NVBLAS library and matrix CPU when performing matrix multiplication on instances that use GPU devices to achieve optimal performance for different cloud computing instances with different capacities. For instance, you can use OpenBLAS. Matrix multiplication performance prediction can also use the netlib-java library to enable Spark to interact with the hardware-optimized linear algebra library. Matrix multiplication performance prediction is not limited to this, and various distributed AI computational programs may be used.

2) 특징 추출 단계2) Feature Extraction Step

분산 컴퓨팅 환경에서의 행렬 곱셈의 오버헤드(overhead)는 다양한 자원들에 영향을 받을 수 있다. 행렬 곱셈 성능 측정은 다양한 오버헤드를 처리하기 위해 입력 행렬 블록들의 차원(dimension)과 곱셈(product)을 사용할 수 있고, 예를 들어, lr, lc, rc, lr*rc, lr*lc, lc*rc, lr*lc+lc*rc 및 lr*lc*rc 등을 행렬 곱셈 성능을 모델링하기 위한 행렬 특성들로서 사용할 수 있다. 여기에서, lr*rc는 출력 행렬의 크기를 나타내고, lr*lr 및 lc*rc는 각각 네트워크 오버헤드 및 입출력 디스크 오버헤드에 영향을 미치는 왼쪽 및 오른쪽 행렬 블록의 크기를 나타낼 수 있다. lr*lc*rc는 행렬 곱셈에서 수행되는 곱셈 연산의 총 수를 나타낼 수 있다.The overhead of matrix multiplication in a distributed computing environment can be affected by various resources. Matrix multiplication performance measurements can use the dimensions and products of the input matrix blocks to handle various overheads, for example, lr, lc, rc, lr * rc, lr * lc, lc * rc, lr * lc + lc * rc and lr * lc * rc can be used as matrix properties for modeling matrix multiplication performance. Here, lr * rc represents the size of the output matrix, and lr * lr and lc * rc may represent the size of the left and right matrix blocks affecting network overhead and I / O disk overhead, respectively. lr * lc * rc may represent the total number of multiplication operations performed in matrix multiplication.

3) 모델링 작업 단계3) Modeling work steps

모델링 작업 단계에서, 행렬 곱셈 성능 예측은 다양한 행렬들을 곱하는 성능을 예측할 수 있는 성능 예측 모델을 구축할 수 있다. 모델링 작업 단계는 모델 구축 단계 및 하이퍼 파라미터(hyper-parameter) 검색 단계로 구성될 수 있다. 행렬 곱셈 성능 예측은 모델 구축 단계를 위해 GB(Gradient Boost) regressor를 사용할 수 있고, GB 방법(method)에 대한 최적의 파라미터들을 찾기 위해 베이지안 최적화(Bayesian Optimization)를 사용할 수 있다.At the modeling stage, matrix multiplication performance prediction can build a performance prediction model that can predict the performance of multiplying the various matrices. The modeling work step may be composed of a model building step and a hyper-parameter search step. Matrix multiplication performance prediction can use the GB (Gradient Boost) regressor for the model building phase, and can use Bayesian Optimization to find the optimal parameters for the GB method.

GB 방법은 분류 및 회귀에 대한 유연한 비모수 통계적 학습 접근법이다. GB 방법의 주된 아이디어는 특징들 간의 복잡하고 비선형적인 상호작용들을 모델링하기 위해 점진적으로 간단한 선형 관계에만 일반적으로 적용할 수 있는 여러 개의 약한 학습기를 결합하는 것이다. GB 모델은 정방향 단계별 패턴으로 되어 있고, 각 단계에서 새로운 약한 학습기 모델이 현재 모델의 나머지 부분에 적용되며, GB 모델은 이전 반복에 대한 오류를 수정하는데 더 중점을 둘 수 있다.The GB method is a flexible, nonparametric statistical learning approach to classification and regression. The main idea of the GB method is to combine several weak learners that are generally applicable only to progressively simple linear relationships in order to model complex and nonlinear interactions between features. The GB model has a forward stepped pattern, in which each new weak learner model is applied to the rest of the current model, and the GB model can focus more on correcting errors for previous iterations.

성능 예측 모델을 구축할 때 모델 파라미터들을 적절하게 설정하는 것이 예측 품질을 향상시키는데 매우 중요할 수 있다. 랜덤워크(random walk), 그리드 기반 검색(grid based search) 및 통계적 추론 (statistical inference) 등의 많은 휴리스틱(heuristic) 방법들은 최상의 성능을 발휘하는 하이퍼 파라미터를 검색하기 위해 제안되고 있다. 행렬 곱셈 성능 예측은 베이지안 모델에 기반한 통계적 추론 방법을 사용할 수 있다. 베이지안 최적화 방법은 모델 품질을 향상시키거나 불확실성을 줄일 수 있는 다음 단계의 구성 값들에 관한 집합을 검색할 수 있다. Properly setting model parameters when building a performance prediction model can be critical to improving prediction quality. Many heuristic methods, such as random walk, grid based search and statistical inference, have been proposed to search for the best performing hyperparameters. Matrix multiplication performance prediction can use statistical inference based on Bayesian models. The Bayesian optimization method can retrieve a set of configuration values of the next level that can improve model quality or reduce uncertainty.

도 1은 본 발명의 일 실시예에 따른 컴퓨터 수행 가능한 클라우드 기반의 인공지능 연산 서비스 시스템을 설명하는 도면이다.1 is a diagram illustrating a computer-implemented cloud-based AI computing service system according to an embodiment of the present invention.

도 1을 참조하면, 클라우드 기반의 인공지능 연산 서비스 시스템(100)은 사용자 단말(110), 인공지능 연산 서비스 장치(130) 및 데이터베이스(150)를 포함할 수 있다.Referring to FIG. 1, the cloud-based AI computing service system 100 may include a user terminal 110, an AI computing service device 130, and a database 150.

사용자 단말(110)은 인공지능 연산 서비스 장치(130)에 분산 행렬의 연산 서비스와 같은 인공지능 연산 서비스를 요청할 수 있는 컴퓨팅 장치에 해당할 수 있다. 사용자 단말(110)은 스마트폰, 노트북 또는 컴퓨터로 구현될 수 있으며, 반드시 이에 한정되지 않고, 태블릿 PC 등 다양한 디바이스로도 구현될 수 있다. 사용자 단말(110)은 인공지능 연산 서비스 장치(130)와 네트워크를 통해 연결될 수 있고, 복수의 사용자 단말(110)들은 인공지능 연산 서비스 장치(130)와 동시에 연결될 수 있다.The user terminal 110 may correspond to a computing device capable of requesting an artificial intelligence calculation service device 130 such as an arithmetic service of a distribution matrix. The user terminal 110 may be implemented as a smartphone, a notebook, or a computer, and is not limited thereto, and may also be implemented as various devices such as a tablet PC. The user terminal 110 may be connected to the artificial intelligence calculation service device 130 through a network, and the plurality of user terminals 110 may be simultaneously connected to the artificial intelligence calculation service device 130.

인공지능 연산 서비스 장치(130)는 사용자 단말(110)로부터 인공지능 연산 서비스 요청을 수신하고, 인공지능을 구현할 때 필수적인 분산 행렬 연산에 소요되는 연산시간을 예측하여 최적의 클라우드 컴퓨팅 서비스를 제공할 수 있는 컴퓨터 또는 프로그램에 해당하는 서버로 구현될 수 있다. 인공지능 연산 서비스 장치(130)는 분산 컴퓨팅 기반으로 동작되는 적어도 하나의 클라우드 서버로 구현될 수 있다. 인공지능 연산 서비스 장치(130)는 사용자 단말(110)과 유선 네트워크 또는 블루투스, WiFi 등과 같은 무선 네트워크로 연결될 수 있고, 유선 또는 무선 네트워크를 통해 사용자 단말(110)과 통신을 수행할 수 있다.The AI computing service device 130 may receive an AI computing service request from the user terminal 110 and may provide an optimal cloud computing service by predicting an operation time required for a distributed matrix operation necessary when implementing AI. It may be implemented as a server corresponding to a computer or a program. The artificial intelligence computing service device 130 may be implemented as at least one cloud server operated on a distributed computing basis. The artificial intelligence computing service device 130 may be connected to the user terminal 110 through a wired network or a wireless network such as Bluetooth, WiFi, or the like, and may communicate with the user terminal 110 through a wired or wireless network.

인공지능 연산 서비스 장치(130)는 데이터베이스(150)와 연동하여 분산 행렬 연산과 관련된 적어도 하나의 클라우드 서버에 관한 CPU(Central Processing Unit), GPU(Graphics Processing Unit), TPU(Tensor Processing Unit) 및 메모리를 포함하는 자원 정보를 저장할 수 있다. 한편, 인공지능 연산 서비스 장치(130)는 도 1과 달리, 데이터베이스(150)를 내부에 포함하여 구현될 수 있다.The artificial intelligence computing service device 130 is linked to the database 150 and may include a central processing unit (CPU), a graphics processing unit (GPU), a tensor processing unit (TPU), and a memory for at least one cloud server associated with a distributed matrix operation. Resource information including a may be stored. On the other hand, the artificial intelligence computing service device 130 may be implemented by including a database 150 therein, unlike FIG.

데이터베이스(150)는 인공지능 연산 서비스 장치(130)가 사용자 단말(110)로부터 수신한 인공지능 연산 서비스 요청에 따라 다양한 형태의 분산 행렬 곱셈의 연산 시간을 예측하기 위해 사용하는 다양한 정보들을 저장할 수 있다. 예를 들어, 데이터베이스(150)는 다양한 형태의 행렬 곱셈에 대한 프로파일링 정보를 저장할 수 있고, 행렬 곱셈 연산을 수행하기 위하여 분산 배치된 클라우드 상의 작업 서버에 관한 레이아웃(layout) 정보를 저장할 수 있으며, 반드시 이에 한정되지 않고, 사용자 최적의 클라우드 컴퓨팅 서비스 환경을 제공하기 위하여 인공지능 연산 서비스를 제공하는 과정에서 다양한 형태로 수집 또는 가공된 정보들을 저장할 수 있다.The database 150 may store various pieces of information used by the AI operation apparatus 130 to predict operation time of various types of distributed matrix multiplication according to the AI operation request received from the user terminal 110. . For example, the database 150 may store profiling information for various types of matrix multiplication, store layout information about a job server on a distributed cloud to perform matrix multiplication, The present invention is not limited thereto and may store information collected or processed in various forms in the process of providing an AI computing service to provide an optimal cloud computing service environment for a user.

도 2는 도 1에 있는 인공지능 연산 서비스 장치를 나타내는 블록도이다.FIG. 2 is a block diagram illustrating an artificial intelligence service device of FIG. 1.

도 2를 참조하면, 인공지능 연산 서비스 장치(130)는 서버 레이아웃 결정부(210), 파티션 분할부(230), 단위 연산 시간 산출부(250), 전체 연산 시간 예측부(270) 및 제어부(290)를 포함할 수 있다.Referring to FIG. 2, the AI calculation service device 130 may include a server layout determiner 210, a partition divider 230, a unit operation time calculator 250, a total operation time predictor 270, and a controller ( 290).

서버 레이아웃 결정부(210)는 제1 및 제2 행렬들 각각의 형상을 기초로 행렬 곱셈 연산을 위한 서버 레이아웃(Layout)을 결정할 수 있다. 여기에서, 서버 레이아웃(Layout)은 클라우드 컴퓨팅 인스턴스들 중에서 제1 및 제2 행렬들 간의 행렬 곱셈 연산을 수행할 수 있는 인스턴스들에 관한 정보에 해당할 수 있다. 예를 들어, 서버 레이아웃은 행렬 곱셈 연산을 수행할 수 있는 가용 인스턴스들의 수, 행렬 곱셈 연산을 구성하는 분산 행렬 연산들 및 가용 인스턴스들 간의 매칭 정보, 그리고 최종 행렬 곱셈 연산 결과를 산출하기 위한 각 분산 행렬 연산들의 통합 정보를 포함할 수 있다. 서버 레이아웃 결정부(210)는 제1 및 제2 행렬들 각각의 형상 및 크기와 현재의 클라우드 컴퓨팅 인스턴스의 가용 상황 등을 고려하여 행렬 곱셈 연산을 위한 서버 레이아웃(Layout)을 결정할 수 있다.The server layout determiner 210 may determine a server layout for a matrix multiplication operation based on the shape of each of the first and second matrices. Here, the server layout may correspond to information about instances among the cloud computing instances capable of performing a matrix multiplication operation between the first and second matrices. For example, the server layout may include the number of available instances that can perform matrix multiplication operations, the variance matrix operations constituting the matrix multiplication operation and matching information between the available instances, and each variance to yield the final matrix multiplication operation result. It may include integration information of matrix operations. The server layout determiner 210 may determine a server layout for the matrix multiplication operation in consideration of the shape and size of each of the first and second matrices and the current availability of the cloud computing instance.

일 실시예에서, 서버 레이아웃 결정부(210)는 복수의 서버들 중에서 제1 및 제2 행렬들 간의 행렬 곱셈 연산의 특정 조건을 충족시키는 적어도 하나의 작업 서버를 결정하고 작업 서버의 개수를 기초로 제1 및 제2 행렬들의 형상들에 맞게 적어도 하나의 작업 서버를 매칭시켜 서버 레이아웃을 결정할 수 있다. 여기에서, 복수의 서버들은 클라우드 컴퓨팅 인스턴스들 각각에 대응할 수 있고, 작업 서버는 가용 클라우드 컴퓨팅 인스턴스에 해당할 수 있다.In an embodiment, the server layout determiner 210 determines at least one job server that satisfies a specific condition of a matrix multiplication operation between first and second matrices among a plurality of servers and based on the number of job servers. The server layout may be determined by matching at least one job server according to the shapes of the first and second matrices. Here, the plurality of servers may correspond to each of the cloud computing instances, and the job server may correspond to an available cloud computing instance.

서버 레이아웃 결정부(210)는 제1 및 제2 행렬들의 크기와 형상을 기초로 행렬 곱셈 연산을 통해 산출되는 결과 행렬의 크기를 결정할 수 있고, 결과 행렬의 크기, 즉 행의 수와 열의 수를 곱한 값과 동일하게 작업 서버의 수를 결정할 수 있다. 예를 들어, 서버 레이아웃 결정부(210)는 2행 * 2열의 형상을 가진 결과 행렬의 크기가 4(= 2 * 2)인 경우 작업 서버의 수를 4로 결정할 수 있고, 3행 * 5열의 형상을 가진 결과 행렬의 크기가 15(= 3 * 5)인 경우 작업 서버의 수를 15로 결정할 수 있다. 다른 실시예에서, 서버 레이아웃 결정부(210)는 결과 행렬의 크기와 상관없이 제1 및 제2 행렬들 각각의 형상 또는 크기를 기초로 작업 서버의 수를 결정할 수 있다.The server layout determiner 210 may determine the size of the result matrix calculated through the matrix multiplication operation based on the size and shape of the first and second matrices, and determine the size of the result matrix, that is, the number of rows and the number of columns. You can determine the number of job servers equal to the product of the multiplication. For example, the server layout determiner 210 may determine the number of job servers as 4 when the size of the result matrix having the shape of 2 rows * 2 columns is 4 (= 2 * 2), and the 3 rows * 5 columns If the size of the resulting matrix is 15 (= 3 * 5), the number of job servers can be determined to be 15. In another embodiment, the server layout determiner 210 may determine the number of job servers based on the shape or size of each of the first and second matrices regardless of the size of the result matrix.

서버 레이아웃 결정부(210)는 작업 서버의 개수를 기초로 제1 및 제2 행렬들의 형상들에 맞게 적어도 하나의 작업 서버를 매칭시켜 서버 레이아웃을 결정할 수 있다. 보다 구체적으로, 서버 레이아웃 결정부(210)는 제1 및 제2 행렬들의 형상에 따라 결정되는 결과 행렬의 형상을 기초로 결과 행렬을 구성하는 복수의 분산 행렬 연산들 각각을 수행할 작업 서버를 일대일로 매칭시킬 수 있다. 결과적으로, 서버 레이아웃 결정부(210)는 행렬 곱셈 연산에 필요한 작업 서버의 수, 각 작업 서버에서 수행되는 분산 행렬 연산에 관한 정보 및 각 작업 서버가 클라우드 네트워크 상에 존재하는 위치 정보 등을 포함하는 서버 레이아웃을 결정할 수 있다.The server layout determiner 210 may determine the server layout by matching at least one job server according to the shapes of the first and second matrices based on the number of job servers. More specifically, the server layout determiner 210 performs one-to-one job server to perform each of a plurality of distributed matrix operations constituting the result matrix based on the shape of the result matrix determined according to the shape of the first and second matrices. Can be matched with As a result, the server layout determiner 210 includes a number of job servers required for matrix multiplication operations, information on distributed matrix operations performed by each job server, and location information of each job server on a cloud network. You can determine the server layout.

파티션 분할부(230)는 서버 레이아웃을 기초로 제1 및 제2 행렬들을 제1 및 제2 블록 기반의 파티션(Partition)들로 분할할 수 있다. 여기에서, 블록 기반의 파티션(Partition)은 행렬 곱셈 연산에 사용되는 왼쪽 및 오른쪽 행렬 각각을 구성하는 부분 행렬에 해당할 수 있다. 예를 들어, 왼쪽 또는 오른쪽 행렬이 6행 * 6열의 크기를 가진 행렬이고 해당 행렬의 행 및 열을 각각 2개의 블록으로 분할하는 경우 총 4개의 블록 기반의 파티션들로 분할될 수 있다. 분할된 4개의 블록 기반의 파티션들은 각각 3행 * 3열의 크기를 가진 행렬에 해당할 수 있다. 결과적으로, 파티션 분할부(230)는 왼쪽 또는 오른쪽 행렬을 행에 대해 2개의 블록들로 분할하고, 열에 대해 2개의 블록들로 분할할 수 있다.The partition divider 230 may divide the first and second matrices into first and second block-based partitions based on the server layout. Here, the block-based partition may correspond to a partial matrix constituting each of the left and right matrices used for matrix multiplication. For example, if the left or right matrix is a matrix having a size of 6 rows * 6 columns, and the rows and columns of the matrix are divided into two blocks, the partitions may be divided into a total of four block-based partitions. The divided four block-based partitions may correspond to a matrix having a size of 3 rows * 3 columns. As a result, partition divider 230 may split the left or right matrix into two blocks for a row and two blocks for a column.

일 실시예에서, 파티션 분할부(230)는 제1 및 제2 행렬들 각각에 대해 작업 서버의 개수를 기초로 제1 및 제2 블록 기반의 파티션들로 분할할 수 있다. 파티션 분할부(230)는 서버 레이아웃에 포함된 작업 서버의 개수가 12인 경우 제1 및 제2 행렬 간의 행렬 곱셈 연산이 총 12개의 작업 서버에서 분산되어 수행될 수 있도록 제1 및 제2 행렬을 제1 및 제2 블록 기반의 파티션들로 분할할 수 있다. 파티션 분할부(230)에 의해 분할된 제1 및 제2 블록 기반의 파티션들의 형상은 제1 및 제2 행렬의 형상 및 작업 서버의 개수를 기초로 결정될 수 있다.In one embodiment, the partition divider 230 may divide the first and second matrices into partitions based on the first and second blocks based on the number of job servers. When the number of job servers included in the server layout is 12, the partition dividing unit 230 divides the first and second matrices so that matrix multiplication operations between the first and second matrices may be performed by being distributed in a total of 12 job servers. It may be partitioned into first and second block based partitions. The shapes of the first and second block-based partitions divided by the partition dividing unit 230 may be determined based on the shapes of the first and second matrices and the number of job servers.

일 실시예에서, 파티션 분할부(230)는 제곱승으로 구성된 서버의 개수를 기초로 곱셈 연산을 위한 제1 및 제2 행렬들을 제1 및 제2 블록 기반 파티션(Partition)들로 분할할 수 있다. 예를 들어, 파티션 분할부(230)는 클라우드 컴퓨팅 환경에서 가용할 수 있는 서버의 개수가 4, 9 및 16과 같이 각각 2, 3 및 4의 제곱승인 경우 제1 및 제2 행렬들 각각의 행과 열에 대해 2, 3, 4블록 기반의 파티션들로 분할할 수 있다. 제1 행렬이 크기가 16000 * 32000이고 제2 행렬의 크기가 32000 * 16000인 경우 서버의 개수가 4대라고 하면 각각의 서버는 8000 * 16000인 행렬에 해당하는 제1 블록 기반 파티션과 16000 * 8000인 행렬에 해당하는 제2 블록 기반 파티션 간의 분산 행렬 연산을 수행할 수 있다. 파티션 분할부(230)는 행렬 분배의 효율성을 위하여 서버의 개수가 제곱승의 값과 동일한 경우를 가정하여 제1 및 제2 행렬들을 제1 및 제2 블록 기반의 파티션들로 분할할 수 있다.In one embodiment, the partition divider 230 may divide the first and second matrices for the multiplication operation into first and second block-based partitions based on the number of servers formed of squared powers. . For example, the partition divider 230 is a row of each of the first and second matrices when the number of servers available in the cloud computing environment is the squares of 2, 3, and 4, such as 4, 9, and 16, respectively. Partitions can be divided into two, three, and four block-based partitions for overheating. If the first matrix has a size of 16000 * 32000 and the second matrix has a size of 32000 * 16000, if the number of servers is four, each server has a first block-based partition and 16000 * 8000 corresponding to a matrix of 8000 * 16000. A distribution matrix operation may be performed between second block-based partitions corresponding to a matrix. The partition dividing unit 230 may divide the first and second matrices into partitions based on the first and second blocks on the assumption that the number of servers is equal to the squared value for the efficiency of matrix distribution.

단위 연산 시간 산출부(250)는 제1 및 제2 블록 기반 파티션들에 대해 분산 행렬 특성을 적용하여 단위 연산 시간을 산출할 수 있다. 단위 연산 시간 산출부(250)는 제1 및 제2 블록 기반의 파티션들에 관한 분산 행렬 연산을 담당하는 각 작업 서버에서의 연산시간을 고려하여 단위 연산 시간을 산출할 수 있고, 단위 연산 시간은 성능 예측 모델을 이용하여 산출할 수 있다. The unit operation time calculator 250 may calculate the unit operation time by applying a dispersion matrix characteristic to the first and second block-based partitions. The unit operation time calculating unit 250 may calculate the unit operation time in consideration of the operation time of each work server in charge of the distribution matrix operation for the partitions based on the first and second blocks. It can be calculated using a performance prediction model.

여기에서, 분산 행렬 특성은 다양한 크기와 형상을 가진 임의의 행렬들 간의 행렬 곱셈에 소요되는 연산시간에 영향을 줄 수 있는 행렬 특성에 해당할 수 있다. 예를 들어, 분산 행렬 특성은 왼쪽 행렬의 행 크기(LR)와 열 크기(LC), 오른쪽 행렬의 행 크기(LC)와 열 크기(RC), 왼쪽 행렬의 전체 크기(LR*LC), 오른쪽 행렬의 전체 크기(LC*RC), 왼쪽 및 오른쪽 행렬들의 크기 합계(LR*LC+LC*RC), 결과 행렬의 크기(LR*RC) 및 행렬 연산 횟수(LR*LC*RC)를 포함할 수 있다.Here, the dispersion matrix characteristic may correspond to a matrix characteristic that may affect the computation time required for matrix multiplication between arbitrary matrices having various sizes and shapes. For example, the variance matrix properties include the row size (LR) and column size (LC) of the left matrix, the row size (LC) and column size (RC) of the right matrix, the total size of the left matrix (LR * LC), and the right Include the total size of the matrix (LC * RC), the sum of the sizes of the left and right matrices (LR * LC + LC * RC), the size of the resulting matrix (LR * RC), and the number of matrix operations (LR * LC * RC). Can be.

일 실시예에서, 단위 연산 시간 산출부(250)는 제1 및 제2 블록 기반 파티션들을 서로 연관시켜 서버 레이아웃을 기초로 적어도 하나의 작업 서버에 할당하고 적어도 하나의 작업 서버에서의 연산 시간을 이용하여 단위 연산 시간을 산출할 수 있다. 단위 연산 시간 산출부(250)는 서버 레이아웃에 포함된 매칭 정보를 기초로 특정 작업 서버에서 담당하는 분산 행렬 연산과 연관된 제1 및 제2 블록 기반 파티션들을 결정할 수 있고, 해당 제1 및 제2 블록 기반 파티션들의 형상 및 크기를 기초로 성능 예측 모델을 통해 단위 연산 시간을 산출할 수 있다.In one embodiment, the unit operation time calculator 250 associates the first and second block-based partitions with each other and allocates the at least one work server based on the server layout and uses the compute time at the at least one work server. Unit calculation time can be calculated. The unit operation time calculator 250 may determine the first and second block based partitions associated with the distribution matrix operation that is performed by the specific job server based on the matching information included in the server layout, and the corresponding first and second blocks. Based on the shape and size of the base partitions, the unit operation time may be calculated through the performance prediction model.

일 실시예에서, 단위 연산 시간 산출부(250)는 적어도 하나의 작업 서버에 할당된 제1 블록 기반 파티션에 대한 행의 크기와 열의 크기 및 제2 블록 기반 파티션에 대한 열의 크기를 기초로 성능 예측 모델을 통해 단위 연산 시간을 산출할 수 있다. 단위 연산 시간 산출부(250)는 제1 블록 기반 파티션에 대한 행의 크기와 열의 크기 및 제2 블록 기반 파티션에 대한 열의 크기를 포함하는 입력 데이터를 성능 예측 모델에 입력하여 적어도 하나의 작업 서버에서의 연산 시간을 산출할 수 있고 이를 기초로 단위 연산 시간을 산출할 수 있다.In an embodiment, the unit operation time calculator 250 estimates the performance based on the size of the row and the size of the column for the first block-based partition and the size of the column for the second block-based partition. The unit calculation time can be calculated through the model. The unit operation time calculating unit 250 inputs input data including the size of the row and the size of the column for the first block-based partition and the size of the column for the second block-based partition, to the performance prediction model to input the input data into the performance prediction model. It is possible to calculate the operation time of and unit calculation time can be calculated based on this.

일 실시예에서, 단위 연산 시간 산출부(250)는 성능 예측 모델의 작업 서버의 개수에 대한 서버 레이아웃에 포함된 작업 서버의 개수에 대한 비율을 기초로 분산 행렬 특성을 결정할 수 있다. 예를 들어, 단위 연산 시간 산출부(250)는 성능 예측 모델의 작업 서버의 개수가 4이고 서버 레이아웃에 포함된 작업 서버의 개수가 9인 경우 제1 및 제2 행렬들의 크기에 각각 2/3를 곱하여 보정된 제1 및 제2 행렬의 크기를 결정할 수 있다. 단위 연산 시간 산출부(250)는 보정된 제1 행렬의 행의 크기와 열의 크기 및 보정된 제2 행렬의 열의 크기를 분산 행렬 특성으로서 결정할 수 있다. In an embodiment, the unit operation time calculator 250 may determine a distribution matrix characteristic based on a ratio of the number of work servers included in the server layout to the number of work servers of the performance prediction model. For example, when the number of job servers of the performance prediction model is 4 and the number of job servers included in the server layout is 9, the unit operation time calculator 250 is 2/3 of the size of the first and second matrices, respectively. The size of the corrected first and second matrices can be determined by multiplying by. The unit operation time calculator 250 may determine the size of the corrected row and column of the first matrix and the size of the corrected column of the second matrix as dispersion matrix characteristics.

전체 연산 시간 예측부(270)는 단위 연산 시간을 기초로 제1 및 제2 행렬들의 곱셈 연산에 필요한 전체 연산 시간을 예측할 수 있다. 전체 연산 시간 예측부(270)는 작업 서버에서의 단위 연산 시간을 산출할 수 있고 산출된 단위 연산 시간들을 통합하여 전체 연산 시간을 산출할 수 있다. 전체 연산 시간 예측부(270)는 성능 예측 모델을 통해 단위 연산 시간을 산출하는 경우 산출된 단위 연산 시간을 기초로 선형 방정식을 활용하여 전체 연산 시간을 산출할 수 있다.The total computation time prediction unit 270 may predict the total computation time required for the multiplication operation of the first and second matrices based on the unit computation time. The total operation time predicting unit 270 may calculate the unit operation time in the job server and calculate the total operation time by integrating the calculated unit operation times. When calculating the unit operation time through the performance prediction model, the total calculation time prediction unit 270 may calculate the total calculation time by using a linear equation based on the calculated unit calculation time.

일 실시예에서, 전체 연산 시간 예측부(270)는 단위 연산 시간에 성능 예측 모델의 작업 서버의 개수에 대한 서버 레이아웃에 포함된 작업 서버의 개수에 대한 비율을 적용하여 전체 연산 시간을 예측할 수 있다. 전체 연산 시간 예측부(270)는 성능 예측 모델이 특정 개수의 작업 서버만이 가용될 수 있는 경우를 가정하여 생성되었음을 이용하여 성능 예측 모델을 통해 산출된 단위 연산 시간에 대해 비율 조정을 통해 전체 연산 시간을 예측할 수 있다. 이에 대해서는 도 6에서 보다 자세히 설명한다.In one embodiment, the total computation time estimator 270 may estimate the total computation time by applying a ratio of the number of task servers included in the server layout to the number of task servers of the performance prediction model to the unit computation time. . The total calculation time prediction unit 270 calculates the total calculation by adjusting the ratio of the unit calculation time calculated through the performance prediction model using the fact that the performance prediction model is generated assuming that only a certain number of job servers are available. Predict the time. This will be described in more detail with reference to FIG. 6.

제어부(290)는 인공지능 연산 서비스 장치(130)의 전체적인 동작을 제어하고, 서버 레이아웃 결정부(210), 파티션 분할부(230), 단위 연산 시간 산출부(250), 전체 연산 시간 예측부(270) 간의 제어 흐름 또는 데이터 흐름을 관리할 수 있다.The control unit 290 controls the overall operation of the artificial intelligence calculation service device 130, the server layout determination unit 210, partition partitioning unit 230, unit operation time calculation unit 250, total operation time prediction unit ( 270 may manage the control flow or data flow between.

도 3은 도 1에 있는 인공지능 연산 서비스 장치에서 수행되는 인공지능 연산 서비스 제공 과정을 설명하는 순서도이다.FIG. 3 is a flowchart illustrating a process of providing an artificial intelligence calculation service performed by the artificial intelligence service apparatus of FIG. 1.

도 3을 참조하면, 인공지능 연산 서비스 장치(130)는 서버 레이아웃 결정부(210)를 통해 제1 및 제2 행렬들 각각의 형상을 기초로 행렬 곱셈 연산을 위한 서버 레이아웃을 결정할 수 있다(단계 S310). 인공지능 연산 서비스 장치(130)는 파티션 분할부(230)를 통해 서버 레이아웃 결정부(210)에 의해 결정된 서버 레이아웃을 기초로 제1 및 제2 행렬들을 제1 및 제2 블록 기반의 파티션들로 분할할 수 있다(단계 S330).Referring to FIG. 3, the AI operation apparatus 130 may determine a server layout for matrix multiplication operation based on the shape of each of the first and second matrices through the server layout determiner 210 (step). S310). The artificial intelligence service device 130 converts the first and second matrices into first and second block based partitions based on the server layout determined by the server layout determiner 210 through the partition divider 230. It may divide (step S330).

인공지능 연산 서비스 장치(130)는 단위 연산 시간 산출부(250)를 통해 제1 및 제2 블록 기반 파티션들에 대해 분산 행렬 특성을 적용하여 단위 연산 시간을 산출할 수 있다(단계 S350). 인공지능 연산 서비스 장치(130)는 전체 연산 시간 예측부(270)를 통해 단위 연산 시간 산출부(250)에 의해 산출된 단위 연산 시간을 기초로 제1 및 제2 행렬들의 곱셈 연산에 필요한 전체 연산 시간을 예측할 수 있다(단계 S370).The artificial intelligence operation service device 130 may calculate the unit operation time by applying a distribution matrix characteristic to the first and second block-based partitions through the unit operation time calculator 250 (step S350). The AI operation service device 130 may perform all operations required for the multiplication operation of the first and second matrices based on the unit operation time calculated by the unit operation time calculator 250 through the total operation time predictor 270. The time can be predicted (step S370).

도 4는 도 1에 있는 인공지능 연산 서비스 장치에서 블록 기반의 분산 행렬 곱셈이 수행되는 과정을 설명하는 예시도이다.FIG. 4 is an exemplary diagram illustrating a process of performing block-based distributed matrix multiplication in the artificial intelligence service device of FIG. 1.

도 4를 참조하면, 임의의 행렬 A 및 B 간의 행렬 곱셈 연산이 복수개의 분할 곱셈 연산으로 분할되어 클라우드 네트워크 상에 배치된 여러 작업 서버들에서 수행된 후 하나로 통합되어 최종적인 결과로서 행렬 C가 생성되는 과정을 확인할 수 있다. 보다 구체적으로, 왼쪽 행렬 A(410)는 3행 * 2열의 크기를 가진 행렬이고, 오른쪽 행렬 B(430)은 2행 * 3열의 크기를 가진 행렬이며, 왼쪽 행렬 A(410) 및 오른쪽 행렬 B(430) 간의 행렬 곱셈을 통해 3행 * 3열의 크기를 가진 결과 행렬 C(450)가 산출되는 과정을 확인할 수 있다.Referring to FIG. 4, a matrix multiplication operation between arbitrary matrices A and B is divided into a plurality of division multiplication operations, performed at several job servers arranged on a cloud network, and then merged into one to generate a matrix C as a final result. You can check the process. More specifically, the left matrix A 410 is a matrix having a size of 3 rows * 2 columns, the right matrix B 430 is a matrix having a size of 2 rows * 3 columns, and the left matrix A 410 and the right matrix B The process of calculating the result matrix C 450 having the size of 3 rows * 3 columns through matrix multiplication between 430 can be confirmed.

인공지능 연산 서비스 장치(130)는 서버 레이아웃 결정부(210)를 통해 총 9개의 작업 서버(worker-1, worker-2, ..., worker-9)를 결정할 수 있고 9개의 작업 서버에 대해 3행 * 3열의 결과 행렬 C(450)에 맞춰 분산 행렬 연산을 매칭시킨 서버 레이아웃을 결정할 수 있다. 인공지능 연산 서비스 장치(130)는 서버 레이아웃을 기초로 파티션 분할부(230)를 통해 왼쪽 행렬 A(410)를 6개의 제1 블록 기반 파티션들로 분할하고, 오른쪽 행렬 B(430)를 6개의 제2 블록 기반 파티션들로 분할할 수 있다.The artificial intelligence computing service device 130 may determine a total of nine work servers (worker-1, worker-2, ..., worker-9) through the server layout determiner 210 and for nine work servers. According to the result matrix C 450 of 3 rows * 3 columns, the server layout that matches the distribution matrix operation can be determined. The artificial intelligence computing service device 130 divides the left matrix A 410 into six first block-based partitions and partitions the right matrix B 430 into six through the partition divider 230 based on the server layout. It may be divided into second block-based partitions.

서버 레이아웃 결정부(210)에 의해 결정된 9개의 작업 서버들은 각각 할당된 분산 행렬 연산을 위해 클라우드 네트워크를 통해 제1 및 제2 블록 기반 파티션들을 수집할 수 있다. 예를 들어, 작업 서버 1(worker-1)은 결과 행렬 C(450)의 (0, 0)의 원소에 해당하는 값을 산출하는 분산 행렬 연산을 수행할 수 있다. 보다 구체적으로, 작업 서버 1(worker-1)은 C(0, 0)을 산출하기 위해 A(0, 0), A(0, 1), B(0, 1), B(1, 0)을 수집할 수 있다. 이러한 과정은 cogroup 작업에 해당할 수 있고 cogroup 작업 시간은 네트워크 성능에 가장 많은 영향을 받을 수 있다.The nine job servers determined by the server layout determiner 210 may collect the first and second block-based partitions through the cloud network, respectively, for the assigned distribution matrix operation. For example, the work server 1 (worker-1) may perform a variance matrix operation that calculates a value corresponding to an element of (0, 0) of the result matrix C 450. More specifically, worker server 1 (worker-1) is A (0, 0), A (0, 1), B (0, 1), B (1, 0) to yield C (0, 0). Can be collected. This process may correspond to cogroup work, and cogroup work time may be most affected by network performance.

cogroup 작업이 완료되면, 각 작업 서버는 제1 및 제2 블록 기반 파티션들 간의 행렬 곱셈 연산 및 원소들 간의 덧셈 연산을 수행할 수 있다. 예를 들어, 작업 서버 1(worker-1)은 C(0, 0)을 산출하기 위해 A(0, 0) * B(0, 1), A(0, 1) * B(1, 0)의 행렬 곱셈 연산을 수행하고 (A(0, 0) * B(0, 1)) + (A(0, 1) * B(1, 0))의 덧셈 연산을 수행할 수 있다. 각 작업 서버에서의 행렬 곱셈 연산은 I/O 디스크 성능 및 컴퓨팅 성능에 가장 많은 영향을 받을 수 있고, 원소들 간의 덧셈 연산은 컴퓨팅 성능에 가장 많은 영향을 받을 수 있다. When the cogroup task is completed, each task server may perform a matrix multiplication operation between the first and second block-based partitions and an addition operation between elements. For example, work server 1 (worker-1) is A (0, 0) * B (0, 1), A (0, 1) * B (1, 0) to yield C (0, 0) A matrix multiplication operation of may be performed and an addition operation of (A (0, 0) * B (0, 1)) + (A (0, 1) * B (1, 0)) may be performed. Matrix multiplication operations at each work server can be most affected by I / O disk performance and computing performance, and addition operations between elements can be most affected by computing performance.

도 5는 도 1에 있는 인공지능 연산 서비스 장치에서 성능 예측 모델을 생성하는 과정을 설명하는 예시도이다.FIG. 5 is an exemplary diagram illustrating a process of generating a performance prediction model in the artificial intelligence computing service device of FIG. 1.

도 5를 참조하면, 성능 예측 모델 생성 과정은 크게 학습 데이터 집합 생성 단계, 특징 추출 단계 및 모델링 작업 단계로 구성될 수 있다.Referring to FIG. 5, the performance prediction model generation process may be largely composed of a training data set generation step, a feature extraction step, and a modeling work step.

인공지능 연산 서비스 장치(130)는 학습 데이터 집합 생성(Model data generation) 단계에서 다양한 형상과 크기를 가지는 행렬들 간의 행렬 곱셈에 관한 학습 데이터를 생성할 수 있다. 일 실시예에서, 인공지능 연산 서비스 장치(130)는 다양한 행렬 곱셈들에 관한 모집단에서 샘플링을 통해 행렬 곱셈에 관한 학습 데이터 집합을 생성할 수 있다.The artificial intelligence operation service device 130 may generate training data regarding matrix multiplication between matrices having various shapes and sizes in a model data generation step. In one embodiment, the artificial intelligence arithmetic service device 130 may generate a training data set for matrix multiplication through sampling in a population for various matrix multiplications.

인공지능 연산 서비스 장치(130)는 특징 추출(Feature extraction) 단계에서 학습 데이터 집합 생성 단계에서 생성된 학습 데이터 집합에 대한 특징을 추출할 수 있다. 보다 구체적으로, 인공지능 연산 서비스 장치(130)는 학습 데이터 집합에 있는 행렬 곱셈들에 대한 분산 행렬 특성을 추출할 수 있다. 분산 행렬 특성은 행렬 곱셈에 소요되는 연산시간에 영향을 미치는 행렬 특성에 해당할 수 있다. 예를 들어, 분산 행렬 특성은 왼쪽 행렬의 행의 크기(LC)와 열의 크기(LC) 및 오른쪽 행렬의 열의 크기(RC)를 포함할 수 있다. 인공지능 연산 서비스 장치(130)는 단위 연산 시간 산출부(250)를 통해 각 작업서버에 할당된 제1 및 제2 블록 기반 파티션들에 대해 분산 행렬 특성을 산출하여 단위 연산 시간을 산출하는데 활용할 수 있다.The artificial intelligence operation service device 130 may extract a feature for the training data set generated in the training data set generation step in the feature extraction step. More specifically, the artificial intelligence computing service device 130 may extract the variance matrix property of the matrix multiplications in the training data set. The variance matrix property may correspond to a matrix property that affects the computation time required for matrix multiplication. For example, the variance matrix characteristic may include the size LC of the left matrix, the size LC of the column, and the size RC of the column of the right matrix. The artificial intelligence computing service device 130 may calculate the dispersion matrix characteristics of the first and second block-based partitions allocated to each work server through the unit operation time calculator 250 and calculate the unit operation time. have.

인공지능 연산 서비스 장치(130)는 모델링 작업(Modeling) 단계에서 성능 예측 모델을 생성할 수 있다. 성능 예측 모델 생성은 크게 모델 구축 단계 및 하이퍼 파라미터(hyper-parameter) 검색 단계로 구성될 수 있다. 인공지능 연산 서비스 장치(130)는 모델 구축 단계에서 GB(Gradient Boost) Regressor를 사용할 수 있고, 하이퍼 파라미터 검색 단계에서 GB 방법(method)에 대한 최적의 파라미터들을 찾기 위해 베이지안 최적화(Bayesian Optimization)를 사용할 수 있다. GB Regressor 및 Bayesian Optimization의 구체적인 동작 과정은 생략한다.The artificial intelligence computing service device 130 may generate a performance prediction model in a modeling step. Performance prediction model generation can be largely composed of a model building step and a hyper-parameter search step. The AI computing service device 130 may use a gradient boost (GB) regressor in the model building phase and may use Bayesian Optimization to find the optimal parameters for the GB method in the hyperparameter retrieval phase. Can be. The detailed operation of GB Regressor and Bayesian Optimization is omitted.

도 6은 도 1에 있는 인공지능 연산 서비스 장치에서 서로 다른 서버 레이아웃을 기초로 행렬 곱셈에 대한 전체 연산 시간을 예측하는 과정을 설명하는 예시도이다.FIG. 6 is an exemplary diagram illustrating a process of estimating a total operation time for matrix multiplication on the basis of different server layouts in the artificial intelligence service apparatus of FIG. 1.

도 6을 참조하면, 인공지능 연산 서비스 장치(130)는 왼쪽 행렬 A와 오른쪽 행렬 B 간의 행렬 곱셈을 통해 결과 행렬 C를 산출할 수 있다. 왼쪽 행렬 A, 오른쪽 행렬 B 및 결과 행렬 C의 크기가 각각 (48000, 24000), (24000,36000) 및 (48000,36000)이고 성능 예측 모델이 4개의 작업 서버들에서 분산 행렬 연산이 수행됨을 전제로 생성되었다고 가정할 경우 인공지능 연산 서비스 장치(130)는, 도 6과 같이, 서로 다른 서버 레이아웃을 기초로 행렬 곱셈에 대한 전체 연산 시간을 예측할 수 있다. 도 6에서는 서버 레이아웃에 포함된 작업 서버의 개수가 제곱승에 해당하는 경우만을 예로 들어 설명하고 있으나, 인공지능 연산 서비스 장치(130)는 서버 레이아웃에 포함된 작업 서버의 개수가 제곱승이 아닌 경우에도 가장 차이가 적은 제곱승의 경우를 적용하거나 또는 이와 유사한 방식을 사용할 수 있다.Referring to FIG. 6, the artificial intelligence computing service device 130 may calculate a result matrix C through matrix multiplication between a left matrix A and a right matrix B. The dimensions of left matrix A, right matrix B and result matrix C are (48000, 24000), (24000,36000) and (48000,36000), respectively, and the performance prediction model assumes that distributed matrix operations are performed on four job servers. If it is assumed that is generated by the AI service device 130, as shown in Figure 6, based on the different server layout, it is possible to predict the total operation time for the matrix multiplication. In FIG. 6, only the case where the number of job servers included in the server layout corresponds to the squared power is described as an example. However, even when the number of job servers included in the server layout is not the squared power, the AI calculation apparatus 130 may be used. The least squares case can be applied or a similar method can be used.

서버 레이아웃 결정부(210)에 의해 결정된 서버 레이아웃이 작업 서버들의 수로서 9를 포함하고 있는 경우, 파티션 분할부(230)를 통해 왼쪽 행렬은 (16000, 8000)의 크기를 갖는 블록 기반 파티션들로 분할될 수 있고 오른쪽 행렬은 (8000, 12000)의 크기를 갖는 블록 기반 파티션들로 분할될 수 있다. 또한, 각 작업 서버에서 산출되는 결과 행렬은 (16000,12000)의 크기를 갖는 부분 행렬에 해당할 수 있다. If the server layout determined by the server layout determiner 210 includes 9 as the number of working servers, the left matrix through the partition divider 230 is a block-based partition having a size of (16000, 8000). The right matrix may be divided into block based partitions having sizes of (8000, 12000). In addition, the result matrix calculated at each job server may correspond to a partial matrix having a size of (16000, 12000).

상기의 9개의 작업 서버들을 갖는 서버 레이아웃을 기초로 결정된 블록 기반 파티션들의 크기들과 동일한 블록 기반 파티션들을 4개의 작업 서버들에서 분산 행렬 연산을 수행한다면 왼쪽 행렬의 크기는 (32000, 16000)이고 오른쪽 행렬의 크기는 (16000, 24000)이며 결과 행렬의 크기는 (32000, 24000)이 될 수 있다. 따라서, 4개의 작업 서버들을 기준으로 생성된 성능 예측 모델을 통해 9개의 작업 서버들에서 수행되는 행렬 곱셈의 전체 연산시간을 산출하기 위해서는 (32000, 16000, 24000)로 구성된 행렬 특성 정보를 성능 예측 모델에 입력하여야 하며, f(32000, 16000, 24000)으로 표현될 수 있다. If the distribution matrix operation is performed on four work servers with block-based partitions that are identical to the sizes of the block-based partitions determined based on the server layout having the nine work servers, the size of the left matrix is (32000, 16000) and the right side. The size of the matrix can be (16000, 24000) and the size of the resulting matrix can be (32000, 24000). Therefore, in order to calculate the total computation time of matrix multiplication performed in nine work servers through a performance prediction model generated based on four work servers, matrix characteristic information consisting of (32000, 16000, and 24000) is calculated using a performance prediction model. It should be entered in f, and can be expressed as f (32000, 16000, 24000).

여기에서, 함수 f는 행렬 곱셈에 대한 성능 예측 모델을 의미하며, f(LR, LC, RC)는 왼쪽 행렬의 행의 크기와 열의 크기 및 오른쪽 행렬의 열의 크기를 성능 예측 모델에 입력하여 출력된 결과에 해당할 수 있다. 즉, f(LR, LC, RC)는 LR * LC 크기의 왼쪽 행렬 및 LC * RC 크기의 오른쪽 행렬 간의 행렬 곱셈에 대하여 성능 예측 모델이 예측한 연산시간에 해당할 수 있다. 또한, 최종적인 전체 연산 시간은 성능 예측 모델을 통해 산출된 f(32000, 16000, 24000)의 값에 대해 성능 예측 모델의 작업 서버의 개수에 대한 서버 레이아웃에 포함된 작업 서버의 개수에 대한 비율, 즉 3/2를 곱하여 산출할 수 있다.Here, the function f denotes a performance prediction model for matrix multiplication, and f (LR, LC, RC) denotes the size of the row and column of the left matrix and the size of the column of the right matrix into the performance prediction model. This may be the result. That is, f (LR, LC, RC) may correspond to the computation time predicted by the performance prediction model for matrix multiplication between the left matrix of LR * LC size and the right matrix of LC * RC size. In addition, the final total calculation time is a ratio of the number of job servers included in the server layout to the number of job servers of the performance prediction model with respect to the value of f (32000, 16000, 24000) calculated through the performance prediction model, That is, it can be calculated by multiplying by 3/2.

결과적으로, 성능 예측 모델이 4개의 작업 서버들에서 분산 행렬 연산이 수행됨을 전제로 생성되었다고 가정할 경우, 9개의 작업 서버들에서 분산 행렬 연산이 수행되는 경우의 전체 연산 시간은

를 통해 산출할 수 있고, 16개의 작업 서버들에서 분산 행렬 연산이 수행되는 경우의 전체 연산 시간은

를 통해 산출할 수 있다.As a result, assuming that the performance prediction model is generated assuming that the distributed matrix operation is performed on four work servers, the total computation time when the distributed matrix operation is performed on nine work servers is

It can be calculated through, and the total operation time when the distributed matrix operation is performed in 16 job servers

It can be calculated through

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although described above with reference to a preferred embodiment of the present invention, those skilled in the art will be variously modified and changed within the scope of the invention without departing from the spirit and scope of the invention described in the claims below I can understand that you can.

100: 클라우드 기반의 인공지능 연산 서비스 시스템
110: 사용자 단말 130: 인공지능 연산 서비스 장치
150: 데이터베이스
210: 서버 레이아웃 결정부 230: 파티션 분할부
250: 단위 연산 시간 산출부 270: 전체 연산 시간 예측부
290: 제어부
410: 왼쪽 행렬 A 430: 오른쪽 행렬 B
450: 결과 행렬 C100: cloud-based AI computing service system
110: user terminal 130: AI computing service device
150: database
210: server layout determination unit 230: partition partition
250: unit operation time calculation unit 270: total operation time prediction unit
290: control unit
410: left matrix A 430: right matrix B
450: result matrix C

Claims

(a) determining a server layout for a matrix multiplication operation based on the shape of each of the first and second matrices;
(b) dividing the first and second matrices into first and second block based partitions based on the server layout;
(c) calculating a unit operation time by applying a dispersion matrix characteristic to the first and second block-based partitions; And
and (d) estimating a total operation time required for the multiplication operation of the first and second matrices based on the unit operation time.

The method of claim 1, wherein step (a)
Determine at least one job server among a plurality of servers that satisfies a specific condition of a matrix multiplication operation between the first and second matrices and fit the shapes of the first and second matrices based on the number of job servers. And the server layout is determined by matching the at least one job server.

The method of claim 2, wherein step (b)
And computing the first and second block-based partitions into partitions based on the number of the job servers for each of the first and second matrices.

The method of claim 2, wherein step (c)
Associating the first and second block-based partitions with each other and assigning the at least one job server to the at least one job server based on the server layout, and calculating the unit operation time using the operation time at the at least one job server. Cloud-based AI computing service method that can be performed by a computer.

The method of claim 4, wherein step (c)
The unit operation time is calculated through a performance prediction model based on a row size and a column size for the first block-based partition and a column size for a second block-based partition allocated to the at least one work server. Cloud-based AI computing service method that can be performed by a computer.

The method of claim 1, wherein step (c)
And determining the distribution matrix characteristic based on a ratio of the number of job servers included in the server layout to the number of job servers of the performance prediction model. Service method.

The method of claim 5, wherein step (d)
The computer-implemented cloud-based artificial tree is estimated by applying a ratio of the number of job servers included in the server layout to the number of job servers of the performance prediction model to the unit operation time. Intelligent operation service method.

(a) dividing the first and second matrices for the multiplication operation into first and second block-based partitions based on the number of servers formed of squared powers;
calculating a unit operation time by applying a dispersion matrix property to the first and second block-based partitions; And
and (c) predicting a total operation time required for a multiplication operation of the first and second matrices based on the unit operation time.