KR20230135864A

KR20230135864A - Distributed computing orchestration method and system for resource sharing through federation among multiple cluster computing platforms

Info

Publication number: KR20230135864A
Application number: KR1020220033378A
Authority: KR
Inventors: 김기현; 석우진; 문정훈; 이상권
Original assignee: 한국과학기술정보연구원
Priority date: 2022-03-17
Filing date: 2022-03-17
Publication date: 2023-09-26

Abstract

복수의 컴퓨팅 클러스터를 포함하는 시스템에서 수행될 수 있는 컴퓨팅 자원 할당 방법 및 오케스트레이션 시스템이 제공된다.
본 개시의 몇몇 실시예에 따른 컴퓨팅 자원 할당 방법은, 상기 복수의 컴퓨팅 클러스터에 포함된 제1 클러스터의 제1 토큰 값과 제2 클러스터의 제2 토큰값을 이용하여 상기 제1 클러스터 및 상기 제2 클러스터 상호 인증을 수행하는 단계와, 상기 제1 클러스터 및 상기 제2 클러스터로 구성된 페더레이션에 대한 자원 사용을 제1 사용자의 단말에 대해 승인하는 단계와, 상기 제1 사용자에 의해 입력된 분산 컴퓨팅 명세 데이터를 이용하여, 상기 제1 클러스터 및 상기 제2 클러스터의 노드 중 일부를 매니저 서버와 워커 서버로 할당하는 단계 및 상기 할당된 매니저 서버 와 상기 할당된 워커 서버에 의해 분산 컴퓨팅 연산을 수행하는 단계를 포함한다.A computing resource allocation method and orchestration system that can be performed in a system including a plurality of computing clusters are provided.
The computing resource allocation method according to some embodiments of the present disclosure uses the first token value of the first cluster and the second token value of the second cluster included in the plurality of computing clusters to determine the first cluster and the second cluster. Performing cluster mutual authentication, approving resource use for a federation comprised of the first cluster and the second cluster for a first user's terminal, and distributed computing specification data input by the first user. Using , allocating some of the nodes of the first cluster and the second cluster to a manager server and a worker server and performing a distributed computing operation by the allocated manager server and the allocated worker server. do.

Description

Distributed computing orchestration method and system for resource sharing through federation between multiple cluster computing platforms

본 개시는 복수의 클러스터 컴퓨팅 플랫폼 간의 페더레이션(Federation)을 통한 자원 공유를 위한 분산 컴퓨팅 오케스트레이션 방법 및 시스템이다. 보다 자세하게는, 복수의 클러스터 상호간 토큰 값을 이용한 인증을 수행하여 사용자에 대한 승인 과정을 거쳐 분산 컴퓨팅 컨트롤러에 의해 복수의 클러스터의 노드에 작업을 할당하는 오케스트레이션 방법 및 시스템에 대한 것이다.The present disclosure is a distributed computing orchestration method and system for sharing resources through federation between a plurality of cluster computing platforms. More specifically, it relates to an orchestration method and system that performs authentication using token values between multiple clusters, goes through an approval process for users, and assigns tasks to nodes in multiple clusters by a distributed computing controller.

기존의 클라우드 컴퓨팅 구조는 거대한 여러대의 서버들로 구성되어 있으며, 이러한 서버들 내부에 존재하는 컴퓨팅의 수를 이용하여 컴퓨팅 환경을 구성할 수 있었다. 최근 클라우드 컴퓨팅 내부적으로 분산 컴퓨팅을 활용하기 위한 다양한 솔루션들이 존재하고, 이러한 솔루션을 활용하여 클라우드 컴퓨팅 인프라 내부적으로 분산 컴퓨팅을 활용할 수 있는 방법들은 존재한다.The existing cloud computing structure consists of several huge servers, and a computing environment can be created using the number of computing resources that exist within these servers. Recently, various solutions exist to utilize distributed computing within cloud computing, and there are ways to utilize distributed computing within a cloud computing infrastructure by utilizing these solutions.

클라우드 컴퓨팅을 구성하는 경우 하나의 서버와 같이 구성하기 때문에 분산 컴퓨팅과 같은 기술의 필요성이 크게 느껴지지 않았다. 하지만 최근 데이터의 크기가 커지면서 클라우드 컴퓨팅을 활용한 대용량의 데이터 처리 기술이 한계에 부딪혔다. 따라서 분산된 여러 컴퓨팅 인프라의 연계를 통해 거대한 분산 컴퓨팅을 활용한 데이터 처리 기술이 필요하다.When configuring cloud computing, because it is configured as a single server, the need for technologies such as distributed computing was not greatly felt. However, as the size of data has recently grown, large-capacity data processing technology using cloud computing has faced limitations. Therefore, data processing technology that utilizes massive distributed computing through the connection of multiple distributed computing infrastructures is needed.

기존의 시스템은 페더레이션을 이용한 컴퓨팅 클러스터 간의 자원 공유는 가능했지만, 여러 클러스터의 자원을 통합하여 활용한 거대한 분산 컴퓨팅 환경을 사용할 수 있는 방법이 존재하지 않는다.The existing system was able to share resources between computing clusters using federation, but there is no way to use a large distributed computing environment that integrates and utilizes the resources of multiple clusters.

클라우드 컴퓨팅의 경우 내부적으로 필요한 자원을 활용한 거대한 컴퓨팅은 가능하지만 다른 플랫폼과의 자원 연계 및 공유 시스템을 제공하지 않는다.In the case of cloud computing, massive computing is possible using internally needed resources, but it does not provide a system for linking and sharing resources with other platforms.

한국 공개특허공보 제10-2021-0108487호 (2021.09.02.)Korean Patent Publication No. 10-2021-0108487 (2021.09.02.)

본 개시의 몇몇 실시예를 통해 해결하고자 하는 기술적 과제는, 서로 다른 클러스터를 구성하는 컴퓨팅 플랫폼을 페더레이션을 통해 자원을 공유하는 오케스트레이션 시스템을 제공하는 것이다.The technical problem to be solved through some embodiments of the present disclosure is to provide an orchestration system that shares resources through federation of computing platforms constituting different clusters.

본 개시의 몇몇 실시예를 통해 해결하고자 하는 다른 기술적 과제는, 연계된 여러 클러스터에 존재하는 컴퓨팅 리소스를 이용한 거대 분산 클러스터 컴퓨팅 환경을 제공하는 것이다.Another technical problem to be solved through some embodiments of the present disclosure is to provide a large distributed cluster computing environment using computing resources existing in several connected clusters.

본 개시의 몇몇 실시예를 통해 해결하고자 하는 다른 기술적 과제는, 컨테이너 기반의 시스템으로 구성한 후 두 플랫폼간 토큰을 교환하는 인증과정을 거쳐 공유된 자원을 활용하여 대용량 데이터를 빠르게 처리하는 환경을 제공하는 것이다.Another technical problem to be solved through some embodiments of the present disclosure is to provide an environment that quickly processes large amounts of data by configuring a container-based system and utilizing shared resources through an authentication process of exchanging tokens between the two platforms. will be.

본 개시의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 개시의 기술분야에서의 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present disclosure are not limited to the technical problems mentioned above, and other technical problems not mentioned can be clearly understood by those skilled in the art from the description below.

상기 기술적 과제를 해결하기 위한, 본 개시의 일 실시예에 따른 컴퓨팅 자원 할당 방법은, 복수의 컴퓨팅 클러스터를 포함하는 시스템에서 수행되는 방법에 있어서, 상기 복수의 컴퓨팅 클러스터에 포함된 제1 클러스터의 제1 토큰 값을 생성하고, 상기 복수의 컴퓨팅 클러스터에 포함된 제2 클러스터의 제2 토큰 값을 생성하는 단계 및 상기 제1 토큰 값과 상기 제2 토큰 값을 이용하여 상기 제1 클러스터와 상기 제2 클러스터 간에 상호 인증이 수행되는 단계 및 상기 제1 클러스터 및 상기 제2 클러스터로 구성된 페더레이션(federation)에 대한 자원 사용을 제1 사용자의 단말에 대하여 승인하는 단계 및 상기 시스템에 포함되는 분산 컴퓨팅 컨트롤러가 상기 제1 사용자에 의하여 입력된 분산 컴퓨팅 명세 데이터를 이용하여, 상기 제1 클러스터의 노드 및 상기 제2 클러스터의 노드 중 어느 하나를 매니저 서버로 할당하고, 상기 제1 클러스터의 노드 및 상기 제2 클러스터의 노드 중 일부를 워커 서버로 할당하는 단계 및 상기 할당된 매니저 서버 및 상기 할당된 워커 서버에 의해 분산 컴퓨팅 연산을 수행하는 단계를 포함할 수 있다.In order to solve the above technical problem, a computing resource allocation method according to an embodiment of the present disclosure is performed in a system including a plurality of computing clusters, wherein the first cluster of the first cluster included in the plurality of computing clusters Generating a 1 token value and generating a second token value of a second cluster included in the plurality of computing clusters, and using the first token value and the second token value to Performing mutual authentication between clusters and approving resource use for a federation consisting of the first cluster and the second cluster for the first user's terminal, and a distributed computing controller included in the system Using the distributed computing specification data input by the first user, any one of the nodes of the first cluster and the nodes of the second cluster is assigned as a manager server, and the nodes of the first cluster and the nodes of the second cluster are assigned as a manager server. It may include the step of allocating some of the nodes to worker servers and performing distributed computing operations by the assigned manager server and the assigned worker server.

몇몇 실시예에서, 상기 제1 토큰 값은 제1 클러스터의 제1 서비스 어카운트와 제1 클러스터 네임을 이용하여 생성된 것이며, 상기 제2 토큰 값은, 제2 클러스터의 제2 서비스 어카운트와 제2 클러스터 네임을 이용하여 생성된 것일 수 있다.In some embodiments, the first token value is generated using the first service account of the first cluster and the first cluster name, and the second token value is generated using the second service account of the second cluster and the second cluster name. It may have been created using the name.

몇몇 실시예에서, 상기 상호 인증이 수행되는 단계는, 상기 제1 클러스터 및 상기 제2 클러스터 상호간에 제1 토큰 값 및 제2 토큰 값을 상호 교환하는 단계 및 상기 제1 클러스터 및 상기 제2 클러스터 상호간에 상기 교환된 토큰 값의 유효 여부를 검증하는 인증을 수행하는 단계를 포함할 수 있다.In some embodiments, the step of performing the mutual authentication includes exchanging a first token value and a second token value between the first cluster and the second cluster, and mutually exchanging the first token value and the second token value between the first cluster and the second cluster. It may include performing authentication to verify whether the exchanged token value is valid.

몇몇 실시예에서, 상기 분산 컴퓨팅 명세 데이터가 생성되는 단계는, 상기 분산 컴퓨팅 명세 데이터에 매니저 서버의 수, 매니저 서버 이미지 및 매니저 서버의 사용 자원에 대한 데이터를 추가하는 단계를 포함할 수 있다.In some embodiments, generating the distributed computing specification data may include adding data about the number of manager servers, a manager server image, and resources used by the manager server to the distributed computing specification data.

몇몇 실시예에서, 상기 분산 컴퓨팅 명세 데이터가 생성되는 단계는, 상기 분산 컴퓨팅 명세 데이터에 워커 서버의 수, 워커 서버 이미지 및 워커 서버 사용 자원에 대한 데이터를 추가하는 단계를 포함할 수 있다.In some embodiments, the step of generating the distributed computing specification data may include adding data about the number of worker servers, worker server images, and worker server usage resources to the distributed computing specification data.

몇몇 실시예에서, 상기 분산 컴퓨팅 연산을 수행하는 단계는, 매니저 서버가 분산 처리가 필요한 데이터를 워커 서버로 분배하는 단계 및 워커 서버가 분배된 상기 데이터를 처리하여 결과를 매니저 서버로 전송하는 단계 및 매니저 서버가 워커 서버에서 처리한 상기 데이터의 파라미터를 통합적으로 계산하여 상기 데이터의 연산을 처리하는 단계를 포함할 수 있다.In some embodiments, performing the distributed computing operation includes the manager server distributing data requiring distributed processing to the worker server and the worker server processing the distributed data and transmitting the results to the manager server; It may include a step where the manager server comprehensively calculates the parameters of the data processed by the worker server and processes the operation of the data.

본 개시의 다른 실시예에 따른, 분산 컴퓨팅 오케스트레이션 시스템은, 복수의 노드를 포함하는 제1 클러스터, 복수의 노드를 포함하는 제2 클러스터 및 분산 컴퓨팅 컨트롤러를 포함하되, 상기 분산 컴퓨팅 컨트롤러는, 분산 컴퓨팅 명세 데이터를 이용하여, 상기 제1 클러스터의 노드 및 상기 제2 클러스터의 노드 중 어느 하나를 매니저 서버로 할당하고, 상기 제1 클러스터의 노드 및 상기 제2 클러스터의 노드 중 일부를 워커 서버로 할당하고, 상기 매니저 서버는 분산 처리가 필요한 데이터를 워커 서버로 분배하고, 상기 워커 서버가 분배된 상기 데이터를 처리하여 결과를 상기 매니저 서버로 전송하며, 상기 매니저 서버가 워커 서버에서 처리한 상기 데이터의 파라미터를 통합적으로 계산하여 상기 데이터의 연산을 처리하고, 상기 제1 클러스터 및 상기 제2 클러스터 상호간에 상기 제1 토큰 값 및 상기 제2 토큰 값을 교환하여 인증을 수행할 수 있다.According to another embodiment of the present disclosure, a distributed computing orchestration system includes a first cluster including a plurality of nodes, a second cluster including a plurality of nodes, and a distributed computing controller, wherein the distributed computing controller includes a distributed computing controller. Using specification data, assign one of the nodes of the first cluster and the nodes of the second cluster as a manager server, and assign some of the nodes of the first cluster and the nodes of the second cluster as worker servers, , the manager server distributes data requiring distributed processing to worker servers, the worker server processes the distributed data and transmits the results to the manager server, and the manager server processes parameters of the data in the worker server. The operation of the data can be processed by comprehensively calculating, and authentication can be performed by exchanging the first token value and the second token value between the first cluster and the second cluster.

몇몇 실시예에서, 상기 분산 컴퓨팅 컨트롤러는, 상기 제1 클러스터 및 상기 제2 클러스터 상호간에 상기 제1 토큰 값 및 상기 제2 토큰 값의 교환을 중개할 수 있다.In some embodiments, the distributed computing controller may mediate exchange of the first token value and the second token value between the first cluster and the second cluster.

몇몇 실시예에서, 상기 분산 컴퓨팅 명세 데이터는, 매니저 서버의 수, 매니저 서버 이미지 및 매니저 서버의 사용 자원에 대한 데이터가 추가된 것일 수 있다.In some embodiments, the distributed computing specification data may include data about the number of manager servers, a manager server image, and the resources used by the manager servers.

몇몇 실시예에서, 상기 분산 컴퓨팅 명세 데이터는, 워커 서버의 수, 워커 서버 이미지 및 워커 서버 사용 자원에 대한 데이터가 추가된 것일 수 있다.In some embodiments, the distributed computing specification data may include data on the number of worker servers, worker server images, and worker server usage resources.

본 개시의 또 다른 실시예에 따른, 분산 컴퓨팅 컨트롤 장치는, 제1 클러스터 및 제2 클러스터 상호간에 제1 토큰 값 및 제2 토큰 값의 교환을 중개하는 중개부 및 분산 컴퓨팅 명세 데이터를 이용하여, 상기 제1 클러스터의 노드 및 상기 제2 클러스터의 노드 중 어느 하나를 매니저 서버로 할당하고, 상기 제1 클러스터의 노드 및 상기 제2 클러스터의 노드 중 일부를 워커 서버로 할당하는 리소스 할당부 및 사용자 인증 정보와 연산 작업을 위한 사용 자원을 입력 받기 위한 사용자 인터페이스를 제공하는 사용자 인터페이스 제공부를 포함하되, 상기 매니저 서버는, 분산 처리가 필요한 데이터를 워커 서버로 분배하고 워커 서버에서 처리한 상기 데이터의 파라미터를 통합적으로 계산하여 데이터의 연산을 처리하며, 상기 워커 서버는, 분배된 데이터를 처리하여 결과를 상기 매니저 서버로 전송할 수 있다.According to another embodiment of the present disclosure, the distributed computing control device uses distributed computing specification data and an intermediary unit that mediates the exchange of the first token value and the second token value between the first cluster and the second cluster, A resource allocation unit and user authentication that allocates one of the nodes of the first cluster and the nodes of the second cluster as a manager server, and allocates some of the nodes of the first cluster and the nodes of the second cluster as worker servers It includes a user interface provider that provides a user interface for inputting information and resources for computational work, wherein the manager server distributes data requiring distributed processing to worker servers and sets parameters of the data processed by the worker servers. Data operations are processed through integrated calculations, and the worker server can process distributed data and transmit the results to the manager server.

몇몇 실시예에서, 상기 중개부는, 상기 제1 클러스터의 노드 중 GPU를 구비한 노드와, 상기 제2 클러스터의 노드 중 GPU를 구비한 노드로 구성된 워커 서버 풀 중 적어도 일부를 상기 워커 서버로 할당할 수 있다.In some embodiments, the mediation unit may allocate at least a portion of a worker server pool consisting of nodes equipped with a GPU among the nodes of the first cluster and nodes equipped with a GPU among the nodes of the second cluster to the worker servers. You can.

도 1은 본 개시의 일 실시예에 따른, 분산 컴퓨팅 오케스트레이션 시스템의 구성도이다.
도 2는 본 개시의 일 실시예에 따른, 분산 컴퓨팅 컨트롤러 장치의 블록도이다.
도 3은 본 개시의 일 실시예에 따른, 컴퓨팅 자원 할당 방법의 순서도이다.
도 4는 도 3을 참조하여 설명한 방법 중 각 클러스터 간에 상호 인증을 수행하는 단계를 보다 상세히 설명하기 위한 상세 순서도이다.
도 5는 도 3을 참조하여 설명한 방법 중 매니저 서버와 워커 서버를 노드에 할당하는 단계를 설명하기 위한 상세 순서도이다.
도 6는 도 3을 참조하여 설명한 방법 중 분산 컴퓨팅 연산을 수행하는 단계를 설명하기 위한 상세 순서도이다.
도 7은 본 개시의 다양한 실시예에 따른 장치 및/또는 시스템을 구현할 수 있는 예시적인 컴퓨팅 장치의 하드웨어 구성도이다.1 is a configuration diagram of a distributed computing orchestration system according to an embodiment of the present disclosure.
Figure 2 is a block diagram of a distributed computing controller device according to an embodiment of the present disclosure.
3 is a flowchart of a computing resource allocation method according to an embodiment of the present disclosure.
FIG. 4 is a detailed flowchart for explaining in more detail the step of performing mutual authentication between each cluster among the methods described with reference to FIG. 3.
FIG. 5 is a detailed flowchart for explaining the steps of allocating a manager server and a worker server to a node in the method described with reference to FIG. 3.
FIG. 6 is a detailed flowchart illustrating steps for performing distributed computing operations in the method described with reference to FIG. 3 .
7 is a hardware configuration diagram of an example computing device that can implement devices and/or systems according to various embodiments of the present disclosure.

이하, 첨부된 도면을 참조하여 본 개시의 바람직한 실시예들을 상세히 설명한다. 본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명의 기술적 사상은 이하의 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 이하의 실시예들은 본 발명의 기술적 사상을 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명의 기술적 사상은 청구항의 범주에 의해 정의될 뿐이다.Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the attached drawings. The advantages and features of the present invention and methods for achieving them will become clear by referring to the embodiments described in detail below along with the accompanying drawings. However, the technical idea of the present invention is not limited to the following embodiments and may be implemented in various different forms. The following examples are merely intended to complete the technical idea of the present invention and to be used in the technical field to which the present invention pertains. It is provided to fully inform those skilled in the art of the scope of the present invention, and the technical idea of the present invention is only defined by the scope of the claims.

본 개시를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다. In describing the present disclosure, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present invention, the detailed description will be omitted.

본 명세서에서 컨테이너란, 소프트웨어가 현재의 컴퓨팅 환경에서 다른 환경으로 이동하더라도 안정적으로 실행될 수 있도록 모듈화 되고 격리된 컴퓨팅 공간 또는 컴퓨팅 환경, 다시 말해 어플리케이션을 구동하는 환경을 격리한 공간을 의미할 수 있다. In this specification, a container may refer to a computing space or computing environment that is modularized and isolated so that software can run stably even if it is moved from the current computing environment to another environment, that is, a space that isolates the environment running the application.

본 명세서에서 서비스 어카운트란, 컨테이너 환경에서 실행되는 컴퓨팅 프로세스를 위한 계정으로서, 사람을 위한 사용자 어카운트와 구분되는 개념이다.In this specification, a service account is an account for a computing process running in a container environment, and is a concept that is distinct from a user account for a person.

또한, 본 명세서에서 페더레이션(Federation)이란, 사용자에게는 단일 계정정보의 인증을 통해 다중 서비스를 제공하고, 서비스 제공자에게도 사용자 계정정보를 관리하는 부담을 덜어주는 기술과 표준을 이용하는 기관들의 연합체를 의미할 수 있다. 일 실시예로, 페더레이션은 사용자 계정을 시스템, 네트워크, 도메인 간에 신뢰 기반으로 확장해서 안전하게 사용할 수 있도록 기술적 행위들을 제공하는 연합체를 의미할 수 있다.In addition, in this specification, federation refers to an association of organizations that provides multiple services to users through authentication of single account information and uses technologies and standards to relieve service providers of the burden of managing user account information. You can. In one embodiment, federation may refer to an association that provides technical actions so that user accounts can be safely used by expanding them based on trust between systems, networks, and domains.

이하, 본 개시의 다양한 실시예들에 대하여 첨부된 도면에 따라 상세하게 설명한다.Hereinafter, various embodiments of the present disclosure will be described in detail with reference to the attached drawings.

도 1은 본 개시의 일 실시예에 따른, 분산 컴퓨팅 오케스트레이션 시스템의 구성도이다.1 is a configuration diagram of a distributed computing orchestration system according to an embodiment of the present disclosure.

도 1을 참조하면, 본 개시의 일 실시예에 따른 복수의 컴퓨팅 클러스터를 포함하는 분산 컴퓨팅 오케스트레이션 시스템은 제1 클러스터(2000), 제2 클러스터(3000) 및 분산 컴퓨팅 컨트롤러(4000)를 포함할 수 있다. 여기서, 제1 클러스터 와 제2 클러스터는 다른 나라 또는 다른 기관의 서로 다른 클러스터일 수 있으며 나아가 서로 다른 클러스터의 수는 2개 이상일 수 있다. Referring to FIG. 1, a distributed computing orchestration system including a plurality of computing clusters according to an embodiment of the present disclosure may include a first cluster 2000, a second cluster 3000, and a distributed computing controller 4000. there is. Here, the first cluster and the second cluster may be different clusters from different countries or different organizations, and further, the number of different clusters may be two or more.

제1 클러스터는 복수의 노드들(2100, 2200, 2300, 2400)을 포함한다. 제1 노드(2100)는 매니저 서버(2110) 및/또는 워커 서버(2120)를 포함할 수 있으며, 제1 노드의 워커서버(2120)는 GPU(Graphic Processing Unit)(2121)를 포함할 수 있다. 여기서 매니저 서버는, 분산 학습에 있어서 각 워커의 학습 결과를 합산하고 학습 모델의 파라미터를 관리하는 파라미터 서버를 의미할 수 있다.The first cluster includes a plurality of nodes (2100, 2200, 2300, and 2400). The first node 2100 may include a manager server 2110 and/or a worker server 2120, and the worker server 2120 of the first node may include a GPU (Graphics Processing Unit) 2121. . Here, the manager server may refer to a parameter server that sums the learning results of each worker in distributed learning and manages the parameters of the learning model.

다른 복수의 노드들(2200,2300,2400) 또한 복수의 워커 서버들(2210, 2220, 2310, 2320, 2410, 2420)을 포함할 수 있다. 여기서, 복수의 워커 서버들(2120, 2210, 2220, 2310, 2320, 2410, 2420)의 복수의 GPU(2121,2211,2221,2311,2321,2411,2421)은 서로 다른 성능을 가질 수 있으며, 나아가 상기 복수의 GPU는, GPU 외에 CPU(Central Processing Unit) 등의 다른 데이터 처리 장치에 의해 구현될 수도 있다.Other nodes 2200, 2300, and 2400 may also include a plurality of worker servers 2210, 2220, 2310, 2320, 2410, and 2420. Here, the plurality of GPUs (2121, 2211, 2221, 2311, 2321, 2411, 2421) of the plurality of worker servers (2120, 2210, 2220, 2310, 2320, 2410, 2420) may have different performance, Furthermore, the plurality of GPUs may be implemented by other data processing devices such as CPUs (Central Processing Units) in addition to GPUs.

제1 클러스터의 어느 하나의 노드 또는 복수의 노드에 매니저 서버가 있는 경우, 제2 클러스터에는 매니저 서버가 없을 수 있다. 제1 클러스터의 노드의 개수, 매니저 서버의 개수 및 워커 서버의 개수는 반드시 도 1에 개시된 개수에 한정되지 않고 이는 본 개시의 일 실시예에 불과하다. 도 1의 시스템에 대한 상기 설명은 제2 클러스터에도 동일하게 적용될 수 있다.If there is a manager server in one node or multiple nodes of the first cluster, there may be no manager server in the second cluster. The number of nodes, manager servers, and worker servers of the first cluster are not necessarily limited to the numbers disclosed in FIG. 1, and this is only an embodiment of the present disclosure. The above description of the system in FIG. 1 can be equally applied to the second cluster.

분산 컴퓨팅 컨트롤러(4000)는 제1 사용자에 의해 입력된 분산 컴퓨팅 명세 데이터를 이용하여, 제1 클러스터의 노드 및 제2 클러스터의 노드 중 일부를 매니저 서버로 할당하고, 제1 클러스터의 노드 및 제2 클러스터의 노드 중 일부를 워커 서버로 할당하는 구성요소다. 이때, 분산 컴퓨팅 컨트롤러(4000)는 제1 클러스터 및 제2 클러스터와 별개의 공간에 위치할 수 있다. 여기서 분산 컴퓨팅 명세 데이터는, 서로 다른 시스템 간에 데이터를 주고받을 때 사용되는 데이터 포맷으로서, 예를들어 자원 사용을 승인받은 제1 사용자가 분산 컴퓨팅 컨트롤러(4000)를 이용하여 생성한 YAML 파일일 수 있다.The distributed computing controller 4000 uses the distributed computing specification data input by the first user to assign some of the nodes of the first cluster and the nodes of the second cluster to the manager server, and assign the nodes of the first cluster and some of the nodes of the second cluster to the manager server. It is a component that allocates some of the nodes in the cluster as worker servers. At this time, the distributed computing controller 4000 may be located in a space separate from the first cluster and the second cluster. Here, the distributed computing specification data is a data format used when exchanging data between different systems. For example, it may be a YAML file created by the first user approved to use the resource using the distributed computing controller 4000. .

페더레이션 프로세스(5000)는 제1 클러스터(2000)의 페더레이션 시스템(5100) 및 제2 클러스터의 페더레이션 시스템(5200)을 포함할 수 있다. 상기 프로세스에 대하여는 도 3에서 보다 구체적으로 설명하기로 한다.The federation process 5000 may include a federation system 5100 of the first cluster 2000 and a federation system 5200 of the second cluster. The above process will be described in more detail in FIG. 3.

도 2는 본 개시의 일 실시예에 따른, 분산 컴퓨팅 컨트롤러(4000) 장치의 블록도이다. 도 2를 참조하면, 분산 컴퓨팅 컨트롤러(4000) 장치는, 중개부(4100), 리소스 할당부(4200) 및 사용자 인터페이스 제공부(4300)으로 구성될 수 있다.Figure 2 is a block diagram of a distributed computing controller 4000 device according to an embodiment of the present disclosure. Referring to FIG. 2, the distributed computing controller 4000 device may be composed of a mediation unit 4100, a resource allocation unit 4200, and a user interface providing unit 4300.

중개부(4100)는 제1 클러스터(2000) 및 제2 클러스터(3000) 상호간에 제1 토큰 값 및 제2 토큰 값의 교환을 중개한다. 여기서, 중개부(4100)는 제1 클러스터(2000)의 노드 중 GPU를 구비한 노드와, 제2 클러스터(3000)의 노드 중 GPU를 구비한 노드로 구성된 워커 서버 풀 중 적어도 일부를 상기 워커 서버로 할당할 수 있다.The mediation unit 4100 mediates the exchange of the first token value and the second token value between the first cluster 2000 and the second cluster 3000. Here, the mediator 4100 selects at least some of the worker server pool consisting of nodes equipped with GPUs among the nodes of the first cluster 2000 and nodes equipped with GPUs among the nodes of the second cluster 3000. It can be assigned.

리소스 할당부(4200)는 분산 컴퓨팅 명세 데이터를 이용하여, 제1 클러스터(2000)의 노드 및 제2 클러스터(3000)의 노드 중 어느 하나를 매니저 서버로 할당하고, 제1 클러스터(2000)의 노드 및 제2 클러스터(3000)의 노드 중 일부를 워커 서버로 할당할 수 있다.The resource allocation unit 4200 uses the distributed computing specification data to allocate one of the nodes of the first cluster 2000 and the nodes of the second cluster 3000 as the manager server, and the node of the first cluster 2000 And some of the nodes of the second cluster 3000 may be allocated as worker servers.

사용자 인터페이스 제공부(4300)는 사용자 단말(미도시)에 user ID 및 패스워드 등 자원 사용을 승인 받기 위한 사용자 정보를 입력 받기 위한 인증화면을 제공할 수 있고, 상기 인증 화면을 통한 사용자 인증 통과 시, 분산 컴퓨팅 오케스트레이션 시스템을 통하여 연산 작업을 할당하기 위한 사용 자원을 입력 받기 위한 편집 창을 더 제공할 수 있다.The user interface provider 4300 may provide an authentication screen for inputting user information for approval to use resources, such as a user ID and password, on a user terminal (not shown), and when user authentication passes through the authentication screen, An editing window can be further provided for inputting used resources for allocating computational tasks through a distributed computing orchestration system.

도 3은 본 개시의 일 실시예에 따른, 컴퓨팅 자원 할당 방법의 순서도이다. 이하, 제1 클러스터의 제1 토큰 값과 제2 클러스터의 제2 토큰 값이 생성되는 경우를 기준으로 본 발명의 실시예들에 대하여 설명하도록 한다. 먼저, 제1 클러스터의 제1 클러스터 네임과 제1 서비스 어카운트를 이용하여 제1 토큰 값을 생성하고, 제2 클러스터의 제2 클러스터 네임과 제2 서비스 어카운트를 이용하여 제2 토큰 값을 생성할 수 있다(S100). 여기서, 제1 토큰 값의 생성주체는 제1 클러스터(2000)일 수 있으며, 제2 토큰 값의 생성주체는 제2 클러스터(3000)일 수 있다.3 is a flowchart of a computing resource allocation method according to an embodiment of the present disclosure. Hereinafter, embodiments of the present invention will be described based on the case where the first token value of the first cluster and the second token value of the second cluster are generated. First, the first token value can be generated using the first cluster name and first service account of the first cluster, and the second token value can be generated using the second cluster name and second service account of the second cluster. There is (S100). Here, the generator of the first token value may be the first cluster (2000), and the generator of the second token value may be the second cluster (3000).

제1 클러스터 와 제2 클러스터의 토큰 값이 생성된 후, 생성된 제1 토큰 값 및 제2 토큰 값을 이용하여 제1 클러스터 와 제2 클러스터 간에 상호 인증을 수행할 수 있다(S200). 상호 간에 인증된 클러스터는 페더레이션을 이룬다. 제1 사용자는 자신이 원하는 만큼의 자원을 사용하기 위해 분산 컴퓨팅 컨트롤러(4000)에 필요한 만큼의 자원을 요청할 수 있다. 예를 들어, 제1 사용자의 자원 사용 요청이 있는 경우, 복수의 클러스터로 구성된 페더레이션에 대하여 제1 사용자의 권한을 추가하여 관리자가 자원 사용을 승인할 수 있다(S300). After the token values of the first cluster and the second cluster are generated, mutual authentication can be performed between the first cluster and the second cluster using the generated first token value and the second token value (S200). Clusters that are authenticated to each other form a federation. The first user may request as many resources as necessary from the distributed computing controller 4000 in order to use as many resources as the first user wants. For example, if there is a request to use a resource from the first user, the administrator can approve the use of the resource by adding the first user's authority to the federation consisting of a plurality of clusters (S300).

일 실시예로, 제1 사용자는 자신이 원하는 만큼의 자원을 사용하기 위해 분산 컴퓨팅 컨트롤러(4000)에 필요한 만큼의 자원 수를 요청할 수 있다. 또한 일 실시예로, 상기 페더레이션 시스템의 사용을 원하는 제1 사용자는 제1 사용자의 User ID에 대한 승인을 받아 페더레이션 시스템의 서로 다른 클러스터의 자원을 사용할 수 있다. In one embodiment, the first user may request the required number of resources from the distributed computing controller 4000 in order to use as many resources as desired. Also, in one embodiment, a first user who wants to use the federation system can use resources of different clusters of the federation system by receiving approval for the first user's User ID.

자원의 사용이 승인된 경우, 분산 컴퓨팅 컨트롤러(4000)가 노드 중 어느 하나를 매니저 서버로 할당하고, 일부를 워커 서버로 할당할 수 있다(S400). 정의된 매니저 서버와 워커 서버들을 포함하는 복수의 페더레이션을 이루는 클러스터를 이용하여 분산 컴퓨팅 연산이 수행될 수 있다(S500). If the use of resources is approved, the distributed computing controller 4000 may assign one of the nodes as a manager server and some as worker servers (S400). Distributed computing operations can be performed using a cluster forming a plurality of federations including defined manager servers and worker servers (S500).

서로 다른 클러스터를 구성하는 컴퓨팅 플랫폼들을 컨테이너 기반 시스템으로 구성한 후, 페더레이션을 통해 자원 공유가 가능하도록 함으로써, 연계된 서로다른 클러스터에 존재하는 컴퓨팅 리소스를 이용한 거대한 분산 클러스터 컴퓨팅 환경을 제공할 수 있는 효과가 있다.By configuring the computing platforms that make up different clusters into a container-based system and then enabling resource sharing through federation, the effect is to provide a huge distributed cluster computing environment using computing resources existing in different linked clusters. there is.

도 3 내지 도 4을 참조하면, 단계(S200)에서, 제1 클러스터와 제2 클러스터 상호간에 제1 토큰 값과 제2 토큰 값을 상호 교환할 수 있다(S210). 그리고 제1 클러스터와 제2 클러스터 상호간에 교환된 제1 토큰 값과 제2 토큰 값의 유효여부를 검증하는 인증을 수행할 수 있다(S220). 여기서, 분산 컴퓨팅 컨트롤러(4000)가 제1 클러스터와 제2 클러스터 상호간에 이뤄지는 제1 토큰 값과 제2 토큰 값의 교환을 중개할 수 있다.Referring to FIGS. 3 and 4 , in step S200, the first token value and the second token value may be exchanged between the first cluster and the second cluster (S210). Additionally, authentication can be performed to verify the validity of the first and second token values exchanged between the first cluster and the second cluster (S220). Here, the distributed computing controller 4000 may mediate the exchange of the first token value and the second token value between the first cluster and the second cluster.

도 5는 본 개시의 매니저 서버와 워커 서버를 노드에 할당하는 단계를 설명하기 위한 상세 순서도이다. 제1 클러스터 및 제2 클러스터로 구성된 페더레이션에 대한 제1 사용자의 자원 사용이 승인된 경우, 승인된 제1 사용자는 분산 컴퓨팅 컨트롤러(4000)를 이용하여 분산 컴퓨팅 명세 데이터를 생성할 수 있다. 여기서, 상기 분산 컴퓨팅 명세 데이터는 YAML 파일일 수 있다. 상기 분산 컴퓨팅 명세 데이터 파일 생성시, 매니저 서버의 수, 매니저 서버 이미지 및 매니저 서버 사용 자원을 정의할 수 있으며, 워커 서버의 수, 워커 서버 이미지, 워커 서버 사용 자원을 정의할 수 있다.Figure 5 is a detailed flow chart to explain the steps of allocating the manager server and worker server to the node of the present disclosure. If the first user's use of resources for the federation consisting of the first cluster and the second cluster is approved, the approved first user may generate distributed computing specification data using the distributed computing controller 4000. Here, the distributed computing specification data may be a YAML file. When creating the distributed computing specification data file, the number of manager servers, manager server image, and manager server usage resources can be defined, and the number of worker servers, worker server image, and worker server usage resources can be defined.

도 6은 분산 컴퓨팅 연산을 수행하는 단계를 설명하기 위한 상세 순서도이다. 상기 분산 컴퓨팅 명세 데이터에 의해 정의된 바에 따라 매니저 서버 및 워커 서버의 수, 이미지 및 사용 자원 등이 정해진다. 상기 매니저 서버가 분산 처리가 필요한 데이터를 워커 서버로 분배할 수 있고(S510), 상기 워커 서버가 분배된 상기 데이터를 처리하여 결과를 매니저 서버로 전송할 수 있다(S520). 그후, 상기 매니저 서버가 상기 워커 서버에서 처리한 상기 데이터의 파라미터를 통합적으로 계산하여 상기 데이터의 연산을 처리할 수 있다(S530). 여기서, 상기 데이터의 연산 처리는 AI 연산의 처리일 수 있다. 상기 데이터의 연산을 수행한 결과는 Web UI로 제공될 수 있다.Figure 6 is a detailed flowchart for explaining steps for performing distributed computing operations. The number of manager servers and worker servers, images, and resources used are determined as defined by the distributed computing specification data. The manager server can distribute data requiring distributed processing to the worker server (S510), and the worker server can process the distributed data and transmit the results to the manager server (S520). Thereafter, the manager server may process the operation of the data by comprehensively calculating the parameters of the data processed by the worker server (S530). Here, the calculation processing of the data may be AI calculation processing. The results of performing operations on the data can be provided through a web UI.

도 7은 본 개시의 몇몇 실시예들에 따른 컴퓨팅 장치의 하드웨어 구성도이다. 도 7에 도시된 컴퓨팅 장치(1500)는, 예를 들어 도 1을 참조하여 설명한 제1 클러스터(2000)의 제1 노드(2100)를 가리키는 것일 수 있다. 컴퓨팅 장치(1500)는 하나 이상의 프로세서(1510), 시스템 버스(1550), 통신 인터페이스(1200), 프로세서(1510)에 의하여 수행되는 컴퓨터 프로그램(1591)을 로드(load)하는 메모리(1530)와, 컴퓨터 프로그램(1591)을 저장하는 스토리지(1590)를 포함할 수 있다.7 is a hardware configuration diagram of a computing device according to some embodiments of the present disclosure. For example, the computing device 1500 shown in FIG. 7 may point to the first node 2100 of the first cluster 2000 described with reference to FIG. 1 . The computing device 1500 includes one or more processors 1510, a system bus 1550, a communication interface 1200, a memory 1530 that loads a computer program 1591 executed by the processor 1510, and It may include a storage 1590 that stores a computer program 1591.

프로세서(1510)는 컴퓨팅 장치(1500)의 각 구성의 전반적인 동작을 제어한다. 프로세서(1510)는 본 개시의 다양한 실시예들에 따른 방법/동작을 실행하기 위한 적어도 하나의 애플리케이션 또는 프로그램에 대한 연산을 수행할 수 있다. 메모리(1530)는 각종 데이터, 명령 및/또는 정보를 저장한다. 메모리(1530)는 본 개시의 다양한 실시예들에 따른 방법/동작들을 실행하기 위하여 스토리지(1590)로부터 하나 이상의 컴퓨터 프로그램(1500)을 로드(load) 할 수 있다. 버스(1550)는 컴퓨팅 장치(1500)의 구성 요소 간 통신 기능을 제공한다. 네트워크 인터페이스(1570)는 컴퓨팅 장치(1500)의 인터넷 통신을 지원한다. 스토리지(1590)는 하나 이상의 컴퓨터 프로그램(1591)을 비임시적으로 저장할 수 있다. 컴퓨터 프로그램(1591)은 본 개시의 다양한 실시예들에 따른 방법/동작들이 구현된 하나 이상의 인스트럭션들(instructions)을 포함할 수 있다. 컴퓨터 프로그램(1591)이 메모리(1530)에 로드 되면, 프로세서(1510)는 상기 하나 이상의 인스트럭션들을 실행시킴으로써 본 개시의 다양한 실시예들에 따른 방법/동작들을 수행할 수 있다.The processor 1510 controls the overall operation of each component of the computing device 1500. The processor 1510 may perform operations on at least one application or program to execute methods/operations according to various embodiments of the present disclosure. The memory 1530 stores various data, commands, and/or information. The memory 1530 may load one or more computer programs 1500 from the storage 1590 to execute methods/operations according to various embodiments of the present disclosure. Bus 1550 provides communication functionality between components of computing device 1500. The network interface 1570 supports Internet communication of the computing device 1500. Storage 1590 may non-transitory store one or more computer programs 1591. The computer program 1591 may include one or more instructions implementing methods/operations according to various embodiments of the present disclosure. When the computer program 1591 is loaded into the memory 1530, the processor 1510 can perform methods/operations according to various embodiments of the present disclosure by executing the one or more instructions.

컴퓨터 프로그램(1591)은 분산 처리가 필요한 데이터를 분배하는 동작, 분배된 데이터를 처리하는 동작 또는 처리된 데이터의 파라미터를 통합적으로 계산하여 상기 데이터의 연산을 처리하는 동작 등을 수행하기 위한 인스트럭션들(instructions)을 포함할 수 있다.The computer program 1591 includes instructions ( instructions) may be included.

지금까지 도 1 내지 도 7를 참조하여 본 개시의 다양한 실시예들 및 그 실시예들에 따른 효과들을 언급하였다. 본 개시의 기술적 사상에 따른 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.So far, various embodiments of the present disclosure and effects according to the embodiments have been mentioned with reference to FIGS. 1 to 7 . The effects according to the technical idea of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below.

지금까지 설명된 본 개시의 기술적 사상은 컴퓨터가 읽을 수 있는 매체 상에 컴퓨터가 읽을 수 있는 코드로 구현될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체에 기록된 상기 컴퓨터 프로그램은 인터넷 등의 네트워크를 통하여 다른 컴퓨팅 장치에 전송되어 상기 다른 컴퓨팅 장치에 설치될 수 있고, 이로써 상기 다른 컴퓨팅 장치에서 사용될 수 있다.The technical ideas of the present disclosure described so far can be implemented as computer-readable code on a computer-readable medium. The computer program recorded on the computer-readable recording medium can be transmitted to another computing device through a network such as the Internet, installed on the other computing device, and thus used on the other computing device.

도면에서 동작들이 특정한 순서로 도시되어 있지만, 반드시 동작들이 도시된 특정한 순서로 또는 순차적 순서로 실행되어야만 하거나 또는 모든 도시 된 동작들이 실행되어야만 원하는 결과를 얻을 수 있는 것으로 이해되어서는 안 된다. 특정 상황에서는, 멀티태스킹 및 병렬 처리가 유리할 수도 있다. 이상 첨부된 도면을 참조하여 본 개시의 실시예들을 설명하였지만, 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자는 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 본 발명이 다른 구체적인 형태로도 실시될 수 있다는 것을 이해할 수 있다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로 이해해야만 한다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 개시에 의해 정의되는 기술적 사상의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Although operations are shown in the drawings in a specific order, it should not be understood that the operations must be performed in the specific order shown or sequential order or that all illustrated operations must be performed to obtain the desired results. In certain situations, multitasking and parallel processing may be advantageous. Although embodiments of the present disclosure have been described above with reference to the attached drawings, those skilled in the art will understand that the present invention can be implemented in other specific forms without changing the technical idea or essential features. I can understand that there is. Therefore, the embodiments described above should be understood in all respects as illustrative and not restrictive. The scope of protection of the present invention should be interpreted in accordance with the claims below, and all technical ideas within the equivalent scope should be construed as being included in the scope of rights of the technical ideas defined by this disclosure.

Claims

In a method performed on a system including a plurality of computing clusters,
Generating a first token value of a first cluster included in the plurality of computing clusters and generating a second token value of a second cluster included in the plurality of computing clusters;
performing mutual authentication between the first cluster and the second cluster using the first token value and the second token value;
Approving resource use for a federation consisting of the first cluster and the second cluster for the first user's terminal;
A distributed computing controller included in the system assigns one of the nodes of the first cluster and the nodes of the second cluster as a manager server using the distributed computing specification data input by the first user, and Allocating some of the nodes of the first cluster and the nodes of the second cluster as worker servers; and
Comprising the step of performing distributed computing operations by the allocated manager server and the allocated worker server,
How to allocate computing resources.

According to paragraph 1,
The first token value is generated using the first service account and the first cluster name of the first cluster, and the second token value is generated using the second service account and the second cluster name of the second cluster. What has been done,
How to allocate computing resources.

According to paragraph 1,
The steps in which the mutual authentication is performed are:
exchanging first token values and second token values between the first cluster and the second cluster; and
Comprising the step of performing authentication to verify whether the exchanged token value is valid between the first cluster and the second cluster,
How to allocate computing resources.

According to paragraph 1,
The step in which the distributed computing specification data is generated is,
Including adding data on the number of manager servers, manager server image, and used resources of the manager server to the distributed computing specification data,
How to share computing platform resources.

According to paragraph 1,
The step in which the distributed computing specification data is generated is,
Including adding data on the number of worker servers, worker server images, and worker server usage resources to the distributed computing specification data,
How to share computing platform resources.

According to paragraph 1,
The step of performing the distributed computing operation is,
A step where the manager server distributes data requiring distributed processing to the worker servers;
A worker server processing the distributed data and transmitting the results to the manager server; and
Comprising the step of the manager server processing the operation of the data by comprehensively calculating parameters of the data processed by the worker server,
How to share computing platform resources.

In a distributed computing orchestration system for resource sharing through federation between multiple cluster computing platforms,
A first cluster including a plurality of nodes;
a second cluster including a plurality of nodes; and
Includes a distributed computing controller,
The distributed computing controller is,
Using distributed computing specification data, any one of the nodes of the first cluster and the nodes of the second cluster is assigned as a manager server, and some of the nodes of the first cluster and the nodes of the second cluster are assigned as worker servers. assign,
The manager server distributes data requiring distributed processing to worker servers,
The worker server processes the distributed data and transmits the results to the manager server,
The manager server processes the operation of the data by comprehensively calculating the parameters of the data processed by the worker server,
Authentication is performed by exchanging a first token value and a second token value between the first cluster and the second cluster,
Distributed computing orchestration system.

In clause 7,
The distributed computing controller is,
Mediating the exchange of the first token value and the second token value between the first cluster and the second cluster,
Distributed computing orchestration system.

In clause 7,
The distributed computing specification data is,
Data on the number of manager servers, manager server image, and resource usage of the manager server has been added.
Distributed computing orchestration system.

In clause 7,
The distributed computing specification data is,
Data on the number of worker servers, worker server images, and worker server usage resources has been added.
Distributed computing orchestration system.

In a distributed computing control device for sharing resources through federation between multiple cluster computing platforms,
a mediation unit that mediates the exchange of the first token value and the second token value between the first cluster and the second cluster; and
Using distributed computing specification data, any one of the nodes of the first cluster and the nodes of the second cluster is assigned as a manager server, and some of the nodes of the first cluster and the nodes of the second cluster are assigned as worker servers. a resource allocation unit that allocates; and
It includes a user interface providing unit that provides a user interface for inputting user authentication information and resources used for calculation tasks,
The manager server distributes data requiring distributed processing to worker servers and processes data operations by comprehensively calculating parameters of the data processed by the worker servers,
The worker server processes the distributed data and transmits the results to the manager server.
Distributed computing control device.

According to clause 11,
The mediation department,
Allocating at least a portion of a worker server pool consisting of nodes equipped with GPUs among the nodes of the first cluster and nodes equipped with GPUs among the nodes of the second cluster to the worker servers,
Distributed computing control device.