KR20220052508A

KR20220052508A - Service Configuration Method with Partitioning and Aggregation of GPU Resources in Microservice Environment

Info

Publication number: KR20220052508A
Application number: KR1020200136532A
Authority: KR
Inventors: 김동민; 손재기; 전기만; 황동현
Original assignee: 한국전자기술연구원
Priority date: 2020-10-21
Filing date: 2020-10-21
Publication date: 2022-04-28

Abstract

As a plan to maximize the utilization of GPU resources required to run artificial intelligence models distributed in container units, provided is a method for providing channels for quick access to data together with an N:N matching function between the GPU resources and containers in which the artificial intelligence models are implemented in a microservice architecture environment. According to an embodiment of the present invention, a method for configuring a container service comprises: a step of requesting GPU assignment from a CPU; a step in which the CPU transmits a received GPU assignment request to a GPU virtualization module; a step in which the GPU virtualization module generates a virtualized GPU by using at least one of a plurality of GPUs; and a step of assigning the generated virtualized GPU to a container.

Description

Service Configuration Method with Partitioning and Aggregation of GPU Resources in Microservice Environment}

본 발명은 컨테이너 기반으로 응용이 구동되는 마이크로 서비스 아키텍처에 관한 것으로, 더욱 상세하게는 해당 마이크로 서비스 환경에서 한정된 GPU 자원의 집적 및 분할을 지원하는 서비스 구성 방법에 관한 것이다.The present invention relates to a microservice architecture in which an application is driven based on a container, and more particularly, to a service configuration method that supports the aggregation and division of GPU resources limited in the microservice environment.

도 1은 기존 인공지능 모델을 위한 컨테이너의 GPU 자원 할당 방법의 설명에 제공되는 도면이다. 도시된 바와 같이, 컨테이너 단위로 배포되는 마이크로 서비스 아키텍처 환경에서는, 하나의 인공지능 모델과 GPU 자원이 1:1로 매칭되어 컨테이너 응용이 구동된다.1 is a diagram provided to explain a method for allocating GPU resources of a container for an existing artificial intelligence model. As shown, in a microservice architecture environment deployed in units of containers, one AI model and GPU resources are matched 1:1 to drive container applications.

구체적으로, GPU 코어 2개를 요구하는 컨테이너 #1에 GPU #1이 매칭되었고, GPU 코어 1개를 요구하는 컨테이너 #2에 GPU #2가 매칭되었으며, GPU 코어 2개를 요구하는 컨테이너 #3에 GPU #3이 매칭되었고, GPU 코어 3개를 요구하는 컨테이너 #N에 GPU #M이 매칭되었다.Specifically, GPU #1 matched container #1 requiring 2 GPU cores, GPU #2 matched container #2 requiring 1 GPU core, and container #3 requiring 2 GPU cores. GPU #3 was matched, and GPU #M was matched to container #N requiring 3 GPU cores.

컨테이너 #1,#2,#3의 경우 요구에 부합하는 GPU가 매칭되었지만, 컨테이너 #N에 매칭된 GPU #M은 요구 조건을 만족시키지 못한다는 문제가 있다. 컨테이너 #N이 요구하는 GPU 코어는 3개이지만, GPU #M에는 코어가 1개 밖에 없기 때문이다.In the case of containers #1, #2, and #3, a GPU matching the requirements was matched, but there is a problem that the GPU #M matching the container #N does not satisfy the requirements. This is because container #N requires 3 GPU cores, but GPU #M only has 1 core.

컨테이너들이 요구하는 코어는 전부 8개이고, GPU들의 코어는 전부 10개임에도 불구하고, 컨테이너 #N을 통한 서비스의 QoS가 떨어지게 된다는 점에서 비효율적인 바, 이를 개선할 것이 요구된다.Although the total number of cores required by the containers is 8 and the total number of cores of the GPUs is 10, it is inefficient in that the QoS of the service through the container #N is lowered, and it is required to improve this.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 컨테이너 단위로 배포되는 인공지능 모델의 구동을 위해 필요한 GPU 자원의 활용을 극대화하기 위한 방안으로, 마이크로 서비스 아키텍처 환경에서 인공지능 모델이 구현된 컨테이너와 GPU 자원 간의 N:M 매칭 기능과 함께 빠른 데이터 접근을 위한 채널을 제공하는 방법을 제공함에 있다.The present invention has been devised to solve the above problems, and an object of the present invention is to maximize the utilization of GPU resources necessary for driving an artificial intelligence model deployed in container units, in a microservice architecture environment. The goal is to provide a method for providing a channel for fast data access with an N:M matching function between the container in which the AI model is implemented and the GPU resource.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 컨테이너 서비스 구성 방법은 컨테이너가, CPU에 GPU 할당을 요청하는 단계; CPU가, 수신한 GPU 할당 요청을 GPU 가상화 모듈에 전달하는 단계; GPU 가상화 모듈이, 다수의 GPU들 중 적어도 하나를 이용하여 가상화된 GPU를 생성하는 단계; 생성된 가상화된 GPU를 컨테이너에 할당하는 단계;를 포함한다. According to an embodiment of the present invention for achieving the above object, a container service configuration method includes: a container requesting GPU allocation to a CPU; transmitting, by the CPU, the received GPU allocation request to the GPU virtualization module; generating, by the GPU virtualization module, a virtualized GPU using at least one of a plurality of GPUs; Allocating the created virtualized GPU to a container; includes.

생성 단계는, 제1 GPU의 코어들 중 일부를 분할(Partitioning)하여, 제1 컨테이너에 할당할 가상화된 GPU를 생성하는 것일 수 있다. The generating step may be to generate a virtualized GPU to be allocated to the first container by partitioning some of the cores of the first GPU.

생성 단계는, 제1 GPU의 코어들 중 다른 일부를 분할하여, 제2 컨테이너에 할당할 가상화된 GPU를 생성하는 것일 수 있다.The generating step may be to generate a virtualized GPU to be allocated to the second container by dividing another part of cores of the first GPU.

생성 단계는, 제1 GPU의 코어들 중 다른 일부와 제2 GPU의 코어들 중 일부를 집적(Aggregation)하여, 제3 컨테이너에 할당할 가상화된 GPU를 생성하는 것일 수 있다.The generating step may be to generate a virtualized GPU to be allocated to the third container by aggregating another part of the cores of the first GPU and some of the cores of the second GPU.

본 발명에 따른 컨테이너 서비스 구성 방법은 GPU 가상화 모듈이, 컨테이너에 할당된 가상화된 GPU의 구성을 변경하는 단계;를 더 포함할 수 있다. The container service configuration method according to the present invention may further include, by the GPU virtualization module, changing the configuration of the virtualized GPU allocated to the container.

본 발명에 따른 컨테이너 서비스 구성 방법은 컨테이너와 가상화된 GPU 간의 데이터 채널을 설정하는 단계;를 더 포함할 수 있다. The container service configuration method according to the present invention may further include establishing a data channel between the container and the virtualized GPU.

컨테이너는, 사용자에게 서비스 제공을 위해, 인공지능 모델을 구동하는 응용일 수 있다.The container may be an application that drives an artificial intelligence model to provide a service to a user.

본 발명의 다른 측면에 따르면, GPU 할당을 요청하는 컨테이너; 컨테이너의 GPU 할당 요청을 GPU 가상화 모듈에 전달하는 CPU; 다수의 GPU들 중 적어도 하나를 이용하여 가상화된 GPU를 생성하고, 생성된 가상화된 GPU를 컨테이너에 할당하는 GPU 가상화 모듈;을 포함하는 것을 특징으로 하는 클라우드 시스템이 제공된다.According to another aspect of the present invention, a container for requesting GPU allocation; CPU passing the GPU allocation request of the container to the GPU virtualization module; A cloud system comprising: a GPU virtualization module that generates a virtualized GPU using at least one of a plurality of GPUs, and allocates the generated virtualized GPU to a container is provided.

이상 설명한 바와 같이, 본 발명의 실시예들에 따르면, 마이크로 서비스 아키텍처 환경에서 인공지능 모델이 구현된 컨테이너와 GPU 자원 간의 N:M 매칭을 가능하게 하여, 인공지능 컨테이너 모델을 위한 GPU 자원의 활용을 극대화할 수 있게 된다.As described above, according to the embodiments of the present invention, by enabling N:M matching between the container in which the AI model is implemented and the GPU resource in a microservice architecture environment, the utilization of GPU resources for the AI container model is improved. can be maximized.

또한, 본 발명의 실시예들에 따르면, 컨테이너와 GPU 자원 간의 직접적인 데이터 채널을 통한 빠른 데이터 접근이 가능해지고, vGPU 서비스 레이어를 통한 GPU 자원의 효과적인 관리가 가능해진다.In addition, according to embodiments of the present invention, fast data access through a direct data channel between the container and the GPU resource is enabled, and effective management of the GPU resource is enabled through the vGPU service layer.

도 1은 기존 인공지능 모델을 위한 컨테이너의 GPU 자원 할당 방법의 설명에 제공되는 도면,
도 2는 본 발명의 일 실시예에 따른 마이크로 서비스 구성 방법의 설명에 제공되는 도면,
도 3은 vGPU 레이어(130)의 구조를 도시한 블럭도, 그리고,
도 4는 본 발명의 다른 실시예에 따른 컨테이너 서비스 구성 방법의 설명에 제공되는 도면이다.1 is a diagram provided for explanation of a method for allocating GPU resources of containers for an existing artificial intelligence model;
2 is a diagram provided for explanation of a method for configuring microservices according to an embodiment of the present invention;
3 is a block diagram showing the structure of the vGPU layer 130, and,
4 is a diagram provided to explain a method for configuring a container service according to another embodiment of the present invention.

이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in more detail with reference to the drawings.

본 발명의 실시예에서는 컨테이너 기반으로 응용이 구동되는 마이크로 서비스 아키텍처 환경에서, 인공지능 모델의 구동을 위한 컨테이너와 GPU 자원 간의 N:M 매칭 기법을 제시한다. 구체적으로, 본 발명의 실시예에서는 한정된 GPU 자원을 분할(Partitioning) 및 집적(Aggregation) 하여 컨테이너에 할당함으로써 응용 서비스를 구성한다.In an embodiment of the present invention, an N:M matching technique between a container and GPU resources for driving an artificial intelligence model is presented in a microservice architecture environment in which a container-based application is driven. Specifically, in the embodiment of the present invention, an application service is configured by partitioning and aggregating limited GPU resources and allocating them to containers.

또한, 본 발명의 실시예에서는, 컨테이너와 GPU 자원 간의 빠른 데이터 접근을 위한 채널을 설정하며, GPU 자원의 효율적인 할당과 효과적인 관리를 위한 서비스 레이어를 제시한다.In addition, in an embodiment of the present invention, a channel for fast data access between a container and a GPU resource is established, and a service layer for efficient allocation and effective management of GPU resources is provided.

도 2는 본 발명의 일 실시예에 따른 마이크로 서비스 구성 방법이 적용된 클라우드 시스템의 구성을 도시한 도면이다.2 is a diagram illustrating a configuration of a cloud system to which a microservice configuration method according to an embodiment of the present invention is applied.

본 발명의 실시예에서는 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)에 GPU들(140-1, 140-2, 140-3, ..., 140-M)을 할당하고 관리하기 위한 구성으로 vGPU(virtualization GPU) 레이어(130)를 도입하였다.In the embodiment of the present invention, GPUs 140-1, 140-2, 140-3, ..., 140 in containers 110-1, 110-2, 110-3, ..., 110-N. -M), a virtualization GPU (vGPU) layer 130 is introduced as a configuration for allocating and managing.

컨테이너들(110-1, 110-2, 110-3, ..., 110-N)은 사용자들에게 인공지능 응용 서비스를 제공한다. 인공지능 모델을 구동하여야 하므로, 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)은 GPU 자원을 필요로 하며, 이에 필요한 GPU 자원의 할당을 CPU(120)에 요청한다(①).The containers 110-1, 110-2, 110-3, ..., 110-N provide AI application services to users. Since the artificial intelligence model needs to be driven, the containers 110-1, 110-2, 110-3, ..., 110-N require GPU resources, and the CPU 120 allocates the necessary GPU resources. to request (①).

CPU(120)는 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)의 GPU 자원 할당 요청을 vGPU 레이어(130)에 전달한다(②).The CPU 120 transmits the GPU resource allocation request of the containers 110-1, 110-2, 110-3, ..., 110-N to the vGPU layer 130 (②).

vGPU 레이어(130)는 GPU들(140-1, 140-2, 140-3, ..., 140-M)을 분할하거나 집적하여 주는 GPU 가상화 모듈이다. 이를 통해, vGPU 레이어(130)는 요구에 부합하는 GPU 자원을 가상화하여 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)에 할당한다.The vGPU layer 130 is a GPU virtualization module that divides or integrates the GPUs 140-1, 140-2, 140-3, ..., 140-M. Through this, the vGPU layer 130 virtualizes GPU resources that meet the demand and allocates them to the containers 110-1, 110-2, 110-3, ..., 110-N.

도 2에서, vGPU 레이어(130)는,In Figure 2, the vGPU layer 130,

1) GPU 코어 2개를 요구하는 컨테이너 #1(110-1)에 GPU #1(140-1)의 코어 2개를 분할하여 할당하였고, 1) Two cores of GPU #1 (140-1) were divided and allocated to container #1 (110-1) requiring two GPU cores,

2) GPU 코어 1개를 요구하는 컨테이너 #2(110-2)에 GPU #1(140-1)의 나머지 코어 1개를 분할하여 할당하였으며,2) The remaining 1 core of GPU #1 (140-1) was divided and allocated to container #2 (110-2) requiring 1 GPU core,

3) GPU 코어 2개를 요구하는 컨테이너 #3(110-3)에 GPU #3(140-3)의 코어 2개를 분할하여 할당하였고, 3) The two cores of GPU #3 (140-3) were divided and allocated to container #3 (110-3) requiring two GPU cores,

4) GPU 코어 3개를 요구하는 컨테이너 #N(110-N)에 GPU #3(140-3)의 코어 2개와 GPU #M(140-M)의 코어 1개를 집적하여 할당한 것으로 도시되어 있다.4) It is shown that two cores of GPU #3 (140-3) and one core of GPU #M (140-M) are integrated and allocated to container #N (110-N) requiring three GPU cores. there is.

이와 같이, vGPU 레이어(130)에 의해, 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)과 GPU들(140-1, 140-2, 140-3, ..., 140-M) 간의 N:M 매칭이 가능해진다.As such, by the vGPU layer 130, the containers 110-1, 110-2, 110-3, ..., 110-N and the GPUs 140-1, 140-2, 140-3, ..., 140-M) between N:M matching becomes possible.

컨테이너들(110-1, 110-2, 110-3, ..., 110-N)은 자신에게 할당된 GPU들(140-1, 140-2, 140-3, ..., 140-M)을 이용하여 인공지능 모델을 구동시킨다(③).Containers 110-1, 110-2, 110-3, ..., 110-N are GPUs 140-1, 140-2, 140-3, ..., 140-M assigned to them. ) to drive the artificial intelligence model (③).

한편, 도 2의 우측에 도시된 바와 같이, 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)과 GPU들(140-1, 140-2, 140-3, ..., 140-M) 간에는 데이터 채널이 설정되어 있다. 이 채널을 통해, 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)은 GPU들(140-1, 140-2, 140-3, ..., 140-M)에 빠르게 접근하여 데이터를 저장/인출할 수 있다.Meanwhile, as shown on the right side of FIG. 2 , containers 110-1, 110-2, 110-3, ..., 110-N and GPUs 140-1, 140-2, 140-3 , ..., 140-M), a data channel is established. Through this channel, containers 110-1, 110-2, 110-3, ..., 110-N are connected to GPUs 140-1, 140-2, 140-3, ..., 140- M) can be accessed quickly and data can be saved/retrieved.

컨테이너들(110-1, 110-2, 110-3, ..., 110-N)은 인공지능 모델을 구동하여 연산/추론 결과를 획득하고(④), 획득한 결과를 서비스를 요청한 사용자에게 제공한다.The containers (110-1, 110-2, 110-3, ..., 110-N) operate an artificial intelligence model to obtain calculation/inference results (④), and provide the obtained results to the user who requested the service. to provide.

이하에서는, 도 2에 도시된 vGPU 레이어(130)에 대해 도 3을 참조하여 상세히 설명한다. 도 3은 vGPU 레이어(130)의 구조를 도시한 블럭도이다.Hereinafter, the vGPU layer 130 shown in FIG. 2 will be described in detail with reference to FIG. 3 . 3 is a block diagram illustrating the structure of the vGPU layer 130 .

도 3에 도시된 바와 같이, vGPU 레이어(130)는 GPU 자원 관리 모듈(131), GPU-컨테이너 연결 관리 모듈(132), GPU 공유 메모리(133) 및 GPU 데이터 채널 관리 모듈(134)을 포함한다.3 , the vGPU layer 130 includes a GPU resource management module 131 , a GPU-container connection management module 132 , a GPU shared memory 133 , and a GPU data channel management module 134 . .

GPU 자원 관리 모듈(131)은 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)의 요청에 부합하는 GPU 자원을 할당한다. 이를 위해, GPU 자원 관리 모듈(131)은 GPU들(140-1, 140-2, 140-3, ..., 140-M)을 분할하거나 집적하여 가상화된 GPU를 생성할 수 있다.The GPU resource management module 131 allocates GPU resources corresponding to the requests of the containers 110-1, 110-2, 110-3, ..., 110-N. To this end, the GPU resource management module 131 may generate a virtualized GPU by dividing or integrating the GPUs 140-1, 140-2, 140-3, ..., 140-M.

또한, GPU 자원 관리 모듈(131)은 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)에 대한 GPU 자원 할당을 사후적으로도 관리한다. 이를 위해, GPU 자원 관리 모듈(131)은 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)에 의한 GPU 자원 사용을 모니터링한다.In addition, the GPU resource management module 131 also manages the allocation of GPU resources to the containers 110-1, 110-2, 110-3, ..., 110-N ex post. To this end, the GPU resource management module 131 monitors GPU resource usage by the containers 110-1, 110-2, 110-3, ..., 110-N.

모니터링 결과, GPU 자원이 더 필요한 컨테이너 대해 GPU 자원 관리 모듈(131)은 GPU 코어를 추가로 할당하고, 유휴되고 있는 GPU 자원이 많은 컨테이너 대해 GPU 자원 관리 모듈(131)은 GPU 코어를 감축할 수 있다. 나아가, GPU 자원 관리 모듈(131)은 컨테이너에 할당된 GPU 코어들 중 일부를 다른 GPU 코어로 변경할 수도 있다.As a result of monitoring, the GPU resource management module 131 may additionally allocate GPU cores to containers requiring more GPU resources, and the GPU resource management module 131 may reduce GPU cores for containers with many idle GPU resources. . Furthermore, the GPU resource management module 131 may change some of the GPU cores allocated to the container to other GPU cores.

GPU-컨테이너 연결 관리 모듈(132)은 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)과 GPU들(140-1, 140-2, 140-3, ..., 140-M) 간의 세션 연결을 관리하는 모듈이다.GPU-container connection management module 132 is containers (110-1, 110-2, 110-3, ..., 110-N) and GPUs (140-1, 140-2, 140-3, . .., 140-M) is a module that manages the connection between sessions.

GPU 데이터 채널 관리 모듈(134)은 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)과 GPU들(140-1, 140-2, 140-3, ..., 140-M) 간에는 데이터 채널을 관리한다. 이 과정에서, GPU 데이터 채널 관리 모듈(134)은 필요한 데이터 채널을 설정하고, 대역폭을 변경(증가/감소)할 수 없으며, 불필요한 데이터 채널에 대해서는 폐기/삭제한다.The GPU data channel management module 134 includes the containers 110-1, 110-2, 110-3, ..., 110-N and the GPUs 140-1, 140-2, 140-3, .. ., 140-M) to manage the data channel. In this process, the GPU data channel management module 134 sets a necessary data channel, cannot change (increase/decrease) the bandwidth, and discard/delete unnecessary data channels.

도 4는 본 발명의 다른 실시예에 따른 컨테이너 서비스 구성 방법의 설명에 제공되는 도면이다.4 is a diagram provided to explain a method for configuring a container service according to another embodiment of the present invention.

서비스 구성을 위해, 먼저 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)이 필요한 GPU 자원의 할당을 CPU(120)에 요청하면, CPU(120)는 S210단계에서의 GPU 자원 할당 요청을 vGPU 레이어(130)에 전달한다(S210).For service configuration, first, when the containers 110-1, 110-2, 110-3, ..., 110-N request the CPU 120 to allocate the necessary GPU resources, the CPU 120 performs S210 The GPU resource allocation request in step is transferred to the vGPU layer 130 (S210).

vGPU 레이어(130)는 GPU 자원을 가상화하여 S210단계에서 전달받은 요청에 부합하는 GPU 자원을 생성하고(S220), 생성한 GPU 자원을 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)에 할당한다(S230).The vGPU layer 130 virtualizes the GPU resource to generate a GPU resource corresponding to the request received in step S210 (S220), and uses the generated GPU resource in the containers 110-1, 110-2, 110-3, . .., 110-N) (S230).

S220단계에서, GPU 코어의 분할이나 집적이 이루어질 수 있으며, 이로 인해, 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)과 GPU들(140-1, 140-2, 140-3, ..., 140-M) 간의 N:M 매칭이 가능해진다.In step S220, the GPU core may be divided or integrated, and thus the containers 110-1, 110-2, 110-3, ..., 110-N and the GPUs 140-1, 140 N:M matching between -2, 140-3, ..., 140-M) becomes possible.

그리고, vGPU 레이어(130)는 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)과 GPU들(140-1, 140-2, 140-3, ..., 140-M) 간의 데이터 채널을 설정한다(S240).In addition, the vGPU layer 130 includes containers 110-1, 110-2, 110-3, ..., 110-N and GPUs 140-1, 140-2, 140-3, ... , 140-M) to establish a data channel (S240).

컨테이너들(110-1, 110-2, 110-3, ..., 110-N)은 설정된 데이터 채널로 자신에게 할당된 GPU들(140-1, 140-2, 140-3, ..., 140-M)에 접근하여 인공지능 모델을 구동시키고(S250), 연산/추론 결과를 서비스를 요청한 사용자에게 제공한다(S260).Containers 110-1, 110-2, 110-3, ..., 110-N are GPUs 140-1, 140-2, 140-3, ... allocated to themselves through a set data channel. , 140-M) to drive the artificial intelligence model (S250), and provide calculation/inference results to the user who requested the service (S260).

한편, vGPU 레이어(130)는 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)에 의한 GPU 자원 사용을 모니터링하여, GPU 자원을 변경할 필요가 있는지 판단한다(S270).On the other hand, the vGPU layer 130 monitors the GPU resource usage by the containers 110-1, 110-2, 110-3, ..., 110-N, and determines whether it is necessary to change the GPU resource ( S270).

변결할 필요가 있다고 판단되면(S270-Y), vGPU 레이어(130)는 컨테이너들(110-1, 110-2, 110-3, ..., 110-N)에 할당된 GPU 자원을 변경한다(S280).If it is determined that it is necessary to change (S270-Y), the vGPU layer 130 changes the GPU resources allocated to the containers 110-1, 110-2, 110-3, ..., 110-N. (S280).

지금까지, 마이크로 서비스 환경에서 GPU 자원의 집적 및 분할을 통한 서비스 구성 방법 및 이를 적용한 클라우드 시스템에 대해 바람직한 실시예를 들어 상세히 설명하였다.So far, a preferred embodiment has been described in detail for a service configuration method through integration and division of GPU resources in a microservice environment and a cloud system to which it is applied.

본 발명의 실시예에서는 인공지능 모델을 위한 마이크로 서비스 아키텍처 환경에서 인공지능 모델이 구현된 컨테이너와 GPU 자원 간의 N:M 매칭 기능과 함께 빠른 데이터 접근을 위한 채널을 제공함으로 컨테이너 단위로 배포되는 인공지능 모델의 구동을 위해 필요한 GPU 자원의 활용을 극대화하기 위한 방안을 제시하였다.In an embodiment of the present invention, artificial intelligence deployed in container units by providing a channel for fast data access together with an N:M matching function between a container in which an artificial intelligence model is implemented and a GPU resource in a microservice architecture environment for an artificial intelligence model A method for maximizing the utilization of GPU resources required to run the model was presented.

이를 통해, 인공지능 컨테이너 모델을 위한 GPU 자원의 활용을 극대화하고, 컨테이너와 GPU 자원 간의 직접적인 데이터 채널을 통한 빠른 데이터을 접근하게 하며, vGPU 서비스 레이어를 통한 GPU 자원 관리가 가능해진다.Through this, it maximizes the utilization of GPU resources for the AI container model, enables fast data access through a direct data channel between containers and GPU resources, and enables GPU resource management through the vGPU service layer.

한편, 본 실시예에 따른 장치와 방법의 기능을 수행하게 하는 컴퓨터 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록매체에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다. 또한, 본 발명의 다양한 실시예에 따른 기술적 사상은 컴퓨터로 읽을 수 있는 기록매체에 기록된 컴퓨터로 읽을 수 있는 코드 형태로 구현될 수도 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터에 의해 읽을 수 있고 데이터를 저장할 수 있는 어떤 데이터 저장 장치이더라도 가능하다. 예를 들어, 컴퓨터로 읽을 수 있는 기록매체는 ROM, RAM, CD #ROM, 자기 테이프, 플로피 디스크, 광디스크, 하드 디스크 드라이브, 등이 될 수 있음은 물론이다. 또한, 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터로 읽을 수 있는 코드 또는 프로그램은 컴퓨터간에 연결된 네트워크를 통해 전송될 수도 있다.On the other hand, it goes without saying that the technical idea of the present invention can be applied to a computer-readable recording medium containing a computer program for performing the functions of the apparatus and method according to the present embodiment. In addition, the technical ideas according to various embodiments of the present invention may be implemented in the form of computer-readable codes recorded on a computer-readable recording medium. The computer-readable recording medium may be any data storage device readable by the computer and capable of storing data. For example, the computer-readable recording medium may be a ROM, RAM, CD #ROM, magnetic tape, floppy disk, optical disk, hard disk drive, and the like. In addition, the computer-readable code or program stored in the computer-readable recording medium may be transmitted through a network connected between computers.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the present invention belongs without departing from the gist of the present invention as claimed in the claims In addition, various modifications are possible by those of ordinary skill in the art, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

110-1, 110-2, 110-3, ..., 110-N : 컨테이너
120 : CPU
130 : vGPU(virtualization GPU) 레이어
131 : GPU 자원 관리 모듈
132 : GPU-컨테이너 연결 관리 모듈
133 : GPU 공유 메모리
134 : GPU 데이터 채널 관리 모듈
140-1, 140-2, 140-3, ..., 140-M : GPU110-1, 110-2, 110-3, ..., 110-N: container
120 : CPU
130 : virtualization GPU (vGPU) layer
131: GPU resource management module
132: GPU-container connection management module
133 : GPU shared memory
134: GPU data channel management module
140-1, 140-2, 140-3, ..., 140-M: GPU

Claims

requesting, by the container, allocating a GPU to the CPU;
transmitting, by the CPU, the received GPU allocation request to the GPU virtualization module;
generating, by the GPU virtualization module, a virtualized GPU using at least one of a plurality of GPUs;
Allocating the created virtualized GPU to a container; container service configuration method comprising the.

The method according to claim 1,
The creation step is
A container service configuration method, characterized in that by partitioning some of the cores of the first GPU, a virtualized GPU to be allocated to the first container is generated.

3. The method according to claim 2,
The creation step is
A container service configuration method, characterized in that by dividing another part of the cores of the first GPU to generate a virtualized GPU to be allocated to the second container.

3. The method according to claim 2,
The creation step is
A container service configuration method, characterized in that by aggregating some of the cores of the first GPU and some of the cores of the second GPU to generate a virtualized GPU to be allocated to the third container.

The method according to claim 1,
Changing, by the GPU virtualization module, the configuration of the virtualized GPU allocated to the container; Container service configuration method further comprising a.

The method according to claim 1,
Establishing a data channel between the container and the virtualized GPU; container service configuration method comprising the further comprising.

The method according to claim 1,
container,
A container service configuration method, characterized in that it is an application that drives an artificial intelligence model to provide a service to a user.

Container requesting GPU allocation;
CPU passing the GPU allocation request of the container to the GPU virtualization module;
A cloud system comprising: a GPU virtualization module that generates a virtualized GPU using at least one of a plurality of GPUs, and allocates the generated virtualized GPU to a container.