KR102662498B1

KR102662498B1 - Method for dynamically switching deep learning models according to inference response time to user queries

Info

Publication number: KR102662498B1
Application number: KR1020230122960A
Authority: KR
Inventors: 조창희; 고형석; 이홍재
Original assignee: (주)유알피
Priority date: 2023-09-15
Filing date: 2023-09-15
Publication date: 2024-05-03

Abstract

본 발명은 사용자 질의에 대한 추론 응답 시간에 따라 딥러닝 모델을 동적으로 전환하는 방법에 관한 것으로, 다수의 사용자에 의한 추론 요청에 따라 딥러닝 모델을 이용하여 추론을 수행할 때, 여러 매개변수 크기를 갖는 딥러닝 모델을 동적으로 전환하여 최적의 추론 응답 속도를 확보할 수 있는 방법에 관한 것이다.The present invention relates to a method of dynamically switching a deep learning model according to the inference response time to a user query. When performing inference using a deep learning model according to inference requests by multiple users, various parameter sizes are used. This is about a method of securing optimal inference response speed by dynamically switching a deep learning model with .

Description

How to dynamically switch deep learning models according to the inference response time to user queries {METHOD FOR DYNAMICALLY SWITCHING DEEP LEARNING MODELS ACCORDING TO INFERENCE RESPONSE TIME TO USER QUERIES}

본 발명은 사용자 질의에 대한 추론 응답 시간에 따라 딥러닝 모델을 동적으로 전환하는 방법에 관한 것으로, 더욱 상세하게는 다수의 사용자에 의한 추론 요청에 따라 딥러닝 모델을 이용하여 추론을 수행할 때, 여러 매개변수 크기를 갖는 딥러닝 모델을 동적으로 전환하여 최적의 추론 응답 속도를 확보할 수 있는 방법에 관한 것이다.The present invention relates to a method of dynamically switching a deep learning model according to the inference response time to a user inquiry. More specifically, when performing inference using a deep learning model according to inference requests by multiple users, This is about a method to secure optimal inference response speed by dynamically switching deep learning models with multiple parameter sizes.

딥러닝 모델은 복잡한 계산이 많이 필요한 특성을 가지고 있기 때문에, CPU를 사용하는 것보다 GPU를 사용하여 추론을 수행하는 것이 추론 시간을 단축시키는데 도움이 된다.Because deep learning models require a lot of complex calculations, performing inference using a GPU rather than a CPU helps reduce inference time.

예를 들어, 이미지 인식 모델의 경우 GPU를 이용하면 CPU로 수행하는 시간의 10분의 1 이하로 추론을 수행할 수 있다. GPU는 그래픽 설계를 위해 설계된 하드웨어로서, 병렬 연산에 특화되어 있고, 시간을 단축시킬 수 있는 것은 물론, 전력 효율이 높으며, 비용이 저렴하다.For example, in the case of an image recognition model, using a GPU, inference can be performed in less than one-tenth of the time it takes with a CPU. GPU is a piece of hardware designed for graphic design, specialized in parallel computation, capable of reducing time, high power efficiency, and low cost.

이러한 GPU를 이용한 딥러닝 추론은 자율주행, 영상분석, 자연어 처리 등의 각종 분야에 활발하게 사용되고 있다.Deep learning inference using GPUs is actively used in various fields such as autonomous driving, image analysis, and natural language processing.

그러나 다수의 딥러닝 모델은 다수의 사용자가 동시에 추론을 요청하는 경우가 매우 빈번하고, 이로 인해서 추론에 필요한 GPU 리소스가 부족하여 응답 속도가 저하되는 문제가 발생할 수 있다.However, for multiple deep learning models, it is very frequent that multiple users request inference at the same time, which can cause a problem of slow response speed due to insufficient GPU resources required for inference.

따라서 본 발명에서는 다수의 사용자에 의한 요청에 따른 최적의 추론 응답 속도를 확보하기 위해서 여러 매개변수 딥러닝 모델을 동적으로 전환함으로써, 추론 작업에 필요한 GPU의 리소스 부족을 방지하여 응답 속도를 개선할 수 있는 방안을 제시하고자 한다.Therefore, in the present invention, in order to secure the optimal inference response speed according to requests by multiple users, the response speed can be improved by dynamically switching multiple parameter deep learning models to prevent insufficient resources of the GPU required for inference work. I would like to suggest a solution.

다음으로 본 발명의 기술분야에 존재하는 선행발명에 대하여 간단하게 설명하고, 이어서 본 발명이 상기 선행발명에 비해서 차별적으로 이루고자 하는 기술적 사항에 대해서 기술하고자 한다.Next, we will briefly describe the prior inventions existing in the technical field of the present invention, and then describe the technical details that the present invention seeks to achieve differently compared to the prior inventions.

한국등록특허 제10-2374530호(2022.03.16.)는 질의 메시지에 대한 서로 다른 기준에 따라 인텐츠를 도출하는 복수의 추론기를 통해 답변 콘텐츠를 추출하고, 추론율에 따른 우선순위별로 정렬하여 사용자에게 제공함으로써 보다 사용자의 의도에 부합하는 답변을 제공할 수 있는 최적 질의 응답 시스템 및 방법에 관한 선행발명이다.Korean Patent No. 10-2374530 (2022.03.16.) extracts answer content through a plurality of inference machines that derive intent according to different standards for the query message, sorts it by priority according to the inference rate, and provides information to the user. This is a prior invention regarding an optimal question answering system and method that can provide answers that are more in line with the user's intent.

하지만, 본 발명은 여러 매개변수 딥러닝 모델을 동적으로 전환하여 사용자의 요청에 대한 추론 응답 속도를 확보하는 것으로서, 기계학습 기반 대화형 메신저 프로그램 등에 사용자의 다양한 형태의 질의에 대하여 최적의 응답을 도출하여 회신하는 상기 한국등록특허 제10-2374530호(2022.03.16.)와 비교해 볼 때, 현저한 구성상 차이점이 있다.However, the present invention secures the inference response speed to user requests by dynamically switching multiple parameter deep learning models, and derives optimal responses to various types of user inquiries such as machine learning-based interactive messenger programs. When compared to the Korean Patent No. 10-2374530 (2022.03.16.) to which we reply, there are significant differences in composition.

본 발명은 상기와 같은 문제점을 해결하기 위해 창작된 것으로서, 다수의 사용자의 질의(즉, 추론 요청)에 의해 추론을 수행할 때 여러 매개변수 크기를 갖는 딥러닝 모델을 동적으로 전환하여 최적의 추론 응답 속도를 확보할 수 있는 방법을 제공하는 것을 목적으로 한다.The present invention was created to solve the above problems, and when performing inference by multiple users' queries (i.e., inference requests), deep learning models with various parameter sizes are dynamically switched to provide optimal inference. The purpose is to provide a method to secure response speed.

또한, 본 발명은 추론에 필요한 GPU 리소스의 부족으로 인한 응답 속도의 저하를 방지할 수 있는 방법을 제공하는 것을 다른 목적으로 한다.Another purpose of the present invention is to provide a method for preventing a decrease in response speed due to a lack of GPU resources required for inference.

또한, 본 발명은 현재 적용중인 딥러닝 모델을 이용한 추론 응답 시간과 기 설정된 응답 시간별 딥러닝 모델 맵을 활용하여, 최적화된 추론 응답 속도를 가진 딥러닝 모델로 동적인 전환을 수행할 수 있는 방법을 제공하는 것을 또 다른 목적으로 한다.In addition, the present invention provides a method for dynamically switching to a deep learning model with an optimized inference response speed by utilizing the inference response time using the currently applied deep learning model and the deep learning model map for each preset response time. Another purpose is to provide

또한, 본 발명은 각 딥러닝 모델마다 도메인 특성에 맞추어 정제된 데이터 셋으로 학습하여, 각 딥러닝 모델에 대해서 비슷한 성능을 구현할 수 있는 방법을 제공하는 것을 또 다른 목적으로 한다.In addition, another purpose of the present invention is to provide a method for implementing similar performance for each deep learning model by learning with a data set refined according to domain characteristics for each deep learning model.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical challenges that this embodiment aims to achieve are not limited to the technical challenges described above, and other technical challenges may exist.

본 발명의 일 실시예에 따른 사용자 질의에 대한 추론 응답 시간에 따라 딥러닝 모델을 동적으로 전환하는 방법은, 딥러닝 모델을 동적으로 전환하는 시스템에서 수행되는 것으로서, 복수의 사용자 단말로부터 추론 요청을 수신하는 추론 요청 수신 단계; 상기 수신한 추론 요청을 현재 적용중인 딥러닝 모델에 입력하여 추론하는 추론 수행 단계; 상기 현재 적용중인 딥러닝 모델에서 수행한 각 사용자별 추론에 소요된 추론 응답 시간을 측정하고, 상기 측정한 각 사용자별 추론에 소요된 추론 응답 시간을 평균하여, 평균 추론 응답 시간을 측정하는 추론 응답 시간 측정 단계; 및 추론에 필요한 GPU 리소스 부족으로 인한 응답속도 저하를 방지하기 위해서, 상기 측정한 추론 응답 시간을 참조하여, 상기 현재 적용중인 딥러닝 모델을 복수의 딥러닝 모델 중에서 추론 응답 시간을 최적화하는 어느 하나의 딥러닝 모델로 전환하는 딥러닝 모델 동적 전환 단계;를 포함하며, 상기 딥러닝 모델 동적 전환 단계는, 상기 추론 응답 시간 측정 단계에서 평균 추론 응답 시간이 측정되면, 기 설정된 응답 시간별 딥러닝 모델 맵을 조회하는 단계; 상기 현재 적용중인 딥러닝 모델의 매개변수 크기와 상기 응답 시간별 딥러닝 모델 맵에서 확인한 상기 평균 추론 응답 시간과 동일한 추론 응답 시간에 매칭된 매개변수 크기를 비교하는 단계; 상기 비교한 결과 기 설정된 동적 전환 조건에 해당하는지를 판단하는 단계; 상기 판단한 결과 상기 동적 전환 조건을 만족하지 않으면, 상기 현재 적용중인 딥러닝 모델을 그대로 사용하여 상기 사용자 단말의 추론 요청에 따른 추론을 수행하는 단계; 및 상기 판단한 결과 상기 동적 전환 조건을 만족하면, 상기 현재 적용중인 딥러닝 모델을 상기 응답 시간별 딥러닝 모델 맵에서 확인한 상기 평균 추론 응답 시간과 동일한 추론 응답 시간에 매칭된 매개변수 크기를 갖는 딥러닝 모델로 전환하는 단계;를 포함하는 것을 특징으로 한다.The method of dynamically switching a deep learning model according to the inference response time to a user query according to an embodiment of the present invention is performed in a system that dynamically switches the deep learning model, and receives inference requests from a plurality of user terminals. A step of receiving an inference request; an inference performing step of inferring by inputting the received inference request into a deep learning model currently being applied; An inference response that measures the inference response time required for inference for each user performed in the currently applied deep learning model, averages the inference response time required for inference for each user measured, and measures the average inference response time. time measurement step; And in order to prevent a decrease in response speed due to lack of GPU resources required for inference, any one that optimizes the inference response time of the currently applied deep learning model among a plurality of deep learning models with reference to the measured inference response time. A deep learning model dynamic conversion step of converting to a deep learning model, wherein the deep learning model dynamic conversion step includes, when the average inference response time is measured in the inference response time measurement step, a deep learning model map for each preset response time. Inquiry step; Comparing the parameter size of the currently applied deep learning model with the parameter size matched to the same inference response time as the average inference response time confirmed in the deep learning model map for each response time; determining whether the comparison results meet preset dynamic conversion conditions; If the dynamic switching condition is not satisfied as a result of the determination, performing inference according to the inference request of the user terminal by using the currently applied deep learning model as is; And if the dynamic switching condition is satisfied as a result of the determination, the currently applied deep learning model is a deep learning model having a parameter size matched to the same inference response time as the average inference response time confirmed in the deep learning model map for each response time. Characterized in that it includes a step of converting to;

삭제delete

또한, 상기 동적 전환 조건은, 상기 현재 적용중인 딥러닝 모델의 매개변수 크기와 상기 응답 시간별 딥러닝 모델 맵에서 확인한 상기 평균 추론 응답 시간과 동일한 추론 응답 시간에 매칭된 매개변수 크기가 다르고, 매개변수의 크기가 다른 것이 연속적으로 기 설정된 횟수 이상 발생하는 조건인 것을 특징으로 한다.In addition, in the dynamic switching condition, the parameter size of the currently applied deep learning model is different from the parameter size matched to the same inference response time as the average inference response time confirmed in the deep learning model map for each response time, and the parameter size is different. It is characterized by a condition in which the size of is different and occurs continuously more than a preset number of times.

또한, 상기 추론 수행 단계는, 상기 딥러닝 모델 동적 전환 단계를 통해 상기 현재 적용중인 딥러닝 모델을 상기 응답 시간별 딥러닝 모델 맵에서 확인한 상기 평균 추론 응답 시간과 동일한 추론 응답 시간을 갖는 딥러닝 모델로 전환하는 것을 완료할 때까지 상기 현재 적용중인 딥러닝 모델을 사용하여 상기 사용자 단말의 추론 요청에 대한 추론을 수행하는 것을 더 포함하는 것을 특징으로 한다.In addition, the inference performance step converts the currently applied deep learning model into a deep learning model with an inference response time equal to the average inference response time confirmed in the deep learning model map for each response time through the deep learning model dynamic conversion step. The method further includes performing inference on the inference request of the user terminal using the currently applied deep learning model until the conversion is completed.

또한, 상기 응답 시간별 딥러닝 모델 맵은, 사전에 테스트를 통해 설정된 것으로서, 주어진 GPU에서 추론한 최소 및 최대 추론 응답 시간에 따라 매개 변수 크기가 다른 딥러닝 모델을 매칭하여 설정한 것을 특징으로 한다.In addition, the deep learning model map by response time is set through testing in advance, and is characterized by matching deep learning models with different parameter sizes according to the minimum and maximum inference response times inferred from a given GPU.

또한, 상기 딥러닝 모델은, 각 딥러닝 모델마다 도메인 특성에 맞추어 정제된 데이터 셋으로 학습하여 생성함으로써, 매개변수 크기에 따른 추론 응답의 품질에 차이가 발생하지 않도록 하는 것을 특징으로 한다.In addition, the deep learning model is characterized in that each deep learning model is created by learning from a data set refined according to the domain characteristics, thereby preventing differences in the quality of the inference response depending on the parameter size.

또한, 상기 도메인은, 상기 딥러닝 모델이 학습하는 데이터의 특성으로서, 이미지, 텍스트, 음성 및 비디오를 포함한 데이터 도메인과 분류, 회귀, 생성, 추천 및 로봇 제어를 포함한 작업 도메인으로 나눌 수 있는 것을 특징으로 한다.In addition, the domain is a characteristic of the data learned by the deep learning model, and can be divided into a data domain including images, text, voice, and video, and a task domain including classification, regression, generation, recommendation, and robot control. Do it as

이상에서와 같이 본 발명의 사용자 질의에 대한 추론 응답 시간에 따라 딥러닝 모델을 동적으로 전환하는 방법에 따르면, 다수의 사용자에 의한 추론 요청에 대하여 여러 매개변수 딥러닝 모델을 동적으로 전환하여 최적의 추론 응답 속도를 확보함으로써, 추론에 필요한 GPU 리소스의 부족으로 인하여 응답 속도가 저하되는 것을 방지할 수 있는 효과가 있다.As described above, according to the method of dynamically switching a deep learning model according to the inference response time to a user query of the present invention, the optimal deep learning model is dynamically switched in response to inference requests by multiple users. By securing the inference response speed, there is an effect of preventing the response speed from deteriorating due to a lack of GPU resources required for inference.

다만, 본 발명의 효과가 상술한 효과들로 제한되는 것은 아니며, 언급되지 아니한 효과들은 본 명세서 및 첨부된 도면으로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확히 이해될 수 있을 것이다.However, the effects of the present invention are not limited to the effects described above, and effects not mentioned can be clearly understood by those skilled in the art from this specification and the attached drawings.

도 1은 본 발명의 일 실시예에 따른 사용자 질의에 대한 추론 응답 시간에 따라 딥러닝 모델을 동적으로 전환하는 시스템을 포함한 전체 구성을 개략적으로 나타낸 도면이다.
도 2는 본 발명의 일 실시예에 따른 딥러닝 모델을 동적으로 전환하는 시스템의 구성을 보다 상세하게 나타낸 블록도이다.
도 3은 본 발명에 적용된 응답 시간별 딥러닝 모델 맵의 예시를 나타낸 도면이다.
도 4는 본 발명의 일 실시예에 따른 딥러닝 모델을 동적으로 전환하는 시스템의 하드웨어 구조를 나타낸 도면이다.
도 5는 본 발명의 일 실시예에 따른 사용자 질의에 대한 추론 응답 시간에 따라 딥러닝 모델을 동적으로 전환하는 방법의 동작과정을 상세하게 나타낸 순서도이다.Figure 1 is a diagram schematically showing the entire configuration including a system for dynamically switching a deep learning model according to the inference response time to a user inquiry according to an embodiment of the present invention.
Figure 2 is a block diagram showing in more detail the configuration of a system for dynamically switching a deep learning model according to an embodiment of the present invention.
Figure 3 is a diagram showing an example of a deep learning model map for each response time applied to the present invention.
Figure 4 is a diagram showing the hardware structure of a system for dynamically switching deep learning models according to an embodiment of the present invention.
Figure 5 is a flowchart showing in detail the operation process of a method for dynamically switching a deep learning model according to the inference response time to a user inquiry according to an embodiment of the present invention.

이하에서는 도면을 참조하여 본 발명의 구체적인 실시예를 상세하게 설명한다. 다만, 본 발명의 사상은 제시되는 실시예에 제한되지 아니하고, 본 발명의 사상을 이해하는 당업자는 동일한 사상의 범위 내에서 다른 구성요소를 추가, 변경, 삭제 등을 통하여, 퇴보적인 다른 발명이나 본 발명 사상의 범위 내에 포함되는 다른 실시예를 용이하게 제안할 수 있을 것이나, 이 또한 본원 발명 사상 범위 내에 포함된다고 할 것이다.Hereinafter, specific embodiments of the present invention will be described in detail with reference to the drawings. However, the spirit of the present invention is not limited to the presented embodiments, and a person skilled in the art who understands the spirit of the present invention may add, change, or delete other components within the scope of the same spirit, thereby creating other degenerative inventions or the present invention. Other embodiments that are included within the scope of the invention can be easily proposed, but this will also be said to be included within the scope of the invention of the present application.

또한, 각 실시예의 도면에 나타나는 동일한 사상의 범위 내의 기능이 동일한 구성요소는 동일한 참조부호를 사용하여 설명한다.In addition, components having the same function within the scope of the same idea shown in the drawings of each embodiment will be described using the same reference numerals.

도 1은 본 발명의 일 실시예에 따른 사용자 질의에 대한 추론 응답 시간에 따라 딥러닝 모델을 동적으로 전환하는 시스템을 포함한 전체 구성을 개략적으로 나타낸 도면이다.Figure 1 is a diagram schematically showing the entire configuration including a system for dynamically switching a deep learning model according to the inference response time to a user inquiry according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명은 사용자 질의에 대한 추론 응답 시간에 따라 딥러닝 모델을 동적으로 전환하는 시스템(100, 이하 딥러닝 모델을 동적으로 전환하는 시스템이라 함), 복수의 사용자 단말(200), 데이터베이스(300)등을 포함하여 구성된다.As shown in Figure 1, the present invention is a system for dynamically switching a deep learning model according to the inference response time to a user inquiry (100, hereinafter referred to as a system for dynamically switching a deep learning model), and a plurality of user terminals. (200), database (300), etc.

상기 딥러닝 모델을 동적으로 전환하는 시스템(100)은 서버 컴퓨터, 플랫폼 등으로 구성되고, 네트워크를 통해 통신 접속된 다수의 사용자 단말(200)로부터 딥러닝 모델을 이용한 추론 요청을 접수하며, 상기 사용자 단말(200)의 추론 요청에 따른 응답을 제공한다.The system 100 for dynamically switching the deep learning model is composed of a server computer, a platform, etc., and receives inference requests using a deep learning model from a plurality of user terminals 200 connected to communication through a network, and the user A response is provided according to the inference request from the terminal 200.

이때 상기 사용자 단말(200)의 추론 요청은 동시에 요청되는 경우가 빈번하며, 이로 인해서 추론 응답에 필요한 GPU 리소스가 부족하여 추론 요청에 대한 응답에 있어서 속도가 저하될 수 있다.At this time, inference requests from the user terminal 200 are often requested simultaneously, and as a result, GPU resources required for inference responses are insufficient, which may result in a slowdown in responses to inference requests.

이를 해결하기 위해서, 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)은 다수의 사용자 단말(200)로부터 수신한 추론 요청에 대한 추론을 수행할 때, 여러 매개변수 크기를 갖는 딥러닝 모델을 동적으로 전환하여 최적의 추론 응답 속도를 확보할 수 있도록 한다.To solve this problem, the system 100 for dynamically switching the deep learning model dynamically switches a deep learning model with several parameter sizes when performing inference on inference requests received from a plurality of user terminals 200. Switch to to ensure optimal inference response speed.

특히, 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)은 현재 적용중인 딥러닝 모델을 이용한 다수 사용자의 추론 요청에 대한 평균 추론 응답 시간과 기 설정된 응답 시간별 딥러닝 모델 맵을 활용하여, 최적화된 추론 응답 속도를 가진 딥러닝 모델로 동적인 전환을 수행함으로써, 추론에 필요한 GPU 리소스의 부족으로 인한 응답 속도의 저하를 방지하면서 최적의 응답 속도로 사용자에게 추론 결과를 제공할 수 있도록 한다.In particular, the system 100 for dynamically switching the deep learning model utilizes the average inference response time for inference requests from multiple users using the currently applied deep learning model and the deep learning model map for each preset response time to optimize the By performing a dynamic transition to a deep learning model with an inference response speed, it prevents a decrease in response speed due to a lack of GPU resources required for inference and provides inference results to users at an optimal response speed.

이때 상기 응답 시간별 딥러닝 모델 맵은 최소 추론 응답 시간 및 최대 추론 응답 시간에 따라 매개변수(parameter) 크기가 다른 딥러닝 모델을 매칭한 정보이다.(도 3 참조)At this time, the deep learning model map by response time is information matching deep learning models with different parameter sizes according to the minimum inference response time and maximum inference response time (see Figure 3).

또한, 상기 응답 시간별 딥러닝 모델 맵상에 매칭되어 있는 각 딥러닝 모델은 도메인 특성에 맞추어 정제된 데이터 셋으로 학습하여 생성하도록 함으로써, 매개변수 크기에 따라 응답 품질에 차이가 발생하지 않도록 하는 것이 필요하다.In addition, it is necessary to ensure that there is no difference in response quality depending on parameter size by learning and creating each deep learning model matched on the deep learning model map for each response time with a data set refined according to domain characteristics. .

한편, 상기 딥러닝 모델의 매개변수 크기는 신경망이 학습해야 하는 가중치와 편향의 개수를 나타내는 것으로서, 상기 가중치는 입력과 출력을 연결하는 역할을 하고, 편향은 뉴런의 출력에 추가되는 값이며, 매개변수 크기가 클수록 신경망은 더 복잡한 관계를 학습할 수 있다.Meanwhile, the parameter size of the deep learning model represents the number of weights and biases that the neural network must learn. The weights serve to connect input and output, the bias is a value added to the output of the neuron, and the The larger the variable size, the more complex relationships the neural network can learn.

일 예로, GPT-3는 1,750억 개의 매개변수를 가진 언어 모델이며, 이와 같은 매개변수의 크기는 텍스트의 의미를 이해하고 생성하는 데 필요한 복잡한 관계를 학습할 수 있음을 의미한다.As an example, GPT-3 is a language model with 175 billion parameters, and the size of these parameters means that it can learn the complex relationships needed to understand and create meaning in text.

또한, CNN, RNN, LSTM 등과 같은 모델의 종류에 따라 필요한 매개변수의 수가 다르고, 모델의 층 구조가 많을수록 필요한 매개변수의 수가 많아지고, 데이터의 크기가 클수록 필요한 매개변수의 수가 많아진다.In addition, the number of required parameters varies depending on the type of model such as CNN, RNN, LSTM, etc. The more layered the model is, the more required parameters are, and the larger the data size, the more required parameters are.

이처럼 상기 딥러닝 모델의 매개변수 크기는 모델의 성능에 영향을 미치는데, 딥러닝 모델을 설계할 때 매개변수 크기를 적절하게 선택하는 것이 매우 중요하다.In this way, the parameter size of the deep learning model affects the performance of the model, and it is very important to appropriately select the parameter size when designing a deep learning model.

매개변수 크기가 클수록 모델의 성능이 향상되지만, 매개변수 크기가 너무 크면 모델이 훈련하기 어려워지고 과적합의 위험이 높아질 수 있기 때문이다.The larger the parameter size, the better the model's performance, but if the parameter size is too large, the model may become difficult to train and the risk of overfitting may increase.

상기 사용자 단말(200)은 각 개인이 보유하고 있는 랩톱(laptop) 컴퓨터, 노트북 컴퓨터, 데스크톱 컴퓨터, 웹 패드, 이동 전화기와 같이 프로세서를 탑재하고 메모리를 구비한 연산 능력을 갖춘 디지털 기기일 수 있다. 또한, 웹 기반 또는 별도의 소프트웨어/애플리케이션 등을 통해 서버 또는 시스템에서 제공하는 각종 기능을 실행할 수 있다.The user terminal 200 may be a digital device equipped with a processor and memory and computing power, such as a laptop computer, notebook computer, desktop computer, web pad, or mobile phone owned by each individual. Additionally, various functions provided by the server or system can be executed through web-based or separate software/applications.

상기 사용자 단말(200)은 네트워크를 통해 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)에 접속하여 딥러닝 모델을 이용한 추론을 요청한다. 즉, 사용자가 궁금해 하는 사항에 대한 질의를 수행하는 것이다.The user terminal 200 connects to the system 100 for dynamically switching the deep learning model through the network and requests inference using the deep learning model. In other words, it performs inquiries about what the user is curious about.

상기 네트워크는 유선 공중망, 무선 이동 통신망, 또는 휴대 인터넷 등과 통합된 코어 망일 수도 있고, TCP/IP 프로토콜 및 그 상위 계층에 존재하는 여러 서비스, 즉 HTTP(Hyper Text Transfer Protocol), HTTPS(Hyper Text Transfer Protocol Secure), Telnet, FTP(File Transfer Protocol) 등을 제공하는 전 세계적인 개방형 컴퓨터 네트워크 구조를 의미할 수 있으며, 이러한 예에 한정하지 않고 다양한 형태로 데이터를 송수신할 수 있는 데이터 통신망을 포괄적으로 의미하는 것이다.The network may be a core network integrated with a wired public network, a wireless mobile communication network, or a mobile Internet, and may include the TCP/IP protocol and various services existing at its upper layer, such as HTTP (Hyper Text Transfer Protocol) and HTTPS (Hyper Text Transfer Protocol). Secure), Telnet, FTP (File Transfer Protocol), etc., it can refer to a global open computer network structure, and is not limited to these examples, but comprehensively refers to a data communication network that can transmit and receive data in various forms. .

또한 상기 신청자 단말(200)은 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)으로부터 추론 요청에 대한 추론 결과를 제공받는다.Additionally, the applicant terminal 200 receives inference results for inference requests from the system 100 that dynamically switches the deep learning model.

상기 데이터베이스(300)는 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)에서 사용하는 각종 동작프로그램은 물론, 상기 응답 시간별 딥러닝 모델 맵과 추론에 사용되는 각종 딥러닝 모델을 저장하여 관리한다.The database 300 stores and manages various operation programs used in the system 100 for dynamically switching the deep learning model, as well as deep learning model maps for each response time and various deep learning models used for inference.

또한, 상기 데이터베이스(300)는 각 사용자가 요청하는 추론 요청에 대한 정보, 상기 추론 요청에 따라 딥러닝 모델을 통해 수행하여 해당 사용자에게 제공한 추론 결과를 저장하여 관리할 수 있다.Additionally, the database 300 may store and manage information on inference requests requested by each user and inference results provided to the user by performing a deep learning model according to the inference request.

도 2는 본 발명의 일 실시예에 따른 딥러닝 모델을 동적으로 전환하는 시스템의 구성을 보다 상세하게 나타낸 블록도이다. Figure 2 is a block diagram showing in more detail the configuration of a system for dynamically switching a deep learning model according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)은 추론 요청 수신부(110), 추론 수행부(120), 추론 응답 시간 측정부(130), 딥러닝 모델 동적 전환부(140), 추론 결과 제공부(150) 등을 포함하여 구성된다.As shown in Figure 2, the system 100 for dynamically switching the deep learning model includes an inference request receiver 110, an inference performance unit 120, an inference response time measurement unit 130, and a deep learning model dynamic switch. It is comprised of a unit 140, an inference result providing unit 150, etc.

또한, 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)은 상기 도 2에 도시하지는 않았지만, GPT(Generative Pre-trained Transformer) 시리즈 등의 오픈 대화형 언어 모델을 포함한 딥러닝 모델의 생성, 업데이트 및 관리를 수행하는 구성을 추가로 구성할 수 있다.In addition, the system 100 for dynamically switching the deep learning model, although not shown in FIG. 2, generates, updates and processes deep learning models including open interactive language models such as the GPT (Generative Pre-trained Transformer) series. Additional configurations can be configured to perform management.

상기 추론 요청 수신부(110)는 네트워크를 통해 통신 접속된 상기 사용자 단말(200)로부터 추론 요청을 수신하며, 상기 수신한 추론 요청을 상기 추론 수행부(120)로 전달한다.The inference request receiving unit 110 receives an inference request from the user terminal 200 connected to communication through a network, and transmits the received inference request to the inference performing unit 120.

상기 추론 수행부(120)는 상기 추론 요청 수신부(110)를 통해 수신한 각 사용자별 추론 요청을 현재 적용중인 딥러닝 모델에 입력하여, 추론을 수행하도록 한다. 이때 상기 추론 수행부(120)에서 수행한 추론 결과는 상기 딥러닝 모델 동적 전환부(140)에서의 딥러닝 모델 동적 전환과 관련 없이 상기 추론 결과 제공부(150)를 통해 해당 사용자 단말(200)로 제공될 수 있다.The inference performance unit 120 inputs the inference request for each user received through the inference request receiver 110 into the deep learning model currently being applied to perform inference. At this time, the inference result performed by the inference performance unit 120 is sent to the corresponding user terminal 200 through the inference result provider 150, regardless of the deep learning model dynamic conversion in the deep learning model dynamic conversion unit 140. It can be provided as .

또한, 상기 추론 수행부(120)는 상기 딥러닝 모델 동적 전환부(140)에서 판단한 결과 동적 전환 조건을 만족하지 않아 현재 적용중인 딥러닝 모델을 그대로 사용하는 것이 결정되는 경우에는, 현재 적용중인 딥러닝 모델을 그대로 사용하여 상기 사용자 단말(200)의 추론 요청에 따른 추론을 수행할 수 있다.In addition, if the inference performance unit 120 determines to use the currently applied deep learning model as is because the dynamic conversion condition is not satisfied as a result of the determination by the deep learning model dynamic conversion unit 140, the currently applied deep learning model is not satisfied. Inference can be performed according to the inference request of the user terminal 200 by using the learning model as is.

또한, 상기 추론 수행부(120)는 상기 딥러닝 모델 동적 전환부(140)에서 판단한 결과 동적 전환 조건을 만족하여 다른 딥러닝 모델로의 동적 전환이 이루어지더라도, 전환 예정의 다른 딥러닝 모델이 최종적으로 시스템에 로딩되기 전까지는 현재 적용중인 딥러닝 모델을 그대로 사용하여 상기 사용자 단말(200)의 추론 요청에 따른 추론을 수행할 수 있다.In addition, even if the inference performance unit 120 satisfies the dynamic conversion conditions as determined by the deep learning model dynamic conversion unit 140 and dynamic conversion to another deep learning model is made, the other deep learning model to be converted is Until it is finally loaded into the system, inference according to the inference request of the user terminal 200 can be performed using the deep learning model currently being applied.

상기 추론 응답 시간 측정부(130)는 상기 추론 수행부(120)에서 수행한 추론에 소요된 추론 응답 시간을 측정하는 기능을 수행한다.The inference response time measurement unit 130 performs a function of measuring the inference response time required for inference performed by the inference performance unit 120.

이때 상기 추론 응답 시간 측정부(130)는 현재 적용중인 딥러닝 모델에서 수행한 각 사용자별 추론에 소요된 추론 응답 시간을 측정한 다음, 상기 측정한 각 사용자별 추론에 소요된 추론 응답 시간을 평균하여, 평균 추론 응답 시간을 측정하며, 상기 측정한 평균 추론 응답 시간을 상기 딥러닝 모델 동적 전환부(140)로 전달한다.At this time, the inference response time measurement unit 130 measures the inference response time required for inference for each user performed in the currently applied deep learning model, and then averages the inference response time required for inference for each user measured. Thus, the average inference response time is measured, and the measured average inference response time is transmitted to the deep learning model dynamic conversion unit 140.

상기 딥러닝 모델 동적 전환부(140)는 상기 추론 응답 시간 측정부(130)에서 측정한 추론 응답 시간을 참조하여, 현재 적용중인 딥러닝 모델을 그대로 적용할 것인지, 아니면 복수의 딥러닝 모델 중에서 추론 응답 시간을 최적화하는 어느 하나의 다른 딥러닝 모델로 전환할 것인지를 결정한다.The deep learning model dynamic switching unit 140 refers to the inference response time measured by the inference response time measurement unit 130 and determines whether to apply the currently applied deep learning model as is or to infer among a plurality of deep learning models. Decide whether to switch to one different deep learning model that optimizes response time.

이때 상기 딥러닝 모델 동적 전환부(140)는 조회부(141), 매개변수 크기 비교부(142), 동적 전환 조건 판단부(143), 동적 전환 처리부(144)로 구성된다.At this time, the deep learning model dynamic conversion unit 140 is composed of a query unit 141, a parameter size comparison unit 142, a dynamic conversion condition determination unit 143, and a dynamic conversion processing unit 144.

상기 조회부(141)는 상기 추론 응답 시간 측정부(130)로부터 평균 추론 응답 시간이 전달되면, 상기 데이터베이스(300)에 저장해둔 응답 시간별 딥러닝 모델 맵을 조회한다. 상기 응답 시간별 딥러닝 모델 맵에 대해서는 도 3에서 상세하게 설명하기로 한다.When the average inference response time is delivered from the inference response time measurement unit 130, the search unit 141 searches the deep learning model map for each response time stored in the database 300. The deep learning model map for each response time will be described in detail in FIG. 3.

상기 매개변수 크기 비교부(142)는 현재 적용중인 딥러닝 모델의 매개변수 크기를 확인한다.The parameter size comparison unit 142 checks the parameter size of the deep learning model currently being applied.

또한, 상기 매개변수 크기 비교부(142)는 상기 조회부(141)를 통해 조회한 상기 응답 시간별 딥러닝 모델 맵을 참조하여, 상기 추론 응답 시간 측정부(130)에서 측정한 각 사용자별 추론 요청에 대한 평균 추론 응답 시간과 동일한 추론 응답 시간을 갖는 딥러닝 모델의 매개변수 크기를 확인한다.In addition, the parameter size comparison unit 142 refers to the deep learning model map for each response time searched through the inquiry unit 141, and requests inference for each user measured by the inference response time measurement unit 130. Check the parameter size of the deep learning model that has an inference response time equal to the average inference response time for .

이어서, 상기 매개변수 크기 비교부(142)는 2개의 딥러닝 모델의 매개변수 크기(즉 현재 적용중인 딥러닝 모델의 매개변수 크기와 응답 시간별 딥러닝 모델 맵에서 확인한 각 사용자별 추론 요청에 대한 평균 추론 응답 시간과 동일한 추론 응답 시간을 갖는 딥러닝 모델의 매개변수 크기)를 비교하고, 상기 비교한 결과를 상기 동적 전환 조건 판단부(143)로 전달한다.Subsequently, the parameter size comparison unit 142 determines the parameter size of the two deep learning models (i.e., the parameter size of the deep learning model currently being applied and the average of the inference requests for each user confirmed in the deep learning model map for each response time). The inference response time and the parameter size of the deep learning model having the same inference response time) are compared, and the compared result is transmitted to the dynamic switching condition determination unit 143.

상기 동적 전환 조건 판단부(143)는 상기 매개변수 크기 비교부(142)에서 비교한 결과를 토대로 기 설정된 동적 전환 조건에 해당하는지의 여부를 판단하고, 상기 판단한 결과를 상기 동적 전환 처리부(144)로 전달한다.The dynamic conversion condition determination unit 143 determines whether a preset dynamic conversion condition is met based on the result compared by the parameter size comparison unit 142, and the determined result is sent to the dynamic conversion processor 144. Pass it to

이때 상기 동적 전환 조건은 현재 적용중인 딥러닝 모델의 매개변수 크기와 응답 시간별 딥러닝 모델 맵에서 확인한 각 사용자별 추론 요청에 대한 평균 추론 응답 시간과 동일한 추론 응답 시간을 갖는 딥러닝 모델의 매개변수 크기가 다르고, 2개의 매개변수 크기가 다른 것이 연속적으로 기 설정된 횟수(예: 3회) 이상 발생하는 조건을 말한다.At this time, the dynamic switching condition is the parameter size of the deep learning model currently being applied and the parameter size of the deep learning model with the same inference response time as the average inference response time for each user's inference request confirmed in the deep learning model map by response time. refers to a condition in which the sizes of two parameters are different and occur consecutively more than a preset number of times (e.g., three times).

이때 본 발명에서는 상기 횟수를 3회로 설정하는 것이 바람직하지만, 이에 한정되는 것은 아니며, 사용 환경에 따라 줄이거나 늘려서 설정할 수 있음을 밝혀둔다.At this time, in the present invention, it is preferable to set the number of times to 3, but it is not limited to this, and it can be set to reduce or increase depending on the usage environment.

상기 동적 전환 처리부(144)는 상기 동적 전환 조건 판단부(143)에서 판단한 결과 상기 동적 전환 조건을 만족하면, 현재 적용중인 딥러닝 모델을 응답 시간별 딥러닝 모델 맵에서 확인한 상기 평균 추론 응답 시간과 동일한 추론 응답 시간을 갖는 딥러닝 모델로 전환하는 기능을 수행한다.If the dynamic switching condition is satisfied as determined by the dynamic switching condition determination unit 143, the dynamic switching processing unit 144 selects the currently applied deep learning model as equal to the average inference response time confirmed in the deep learning model map for each response time. It performs the function of converting to a deep learning model with inference response time.

즉, 상기 결과 동적 전환 조건을 만족하면, 현재 적용중인 딥러닝 모델을 현재 시점의 추론에 최적화된 응답 속도를 가진 다른 딥러닝 모델로 동적인 전환을 수행하는 것이다.In other words, if the dynamic conversion condition is satisfied, the currently applied deep learning model is dynamically converted to another deep learning model with a response speed optimized for inference at the current time.

상기 추론 결과 제공부(150)는 상기 추론 수행부(120)에서 수행한 추론 결과를 상기 사용자 단말(200)로 제공한다.The inference result providing unit 150 provides the inference result performed by the inference performing unit 120 to the user terminal 200.

이때 생성되는 추론 결과는 텍스트 데이터로 생성될 수 있으며, 그 이외에 사전에 정해져 있는 템플릿에 따른 그래픽이나 표 형식으로 생성될 수 있다. 또한, 추론 결과는 여러 파일 포맷으로 생성될 수 있으며, 사용자의 요청에 따라 원하는 파일 포맷으로 변환하여 제공될 수 있다.The inference results generated at this time can be generated as text data, or in graphic or table format according to a predetermined template. Additionally, inference results can be generated in several file formats, and can be converted to a desired file format and provided upon user request.

도 3은 본 발명에 적용된 응답 시간별 딥러닝 모델 맵의 예시를 나타낸 도면이다.Figure 3 is a diagram showing an example of a deep learning model map for each response time applied to the present invention.

도 3에 도시된 바와 같이, 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)은 상기 데이터베이스(300)에 응답 시간별 딥러닝 모델 맵을 사전에 구축하여 저장, 관리할 수 있다.As shown in FIG. 3, the system 100 that dynamically switches the deep learning model can build, store, and manage a deep learning model map for each response time in advance in the database 300.

상기 응답 시간별 딥러닝 모델 맵은 주어진 GPU를 통해 추론한 최소 추론 응답 시간 및 최대 응답 시간에 따라 매개변수 크기가 다른 딥러닝 모델을 매칭하여 설정한 테이블 데이터로서, 사전에 테스트를 통해 설정될 수 있다.The deep learning model map by response time is table data set by matching deep learning models with different parameter sizes according to the minimum inference response time and maximum response time inferred through a given GPU, and can be set through testing in advance. .

일 예로, 주어진 GPU에서 추론한 응답시간이 60초 이상 120초 미만이면 매개변수 크기가 1.3B(13억개)인 딥러닝 모델을 매칭하고, 응답시간이 30초 이상 60초 미만이면 매개변수 크기가 3B(30억개)인 딥러닝 모델을 매칭하고, 응답시간이 5초 이상 30초 미만이면 매개변수 크기가 13B(130억개)인 딥러닝 모델을 매칭하고, 응답시간이 0초 이상 5초 미만이면 매개변수 크기가 40B(400억개)인 딥러닝 모델을 매칭함으로써, 응답시간별 적정 모델 맵을 설정할 수 있는 것이다. 여기서, 응답시간이나 매개변수의 크기는 테스트 결과를 통해 임의로 변경 가능하다.For example, if the response time inferred from a given GPU is 60 to 120 seconds, a deep learning model with a parameter size of 1.3B (1.3 billion) is matched, and if the response time is 30 to 60 seconds, the parameter size is matched. Match a deep learning model with a parameter size of 3B (3 billion), and if the response time is between 5 and 30 seconds, match a deep learning model with a parameter size of 13B (13 billion). If the response time is between 0 and 5 seconds, match the deep learning model with a parameter size of 13B (13 billion). By matching a deep learning model with a parameter size of 40B (40 billion), it is possible to set an appropriate model map for each response time. Here, the response time or parameter size can be arbitrarily changed based on test results.

이때 상기 딥러닝 모델은 매개변수 크기에 따른 추론 응답의 품질에 차이가 발생하지 않도록 하기 위해서, 도메인 특성에 맞추어 정제된 데이터 셋으로 학습하여 생성하여야 한다.At this time, the deep learning model must be created by learning from a data set refined according to domain characteristics in order to prevent differences in the quality of the inference response depending on the parameter size.

또한, 상기 도메인은 상기 딥러닝 모델이 학습하는 데이터의 특성을 의미하는 것으로서, 이미지, 텍스트, 음성 및 비디오를 포함한 데이터 도메인과 분류, 회귀, 생성, 추천 및 로봇 제어를 포함한 작업 도메인으로 나눌 수 있다.In addition, the domain refers to the characteristics of the data learned by the deep learning model, and can be divided into a data domain including images, text, voice, and video, and a task domain including classification, regression, generation, recommendation, and robot control. .

여기서, 데이터 도메인은 이미지의 경우 해상도, 색상, 장면, 조명, 카메라의 종류 등을 포함하고, 텍스트의 경우 언어, 문법, 스타일, 주제 등을 포함하고, 음성의 경우 음질, 방향, 배경 소음 등을 포함하며, 비디오의 경우 프레임 속도, 해상도, 색상, 장면 등을 포함할 수 있다.Here, the data domain includes resolution, color, scene, lighting, camera type, etc. for images, language, grammar, style, topic, etc. for text, and sound quality, direction, background noise, etc. for voice. In the case of video, it may include frame rate, resolution, color, scene, etc.

또한, 작업 도메인은 분류의 경우 이미지, 텍스트, 음성, 비디오 등을 포함하고, 회귀의 경우 수치 예측, 시계열 예측 등을 포함하고, 생성의 경우 이미지, 텍스트, 음성, 비디오 등을 포함하고, 추천의 경우 상품, 영화, 음악 등을 포함하며, 로봇 제어의 경우 물체 인식, 이동, 상호 작용 등을 포함할 수 있다.Additionally, the task domain includes images, text, voice, video, etc. for classification, numerical prediction, time series prediction, etc. for regression, images, text, voice, video, etc. for generation, and recommendation. In this case, it includes products, movies, music, etc., and in the case of robot control, it can include object recognition, movement, interaction, etc.

한편, 수억 내지 수십억 개의 매개변수를 가진 딥러닝 모델은 많은 양의 데이터와 계산 자원을 사용하여 학습하여야 하므로 시간이 오래 걸리지만, 수많은 매개변수를 사용하여 복잡한 패턴을 학습하는 것을 통해서 복잡한 문제를 해결할 수 있고, 기존의 모델보다 더 정확하고 효율적인 결과를 제공할 수 있으며, 자연어 처리, 컴퓨터 비전, 음성 인식, 자연어 생성 등의 매우 다양한 분야에서 사용할 수 있다.Meanwhile, deep learning models with hundreds of millions or billions of parameters take a long time because they must be trained using large amounts of data and computational resources, but they can solve complex problems by learning complex patterns using numerous parameters. It can provide more accurate and efficient results than existing models, and can be used in a wide variety of fields such as natural language processing, computer vision, speech recognition, and natural language generation.

일 예로, 자연어 처리 딥러닝 모델은 질의응답, 번역, 요약, 생성 등의 텍스트를 이해하고 생성하는데 사용되는 모델로서, GPT-3, RoBERTa, LaMDA, Megatron-Turing NLG, Jurassic-1 Jumbo 등이 있다. 또한 컴퓨터 비전 딥러닝 모델은 물체 인식, 얼굴 인식, 자연 이미지 분류 등의 이미지와 비디오를 이해하고 처리하는데 사용되는 모델로서, ResNet, VGG, YOLO, SSD, Swin Transformer, ConvMixer 등이 있다. 또한 음성 인식 딥러닝 모델은 음성 명령, 음성 합성 등의 음성을 이해하고 처리하는데 사용되는 모델로서, Transformer, WaveNet, DNN, Megatron-Turing NLG, WuDao 2.0 등이 있다. 또한 자연어 생성 딥러닝 모델은 뉴스 기사, 웹 페이지, 시, 코드 등의 텍스트를 생성하는데 사용되는 모델로서, GPT-3, T5, Megatron-Turing NLG, Jurassic-1 Jumbo, WuDao 2.0 등이 있다. 그 이외에, AlphaGo, AlphaZero, Dota 2 AI, StarCraft II AI, WuDao 2.0 등의 게임용 딥러닝 모델이 있다.As an example, natural language processing deep learning models are models used to understand and generate text such as question answering, translation, summarization, and generation, and include GPT-3, RoBERTa, LaMDA, Megatron-Turing NLG, and Jurassic-1 Jumbo. . In addition, computer vision deep learning models are models used to understand and process images and videos for object recognition, face recognition, and natural image classification, and include ResNet, VGG, YOLO, SSD, Swin Transformer, and ConvMixer. In addition, voice recognition deep learning models are models used to understand and process voices such as voice commands and voice synthesis, and include Transformer, WaveNet, DNN, Megatron-Turing NLG, and WuDao 2.0. In addition, natural language generation deep learning models are models used to generate text such as news articles, web pages, poems, and codes, and include GPT-3, T5, Megatron-Turing NLG, Jurassic-1 Jumbo, and WuDao 2.0. In addition, there are deep learning models for games such as AlphaGo, AlphaZero, Dota 2 AI, StarCraft II AI, and WuDao 2.0.

도 4는 본 발명의 일 실시예에 따른 딥러닝 모델을 동적으로 전환하는 시스템의 하드웨어 구조를 나타낸 도면이다.Figure 4 is a diagram showing the hardware structure of a system for dynamically switching deep learning models according to an embodiment of the present invention.

도 4에 도시한 것과 같이, 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)의 하드웨어 구조는, 중앙처리장치(1000), 메모리(2000), 사용자 인터페이스(3000), 데이터베이스 인터페이스(4000), 네트워크 인터페이스(5000), 웹서버(6000), 그래픽처리장치(7000) 등을 포함하여 구성된다.As shown in FIG. 4, the hardware structure of the system 100 for dynamically switching the deep learning model includes a central processing unit 1000, a memory 2000, a user interface 3000, a database interface 4000, It consists of a network interface (5000), a web server (6000), a graphics processing unit (7000), etc.

상기 사용자 인터페이스(3000)는 그래픽 사용자 인터페이스(GUI, graphical user interface)를 사용함으로써, 사용자에게 입력과 출력 인터페이스를 제공한다.The user interface 3000 provides an input and output interface to the user by using a graphical user interface (GUI).

상기 데이터베이스 인터페이스(4000)는 데이터베이스와 하드웨어 구조 사이의 인터페이스를 제공한다. 상기 네트워크 인터페이스(5000)는 사용자가 보유한 장치 간의 네트워크 연결을 제공한다.The database interface 4000 provides an interface between a database and a hardware structure. The network interface 5000 provides network connections between devices owned by users.

상기 웹 서버(6000)는 사용자가 네트워크를 통해 하드웨어 구조로 액세스하기 위한 수단을 제공한다. 대부분의 사용자들은 원격에서 웹 서버로 접속하여 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)을 사용할 수 있다.The web server 6000 provides a means for users to access the hardware structure through a network. Most users can use the system 100 to connect to a web server remotely and dynamically switch the deep learning model.

상술한 구성 또는 방법의 각 단계는, 컴퓨터 판독 가능한 기록매체 상의 컴퓨터 판독 가능 코드로 구현되거나 전송 매체를 통해 전송될 수 있다. 컴퓨터 판독 가능한 기록매체는, 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터를 저장할 수 있는 데이터 저장 디바이스이다.Each step of the above-described configuration or method may be implemented as computer-readable code on a computer-readable recording medium or transmitted through a transmission medium. A computer-readable recording medium is a data storage device that can store data that can be read by a computer system.

컴퓨터 판독 가능한 기록매체의 예로는 데이터베이스, ROM, RAM, CD-ROM, DVD, 자기 테이프, 플로피 디스크 및 광학 데이터 저장 디바이스가 있으나 이에 한정되는 것은 아니다. 전송매체는 인터넷 또는 다양한 유형의 통신 채널을 통해 전송되는 반송파를 포함할 수 있다. 또한 컴퓨터 판독 가능한 기록매체는, 컴퓨터 판독 가능 코드가 분산 방식으로 저장되고, 실행되도록 네트워크 결합 컴퓨터 시스템을 통해 분배될 수 있다.Examples of computer-readable recording media include, but are not limited to, databases, ROM, RAM, CD-ROM, DVD, magnetic tape, floppy disk, and optical data storage devices. Transmission media may include carrier waves transmitted through the Internet or various types of communication channels. Additionally, the computer-readable recording medium may be distributed through a network-coupled computer system such that the computer-readable code is stored and executed in a distributed manner.

또한 본 발명에 적용된 적어도 하나 이상의 구성요소는, 각각의 기능을 수행하는 중앙처리장치(CPU), 마이크로프로세서 등과 같은 프로세서를 포함하거나 이에 의해 구현될 수 있으며, 상기 구성요소 중 둘 이상은 하나의 단일 구성요소로 결합되어 결합된 둘 이상의 구성요소에 대한 모든 동작 또는 기능을 수행할 수 있다. 또한 본 발명에 적용된 적어도 하나 이상의 구성요소의 일부는, 이들 구성요소 중 다른 구성요소에 의해 수행될 수 있다. 또한 상기 구성요소들 간의 통신은 버스(미도시)를 통해 수행될 수 있다.In addition, at least one or more components applied to the present invention may include or be implemented by a processor such as a central processing unit (CPU) or microprocessor that performs each function, and two or more of the components may be implemented as a single It can be combined into components and perform all operations or functions of two or more components combined. Additionally, part of at least one or more components applied to the present invention may be performed by other components among these components. Additionally, communication between the components may be performed through a bus (not shown).

다음에는, 이와 같이 구성된 본 발명에 따른 사용자 질의에 대한 추론 응답 시간에 따라 딥러닝 모델을 동적으로 전환하는 방법의 일 실시예를 도 5를 참조하여 상세하게 설명한다. 이때 본 발명의 방법에 따른 각 단계는 사용 환경이나 당업자에 의해 순서가 변경될 수 있다.Next, an embodiment of a method for dynamically switching a deep learning model according to the inference response time to a user query according to the present invention configured as described above will be described in detail with reference to FIG. 5. At this time, the order of each step according to the method of the present invention may be changed depending on the usage environment or a person skilled in the art.

도 5는 본 발명의 일 실시예에 따른 사용자 질의에 대한 추론 응답 시간에 따라 딥러닝 모델을 동적으로 전환하는 방법의 동작과정을 상세하게 나타낸 순서도이다.Figure 5 is a flowchart showing in detail the operation process of a method for dynamically switching a deep learning model according to the inference response time to a user inquiry according to an embodiment of the present invention.

도 5에 도시된 바와 같이, 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)은 네트워크를 통해 접속한 상기 사용자 단말(200)로부터 추론 요청(즉, 질의)을 수신한다(S100).As shown in FIG. 5, the system 100 for dynamically switching the deep learning model receives an inference request (i.e., query) from the user terminal 200 connected through the network (S100).

이때 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)은 상기 추론 요청을 수신하기 위해서 웹페이지, 앱 화면 등의 사용자 인터페이스를 제공할 수 있다.At this time, the system 100 that dynamically switches the deep learning model may provide a user interface such as a web page or app screen to receive the inference request.

또한, 상기 사용자 단말(200)로부터 수신하는 추론 요청은 일상생활에서 사용하는 대화형의 질문, 명령, 지시형 문장 등의 질의일 수 있다.Additionally, the inference request received from the user terminal 200 may be a query such as an interactive question, command, or directive sentence used in daily life.

상기 S100 단계를 통해 상기 사용자 단말(200)로부터 추론 요청이 수신되면, 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)은 현재 적용중인 딥러닝 모델을 통해서 상기 추론 요청에 대한 추론을 수행한다(S200).When an inference request is received from the user terminal 200 through step S100, the system 100, which dynamically switches the deep learning model, performs inference on the inference request through the deep learning model currently being applied ( S200).

이후, 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)은 추론에 소요된 추론 응답 시간을 측정한다(S300). 즉, 현재 적용중인 딥러닝 모델에서 수행한 각 사용자별 추론에 소요된 추론 응답 시간을 측정한 다음, 상기 측정한 각 사용자별 추론에 소요된 전체 추론 응답 시간을 평균하여 평균 추론 응답 시간을 측정하는 것이다.Afterwards, the system 100 for dynamically switching the deep learning model measures the inference response time required for inference (S300). In other words, the average inference response time is measured by measuring the inference response time required for inference for each user performed in the currently applied deep learning model, and then averaging the total inference response time required for inference for each user measured above. will be.

이어서, 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)은 상기 S300 단계에서 측정한 추론 응답 시간을 토대로 현재 적용중인 딥러닝 모델을 추론 응답 시간을 최적화하는 다른 딥러닝 모델로 전환하는 작업을 수행한다.Subsequently, the system 100 for dynamically switching the deep learning model performs the task of converting the deep learning model currently being applied to another deep learning model that optimizes the inference response time based on the inference response time measured in step S300. do.

보다 구체적으로 설명하면, 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)은 S300 단계를 통해 평균 추론 응답 시간이 측정된 이후, 상기 데이터베이스(300)에 저장하여 관리중인 응답 시간별 딥러닝 모델 맵을 조회한다(S400).To be more specific, the system 100 for dynamically switching the deep learning model measures the average inference response time through step S300, and then stores and manages the deep learning model map for each response time in the database 300. Search (S400).

그리고 현재 적용중인 딥러닝 모델의 매개변수 크기와 상기 S400 단계에서 조회한 응답 시간별 딥러닝 모델 맵에서 확인한 상기 S300 단계에서 측정한 평균 추론 응답 시간과 동일한 추론 응답 시간을 갖는 딥러닝 모델의 매개변수 크기를 비교하고, 상기 비교한 결과를 토대로 동적 전환 조건에 해당하는지의 여부를 판단한다(S500).And the parameter size of the deep learning model currently being applied and the parameter size of the deep learning model having the same inference response time as the average inference response time measured in step S300 confirmed in the deep learning model map for each response time viewed in step S400. is compared, and based on the comparison result, it is determined whether the dynamic conversion condition is met (S500).

이때 상기 동적 전환 조건은 현재 적용중인 딥러닝 모델의 매개변수 크기와 상기 S300 단계에서 측정한 평균 추론 응답 시간을 토대로 응답 시간별 딥러닝 모델 맵에서 확인한 특정 딥러닝 모델의 매개변수 크기가 서로 다르고, 매개변수의 크기가 서로 다른 것이 연속적으로 3회 이상 발생하는 조건인 것임은 상기 설명한 바와 같다.At this time, the dynamic switching condition is that the parameter size of the currently applied deep learning model and the parameter size of the specific deep learning model confirmed in the deep learning model map for each response time based on the average inference response time measured in step S300 are different from each other, and As explained above, the condition in which variables with different sizes occur three or more times in succession is as described above.

상기 S500 단계를 통해 판단한 결과 동적 전환 조건을 만족하면, 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)은 현재 적용중인 딥러닝 모델을 응답 시간별 딥러닝 모델 맵을 통해 확인한 상기 S300 단계에서 측정한 평균 추론 응답 시간과 동일한 추론 응답 시간을 갖는 딥러닝 모델로 전환한다(S600).If the dynamic switching condition is satisfied as determined through step S500, the system 100 for dynamically switching the deep learning model measures the deep learning model currently being applied in step S300 confirmed through the deep learning model map for each response time. Switch to a deep learning model with an inference response time equal to the average inference response time (S600).

이때 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)은 상기 S600단계를 통해서 현재 적용중인 딥러닝 모델을 다른 딥러닝 모델로 전환하는 것을 완료할 때까지 현재 적용중인 딥러닝 모델을 그대로 사용하여 사용자의 추론 요청에 대한 추론을 수행한다.At this time, the system 100 for dynamically converting the deep learning model uses the currently applied deep learning model as is until conversion of the currently applied deep learning model to another deep learning model is completed through step S600. Performs inference on inference requests.

또한, 상기 딥러닝 모델을 동적으로 전환하는 시스템(100)은 상기 S500 단계를 통해 판단한 결과 동적 전환 조건을 만족하지 않으면, 현재 적용중인 딥러닝 모델을 그대로 사용하여 상기 사용자 단말의 추론 요청에 따른 추론을 수행한다(S700).In addition, the system 100 for dynamically switching the deep learning model, if the dynamic switching condition is not satisfied as a result of the determination through step S500, uses the currently applied deep learning model as is to make inferences according to the inference request of the user terminal. Perform (S700).

이처럼, 본 발명은 다수의 사용자에 의한 추론 요청에 대하여 여러 매개변수 딥러닝 모델을 동적으로 전환하여 최적의 추론 응답 속도를 확보하도록 하기 때문에, 추론에 필요한 GPU 리소스의 부족으로 인하여 응답 속도가 저하되는 것을 방지할 수 있다.As such, since the present invention secures optimal inference response speed by dynamically switching multiple parameter deep learning models in response to inference requests by multiple users, the response speed is reduced due to lack of GPU resources required for inference. can be prevented.

첨부된 도면은 본 발명의 기술적 사상을 보다 명확하게 표현하기 위해, 본 발명의 기술적 사상과 관련성이 없거나 떨어지는 구성에 대해서는 간략하게 표현하거나 생략하였다.In the attached drawings, in order to more clearly express the technical idea of the present invention, components that are unrelated or less relevant to the technical idea of the present invention are briefly expressed or omitted.

상기에서는 본 발명에 따른 실시예를 기준으로 본 발명의 구성과 특징을 설명하였으나 본 발명은 이에 한정되지 않으며, 본 발명의 사상과 범위 내에서 다양하게 변경 또는 변형할 수 있음은 본 발명이 속하는 기술분야의 당업자에게 명백한 것이며, 따라서 이와 같은 변경 또는 변형은 첨부된 특허청구범위에 속함을 밝혀둔다.In the above, the configuration and features of the present invention have been described based on the embodiments according to the present invention, but the present invention is not limited thereto, and various changes or modifications may be made within the spirit and scope of the present invention. It is obvious to those skilled in the art, and therefore, it is stated that such changes or modifications fall within the scope of the appended patent claims.

100 : 사용자 질의에 대한 추론 응답 시간에 따라 딥러닝 모델을 동적으로 전환하는 시스템
110 : 추론 요청 수신부
120 : 추론 수행부
130 : 추론 응답 시간 측정부
140 : 딥러닝 모델 동적 전환부
141 : 조회부
142 : 매개변수 크기 비교부
143 : 동적 전환 조건 판단부
144 : 동적 전환 처리부
150 : 추론 결과 제공부
200 : 사용자 단말
300 : 데이터베이스100: A system that dynamically switches deep learning models according to the inference response time to user queries
110: Inference request receiver
120: Inference execution unit
130: Inference response time measurement unit
140: Deep learning model dynamic conversion unit
141: Inquiry section
142: Parameter size comparison unit
143: Dynamic conversion condition determination unit
144: Dynamic conversion processing unit
150: Inference result provision unit
200: user terminal
300: database

Claims

This is performed in a system that dynamically switches deep learning models,
An inference request receiving step of receiving inference requests from a plurality of user terminals;
an inference performing step of inferring by inputting the received inference request into a deep learning model currently being applied;
An inference response that measures the inference response time required for inference for each user performed in the currently applied deep learning model, averages the inference response time required for inference for each user measured, and measures the average inference response time. time measurement step; and
In order to prevent a decrease in response speed due to lack of GPU resources required for inference, with reference to the measured inference response time, the currently applied deep learning model is selected from one of a plurality of deep learning models that optimizes the inference response time. It includes a deep learning model dynamic conversion step of converting to a learning model,
The deep learning model dynamic conversion step is,
When the average inference response time is measured in the inference response time measurement step, searching a deep learning model map for each preset response time;
Comparing the parameter size of the currently applied deep learning model with the parameter size matched to the same inference response time as the average inference response time confirmed in the deep learning model map for each response time;
determining whether the comparison results meet preset dynamic conversion conditions;
If the dynamic switching condition is not satisfied as a result of the determination, performing inference according to the inference request of the user terminal by using the currently applied deep learning model as is; and
If the dynamic conversion condition is satisfied as a result of the determination, the currently applied deep learning model is converted into a deep learning model with a parameter size matched to the inference response time equal to the average inference response time confirmed in the deep learning model map for each response time. A method of dynamically switching a deep learning model, comprising the step of switching.

delete

In claim 1,
The dynamic switching conditions are:
The parameter size of the currently applied deep learning model and the parameter size matched to the same inference response time as the average inference response time confirmed in the deep learning model map for each response time are different, and the sizes of the parameters are different continuously. A method of dynamically switching a deep learning model characterized by a condition that occurs more than a set number of times.

delete

In claim 1,
The deep learning model map for each response time is,
A method of dynamically switching a deep learning model, which is set through testing in advance and is set by matching deep learning models with different parameter sizes according to the minimum and maximum inference response times inferred from a given GPU.

delete