KR102613367B1

KR102613367B1 - Method and apparatus for automatically reducing model weight for deep learning model serving optimization, and a method for providing cloud inference services usin the same

Info

Publication number: KR102613367B1
Application number: KR1020200185894A
Authority: KR
Inventors: 이경용; 손태선
Original assignee: 국민대학교산학협력단
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2023-12-13
Also published as: KR20220094564A; WO2022145564A1

Abstract

본 발명은 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법 및 장치, 이를 이용한 클라우드 추론 서비스 제공 방법에 관한 것으로, 상기 장치는 딥러닝 모델을 구축하기 위한 딥러닝 알고리즘을 수신하는 단계; 상기 딥러닝 알고리즘을 복수의 동작 단계들로 분할하는 단계; 상기 딥러닝 알고리즘에 따른 학습 과정에서 상기 복수의 동작 단계들 사이에 존재하는 적어도 하나의 분기 지점을 결정하는 단계; 상기 적어도 하나의 분기 지점을 기준으로 상기 학습 과정의 진행 방향으로부터 분기하고 상기 딥러닝 알고리즘의 마지막 동작 단계로 진행하는 적어도 하나의 중간 딥러닝 모델을 생성하는 단계; 및 상기 학습 과정의 완료에 따라 상기 딥러닝 모델 및 상기 적어도 하나의 중간 딥러닝 모델을 완성하는 단계를 포함한다.The present invention relates to a method and device for automatically lightweighting a model for optimizing deep learning model serving, and a method for providing a cloud inference service using the same, the device comprising: receiving a deep learning algorithm for building a deep learning model; Dividing the deep learning algorithm into a plurality of operation steps; determining at least one branch point that exists between the plurality of operation steps in a learning process according to the deep learning algorithm; generating at least one intermediate deep learning model that branches off from the direction of the learning process based on the at least one branch point and proceeds to a final operation step of the deep learning algorithm; and completing the deep learning model and the at least one intermediate deep learning model upon completion of the learning process.

Description

Method and device for automatically reducing model weight for optimizing deep learning model serving, and method for providing cloud inference services using the same

본 발명은 딥러닝 모델 생성 기술에 관한 것으로, 보다 상세하게는 다양한 수준의 딥러닝 모델을 자동 생성하여 딥러닝 모델 추론 서비스를 원활하게 제공할 수 있는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법 및 장치, 이를 이용한 클라우드 추론 서비스 제공 방법에 관한 것이다.The present invention relates to deep learning model generation technology, and more specifically, a method for automatically lightweighting models for optimizing deep learning model serving that can automatically generate deep learning models at various levels to smoothly provide deep learning model inference services, and This relates to devices and methods of providing cloud inference services using them.

최근 다양한 애플리케이션에 딥러닝 기법을 적용하게 되면서 임베디드 환경에서도 해당 알고리즘을 사용하는 사례가 증가하고 있다. 일반적으로 정확도가 높은 딥러닝 모델일수록 결과 도출까지 긴 시간이 필요할 수 있다. 그러나, 임베디드 환경은 시간과 에너지에 제약 사항이 존재하며, 기존의 신경망은 제약 사항에 동적으로 대처하지 못하는 문제점을 가지고 있다.Recently, as deep learning techniques have been applied to various applications, the number of cases of using the algorithm in embedded environments is increasing. In general, deep learning models with higher accuracy may require a longer time to produce results. However, the embedded environment has constraints on time and energy, and existing neural networks have the problem of not being able to dynamically cope with constraints.

이러한 상황은 클라우드 컴퓨팅 환경에서도 동일하게 적용될 수 있다. 현재 대규모의 컴퓨팅 자원을 필요로 하는 딥러닝 작업의 특성 상 학습과 추론 서비스 모두 클라우드 컴퓨팅 환경에서 많은 작업이 이루어지고 있다. 딥러닝 추론 서비스의 경우 생성된 모델을 활용하여 사용자의 요청에 따라 예측 서비스를 제공해주는 역할을 담당할 수 있다.This situation can equally apply to a cloud computing environment. Currently, due to the nature of deep learning tasks that require large-scale computing resources, much work is being done in cloud computing environments for both learning and inference services. In the case of deep learning inference services, it can be responsible for providing prediction services according to user requests by utilizing the created model.

학습 서비스와 달리 추론 서비스는 사용자의 요청 수에 따라서 확장성 있는 서비스가 제공되어야 하는 특성을 가질 수 있다. 딥러닝 추론 모델의 경우 복잡한 구조의 모델은 추론 시간은 오래 걸리는 반면에 정확도가 높은 특징이 있으며, 반대로 간단한 구조의 모델은 추론 시간은 짧지만 정확도가 낮다는 단점을 가질 수 있다. 일반적인 딥러닝 알고리즘 개발자는 정확도가 높은 모델을 만드는 것에 많은 노력을 기울이게 된다.Unlike learning services, inference services may have the characteristic of providing scalable services depending on the number of user requests. In the case of deep learning inference models, models with complex structures have the characteristic of having high accuracy while taking long inference times, while models with simple structures may have the disadvantage of having short inference times but low accuracy. General deep learning algorithm developers put a lot of effort into creating models with high accuracy.

하지만, 실제 딥러닝 모델의 추론 서비스를 제공해주는 경우에 있어서 사용자 요청이 폭증하는 경우 정확도가 높고 오랜 연산이 걸리는 모델보다는, 약간의 정확도 손해를 보더라도 빠른 추론 시간을 제공해 주는 모델이 응용 서비스에 적용하기에 더 효과적일 수 있다.However, in the case of providing an inference service for an actual deep learning model, when user requests increase rapidly, a model that provides a fast inference time even if there is a slight loss in accuracy is better to apply to the application service than a model that has high accuracy and takes a long time to calculate. may be more effective.

한국등록특허 제10-0820723(2008.04.02)호Korean Patent No. 10-0820723 (2008.04.02)

본 발명의 일 실시예는 다양한 수준의 딥러닝 모델을 자동 생성하여 딥러닝 모델 추론 서비스를 원활하게 제공할 수 있는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법 및 장치, 이를 이용한 클라우드 추론 서비스 제공 방법을 제공하고자 한다.An embodiment of the present invention is a method and device for automatic model lightweighting for optimizing deep learning model serving that can automatically generate deep learning models at various levels to smoothly provide deep learning model inference services, and a method for providing cloud inference services using the same. We would like to provide.

본 발명의 일 실시예는 딥러닝 모델 개발 연구원이 정확도가 높은 복잡한 모델을 개발하는 중에 해당 모델로부터 각기 다른 예측 정확도와 모델 복잡도를 가지는 다양한 딥러닝 예측 모델들을 생성할 수 있는 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법 및 장치, 이를 이용한 클라우드 추론 서비스 제공 방법을 제공하고자 한다.One embodiment of the present invention provides deep learning model serving optimization that allows deep learning model development researchers to generate various deep learning prediction models with different prediction accuracy and model complexity from the model while developing a complex model with high accuracy. We aim to provide a method and device for automatic model lightweighting and a method for providing cloud inference services using the same.

실시예들 중에서, 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법은 딥러닝 모델을 구축하기 위한 딥러닝 알고리즘을 수신하는 단계; 상기 딥러닝 알고리즘을 복수의 동작 단계들로 분할하는 단계; 상기 딥러닝 알고리즘에 따른 학습 과정에서 상기 복수의 동작 단계들 사이에 존재하는 적어도 하나의 분기 지점을 결정하는 단계; 상기 적어도 하나의 분기 지점을 기준으로 상기 학습 과정의 진행 방향으로부터 분기하고 상기 딥러닝 알고리즘의 마지막 동작 단계로 진행하는 적어도 하나의 중간 딥러닝 모델을 생성하는 단계; 및 상기 학습 과정의 완료에 따라 상기 딥러닝 모델 및 상기 적어도 하나의 중간 딥러닝 모델을 완성하는 단계를 포함한다.Among embodiments, a method for automatic model lightweighting for deep learning model serving optimization includes receiving a deep learning algorithm for building a deep learning model; Dividing the deep learning algorithm into a plurality of operation steps; determining at least one branch point that exists between the plurality of operation steps in a learning process according to the deep learning algorithm; generating at least one intermediate deep learning model that branches off from the direction of the learning process based on the at least one branch point and proceeds to a final operation step of the deep learning algorithm; and completing the deep learning model and the at least one intermediate deep learning model upon completion of the learning process.

상기 딥러닝 알고리즘은 DNN(Deep Neural Network), CNN(Convolution Neural Network) 및 RNN(Recurrent Neural Network)을 포함할 수 있다.The deep learning algorithm may include a Deep Neural Network (DNN), a Convolution Neural Network (CNN), and a Recurrent Neural Network (RNN).

상기 딥러닝 모델 및 상기 적어도 하나의 중간 딥러닝 모델은 예측 정확도와 연산 속도가 각각 상이할 수 있다.The deep learning model and the at least one intermediate deep learning model may have different prediction accuracy and calculation speed, respectively.

상기 복수의 동작 단계들로 분할하는 단계는 상기 딥러닝 알고리즘의 동작들을 복수의 레이어(layer)들로 분할하는 단계; 및 상기 복수의 레이어들 각각에 대응하는 동작 단계들을 결정하는 단계를 포함할 수 있다.The dividing into a plurality of operation steps includes dividing the operations of the deep learning algorithm into a plurality of layers; and determining operation steps corresponding to each of the plurality of layers.

상기 복수의 동작 단계들로 분할하는 단계는 상기 딥러닝 알고리즘의 동작 과정에서 반복적으로 수행되는 반복 구간을 결정하는 단계; 상기 반복 구간을 기준으로 반복 전 구간 및 반복 후 구간을 결정하는 단계; 상기 반복 구간에 대해 적어도 하나의 단위 구간을 결정하는 단계; 및 상기 반복 전 구간, 상기 적어도 하나의 단위 구간 및 상기 반복 후 구간을 순서대로 정렬하여 상기 복수의 동작 단계들로 결정하는 단계를 포함할 수 있다.The dividing into a plurality of operation steps includes determining a repetition section that is repeatedly performed during the operation of the deep learning algorithm; determining a pre-repetition section and a post-repetition section based on the repeat section; determining at least one unit section for the repetition section; and arranging the pre-repetition section, the at least one unit section, and the post-repetition section in order to determine the plurality of operation steps.

상기 적어도 하나의 분기 지점을 결정하는 단계는 상기 반복 구간이 종료되는 지점마다 분기 지점으로 결정하는 단계를 포함할 수 있다.The step of determining at least one branch point may include determining each point at which the repetition section ends as a branch point.

상기 적어도 하나의 분기 지점을 결정하는 단계는 상기 딥러닝 알고리즘이 CNN인 경우 적어도 하나의 콘볼루션 레이어(convolution layer)와 풀링 레이어(pooling layer)를 순차적으로 진행하는 반복 구간의 종료 지점을 상기 분기 지점으로 결정하는 단계를 포함할 수 있다.The step of determining the at least one branch point is performed by selecting the end point of the repetition section that sequentially progresses through at least one convolution layer and a pooling layer as the branch point. It may include a step of deciding.

상기 적어도 하나의 중간 딥러닝 모델을 생성하는 단계는 상기 분기에 따른 후보 중간 딥러닝 모델을 정의하는 단계; 상기 후보 중간 딥러닝 모델의 레이어 수(L) 및 예측 정확도(A)를 기초로 해당 분기 지점의 적정성을 산출하는 단계; 및 상기 해당 분기 지점의 적정성이 기 설정된 조건을 충족하는 경우 상기 후보 중간 딥러닝 모델을 중간 딥러닝 모델로 결정하는 단계를 포함할 수 있다.Generating the at least one intermediate deep learning model may include defining a candidate intermediate deep learning model according to the branch; calculating the adequacy of the corresponding branch point based on the number of layers (L) and prediction accuracy (A) of the candidate intermediate deep learning model; and determining the candidate intermediate deep learning model as an intermediate deep learning model when the appropriateness of the corresponding branch point satisfies a preset condition.

상기 적정성을 산출하는 단계는 상기 레이어 수(L)와 상기 예측 정확도(A) 간의 곱 연산(A*L)을 통해 상기 적정성을 산출하는 단계를 포함하고, 상기 중간 딥러닝 모델로 결정하는 단계는 상기 적정성이 이전 단계에서 생성된 중간 딥러닝 모델보다 2배 증가한 경우 해당 후보 중간 딥러닝 모델을 상기 중간 딥러닝 모델로 결정하는 단계를 포함할 수 있다.The step of calculating the adequacy includes calculating the adequacy through a product operation (A*L) between the number of layers (L) and the prediction accuracy (A), and the step of determining the intermediate deep learning model is If the adequacy is twice that of the intermediate deep learning model generated in the previous step, the step of determining the candidate intermediate deep learning model as the intermediate deep learning model may be included.

실시예들 중에서, 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 장치는 딥러닝 모델을 구축하기 위한 딥러닝 알고리즘을 수신하는 알고리즘 수신부; 상기 딥러닝 알고리즘을 복수의 동작 단계들로 분할하는 알고리즘 분석부; 상기 딥러닝 알고리즘에 따른 학습 과정에서 상기 복수의 동작 단계들 사이에 존재하는 적어도 하나의 분기 지점을 결정하는 분기지점 결정부; 상기 적어도 하나의 분기 지점을 기준으로 상기 학습 과정의 진행 방향으로부터 분기하고 상기 딥러닝 알고리즘의 마지막 동작 단계로 진행하는 적어도 하나의 중간 딥러닝 모델을 생성하는 중간모델 생성부; 및 상기 학습 과정의 완료에 따라 상기 딥러닝 모델 및 상기 적어도 하나의 중간 딥러닝 모델을 완성하는 딥러닝 모델 구축부를 포함한다.Among embodiments, an automatic model lightweighting device for optimizing deep learning model serving includes an algorithm receiver that receives a deep learning algorithm for building a deep learning model; an algorithm analysis unit that divides the deep learning algorithm into a plurality of operation steps; a branch point determination unit that determines at least one branch point that exists between the plurality of operation steps in a learning process according to the deep learning algorithm; an intermediate model generator that generates at least one intermediate deep learning model that branches off from the direction of the learning process based on the at least one branch point and proceeds to a final operation step of the deep learning algorithm; and a deep learning model construction unit that completes the deep learning model and the at least one intermediate deep learning model upon completion of the learning process.

상기 중간모델 생성부는 상기 분기에 따른 후보 중간 딥러닝 모델을 정의하고 상기 후보 중간 딥러닝 모델의 레이어 수(L) 및 예측 정확도(A)를 기초로 해당 분기 지점의 적정성을 산출하며 상기 해당 분기 지점의 적정성이 기 설정된 조건을 충족하는 경우 상기 후보 중간 딥러닝 모델을 중간 딥러닝 모델로 결정할 수 있다.The intermediate model generator defines a candidate intermediate deep learning model according to the branch, calculates the adequacy of the branch point based on the number of layers (L) and prediction accuracy (A) of the candidate intermediate deep learning model, and calculates the adequacy of the branch point. If the adequacy of satisfies preset conditions, the candidate intermediate deep learning model can be determined as the intermediate deep learning model.

상기 중간모델 생성부는 상기 레이어 수(L)와 상기 예측 정확도(A) 간의 곱 연산(A*L)을 통해 상기 적정성을 산출하고 상기 적정성이 이전 단계에서 생성된 중간 딥러닝 모델보다 2배 증가한 경우 해당 후보 중간 딥러닝 모델을 상기 중간 딥러닝 모델로 결정할 수 있다.The intermediate model generator calculates the adequacy through a product operation (A*L) between the number of layers (L) and the prediction accuracy (A), and when the adequacy is doubled compared to the intermediate deep learning model generated in the previous step, The candidate intermediate deep learning model may be determined as the intermediate deep learning model.

실시예들 중에서, 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법을 이용한 클라우드 추론 서비스 제공 방법은 사용자 단말로부터 추론 서비스에 관한 요청을 수신하는 단계; 상기 요청의 수신 시점을 기준으로 클라우드 가용 자원 현황을 결정하고 상기 요청에 대한 응답 생성 시간을 예측하여 해당 응답 생성 시간에 따라 상기 복수의 딥러닝 모델들 중 어느 하나를 결정하는 단계; 상기 결정된 딥러닝 모델을 이용하여 상기 요청에 대한 응답을 생성하여 상기 사용자 단말에 제공하는 단계를 포함한다.Among embodiments, a method of providing a cloud inference service using an automatic model lightweighting method for optimizing deep learning model serving includes receiving a request for an inference service from a user terminal; determining the status of available cloud resources based on the time of receipt of the request, predicting a response generation time for the request, and determining one of the plurality of deep learning models according to the response generation time; It includes generating a response to the request using the determined deep learning model and providing it to the user terminal.

상기 복수의 딥러닝 모델들은 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법을 기초로 예측 정확도와 연산 속도가 각각 상이하도록 생성될 수 있다.The plurality of deep learning models may be generated to have different prediction accuracy and calculation speed based on an automatic model lightweighting method for optimizing deep learning model serving.

개시된 기술은 다음의 효과를 가질 수 있다. 다만, 특정 실시예가 다음의 효과를 전부 포함하여야 한다거나 다음의 효과만을 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The disclosed technology can have the following effects. However, since it does not mean that a specific embodiment must include all of the following effects or only the following effects, the scope of rights of the disclosed technology should not be understood as being limited thereby.

본 발명의 일 실시예에 따른 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법 및 장치, 이를 이용한 클라우드 추론 서비스 제공 방법은 다양한 수준의 딥러닝 모델을 자동 생성하여 딥러닝 모델 추론 서비스를 원활하게 제공할 수 있다.The method and device for automatically lightweighting models for optimizing deep learning model serving according to an embodiment of the present invention, and the method for providing cloud inference services using the same, automatically generate deep learning models at various levels to smoothly provide deep learning model inference services. You can.

본 발명의 일 실시예에 따른 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법 및 장치, 이를 이용한 클라우드 추론 서비스 제공 방법은 딥러닝 모델 개발 연구원이 정확도가 높은 복잡한 모델을 개발하는 중에 해당 모델로부터 각기 다른 예측 정확도와 모델 복잡도를 가지는 다양한 딥러닝 예측 모델들을 생성할 수 있다.A method and device for automatically lightweighting a model for optimizing deep learning model serving according to an embodiment of the present invention, and a method for providing a cloud inference service using the same, allow deep learning model development researchers to You can create various deep learning prediction models with different prediction accuracy and model complexity.

도 1은 본 발명에 따른 모델 자동 경량화 시스템을 설명하는 도면이다.
도 2는 도 1의 모델 자동 경량화 장치의 시스템 구성을 설명하는 도면이다.
도 3은 도 1의 모델 자동 경량화 장치의 기능적 구성을 설명하는 도면이다.
도 4는 본 발명에 따른 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법을 설명하는 순서도이다.
도 5는 본 발명에 따른 클라우드 추론 서비스 제공 방법을 설명하는 순서도이다.
도 6은 CNN의 레이어 구성을 설명하는 도면이다.
도 7 및 8은 본 발명에 따른 모델 자동 경량화 방법의 일 실시예를 설명하는 도면이다.1 is a diagram illustrating an automatic model weight reduction system according to the present invention.
FIG. 2 is a diagram explaining the system configuration of the model automatic weight reduction device of FIG. 1.
FIG. 3 is a diagram explaining the functional configuration of the model automatic weight reduction device of FIG. 1.
Figure 4 is a flowchart explaining the automatic model lightweighting method for optimizing deep learning model serving according to the present invention.
Figure 5 is a flowchart explaining a method of providing a cloud inference service according to the present invention.
Figure 6 is a diagram explaining the layer configuration of CNN.
7 and 8 are diagrams illustrating an embodiment of the automatic model weight reduction method according to the present invention.

본 발명에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. 또한, 본 발명에서 제시된 목적 또는 효과는 특정 실시예가 이를 전부 포함하여야 한다거나 그러한 효과만을 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.Since the description of the present invention is only an example for structural or functional explanation, the scope of the present invention should not be construed as limited by the examples described in the text. In other words, since the embodiments can be modified in various ways and can have various forms, the scope of rights of the present invention should be understood to include equivalents that can realize the technical idea. In addition, the purpose or effect presented in the present invention does not mean that a specific embodiment must include all or only such effects, so the scope of the present invention should not be understood as limited thereby.

한편, 본 출원에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.Meanwhile, the meaning of the terms described in this application should be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Terms such as “first” and “second” are used to distinguish one component from another component, and the scope of rights should not be limited by these terms. For example, a first component may be named a second component, and similarly, the second component may also be named a first component.

어떤 구성요소가 다른 구성요소에 "연결되어"있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어"있다고 언급된 때에는 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is referred to as being “connected” to another component, it should be understood that it may be directly connected to the other component, but that other components may exist in between. On the other hand, when a component is referred to as being “directly connected” to another component, it should be understood that there are no other components in between. Meanwhile, other expressions that describe the relationship between components, such as "between" and "immediately between" or "neighboring" and "directly neighboring" should be interpreted similarly.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다"또는 "가지다" 등의 용어는 실시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions should be understood to include plural expressions unless the context clearly indicates otherwise, and terms such as “comprise” or “have” refer to implemented features, numbers, steps, operations, components, parts, or them. It is intended to specify the existence of a combination, and should be understood as not excluding in advance the possibility of the presence or addition of one or more other features, numbers, steps, operations, components, parts, or combinations thereof.

각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.For each step, identification codes (e.g., a, b, c, etc.) are used for convenience of explanation. The identification codes do not explain the order of each step, and each step clearly follows a specific order in context. Unless specified, events may occur differently from the specified order. That is, each step may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the opposite order.

본 발명은 컴퓨터가 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있고, 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The present invention can be implemented as computer-readable code on a computer-readable recording medium, and the computer-readable recording medium includes all types of recording devices that store data that can be read by a computer system. . Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage devices. Additionally, the computer-readable recording medium can be distributed across computer systems connected to a network, so that computer-readable code can be stored and executed in a distributed manner.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein, unless otherwise defined, have the same meaning as commonly understood by a person of ordinary skill in the field to which the present invention pertains. Terms defined in commonly used dictionaries should be interpreted as consistent with the meaning they have in the context of the related technology, and cannot be interpreted as having an ideal or excessively formal meaning unless clearly defined in the present application.

도 1은 본 발명에 따른 모델 자동 경량화 시스템을 설명하는 도면이다.1 is a diagram illustrating an automatic model weight reduction system according to the present invention.

도 1을 참조하면, 모델 자동 경량화 시스템(100)은 사용자 단말(110), 모델 자동 경량화 장치(130) 및 데이터베이스(150)를 포함할 수 있다.Referring to FIG. 1, the automatic model weight reduction system 100 may include a user terminal 110, an automatic model weight reduction device 130, and a database 150.

사용자 단말(110)은 사용자에 운용되는 컴퓨팅 장치에 해당할 수 있다. 즉, 사용자는 이미지 분류, 자연어 처리 등의 응용을 위해 사용자 단말(110)을 통해 모델 자동 경량화 장치(130)에 추론 서비스를 요청할 수 있다. 사용자 단말(110)은 스마트폰, 노트북 또는 컴퓨터로 구현될 수 있으며, 반드시 이에 한정되지 않고, 태블릿 PC 등 다양한 디바이스로도 구현될 수 있다.The user terminal 110 may correspond to a computing device operated by the user. That is, the user may request an inference service from the automatic model lightweighting device 130 through the user terminal 110 for applications such as image classification and natural language processing. The user terminal 110 may be implemented as a smartphone, a laptop, or a computer, but is not necessarily limited thereto, and may also be implemented as a variety of devices such as a tablet PC.

또한, 사용자 단말(110)은 모델 자동 경량화 장치(130)와 네트워크를 통해 연결될 수 있고, 복수의 사용자 단말(110)들은 모델 자동 경량화 장치(130)와 동시에 연결될 수 있다. 또한, 사용자 단말(110)은 모델 자동 경량화 장치(130)와 연동하기 위한 전용 프로그램 또는 어플리케이션을 설치하여 실행할 수 있다.Additionally, the user terminal 110 may be connected to the automatic model weight reduction device 130 through a network, and a plurality of user terminals 110 may be connected to the automatic model weight reduction device 130 at the same time. Additionally, the user terminal 110 may install and execute a dedicated program or application for linking with the automatic model weight reduction device 130.

모델 자동 경량화 장치(130)는 사용자에 의해 설정된 딥러닝 알고리즘을 기초로 학습을 수행하여 딥러닝 모델을 구축할 수 있는 컴퓨터 또는 프로그램에 해당하는 서버로 구현될 수 있다. 모델 자동 경량화 장치(130)는 사용자가 요구한 딥러닝 알고리즘에 대응되는 딥러닝 모델을 구축하는 과정에서 사용자의 개입없이 높은 정확도의 복잡한 모델로부터 낮은 정확도의 단순한 모델들을 자동으로 생성하는 동작을 수행할 수 있다. 모델 자동 경량화 장치(130)는 사용자 단말(110)과 유선 네트워크 또는 블루투스, WiFi, LTE 등과 같은 무선 네트워크로 연결될 수 있고, 네트워크를 통해 사용자 단말(110)과 데이터를 송·수신할 수 있다.The automatic model lightweighting device 130 may be implemented as a server corresponding to a computer or program that can build a deep learning model by performing learning based on a deep learning algorithm set by the user. The automatic model lightweighting device 130 performs an operation of automatically generating low-accuracy simple models from a high-accuracy complex model without user intervention in the process of building a deep learning model corresponding to the deep learning algorithm requested by the user. You can. The automatic model lightweight device 130 may be connected to the user terminal 110 through a wired network or a wireless network such as Bluetooth, WiFi, or LTE, and may transmit and receive data with the user terminal 110 through the network.

또한, 모델 자동 경량화 장치(130)는 데이터의 수집 또는 추가 기능의 제공을 위하여 외부 시스템(도 1에 미도시됨)과 연동하여 동작할 수도 있다. 예를 들어, 외부 시스템은 클라우드 서비스를 제공하는 클라우드 서버를 포함할 수 있다. 이 경우, 모델 자동 경량화 장치(130)는 클라우드 서버와 연동하여 사용자에 의해 요청된 추론 서비스를 처리할 수 있다. 즉, 모델 자동 경량화 장치(130)는 사용자의 추론 서비스 요청을 클라우드 서버에 질의하여 그에 대한 응답을 수신하여 사용자에게 제공할 수 있다.Additionally, the automatic model weight reduction device 130 may operate in conjunction with an external system (not shown in FIG. 1) to collect data or provide additional functions. For example, the external system may include a cloud server that provides cloud services. In this case, the automatic model lightweighting device 130 may process the inference service requested by the user in conjunction with the cloud server. That is, the automatic model lightweighting device 130 may query the cloud server for the user's inference service request, receive a response, and provide the response to the user.

한편, 모델 자동 경량화 장치(130)는 클라우드 서버에 포함되어 구현될 수 있다. 이 경우 모델 자동 경량화 장치(130)는 클라우드 추론 서비스를 제공함에 있어 하나의 딥러닝 모델을 기초로 다양한 정확도와 복잡성을 가진 다수의 딥러닝 모델들을 구축할 수 있고, 이를 기초로 사용자의 요청에 관한 추론을 효율적으로 수행하여 서비스 응답을 제공할 수 있다.Meanwhile, the automatic model lightweight device 130 may be implemented by being included in a cloud server. In this case, when providing a cloud inference service, the automatic model lightweight device 130 can build multiple deep learning models with various accuracy and complexity based on one deep learning model, and based on this, it can provide information on user requests. Inference can be performed efficiently to provide service responses.

데이터베이스(150)는 모델 자동 경량화 장치(130)의 동작 과정에서 필요한 다양한 정보들을 저장하는 저장장치에 해당할 수 있다. 데이터베이스(150)는 딥러닝 알고리즘 및 학습 데이터를 저장할 수 있고, 학습을 통해 구축된 딥러닝 모델들을 저장할 수 있으며, 반드시 이에 한정되지 않고, 모델 자동 경량화 장치(130)가 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법을 수행하는 과정에서 다양한 형태로 수집 또는 가공된 정보들을 저장할 수 있다.The database 150 may correspond to a storage device that stores various information required during the operation of the automatic model lightweight device 130. The database 150 can store deep learning algorithms and learning data, and can store deep learning models built through learning, but is not necessarily limited thereto, and the automatic model lightweighting device 130 is used to optimize deep learning model serving. In the process of performing the automatic model lightweighting method, information collected or processed can be stored in various forms.

도 2는 도 1의 모델 자동 경량화 장치의 시스템 구성을 설명하는 도면이다.FIG. 2 is a diagram explaining the system configuration of the model automatic weight reduction device of FIG. 1.

도 2를 참조하면, 모델 자동 경량화 장치(130)는 프로세서(210), 메모리(230), 사용자 입출력부(250) 및 네트워크 입출력부(270)를 포함하여 구현될 수 있다.Referring to FIG. 2, the automatic model weight reduction device 130 may be implemented including a processor 210, a memory 230, a user input/output unit 250, and a network input/output unit 270.

프로세서(210)는 모델 자동 경량화 장치(130)가 동작하는 과정에서의 각 단계들을 처리하는 프로시저를 실행할 수 있고, 그 과정 전반에서 읽혀지거나 작성되는 메모리(230)를 관리할 수 있으며, 메모리(230)에 있는 휘발성 메모리와 비휘발성 메모리 간의 동기화 시간을 스케줄할 수 있다. 프로세서(210)는 모델 자동 경량화 장치(130)의 동작 전반을 제어할 수 있고, 메모리(230), 사용자 입출력부(250) 및 네트워크 입출력부(270)와 전기적으로 연결되어 이들 간의 데이터 흐름을 제어할 수 있다. 프로세서(210)는 모델 자동 경량화 장치(130)의 CPU(Central Processing Unit)로 구현될 수 있다.The processor 210 can execute a procedure that processes each step in the process of operating the automatic model lightweight device 130, and can manage the memory 230 that is read or written throughout the process, and the memory ( 230), the synchronization time between the volatile memory and the non-volatile memory can be scheduled. The processor 210 can control the overall operation of the automatic model lightweight device 130 and is electrically connected to the memory 230, the user input/output unit 250, and the network input/output unit 270 to control data flow between them. can do. The processor 210 may be implemented as a CPU (Central Processing Unit) of the automatic model lightweight device 130.

메모리(230)는 SSD(Solid State Drive) 또는 HDD(Hard Disk Drive)와 같은 비휘발성 메모리로 구현되어 모델 자동 경량화 장치(130)에 필요한 데이터 전반을 저장하는데 사용되는 보조기억장치를 포함할 수 있고, RAM(Random Access Memory)과 같은 휘발성 메모리로 구현된 주기억장치를 포함할 수 있다.The memory 230 may be implemented as a non-volatile memory such as a solid state drive (SSD) or a hard disk drive (HDD) and may include an auxiliary memory used to store all data required for the model automatic lightweight device 130. , may include a main memory implemented as volatile memory such as RAM (Random Access Memory).

사용자 입출력부(250)는 사용자 입력을 수신하기 위한 환경 및 사용자에게 특정 정보를 출력하기 위한 환경을 포함할 수 있다. 예를 들어, 사용자 입출력부(250)는 터치 패드, 터치 스크린, 화상 키보드 또는 포인팅 장치와 같은 어댑터를 포함하는 입력장치 및 모니터 또는 터치스크린과 같은 어댑터를 포함하는 출력장치를 포함할 수 있다. 일 실시예에서, 사용자 입출력부(250)는 원격 접속을 통해 접속되는 컴퓨팅 장치에 해당할 수 있고, 그러한 경우, 모델 자동 경량화 장치(130)는 독립적인 서버로서 수행될 수 있다.The user input/output unit 250 may include an environment for receiving user input and an environment for outputting specific information to the user. For example, the user input/output unit 250 may include an input device including an adapter such as a touch pad, touch screen, on-screen keyboard, or pointing device, and an output device including an adapter such as a monitor or touch screen. In one embodiment, the user input/output unit 250 may correspond to a computing device connected through a remote connection, and in such case, the automatic model lightweight device 130 may be performed as an independent server.

네트워크 입출력부(270)은 네트워크를 통해 외부 장치 또는 시스템과 연결하기 위한 환경을 포함하고, 예를 들어, LAN(Local Area Network), MAN(Metropolitan Area Network), WAN(Wide Area Network) 및 VAN(Value Added Network) 등의 통신을 위한 어댑터를 포함할 수 있다.The network input/output unit 270 includes an environment for connecting with external devices or systems through a network, for example, Local Area Network (LAN), Metropolitan Area Network (MAN), Wide Area Network (WAN), and VAN ( It may include an adapter for communication such as a Value Added Network).

도 3은 도 1의 모델 자동 경량화 장치의 기능적 구성을 설명하는 도면이다.FIG. 3 is a diagram explaining the functional configuration of the model automatic weight reduction device of FIG. 1.

도 3을 참조하면, 모델 자동 경량화 장치(130)는 알고리즘 수신부(310), 알고리즘 분석부(320), 분기지점 결정부(330), 중간모델 생성부(340), 딥러닝 모델 구축부(350) 및 제어부(360)를 포함할 수 있다.Referring to FIG. 3, the automatic model lightweighting device 130 includes an algorithm receiver 310, an algorithm analysis unit 320, a branch point determination unit 330, an intermediate model generation unit 340, and a deep learning model construction unit 350. ) and a control unit 360.

알고리즘 수신부(310)는 딥러닝 모델을 구축하기 위한 딥러닝 알고리즘을 수신할 수 있다. 이때, 딥러닝 알고리즘은 DNN(Deep Neural Network), CNN(Convolution Neural Network) 및 RNN(Recurrent Neural Network) 등을 포함할 수 있다. 알고리즘 수신부(310)는 데이터베이스(150)를 통해 딥러닝 알고리즘을 저장할 수 있고, 사용자 단말(110)로부터 특정 딥러닝 알고리즘에 관한 선택 정보 만을 수신함으로써 해당 딥러닝 알고리즘에 관한 수신 동작을 처리할 수도 있다. 한편, 딥러닝 알고리즘은 상기의 대표적인 알고리즘을 기초로 수많은 변형 알고리즘이 존재하며, 알고리즘 수신부(310)는 사용자 단말(110)로부터 딥러닝 알고리즘에 관한 설정 내용을 함께 수신하여 딥러닝 알고리즘의 동작 단계들 및 동작 순서를 구체적으로 특정할 수 있다.The algorithm receiver 310 may receive a deep learning algorithm for building a deep learning model. At this time, the deep learning algorithm may include Deep Neural Network (DNN), Convolution Neural Network (CNN), and Recurrent Neural Network (RNN). The algorithm receiver 310 can store a deep learning algorithm through the database 150, and can also process reception operations related to the deep learning algorithm by receiving only selection information about a specific deep learning algorithm from the user terminal 110. . Meanwhile, deep learning algorithms exist in numerous modified algorithms based on the representative algorithms above, and the algorithm receiver 310 receives settings related to the deep learning algorithm from the user terminal 110 and performs the operation steps of the deep learning algorithm. And the operation sequence can be specifically specified.

알고리즘 분석부(320)는 딥러닝 알고리즘을 복수의 동작 단계들로 분할할 수 있다. 딥러닝 알고리즘은 학습 데이터를 복수의 동작 단계들에 순차적으로 통과시키는 과정에서 각 단계별 가중치를 누적 갱신하는 방식으로 학습 과정을 수행하도록 정의될 수 있다. 알고리즘 분석부(320)는 사용자에 의해 선택 또는 입력된 딥러닝 알고리즘을 분석하여 동작의 진행 과정에서 단위 연산들의 종료 지점을 기초로 독립적인 동작 단계들을 정의할 수 있다.The algorithm analysis unit 320 may divide the deep learning algorithm into a plurality of operation steps. A deep learning algorithm can be defined to perform a learning process by cumulatively updating weights at each stage in the process of sequentially passing learning data through a plurality of operation stages. The algorithm analysis unit 320 may analyze the deep learning algorithm selected or input by the user and define independent operation steps based on the end points of unit operations during the operation process.

일 실시예에서, 알고리즘 분석부(320)는 딥러닝 알고리즘의 동작들을 복수의 레이어(layer)들로 분할하고, 복수의 레이어들 각각에 대응하는 동작 단계들을 결정함으로써 복수의 동작 단계들로 분할할 수 있다. 딥러닝 알고리즘은 단위 연산들의 집합이 레이어(layer)로 구분되어 정의될 수 있다. 예를 들어, 이미지 분류 딥러닝 모델 중 하나인 CNN은 이미지 처리 레이어(Convolution layer), 완전 연결 레이어(Fully-Connected layer), 활성 레이어(Activation layer) 및 소프트맥스 레이어(Softmax layer) 등으로 구성될 수 있다. 알고리즘 분석부(320)는 딥러닝 알고리즘의 기본 레이어와 사용자에 의해 설정된 구체적인 파라미터들을 기초로 딥러닝 알고리즘의 동작들을 복수의 레이어들로 분할할 수 있고, 이를 기초로 복수의 동작 단계들을 결정할 수 있다. 이때, 독립적인 동작 단계들 간의 연결 지점은 이후 단계에서 중간 딥러닝 모델을 생성하기 위한 분기 지점으로 활용될 수 있다.In one embodiment, the algorithm analysis unit 320 divides the operations of the deep learning algorithm into a plurality of layers and determines operation steps corresponding to each of the plurality of layers, thereby dividing them into a plurality of operation steps. You can. Deep learning algorithms can be defined as a set of unit operations divided into layers. For example, CNN, one of the image classification deep learning models, consists of an image processing layer (Convolution layer), Fully-Connected layer, Activation layer, and Softmax layer. You can. The algorithm analysis unit 320 can divide the operations of the deep learning algorithm into a plurality of layers based on the basic layer of the deep learning algorithm and specific parameters set by the user, and determine a plurality of operation steps based on this. . At this time, the connection point between independent operation steps can be used as a branching point to create an intermediate deep learning model in a later step.

일 실시예에서, 알고리즘 분석부(320)는 딥러닝 알고리즘의 동작 과정에서 반복적으로 수행되는 반복 구간을 결정하고 반복 구간을 기준으로 반복 전 구간 및 반복 후 구간을 결정할 수 있다. 또한, 알고리즘 분석부(320)는 반복 구간을 적어도 하나의 단위 구간으로 분할할 수 있으며, 딥러닝 알고리즘의 동작 순서에 따라 반복 전 구간, 적어도 하나의 단위 구간 및 반복 후 구간을 정렬하여 복수의 동작 단계들로 결정할 수 있다. 딥러닝 모델은 다양한 레이어들을 반복적으로 쌓는 학습 과정에 따라 최종 구축된 모델의 성능(예를 들어, 정확도)이 결정될 수 있다. 따라서, 알고리즘 분석부(320)는 반복 구간의 시작 및 종료 지점을 기초로 딥러닝 알고리즘에 관한 복수의 동작 단계들을 결정할 수 있다.In one embodiment, the algorithm analysis unit 320 may determine a repetition section that is repeatedly performed during the operation of a deep learning algorithm and determine a pre-repetition section and a post-repetition section based on the repetition section. In addition, the algorithm analysis unit 320 may divide the repetition section into at least one unit section, and arrange a pre-repetition section, at least one unit section, and a post-repetition section according to the operation order of the deep learning algorithm to perform a plurality of operations. It can be decided in steps. In a deep learning model, the performance (e.g., accuracy) of the final built model can be determined according to the learning process of repeatedly stacking various layers. Accordingly, the algorithm analysis unit 320 may determine a plurality of operation steps related to the deep learning algorithm based on the start and end points of the repetition section.

한편, 딥러닝 알고리즘의 반복 구간은 설정에 따라 동일한 단위 구간들이 연속하여 진행하도록 구현될 수 있다. 예를 들어, VGG 모델(VGGNet)의 경우 두개의 콘볼루션(convolution) 연산과 하나의 맥스풀링(maxpooling) 연산으로 구성된 제1 단위 구간이 반복적으로 수행될 수 있고, 세개의 콘볼루션 연산과 하나의 맥스풀링 연산으로 구성된 제2 단위 구간이 반복적으로 수행될 수도 있다. 알고리즘 분석부(320)는 반복 구간 내에서도 단위 구간들로 분할하여 각각 독립된 동작 단계들로 결정할 수 있다.Meanwhile, the repetition section of the deep learning algorithm can be implemented so that the same unit sections proceed continuously depending on the settings. For example, in the case of the VGG model (VGGNet), the first unit section consisting of two convolution operations and one maxpooling operation may be performed repeatedly, and three convolution operations and one maxpooling operation may be performed repeatedly. The second unit section consisting of the max pooling operation may be performed repeatedly. The algorithm analysis unit 320 can divide the repetition section into unit sections and determine independent operation steps for each section.

분기지점 결정부(330)는 딥러닝 알고리즘에 따른 학습 과정에서 복수의 동작 단계들 사이에 존재하는 적어도 하나의 분기 지점을 결정할 수 있다. 여기에서, 분기 지점은 딥러닝 알고리즘을 구성하는 동작 단계들 사이의 지점으로서 중간 딥러닝 모델을 생성하기 위하여 딥러닝 모델의 진행 방향에서 독립적으로 분기하는 지점에 해당할 수 있다. 따라서, 분기 지점으로부터 분기하여 생성되는 중간 딥러닝 모델은 원래의 딥러닝 모델에 비하여 정확도는 낮은 반면, 복잡성 감소에 따라 연산 속도는 증가할 수 있다. 분기지점 결정부(330)는 원래의 딥러닝 모델과 함께 생성되는 중간 딥러닝 모델들의 정확도가 균등하게 분포될 수 있도록 분기 지점을 결정할 수 있다.The branch point determination unit 330 may determine at least one branch point that exists between a plurality of operation steps in a learning process according to a deep learning algorithm. Here, the branch point is a point between the operation steps that constitute the deep learning algorithm and may correspond to a point that independently branches off in the direction of progress of the deep learning model in order to create an intermediate deep learning model. Therefore, the intermediate deep learning model created by branching from the branch point has lower accuracy compared to the original deep learning model, but the computation speed can increase as complexity is reduced. The branching point determination unit 330 may determine the branching point so that the accuracy of the intermediate deep learning models generated together with the original deep learning model can be evenly distributed.

일 실시예에서, 분기지점 결정부(330)는 반복 구간이 종료되는 지점마다 분기 지점으로 결정할 수 있다. 딥러닝 알고리즘의 반복 구간이 반복될 때마다 누적 학습에 따른 모델 성능은 증가하는 반면, 복잡성도 함께 증가하여 연산 속도는 감소할 수 있다. 이에 따라, 분기지점 결정부(330)는 반복 구간의 종료 지점을 기초로 분기 지점을 결정할 수 있다. 다만, 반복 구간의 종료 지점은 동작 단계가 동일하게 누적됨에도 불구하고 중간 딥러닝 모델의 성능이 선형적으로 변경되지 않을 수 있다.In one embodiment, the branch point determination unit 330 may determine each point where a repetition section ends as a branch point. Each time the repetition section of the deep learning algorithm is repeated, model performance due to cumulative learning increases, while complexity also increases and computation speed may decrease. Accordingly, the branch point determination unit 330 may determine the branch point based on the end point of the repetition section. However, the performance of the intermediate deep learning model may not change linearly at the end point of the repetition section even though the operation steps are equally accumulated.

일 실시예에서, 분기지점 결정부(330)는 딥러닝 알고리즘이 CNN인 경우 적어도 하나의 콘볼루션 레이어(convolution layer)와 풀링 레이어(pooling layer)를 순차적으로 진행하는 반복 구간의 종료 지점을 분기 지점으로 결정할 수 있다. CNN은 콘볼루션 레이어, 풀링 레이어 및 완전 연결 레이어(fully-connected layer)로 구성될 수 있으며, 콘볼루션 레이어와 풀링 레이어는 특징 추출(feature extraction) 단계로서 소정의 횟수만큼 반복적으로 수행될 수 있다. 완전 연결 레이어는 분류(classification) 단계로서 단일 동작으로 수행될 수 있다. 분기지점 결정부(330)는 CNN의 경우 풀링 레이어의 종료 지점을 기초로 분기 지점으로 결정할 수 있으며, 이에 따라 분기 지점에 완전 연결 레이어가 단순 결합됨으로써 동일한 동작 구조를 기반으로 반복 횟수에서만 차이가 있는 다수의 중간 딥러닝 모델들이 생성될 수 있다.In one embodiment, when the deep learning algorithm is CNN, the branch point determination unit 330 sets the end point of the repetition section that sequentially progresses through at least one convolution layer and a pooling layer as the branch point. can be decided. CNN may be composed of a convolutional layer, a pooling layer, and a fully-connected layer, and the convolutional layer and the pooling layer can be performed repeatedly a predetermined number of times as a feature extraction step. The fully connected layer can be performed in a single operation as a classification step. In the case of CNN, the branch point decision unit 330 can determine the branch point based on the end point of the pooling layer, and accordingly, the fully connected layer is simply combined at the branch point, so that the difference is only in the number of repetitions based on the same operation structure. Multiple intermediate deep learning models can be created.

중간모델 생성부(340)는 적어도 하나의 분기 지점을 기준으로 학습 과정의 진행 방향으로부터 분기하고 딥러닝 알고리즘의 마지막 동작 단계로 진행하는 적어도 하나의 중간 딥러닝 모델을 생성할 수 있다. 즉, 중간 딥러닝 모델은 원래의 딥러닝 모델을 기반으로 학습 과정의 일부를 생략하여 생성된 딥러닝 모델에 해당할 수 있다. 따라서, 중간 딥러닝 모델은 원래의 딥러닝 모델에 비해 복잡성이 감소하여 연산 속도가 증가한 반면, 예측 성능(즉, 정확도)은 감소할 수 있다. 중간모델 생성부(340)는 분기 지점에서 추가 학습 과정을 수행하지 않고 곧바로 마지막 동작 단계로 진행하는 중간 딥러닝 모델을 생성할 수 있다. 한편, 마지막 동작 단계는 원래의 딥러닝 모델의 마지막 동작 단계와 동일하게 구성될 수 있고, 필요에 따라 마지막 동작 단계의 일부만으로 구성될 수도 있다.The intermediate model generator 340 may branch from the direction of the learning process based on at least one branch point and generate at least one intermediate deep learning model that proceeds to the final operation step of the deep learning algorithm. In other words, the intermediate deep learning model may correspond to a deep learning model created by omitting part of the learning process based on the original deep learning model. Therefore, the complexity of the intermediate deep learning model is reduced compared to the original deep learning model, which increases computational speed, but prediction performance (i.e., accuracy) may decrease. The intermediate model generator 340 may generate an intermediate deep learning model that proceeds directly to the final operation step without performing an additional learning process at the branch point. Meanwhile, the last operation step may be configured identically to the last operation step of the original deep learning model, or may be composed of only a part of the last operation step, if necessary.

일 실시예에서, 중간모델 생성부(340)는 분기에 따른 후보 중간 딥러닝 모델을 정의하고 후보 중간 딥러닝 모델의 레이어 수(L) 및 예측 정확도(A)를 기초로 해당 분기 지점의 적정성을 산출하며 해당 분기 지점의 적정성이 기 설정된 조건을 충족하는 경우 후보 중간 딥러닝 모델을 중간 딥러닝 모델로 결정할 수 있다. 여기에서, 레이어 수(L)은 딥러닝 모델을 구성하는 레이어들의 총 개수에 해당할 수 있고, 예측 정확도(A)는 생성된 딥러닝 모델의 성능지표에 해당할 수 있으며 원래의 딥러닝 모델의 성능을 100%으로 가정하여 이에 대한 상대적 수치로 표현될 수 있다.In one embodiment, the intermediate model generator 340 defines a candidate intermediate deep learning model according to the branch and determines the appropriateness of the branch point based on the number of layers (L) and prediction accuracy (A) of the candidate intermediate deep learning model. If the appropriateness of the corresponding branch point satisfies preset conditions, the candidate intermediate deep learning model can be determined as the intermediate deep learning model. Here, the number of layers (L) may correspond to the total number of layers constituting the deep learning model, and the prediction accuracy (A) may correspond to the performance index of the generated deep learning model and that of the original deep learning model. Performance can be assumed to be 100% and expressed as a relative value.

즉, 중간모델 생성부(340)는 모든 분기 지점에 대해 중간 딥러닝 모델을 생성하는 대신 적정한 분기 지점을 선별하여 중간 딥러닝 모델을 생성할 수 있다. 만약 중간 딥러닝 모델이 너무 많이 생성될 경우 비슷한 성능을 가지는 다수의 모델들을 관리해야 하는 문제가 발생할 수 있으며, 중간 딥러닝 모델의 개수가 너무 작다면 각 중간 딥러닝 모델 간의 연산 속도 및 정확도의 차이가 커져서 이를 활용한 추론 서비스 제공 시 제약으로 작용할 수 있다.That is, the intermediate model generator 340 may generate an intermediate deep learning model by selecting an appropriate branch point instead of generating an intermediate deep learning model for all branch points. If too many intermediate deep learning models are created, problems may arise in having to manage multiple models with similar performance, and if the number of intermediate deep learning models is too small, there may be differences in calculation speed and accuracy between each intermediate deep learning model. As it grows, it may act as a limitation when providing inference services using it.

일 실시예에서, 중간모델 생성부(340)는 레이어 수(L)와 예측 정확도(A) 간의 곱 연산(A*L)을 통해 분기 지점의 적정성을 산출하고 적정성이 이전 단계에서 생성된 중간 딥러닝 모델보다 2배 증가한 경우 해당 후보 중간 딥러닝 모델을 중간 딥러닝 모델로 결정할 수 있다. 즉, 중간모델 생성부(340)는 A*L 값을 분기지점의 적적성으로서 산출할 수 있고, 해당 값이 초기 값의 2배씩 증가할 때마다 중간 딥러닝 모델을 생성할 수 있다. 결과적으로, 모델링의 초기에 정확도가 빠르게 증가하기에 보다 많은 중간 딥러닝 모델이 생성될 수 있고, 모델링의 후반으로 갈수록 레이어는 깊어지는 반면 정확도는 크게 향상되지 않기에 상대적으로 적은 중간 딥러닝 모델이 생성될 수 있다.In one embodiment, the intermediate model generator 340 calculates the adequacy of the branch point through a product operation (A*L) between the number of layers (L) and the prediction accuracy (A), and determines the adequacy of the intermediate dip generated in the previous step. If the number is twice that of the learning model, the candidate intermediate deep learning model can be determined as the intermediate deep learning model. That is, the intermediate model generator 340 can calculate the A*L value as the appropriateness of the branch point and generate an intermediate deep learning model each time the value increases by twice the initial value. As a result, more intermediate deep learning models can be created because the accuracy increases rapidly in the early stages of modeling, and as the layers become deeper in the later stages of modeling, but accuracy does not improve significantly, relatively fewer intermediate deep learning models are created. It can be.

딥러닝 모델 구축부(350)는 학습 과정의 완료에 따라 딥러닝 모델 및 적어도 하나의 중간 딥러닝 모델을 완성할 수 있다. 딥러닝 모델 구축부(350)는 딥러닝 알고리즘에 따라 소정의 학습 데이터에 대한 학습을 수행할 수 있고, 모든 학습 데이터에 대한 학습이 완료되면 원래의 딥러닝 모델과 이에 기반하여 생성되는 적어도 하나의 중간 딥러닝 모델을 완성할 수 있다. 즉, 딥러닝 모델 구축부(350)는 하나의 딥러닝 알고리즘과 학습 데이터 집합을 기초로 복수의 딥러닝 모델들을 자동으로 구축할 수 있다.The deep learning model construction unit 350 may complete the deep learning model and at least one intermediate deep learning model upon completion of the learning process. The deep learning model building unit 350 may perform learning on predetermined learning data according to a deep learning algorithm, and when learning on all learning data is completed, the original deep learning model and at least one generated based on it. An intermediate deep learning model can be completed. That is, the deep learning model building unit 350 can automatically build a plurality of deep learning models based on one deep learning algorithm and a learning data set.

이때, 생성되는 복수의 딥러닝 모델들은 예측 정확도와 연산 속도가 각각 상이할 수 있으며, 이를 활용한 추론 서비스의 수행 과정에서 시스템의 동작 조건에 따라 딥러닝 모델이 선택적으로 적용할 수 있다. 예를 들어, 추론 서비스 제공 중 서비스 요청이 폭증하는 경우 연산이 빠른 가벼운 딥러닝 모델을 활용하여 보다 빠른 응답을 제공할 수 있고 서비스 요청이 적은 경우 복잡성 높은 딥러닝 모델을 활용하여 보다 정확한 응답을 제공할 수 있다.At this time, the plurality of deep learning models created may have different prediction accuracy and calculation speed, and the deep learning model can be selectively applied depending on the operating conditions of the system during the performance of the inference service using this. For example, if service requests explode while providing inference services, a faster response can be provided by utilizing a lightweight deep learning model with fast computation. If there are few service requests, a more accurate response can be provided by utilizing a highly complex deep learning model. can do.

제어부(360)는 모델 자동 경량화 장치(130)의 전체적인 동작을 제어하고, 알고리즘 수신부(310), 알고리즘 분석부(320), 분기지점 결정부(330), 중간모델 생성부(340) 및 딥러닝 모델 구축부(350) 간의 제어 흐름 또는 데이터 흐름을 관리할 수 있다.The control unit 360 controls the overall operation of the automatic model weight reduction device 130, and includes an algorithm receiver 310, an algorithm analysis unit 320, a branch point determination unit 330, an intermediate model generation unit 340, and a deep learning unit. Control flow or data flow between the model building units 350 can be managed.

도 4는 본 발명에 따른 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법을 설명하는 순서도이다.Figure 4 is a flowchart explaining the automatic model lightweighting method for optimizing deep learning model serving according to the present invention.

도 4를 참조하면, 모델 자동 경량화 장치(130)는 알고리즘 수신부(310)를 통해 사용자 단말(110)로부터 딥러닝 모델을 구축하기 위한 딥러닝 알고리즘을 수신할 수 있다(단계 S410). 모델 자동 경량화 장치(130)는 알고리즘 분석부(320)를 통해 딥러닝 알고리즘을 복수의 동작 단계들로 분할할 수 있다(단계 S430). 모델 자동 경량화 장치(130)는 분기지점 결정부(330)를 통해 딥러닝 알고리즘에 따른 학습 과정에서 복수의 동작 단계들 사이에 존재하는 적어도 하나의 분기 지점을 결정할 수 있다(단계 S450).Referring to FIG. 4, the automatic model weight reduction device 130 may receive a deep learning algorithm for building a deep learning model from the user terminal 110 through the algorithm receiver 310 (step S410). The automatic model weight reduction device 130 may divide the deep learning algorithm into a plurality of operation steps through the algorithm analysis unit 320 (step S430). The automatic model weight reduction device 130 may determine at least one branch point that exists between a plurality of operation steps in a learning process according to a deep learning algorithm through the branch point determination unit 330 (step S450).

또한, 모델 자동 경량화 장치(130)는 중간모델 생성부(340)를 통해 적어도 하나의 분기 지점을 기준으로 학습 과정의 진행 방향으로부터 분기하고 딥러닝 알고리즘의 마지막 동작 단계로 진행하는 적어도 하나의 중간 딥러닝 모델을 생성할 수 있다(단계 S470). 모델 자동 경량화 장치(130)는 딥러닝 모델 구축부(350)를 통해 학습 과정의 완료에 따라 딥러닝 모델 및 적어도 하나의 중간 딥러닝 모델을 완성할 수 있다(단계 S490).In addition, the automatic model lightweighting device 130 branches from the direction of the learning process based on at least one branch point through the intermediate model generator 340 and creates at least one intermediate deep learning process that proceeds to the final operation step of the deep learning algorithm. A learning model can be created (step S470). The automatic model lightweighting device 130 may complete a deep learning model and at least one intermediate deep learning model upon completion of the learning process through the deep learning model building unit 350 (step S490).

도 5는 본 발명에 따른 클라우드 추론 서비스 제공 방법을 설명하는 순서도이다.Figure 5 is a flowchart explaining a method of providing a cloud inference service according to the present invention.

도 5를 참조하면, 본 발명에 따른 딥러닝 모델 서빙 최적화를 위한 모델 자동 경량화 방법을 사용하면 예측 정확도와 연산 속도가 상이한 복수의 딥러닝 모델들을 생성할 수 있으며, 이를 활용하여 클라우드 추론 서비스를 제공할 수 있다.Referring to Figure 5, using the automatic model lightweighting method for optimizing deep learning model serving according to the present invention, multiple deep learning models with different prediction accuracy and calculation speed can be created, and cloud inference services are provided by utilizing this. can do.

보다 구체적으로, 단계 S510에서 본 발명에 따른 클라우드 추론 서비스 제공 방법은 사용자 단말(110)로부터 추론 서비스에 관한 요청을 수신할 수 있다. 만약 클라우드 추론 서비스 제공 장치가 본 발명에 따른 모델 자동 경량화 장치와 독립적으로 구현된 경우, 모델 자동 경량화 장치가 해당 요청을 수신하여 클라우드 추론 서비스 제공 장치로 전달할 수 있다.More specifically, in step S510, the method for providing a cloud inference service according to the present invention may receive a request for an inference service from the user terminal 110. If the cloud inference service providing device is implemented independently from the automatic model lightening device according to the present invention, the automatic model lightening device may receive the request and forward it to the cloud inference service providing device.

단계 S530)에서 클라우드 추론 서비스 제공 방법은 기 구축된 복수의 딥러닝 모델들 중에서 어느 하나를 결정할 수 있다. 이때, 클라우드 추론 서비스 제공 방법은 요청의 수신 시점을 기준으로 클라우드 가용 자원 현황을 결정하고 요청에 대한 응답 생성 시간을 예측하여 해당 응답 생성 시간에 따라 복수의 딥러닝 모델들 중 어느 하나를 결정할 수 있다.In step S530), the cloud inference service provision method may determine one of a plurality of previously constructed deep learning models. At this time, the cloud inference service provision method determines the status of available cloud resources based on the time of receipt of the request, predicts the response generation time to the request, and determines one of a plurality of deep learning models according to the response generation time. .

예를 들어, 현재의 클라우드 가용 자원 현황을 기초로 원래의 딥러닝 모델을 이용하여 서비스 요청에 관한 응답을 생성할 경우 10초가 소요되는 것으로 예측된다면, 정확도는 다소 낮더라도 복잡성이 낮은 딥러닝 모델을 선택하여 10초 보다 빠르게 응답을 생성할 수 있다.For example, if it is predicted that it will take 10 seconds to generate a response to a service request using the original deep learning model based on the current cloud resources available, a deep learning model with low complexity even if the accuracy is somewhat low is used. You can select to generate a response faster than 10 seconds.

단계 S550에서, 클라우드 추론 서비스 제공 방법은 기 결정된 딥러닝 모델을 이용하여 사용자의 서비스 요청에 대한 응답을 생성할 수 있고, 사용자 단말(110)을 통해 사용자에게 제공할 수 있다.In step S550, the cloud inference service providing method may generate a response to the user's service request using a predetermined deep learning model and provide the response to the user through the user terminal 110.

도 6은 CNN의 레이어 구성을 설명하는 도면이다.Figure 6 is a diagram explaining the layer configuration of CNN.

도 6을 참조하면, CNN(Convolution Neural Network)은 이미지 처리 레이어(Convolution layer), 완전 연결 레이어(Fully-Connected layer), 활성 레이어(Activation layer) 등으로 구성될 수 있다.Referring to FIG. 6, a Convolution Neural Network (CNN) may be composed of an image processing layer, a fully connected layer, an activation layer, etc.

모델 자동 경량화 장치(130)는 딥러닝 알고리즘을 분석하여 복수의 동작 단계들로 분할할 수 있으며, 딥러닝 알고리즘에 정의된 반복 구간을 기초로 동작 단계들을 분할할 수 있다.The automatic model weight reduction device 130 can analyze the deep learning algorithm and divide it into a plurality of operation steps, and divide the operation steps based on the repetition section defined in the deep learning algorithm.

도 6에서, CNN의 경우 콘볼루션 레이어들이 반복적으로 수행되는 단계 S630이 반복 구간에 해당할 수 있다. 모델 자동 경량화 장치(130)는 해당 반복 구간(S630)을 기준으로 반복 전 구간(S610)과 반복 후 구간(S630)을 결정할 수 있다. CNN의 경우 반복 전 구간(S610)은 입력 수신 단계에 해당할 수 있고, 반복 후 구간(S610)은 출력 생성 단계에 해당할 수 있다. 이때, 출력 생성 단계는 완전 연결 레이어(Dense Layer)와 활성 레이어(Activation Layer)를 포함할 수 있으며, 필요에 따라 반복될 수 있다.In Figure 6, in the case of CNN, step S630, in which convolutional layers are repeatedly performed, may correspond to a repetition section. The automatic model weight reduction device 130 may determine the pre-repetition section (S610) and the post-repetition section (S630) based on the corresponding repeat section (S630). In the case of CNN, the section before repetition (S610) may correspond to the input reception step, and the section after repetition (S610) may correspond to the output generation step. At this time, the output generation step may include a fully connected layer (Dense Layer) and an activation layer (Activation Layer), and may be repeated as necessary.

또한, 모델 자동 경량화 장치(130)는 반복 구간(S630)을 복수의 단위 구간들로 분할할 수 있고, 단위 구간들의 종료 지점을 분기 지점으로 결정하여 중간 딥러닝 모델을 생성할 수 있다. 즉, 도 6에서 콘볼루션 레이어들 사이의 지점이 분기 지점으로 결정될 수 있다.Additionally, the automatic model weight reduction device 130 may divide the repetition section S630 into a plurality of unit sections and determine the end points of the unit sections as branch points to generate an intermediate deep learning model. That is, the point between the convolutional layers in FIG. 6 may be determined as the branching point.

도 7 및 8은 본 발명에 따른 모델 자동 경량화 방법의 일 실시예를 설명하는 도면이다.7 and 8 are diagrams illustrating an embodiment of the automatic model weight reduction method according to the present invention.

도 7을 참조하면, 모델 자동 경량화 장치(130)는 사용자로부터 하나의 딥러닝 알고리즘을 수신하여 원래의 학습 방향(750)에 따라 딥러닝 모델을 구축할 수 있다. 모델 자동 경량화 장치(130)는 학습 과정에서 적어도 하나의 분기 지점(740)을 결정할 수 있으며, 분기 지점을 기준으로 원래의 학습 방향(750)으로부터 분기하는 분기된 학습 방향(760)을 정의할 수 있다. 즉, 모델 자동 경량화 장치(130)는 분기된 학습 방향(760)에 따라 분기 지점(740)에서 곧바로 학습 종료 단계(S730)를 수행하는 중간 딥러닝 모델을 생성할 수 있다. 이렇게 생성된 중간 딥러닝 모델은 원래의 학습 방향(750)보다 더 단순한 구조를 가지게 되어 추론 시간은 짧지만 정확도가 낮은 특징을 가질 수 있다.Referring to FIG. 7, the automatic model lightweighting device 130 may receive one deep learning algorithm from the user and build a deep learning model according to the original learning direction 750. The automatic model lightening device 130 may determine at least one branch point 740 in the learning process and define a branched learning direction 760 that branches off from the original learning direction 750 based on the branch point. there is. In other words, the automatic model lightweighting device 130 can generate an intermediate deep learning model that performs the learning end step (S730) immediately at the branch point 740 according to the branched learning direction 760. The intermediate deep learning model created in this way has a simpler structure than the original learning direction 750, so it may have a short inference time but low accuracy.

한편, VGG 모델의 경우, 학습 종료 단계(S730)은 플래튼(Flatten) 레이어, 덴스(Dense) 레이어, 드롭아웃(Dropout) 레이어 및 활성(Activation) 레이어로 구성될 수 있으며, 반드시 이에 한정되지 않고, 필요에 따라 선택적으로 적용될 수 있음은 물론이다. 또한, 덴스 레이어와 드롭아웃 레이어는 반복적으로 수행될 수도 있다.Meanwhile, in the case of the VGG model, the learning end step (S730) may consist of a flatten layer, a dense layer, a dropout layer, and an activation layer, but is not necessarily limited to this. Of course, it can be applied selectively as needed. Additionally, the dense layer and dropout layer may be performed repeatedly.

또한, 모델 자동 경량화 장치(130)는 중간 딥러닝 모델을 생성하기 위하여 원래의 학습 과정의 동작 단계들 사이의 분기 지점(740)을 결정할 수 있다. 이때, 모델 자동 경량화 장치(130)는 소정의 동작 단계들이 반복적으로 수행되는 반복 구간을 결정하고 반복 구간의 종료 지점을 분기 지점으로 결정할 수 있다. 도 7에서, 단계 S710 및 단계 S720은 각각 2개의 컨볼루션 연산(Conv1 및 Conv2)과 하나의 풀링 연산(Pool)을 순차적으로 수행하는 구간으로 반복 구간에 해당할 수 있다. 즉, 모델 자동 경량화 장치(130)는 단계 S710 및 단계 S720 사이의 지점을 분기 지점(740)으로 결정하고, 이로부터 분기하는 중간 딥러닝 모델을 생성할 수 있다.Additionally, the automatic model lightweighting device 130 may determine the branching point 740 between operation steps of the original learning process in order to generate an intermediate deep learning model. At this time, the automatic model weight reduction device 130 may determine a repetition section in which predetermined operation steps are repeatedly performed and determine the end point of the repetition section as a branch point. In FIG. 7, steps S710 and S720 are sections in which two convolution operations (Conv1 and Conv2) and one pooling operation (Pool) are sequentially performed, respectively, and may correspond to a repetition section. That is, the automatic model lightweighting device 130 may determine the point between step S710 and step S720 as the branch point 740 and generate an intermediate deep learning model branching from this point.

도 8을 참조하면, 모델 자동 경량화 장치(130)는 원래의 딥러닝 모델의 학습 과정에서 분기하는 중간 딥러닝 모델(Model A 또는 Model B)을 생성할 수 있다. 딥러닝 모델은 동일 동작을 반복적으로 수행하여 모델의 성능을 높일 수 있으나, 복잡성이 증가하여 계산의 오버헤드도 동시에 증가할 수 있다. 즉, 모델 자동 경량화 장치(130)는 원래의 학습 과정의 일부가 생략된 중간 딥러닝 모델들(Model A 및 Model B)을 생성하여 다양한 정확도와 계산 오버헤드를 가지는 딥러닝 모델들을 자동으로 확보할 수 있다.Referring to FIG. 8, the automatic model lightweighting device 130 may generate an intermediate deep learning model (Model A or Model B) that branches off during the learning process of the original deep learning model. Deep learning models can increase model performance by repeatedly performing the same operation, but as complexity increases, computational overhead may also increase at the same time. In other words, the automatic model lightweighting device 130 generates intermediate deep learning models (Model A and Model B) in which part of the original learning process is omitted, and automatically secures deep learning models with various accuracy and computational overhead. You can.

이에 따라, 본 발명에 따른 모델 자동 경량화 방법을 통해 구축된 복수의 딥러닝 모델들은 클라우드에서 학습하는 서비스에 적용될 수 있으며, 딥러닝 모델 개발자는 별도의 중간 딥러닝 모델 생성 작업없이도 학습작업 실행 중 중간 딥러닝 모델을 자동으로 생성할 수 있게 된다. 즉, 중간 딥러닝 모델들은 정확도는 낮을지라도 모델의 계산 오버헤드에 관한 다양한 스펙트럼을 제공하여 특정 서비스 환경에서도 원활한 서비스 제공을 가능하게 할 수 있다. 예를 들어, 추론 서비스 제공 중 요청이 폭증하는 경우 연산이 빠른 가벼운 모델을 활용하여 서비스를 제공해 줄 수 있게 되어 사용자의 서비스 만족도를 높일 수 있다.Accordingly, a plurality of deep learning models built through the automatic model lightweighting method according to the present invention can be applied to a service learning in the cloud, and deep learning model developers can perform intermediate learning tasks while executing the learning task without a separate intermediate deep learning model creation task. Deep learning models can be automatically created. In other words, although intermediate deep learning models may have low accuracy, they can provide a diverse spectrum of model calculation overhead, enabling smooth service provision even in specific service environments. For example, when requests explode while providing an inference service, the service can be provided using a lightweight model with fast computation, thereby increasing user service satisfaction.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the present invention has been described above with reference to preferred embodiments, those skilled in the art may make various modifications and changes to the present invention without departing from the spirit and scope of the present invention as set forth in the claims below. You will understand that you can do it.

100: 모델 자동 경량화 시스템
110: 사용자 단말 130: 모델 자동 경량화 장치
150: 데이터베이스
210: 프로세서 230: 메모리
250: 사용자 입출력부 270: 네트워크 입출력부
310: 알고리즘 수신부 320: 알고리즘 분석부
330: 분기지점 결정부 340: 중간모델 생성부
350: 딥러닝 모델 구축부 360: 제어부
740: 분기 지점 750: 원래의 학습 방향
760: 분기된 학습 방향100: Model automatic lightweight system
110: User terminal 130: Automatic model lightweight device
150: database
210: Processor 230: Memory
250: user input/output unit 270: network input/output unit
310: Algorithm receiving unit 320: Algorithm analysis unit
330: Branch point determination unit 340: Intermediate model generation unit
350: Deep learning model construction unit 360: Control unit
740: Branch point 750: Original learning direction
760: Branched Learning Directions

Claims

In the automatic model lightweighting method performed in the automatic model lightweighting device,
Receiving a deep learning algorithm for building a deep learning model through an algorithm receiving unit;
Dividing the deep learning algorithm into a plurality of operation steps based on a repetition section that is repeatedly performed through an algorithm analysis unit;
determining at least one branch point existing between the plurality of operation steps in a learning process according to the deep learning algorithm, through a branch point determination unit;
Generating, through an intermediate model generator, at least one intermediate deep learning model that branches from the direction of the learning process based on the at least one branch point and proceeds to the final operation step of the deep learning algorithm; and
Completing the deep learning model and the at least one intermediate deep learning model upon completion of the learning process through a deep learning model building unit,
Generating the at least one intermediate deep learning model may include defining a candidate intermediate deep learning model according to the branch; Based on the number of layers (L) and prediction accuracy (A) of the candidate intermediate deep learning model, the appropriateness of the branch point is determined through the product operation (A*L) between the number of layers (L) and the prediction accuracy (A). calculating step; and determining the candidate intermediate deep learning model as an intermediate deep learning model when the appropriateness of the corresponding branch point satisfies a preset condition. A model automatic model lightweighting method for optimizing deep learning model serving.

The method of claim 1, wherein the deep learning algorithm is
An automatic model lightweighting method for optimizing deep learning model serving, characterized by including Deep Neural Network (DNN), Convolution Neural Network (CNN), and Recurrent Neural Network (RNN).

According to paragraph 1,
An automatic model lightweighting method for deep learning model serving optimization, wherein the deep learning model and the at least one intermediate deep learning model have different prediction accuracy and calculation speed.

According to paragraph 1,
The step of dividing into the plurality of operation steps is
Dividing operations of the deep learning algorithm into a plurality of layers; and
An automatic model lightweighting method for deep learning model serving optimization, comprising the step of determining operation steps corresponding to each of the plurality of layers.

According to paragraph 1,
The step of dividing into the plurality of operation steps is
Determining a repetition section that is repeatedly performed during the operation of the deep learning algorithm;
determining a pre-repetition section and a post-repetition section based on the repeat section;
determining at least one unit section for the repetition section; and
An automatic model lightweight method for optimizing deep learning model serving, comprising arranging the pre-repetition section, the at least one unit section, and the post-repetition section in order to determine the plurality of operation steps.

According to clause 5,
The step of determining the at least one branch point is
An automatic model lightweighting method for deep learning model serving optimization, comprising the step of determining each point where the repetition section ends as a branch point.

According to clause 6,
The step of determining the at least one branch point is
When the deep learning algorithm is CNN, determining the end point of a repetition section that sequentially progresses through at least one convolution layer and a pooling layer as the branch point. Automatic model lightweighting method for optimizing deep learning model serving.

delete

According to paragraph 1,
The step of deciding with the intermediate deep learning model is
Automatic model lightweighting for deep learning model serving optimization, comprising the step of determining the candidate intermediate deep learning model as the intermediate deep learning model when the adequacy is twice that of the intermediate deep learning model generated in the previous step. method.

An algorithm receiving unit that receives a deep learning algorithm for building a deep learning model;
an algorithm analysis unit that divides the deep learning algorithm into a plurality of operation steps based on a repetition section that is repeatedly performed;
a branch point determination unit that determines at least one branch point that exists between the plurality of operation steps in a learning process according to the deep learning algorithm;
an intermediate model generator that generates at least one intermediate deep learning model that branches off from the direction of the learning process based on the at least one branch point and proceeds to a final operation step of the deep learning algorithm; and
A deep learning model construction unit that completes the deep learning model and the at least one intermediate deep learning model upon completion of the learning process,
The intermediate model generator defines a candidate intermediate deep learning model according to the branch, and based on the number of layers (L) and prediction accuracy (A) of the candidate intermediate deep learning model, the number of layers (L) and the prediction accuracy ( A) The adequacy of the branch point is calculated through the product operation (A*L) between the branches, and if the adequacy of the branch point satisfies preset conditions, the candidate intermediate deep learning model is determined as the intermediate deep learning model. Features an automatic model lightweight device for optimizing deep learning model serving.

delete

The method of claim 10, wherein the intermediate model generator
Automatic model lightweighting device for deep learning model serving optimization, characterized in that when the adequacy is doubled compared to the intermediate deep learning model generated in the previous step, the candidate intermediate deep learning model is determined as the intermediate deep learning model.

Receiving a request for an inference service from a user terminal;
Determining the status of available cloud resources based on the time of receipt of the request, predicting a response generation time for the request, and determining one of a plurality of deep learning models according to the response generation time; and
Generating a response to the request using the determined deep learning model and providing it to the user terminal,
The plurality of deep learning models include a deep learning model generated to have different prediction accuracy and calculation speed based on an automatic model lightweighting method for deep learning model serving optimization and at least one intermediate deep learning model,
When the at least one intermediate deep learning model is a candidate intermediate deep learning model defined based on a branching point that exists between a plurality of operation steps divided based on a repetitive section of the deep learning model, the model Based on the number of layers (L) and prediction accuracy (A), the adequacy of the branch point calculated through the product operation (A*L) between the number of layers (L) and the prediction accuracy (A) satisfies the preset conditions. A method of providing cloud inference services using an automatic model lightweighting method for optimizing deep learning model serving, characterized in that it corresponds to a candidate intermediate deep learning model that satisfies the criteria.