KR20220143276A

KR20220143276A - Training method and training apparatus of deep learning model

Info

Publication number: KR20220143276A
Application number: KR1020210049588A
Authority: KR
Inventors: 이영민
Original assignee: 서울시립대학교 산학협력단
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2022-10-25
Also published as: KR102508635B1

Abstract

The present invention relates to a method and learning device for learning a deep learning model, wherein the method comprises: a step of selecting N number of layers of a learned deep learning model; a step of setting each threshold corresponding to a sparsification parameter according to the sparsification parameter, for N number of layers preceding the selected N number of layers; and a step of tuning the deep learning model using each threshold set for the preceding N number of layers. Therefore, the present invention is capable of having an effect of improving an inference speed.

Description

TRAINING METHOD AND TRAINING APPARATUS OF DEEP LEARNING MODEL

본 발명은 딥 러닝 모델 학습 방법 및 학습기에 관한 것으로서, 구체적으로는 추론 정확도의 감소 없이 액티베이션 맵의 희소성을 높여 모바일 플랫폼에서 추론 속도를 높일 수 있는 딥 러닝 모델 학습 방법 및 학습기에 관한 것이다. The present invention relates to a method and a learner for learning a deep learning model, and more specifically, to a method and a learner for learning a deep learning model that can increase inference speed on a mobile platform by increasing the sparsity of an activation map without reducing inference accuracy.

딥 러닝 기술이 보편화 됨에 따라, 디바이스 상에서 직접 딥 러닝을 수행할 필요성이 증가하고 있다. 자율 주행차, 스마트 카메라, 로봇, 스마트 IoT와 같은 다양한 범주의 디바이스는 많은 양의 데이터를 클라우드로 보내고 추론 결과를 받기 보다는 디바이스 상에서 직접 딥 러닝을 수행하는 것을 선호한다. As deep learning technology becomes more common, the need to perform deep learning directly on a device is increasing. A wide range of devices, such as autonomous vehicles, smart cameras, robots, and smart IoT, prefer to perform deep learning directly on the device rather than sending large amounts of data to the cloud and receiving inference results.

디바이스 위에서의 딥 러닝은 네트워크 연결, 응답 지연 시간(latency), 모뎀 파워 소비, 클라우드 서버로 보안 데이터의 전송이 필요 없어, 점점 더 많은 디바이스가 디바이스 자체에서 직접 딥 러닝(예를 들어, 추론)을 수행하고 있다. Deep learning on device eliminates the need for network connectivity, response latency, modem power consumption, and transmission of secure data to cloud servers, so more and more devices are using deep learning (e.g., inference) directly on the device itself. are performing

디바이스의 모바일 플랫폼에서의 효율적인 온 디바이스 추론 연구는 주로 알고리즘 최적화 및 시스템 최적화의 2가지 방향으로 분류될 수 있다. 알고리즘 최적화로서 모바일넷(MobileNet)과 같은 경량(lightweight) CNN 구조가 제안되어 왔다. 경량 CNN 구조는 뎁스와이즈(depthwise) 1*1 합성곱과 같은 경량 합성곱 연산이나 프루닝(pruning), 양자화(quantization) 등의 기법을 사용한다. Efficient on-device reasoning studies in the mobile platform of devices can be mainly classified into two directions: algorithm optimization and system optimization. As an algorithm optimization, a lightweight CNN structure such as MobileNet has been proposed. The lightweight CNN structure uses a lightweight convolution operation such as depthwise 1*1 convolution, or techniques such as pruning and quantization.

시스템 최적화로서 NPU(Nueral Processing Unit : NPU) 또는 전용 딥 러닝 가속기 관련 연구가 지속하고 있다. NPU는 빠르고 에너지 효율적인 반면 작은 에스램(SRAM)을 가지고 있어 디램(DRAM)의 지연 시간(latency)을 효율적으로 숨기지 못하면 효율면에서 많은 손실을 야기한다. CPU나 GPU(Graphics Processing Unit : GPU)와는 달리 NPU에 대한 사용자를 위한 커스텀 레이어의 개발 환경은 극히 제한적이거나 이용 가능하지 않은 단점 또한 존재한다. As a system optimization, research on NPU (Neueral Processing Unit: NPU) or a dedicated deep learning accelerator is continuing. NPU is fast and energy efficient, but has a small SRAM, so if the latency of DRAM is not effectively hidden, it causes a lot of loss in efficiency. Unlike CPU or GPU (Graphics Processing Unit: GPU), the development environment for custom layer for NPU users is extremely limited or not available.

디바이스 상에서 딥 러닝 수행의 필요성이 증가함에 따라 앞으로 다수의 딥 뉴럴 네트워크(또는 모델)가 디바이스 상에서 동시에 실행될 것으로 예상된다. 따라서, 미래에는 NPU 뿐 아니라 GPU에서도 딥 러닝 추론이 계속 수행될 것으로 예상된다. As the need to perform deep learning on devices increases, it is expected that multiple deep neural networks (or models) will run concurrently on devices in the future. Therefore, it is expected that deep learning inference will continue to be performed not only on NPUs but also on GPUs in the future.

딥 러닝 네트워크(또는 모델)가 경량화됨에 따라 모바일 플랫폼(예를 들어, 스마트폰, 태블릿 PC 등)에서 경량화된 딥 러닝 모델의 수행이 더욱더 적합해 지고 있다. 희소성(Sparsity)-고려(aware) 추론은 모바일 플랫폼에서 효율적인 추론을 가능케 한다. 희소성-고려 추론에서 웨이트(weight) 프루닝의 결과로서 웨이트(또는 필터)의 많은 0(zero) 원소를 합성곱 연산에서 스킵하여 시간과 에너지를 절약할 수 있다. As deep learning networks (or models) become lightweight, it is becoming more and more suitable to perform lightweight deep learning models on mobile platforms (eg, smartphones, tablet PCs, etc.). Sparsity-aware reasoning enables efficient reasoning on mobile platforms. In sparsity-considered reasoning, time and energy can be saved by skipping many zero elements of a weight (or filter) in the convolution operation as a result of weight pruning.

최근 연구에 따르면, 프루닝된 희소 모델(pruned sparse model)이 희소성에 따른 연산을 고려하지 않은 조밀 모델(dense model)보다 동일하거나 더 높은 정확도를 가지는 것으로 알려져 있다.According to a recent study, it is known that the pruned sparse model has the same or higher accuracy than the dense model that does not consider the operation according to the sparsity.

딥 러닝 모델 전용의 NPU와는 달리 GPU는 불연속적인(uncoalesced) 메모리 액세스와 분기(divergent branch)에 그 성능이 주로 영향을 받는다. GPU의 프로세싱 요소(Processing elements : PE)가 개별적으로 임의 위치를 액세스를 하면 GPU의 성능은 매우 악화된다. 이는 희소성-고려 추론이 GPU에서 활발히 시도되지 못하는 이유이기도 한다. Unlike NPUs dedicated to deep learning models, GPU performance is mainly affected by uncoalesced memory accesses and divergent branches. When the processing elements (PE) of the GPU individually access arbitrary locations, the performance of the GPU deteriorates significantly. This is also why sparsity-considered inference is not actively tried on GPUs.

또한, 웨이트 프루닝에 더하여, ReLU와 같은 액티베이션 함수가 합성곱 또는 풀링 함수 출력의 음수를 0으로 변경하기에 많은 희소성이 레이어별 피쳐 맵(feature map 또는 액티베이션 맵(activation map))에서 발견된다.Also, in addition to weight pruning, a lot of sparseness is found in layer-by-layer feature maps (feature maps or activation maps), since activation functions such as ReLU change negative numbers of convolution or pooling function outputs to zero.

따라서, 모바일 플랫폼을 구성하는 모바일 GPU에서 딥 러닝 모델의 희소성을 효율적으로 이용하는 것이 매우 중요하다. 전형적인 GPU는 대량의 파인-그레인(fine grain) 데이터 병렬화를 구현하기 위해 SIMD(Single Instruction Muliple Data) 구조를 채용하고 있어 비구조화된 희소성을 효율적으로 사용하기에는 많은 문제점이 있다. Therefore, it is very important to efficiently utilize the sparsity of the deep learning model in the mobile GPU constituting the mobile platform. A typical GPU employs a SIMD (Single Instruction Mule Data) structure to implement a large amount of fine-grain data parallelism, so there are many problems in efficiently using unstructured sparsity.

딥 러닝 모델의 학습단계에서 결정되는 레이어들의 웨이트들의 희소성은 추론단계에서 효율적으로 재사용 가능한 연구가 알려져 있다. It is known that studies on the sparseness of the weights of the layers determined in the learning stage of a deep learning model can be efficiently reused in the inference stage.

반면, ReLU 함수와 같은 액티베이션 또는 활성화(activation) 함수에서 야기되는 액티베이션 희소성을 효율적으로 처리하기 위한 연구는 알려져 있지 않다. 학습 단계에서 고정되는 웨이트와는 달리, 비록 ReLU 함수가 대략 50% 이상의 원소를 0으로 변환함에도 추론의 입력 데이터에 따라 동적으로 변화하는 ReLU 함수에 의해 생성되는 피쳐 맵 또는 액티베이션 맵의 비정규성으로 인해 이를 효율적으로 처리하기란 결코 용이치 않다. On the other hand, there is no known study for efficiently dealing with activation sparsity caused by activation or activation functions such as the ReLU function. Unlike the weight fixed in the learning phase, although the ReLU function converts more than 50% of the elements to zero, due to the non-normality of the feature map or activation map generated by the ReLU function dynamically changing according to the input data of the inference It is never easy to deal with this efficiently.

이와 같이, 딥 러닝 모델의 액티베이션 맵의 희소성을 효율적으로 처리하기 위한 딥 러닝 모델 추론 방안이 필요하다.As such, a deep learning model inference method is needed to efficiently handle the sparsity of the activation map of the deep learning model.

본 발명은, 상술한 문제점을 해결하기 위해서 안출한 것으로서, 모바일 플랫폼에서 수행되는 딥 러닝 모델의 추론 속도를 추론 정확도의 저하 없이 향상시킬 수 있는 딥 러닝 모델 학습 방법 및 학습기를 제공하는 데 그 목적이 있다. The present invention has been devised to solve the above problems, and the purpose of the present invention is to provide a deep learning model learning method and a learner that can improve the inference speed of a deep learning model performed on a mobile platform without lowering inference accuracy. have.

또한, 본 발명은 딥 러닝 모델의 액티베이션 맵의 희소성을 극대화하여 모바일 플랫폼의 GPU 또는 NPU에서 추론 속도를 향상시킬 수 있는 딥 러닝 모델 학습 방법 및 학습기를 제공하는 데 그 목적이 있다.In addition, an object of the present invention is to provide a deep learning model learning method and learner capable of improving the inference speed in the GPU or NPU of a mobile platform by maximizing the sparsity of the activation map of the deep learning model.

또한, 본 발명은 딥 러닝 모델의 레이어별 수행 소요시간과 희소성 비율의 상관관계를 이용하여 정확도 저하 없이 복수의 레이어의 희소성 비율을 높여 모바일 플랫폼에서의 추론 속도를 향상시킬 수 있는 딥 러닝 모델 학습 방법 및 학습기를 제공하는 데 그 목적이 있다.In addition, the present invention provides a deep learning model learning method that can improve the inference speed on a mobile platform by increasing the sparsity ratio of a plurality of layers without reducing accuracy by using the correlation between the execution time for each layer of the deep learning model and the sparsity ratio and to provide a learning machine.

본 발명에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다. The technical problems to be achieved in the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those of ordinary skill in the art to which the present invention belongs from the description below. will be able

본 발명의 일 양상에 따른 딥 러닝 모델 학습 방법은 (a) 학습된 딥 러닝 모델의 N 개의 레이어를 선택하는 단계, (b) 선택된 N 개의 레이어에 선행하는 N 개의 레이어에 대해, 희소화 파라미터에 따라 희소화 파라미터에 대응하는 임계치를 각각 설정하는 단계 및 (c) 선행하는 N 개의 레이어에 대해 설정된 각각의 임계치를 이용하여 딥 러닝 모델을 튜닝하는 단계를 포함하고, N 은 1 이상의 정수이다. A deep learning model learning method according to an aspect of the present invention includes the steps of (a) selecting N layers of a trained deep learning model, (b) for N layers preceding the selected N layers, to the sparsity parameter. and (c) tuning the deep learning model using each threshold set for the preceding N layers, wherein N is an integer greater than or equal to 1, respectively.

상기한 딥 러닝 모델 학습 방법에 있어서, 튜닝된 딥 러닝 모델의 정확도를 산출하고 산출된 정확도가 임계 정확도 이상인지를 판단하는 단계를 더 포함하고, 산출된 정확도가 임계 정확도 미만인 경우, 딥 러닝 모델 학습 방법은 희소화 파라미터를 감소시키고 단계 (b) 및 단계 (c)를 반복 수행한다. In the deep learning model learning method described above, further comprising calculating the accuracy of the tuned deep learning model and determining whether the calculated accuracy is greater than or equal to a threshold accuracy, when the calculated accuracy is less than the threshold accuracy, learning the deep learning model The method reduces the sparing parameter and repeats steps (b) and (c).

상기한 딥 러닝 모델 학습 방법에 있어서, 산출된 정확도가 임계 정확도 이상인 경우, 딥 러닝 모델 학습 방법은 N을 1 증가시키고 산출된 정확도에 대응하는 희소화 파라미터로부터 단계 (a) 내지 단계 (c)를 반복 수행한다. In the deep learning model training method described above, when the calculated accuracy is equal to or greater than the threshold accuracy, the deep learning model training method increases N by 1 and performs steps (a) to (c) from the sparsity parameter corresponding to the calculated accuracy. Repeat.

상기한 딥 러닝 모델 학습 방법에 있어서, 단계 (a) 내지 단계 (d)의 수행 이후에, 튜닝된 딥 러닝 모델과 대응하는 튜닝 파라미터를 저장하고, 튜닝 파라미터를 이용하여 입력 데이터로부터 추론을 수행하는 튜닝된 딥 러닝 모델은 튜닝 파라미터를 통해 설정되는 임계치를 이용하여 선행하는 N 개 레이어의 액티베이션 함수를 수행한다. In the deep learning model learning method described above, after performing steps (a) to (d), storing the tuned deep learning model and the corresponding tuning parameters, and performing inference from the input data using the tuning parameters The tuned deep learning model performs the activation function of the preceding N layers using a threshold set through a tuning parameter.

상기한 딥 러닝 모델 학습 방법에 있어서, 단계 (a)는 학습된 딥 러닝 모델의 프로파일링에 따른 전체 레이어들의 수행 시간에 기초하여 N 개의 레이어를 선택한다.In the deep learning model learning method described above, step (a) selects N layers based on the execution time of all layers according to profiling of the learned deep learning model.

상기한 딥 러닝 모델 학습 방법에 있어서, 단계 (a)는 전체 레이어들 각각을 수행 시간에 따라 순서화하고 가장 긴 수행 시간을 가지는 N 개의 레이어를 선택하거나 전체 레이어 각각의 희소화에 따른 수행 시간 감소율에 따라 전체 레이어들을 순서화하고 가장 수행 시간 감소율이 높은 N 개의 레이어를 선택한다. In the above-described deep learning model learning method, step (a) is to order each of the entire layers according to the execution time and select the N layers having the longest execution time or to reduce the execution time according to the sparsity of each of the entire layers. Accordingly, all layers are ordered and N layers with the highest reduction in execution time are selected.

상기한 딥 러닝 모델 학습 방법에 있어서, 단계 (b)는 선행하는 N 개의 레이어의 프로파일링에 따른 각각의 액티베이션 함수 입력 분포에서 희소화 파라미터에 대응하는 각각의 입력 값을 선행하는 N 개 레이어의 각각의 임계치로 설정한다.In the above-described deep learning model learning method, step (b) is performed in each of the N layers preceding each input value corresponding to the sparing parameter in each activation function input distribution according to the profiling of the preceding N layers. set to the threshold of

또한, 본 발명의 일 양상에 따른 딥 러닝 모델 학습기는 프로그램의 명령어를 수행하는 제어부 및 딥 러닝 모델 학습 프로그램과 학습된 딥 러닝 모델을 저장하는 저장부를 포함하고, 딥 러닝 모델 학습 프로그램을 수행하는 제어부는, 학습된 딥 러닝 모델의 N 개의 레이어를 선택하고, 선택된 N 개의 레이어에 선행하는 N 개의 레이어에 대해 희소화 파라미터에 따라 희소화 파라미터에 대응하는 임계치를 각각 설정하고, 선행하는 N 개의 레이어에 대해 설정된 각각의 임계치를 이용하여 딥 러닝 모델을 튜닝하며, N 은 1 이상의 정수이다. In addition, the deep learning model learner according to an aspect of the present invention includes a control unit for executing a command of the program, a deep learning model learning program and a storage unit for storing the learned deep learning model, and a control unit for performing a deep learning model learning program selects N layers of the trained deep learning model, sets thresholds corresponding to the sparsation parameters according to the sparsization parameters for N layers preceding the selected N layers, respectively, and sets the N layers preceding the selected N layers. Tune the deep learning model using each threshold set for , where N is an integer greater than or equal to 1.

상기한 딥 러닝 모델 학습기에 있어서, 상기 제어부는, 튜닝된 딥 러닝 모델의 정확도를 산출하고 산출된 정확도가 임계 정확도 이상인지를 판단하고, 산출된 정확도가 임계 정확도 미만인 경우 희소화 파라미터를 감소시키고, 선행하는 N 개의 레이어에 대해 감소된 희소화 파라미터에 따라 대응하는 임계치를 각각 설정하고 선행하는 N 개의 레이어에 대해 감소 설정된 각각의 임계치를 이용하여 딥 러닝 모델의 튜닝을 반복한다.In the deep learning model learner, the control unit calculates the accuracy of the tuned deep learning model, determines whether the calculated accuracy is greater than or equal to a threshold accuracy, and reduces the sparing parameter when the calculated accuracy is less than the threshold accuracy, For the preceding N layers, a corresponding threshold is set according to the reduced sparsity parameter, respectively, and the tuning of the deep learning model is repeated using each threshold set to be reduced for the preceding N layers.

상기한 딥 러닝 모델 학습기에 있어서, 제어부는, 산출된 정확도가 임계 정확도 이상인 경우 N을 1 증가시키고, 학습된 딥 러닝 모델의 1 증가된 N 개의 레이어를 선택하고, N 개의 레이어에 선행하는 N 개의 레이어에 대해 산출된 정확도에 대응하는 희소화 파라미터에 대응하는 임계치를 각각 설정하고, 선행하는 N 개의 레이어에 대해 설정된 각각의 임계치를 이용하여 딥 러닝 모델의 튜닝을 반복한다. In the above-described deep learning model learner, when the calculated accuracy is greater than or equal to the threshold accuracy, the control unit increases N by 1, selects N layers increased by 1 of the learned deep learning model, and N data preceding the N layers Thresholds corresponding to the sparing parameters corresponding to the calculated accuracy for the layers are respectively set, and the tuning of the deep learning model is repeated using the respective thresholds set for the preceding N layers.

상기한 딥 러닝 모델 학습기에 있어서, 제어부는 튜닝된 딥 러닝 모델과 대응하는 튜닝 파라미터를 저장부에 저장하고, 튜닝 파라미터를 이용하여 입력 데이터로부터 추론을 수행하는 튜닝된 딥 러닝 모델은 튜닝 파라미터를 통해 설정되는 임계치를 이용하여 선행하는 N 개 레이어의 액티베이션 함수를 수행한다. In the deep learning model learner described above, the control unit stores the tuned deep learning model and the corresponding tuning parameters in the storage unit, and the tuned deep learning model that performs inference from the input data using the tuning parameters is performed through the tuning parameters. The activation function of the preceding N layers is performed using the set threshold.

상기한 딥 러닝 모델 학습기에 있어서, 제어부는, 학습된 딥 러닝 모델의 프로파일링에 따라, 전체 레이어들 각각을 수행 시간에 따라 순서화하고 가장 긴 수행 시간을 가지는 N 개의 레이어를 선택하거나 전체 레이어 각각의 희소화에 따른 수행 시간 감소율에 따라 전체 레이어들을 순서화하고 가장 수행 시간 감소율이 높은 N 개의 레이어를 선택한다.In the deep learning model learner, the control unit orders each of the entire layers according to the execution time according to the profiling of the learned deep learning model and selects N layers having the longest execution time or each of the entire layers All layers are ordered according to the reduction in execution time according to the sparsity, and N layers with the highest reduction in execution time are selected.

상기한 딥 러닝 모델 학습기에 있어서, 제어부는, 선행하는 N 개의 레이어의 프로파일링에 따른 각각의 액티베이션 함수 입력 분포에서 희소화 파라미터에 대응하는 각각의 입력 값을 선행하는 N 개 레이어의 각각의 임계치로 설정한다. In the above-described deep learning model learner, the control unit sets each input value corresponding to the sparing parameter in each activation function input distribution according to the profiling of the preceding N layers to each threshold of the preceding N layers. set

상기와 같은 본 발명에 따른 딥 러닝 모델 학습 방법 및 학습기는 모바일 플랫폼에서 수행되는 딥 러닝 모델의 추론 속도를 추론 정확도의 저하 없이 향상시킬 수 있는 효과가 있다. The deep learning model learning method and learner according to the present invention as described above has the effect of improving the inference speed of the deep learning model performed on the mobile platform without degrading the inference accuracy.

또한, 상기와 같은 본 발명에 따른 딥 러닝 모델 학습 방법 및 학습기는 딥 러닝 모델의 액티베이션 맵의 희소성을 극대화하여 모바일 플랫폼의 GPU 또는 NPU에서 추론 속도를 향상시킬 수 있는 효과가 있다.In addition, the deep learning model learning method and learner according to the present invention as described above has the effect of maximizing the scarcity of the activation map of the deep learning model to improve the inference speed in the GPU or NPU of the mobile platform.

또한, 상기와 같은 본 발명에 따른 딥 러닝 모델 학습 방법 및 학습기는 딥 러닝 모델의 레이어별 수행 소요시간과 희소성 비율의 상관관계를 이용하여 정확도 저하 없이 복수의 레이어의 희소성 비율을 높여 모바일 플랫폼에서의 추론 속도를 향상시킬 수 있는 효과가 있다. In addition, the deep learning model learning method and learner according to the present invention as described above uses the correlation between the execution time for each layer of the deep learning model and the sparsity ratio to increase the sparsity ratio of a plurality of layers without lowering accuracy in the mobile platform. It has the effect of improving the inference speed.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다. The effects obtainable in the present invention are not limited to the above-mentioned effects, and other effects not mentioned may be clearly understood by those of ordinary skill in the art from the following description. will be.

도 1은 딥 러닝 모델 학습기의 예시적인 블록도를 도시한 도면이다.
도 2는 딥 러닝 모델의 예시적인 내부 구조를 도시한 도면이다.
도 3은 딥 러닝 모델의 튜닝을 위한 주요 과정을 도시한 도면이다.
도 4는 딥 러닝 모델의 튜닝을 위한 제어 흐름을 도시한 도면이다.
도 5는 딥 러닝 모델의 튜닝을 위한 예시적인 슈도 코드를 도시한 도면이다.
도 6은 특정 액티베이션 함수들에서의 입력 분포의 예를 도시한 도면이다.
도 7은 딥 러닝 모델 추론기의 예시적인 블록도를 도시한 도면이다.
도 8은 보조 데이터 구조의 생성 알고리즘의 예를 도시한 도면이다.
도 9는 보조 데이터 구조를 이용하여 합성곱 연산을 수행하는 예를 도시한 도면이다. 1 is a diagram illustrating an exemplary block diagram of a deep learning model learner.
2 is a diagram illustrating an exemplary internal structure of a deep learning model.
3 is a diagram illustrating a main process for tuning a deep learning model.
4 is a diagram illustrating a control flow for tuning a deep learning model.
5 is a diagram illustrating an exemplary pseudo code for tuning a deep learning model.
6 is a diagram illustrating an example of an input distribution in specific activation functions.
7 is a diagram illustrating an exemplary block diagram of a deep learning model reasoner.
8 is a diagram illustrating an example of an algorithm for generating an auxiliary data structure.
9 is a diagram illustrating an example of performing a convolution operation using an auxiliary data structure.

상술한 목적, 특징 및 장점은 첨부된 도면을 참조하여 상세하게 후술 되어 있는 상세한 설명을 통하여 더욱 명확해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시 예를 상세히 설명하기로 한다. The above-described objects, features and advantages will become more clear through the detailed description described below in detail with reference to the accompanying drawings, and accordingly, those of ordinary skill in the art to which the present invention pertains can understand the technical spirit of the present invention. can be easily implemented. In addition, in the description of the present invention, when it is determined that a detailed description of a known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 딥 러닝 모델 학습기(100)의 예시적인 블록도를 도시한 도면이다. 1 is a diagram illustrating an exemplary block diagram of a deep learning model learner 100 .

도 1에 따르면, 딥 러닝 모델 학습기(100)는 통신부(110), 저장부(150) 및 제어부(190)를 포함한다. 도 1의 블록도는 바람직하게는 기능 블록도를 나타내고 대응하는 하드웨어 블록을 각 기능 블록들은 구비한다. 예를 들어, 딥 러닝 모델 학습기(100)는 개인용 컴퓨터, 노트북 또는 서버로 구성될 수 있다. 딥 러닝 모델 학습기(100)는 도 1에 도시되지 않은 다른 블록을 더 포함할 수 있다. Referring to FIG. 1 , the deep learning model learner 100 includes a communication unit 110 , a storage unit 150 , and a control unit 190 . The block diagram of Fig. 1 preferably shows a functional block diagram, each functional block having a corresponding hardware block. For example, the deep learning model learner 100 may be configured as a personal computer, a laptop computer, or a server. The deep learning model learner 100 may further include other blocks not shown in FIG. 1 .

도 1을 통해 딥 러닝 모델 학습기(100)를 살펴보면, 통신부(110)는 무선랜, 유선랜, 광랜 등의 통신을 위한 통신칩셋과 안테나 등을 포함하여 로컬 네트워크나 인터넷 등의 광대역 네트워크를 통해 각종 데이터를 송수신한다. 예를 들어, 통신부(110)는 (미리) 학습된 딥 러닝 모델(300)을 다른 기기, 장치, 디바이스 등으로부터 인터넷이나 로컬 네트워크를 통해 수신하거나 (미리) 학습된 딥 러닝 모델(300)을 튜닝한 딥 러닝 모델(300)을 다른 기기, 장치, 디바이스 등으로 전송할 수 있다. 통신부(110)는 튜닝된 딥 러닝 모델(300)을 스마트폰이나 태블릿 PC 등의 모바일 디바이스로 전송할 수 있다. Looking at the deep learning model learner 100 through FIG. 1 , the communication unit 110 includes a communication chipset and an antenna for communication such as wireless LAN, wired LAN, optical LAN, etc., and various data through a broadband network such as a local network or the Internet. send and receive For example, the communication unit 110 receives the (pre)trained deep learning model 300 from other devices, devices, devices, etc. through the Internet or a local network, or tunes the (pre)trained deep learning model 300 . One deep learning model 300 may be transmitted to another device, apparatus, device, or the like. The communication unit 110 may transmit the tuned deep learning model 300 to a mobile device such as a smartphone or a tablet PC.

저장부(150)는 각종 데이터와 프로그램을 저장한다. 저장부(150)는 휘발성 메모리, 비휘발성 메모리 및/또는 하드디스크와 같은 대용량 저장매체를 포함하여 적어도 딥 러닝 모델(300)을 튜닝하기 위한 딥 러닝 모델 학습 프로그램과 미리 학습된 딥 러닝 모델과 학습된 딥 러닝 모델에 대해 튜닝된 딥 러닝 모델(300)을 저장한다. The storage unit 150 stores various data and programs. The storage unit 150 includes a volatile memory, a non-volatile memory and/or a mass storage medium such as a hard disk, at least a deep learning model learning program for tuning the deep learning model 300 and a pre-trained deep learning model and learning Stores the deep learning model 300 tuned for the deep learning model.

제어부(190)는 딥 러닝 모델 학습기(100)를 제어한다. 제어부(190)는 프로그램의 명령어를 수행할 수 있도록 구성되고 저장부(150)의 딥 러닝 모델 학습 프로그램을 수행하여 미리 학습된 딥 러닝 모델(300)을 본 발명에 따라 튜닝할 수 있도록 구성된다. 제어부(190)는 딥 러닝 모델 학습 프로그램의 수행을 통해 딥 러닝 모델에 대해 튜닝된 딥 러닝 모델(300)과 튜닝 파라미터를 생성하여 저장부(150)에 저장할 수 있다. The controller 190 controls the deep learning model learner 100 . The control unit 190 is configured to execute the command of the program, and to perform the deep learning model learning program of the storage unit 150 to tune the deep learning model 300 previously learned according to the present invention. The controller 190 may generate and store the deep learning model 300 and tuning parameters tuned for the deep learning model through the execution of the deep learning model learning program in the storage unit 150 .

프로그램의 명령어를 수행하는 제어부(190)는 제1 프로세싱 유닛(191)과 제2 프로세싱 유닛(195)을 포함한다. 제1 프로세싱 유닛(191)은 예를 들어, CPU, NPU, 프로세서 등일 수 있고 제2 프로세싱 유닛(195)은 GPU 등 일 수 있다. The control unit 190 for executing the program instruction includes a first processing unit 191 and a second processing unit 195 . The first processing unit 191 may be, for example, a CPU, NPU, processor, etc. and the second processing unit 195 may be a GPU or the like.

딥 러닝 모델 학습 프로그램을 수행하는 제어부(190)는 제1 프로세싱 유닛(191)과 제2 프로세싱 유닛(195)에 학습된 딥 러닝 모델(300)(의 프로그램)을 수행시켜 프로파일링에 따른 각종 정보를 획득하고 획득된 정보에 따라 딥 러닝 모델을 튜닝하고 튜닝된 딥 러닝 모델과 튜닝 파라미터를 생성하여 저장부(150)에 저장하고 이후 모바일 디바이스 등으로 튜닝된 딥 러닝 모델(과 튜닝 파라미터)을 전송하거나 모바이 디바이스 등에 탑재시킬 수 있다.The control unit 190 for performing the deep learning model learning program executes the deep learning model 300 (program of) learned in the first processing unit 191 and the second processing unit 195, and various information according to profiling to acquire and tune the deep learning model according to the acquired information, generate the tuned deep learning model and tuning parameters, store it in the storage 150, and then transmit the tuned deep learning model (and tuning parameters) to a mobile device, etc. Or it can be mounted on a mobile device or the like.

적어도, 제어부(190)는 1차 학습된 딥 러닝 모델(300)의 레이어(310)들에서 생성되는 3차원( 이상)의 텐서(tensor)의 희소성(sparcity)을 높일 수 있도록 튜닝한다. 튜닝 과정에서 특정 레이어(310)의 출력 텐서는 설정된 희소성 비율 이상의 0을 가지도록 설정된다. 또한, 제어부(190)는 레이어(310)에서 희소성이 강제로 높아짐에도 정확도 손실을 없거나 일정 비율 이내로 설정되도록 각 레이어(310)의 웨이트(또는 필터)를 튜닝한다.At least, the controller 190 tunes to increase the sparsity of the three-dimensional (or more) tensor generated in the layers 310 of the first-learned deep learning model 300 . In the tuning process, the output tensor of the specific layer 310 is set to have 0 greater than or equal to the set sparsity ratio. In addition, the controller 190 tunes the weight (or filter) of each layer 310 so that there is no loss of accuracy or is set within a certain ratio even when the rarity is forcibly increased in the layer 310 .

희소화를 위한 튜닝에 따라, 딥 러닝 모델(300)이 희소화됨에 따라 해당 딥 러닝 모델(300)을 수행하는 딥 러닝 모델 추론기(200)는 추론을 위한 입력 데이터에 따라 동적으로 변경되거나 변화하는 희소(sparse) 텐서를 이용하여 추론 속도를 향상시킬 수 있다. According to tuning for sparsity, as the deep learning model 300 is sparse, the deep learning model reasoner 200 performing the deep learning model 300 is dynamically changed or changed according to the input data for inference. Inference speed can be improved by using a sparse tensor.

튜닝 관련하여 딥 러닝 모델 학습기(100)(의 제어부(190))에서 이루어지는 구체적인 제어 흐름은 도 2 내지 도 5를 통해 좀 더 상세히 살펴보도록 한다.A detailed control flow performed in the deep learning model learner 100 (controller 190 of) in relation to tuning will be looked at in more detail through FIGS. 2 to 5 .

도 2는 딥 러닝 모델(300)의 예시적인 내부 구조를 도시한 도면이다. 2 is a diagram illustrating an exemplary internal structure of the deep learning model 300 .

도 2에 따르면, 딥 러닝 모델(300)은 내부에 다수의 레이어(310)를 포함하고, 각각의 레이어(310)는 합성곱(convolution) 연산과 액티베이션(activation) 연산과 풀링(Pooling) 연산을 포함하여 각각의 입력 텐서를 출력 텐서로 생성하여 출력한다. 하나 이상의 특정 레이어(310)는 풀링 연산을 생략하여 구성될 수 있다. According to FIG. 2 , the deep learning model 300 includes a plurality of layers 310 therein, and each layer 310 performs a convolution operation, an activation operation, and a pooling operation. It generates and outputs each input tensor as an output tensor. One or more specific layers 310 may be configured by omitting a pooling operation.

딥 러닝 모델(300)은 알려져 있거나 알려질 임의의 딥 러닝 모델로서 예를 들어, 모바일넷(MobileNet), 레스넷(ResNet), 인셉션(Inception) 등일 수 있다. The deep learning model 300 is any known or to be known deep learning model, and may be, for example, MobileNet, ResNet, Inception, or the like.

각각의 딥 러닝 모델은 설계 예에 따라 서로 다른 개수의 레이어(310)들을 가질 수 있고 각 레이어(310)들은 입력 텐서(tensor)와 웨이트들 사이의 합성곱과 후속하는 액티베이션 연산(319)과 나아가 풀링 연산(315)을 통해 생성되는 출력 텐서를 후속하는 레이어(310)들의 입력 텐서로 제공한다. Each deep learning model may have a different number of layers 310 according to a design example, and each layer 310 includes a convolution between an input tensor and weights and a subsequent activation operation 319 and further An output tensor generated through the pooling operation 315 is provided as an input tensor of subsequent layers 310 .

각 레이어(310)의 웨이트(또는 필터)의 개수는 서로 다를 수 있고(도 2의 L, M, N 참조), 각각의 웨이트에 대해 합성곱, 액티베이션 및 풀링 함수의 수행으로 후속하는 레이어(310)로 웨이트 개수 만큼의 2차원 어레이(텐서)를 출력할 수 있다. The number of weights (or filters) of each layer 310 may be different (see L, M, N in FIG. 2 ), and the subsequent layers 310 by performing convolution, activation, and pooling functions for each weight. ) to output a two-dimensional array (tensor) as many as the number of weights.

딥 러닝 모델(300)의 수행 시간은 주로 합성곱 연산(311)에 의해 소요된다. 합성곱 연산(311)은 입력 텐서의 데이터 원소와 웨이트의 데이터 원소 사이의 곱과 곱들의 내적으로 이루어져 많은 시간이 요한다. 웨이트는 추론기(200)에서 고정되어 사용되는 반면 입력 텐서와 출력 텐서는 입력 데이터에 기초하여 생성되어 추론 과정에서 가변적인 특징을 가진다. The execution time of the deep learning model 300 is mainly consumed by the convolution operation 311 . The convolution operation 311 requires a lot of time because it consists of the product between the data element of the input tensor and the data element of the weight and the dot product of the products. While the weight is fixed and used in the reasoner 200 , the input tensor and the output tensor are generated based on input data and have variable characteristics in the reasoning process.

레이어(310)의 합성곱 연산(311)에 입력 인자인 입력 텐서의 대다수가 0인 경우, 합성곱 연산(311)에서 곱 연산을 생략할 수 있어 추론 속도를 향상시킬 수 있다. When the majority of input tensors that are input factors to the convolution operation 311 of the layer 310 are 0, the multiplication operation may be omitted from the convolution operation 311, thereby improving the inference speed.

특정 레이어(310)(도 2의 ⓐ 참조)로 제공되는 입력 텐서는 직전 레이어(310)(도 2의 ⓑ 참조)의 출력 텐서이고 직전 레이어(310)의 출력 텐서는 직전 레이어(310)의 액티베이션 함수(연산)를 통해 생성된다. 직전 레이어(310)의 출력 텐서는 액티베이션 함수에 의해 또는 액티베이션 함수와 함께 후속하는 풀링 함수에 의해 생성된다. The input tensor provided to a specific layer 310 (see ⓐ in FIG. 2 ) is the output tensor of the immediately preceding layer 310 (see ⓑ in FIG. 2 ), and the output tensor of the immediately preceding layer 310 is the activation of the immediately preceding layer 310 . It is created through a function (operation). The output tensor of the immediately preceding layer 310 is generated by an activation function or a subsequent pooling function with an activation function.

액티베이션 연산(319)의 액티베이션 함수는 합성곱 연산(311) 후의 합성곱 데이터를 필터링할 목적의 함수로서 예를 들어 ReLU 함수이거나 시그모이드 함수 등일 수 있다. ReLU 함수는 합성곱 데이터 입력이 0 미만인 경우 0으로 설정하여 출력한다. 액티베이션 함수를 통해 출력되는 출력 텐서는 프로파일링 결과 50% 내외의 0을 가지고 심지어 90% 이상의 0을 가지는 출력 텐서가 또한 발견된다. The activation function of the activation operation 319 is a function for filtering the convolution data after the convolution operation 311, and may be, for example, a ReLU function or a sigmoid function. If the convolutional data input is less than 0, the ReLU function sets it to 0 and outputs it. The output tensor output through the activation function has zeros within 50% of the profiling result, and even output tensors with zeros over 90% are also found.

액티베이션 함수(와 나아가 풀링 함수)를 통해 출력되는 출력 텐서에 높은 비율의 0을 가지는 경우, 후속하는 레이어(310)의 합성곱 연산(311)에서 입력 텐서의 0인 원소와 웨이트 원소 사이의 곱 연산을 생략할 수 있고 이는 NPU나 GPU를 포함하는 제2 프로세싱 유닛(293) 및/또는 제3 프로세싱 유닛(295)에서 고속의 추론 성능을 제공할 수 있다. 더욱이, 고속의 추론 성능을 제공하면서 그 정확도가 떨어지지 않는다면 매우 바람직할 것이다. When the output tensor output through the activation function (and furthermore the pooling function) has a high ratio of zeros, the product operation between the zero element and the weight element of the input tensor in the convolution operation 311 of the subsequent layer 310 may be omitted, which may provide high-speed inference performance in the second processing unit 293 and/or the third processing unit 295 including the NPU or GPU. Moreover, it would be highly desirable to provide high-speed inference performance without compromising accuracy.

도 3은 딥 러닝 모델(300)의 튜닝을 위한 주요 과정을 도시한 도면이다.3 is a diagram illustrating a main process for tuning the deep learning model 300 .

도 3의 튜닝의 주요 과정은 딥 러닝 모델 학습기(100)에 의해 수행되고 바람직하게는 딥 러닝 모델 학습 프로그램 또는 다른 프로그램을 수행하는 제어부(190)를 통해 이루어진다. The main process of tuning in FIG. 3 is performed by the deep learning model learner 100 and preferably through the control unit 190 for performing a deep learning model learning program or other program.

먼저, 딥 러닝 모델 학습기(100)(의 제어부(190))는 이미 1차 학습이 이루어진 딥 러닝 모델(300)을 프로파일링(S1)한다. 딥 러닝 모델 학습 프로그램을 수행하는 제어부(190)는 테스트 세트를 대상으로 딥 러닝 모델을 딥 러닝 모델의 프로그램에 따라 제1 프로세싱 유닛(191)과 제2 프로세싱 유닛(195)에서 수행시키고, 딥 러닝 모델의 각 레이어(310)의 수행 시간을 측정한다. 또한, 제어부(190)는 각 레이어(310)의 액티베이션 함수에 입력되는 입력 분포를 측정한다. First, the deep learning model learner 100 (control unit 190 of) profiles (S1) the deep learning model 300 in which primary learning has already been performed. The control unit 190 for performing a deep learning model learning program performs a deep learning model on a test set in the first processing unit 191 and the second processing unit 195 according to the program of the deep learning model, and deep learning The execution time of each layer 310 of the model is measured. In addition, the controller 190 measures an input distribution input to the activation function of each layer 310 .

제어부(190)는 다수의 테스트 세트(예를 들어, 256 개 단위의 테스트 세트)를 대상으로 딥 러닝 모델을 수행하여 각 레이어(310)의 평균 수행 시간을 측정할 수 있다. 또한, 제어부(190)는 각 레이어(310)의 액티베이션 함수에 입력되는 입력 분포를 측정하고 입력 분포 테이블을 구성한다. The controller 190 may measure an average execution time of each layer 310 by performing a deep learning model on a plurality of test sets (eg, 256 test sets). In addition, the controller 190 measures an input distribution input to the activation function of each layer 310 and configures an input distribution table.

제어부(190)는 각 테스트 세트에 대해 각 레이어(310)의 액티베이션 함수에 입력되는 입력 데이터를 크기에 따라 순서화시켜 입력 분포를 구성할 수 있다. 또한, 제어부(190)는 각 테스트 세트별 입력 분포를 평균화하여 입력 분포를 결정할 수 있다. The controller 190 may configure an input distribution by ordering input data input to the activation function of each layer 310 for each test set according to sizes. Also, the controller 190 may determine the input distribution by averaging the input distribution for each test set.

입력 분포 테이블은 분포 하위 N %와 대응하는 입력값으로 설정될 수 있고 예를 들어, 전체 입력 분포(테스트에 이용된 전체 입력 데이터들의 순서)에서 하위 50%에 대응하는 입력값, 하위 51%에 대응하는 입력값, 하위 N%(N은 99 이하의 정수)에 대응하는 입력값으로 설정될 수 있다. 각 레이어(310)의 입력 분포는 레이어(310)의 필터링의 특징에 따라 실시간의 추론시에도 대략적으로 동일하거나 극히 유사한 것으로 확인된다.The input distribution table can be set with the input values corresponding to the lower N % of the distribution, for example, the input values corresponding to the lower 50%, the lower 51% in the total input distribution (the order of the total input data used for the test). The corresponding input value may be set as an input value corresponding to the lower N% (N is an integer less than or equal to 99). It is confirmed that the input distribution of each layer 310 is approximately the same or extremely similar even during real-time inference according to the filtering characteristics of the layer 310 .

각 레이어(310)의 액티베이션 함수의 입력 분포는 각 레이어(310)의 위치, 선후 레이어(310)의 특성 등에 따라 가변적일 수 있다. 예를 들어 특정 레이어(310)는 -150에서 +250 사이의 값을 가지고(도 6의 (a) 참조) 다른 특정 레이어(310)는 -5에서 5 사이의 값을 가질 수 있고(도 6의 (b) 참조) 그 분포의 형상 또한 다를 수 있다. 프로파일링에 따라 제어부(190)는 각 레이어(310)에 대응하는 수행 시간과 각 레이어(310)의 분포 테이블을 저장부(150)에 저장할 수 있다. The input distribution of the activation function of each layer 310 may vary according to the position of each layer 310 , characteristics of the preceding and subsequent layers 310 , and the like. For example, a specific layer 310 may have a value between -150 and +250 (refer to (a) of FIG. 6) and another specific layer 310 may have a value between -5 and 5 (see FIG. 6(a)). See (b)) The shape of the distribution may also be different. According to the profiling, the controller 190 may store the execution time corresponding to each layer 310 and the distribution table of each layer 310 in the storage 150 .

나아가, 제어부(190)는 먼저 생성된 각 레이어(310)의 분포 테이블을 이용하여 각 레이어(310)의 액티베이션 함수(액티베이션 연산(319))의 출력을 희소화(sparcification)시키고 그에 따라 대응하여 감소되는 수행 시간을 측정할 수 있다. Furthermore, the control unit 190 uses the distribution table of each layer 310 generated first to sparsize the output of the activation function (activation operation 319) of each layer 310, and to reduce it accordingly. The execution time can be measured.

예를 들어, 제어부(190)는 각 레이어(310)의 분포 테이블에 따라 각 레이어(310)의 액티베이션 함수의 설정 임계치를 0이 아닌 지정된 희소화 비율로 설정하기 위한 임계치로 설정한다. 제어부(190)는 각 레이어(310)의 분포 테이블에서 지정된 희소화 비율(예를 들어, 90%)에 대응하는 입력값을 임계치로 설정한 후에 딥 러닝 모델(300)을 수행시키고 그에 따른 임계치로 설정된 액티베이션 함수에 후속하는 레이어(310)의 수행 시간을 측정한다. For example, the control unit 190 sets the threshold value for setting the activation function of each layer 310 to a specified sparing ratio other than 0 according to the distribution table of each layer 310 . The control unit 190 sets the input value corresponding to the specified sparsity ratio (eg, 90%) in the distribution table of each layer 310 as a threshold value, then performs the deep learning model 300 and sets the threshold value accordingly. The execution time of the layer 310 following the set activation function is measured.

제어부(190)는 희소화 전 수행 시간과 액티베이션 함소의 희소화 후 수행 시간 사이의 수행 시간 감소율을 각 레이어(310)에 매칭시켜 저장부(150)에 더 저장하거나 수행 시간 대신에 저장할 수 있다. The control unit 190 may match the execution time reduction rate between the execution time before sparsity and the execution time after the activation cancellation to each layer 310 to further store it in the storage unit 150 or store it instead of the execution time.

희소화를 통해 딥 러닝 모델(300)의 수행 속도를 향상시키고 수행 정확도를 유지시키기 위해 딥 러닝 모델의 프로파일링 이후에, 딥 러닝 모델 학습기(100)는 1차 미리 학습된 딥 러닝 모델을 튜닝(S2)한다. After profiling the deep learning model in order to improve the performance speed of the deep learning model 300 through sparing and maintain the performance accuracy, the deep learning model learner 100 tunes the first pre-trained deep learning model ( S2).

제어부(190)는 복수의 에폭(epoch)에 걸쳐 테스트 데이터 세트로 미리 학습된 딥 러닝 모델을 희소화 적용을 고려하여 파인 튜닝한다. 파인 튜닝(fine tuning 또는 transfer learning)은 딥 러닝 모델 학습 프로그램을 수행하는 제어부(190)에 의해서 이루어진다. The control unit 190 fine-tunes the deep learning model previously trained with the test data set over a plurality of epochs in consideration of the application of sparsity. Fine tuning (or transfer learning) is performed by the control unit 190 performing a deep learning model learning program.

도 4는 딥 러닝 모델(300)의 튜닝(S2)을 위한 제어 흐름을 도시한 도면이고 도 5는 딥 러닝 모델(300)의 튜닝을 위한 예시적인 슈도(pseudo) 코드를 도시한 도면이다.4 is a diagram illustrating a control flow for tuning (S2) of the deep learning model 300, and FIG. 5 is a diagram illustrating an exemplary pseudo code for tuning the deep learning model 300.

도 5의 슈도 코드는 도 4의 제어 흐름을 표현하는 형태의 코드로서 일부 제어 흐름과 대응하는 코드는 일부 다를 수 있다. 도 4의 제어 흐름과 도 5의 슈도 코드는 딥 러닝 모델 학습기(100)에 의해서 수행되고 바람직하게는 딥 러닝 모델 학습 프로그램을 수행하는 제어부(190)에 의해서 이루어진다. The pseudo code of FIG. 5 is a type of code representing the control flow of FIG. 4 , and codes corresponding to some control flows may be partially different. The control flow of FIG. 4 and the pseudo code of FIG. 5 are performed by the deep learning model learner 100 and preferably by the control unit 190 performing a deep learning model learning program.

먼저, 딥 러닝 모델 학습기(100)(의 제어부(190))는 희소화 파라미터와 희소화를 위해 선택되는 레이어(310)의 개수인 N 값을 초기화(S201)한다. 제어부(190)는 희소화 파라미터를 99%로 초기화하고 N 값을 1로 초기화(도 5의 ① 참조)한다. 또한, 제어부(190)는 학습된 딥 러닝 모델(300)의 (기준) 정확도를 미리 산출(도 5의 base_acc)할 수 있다. First, the deep learning model learner 100 (controller 190 of) initializes the N value, which is the number of the sparsation parameters and the number of layers 310 selected for sparsation (S201). The controller 190 initializes the sparsity parameter to 99% and initializes the N value to 1 (refer to ① in FIG. 5 ). Also, the controller 190 may calculate in advance (base_acc in FIG. 5 ) the (reference) accuracy of the learned deep learning model 300 .

이후, 딥 러닝 모델 학습기(100)(의 제어부(190))는 학습된 딥 러닝 모델(300)의 전체 레이어(310)들 중에서 N(N은 1 이상의 정수임.) 개의 레이어(310)를 선택(S203)(도 5의 ② 참조)한다. 제어부(190)는 학습된 딥 러닝 모델에 대한 프로파일링(도 3의 S1 참조)에 따른 딥 러닝 모델의 전체 레이어(310)들의 수행 시간에 기초하여 N 개의 레이어(310)를 선택한다. Then, the deep learning model learner 100 (control unit 190 of) selects N (N is an integer greater than or equal to 1) layers 310 among all the layers 310 of the learned deep learning model 300 ( S203) (refer to ② in FIG. 5). The controller 190 selects N layers 310 based on the execution time of all layers 310 of the deep learning model according to profiling (see S1 of FIG. 3 ) for the learned deep learning model.

제어부(190)는 프로파일링에 따른 저장부(150)에 저장되어 있는 레이어(310)들의 대응하는 수행 시간들을 가장 긴 시간에서 짧은 시간 순서로 순서화하고 가장 긴 수행 시간을 가지는 N 개의 레이어(310)를 선택할 수 있다. 또는 제어부(190)는 프로파일링에 따라 저장부(150)에 저장되어 있는 각 레이어(310)들의 수행 시간 감소율을 높은 순에서 낮은 순으로 순서화하고 가장 높은 수행 시간 감소율을 가지는 N 개의 레이어(310)를 선택할 수 있다. The control unit 190 orders the corresponding execution times of the layers 310 stored in the storage unit 150 according to the profiling from the longest time to the shortest time, and N layers 310 having the longest execution time. can be selected. Alternatively, the controller 190 orders the execution time reduction rate of each of the layers 310 stored in the storage unit 150 from highest to lowest according to the profiling and N layers 310 having the highest execution time reduction rate. can be selected.

딥 러닝 모델 학습기(100)(의 제어부(190))는 선택된 N 개의 레이어(310)에 선행하는 N 개의 레이어(310)에 대해(예를 들어, 도 2에서, ⓐ 레이어(310)가 선택된 경우 ⓐ 레이어(310)에 선행하는 ⓑ 레이어(310)에 대해) 설정된 희소화 파라미터에 따라 희소화 파라미터에 대응하는 임계치를 각각 설정(S205)(도 5의 ③ 참조)한다. The deep learning model learner 100 (control unit 190 of) for the N layers 310 preceding the selected N layers 310 (eg, in FIG. 2 , when the ⓐ layer 310 is selected) According to the set sparing parameter (for the ⓑ layer 310 that precedes the ⓐ layer 310 ), threshold values corresponding to the sparing parameters are respectively set ( S205 ) (see ③ in FIG. 5 ).

제어부(190)는 선행하는 N 개 레이어(310)의 프로파일링에 따른 각각의 액티베이션 함수 입력 분포에서 현재 설정된 희소화 파라미터에 대응하는 각각의 입력 값을 선행하는 N 개 레이어(310)의 각각의 임계치로 설정한다. The control unit 190 controls each threshold value of the N layers 310 preceding each input value corresponding to the currently set sparing parameter in each activation function input distribution according to the profiling of the preceding N layers 310 . set to

제어부(190)는 선행하는 N 개의 레이어(310) 각각의 저장부(150)에 저장된 분포 테이블에서 희소화 파라미터의 값에 매칭되는 임계치(입력값)를 각각의 임계치로 설정한다. 이에 따라, 각 선행하는 레이어(310)의 임계치는 선행 레이어(310)의 입력 분포에 따라 서로 다르게 설정된다. The controller 190 sets thresholds (input values) matching the values of the sparsization parameters in the distribution table stored in the storage 150 of each of the preceding N layers 310 as the respective thresholds. Accordingly, the threshold of each preceding layer 310 is set differently according to the input distribution of the preceding layer 310 .

딥 러닝 모델 학습기(100)(의 제어부(190))는 선행하는 N 개의 레이어(310)에 대해 설정된 입력 분포에 따라 동일한 희소화 파라미터에 대응하는 서로 다른 임계치를 이용하여 딥 러닝 모델(300)을 튜닝(S207)(도 5의 ④ 참조)한다.The deep learning model learner 100 (control unit 190 of) uses different thresholds corresponding to the same sparing parameters according to the input distribution set for the preceding N layers 310 to the deep learning model 300. Tune (S207) (refer to ④ in FIG. 5).

서로 다른 임계치를 이용한 튜닝 과정에서, 제어부(190)는 딥 러닝 모델(300)을 특정 N 개의 레이어(310)의 희소화를 반영시켜 학습시키고 학습에 따른 피드백으로 전체 레이어(310)의 웨이트들을 업데이트한다. In the tuning process using different thresholds, the control unit 190 learns the deep learning model 300 by reflecting the sparsity of specific N layers 310, and updates the weights of all layers 310 with feedback according to the learning. do.

딥 러닝 모델 학습기(100)(의 제어부(190))는 튜닝된 딥 러닝 모델의 정확도를 산출(S209)(도 5의 ⑤)하고 산출된 정확도를 임계 정확도와 비교(S211)(도 5의 ⑥ 참조)한다. The deep learning model learner 100 (control unit 190 of) calculates the accuracy of the tuned deep learning model (S209) (⑤ in FIG. 5) and compares the calculated accuracy with the threshold accuracy (S211) (⑥ in FIG. 5) see).

딥 러닝 모델 학습기(100)(의 제어부(190))는 산출된 정확도가 튜닝전 딥 러닝 모델(300)의 베이스 정확도에 기초한 임계 정확도(튜닝전 딥 러닝 모델의 베이스 정확도에 대한 비율이나 차) 미만인 경우, 설정된 희소화 파라미터를 감소(S213)시키고 감소된 희소화 파라미터를 대상으로 단계 S205 이하를 반복 수행할 수 있다. The deep learning model learner 100 (control unit 190 of) calculates the accuracy of the threshold accuracy based on the base accuracy of the deep learning model 300 before tuning (ratio or difference to the base accuracy of the deep learning model before tuning) less than In this case, the set sparsity parameter may be decreased ( S213 ), and steps S205 and lower may be repeatedly performed for the reduced sparsity parameter.

예를 들어, 제어부(190)는 현재 설정된 희소화 파라미터로부터 지정된 오프셋(예를 들어, 1 %)만큼 감소시켜 감소된 희소화 파라미터에 따른 선행하는 레이어(310)의 희소화로 딥 러닝 모델(300)을 재차 튜닝하고 그 정확도를 산출할 수 있다. For example, the control unit 190 reduces the currently set sparing parameter by a specified offset (eg, 1 %) to the deep learning model 300 by sparing the preceding layer 310 according to the reduced sparing parameter. can be retuned and the accuracy can be calculated.

산출된 정확도가 임계 정확도 이상인 경우, 딥 러닝 모델 학습기(100)(의 제어부(190))는 튜닝된 딥 러닝 모델(300)과 튜닝 파라미터를 저장부(150)에 저장(S217)(도 5의 ⑦ 참조)한다. 제어부(190)는 튜닝에 따라 변경되는 레이어(310)들의 웨이트들 어레이와 희소화가 이루어진 N에 따라 선택된 레이어 ID들과, 선택된 레이어(310)의 직전 N 개 레이어(310)의 액티베이션 함수의 임계치(입력 분포에 따른 임계치)를 튜닝 파라미터로 또는 딥 러닝 모델(300)에 포함시켜 저장할 수 있다. If the calculated accuracy is greater than or equal to the threshold accuracy, the deep learning model learner 100 (controller 190 of) stores the tuned deep learning model 300 and the tuning parameters in the storage unit 150 (S217) (FIG. 5) Refer to ⑦). The control unit 190 controls the array of weights of the layers 310 that are changed according to tuning, the layer IDs selected according to N where the sparsity is made, and the threshold value of the activation function of the N layers 310 immediately before the selected layer 310 ( threshold according to the input distribution) may be stored as a tuning parameter or included in the deep learning model 300 .

그 외 딥 러닝 모델은 희소화에 따라 속도 향상이 이루어진 레이어(310)에서 이용되는 변경된 레이어 프로그램-모듈을 더 포함할 수 있다. 선택된 레이어(310)의 입력 텐서와 선택된 레이어(310)에서 출력되는 출력 텐서는 지정된 희소화 파라미터에 따른 희소화를 고려하여 레이어 프로그램-모듈이 변경되거나 희소화 원소를 특정할 수 있도록 레이어 프로그램-모듈이 구성된다.Other deep learning models may further include a modified layer program-module used in the layer 310 in which speed is improved according to sparsity. The input tensor of the selected layer 310 and the output tensor output from the selected layer 310 are changed in consideration of the sparing according to the specified sparsization parameter, or the layer program-module so that the layer program-module can be changed or the sparsation element can be specified. it is composed

산출된 정확도가 임계 정확도 이상에 따라, 튜닝된 딥 러닝 모델(300)과 튜닝 파라미터의 저장에 후속하여 딥 러닝 모델 학습기(100)(의 제어부(190))는 N을 1 증가시키고(S219), 임계 정확도 이상의 산출된 정확도에 대응하는 희소화 파라미터(도 5의 ⑦ 참조)로부터 단계 S203 이하의 과정을 반복 수행한다. According to the calculated accuracy is greater than or equal to the threshold accuracy, following the storage of the tuned deep learning model 300 and the tuning parameters, the deep learning model learner 100 (controller 190 of) increases N by 1 (S219), The process below step S203 is repeatedly performed from the sparsity parameter (refer to ⑦ in FIG. 5) corresponding to the calculated accuracy equal to or greater than the threshold accuracy.

감소된 희소화 파라미터가 최소 파라미터(예를 들어, 50%)인 경우(S215), 딥 러닝 모델 학습기(100)(의 제어부(190))는 도 4의 제어 흐름을 종료(S250)하고 최종 튜닝된 딥 러닝 모델(300)과 대응하는 최종 튜닝 파라미터를 저장부(150)에 저장하고 저장된 최종 딥 러닝 모델(300)과 최종 튜닝 파라미터를 모바일 디바이스 등으로 제공할 수 있다. When the reduced sparsity parameter is the minimum parameter (eg, 50%) (S215), the deep learning model learner 100 (controller 190 of) ends the control flow of FIG. 4 (S250) and final tuning The stored final deep learning model 300 and the corresponding final tuning parameters may be stored in the storage unit 150, and the stored final deep learning model 300 and the final tuning parameters may be provided to a mobile device or the like.

모바일 디바이스 등의 제어부(290)에서 수행되는 튜닝된 딥 러닝 모델(300)은 제공되는 튜닝 파라미터를 통해 설정되는 입력 분포에 따른 서로 다른 임계치를 이용하여 선택된 N 개 레이어(310)에 선행하는 N 개 레이어(310)의 액티베이션 함수를 희소화하도록 구성된다.The tuned deep learning model 300 performed by the control unit 290 such as a mobile device uses different thresholds according to the input distribution set through the provided tuning parameters, and N preceding the N layers 310 are selected. configured to sparse the activation function of the layer 310 .

최종 저장된 튜닝 파라미터는 도 4 및 도 5에 따라 결정되는 최대 개수 N의 선택된 레이어(310) 직전의 레이어(310)의 액티베이션 함수에서 지정된 비율(희소화 파라미터) 이상으로 희소화 율을 높여, 가장 많은 수행 속도에 영향을 받는 N 개의 선택된 레이어(310)의 수행 속도를 희소화에 따라 개선시킬 수 있다. 튜닝된 최종 딥 러닝 모델(300)에 대해서는 추론기(200) 관련한 도 7 내지 도 9의 설명에서 좀 더 살펴보도록 한다. The last stored tuning parameter increases the sparsity rate by more than a specified ratio (spacing parameter) in the activation function of the layer 310 immediately before the maximum number of N selected layers 310 determined according to FIGS. 4 and 5, The performance speed of the N selected layers 310 affected by the performance speed may be improved according to the sparsity. The tuned final deep learning model 300 will be described in more detail in the descriptions of FIGS. 7 to 9 related to the reasoning machine 200 .

도 7은 딥 러닝 모델 추론기(200)의 예시적인 블록도를 도시한 도면이다.7 is a diagram illustrating an exemplary block diagram of a deep learning model reasoner 200 .

도 7에 따르면, 딥 러닝 모델 추론기(200)는 입력부(210), 통신부(230), 저장부(250), 출력부(270) 및 제어부(290)를 포함한다. 설계 예에 따라 딥 러닝 모델 추론기(200)는 그 외 다른 블록을 더 포함하거나 도 7의 일부 블록을 생략하여 구성될 수 있다. According to FIG. 7 , the deep learning model reasoner 200 includes an input unit 210 , a communication unit 230 , a storage unit 250 , an output unit 270 , and a control unit 290 . According to a design example, the deep learning model reasoner 200 may be configured by further including other blocks or omitting some blocks of FIG. 7 .

도 7의 딥 러닝 모델 추론기(200)는 스마트폰이나 태블릿 PC 등의 모바일 디바이스의 예를 나타낸다. 딥 러닝 모델 추론기(200)는 내부에 GPU 및/또는 NPU를 포함하여 통신 네트워크를 통해 서버로 추론을 요청하고 그 요청에 따른 추론 응답을 수신하는 것 대신에 내부 GPU 및/또는 NPU를 이용하여 직접 튜닝된 딥 러닝 모델(300)에 따라 추론할 수 있도록 구성된다. The deep learning model reasoner 200 of FIG. 7 shows an example of a mobile device such as a smart phone or a tablet PC. The deep learning model reasoner 200 uses the internal GPU and/or NPU instead of requesting inference to the server through the communication network and receiving the inference response according to the request, including the GPU and/or NPU therein. It is configured to be able to infer according to the directly tuned deep learning model 300 .

본 발명에 따른 딥 러닝 모델 추론기(200)는 입력 데이터에 따라 동적으로 내부에서 레이어(310)별로 동적으로 변화하는 입력 텐서(또는 출력 텐서) 자체를 희소화하여 희소화에 따른 추론 성능을 정확도의 감소 없이 향상시킬 수 있다. The deep learning model reasoner 200 according to the present invention sparses the input tensor (or the output tensor) that dynamically changes for each layer 310 dynamically according to the input data to improve the inference performance according to the sparsity accuracy. can be improved without a decrease in

도 7을 통해 딥 러닝 모델 추론기(200)를 살펴보면, 입력부(210)는 딥 러닝 모델 추론기(200)의 사용자 입력을 수신한다. 입력부(210)는 버튼, 터치 스크린, 터치 패드, 마이크 등을 포함하여 각종 사용자 입력을 수신한다. Referring to the deep learning model reasoner 200 through FIG. 7 , the input unit 210 receives a user input of the deep learning model reasoner 200 . The input unit 210 receives various user inputs including a button, a touch screen, a touch pad, a microphone, and the like.

통신부(230)는 무선랜이나 이동통신망에 연결되기 위한 통신칩셋과 안테나 등을 포함하여 인터넷이나 이동통신망의 광대역 네트워크를 통해 각종 데이터를 송수신한다. 통신부(230)는 튜닝된 딥 러닝 모델(300)과 튜닝 파라미터를 딥 러닝 모델 학습기(100) 또는 다른 서버로부터 수신할 수 있다. The communication unit 230 transmits and receives various data through the Internet or a broadband network of a mobile communication network, including a communication chipset and an antenna for connection to a wireless LAN or a mobile communication network. The communication unit 230 may receive the tuned deep learning model 300 and tuning parameters from the deep learning model learner 100 or another server.

저장부(250)는 각종 데이터와 프로그램을 저장한다. 저장부(250)는 휘발성 메모리, 비휘발성 메모리 및/또는 캐쉬 등을 포함하여 튜닝 파라미터를 이용하여 튜닝된 딥 러닝 모델(300)을 수행하는 딥 러닝 프로그램을 적어도 저장한다. 딥 러닝 프로그램은 딥 러닝 모델 자체이거나 해당 딥 러닝 모델을 수행시킬 수 있는 프로그램일 수 있다. The storage unit 250 stores various data and programs. The storage unit 250 stores at least a deep learning program for performing the tuned deep learning model 300 using a tuning parameter including a volatile memory, a nonvolatile memory and/or a cache. The deep learning program may be a deep learning model itself or a program capable of executing the deep learning model.

딥 러닝 프로그램은 생산시 딥 러닝 모델 추론기(200)(의 모바일 디바이스)에 내장되거나 통신부(230)를 통해 딥 러닝 모델 학습기(100)나 다른 서버로부터 수신할 수 있다. 딥 러닝 프로그램은 백 그라운드로 수행되는 프로그램이거나 입력부(210)를 통한 사용자 입력에 따라 구동되는 프로그램일 수 있다. The deep learning program may be received from the deep learning model learner 100 or another server built into the deep learning model reasoner 200 (mobile device of) during production or through the communication unit 230 . The deep learning program may be a program executed in the background or a program driven according to a user input through the input unit 210 .

저장부(250)는 그 외 딥 러닝 프로그램에 입력 데이터로 이용되는 데이터(예를 들어, 이미지)를 저장할 수 있다. The storage unit 250 may store data (eg, images) used as input data to other deep learning programs.

출력부(270)는 각종 오디오 및/또는 비디오 신호를 출력한다. 출력부(270)는 디스플레이, 스피커, LED 등을 포함하여 제어부(290)에 의해 생성되는 오디오 신호나 비디오 신호를 출력할 수 있다. 출력부(270)는 제어부(290)에 의해 생성되는 추론 결과를 오디오나 비디오 신호로 변환하여 출력할 수 있다. The output unit 270 outputs various audio and/or video signals. The output unit 270 may output an audio signal or a video signal generated by the control unit 290 including a display, a speaker, an LED, and the like. The output unit 270 may convert the inference result generated by the control unit 290 into an audio or video signal and output it.

제어부(290)는 딥 러닝 모델 추론기(200)를 제어한다. 제어부(290)는 프로그램의 명령어를 수행할 수 있도록 구성되고 저장부(250)의 딥 러닝 프로그램을 수행하여 튜닝된 딥 러닝 모델(300)과 튜닝 파라미터에 따라 입력 데이터에 대해 추론할 수 있도록 구성된다. The controller 290 controls the deep learning model reasoner 200 . The control unit 290 is configured to execute the command of the program, and is configured to infer the input data according to the deep learning model 300 and the tuning parameters tuned by executing the deep learning program of the storage unit 250 .

튜닝된 딥 러닝 모델과 튜닝 파라미터는 적어도 선택된 특정 레이어(310)에 선행하는 레이어(310)의 출력( 텐서)을 일정 비율 이상으로 희소화하여(일정 비율(예를 들어, 90% 등) 이상으로 0 원소를 가지도록 하여) 선택된 특정 레이어(310)의 수행 속도를 향상시켜 전체 추론 속도를 향상시킬 수 있도록 한다. The tuned deep learning model and tuning parameters at least sparse the output (tensor) of the layer 310 preceding the selected specific layer 310 by more than a certain percentage (for example, 90%, etc.) It is possible to improve the overall inference speed by improving the execution speed of the selected specific layer 310).

프로그램의 명령어를 수행하는 제어부(290)는 제1 프로세싱 유닛(291)을 포함하고 나아가 제2 프로세싱 유닛(293) 및/또는 제3 프로세싱 유닛(295)을 포함한다. 제1 프로세싱 유닛(291)은 CPU, MPU, 프로세서 등일 수 있다. 제2 프로세싱 유닛(293)은 GPU 등일 수 있다. 제2 프로세싱 유닛(293)은 모바일 디바이스에 이용되는 모바일 GPU로서 예를 들어, Mali GPU나 Ardeno GPU 등일 수 있다. The control unit 290 for executing the instructions of the program includes a first processing unit 291 and further includes a second processing unit 293 and/or a third processing unit 295 . The first processing unit 291 may be a CPU, an MPU, a processor, or the like. The second processing unit 293 may be a GPU or the like. The second processing unit 293 is a mobile GPU used in a mobile device, and may be, for example, a Mali GPU or an Ardeno GPU.

Mali GPU나 Ardeno GPU는 내부 SIMD 구조를 가져 와프(warp)나 웨이브프런트(wavefront)라고 하는 스케줄링 단위에 따라 묶여 와프 내의 모든 쓰레드(thread)가 동시에 같은 명령어를 수행할 때 높은 수행 성능을 가지고 분기(divergent branch)가 존재하면 쓰레드의 대기나 동기화에 의해 수행 성능이 떨어진다. Mali GPU or Ardeno GPU has an internal SIMD structure and is tied according to a scheduling unit called a warp or wavefront. If there is a divergent branch), performance is degraded due to thread waiting or synchronization.

또한, 모바일 GPU는 서로 인접 쓰레드들이 인접하는 위치의 메모리를 참조할 때 개별 메모리 액세스 대신에 일정 단위 크기의 메모리 액세스(coalesced memory access)가 가능하여 효율성이 높아지는 특성이 있다. 이러한 모바일 GPU의 특성을 고려하여 딥 러닝 모델(300)이 구성될 필요가 있다. In addition, when adjacent threads refer to a memory in a location adjacent to each other, the mobile GPU can access a memory of a certain unit size instead of an individual memory access, thereby increasing efficiency. The deep learning model 300 needs to be configured in consideration of the characteristics of the mobile GPU.

제3 프로세싱 유닛(295)은 NPU 일 수 있다. 딥 러닝 프로그램을 수행하는 제어부(290)는 튜닝된 딥 러닝 모델의 제어 프로시저에 따라 특정 레이어(310)를 제2 프로세싱 유닛(293) 및/또는 제3 프로세싱 유닛(295)에 동적으로 또는 정적으로 할당하여 입력 데이터에 대한 추론을 수행할 수 있다. The third processing unit 295 may be an NPU. The control unit 290 for performing the deep learning program dynamically or statically transfers the specific layer 310 to the second processing unit 293 and/or the third processing unit 295 according to the control procedure of the tuned deep learning model. Inference can be performed on the input data by assigning

도 2를 통해, 딥 러닝 프로그램의 딥 러닝 모델 추론기(200)에서의 수행을 좀 더 구체적으로 살펴보면, 딥 러닝 프로그램의 내부 스케줄러(제어부(290))는 추론을 위한 입력 데이터에 대해 순차적으로 레이어 처리를 수행한다. 내부 스케줄러는 딥 러닝 모델에 따라 구성되는 서로 다른 개수의 레이어 처리를 수행하고 딥 러닝 프로그램은 추론 결과를 출력부(270)로 출력할 수 있다. 딥 러닝 프로그램을 수행하는 제어부(290)는 딥 러닝 프로그램을 통해 레이어 처리와 그 결과를 출력할 수 있다. Referring to the performance of the deep learning model reasoner 200 of the deep learning program in more detail through FIG. 2 , the internal scheduler (control unit 290) of the deep learning program is sequentially layered with respect to the input data for inference. perform processing. The internal scheduler may process a different number of layers configured according to the deep learning model, and the deep learning program may output an inference result to the output unit 270 . The control unit 290 executing the deep learning program may process the layer through the deep learning program and output the result.

튜닝된 딥 러닝 모델(300)의 딥 러닝 프로그램은 레이어(310)로 입력되는 텐서(tensor)의 희소성(sparcity)을 이용하여 추론 속도를 높이도록 구성되고 특히 선행하는 레이어(310)의 출력 텐서에 희소성 비율을 높여 추론 속도를 더 향상시킬 수 있다. The deep learning program of the tuned deep learning model 300 is configured to increase the inference speed by using the sparsity of the tensor input to the layer 310, and in particular, to the output tensor of the preceding layer 310 The inference speed can be further improved by increasing the sparsity ratio.

딥 러닝 프로그램의 내부 스케줄러는 순차적으로 레이어(310)를 수행시키고 전단의 레이어(310)에서 생성된 출력 텐서를 후속 레이어(310)의 입력 텐서로 제공한다. 예를 들어, 딥 러닝 프로그램은 도 2의 ⓑ 레이어(310)에서 웨이트 어레이(L)에 따른 적용으로 출력되는 텐서를 다음 어레이인 도 2의 ⓐ에 입력 텐서로 제공한다. 또한, 레이어(310)의 순차적인 수행에 따라 도 2의 ⓒ 레이어(310)에서 웨이트 어레이(N)에 따른 적용으로 출력되는 텐서를 다음 어레이인 도 2의 ⓓ의 입력 텐서로 제공한다. The internal scheduler of the deep learning program sequentially executes the layers 310 and provides the output tensor generated in the previous layer 310 as the input tensor of the subsequent layer 310 . For example, the deep learning program provides a tensor output by application according to the weight array L in the layer 310 of FIG. 2 as an input tensor to the next array, ⓐ of FIG. 2 . In addition, according to the sequential execution of the layer 310, a tensor output by application according to the weight array N in the ⓒ layer 310 of FIG. 2 is provided as the input tensor of ⓓ of FIG. 2, which is the next array.

여기서, 제어부(290)의 딥 러닝 프로그램을 통해 튜닝된 딥 러닝 모델에서 출력되는 텐서에 포함되는 데이터 원소들 중 0인 데이터 원소를 나타내는 보조 데이터 구조를 생성하는 레이어(310)를 이하 '생성 레이어'(producing layer)라 이하 지칭하고 전단의 생성 레이어(310)의 보조 데이터 구조를 이용하여 희소성을 고려한 레이어 연산을 처리하는 레이어(310)를 이하 '소비 레이어'(consumer layer)라 지칭한다.Hereinafter, the layer 310 that generates the auxiliary data structure representing the zero data element among the data elements included in the tensor output from the deep learning model tuned through the deep learning program of the controller 290 is referred to as a 'generation layer' Hereinafter, a layer 310 that processes a layer operation in consideration of scarcity by using the auxiliary data structure of the generation layer 310 of the previous stage is referred to as a 'producing layer' is hereinafter referred to as a 'consumer layer'.

도 2에서, ⓑ 레이어(310)는 ⓐ 레이어(310)에 대해 생성 레이어이고 ⓐ 레이어(310)는 ⓑ 레이어(310)에서 출력되는 텐서를 소비하는 소비 레이어이다. 또한, ⓒ 레이어(310)는 ⓓ 레이어(310)에 대해 생성 레이어이고 ⓓ 레이어(310)는 ⓒ 레이어(310)에서 출력되는 텐서를 소비하는 소비 레이어이다. In FIG. 2 , the ⓑ layer 310 is a generation layer for the ⓐ layer 310 , and the ⓐ layer 310 is a consumption layer that consumes the tensor output from the ⓑ layer 310 . In addition, the ⓒ layer 310 is a generation layer for the ⓓ layer 310 , and the ⓓ layer 310 is a consumption layer that consumes the tensor output from the ⓓ layer 310 .

튜닝된 딥 러닝 모델에서, 첫 번째 및 마지막 레이어를 제외한 모든 레이어(310)들이 생성 레이어(310)이자 소비 레이어(310)일 수 있다. 또는, 모든 레이어(310)들 중 특정 하나 이상의 선택된 레이어(310)가 소비 레이어(310)이고 선택된 레이어(310)에 선행하는 하나 이상의 레이어(310)가 생성 레이어(310)로 구성될 수 있다. In the tuned deep learning model, all layers 310 except for the first and last layers may be a generation layer 310 and a consumption layer 310 . Alternatively, a specific one or more selected layers 310 among all the layers 310 may be configured as the consumption layer 310 , and one or more layers 310 preceding the selected layer 310 may be configured as the generation layer 310 .

튜닝된 딥 러닝 모델(300)의 소비 레이어(310)들은 딥 러닝 모델 학습기(100)에서 딥 러닝 모델의 레이어(310)들 중 프로파일링에 따라 측정된 수행 시간에 따라 선택되는 레이어(310)(도 4 참조)이다. Consumption layers 310 of the tuned deep learning model 300 are selected according to the execution time measured according to profiling among the layers 310 of the deep learning model in the deep learning model learner 100 ( 310) ( 4).

도 4를 통해 살펴본 바와 같이, 선택된 소비 레이어(310)의 선행하는 생성 레이어(310)에 일정 비율 이상의 희소성을 가지도록 임계치를 변경하여 튜닝 파라미터가 저장되고 제공된다. 4 , the tuning parameter is stored and provided by changing the threshold to have a sparseness of a predetermined ratio or more in the generation layer 310 preceding the selected consumption layer 310 .

또한, 딥 러닝 모델 학습기(100)는 딥 러닝 모델(300)의 생성 레이어(310)에서 보조 데이터 구조를 생성하기 위해 생성 레이어(310)의 프로그램( 모듈)을 변경하고 소비 레이어(310)에서 보조 데이터 구조를 활용하여 레이어 수행 속도를 높이도록 소비 레이어(310)의 프로그램( 모듈)을 변경한다.In addition, the deep learning model learner 100 changes the program (module) of the generation layer 310 to generate an auxiliary data structure in the generation layer 310 of the deep learning model 300 and assists in the consumption layer 310 The program (module) of the consumption layer 310 is changed to increase the layer execution speed by utilizing the data structure.

도 2의 예를 통해 좀 더 구체적으로 살펴보면, 도 2의 튜닝된 딥 러닝 모델은 예를 들어 2 개의 소비 레이어(310)(ⓐ, ⓓ)와 두 개의 생성 레이어(310)(ⓑ, ⓒ)를 가진다. 튜닝된 딥 러닝 모델에 따라 딥 러닝 프로그램을 수행하는 딥 러닝 모델 추론기(200)(의 제어부(290))는 선행하는 하나의 생성 레이어(310)(ⓑ)에서 출력 텐서와 출력 텐서에서의 데이터 원소의 0 여부를 나타내는 보조 데이터 구조를 생성(도 2의 ① 참조)한다.Looking more specifically through the example of FIG. 2, the tuned deep learning model of FIG. 2 includes, for example, two consumption layers 310 (ⓐ, ⓓ) and two generation layers 310 (ⓑ, ⓒ). have The deep learning model reasoner 200 (controller 290 of) that performs a deep learning program according to the tuned deep learning model has an output tensor and data in the output tensor in one preceding generation layer 310 (ⓑ). An auxiliary data structure indicating whether an element is 0 or not is created (refer to ① in FIG. 2).

생성 레이어(310)(ⓑ)는 내부적으로 순차적으로 합성곱 연산(311) 및 액티베이션 연산(319)과 나아가 풀링 연산을 수행하도록 구성된다. 합성곱 연산(311)은 해당 생성 레이어(310)에 대해 입력, 출력 텐서의 희소성을 고려하여 튜닝되는 웨이트 어레이(L)를 이용하여 수행되고, 액티베이션 연산(319)은 생성 레이어(310)의 출력 텐서에 설정되는 희소성 비율 이상의 0을 가지도록 해당 생성 레이어(310)(ⓑ)의 합성곱 연산(311) 후의 입력 분포에 따라 설정되는 임계치에 따른 액티베이션 함수를 수행한다. 생성 레이어(310)(ⓑ)의 액티베이션 함수에 적용되는 임계치는 다른 생성 레이어(310)(예를 들어, ⓒ)의 액티베이션 함수에 적용되는 임계치와 그 입력 분포에 따라 달리 설정된다. The generation layer 310 (ⓑ) is internally configured to sequentially perform a convolution operation 311 and an activation operation 319 and further a pooling operation. The convolution operation 311 is performed with respect to the generation layer 310 using a weight array L tuned in consideration of the sparseness of the input and output tensors, and the activation operation 319 is the output of the generation layer 310 . An activation function according to a threshold set according to the input distribution after the convolution operation 311 of the corresponding generation layer 310 (ⓑ) is performed so as to have 0 equal to or greater than the sparsity ratio set in the tensor. The threshold applied to the activation function of the generation layer 310 (ⓑ) is set differently according to the threshold applied to the activation function of another generation layer 310 (eg, ⓒ) and the input distribution thereof.

액티베이션 함수는 튜닝 파라미터의 생성 레이어(310)(ⓑ)에 대응하는 임계치를 로딩(이용)하고 임계치를 이용하여 합성곱 연산(311)에서 출력되는 합성곱 데이터 원소를 출력 텐서로 설정한다. 예를 들어, 액티베이션 함수는 임계치 이하의(보다 작은) 합성곱 데이터 원소를 출력 텐서에 0으로 설정한다. ReLU 함수인 액티베이션 함수는 그 입력 분포에 따라 출력에 높은 비율의 0을 가지도록 0 이상의 임계치로 합성곱 데이터를 필터링하여 높은 비율의 0을 가진 출력 텐서를 생성한다. The activation function loads (uses) a threshold value corresponding to the generation layer 310 (ⓑ) of the tuning parameter and sets the convolution data element output from the convolution operation 311 as an output tensor by using the threshold value. For example, the activation function sets convolutional data elements below (less than) a threshold to zero in the output tensor. The activation function, which is a ReLU function, generates an output tensor with a high ratio of zeros by filtering the convolutional data with a threshold greater than or equal to zero to have a high ratio of zeros in the output according to its input distribution.

또한, 액티베이션 함수나 액티베이션 함수에 후속하는 특정 함수(예를 들어 풀링 함수나 임의의 함수)는 액티베이션 함수에서 출력되는 텐서나 풀링 함수(해당 레이어(310)에 풀링 연산이 존재하는 경우)에서 출력되는 텐서에 각 데이터 원소의 0 여부를 나타내는 데이터를 보조 데이터 구조로 저장한다. In addition, the activation function or a specific function following the activation function (for example, a pooling function or an arbitrary function) is a tensor output from the activation function or a pooling function (when a pooling operation exists in the corresponding layer 310). It stores data indicating whether each data element is zero or not in a tensor as an auxiliary data structure.

도 8은 보조 데이터 구조의 생성 알고리즘의 예를 도시한 도면으로서, 도 8의 알고리즘은 액티베이션 함수에서 또는 액티베이션 함수 이후(예를 들어, 풀링 함수나 임의의 함수)에서 수행된다. FIG. 8 is a diagram illustrating an example of an algorithm for generating an auxiliary data structure, wherein the algorithm of FIG. 8 is performed in an activation function or after an activation function (eg, a pooling function or any function).

도 8의 (a)는 출력 텐서 사이즈 크기의 비트-마스크 어레이를 생성하는 예를 나타내고 도 8의 (b)는 0이 아닌 데이터 원소의 위치를 나타내는 인덱스를 저장하는 인덱스 테이블의 예를 나타낸다. FIG. 8(a) shows an example of generating a bit-mask array having an output tensor size, and FIG. 8(b) shows an example of an index table storing an index indicating the position of a non-zero data element.

도 8의 (a)에서 액티베이션 함수 또는 후속 함수는 희소화된 출력 텐서와 동일 차원의 어레이(도 8의 (a)의 nz_mask)에 출력 텐서의 각 데이터 원소 위치에서의 0 여부에 따라 비트-마스크 어레이의 대응하는 위치에 0 또는 1을 기록(도 8의 (a)의 라인 10, 11 참조)한다. In Fig. 8(a), the activation function or the subsequent function is a bit-mask depending on whether 0 at each data element position of the output tensor in an array of the same dimension as the sparse output tensor (nz_mask in Fig. 8(a)). 0 or 1 is written to the corresponding position in the array (see lines 10 and 11 in Fig. 8(a)).

도 8의 (b)에서 액티베이션 함수 또는 후속 함수는 출력 텐서의 각 데이터 원소 위치에서의 0 여부에 따라 0이 아닌 데이터 원소를 나타내는 위치를 인덱스 테이블(도 8의 (b)의 nz_idx, 10, 11 라인)에 기록한다. In (b) of FIG. 8, the activation function or the subsequent function determines the position representing the non-zero data element according to whether it is 0 at each data element position of the output tensor in the index table (nz_idx, 10, 11 in Fig. 8(b)). line) is recorded.

제어부(290)는 생성 레이어(310)(ⓑ)에서 생성되는 출력 텐서와 데이터 원소의 0 여부를 나타내는 보조 데이터 구조를 저장부(250)에 저장한다. The control unit 290 stores the output tensor generated in the generation layer 310 (ⓑ) and an auxiliary data structure indicating whether the data element is 0 in the storage unit 250 .

하나의 생성 레이어(310)(ⓑ)에서 출력 텐서와 보조 데이터 구조의 생성에 후속하여, 딥 러닝 프로그램을 수행하는 딥 러닝 모델 추론기(200)(의 제어부(290))는 생성 레이어(310)(ⓑ)에 후속하는 소비 레이어(310)(ⓐ)에서 선행하는 생성 레이어(310)에서 생성된 보조 데이터 구조를 이용하여 선행하는 생성 레이어(310)로부터의 텐서에 대한 연산을 수행(도 2의 ② 참조)한다. Subsequent to the generation of the output tensor and auxiliary data structure in one generation layer 310 (ⓑ), the deep learning model reasoner 200 (controller 290 of) performing a deep learning program is the generation layer 310 In the consumption layer 310 (ⓐ) following (ⓑ), an operation is performed on the tensor from the preceding generation layer 310 using the auxiliary data structure generated in the preceding generation layer 310 (see FIG. 2 ). ②).

소비 레이어(310)(ⓐ)는 내부에서 순차적으로 합성곱 연산(311), 액티베이션 연산(319) 및 나아가(추가로) 풀링 연산(315)을 수행하도록 구성된다. 합성곱 연산(311)은 희소성 비율을 적용하여 생성 레이어(310)에서 출력된 텐서와 소비 레이어(310)(ⓐ)의 웨이트 어레이(M) 사이의 합성곱 연산(311)을 생성 레이어(310)에서 동적으로 출력되는 보조 데이터 구조를 이용하여 수행한다. The consumption layer 310 (ⓐ) is configured to sequentially perform a convolution operation 311 , an activation operation 319 , and further (additionally) a pooling operation 315 therein. The convolution operation 311 applies a sparsity ratio to generate a convolution operation 311 between the tensor output from the generation layer 310 and the weight array M of the consumption layer 310 (ⓐ) generation layer 310 It is performed using an auxiliary data structure that is dynamically output from

소비 레이어(310)(ⓐ)는 입력 텐서의 높은 비율의 0 데이터 원소를 고려하여 합성곱 연산(311)을 수행하도록 구성된다. The consumption layer 310 (ⓐ) is configured to perform the convolution operation 311 in consideration of a high proportion of zero data elements of the input tensor.

도 9는 소비 레이어(310)(ⓐ)에서 생성 레이어(310)에서 제공되는 보조 데이터 구조를 이용하여 합성곱 연산(311)을 수행하는 예를 도시한 도면이다. FIG. 9 is a diagram illustrating an example of performing the convolution operation 311 by using the auxiliary data structure provided from the generation layer 310 in the consumption layer 310 (ⓐ).

도 9의 (a)는 생성 레이어(310)에서 제공되는 비트-마스크 어레이의 데이터 원소 위치의 마스크 값에 따라 합성곱 연산(311)을 수행하는 예를 나타낸다. 비트-마스크 어레이의 마스크 값이 1인 경우에 소비 레이어(310)(ⓐ)는 합성곱 연산(311)을 수행하고 마스크 값이 0인 경우 합성곱 연산(311)을 스킵한다. 도 9의 (a)는 레이어(310) 사이에서 희소 행렬을 고려한 표현(representation)으로 변환 없이 기존 행렬의 표현을 그대로 사용하여 처리할 수 있는 장점이 있는 반면, GPU 등에서의 조건 분기(branch)에 따른 일부 성능 저하 등은 발생할 수 있다.9A illustrates an example of performing the convolution operation 311 according to the mask value of the data element position of the bit-mask array provided from the generation layer 310 . When the mask value of the bit-mask array is 1, the consuming layer 310 (ⓐ) performs the convolution operation 311, and when the mask value is 0, the convolution operation 311 is skipped. 9(a) has the advantage that it can be processed using the representation of the existing matrix as it is without conversion into a representation considering the sparse matrix between the layers 310, whereas the conditional branch in the GPU, etc. As a result, some performance degradation may occur.

도 9의 (b)는 생성 레이어(310)에서 제공되는 인덱스 테이블을 이용하여 0이 아닌 입력 텐서의 데이터 원소와 웨이트 어레이 사이의 합성곱 연산(311)을 수행하는 예이다. 도 9의 (b)의 예는 필요한 횟수 만큼의 루프만을 수행하는 반면에 인덱스 테이블의 'c'(라인 7 참조)가 랜덤한 순서로 저장되어 루프의 반복에서의 메모리 접근의 스트라이드(stride)가 클 수 있는 단점이 발생할 수 있다.9B is an example of performing a convolution operation 311 between a data element of a non-zero input tensor and a weight array using the index table provided by the generation layer 310 . The example of FIG. 9 (b) performs only the required number of loops, whereas 'c' (see line 7) of the index table is stored in a random order, so that the stride of memory access in the iteration of the loop is There may be significant drawbacks.

튜닝된 딥 러닝 모델의 생성 레이어(310)에 의해 생성되고 소비 레이어(310)에 의해 소비되는 텐서는 모바일 디바이스에서 추론을 위해 입력 데이터에 의해 생성되는 데이터이다. 이러한 텐서는 웨이트 어레이와는 달리 입력 데이터에 따른 추론 대상 데이터가 변경됨에 따라 동적으로 변경된다. The tensor generated by the generation layer 310 of the tuned deep learning model and consumed by the consumption layer 310 is data generated by input data for inference at the mobile device. Unlike the weight array, such a tensor is dynamically changed as the inference target data according to the input data is changed.

반면, 생성 레이어(310)에 의해 출력되는 출력 텐서는 일정 비율 이상의 0을 가지는 희소 행렬로 구성되고 딥 러닝 모델(300)은 희소화(sparcification)를 고려하여 미리 튜닝되어 그 정확도가 떨어지지 않는다. On the other hand, the output tensor output by the generation layer 310 is composed of a sparse matrix having zeros greater than or equal to a certain ratio, and the deep learning model 300 is pre-tuned in consideration of sparing, so that its accuracy does not decrease.

도 8과 도 9에서는 텐서의 데이터 원소 하나 하나에 대한 보조 데이터 구조를 생성하고 보조 데이터 구조의 하나의 데이터 원소의 0 여부에 따라 합성곱 연산(311)을 수행하는 예를 나타낸다. 8 and 9 show examples of generating an auxiliary data structure for each data element of the tensor and performing the convolution operation 311 depending on whether one data element of the auxiliary data structure is 0 or not.

대안으로, 생성 레이어(310)는 지정된 사이즈 단위로 지정된 사이즈의 일련의 데이터 원소들 모두의 0 여부를 나타내는 보조 데이터 구조를 생성한다. 예를 들어, 생성 레이어(310)는 출력 텐서의 넓이(Width), 높이(Height) 및 채널(Channel) 중 채널 방향이나 넓이 방향으로(바람직하게는 넓이 방향) 지정된 사이즈(예를 들어, 4 또는 8 등)씩 그룹핑하고 각 그룹핑된 연속적인 데이터 원소들 모두가 0인지 여부에 따른 보조 데이터 구조를 생성한다. Alternatively, the creation layer 310 creates an auxiliary data structure that indicates whether all of a series of data elements of a specified size are zero in a specified size unit. For example, the generation layer 310 has a size (for example, 4 or 8, etc.) and create an auxiliary data structure according to whether or not all of the grouped consecutive data elements are 0.

소비 레이어(310)는 지정된 사이즈로 그룹핑되어 그룹별 데이터 원소들의 0 여부를 나타내는 보조 데이터 구조를 이용하여 합성곱 연산(311)을 수행한다. 소비 레이어(310)는 지정된 사이즈의 벡터 연산을 통해 합성곱 연산(311)을 수행한다. The consumption layer 310 performs a convolution operation 311 using an auxiliary data structure that is grouped by a specified size and indicates whether data elements for each group are 0 or not. The consumption layer 310 performs a convolution operation 311 through a vector operation of a specified size.

예를 들어, 소비 레이어(310)는 지정된 사이즈의 그룹 데이터 모두가 0이 아닌 경우 비록 일부가 0인 경우라도 연속적인 메모리에서 지정된 사이즈의 연속적인 데이터 원소들을 읽어들이고 합성곱 연산(311)을 수행한다. 소비 레이어(310)는 보조 데이터 구조의 그룹 비트 마스크나 그룹 인덱스를 통해 그룹 데이터 모두가 0인 경우 합성곱 연산(311)의 수행을 스킵할 수 있다. For example, when all of group data of a specified size is not 0, the consumption layer 310 reads consecutive data elements of a specified size from contiguous memory and performs a convolution operation 311 even if some of them are 0. do. The consumption layer 310 may skip performing the convolution operation 311 when all group data is 0 through the group bit mask or group index of the auxiliary data structure.

지정된 사이즈의 그룹 데이터에 대한 벡터 연산 처리로 인접 메모리의 데이터를 효율적으로 로딩하고 지정된 사이즈 단위로 연산 처리할 수 있어 소비 레이어(310)의 수행 속도를 향상시킬 수 있다. By vector arithmetic processing for group data of a specified size, data in an adjacent memory can be efficiently loaded and operation processing can be performed in units of a specified size, thereby improving the execution speed of the consumption layer 310 .

특히, 딥 러닝 모델 학습기(100)에 의한 희소화에 따라 높은 희소성을 가지는 생성 레이어(310)와 소비 레이어(310)는 비록 벡터 연산을 위한 그룹핑에 따라 희소성 비율이 낮아지는 경우에도 높은 수행 속도 향상을 가져온다. In particular, the generation layer 310 and the consuming layer 310, which have high sparsity according to the sparsity by the deep learning model learner 100, even when the sparsity ratio is decreased due to grouping for vector operation, high performance speed improvement brings

딥 러닝 모델 추론기(200)(의 제어부(290))는 생성 레이어(310)(ⓑ)와 소비 레이어(310)(ⓐ)에 후속하는 다른 레이어(310)들의 연산을 수행하고 레이어(310)의 수행 순서에 따라 다른 생성 레이어(310)(ⓒ)와 다른 소비 레이어(310)(ⓓ)의 연산을 수행(도 2의 ③ 및 ④ 참조)한다. The deep learning model reasoner 200 (controller 290 of) performs the operation of the other layers 310 following the generation layer 310 (ⓑ) and the consumption layer 310 (ⓐ), and the layer 310 According to the execution order of the other generation layer 310 (ⓒ) and the other consumption layer 310 (ⓓ) is performed (refer to ③ and ④ in FIG. 2).

후행하는 생성 레이어(310)(ⓒ)와 후행하는 소비 레이어(310)(ⓓ)는 이전의 생성 레이어(310)(ⓑ)와 소비 레이어(310)(ⓐ)와는 다른 임계치를 가지고 동적으로 텐서를 생성하고 연산한다. The following generation layer 310 (©) and the following consumption layer 310 (ⓓ) have different thresholds than the previous generation layer 310 (ⓑ) and consumption layer 310 (ⓐ), and dynamically generate tensors create and compute

이미, 생성 레이어(310)(ⓑ)와 소비 레이어(310)(ⓐ)에 관련하여 상세히 살펴보았으므로 여기서는 차이점 위주로 살펴보도록 한다. Since the generation layer 310 (ⓑ) and the consumption layer 310 (ⓐ) have already been described in detail, the differences will be mainly discussed here.

딥 러닝 모델 추론기(200)(의 제어부(290))는 생성 레이어(310)(ⓒ)에서 앞선 생성 레이어(310)(ⓑ)에서 이용된 임계치와 다른 임계치를 이용하여 출력 텐서를 생성하고 출력 텐서의 각각의 데이터 원소의 0 여부를 나타내는 보조 데이터 구조를 생성한다. The deep learning model reasoner 200 (controller 290 of) generates and outputs an output tensor using a threshold different from the threshold used in the previous generation layer 310 (ⓑ) in the generation layer 310 (ⓒ). Creates an auxiliary data structure indicating whether each data element in the tensor is zero or not.

딥 러닝 모델 추론기(200)(의 제어부(290))는 생성 레이어(310)(ⓒ)의 액티베이션 함수의 임계치를 제공되는 튜닝 파라미터에 따라 설정하고 설정된 임계치에 따라 임계치 이하의 합성곱 연산 데이터를 0으로 설정한다. The deep learning model reasoner 200 (control unit 290 of) sets the threshold of the activation function of the generation layer 310 (ⓒ) according to the provided tuning parameter, and according to the set threshold, convolution operation data below the threshold value set to 0.

생성 레이어(310)(ⓒ)의 임계치와 생성 레이어(310)(ⓑ)의 임계치는 서로 다르고, 생성 레이어(310)(ⓒ)의 임계치는 소비 레이어(310)(ⓓ)로 지정된 희소성 비율 이상의 입력 텐서를 제공하도록 생성 레이어(310)(ⓒ)에 대한 프로파일링에 따른 입력 분포에 따라 설정되는 임계치이다. The threshold of the generation layer 310 (ⓒ) and the threshold of the generation layer 310 (ⓑ) are different from each other, and the threshold of the generation layer 310 (ⓒ) is an input greater than or equal to the scarcity ratio specified by the consumption layer 310 (ⓓ). It is a threshold set according to an input distribution according to profiling for the generation layer 310 (©) to provide a tensor.

또한, 생성 레이어(310)(ⓑ)의 임계치는 소비 레이어(310)(ⓐ)로 지정된 희소성 비율 이상의 입력 텐서를 제공하도록 생성 레이어(310)(ⓑ)에 대한 프로파일링에 따른 입력 분포에 따라 설정되는 임계치이다. In addition, the threshold of the generation layer 310 (ⓑ) is set according to the input distribution according to profiling for the generation layer 310 (ⓑ) so as to provide an input tensor equal to or greater than the sparsity ratio specified by the consumption layer 310 (ⓐ). is the threshold to be

딥 러닝 모델 추론기(200)(의 제어부(290))는 생성 레이어(310)(ⓒ)에 후속하는 소비 레이어(310)(ⓓ)에서 생성 레이어(310)(ⓒ)에서 생성된 보조 데이터 구조를 이용하여 입력 텐서에 대한 연산을 수행한다. The deep learning model reasoner 200 (control unit 290 of) is an auxiliary data structure generated in the generation layer 310 (©) in the consumption layer 310 (ⓓ) subsequent to the generation layer 310 (©). is used to perform an operation on the input tensor.

딥 러닝 모델 추론기(200)(의 제어부(290))는 소비 레이어(310)(ⓓ)의 내부에서 순차적으로 합성곱 연산(311) 및 액티베이션 연산(319)과 나아가(추가로) 풀링 연산(315)을 수행한다. 소비 레이어(310)(ⓓ)는 희소성 비율을 적용하여 생성 레이어(310)(ⓒ)에서 출력된 텐서와 소비 레이어(310)(ⓓ)의 웨이트 어레이(N) 사이의 합성곱 연산(311)을 생성 레이어(310)(ⓒ)에서 동적으로 출력되는 보조 데이터 구조를 이용하여 수행한다.The deep learning model reasoner 200 (control unit 290 of) sequentially performs a convolution operation 311 and an activation operation 319 inside the consumption layer 310 (ⓓ) and further (additionally) a pooling operation ( 315). The consumption layer 310 (ⓓ) applies the sparsity ratio to perform a convolution operation 311 between the tensor output from the generation layer 310 (©) and the weight array N of the consumption layer 310 (ⓓ). This is performed using an auxiliary data structure dynamically output from the generation layer 310 (©).

이와 같이 튜닝된 딥 러닝 모델(300)은 하나 이상의(바람직하게는 2 개 이상의) 생성 레이어(310)와 소비 레이어(310)를 가지고 각각의 생성 레이어(310)는 출력 텐서(또는 액티베이션 연산(319)의 출력 텐서)에 일정 비율 이상의 0 데이터 원소를 가지도록 입력 분포에 따라 액티베이션 함수 등을 통해 희소화하고 소비 레이어(310)는 희소화된 텐서에 대한 합성곱 연산(311)의 수행으로 추론 성능을 향상시킬 수 있다. The deep learning model 300 tuned in this way has one or more (preferably two or more) generation layers 310 and consumption layers 310, and each generation layer 310 has an output tensor (or activation operation 319). ) in the output tensor of ), it is sparsed through an activation function, etc. according to the input distribution so as to have a certain ratio or more of 0 data elements, and the consumption layer 310 performs inference performance by performing a convolution operation 311 on the sparse tensor. can improve

튜닝된 딥 러닝 모델(300)의 웨이트들은 딥 러닝 모델(300)의 레이어(310)별 희소화를 고려한 학습에 따라 튜닝된다. 튜닝된 딥 러닝 모델(300)은 그 정확도가 유지되도록 구성된다. 또한, 수행 시간을 고려하여 소비 레이어(310)를 선택하고 선행하는 생성 레이어(310)를 희소화시켜 전체 추론 속도를 향상시킬 수 있다. The weights of the tuned deep learning model 300 are tuned according to learning in consideration of the layer 310 sparsity of the deep learning model 300 . The tuned deep learning model 300 is configured such that its accuracy is maintained. In addition, the overall inference speed may be improved by selecting the consuming layer 310 in consideration of the execution time and sparse the preceding generation layer 310 .

기존 딥 러닝 모델 대비하여 추론 정확도의 저하 없이 추론 속도는 1.7배 이상의 개선을 가져온 것으로 실험을 통해 확인되었고 이는 모바일 GPU나 모바일 NPU를 포함하는 모바일 디바이스에서 바람직하게 인공지능을 통한 추론 기능을 적용할 수 있다. Compared to the existing deep learning model, it was confirmed through experiments that the inference speed improved by more than 1.7 times without deterioration in inference accuracy. have.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시 예 및 첨부된 도면에 의해 한정되는 것이 아니다.The present invention described above, various substitutions, modifications and changes are possible within the scope without departing from the technical spirit of the present invention for those of ordinary skill in the art to which the present invention pertains. It is not limited by the drawings.

100 : 딥 러닝 모델 학습기
110 : 통신부 150 : 저장부
190 : 제어부
191 : 제1 프로세싱 유닛 195 : 제2 프로세싱 유닛
200 : 딥 러닝 모델 추론기
210 : 입력부 230 : 통신부
250 : 저장부 270 : 출력부
290 : 제어부
291 : 제1 프로세싱 유닛 293 : 제2 프로세싱 유닛
295 : 제3 프로세싱 유닛
300 : 딥 러닝 모델
310 : 레이어
311 : 합성곱 연산
315 : 풀링 연산
319 : 액티베이션 연산100: Deep Learning Model Learner
110: communication unit 150: storage unit
190: control unit
191 first processing unit 195 second processing unit
200: deep learning model reasoner
210: input unit 230: communication unit
250: storage unit 270: output unit
290: control unit
291 first processing unit 293 second processing unit
295: third processing unit
300: deep learning model
310: layer
311 : Convolution operation
315: Pooling operation
319: Activation operation

Claims

(a) selecting N layers of the trained deep learning model;
(b) for N layers preceding the selected N layers, respectively setting thresholds corresponding to the sparsization parameters according to the sparsization parameters; and
(c) tuning the deep learning model by using each threshold set for the preceding N layers;
Wherein N is an integer of 1 or more,
How to train a deep learning model.

According to claim 1,
(d) calculating the accuracy of the tuned deep learning model and determining whether the calculated accuracy is greater than or equal to a threshold accuracy; further comprising,
When the calculated accuracy is less than the threshold accuracy, the deep learning model training method reduces the sparsity parameter and repeats steps (b) and (c),
How to train a deep learning model.

3. The method of claim 2,
When the calculated accuracy is equal to or greater than the threshold accuracy, the deep learning model training method increases N by 1 and repeats steps (a) to (c) from the sparsity parameter corresponding to the calculated accuracy,
How to train a deep learning model.

4. The method of claim 3,
After performing the steps (a) to (d), storing the tuned deep learning model and the corresponding tuning parameters,
The tuned deep learning model performing inference from input data using the tuning parameter performs the activation function of the preceding N layers using a threshold set through the tuning parameter,
How to train a deep learning model.

The method of claim 1,
The step (a) is to select N layers based on the execution time of all layers according to the profiling of the learned deep learning model,
How to train a deep learning model.

6. The method of claim 5,
In step (a), each of the layers is ordered according to the execution time, and N layers having the longest execution time are selected, or the entire layers are ordered according to the execution time reduction rate according to the sparsity of each of the entire layers and performed the most. Selecting N layers with a high time reduction rate,
How to train a deep learning model.

According to claim 1,
The step (b) is to set each input value corresponding to the sparing parameter in each activation function input distribution according to the profiling of the preceding N layers to the respective thresholds of the N preceding layers,
How to train a deep learning model.

a control unit that executes a program command; and
Including; a deep learning model learning program and a storage unit for storing the learned deep learning model;
The control unit for performing the deep learning model training program selects N layers of the learned deep learning model, and a threshold value corresponding to the sparsation parameter according to the sparing parameter for N layers preceding the selected N layers. , and tuning the deep learning model using each threshold set for the preceding N layers,
Wherein N is an integer of 1 or more,
Deep learning model learner.

9. The method of claim 8,
The control unit calculates the accuracy of the tuned deep learning model, determines whether the calculated accuracy is equal to or greater than a threshold accuracy, and if the calculated accuracy is less than the threshold accuracy, reduces the sparsization parameter, and for the preceding N layers Each setting a corresponding threshold according to the reduced sparing parameter and repeating the tuning of the deep learning model using each threshold set to be reduced for the preceding N layers,
Deep learning model learner.

10. The method of claim 9,
When the calculated accuracy is greater than or equal to the threshold accuracy, the control unit increases N by 1, selects N layers increased by 1 of the learned deep learning model, and calculates the N layers preceding the N layers Each of the thresholds corresponding to the sparing parameters corresponding to the obtained accuracy are set, and the tuning of the deep learning model is repeated using each threshold set for the preceding N layers,
Deep learning model learner.

11. The method of claim 10,
The control unit stores the tuned deep learning model and the corresponding tuning parameters in the storage unit,
The tuned deep learning model performing inference from input data using the tuning parameter performs the activation function of the preceding N layers using a threshold set through the tuning parameter,
Deep learning model learner.

9. The method of claim 8,
The control unit, according to the profiling of the learned deep learning model, orders each of the entire layers according to the execution time and selects N layers having the longest execution time or a reduction rate of the execution time according to the sparsity of each of the entire layers order the entire layers according to , and select N layers with the highest reduction in execution time,
Deep learning model learner.

9. The method of claim 8,
The control unit sets each input value corresponding to the sparing parameter in each activation function input distribution according to the profiling of the preceding N layers to each threshold value of the preceding N layers,
Deep learning model learner.