KR102422774B1

KR102422774B1 - Method and Apparatus for Splitting and Assembling Neural Networks for Model Generation Queries

Info

Publication number: KR102422774B1
Application number: KR1020200063418A
Authority: KR
Inventors: 최동완; 진영화; 이건호
Original assignee: 인하대학교 산학협력단
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2022-07-19
Also published as: KR20210146534A

Abstract

학습된 신경망 모델을 사용자의 질의에 맞춰 분할하고 병합하는 방법 및 장치가 제시된다. 본 발명에서 제안하는 학습된 신경망 모델을 사용자의 질의에 맞춰 분할하고 병합하는 방법은 신경망 모델이 학습한 모든 태스크에 대한 합성곱 신경망(Convolution Neural Network; CNN) 채널의 중요도를 저장하는 모델 전처리 단계 및 사용자로부터 처리를 요청하는 부분 태스크에 대한 쿼리가 입력되면, 해당 쿼리에 필요한 중요도를 전처리 단계에서 저장된 CNN 채널의 중요도를 이용하여 계산하고, 불필요한 부분에 대해 삭제함으로써 모델을 경량화 하는 쿼리 단계를 포함한다.A method and apparatus for segmenting and merging a trained neural network model according to a user's query are presented. The method of segmenting and merging the trained neural network model proposed in the present invention according to the user's query is a model preprocessing step of storing the importance of a convolutional neural network (CNN) channel for all tasks learned by the neural network model, and When a query for a partial task requesting processing is input from the user, the importance required for the query is calculated using the importance of the CNN channel stored in the pre-processing step, and the unnecessary part is deleted, thereby reducing the model weight. .

Description

A method and apparatus for splitting and merging a trained neural network model according to a user's query {Method and Apparatus for Splitting and Assembling Neural Networks for Model Generation Queries}

본 발명은 학습된 신경망 모델을 사용자의 질의에 맞춰 분할하고 병합하는 방법 및 장치에 관한 것이다. The present invention relates to a method and apparatus for segmenting and merging a learned neural network model according to a user's query.

신경망 모델의 경량화, 압축 분야는 활발히 연구가 이루어지고 있는 분야 중 하나이다. 대표적으로 파라미터를 양자화하여, 모델의 용량과 계산속도를 늘리는 방법과[1], 파라미터의 행렬 분해를 통해 모델을 경량화 하는 방법[2], 불필요한 부분을 찾아내 삭제하는 프루닝 방법[3, 4]등이 존재한다. 이러한 모델 경량화 방법들은 추가적인 학습 비용을 적게 가져가거나, 혹은 추가적인 학습이 필요하지 않은 방법들도 많이 존재한다. The field of weight reduction and compression of neural network models is one of the fields that are being actively researched. Representatively, the method of increasing the capacity and calculation speed of the model by quantizing the parameters [1], the method of reducing the weight of the model through matrix decomposition of the parameters [2], and the pruning method of finding and deleting unnecessary parts [3, 4] ] and so on. These model lightweight methods have a small additional training cost, or there are many methods that do not require additional training.

종래기술[5]에서 처음 제안된 지식 증류 기법은 기존의 사전 학습된 대형 신경망을 티처(teacher) 모델로 간주하고, 이 티처(teacher) 모델의 예측 결과값을 이용하여 상대적으로 더 작은 신경망인 스튜던트(student) 모델을 학습하는 알고리즘이다. 이 방법을 통해, 새롭게 학습되는 스튜던트(student) 모델은 더 작은 크기로도 티처(teacher) 모델에 근접하는 성능을 낼 수 있다는 것이 알려져 있기 때문에, 경량화 신경망을 만드는 방법 중 하나로 자리잡고 있다. 하지만 이러한 접근법은 여전히 완전한 초기 상태의 모델에 대한 학습과정이 필요하다는 단점이 존재하며, 학습이 성공적으로 이루어지기 위해서는 목적 태스크를 위한 데이터 샘플이 충분해야 한다는 가정을 전제로 한다. The knowledge distillation technique first proposed in the prior art [5] regards the existing pre-trained large neural network as a teacher model, and uses the prediction result of this teacher model to create a relatively smaller neural network, the Student. (student) It is an algorithm for learning the model. Through this method, since it is known that a newly trained student model can achieve performance close to that of a teacher model even with a smaller size, it is positioned as one of the methods for creating lightweight neural networks. However, this approach still has the disadvantage that it requires a learning process for the model in a complete initial state, and it is premised on the assumption that sufficient data samples for the target task are required for the learning to be successful.

최근 심층 신경망 모델의 긴 추론 시간(inference time)을 줄이기 위한 방법 중 하나로 나온 것이 모델 전문화(model specialization)[6, 7]이다. 이는 모델의 태스크를 줄여, 제한된 태스크에 대한 모델을 만드는 방법이다. 이는 일반적인 기계학습 알고리즘들이 모든 데이터에 대한 일반화(generalization)된 모델을 만드는 것을 목표로 하고 있는 것과 반대로, 일반화를 포기하는 대신 부분 태스크에 대한 신뢰성 있는 모델을 만드는 것에 초점을 둔다. 본 발명에서 제안된 방법이 목표로 하는 부분 또한 이와 가장 유사하다고 할 수 있다. 하지만 기존 모델전문화 방법들은 단순한 방법을 사용하였는데, 서로 다른 구조와 사이즈를 가진 소형신경망들을 후보 모델들로 간주하고 각 소형 신경망에 목적 태스크에 맞는 데이터 샘플을 가지고 학습한 뒤, 이 중에서 최적의 신경망을 선택하는 방식을 사용하였다.One of the recent methods to reduce the long inference time of deep neural network models is model specialization [6, 7]. This is a way to reduce the task of the model and create a model for a limited task. It focuses on creating reliable models for partial tasks instead of giving up generalization, as opposed to general machine learning algorithms that aim to create generalized models for all data. It can be said that the part targeted by the method proposed in the present invention is also most similar to this. However, the existing model specialization methods used a simple method. Small neural networks with different structures and sizes were considered as candidate models, and each small neural network was trained with data samples suitable for the target task, and then the optimal neural network was selected. selection method was used.

기존 모델 전문화 방법들은 먼저 서로 다른 구조와 사이즈를 가진 몇 가지 소형 신경망들을 준비한 뒤, 각 후보 모델에 목적 태스크에 맞는 데이터 샘플을 학습시켜 후보 모델들은 생성한 다음 후보 모델들의 성능을 비교하여 최적의 신경망을 선택하는 방식을 사용하였다. 이는 사용자가 미리 지정한 모델 후보군 내에서만 구조를 선택할 수 있기에, 최적의 구조가 아닐 확률이 높으며, 여전히 초기 무작위 파라미터 값을 가진 소형 신경망들을 다시 학습하는 과정을 거쳐야 하는 단점이 존재한다. 본 발명은 이와 같은 단순한 방식의 해결책을 극복하여, 좀 더 과학적으로 최적의 전문화된 모델을 찾아내기 위한 방법을 제안하고자 한다. 본 발명이 이루고자 하는 기술적 과제는 기존에 학습된 합성곱 신경망 모델(convolution Neural Network)을 주어진 부분 태스크(Subset task)에 전문화(specialization) 시켜 모델을 경량화하고, 사용시 추론속도 등을 향상시키는 방법 및 장치를 제공하는데 있다.Existing model specialization methods first prepare several small neural networks with different structures and sizes, then train each candidate model with data samples suitable for the target task to create candidate models, and then compare the performance of the candidate models to optimize the neural network. was used to select . This is because the user can select a structure only from within the pre-specified model candidate group, so there is a high probability that the structure is not optimal, and there is a disadvantage in that small neural networks with initial random parameter values must be re-trained. The present invention is to overcome such a simple solution and propose a method for finding a more scientifically optimal specialized model. The technical task to be achieved by the present invention is a method and apparatus for reducing the weight of a model by specializing a previously learned convolutional neural network model to a given subset task, and improving the inference speed when using it. is to provide

일 측면에 있어서, 본 발명에서 제안하는 학습된 신경망 모델을 사용자의 질의에 맞춰 분할하고 병합하는 방법은 신경망 모델이 학습한 모든 태스크에 대한 합성곱 신경망(Convolution Neural Network; CNN) 채널의 중요도를 저장하는 모델 전처리 단계 및 사용자로부터 처리를 요청하는 부분 태스크에 대한 쿼리가 입력되면, 해당 쿼리에 필요한 중요도를 전처리 단계에서 저장된 CNN 채널의 중요도를 이용하여 계산하고, 불필요한 부분에 대해 삭제함으로써 모델을 경량화 하는 쿼리 단계를 포함한다. In one aspect, the method of segmenting and merging the trained neural network model proposed by the present invention according to a user's query stores the importance of a convolutional neural network (CNN) channel for all tasks learned by the neural network model. When a query is input for the model pre-processing step and the partial task requested by the user, the importance required for the query is calculated using the importance of the CNN channel stored in the pre-processing step, and unnecessary parts are deleted to lighten the model. It includes the query phase.

신경망 모델이 학습한 모든 태스크에 대한 합성곱 신경망 채널의 중요도를 저장하는 모델 전처리 단계는 신경망 모델이 학습한 모든 태스크에 대해 각각의 데이터를 활용하여 합성곱 신경망 채널의 활성화 정도를 계산하고, 계산된 활성화 정도에 기초하여 각각의 태스크에 대한 원본 필터의 중요도를 계산하고, 계상된 중요도를 쿼리 단계에서 사용할 수 있도록 리스트로 저장한다. In the model preprocessing step, which stores the importance of the convolutional neural network channel for all tasks learned by the neural network model, the activation degree of the convolutional neural network channel is calculated using each data for all tasks learned by the neural network model, and the calculated Based on the activation level, the importance of the original filter for each task is calculated, and the calculated importance is stored as a list for use in the query step.

사용자로부터 처리를 요청하는 부분 태스크에 대한 쿼리가 입력되면, 해당 쿼리에 필요한 중요도를 전처리 단계에서 저장된 CNN 채널의 중요도를 이용하여 계산하고, 불필요한 부분에 대해 삭제함으로써 모델을 경량화 하는 쿼리 단계는 가장 작은 태스크인 원시 태스크가 쿼리로써 입력되면, 미리 저장되어 있는 해당 태스크에 대한 채널 중요도 리스트를 선택하여 채널 프루닝을 진행한 후, 세밀 조정 학습을 통해 최종 전문화모델을 생성한다. When a query for a partial task requesting processing from a user is input, the query step, which calculates the importance required for the query using the importance of the CNN channel stored in the pre-processing step, and deletes unnecessary parts, to reduce the weight of the model is the smallest When a raw task, which is a task, is input as a query, channel pruning is performed by selecting a pre-stored channel importance list for the corresponding task, and then a final specialized model is generated through fine-tuning learning.

사용자로부터 처리를 요청하는 부분 태스크에 대한 쿼리가 입력되면, 해당 쿼리에 필요한 중요도를 전처리 단계에서 저장된 CNN 채널의 중요도를 이용하여 계산하고, 불필요한 부분에 대해 삭제함으로써 모델을 경량화 하는 쿼리 단계는 원시 태스크보다 큰 임의의 태스크가 쿼리로써 입력되면, 해당 태스크에 맞는 채널 중요도를 다시 계산하기 위해 모델 전처리 단계에서 미리 계산된 원시 태스크에 대한 채널 중요도 리스트를 모두 더하여 태스크 전체에 대한 채널 중요도를 구한다. When a query for a partial task requesting processing from the user is input, the query step, which calculates the importance required for the query using the importance of the CNN channel stored in the pre-processing step, and deletes unnecessary parts, is a raw task When a larger arbitrary task is input as a query, the channel importance for the entire task is obtained by adding up all the channel importance lists for the raw task pre-calculated in the model preprocessing step in order to recalculate the channel importance for the task.

또 다른 일 측면에 있어서, 본 발명에서 제안하는 학습된 신경망 모델을 사용자의 질의에 맞춰 분할하고 병합하는 장치는 신경망 모델이 학습한 모든 태스크에 대한 합성곱 신경망(Convolution Neural Network; CNN) 채널의 중요도를 저장하는 모델 전처리부 및 사용자로부터 처리를 요청하는 부분 태스크에 대한 쿼리가 입력되면, 해당 쿼리에 필요한 중요도를 전처리 단계에서 저장된 CNN 채널의 중요도를 이용하여 계산하고, 불필요한 부분에 대해 삭제함으로써 모델을 경량화 하는 쿼리 수행부를 포함한다. In another aspect, the apparatus for dividing and merging the trained neural network model proposed in the present invention according to the user's query is the importance of the Convolution Neural Network (CNN) channel for all tasks learned by the neural network model. When a query is input for a partial task requesting processing from the model preprocessor that stores Includes a lightweight query execution unit.

본 발명의 실시예들에 따른 전문화 기법을 통한 모델의 분류(classification) 성능 평가에 있어서, 쿼리로 주어진 부분 태스크에 대해 원 모델과 동등한 성능을 내면서, 모델의 크기와 추론속도(inference time)가 모두 향상되었다. In the evaluation of the classification performance of the model through the specialization technique according to the embodiments of the present invention, both the size of the model and the inference time are improved

도 1은 본 발명의 일 실시예에 따른 학습된 신경망 모델을 사용자의 질의에 맞춰 분할하고 병합하는 방법을 설명하기 위한 흐름도이다.
도 2는 본 발명의 일 실시예에 따른 모델 전처리 단계에서 채널의 중요도를 계산하는 과정을 나타내는 도면이다.
도 3은 본 발명의 일 실시예에 따른 채널 프루닝 진행 과정을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 쿼리 단계를 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 학습된 신경망 모델을 사용자의 질의에 맞춰 분할하고 병합하는 장치의 구성을 나타내는 도면이다.
도 6은 본 발명의 일 실시예에 따른 정보 밀도량 측면에서의 성능을 종래기술과 비교한 그래프이다. 1 is a flowchart illustrating a method of dividing and merging a learned neural network model according to a user's query according to an embodiment of the present invention.
2 is a diagram illustrating a process of calculating the importance of a channel in a model preprocessing step according to an embodiment of the present invention.
3 is a diagram for explaining a channel pruning process according to an embodiment of the present invention.
4 is a diagram for explaining a query step according to an embodiment of the present invention.
5 is a diagram illustrating the configuration of an apparatus for dividing and merging a learned neural network model according to a user's query according to an embodiment of the present invention.
6 is a graph comparing performance in terms of information density according to an embodiment of the present invention with that of the prior art.

종래기술들은 모델의 크기를 줄인다는 점에서 본 발명에서 제안하는 방법의 목표와 일치하며, 본 발명에서 제안된 방법에 사용된 알고리즘 또한 채널 프루닝을 기반으로 하고 있다. 하지만 본 발명의 최종 목표는 모든 태스크에 대한 성능을 유지하는 것이 아닌 사용자가 요구하는 제한된 태스크에 최적화된 전문화 모델(specialized model)을 만드는 것이라고 할 수 있기에, 모델 경량화의 접근법과는 차이점이 존재한다. 본 발명에서 제안된 방법은 추가 학습을 하지 않기 때문에 학습과정에 필요한 자원을 최대한 줄일 수 있으며, 데이터가 부족하더라도 성공적으로 경량 신경망을 만들어 낼 수 있다는 점에서 차별점이 있다. 또한, 종래기술과는 다르게 본 발명에서는 채널 프루닝 기반의 방법을 통해 신경망을 학습하지 않고 모델의 전문화(specialization)를 이뤄내고자 하였다. The prior art is consistent with the goal of the method proposed in the present invention in that it reduces the size of the model, and the algorithm used in the method proposed in the present invention is also based on channel pruning. However, the final goal of the present invention is not to maintain performance for all tasks, but to create a specialized model optimized for a limited task required by the user, so there is a difference from the approach of reducing the model weight. The method proposed in the present invention is differentiated in that it can reduce the resources required for the learning process as much as possible because additional learning is not performed, and can successfully create a lightweight neural network even when data is insufficient. In addition, unlike the prior art, the present invention tried to achieve specialization of the model without learning the neural network through the channel pruning-based method.

본 발명은 기존에 학습된 합성곱 신경망 모델(convolution Neural Network)을 주어진 부분 태스크(Subset task)에 전문화(specialization) 시켜 모델을 경량화하고, 사용시 추론속도 등을 향상시키는 방법을 제안한다. 본 발명은 이를 위해서 데이터 기반의 채널 프루닝 방법(data dependent channel pruning)을 활용하여, 부분 태스크에 불필요한 채널들을 삭제하여 모델을 경량화 한 뒤, 부분 태스크에 적합하게 세밀조정(fine tuning)하여 부분 태스크에 전문화시키는 방법을 사용한다. 또한 각각의 태스크에 대한 채널들의 중요도를 저장하였다가, 사용자의 쿼리에 맞춰 각 태스크에 대한 중요도를 병합하여 사용자가 원하는 어떠한 태스크에 대해서도 빠르게 전문화 모델을 생성할 수 있는 방법을 제시한다. 본 발명에서 제안되는 전문화 기법을 통해 모델의 분류(classification) 성능 평가를 실시한 결과, 쿼리로 주어진 부분 태스크에 대해 원 모델과 동등한 성능을 내면서, 모델의 크기와 추론속도(inference time) 모두 향상되는 결과를 확인할 수 있었다. 이하, 본 발명의 실시 예를 첨부된 도면을 참조하여 상세하게 설명한다. The present invention proposes a method of reducing the weight of a model by specializing a previously learned convolutional neural network model to a given subset task, and improving inference speed when used. To this end, the present invention utilizes a data dependent channel pruning method to reduce the weight of the model by deleting unnecessary channels for the partial task, and then fine-tunes the partial task to suit the partial task. use a method that specializes in In addition, we propose a method for storing the importance of channels for each task and then merging the importance for each task according to the user's query to quickly create a specialized model for any task desired by the user. As a result of evaluating the classification performance of the model through the specialization technique proposed in the present invention, both the size and inference time of the model are improved while providing the same performance as the original model for the partial task given by the query. was able to check Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명에서 사용되는 용어 '합성곱 신경망(CNN, Convolution Neural Network)'은 신경망 모델의 일종으로, 합성곱 알고리즘을 통해 입력된 데이터로부터 정보를 추출하는 모델을 의미한다. 본 발명에서 사용되는 용어 '부분 태스크(Sub task)'는 원본 모델이 학습한 전체 태스크 중에서, 사용자가 원하는 일부의 태스크를 의미한다. 본 발명에서 사용되는 용어 '채널 프루닝(Channel Pruning)'은 CNN 내에 존재하는 채널들 중, 불필요한 채널을 생성하는 필터들을 찾아내어 삭제함으로써 CNN 모델을 경량화 하는 일련의 방법을 나타낸다. The term 'convolutional neural network (CNN)' used in the present invention is a kind of neural network model, and refers to a model for extracting information from data input through a convolution algorithm. The term 'sub task' used in the present invention refers to a partial task desired by a user among all tasks learned by the original model. The term 'channel pruning' used in the present invention refers to a series of methods for lightening the CNN model by finding and deleting filters that generate unnecessary channels among channels existing in the CNN.

도 1은 본 발명의 일 실시예에 따른 학습된 신경망 모델을 사용자의 질의에 맞춰 분할하고 병합하는 방법을 설명하기 위한 흐름도이다. 1 is a flowchart illustrating a method of dividing and merging a learned neural network model according to a user's query according to an embodiment of the present invention.

제안하는 학습된 신경망 모델을 사용자의 질의에 맞춰 분할하고 병합하는 방법은 신경망 모델이 학습한 모든 태스크에 대한 합성곱 신경망(Convolution Neural Network; CNN) 채널의 중요도를 저장하는 모델 전처리 단계(110) 및 사용자로부터 처리를 요청하는 부분 태스크에 대한 쿼리가 입력되면, 해당 쿼리에 필요한 중요도를 전처리 단계에서 저장된 CNN 채널의 중요도를 이용하여 계산하고, 불필요한 부분에 대해 삭제함으로써 모델을 경량화 하는 쿼리 단계(120)를 포함한다. The method of segmenting and merging the proposed trained neural network model according to the user's query is a model preprocessing step 110 that stores the importance of a convolutional neural network (CNN) channel for all tasks learned by the neural network model. When a query for a partial task requesting processing is input from the user, the importance required for the query is calculated using the importance of the CNN channel stored in the pre-processing step, and the unnecessary part is deleted by the query step 120 to reduce the weight of the model. includes

모델 전처리 단계(110)에서, 신경망 모델이 학습한 모든 태스크에 대한 합성곱 신경망(Convolution Neural Network; CNN) 채널의 중요도를 저장한다. 다시 말해, 모델이 학습한 모든 태스크에 대한 CNN 채널의 중요도를 기록하게 된다. In the model preprocessing step 110 , the importance of a convolutional neural network (CNN) channel for all tasks learned by the neural network model is stored. In other words, it records the importance of the CNN channel for every task the model learned.

신경망 모델이 학습한 모든 태스크에 대해 각각의 데이터를 활용하여 합성곱 신경망 채널의 활성화 정도를 계산하고, 계산된 활성화 정도에 기초하여 각각의 태스크에 대한 원본 필터의 중요도를 계산한다. 이후, 계산된 중요도를 쿼리 단계에서 사용할 수 있도록 리스트로 저장한다. For all tasks learned by the neural network model, each data is used to calculate the activation degree of the convolutional neural network channel, and the importance of the original filter for each task is calculated based on the calculated activation degree. After that, the calculated importance is stored as a list for use in the query step.

쿼리 단계(120)에서, 사용자로부터 처리를 요청하는 부분 태스크에 대한 쿼리가 입력되면, 해당 쿼리에 필요한 중요도를 전처리 단계에서 저장된 CNN 채널의 중요도를 이용하여 계산하고, 불필요한 부분에 대해 삭제함으로써 모델을 경량화한다. In the query step 120, when a query for a partial task requesting processing from the user is input, the importance required for the query is calculated using the importance of the CNN channel stored in the pre-processing step, and the model is deleted by deleting unnecessary parts. lighten

가장 작은 태스크인 원시 태스크가 쿼리로써 입력되면, 미리 저장되어 있는 해당 태스크에 대한 채널 중요도 리스트를 선택하여 채널 프루닝을 진행한 후, 세밀 조정 학습을 통해 최종 전문화모델을 생성한다. When a raw task, which is the smallest task, is input as a query, channel pruning is performed by selecting a pre-stored channel importance list for the corresponding task, and then a final specialized model is generated through fine-tuning learning.

원시 태스크보다 큰 임의의 태스크가 쿼리로써 입력되면, 해당 태스크에 맞는 채널 중요도를 다시 계산하기 위해 모델 전처리 단계에서 미리 계산된 원시 태스크에 대한 채널 중요도 리스트를 모두 더하여 태스크 전체에 대한 채널 중요도를 구한다. If any task larger than the raw task is input as a query, the channel importance for the entire task is obtained by adding up all the channel importance lists for the raw task pre-calculated in the model preprocessing step in order to recalculate the channel importance for the task.

도 2는 본 발명의 일 실시예에 따른 모델 전처리 단계에서 채널의 중요도를 계산하는 과정을 나타내는 도면이다. 2 is a diagram illustrating a process of calculating the importance of a channel in a model preprocessing step according to an embodiment of the present invention.

모델을 전문화하기 위한 첫 번째 단계는 모델 전처리이다. 이 단계에서는, 모델이 학습한 모든 태스크에 대해 각각 데이터를 활용하여 채널의 활성화 정도를 계산한 뒤, 이를 바탕으로 각 태스크에 대한 원본 필터의 중요도를 계산하게 된다. 예를 들어, 10개의 사물을 구분하는 태스크를 학습한 원본 모델에 대해 전처리를 수행한다고 하자. 1~10번까지의 각각의 클래스를 하나의 태스크로 생각할 수 있으며, 이를 원시 태스크(primitive task)로 정의한 뒤 각 원시 태스크에 대해 개별적으로 채널의 중요도를 계산하게 된다. The first step to specializing a model is model preprocessing. In this step, the degree of channel activation is calculated using data for all tasks learned by the model, and the importance of the original filter for each task is calculated based on this. For example, suppose that preprocessing is performed on the original model that has learned the task of classifying 10 objects. Each class from 1 to 10 can be considered as one task, and after defining it as a primitive task, the importance of the channel is calculated individually for each primitive task.

각 원시 태스크

(210)에 속하는 데이터

가 주어지면 이를 원본 모델(220)에 입력한 다음, 모든 레이어

에 대하여 출력 채널

와 최종 출력 값에 대한 채널의 기울기(gradient)

를 구하게 된다. 그리고 계산된 채널 활성화 값과 기울기를 통해 태스크 인식 채널 중요도(task - aware channel saliency)를 다음과 같이 계산한다.each raw task

Data belonging to (210)

is given, it is input to the original model 220, and then all layers

About output channel

and the gradient of the channel with respect to the final output value.

will save Then, the task-aware channel saliency is calculated using the calculated channel activation value and gradient as follows.

위의 식은 각 채널의 픽셀 활성화 정도와 각 픽셀들의 태스크 출력에 대한 기여도를 채널 전체에 대하여 평균을 낸 것이다. 이 때 , 는 각 채널의 가로, 세로 길이이므로 는 채널의 픽셀 수를 의미한다. 결국 각 채널이 활성화가 많이 될수록, 그리고 각 채널이 최종 출력에 기여하는 정도가 클수록 각 채널이 중요하다고 판단하는 의미이다. 도 2는 지금까지 설명한 전처리 단계에서 채널의 중요도를 계산하는 방법을 도식화한 것이다. The above equation is the average of the pixel activation degree of each channel and the contribution of each pixel to the task output for the entire channel. In this case, since is the horizontal and vertical length of each channel, is the number of pixels in the channel. In the end, the more each channel is activated, and the greater the contribution of each channel to the final output, the more important it is to judge each channel. 2 is a schematic diagram of a method of calculating the importance of a channel in the pre-processing step described so far.

위와 같이 채널의 중요도를 모델이 학습한 모든 원시 태스크마다 각각 계산한 다음, 이 중요도를 쿼리 단계에서 사용할 수 있도록 리스트로 저장한다. As above, the importance of the channel is calculated for every raw task the model has learned, and then this importance is stored as a list for use in the query step.

도 3은 본 발명의 일 실시예에 따른 채널 프루닝 진행 과정을 설명하기 위한 도면이다. 3 is a diagram for explaining a channel pruning process according to an embodiment of the present invention.

수행해야 할 부분 태스크에 대한 쿼리를 사용자로부터 받게 되면, 이 중요도를 바탕으로 불필요한 채널을 생성하는 필터를 삭제하게 된다. 일반적인 프루닝 방식에서는 삭제할 채널의 수를 사용자가 비율의 형태로 설정하게 된다. 하지만 본 발명에서는 태스크마다 최적의 채널 수가 다를 수 있다는 점을 고려하여, 절대적인 비율대신 채널 중요도의 평균을 기준으로 중요도가 부족한 채널을 자르게 된다. 이 때 사용자가 정하는 프루닝 강도 가중치

를 이용하여 모델에 대한 프루닝 강도를 조절하게 된다. 예를 들어, 가중치를 이용하여 임계값을 구하고, 임계값 이하의 채널은 삭제할 수 있다. When a query for a partial task to be performed is received from a user, the filter that creates an unnecessary channel is deleted based on this importance. In a general pruning method, the user sets the number of channels to be deleted in the form of a ratio. However, in the present invention, considering that the optimal number of channels may be different for each task, channels lacking in importance are cut based on the average of channel importance instead of an absolute ratio. At this time, the pruning intensity weight determined by the user

is used to adjust the pruning strength for the model. For example, a threshold may be obtained using a weight, and channels less than or equal to the threshold may be deleted.

도 4는 본 발명의 일 실시예에 따른 쿼리 단계를 설명하기 위한 도면이다.4 is a diagram for explaining a query step according to an embodiment of the present invention.

쿼리단계에서는, 사용자가 처리를 요청하는 부분 태스크에 대한 쿼리(410)가 모델(420)에 입력되면, 이에 맞는 모델을 생성해내는 단계이다. 만약 쿼리로써 가장 작은 태스크인 원시 태스크가 주어지게 되면, 미리 저장되어 있는 해당 태스크에 대한 채널 중요도 리스트를 곧바로 선택하여 프루닝을 진행한 후, 세밀 조정 학습을 통해 최종 전문화모델을 생성하게 된다. In the query step, when a query 410 for a partial task requested by the user is input to the model 420, a model corresponding thereto is generated. If a raw task, which is the smallest task, is given as a query, the pre-stored channel importance list for the task is immediately selected, pruning is performed, and the final specialized model is generated through fine-tuning learning.

만약 사용자가 원시 태스크보다 큰 임의의 태스크를 요청할 경우에는, 저장되어 있는 채널 중요도 리스트를 곧바로 사용할 수는 없고, 임의의 태스크에 맞는 채널 중요도를 다시 계산해야 한다. 이 때 사용자로부터 주어진 태스크는 결국 n개의 원시 태스크로 구성되어 있기 때문에, 전처리단계에서 미리 계산된 원시 태스크에 대한 채널 중요도 리스트를 모두 더하는 것으로 태스크 전체에 대한 채널 중요도를 구할 수 있다. 이는 결국 각 원시 태스크에 대한 정보를 병합하여 더 큰 전문화 모델을 생성할 수 있게 하는 것으로써, 임의의 태스크에 대해 전처리단계처럼 처음부터 다시 채널 중요도를 구하는 방식보다 훨씬 빠르게 전문화 모델을 생성할 수 있다. If the user requests an arbitrary task that is larger than the original task, the stored channel importance list cannot be used immediately, and the channel importance suitable for the arbitrary task must be recalculated. At this time, since the task given by the user is eventually composed of n raw tasks, the channel importance for the entire task can be obtained by adding up all the channel importance lists for the raw tasks calculated in advance in the preprocessing step. This, in turn, makes it possible to create a larger specialized model by merging the information on each raw task, so that it is possible to create a specialized model much faster than the method of finding the channel importance from the beginning again like a preprocessing step for an arbitrary task. .

도 5는 본 발명의 일 실시예에 따른 학습된 신경망 모델을 사용자의 질의에 맞춰 분할하고 병합하는 장치의 구성을 나타내는 도면이다. 5 is a diagram illustrating a configuration of an apparatus for dividing and merging a learned neural network model according to a user's query according to an embodiment of the present invention.

제안하는 학습된 신경망 모델을 사용자의 질의에 맞춰 분할하고 병합하는 장치의 구성은 모델 전처리부(510) 및 쿼리 수행부(520)를 포함한다. The configuration of the apparatus for segmenting and merging the proposed trained neural network model according to a user's query includes a model preprocessor 510 and a query execution unit 520 .

모델 전처리부(510)는 신경망 모델이 학습한 모든 태스크에 대한 합성곱 신경망(Convolution Neural Network; CNN) 채널의 중요도를 저장한다. 다시 말해, 모델이 학습한 모든 태스크에 대한 CNN 채널의 중요도를 기록하게 된다. The model preprocessor 510 stores the importance of a convolutional neural network (CNN) channel for all tasks learned by the neural network model. In other words, it records the importance of the CNN channel for every task the model learned.

쿼리 수행부(520)는 사용자로부터 처리를 요청하는 부분 태스크에 대한 쿼리가 입력되면, 해당 쿼리에 필요한 중요도를 전처리 단계에서 저장된 CNN 채널의 중요도를 이용하여 계산하고, 불필요한 부분에 대해 삭제함으로써 모델을 경량화한다. When a query for a partial task requesting processing is input from the user, the query performing unit 520 calculates the importance required for the query using the importance of the CNN channel stored in the pre-processing step, and deletes unnecessary parts to build the model. lighten

도 6은 본 발명의 일 실시예에 따른 정보 밀도량 측면에서의 성능을 종래기술과 비교한 그래프이다. 6 is a graph comparing performance in terms of information density according to an embodiment of the present invention with that of the prior art.

도 6의 그래프는 기존의 채널 프루닝 방법[3, 4]과 본 발명에서 제안한 태스크 기반 중요도를 고려한 채널 프루닝 방법을 두 가지 신경망 모델에 적용하여, 정보 밀도량(average information density) 측면에서 성능을 비교한 결과이다. 이 때 정보 밀도량이란 성능을 모델의 파라미터 수로 나눈 값으로써, 모델의 크기 대비 성능의 정도를 비교할 수 있는 지표이다. 실험 결과 태스크 인식 채널 중요도를 활용하여 채널을 프루닝 했을 때 정보 밀도량이 더 높은 것을 확인 할 수 있고, 이를 통해 본 발명에서 제안한 태스크 인식 채널 중요도가 유효하다고 볼 수 있다.The graph of FIG. 6 shows the performance in terms of average information density by applying the existing channel pruning method [3, 4] and the channel pruning method considering the task-based importance proposed in the present invention to two neural network models. is the result of comparing In this case, the information density is a value obtained by dividing the performance by the number of parameters of the model, and is an index that can compare the degree of performance versus the size of the model. As a result of the experiment, it can be confirmed that the information density is higher when the channel is pruned using the task recognition channel importance, and through this, it can be seen that the task recognition channel importance proposed in the present invention is valid.

표 1은 신경망 모델 중 하나인 VGG 16 모델을 기반으로, 사용자로부터 부분 태스크에 대한 쿼리가 주어졌을 때 해당 태스크에 전문화된 모델을 생성해내는 실험의 결과를 비교한 것이다. Table 1 compares the results of an experiment that generates a model specialized for a task when a query for a partial task is given from a user based on the VGG 16 model, which is one of the neural network models.

주어진 데이터를 처음부터 학습하는 방식이나(joint training), 지식 증류 기반의 방식(UHC, KD)[5, 8]보다 본 발명에서 제안한 방식을 통해 생성된 전문화 모델의 성능이 더 높은 것을 볼 수 있다. 뿐만 아니라, 전문화 모델을 생성해내는 시간 또한 다른 모델에 비해 빠른 것을 확인할 수 있다. It can be seen that the performance of the specialized model generated through the method proposed in the present invention is higher than the method of learning given data from scratch (joint training) or the method based on knowledge distillation (UHC, KD) [5, 8] . In addition, it can be seen that the time to create a specialized model is also faster than other models.

참고문헌references

[1] S. Han et al., Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, in ICLR, 2016. [1] S. Han et al., Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding, in ICLR, 2016.

[2] M. Jaderberg et. al., Speeding up convolutional neural networks with low rank expansions, in proc. BMVC, 2014 [2] M. Jaderberg et. al., Speeding up convolutional neural networks with low rank expansions, in proc. BMVC, 2014

[3] J.H. Luo et al., ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression, in ICCV, 2017 [3] J.H. Luo et al., ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression, in ICCV, 2017

[4] Y. He et al., Channel Pruning for Accelerationg Very Deep Neural Networks, in ICCV, 2017 [4] Y. He et al., Channel Pruning for Accelerationg Very Deep Neural Networks, in ICCV, 2017

[5] G. Hinton et al., Distilling the Knowledge in a Neural Network, in NIPS, 2014 [5] G. Hinton et al., Distilling the Knowledge in a Neural Network, in NIPS, 2014

[6] D. Kang et. al., NoScope: Optimizing Neural Network Queries over Video at Scale, in VLDB, 2017 [6] D. Kang et. al., NoScope: Optimizing Neural Network Queries over Video at Scale, in VLDB, 2017

[7] K. Hsieh et. al., Focus: Querying Large Video Datasets with Low Latency and Low Cost, in OSDI, 2018 [7] K. Hsieh et. al., Focus: Querying Large Video Datasets with Low Latency and Low Cost, in OSDI, 2018

[8] J. Vongkulbhisal et. al., Unifying heterogeneous classiers with distillation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. pp. 3175-3184 (2019) [8] J. Vongkulbhisal et. al., Unifying heterogeneous classiers with distillation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. pp. 3175-3184 (2019)

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다.　 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다.　 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다.　 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다.　 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다.　 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다.　 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다.　 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.　 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.　 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.　 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.　 The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다.　 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible by those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A model preprocessing step of storing the importance of a convolutional neural network (CNN) channel for all tasks learned by the neural network model; and
When a query for a partial task requesting processing is input from the user, depending on whether a raw task is given as the input query, the CNN channel for the corresponding task of the input query from the priority list of the CNN channels stored in the model preprocessing step Select the importance list of to proceed with channel pruning, or add all the importance lists of the CNN channels for the corresponding task of the input query among the importance lists of the CNN channels stored in the model preprocessing step to determine the importance required for the input query. Query step to lighten the model by calculating and deleting unnecessary parts
including,
The model preprocessing step of storing the importance of the CNN channel for all tasks learned by the neural network model is,
For all tasks learned by the neural network model, the activation degree of the CNN channel is calculated using each data, and the importance of the original filter for each task is calculated based on the calculated activation degree, and the calculated importance is used in the query step. to save it as a list for use in
How to split and merge trained neural network models.

delete

According to claim 1,
The query step of reducing the weight of the model is
When a raw task, the smallest task, is input as a query, it selects the priority list of the CNN channel for the task stored in advance, performs channel pruning, and then generates a final specialized model through fine-tuning learning.
How to split and merge trained neural network models.

According to claim 1,
The query step of reducing the weight of the model is
When an arbitrary task larger than the raw task is input as a query, the channel importance for the whole task is calculated by adding all the importance lists of the CNN channels for the raw task pre-computed in the model preprocessing step to recalculate the channel importance for the task. seeking
How to split and merge trained neural network models.

a model preprocessor that stores the importance of a convolutional neural network (CNN) channel for all tasks learned by the neural network model; and
When a query for a partial task requesting processing is input from a user, depending on whether a raw task is given as an input query, a CNN channel for the corresponding task of the input query from the priority list of CNN channels stored in the model preprocessor Select the importance list of to proceed with channel pruning, or add all the importance lists of the CNN channels for the corresponding task of the input query from the priority list of the CNN channels stored in the model preprocessor to determine the importance required for the input query. Query execution unit that calculates and deletes unnecessary parts to lighten the model
including,
The model preprocessor,
For all tasks learned by the neural network model, the activation degree of the convolutional neural network channel is calculated using each data, and the importance of the original filter for each task is calculated based on the calculated activation degree, and the calculated importance is calculated. It is stored as a list for use in the query execution part.
A trained neural network model splitting and merging device.

delete

6. The method of claim 5,
The query execution unit,
When a raw task, the smallest task, is input as a query, it selects the priority list of the CNN channel for the task stored in advance, performs channel pruning, and then generates a final specialized model through fine-tuning learning.
A trained neural network model splitting and merging device.

6. The method of claim 5,
The query execution unit,
When an arbitrary task larger than the raw task is input as a query, the channel importance for the entire task is calculated by adding all the priority lists of the CNN channels for the raw task pre-calculated in the model preprocessor in order to recalculate the channel importance for the task. seeking
A trained neural network model splitting and merging device.