KR20210038027A

KR20210038027A - Method for Training to Compress Neural Network and Method for Using Compressed Neural Network

Info

Publication number: KR20210038027A
Application number: KR1020190120627A
Authority: KR
Inventors: 최현철; 김민성
Original assignee: 영남대학교 산학협력단
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2021-04-07
Also published as: KR102305981B1

Abstract

Disclosed are a neural network compression training method and a method for using a compressed neural network. According to one aspect of the present invention, provided is a method for compressing a neural network, and more particularly, a neural network compression method for reducing the number of parameters to be trained by a neural network to reduce the computational throughput and computational speed of the neural network. The neural network compression training method comprises the following processes of: obtaining a plurality of batches and providing the plurality of batches to a neural network; and updating parameters of the neural network for each batch.

Description

Neural Network Compression Training Method and Method for Using Compressed Neural Network {Method for Training to Compress Neural Network and Method for Using Compressed Neural Network}

본 발명의 실시예들은 신경망을 압축하는 방법, 특히 신경망이 학습할 파라미터의 수를 줄여 신경망의 연산 처리량 및 연산 속도를 감소시키는 신경망 압축 방법에 관한 것이다.Embodiments of the present invention relate to a method of compressing a neural network, in particular, to a method of compressing a neural network in which the number of parameters to be learned by the neural network is reduced to reduce the computational throughput and the computation speed of the neural network.

이 부분에 기술된 내용은 단순히 본 발명에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The content described in this section merely provides background information on the present invention and does not constitute prior art.

최근 딥러닝(deep learning)을 기반으로 하는 이미지 분류 방법 또는 이미지 스타일 전환 방법(image style transfer)은 수많은 채널(channel)을 가지는 신경망(neural network)을 이용한다. 이러한 신경망은 일반적인 이미지 분류 작업 또는 이미지 스타일 전환 방법을 수행하기 위해 많은 수의 파라미터들(parameters)을 학습해야한다. 신경망이 학습해야하는 파라미터가 많을수록 프로세서의 연산 처리량 및 소모 전력이 기하급수적으로 증가하는 문제점이 있다. 따라서, 프로세서의 연산 처리량, 처리속도 및 전력소모 측면에서 신경망을 압축할 필요가 있다.Recently, an image classification method or image style transfer method based on deep learning uses a neural network having a number of channels. Such a neural network needs to learn a large number of parameters to perform a general image classification task or an image style conversion method. There is a problem that as the number of parameters that the neural network needs to learn increases, the computational throughput and power consumption of the processor increase exponentially. Therefore, it is necessary to compress the neural network in terms of processing throughput, processing speed, and power consumption of the processor.

신경망을 압축하기 위해 채널 프루닝(channel pruning) 방법이 고안되었다. 채널 프루닝 방법은 신경망의 손실 함수(loss function)가 특정한 기준까지 줄어들 때까지 파라미터를 학습한 후, 신경망에서 크기가 작은 파라미터(또는 가중치, weight)를 제거하도록 신경망을 훈련하는 방법이다. A channel pruning method was devised to compress neural networks. The channel pruning method is a method of training a neural network to remove a small parameter (or weight) from a neural network after learning a parameter until a loss function of a neural network decreases to a specific criterion.

특히, 컨볼루션 신경망(convolution neural network)에서 사용되는 종래의 채널 프루닝 방법은 두 가지 단계로 수행된다. 첫 번째 단계는 훈련된 신경망에서 필터 파라미터 중 크기가 작은 파라미터를 제거하여 신경망을 압축하는 단계이며, 두 번째 단계는 압축된 신경망의 필터 파라미터를 다시 훈련시키는 단계이다. In particular, the conventional channel pruning method used in a convolution neural network is performed in two steps. The first step is to compress the neural network by removing the small filter parameters from the trained neural network, and the second step is to retrain the filter parameters of the compressed neural network.

하지만, 기존 채널 프루닝 방법은 컨볼루션 신경망에 입력되는 입력 특징맵에 포함된 엘리먼트(element)의 크기를 고려하지 않은 채 필터 파라미터의 크기만을 고려하여, 크기가 작은 파라미터를 제거한다. 컨볼루션 신경망에서 필터 파라미터의 크기가 작더라도 입력 특징맵에 포함된 엘리먼트의 크기가 큰 경우, 필터는 크기가 큰 엘리먼트를 포함하는 특징맵을 생성할 수 있다. 이처럼, 입력 특징맵의 크기에 상관없이 크기가 작은 필터 파라미터를 모두 제거한다면, 크기가 작은 필터 파라미터에 의해 생성될 수 있는 크기가 큰 출력 특징맵까지 제거되므로 신경망의 성능을 저하시키는 문제점이 있다.However, the existing channel pruning method removes a parameter having a small size by considering only the size of a filter parameter without considering the size of an element included in an input feature map input to a convolutional neural network. In a convolutional neural network, even if the size of the filter parameter is small, when the size of an element included in the input feature map is large, the filter may generate a feature map including the element having a large size. As described above, if all filter parameters having a small size are removed irrespective of the size of the input feature map, the output feature map with a large size that can be generated by the small filter parameter is also removed, thereby reducing the performance of the neural network.

또한, 종래의 채널 프루닝 방법은 신경망을 압축하는 단계를 수행한 후에 압축된 신경망의 필터 파라미터를 다시 훈련시키는 단계를 추가적으로 수행해야 한다. 이는, 필터 파라미터를 제거하는 단계에 의한 성능 저하를 보상하기 위한 과정이다. 따라서, 종래의 채널 프루닝 방법은 신경망을 압축하는 과정과 재훈련하는 과정을 거쳐야 하므로, 신경망을 훈련하는 데 필요한 시간이 증가한다는 문제점이 있다.In addition, in the conventional channel pruning method, after performing the step of compressing the neural network, it is necessary to additionally perform the step of retraining the filter parameters of the compressed neural network. This is a process for compensating for performance degradation caused by the step of removing the filter parameter. Accordingly, the conventional channel pruning method has a problem in that the time required to train the neural network increases because the process of compressing and retraining the neural network is required.

본 발명의 실시예들은, 신경망 내 필터에 입력되는 입력 특징맵의 크기에 상관없이 필터에 의해 생성되는 특징맵에 기초하여 제로 특징맵을 생성하는 필터를 제거함으로써, 신경망이 학습할 파라미터의 수를 줄이는 데 주된 목적이 있다. In embodiments of the present invention, the number of parameters to be learned by the neural network is determined by removing a filter that generates a zero feature map based on a feature map generated by the filter regardless of the size of the input feature map input to the filter in the neural network. The main purpose is to reduce.

본 발명의 다른 일 실시예는, 임의의 이미지에 대해 필터가 모두 제로 특징맵을 출력하거나 모두 논제로 특징맵을 출력하도록 훈련시킨 뒤 제로 특징맵을 생성하는 필터를 제거함으로써, 신경망의 성능을 유지함과 동시에 학습 파라미터의 수를 줄이는 데 일 목적이 있다.Another embodiment of the present invention maintains the performance of a neural network by removing the filter that generates the zero feature map after training to output all zero feature maps or all thesis feature maps for an arbitrary image. At the same time, it aims to reduce the number of learning parameters.

본 발명의 일 측면에 의하면, 복수의 이미지를 포함하는 복수의 배치에 대해 복수의 필터를 이용하여 특징맵들을 생성하는 신경망을 압축하는 훈련 방법에 있어서, 상기 복수의 배치를 획득하고, 상기 복수의 배치를 상기 신경망에 제공하는 과정; 및 각 배치마다 상기 신경망의 파라미터를 갱신하는 과정을 포함하며, 상기 갱신하는 과정은, 상기 신경망을 이용하여 상기 배치에 포함된 각 이미지들에 대응하는 복수의 특징맵을 획득하는 과정; 상기 각 이미지들에 대응하는 상기 복수의 특징맵을 제로 특징맵과 논제로 특징맵으로 구분하는 과정; 및 상기 제로 특징맵의 수, 및 상기 복수의 이미지 중 서로 다른 두 이미지에 대해 모두 제로 특징맵 또는 논제로 특징맵을 출력하는 필터의 수를 이용하여 상기 신경망의 파라미터를 수정하는 과정을 포함하되, 상기 제로 특징맵은 모든 엘리먼트가 0인 특징맵이고, 상기 논제로 특징맵은 엘리먼트 중 적어도 하나가 0이 아닌 특징맵인 훈련 방법을 제공한다.According to an aspect of the present invention, in a training method for compressing a neural network that generates feature maps using a plurality of filters for a plurality of batches including a plurality of images, the plurality of batches are obtained, and the plurality of Providing an arrangement to the neural network; And updating parameters of the neural network for each batch, wherein the updating includes: obtaining a plurality of feature maps corresponding to the images included in the batch using the neural network; Dividing the plurality of feature maps corresponding to the images into a zero feature map and a non-zero feature map; And modifying a parameter of the neural network using the number of zero feature maps and the number of filters that output a zero feature map or a topic zero feature map for two different images among the plurality of images, The zero feature map is a feature map in which all elements are 0, and the non-zero feature map provides a training method in which at least one of the elements is a non-zero feature map.

본 실시예의 다른 측면에 의하면, 이미지 처리 방법으로서, 입력 이미지를 획득하는 과정; 상기 입력 이미지를 제1신경망에 제공하는 과정, 상기 제1신경망은 복수의 필터를 이용하여 상기 입력 이미지에 대한 특징맵들을 생성함; 및 제2신경망이 상기 제1신경망으로부터 수신한 상기 특징맵들을 이용하여 상기 이미지를 처리한 결과를 획득하는 과정을 포함하고, 상기 제1신경망은 복수의 이미지를 포함하는 복수의 배치에 대해 상기 제2신경망과 함께 훈련된 것이며, 훈련 방법은 각 배치마다 상기 제1신경망을 이용하여 상기 배치에 포함된 각 이미지들에 대응하는 복수의 특징맵을 획득하는 과정; 각 이미지들에 대응하는 상기 복수의 특징맵을 제로 특징맵과 논제로 특징맵으로 구분하는 과정; 및 상기 제로 특징맵의 수, 및 상기 복수의 이미지 중 서로 다른 두 이미지에 대해 모두 제로 특징맵 또는 논제로 특징맵을 출력하는 필터의 수를 이용하여 상기 제1신경망의 파라미터들을 수정하는 과정을 포함하되, 상기 제로 특징맵은 모든 엘리먼트(element)가 0인 특징맵이고, 상기 논제로 특징맵은 엘리먼트 중 적어도 하나가 0이 아닌 특징맵인 이미지 처리 방법을 제공한다.According to another aspect of the present embodiment, there is provided an image processing method, comprising: obtaining an input image; Providing the input image to a first neural network, wherein the first neural network generates feature maps for the input image using a plurality of filters; And obtaining, by a second neural network, a result of processing the image using the feature maps received from the first neural network, wherein the first neural network It is trained with two neural networks, and the training method includes: acquiring a plurality of feature maps corresponding to the images included in the batch using the first neural network for each batch; Dividing the plurality of feature maps corresponding to each image into a zero feature map and a non-zero feature map; And modifying parameters of the first neural network using the number of zero feature maps and the number of filters that output zero feature maps or non-zero feature maps for two different images among the plurality of images. However, the zero feature map is a feature map in which all elements are 0, and the non-zero feature map provides an image processing method in which at least one of the elements is a non-zero feature map.

이상에서 설명한 바와 같이 본 발명의 일 실시예에 의하면, 신경망 내 필터에 입력되는 입력 특징맵의 크기에 상관없이 필터에 의해 생성되는 특징맵에 기초하여 제로 특징맵을 생성하는 필터를 제거함으로써, 신경망이 학습할 파라미터의 수를 줄여 신경망의 연산 처리량 및 연산 속도를 감소시킬 수 있다.As described above, according to an embodiment of the present invention, by removing a filter that generates a zero feature map based on a feature map generated by the filter regardless of the size of the input feature map input to the filter in the neural network, the neural network By reducing the number of parameters to be learned, the computational throughput and computational speed of the neural network can be reduced.

본 발명의 다른 일 실시예에 의하면, 임의의 이미지에 대해 필터가 모두 제로 특징맵을 출력하거나 모두 논제로 특징맵을 출력하도록 훈련시킨 뒤 제로 특징맵을 생성하는 필터를 제거함으로써, 신경망의 성능을 유지함과 동시에 학습 파라미터의 수를 줄여 신경망의 연산 처리량 및 연산 속도를 감소시킬 수 있다.According to another embodiment of the present invention, the performance of the neural network is improved by removing the filter that generates the zero feature map after training to output all zero feature maps or all thesis feature maps for an arbitrary image. While maintaining, it is possible to reduce the number of learning parameters to reduce the computational throughput and computational speed of the neural network.

도 1a 및 도 1b는 컨볼루션 신경망에서 신경망을 압축하는 방법 중 채널 프루닝 방법을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 신경망을 압축하는 과정을 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른 신경망 압축 방법을 수학식으로 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 신경망의 압축 훈련 방법을 나타내는 순서도이다.
도 5는 본 발명의 일 실시예에 따른 압축된 신경망을 이용하는 방법을 설명하기 위한 순서도이다.1A and 1B are diagrams for explaining a channel pruning method among methods of compressing a neural network in a convolutional neural network.
2 is a diagram illustrating a process of compressing a neural network according to an embodiment of the present invention.
3 is a diagram for explaining a neural network compression method according to an embodiment of the present invention in terms of equations.
4 is a flowchart illustrating a compression training method of a neural network according to an embodiment of the present invention.
5 is a flowchart illustrating a method of using a compressed neural network according to an embodiment of the present invention.

이하, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, some embodiments of the present invention will be described in detail through exemplary drawings. In adding reference numerals to elements of each drawing, it should be noted that the same elements are assigned the same numerals as possible, even if they are indicated on different drawings. In addition, in describing the present invention, when it is determined that a detailed description of a related known configuration or function may obscure the subject matter of the present invention, a detailed description thereof will be omitted.

또한, 본 발명의 구성 요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 명세서 전체에서, 어떤 부분이 어떤 구성요소를 '포함', '구비'한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 '~부', '모듈' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In addition, in describing the constituent elements of the present invention, terms such as first, second, A, B, (a) and (b) may be used. These terms are for distinguishing the constituent element from other constituent elements, and the nature, order, or order of the constituent element is not limited by the term. Throughout the specification, when a part'includes' or'includes' a certain element, it means that other elements may be further included rather than excluding other elements unless otherwise stated. . In addition, terms such as'~ unit' and'module' described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software, or a combination of hardware and software.

도 1a 및 도 1b는 컨볼루션 신경망에서 신경망을 압축하는 방법 중 채널 프루닝 방법을 설명하기 위한 도면이다.1A and 1B are diagrams for explaining a channel pruning method among methods of compressing a neural network in a convolutional neural network.

도 1a를 참조하면, 입력 특징맵(feature map, 100), 커널(kernel), 즉 제1필터(102), 중간 특징맵(104), 제2필터(106) 및 출력 특징맵(108)이 예시된다.1A, an input feature map 100, a kernel, that is, a first filter 102, an intermediate feature map 104, a second filter 106, and an output feature map 108 are It is illustrated.

입력 특징맵(100)은 입력 이미지로부터 추출되며, 3개의 채널을 가지는 것으로 설명한다. 3개의 입력 특징맵(100)이 5개의 제1필터(102)에 입력된다. 각 필터는 3개의 채널에 대해 하나의 중간 특징맵(104)를 생성한다. 즉, 중간 특징맵(104)의 개수는 제1필터의 개수에 의해 정해진다. 5개의 중간 특징맵(104)는 2개의 제2필터(106)에 입력되고, 각 필터는 5개의 중간 특징맵(104)에 대해 2개의 출력 특징맵(108)을 생성한다.The input feature map 100 is extracted from the input image and will be described as having three channels. Three input feature maps 100 are input to five first filters 102. Each filter creates one intermediate feature map 104 for three channels. That is, the number of intermediate feature maps 104 is determined by the number of first filters. Five intermediate feature maps 104 are input to two second filters 106, and each filter generates two output feature maps 108 for five intermediate feature maps 104.

채널 프루닝 방법을 적용하기 위해, 필터에 포함된 파라미터의 크기가 작은 필터를 제거한다. 5개의 제1필터(102) 중 2개의 필터에 포함된 파라미터의 크기가 작을 때, 2개의 필터에 대해 신경망의 성능에 영향을 미치지 않는 필터라고 판단하고, 2개의 필터에 의해 생성되는 2개의 중간 특징맵 또한 신경망의 성능에 영향을 미치지 않는 특징맵으로 판단할 수 있다. In order to apply the channel pruning method, a filter with a small parameter size included in the filter is removed. When the size of the parameters included in the two filters among the five first filters 102 is small, it is determined that the two filters are filters that do not affect the performance of the neural network, and the two intermediate filters are generated by the two filters. The feature map can also be determined as a feature map that does not affect the performance of the neural network.

도 1b를 참조하면, 크기가 작은 파라미터로 이루어진 2개의 필터를 제거한 후 신경망을 재훈련(re-training)하는 과정이 나타난다.Referring to FIG. 1B, a process of re-training a neural network after removing two filters composed of small parameters is shown.

3개의 입력 특징맵(100)은 3개의 제1필터(112)에 입력되고, 3개의 제1필터(112)는 3개의 중간 특징맵(114)를 생성한다. 3개의 중간 특징맵(114)는 2개의 제2필터(116)에 입력되고, 2개의 제2필터(116)는 2개의 출력 특징맵(118)을 생성한다.The three input feature maps 100 are input to the three first filters 112, and the three first filters 112 generate three intermediate feature maps 114. Three intermediate feature maps 114 are input to two second filters 116, and two second filters 116 generate two output feature maps 118.

이때, 2개의 필터가 제거되기 전, 2개의 필터에 의해 생성되는 2개의 중간 특징맵도 2개의 필터와 함께 제거된다. 이때, 2개의 필터에 포함된 파라미터의 크기가 작더라도, 2개의 중간 특징맵은 크기가 큰 파라미터를 포함할 수 있다. 즉, 2개의 필터를 제거함으로써 신경망의 성능에 영향을 줄 수 있는 특징맵까지 제거되므로, 신경망의 성능이 저하될 수 있다.At this time, before the two filters are removed, two intermediate feature maps generated by the two filters are also removed together with the two filters. In this case, even if the size of the parameters included in the two filters is small, the two intermediate feature maps may include a parameter having a large size. That is, by removing the two filters, even a feature map that may affect the performance of the neural network is removed, so that the performance of the neural network may be degraded.

5개의 입력 필터 중 2개의 필터가 제거되어 신경망의 성능이 저하되는 것을 방지하기 위해, 신경망은 3개의 제1필터(112)에 포함된 파라미터를 다시 훈련하는 과정을 수행한다. In order to prevent the performance of the neural network from deteriorating due to the removal of two of the five input filters, the neural network performs a process of retraining the parameters included in the three first filters 112.

도 2는 본 발명의 일 실시예에 따른 신경망을 압축하는 과정을 설명하기 위한 도면이다.2 is a diagram illustrating a process of compressing a neural network according to an embodiment of the present invention.

도 2를 참조하면, 제1이미지(210), 제2 이미지(220), 신경망(230), 4개의 필터(232, 234, 236, 238), 제1 이미지에 대한 4개의 제1 특징맵들(212, 214, 216, 218) 및 제2 이미지에 대한 4개의 제2 특징맵들(222, 224, 226, 228)이 나타나 있다.Referring to FIG. 2, a first image 210, a second image 220, a neural network 230, four filters 232, 234, 236, 238, and four first feature maps for the first image. (212, 214, 216, 218) and four second feature maps 222, 224, 226, 228 for the second image are shown.

제1 이미지(210) 및 제2 이미지(220)는 신경망의 훈련 데이터이다. 다만, 제1 이미지(210) 및 제2 이미지(220)는 하나의 실시예일뿐, 이에 한정되지 않고 복수의 글자들을 다차원 행렬로 표현한 데이터일 수 있다.The first image 210 and the second image 220 are training data of a neural network. However, the first image 210 and the second image 220 are only one embodiment, and are not limited thereto, and may be data representing a plurality of characters in a multidimensional matrix.

신경망(230)은 4개의 필터(232, 234, 236, 238)을 포함하나, 이는 하나의 실시예일뿐, 복수의 필터를 포함할 수도 있다. 신경망(230)은 제1 이미지(210) 및 제2 이미지(220)을 제공받고, 4개의 필터(232, 234, 236, 238)을 이용하여 특징맵들을 생성한다. 제1 특징맵들(212, 214, 216, 218) 및 제2 특징맵들(222, 224, 226, 228)은 각 특징맵마다 복수의 엘리먼트를 포함한다.The neural network 230 includes four filters 232, 234, 236, and 238, but this is only one embodiment and may include a plurality of filters. The neural network 230 receives the first image 210 and the second image 220, and generates feature maps using four filters 232, 234, 236, and 238. The first feature maps 212, 214, 216, and 218 and the second feature maps 222, 224, 226, and 228 include a plurality of elements for each feature map.

제1 특징맵들(212, 214, 216, 218)은 신경망(230)이 제1 이미지(210)에 대해 생성한 특징맵들이며, 제2 특징맵들(222, 224, 226, 228)은 신경망(230)이 제2 이미지(220)에 대해 생성한 특징맵들이다. 구체적으로, 제1 필터(232)는 제1 이미지에 대한 특징맵(212) 및 제2 이미지에 대한 특징맵(222)를 생성한다. 마찬가지로, 제2 필터(234), 제3 필터(236) 및 제4 필터(238)는 각각 제1 이미지에 대한 특징맵들(214, 216, 218) 및 제2 이미지에 대한 특징맵들(224, 226, 228)를 생성한다.The first feature maps 212, 214, 216, 218 are feature maps generated by the neural network 230 for the first image 210, and the second feature maps 222, 224, 226, 228 are neural networks. Reference numeral 230 denotes feature maps generated for the second image 220. Specifically, the first filter 232 generates a feature map 212 for the first image and a feature map 222 for the second image. Likewise, the second filter 234, the third filter 236, and the fourth filter 238 respectively include feature maps 214, 216, and 218 for the first image and feature maps 224 for the second image, respectively. , 226, 228).

제1 특징맵들(212, 214, 216, 218) 및 제2 특징맵들(222, 224, 226, 228)은 제로 특징맵(zero feature map)과 논제로 특징맵(non-zero feature map)으로 구분된다. 여기서, 제로 특징맵은 모든 엘리먼트(element)의 값이 0인 특징맵을 의미하고, 논제로 특징맵은 엘리먼트 중 적어도 하나가 0이 아닌 특징맵을 의미한다. 제1 특징맵들 중 일부(212, 214)는 논제로 특징맵이며, 제1 특징맵들 중 나머지(216, 218)는 제로 특징맵이다. 제2 특징맵들 중 일부(214, 216)는 논제로 특징맵이며, 제2 특징맵들 중 나머지(212, 218)는 제로 특징맵이다.The first feature maps 212, 214, 216, 218 and the second feature maps 222, 224, 226, 228 are a zero feature map and a non-zero feature map. It is divided into. Here, the zero feature map means a feature map in which values of all elements are 0, and the non-zero feature map means a feature map in which at least one of the elements is not zero. Some of the first feature maps 212 and 214 are topic-zero feature maps, and the rest of the first feature maps 216 and 218 are zero feature maps. Some of the second feature maps 214 and 216 are topic-zero feature maps, and the rest of the second feature maps 212 and 218 are zero feature maps.

신경망(230)의 파라미터는 제로 특징맵들(216, 218, 222, 228)의 수, 제1 이미지(210) 및 제2 이미지(220)에 대해 모두 제로 특징맵을 출력하는 필터(238)의 수에 부분적으로 기초하여 수정될 수 있다.The parameters of the neural network 230 include the number of zero feature maps 216, 218, 222, 228, and the filter 238 outputting a zero feature map for both the first image 210 and the second image 220. May be modified based in part on the number.

본 발명의 일 실시예에 따른 신경망을 압축하는 훈련 방법은 제1 이미지(210) 및 제2 이미지(220)에 대해 모두 제로 특징맵을 생성하거나 모두 논제로 특징맵을 생성하는 필터의 수가 증가되도록 신경망(230)의 파라미터를 수정한다.In the training method for compressing a neural network according to an embodiment of the present invention, the number of filters for generating zero feature maps for both the first image 210 and the second image 220 is increased. The parameters of the neural network 230 are modified.

본 발명의 다른 실시예에 따른 신경망을 압축하는 훈련 방법은 제1 이미지(210) 및 제2 이미지(220)에 대해 생성된 복수의 특징맵 중 제로 특징맵 의 수가 증가되도록 신경망(230)의 파라미터를 수정한다.A training method for compressing a neural network according to another embodiment of the present invention includes a parameter of the neural network 230 so that the number of zero feature maps among a plurality of feature maps generated for the first image 210 and the second image 220 is increased. Modify it.

본 발명의 일 실시예에 따른 신경망을 압축하는 훈련 방법은 복수의 배치(batch)에 대해 각 배치 단위로 신경망의 파라미터를 갱신할 수 있다. 여기서, 배치는 훈련 데이터인 복수의 이미지를 포함하는 단위이다. 복수의 배치에 대해 신경망의 파라미터를 갱신한 후, 복수의 이미지 중 서로 다른 두 이미지를 신경망에 제공한다. 신경망에 포함된 복수의 필터 중 서로 다른 두 이미지에 대해 모두 제로 특징맵을 생성하는 필터를 제거한다.In the training method for compressing a neural network according to an embodiment of the present invention, parameters of the neural network may be updated in units of each batch for a plurality of batches. Here, the arrangement is a unit including a plurality of images that are training data. After updating the parameters of the neural network for a plurality of batches, two different images among the plurality of images are provided to the neural network. Among the plurality of filters included in the neural network, filters that generate zero feature maps for two different images are removed.

구체적으로, 신경망을 압축하는 훈련 방법은 복수의 배치 중 하나의 배치를 신경망에 제공하고, 신경망을 이용하여 하나의 배치에 포함된 각 이미지에 대응하는 복수의 특징맵을 획득한다. 각 이미지에 대응하는 복수의 특징맵을 제로 특징맵과 논제로 특징맵으로 구분한다. 하나의 배치에 포함된 서로 다른 두 이미지에 대해 모두 제로 특징맵 또는 논제로 특징맵을 출력하는 필터의 수 및 제로 특징맵의 수에 기초하여 신경망의 파라미터를 수정할 수 있다. 이후, 위의 과정을 각 배치마다 반복하여 수행함으로써 신경망을 훈련할 수 있다. 예를 들어, 2개의 배치가 주어지고, 각 배치는 8개의 이미지를 포함할 때, 신경망 훈련 방법은 하나의 배치에 대해 신경망의 파라미터를 총 28번 갱신할 수 있다. 28번은 C(8, 2)로부터 계산되며, C(8, 2)는 8개의 이미지 중 서로 다른 두 이미지의 조합을 의미한다.Specifically, in the training method for compressing a neural network, one arrangement among a plurality of arrangements is provided to the neural network, and a plurality of feature maps corresponding to each image included in one arrangement is obtained using the neural network. A plurality of feature maps corresponding to each image is divided into a zero feature map and a topic zero feature map. The parameters of the neural network may be modified based on the number of filters that output a zero feature map or a non-zero feature map for two different images included in one arrangement and the number of zero feature maps. Thereafter, the neural network can be trained by repeating the above process for each batch. For example, when two batches are given, and each batch includes eight images, the neural network training method may update the parameters of the neural network 28 times for one batch. No. 28 is calculated from C(8, 2), and C(8, 2) means a combination of two different images among the eight images.

도 2에서 설명한 과정들을 통해 압축된 신경망을 훈련할 수 있으며, 신경망의 파라미터에 대한 훈련 및 압축이 동시에 이루어지므로, 신경망이 학습하는 파라미터의 수가 줄어들되, 별도의 재훈련 과정이 요구되지 않고 신경망의 성능이 유지된다. The compressed neural network can be trained through the processes described in FIG. 2, and since training and compression for the parameters of the neural network are performed at the same time, the number of parameters that the neural network learns is reduced, but a separate retraining process is not required and Performance is maintained.

도 3은 본 발명의 일 실시예에 따른 신경망 압축 방법을 수학식으로 설명하기 위한 도면이다.3 is a diagram for explaining a neural network compression method according to an embodiment of the present invention in terms of equations.

본 발명의 일 실시예에 따른 신경망 압축 훈련 방법은 두 개의 손실(loss)이 포함된 손실함수를 이용한다. 두 개의 손실은 채널 로스(channel loss)와 xor 로스(exclusive or loss)이다. The neural network compression training method according to an embodiment of the present invention uses a loss function including two losses. The two losses are channel loss and xor loss (exclusive or loss).

채널 로스는 컨볼루션 레이어로부터 출력되는 복수의 특징맵 중 제로 특징맵의 수를 증가시키는 파라미터이다. 즉, 채널 로스가 적을수록 출력 특징맵 중 제로 특징맵이 많아진다. The channel loss is a parameter that increases the number of zero feature maps among a plurality of feature maps output from the convolutional layer. That is, as the channel loss decreases, the number of zero feature maps among the output feature maps increases.

xor 로스는 입력 이미지가 달라지더라도 제로 특징맵을 생성하는 필터를 제거하기 위한 파라미터이다. xor 로스를 이용하여 두 입력 이미지에 대해 모두 제로 특징맵을 생성하거나 모두 논제로 특징맵을 생성하는 필터의 수를 증가시킬 수 있다. 즉, xor 로스가 작을수록 임의의 입력 이미지에 대해 항상 제로 특징맵을 출력하거나 모두 논제로 특징맵을 출력하는 필터의 수가 증가한다. The xor loss is a parameter for removing a filter that generates a zero feature map even if the input image is different. Using xor loss, it is possible to generate zero feature maps for both input images or increase the number of filters that generate non-zero feature maps for both. That is, as the xor loss decreases, the number of filters that always output zero feature maps for any input image or all non-zero feature maps increases.

본 발명의 일 실시예는 두 손실을 이용하여 임의의 이미지에 대해 항상 제로 특징맵을 생성하는 필터를 제거함으로써, 신경망의 성능을 유지하되 신경망의 학습 파라미터 수를 줄일 수 있다.According to an embodiment of the present invention, by removing a filter that always generates a zero feature map for an arbitrary image using two losses, it is possible to maintain the performance of the neural network and reduce the number of training parameters of the neural network.

도 3을 참조하면, B는 신경망의 파라미터를 갱신하는 단위이며, 컨볼루션 레이어는 신경망에 포함되고 복수의 필터를 포함하되 복수의 특징맵을 출력하는 구성이다.

은 L번째 컨볼루션 레이어로부터 출력되는 특징맵들이고,

은 L번째 레이어로부터 출력되는 특징맵들의 개수를 나타내는 채널이다. Referring to FIG. 3, B is a unit for updating a parameter of a neural network, and a convolution layer is included in the neural network, includes a plurality of filters, and outputs a plurality of feature maps.

Are feature maps output from the L-th convolution layer,

Is a channel indicating the number of feature maps output from the L-th layer.

본 발명의 일 실시예에 따른 훈련 방법은 배치 B에 포함된 이미지 i 및 이미지 j를 신경망에 제공하고, 신경망에 포함된 컨볼루션 레이어 l로부터 특징맵들을 획득한다. 이후, 이미지 i에 대한 복수의 특징맵인

와 이미지 j에 대한 복수의 특징맵인

에 대해 L0-정규화(L0-normalization)를 통해 제로 특징맵과 논제로 특징맵으로 구분한다. xor 로스는 이미지 i 및 이미지 j에 대해 모두 제로 특징맵을 생성하거나 모두 논제로 특징맵을 생성하는 필터의 수를 합한 값에 부분적으로 기초하여 계산된다. 채널 로스는 이미지 i 및 이미지 j에 대해 논제로 특징맵을 생성하는 필터의 수를 모두 합한 값에 부분적으로 기초하여 계산된다.In the training method according to an embodiment of the present invention, image i and image j included in batch B are provided to a neural network, and feature maps are obtained from convolutional layer l included in the neural network. After that, a plurality of feature maps for image i

And multiple feature maps for image j

Is divided into a zero feature map and a topic zero feature map through L0-normalization. The xor loss is calculated based in part on the sum of the number of filters that generate zero feature maps for both images i and j, or all non-zero feature maps. The channel loss is calculated based in part on the sum of all the number of filters that generate the feature map as a topic for image i and image j.

구체적으로, 채널 로스 및 xor 로스는 각각 수학식 1 및 수학식 2에 의해 계산된다. 수학식 1 및 수학식 2에서

는 L0-norm으로서, 특징맵 x에 포함된 모든 엘리먼트가 0이면 0을 반환하고, 엘리먼트 중 적어도 하나가 0이 아니면 1을 반환하는 기호이다. 또한, 수학식 2에서 t는 채널의 인덱스(index)를 의미한다.Specifically, the channel loss and xor loss are calculated by

Equations

1 and 2, respectively. In Equation 1 and Equation 2

Is L0-norm, and is a symbol that returns 0 if all elements included in the feature map x are 0, and returns 1 if at least one of the elements is not 0. In addition, in Equation 2, t denotes an index of a channel.

본 발명의 일 실시예에 따라.

는 수학식 3과 같이 수정될 수 있다. 수학식 3에서

는 L2-norm(L2 정규화)이다. 이는, 채널 로스 및 xor 로스를 포함하는 손실함수를 미분가능하도록 하기 위함이다.

일 수 있다.According to an embodiment of the present invention.

Can be modified as in Equation 3. In Equation 3

Is L2-norm (L2 normalized). This is to make it possible to differentiate a loss function including a channel loss and an xor loss.

Can be

도 4는 본 발명의 일 실시예에 따른 신경망의 압축 훈련 방법을 나타내는 순서도이다.4 is a flowchart illustrating a compression training method of a neural network according to an embodiment of the present invention.

이하에서 설명하는 과정들은 하나 이상의 메모리 및 하나 이상의 프로세서를 이용하여 수행될 수 있다.The processes described below may be performed using one or more memories and one or more processors.

우선, 복수의 이미지를 포함하는 저장하는 메모리로부터 복수의 이미지를 획득한다(S400). 이때, 복수의 이미지를 포함하는 배치 단위로 이미지를 획득할 수도 있다.First, a plurality of images is acquired from a memory storing a plurality of images (S400). In this case, an image may be acquired in a batch unit including a plurality of images.

복수의 배치를 신경망에 제공하고, 신경망이 하나의 배치에 포함된 각 이미지들에 대해 복수의 필터를 이용하여 생성한 복수의 특징맵을 수신한다(S402).A plurality of arrangements are provided to the neural network, and the neural network receives a plurality of feature maps generated by using a plurality of filters for each image included in one arrangement (S402).

각 이미지들에 대응하는 복수의 특징맵을 제로 특징맵과 논제로 특징맵으로 구분한다(S404).A plurality of feature maps corresponding to each image is divided into a zero feature map and a topic zero feature map (S404).

복수의 이미지 중 서로 다른 두 이미지에 대해 모두 제로 특징맵 또는 모두 논제로 특징맵을 생성하는 필터의 수가 증가하도록 신경망의 파라미터를 수정한다(S406).The parameters of the neural network are modified to increase the number of filters that generate zero feature maps or all non-zero feature maps for two different images among the plurality of images (S406).

복수의 특징맵 중 제로 특징맵의 수가 증가하도록 신경망의 파라미터를 수정한다(S408).The parameters of the neural network are modified to increase the number of zero feature maps among the plurality of feature maps (S408).

본 발명의 일 실시예에 따른 신경망 훈련 방법은 과정 S402 내지 과정 S408을 각 배치에 대해 반복하여 수행할 수 있다. 또한, 신경망 훈련 방법은 과정 S406 및 과정 S408을 동시에 수행할 수도 있다.In the neural network training method according to an embodiment of the present invention, steps S402 to S408 may be repeatedly performed for each batch. In addition, the neural network training method may simultaneously perform steps S406 and S408.

도 5는 본 발명의 일 실시예에 따른 압축된 신경망을 이용하는 방법을 설명하기 위한 순서도이다.5 is a flowchart illustrating a method of using a compressed neural network according to an embodiment of the present invention.

압축된 신경망을 이용하여 이미지를 처리하는 방법으로서, 훈련된 제1 신경망과 훈련된 제2 신경망을 이용하는 방법을 설명한다. As a method of processing an image using a compressed neural network, a method of using the trained first neural network and the trained second neural network will be described.

도 5를 참조하면, 메모리로부터 복수의 입력 이미지를 획득한다(S500).Referring to FIG. 5, a plurality of input images are acquired from a memory (S500).

획득한 복수의 입력 이미지를 제1신경망에 제공한다(S502). 제1 신경망은 복수의 필터를 이용하여 복수의 입력 이미지 각각에 대한 특징맵들을 생성한다. 제1 신경망에 의해 생성된 특징맵들은 제2 신경망에 전달된다.The obtained input images are provided to the first neural network (S502). The first neural network generates feature maps for each of a plurality of input images using a plurality of filters. Feature maps generated by the first neural network are transmitted to the second neural network.

제2 신경망이 제1 신경망으로부터 수신한 특징맵들을 이용하여 이미지를 처리한 결과를 획득한다(S504). The second neural network obtains a result of processing the image using the feature maps received from the first neural network (S504).

여기서, 제2 신경망이 이미지 분류기인 경우, 제2 신경망이 복수의 이미지를 분류한 결과를 획득할 수 있다. Here, when the second neural network is an image classifier, a result of classifying a plurality of images by the second neural network may be obtained.

반면, 제2 신경망이 이미지 스타일 변환기인 경우, 복수의 이미지 중 서로 다른 두 이미지에 대해 이미지 스타일 전환이 적용된 합성 이미지를 획득할 수 있다. 이미지 스타일 전환이란, 두 이미지 중 하나의 이미지로부터 컨텐츠 특징을 추출하고, 다른 하나의 이미지로부터 스타일 특징을 추출한 후, 컨텐츠 특징과 스타일 특징을 합성한 이미지를 출력하는 기술을 의미한다.On the other hand, when the second neural network is an image style converter, it is possible to obtain a composite image to which image style conversion is applied to two different images among a plurality of images. Image style conversion refers to a technique of extracting a content feature from one image of two images, extracting a style feature from the other image, and then outputting an image in which the content feature and the style feature are combined.

한편, 제1 신경망과 제2 신경망은 함께 훈련될 수 있다. 이하에서 설명한다.Meanwhile, the first neural network and the second neural network may be trained together. It will be described below.

훈련 방법은, 우선 복수의 배치 중 각 배치마다 제1 신경망을 이용하여 배치에 포함된 각 이미지들에 대응하는 복수의 특징맵을 획득한다. 제1 신경망에 의해 생성된 복수의 특징맵은 제2 신경망에 전달된다. 이후, 각 이미지들에 대응하는 복수의 특징맵을 제로 특징맵과 논제로 특징맵으로 구분한다. 제로 특징맵의 수, 및 복수의 이미지 중 서로 다른 두 이미지에 대해 모두 제로 특징맵 또는 모두 논제로 특징맵을 출력하는 필터의 수를 이용하여 제1신경망의 파라미터들을 수정한다.In the training method, first, a plurality of feature maps corresponding to images included in the batch are obtained using a first neural network for each batch among a plurality of batches. The plurality of feature maps generated by the first neural network are transmitted to the second neural network. Thereafter, a plurality of feature maps corresponding to the images are divided into a zero feature map and a topic zero feature map. The parameters of the first neural network are modified using the number of zero feature maps and the number of filters that output zero feature maps or all non-zero feature maps for two different images among a plurality of images.

본 발명의 일 실시예에 따라 제1 신경망과 제2 신경망이 복수의 이미지를 분류하도록 훈련된 경우, 제2 신경망은 이미지 분류기일 수 있다. 훈련 방법에 있어서, 우선 제1 신경망에 복수의 이미지를 제공한다. 제1 신경망은 복수의 이미지에 대한 복수의 특징맵을 생성하여 제2 신경망에 전달하고, 제2 신경망은 복수의 특징맵을 이용하여 복수의 이미지에 대해 각각 라벨링(labelling)한다. 라벨링된 복수의 이미지와 기 설정된 정답 데이터로부터 손실 함수를 추출한다. 추출된 손실함수에 제1 신경망의 채널 로스 및 xor 로스를 합한다. 마지막으로, 채널 로스 및 xor 로스를 포함하는 손실함수를 기 설정된 값까지 줄어들도록 제1 신경망의 파라미터와 제2 신경망의 파라미터를 함께 수정함으로써, 제1 신경망 및 제2 신경망을 훈련할 수 있다.According to an embodiment of the present invention, when the first neural network and the second neural network are trained to classify a plurality of images, the second neural network may be an image classifier. In the training method, first, a plurality of images is provided to the first neural network. The first neural network generates a plurality of feature maps for a plurality of images and delivers them to the second neural network, and the second neural network labels a plurality of images using the plurality of feature maps. A loss function is extracted from a plurality of labeled images and preset correct answer data. The channel loss and xor loss of the first neural network are added to the extracted loss function. Finally, the first neural network and the second neural network may be trained by modifying the parameters of the first neural network and the parameters of the second neural network together so that the loss function including the channel loss and the xor loss is reduced to a preset value.

반면, 본 발명의 일 실시예에 따라 제1 신경망과 제2 신경망이 이미지 스타일을 변환하도록 훈련된 경우, 제1 신경망은 인코더(encoder)이고, 제2 신경망은 디코더(decoder)일 수 있다. On the other hand, according to an embodiment of the present invention, when the first neural network and the second neural network are trained to convert the image style, the first neural network may be an encoder and the second neural network may be a decoder.

훈련 방법에 있어서, 우선 제1 신경망에 서로 다른 두 이미지를 제공한다. 제1 신경망은 하나의 이미지로부터 컨텐츠에 관련된 특징맵을 추출하고, 나머지 하나의 이미지로부터 스타일에 관련된 특징맵을 추출한다. 제1 신경망은 컨텐츠에 관련된 특징맵 및 스타일에 관련된 특징맵을 제2 신경망에 전달한다. 제2 신경망은 컨텐츠에 관련된 특징맵 및 스타일에 관련된 특징맵을 이용하여 컨텐츠와 스타일이 합성된 이미지를 생성한다. 합성된 이미지와 기 설정된 정답 데이터로부터 손실 함수를 추출한다. 추출된 손실함수에 제1 신경망의 채널 로스 및 xor 로스를 합한다. 마지막으로, 채널 로스 및 xor 로스를 포함하는 손실함수를 기 설정된 값까지 줄어들도록 제1 신경망의 파라미터와 제2 신경망의 파라미터를 함께 수정함으로써, 제1 신경망 및 제2 신경망을 훈련할 수 있다.In the training method, first, two different images are provided to the first neural network. The first neural network extracts a feature map related to content from one image, and extracts a feature map related to style from the other image. The first neural network transmits the feature map related to the content and the feature map related to the style to the second neural network. The second neural network generates an image in which content and style are synthesized using a feature map related to the content and a feature map related to the style. The loss function is extracted from the synthesized image and the preset correct answer data. The channel loss and xor loss of the first neural network are added to the extracted loss function. Finally, the first neural network and the second neural network may be trained by modifying the parameters of the first neural network and the parameters of the second neural network together so that the loss function including the channel loss and the xor loss is reduced to a preset value.

도 4 및 도 5에서는 과정 S400 내지 과정 S504를 순차적으로 실행하는 것으로 기재하고 있으나, 이는 본 발명의 일 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것이다. 다시 말해, 본 발명의 일 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 일 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 4 및 도 5에 기재된 순서를 변경하여 실행하거나 과정 S400 내지 과정 S504 중 하나 이상의 과정을 병렬적으로 실행하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이므로, 도 4 및 도 5는 시계열적인 순서로 한정되는 것은 아니다.In FIGS. 4 and 5, it is described that processes S400 to S504 are sequentially executed, but this is merely illustrative of the technical idea of an embodiment of the present invention. In other words, a person of ordinary skill in the art to which an embodiment of the present invention belongs can change the order shown in FIGS. 4 and 5 and execute the procedure S400 to the process without departing from the essential characteristics of the embodiment of the present invention. Since one or more processes in S504 are executed in parallel, various modifications and variations may be applied, and thus FIGS. 4 and 5 are not limited to a time-series order.

한편, 도 4 및 도 5에 도시된 과정들은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 즉, 컴퓨터가 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드디스크 등) 및 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등)와 같은 저장매체를 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 또한, 하나 이상의 프로세서 및 명령어를 저장하는 하나 이상의 메모리를 포함하되, 상기 명령어가 상기 하나 이상의 프로세서에 의해 실행되는 경우, 상기 명령어는 상기 하나 이상의 프로세서로 하여금 과정 S400 내지 과정 S408 또는 과정 S500 내지 S504를 수행하도록 한다.Meanwhile, the processes shown in FIGS. 4 and 5 can be implemented as computer-readable codes on a computer-readable recording medium. The computer-readable recording medium includes all types of recording devices that store data that can be read by a computer system. That is, the computer-readable recording medium includes storage media such as magnetic storage media (eg, ROM, floppy disk, hard disk, etc.) and optical reading media (eg, CD-ROM, DVD, etc.). In addition, the computer-readable recording medium can be distributed over a computer system connected through a network to store and execute computer-readable codes in a distributed manner. Further, it includes one or more processors and one or more memories for storing instructions, wherein when the instruction is executed by the one or more processors, the instruction causes the one or more processors to perform steps S400 to S408 or steps S500 to S504. Let's do it.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present embodiment, and those of ordinary skill in the technical field to which the present embodiment pertains will be able to make various modifications and variations without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are not intended to limit the technical idea of the present embodiment, but to explain the technical idea, and the scope of the technical idea of the present embodiment is not limited by these embodiments. The scope of protection of this embodiment should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present embodiment.

100: 입력 특징맵 108: 출력 특징맵
210: 제1 이미지 220: 제2 이미지
230: 신경망100: input feature map 108: output feature map
210: first image 220: second image
230: neural network

Claims

In a training method for compressing a neural network that generates feature maps using a plurality of filters for a plurality of batches including a plurality of images,
Obtaining the plurality of arrangements and providing the plurality of arrangements to the neural network; And
Including the process of updating the parameters of the neural network for each batch,
The updating process,
Obtaining a plurality of feature maps corresponding to the images included in the arrangement using the neural network;
Dividing the plurality of feature maps corresponding to the respective images into a zero feature map and a non-zero feature map; And
A process of modifying a parameter of the neural network using the number of zero feature maps and the number of filters that output a zero feature map or a topic zero feature map for two different images among the plurality of images,
The zero feature map is a feature map in which all elements are 0, and the non-zero feature map is a feature map in which at least one of the elements is non-zero.

The method of claim 1,
The training method wherein the neural network includes at least one convolution layer, and the plurality of filters are included in the at least one convolution layer.

The method of claim 1,
The modification process,
And modifying the parameter to increase the number of zero feature maps and the number of filters.

As an image processing method,
Obtaining an input image;
Providing the input image to a first neural network, wherein the first neural network generates feature maps for the input image using a plurality of filters; And
A second neural network obtaining a result of processing the image using the feature maps received from the first neural network,
The first neural network is trained with the second neural network for a plurality of batches including a plurality of images, and the training method is for each batch.
Acquiring a plurality of feature maps corresponding to the images included in the arrangement using the first neural network;
Dividing the plurality of feature maps corresponding to each image into a zero feature map and a non-zero feature map; And
Modifying parameters of the first neural network using the number of zero feature maps and the number of filters outputting a zero feature map or a non-zero feature map for two different images among the plurality of images;
Including,
The zero feature map is a feature map in which all elements are 0, and the non-zero feature map is a feature map in which at least one of the elements is non-zero.

The method of claim 4,
The first neural network includes at least one convolutional layer, and the plurality of filters are included in the at least one convolutional layer.

The method of claim 4,
The modification process,
And modifying the parameter to increase the number of zero feature maps and the number of filters.

The method of claim 4,
The first neural network is an encoder, and the second neural network is a decoder.

The method of claim 4,
The first neural network is an encoder, and the second neural network is an image classifier.

In a system for compressing a neural network,
One or more processors; And
Including one or more memories for storing instructions,
When the instruction is executed by the one or more processors, the instruction causes the one or more processors to perform a neural network training method,
The training method,
A training method for compressing the neural network generating feature maps using a plurality of filters for a plurality of batches including a plurality of images,
Acquiring the plurality of arrangements and providing the plurality of arrangements to the neural network; And
Including the process of updating the parameters of the neural network for each batch,
The updating process,
Obtaining a plurality of feature maps corresponding to the images included in the arrangement using the neural network;
Dividing the plurality of feature maps corresponding to each of the images into a zero feature map and a non-zero feature map; And
A process of modifying a parameter of the neural network using the number of zero feature maps and the number of filters that output a zero feature map or a topic zero feature map for two different images among the plurality of images,
The zero feature map is a feature map in which all elements are 0, and the non-zero feature map is a feature map in which at least one of the elements is not zero.

The method of claim 9,
Wherein the neural network includes at least one convolutional layer, and the plurality of filters are included in the at least one convolutional layer.

The method of claim 9,
The modification process,
And modifying the parameter to increase the number of zero feature maps and the number of filters.

In a system for processing images,
One or more processors; And
Including one or more memories for storing instructions,
When the instruction is executed by the one or more processors, the instruction causes the one or more processors to perform an image processing method,
The image processing method,
Obtaining an input image;
Providing the input image to a first neural network, wherein the first neural network generates feature maps for the input image using a plurality of filters; And
A second neural network obtaining a result of processing the image using the feature maps received from the first neural network,
The first neural network is trained with the second neural network for a plurality of batches including a plurality of images, and the training method is for each batch.
Acquiring a plurality of feature maps corresponding to the images included in the arrangement using the first neural network;
Dividing the plurality of feature maps corresponding to each image into a zero feature map and a non-zero feature map; And
Modifying parameters of the first neural network using the number of zero feature maps and the number of filters outputting a zero feature map or a non-zero feature map for two different images among the plurality of images;
Including,
The zero feature map is a feature map in which all elements are 0, and the non-zero feature map is a feature map in which at least one of the elements is not zero.

The method of claim 12,
Wherein the first neural network includes at least one convolutional layer, and the plurality of filters are included in the at least one convolutional layer.

The method of claim 12,
The modification process,
And modifying the parameter to increase the number of zero feature maps and the number of filters.

The method of claim 12,
The first neural network is an encoder, and the second neural network is a decoder.

The method of claim 12,
The first neural network is an encoder, and the second neural network is an image classifier.