KR102597079B1

KR102597079B1 - Method and device for compression of convolution neural network using n-mode tensor product operation

Info

Publication number: KR102597079B1
Application number: KR1020210117675A
Authority: KR
Inventors: 최윤식; 김태현
Original assignee: 연세대학교 산학협력단
Priority date: 2021-09-03
Filing date: 2021-09-03
Publication date: 2023-10-31
Also published as: KR20230034655A

Abstract

개시된 기술은 n차원 텐서 곱 연산을 이용한 합성곱 신경망 압축 방법 및 장치에 관한 것으로, 연산장치가 합성곱 신경망(Convolution Neural Network, CNN)에 포함된 복수개의 계층들 중 특정 계층을 선택하는 단계; 상기 연산장치가 상기 특정 계층에 포함된 복수의 가중치 필터들 각각에 대한 n차원 텐서 곱 연산을 수행하여 상기 복수의 가중치 필터들 각각에 대한 중요도를 계산하는 단계; 및 상기 연산장치가 상기 계산 결과에 따라 상기 복수의 가중치 필터들 중 상기 중요도가 기준 미만인 일부의 가중치 필터를 제거하는 단계;를 포함한다.The disclosed technology relates to a method and device for compressing a convolutional neural network using an n-dimensional tensor product operation, comprising: selecting a specific layer among a plurality of layers included in a convolutional neural network (CNN) by a computing device; Calculating importance for each of the plurality of weight filters by performing an n-dimensional tensor product operation on each of the plurality of weight filters included in the specific layer, by the computing device; and removing, by the calculation device, some weight filters whose importance is less than a standard among the plurality of weight filters according to the calculation result.

Description

Convolutional neural network compression method and device using n-dimensional tensor product operation {METHOD AND DEVICE FOR COMPRESSION OF CONVOLUTION NEURAL NETWORK USING N-MODE TENSOR PRODUCT OPERATION}

개시된 기술은 합성곱 신경망 내 특정 계층의 가중치 필터가 다음 계층에 미치는 영향을 n차원 텐서 곱 연산을 통해 계산하여 합성곱 신경망을 압축하는 방법 및 장치에 관한 것이다.The disclosed technology relates to a method and device for compressing a convolutional neural network by calculating the influence of a weight filter of a specific layer within the convolutional neural network on the next layer through an n-dimensional tensor product operation.

최근 다양한 기계 학습 및 컴퓨터 비전 기술들이 심층 합성곱 신경망을 활용하여 매우 높은 성능을 달성하였다. 하지만, 합성곱 신경망의 구조적 특징으로 인해 다수의 기 개발된 심층 합성곱 신경망 기반의 기술들은 높은 수준의 연산 능력을 요구하는 필수 불가결한 문제를 보유하고 있다. 심층 합성곱 신경망의 높은 메모리 및 계산 복잡도 문제를 해결하기 위해, 합성곱 층을 구성하는 가중치 필터들 중에서 신경망의 성능에 영향을 미치지 않거나 미치는 영향이 미미한 잉여 가중치 필터들을 선별하고 제거하는 필터 선택법 기반의 신경망 압축 기법들이 제안되고 있다. Recently, various machine learning and computer vision technologies have achieved very high performance by utilizing deep convolutional neural networks. However, due to the structural characteristics of convolutional neural networks, many previously developed deep convolutional neural network-based technologies have essential problems that require a high level of computing power. To solve the problem of high memory and computational complexity of deep convolutional neural networks, a filter selection method is used to select and remove redundant weight filters that do not or have only a minor effect on the performance of the neural network among the weight filters that make up the convolutional layer. Neural network compression techniques have been proposed.

기존 필터 선택법들은 특정 합성곱 층을 구성하는 가중치 필터들의 놈(norm)을 측정하여 낮은 놈을 가진 가중치 필터들을 제거하고, 반대로 높은 놈을 가지는 가중치 필터들을 유지하는 방법을 제안하였다. 이러한 신경망 압축 기법은 각각의 합성곱 층을 독립적으로 처리하므로, 가중치 필터의 중요도를 평가하는 과정에서 해당 필터가 후반 합성곱 층에서 미치는 영향을 고려하지 않는다. 다시 말해, 기존의 필터 선택법은 신경망 압축의 관점에서 국소 최적 문제(Local Optimal Problem)가 발생하는 문제가 있었다.Existing filter selection methods measure the norms of the weight filters constituting a specific convolution layer, remove weight filters with low norms, and conversely, maintain weight filters with high norms. Since this neural network compression technique processes each convolution layer independently, the effect of the filter on the later convolution layer is not considered in the process of evaluating the importance of the weight filter. In other words, the existing filter selection method had the problem of generating a local optimal problem from the perspective of neural network compression.

한국 공개특허 제10-2020-0052200호Korean Patent Publication No. 10-2020-0052200

개시된 기술은 합성곱 신경망 내 특정 계층의 가중치 필터가 다음 계층에 미치는 영향을 n차원 텐서 곱 연산을 통해 계산하여 합성곱 신경망을 압축하는 방법 및 장치를 제공하는데 있다.The disclosed technology provides a method and device for compressing a convolutional neural network by calculating the influence of a weight filter of a specific layer in the convolutional neural network on the next layer through an n-dimensional tensor product operation.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 제 1 측면은 연산장치가 합성곱 신경망(Convolution Neural Network, CNN)에 포함된 복수개의 계층들 중 특정 계층을 선택하는 단계, 상기 연산장치가 상기 특정 계층에 포함된 복수의 가중치 필터들 각각에 대한 n차원 텐서 곱 연산을 수행하여 상기 복수의 가중치 필터들 각각에 대한 중요도를 계산하는 단계 및 상기 연산장치가 상기 계산 결과에 따라 상기 복수의 가중치 필터들 중 상기 중요도가 기준 미만인 일부의 가중치 필터를 제거하는 단계를 포함하는 합성곱 신경망 압축 방법을 제공하는데 있다.The first aspect of the technology disclosed to achieve the above technical problem is a step of selecting a specific layer from a plurality of layers included in a convolution neural network (CNN) by a computing device, and selecting a specific layer by the processing device in the specific layer. Calculating importance for each of the plurality of weight filters by performing an n-dimensional tensor product operation on each of the plurality of weight filters included, and the calculation device selecting the weight filters among the plurality of weight filters according to the calculation result. The aim is to provide a convolutional neural network compression method that includes removing some weight filters whose importance is less than a standard.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 제 2 측면은 복수개의 계층을 포함하는 합성곱 신경망(Convolution Neural Network, CNN)을 저장하는 저장장치 및 상기 합성곱 신경망에 포함된 복수개의 계층들 중 특정 계층을 선택하여 상기 특정 계층에 포함된 가중치 필터에 대한 n차원 텐서 곱 연산을 수행하여 상기 가중치 필터에 대한 중요도를 계산하고, 상기 계산 결과에 따라 상기 가중치 필터를 제거하는 연산장치를 포함하는 합성곱 신경망 압축 장치를 제공하는데 있다.The second aspect of the technology disclosed to achieve the above technical problem is a storage device for storing a convolution neural network (CNN) including a plurality of layers and a specific layer among the plurality of layers included in the convolution neural network. A convolutional neural network including an operation unit that selects and performs an n-dimensional tensor product operation on the weight filters included in the specific layer to calculate the importance of the weight filter, and removes the weight filter according to the calculation result. The purpose is to provide a compression device.

개시된 기술의 실시 예들은 다음의 장점들을 포함하는 효과를 가질 수 있다. 다만, 개시된 기술의 실시 예들이 이를 전부 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다. Embodiments of the disclosed technology may have effects including the following advantages. However, since this does not mean that the embodiments of the disclosed technology must include all of them, the scope of rights of the disclosed technology should not be understood as being limited thereby.

개시된 기술의 일 실시예에 따른 n차원 텐서 곱 연산을 이용한 합성곱 신경망 압축 방법 및 장치는 특정 가중치 필터에 대한 n차원 텐서 곱 연산을 수행하여 합성곱 신경망을 압축시키는 효과가 있다.A convolutional neural network compression method and device using an n-dimensional tensor product operation according to an embodiment of the disclosed technology has the effect of compressing a convolutional neural network by performing an n-dimensional tensor product operation for a specific weight filter.

또한, 합성곱 신경망을 효율적으로 압축시켜서 메모리 소모량 및 계산 복잡도를 줄이는 효과가 있다.In addition, it has the effect of reducing memory consumption and computational complexity by efficiently compressing the convolutional neural network.

또한, 특정 가중치 필터가 다음 계층에 미치는 영향을 계산하여 신경망의 국소 최적 문제가 발생하는 것을 방지하는 효과가 있다.In addition, it has the effect of preventing local optimality problems in neural networks from occurring by calculating the impact of a specific weight filter on the next layer.

도 1은 개시된 기술의 일 실시예에 따른 n차원 텐서 곱 연산을 이용하여 합성곱 신경망을 압축하는 과정을 나타낸 도면이다.
도 2는 개시된 기술의 일 실시예에 따른 n차원 텐서 곱 연산을 이용한 합성곱 신경망 압축 방법에 대한 순서도이다.
도 3은 개시된 기술의 일 실시예에 따른 n차원 텐서 곱 연산을 이용한 합성곱 신경망 압축 장치에 대한 블록도이다.
도 4는 종래 합성곱 신경망의 구조를 나타낸 도면이다.Figure 1 is a diagram illustrating a process for compressing a convolutional neural network using an n-dimensional tensor product operation according to an embodiment of the disclosed technology.
Figure 2 is a flowchart of a convolutional neural network compression method using an n-dimensional tensor product operation according to an embodiment of the disclosed technology.
Figure 3 is a block diagram of a convolutional neural network compression device using an n-dimensional tensor product operation according to an embodiment of the disclosed technology.
Figure 4 is a diagram showing the structure of a conventional convolutional neural network.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all changes, equivalents, and substitutes included in the spirit and technical scope of the present invention.

제 1 , 제 2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성요소는 제 2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제 1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, B, etc. may be used to describe various components, but the components are not limited by the terms, and are only used for the purpose of distinguishing one component from other components. It is used only as For example, a first component may be referred to as a second component, and similarly, the second component may be referred to as a first component without departing from the scope of the present invention. The term and/or includes any of a plurality of related stated items or a combination of a plurality of related stated items.

본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 한다. 그리고 "포함한다" 등의 용어는 설시된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.As used herein, singular expressions should be understood to include plural expressions unless the context clearly dictates otherwise. And terms such as "include" mean the presence of the described features, numbers, steps, operations, components, parts, or combinations thereof, but one or more other features, numbers, steps, operations, components, parts, etc. It should be understood that it does not exclude the existence or addition of combinations thereof.

도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. Before providing a detailed description of the drawings, it would be clarified that the division of components in this specification is merely a division according to the main function each component is responsible for. That is, two or more components, which will be described below, may be combined into one component, or one component may be divided into two or more components for more detailed functions.

그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다. 따라서, 본 명세서를 통해 설명되는 각 구성부들의 존재 여부는 기능적으로 해석되어야 할 것이다.In addition to the main functions it is responsible for, each of the components described below may additionally perform some or all of the functions handled by other components, and some of the main functions handled by each component may be performed by other components. Of course, it can also be carried out exclusively by . Therefore, the presence or absence of each component described throughout this specification should be interpreted functionally.

도 1은 개시된 기술의 일 실시예에 따른 n차원 텐서 곱 연산을 이용하여 합성곱 신경망(Convolution Neural Network, CNN)을 압축하는 과정을 나타낸 도면이다. 도 1을 참조하면 연산장치는 이미지 분석을 위한 합성곱 신경망을 저장한다. 합성곱 신경망은 이미지 분류를 위해 사전에 학습된 것으로 다양한 구조의 합성곱 신경망을 탑재할 수 있다. 연산장치는 사용자가 입력하는 이미지를 합성곱 신경망에 입력하고 합성곱 신경망을 통해 출력되는 결과에 따라 이미지를 분류할 수 있다. 물론 이미지 분류 뿐만 아니라 텍스트나 음성 등을 입력받아 특징을 추출하는 것도 가능하다. 이하에서는 이미지 분류를 예시로 설명한다.Figure 1 is a diagram illustrating a process of compressing a convolution neural network (CNN) using an n-dimensional tensor product operation according to an embodiment of the disclosed technology. Referring to Figure 1, the computing device stores a convolutional neural network for image analysis. Convolutional neural networks are pre-trained for image classification and can be equipped with convolutional neural networks of various structures. The computing device can input the image input by the user into a convolutional neural network and classify the image according to the results output through the convolutional neural network. Of course, in addition to classifying images, it is also possible to extract features by inputting text or voice. Below, image classification is explained as an example.

한편, 합성곱 신경망은 복수개의 계층(Layer)들이 쌓여있는 구조이다. 네트워크의 깊이를 늘려서 특징 추출의 성능을 개선시킬 수 있다. 그러나 상술한 구조적 특징으로 인하여 합성곱 신경망을 탑재한 연산장치는 입력된 이미지를 분석할 때 메모리 복잡도와 계산 복잡도가 높아지는 문제가 발생할 수 있다. 이러한 문제를 해소하기 위해서 연산장치는 합성곱 신경망을 구성하는 복수의 계층들에 포함된 가중치 필터의 중요도를 계산하여 기준 미만의 가중치 필터를 제거함으로써 합성곱 신경망을 압축시킬 수 있다.Meanwhile, a convolutional neural network has a structure in which multiple layers are stacked. The performance of feature extraction can be improved by increasing the depth of the network. However, due to the above-mentioned structural characteristics, a computing device equipped with a convolutional neural network may experience problems of increased memory complexity and computational complexity when analyzing an input image. To solve this problem, the computing device can compress the convolutional neural network by calculating the importance of the weight filters included in the plurality of layers constituting the convolutional neural network and removing weight filters that are below the standard.

한편, 합성곱 신경망을 압축하는 과정에 있어서, 종래에는 특정 계층에 포함된 가중치 필터들의 중요도를 독립적으로 판단하였다. 예컨대, 가중치 필터의 L1 놈(Norm) 또는 L2 놈을 측정하여 높은 놈을 가진 가중치 필터는 유지하고 낮은 놈을 가진 가중치 필터를 제거하는 방식으로 신경망을 압축하였다. 그러나 이러한 방식은 가중치 필터가 속한 계층에 대한 중요도만을 판단하는 것이어서 가중치 필터가 후속 계층에 미치는 영향력을 고려하지 않았기 때문에 국소 최적 문제가 발생하는 문제가 있었다. 따라서, 본 기술에서는 가중치 필터가 후속 계층에 미치는 영향력까지 고려하여 합성곱 신경망을 압축하는 기법을 개시한다.Meanwhile, in the process of compressing a convolutional neural network, conventionally, the importance of weight filters included in a specific layer was independently determined. For example, the neural network was compressed by measuring the L1 norm or L2 norm of a weight filter, maintaining weight filters with a high norm and removing weight filters with a low norm. However, this method only judges the importance of the layer to which the weight filter belongs and does not consider the influence of the weight filter on subsequent layers, resulting in a local optimality problem. Therefore, this technology discloses a technique for compressing a convolutional neural network by considering the influence of the weight filter on subsequent layers.

한편, 연산장치는 상술한 방식의 합성곱 신경망 압축을 수행하기 위하여 합성곱 신경망에 포함된 복수개의 계층들 중 특정 계층을 선택한다. 여기에서 특정 계층을 선택하는 과정은 합성곱 신경망을 구성하는 복수개의 계층들 중 랜덤한 방식으로 하나를 선택하는 방식으로 처리될 수 있다. 해당 계층에 포함된 복수의 가중치 필터들에 대한 중요도를 연산한 다음 다른 계층을 선택할 수 있다. 물론 최초 계층부터 순차적으로 선택하여 가중치 필터의 중요도를 계산할 수도 있고 합성곱 신경망의 계층의 순서와 가중치 필터의 순서를 모두 랜덤하게 선택할 수도 있다. 가령 i번째 계층의 j번째 가중치 필터를 선택하여 중요도를 계산한 다음 k번째 계층의 s번째 가중치 필터를 선택할 수도 있다.Meanwhile, the computing device selects a specific layer among a plurality of layers included in the convolutional neural network in order to perform the convolutional neural network compression in the above-described manner. Here, the process of selecting a specific layer can be processed by randomly selecting one of the plurality of layers that make up the convolutional neural network. After calculating the importance of a plurality of weight filters included in the corresponding layer, another layer can be selected. Of course, you can calculate the importance of the weight filter by selecting it sequentially from the first layer, or you can randomly select both the order of the layers of the convolutional neural network and the order of the weight filter. For example, you can select the jth weight filter of the ith layer, calculate the importance, and then select the sth weight filter of the kth layer.

한편, 연산장치는 선택한 특정 계층에 포함된 복수의 가중치 필터들 각각에 대한 n차원 텐서 곱 연산을 수행하여 복수의 가중치 필터들 각각에 대한 중요도를 계산한다. 가중치 필터의 중요도를 계산하기에 앞서 이하의 수학식 1에 따라 합성곱 신경망의 메모리 복잡도 와 계산 복잡도 를 정의하면 다음과 같다.Meanwhile, the computing device calculates the importance of each of the plurality of weight filters by performing an n-dimensional tensor product operation on each of the plurality of weight filters included in the selected specific layer. Before calculating the importance of the weight filter, the memory complexity of the convolutional neural network is calculated according to Equation 1 below: and computational complexity is defined as follows.

여기에서 는 합성곱 신경망의 i번째 계층 을 구성하는 가중치 필터의 너비 및 높이를 의미하고 는 가중치 필터의 개수를 의미한다. 그리고 는 i+1번째 특징맵의 너비를 의미하고 는 i+1번째 특징맵의 높이를 의미한다. 특정 가중치 필터를 제거하는 방식의 신경망 압축 기법에서는 가중치 필터의 개수를 로 줄이는 것을 목표로 한다. 따라서, 압축된 합성곱 신경망의 메모리 복잡도 와 계산 복잡도 를 정의하면 다음과 같다.From here is the ith layer of the convolutional neural network refers to the width and height of the weight filter that constitutes means the number of weight filters. and means the width of the i+1th feature map, means the height of the i+1th feature map. In the neural network compression technique that removes specific weight filters, the number of weight filters cast The goal is to reduce it to Therefore, the memory complexity of a compressed convolutional neural network is and computational complexity is defined as follows.

한편, 신경망의 메모리 압축률은 원본 신경망의 메모리 복잡도 대비 압축된 신경망의 메모리 복잡도로 정의할 수 있다. 마찬가지로 신경망의 계산 복잡도 압축률도 원본 신경망의 계산 복잡도 대비 압축된 신경망의 계산 복잡도로 정의할 수 있다. 이와 같이 압축된 신경망의 메모리 압축률 및 계산 복잡도 압축률은 수학식 3으로 정의할 수 있다.Meanwhile, the memory compression rate of a neural network can be defined as the memory complexity of the compressed neural network compared to the memory complexity of the original neural network. Likewise, the computational complexity compression ratio of a neural network can be defined as the computational complexity of the compressed neural network compared to the computational complexity of the original neural network. The memory compression rate and computational complexity compression rate of this compressed neural network can be defined by Equation 3.

한편, 놈(Norm) 기반 필터 선택 기법에 있어서, i번째 계층에 포함된 j번째 가중치 필터 의 중요도는 아래의 수학식 4와 같이 표현할 수 있다.Meanwhile, in the norm-based filter selection technique, the j-th weight filter included in the i-th layer The importance of can be expressed as Equation 4 below.

여기에서 i번째 계층에 포함된 j번째 가중치 필터 의 중요도 는 과 같다. 즉, 는 의 놈 연산자를 의미한다. 종래 기법에서는 특정 가중이 필터의 놈이 높은지 낮은지에 대해서만 판단하기 때문에 해당 가중치 필터가 속한 계층에 대하여 독립적으로 중요도를 판단한다. 즉, 가중치 필터가 후속 계층에 어떤 영향력을 미칠 것인지에 대해서는 고려하지 않기 때문에 국소 최적 문제가 발생하는 것이다.Here, the importance of the jth weight filter included in the ith layer Is Same as in other words, Is means the norm operator of . In the conventional technique, since a specific weight determines only whether the norm of the filter is high or low, the importance is determined independently for the layer to which the weight filter belongs. In other words, a local optimality problem occurs because the weight filter does not consider how it will affect subsequent layers.

이러한 문제점을 해소하기 위해서 연산장치는 아래의 수학식 5에 따라 n차원 텐서 곱 연산을 수행하여 가중치 필터의 중요도를 계산할 수 있다.To solve this problem, the computing device can calculate the importance of the weight filter by performing an n-dimensional tensor product operation according to Equation 5 below.

여기에서 는 i번째 계층 과 i+1번째 계층 간의 n차원 곱셈 과정을 의미한다. 그리고 는 i번째 계층의 j번째 가중치 필터를 0으로 구성된 가중치 필터를 의미한다. n차원 곱셈은 이하의 수학식 6을 통해 계산된다.From here is the ith layer and i+1th layer It refers to the n-dimensional multiplication process between and means a weight filter in which the jth weight filter of the ith layer is set to 0. The n-dimensional multiplication is calculated using Equation 6 below.

여기에서 는 i번째 계층 을 의 형태로 변경하는 차원 변환 함수를 의미한다. 개시된 기술에서 제안하는 방법은 n차원 텐서 곱 연산에 따라 계산된 새로운 고차원 텐서의 재구성 손실 함수(Reconstruction loss)의 형태로 정의된다. n차원 텐서 곱 연산을 통해 구해진 새로운 텐서 데이터는 특정 계층이 후속 계층에 미치는 영향력을 표현한 것이며, 특정 가중치 필터를 0으로 치환한 것과의 재구성 손실값을 통해 해당 가중치 필터의 중요도를 정량화할 수 있다. From here is the ith layer second It refers to a dimension conversion function that changes the form of . The method proposed in the disclosed technology is defined in the form of a reconstruction loss function of a new high-dimensional tensor calculated according to an n-dimensional tensor product operation. New tensor data obtained through an n-dimensional tensor product operation expresses the influence of a specific layer on the subsequent layer, and the importance of the weight filter can be quantified through the reconstruction loss value of replacing a specific weight filter with 0.

한편, 연산장치는 상술한 계산 과정에 따라 특정 계층의 가중치 필터들 각각에 대한 중요도를 계산할 수 있다. 그리고 각각의 결과에 따라 중요도가 기준 미만인 일부의 가중치 필터를 제거할 수 있다. 그리고 이러한 과정을 합성곱 신경망 내 모든 계층들에 각각 적용함으로써 각 계층 별로 중요도가 낮은 가중치 필터를 제거하는 방식으로 합성곱 신경망을 압축시킬 수 있다. 이 과정에서 합성곱 신경망에 포함된 다수의 파라미터들 중 잉여 파라미터들이 제거되어 계산과정에 대한 복잡도가 줄어들 수 있고 계산 시 메모리 복잡도 또한 줄어들 수 있다.Meanwhile, the computing device can calculate the importance of each weight filter of a specific layer according to the above-described calculation process. And, depending on each result, some weight filters whose importance is below the standard can be removed. And by applying this process to all layers in the convolutional neural network, the convolutional neural network can be compressed by removing weight filters of low importance for each layer. In this process, redundant parameters among the many parameters included in the convolutional neural network are removed, thereby reducing the complexity of the calculation process and memory complexity during calculation.

도 2는 개시된 기술의 일 실시예에 따른 n차원 텐서 곱 연산을 이용한 합성곱 신경망 압축 방법에 대한 순서도이다. 도 2를 참조하면 합성곱 신경망 압축 방법은 210 단계 내지 230 단계를 포함한다. 각 단계는 연산장치를 통해 순차적으로 수행될 수 있다.Figure 2 is a flowchart of a convolutional neural network compression method using an n-dimensional tensor product operation according to an embodiment of the disclosed technology. Referring to FIG. 2, the convolutional neural network compression method includes steps 210 to 230. Each step can be performed sequentially through a computing device.

210 단계에서 연산장치는 합성곱 신경망(Convolution Neural Network, CNN)에 포함된 복수개의 계층들 중 특정 계층을 선택한다. 합성곱 신경망을 구성하는 복수개의 계층들 중 랜덤한 방식으로 하나를 선택할 수도 있다. 연산장치는 각 계층 별 가중치 필터들의 중요도를 알 수 없으므로 신경망에 포함된 모든 계층을 순차적으로 선택하여 각 계층 별로 n차원 텐서 곱 연산을 수행할 수도 있다.In step 210, the computing device selects a specific layer among a plurality of layers included in a convolution neural network (CNN). One of the plurality of layers that make up the convolutional neural network may be selected in a random manner. Since the computing device cannot know the importance of the weight filters for each layer, it may sequentially select all layers included in the neural network and perform an n-dimensional tensor product operation for each layer.

220 단계에서 연산장치는 특정 계층에 포함된 복수의 가중치 필터들 각각에 대한 n차원 텐서 곱 연산을 수행하여 복수의 가중치 필터들 각각에 대한 중요도를 계산한다. 중요도를 계산하는 과정은 앞서 도 1을 통해 설명한 바와 같이 각 계층별로 독립적으로 가중치 필터의 중요도를 계산하는 것이 아니라 후속 계층에 미치는 영향력을 고려하여 중요도를 계산한다.In step 220, the computing device calculates the importance of each of the plurality of weight filters by performing an n-dimensional tensor product operation on each of the plurality of weight filters included in a specific layer. The process of calculating the importance does not calculate the importance of the weight filter independently for each layer, as previously explained in Figure 1, but calculates the importance by considering the influence on the subsequent layer.

230 단계에서 연산장치는 계산 결과에 따라 복수의 가중치 필터들 중 중요도가 기준 미만인 일부의 가중치 필터를 제거한다. 연산장치는 210 내지 230 단계를 일정 횟수 반복하여 합성곱 신경망에서 중요도가 낮은 가중치 필터들을 제거할 수 있다.In step 230, the computing device removes some weight filters whose importance is less than the standard among the plurality of weight filters according to the calculation result. The processing unit may repeat steps 210 to 230 a certain number of times to remove weight filters of low importance from the convolutional neural network.

도 3은 개시된 기술의 일 실시예에 따른 n차원 텐서 곱 연산을 이용한 합성곱 신경망 압축 장치에 대한 블록도이다. 도 3을 참조하면 합성곱 신경망 압축 장치(300)는 저장장치(310) 및 연산장치(320)를 포함한다.Figure 3 is a block diagram of a convolutional neural network compression device using an n-dimensional tensor product operation according to an embodiment of the disclosed technology. Referring to FIG. 3, the convolutional neural network compression device 300 includes a storage device 310 and an arithmetic device 320.

저장장치(310)는 합성곱 신경망을 저장한다. 합성곱 신경망은 계층 수를 늘려서 네트워크의 깊이를 깊게 형성한 것이므로 저장장치(310)는 이러한 합성곱 신경망을 저장할 수 있을 정도의 용량을 가진 메모리를 이용할 수 있다.The storage device 310 stores the convolutional neural network. Since the convolutional neural network increases the number of layers to create a deeper network, the storage device 310 can use a memory with a capacity sufficient to store the convolutional neural network.

연산장치(320)는 합성곱 신경망에 포함된 복수개의 계층들 중 특정 계층을 선택하여 특정 계층에 포함된 가중치 필터에 대한 n차원 텐서 곱 연산을 수행하여 가중치 필터에 대한 중요도를 계산한다. 그리고, 계산 결과에 따라 가중치 필터를 제거한다. 연산장치(320)는 합성곱 신경망 압축 장치(300)의 프로세서 내지는 AP일 수 있다. 연산장치(320)는 n차원 텐서 곱 연산에 따라 계산된 가중치 필터들의 텐서 데이터와 가중치 필터를 0으로 치환한 텐서 데이터를 뺀 결과가 기준 미만이면 가중치 필터를 제거할 수 있다. 연산장치는 이러한 계산 과정을 특정 계층에 포함된 나머지 가중치 필터들에도 각각 적용하여 중요도가 기준 미만인 적어도 하나의 가중치 필터를 제거할 수 있다. 이러한 계산 과정을 반복함으로써 저장장치(310)에 저장된 합성곱 신경망을 전체적으로 압축시킬 수 있다.The computing unit 320 selects a specific layer from among the plurality of layers included in the convolutional neural network and calculates the importance of the weight filter by performing an n-dimensional tensor product operation on the weight filter included in the specific layer. Then, the weight filter is removed according to the calculation results. The computing device 320 may be a processor or an AP of the convolutional neural network compression device 300. The calculation unit 320 may remove the weight filter if the result of subtracting the tensor data of the weight filters calculated according to the n-dimensional tensor product operation and the tensor data obtained by replacing the weight filter with 0 is less than the standard. The computing device may apply this calculation process to each of the remaining weight filters included in a specific layer to remove at least one weight filter whose importance is less than the standard. By repeating this calculation process, the convolutional neural network stored in the storage device 310 can be compressed as a whole.

한편, 상술한 합성곱 신경망 압축 장치(300)는 컴퓨터에서 실행될 수 있는 실행가능한 알고리즘을 포함하는 프로그램(또는 어플리케이션)으로 구현될 수도 있다. 상기 프로그램은 일시적 또는 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다.Meanwhile, the convolutional neural network compression device 300 described above may be implemented as a program (or application) including an executable algorithm that can be executed on a computer. The program may be stored and provided in a temporary or non-transitory computer readable medium.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM (read-only memory), PROM (programmable read only memory), EPROM(Erasable PROM, EPROM) 또는 EEPROM(Electrically EPROM) 또는 플래시 메모리 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.A non-transitory readable medium refers to a medium that stores data semi-permanently and can be read by a device, rather than a medium that stores data for a short period of time, such as registers, caches, and memories. Specifically, the various applications or programs described above include CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM (read-only memory), PROM (programmable read only memory), and EPROM (Erasable PROM, EPROM). Alternatively, it may be stored and provided in a non-transitory readable medium such as EEPROM (Electrically EPROM) or flash memory.

일시적 판독 가능 매체는 스태틱 램(Static RAM，SRAM), 다이내믹 램(Dynamic RAM，DRAM), 싱크로너스 디램 (Synchronous DRAM，SDRAM), 2배속 SDRAM(Double Data Rate SDRAM，DDR SDRAM), 증강형 SDRAM(Enhanced SDRAM，ESDRAM), 동기화 DRAM(Synclink DRAM，SLDRAM) 및 직접 램버스 램(Direct Rambus RAM，DRRAM) 과 같은 다양한 RAM을 의미한다.Temporarily readable media include Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), and Enhanced SDRAM (Enhanced RAM). It refers to various types of RAM, such as SDRAM (ESDRAM), Synchronous DRAM (Synclink DRAM, SLDRAM), and Direct Rambus RAM (DRRAM).

도 4는 종래 합성곱 신경망의 구조를 나타낸 도면이다. 도 4를 참조하면 합성곱 신경망의 한 종류인 VGG-16과 ResNet-18은 서로 다른 구조를 가지고 있다. 먼저 VGG-16는 3x3 가중치 필터들을 쌓아서 신경망을 형성하는 구조이며 ResNet-18는 기본적으로는 VGG-16와 동일하되 스킵커넥션이라는 구조적 특징이 추가된 것이다. 아래 표 1을 참조하면 개시된 기술을 상기 두 종류의 합성곱 신경망에 적용한 결과 종래의 놈 기반 필터 제거 기법 대비 압축 성능이 개선된 것을 확인할 수 있었다. 표 1은 공인 데이터셋인 CIFAR-100을 이용하여 이미지 분류 시 VGG-16과 ResNet-18의 성능, L1 Norm 필터 선택법을 이용한 압축 시 성능, L2 Norm 필터 선택법을 이용한 압축 시 성능, 개시된 기술을 적용하여 압축 시의 성능을 각각 나타낸 것이다.Figure 4 is a diagram showing the structure of a conventional convolutional neural network. Referring to Figure 4, VGG-16 and ResNet-18, which are types of convolutional neural networks, have different structures. First, VGG-16 is a structure that forms a neural network by stacking 3x3 weight filters, and ResNet-18 is basically the same as VGG-16, but has the added structural feature of skip connection. Referring to Table 1 below, as a result of applying the disclosed technology to the above two types of convolutional neural networks, it was confirmed that compression performance was improved compared to the conventional norm-based filter removal technique. Table 1 shows the performance of VGG-16 and ResNet-18 when classifying images using CIFAR-100, a public dataset, performance when compressing using the L1 Norm filter selection method, performance when compressing using the L2 Norm filter selection method, and applying the disclosed technology. This shows the performance during compression.

# of params# of params Accuracy (%)Accuracy (%) VGG-16VGG-16 1472313614723136 72.4872.48 L1 NormL1 Norm 1833621
(12.45%)1833621
(12.45%) 63.94
(-8.54)63.94
(-8.54) L2 NormL2 Norm 1833621
(12.45%)1833621
(12.45%) 62.59
(-9.89)62.59
(-9.89) OursOurs 1811760
(12.31%)1811760
(12.31%) 65.73
(-6.75)65.73
(-6.75) ResNet-18ResNet-18 64481926448192 76.3876.38 L1 NormL1 Norm 929660
(14.4%)929660
(14.4%) 68.17
(-8.21)68.17
(-8.21) L2 NormL2 Norm 929660
(14.4%)929660
(14.4%) 67.29
(-9.09)67.29
(-9.09) OursOurs 914923
(14.2%)914923
(14.2%) 71.45
(-4.93)71.45
(-4.93)

상기 표 1을 참조하면 VGG-16의 원본 신경망의 파라미터 개수(# of params)는 14723136개이고 정확도(Accuracy)는 72.48%를 나타냈다. 여기에서 L1 Norm 기반의 필터 선택법을 적용한 결과 파라미터 개수는 1833621개로 줄어들었고 정확도는 63.94%를 나타내었다. L2 Norm 기반 필터 선택법을 적용한 결과도 파라미터 개수 1833621개와 정확도 62.59%를 나타내었다. 두 기법 모두 비슷한 성능을 나타내었다. 그리고 개시된 기술을 VGG-16에 적용한 결과 파라미터 개수는 1811760개로 종래 기법 대비 더 많은 파라미터들이 줄어들었으며 정확도는 65.73%로 오히려 높아진 것을 확인할 수 있었다.Referring to Table 1 above, the number of parameters (# of params) of the original neural network of VGG-16 was 14723136, and the accuracy was 72.48%. Here, as a result of applying the L1 Norm-based filter selection method, the number of parameters was reduced to 1833621 and the accuracy was 63.94%. The results of applying the L2 Norm-based filter selection method also showed that the number of parameters was 1833621 and the accuracy was 62.59%. Both techniques showed similar performance. And as a result of applying the disclosed technology to VGG-16, the number of parameters was reduced to 1811760 compared to the conventional technique, and the accuracy was confirmed to be increased to 65.73%.

한편 ResNet-18의 원본 신경망의 파라미터 개수(# of params)는 6448192개이고 정확도(Accuracy)는 76.38%를 나타냈다. 여기에서 L1 Norm 기반의 필터 선택법을 적용한 결과 파라미터 개수는 929660개로 줄어들었고 정확도는 68.17%를 나타내었다. L2 Norm 기반 필터 선택법을 적용한 결과는 파라미터 개수 929660개와 정확도 67.29%를 나타내었다. 마찬가지로 두 기법 모두 비슷한 성능을 나타내었다. 그리고 개시된 기술을 ResNet-18에 적용한 결과 파라미터 개수는 914923개로 종래 기법 대비 더 많은 파라미터들이 줄어들었으며 정확도는 71.45%로 오히려 높아진 것을 확인할 수 있었다. 즉, 합성곱 신경망의 구조와 상관없이 개시된 기술을 적용하여 신경망을 압축한 결과가 종래의 L1, L2 Norm 기반 필터 선택법 대비 더 높은 성능을 나타내는 것을 확인할 수 있었다. 따라서, 개시된 기술을 합성곱 신경망의 종류나 구조에 구애받지 않고 적용시킴으로써 신경망을 효율적으로 압축시킬 수 있고 메모리 복잡도와 계산 복잡도를 줄일 수 있다.Meanwhile, the number of parameters (# of params) of the original neural network of ResNet-18 was 6448192, and the accuracy was 76.38%. Here, as a result of applying the L1 Norm-based filter selection method, the number of parameters was reduced to 929660 and the accuracy was 68.17%. The results of applying the L2 Norm-based filter selection method showed a number of parameters of 929660 and an accuracy of 67.29%. Likewise, both techniques showed similar performance. And as a result of applying the disclosed technology to ResNet-18, it was confirmed that the number of parameters was reduced to 914923, which was more than the conventional method, and the accuracy was actually increased to 71.45%. In other words, regardless of the structure of the convolutional neural network, it was confirmed that the result of compressing the neural network by applying the disclosed technology shows higher performance compared to the conventional L1 and L2 Norm-based filter selection methods. Therefore, by applying the disclosed technology regardless of the type or structure of the convolutional neural network, the neural network can be efficiently compressed and memory complexity and computational complexity can be reduced.

개시된 기술의 일 실시예에 따른 n차원 텐서 곱 연산을 이용한 합성곱 신경망 압축 방법 및 장치는 이해를 돕기 위하여 도면에 도시된 실시 예를 참고로 설명되었으나, 이는 예시적인 것에 불과하며, 당해 분야에서 통상적 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서, 개시된 기술의 진정한 기술적 보호범위는 첨부된 특허청구범위에 의해 정해져야 할 것이다.The convolutional neural network compression method and device using an n-dimensional tensor product operation according to an embodiment of the disclosed technology has been described with reference to the embodiment shown in the drawings to aid understanding, but this is merely illustrative and is a method commonly used in the field. Those skilled in the art will understand that various modifications and other equivalent embodiments are possible. Therefore, the true technical protection scope of the disclosed technology should be determined by the appended patent claims.

Claims

Selecting, by a computing device, a specific layer from among a plurality of layers included in a convolution neural network (CNN);
Calculating importance for each of the plurality of weight filters by performing an n-dimensional tensor product operation on each of the plurality of weight filters included in the specific layer, by the computing device; and
Including, by the calculation device, removing some weight filters whose importance is less than the standard among the plurality of weight filters according to the calculation result,
A convolutional neural network compression method in which the calculation unit removes the weight filter if the result of subtracting the tensor data of the plurality of weight filters calculated according to the n-dimensional tensor multiplication operation and the tensor data obtained by replacing the weight filter with 0 is less than a standard. .

According to claim 1,
The step of selecting a specific layer is a convolutional neural network compression method in which one of a plurality of layers included in the convolutional neural network is randomly selected.

delete

According to claim 1,
A convolutional neural network compression method in which the importance is calculated according to Equation 5 below.
[Equation 5]

(From here means the importance of the jth weight filter of the ith layer, is the ith layer and i+1th layer It means n-dimensional multiplication between and means the width and height of the weight filter included in the ith layer, means the number of weight filters in the ith layer. and means replacing the jth weight filter of the ith layer with a weight filter consisting of 0.)

According to claim 4,
A convolutional neural network compression method in which the n-dimensional multiplication is calculated according to Equation 6 below.
[Equation 6]

(From here is the ith layer second refers to a dimension conversion function that changes the form.)

A storage device for storing a convolution neural network (CNN) including a plurality of layers; and
Select a specific layer from among the plurality of layers included in the convolutional neural network, calculate the importance of the weight filter by performing an n-dimensional tensor product operation on the weight filter included in the specific layer, and calculate the importance of the weight filter according to the calculation result. Including an arithmetic unit for removing the weight filter,
The calculation device is a convolutional neural network compression device that removes the weight filter when the result of subtracting the tensor data of the weight filters calculated according to the n-dimensional tensor multiplication operation and the tensor data obtained by replacing the weight filter with 0 is less than a standard.

delete

According to claim 6,
The computing device is a convolutional neural network compression device that calculates the importance according to Equation 5 below.
[Equation 5]

(From here means the importance of the jth weight filter of the ith layer, is the ith layer and i+1th layer It means n-dimensional multiplication between and means the width and height of the weight filter included in the ith layer, means the number of weight filters in the ith layer. and means replacing the jth weight filter of the ith layer with a weight filter consisting of 0.)

According to claim 8,
The computing device is a convolutional neural network compression device that calculates the n-dimensional multiplication according to Equation 6 below.
[Equation 6]

(From here is the ith layer second refers to a dimension conversion function that changes the form.)

According to claim 6,
A convolutional neural network compression device in which the calculation unit calculates the importance of each of the remaining weight filters included in the specific layer and removes at least one weight filter whose importance is less than a standard.