KR20200049422A

KR20200049422A - Effective Network Compression using Simulation-guided Iterative Pruning

Info

Publication number: KR20200049422A
Application number: KR1020180156750A
Authority: KR
Inventors: 정대웅; 김재헌; 김영석; 채명수
Original assignee: 주식회사 노타
Priority date: 2018-10-31
Filing date: 2018-12-07
Publication date: 2020-05-08
Also published as: KR102153192B1

Abstract

According to various embodiments of the present invention, efficient network compression using simulation-guided iterative pruning, performed by an electronic device, comprises: pruning a first neural network based on a threshold value to generate a second neural network; calculating a gradient for each weight of the second neural network; and applying the gradient to the first neural network to acquire a third neural network.

Description

Effective Network Compression using Simulation-guided Iterative Pruning}

다양한 실시예들은 시뮬레이션-가이드된 반복적 프루닝을 사용하는 효율적인 네트워크 압축에 관한 것이다.Various embodiments relate to efficient network compression using simulation-guided iterative pruning.

심층 뉴럴 네트워크(deep neural network)의 발전은 인공 지능의 최근 인기에 크게 기여했다. 다양한 분야에서 최첨단 성능을 보여주는 대부분의 알고리즘들은 심층 뉴럴 네트워크를 기반으로 하고 있다. 그러나, 복잡하고 대규모인 네트워크 구조로 인하여, 하이엔드 컴퓨팅을 사용하지 않고 심층 뉴럴 네트워크를 사용하는 것은 어렵다. 컴퓨팅 파워를 공급하기 위해, 심층 뉴럴 네트워크를 기반으로 하는 대부분의 기존 제품들이 하이엔드 서버에서 처리되므로, 대기 시간 문제, 네트워크 비용 및 개인 정보 문제와 같은 세 가지의 중요한 한계들이 있다. 따라서, 서버가 아닌 독립적인 클라이언트들에서 심층 뉴럴 네트워크를 사용하도록 할 필요가 있다. 이를 달성하기 위해, 네트워크 압축 기술이 매우 중요하다. The development of deep neural networks has greatly contributed to the recent popularity of artificial intelligence. Most of the algorithms that demonstrate cutting-edge performance in various fields are based on deep neural networks. However, due to the complex and large-scale network structure, it is difficult to use a deep neural network without using high-end computing. To provide computing power, most of the existing products based on deep neural networks are processed on high-end servers, so there are three important limitations: latency issues, network costs and privacy issues. Therefore, it is necessary to make the deep neural network available to independent clients, not the server. To achieve this, network compression technology is very important.

네트워크 압축에 대한 연구는 다양한 접근법들을 통해 집중적으로 수행되어 왔다. 또한, 네트워크 압축 방법 중 반복적 프루닝(iterative pruning)은 최첨단의 방법을 포함하여 이전의 여러 연구들에서 효과가 있는 것으로 입증된 가장 유명한 방법 중 하나이다. 반복적 프루닝 프로세스에서, 먼저 기본 네트워크(original network)에서 가중치들의 중요도가 평가된 다음, 미세 조정을 통해 나머지 가중치들이 재트레이닝됨으로써, 낮은 중요도의 가중치들이 제거된다. 이러한 프루닝 프로세스는, 정지 조건에 부합될 때까지, 반복적으로 수행된다. Research on network compression has been intensively conducted through various approaches. In addition, iterative pruning among network compression methods is one of the most popular methods that have proven effective in several previous studies, including the most advanced. In the iterative pruning process, the importance of weights is first evaluated in the original network, and then the rest of the weights are retrained through fine adjustment, so that the weights of low importance are removed. This pruning process is performed repeatedly until the stop conditions are met.

그러나, 이러한 프로세스에서, 중요도는 기본 네트워크를 기반으로 결정되므로, 프루닝된 가중치가 정리된 네트워크에서 충분히 중요할 수 있다. 따라서, 다양한 실시예들에서, 축소된 네트워크의 시뮬레이션을 기반으로, 보다 효율적이고 정교한 가중치 프루닝 방법이 제안된다. However, in this process, the importance is determined based on the basic network, so the pruned weights can be sufficiently important in the organized network. Accordingly, in various embodiments, a more efficient and sophisticated weight pruning method is proposed based on a simulation of a reduced network.

다양한 실시예들에 따른 효과적인 네트워크 압축을 위한 시뮬레이션-가이드된 반복적 프루닝 방법은, 제 1 뉴럴 네트워크를 임계 값에 기반하여 프루닝하여, 제 2 뉴럴 네트워크를 생성하는 동작, 상기 제 2 뉴럴 네트워크의 각 가중치에 대한 그라디언트를 계산하는 동작, 및 상기 그라디언트를 상기 제 1 뉴럴 네트워크에 적용하여, 제 3 뉴럴 네트워크를 획득하는 동작을 포함할 수 있다. A simulation-guided iterative pruning method for effective network compression according to various embodiments of the present invention comprises: pruning a first neural network based on a threshold value to generate a second neural network, and And calculating a gradient for each weight, and applying the gradient to the first neural network to obtain a third neural network.

다양한 실시예들에 따른 효과적인 네트워크 압축을 위한 시뮬레이션-가이드된 반복적 프루닝을 수행하는 전자 장치는, 압축할 기본 네트워크의 가중치들을 저장하고 있는 메모리, 및 상기 기본 네트워크를 압축하도록 구성된 프로세서를 포함할 수 있다. 다양한 실시예들에 따르면, 상기 프로세서는, 제 1 뉴럴 네트워크를 임계 값에 기반하여 프루닝하여, 제 2 뉴럴 네트워크를 생성하고, 상기 제 2 뉴럴 네트워크의 각 가중치에 대한 그라디언트를 계산하고, 상기 그라디언트를 상기 제 1 뉴럴 네트워크에 적용하여, 제 3 뉴럴 네트워크를 획득하도록 구성될 수 있다. An electronic device performing simulation-guided iterative pruning for effective network compression according to various embodiments may include a memory storing weights of a basic network to be compressed, and a processor configured to compress the basic network. have. According to various embodiments, the processor prunes the first neural network based on a threshold value to generate a second neural network, calculates a gradient for each weight of the second neural network, and the gradient It can be configured to obtain a third neural network by applying to the first neural network.

다양한 실시예들에 따르면, 심층 뉴럴 네트워크를 압축하는 새로운 방법이 제안된다. 일시적으로 축소된 네트워크의 시뮬레이션을 통해, 반복적 프루닝이 보다 효과적으로 수행될 수 있다. 동시에, 축소된 네트워크에 보다 적합한 구조로 최적의 가중치가 협력적으로 학습될 수 있다. 따라서, 한정된 자원을 사용하는 임베디드 시스템에 고성능 심층 학습 모델을 장착하는 것이 가능하다. According to various embodiments, a new method for compressing a deep neural network is proposed. Through simulation of the temporarily reduced network, iterative pruning can be performed more effectively. At the same time, the optimal weight can be cooperatively learned with a more suitable structure for the reduced network. Therefore, it is possible to mount a high-performance deep learning model in an embedded system using limited resources.

도 1은 다양한 실시예들에 따른 시뮬레이션-가이드된 반복적 프루닝을 수행하는 전자 장치의 블록도이다.
도 2는 다양한 실시예들에 따른 시뮬레이션-가이드된 반복적 프루닝을 수행하는 전자 장치의 동작 알고리즘을 설명하기 위한 도면이다.
도 3은 다양한 실시예들에 따른 시뮬레이션-가이드된 반복적 프루닝을 개념적으로 설명하기 위한 도면이다.
도 4는 다양한 실시예들에 따른 시뮬레이션-가이드된 반복적 프루닝을 수행하는 방법의 순서도이다.
도 5는 다양한 실시예들에 따른 시뮬레이션-가이드된 반복적 프루닝의 성능을 설명하기 위한 도면이다. 1 is a block diagram of an electronic device that performs simulation-guided iterative pruning according to various embodiments.
2 is a diagram for describing an operation algorithm of an electronic device performing simulation-guided iterative pruning according to various embodiments of the present disclosure.
3 is a diagram for conceptually explaining simulation-guided iterative pruning according to various embodiments.
4 is a flow chart of a method for performing simulation-guided iterative pruning according to various embodiments.
5 is a diagram for describing the performance of simulation-guided iterative pruning according to various embodiments.

이하, 본 문서의 다양한 실시예들이 첨부된 도면을 참조하여 설명된다. 다양한 실시예들을 설명함에 있어서, 관련된 공지기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단된 경우 그 상세한 설명은 생략한다. 그리고 후술되는 용어들은 기능을 고려하여 정의된 용어들로서, 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. Hereinafter, various embodiments of the document will be described with reference to the accompanying drawings. In describing various embodiments, when it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. In addition, terms to be described later are terms defined in consideration of functions, and may vary according to a user's or operator's intention or practice.

다양한 실시예들에 따르면, 효과적인 네트워크 압축을 위한 시뮬레이션-가이드된 반복적 프루닝이 제안된다. According to various embodiments, simulation-guided iterative pruning for effective network compression is proposed.

도 1은 다양한 실시예들에 따른 시뮬레이션-가이드된 반복적 프루닝을 수행하는 전자 장치의 블록도이다. 도 2는 다양한 실시예들에 따른 시뮬레이션-가이드된 반복적 프루닝을 수행하는 전자 장치의 동작 알고리즘을 설명하기 위한 도면이다. 도 3은 다양한 실시예들에 따른 시뮬레이션-가이드된 반복적 프루닝을 개념적으로 설명하기 위한 도면이다. 1 is a block diagram of an electronic device that performs simulation-guided iterative pruning according to various embodiments. 2 is a diagram for describing an operation algorithm of an electronic device performing simulation-guided iterative pruning according to various embodiments of the present disclosure. 3 is a diagram for conceptually explaining simulation-guided iterative pruning according to various embodiments.

도 1을 참조하면, 다양한 실시예들에 따른 전자 장치(100)는 구성 요소들로서, 메모리(110)와 프로세서(120)를 포함할 수 있다. 어떤 실시예에서, 전자 장치(100)는 적어도 하나의 다른 구성 요소를 더 포함할 수 있다. 다양한 실시예들에 따르면, 전자 장치(100)는 심층 뉴럴 네트워크(deep neural network)를 사용하도록 구현되며, 시뮬레이션-가이드된 반복적 프루닝 프로세스를 수행할 수 있다. Referring to FIG. 1, the electronic device 100 according to various embodiments of the present disclosure includes components, and includes the memory 110 and the processor 120. In some embodiments, the electronic device 100 may further include at least one other component. According to various embodiments, the electronic device 100 is implemented to use a deep neural network, and may perform a simulation-guided iterative pruning process.

메모리(110)는 전자 장치(100)의 구성 요소에 의해 사용되는 다양한 데이터를 저장할 수 있다. 예를 들면, 데이터는 소프트웨어(예: 프로그램) 및 이와 관련된 명령에 대한 입력 데이터 또는 출력 데이터를 포함할 수 있다. 메모리(110)는 휘발성 메모리 또는 비휘발성 메모리를 포함할 수 있다. The memory 110 may store various data used by components of the electronic device 100. For example, the data may include input data or output data for software (eg, programs) and related commands. The memory 110 may include volatile memory or nonvolatile memory.

프로세서(120)는 다양한 데이터 처리 및 연산을 수행할 수 있다. 이를 위해, 프로세서(120)는 프로세서(120)에 연결된 전자 장치(100)의 적어도 하나의 다른 구성 요소를 제어할 수 있다. 또한, 프로세서(120)는 소프트웨어를 구동하여 다양한 데이터 처리 및 연산을 수행하고, 결과 데이터를 메모리(110)에 저장할 수 있다. The processor 120 may perform various data processing and operations. To this end, the processor 120 may control at least one other component of the electronic device 100 connected to the processor 120. In addition, the processor 120 may drive software to perform various data processing and calculations, and store the result data in the memory 110.

다양한 실시예들에 따르면, 전자 장치(100)는, 도 2에 도시된 바와 같이 시뮬레이션-가이드된 반복적 프루닝을 수행할 수 있다. 다양한 실시예들에 따르면, 전자 장치(100)는 일시적으로 축소된 네트워크(temporarily reduced network)를 시뮬레이션에 활용할 수 있다. 시뮬레이션의 첫 번째 단계로서, 전자 장치(100)는 기본 네트워크(original network)에서 가중치(weight)들의 중요도(importance)를 계산하여, 메모리(110)에 별도로 저장할 수 있다. 이 후 전자 장치(100)는 특정 임계 값 이하의 중요도를 가진 기본 네트워크의 가중치들을 일시적으로 제로(0)로 설정함으로써, 일시적으로 축소된 네트워크를 생성할 수 있다. 여기서, 미리 결정된 백분위수의 임계 값이 사용되어, 반복적 시뮬레이션 프로세스 동안 일관성이 유지될 수 있다. 전자 장치(100)는 일시적으로 제로로 설정된 가중치들을 포함하여 일시적으로 축소된 네트워크의 각 가중치에 대한 그라디언트(gradient)들을 일련의 학습 데이터를 사용하여 계산할 수 있다. 전자 장치(100)는 그라디언트들을 일시적으로 축소된 네트워크가 아니라, 저장된 기본 네트워크에 적용할 수 있다. 이러한 시뮬레이션 프로세스는 첫 번째 단계로부터 시작하여 기본 네트워크를 변경하고, 네트워크가 충분히 시뮬레이션될 때까지 반복될 수 있다. According to various embodiments, the electronic device 100 may perform simulation-guided iterative pruning as illustrated in FIG. 2. According to various embodiments, the electronic device 100 may utilize a temporarily reduced network for simulation. As a first step of the simulation, the electronic device 100 may calculate the importance of weights in the original network and separately store it in the memory 110. Thereafter, the electronic device 100 may temporarily create a reduced network by temporarily setting the weights of the basic networks having a importance level below a certain threshold to zero. Here, a threshold of a predetermined percentile is used, so that consistency can be maintained during the iterative simulation process. The electronic device 100 may calculate gradients for each weight of the temporarily reduced network, including weights temporarily set to zero, using a series of learning data. The electronic device 100 may apply the gradients to the stored basic network, not a temporarily reduced network. This simulation process can be repeated starting from the first step, changing the basic network, and until the network is sufficiently simulated.

이러한 시뮬레이션 프로세스를 통해, 가중치들의 중요도가 계산되고, 임계 값 아래의 가중치들은 반복적 프루닝을 통해 제거될 수 있다. 이 후 프루닝된 가중치들은 영구적으로 고정되고, 재트레이닝 프로세스 없이 전체 프로세스가 높은 임계 값을 가지고 반복될 수 있다. Through this simulation process, the importance of weights is calculated, and the weights below the threshold can be removed through iterative pruning. The pruned weights are then permanently fixed, and the entire process can be repeated with a high threshold without retraining.

다양한 실시예들에 따르면, 전자 장치(100)는, 도 3에 도시된 바와 같이 가중치들의 그라디언트에 기반하여, 시뮬레이션-가이드된 반복적 프루닝을 수행할 수 있다. x 축(310)은 가중치들의 값에 대응되고, 제 1 점선(320)은 프루닝의 임계 값을 나타낼 수 있다. 제 2 점선(330)은 제로에 가깝게 위치되는, 엄밀히 말하여 무의미한, 가중치의 이상 값을 나타내고, 제 3 점선(340)은 충분히 중요한 다른 가중치의 이상 값을 나타낼 수 있다. 네트워크의 러닝 프로세스(learning process)가 확률적으로 선택된 데이터를 사용하기 때문에, 중요한 가중치의 이상 값 또한 절대 이상 값 근처에서 확률적으로 분포될 수 있다. 이러한 경우, 중요한 가중치의 절대 이상 값이 임계 값 보다 크더라도, 가중치의 값이 컷오프 범위 내로 떨어질 수 있다. 이와 같이 떨어진 가중치의 프루닝은 불필요한 정보의 손실을 초래할 수 있다. 따라서, 중요하지 않은 가중치들과 중요하지만 떨어진 가중치들을 구별하는 것이 중요할 수 있다. 여기서, 시뮬레이션 중에 가중치를 제로로 설정하면, 절대 이상 값이 제로에 가까워지기 때문에 중요하지 않은 가중치의 그라디언트는 랜덤한 방향을 가질 수 있다. 이에 반해, 중요한 가중치의 그라디언트는, 가중치가 반복적 시뮬레이션을 통해 구해질 때까지 일관된 방향을 가질 수 있다. 따라서, 이러한 스크리닝 프로세스는 기본 네트워크가 아닌 프루닝된 네트워크의 시뮬레이션을 기반으로 하며, 보다 정교한 프루닝을 가능하게 할 수 있다. According to various embodiments, the electronic device 100 may perform simulation-guided iterative pruning based on a gradient of weights as illustrated in FIG. 3. The x-axis 310 corresponds to the values of the weights, and the first dotted line 320 may indicate the threshold value of pruning. The second dashed line 330 may indicate an outlier value of a weight, which is strictly meaningless, which is located close to zero, and the third dotted line 340 may indicate an outlier value of another weight that is sufficiently important. Because the network's learning process uses stochastically selected data, the outliers of important weights can also be stochastically distributed near absolute outliers. In this case, even if the absolute ideal value of the important weight is greater than the threshold value, the value of the weight may fall within the cutoff range. The pruning of the dropped weights may cause unnecessary information loss. Therefore, it may be important to distinguish between weights that are not important and weights that are important. Here, if the weight is set to zero during the simulation, since the absolute outlier value approaches zero, the gradient of the insignificant weight may have a random direction. On the other hand, gradients of important weights can have a consistent direction until the weights are obtained through iterative simulation. Therefore, this screening process is based on simulation of a pruned network rather than a basic network, and may enable more sophisticated pruning.

다양한 실시예들에 따른 효과적인 네트워크 압축을 위한 시뮬레이션-가이드된 반복적 프루닝을 수행하는 전자 장치(100)는, 압축할 기본 네트워크의 가중치들을 저장하고 있는 메모리(110), 및 기본 네트워크를 압축하도록 구성된 프로세서(120)를 포함할 수 있다. The electronic device 100 performing simulation-guided iterative pruning for effective network compression according to various embodiments is configured to compress the basic network and the memory 110 storing weights of the basic network to be compressed. It may include a processor 120.

다양한 실시예들에 따르면, 상기 프로세서(120)는, 제 1 뉴럴 네트워크를 임계 값(r)에 기반하여 프루닝하여, 제 2 뉴럴 네트워크를 생성하고, 상기 제 2 뉴럴 네트워크의 각 가중치에 대한 그라디언트(g)를 계산하고, 상기 그라디언트(g)를 상기 제 1 뉴럴 네트워크에 적용하여, 제 3 뉴럴 네트워크를 획득하도록 구성될 수 있다. According to various embodiments, the processor 120 prunes a first neural network based on a threshold value r to generate a second neural network, and a gradient for each weight of the second neural network It can be configured to calculate (g) and apply the gradient (g) to the first neural network to obtain a third neural network.

다양한 실시예들에 따르면, 상기 프로세서(120)는, 상기 제 1 뉴럴 네트워크의 가중치들 중 상기 임계 값(r) 아래의 중요도를 갖는 적어도 어느 하나를 제로로 설정하도록 구성될 수 있다. According to various embodiments, the processor 120 may be configured to set at least one of weights of the first neural network having an importance below the threshold value r to zero.

다양한 실시예들에 따르면, 상기 프로세서(120)는, 압축할 기본 네트워크를 상기 제 1 뉴럴 네트워크로 결정할 수 있다. According to various embodiments, the processor 120 may determine a basic network to be compressed as the first neural network.

다양한 실시예들에 따르면, 상기 프로세서(120)는, 상기 제 3 뉴럴 네트워크를 상기 제 1 뉴럴 네트워크로 결정하고, 정해진 횟수 만큼 반복적으로 동작할 수 있다. 예를 들면, 정해진 횟수는 프루닝 단계들(n)을 나타낼 수 있다. According to various embodiments, the processor 120 may determine the third neural network as the first neural network and repeatedly operate a predetermined number of times. For example, the predetermined number of times may represent pruning steps (n).

다양한 실시예들에 따르면, 상기 프로세서(120)는, 상기 정해진 횟수 만큼 반복된 후에, 상기 제 3 뉴럴 네트워크를 압축된 네트워크로 획득할 수 있다. According to various embodiments, the processor 120 may acquire the third neural network as a compressed network after being repeated the predetermined number of times.

예를 들면, 상기 제 1 뉴럴 네트워크는 사전-트레이닝된 뉴럴 네트워크 모델(Ma)을 나타내고, 상기 제 2 뉴럴 네트워크는 일시적으로 축소된 네트워크(T)를 나타내고, 상기 제 3 뉴럴 네트워크는 각 프루닝 단계를 통해 축소된 네트워크(Ma+1) 또는 전체 프루닝 단계들을 통해 축소된 네트워크(R)를 나타낼 수 있다.For example, the first neural network represents a pre-trained neural network model (Ma), the second neural network temporarily represents a reduced network (T), and the third neural network includes each pruning step. A reduced network (Ma + 1) through or a reduced network (R) through all pruning steps may be represented.

도 4는 다양한 실시예들에 따른 시뮬레이션-가이드된 반복적 프루닝을 수행하는 방법의 순서도이다. 4 is a flow chart of a method for performing simulation-guided iterative pruning according to various embodiments.

도 4를 참조하면, 전자 장치(100)는 411 동작에서 제로 위치 매트릭스(zero position matrix; Z)를 초기화할 수 있다. 전자 장치(100)는 기본 네트워크의 가중치들을 제로 위치 매트릭스(Z)에 저장할 수 있다. 전자 장치(100)는 기본 네트워크에서 가중치들의 중요도를 계산하여, 메모리(110)에 저장할 수 있다. 전자 장치(100)는 413 동작에서 사전-트레이닝된 뉴럴 네트워크 모델(pre-trained neural network model; M)을 결정할 수 있다. 예를 들면, 전자 장치(100)는 프루닝 단계(pruning step; a)에 기반하여, 사전-트레이닝된 뉴럴 네트워크 모델(M)을 결정할 수 있다. 전자 장치(100)는 415 동작에서 프루닝 단계(a)를 설정할 수 있다. 예를 들면, 기본 네트워크에 대한 첫 번째 프루닝을 수행하는 경우, 전자 장치(100)는 프루닝 단계(a)를 1로 설정할 수 있으며, 사전-트레이닝된 뉴럴 네트워크 모델(M)은 기본 네트워크일 수 있다. Referring to FIG. 4, the electronic device 100 may initialize a zero position matrix (Z) in operation 411. The electronic device 100 may store the weights of the basic network in the zero position matrix Z. The electronic device 100 may calculate the importance of weights in the basic network and store it in the memory 110. In operation 413, the electronic device 100 may determine a pre-trained neural network model (M). For example, the electronic device 100 may determine a pre-trained neural network model M based on a pruning step (a). The electronic device 100 may set the pruning step (a) in operation 415. For example, when performing the first pruning for the basic network, the electronic device 100 may set the pruning step (a) to 1, and the pre-trained neural network model M may be the basic network. You can.

전자 장치(100)는, 417 동작에서 프루닝 단계(a)가 정해진 프루닝 단계들(pruning steps; n)에 도달했는 지의 여부를 판단할 수 있다. 즉 전자 장치(100)는, 현재의 프루닝 단계(a)가 프루닝 단계들(n)의 정해진 횟수 미만인 지의 여부를 판단할 수 있다. 417 동작에서 프루닝 단계(a)가 정해진 프루닝 단계들(n)에 도달하지 않은 것으로 판단되면, 전자 장치(100)는 419 동작에서 일시적으로 축소된 네트워크(T)를 생성할 수 있다. 전자 장치(100)는 임계 값(pruning ratio; r)에 기반하여, 사전-트레이닝된 뉴럴 네트워크 모델(M)에 프루닝을 수행함으로써, 일시적으로 축소된 네트워크(T)를 생성할 수 있다. 전자 장치(100)는 미리 결정된 백분위수를 임계 값(r)으로 사용할 수 있다. 이를 통해, 사전-트레이닝된 뉴럴 네트워크 모델(M)의 가중치들 중 임계 값 이하의 중요도를 가진 적어도 어느 하나가 제로로 설정됨으로써, 일시적으로 축소된 네트워크(T)가 생성될 수 있다. 전자 장치(100)는 421 동작에서 일시적으로 축소된 네트워크(T)를 사용하여, 사전-트레이닝된 뉴럴 네트워크 모델(M)의 가중치들에 대한 그라디언트(g)를 계산할 수 있다. In operation 417, the electronic device 100 may determine whether the pruning step (a) has reached predetermined pruning steps (n). That is, the electronic device 100 may determine whether the current pruning step (a) is less than a predetermined number of pruning steps (n). If it is determined in operation 417 that the pruning step (a) has not reached the predetermined pruning steps (n), the electronic device 100 may generate the temporarily reduced network T in operation 419. The electronic device 100 may generate the temporarily reduced network T by performing pruning on the pre-trained neural network model M based on a threshold value p. The electronic device 100 may use a predetermined percentile as the threshold value r. Through this, at least one of the weights of the pre-trained neural network model M having a importance level below a threshold value is set to zero, so that the temporarily reduced network T can be generated. The electronic device 100 may calculate the gradient g for the weights of the pre-trained neural network model M using the temporarily reduced network T in operation 421.

전자 장치(100)는 423 동작에서 가중치들을 제로 위치 매트릭스(Z)와 비교할 수 있다. 전자 장치(100)는, 가중치들이 제로 위치 매트릭스(Z)에 대응되는 지의 여부를 판단할 수 있다. 423 동작에서 가중치들이 제로 위치 매트릭스(Z)에 대응되지 않는 것으로 판단되면, 전자 장치(100)는 425 동작에서 그라디언트(g)를 사용하여, 사전-트레이닝된 뉴럴 네트워크 모델(M)의 가중치들을 업데이트할 수 있다. 즉 전자 장치(100)는 그라디언트(g)를, 일시적으로 축소된 네트워크(T)가 아니라, 사전-트레이닝된 뉴럴 네트워크 모델(M)에 적용할 수 있다. 이 후 전자 장치(100)는 427 동작에서 임계 값(r)에 기반하여, 제로 위치 매트릭스(Z)로 가중치들을 저장할 수 있다. 423 동작에서 가중치들이 제로 위치 매트릭스(Z)에 대응되는 것으로 판단되면, 전자 장치(100)는 427 동작에서 임계 값(r)에 기반하여, 제로 위치 매트릭스(Z)로 가중치들을 저장할 수 있다. 이를 통해, 전자 장치(100)는 기본 네트워크를 변경할 수 있다. 전자 장치(100)는 429 동작에서 프루닝 단계(a)를 변경할 수 있다. 예를 들면, 전자 장치(100)는 현재의 프루닝 단계(a)를 1 만큼 증가시킬 수 있다. In operation 423, the electronic device 100 may compare the weights with the zero position matrix Z. The electronic device 100 may determine whether the weights correspond to the zero position matrix Z. If it is determined in operation 423 that the weights do not correspond to the zero position matrix (Z), the electronic device 100 updates the weights of the pre-trained neural network model (M) using the gradient (g) in operation 425. can do. That is, the electronic device 100 may apply the gradient g to the pre-trained neural network model M, rather than the temporarily reduced network T. Thereafter, the electronic device 100 may store the weights as the zero position matrix Z based on the threshold value r in operation 427. If it is determined in operation 423 that the weights correspond to the zero location matrix Z, the electronic device 100 may store the weights in the zero location matrix Z based on the threshold value r in operation 427. Through this, the electronic device 100 may change the basic network. In operation 429, the electronic device 100 may change the pruning step (a). For example, the electronic device 100 may increase the current pruning step (a) by one.

전자 장치(100)는, 431 동작에서 프루닝 단계(a)가 프루닝 단계들(n)에 도달했는 지의 여부를 판단할 수 있다. 즉 전자 장치(100)는, 현재의 프루닝 단계(a)가 프루닝 단계들(n)의 정해진 횟수와 일치하는 지의 여부를 판단할 수 있다. 예를 들면, 전자 장치(100)는 429 동작에서 프루닝 단계(a)를 변경한 다음, 431 동작을 수행할 수 있다. 또는 417 동작에서 현재의 프루닝 단계(a)가 프루닝 단계들(n)의 정해진 횟수 미만이 아닌 것으로 판단되면, 전자 장치(100)는 431 동작으로 진행할 수 있다. 또는 431 동작에서 현재의 프루닝 단계(a)가 프루닝 단계들(n)의 정해진 횟수와 일치하지 않는 것으로 판단되면, 전자 장치(100)는 417 동작으로 진행할 수 있다. 이를 통해, 전자 장치(100)는 프루닝 단계들(n) 만큼 반복적 프루닝을 수행할 수 있다. 이에 따라, 낮은 중요도의 가중치들이 제거될 수 있다. 431 동작에서 프루닝 단계(a)가 프루닝 단계들(n)에 도달한 것으로 판단되면, 전자 장치(100)는 433 동작에서 사전-트레이닝된 뉴럴 네트워크 모델(M)을 축소된 네트워크(reduced network; R)로 획득할 수 있다. In operation 431, the electronic device 100 may determine whether the pruning step (a) has reached the pruning steps (n). That is, the electronic device 100 may determine whether the current pruning step (a) matches the predetermined number of pruning steps (n). For example, the electronic device 100 may change the pruning step (a) in operation 429 and then perform operation 431. Alternatively, if it is determined in operation 417 that the current pruning step (a) is not less than a predetermined number of pruning steps (n), the electronic device 100 may proceed to operation 431. Alternatively, if it is determined in operation 431 that the current pruning step (a) does not match the predetermined number of pruning steps (n), the electronic device 100 may proceed to operation 417. Through this, the electronic device 100 may perform repetitive pruning as many as the pruning steps (n). Accordingly, weights of low importance can be eliminated. If it is determined in operation 431 that the pruning step (a) has reached the pruning steps (n), the electronic device 100 reduces the pre-trained neural network model (M) in operation 433 to a reduced network. ; R).

다양한 실시예들에 따른 효과적인 네트워크 압축을 위한 시뮬레이션-가이드된 반복적 프루닝 방법은, 제 1 뉴럴 네트워크를 임계 값(r)에 기반하여 프루닝하여, 제 2 뉴럴 네트워크를 생성하는 동작, 상기 제 2 뉴럴 네트워크의 각 가중치에 대한 그라디언트(g)를 계산하는 동작, 및 상기 그라디언트(g)를 상기 제 1 뉴럴 네트워크에 적용하여, 제 3 뉴럴 네트워크를 획득하는 동작을 포함할 수 있다. A simulation-guided iterative pruning method for effective network compression according to various embodiments comprises: pruning a first neural network based on a threshold value (r) to generate a second neural network, the second It may include calculating a gradient (g) for each weight of the neural network, and obtaining a third neural network by applying the gradient (g) to the first neural network.

다양한 실시예들에 따르면, 상기 제 2 뉴럴 네트워크 생성 동작은, 상기 제 1 뉴럴 네트워크의 가중치들 중 임계 값(r) 아래의 중요도를 갖는 적어도 어느 하나를 제로로 설정하는 동작을 포함할 수 있다. According to various embodiments, the operation of generating the second neural network may include setting at least one of the weights of the first neural network having an importance level below a threshold r to zero.

다양한 실시예들에 따르면, 상기 방법은, 압축할 기본 네트워크를 상기 제 1 뉴럴 네트워크로 결정하는 동작을 더 포함할 수 있다. According to various embodiments, the method may further include determining a basic network to be compressed as the first neural network.

다양한 실시예들에 따르면, 상기 방법은, 상기 제 3 뉴럴 네트워크를 상기 제 1 뉴럴 네트워크로 결정하는 동작을 더 포함하고, 정해진 횟수 만큼 반복될 수 있다. 예를 들면, 정해진 횟수는 프루닝 단계들(n)을 나타낼 수 있다. According to various embodiments, the method further includes determining the third neural network as the first neural network, and may be repeated a predetermined number of times. For example, the predetermined number of times may represent pruning steps (n).

다양한 실시예들에 따르면, 상기 방법은, 상기 정해진 횟수 만큼 반복된 후에, 상기 제 3 뉴럴 네트워크를 압축된 네트워크로 획득하는 동작을 더 포함할 수 있다. According to various embodiments, the method may further include obtaining the third neural network as a compressed network after being repeated the predetermined number of times.

예를 들면, 상기 제 1 뉴럴 네트워크는 사전-트레이닝된 뉴럴 네트워크 모델(M_a)을 나타내고, 상기 제 2 뉴럴 네트워크는 일시적으로 축소된 네트워크(T)를 나타내고, 상기 제 3 뉴럴 네트워크는 각 프루닝 단계를 통해 축소된 네트워크(M_a+1) 또는 전체 프루닝 단계들을 통해 축소된 네트워크(R)를 나타낼 수 있다.For example, the first neural network represents a pre-trained neural network model (M _a ), the second neural network temporarily represents a reduced network (T), and the third neural network each pruning The reduced network M _{a + 1} through steps or the reduced network R through all pruning steps may be represented.

도 5는 다양한 실시예들에 따른 시뮬레이션-가이드된 반복적 프루닝의 성능을 설명하기 위한 도면이다. 도 5는 다양한 실시예들에 따른 알고리즘과 기존의 알고리즘에 각각 대응하는 실험 결과로서, 프루닝 프로세스에서 제거되지 않은 프루닝된 가중치에 대한 비율의 함수로서 분류 정확도(classification accuracy)를 나타내고 있다. 도 5의 (b)는, 도 5의 (a)에서 다양한 실시예들에 따른 알고리즘과 기존의 알고리즘의 성능 차가 비교적 큰 일부 영역을 확대하여 도시하고 있다. 5 is a diagram for describing the performance of simulation-guided iterative pruning according to various embodiments. FIG. 5 shows the classification accuracy as a function of the ratio of the pruned weights not removed in the pruning process, as an experimental result corresponding to the algorithm according to various embodiments and the existing algorithm, respectively. FIG. 5 (b) is an enlarged view of a region in which performance differences between an algorithm according to various embodiments and a conventional algorithm are relatively large in FIG. 5 (a).

도 5를 참조하면, 프루닝된 가중치들의 개수가 1 % 까지 감소되더라도, 다양한 실시예들에 따른 알고리즘과 기존의 알고리즘 사이에 현저한 성능 차는 나타나지 않는다. 다만, 프루닝된 가중치들의 개수가 1 % 미만으로 감소됨에 따라, 기존의 알고리즘의 성능은 급격하게 저하되는 반면, 다양한 실시예들에 따른 알고리즘의 성능은 상대적으로 높게 나타난다. 이는, 다양한 실시예들에 따른 알고리즘이 기존의 알고리즘 보다 효과적으로 네트워크 압축을 수행함을 의미한다. 예를 들면, 다양한 실시예들에 따른 알고리즘을 사용하는 경우, 분류 정확도를 90 %로 유지하면서 네트워크가 원래 크기의 0.114 %로 압축되는 데 반해, 기존의 알고리즘을 사용하는 경우, 0.138 %로 압축될 수 있을 뿐이다. Referring to FIG. 5, even if the number of pruned weights is reduced to 1%, there is no significant performance difference between an algorithm according to various embodiments and an existing algorithm. However, as the number of pruned weights is reduced to less than 1%, the performance of the existing algorithm rapidly decreases, while the performance of the algorithm according to various embodiments is relatively high. This means that the algorithm according to various embodiments performs network compression more effectively than the existing algorithm. For example, when using an algorithm according to various embodiments, the network is compressed to 0.114% of its original size while maintaining classification accuracy at 90%, whereas when using an existing algorithm, it is compressed to 0.138%. I can only.

다양한 실시예들에 따르면, 심층 뉴럴 네트워크를 압축하는 새로운 방법이 제안된다. 일시적으로 축소된 네트워크의 시뮬레이션을 통해, 반복적 프루닝이 보다 효과적으로 수행될 수 있다. 동시에, 축소된 네트워크에 보다 적합한 구조로 최적의 가중치가 협력적으로 학습될 수 있다. 이는, 도 5에 도시된 실험 결과와 같이, 기존의 알고리즘을 능가하는 성능을 나타낼 수 있다. 따라서, 한정된 자원을 사용하는 임베디드 시스템에 고성능 심층 학습 모델을 장착하는 것이 가능하다. According to various embodiments, a new method for compressing a deep neural network is proposed. Through simulation of the temporarily reduced network, iterative pruning can be performed more effectively. At the same time, the optimal weight can be cooperatively learned with a more suitable structure for the reduced network. This, as shown in the experimental results shown in Figure 5, may represent a performance that exceeds the existing algorithm. Therefore, it is possible to mount a high-performance deep learning model in an embedded system using limited resources.

본 문서의 다양한 실시예들에 관해 설명되었으나, 본 문서의 다양한 실시예들의 범위에서 벗어나지 않는 한도 내에서 여러 가지 변형이 가능하다. 그러므로, 본 문서의 다양한 실시예들의 범위는 설명된 실시예에 국한되어 정해져서는 아니되며 후술하는 특허청구의 범위 뿐만 아니라 이 특허청구의 범위와 균등한 것들에 의해 정해져야 한다. Although various embodiments of the present document have been described, various modifications are possible without departing from the scope of the various embodiments of the present document. Therefore, the scope of various embodiments of the present document should not be limited to the described embodiments, but should be determined not only by the scope of the claims described below, but also by the scope and equivalents of the claims.

Claims

In the simulation-guided iterative pruning method for effective network compression,
Pruning the first neural network based on a threshold to generate a second neural network;
Calculating a gradient for each weight of the second neural network; And
And applying the gradient to the first neural network to obtain a third neural network.

The method of claim 1, wherein the operation of generating the second neural network comprises:
And setting at least one of weights of the first neural network having an importance level below the threshold to zero.

According to claim 1,
And determining the primary network to be compressed as the first neural network.

According to claim 1,
And determining the third neural network as the first neural network,
How to repeat a specified number of times.

The method of claim 4,
And after repeating the predetermined number of times, acquiring the third neural network as a compressed network.

An electronic device performing simulation-guided iterative pruning for effective network compression,
A memory storing weights of the basic network to be compressed; And
A processor configured to compress the basic network,
The processor,
Pruning the first neural network based on a threshold value to generate a second neural network,
Calculate the gradient for each weight of the second neural network,
An electronic device configured to obtain a third neural network by applying the gradient to the first neural network.

The method of claim 6, wherein the processor,
An electronic device configured to set at least one of weights of the first neural network having an importance level below the threshold to zero.