KR102316528B1

KR102316528B1 - Hardware friendly neural architecture search(nas) based neural network quantization method

Info

Publication number: KR102316528B1
Application number: KR1020210005891A
Authority: KR
Inventors: 최정욱; 박성민; 권범석; 임지은; 심규영
Original assignee: 주식회사 노타; 한양대학교 산학협력단
Priority date: 2021-01-15
Filing date: 2021-01-15
Publication date: 2021-10-25

Abstract

Provided are a neural network data quantization method and system based on a new neural network architecture search technique called neural channel expansion. Disclosed is a hardware friendly neural architecture search (NAS)-based neural data network quantization method. The data quantization method in a neural network may include: a step of searching for a neural network architecture in which channel information is adjusted to derive a neural network model robust against quantization errors; and a step of deriving a quantized neural network model through training on the searched neural network architecture.

Description

HARDWARE FRIENDLY NEURAL ARCHITECTURE SEARCH (NAS) BASED NEURAL NETWORK QUANTIZATION METHOD

아래의 설명은 신경망 구조 탐색(Neural Architecture Search) 기반 신경망(Neural Network) 데이터 양자화 기술에 관한 것이다. The following description relates to a Neural Network data quantization technique based on Neural Architecture Search.

최근 인공지능(artificial intelligence) 기술이 발달함에 따라 다양한 산업에서 인공지능 기술을 응용 및 도입하고 있다. 이러한 추세에 따라 컨볼루션 신경망(Convolutional Neural Network)과 같은 인공 신경망(artificial neural network)을 실시간 하드웨어로 구현하고자 하는 수요도 증가하고 있다. Recently, with the development of artificial intelligence (AI) technology, various industries are applying and introducing artificial intelligence technology. According to this trend, the demand for real-time hardware implementation of artificial neural networks, such as convolutional neural networks, is also increasing.

컨볼루션 신경망과 같은 다층 구조의 심층 신경망의 학습과 인식 과정에서 많은 연산량과 메모리량을 필요로 한다. 다층 구조의 심층 신경망의 연산량과 메모리량을 줄이는 방법으로는, 인공 신경망의 연산에 사용되는 파라미터의 데이터 표현 크기를 비트 단위로 감소시키는 비트 양자화 방법이 사용될 수 있다. 기존의 비트 양자화 방법은, 인공 신경망의 모든 파라미터를 동일한 비트 수로 양자화 하는 균일 비트 양자화(Uniform bit quantization)가 사용되지만, 기존의 균일 비트 양자화 방법은 인공 신경망에서 사용되는 각각의 파라미터에 대한 비트 수의 변경이 전체 성능에 미치는 영향을 정확히 반영하지 못하는 문제가 있다.A large amount of computation and memory is required in the learning and recognition process of a multi-layer deep neural network such as a convolutional neural network. As a method of reducing the amount of computation and memory of a multi-layered deep neural network, a bit quantization method of reducing the data expression size of a parameter used for computation of an artificial neural network in bits may be used. In the conventional bit quantization method, uniform bit quantization, which quantizes all parameters of the artificial neural network with the same number of bits, is used, but the conventional uniform bit quantization method uses the number of bits for each parameter used in the artificial neural network. The problem is that the change does not accurately reflect the impact on overall performance.

한국 공개특허 제10-2018-0120967호(공개일 2018년11월07일)에 뉴럴네트워크를 위한 데이터 양자화 방법 및 장치가 개시된 바 있다. In Korean Patent Laid-Open Publication No. 10-2018-0120967 (published on November 07, 2018), a data quantization method and apparatus for a neural network have been disclosed.

신경 채널 확장(Neural Channel Expansion)이라는 새로운 신경망 구조 탐색 기법 기반의 신경망 데이터 양자화 방법 및 시스템을 제공할 수 있다. It is possible to provide a method and system for quantizing neural network data based on a new neural network structure search technique called Neural Channel Expansion.

하드웨어 제약 조건을 유지하면서 균일 정밀 양자화 오류에 대해 보다 강력한 네트워크를 갖는 신경망 구조를 탐색하는 방법 및 시스템을 제공할 수 있다. It is possible to provide a method and system for exploring a neural network structure with a more robust network for uniform precision quantization errors while maintaining hardware constraints.

신경망 데이터 양자화 방법은, 양자화 오류에 대한 강인한 신경망 모델을 도출하기 위하여 채널 정보가 조정된 신경망 구조를 탐색하는 단계; 및 상기 탐색된 신경망 구조에 대한 훈련을 통해 양자화된 신경망 모델을 도출하는 단계를 포함할 수 있다. The method for quantizing neural network data includes: searching for a neural network structure in which channel information is adjusted in order to derive a robust neural network model against quantization errors; and deriving a quantized neural network model through training on the searched neural network structure.

상기 신경망 구조를 탐색하는 단계는, 신경 채널 확장 기법에 기초하여 채널의 개수를 선택적으로 조정하는 단계를 포함할 수 있다. The step of discovering the neural network structure may include selectively adjusting the number of channels based on a neural channel expansion technique.

상기 신경망 구조를 탐색하는 단계는, 채널을 축소 또는 확장이 가능한 검색 공간을 사용하여 신경망을 구성하는 계층에 대한 채널을 확장시키는 단계를 포함할 수 있다. The step of searching for the neural network structure may include expanding a channel for a layer constituting the neural network by using a search space in which the channel can be reduced or expanded.

상기 신경망 구조를 탐색하는 단계는, 검색 매개변수를 사용하여 채널의 개수에 대해 검색 공간을 구성하고, 상기 구성된 검색 공간에서, 양자화에 대한 각 계층의 민감도와 하드웨어 제약 조건을 기반으로 채널 선택을 통해 검색 매개변수를 업데이트하는 단계를 포함할 수 있다. In the step of searching the neural network structure, a search space is constructed for the number of channels using a search parameter, and in the constructed search space, the sensitivity of each layer to quantization and a channel selection based on hardware constraints are used. updating search parameters.

상기 신경망 구조를 탐색하는 단계는, 상기 검색 공간에서, 단일 비트 양자화 또는 다중 비트 양자화를 통해 훈련 손실을 감소시키기 위한 가중치 매개변수를 업데이트하는 단계를 포함할 수 있다. Exploring the neural network structure may include updating a weight parameter for reducing training loss through single-bit quantization or multi-bit quantization in the search space.

상기 신경망 구조를 탐색하는 단계는, 신경망에 확장 가능한 최대 채널의 개수와 관련하여 업데이트된 검색 매개변수를 각 계층에서 채널 확장이 필요한 지 여부를 확인하기 위하여 상기 검색 매개변수를 특정 임계값과 비교하는 단계를 포함할 수 있다. The step of searching for the neural network structure includes comparing the search parameter with a specific threshold value in order to check whether the channel extension is required in each layer with the search parameter updated in relation to the maximum number of channels expandable in the neural network. may include steps.

상기 신경망 구조를 탐색하는 단계는, 상기 비교를 통해 상기 최대 채널 수와 관련된 업데이트된 검색 매개변수가 특정 임계값을 초과할 때, 각 계층의 채널 확장을 활성화하고, 상기 채널 확장이 활성화된 계층에 상기 업데이트된 가중치 매개변수를 추가하고 상기 업데이트된 검색 매개변수를 복사하는 단계를 포함할 수 있다. In the step of searching the neural network structure, when the updated search parameter related to the maximum number of channels through the comparison exceeds a specific threshold value, the channel expansion of each layer is activated, and the channel expansion is applied to the activated layer. adding the updated weight parameter and copying the updated search parameter.

신경망 데이터 양자화를 위한 양자화 시스템은, 양자화 오류에 대한 강인한 신경망 모델을 도출하기 위하여 채널 정보가 조정된 신경망 구조를 탐색하는 구조 탐색부; 및 상기 탐색된 신경망 구조에 대한 훈련을 통해 양자화된 신경망 모델을 도출하는 모델 도출부를 포함할 수 있다. A quantization system for quantizing neural network data includes: a structure search unit for searching a neural network structure with channel information adjusted in order to derive a robust neural network model against quantization errors; and a model derivation unit for deriving a quantized neural network model through training on the searched neural network structure.

상기 구조 탐색부는, 신경 채널 확장 기법에 기초하여 채널의 수를 선택적으로 조정할 수 있다. The structure search unit may selectively adjust the number of channels based on a neural channel expansion technique.

상기 구조 탐색부는, 채널을 축소 또는 확장이 가능한 검색 공간을 사용하여 신경망을 구성하는 계층에 대한 채널을 확장시킬 수 있다. The structure search unit may expand a channel for a layer constituting a neural network by using a search space in which a channel can be reduced or expanded.

신경 채널 확장이라는 새로운 신경망 구조 탐색 기법을 이용하여 채널 확장을 통해 균일 정밀 양자화를 수행하여 계층 전체의 채널 수를 균일하게 조절할 수 있다. Using a new neural network structure discovery technique called neural channel extension, uniform precision quantization through channel extension can be performed to uniformly control the number of channels in the entire layer.

신경 채널 확장이라는 새로운 신경망 구조 탐색 기법을 이용하여 선택적 채널 확장에 의한 양자화 오류의 보정을 용이하게 할 수 있다. Correction of quantization error by selective channel expansion can be facilitated by using a new neural network structure search technique called neural channel expansion.

목표 신경망의 구조를 수정하여 양자화 정확도를 현저히 향상시킬 수 있다. The quantization accuracy can be significantly improved by modifying the structure of the target neural network.

도 1은 일 실시예에 있어서, 신경망 데이터 양자화 동작을 설명하기 위한 알고리즘의 예이다.
도 2는 일 실시예에 있어서, 동적 활성화 범위에 대한 채널 확장의 영향을 설명하기 위한 그래프이다.
도 3은 일 실시예에 있어서, 채널 검색에 대한 채널 확장의 영향을 설명하기 위한 그래프이다.
도 4는 일 실시예에 있어서, 신경 채널 확장 동작에 대하여 설명하기로 한다.
도 5는 일 실시예에 있어서, 양자화 시스템의 구성을 설명하기 위한 블록도이다.
도 6은 일 실시예에 있어서, 양자화 시스템에서 신경망 데이터 양자화 방법을 설명하기 위한 흐름도이다. 1 is an example of an algorithm for describing a neural network data quantization operation according to an embodiment.
2 is a graph for explaining the effect of channel extension on a dynamic activation range according to an embodiment.
3 is a graph for explaining the effect of channel extension on channel search according to an embodiment.
4 illustrates a neural channel expansion operation according to an embodiment.
5 is a block diagram illustrating a configuration of a quantization system according to an embodiment.
6 is a flowchart illustrating a method for quantizing neural network data in a quantization system according to an embodiment.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

실시예에서는 신경 채널 확장(Neural Channel Expansion)이라는 새로운 신경망 구조 탐색 기반 양자화 동작에 대하여 설명하기로 한다. 신경 채널 확장은 단순하지만 혁신적인 채널 확장 매커니즘을 갖추고 있기 때문에 균일 정밀 양자화를 통해 계층 전체의 채널 수를 균형있게 조절할 수 있다. 신경 채널 확장에 대한 심층 분석을 실시하여 채널 확장이 양자화 오류 보상에 미치는 영향을 파악할 수 있다. 목표 신경망의 구조를 수정하여 양자화 정확도를 현저히 향상시킬 수 있다. In the embodiment, a new neural network structure search-based quantization operation called Neural Channel Expansion will be described. Since neural channel expansion is equipped with a simple but innovative channel expansion mechanism, it is possible to balance the number of channels in the entire layer through uniform precision quantization. An in-depth analysis of neural channel expansion can be performed to understand the effect of channel expansion on quantization error compensation. The quantization accuracy can be significantly improved by modifying the structure of the target neural network.

도 1은 일 실시예에 있어서, 신경망 데이터 양자화 동작을 설명하기 위한 알고리즘의 예이다. 1 is an example of an algorithm for describing a neural network data quantization operation according to an embodiment.

양자화 시스템은 신경 채널 확장(Neural Channel Expansion)이라는 새로운 신경망 구조 탐색 기법 기반의 신경망 데이터 양자화를 수행할 수 있다. 우선적으로, 신경 채널 확장 동작에 대하여 설명하기로 한다. 양자화 시스템은 검색 매개변수

를 사용하여 채널 C={1:cout}의 개수에 대해 검색 공간을 구성할 수 있다. 출력 활성화는 채널 별 보간(channel-wise interpolation)을 통해 정렬된 다른 채널을 사용하여 샘플링된 활성화의 가중합으로 계산될 수 있다. The quantization system may perform neural network data quantization based on a new neural network structure search technique called Neural Channel Expansion. First, a neural channel expansion operation will be described. The quantization system is a search parameter

can be used to construct a search space for the number of channels C={1:cout}. The output activation can be calculated as a weighted sum of activations sampled using different channels aligned through channel-wise interpolation.

수학식 1: Equation 1:

여기서, 출력 활성화

는 양자화기 Q에 의해 양자화된 입력 활성화 X 및 가중치 매개변수 W로 계산되며, I는 C의 샘플링된 서브셋(subset)이다. Here, enable the output

is computed with the input activation X and the weight parameter W quantized by the quantizer Q, where I is a sampled subset of C.

검색 중에 검색 매개변수는 교차 엔트로피 손실과 하드웨어 제약 조건 손실 사이의 트레이드 오프(trade-off)를 기반으로 채널 선택을 통해 업데이트될 수 있다. 종래의 기술에서는 채널 수가 고정되어, 탐색 범위를 가지치기(pruning)로 제한되었다. 실시예에서는, 최대 채널 수와 관련된 검색 매개변수가 특정 임계값을 초과할 때, 개별 계층(layer)의 채널 확장을 활성화할 수 있다. 하나의 계층이 양자화 오류에 취약한 경우, 검색 매개변수가 교차 엔트로피 손실을 줄이기 위해 더 많은 채널에 대한 기본 설정으로 업데이트될 수 있다. 이러한 간단한 확장 조건을 사용하면, 양자화 오류에 양항을 가장 많이 받는 계층으로 채널을 확장하고 양자화에 강력한 다른 계층의 채널을 가지치기(prune)할 수 있다. 이에, 전체적인 하드웨어 제약 조건이 충족될 수 있다. During the search, the search parameters may be updated through channel selection based on a trade-off between cross-entropy loss and hardware constraint loss. In the prior art, the number of channels is fixed, and the search range is limited to pruning. In an embodiment, when a search parameter related to the maximum number of channels exceeds a certain threshold, channel extension of an individual layer may be activated. If one layer is vulnerable to quantization errors, the search parameters can be updated with default settings for more channels to reduce cross-entropy loss. Using such a simple extension condition, it is possible to extend the channel to the layer most susceptible to quantization error and prune the channel of another layer that is strong in quantization. Accordingly, the overall hardware constraint may be satisfied.

도 1은 신경망 데이터 양자화를 위한 전체 동작을 요약한 알고리즘이다. 신경 채널 확장은 준비(warm-up), 검색, 훈련을 포함하는 세 단계로 구성될 수 있다. 준비 단계에서, 모든 수퍼넷 가중치 매개변수를 합리적으로 초기화될 수 있도록 전체 슈퍼넷의 워밍업이 수행될 수 있다. 검색 단계에서, 이중 레벨 최적화를 통한 가중치(w) 및 검색 매개 변수(

)의 반복적인 업데이트가 수행될 수 있다. 최대 채널 수와 관련된 업데이트된 검색 매개변수가 각 계층에서 채널 확장이 필요한 지 여부를 확인하기 위해 임계 값(하이퍼 파라미터로 미리 결정됨)과 비교될 수 있다. 채널 확장이 발생되면, 가중치 매개변수가 해당 계층에 추가되고 검색 매개변수도 복사되어 채널 수가 증가될 수 있다. 검색이 완료됨에 따라 생성된 후보 모델은 승자독식 전략에 의해 도출될 수 있다. 다시 말해서, 각 계층에 대해 가장 큰 크기의 검색 매개변수를 가진 채널 수가 선택될 수 있다. 1 is an algorithm summarizing the overall operation for quantizing neural network data. Neural channel expansion can consist of three steps, including warm-up, discovery, and training. In the preparation phase, a warm-up of the entire supernet may be performed so that all supernet weight parameters can be reasonably initialized. In the retrieval stage, the weight (w) and retrieval parameters (

) may be repeatedly updated. An updated search parameter related to the maximum number of channels may be compared with a threshold (predetermined as a hyperparameter) to determine whether channel extension is required at each layer. When channel expansion occurs, the weight parameter is added to the corresponding layer and the search parameter is also copied, so that the number of channels can be increased. As the search is completed, the generated candidate model may be derived by a winner-take-all strategy. In other words, for each layer, the number of channels with the largest size of the search parameter may be selected.

양자화 시스템은 신경 채널 확장에 기반하여 하드웨어 제약을 유지하면서 균일 정밀 양자화 오류에 대해 보다 강력한 네트워크 구조를 탐색하는 동작을 상세하게 설명하기로 한다. 먼저, 확장된 채널의 구조가 입력 활성화의 동적 범위를 줄이고 양자화 오류를 억제할 수 있다. 다시 말해서, 신경 채널 확장에 의해 적용된 네트워크의 구조는 처음부터 독립적으로 훈련될 수 있고, 양자화에 대한 견고성을 나타낼 수 있다. 신경 채널 확장을 통해 선택된 채널에 의해 양자화 오류의 보정이 용이하게 될 수 있다. The quantization system will describe in detail the operation of exploring a more robust network structure for uniform precision quantization errors while maintaining hardware constraints based on neural channel extension. First, the structure of the extended channel can reduce the dynamic range of input activation and suppress quantization errors. In other words, the structure of the network applied by neural channel expansion can be trained independently from scratch, and can exhibit robustness to quantization. Correction of quantization error may be facilitated by the channel selected through neural channel expansion.

도 2는 일 실시예에 있어서, 동적 활성화 범위에 대한 채널 확장의 영향을 설명하기 위한 그래프이다. 2 is a graph for explaining the effect of channel extension on a dynamic activation range according to an embodiment.

도 2(a)와 도 2(b)는 1X 채널 또는 2X 채널이 있는 ResNet20에서 계층 별 표준편차를 나타낸 것이고, 도 2(c)는 1X 채널 또는 2X 채널이 있는 VGG16에서 계층 별 표준편차를 나타낸 것이고, 도 2(d)는 1X 채널 또는 2X 채널로 ResNet20 정확도를 테스트한 것이다(W32A32에서 W2A2로 정확도 저하). 2(a) and 2(b) show the standard deviation for each layer in ResNet20 with 1X channel or 2X channel, and FIG. 2(c) shows the standard deviation for each layer in VGG16 with 1X channel or 2X channel. 2(d) shows the ResNet20 accuracy test with 1X channel or 2X channel (reduced accuracy from W32A32 to W2A2).

채널 분할이 가중치 매개변수의 동적 범위를 감소시킬 수 있다. 다만, 신경망 훈련 중 구조 자체가 양자화에 얼마나 영향을 미치는지 명확하지 않다. 확장된 채널 구조가 신경망(예를 들면 DNN)의 양자화에 미치는 영향을 이해하기 위해, 먼저 주어진 네트워크에 적용된 양자화가 활성화의 동적 범위를 증가시켜 성공적인 QDNN의 도출을 방해한다는 것을 보여준다. 도 2(a)는 훈련 중에 양자화자를 사용하거나 사용하지 않고, CIFAR10 데이터 셋에서 처음부터 훈련된 ResNet20에 대한 입력 활성화의 표준편차(STDEV)를 나타낸다. W{X}, A{Y}는 가중치와 활성화가 각각 X비트와 Y비트로 양자화 되었음을 나타낸다. W32A2와 W2A2에 대한 표준편차의 대폭적인 증가는 입력 활성화가 양자화될 때 큰 양자화 오류가 발생할 수 있음을 의미한다. 동적 범위의 증가는 고정 모델 구조를 사용한 양자화가 초저비트 정밀도가 균일하게 적용될 때 종종 상당한 정확도 저하를 겪는 이유를 부분적으로 설명할 수 있다. 이러한 현상은 예를 들면, 도 2(c)의 VGG16 모델에서 관찰될 수 있다. Channel segmentation may reduce the dynamic range of the weight parameter. However, it is not clear how much the structure itself affects quantization during neural network training. In order to understand the effect of the extended channel structure on the quantization of a neural network (e.g. DNN), we first show that quantization applied to a given network increases the dynamic range of activation, preventing the derivation of successful QDNNs. Figure 2(a) shows the standard deviation (STDEV) of input activation for ResNet20 trained from scratch on the CIFAR10 data set, with or without a quantizer during training. W{X} and A{Y} indicate that the weight and activation are quantized into X bits and Y bits, respectively. The large increase in standard deviation for W32A2 and W2A2 means that large quantization errors can occur when the input activations are quantized. The increase in dynamic range may partly explain why quantization using fixed model structures often suffers from significant accuracy degradation when ultra-low-bit precision is applied uniformly. This phenomenon can be observed, for example, in the VGG16 model of FIG. 2(c).

또한, 도 2(b)와 도 2(c)는 확장된 채널 구조가 동적 범위에 미치는 영향을 나타낸다. "2X" 모델은 해당 계층의 채널 수가 두 배로 증가한 것을 의미한다. 이때, 모든 하이퍼 파라미터의 설정이 동일하다. 2비트 양자화(W2A2)를 사용하는 2X ResNet 20 및 VGG16 모델 모두 표준편차를 완전 정밀 1X 모델로 축소시킬 수 있다. In addition, FIGS. 2(b) and 2(c) show the effect of the extended channel structure on the dynamic range. The "2X" model means that the number of channels in the corresponding layer is doubled. At this time, all hyperparameter settings are the same. Both the 2X ResNet 20 and VGG16 models using 2-bit quantization (W2A2) can reduce the standard deviation to a full precision 1X model.

2X 채널의 가중치 초기화로 인하여 동적 범위 축소가 고려될 수 있다. 초기 가중치 매개변수는 입력 활성화의 동적 범위에 거의 영향을 미치지 않는 것으로 설명될 수 있다. 구체적으로, 채널의 개수에 의해 초기 가중치의 동적 범위가 결정될 수 있다. 그러나, 도 2(a)에서 볼 수 있듯이, W2A2 양자화가 적용된 2X ResNet20은 1X ResNet 모델이 초기화되는 것과 동일한 방법으로 가중치를 초기화 하더라도 여전히 표준편차가 감소될 수 있다.Dynamic range reduction may be considered due to weight initialization of the 2X channel. The initial weight parameter can be described as having little effect on the dynamic range of input activation. Specifically, the dynamic range of the initial weight may be determined by the number of channels. However, as shown in Fig. 2(a), the standard deviation of 2X ResNet20 to which W2A2 quantization is applied can still be reduced even if the weights are initialized in the same way as the 1X ResNet model is initialized.

마지막으로 입력 활성화의 동적 범위 감소로 테스트 정확도가 향상되었음이 확인될 수 있다. 도 2(c)에서 볼 수 있듯이, 1X ResNet20 모델은 양자화(-2.22%)로 인해 큰 정확도 저하(-0.84%)를 겪는 반면, 2X ResNet20 모델은 상대적으로 작은 정확도 저하(-0.84%)를 겪는다. 그러한 양자화 오류에 대한 견고성이 증가하는 것은 확장된 채널 구조가 활성화의 동적 범위를 감소시키는 메커니즘 때문이다.Finally, it can be seen that the test accuracy is improved by reducing the dynamic range of the input activation. As can be seen in Fig. 2(c), the 1X ResNet20 model suffers from a large accuracy degradation (-0.84%) due to quantization (-2.22%), whereas the 2X ResNet20 model suffers from a relatively small accuracy degradation (-0.84%). . The increased robustness to such quantization errors is due to the mechanism by which the extended channel structure reduces the dynamic range of activation.

실시예에 따르면, 확장된 채널의 구조가 양자화 오류를 보상하는데 결정적인 역할을 수행할 수 있다. 이에, 신경망 구조 탐색(NAS) 프레임 워크에서 채널 확장을 사용하는 경우, 계층의 채널을 조정하여 처음부터 훈련할 때 균일 정밀 양자화에 더욱 강력한 새로운 네트워크 구조를 탐색할 수 있다. According to an embodiment, the structure of the extended channel may play a decisive role in compensating for a quantization error. Therefore, when channel extension is used in the neural network structure search (NAS) framework, a new network structure that is more powerful for uniform precision quantization can be searched for when training from the beginning by adjusting the channels of the layer.

도 3은 일 실시예에 있어서, 채널 검색에 대한 채널 확장의 영향을 설명하기 위한 그래프이다. 3 is a graph for explaining the effect of channel extension on channel search according to an embodiment.

도 3에서 검색 중에 계층의 검색 매개변수(

)에 대한 교차 엔트로피 손실의 기울기를 보여주는CIFAR10-ResNet20에 대한 실험 및 실험 결과를 나타낸 것이다. 도3(a)는 완전 정밀도, 도 3(b)는 2비트 양자화 사용, 도 3(c)는 모든 계층의 Kendall 순위 상관 점수를 나타낸 것이다. In Figure 3, the search parameters of the hierarchy during the search (

) shows the experimental and experimental results for CIFAR10-ResNet20 showing the slope of the cross-entropy loss. 3(a) shows full precision, FIG. 3(b) uses 2-bit quantization, and FIG. 3(c) shows Kendall rank correlation scores of all layers.

양자화 시스템은 양자화 오류를 보상하기 위해 더 많은 수의 채널을 선택하는 것이 바람직할 때만 선택적으로 채널 확장을 허용할 수 있다. 그렇지 않을 경우, 전체 하드웨어 제약을 충족시키기 위해 채널을 정리할 수 있다. 도 3은 신경 채널 확장의 보상 매커니즘을 설명하기 위한 예이다.The quantization system may selectively allow channel extension only when it is desirable to select a larger number of channels to compensate for the quantization error. Otherwise, channels can be cleaned up to meet the overall hardware constraint. 3 is an example for explaining the compensation mechanism of neural channel expansion.

먼저, 채널 선택 기본 설정을 검색 매개변수에 대한 기울기로 관찰될 수 있음을 나타낸다. 검색 중에 탐색된 중요한 절충안은 교차 엔트로피 손실과 하드웨어 제약 조건 손실 사이에 있다. 특히, 교차 엔트로피에 민감한 계층은 많은 수가 채널을 선택할 수 있다. 다시 말해서,

,

과 같이 많은 수의 채널과 관련된 검색 매개변수는 큰 음의 기울기를 수신한다. 예를 들면, 도 3(a)는 완전한 정밀도로 CIFAR10-ResNet20의 종래의 기술(TAS 검색) 중 검색 매개변수의 기울기를 나타낸다. 최대 채널 수(

)와 관련된 검색 매개변수는 처음에는 음의 기울기를 수신할 수 있다. 반대로, 채널 수가 가장 적은 검색 매개변수(

)는 양의 기울기를 수신할 수 있다. 이를 통해 계층이 처음에는 많은 수의 채널을 선호하지만, 검색 에포크(epoch)보다 선호도가 감소한다고 추측할 수 있다. First, we indicate that the channel selection preference can be observed as a slope for the search parameters. An important compromise explored during the search is between cross-entropy loss and hardware constraint loss. In particular, a layer sensitive to cross entropy may select a large number of channels. In other words,

,

Search parameters associated with a large number of channels, such as , receive large negative slopes. For example, Fig. 3(a) shows the slope of the search parameters among the prior art (TAS search) of CIFAR10-ResNet20 with full precision. Maximum number of channels (

) and associated search parameters may initially receive a negative slope. Conversely, the search parameter with the smallest number of channels (

) can receive a positive slope. From this, it can be inferred that the layer initially prefers a large number of channels, but decreases in preference over the search epoch.

다음으로, 검색 중 양자화가 다수의 채널에 대한 선호도를 능가한다는 것을 나타낸다. 동일한 실험 설정에서, 검색 중에 양자화를 적용하면, 도 3(b)와 같이 다수의 채널에 대한 선호도가 더욱 뚜렷해진다. 이러한 현상은 양자화 오류가 교차 엔트로피 손실에 더 많은 영향을 미치기 때문이다. 이러한 채널 선호도를 양자화 하기 위해 검색 매개변수의 경사와 검색 에포크 평균에 대한 Kendall 순위 상관 관계 점수가 계산될 수 있다. 많은 채널에 대한 선호도가 일관적일수록 Kendall 점수가 높아진다. 도 3(c)는 양자화가 있거나 없는 계층별 Kendall 점수를 나타낸 것이다. Kendall 점수는 신경망 구조 검색 중에 양자화가 적용되면 증가한다. 이렇게 증가된 Kendall 점수는 양자화가 검색 매개변수를 더 많은 수의 채널에 대한 강한 선호로 유도한다는 것을 의미한다. 이에, 채널 확장이 가능하도록 검색 공간을 증가시킬 수 있다. 이 새로운 검색 공간 덕분에 많은 채널을 선호하는 계층에 대해 채널 확장이 선택적으로 이루어질 수 있다. 이때, 새로운 검색 공간은 도 3(c)와 같이 더 높은 Kendall 점수를 얻을 수 있다. 이는 검색 공간이 훨씬 더 많은 수의 채널을 선호할 수 있다는 것을 보여준다. 다시 말해서, 채널을 선택적으로 확장하는 새로운 검색 공간은 더 많은 수의 채널을 선택하는 한계를 완화시켜 양자화 오류를 보상할 수 있다.Next, we show that the quantization during the search outperforms the preference for multiple channels. In the same experimental setup, if quantization is applied during the search, the preference for multiple channels becomes more pronounced as shown in Fig. 3(b). This phenomenon is because the quantization error has more influence on the cross-entropy loss. To quantize these channel preferences, a Kendall rank correlation score for the slope of the search parameters and the average of the search epochs can be calculated. The more consistent the preference for many channels, the higher the Kendall score. Figure 3(c) shows the Kendall score for each layer with or without quantization. The Kendall score increases if quantization is applied during neural network structure search. This increased Kendall score means that quantization leads to a strong preference for a larger number of channels. Accordingly, it is possible to increase the search space to enable channel expansion. Thanks to this new search space, channel expansion can be made selectively for layers that prefer many channels. In this case, the new search space may obtain a higher Kendall score as shown in FIG. 3(c). This shows that the search space can favor a much larger number of channels. In other words, a new search space that selectively extends channels can compensate for quantization errors by relaxing the limit of selecting a larger number of channels.

도 4는 일 실시예에 있어서, 신경 채널 확장 동작에 대하여 설명하기로 한다. 4 illustrates a neural channel expansion operation according to an embodiment.

도 2 및 도 3에서는 양자화 오류를 보상하기 위해 채널 확장을 촉진하는 동작에 설명하였다면, 도 4에서는 신경 채널 확장의 장점에 설명하기로 한다.In FIGS. 2 and 3 , the operation of facilitating channel expansion to compensate for quantization error will be described. In FIG. 4 , the advantages of neural channel expansion will be described.

양자화 인식 구조 검색의 이점에 대하여 설명하기로 한다. 양자화 시스템은 신경 채널 확장이 양자화를 위한 더 나은 신경망 구조를 탐색하기 위해 신경망 검색 중에 양자화를 반영할 수 있다. 양자화는 검색 매개변수에 대한 기울기에 영향을 미치며, 검색 후 네트워크 구조의 차이를 초래한다. The advantages of quantization-aware structure search will now be described. The quantization system can reflect the quantization during neural network search in order to search for a better neural network structure for quantization in which neural channel expansion is performed. Quantization affects the slope for the search parameters, resulting in differences in the network structure after the search.

도 4(a)는 양자화 여부에 관계없이 검색 후 정확도를 테스트 한 것이고, 도 4(b)는 양자화 여부에 관계없이 검색 후 모델 구조를 나타낸 것이고, 도 4(c)는 채널 확장 전략에 대한 정확도 대 하드웨어 제약 조건을 나타낸 그래프이다. Fig. 4(a) shows the accuracy test after the search regardless of whether quantization is performed, Fig. 4(b) shows the model structure after the search regardless of whether quantization is performed, and Fig. 4(c) shows the accuracy of the channel extension strategy. It is a graph showing the vs. hardware constraint.

도 4(a)에서는 검색 중 W2A2 양자화 여부에 관계없이 CIFAR10-ResNet20에 대한 신경 채널 확장이 실행된 것을 나타낸 것이다. 그런 다음, 각각의 검색 후에 신경망 모델들을 가져와서 W2A2 양자화를 사용하거나 사용하지 않고 처음부터 신경망 모델을 훈련시킬 수 있다. 완전 정밀 훈련 후, 양쪽 네트워크(양자화 또는 양자화되지 않고 검색된 네트워크) 모두 동일한 정확도 수준을 달성함을 확인할 수 있다. 그러나, W2A2의 경우, 양자화가 검색된 네트워크는 양자화 없이 검색된 네트워크에 비해 평균 정확도에서 더 많은 이득을 획득할 수 있다. 도 4(b)는 양자화가 검색된 신경망 모델(W2A2)과 양자화가 검색되지 않은 신경망 모델(W32A32) 사이의 채널 섹션 차이를 나타낸다. W2A2는 이후 계층에서 더 많은 채널을 선호하는 것을 확인할 수 있다. 신경 채널 확장이 양자화가 인식된 신경망 구조 검색을 수행할 수 있음을 보여준다. Fig. 4(a) shows that neural channel expansion for CIFAR10-ResNet20 was performed regardless of whether W2A2 was quantized during the search. You can then train the neural network model from scratch with or without W2A2 quantization by importing the neural network models after each search. After full precision training, it can be seen that both networks (quantized or non-quantized searched networks) achieve the same level of accuracy. However, in the case of W2A2, the network searched for quantization can obtain more gains in average accuracy compared to the network searched without quantization. 4(b) shows the channel section difference between the neural network model W2A2 for which quantization is searched and the neural network model for which quantization is not searched (W32A32). It can be confirmed that W2A2 prefers more channels in a later layer. We show that neural channel expansion can perform quantization-recognized neural network structure searches.

선택적 채널 확장의 이점에 대하여 설명하기로 한다. 채널 확장을 검색하는 두 가지 옵션이 있다. 한 가지 옵션은 확장된 채널을 검색 공간으로 시작한 다음 정리하는 것이고, 다른 옵션은 신경 채널 확장처럼 선택적으로 채널을 확장하는 것이다. 선택적 확장의 효과를 이해하기 위하여, 모델 검색할 때, 각 계층에 대해 8개의 검색 매개변수가 있지만 신경 채널 확장이 있는 1X 채널과, 16개의 검색 매개변수층이 있는 2X 채널이 검색되는 실험이 구성될 수 있다. 1X-신경 채널 확장이 적절한 네트워크 구조를 탐색한다면, 2X-신경 채널 확장도 마찬가지로 적절한 네트워크 구조를 탐색해야 한다. 그러나, 2X-신경 채널 확장의 검색 결과는 신경 채널 확장에 비해 열세인 것으로 나타났다. 도 4(c)에서 볼 수 있듯이, 동일한 대상 하드웨어 제약 조건에 대해, 2X에 의해 검색된 네트워크 구조는 신경 채널 확장보다 테스트 정확도가 떨어질 수 있다. The advantages of selective channel extension will now be described. There are two options for searching for channel extensions. One option is to start the extended channel as a search space and then clean it up, the other option is to expand the channel selectively like neural channel expansion. To understand the effect of selective expansion, when retrieving the model, an experiment in which a 1X channel with 8 search parameters for each layer but neural channel expansion and a 2X channel with 16 search parameter layers was searched was constructed. can be If 1X-neural channel expansion searches for an appropriate network structure, 2X-neural channel expansion also needs to search for an appropriate network structure. However, the search results of 2X-neural channel dilatation were found to be inferior to neural channel dilatation. As can be seen in Fig. 4(c), for the same target hardware constraint, the network structure retrieved by 2X may have lower test accuracy than neural channel extension.

실시예에 따르면, 신경망 구조 검색 기법을 이용하여 균일 정밀 양자화를 통해 계층별 이질적인 민감성을 해결할 수 있다. According to an embodiment, heterogeneous sensitivities for each layer may be resolved through uniform precision quantization using a neural network structure search technique.

실시예에 따른 채널이 확장된 계층을 가진 신경망 구조를 제공하여 양자화 인식 훈련 중 활성화 범위를 감소시킬 수 있다. An activation range during quantization recognition training may be reduced by providing a neural network structure with a layer in which the channel according to the embodiment is extended.

도 5는 일 실시예에 있어서, 양자화 시스템의 구성을 설명하기 위한 블록도이고, 도 6은 일 실시예에 있어서, 양자화 시스템에서 신경망 데이터 양자화 방법을 설명하기 위한 흐름도이다. 5 is a block diagram illustrating a configuration of a quantization system according to an embodiment, and FIG. 6 is a flowchart illustrating a method of quantizing neural network data in the quantization system according to an embodiment.

양자화 시스템(100)의 프로세서는 구조 탐색부(510) 및 모델 도출부(520)를 포함할 수 있다. 이러한 프로세서의 구성요소들은 양자화 시스템에 저장된 프로그램 코드가 제공하는 제어 명령에 따라 프로세서에 의해 수행되는 서로 다른 기능들(different functions)의 표현들일 수 있다. 프로세서 및 프로세서의 구성요소들은 도 5의 신경망 데이터 양자화 방법이 포함하는 단계들(610 내지 620)을 수행하도록 양자화 시스템을 제어할 수 있다. 이때, 프로세서 및 프로세서의 구성요소들은 메모리가 포함하는 운영체제의 코드와 적어도 하나의 프로그램의 코드에 따른 명령(instruction)을 실행하도록 구현될 수 있다.The processor of the quantization system 100 may include a structure search unit 510 and a model derivation unit 520 . These components of the processor may be representations of different functions performed by the processor according to control instructions provided by program code stored in the quantization system. The processor and components of the processor may control the quantization system to perform steps 610 to 620 included in the method for quantizing neural network data of FIG. 5 . In this case, the processor and components of the processor may be implemented to execute instructions according to the code of the operating system and the code of at least one program included in the memory.

프로세서는 신경망 데이터 양자화 방법을 위한 프로그램의 파일에 저장된 프로그램 코드를 메모리에 로딩할 수 있다. 예를 들면, 양자화 시스템에서 프로그램이 실행되면, 프로세서는 운영체제의 제어에 따라 프로그램의 파일로부터 프로그램 코드를 메모리에 로딩하도록 양자화 시스템을 제어할 수 있다. 이때, 구조 탐색부(510) 및 모델 도출부(520) 각각은 메모리에 로딩된 프로그램 코드 중 대응하는 부분의 명령을 실행하여 이후 단계들(610 내지 620)을 실행하기 위한 프로세서의 서로 다른 기능적 표현들일 수 있다.The processor may load the program code stored in the file of the program for the neural network data quantization method into the memory. For example, when a program is executed in the quantization system, the processor may control the quantization system to load a program code from a file of the program into the memory according to the control of the operating system. At this time, each of the structure search unit 510 and the model derivation unit 520 executes the instruction of the corresponding part of the program code loaded into the memory to execute the subsequent steps 610 to 620 with different functional representations of the processor. can take

단계(610)에서 구조 탐색부(510)는 양자화 오류에 대한 강인한 신경망 모델을 도출하기 위하여 채널 정보가 조정된 신경망 구조를 탐색할 수 있다. 구조 탐색부(510)는 신경 채널 확장 기법에 기초하여 채널의 수를 선택적으로 조정할 수 있다. 구조 탐색부(510)는 채널을 축소 또는 확장이 가능한 검색 공간을 사용하여 신경망을 구성하는 계층에 대한 채널을 확장시킬 수 있다. 구조 탐색부(510)는 검색 매개변수를 사용하여 채널 수에 대한 검색 공간을 구성하고, 구성된 검색 공간에서, 양자화에 대한 각 계층의 민감도와 하드웨어 제약 조건을 기반으로 채널 선택을 통해 검색 매개변수를 업데이트할 수 있다. 구조 탐색부(510)는 검색 공간에서, 단일 비트 양자화 또는 다중 비트 양자화를 통해 훈련 손실을 감소시키기 위한 가중치 매개변수를 업데이트할 수 있다. 구조 탐색부(510)는 신경망에 확장 가능한 최대 채널 수와 관련하여 업데이트된 검색 매개변수를 각 계층에서 채널 확장이 필요한 지 여부를 확인하기 위하여 검색 매개변수를 특정 임계값과 비교할 수 있다. 구조 탐색부(510)는 비교를 통해 최대 채널 수와 관련된 업데이트된 검색 매개변수가 특정 임계값을 초과할 때, 각 계층의 채널 확장을 활성화하고, 채널 확장이 활성화된 계층에 업데이트된 가중치 매개변수를 추가하고 업데이트된 검색 매개변수를 복사할 수 있다.In step 610 , the structure search unit 510 may search for a neural network structure in which channel information is adjusted in order to derive a robust neural network model against quantization errors. The structure search unit 510 may selectively adjust the number of channels based on a neural channel expansion technique. The structure search unit 510 may expand a channel for a layer constituting a neural network by using a search space in which a channel can be reduced or expanded. The structure search unit 510 constructs a search space for the number of channels by using the search parameters, and selects the search parameters through channel selection based on the sensitivity of each layer to quantization and hardware constraints in the configured search space. can be updated. The structure search unit 510 may update a weight parameter for reducing training loss through single-bit quantization or multi-bit quantization in the search space. The structure search unit 510 may compare the search parameter with a specific threshold value in order to check whether the channel extension is required in each layer for the search parameter updated in relation to the maximum number of channels expandable in the neural network. The structure search unit 510 activates the channel extension of each layer when the updated search parameter related to the maximum number of channels exceeds a specific threshold through comparison, and the updated weight parameter for the channel extension is activated. and copy the updated search parameters.

단계(620)에서 및 모델 도출부(620)는 탐색된 신경망 구조에 대한 훈련을 통해 양자화된 신경망 모델을 도출할 수 있다. 이에, 신경 채널 확장에 의해 훈련된 신경망 구조는 처음부터 독립적으로 훈련될 수 있고 양자화에 대한 견고성을 나타내게 된다. In step 620 and the model derivation unit 620 may derive a quantized neural network model through training on the searched neural network structure. Accordingly, the neural network structure trained by neural channel expansion can be trained independently from the beginning and exhibit robustness against quantization.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA). , a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

In the neural network data quantization method performed by a quantization system including a structure search unit and a model derivation unit,
In the structure search unit, based on a target neural network configured to derive a robust neural network model against quantization errors, searching for a neural network structure in which channel information is adjusted through a search parameter related to the maximum number of channels expandable to the target neural network ; and
deriving, in the model derivation unit, a quantized neural network model through training on the searched neural network structure;
including,
Exploring the neural network structure comprises:
In a search space where a weight parameter is initialized and configured for the number of channels using the search parameter, through channel selection based on the sensitivity and hardware constraints of each layer to quantization, the search parameter and updating a weight parameter for reducing training loss, and comparing the size of the updated search parameter with a specific threshold predefined by a hyperparameter of the target neural network to determine whether to activate channel expansion for each layer step
including,
The search parameter is updated through channel selection based on a trade-off between cross entropy loss and hardware constraint loss.
A quantization method of neural network data comprising a.

According to claim 1,
Exploring the neural network structure comprises:
Selectively adjusting the number of channels based on the neural channel expansion technique
A quantization method of neural network data comprising a.

3. The method of claim 2,
Exploring the neural network structure comprises:
Expanding the channel for the layers constituting the neural network using a search space that can shrink or expand the channel.
A quantization method of neural network data comprising a.

According to claim 1,
Exploring the neural network structure comprises:
Constructing a search space for the number of channels using search parameters
A quantization method of neural network data comprising a.

4. The method of claim 3,
Exploring the neural network structure comprises:
In the search space, updating a weight parameter for reducing training loss through single-bit quantization or multi-bit quantization;
A quantization method of neural network data comprising a.

5. The method of claim 4,
Exploring the neural network structure comprises:
In order to determine whether channel expansion is required in each layer, the updated search parameter in relation to the maximum number of channels expandable in the target neural network is set with a specific threshold predetermined by the hyperparameter of the target neural network. step to compare
A quantization method of neural network data comprising a.

7. The method of claim 6,
Exploring the neural network structure comprises:
When the size of the search parameter updated in relation to the maximum number of channels through the comparison exceeds a specific threshold, channel extension of each layer is activated, and the updated weight parameter is applied to the layer in which the channel extension is activated. and copying the updated search parameters.
A quantization method of neural network data comprising a.

A quantization system for quantizing neural network data, comprising:
a structure search unit for searching a neural network structure in which channel information is adjusted through a search parameter related to the maximum number of channels expandable to the target neural network based on the target neural network configured to derive a robust neural network model against quantization errors; and
A model derivation unit for deriving a quantized neural network model through training on the searched neural network structure
including,
The structure search unit,
In a search space where a weight parameter is initialized and configured for the number of channels using the search parameter, through channel selection based on the sensitivity and hardware constraints of each layer to quantization, the search parameter and updating a weight parameter for reducing training loss, and comparing the size of the updated search parameter with a specific threshold predefined by a hyperparameter of the target neural network to determine whether to activate channel expansion for each layer including that,
The search parameter is updated through channel selection based on a trade-off between cross entropy loss and hardware constraint loss.
A quantization system comprising a.

9. The method of claim 8,
The structure search unit,
Selectively adjusting the number of channels based on the neural channel expansion technique
A quantization system, characterized in that.

9. The method of claim 8,
The structure search unit,
It expands the channel for the layers constituting the neural network using a search space that can shrink or expand the channel.
A quantization system, characterized in that.