KR20240047195A

KR20240047195A - Method and system for restoring accuracy of quantization model to address problem of accuracy drop in low bit neural network

Info

Publication number: KR20240047195A
Application number: KR1020220126552A
Authority: KR
Inventors: 박준규
Original assignee: 주식회사 노타
Priority date: 2022-10-04
Filing date: 2022-10-04
Publication date: 2024-04-12

Abstract

로우 비트 신경망에 수반되는 성능 손실 문제를 해결하기 위한 양자화 모델 조정 방법 및 시스템을 개시한다. 일실시예에 따른 양자화 모델 조정 방법은 입력된 양자화 모델의 양자화 파라미터를 추출하는 단계, 상기 양자화 모델에 포함된 레이어들 중에서 상기 양자화 파라미터의 조정에 따른 민감도에 기초하여 선정된 양자화 파라미터 조정 대상 각각에 대해 양자화 파라미터의 후보군을 매칭시킴으로써, 양자화 파라미터 탐색 알고리즘을 위한 탐색 공간을 구축하는 단계, 상기 양자화 파라미터 탐색 알고리즘 및 상기 탐색 공간을 이용하여 상기 양자화 파라미터 조정 대상 각각에 적용할 양자화 파라미터를 탐색하는 단계 및 상기 탐색된 양자화 파라미터를 상기 양자화 모델에 반영하여 조정된 양자화 모델을 생성하는 단계를 포함할 수 있다.Disclosed is a quantization model adjustment method and system for solving the performance loss problem accompanying low-bit neural networks. A quantization model adjustment method according to an embodiment includes extracting quantization parameters of an input quantization model, and selecting each quantization parameter adjustment target among layers included in the quantization model based on sensitivity to adjustment of the quantization parameter. constructing a search space for a quantization parameter search algorithm by matching a candidate group of quantization parameters, using the quantization parameter search algorithm and the search space to search for quantization parameters to be applied to each of the quantization parameter adjustment objects; It may include generating an adjusted quantization model by reflecting the searched quantization parameters in the quantization model.

Description

Quantization model accuracy restoration method and system for solving performance loss problems accompanying low bit neural networks {METHOD AND SYSTEM FOR RESTORING ACCURACY OF QUANTIZATION MODEL TO ADDRESS PROBLEM OF ACCURACY DROP IN LOW BIT NEURAL NETWORK}

본 발명의 실시예들은 로우 비트 신경망에 수반되는 성능 손실 문제를 해결하기 위한 양자화 모델 정확도 복원 방법 및 시스템에 관한 것이다.Embodiments of the present invention relate to a quantization model accuracy restoration method and system for solving the performance loss problem accompanying low-bit neural networks.

이미지 프로세싱과 자연언어처리 등의 테스크를 효과적으로 수행하기 위해서 딥러닝 모델을 이용하는 시도가 활발하게 이루어지고 있다. 하지만 딥러닝 모델을 실행하기 위해서는 많은 양의 계산과 메모리 자원을 필요로 하기 때문에, 자원이 제한되어 있는 임베디드 기기뿐만이 아니라 고성능의 서버에서도 딥러닝 모델을 실행하는 것은 큰 부담이 된다.Attempts to use deep learning models are actively being made to effectively perform tasks such as image processing and natural language processing. However, because running deep learning models requires a large amount of computation and memory resources, running deep learning models not only on embedded devices with limited resources but also on high-performance servers is a huge burden.

따라서 모델을 더욱 효과적으로 수행하기 위해서 가지치기(pruning), 필터 분해(filter decomposition), 양자화 등의 다양한 경량화 방안들이 사용되고 있다. 이 중 양자화 기법은 디폴트(default) 방식인 32-비트 부동 소수점(32-bit floating point) 방식으로 표현되는 숫자를 더 적은 비트를 사용하여 표현하는 딥러닝 모델 경량화 기법을 의미한다.Therefore, in order to perform the model more effectively, various lightweight methods such as pruning, filter decomposition, and quantization are being used. Among these, the quantization technique refers to a deep learning model lightweight technique that uses fewer bits to express numbers expressed in the default 32-bit floating point method.

딥러닝 모델은 텐서플로(TensorFlow)나 파이토치(PyTorch)와 같은 딥러닝 컴파일러를 통해서 생성한다. 또한 딥러닝 컴파일러 안에 양자화를 하는 과정이 미리 구현되어 있어서 사용자가 양자화를 수행할 수 있도록 함수를 제공하고 있다. 양자화를 구현하는 과정은 딥러닝 컴파일러의 여러 부분과 관련이 되어 있어서 사용자가 직접 양자화를 구현하는 것은 아주 까다로운 일이다. 그래서 사용자는 텐서플로 라이트(TensorFlow Lite)나 텐서알티(TensorRT)와 같은 딥러닝 컴파일러에서 제공하는 양자화 함수를 사용하는 것이 일반적이다. 딥러닝 컴파일러에서는 수식을 이용해 양자화 파라미터(scale과 zero point 등)를 계산한 뒤 모델의 가중치(weight), 편향(bias) 및 활성화(activation)와 함께 저장하는 방식으로 양자화된 모델을 생성한다.Deep learning models are created through deep learning compilers such as TensorFlow or PyTorch. In addition, the quantization process is pre-implemented in the deep learning compiler, so a function is provided so that users can perform quantization. The process of implementing quantization is related to several parts of the deep learning compiler, so it is very difficult for users to implement quantization themselves. Therefore, users typically use the quantization function provided by deep learning compilers such as TensorFlow Lite or TensorRT. A deep learning compiler creates a quantized model by calculating quantization parameters (scale, zero point, etc.) using formulas and then storing them along with the model's weight, bias, and activation.

양자화된 모델은 32-비트 부동 소수점 모델에 비해 더 적은 비트를 사용하기 때문에 양자화 손실이 발생하게 되고 이 손실로 인해 딥러닝 모델의 정확도가 저하되는 현상이 발생한다. 문제는 딥러닝 컴파일러가 아주 복잡한 시스템이기 때문에 사용자가 직접 컴파일러의 양자화 과정을 수정하는 것은 쉽지 않다는 것이다.Because quantized models use fewer bits than 32-bit floating point models, quantization loss occurs, and this loss reduces the accuracy of deep learning models. The problem is that because the deep learning compiler is a very complex system, it is not easy for users to directly modify the compiler's quantization process.

본 발명은 2022년 D-유니콘 프로젝트와 관련이 있다.The present invention is related to the 2022 D-Unicorn Project.

[선행문헌번호][Prior document number]

한국등록특허 제10-2261715호Korean Patent No. 10-2261715

컴파일러에서 생성한 양자화 모델의 정확도 저하를 회복시킬 수 있는 정확도 복원 방법 및 시스템을 제공할 수 있다.It is possible to provide an accuracy restoration method and system that can recover from the decrease in accuracy of a quantization model generated by a compiler.

적어도 하나의 프로세서를 포함하는 컴퓨터 장치에 의해 수행되는 양자화 모델을 조정하는 방법에 있어서, 상기 적어도 하나의 프로세서에 의해, 입력된 양자화 모델의 양자화 파라미터를 추출하는 단계; 상기 적어도 하나의 프로세서에 의해, 상기 양자화 모델에 포함된 레이어들 중에서 상기 양자화 파라미터의 조정에 따른 민감도에 기초하여 선정된 양자화 파라미터 조정 대상 각각에 대해 양자화 파라미터의 후보군을 매칭시킴으로써, 양자화 파라미터 탐색 알고리즘을 위한 탐색 공간을 구축하는 단계; 상기 적어도 하나의 프로세서에 의해, 상기 양자화 파라미터 탐색 알고리즘 및 상기 탐색 공간을 이용하여 상기 양자화 파라미터 조정 대상 각각에 적용할 양자화 파라미터를 탐색하는 단계; 및 상기 적어도 하나의 프로세서에 의해, 상기 탐색된 양자화 파라미터를 상기 양자화 모델에 반영하여 조정된 양자화 모델을 생성하는 단계를 포함하는, 방법을 제공한다.A method of adjusting a quantization model performed by a computer device including at least one processor, comprising: extracting, by the at least one processor, quantization parameters of an input quantization model; A quantization parameter search algorithm is performed by matching a candidate group of quantization parameters to each quantization parameter adjustment target selected based on sensitivity to adjustment of the quantization parameter among the layers included in the quantization model by the at least one processor. building a search space for; Searching, by the at least one processor, for quantization parameters to be applied to each of the quantization parameter adjustment targets using the quantization parameter search algorithm and the search space; and generating an adjusted quantization model by reflecting the searched quantization parameters in the quantization model, by the at least one processor.

일측에 따르면, 상기 탐색 공간을 구축하는 단계는, 상기 양자화 모델 및 데이터 셋을 이용하여 상기 양자화 모델의 적어도 하나의 레이어 각각에 대해 상기 양자화 파라미터의 조정에 따른 상기 민감도를 측정하는 단계; 상기 측정된 민감도를 이용하여 상기 적어도 하나의 레이어 중에서 상기 양자화 파라미터 조정 대상이 되는 조정 대상 레이어를 선정하는 단계; 상기 조정 대상 레이어 별로 양자화 파라미터 후보군을 매칭하는 단계; 및 상기 조정 대상 레이어와 상기 양자화 파라미터 후보군의 쌍으로 이루어진 상기 탐색 공간을 생성하는 단계를 포함하는 것을 특징으로 할 수 있다.According to one side, the step of constructing the search space includes measuring the sensitivity according to adjustment of the quantization parameter for each at least one layer of the quantization model using the quantization model and the data set; selecting an adjustment target layer to be adjusted for the quantization parameter from among the at least one layer using the measured sensitivity; Matching a quantization parameter candidate group for each adjustment target layer; and generating the search space consisting of a pair of the adjustment target layer and the quantization parameter candidate group.

다른 측면에 따르면, 상기 탐색 공간을 구축하는 단계는, 상기 양자화 모델 및 데이터 셋을 이용하여 상기 양자화 모델의 적어도 하나의 레이어 집단 각각에 대해, 상기 양자화 파라미터의 조정에 따른 상기 민감도를 측정하는 단계; 상기 측정된 민감도를 이용하여 상기 적어도 하나의 레이어 집단 중에서 상기 양자화 파라미터 조정 대상이 되는 조정 대상 레이어 집단을 선정하는 단계; 상기 조정 대상 레이어 집단 별로 양자화 파라미터 후보군을 매칭하는 단계; 및 상기 조정 대상 레이어 집단과 상기 양자화 파라미터 후보군의 쌍으로 이루어진 상기 탐색 공간을 생성하는 단계를 포함하는 것을 특징으로 할 수 있다.According to another aspect, the step of constructing the search space includes measuring the sensitivity according to adjustment of the quantization parameter for each of at least one layer group of the quantization model using the quantization model and the data set; selecting an adjustment target layer group to be adjusted for the quantization parameter from among the at least one layer group using the measured sensitivity; Matching a quantization parameter candidate group for each adjustment target layer group; and generating the search space consisting of pairs of the adjustment target layer group and the quantization parameter candidate group.

다른 측면에 따르면, 상기 탐색 공간을 구축하는 단계는, 상기 데이터 셋의 크기가 상기 탐색 알고리즘을 수행하기에 충분한지 결정하는 단계; 및 상기 데이터 셋의 크기가 상기 탐색 알고리즘을 수행하기에 충분하지 않은 경우, 상기 데이터 셋을 확장하는 단계를 더 포함하는 것을 특징으로 할 수 있다.According to another aspect, constructing the search space includes determining whether the size of the data set is sufficient to perform the search algorithm; and if the size of the data set is not sufficient to perform the search algorithm, expanding the data set.

또 다른 측면에 따르면, 상기 데이터 셋을 확장하는 단계는, 상기 데이터 셋이 아닌 다른 데이터 셋의 적어도 일부를 상기 데이터 셋에 포함시켜 상기 데이터 셋을 확장시키는 것을 특징으로 할 수 있다.According to another aspect, the step of expanding the data set may be characterized by expanding the data set by including at least a part of a data set other than the data set into the data set.

또 다른 측면에 따르면, 상기 민감도를 측정하는 단계는, 상기 적어도 하나의 레이어 중 어느 하나의 레이어의 양자화 파라미터를 조정한 후, 상기 양자화 모델의 기준값의 변화에 기반하여 상기 어느 하나의 레이어에 대한 상기 민감도를 측정하는 단계를 포함하는 것을 특징으로 할 수 있다.According to another aspect, the step of measuring the sensitivity includes adjusting the quantization parameter of any one of the at least one layer, and then measuring the sensitivity for the one layer based on a change in a reference value of the quantization model. It may be characterized by including the step of measuring sensitivity.

또 다른 측면에 따르면, 상기 기준값은 상기 양자화 모델의 정확도, 상기 양자화 모델의 손실 및 상기 양자화 모델과 상기 양자화 모델의 양자화 이전 모델간의 활성화 차이 중 적어도 하나를 포함하는 것을 특징으로 할 수 있다.According to another aspect, the reference value may include at least one of accuracy of the quantization model, loss of the quantization model, and an activation difference between the quantization model and a pre-quantization model of the quantization model.

또 다른 측면에 따르면, 상기 적어도 하나의 레이어는, 상기 양자화 모델의 전체 레이어 중에서 연산 종류에 따라 기설정된 레이어를 제외한 나머지 레이어를 포함하는 것을 특징으로 할 수 있다.According to another aspect, the at least one layer may include remaining layers excluding a preset layer according to the operation type among all layers of the quantization model.

또 다른 측면에 따르면, 상기 후보군을 매칭하는 단계는, 상기 조정 대상 레이어 별 민감도에 기반하여, 상기 조정 대상 레이어 별로 양자화 파라미터의 후보군의 개수 혹은 그 분포를 다르게 설정하는 것을 특징으로 할 수 있다.According to another aspect, the step of matching the candidate groups may be characterized by setting the number or distribution of candidate groups of quantization parameters differently for each adjustment target layer based on the sensitivity of each adjustment target layer.

또 다른 측면에 따르면, 상기 양자화 파라미터를 탐색하는 단계는, 상기 탐색 알고리즘의 수행을 위한 자원이 충분한 지 확인하는 단계; 상기 자원이 충분한 경우, 상기 탐색 공간을 이용하여 양자화 파라미터 탐색 알고리즘을 수행하는 단계; 및 상기 자원이 충분하지 않은 경우, 상기 탐색 공간 내의 양자화 파라미터 후보군 중 하나를 선택하거나 또는 레이어의 종류에 따라 양자화 파라미터를 일괄적으로 계산하는 단계를 포함하는 것을 특징으로 할 수 있다.According to another aspect, searching for the quantization parameter includes checking whether there are sufficient resources for performing the search algorithm; If the resources are sufficient, performing a quantization parameter search algorithm using the search space; and, when the resources are insufficient, selecting one of the quantization parameter candidates in the search space or collectively calculating quantization parameters according to the type of layer.

또 다른 측면에 따르면, 상기 탐색 공간을 구축하는 단계 및 상기 양자화 파라미터를 탐색하는 단계는, 적어도 한 번 반복 수행되는 것을 특징으로 할 수 있다.According to another aspect, the step of constructing the search space and the step of searching the quantization parameter may be performed repeatedly at least once.

컴퓨터 장치와 결합되어 상기 방법을 컴퓨터 장치에 실행시키기 위해 컴퓨터 판독 가능한 기록매체에 저장된 컴퓨터 프로그램을 제공한다.A computer program stored on a computer-readable recording medium is provided in conjunction with a computer device to execute the method on the computer device.

상기 방법을 컴퓨터 장치에 실행시키기 위한 프로그램이 기록되어 있는 컴퓨터 판독 가능한 기록매체를 제공한다.Provided is a computer-readable recording medium on which a program for executing the above method on a computer device is recorded.

컴퓨터 장치에서 판독 가능한 명령을 실행하도록 구현되는 적어도 하나의 프로세서를 포함하고, 상기 적어도 하나의 프로세서에 의해, 입력된 양자화 모델의 양자화 파라미터를 추출하고, 상기 양자화 모델에 포함된 레이어들 중에서 상기 양자화 파라미터의 조정에 따른 민감도에 기초하여 선정된 양자화 파라미터 조정 대상 각각에 대해 양자화 파라미터의 후보군을 매칭시킴으로써, 양자화 파라미터 탐색 알고리즘을 위한 탐색 공간을 구축하고, 상기 양자화 파라미터 탐색 알고리즘 및 상기 탐색 공간을 이용하여 상기 양자화 파라미터 조정 대상 각각에 적용할 양자화 파라미터를 탐색하고, 상기 탐색된 양자화 파라미터를 상기 양자화 모델에 반영하여 조정된 양자화 모델을 생성하는 것을 특징으로 하는 컴퓨터 장치를 제공한다.At least one processor implemented to execute instructions readable by a computer device, extracting, by the at least one processor, a quantization parameter of an input quantization model, and selecting the quantization parameter from layers included in the quantization model. A search space for a quantization parameter search algorithm is constructed by matching a candidate group of quantization parameters for each quantization parameter adjustment target selected based on sensitivity according to adjustment, and the quantization parameter search algorithm and the search space are used to construct the search space. A computer device is provided that searches for quantization parameters to be applied to each quantization parameter adjustment target, and generates an adjusted quantization model by reflecting the searched quantization parameters in the quantization model.

컴파일러에서 생성한 양자화 모델의 정확도 저하를 회복시킬 수 있다. 이를 위해, 양자화 모델 내부에 존재하는 양자화 파라미터를 업데이트 하기 위한 반복적 탐색 방식의 성능을 개선할 수 있다. 예를 들어, 기존 양자화 파라미터 탐색 방식 대비 더 높은 정확도 복원을 제공할 수 있으며, 그와 동시에 탐색 공간 (Search Space)을 줄임으로써, 최적의 양자화 파라미터 탐색에 필요한 시간을 줄일 수 있다.Deterioration in accuracy of the quantization model generated by the compiler can be recovered. To this end, the performance of the iterative search method for updating the quantization parameters existing within the quantization model can be improved. For example, it can provide higher accuracy restoration compared to existing quantization parameter search methods, and at the same time, by reducing the search space, the time required to search for optimal quantization parameters can be reduced.

도 1은 본 발명의 일실시예에 따른 컴퓨터 장치의 예를 도시한 블록도이다.
도 2는 본 발명의 일실시예에 따른 양자화 모델 정확도 복원 방법의 예를 도시한 흐름도이다.
도 3은 본 발명의 일실시예에 있어서, 탐색 공간을 구축하는 과정의 예를 도시한 흐름도이다.
도 4는 본 발명의 일실시예에 있어서, 양자화 파라미터를 탐색하는 과정의 예를 도시한 흐름도이다.
도 5는 본 발명의 일실시예에 있어서, 레이어별로 할당된 양자화 파라미터 후보의 예를 나타낸 도면이다.1 is a block diagram showing an example of a computer device according to an embodiment of the present invention.
Figure 2 is a flowchart showing an example of a quantization model accuracy restoration method according to an embodiment of the present invention.
Figure 3 is a flowchart showing an example of a process for constructing a search space according to an embodiment of the present invention.
Figure 4 is a flowchart illustrating an example of a process for searching quantization parameters according to an embodiment of the present invention.
Figure 5 is a diagram showing an example of quantization parameter candidates allocated to each layer, according to an embodiment of the present invention.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

일반적인 심층 신경망 모델은 32 비트로 표현된 부동 소수점 값을 이용하여 연산을 수행한다. 이러한 연산을 수행하는 장치의 메모리를 효율적으로 이용하면서, 동시에 실행 속도(latency)를 개선시키기 위해 32 비트의 부동 소수점 값을 더 적은 수의 비트로 근사화하는 양자화 기법을 적용할 수 있다.A typical deep neural network model performs operations using floating point values expressed in 32 bits. In order to efficiently use the memory of the device that performs these operations and at the same time improve execution speed (latency), a quantization technique that approximates a 32-bit floating point value with a smaller number of bits can be applied.

보다 구체적으로, 심층 신경망 모델의 연산에 수반되는 가중치(weight), 편향(bias), 그리고 활성화(activation) 값이 양자화의 대상이 된다. 양자화 과정을 통해 기존 32 비트 부동 소수점 값은 정보량이 더 작은 로우 비트(low bit)에서 근사화되기 때문에 양자화 이전 심층 신경망 모델과의 출력값의 차이가 수반되며, 이는 정확도 손실을 야기할 수 있다.More specifically, the weight, bias, and activation values involved in the calculation of a deep neural network model are subject to quantization. Through the quantization process, the existing 32-bit floating point value is approximated in low bits with a smaller amount of information, so there is a difference in the output value from the deep neural network model before quantization, which may cause loss of accuracy.

일반적으로 양자화 방식은 다음과 같은 과정으로 수행된다. 먼저 심층 신경망 모델의 각 가중치(혹은 활성화) 텐서(Tensor) 원소들의 값의 분포를 토대로 클리핑 구간(Clipping Range)을 설정한다. 클리핑 구간 밖에 위치한 값들은 클리핑 구간 내의 값으로 이동시키는 과정이 선행되어 모든 부동 소수점 값들이 클리핑 구간 내에 포함될 수 있도록 한다. 양자화는 클리핑 구간 내에 포함되는 값들을 더 낮은 수의 비트가 표현할 수 있는 값의 범위(예시: 8비트 정수 타입의 경우 [-128, 127])로 스케일 변환하는 과정을 일컫는다.Generally, the quantization method is performed through the following process. First, a clipping range is set based on the distribution of the values of each weight (or activation) tensor element of the deep neural network model. Values located outside the clipping interval are first moved to values within the clipping interval so that all floating point values can be included within the clipping interval. Quantization refers to the process of scale converting the values included in the clipping interval into the range of values that can be expressed by a lower number of bits (e.g. [-128, 127] for an 8-bit integer type).

보다 구체적으로 클리핑 구간이란, 아래 수학식 1과 같이, 32 비트로 표현된 가중치 혹은 활성화 값 텐서 r에 대해서 로우 비트로 변환하기 비트 수로의 변환을 위해 설정하는 실수 구간을 일컫는다. 수학식 1과 같이 [a, b]와 같은 형식으로 설정되며 r에 포함된 값 중 a보다 작은 값은 모두 a로 치환되며, b보다 작은 값은 모두 b로 치환하는 값의 이동 과정을 거친다.More specifically, the clipping section refers to a real number section set for converting the weight or activation value tensor r expressed in 32 bits into row bits and conversion to the number of bits, as shown in Equation 1 below. As in Equation 1, it is set in the form [a, b], and among the values included in r, all values smaller than a are replaced with a, and all values smaller than b go through a value transfer process where they are replaced with b.

위와 같은 과정을 거치기 때문에 양자화 모델은 기존의 심층 신경망 모델로부터 변환될 때, 가중치, 활성화, 편향 등 원래 모델을 이루고 있는 요소에 대한 정보 이외의 파라미터가 포함될 수 있다. 이를 양자화 파라미터(Quantization Parameter) 혹은 양자화 메타데이터(Quantization Metadata)라고 부른다.Because the quantization model goes through the above process, when it is converted from an existing deep neural network model, parameters other than information about the elements that make up the original model, such as weights, activation, and bias, may be included. This is called a quantization parameter or quantization metadata.

일반적으로 양자화 방식은 클리핑 구간 내 대표값의 분포에 따라 균일(uniform) 양자화 방식과 비균일(non-uniform) 양자화 방식으로 구분될 수 있다.In general, quantization methods can be divided into uniform quantization methods and non-uniform quantization methods depending on the distribution of representative values within the clipping interval.

균일 양자화 방식은 클리핑 구간 [a, b]를 균등하게 나누어 양자화하는 방식으로, 아래 수학식 2를 통해 계산된 두 개의 양자화 파라미터 각각, 스케일 팩터(scale factor) s 그리고 제로 포인트(zero point) z를 통해 수학식 3으로써 표현 가능하다.The uniform quantization method is a method of quantizing the clipping interval [a, b] by dividing it evenly. The two quantization parameters, scale factor s and zero point z, respectively, are calculated through Equation 2 below. It can be expressed as Equation 3.

비균일 양자화 방식은 클리핑 구간 [a, b]를 불균등하게 나누어 양자화 방식으로, 수학식 2와는 다르게 스케일 팩터와 제로 포인트 외의 파라미터를 포함하여 변환 수식을 표현할 수 있다. The non-uniform quantization method is a quantization method that divides the clipping interval [a, b] unevenly. Unlike Equation 2, the conversion equation can be expressed by including parameters other than the scale factor and the zero point.

심층 신경망 양자화를 수행할 때, 발생하는 에러는 크게 두 가지가 있다. When performing deep neural network quantization, there are two main types of errors that occur.

첫 번째는 클리핑 에러(Clipping Error)이다. 클리핑 에러는 위에서 설명한 클리핑 구간을 벗어난 값들에 대해 실제값과 다른 값(클리핑 구간의 최대값 혹은 최소값)으로 치환하는 클리핑 과정 중에 발생하는 에러이다. 클리핑 구간의 최대값, 최소값이 실제 입력된 텐서의 최대값, 최소값과 차이가 적을수록 클리핑 에러는 작아지고, 클리핑 구간이 작아질수록 클리핑 에러는 커진다. The first is a clipping error. Clipping error is an error that occurs during the clipping process of replacing values outside the clipping interval described above with values different from the actual value (maximum or minimum value of the clipping interval). The smaller the difference between the maximum and minimum values of the clipping section and the actual maximum and minimum values of the input tensor, the smaller the clipping error becomes. As the clipping section becomes smaller, the clipping error increases.

두 번째는 양자화 에러(Quantization Error)이다. 양자화 에러는 클리핑 구간의 값을 더 낮은 비트 수로 변환하는 과정 중 발생한다. 32 비트 수보다 더 낮은 비트로 표현된 값은 근사화 정도가 더 커지면서 원래의 부동 소수점 값과는 그 차이가 커진다. 따라서 변환하고자 하는 비트 수(n)와 양자화 에러는 반비례 관계에 놓인다. 또한 클리핑 구간도 작아질 수록 양자화 에러를 크게 하는데 기여한다.The second is quantization error. Quantization error occurs during the process of converting the value of the clipping section to a lower number of bits. Values expressed with bits lower than 32 bits have a greater degree of approximation and the difference from the original floating point value increases. Therefore, the number of bits to be converted (n) and the quantization error are in inverse proportion. Additionally, as the clipping section gets smaller, it contributes to increasing the quantization error.

양자화된 모델의 정확도 손실은 전술된 두 가지 에러를 모두 최소화함으로써 개선시킬 수 있다. 하지만 일반적으로 클리핑 구간의 크기에 따라 클리핑 에러와 양자화 에러는 서로 균형 관계(trade-off)에 놓여있다. 예를 들면, 클리핑 구간이 작은 경우엔 양자화 에러는 줄어드는 반면 클리핑 에러는 커진다. 반대로 클리핑 구간이 큰 경우엔, 양자화 에러가 커지고 클리핑 에러가 줄어든다. 따라서, 두 에러의 합을 최소화할 수 있는 적절한 클리핑 구간 설정은 양자화 모델의 정확도 확보를 위해 중요히 고려되어야 하는 요소이다.The accuracy loss of the quantized model can be improved by minimizing both of the above-mentioned errors. However, generally, depending on the size of the clipping section, clipping error and quantization error are in a trade-off relationship. For example, when the clipping interval is small, the quantization error decreases while the clipping error increases. Conversely, when the clipping section is large, the quantization error increases and the clipping error decreases. Therefore, setting an appropriate clipping interval that can minimize the sum of the two errors is an important factor to consider to ensure the accuracy of the quantization model.

클리핑 구간 [a, b]를 설정하는 방식을 일반적으로 보정(Calibration)이라 일컫는다. 입력된 분포의 통계를 이용하여 클리핑 구간을 설정하게 된다. 흔히 사용되는 방식으로는 최대값 보정(Max Calibration), 엔트로피 보정(Entropy Calibration), 퍼센타일 보정(Percentile Calibration)이 있다. The method of setting the clipping interval [a, b] is generally referred to as calibration. The clipping section is set using the statistics of the input distribution. Commonly used methods include Max Calibration, Entropy Calibration, and Percentile Calibration.

심층 신경망 모델마다 양자화 시에 성능 저하를 최소화하는 보정 방식이 다르게 존재하는 것으로 알려져 있다. 그리고 단순히 입력된 분포의 통계를 분석하여 클리핑 구간을 설정하는 것은 최종 모델의 성능에 미치는 실질적인 영향이 적을 수 있기 때문에 성능 저하를 최소화하기 위한 효과적인 방식이 아닐 수 있다.It is known that each deep neural network model has a different correction method that minimizes performance degradation during quantization. And simply setting the clipping interval by analyzing the statistics of the input distribution may not be an effective way to minimize performance degradation because the actual impact on the performance of the final model may be small.

또한, 신경망 양자화를 지원하는 텐서플로(TensorFlow), 텐서플로 라이트(TensorFlow Lite), 파이토치(PyTorch), 텐서알티(TensorRT)와 같은 심층 신경망 컴파일러(Deep Learning Compiler)는 각각의 구현 방식이 다양하고 복잡하다. 따라서, 사용자가 심층 신경망 컴파일러의 종류 그리고 사용하려는 모델에 따라 각각 다른 최적의 보정 방식을 찾고 이를 구현하는 것은 매우 어려운 일이다.In addition, deep learning compilers such as TensorFlow, TensorFlow Lite, PyTorch, and TensorRT that support neural network quantization have various implementation methods. complicated. Therefore, it is very difficult for users to find and implement different optimal correction methods depending on the type of deep neural network compiler and the model they want to use.

위에서 설명한 보정 방식의 어려움을 해소하기 위해, 심층 신경망 컴파일러의 내부 코드를 수정하지 않고서도 컴파일러에서 생성한 양자화된 모델과 모델 내부에 존재하는 양자화 파라미터를 이용하여 정확도의 저하를 회복시킬 수 있다. 이를 위해, 양자화 파라미터의 업데이트 횟수와 데이터 유무에 따라 제로-샷(Zero-shot) 방식과 서치(Search) 방식을 이용할 수 있다.In order to solve the difficulties of the correction method described above, the decrease in accuracy can be recovered by using the quantized model generated by the compiler and the quantization parameters present within the model without modifying the internal code of the deep neural network compiler. For this purpose, the zero-shot method and the search method can be used depending on the number of updates of the quantization parameter and the presence or absence of data.

제로-샷 방식은 심층 신경망 컴파일러가 내놓은 양자화 모델 속 양자화 파라미터를 다른 어떠한 입력 데이터 없이 새로운 파라미터로 업데이트하는 방식을 일컫는다. 학습 데이터(training data) 혹은 유효 데이터(validation data)를 사용하지 않기 때문에 아주 짧은 시간에 개선된 성능의 양자화 모델을 얻을 수 있다는 장점이 있다. 하지만, 입력 모델에 최적화된 양자화 파라미터 업데이트를 제공하지 못하는 단점이 있다.The zero-shot method refers to a method of updating the quantization parameters in the quantization model provided by the deep neural network compiler with new parameters without any other input data. Since it does not use training data or validation data, it has the advantage of being able to obtain a quantization model with improved performance in a very short time. However, it has the disadvantage of not providing quantization parameter updates optimized for the input model.

서치 방식은 입력된 양자화 모델 이외에 데이터를 이용해 양자화 파라미터를 수 회 업데이트하는 반복적 탐색 방식이다. 복수의 양자화 파라미터 후보들 중 최적의 파라미터 조합을 찾는 탐색 알고리즘이 사용 가능하다. 해당 방식은 입력 모델에 특화된 최적의 양자화 파라미터 조합을 찾아 내어 제로-샷 방식 대비 더 나은 성능을 낼 수 있다는 장점이 있다. 최적의 조합을 찾기 위해 가장 쉬운 방법은 모든 경우의 수를 고려한 뒤 가장 성능이 뛰어난 조합을 고르는 것이겠지만, 시간 복잡도 측면을 고려하면 사실상 불가능하다. 이를 위해 탐색 알고리즘을 채택하곤 하지만, 일반적인 탐색 방식으로는 모델 크기에 따라 양자화 파라미터 탐색에 너무 오랜 시간이 걸릴 수 있다. 그리고 많은 수의 데이터가 확보되지 않을 경우, 일반적인 탐색 알고리즘을 단순히 적용하게 되면 새로운 양자화 파라미터를 통해 얻은 양자화 모델 성능을 담보할 수 없기 때문에 양자화 파라미터 탐색을 위한 적절한 탐색 방식이 필요하다.The search method is an iterative search method that updates quantization parameters several times using data in addition to the input quantization model. A search algorithm that finds the optimal parameter combination among a plurality of quantization parameter candidates can be used. This method has the advantage of being able to achieve better performance than the zero-shot method by finding the optimal combination of quantization parameters specific to the input model. The easiest way to find the optimal combination would be to consider all cases and then select the combination with the best performance, but considering time complexity, this is virtually impossible. For this purpose, a search algorithm is often adopted, but with a general search method, it may take too long to search for quantization parameters depending on the model size. And when a large number of data is not secured, simply applying a general search algorithm cannot guarantee the quantization model performance obtained through new quantization parameters, so an appropriate search method is needed to search for quantization parameters.

본 발명의 실시예들에 따른 양자화 모델 정확도 복원 시스템은 적어도 하나의 컴퓨터 장치를 통해 구현될 수 있으며, 본 발명의 실시예들에 따른 양자화 모델 정확도 복원 방법은 양자화 모델 정확도 복원 시스템을 구현하는 적어도 하나의 컴퓨터 장치를 통해 수행될 수 있다. 이때, 컴퓨터 장치에는 본 발명의 일실시예에 따른 컴퓨터 프로그램이 설치 및 구동될 수 있고, 컴퓨터 장치는 구동된 컴퓨터 프로그램의 제어에 따라 본 발명의 실시예들에 따른 양자화 모델 정확도 복원 방법을 수행할 수 있다. 상술한 컴퓨터 프로그램은 컴퓨터 장치와 결합되어 양자화 모델 정확도 복원 방법을 컴퓨터에 실행시키기 위해 컴퓨터 판독 가능한 기록매체에 저장될 수 있다.The quantization model accuracy restoration system according to embodiments of the present invention may be implemented through at least one computer device, and the quantization model accuracy restoration method according to embodiments of the present invention may be implemented through at least one computer device implementing the quantization model accuracy restoration system. It can be performed through a computer device. At this time, the computer program according to an embodiment of the present invention may be installed and driven in the computer device, and the computer device may perform the quantization model accuracy restoration method according to the embodiments of the present invention under the control of the driven computer program. You can. The above-described computer program can be combined with a computer device and stored in a computer-readable recording medium to execute the quantization model accuracy restoration method on the computer.

도 1은 본 발명의 일실시예에 따른 컴퓨터 장치의 예를 도시한 블록도이다. 컴퓨터 장치(Computer device, 100)는 도 1에 도시된 바와 같이, 메모리(Memory, 110), 프로세서(Processor, 120), 통신 인터페이스(Communication interface, 130) 그리고 입출력 인터페이스(I/O interface, 140)를 포함할 수 있다. 메모리(110)는 컴퓨터에서 판독 가능한 기록매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있다. 여기서 ROM과 디스크 드라이브와 같은 비소멸성 대용량 기록장치는 메모리(110)와는 구분되는 별도의 영구 저장 장치로서 컴퓨터 장치(100)에 포함될 수도 있다. 또한, 메모리(110)에는 운영체제와 적어도 하나의 프로그램 코드가 저장될 수 있다. 이러한 소프트웨어 구성요소들은 메모리(110)와는 별도의 컴퓨터에서 판독 가능한 기록매체로부터 메모리(110)로 로딩될 수 있다. 이러한 별도의 컴퓨터에서 판독 가능한 기록매체는 플로피 드라이브, 디스크, 테이프, DVD/CD-ROM 드라이브, 메모리 카드 등의 컴퓨터에서 판독 가능한 기록매체를 포함할 수 있다. 다른 실시예에서 소프트웨어 구성요소들은 컴퓨터에서 판독 가능한 기록 매체가 아닌 통신 인터페이스(130)를 통해 메모리(110)에 로딩될 수도 있다. 예를 들어, 소프트웨어 구성요소들은 네트워크(Network, 160)를 통해 수신되는 파일들에 의해 설치되는 컴퓨터 프로그램에 기반하여 컴퓨터 장치(100)의 메모리(110)에 로딩될 수 있다.1 is a block diagram showing an example of a computer device according to an embodiment of the present invention. As shown in FIG. 1, the computer device (100) includes a memory (110), a processor (120), a communication interface (130), and an input/output interface (I/O interface, 140). may include. The memory 110 is a computer-readable recording medium and may include a non-permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive. Here, non-perishable large-capacity recording devices such as ROM and disk drives may be included in the computer device 100 as a separate permanent storage device that is distinct from the memory 110. Additionally, an operating system and at least one program code may be stored in the memory 110. These software components may be loaded into the memory 110 from a computer-readable recording medium separate from the memory 110. Such separate computer-readable recording media may include computer-readable recording media such as floppy drives, disks, tapes, DVD/CD-ROM drives, and memory cards. In another embodiment, software components may be loaded into the memory 110 through the communication interface 130 rather than a computer-readable recording medium. For example, software components may be loaded into the memory 110 of the computer device 100 based on a computer program installed by files received through a network (Network, 160).

프로세서(120)는 기본적인 산술, 로직 및 입출력 연산을 수행함으로써, 컴퓨터 프로그램의 명령을 처리하도록 구성될 수 있다. 명령은 메모리(110) 또는 통신 인터페이스(130)에 의해 프로세서(120)로 제공될 수 있다. 예를 들어 프로세서(120)는 메모리(110)와 같은 기록 장치에 저장된 프로그램 코드에 따라 수신되는 명령을 실행하도록 구성될 수 있다.The processor 120 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input/output operations. Commands may be provided to the processor 120 by the memory 110 or the communication interface 130. For example, the processor 120 may be configured to execute received instructions according to program codes stored in a recording device such as memory 110.

통신 인터페이스(130)는 네트워크(160)를 통해 컴퓨터 장치(100)가 다른 장치와 서로 통신하기 위한 기능을 제공할 수 있다. 일례로, 컴퓨터 장치(100)의 프로세서(120)가 메모리(110)와 같은 기록 장치에 저장된 프로그램 코드에 따라 생성한 요청이나 명령, 데이터, 파일 등이 통신 인터페이스(130)의 제어에 따라 네트워크(160)를 통해 다른 장치들로 전달될 수 있다. 역으로, 다른 장치로부터의 신호나 명령, 데이터, 파일 등이 네트워크(160)를 거쳐 컴퓨터 장치(100)의 통신 인터페이스(130)를 통해 컴퓨터 장치(100)로 수신될 수 있다. 통신 인터페이스(130)를 통해 수신된 신호나 명령, 데이터 등은 프로세서(120)나 메모리(110)로 전달될 수 있고, 파일 등은 컴퓨터 장치(100)가 더 포함할 수 있는 저장 매체(상술한 영구 저장 장치)로 저장될 수 있다.The communication interface 130 may provide a function for the computer device 100 to communicate with other devices through the network 160. For example, a request, command, data, file, etc. generated by the processor 120 of the computer device 100 according to a program code stored in a recording device such as memory 110 is transmitted to the network ( 160) and can be transmitted to other devices. Conversely, signals, commands, data, files, etc. from other devices may be received by the computer device 100 through the communication interface 130 of the computer device 100 via the network 160. Signals, commands, data, etc. received through the communication interface 130 may be transmitted to the processor 120 or memory 110, and files, etc. may be stored in a storage medium (as described above) that the computer device 100 may further include. It can be stored as a permanent storage device).

입출력 인터페이스(140)는 입출력 장치(I/O device, 150)와의 인터페이스를 위한 수단일 수 있다. 예를 들어, 입력 장치는 마이크, 키보드 또는 마우스 등의 장치를, 그리고 출력 장치는 디스플레이, 스피커와 같은 장치를 포함할 수 있다. 다른 예로 입출력 인터페이스(140)는 터치스크린과 같이 입력과 출력을 위한 기능이 하나로 통합된 장치와의 인터페이스를 위한 수단일 수도 있다. 입출력 장치(150)는 컴퓨터 장치(100)와 하나의 장치로 구성될 수도 있다.The input/output interface 140 may be a means for interfacing with an input/output device (I/O device, 150). For example, input devices may include devices such as a microphone, keyboard, or mouse, and output devices may include devices such as displays and speakers. As another example, the input/output interface 140 may be a means for interfacing with a device that integrates input and output functions, such as a touch screen. The input/output device 150 may be configured as a single device with the computer device 100.

또한, 다른 실시예들에서 컴퓨터 장치(100)는 도 1의 구성요소들보다 더 적은 혹은 더 많은 구성요소들을 포함할 수도 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다. 예를 들어, 컴퓨터 장치(100)는 상술한 입출력 장치(150) 중 적어도 일부를 포함하도록 구현되거나 또는 트랜시버(transceiver), 데이터베이스 등과 같은 다른 구성요소들을 더 포함할 수도 있다.Additionally, in other embodiments, computer device 100 may include fewer or more components than those of FIG. 1 . However, there is no need to clearly show most prior art components. For example, the computer device 100 may be implemented to include at least some of the input/output devices 150 described above, or may further include other components such as a transceiver, a database, etc.

도 2는 본 발명의 일실시예에 따른 양자화 모델 정확도 복원 방법의 예를 도시한 흐름도이다. 본 실시예에 따른 양자화 모델 정확도 복원 방법은 도 1을 통해 설명한 컴퓨터 장치(100)에 의해 수행될 수 있다. 이때, 컴퓨터 장치(100)의 프로세서(120)는 메모리(110)가 포함하는 운영체제의 코드나 적어도 하나의 컴퓨터 프로그램의 코드에 따른 제어 명령(instruction)을 실행하도록 구현될 수 있다. 여기서, 프로세서(120)는 컴퓨터 장치(100)에 저장된 코드가 제공하는 제어 명령에 따라 컴퓨터 장치(100)가 도 2의 방법이 포함하는 단계들(210 내지 240)을 수행하도록 컴퓨터 장치(100)를 제어할 수 있다.Figure 2 is a flowchart showing an example of a quantization model accuracy restoration method according to an embodiment of the present invention. The quantization model accuracy restoration method according to this embodiment can be performed by the computer device 100 described with reference to FIG. 1 . At this time, the processor 120 of the computer device 100 may be implemented to execute control instructions according to the code of an operating system included in the memory 110 or the code of at least one computer program. Here, the processor 120 causes the computer device 100 to perform steps 210 to 240 included in the method of FIG. 2 according to control instructions provided by code stored in the computer device 100. can be controlled.

단계(210)에서 컴퓨터 장치(100)는 입력된 양자화 모델의 양자화 파라미터를 추출할 수 있다. 입력된 양자화 모델은 심층 신경망 컴파일러에서 양자화를 완료한 모델일 수 있다. 양자화를 완료한 모델 내부에는 연산에 필요한 파라미터(일례로, 가중치, 편향 등)가 원래 모델의 비트 수인 32-비트보다 더 낮은 수의 비트 수로 저장되어 있을 수 있다. 또는 양자화 모델 내부에 포함된 연산 중 일부 혹은 전체가 원래 모델의 비트 수인 32-비트보다 더 낮은 수의 비트 수의 연산을 수행할 수 있다.In step 210, the computer device 100 may extract quantization parameters of the input quantization model. The input quantization model may be a model that has completed quantization in a deep neural network compiler. Inside the model that has completed quantization, the parameters required for calculation (e.g., weight, bias, etc.) may be stored with a lower number of bits than the 32-bit number of bits of the original model. Alternatively, some or all of the operations included in the quantization model may be performed with a lower number of bits than the 32-bit number of bits of the original model.

이 경우, 컴퓨터 장치(100)는 입력된 양자화 모델을 분석하여 양자화 파라미터를 추출할 수 있다. 예를 들어, 컴퓨터 장치(100)는 입력된 양자화 모델을 파싱(parsing)하여 양자화 모델의 내부에 저장되어 있는 정보 중 이후 단계에서 이용할 수 있는 정보를 추출할 수 있다. 이후 단계에서 이용할 수 있는 정보의 예로는 모델의 가중치 및/또는 편향 등과 같이 양자화 이전 기존 모델이 갖고 있던 파라미터일 수 있다. 또한, 양자화 이후 양자화 모델이 갖게 된 각 레이어 별 양자화 파라미터가 그 예가 될 수 있다. 양자화 파라미터의 예로는, 균일 양자화 방식으로 얻어진 양자화 모델의 경우 앞서 수학식 1 또는 3에서 사용한 내부 값들과 같이 클리핑 구간의 최소값과 최대값 그리고 스케일 팩터와 제로 포인트가 될 수 있다. 반면, 비균일 양자화 방식으로 얻어진 양자화 모델의 경우 스케일 팩터와 제로 포인트 이외의 파라미터 또한 양자화 파라미터로 포함할 수 있다.In this case, the computer device 100 may extract quantization parameters by analyzing the input quantization model. For example, the computer device 100 may parse the input quantization model and extract information that can be used in a later step from information stored inside the quantization model. Examples of information that can be used in later stages may be parameters that the existing model had before quantization, such as model weights and/or biases. Additionally, the quantization parameters for each layer that the quantization model has after quantization can be an example. Examples of quantization parameters include, in the case of a quantization model obtained by the uniform quantization method, the minimum and maximum values of the clipping interval, the scale factor, and the zero point, such as the internal values used in Equation 1 or 3 above. On the other hand, in the case of a quantization model obtained by a non-uniform quantization method, parameters other than the scale factor and zero point may also be included as quantization parameters.

단계(220)에서 컴퓨터 장치(100)는 양자화 모델에 포함된 레이어들 중에서 양자화 파라미터의 조정에 따른 민감도에 기초하여 선정된 양자화 파라미터 조정 대상 각각에 대해 양자화 파라미터의 후보군을 매칭시킴으로써, 양자화 파라미터 탐색 알고리즘을 위한 탐색 공간을 구축할 수 있다. 이때, 양자화 파라미터 조정 대상은 레이어이거나 또는 레이어 집단일 수 있다. 레이어 집단은 하나 이상의 레이어를 포함할 수 있다. 예를 들어, 탐색 공간은 양자화 모델 내의 전체 레이어 중, 양자화 파라미터 탐색을 수행할 적어도 일부의 레이어(또는 적어도 일부의 레이어 집단)와 적어도 일부의 레이어 각각(또는 적어도 일부의 레이어 집단 각각)에 대해 새롭게 재계산될 양자화 파라미터의 후보군 쌍으로 이루어질 수 있다. 이러한 탐색 공간을 구축하는 과정에 대해서는 이후 도 3을 통해 더욱 자세히 설명한다.In step 220, the computer device 100 performs a quantization parameter search algorithm by matching a candidate group of quantization parameters to each quantization parameter adjustment target selected based on sensitivity to adjustment of the quantization parameter among the layers included in the quantization model. You can build a search space for . At this time, the quantization parameter adjustment target may be a layer or a group of layers. A layer group may contain one or more layers. For example, the search space is new for at least some layers (or at least some layer groups) on which quantization parameter search is to be performed and for each of at least some layers (or at least some layer groups) among all the layers in the quantization model. It may consist of a pair of candidate groups of quantization parameters to be recalculated. The process of constructing this search space will be described in more detail later with reference to FIG. 3.

단계(230)에서 컴퓨터 장치(100)는 양자화 파라미터 탐색 알고리즘 및 탐색 공간을 이용하여 양자화 파라미터 조정 대상 각각에 대한 양자화 파라미터를 탐색할 수 있다. 입력된 양자화 모델의 성능 향상을 위해 탐색 공간에 양자화 파라미터 조정 대상 각각에 대해 새로운 양자화 파라미터가 탐색될 수 있다. 일례로, 컴퓨터 장치(100)는 양자화 파라미터 탐색 알고리즘을 이용하여 재계산된 양자화 파라미터를 얻을 수 있다. 양자화 파라미터를 탐색하는 과정에 대해서는 이후 도 4를 통해 더욱 자세히 설명한다.In step 230, the computer device 100 may search for quantization parameters for each quantization parameter adjustment target using a quantization parameter search algorithm and a search space. To improve the performance of the input quantization model, new quantization parameters may be searched for each quantization parameter adjustment target in the search space. For example, the computer device 100 may obtain recalculated quantization parameters using a quantization parameter search algorithm. The process of searching for quantization parameters will be described in more detail later with reference to FIG. 4.

단계(240)에서 컴퓨터 장치(100)는 탐색된 양자화 파라미터를 양자화 모델에 반영하여 조정된 양자화 모델을 생성할 수 있다. 조정된 양자화 모델은 입력된 양자화 모델의 양자화 파라미터가 단계(230)에서 탐색된 새로운 양자화 파라미터로 변경된 모델일 수 있다.In step 240, the computer device 100 may generate an adjusted quantization model by reflecting the discovered quantization parameters in the quantization model. The adjusted quantization model may be a model in which the quantization parameters of the input quantization model are changed to the new quantization parameters discovered in step 230.

도 3은 본 발명의 일실시예에 있어서, 탐색 공간을 구축하는 과정의 예를 도시한 흐름도이다. 도 3의 단계들(310 내지 360)은 앞서 도 2를 통해 설명한 단계(220)에 포함되어 컴퓨터 장치(100)에 의해 수행될 수 있다.Figure 3 is a flowchart showing an example of a process for constructing a search space according to an embodiment of the present invention. Steps 310 to 360 of FIG. 3 may be included in step 220 previously described with reference to FIG. 2 and performed by the computer device 100.

단계(310)에서 컴퓨터 장치(100)는 데이터 셋의 크기가 탐색 알고리즘을 수행하기에 충분한지 결정할 수 있다. 탐색 알고리즘은 탐색 단계에서 더 나은 조건을 평가하기 위한 데이터 셋이 요구될 수 있다. 이때, 탐색 알고리즘을 수행 시에 사용하는 데이터 셋의 크기에 따라 재계산된 양자화 파라미터를 활용한 새로운 양자화 모델의 성능이 바뀔 수 있다. 이에 따라, 사용하려는 탐색 알고리즘에 따라 데이터 셋의 크기가 적절한지 판단하는 과정이 요구된다. 만약, 탐색 알고리즘의 성능을 장담할 수 있는 정도의 데이터 셋이 확보가 되지 않은 경우 혹은 탐색 알고리즘의 성능이 활용하는 데이터 셋의 크기에 비례하는 경우에는 데이터 셋의 크기를 늘리는 것이 적절할 수 있다. 이를 위해, 컴퓨터 장치(100)는 데이터 셋의 크기가 탐색 알고리즘을 수행하기에 충분하지 않은 경우에는 단계(320)를, 데이터 셋의 크기가 탐색 알고리즘을 수행하기에 충분한 경우에는 단계(330)를 수행할 수 있다.At step 310, the computer device 100 may determine whether the size of the data set is sufficient to perform the search algorithm. Search algorithms may require data sets to better evaluate conditions in the search step. At this time, the performance of a new quantization model using the recalculated quantization parameters may change depending on the size of the data set used when performing the search algorithm. Accordingly, a process of determining whether the size of the data set is appropriate depending on the search algorithm to be used is required. If a data set sufficient to guarantee the performance of the search algorithm is not secured, or if the performance of the search algorithm is proportional to the size of the data set used, it may be appropriate to increase the size of the data set. To this end, the computer device 100 performs step 320 when the size of the data set is not sufficient to perform the search algorithm, and step 330 when the size of the data set is sufficient to perform the search algorithm. It can be done.

단계(320)에서 컴퓨터 장치(100)는 데이터 셋을 확장할 수 있다. 예를 들어, 컴퓨터 장치(100)는 데이터 셋이 아닌 다른 데이터 셋의 적어도 일부를 데이터 셋에 포함시켜 데이터 셋을 확장시킬 수 있다. 보다 구체적인 예로, 컴퓨터 장치(100)는 양자화 모델과 같은 도메인의 데이터(In-domain data)를 데이터 셋에 포함시키거나 다른 도메인의 데이터(Out-of-domain data)를 데이터 셋에 포함시켜 오픈 도메인 데이터(Open-domain data)의 셋을 만들 수 있다. 또 다른 데이터 셋 확장의 일례로, 컴퓨터 장치(100)는 합성 데이터(Synthetic data)를 데이터 셋에 포함시킬 수 있다. 합성 데이터는 실제 데이터(일례로, 실제 사물의 사진, 그림 등 일반적인 모델의 학습 데이터 혹은 유효 데이터)가 아니라 머신러닝 알고리즘(Machine Learning Algorithm) 혹은 생성적 모델(Generative Model)을 활용해 인공적으로 만든 데이터를 의미할 수 있다.In step 320, the computer device 100 may expand the data set. For example, the computer device 100 may expand the data set by including at least a portion of a data set other than the data set in the data set. As a more specific example, the computer device 100 includes data from the same domain as a quantization model (in-domain data) in the data set or data from another domain (out-of-domain data) into the data set to form an open domain. You can create a set of data (Open-domain data). As another example of data set expansion, the computer device 100 may include synthetic data in the data set. Synthetic data is not real data (for example, training data or valid data for general models such as photos or paintings of real objects), but data artificially created using a machine learning algorithm or generative model. It can mean.

단계(330)에서 컴퓨터 장치(100)는 양자화 모델 및 데이터 셋을 이용하여 양자화 모델의 적어도 하나의 레이어(또는 적어도 하나의 레이어 집단) 각각에 대해 양자화 파라미터의 조정에 따른 민감도를 측정할 수 있다. 컴퓨터 장치(100)는 양자화 모델과 데이터 셋을 활용해 양자화 모델의 각 레이어(또는 각 레이어 집단)에 대해 양자화 파라미터 업데이트가 전체 양자화 모델의 결과에 미치는 영향을 확인할 수 있다. 이러한 영향을 양자화 파라미터 조정에 따른 각 레이어(또는 각 레이어 집단)의 민감도라고 볼 수 있다. 이를 위해, 컴퓨터 장치(100)는 기존 양자화 모델의 전체 레이어 중 적어도 하나의 레이어 또는 기존 양자화 모델의 전체 레이어 집단 중 적어도 하나의 레이어 집단에 대해, 특정 레이어나 특정 레이어 집단의 양자화 파라미터를 조정한 뒤, 기준값의 변화를 확인하여 측정할 수 있다. 이때, 레이어 집단의 양자화 파라미터를 조정하는 것은 레이어 집단에 포함된 레이어들 중 적어도 하나의 레이어의 양자화 파라미터를 조정하는 것을 의미할 수 있다. 여기서, 민감도의 기준값은, 양자화 모델의 정확도 및/또는 양자화 모델의 손실(loss)과 같은 양자화 모델의 성능 그리고 양자화 모델과 양자화 이전 모델과의 활성화 차이 등을 일례로 들 수 있다. 예를 들어, 양자화 모델의 정확도를 민감도의 기준값으로 활용하는 경우, 특정 레이어의 양자화 파라미터를 조정했을 때, 변경된 양자화 모델의 정확도가 하락한다면, 해당 레이어는 양자화 파라미터 조정 민감도가 높다고 볼 수 있다. 반대로, 정확도가 향상된다면 해당 레이어는 양자화 파라미터 조정 민감도가 낮다고 볼 수 있다.In step 330, the computer device 100 may measure sensitivity according to adjustment of the quantization parameter for each at least one layer (or group of at least one layer) of the quantization model using the quantization model and the data set. The computer device 100 can use the quantization model and data set to check the impact of quantization parameter updates on the results of the entire quantization model for each layer (or each layer group) of the quantization model. This effect can be viewed as the sensitivity of each layer (or each layer group) according to quantization parameter adjustment. To this end, the computer device 100 adjusts the quantization parameters of a specific layer or a specific layer group for at least one layer among all layers of an existing quantization model or at least one layer group among all layer groups of an existing quantization model. , it can be measured by checking the change in the reference value. At this time, adjusting the quantization parameter of the layer group may mean adjusting the quantization parameter of at least one layer among the layers included in the layer group. Here, the reference value of sensitivity may include, for example, the accuracy of the quantization model and/or the performance of the quantization model, such as the loss of the quantization model, and the difference in activation between the quantization model and the pre-quantization model. For example, when using the accuracy of the quantization model as a reference value for sensitivity, if the accuracy of the changed quantization model decreases when the quantization parameter of a specific layer is adjusted, the corresponding layer can be considered to have high sensitivity to quantization parameter adjustment. Conversely, if accuracy improves, the corresponding layer can be considered to have low sensitivity to quantization parameter adjustment.

단계(340)에서 컴퓨터 장치(100)는 측정된 민감도를 이용하여 적어도 하나의 레이어 중에서 양자화 파라미터 조정 대상이 되는 조정 대상 레이어(또는 조정 대상 레이어 집단)를 선정할 수 있다. 일례로, 컴퓨터 장치(100)는 단계(330)에서 측정된 레이어 별 혹은 레이어 집단 별 민감도를 이용해 이후 양자화 파라미터를 조정하기 위한 양자화 파라미터 조정 대상으로서의 레이어를 선정할 수 있다. 이때 조정 대상 레이어의 선정 방식의 예시로서, 레이어의 민감도가 낮을수록 해당 레이어가 조정 대상 레이어로 선정될 확률이 높을 수 있으며, 반면에 레이어의 민감도가 높을수록 해당 레이어가 조정 대상 레이어로 선정될 확률이 적을 수 있다. 조정 대상 레이어의 선정 방식의 다른 예로서, 레이어의 민감도에 대한 임계값을 설정한 후, 임계값 이하의 민감도를 갖는 레이어들을 조정 대상 레이어로서 선정할 수도 있다. 한편, 모델에 따라 패딩(padding) 레이어와 같이, 특히 민감한 레이어가 존재한다. 이에 컴퓨터 장치(100)는 양자화 파라미터를 조정하기 위한 레이어를 선정함에 있어서, 연산 종류에 따라 기설정된 레이어는 제외하고 조정 대상 레이어를 선정할 수 있다. 다시 말해, 민감도 측정의 대상이 되는 레이어는 양자화 모델의 전체 레이어 중에서 연산 종류에 따라 기설정된 레이어를 제외한 나머지 레이어를 포함할 수 있다. 이러한 조정 대상의 선정은 조정 대상이 개별 레이어가 아닌 레이어 집단인 경우에 대해서도 동일하게 동작할 수 있다.In step 340, the computer device 100 may select an adjustment target layer (or adjustment target layer group) that is subject to quantization parameter adjustment from among at least one layer using the measured sensitivity. For example, the computer device 100 may use the sensitivity of each layer or layer group measured in step 330 to select a layer as a quantization parameter adjustment target for later adjustment of the quantization parameter. As an example of the selection method of the layer to be adjusted, the lower the sensitivity of the layer, the higher the probability that the layer will be selected as the layer to be adjusted. On the other hand, the higher the sensitivity of the layer, the higher the probability that the layer will be selected as the layer to be adjusted. You can write this down. As another example of a method of selecting a layer to be adjusted, a threshold for the sensitivity of a layer may be set, and then layers with sensitivities below the threshold may be selected as layers to be adjusted. Meanwhile, depending on the model, there are particularly sensitive layers, such as padding layers. Accordingly, when selecting a layer for adjusting quantization parameters, the computer device 100 may select a layer to be adjusted, excluding layers preset according to the operation type. In other words, the layer subject to sensitivity measurement may include all layers of the quantization model, excluding the preset layer depending on the operation type. This selection of an adjustment target can operate in the same way even when the adjustment target is a group of layers rather than an individual layer.

단계(350)에서 컴퓨터 장치(100)는 조정 대상 레이어 별로 양자화 파라미터 후보군을 매칭할 수 있다. 조정 대상 레이어 별로 선정된 양자화 파라미터 후보군의 개수에 따라 양자화 파라미터 탐색 알고리즘에 소요되는 시간이 크게 달라질 수 있다. 이에 적절한 양자화 파라미터 후보군의 개수가 지정될 필요가 있다. 조정 대상 레이어 별 양자화 파라미터 후보군 설정의 예시로서, 조정 대상 레이어의 양자화 파라미터 조정 민감도가 낮을수록 후보군의 개수가 많게 설정될 수 있다. 반면에 조정 대상 레이어의 민감도가 높을수록 후보군의 개수가 적게 설정될 수 있다. 다시 말해, 컴퓨터 장치(100)는 조정 대상 레이어 별 민감도에 기반하여, 상기 조정 대상 레이어 별로 새로운 양자화 파라미터의 후보군의 개수 혹은 그 분포를 다르게 설정할 수 있다. 양자화 파라미터 후보군을 매칭하는 것은 조정 대상이 개별 레이어가 아닌 레이어 집단인 경우에 대해서도 동일하게 동작할 수 있다.In step 350, the computer device 100 may match the quantization parameter candidate group for each adjustment target layer. Depending on the number of quantization parameter candidates selected for each layer to be adjusted, the time required for the quantization parameter search algorithm may vary significantly. Accordingly, the number of appropriate quantization parameter candidates needs to be specified. As an example of setting a quantization parameter candidate group for each adjustment target layer, the lower the quantization parameter adjustment sensitivity of the adjustment target layer, the larger the number of candidate groups may be set. On the other hand, the higher the sensitivity of the adjustment target layer, the smaller the number of candidates can be set. In other words, the computer device 100 may set the number or distribution of candidates for a new quantization parameter differently for each adjustment target layer based on the sensitivity of each adjustment target layer. Matching the quantization parameter candidates can operate in the same way even when the adjustment target is a group of layers rather than an individual layer.

단계(360)에서 컴퓨터 장치(100)는 조정 대상 레이어와 양자화 파라미터 후보군의 쌍으로 이루어진 탐색 공간을 생성할 수 있다. 생성된 탐색 공간은 이후 도 4를 통해 설명하는 바와 같이 양자화 파라미터 탐색 알고리즘에 이용될 수 있다.In step 360, the computer device 100 may generate a search space consisting of pairs of adjustment target layers and quantization parameter candidates. The generated search space can be used in a quantization parameter search algorithm as described later with reference to FIG. 4.

도 4는 본 발명의 일실시예에 있어서, 양자화 파라미터를 탐색하는 과정의 예를 도시한 흐름도이다. 도 4의 단계들(410 내지 430)은 앞서 도 2를 통해 설명한 단계(230)에 포함되어 컴퓨터 장치(100)에 의해 수행될 수 있다.Figure 4 is a flowchart illustrating an example of a process for searching quantization parameters according to an embodiment of the present invention. Steps 410 to 430 of FIG. 4 may be included in step 230 previously described with reference to FIG. 2 and performed by the computer device 100.

단계(410)에서 컴퓨터 장치(100)는 탐색 알고리즘의 수행을 위한 자원이 충분한 지 확인할 수 있다. 탐색 알고리즘을 수행하기 위한 자원에 따라 양자화 파라미터 탐색을 위한 탐색 공간의 정보를 활용하는 방식이 달라질 수 있으므로 컴퓨터 장치(100)는 탐색 알고리즘의 수행을 위한 자원이 충분한 지 판단할 수 있다. 탐색 알고리즘 수행을 위한 자원의 예로는 알고리즘이 작동될 컴퓨터 장치(100)의 계산 능력, 메모리 크기, 및/또는 알고리즘에 소요될 것으로 기대되는 시간 등이 포함될 수 있다.In step 410, the computer device 100 may check whether there are sufficient resources for performing the search algorithm. Since the method of utilizing information in the search space for quantization parameter search may vary depending on the resources for performing the search algorithm, the computer device 100 may determine whether the resources for performing the search algorithm are sufficient. Examples of resources for performing the search algorithm may include computing power of the computer device 100 on which the algorithm will operate, memory size, and/or the time expected to be spent on the algorithm.

단계(420)에서 컴퓨터 장치(100)는 탐색 공간을 이용하여 양자화 파라미터 탐색 알고리즘을 수행할 수 있다. 다시 말해, 컴퓨터 장치(100)는 탐색 알고리즘을 수행하기 위한 자원이 충분하다고 판단된 경우, 탐색 공간을 활용해 양자화 파라미터 탐색 알고리즘을 수행할 수 있다. 양자화 파라미터 탐색 알고리즘은 입력된 양자화 모델의 성능이 향상될 수 있도록 최적의 양자화 파라미터를 찾는 알고리즘이다.In step 420, the computer device 100 may perform a quantization parameter search algorithm using the search space. In other words, if it is determined that the computer device 100 has sufficient resources to perform the search algorithm, it can perform the quantization parameter search algorithm using the search space. The quantization parameter search algorithm is an algorithm that finds optimal quantization parameters so that the performance of the input quantization model can be improved.

보다 구체적인 예로, 컴퓨터 장치(100)는 양자화 파라미터 탐색 알고리즘을 이용하여 다음과 같이 최적의 양자화 파라미터를 찾을 수 있다.As a more specific example, the computer device 100 can find the optimal quantization parameter using a quantization parameter search algorithm as follows.

1. 컴퓨터 장치(100)는 탐색 공간에 포함된 레이어 중 양자화 모델의 입력 레이어에 가까운 레이어 혹은 레이어 집단부터 탐색을 시작할 수 있다.1. The computer device 100 may start the search from a layer or group of layers that are close to the input layer of the quantization model among the layers included in the search space.

2. 컴퓨터 장치(100)는 각 레이어 혹은 레이어 집단의 양자화 파라미터를 배정된 새로운 양자화 파라미터 후보군으로 하나씩 변경하여 새롭게 바뀐 양자화 파라미터로 업데이트한 양자화 모델의 성능(혹은 기준값)을 측정할 수 있다.2. The computer device 100 may change the quantization parameters of each layer or layer group to the assigned new quantization parameter candidates one by one and measure the performance (or reference value) of the quantization model updated with the newly changed quantization parameters.

3. 컴퓨터 장치(100)는 후보군 내의 각 후보 별 성능 변화를 확인한 뒤, 가장 성능이 좋은 K(K는 자연수)개의 후보를 선택할 수 있다.3. The computer device 100 may check the change in performance of each candidate in the candidate group and then select K (K is a natural number) candidates with the best performance.

4. 컴퓨터 장치(100)는 각 K개의 후보 별로 다음 레이어 혹은 레이어 집단에서 상술한 2번의 과정을 수행한다. 이때, 컴퓨터 장치(100)는 누적으로 가장 성능이 좋은 K개의 후보 조합을 선정할 수 있다.4. The computer device 100 performs the above-described two processes for each K candidates in the next layer or layer group. At this time, the computer device 100 may select K candidate combinations with the best cumulative performance.

5. 컴퓨터 장치(100)는 탐색 공간에 포함된 모든 레이어 또는 레이어 집단 각각에 대하여 위 1 내지 4의 과정을 반복할 수 있다.5. The computer device 100 may repeat processes 1 to 4 above for all layers or each layer group included in the search space.

단계(430)에서 컴퓨터 장치(100)는 탐색 공간 내의 양자화 파라미터 후보군 중 하나를 선택하거나 또는 레이어의 종류에 따라 양자화 파라미터를 일괄적으로 계산할 수 있다. 다시 말해, 탐색 알고리즘을 수행하기 위한 자원이 충분치 않다고 판단되면, 컴퓨터 장치(100)는 탐색 알고리즘을 사용하지 않고 탐색 공간을 활용하여 새로운 양자화 파라미터를 결정할 수 있다. 컴퓨터 장치(100)는 탐색 공간에 포함된 양자화 파라미터 조정의 대상이 되는 레이어 전체 혹은 그 중 일부의 레이어에 대해 새로운 양자화 파라미터를 계산할 수 있다. 양자화 파라미터는 탐색 공간 내의 각 레이어 별 양자화 파라미터 후보군 중 하나로서 선택되거나 또는 각 레이어의 종류(일례로, Convolution, Fully-Connected, Add, Multiplication Layer 등)에 따라 일괄적으로 단순 계산될 수 있다. 예를 들어, 각 레이어의 종류에 따라 양자화 파라미터의 계산 방식이 미리 기설정되어 있을 수 있으며, 컴퓨터 장치(100)는 동일한 종류의 레이어에 대해서는 해당 종류에 대해 기설정된 계산 방식에 따라 일괄적으로 양자화 파라미터를 단순 계산할 수 있다.In step 430, the computer device 100 may select one of the quantization parameter candidates in the search space or collectively calculate quantization parameters according to the type of layer. In other words, if it is determined that there are insufficient resources to perform the search algorithm, the computer device 100 may determine a new quantization parameter by utilizing the search space without using the search algorithm. The computer device 100 may calculate new quantization parameters for all or some of the layers that are subject to quantization parameter adjustment included in the search space. The quantization parameter may be selected as one of the quantization parameter candidates for each layer in the search space, or may be simply calculated in batches according to the type of each layer (e.g., Convolution, Fully-Connected, Add, Multiplication Layer, etc.). For example, the calculation method of the quantization parameter may be preset according to the type of each layer, and the computer device 100 quantizes layers of the same type in batches according to the calculation method preset for the corresponding type. Parameters can be simply calculated.

도 5는 본 발명의 일실시예에 있어서, 레이어별로 할당된 양자화 파라미터 후보의 예를 나타낸 도면이다. 예를 들어, 도 5는 도 4의 실시예에서 컴퓨터 장치(100)는 단계(420)에서 각 레이어별로 2개의 후보 조합이 선정된 예를 나타내고 있다.Figure 5 is a diagram showing an example of quantization parameter candidates allocated to each layer, according to an embodiment of the present invention. For example, FIG. 5 shows an example in which two candidate combinations are selected for each layer in step 420 of the computer device 100 in the embodiment of FIG. 4 .

이 밖에도 최적의 파라미터 조합을 찾기 위해 사용 가능한 방법론으로 진화 알고리즘(Evolutionary Algorithm), 신경망 구조 탐색(Neural Architecture Search), 강화 학습(Reinforcement Learning), 탐욕적 탐색(Greedy Search)이 적용될 수도 있다.In addition, Evolutionary Algorithm, Neural Architecture Search, Reinforcement Learning, and Greedy Search can be applied as available methodologies to find the optimal parameter combination.

이와 같이, 본 발명의 실시예들에 따르면, 컴파일러에서 생성한 양자화 모델의 정확도 저하를 회복시킬 수 있다. 이를 위해, 양자화 모델 내부에 존재하는 양자화 파라미터를 업데이트 하기 위한 반복적 탐색 방식의 성능을 개선할 수 있다. 예를 들어, 기존 양자화 파라미터 탐색 방식 대비 더 높은 정확도 복원을 제공할 수 있으며, 그와 동시에 탐색 공간 (Search Space)을 줄임으로써, 최적의 양자화 파라미터 탐색에 필요한 시간을 줄일 수 있다.In this way, according to embodiments of the present invention, the decrease in accuracy of the quantization model generated by the compiler can be recovered. To this end, the performance of the iterative search method for updating the quantization parameters existing within the quantization model can be improved. For example, it can provide higher accuracy restoration compared to existing quantization parameter search methods, and at the same time, by reducing the search space, the time required to search for optimal quantization parameters can be reduced.

이상에서 설명된 시스템 또는 장치는 하드웨어 구성요소, 또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The system or device described above may be implemented with hardware components or a combination of hardware components and software components. For example, devices and components described in embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), etc. , may be implemented using one or more general-purpose or special-purpose computers, such as a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include a plurality of processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. It can be embodied in . Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수개 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 애플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination. The medium may continuously store a computer-executable program, or may temporarily store it for execution or download. In addition, the medium may be a variety of recording or storage means in the form of a single or several pieces of hardware combined. It is not limited to a medium directly connected to a computer system and may be distributed over a network. Examples of media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, And there may be something configured to store program instructions, including ROM, RAM, flash memory, etc. Additionally, examples of other media include recording or storage media managed by app stores that distribute applications, sites or servers that supply or distribute various other software, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited examples and drawings, various modifications and variations can be made by those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments and equivalents of the claims also fall within the scope of the following claims.

Claims

A method for adjusting a quantization model performed by a computer device including at least one processor, comprising:
extracting, by the at least one processor, quantization parameters of an input quantization model;
For a quantization parameter search algorithm, the at least one processor matches a group of quantization parameter candidates to each quantization parameter adjustment target selected based on sensitivity to adjustment of the quantization parameter among the layers included in the quantization model. building a search space;
Searching, by the at least one processor, for quantization parameters to be applied to each of the quantization parameter adjustment targets using the quantization parameter search algorithm and the search space; and
generating an adjusted quantization model by reflecting the searched quantization parameters in the quantization model, by the at least one processor.
Method, including.

According to paragraph 1,
The step of building the search space is,
measuring the sensitivity according to adjustment of the quantization parameter for each at least one layer of the quantization model using the quantization model and the data set;
selecting an adjustment target layer to be adjusted for the quantization parameter from among the at least one layer using the measured sensitivity;
matching the quantization parameter candidate group for each adjustment target layer; and
Generating the search space consisting of pairs of the adjustment target layer and the quantization parameter candidate group.
A method comprising:

According to paragraph 1,
The step of building the search space is,
measuring the sensitivity according to adjustment of the quantization parameter for each of at least one layer group of the quantization model using the quantization model and the data set;
selecting an adjustment target layer group to be adjusted for the quantization parameter from among the at least one layer group using the measured sensitivity;
matching the quantization parameter candidate group for each adjustment target layer group; and
Generating the search space consisting of pairs of the adjustment target layer group and the quantization parameter candidate group.
A method comprising:

According to paragraph 2,
The step of building the search space is,
determining whether the size of the data set is sufficient to perform the search algorithm; and
If the size of the data set is not sufficient to perform the search algorithm, expanding the data set
A method further comprising:

According to paragraph 4,
The step of expanding the data set is,
A method characterized in that the data set is expanded by including at least a part of a data set other than the data set in the data set.

According to paragraph 2,
The step of measuring the sensitivity is,
After adjusting the quantization parameter of any one of the at least one layer, measuring the sensitivity for the one layer based on a change in a reference value of the quantization model.
A method comprising:

According to clause 6,
The reference value includes at least one of accuracy of the quantization model, loss of the quantization model, and an activation difference between the quantization model and a pre-quantization model of the quantization model.

According to paragraph 2,
The at least one layer is,
A method characterized in that among all layers of the quantization model, the remaining layers excluding the preset layer according to the operation type are included.

According to paragraph 2,
The step of matching the candidate group is,
A method characterized by differently matching the number or distribution of candidate groups of quantization parameters for each adjustment target layer, based on the sensitivity of each adjustment target layer.

According to paragraph 1,
The step of searching for the quantization parameter is,
confirming whether there are sufficient resources for performing the search algorithm;
If the resources are sufficient, performing a quantization parameter search algorithm using the search space; and
If the resources are not sufficient, selecting one of the quantization parameter candidates in the search space or collectively calculating quantization parameters according to the type of layer.
A method comprising:

According to paragraph 1,
The method of constructing the search space and searching the quantization parameter is performed repeatedly at least once.

A computer-readable recording medium recording a computer program for executing the method of any one of claims 1 to 11 on a computer device.

At least one processor implemented to execute readable instructions in a computer device
Including,
By the at least one processor,
Extract the quantization parameters of the input quantization model,
Constructing a search space for a quantization parameter search algorithm by matching a quantization parameter candidate group to each quantization parameter adjustment target selected based on sensitivity to adjustment of the quantization parameter among the layers included in the quantization model,
Search for quantization parameters to be applied to each quantization parameter adjustment target using the quantization parameter search algorithm and the search space, and
Generating an adjusted quantization model by reflecting the searched quantization parameters in the quantization model
A computer device characterized by a.

According to clause 13,
To construct the search space, by the at least one processor,
Measure the sensitivity according to adjustment of the quantization parameter for each at least one layer of the quantization model using the quantization model and data set,
Selecting an adjustment target layer to be adjusted for the quantization parameter from among the at least one layer using the measured sensitivity,
Matching the quantization parameter candidates for each adjustment target layer,
Generating the search space consisting of a pair of the adjustment target layer and the quantization parameter candidate group.
A computer device characterized by a.

According to clause 13,
To construct the search space, by the at least one processor,
Measure the sensitivity according to adjustment of the quantization parameter for each of at least one layer group of the quantization model using the quantization model and data set,
Selecting an adjustment target layer group that is subject to the quantization parameter adjustment from among the at least one layer group using the measured sensitivity,
Matching the quantization parameter candidates for each adjustment target layer group,
Generating the search space consisting of a pair of the adjustment target layer group and the quantization parameter candidate group.
A computer device characterized by a.

According to clause 14,
To search for the quantization parameter, by the at least one processor,
Check whether there are sufficient resources to perform the search algorithm,
If the resources are sufficient, perform a quantization parameter search algorithm using the search space,
If the resources are not sufficient, select one of the quantization parameter candidates in the search space or calculate quantization parameters in batches according to the type of layer.
A computer device characterized by a.

According to clause 13,
The first process for constructing the search space and the second process for searching the quantization parameter are repeated at least once.
A computer device characterized by a.