KR20190098671A

KR20190098671A - High speed processing method of neural network and apparatus using thereof

Info

Publication number: KR20190098671A
Application number: KR1020180094311A
Authority: KR
Inventors: 손창용; 손진우; 정상일; 최창규; 한재준
Original assignee: 삼성전자주식회사
Priority date: 2018-02-14
Filing date: 2018-08-13
Publication date: 2019-08-22
Also published as: KR102655950B1

Abstract

Disclosed are a method and an apparatus for processing a neural network based on lightweight data. According to one embodiment of the present invention, the processing method using a neural network comprises the steps of: generating output maps of a current layer by performing a convolution operation between input maps of the current layer of the neural network and weight kernels of the current layer; determining a lightweight format related to the output maps of the current layer based on distribution of at least part of activation data processed in the neural network; and weight lightening the activation data corresponding to the output maps of the current layer to a low bit width based on the determined lightweight format.

Description

High speed processing method of neural network and apparatus using the method {HIGH SPEED PROCESSING METHOD OF NEURAL NETWORK AND APPARATUS USING THEREOF}

뉴럴 네트워크의 고속 처리 방법 및 그 방법을 이용한 장치에 관한 것이다.A high speed processing method of a neural network and an apparatus using the method.

최근 들어, 입력 패턴을 특정 그룹으로 분류하는 문제를 해결하는 방안으로써, 인간이 지니고 있는 효율적인 패턴 인식 방법을 실제 컴퓨터에 적용시키려는 연구가 활발히 진행되고 있다. 이러한 연구 중 하나로, 인간의 생물학적 신경 세포의 특성을 수학적 표현에 의해 모델링한 인공 뉴럴 네트워크(artificial neural network)에 대한 연구가 있다. 입력 패턴을 특정 그룹으로 분류하는 문제를 해결하기 위해, 인공 뉴럴 네트워크는 인간이 가지고 있는 학습이라는 능력을 모방한 알고리즘을 이용한다. 이 알고리즘을 통하여 인공 뉴럴 네트워크는 입력 패턴과 출력 패턴들 간의 사상(mapping)을 생성해낼 수 있는데, 이를 인공 뉴럴 네트워크가 학습 능력이 있다고 표현한다. 또한, 인공 뉴럴 네트워크는 학습된 결과에 기초하여 학습에 이용되지 않았던 입력 패턴에 대하여 비교적 올바른 출력을 생성할 수 있는 일반화 능력을 가지고 있다.Recently, as a way to solve the problem of classifying input patterns into specific groups, studies are being actively conducted to apply an efficient pattern recognition method possessed by a human to a real computer. One such study is the study of an artificial neural network that models the characteristics of human biological neurons by mathematical expression. To solve the problem of classifying input patterns into specific groups, artificial neural networks use algorithms that mimic the ability of human learning. Through this algorithm, the artificial neural network can generate mapping between input patterns and output patterns, which indicates that the artificial neural network has learning ability. In addition, artificial neural networks have a generalization ability to generate relatively correct outputs for input patterns that have not been used for learning based on the learned results.

일 실시예에 따르면, 뉴럴 네트워크를 이용한 처리 방법은 뉴럴 네트워크의 현재 레이어의 입력 맵들 및 상기 현재 레이어의 웨이트 커널들 간의 컨볼루션 연산을 수행하여 상기 현재 레이어의 출력 맵들을 생성하는 단계; 상기 뉴럴 네트워크 내에서 처리되는 적어도 일부의 액티베이션 데이터의 분포에 기초하여, 상기 현재 레이어의 출력 맵들에 관한 경량화 포맷을 결정하는 단계; 및 상기 결정된 경량화 포맷에 기초하여, 상기 현재 레이어의 출력 맵들에 대응하는 액티베이션 데이터를 로우 비트 폭(low bit width)으로 경량화하는 단계를 포함한다.According to an embodiment, a processing method using a neural network may include: generating output maps of the current layer by performing a convolution operation between input maps of a current layer of a neural network and weight kernels of the current layer; Determining a lightweighting format for output maps of the current layer based on a distribution of at least some activation data processed within the neural network; And lightening the activation data corresponding to the output maps of the current layer to a low bit width based on the determined lightweight format.

상기 경량화 포맷을 결정하는 단계는 상기 현재 레이어의 출력 맵들의 최대 값에 기초하여 상기 출력 맵들에 관한 경량화 포맷을 결정하는 단계를 포함할 수 있다.The determining of the lightweighting format may include determining a lightweighting format for the output maps based on the maximum value of the output maps of the current layer.

상기 경량화하는 단계는 상기 결정된 경량화 포맷에 기초하여, 상기 현재 레이어의 출력 맵들에 대응하는 다음 레이어의 입력 맵들을 상기 로우 비트 폭으로 경량화하는 단계를 포함할 수 있다.The reducing may include reducing the input maps of the next layer corresponding to the output maps of the current layer to the low bit width based on the determined lightweight format.

상기 경량화하는 단계는 상기 경량화 포맷에 대응하는 값으로 상기 다음 레이어의 입력 맵들에 관한 쉬프트 연산을 수행하여, 상기 현재 레이어의 출력 맵들에 대응하는 다음 레이어의 입력 맵들을 상기 로우 비트 폭(low bit width)으로 경량화하는 단계를 포함할 수 있다.The lightweighting may include performing a shift operation on the input maps of the next layer with a value corresponding to the lightweight format, so that the input maps of the next layer corresponding to the output maps of the current layer are low bit width. It may include the step of reducing the weight.

상기 뉴럴 네트워크를 이용한 처리 방법은 상기 현재 레이어의 출력 맵들을 메모리에서 로드하는 단계; 및 상기 로드된 상기 현재 레이어의 출력 맵들에 기초하여 상기 현재 레이어의 출력 맵들의 최대 값을 저장하는 레지스터를 업데이트하는 단계를 더 포함할 수 있고, 상기 경량화 포맷을 결정하는 단계는 상기 레지스터에 저장된 값에 기초하여 수행될 수 있다.The processing method using the neural network may include loading output maps of the current layer from a memory; And updating a register that stores a maximum value of output maps of the current layer based on the output maps of the current layer loaded, wherein determining the lightweight format comprises: a value stored in the register. It can be performed based on.

상기 경량화 포맷을 결정하는 단계는 상기 뉴럴 네트워크의 이전 레이어의 출력 맵들의 최대 값에 기초하여, 상기 현재 레이어의 출력 맵들의 최대 값을 예측하는 단계; 및 상기 예측된 현재 레이어의 출력 맵들의 최대 값에 기초하여 상기 현재 레이어의 출력 맵들에 관한 경량화 포맷을 결정하는 단계를 포함할 수 있다.The determining of the lightweighting format may include predicting a maximum value of output maps of the current layer based on the maximum value of output maps of a previous layer of the neural network; And determining a lightweighting format for output maps of the current layer based on the predicted maximum values of output maps of the current layer.

상기 경량화하는 단계는 상기 결정된 경량화 포맷에 기초하여 상기 현재 레이어의 출력 맵들을 상기 로우 비트 폭으로 경량화하는 단계를 포함할 수 있다.The lightweighting may include lightweighting output maps of the current layer to the low bit width based on the determined lightweighting format.

상기 경량화하는 단계는 상기 경량화 포맷에 대응하는 값으로 상기 현재 레이어의 출력 맵들에 관한 쉬프트 연산을 수행하여, 하이 비트 폭(high bit width)의 상기 현재 레이어의 출력 맵들을 상기 로우 비트 폭(low bit width)으로 경량화하는 단계를 포함할 수 있다.The reducing may include performing a shift operation on the output maps of the current layer with a value corresponding to the weighting format, thereby outputting the output maps of the current layer having a high bit width to the low bit width. width) to light weight.

상기 뉴럴 네트워크를 이용한 처리 방법은 상기 컨볼루션 연산에 의해 생성된 상기 현재 레이어의 출력 맵들에 기초하여 상기 현재 레이어의 출력 맵들의 최대 값을 저장하는 레지스터를 업데이트하는 단계를 더 포함할 수 있고, 상기 뉴럴 네트워크의 다음 레이어의 출력 맵들의 최대 값은 상기 레지스터에 저장된 값에 기초하여 예측될 수 있다.The processing method using the neural network may further include updating a register that stores a maximum value of output maps of the current layer based on output maps of the current layer generated by the convolution operation. The maximum value of the output maps of the next layer of the neural network can be predicted based on the value stored in the register.

상기 뉴럴 네트워크를 이용한 처리 방법은 웨이트 커널들을 레이어 및 출력 채널 별로 포함하는 데이터베이스를 참조하여, 상기 현재 레이어 내 현재 처리 중인 제1 출력 채널에 대응하는 제1 웨이트 커널을 획득하는 단계를 더 포함할 수 있고, 상기 현재 레이어의 출력 맵들을 생성하는 단계는 상기 현재 레이어의 입력 맵들 및 상기 제1 웨이트 커널 간의 컨볼루션 연산을 수행하여 상기 제1 출력 채널에 대응하는 제1 출력 맵을 생성하는 단계를 포함할 수 있다. 상기 제1 웨이트 커널은 상기 현재 레이어의 제2 채널에 대응하는 제2 웨이트 커널과 독립적으로 결정될 수 있다.The processing method using the neural network may further include obtaining a first weight kernel corresponding to a first output channel currently being processed in the current layer by referring to a database including weight kernels by layer and output channel. And generating the output maps of the current layer includes performing a convolution operation between the input maps of the current layer and the first weight kernel to generate a first output map corresponding to the first output channel. can do. The first weight kernel may be determined independently of the second weight kernel corresponding to the second channel of the current layer.

상기 현재 레이어의 입력 맵들 및 상기 현재 레이어의 웨이트 커널들은 상기 로우 비트 폭(low bit width)을 가지고, 상기 현재 레이어의 출력 맵들은 하이 비트 폭(high bit width)을 가질 수 있다.The input maps of the current layer and the weight kernels of the current layer may have the low bit width, and the output maps of the current layer may have a high bit width.

일 실시예에 따르면, 뉴럴 네트워크를 이용한 처리 장치는 프로세서; 및 상기 프로세서에서 읽을 수 있는 명령어를 포함하는 메모리를 포함하고, 상기 명령어가 상기 프로세서에서 실행되면, 상기 프로세서는 뉴럴 네트워크의 현재 레이어의 입력 맵들 및 상기 현재 레이어의 웨이트 커널들 간의 컨볼루션 연산을 수행하여 상기 현재 레이어의 출력 맵들을 생성하고, 상기 뉴럴 네트워크 내에서 처리되는 적어도 일부의 액티베이션 데이터의 분포에 기초하여, 상기 현재 레이어의 출력 맵들에 관한 경량화 포맷을 결정하고, 상기 결정된 경량화 포맷에 기초하여, 상기 현재 레이어의 출력 맵들에 대응하는 액티베이션 데이터를 로우 비트 폭(low bit width)으로 경량화한다.According to an embodiment, a processing apparatus using a neural network may include a processor; And a memory including instructions readable by the processor, wherein when the instructions are executed in the processor, the processor performs a convolution operation between input maps of a current layer of a neural network and weight kernels of the current layer. Generate output maps of the current layer, determine a lightweight format for the output maps of the current layer, based on the distribution of at least some activation data processed within the neural network, and based on the determined lightweight format In addition, the activation data corresponding to the output maps of the current layer is reduced to a low bit width.

다른 실시예에 따르면, 뉴럴 네트워크를 이용한 처리 방법은 복수의 레이어를 포함하는 뉴럴 네트워크를 시작하는 단계; 상기 뉴럴 네트워크의 현재 레이어의 입력 맵들 및 상기 현재 레이어의 웨이트 커널들 간의 컨볼루션 연산을 수행하여 상기 현재 레이어의 출력 맵들을 생성하는 단계; 상기 뉴럴 네트워크가 시작되기 전에 결정되지 않았던, 상기 현재 레이어의 출력 맵들에 관한 경량화 포맷을 결정하는 단계; 및 상기 결정된 경량화 포맷에 기초하여, 상기 현재 레이어의 출력 맵들에 대응하는 액티베이션 데이터를 경량화하는 단계를 포함한다.According to another embodiment, a processing method using a neural network may include starting a neural network including a plurality of layers; Generating output maps of the current layer by performing a convolution operation between input maps of the current layer of the neural network and weight kernels of the current layer; Determining a lightweight format for the output maps of the current layer that was not determined before the neural network was started; And lightweighting the activation data corresponding to the output maps of the current layer based on the determined lightweight format.

상기 뉴럴 네트워크를 시작하는 단계는 입력 데이터에 관한 추론(inference)을 위해, 상기 뉴럴 네트워크에 상기 입력 데이터를 입력하는 단계를 포함할 수 있다.Initiating the neural network may include inputting the input data into the neural network for inference regarding input data.

도 1은 일 실시예에 따른 처리 장치 및 뉴럴 네트워크를 나타낸 도면.
도 2는 일 실시예에 따른 3D 컨볼루셔널 뉴럴 네트워크의 구조를 나타낸 도면.
도 3은 일 실시예에 따른 경량화 포맷을 나타낸 도면.
도 4는 일 실시예에 따른 웨이트 커널의 경량화를 나타낸 도면.
도 5는 일 실시예에 따른 경량화 데이터를 포함하는 룩 업 테이블을 나타낸 도면.
도 6은 일 실시예에 따른 액티베이션 데이터의 동적 경량화 프로세스를 나타낸 도면.
도 7은 다른 실시예에 따른 액티베이션 데이터의 동적 경량화 프로세스를 나타낸 도면.
도 8은 일 실시예에 따른 입력 맵의 최대 값 분포를 나타낸 그래프.
도 9는 일 실시예에 따른 트레이닝 장치를 나타낸 블록도.
도 10은 일 실시예에 따른 처리 장치를 나타낸 블록도.
도 11은 일 실시예에 따른 처리 방법을 나타낸 플로우 차트.
도 12는 다른 실시예에 따른 처리 방법을 나타낸 플로우 차트.1 illustrates a processing apparatus and a neural network according to one embodiment.
2 illustrates a structure of a 3D convolutional neural network according to an embodiment.
3 illustrates a lightweight format, according to one embodiment.
4 is a view showing the weight of the weight kernel according to an embodiment;
5 illustrates a lookup table including lightweight data, according to an exemplary embodiment.
6 illustrates a dynamic lightweight process of activation data, according to an embodiment.
7 is a diagram illustrating a dynamic lightweight process of activation data according to another embodiment.
8 is a graph illustrating a maximum value distribution of an input map according to an embodiment.
9 is a block diagram illustrating a training apparatus according to an embodiment.
10 is a block diagram illustrating a processing apparatus according to an embodiment.
11 is a flowchart illustrating a processing method according to an embodiment.
12 is a flowchart illustrating a processing method according to another embodiment.

아래 개시되어 있는 특정한 구조 또는 기능들은 단지 기술적 개념을 설명하기 위한 목적으로 예시된 것으로서, 아래 개시와는 달리 다른 다양한 형태로 실시될 수 있으며 본 명세서의 실시예들을 한정하지 않는다.The specific structures or functions disclosed below are illustrated for the purpose of describing technical concepts only, and may be embodied in various other forms than the disclosure below, and do not limit the embodiments of the present specification.

제1 또는 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 이해되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but these terms should be understood only for the purpose of distinguishing one component from another. For example, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly indicates otherwise. As used herein, the term "comprises", etc., is intended to designate that the stated features, numbers, steps, actions, components, parts or combinations thereof are present, but one or more other features or numbers, steps, actions, It should be understood that it does not exclude in advance the possibility of the presence or addition of components, parts or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art, and are not construed in ideal or excessively formal meanings unless expressly defined herein. Do not.

이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings. Like reference numerals in the drawings denote like elements.

도 1은 일 실시예에 따른 처리 장치 및 뉴럴 네트워크를 나타낸 도면이다. 도 1을 참조하면, 처리 장치(100)는 뉴럴 네트워크(neural network, 110)를 위한 데이터를 로우 비트 폭(low bit width)으로 경량화(lightening)하여 표현하고, 경량화된 데이터를 이용하여 뉴럴 네트워크(110)의 동작을 처리할 수 있다. 예를 들어, 뉴럴 네트워크(110)의 동작은 입력 영상 내 객체를 인식하거나 인증하는 것을 포함할 수 있다. 아래에서 설명되는 경량화를 포함하는 뉴럴 네트워크(110)와 관련된 처리 동작의 적어도 일부는 소프트웨어로 구현되거나, 혹은 뉴럴 프로세서(neural processor)를 포함하는 하드웨어로 구현되거나, 소프트웨어 및 하드웨어의 조합으로 구현될 수 있다.1 is a diagram illustrating a processing device and a neural network according to an embodiment. Referring to FIG. 1, the processing apparatus 100 expresses data for a neural network 110 with a low bit width, and expresses data for a neural network 110 using a neural network using light weighted data. The operation of 110 may be processed. For example, the operation of the neural network 110 may include recognizing or authenticating an object in the input image. At least a portion of the processing operations associated with the neural network 110, including the lightweight described below, may be implemented in software, or in hardware, including a neural processor, or in a combination of software and hardware. have.

뉴럴 네트워크(110)는 컨볼루셔널 뉴럴 네트워크(convolutional neural network, CNN)를 포함할 수 있다. 뉴럴 네트워크(110)는 딥 러닝에 기반하여 비선형적 관계에 있는 입력 데이터 및 출력 데이터를 서로 매핑함으로써 객체 인식 및 객체 인증 등을 수행할 수 있다. 딥 러닝은 빅 데이터 세트로부터 영상 또는 음성 인식과 같은 문제를 해결하기 위한 기계 학습 기법이다. 딥 러닝은 준비된 트레이닝 데이터를 이용하여 뉴럴 네트워크(110)를 트레이닝하면서 에너지가 최소화되는 지점을 찾아가는 최적화 문제 풀이 과정으로 이해될 수 있다. 딥 러닝의 지도식(supervised) 또는 비지도식(unsupervised) 학습을 통해 뉴럴 네트워크(110)의 구조, 혹은 모델에 대응하는 가중치(weight)가 구해질 수 있고, 이러한 가중치를 통해 입력 데이터 및 출력 데이터가 서로 매핑될 수 있다.The neural network 110 may include a convolutional neural network (CNN). The neural network 110 may perform object recognition and object authentication by mapping input data and output data having a nonlinear relationship with each other based on deep learning. Deep learning is a machine learning technique for solving problems such as video or speech recognition from big data sets. Deep learning may be understood as an optimization problem solving process that finds a point where energy is minimized while training the neural network 110 using the prepared training data. Through supervised or unsupervised learning of deep learning, a weight corresponding to the structure or model of the neural network 110 can be obtained. May be mapped to each other.

뉴럴 네트워크(110)는 복수의 레이어들을 포함할 수 있다. 복수의 레이어들은 입력 레이어(input layer), 적어도 하나의 히든 레이어(hidden layer), 및 출력 레이어(output layer)를 포함할 수 있다. 제1 레이어(111) 및 제2 레이어(112)는 상기 복수의 레이어들 중 적어도 일부일 수 있다. 아래에서는 제2 레이어(112)는 제1 레이어(111)의 다음 레이어이며, 제1 레이어(111)가 처리된 이후에 제2 레이어(112)가 처리되는 것으로 가정한다. 도 1에는 두 레이어들(111, 112)이 도시되어 있으나, 이는 설명의 편의를 위한 것으로, 뉴럴 네트워크(110)는 두 레이어들(111, 112) 외에 더 많은 레이어들을 포함할 수 있다.The neural network 110 may include a plurality of layers. The plurality of layers may include an input layer, at least one hidden layer, and an output layer. The first layer 111 and the second layer 112 may be at least some of the plurality of layers. Hereinafter, it is assumed that the second layer 112 is the next layer of the first layer 111, and the second layer 112 is processed after the first layer 111 is processed. Although two layers 111 and 112 are shown in FIG. 1, this is for convenience of description, and the neural network 110 may include more layers in addition to the two layers 111 and 112.

CNN에서 각 레이어에 입력되는 데이터는 입력 특징 맵(input feature map)으로 지칭될 수 있고, 각 레이어에서 출력되는 데이터는 출력 특징 맵(output feature map)으로 지칭될 수 있다. 아래에서 입력 특징 맵은 간단히 입력 맵으로 지칭될 수 있고, 출력 특징 맵은 간단히 출력 맵으로 지칭될 수 있다. 실시예에 따라 출력 맵은 각 레이어 내의 컨볼루션 연산 결과, 혹은 각 레이어 내의 활성화 함수(activation function) 처리 결과에 대응할 수 있다. 입력 맵 및 출력 맵은 액티베이션 데이터(activation data)로 지칭될 수 있다. 예를 들어, 각 레이어 내의 컨볼루션 연산 결과, 혹은 각 레이어 내의 활성화 함수 처리 결과는 액티베이션 데이터로 지칭될 수 있다. 입력 레이어에서 입력 맵은 입력 영상의 영상 데이터에 대응할 수 있다.Data input to each layer in the CNN may be referred to as an input feature map, and data output in each layer may be referred to as an output feature map. In the following, the input feature map may simply be referred to as an input map, and the output feature map may simply be referred to as an output map. According to an embodiment, the output map may correspond to a result of a convolution operation in each layer, or a result of an activation function processing in each layer. Input maps and output maps may be referred to as activation data. For example, the result of the convolution operation in each layer, or the result of the activation function processing in each layer may be referred to as activation data. The input map in the input layer may correspond to image data of the input image.

뉴럴 네트워크(110)에 관한 동작을 처리하기 위해, 처리 장치(100)는 각 레이어에 관해 입력 맵 및 웨이트 커널(weight kernel) 간에 컨볼루션 연산(convolution operation)을 수행할 수 있고, 컨볼루션 연산 결과에 기초하여 출력 맵을 생성할 수 있다. CNN에서는 컨볼루셔널 계층에 관해 딥 러닝이 수행될 수 있다. 처리 장치(100)는 컨볼루션 연산 결과에 활성화 함수를 적용하여 출력 맵을 생성할 수 있다. 활성화 함수는 시그모이드(sigmoid), 하이퍼볼릭 탄젠트(hyperbolic tangent, tanh) 및 렐루(rectified linear unit, ReLU)를 포함할 수 있으며, 활성화 함수에 의해 뉴럴 네트워크(110)에 비선형성이 부여될 수 있다. 뉴럴 네트워크(110)의 폭과 깊이가 충분히 크면 임의의 함수를 구현할 수 있을 만큼의 용량(capacity)을 가질 수 있다. 뉴럴 네트워크(110)가 적절한 트레이닝 과정을 통해 충분히 많은 양의 트레이닝 데이터를 학습하면 최적의 성능을 달성할 수 있다.In order to process an operation related to the neural network 110, the processing apparatus 100 may perform a convolution operation between an input map and a weight kernel with respect to each layer, and may result in a convolution operation. Based on the output map can be generated. In CNN, deep learning may be performed on the convolutional layer. The processing device 100 may generate an output map by applying an activation function to the result of the convolution operation. The activation function may include sigmoid, hyperbolic tangent (tanh), and rectified linear unit (ReLU), and nonlinearity may be imparted to the neural network 110 by the activation function. have. If the width and depth of the neural network 110 is large enough, the neural network 110 may have enough capacity to implement any function. When the neural network 110 learns a sufficient amount of training data through an appropriate training process, optimal performance may be achieved.

CNN은 영상과 같은 2D 데이터를 처리하는데 적합할 수 있다. CNN에서는 2D 데이터를 처리하기 위해 입력 맵 및 웨이트 커널 간에 컨볼루션 연산이 수행될 수 있는데, 모바일 단말과 같이 자원이 제한된 환경에서 이러한 컨볼루션 연산을 수행하는데 시간 및 자원이 크게 소요될 수 있다.The CNN may be suitable for processing 2D data such as images. In the CNN, a convolution operation may be performed between an input map and a weight kernel to process 2D data. In a CNN, such a convolution operation may take a long time and resources in a resource limited environment such as a mobile terminal.

실시예에 따르면, 처리 장치(100)는 경량화된 데이터를 이용하여 컨볼루션 연산을 수행할 수 있다. 경량화는 하이 비트 폭(high bit width)의 데이터를 로우 비트 폭(low bit width)의 데이터로 변환하는 과정을 의미한다. 로우 비트 폭은 하이 비트 폭에 비해 상대적으로 적은 비트 수를 가질 수 있다. 예를 들어 하이 비트 폭이 32 비트인 경우 로우 비트 폭은 16 비트, 8 비트 혹은 4비트일 수 있고, 하이 비트 폭이 16 비트인 경우 로우 비트 폭은 8 비트 혹은 4 비트일 수 있다. 하이 비트 폭 및 로우 비트 폭의 구체적인 수치는 앞선 예시에 한정되는 것은 아니며, 실시예에 따라 다양한 수치가 활용될 수 있다.According to an embodiment, the processing apparatus 100 may perform a convolution operation by using the weighted data. The weight reduction refers to a process of converting data of high bit width into data of low bit width. The low bit width may have a relatively small number of bits compared to the high bit width. For example, when the high bit width is 32 bits, the low bit width may be 16 bits, 8 bits, or 4 bits. When the high bit width is 16 bits, the low bit width may be 8 bits or 4 bits. Specific values of the high bit width and the low bit width are not limited to the above examples, and various values may be utilized according to embodiments.

처리 장치(100)는 고정 소수점(fixed point) 변환에 기초하여 데이터를 경량화할 수 있다. 고정 소수점 변환 과정에서 부동 소수점의 변수에 일정한 지수를 곱하면 해당 변수는 정수(integer)화될 수 있다. 이때 곱해지는 지수는 Q-포맷(Q-format)으로 정의될 수 있고, 하이 비트 폭의 데이터를 로우 비트 폭으로 변환하기 위한 Q-포맷은 경량화 포맷으로 정의될 수 있다. 경량화 포맷에 관해서는 추후 상세히 설명한다.The processing device 100 may reduce the weight of data based on a fixed point conversion. In the process of fixed-point conversion, multiplying a variable of a floating-point number by a certain exponent may cause the variable to be integerized. In this case, the multiplied index may be defined as a Q-format, and a Q-format for converting high bit width data into a low bit width may be defined as a lightweight format. The lightweight format will be described later in detail.

뉴럴 네트워크(110)는 트레이닝 단계에서 트레이닝 데이터에 기초하여 트레이닝될 수 있고, 추론(inference) 단계에서 입력 데이터에 관한 분류, 인식, 검출과 같은 추론 동작을 수행할 수 있다. 트레이닝 단계를 거쳐 웨이트 커널이 결정되면, 웨이트 커널은 로우 비트 폭의 포맷으로 경량화되어 저장될 수 있다. 트레이닝은 오프라인 단계 혹은 온라인 단계에서 수행될 수 있다. 최근 뉴럴 프로세서와 같은 트레이닝 가속이 가능한 하드웨어의 등장으로 인해, 온라인 단계의 트레이닝이 가능하다. 웨이트 커널은 '미리' 결정된 것으로 표현될 수 있는데, 여기서 '미리'는 뉴럴 네트워크(110)에 추론을 위한 입력 데이터가 입력되기 전을 의미할 수 있다.The neural network 110 may be trained based on the training data in the training phase, and perform inferential operations such as classification, recognition, and detection on input data in the inference phase. When the weight kernel is determined through the training step, the weight kernel may be lightly stored in a low bit width format. Training can be performed in an offline phase or an online phase. With the recent advent of hardware capable of training acceleration, such as neural processors, online training is possible. The weight kernel may be expressed as 'predetermined', where 'predetermined' may mean before input data for inference is input to the neural network 110.

실시예에 따르면, 웨이트 커널은 레이어 및 채널 별로 경량화될 수 있다. 뉴럴 네트워크(110)는 복수의 레이어들을 포함할 수 있고, 각 레이어는 웨이트 커널의 수에 따라 복수의 채널들을 포함할 수 있다. 웨이트 커널은 레이어 및 채널 별로 경량화될 수 있고, 경량화된 웨이트 커널은 데이터베이스를 통해 레이어 및 채널 별로 저장될 수 있다. 일례로, 데이터베이스는 룩 업 테이블을 포함할 수 있다.According to the embodiment, the weight kernel may be lighter for each layer and channel. The neural network 110 may include a plurality of layers, and each layer may include a plurality of channels according to the number of weight kernels. The weight kernel may be lightened for each layer and channel, and the weighted weight kernel may be stored for each layer and channel through a database. In one example, the database may include a look up table.

i번째 레이어에서 웨이트 커널의 사이즈가 K_i * K_i이고, 입력 채널의 수가 C_i이고, 출력 채널의 수가 D_i라고 하면, i번째 레이어의 웨이트 커널은 ((K_i * K_i) * C_i * D_i)로 표현될 수 있다. CNN에 포함된 레이어의 수가 I라고 하면, CNN의 웨이트 커널은 ((K_i * K_i) * C_i * D_i) * I로 표현될 수 있다. 컨볼루션 연산을 위해 입력 맵과 웨이트 커널 간의 매트릭스 곱을 하는 경우, 단일 출력 맵을 생성하기 위한 연산에 필요한 웨이트 커널은 (K * K) * C로 나타낼 수 있다. (K * K) * C의 웨이트 커널에 기초하여 단일 출력 채널이 결정되므로, 웨이트 커널이 (K * K) * C 단위로 경량화되는 것은 웨이트 커널이 출력 채널 별로 경량화되는 것으로 표현될 수 있다.If the size of the weight kernel in the i layer is K _i * K _i , the number of input channels is C _i, and the number of output channels is D _i , the weight kernel of the i layer is ((K _i * K _i ) * C _i * D _i ). If the number of layers included in the CNN is I, the weight kernel of the CNN may be expressed as ((K _i * K _i ) * C _i * D _i ) * I. When the matrix product between the input map and the weight kernel is used for the convolution operation, the weight kernel required for the operation for generating the single output map may be represented by (K * K) * C. Since a single output channel is determined based on the weight kernel of (K * K) * C, the weight of the weight kernel in units of (K * K) * C may be expressed as the weight kernel being lighter for each output channel.

최소 단위의 웨이트 커널 내 값들은 동일한 경량화 포맷을 갖도록 하는 것이 바람직하다. 웨이트 커널이 최소 단위인 채널 별로 경량화됨에 따라, 동일한 비트 수로 표현할 수 있는 해상도(resolution)가 최대화될 수 있다. 예를 들어, 레이어 단위로 웨이트 커널이 경량화되는 경우, 오버플로(overflow)를 방지하기 위해 경량화 포맷이 낮게 설정될 수 있고, 이에 따라 수치 오차(numerical error)가 발생할 수 있다. 웨이트 커널을 채널 단위로 경량화하는 경우 웨이트 커널을 레이어 단위로 경량화 경우보다 더 작은 단위의 데이터 분포가 고려되므로 정보 손실이 줄어들 수 있다. 실시예에 따르면 채널 별 웨이트 커널의 데이터 분포를 고려하여 경량화 포맷이 결정되고, 이에 따라 웨이트 커널이 최소 단위 별로 경량화된다. 따라서, 낭비되는 비트가 최소화되고 정보 손실이 최소화될 수 있다.It is desirable that the minimum kernel weight values have the same lightweight format. As the weight kernel is lighter for each channel, which is the minimum unit, a resolution that can be represented by the same number of bits may be maximized. For example, when the weight kernel is reduced in units of layers, the weight reduction format may be set low in order to prevent overflow, and thus a numerical error may occur. In the case where the weight kernel is lightened in units of channels, the loss of information may be reduced since data distribution in smaller units is considered than in the case where the weight kernel is light in layers. According to the embodiment, the weight reduction format is determined in consideration of the data distribution of the weight kernel for each channel, and accordingly, the weight kernel is lightened for each minimum unit. Thus, wasted bits can be minimized and information loss can be minimized.

컨볼루션 연산은 MAC(Multiplication and Accumulation) 연산에 해당하므로 레지스터를 통한 덧셈을 누적해서 처리되는 범위 내에서 데이터(특히, 웨이트 커널)의 Q-포맷 또는 경량화 포맷이 서로 동일하게 맞춰질 필요가 있다. 만약 누적 덧셈이 처리되는 데이터의 Q-포맷 또는 경량화 포맷이 맞춰져 있지 않은 경우 Q-포맷 또는 경량화 포맷을 맞추기 위한 쉬프트 연산(shift operation)이 추가로 필요할 수 있다. 실시예에 따르면 특정 채널에서 웨이트 커널의 Q-포맷 또는 경량화 포맷이 동일한 경우에는, 상기 채널의 입력 맵과 상기 채널의 웨이트 커널 간의 컨볼루션 연산 시 Q-포맷 또는 경량화 포맷을 맞추기 위한 쉬프트 연산이 생략될 수 있다.Convolution operations correspond to multiplication and accumulation (MAC) operations, so that the Q-format or lightweight format of the data (particularly the weight kernel) needs to be matched with each other within a range that accumulates addition through registers. If the Q-format or the lightweight format of the data for which the cumulative addition is processed is not aligned, a shift operation may be additionally required to fit the Q-format or the lightweight format. According to an embodiment, when a weight kernel has the same Q-format or lightweight format in a specific channel, a shift operation for matching the Q-format or lightweight format is omitted during a convolution operation between the input map of the channel and the weight kernel of the channel. Can be.

입력 맵 및 출력 맵을 위한 경량화 포맷이 오프라인 단계에서 미리 결정된다면, 온라인 단계에서 입력 맵 및 출력 맵을 표현하기 위한 데이터의 해상도(resolution)가 크게 감소할 수 있다. 입력 맵 및 출력 맵은 동적 범위(dynamic range)가 매우 커서, 데이터 표현의 제한된 길이(length) 및 연산 결과의 오버플로를 방지하기 위해 낮은 경량화 포맷이 고정적으로 사용될 수 있는데, 낮은 경량화 포맷이 고정적으로 사용됨에 따라 데이터를 표현할 수 있는 비트 수가 제한될 수 있기 때문이다.If the lightweight format for the input map and output map is predetermined in the offline phase, the resolution of the data for representing the input map and output map in the online phase can be greatly reduced. Input maps and output maps have a very large dynamic range, so that a low weight format can be fixedly used to avoid limited lengths of data representations and overflow of computational results. This is because the number of bits that can represent data may be limited as used.

처리 장치(100)는 해상도(resolution)를 높이고, 수치 오차(numerical error)를 억제하기 위해, 입력 맵 및 출력 맵을 위한 경량화 포맷을 적응적으로(adaptively) 결정할 수 있다. 경량화 포맷을 적응적으로 결정한다는 것은 뉴럴 네트워크(110)가 시작되기 전에는 결정되지 않았던 경량화 포맷을 뉴럴 네트워크(110)가 시작된 이후에 결정하는 것을 의미할 수 있다. 뉴럴 네트워크(110)가 시작되었다는 것은 뉴럴 네트워크(110)가 추론(inference)을 위한 준비가 된 것을 의미할 수 있다. 예를 들어, 뉴럴 네트워크(110)가 시작된 것은 뉴럴 네트워크(110)가 메모리에 로드된 것, 혹은 뉴럴 네트워크(110)가 메모리에 로드된 이후 뉴럴 네트워크(110)에 추론(inference)을 위한 입력 데이터가 입력된 것을 포함할 수 있다.The processing apparatus 100 may adaptively determine a lightweight format for the input map and the output map in order to increase the resolution and suppress the numerical error. Adaptively determining the lightweight format may mean determining the lightweight format, which was not determined before the neural network 110 started, after the neural network 110 started. The neural network 110 has started may mean that the neural network 110 is ready for inference. For example, when the neural network 110 is started, the neural network 110 is loaded into memory, or the input data for inference into the neural network 110 after the neural network 110 is loaded into memory. May include an input.

도 1에서 그래프(131)는 입력 영상(130)의 픽셀 값들의 데이터 분포를 나타내고, 그래프(141)는 입력 영상(140)의 픽셀 값들의 데이터 분포를 나타내고, 그래프(151)는 입력 영상(150)의 픽셀 값들의 데이터 분포를 나타낸다. 입력 영상(130)은 상대적으로 작은 값들의 데이터를 포함하고, 입력 영상(150)은 상대적으로 큰 값들의 데이터를 포함한다. 처리 장치(100)는 뉴럴 네트워크(110)를 이용하여 입력 영상들(130 내지 150)을 각각 처리할 때, 입력 영상들(130 내지 150)에 관해 각각 다른 경량화 포맷을 적응적으로 설정할 수 있다. 예를 들어, 처리 장치(100)는 입력 영상(130)과 같이 작은 값의 데이터 집합에 관해서는 높은 경량화 포맷을 적용할 수 있고, 입력 영상(150)과 같이 큰 값의 데이터 집합에 관해서는 낮은 경량화 포맷을 적용할 수 있다.In FIG. 1, the graph 131 represents a data distribution of pixel values of the input image 130, the graph 141 represents a data distribution of pixel values of the input image 140, and the graph 151 represents an input image 150. Data distribution of pixel values). The input image 130 includes data of relatively small values, and the input image 150 includes data of relatively large values. When the processing apparatus 100 processes the input images 130 to 150 using the neural network 110, the processing apparatus 100 may adaptively set different lightweight formats for the input images 130 to 150, respectively. For example, the processing apparatus 100 may apply a high light weight format to a data set having a small value, such as the input image 130, and may lower a data set having a large value, such as the input image 150. Lightweight format can be applied.

예를 들어, 그래프(161)에 대응하는 데이터 집합은 16비트로 표현될 경우 경량화 포맷 Q6에서 1/64 스텝의 해상도가 확보될 수 있다. 경량화 포맷 Q6 및 1/64 스텝의 해상도는 소수 점 여섯 자리까지 활용 가능한 해상도를 의미한다. 경량화 포맷이 클수록, 스텝이 작아질수록 높은 해상도의 표현이 가능하다. 그래프(131)에 대응하는 데이터 집합은 그 값이 작기 때문에 8비트로 표현되어도 경량화 포맷 Q6에서 1/64 스텝의 해상도가 확보될 수 있다. 이와 같이 데이터는 그 분포에 따라 로우 비트 폭으로도 비교적 정확하게 표현될 수 있다. 그래프(141)의 데이터는 그래프(131)의 데이터에 비해 값이 크므로 8비트로 표현 시 경량화 포맷 Q4 및 1/16 스텝의 해상도가 적용될 수 있고, 그래프(151)의 데이터는 그래프(141)의 데이터에 비해 값이 크므로 8비트로 표현 시 경량화 포맷 Q3 및 1/8 스텝의 해상도가 적용될 수 있다. 이러한 적응적 경량화는 뉴럴 네트워크(110)의 레이어 별로 적용될 수 있다.For example, when the data set corresponding to the graph 161 is represented by 16 bits, the resolution of 1/64 steps may be secured in the lightweight format Q6. Lightweight formats Q6 and 1/64 steps resolution means resolution that can be used to six decimal places. The larger the lightweight format and the smaller the step, the higher the resolution can be represented. Since the data set corresponding to the graph 131 has a small value, the resolution of 1/64 steps can be secured in the lightweight format Q6 even when expressed in 8 bits. As such, the data can be represented relatively accurately even with a low bit width according to its distribution. Since the data of the graph 141 is larger than the data of the graph 131, resolutions of the lightweight formats Q4 and 1/16 steps may be applied when expressed in 8 bits, and the data of the graph 151 may be applied to the graph 141. Since the value is larger than the data, the resolutions of the lightweight formats Q3 and 1/8 steps may be applied when expressed in 8 bits. Such adaptive weight reduction may be applied for each layer of the neural network 110.

동적 경량화를 위해, 처리 장치(100)는 뉴럴 네트워크(110)의 현재 레이어의 입력 맵들 및 현재 레이어의 웨이트 커널들 간의 컨볼루션 연산을 수행하여 현재 레이어의 출력 맵들을 생성하고, 뉴럴 네트워크(110) 내에서 처리되는 적어도 일부의 액티베이션 데이터의 분포에 기초하여, 상기 현재 레이어의 출력 맵들에 관한 경량화 포맷을 결정할 수 있다. 처리 장치(100)는 결정된 경량화 포맷에 기초하여, 현재 레이어의 출력 맵들에 대응하는 액티베이션 데이터를 로우 비트 폭으로 경량화할 수 있다.In order to dynamically reduce the weight, the processing apparatus 100 performs a convolution operation between the input maps of the current layer of the neural network 110 and the weight kernels of the current layer to generate output maps of the current layer, and the neural network 110 Based on the distribution of the at least some activation data processed therein, the lightweighting format for the output maps of the current layer can be determined. The processing apparatus 100 may reduce the activation data corresponding to the output maps of the current layer to a low bit width based on the determined lightweight format.

일 실시예에 따르면, 처리 장치(100)는 현재 레이어의 출력 맵들의 최대 값에 기초하여 현재 레이어의 출력 맵들에 관한 경량화 포맷을 결정하고, 결정된 경량화 포맷에 기초하여 현재 레이어의 출력 맵들에 대응하는 다음 레이어의 입력 맵들을 로우 비트 폭으로 경량화할 수 있다. 다른 실시예에 따르면, 처리 장치(100)는 이전 레이어의 출력 맵들의 최대 값에 기초하여 현재 레이어의 출력 맵들의 최대 값을 예측하고, 예측된 현재 레이어의 출력 맵들의 최대 값에 기초하여 현재 레이어의 출력 맵들에 관한 경량화 포맷을 결정하고, 결정된 경량화 포맷에 기초하여 현재 레이어의 출력 맵들을 로우 비트 폭으로 경량화할 수 있다.According to an embodiment, the processing apparatus 100 determines a lightweight format regarding the output maps of the current layer based on the maximum value of the output maps of the current layer, and corresponds to the output maps of the current layer based on the determined lightweight format. The input maps of the next layer can be reduced to a low bit width. According to another embodiment, the processing apparatus 100 predicts the maximum value of the output maps of the current layer based on the maximum value of the output maps of the previous layer, and based on the predicted maximum value of the output maps of the current layer. It is possible to determine a lightweight format for the output maps of and reduce the output maps of the current layer to a low bit width based on the determined lightweight format.

입력 맵 및 출력 맵에 관한 적응적 경량화(adaptive lightening)는 트레이닝 단계 및 추론 단계에서 수행될 수 있다. 트레이닝 단계에서는 트레이닝 데이터에 기초한 입력 맵 및 출력 맵이 경량화될 수 있고, 추론 단계에서는 추론 대상인 입력 데이터에 기초한 입력 맵 및 출력 맵이 경량화될 수 있다. 뉴럴 네트워크(110)의 트레이닝은 오프라인 단계 및 온라인 단계 중 적어도 하나에서 수행될 수 있다. 다시 말해, 실시예에 따른 적응적 경량화(adaptive lightening)는 오프라인 트레이닝 및 온라인 트레이닝에서 사용되는 트레이닝 데이터, 및 추론 단계에서 사용되는 입력 데이터에 관해 적용될 수 있다.Adaptive lightening on the input map and output map may be performed in the training phase and the inference phase. In the training phase, the input map and the output map based on the training data may be reduced in weight, and in the inference phase, the input map and the output map based on the inference target data may be reduced in weight. Training of the neural network 110 may be performed in at least one of an offline phase and an online phase. In other words, adaptive lightening according to the embodiment may be applied with respect to training data used in offline training and online training, and input data used in the inference step.

입력 맵 및 출력 맵과 같은 데이터 집합을 경량화하기 위해서는 데이터 집합의 최대 값을 검출하기 위한 제1 메모리 접근 동작 및, 검출된 최대 값에 기초하여 데이터 집합에 경량화 포맷을 적용하기 위한 제2 메모리 접근 동작이 추가로 요구될 수 있다. 데이터 집합의 경량화를 위해 이와 같은 추가적인 동작이 수행된다면 추가적인 컴퓨팅 자원이 소모되고 데이터 처리 속도가 저하될 수 있다. 실시예에 따르면 입력 맵 및 출력 맵을 경량화할 때 이러한 추가적인 동작이 최소화될 수 있다.A first memory access operation for detecting a maximum value of the data set to lighten the data set such as an input map and an output map, and a second memory access operation for applying a lightweight format to the data set based on the detected maximum value. This may be required further. If such additional operations are performed to reduce the weight of the data set, additional computing resources may be consumed and data processing speed may be reduced. According to an embodiment, this additional operation may be minimized when the input map and the output map are lightened.

일 실시예에 따르면, 처리 장치(100)는 제1 레이어(111)의 하이 비트 폭의 출력 맵을 레지스터에서 메모리로 저장할 때 제1 레이어(111)의 하이 비트 폭의 출력 맵의 최대 값을 구하고, 제2 레이어(112)의 컨볼루션 연산 전에 제2 레이어(112)의 하이 비트 폭의 입력 맵을 로드하여 상기 구한 최대 값을 기초로 이를 로우 비트 폭의 입력 맵으로 경량화할 수 있다. 이러한 동작에 따라 전술된 제1 메모리 접근 동작이 생략될 수 있다.According to an embodiment, the processing apparatus 100 obtains a maximum value of the high bit width output map of the first layer 111 when storing the high bit width output map of the first layer 111 from a register to a memory. The input map of the high bit width of the second layer 112 may be loaded before the convolution operation of the second layer 112, and the weight may be reduced to the low bit width input map based on the obtained maximum value. According to this operation, the aforementioned first memory access operation may be omitted.

다른 실시예에 따르면, 처리 장치(100)는 제1 레이어(111)의 출력 맵의 최대 값을 이용하여 제2 레이어(112)의 출력 맵의 최대 값을 예측하고, 예측된 최대 값으로 제2 레이어(112)의 출력 맵을 경량화할 수 있다. 이러한 동작에 따라 전술된 제1 메모리 접근 동작 및 제2 메모리 접근 동작이 생략될 수 있다.According to another embodiment, the processing apparatus 100 predicts the maximum value of the output map of the second layer 112 by using the maximum value of the output map of the first layer 111 and sets the second value to the predicted maximum value. The output map of the layer 112 can be reduced in weight. According to this operation, the above-described first memory access operation and the second memory access operation may be omitted.

실시예들은 스마트폰 등의 제한된 임베디드 환경에서, 처리 속도나 메모리 활용을 극대화하여 인식 및 인증 기술을 효과적으로 구현할 수 있다. 또한, 실시예들은 딥 뉴럴 네트워크의 성능 저하를 최소화하면서 딥 뉴럴 네트워크를 고속화할 수 있고, 효과적인 하드웨어 가속기의 구조를 설계하는데 활용될 수 있다.Embodiments can effectively implement recognition and authentication techniques by maximizing processing speed or memory utilization in a limited embedded environment, such as a smartphone. In addition, the embodiments can speed up the deep neural network while minimizing performance degradation of the deep neural network, and can be utilized to design an effective hardware accelerator structure.

도 2는 일 실시예에 따른 3D 컨볼루셔널 뉴럴 네트워크의 구조를 나타낸 도면이다. 도 2의 3D 컨볼루셔널 뉴럴 네트워크는 도 1의 뉴럴 네트워크(110) 내 어느 하나의 레이어에 대응할 수 있다.2 is a diagram illustrating a structure of a 3D convolutional neural network according to an embodiment. The 3D convolutional neural network of FIG. 2 may correspond to any one layer in the neural network 110 of FIG. 1.

도 2를 참조하면, 웨이트 커널들(210) 및 입력 맵들(220) 간의 컨볼루션 연산에 기초하여 출력 맵들(230)이 생성된다. 웨이트 커널들(211)에서 단일 웨이트 커널의 사이즈는 K*K이고, 하나의 출력 채널에 대응하는 웨이트 커널 그룹(211)은 C 개의 서브 커널들로 구성된다. 예를 들어, 첫 번째 레이어에서 C 개의 서브 커널은 각각 빨강(red, R) 성분, 초록(green, G) 성분 및 파랑(blue, B) 성분에 대응할 수 있다. C는 입력 채널의 수에 대응할 수 있다. 웨이트 커널들(210)에서 웨이트 커널 그룹의 수는 D이다. D는 출력 채널의 수에 대응할 수 있다. 웨이트 커널 그룹(211) 및 입력 맵들(220)의 영역(221) 간의 컨볼루션 연산에 기초하여 출력 맵(232) 내 영역(231)이 결정되며, 출력 맵(232)의 나머지 영역에 대해 웨이트 커널 그룹(211) 및 입력 맵들(220) 간의 컨볼루션 연산이 순차적으로 수행됨에 따라 출력 맵(232)이 생성된다. 입력 맵의 사이즈는 W1*H1이고, 출력 맵의 사이즈는 W2*H2이다. 출력 맵의 사이즈는 입력 맵의 사이즈보다 작을 수 있다. 입력 맵들(220)은 C 개의 입력 맵을 포함하고, 출력 맵들(230)은 D 개의 출력 맵을 포함한다.Referring to FIG. 2, output maps 230 are generated based on a convolution operation between weight kernels 210 and input maps 220. The size of a single weight kernel in the weight kernels 211 is K * K, and the weight kernel group 211 corresponding to one output channel is composed of C subkernels. For example, the C subkernels in the first layer may correspond to red, R, green, and blue components, respectively. C may correspond to the number of input channels. The number of weight kernel groups in the weight kernels 210 is D. D may correspond to the number of output channels. An area 231 in the output map 232 is determined based on a convolution operation between the weight kernel group 211 and the area 221 of the input maps 220, and the weight kernel for the remaining area of the output map 232. As the convolution operation between the group 211 and the input maps 220 is performed sequentially, the output map 232 is generated. The size of the input map is W1 * H1 and the size of the output map is W2 * H2. The size of the output map may be smaller than the size of the input map. Input maps 220 include C input maps and output maps 230 include D output maps.

입력 맵들(220)은 매트릭스(225)로 나타낼 수 있다. 매트릭스(225)에서 하나의 열은 영역(221)에 대응하며, K^2*C로 나타낼 수 있다. 매트릭스(225)의 열의 개수 W1*H1는 스캔 동작이 수행되는 입력 맵들(220)의 전체 면적을 나타낸다. 매트릭스(225)를 전치(transpose)하여 입력 맵들(240)을 나타낼 수 있다. 입력 맵들(240)에서 벡터(241)의 길이는 K^2*C이고, N은 하나의 출력 맵을 생성하는데 필요한 컨볼루션 연산의 횟수를 나타낸다. 입력 맵들(240) 및 웨이트 커널들(250) 간의 컨볼루션 연산에 기초하여 출력 맵들(260)이 생성된다. 웨이트 커널들(250)은 웨이트 커널들(210)에 대응하고, 출력 맵들(260)은 출력 맵들(230)에 대응한다. 웨이트 커널 그룹(251)의 사이즈는 K^2*C에 대응하고, 웨이트 커널들(250)은 D개의 웨이트 커널 그룹을 포함한다. 출력 맵(261)의 사이즈는 W2*H2에 대응하고, 출력 맵들(260)은 D개의 출력 맵을 포함한다. 따라서, D개의 웨이트 커널 그룹으로 D개의 출력 채널이 형성될 수 있으며, 하나의 출력 맵을 생성하기 위한 웨이트 커널 그룹의 사이즈는 K^2*C이다.The input maps 220 may be represented by a matrix 225. One column in the matrix 225 corresponds to the region 221 and may be represented by K ^ 2 * C. The number of columns W1 * H1 of the matrix 225 represents the total area of the input maps 220 on which the scan operation is performed. Transpose the matrix 225 to represent the input maps 240. The length of the vector 241 in the input maps 240 is K ^ 2 * C, where N represents the number of convolution operations required to generate one output map. Output maps 260 are generated based on a convolution operation between input maps 240 and weight kernels 250. The weight kernels 250 correspond to the weight kernels 210 and the output maps 260 correspond to the output maps 230. The size of the weight kernel group 251 corresponds to K ^ 2 * C, and the weight kernels 250 include D weight kernel groups. The size of the output map 261 corresponds to W2 * H2, and the output maps 260 include D output maps. Accordingly, D output channels may be formed of D weight kernel groups, and the size of the weight kernel group for generating one output map is K ^ 2 * C.

도 3은 일 실시예에 따른 경량화 포맷을 나타낸 도면이다. 일반적으로 뉴럴 네트워크에서 사용되는 데이터는 32비트 부동 소수점(floating point) 타입으로 표현될 수 있고, 이를 처리하기 위한 컨볼루션 연산은 32bit*32bit의 부동 소수점MAC(Multiplication and Accumulation) 연산이 수행될 수 있다. 임베디드 시스템은 데이터 처리 속도와 메모리 절감을 위해 부동 소수점 데이터 타입을 고정 소수점 데이터 타입으로 변환하여 연산을 수행할 수 있다. 이러한 변환은 고정 소수점 변환으로 지칭될 수 있다. 고정 소수점 변환은 소수(decimal)를 사용하여 구현된 함수들을 정수 연산에 관한 함수로 재정의한 후, 부동 소수점의 소스 코드의 모든 소수점 연산을 정수화하는 과정을 나타내는 것이다. 부동 소수점 변수에 적당한 값을 곱해 정수로 만들면 정수 연산자를 이용한 정수 연산이 수행될 수 있다. 결과 값에 앞서 곱해준 값을 나누면 다시 부동 소수점 변수로 변환될 수 있다.3 is a diagram illustrating a lightweight format according to an embodiment. In general, data used in a neural network may be represented by a 32-bit floating point type, and a convolution operation for processing it may be performed by performing a 32-bit * 32-bit floating point MAC (Multiplication and Accumulation) operation. . Embedded systems can perform operations by converting floating-point data types to fixed-point data types for data processing speed and memory savings. Such a transformation may be referred to as a fixed point transformation. Fixed-point conversion refers to the process of redefining functions implemented using decimal to functions on integer arithmetic and then integerizing all decimal arithmetic in floating-point source code. Integer operations using integer operators can be performed by multiplying floating-point variables with appropriate values to make them integers. Dividing the result multiplied by the previous value can be converted back to a floating point variable.

실시예에 따른 처리 장치는 고정 소수점 변환에 기초하여 데이터를 경량화할 수 있다. 고정 소수점 변환 과정에서 부동 소수점의 변수에 일정한 지수를 곱하면 해당 변수는 정수(integer)화될 수 있고, 이 때 곱해지는 지수는 경량화 포맷으로 정의될 수 있다. 일 실시예에 따르면, 컴퓨터는 2진수로 데이터를 처리하기 때문에, 부동 소수점의 변수를 정수화하기 위해 2의 지수가 곱해질 수 있다. 이 경우, 2의 지수는 경량화 포맷으로 지칭될 수 있다. 예를 들어, 변수 X를 정수화하기 위해 2^q가 곱해진 경우, 변수 X의 경량화 포맷은 q이다. 경량화 포맷으로 2의 지수를 사용함에 따라 경량화 포맷이 쉬프트 연산에 대응하게 되며, 이에 따라 연산 속도가 증가할 수 있다.The processing apparatus according to the embodiment can reduce the data weight based on the fixed point conversion. In the fixed-point conversion process, if a variable of a floating point is multiplied by a certain exponent, the variable may be integerized, and the multiplied exponent may be defined in a lightweight format. According to one embodiment, since the computer processes data in binary, the exponent of two may be multiplied to integer the variable in floating point. In this case, an index of two may be referred to as a lightweight format. For example, if 2 ^ q is multiplied to integer variable X, then the lightweight format of variable X is q. As the exponent of 2 is used as the lightweight format, the lightweight format corresponds to the shift operation, thereby increasing the computation speed.

도 3을 참조하면, 데이터(300)는 정수 비트들 및 가수 비트들을 포함한다. 데이터(300)는 웨이트 커널, 입력 맵 및 출력 맵에 대응할 수 있다. 데이터(300)에 따라 적절한 경량화 포맷을 결정함으로써 데이터가 나타낼 수 있는 해상도가 증가할 수 있다. 실시예에 따르면 레이어 및 채널 별로 웨이트 커널의 경량화 포맷이 결정되고, 입력 맵 및 출력 맵의 경량화 포맷이 적응적으로 결정되므로, 데이터의 표현이 최적화될 수 있다. 경량화 포맷이 결정됨에 있어서 데이터 집합의 최대 값 및 데이터 집합의 분포가 고려될 수 있다. 데이터 집합의 분포는 데이터 집합의 분산을 포함할 수 있다. 예를 들어, 경량화 포맷은 원소들의 최대 값을 기준으로 결정될 수 있고, 데이터 집합의 분포에 따라 데이터 간의 연산 결과에 오버플로가 발생하지 않는 범위에서 결정될 수 있다.Referring to FIG. 3, data 300 includes integer bits and mantissa bits. The data 300 may correspond to a weight kernel, an input map, and an output map. By determining the appropriate lightweight format in accordance with the data 300, the resolution that the data can represent can be increased. According to the embodiment, since the weight format of the weight kernel is determined for each layer and channel, and the weight format of the input map and the output map is adaptively determined, the representation of data may be optimized. In determining the lightweight format, the maximum value of the data set and the distribution of the data set may be considered. The distribution of the data set may include the variance of the data set. For example, the weight reduction format may be determined based on a maximum value of elements, and may be determined in a range in which no overflow occurs in a calculation result between data according to a distribution of the data set.

도 4는 일 실시예에 따른 웨이트 커널의 경량화를 나타낸 도면이다. 도 4를 참조하면, 뉴럴 네트워크(410)의 트레이닝에 따라 트레이닝 결과가 획득될 수 있다. 트레이닝 결과는 각 레이어 및 각 채널 별 웨이트 커널을 포함할 수 있다. 웨이트 커널의 경량화에 따른 경량화 데이터는 메모리(420)에 저장될 수 있다. 경량화 데이터는 웨이트 커널의 경량화 포맷 및 경량화된 웨이트 커널을 포함할 수 있다. 경량화 데이터는 레이어 및 채널 별로 저장될 수 있다. 일 실시예에 따르면, 경량화 데이터는 룩 업 테이블과 같은 데이터베이스로 메모리(420)에 저장될 수 있다.4 is a view showing weight reduction of a weight kernel according to an embodiment. Referring to FIG. 4, training results may be obtained according to training of the neural network 410. The training result may include a weight kernel for each layer and each channel. The weighted data according to the weighted weight kernel may be stored in the memory 420. The weighted data may include the weighted format of the weight kernel and the weighted weight kernel. The weight reduction data may be stored for each layer and channel. According to an embodiment, the lightweight data may be stored in the memory 420 in a database such as a look up table.

도 5는 일 실시예에 따른 경량화 데이터를 포함하는 룩 업 테이블을 나타낸 도면이다. 도 5를 참조하면, 룩 업 테이블(500)은 레이어 및 채널 별로 경량화 데이터를 포함한다. 경량화 데이터는 경량화 포맷 및 경량화된 웨이트 커널을 나타낸다. 전술된 것처럼, 실시예에 따른 뉴럴 네트워크는 복수의 레이어들을 포함할 수 있고, 각 레이어는 복수의 채널들을 포함할 수 있다. 룩 업 테이블(500)에서 Lu는 레이어를 나타내고, Cuv는 채널을 나타낸다. u는 레이어의 인덱스를 나타내고, v는 채널의 인덱스를 나타낸다. n은 레이어의 수를 나타내고, m은 레이어(L1)에 포함된 채널의 수를 나타낸다. 예를 들어, 레이어(L1)는 복수의 채널들(C11 내지 C1m)을 포함할 수 있다.5 is a diagram illustrating a lookup table including lightweight data, according to an exemplary embodiment. Referring to FIG. 5, the lookup table 500 includes lightweight data for each layer and channel. The lightweight data represents a lightweight format and a lightweight kernel. As described above, the neural network according to the embodiment may include a plurality of layers, and each layer may include a plurality of channels. In the look up table 500, Lu represents a layer and Cuv represents a channel. u represents the index of the layer, and v represents the index of the channel. n represents the number of layers, and m represents the number of channels included in the layer L1. For example, the layer L1 may include a plurality of channels C11 to C1m.

뉴럴 네트워크의 학습 결과에 따라 레이어 및 채널 별로 웨이트 커널이 결정될 수 있고, 결정된 웨이트 커널에 관한 경량화 데이터가 결정될 수 있다. 경량화된 웨이트 커널(WK11)은 레이어(L1)의 채널(C11)에 대응하고, 경량화된 웨이트 커널(WK12)은 레이어(L1)의 채널(C12)에 대응한다. 이때 경량화된 웨이트 커널(WK11) 및 경량화된 웨이트 커널(WK12)은 독립적으로 결정될 수 있다. 예를 들어, 채널(C11)에 관해 웨이트 커널이 결정되면, 결정된 웨이트 커널은 경량화 포맷(Q11) 및 경량화된 웨이트 커널(WK11)로 변환되어 룩 업 테이블(500)에 기록될 수 있다. 유사하게, 채널(C12)에 관해 경량화 포맷(Q12) 및 경량화된 웨이트 커널(WK12)이 기록될 수 있고, 채널(C1m)에 관해 경량화 포맷(Q1m) 및 경량화된 웨이트 커널(WK1m)이 기록될 수 있다. 나머지 레이어들 및 나머지 레이어들의 채널들에 관해 경량화 포맷 및 경량화된 웨이트 커널이 결정된 후 룩 업 테이블(500)에 저장될 수 있다.The weight kernel may be determined for each layer and channel according to the learning result of the neural network, and the weighting data regarding the determined weight kernel may be determined. The weighted weight kernel WK11 corresponds to the channel C11 of the layer L1, and the weighted weight kernel WK12 corresponds to the channel C12 of the layer L1. In this case, the weighted weight kernel WK11 and the weighted weight kernel WK12 may be independently determined. For example, when the weight kernel is determined for the channel C11, the determined weight kernel may be converted into the weighted format Q11 and the weighted weight kernel WK11 and recorded in the lookup table 500. Similarly, the lightweight format Q12 and the lightweight weight kernel WK12 can be recorded for the channel C12, and the lightweight format Q1m and the lightweight weight kernel WK1m are recorded for the channel C1m. Can be. The lightweight format and the weighted weight kernel for the remaining layers and channels of the remaining layers may be determined and then stored in the lookup table 500.

룩 업 테이블(500)은 실시예에 따른 처리 장치의 메모리에 저장될 수 있고, 처리 장치는 룩 업 테이블(500)을 이용하여 컨볼루션 연산을 수행할 수 있다. 처리 장치는 룩 업 테이블(500)에서 경량화 포맷(Quv) 및 경량화된 웨이트 커널(WKuv)을 획득하여 레이어(Lu)의 채널(Cuv)에 관한 컨볼루션 연산을 수행할 수 있다.The lookup table 500 may be stored in a memory of a processing apparatus according to an embodiment, and the processing apparatus may perform a convolution operation using the lookup table 500. The processing apparatus may perform a convolution operation on the channel Cu of the layer Lu by obtaining the lightweight format Quv and the lightweight weight kernel WKuv from the lookup table 500.

도 6은 일 실시예에 따른 액티베이션 데이터의 동적 경량화 프로세스를 나타낸 도면이다. 아래에서는 제1 레이어 및 제2 레이어에 관해서 설명하지만, 제2 레이어 이후의 레이어들에 관해서는 제2 레이어에 대응하는 동작이 수행될 수 있다. 이하 ALU(602)의 동작은 처리 장치의 동작으로 이해될 수도 있다.6 is a diagram illustrating a process of dynamically lightweighting activation data according to an embodiment. Hereinafter, the first layer and the second layer will be described, but an operation corresponding to the second layer may be performed with respect to the layers after the second layer. The operation of the ALU 602 may also be understood as the operation of the processing device.

이하, 제1 레이어에 관련된 동작들을 설명한다.Hereinafter, operations related to the first layer will be described.

메모리(601)는 영상 데이터(611), 웨이트 커널(612) 및 웨이트 커널(612)의 경량화 포맷(613)을 저장한다. 영상 데이터(611) 및 웨이트 커널(612)은 모두 로우 비트 폭을 가질 수 있다. 제1 레이어는 뉴럴 네트워크의 입력 레이어에 대응할 수 있다. 이 경우, 촬영 장치를 통해 획득된 입력 영상의 영상 데이터(611)가 입력 맵 대신에 처리될 수 있다. 처리 장치는 영상 데이터(611) 및 웨이트 커널(612)을 로우 비트 폭에 대응하는 사이즈를 갖는 레지스터(603)에 로드할 수 있다. 도 6에서 LD는 데이터를 메모리에서 로드(load)하는 동작을 나타내고, ST는 데이터를 메모리에 저장(store)하는 동작을 나타낸다.The memory 601 stores the image data 611, the weight kernel 612, and the lightweight format 613 of the weight kernel 612. The image data 611 and the weight kernel 612 may both have a low bit width. The first layer may correspond to an input layer of the neural network. In this case, the image data 611 of the input image acquired through the photographing apparatus may be processed instead of the input map. The processing apparatus may load the image data 611 and the weight kernel 612 into the register 603 having a size corresponding to the low bit width. In FIG. 6, LD denotes an operation of loading data from a memory, and ST denotes an operation of storing data in a memory.

웨이트 커널들 및 경량화 포맷들은 레이어 및 출력 채널 별로 메모리(601)에 존재할 수 있다. 예를 들어, 메모리(601)는 도 5에서 설명된 룩 업 테이블을 저장할 수 있다. 처리 장치는 현재 처리 중인 채널에 맞는 웨이트 커널 및 경량화 포맷을 메모리(601)에서 로드할 수 있다. 예를 들어, 현재 제1 레이어의 제1 출력 채널을 처리 중인 경우, 제1 출력 채널에 대응하는 제1 웨이트 커널이 메모리(601)에서 로드될 수 있고, 영상 데이터(611) 및 제1 웨이트 커널 간의 컨볼루션 연산이 수행될 수 있다. 만약, 현재 제1 레이어의 제2 출력 채널을 처리 중인 경우, 제2 출력 채널에 대응하는 제2 웨이트 커널이 메모리(601)에서 로드될 수 있고, 영상 데이터(611) 및 제2 웨이트 커널 간의 컨볼루션 연산이 수행될 수 있다.Weight kernels and lightweight formats may exist in memory 601 per layer and output channel. For example, the memory 601 may store the look up table described in FIG. 5. The processing unit may load a weight kernel and a lightweight format for the channel currently being processed from the memory 601. For example, when the first output channel of the first layer is currently being processed, a first weight kernel corresponding to the first output channel may be loaded from the memory 601, and the image data 611 and the first weight kernel may be loaded. A convolution operation of the liver may be performed. If the second output channel of the first layer is currently being processed, a second weight kernel corresponding to the second output channel may be loaded from the memory 601, and the convoluted between the image data 611 and the second weight kernel may be loaded. A solution operation may be performed.

블록(614)에서 ALU(arithmetic logic unit, 602)는 영상 데이터(611) 및 웨이트 커널(612) 간의 컨볼루션 연산을 처리하여 출력 맵(615)을 생성할 수 있다. 데이터가 8 비트로 경량화된 경우 컨볼루션 연산은 8*8 연산이 되고, 데이터가 4 비트로 경량화된 경우 컨볼루션 연산은 4*4 연산이 될 수 있다. 컨볼루션 연산 결과, 다시 말해 출력 맵(615)은 하이 비트 폭으로 표현될 수 있다. 예를 들어, 8*8 연산이 수행된 경우 컨볼루션 연산 결과는 16비트로 표현될 수 있다. 처리 장치는 출력 맵(615)을 하이 비트 폭에 대응하는 사이즈를 갖는 레지스터(604)를 통해 메모리(601)에 저장할 수 있다. 처리 장치는 메모리(601)에서 출력 맵(615)을 로드하고, ALU(602)는 블록(616)에서 출력 맵(615)을 활성화 함수에 대입하여 출력 맵(618)을 생성한다. 처리 장치는 하이 비트 폭의 출력 맵(618)을 하이 비트 폭의 레지스터(604)를 통해 메모리(601)에 저장할 수 있다.In block 614, an arithmetic logic unit 602 may process a convolution operation between the image data 611 and the weight kernel 612 to generate an output map 615. If the data is reduced to 8 bits, the convolution operation may be 8 * 8 operations. If the data is reduced to 4 bits, the convolution operation may be 4 * 4 operations. As a result of the convolution operation, that is, the output map 615 may be represented by a high bit width. For example, when an 8 * 8 operation is performed, the convolution operation result may be represented by 16 bits. The processing device may store the output map 615 in the memory 601 through a register 604 having a size corresponding to the high bit width. The processing device loads the output map 615 from the memory 601, and the ALU 602 generates the output map 618 by assigning the output map 615 to the activation function at block 616. The processing device may store the high bit wide output map 618 in the memory 601 through the high bit wide register 604.

처리 장치는 블록(617)에서 제1 레이어의 출력 맵의 최대 값을 업데이트한다. 예를 들어, 특정 레이어의 출력 맵의 최대 값을 저장하기 위한 레지스터가 존재할 수 있다. 처리 장치는 활성화 함수 출력을 레지스터에 저장된 기존의 최대 값과 비교하고, 활성화 함수 출력이 레지스터에 저장된 기존의 최대 값보다 큰 경우 레지스터를 활성화 함수 출력으로 업데이트할 수 있다. 이와 같은 방식으로 제1 레이어의 출력 맵들이 모두 처리되면 제1 레이어의 출력 맵의 최대 값(630)이 최종적으로 결정될 수 있다. 활성화 함수 출력이 레지스터 값과 비교되는 것이므로, 처리 장치는 최대 값(630)을 결정하기 위해 별도로 메모리(601)에 접근하지 않고도 최대 값(630)을 결정할 수 있다. 최대 값(630)은 제2 레이어에서 입력 맵을 경량화하는데 이용될 수 있다.The processing device updates the maximum value of the output map of the first layer at block 617. For example, a register may exist to store the maximum value of the output map of a particular layer. The processing unit may compare the activation function output with an existing maximum value stored in the register, and update the register with the activation function output if the activation function output is greater than the existing maximum value stored in the register. In this manner, when the output maps of the first layer are all processed, the maximum value 630 of the output map of the first layer may be finally determined. Since the activation function output is compared with the register value, the processing device can determine the maximum value 630 without accessing the memory 601 separately to determine the maximum value 630. The maximum value 630 may be used to lighten the input map in the second layer.

이하, 제2 레이어에 관련된 동작들을 설명한다.Hereinafter, operations related to the second layer will be described.

ALU(602)는 입력 맵(619)을 메모리(601)에서 로드한다. 블록(620)에서 ALU(602)는 제1 레이어의 출력 맵의 최대 값(630)에 기초하여 입력 맵(619)을 경량화한다. 예를 들어, 처리 장치는 최대 값(630)에 기초하여 입력 맵(619)의 경량화 포맷을 결정하고, 결정된 경량화 포맷에 기초하여 하이 비트 폭의 입력 맵(619)을 로우 비트 폭으로 경량화하여 입력 맵(621)을 생성할 수 있다. 다시 말해, 입력 맵(621)은 입력 맵(619)의 경량화된 버전을 나타낸다. 처리 장치는 결정된 경량화 포맷에 대응하는 값으로 하이 비트 폭의 입력 맵(619)에 관한 쉬프트 연산을 수행하여 하이 비트 폭의 입력 맵(619)을 로우 비트 폭으로 경량화할 수 있다. 혹은, 처리 장치는 입력 맵(619)에 경량화 포맷에 대응하는 지수를 곱하거나 나누어서 입력 맵(619)을 입력 맵(621)으로 경량화할 수 있다.ALU 602 loads input map 619 from memory 601. At block 620, the ALU 602 lightweights the input map 619 based on the maximum value 630 of the output map of the first layer. For example, the processing apparatus determines a light weight format of the input map 619 based on the maximum value 630, and lightens the high bit width input map 619 to a low bit width based on the determined light weight format. The map 621 may be generated. In other words, input map 621 represents a lightweight version of input map 619. The processing apparatus may reduce the high bit width input map 619 to a low bit width by performing a shift operation on the high bit width input map 619 with a value corresponding to the determined light weight format. Alternatively, the processing apparatus may reduce the input map 619 to the input map 621 by multiplying or dividing the input map 619 by an index corresponding to the lightweight format.

제1 레이어 출력이 제2 레이어의 입력이 되므로, 출력 맵(618) 및 입력 맵(619)은 동일한 액티베이션 데이터를 지시할 수 있다. 따라서, 입력 맵(619)에 관한 경량화 과정은 출력 맵(618)에 관한 경량화 과정으로 표현될 수도 있다.Since the first layer output is an input of the second layer, the output map 618 and the input map 619 may indicate the same activation data. Accordingly, the lightening process for the input map 619 may be represented as a lightening process for the output map 618.

블록들(624, 626, 627)에서는 전술된 블록들(614, 616, 617)에 관해 설명된 동작의 대응 동작이 수행될 수 있다.In blocks 624, 626, and 627, a corresponding operation of the operation described with respect to blocks 614, 616, and 617 described above may be performed.

메모리(601)는 입력 맵(621), 웨이트 커널(622) 및 웨이트 커널(622)의 경량화 포맷(623)을 저장한다. 입력 맵(621) 및 웨이트 커널(622)은 모두 로우 비트 폭을 가질 수 있다. 제2 레이어는 제1 레이어의 출력을 수신하므로, 영상 데이터 대신 입력 맵(621)을 처리할 수 있다. 처리 장치는 입력 맵(621) 및 웨이트 커널(622)을 로우 비트 폭에 대응하는 사이즈를 갖는 레지스터(603)에 로드할 수 있다.The memory 601 stores the input map 621, the weight kernel 622, and the lightweight format 623 of the weight kernel 622. The input map 621 and the weight kernel 622 may both have a low bit width. Since the second layer receives the output of the first layer, the second layer may process the input map 621 instead of the image data. The processing device may load the input map 621 and the weight kernel 622 into a register 603 having a size corresponding to the low bit width.

블록(624)에서 ALU(602)는 입력 맵(621) 및 웨이트 커널(622) 간의 컨볼루션 연산을 처리하여 출력 맵(625)을 생성할 수 있다. 처리 장치는 출력 맵(625)을 하이 비트 폭에 대응하는 사이즈를 갖는 레지스터(604)를 통해 메모리(601)에 저장할 수 있다. 처리 장치는 메모리(601)에서 출력 맵(625)을 로드하고, ALU(602)는 블록(626)에서 출력 맵(625)을 활성화 함수에 대입하여 출력 맵(628)을 생성한다. 처리 장치는 하이 비트 폭의 출력 맵(628)을 하이 비트 폭의 레지스터(604)를 통해 메모리(601)에 저장할 수 있다.At block 624, the ALU 602 may process a convolution operation between the input map 621 and the weight kernel 622 to generate an output map 625. The processing device may store the output map 625 in the memory 601 through a register 604 having a size corresponding to the high bit width. The processing device loads the output map 625 from the memory 601, and the ALU 602 generates the output map 628 by assigning the output map 625 to the activation function at block 626. The processing device may store the high bit wide output map 628 in the memory 601 through the high bit wide register 604.

처리 장치는 블록(627)에서 제2 레이어의 출력 맵의 최대 값을 업데이트한다. 제2 레이어의 출력 맵들이 모두 처리되면 제2 레이어 출력 맵의 최대 값(631)이 결정될 수 있고, 최대 값(631)은 제2 레이어의 다음 레이어인 제3 레이어에서 입력 맵을 경량화하는데 이용될 수 있다.The processing device updates at block 627 the maximum value of the output map of the second layer. When the output maps of the second layer are all processed, the maximum value 631 of the second layer output map may be determined, and the maximum value 631 may be used to lighten the input map in the third layer, which is the next layer of the second layer. Can be.

도 7은 다른 실시예에 따른 액티베이션 데이터의 동적 경량화 프로세스를 나타낸 도면이다. 아래에서는 제2 레이어 및 제3 레이어에 관해서 설명하지만, 제3 레이어 이후의 레이어들에 관해서는 제2 레이어 제3 레이어에 대응하는 동작이 수행될 수 있다. 이하 ALU(702)의 동작은 처리 장치의 동작으로 이해될 수도 있다.7 is a view showing a dynamic lightening process of the activation data according to another embodiment. Hereinafter, the second layer and the third layer will be described. However, operations corresponding to the second layer and the third layer may be performed with respect to the layers after the third layer. The operation of the ALU 702 may also be understood as the operation of the processing device.

메모리(701)는 입력 맵(711), 웨이트 커널(712) 및 웨이트 커널(712)의 경량화 포맷(713)을 저장한다. 입력 맵(711) 및 웨이트 커널(712)은 모두 로우 비트 폭을 가질 수 있다. 처리 장치는 입력 맵(711) 및 웨이트 커널(712)을 로우 비트 폭에 대응하는 사이즈를 갖는 레지스터(703)에 로드할 수 있다. 웨이트 커널들 및 경량화 포맷들은 레이어 및 출력 채널 별로 메모리(701)에 존재할 수 있다. 예를 들어, 메모리(701)는 도 5에서 설명된 룩 업 테이블을 저장할 수 있다. 도 7에서 LD는 데이터를 메모리에서 로드(load)하는 동작을 나타내고, ST는 데이터를 메모리에 저장(store)하는 동작을 나타낸다.The memory 701 stores the input map 711, the weight kernel 712, and the lightweight format 713 of the weight kernel 712. The input map 711 and the weight kernel 712 may both have a low bit width. The processing device may load the input map 711 and the weight kernel 712 into a register 703 having a size corresponding to the low bit width. Weight kernels and lightweight formats may exist in memory 701 for each layer and output channel. For example, the memory 701 may store the look up table described in FIG. 5. In FIG. 7, LD denotes an operation of loading data from a memory, and ST denotes an operation of storing data in a memory.

블록(714)에서 ALU(702)는 입력 맵(711) 및 웨이트 커널(712) 간의 컨볼루션 연산을 처리한다. 컨볼루션 연산 결과, 다시 말해 출력 맵은 하이 비트 폭으로 표현될 수 있고, 하이 비트 폭에 대응하는 사이즈를 갖는 레지스터(704)에 저장될 수 있다. 블록(715)에서 ALU(702)는 제2 레이어의 출력 맵의 최대 값을 업데이트한다. 예를 들어, 특정 레이어의 출력 맵의 최대 값을 저장하기 위한 레지스터가 존재할 수 있고, ALU(702)는 컨볼루션 연산 결과 및 레지스터에 저장된 기존의 최대 값 간의 비교에 기초하여 제2 레이어의 출력 맵의 최대 값을 업데이트할 수 있다. 이와 같은 방식으로 제2 레이어의 출력 맵들이 모두 처리되면 제2 레이어의 출력 맵의 최대 값(731)이 최종적으로 결정될 수 있다. 최대 값(731)은 제3 레이어에서 출력 맵을 예측 기반으로 경량화하는데 이용될 수 있다.At block 714, the ALU 702 processes the convolution operation between the input map 711 and the weight kernel 712. As a result of the convolution operation, that is, the output map may be represented in a high bit width and stored in a register 704 having a size corresponding to the high bit width. At block 715 the ALU 702 updates the maximum value of the output map of the second layer. For example, a register may exist for storing the maximum value of the output map of a particular layer, and the ALU 702 may determine the output map of the second layer based on a comparison between the result of the convolution operation and the existing maximum value stored in the register. You can update the maximum value of. In this manner, when the output maps of the second layer are all processed, the maximum value 731 of the output map of the second layer may be finally determined. The maximum value 731 may be used to lighten the output map on the basis of prediction in the third layer.

블록(716)에서 ALU(702)는 컨볼루션 연산 결과를 활성화 함수에 대입하여 활성화 함수 출력을 생성한다. 블록(717)에서 ALU(702)는 예측 기반 경량화를 수행한다. 예를 들어, ALU(702)는 제1 레이어의 출력 맵의 최대 값(730)에 기초하여 제2 레이어 출력 맵의 최대 값을 예측하고, 예측된 제2 레이어 출력 맵의 최대 값에 기초하여 제2 레이어 출력 맵에 관한 경량화 포맷을 결정하고, 결정된 제2 레이어 출력 맵에 관한 경량화 포맷에 기초하여 하이 비트 폭의 활성화 함수 출력을 로우 비트 폭으로 경량화할 수 있다.In block 716 the ALU 702 assigns the result of the convolution operation to the activation function to generate an activation function output. In block 717 the ALU 702 performs prediction based lightweighting. For example, the ALU 702 predicts the maximum value of the second layer output map based on the maximum value 730 of the output map of the first layer and based on the predicted maximum value of the second layer output map. The lightweighting format for the two-layer output map may be determined, and the high bit width activation function output may be reduced to the low bitwidth based on the determined lightweighting format for the second layer output map.

출력 맵의 경량화를 위해서는 출력 맵의 최대 값을 알아야 하는데, 모든 출력 채널에 관한 처리 결과를 기다려서 출력 맵의 최대 값을 결정할 경우, 출력 맵의 최대 값을 결정하기 위한 추가적인 메모리 접근이 요구된다. 실시예에 따르면 이전 레이어의 출력 맵의 최대 값에 기초하여 현재 레이어의 출력 맵의 최대 값을 예측함으로써, 모든 출력 채널에 관한 처리 결과를 기다릴 필요 없이 활성화 함수 출력, 다시 말해 출력 맵을 즉시 경량화할 수 있다.In order to lighten the output map, it is necessary to know the maximum value of the output map. When determining the maximum value of the output map by waiting for processing results of all output channels, an additional memory access is required to determine the maximum value of the output map. According to an embodiment, by predicting the maximum value of the output map of the current layer based on the maximum value of the output map of the previous layer, the output of the activation function, that is, the output map can be immediately reduced, without having to wait for the processing result for all output channels. Can be.

경량화된 활성화 함수 출력은 로우 비트 폭을 가지며, 로우 비트 폭에 대응하는 사이즈를 갖는 레지스터(703)에 저장된다. 처리 장치는 경량화된 활성화 함수 출력을 출력 맵(718)으로서 메모리(701)에 저장한다.The lightweight activation function output has a low bit width and is stored in a register 703 having a size corresponding to the low bit width. The processing device stores the lightweight activation function output in memory 701 as output map 718.

이하, 제3 레이어에 관련된 동작들을 설명한다.Hereinafter, operations related to the third layer will be described.

메모리(701)는 입력 맵(719), 웨이트 커널(720) 및 웨이트 커널(720)의 경량화 포맷(721)을 저장한다. 입력 맵(719) 및 웨이트 커널(720)은 모두 로우 비트 폭을 가질 수 있다. 출력 맵(718)은 제2 레이어에서 이미 경량화된 상태이고, 입력 맵(719)은 출력 맵(718)에 대응하기 때문이다. 처리 장치는 입력 맵(719) 및 웨이트 커널(720)을 로우 비트 폭에 대응하는 사이즈를 갖는 레지스터(703)에 로드할 수 있다.The memory 701 stores the input map 719, the weight kernel 720, and the lightweight format 721 of the weight kernel 720. The input map 719 and the weight kernel 720 may both have a low bit width. This is because the output map 718 is already lightened in the second layer, and the input map 719 corresponds to the output map 718. The processing device may load the input map 719 and the weight kernel 720 into a register 703 having a size corresponding to the low bit width.

블록(722)에서 ALU(702)는 입력 맵(719) 및 웨이트 커널(720) 간의 컨볼루션 연산을 처리한다. 컨볼루션 연산 결과, 다시 말해 출력 맵은 하이 비트 폭으로 표현될 수 있고, 하이 비트 폭에 대응하는 사이즈를 갖는 레지스터(704)에 저장될 수 있다. 블록(723)에서 ALU(702)는 제3 레이어의 출력 맵의 최대 값을 업데이트한다. 제3 레이어의 출력 맵들이 모두 처리되면 제3 레이어의 출력 맵의 최대 값(732)이 최종적으로 결정될 수 있다. 최대 값(732)은 제4 레이어에서 출력 맵을 예측 기반으로 경량화하는데 이용될 수 있다. 제4 레이어는 제3 레이어의 다음 레이어를 나타낸다. 다음 레이어에서 다음 레이어의 출력 맵의 최대 값을 예측할 때, 이전 레이어의 출력 맵의 정확한 최대 값이 이용되므로, 예측 에러가 하나의 레이어 이상 전파되지 않을 수 있다.At block 722, the ALU 702 processes the convolution operation between the input map 719 and the weight kernel 720. As a result of the convolution operation, that is, the output map may be represented in a high bit width and stored in a register 704 having a size corresponding to the high bit width. At block 723, the ALU 702 updates the maximum value of the output map of the third layer. When the output maps of the third layer are all processed, the maximum value 732 of the output map of the third layer may be finally determined. The maximum value 732 may be used to lightweight the output map on the basis of prediction in the fourth layer. The fourth layer represents the next layer of the third layer. When predicting the maximum value of the output map of the next layer in the next layer, since the exact maximum value of the output map of the previous layer is used, the prediction error may not propagate more than one layer.

블록(724)에서 ALU(702)는 컨볼루션 연산 결과를 활성화 함수에 대입하여 활성화 함수 출력을 생성한다. 블록(725)에서 ALU(702)는 제2 레이어의 출력 맵의 최대 값(731)에 기초하여 제3 레이어 출력 맵의 최대 값을 예측하고, 예측된 제3 레이어 출력 맵의 최대 값에 기초하여 활성화 함수 출력을 경량화한다. 경량화된 활성화 함수 출력은 로우 비트 폭을 가지며, 로우 비트 폭에 대응하는 사이즈를 갖는 레지스터(703)에 저장된다. 처리 장치는 경량화된 활성화 함수 출력을 출력 맵(726)으로서 메모리(701)에 저장한다.In block 724 the ALU 702 assigns the result of the convolution operation to the activation function to produce an activation function output. At block 725 the ALU 702 predicts the maximum value of the third layer output map based on the maximum value 731 of the output map of the second layer and based on the predicted maximum value of the third layer output map. Lighten the activation function output. The lightweight activation function output has a low bit width and is stored in a register 703 having a size corresponding to the low bit width. The processing device stores the lightweight activation function output in the memory 701 as an output map 726.

추가로, 제1 레이어의 출력 맵의 최대 값(730)은 다양한 실시예에 따라 결정될 수 있다. 일 실시예에 따르면, 제1 레이어의 출력 맵의 최대 값(730)은 트레이닝 단계에서 다양한 트레이닝 데이터에 기초하여 미리 결정될 수 있다. 다른 실시예에 따르면, 도 6에 따른 실시예의 제1 레이어가 도 7에 따른 실시예의 제1 레이어일 수 있고, 이 경우 도 6의 제1 레이어의 출력 맵의 최대 값(630)이 도 7의 제1 레이어의 출력 맵의 최대 값(730)에 대응할 수 있다.In addition, the maximum value 730 of the output map of the first layer may be determined according to various embodiments. According to an embodiment, the maximum value 730 of the output map of the first layer may be predetermined based on various training data in the training step. According to another embodiment, the first layer of the embodiment according to FIG. 6 may be the first layer of the embodiment according to FIG. 7, in which case the maximum value 630 of the output map of the first layer of FIG. It may correspond to the maximum value 730 of the output map of the first layer.

도 8은 일 실시예에 따른 입력 맵의 최대 값 분포를 나타낸 그래프이다. 도 8을 참조하면, 입력 맵의 최대 값은 일정한 패턴을 가질 수 있다. 특정 레이어의 출력 맵은 다음 레이어의 입력 맵에 대응하므로 출력 맵도 입력 맵과 동일한 패턴을 갖는 것으로 이해될 수 있다. 제1 영상의 데이터들은 비교적 큰 값을 갖는, 예를 들어 고조도의 영상일 수 있으며 제2 영상의 데이터들은 비교적 작은 값을 갖는, 예를 들어 저조도의 영상일 수 있다. 제1 영상의 입력 맵 및 제2 영상의 입력 맵은 모두 유사한 패턴을 가질 수 있다.8 is a graph illustrating a maximum value distribution of an input map according to an embodiment. Referring to FIG. 8, the maximum value of the input map may have a certain pattern. Since the output map of a specific layer corresponds to the input map of the next layer, it can be understood that the output map also has the same pattern as the input map. The data of the first image may be an image having a relatively large value, for example, high illumination, and the data of the second image may be an image of a low illumination, which has a relatively small value. The input map of the first image and the input map of the second image may both have similar patterns.

이전 레이어의 출력 맵의 최대 값에 기초한 참조 범위 내에서 현재 레이어의 출력 맵의 최대 값이 결정될 수 있다. 참조 범위는 수치 오차와 같은 위험을 최소화하기 위해 보수적으로 설정되거나, 해상도와 같은 성능을 최대화하기 위해 적극적으로 설정될 수도 있다. 예를 들어, 참조 범위가 설정되는 기준은 현재 레이어가 몇 번째 레이어인지에 기초할 수 있다. 일례로, 입력 쪽의 레이어들에서는 데이터의 변화가 출력 쪽에 비해 상대적으로 크므로 참조 범위가 상대적으로 보수적으로 설정될 수 있고, 출력 쪽의 레이어들에서는 데이터의 변화가 입력 쪽에 비해 작으므로 참조 범위가 상대적으로 적극적으로 설정될 수 있다. 일례로, 제2 레이어 및 제3 레이어에서는 현재 레이어의 출력 맵의 최대 값이 이전 레이이의 출력 맵의 최대 값의 +10%로 설정될 수 있고, 제4 레이어에서는 현재 레이어의 출력 맵의 최대 값이 이전 레이이의 출력 맵의 최대 값의 -20~30%로 설정될 수 있고, 제5 레이어 이후에서는 현재 레이어의 출력 맵의 최대 값이 이전 레이이의 출력 맵의 최대 값과 동일하게 설정될 수 있다.The maximum value of the output map of the current layer may be determined within a reference range based on the maximum value of the output map of the previous layer. The reference range may be set conservatively to minimize risks such as numerical errors or may be actively set to maximize performance such as resolution. For example, the criterion for setting the reference range may be based on the current layer. For example, in the input layers, the reference range can be set relatively conservatively because the change of data is relatively larger than the output side, and in the output layers, the reference range is small because the change of data is smaller than the input side. It can be set relatively aggressively. For example, the maximum value of the output map of the current layer may be set to + 10% of the maximum value of the output map of the previous layer in the second layer and the third layer, and the maximum value of the output map of the current layer in the fourth layer. The maximum value of the output map of the previous layer may be set to -20 to 30%, and after the fifth layer, the maximum value of the output map of the current layer may be set equal to the maximum value of the output map of the previous layer. .

도 9는 일 실시예에 따른 트레이닝 장치를 나타낸 블록도이다. 도 9를 참조하면, 트레이닝 장치(900)는 메모리(910) 및 프로세서(920)를 포함한다. 메모리(910)는 뉴럴 네트워크(911), 경량화 데이터(912) 및 프로세서(920)에서 읽을 수 있는 명령어를 포함한다. 명령어가 프로세서(920)에서 실행되면, 프로세서(920)는 뉴럴 네트워크(911)를 위한 트레이닝 동작을 수행할 수 있다. 여기서 뉴럴 네트워크(911)를 위한 트레이닝 동작은 트레이닝 단계로 나타낼 수 있다. 예를 들어, 프로세서(920)는 트레이닝 데이터를 뉴럴 네트워크(911)에 입력하고, 뉴럴 네트워크(911)의 웨이트 커널을 트레이닝할 수 있다. 프로세서(920)는 트레이닝된 웨이트 커널을 레이어 및 채널 별로 경량화할 수 있고, 경량화 데이터(912)를 메모리(910)에 저장할 수 있다. 경량화 데이터는 경량화된 웨이트 커널 및 경량화된 웨이트 커널의 경량화 포맷을 포함할 수 있다. 경량화 데이터는 메모리(910)에 룩 업 테이블의 형태로 저장될 수 있다. 그 밖에, 트레이닝 장치(900)에 관해서는 도 1 내지 도 8을 통해 설명된 사항이 적용될 수 있다.9 is a block diagram illustrating a training apparatus according to an exemplary embodiment. Referring to FIG. 9, the training device 900 includes a memory 910 and a processor 920. The memory 910 includes instructions that can be read by the neural network 911, the lightweight data 912, and the processor 920. When the instruction is executed in the processor 920, the processor 920 may perform a training operation for the neural network 911. Here, the training operation for the neural network 911 may be represented as a training step. For example, the processor 920 may input training data into the neural network 911 and train the weight kernel of the neural network 911. The processor 920 may reduce the weight of the trained weight kernel for each layer and channel, and store the weighted data 912 in the memory 910. The lightweight data may include a lightweight weight kernel and a lightweight format of the lightweight weight kernel. The weighted data may be stored in the form of a look up table in the memory 910. In addition, the matters described with reference to FIGS. 1 to 8 may be applied to the training apparatus 900.

도 10은 일 실시예에 따른 처리 장치를 나타낸 블록도이다. 도 10을 참조하면, 처리 장치(1000)는 메모리(1010) 및 프로세서(1020)를 포함한다. 메모리(1010)는 뉴럴 네트워크(1011), 경량화 데이터(1012) 및 프로세서(1020)에서 읽을 수 있는 명령어를 포함한다. 명령어가 프로세서(1020)에서 실행되면, 프로세서(1020)는 뉴럴 네트워크(1011)를 이용한 처리 동작을 수행할 수 있다. 여기서 뉴럴 네트워크(1011)를 이용한 처리 동작은 추론 단계로 나타낼 수 있다. 예를 들어, 프로세서(1020)는 입력 영상을 뉴럴 네트워크(1011)에 입력하고, 뉴럴 네트워크(1011)의 출력에 기초하여 처리 결과를 출력할 수 있다. 처리 결과는 인식 결과 혹은 인증 결과를 포함할 수 있다.10 is a block diagram illustrating a processing apparatus according to an exemplary embodiment. Referring to FIG. 10, the processing apparatus 1000 includes a memory 1010 and a processor 1020. The memory 1010 includes a neural network 1011, lightweight data 1012, and instructions readable by the processor 1020. When the instruction is executed in the processor 1020, the processor 1020 may perform a processing operation using the neural network 1011. Herein, the processing operation using the neural network 1011 may be represented as an inference step. For example, the processor 1020 may input an input image to the neural network 1011, and output a processing result based on the output of the neural network 1011. The processing result may include a recognition result or an authentication result.

명령어가 프로세서(1020)에서 실행되면, 프로세서(1020)는 뉴럴 네트워크의 현재 레이어의 입력 맵들 및 현재 레이어의 웨이트 커널들 간의 컨볼루션 연산을 수행하여 현재 레이어의 출력 맵들을 생성하고, 뉴럴 네트워크 내에서 처리되는 적어도 일부의 액티베이션 데이터의 분포에 기초하여, 현재 레이어의 출력 맵들에 관한 경량화 포맷을 결정하고, 결정된 경량화 포맷에 기초하여, 현재 레이어의 출력 맵들에 대응하는 액티베이션 데이터를 로우 비트 폭(low bit width)으로 경량화할 수 있다. 그 밖에, 처리 장치(1000)에 관해서는 도 1 내지 도 9를 통해 설명된 사항이 적용될 수 있다.When the instruction is executed in the processor 1020, the processor 1020 performs a convolution operation between the input maps of the current layer of the neural network and the weight kernels of the current layer to generate output maps of the current layer, and within the neural network. Based on the distribution of the at least some activation data processed, a lightweight format is determined for the output maps of the current layer, and based on the determined lightweight format, the activation data corresponding to the output maps of the current layer is low bit width. width) can be reduced in weight. In addition, the matters described with reference to FIGS. 1 to 9 may be applied to the processing apparatus 1000.

도 11은 일 실시예에 따른 처리 방법을 나타낸 플로우 차트이다. 도 11을 참조하면, 처리 장치는 단계(1110)에서 뉴럴 네트워크의 현재 레이어의 입력 맵들 및 현재 레이어의 웨이트 커널들 간의 컨볼루션 연산을 수행하여 현재 레이어의 출력 맵들을 생성하고, 단계(1120)에서 뉴럴 네트워크 내에서 처리되는 적어도 일부의 액티베이션 데이터의 분포에 기초하여, 현재 레이어의 출력 맵들에 관한 경량화 포맷을 결정하고, 단계(1130)에서 결정된 경량화 포맷에 기초하여, 현재 레이어의 출력 맵들에 대응하는 액티베이션 데이터를 로우 비트 폭(low bit width)으로 경량화한다. 그 밖에, 처리 방법에 관해서는 도 1 내지 도 10을 통해 설명된 사항이 적용될 수 있다.11 is a flowchart illustrating a processing method according to an exemplary embodiment. Referring to FIG. 11, in operation 1120, the processing apparatus generates an output map of a current layer by performing a convolution operation between input maps of a current layer of a neural network and weight kernels of a current layer, and at step 1120. Based on the distribution of the at least some activation data processed in the neural network, determine a lightweight format for the output maps of the current layer, and based on the lightweight format determined in step 1130, corresponding to the output maps of the current layer. Activation data is reduced to a low bit width. In addition, the matter described with reference to FIGS. 1 to 10 may be applied to the processing method.

도 12는 다른 실시예에 따른 처리 방법을 나타낸 플로우 차트이다. 도 12를 참조하면, 처리 장치는 단계(1210)에서 복수의 레이어를 포함하는 뉴럴 네트워크를 시작하고, 단계(1220)에서 뉴럴 네트워크의 현재 레이어의 입력 맵들 및 현재 레이어의 웨이트 커널들 간의 컨볼루션 연산을 수행하여 현재 레이어의 출력 맵들을 생성하고, 단계(1230)에서 뉴럴 네트워크가 시작되기 전에 결정되지 않았던, 현재 레이어의 출력 맵들에 관한 경량화 포맷을 결정하고, 단계(1240)에서 결정된 경량화 포맷에 기초하여, 현재 레이어의 출력 맵들에 대응하는 액티베이션 데이터를 경량화한다. 그 밖에, 처리 방법에 관해서는 도 1 내지 도 11을 통해 설명된 사항이 적용될 수 있다.12 is a flowchart illustrating a processing method according to another embodiment. Referring to FIG. 12, the processing apparatus starts a neural network including a plurality of layers in step 1210, and in step 1220 a convolution operation between input maps of the current layer of the neural network and the weight kernels of the current layer. Generate output maps of the current layer, determine a lightweight format for the output maps of the current layer that were not determined before the neural network was started in step 1230, and based on the lightweight format determined in step 1240. This makes the activation data corresponding to the output maps of the current layer lighter. In addition, the matter described with reference to FIGS. 1 to 11 may be applied to the processing method.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(Arithmetic Logic Unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(Field Programmable Gate Array), PLU(Programmable Logic Unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented as hardware components, software components, and / or combinations of hardware components and software components. For example, the devices, methods, and components described in the embodiments may include, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors, microcomputers, field programmable gates (FPGAs). It may be implemented using one or more general purpose or special purpose computers, such as an array, a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of explanation, one processing device may be described as being used, but one of ordinary skill in the art will appreciate that the processing device includes a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the above, and configure the processing device to operate as desired, or process it independently or collectively. You can command the device. Software and / or data may be any type of machine, component, physical device, virtual equipment, computer storage medium or device in order to be interpreted by or to provide instructions or data to the processing device. Or may be permanently or temporarily embodied in a signal wave to be transmitted. The software may be distributed over networked computer systems so that they may be stored or executed in a distributed manner. Software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be embodied in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described with reference to the accompanying drawings, those skilled in the art may apply various technical modifications and variations based on the above. For example, the described techniques may be performed in a different order than the described method, and / or components of the described systems, structures, devices, circuits, etc. may be combined or combined in a different form than the described method, or other components. Or even if replaced or substituted by equivalents, an appropriate result can be achieved.

Claims

Generating output maps of the current layer by performing a convolution operation between input maps of a current layer of a neural network and weight kernels of the current layer;
Determining a lightweighting format for output maps of the current layer based on a distribution of at least some activation data processed within the neural network; And
Based on the determined lightweight format, lightweighting the activation data corresponding to the output maps of the current layer to a low bit width;
Processing method using a neural network comprising a.

The method of claim 1,
The determining of the lightweight format is
Determining a lightweight format for the output maps based on a maximum value of the output maps of the current layer
Including, a processing method using a neural network.

The method of claim 1,
The weight reduction step
Based on the determined lightweight format, lightweighting input maps of a next layer corresponding to output maps of the current layer to the low bit width;
Including, a processing method using a neural network.

The method of claim 1,
The weight reduction step
Performing a shift operation on the input maps of the next layer with a value corresponding to the lightweight format, and reducing the input maps of the next layer corresponding to the output maps of the current layer to the low bit width;
Including, a processing method using a neural network.

The method of claim 1,
Loading output maps of the current layer from memory; And
Updating a register storing a maximum value of output maps of the current layer based on the output maps of the loaded current layer
More,
Determining the lightweight format is performed based on a value stored in the register.
Processing method using neural network.

The method of claim 1,
The determining of the lightweight format is
Predicting a maximum value of output maps of the current layer based on the maximum value of output maps of a previous layer of the neural network; And
Determining a lightweighting format for the output maps of the current layer based on the maximum value of the output maps of the predicted current layer
Including, a processing method using a neural network.

The method of claim 1,
The weight reduction step
Lightening the output maps of the current layer to the low bit width based on the determined lightening format
Including, a processing method using a neural network.

The method of claim 1,
The weight reduction step
Performing a shift operation on the output maps of the current layer to a value corresponding to the lightweight format, thereby reducing the high bit width output maps of the current layer to the low bit width; step
Including, a processing method using a neural network.

The method of claim 1,
Updating a register storing a maximum value of output maps of the current layer based on output maps of the current layer generated by the convolution operation
More,
The maximum value of output maps of the next layer of the neural network is predicted based on the value stored in the register,
Processing method using neural network.

The method of claim 1,
Acquiring a first weight kernel corresponding to a first output channel currently being processed in the current layer by referring to a database including weight kernels by layer and output channel;
More,
Generating output maps of the current layer
And performing a convolution operation between the input maps of the current layer and the first weight kernel to generate a first output map corresponding to the first output channel.

The method of claim 10,
Wherein the first weight kernel is determined independently of a second weight kernel corresponding to a second channel of the current layer.

The method of claim 1,
The input maps of the current layer and the weight kernels of the current layer have the low bit width, and the output maps of the current layer have a high bit width. .

A computer-readable storage medium having stored thereon one or more programs comprising instructions for performing the method of claim 1.

A processor; And
A memory containing instructions readable by the processor
Including,
When the instruction is executed on the processor, the processor
Generating output maps of the current layer by performing a convolution operation between input maps of a current layer of a neural network and weight kernels of the current layer,
Based on the distribution of at least some activation data processed in the neural network, determine a lightweight format for the output maps of the current layer,
Based on the determined lightweight format, reducing the activation data corresponding to the output maps of the current layer to a low bit width;
Processing device using neural network.

The method of claim 14,
The processor is
And determine a lightweight format for the output maps based on a maximum value of the output maps of the current layer.

The method of claim 14,
The processor is
An input device of a next layer of the neural network based on the output maps of the current layer, and the input map of the next layer is lightened to the low bit width based on the determined lightweight format; .

The method of claim 14,
The processor is
Obtaining input maps of a next layer of the neural network based on output maps of the current layer, and performing a shift operation on the input maps of the next layer with a value corresponding to the lightweight format to obtain a high bit width and weight the input maps of the next layer of the width to the low bit width.

The method of claim 14,
The processor is
Predict the maximum value of the output maps of the current layer based on the maximum value of the output maps of the previous layer of the neural network, and output the output maps of the current layer based on the maximum value of the output maps of the predicted current layer. A processing device using a neural network to determine the lightweight format associated with the neural network.

The method of claim 14,
The processor is
And reduce the output maps of the current layer to the low bit width based on the determined lightweight format.

The method of claim 14,
The processor is
Performing a shift operation on the output maps of the current layer to a value corresponding to the lightweight format, thereby reducing the high bit width output maps of the current layer to the low bit width; , Processing device using neural network.

Starting a neural network comprising a plurality of layers;
Generating output maps of the current layer by performing a convolution operation between input maps of the current layer of the neural network and weight kernels of the current layer;
Determining a lightweight format for the output maps of the current layer that was not determined before the neural network was started; And
Lightweighting activation data corresponding to output maps of the current layer based on the determined lightweight format;
Processing method using a neural network comprising a.

The method of claim 21,
Starting the neural network
Inputting the input data into the neural network for inference regarding input data.

The method of claim 21,
The determining of the lightweight format is
Determining a lightweight format for the output maps of the current layer based on a distribution of at least some activation data processed within the neural network.

The method of claim 21,
The determining of the lightweight format is
Determining a lightweighting format for the output maps based on a maximum value of the output maps of the current layer,
The weight reduction step
Based on the determined lightweighting format, lightweighting input maps of a next layer corresponding to output maps of the current layer with a low bit width;
Processing method using neural network.

The method of claim 21,
The determining of the lightweight format is
Predicting a maximum value of output maps of the current layer based on the maximum value of output maps of a previous layer of the neural network; And
Determining a lightweighting format for the output maps of the current layer based on the maximum value of the output maps of the predicted current layer
Including,
The weight reduction step
Lightweighting the output maps of the current layer to a low bit width based on the determined lightweighting format;
Processing method using neural network.