KR20220125114A

KR20220125114A - Method and device for encoding

Info

Publication number: KR20220125114A
Application number: KR1020210034835A
Authority: KR
Inventors: 최영재; 최승규; 김이섭; 신재강
Original assignee: 삼성전자주식회사; 한국과학기술원
Priority date: 2021-03-04
Filing date: 2021-03-17
Publication date: 2022-09-14

Abstract

An encoding method and device are disclosed. According to one embodiment of the present invention, the encoding method includes the steps of: receiving input data represented by a 16-bit floating point; adjusting a number of bits of an exponent and a mantissa of the input data to split the input data into 4-bit units; and encoding the input data in which the number of bits has been adjusted such that the exponent is a multiple of 4.

Description

Encoding method and apparatus

아래 실시예들은 인코딩 방법 및 장치에 관한 것이다.The following embodiments relate to an encoding method and apparatus.

인공신경망(Artificial Neural Network)은 컴퓨터 과학적 아키텍쳐(computational architecture)를 참조하여 구현된다. 최근 인공신경망 기술이 발전함에 따라, 다양한 종류의 전자 시스템에서 인공신경망을 활용하여 입력 데이터를 분석하고 유효한 정보를 추출하는 연구가 활발히 진행되고 있다. 인공신경망을 처리하는 장치는 복잡한 입력 데이터에 대한 많은 양의 연산을 필요로 한다. 따라서, 인공신경망을 이용하여 대량의 입력 데이터를 실시간으로 분석하여, 원하는 정보를 추출하기 위해서는 인공신경망에 관한 연산을 효율적으로 처리할 수 있는 기술이 요구된다.An artificial neural network is implemented with reference to a computational architecture. With the recent development of artificial neural network technology, research on analyzing input data and extracting valid information using artificial neural networks in various types of electronic systems is being actively conducted. A device that processes artificial neural networks requires a large amount of computation on complex input data. Therefore, in order to extract desired information by analyzing a large amount of input data in real time using the artificial neural network, a technology capable of efficiently processing the computation of the artificial neural network is required.

일 실시예에 따른 인코딩 방법은 16비트 하프 부동 소수점(half floating point)으로 표현된 입력 데이터를 수신하는 단계; 상기 입력 데이터를 4비트(bit) 단위로 분리할 수 있도록, 상기 입력 데이터의 지수부(exponent) 및 가수부(mantissa)의 비트 수를 조정하는 단계; 및 상기 지수부가 4의 배수로 될 수 있도록 상기 비트 수가 조정된 입력 데이터를 인코딩하는 단계를 포함한다.An encoding method according to an embodiment includes: receiving input data expressed in 16-bit half floating point; adjusting the number of bits of an exponent and a mantissa of the input data so that the input data can be divided into 4-bit units; and encoding the input data whose number of bits is adjusted so that the exponent part is a multiple of four.

상기 비트 수를 조정하는 단계는 상기 지수부에 4비트를 할당하는 단계; 및 상기 가수부에 11비트를 할당하는 단계를 포함할 수 있다.The step of adjusting the number of bits may include: allocating 4 bits to the exponent part; and allocating 11 bits to the mantissa.

상기 인코딩하는 단계는 상기 입력 데이터의 지수부에 4를 더한 값을 4로 나눈 몫과 나머지를 계산하는 단계; 상기 몫에 기초하여, 상기 지수부를 인코딩하는 단계; 및 상기 나머지에 기초하여, 상기 가수부를 인코딩하는 단계를 포함할 수 있다.The encoding may include: calculating a quotient and a remainder obtained by dividing a value obtained by adding 4 to an exponent of the input data by 4; encoding the exponent portion based on the quotient; and encoding the mantissa part based on the remainder.

상기 지수부를 인코딩하는 단계는 상기 몫 및 바이어스(bias)에 기초하여, 상기 지수부를 인코딩하는 단계를 포함할 수 있다.The encoding of the exponent part may include encoding the exponent part based on the quotient and a bias.

상기 가수부를 인코딩하는 단계는 상기 나머지가 0인 경우, 상기 가수부의 제1 비트 값을 1로 결정하는 단계를 포함할 수 있다.The encoding of the mantissa may include determining a first bit value of the mantissa to be 1 when the remainder is 0.

상기 가수부를 인코딩하는 단계는 상기 나머지가 1인 경우, 상기 가수부의 제1 비트 값을 0으로, 상기 가수부의 제2 비트 값을 1로 결정할 수 있다.In the encoding of the mantissa part, when the remainder is 1, the first bit value of the mantissa part may be determined as 0, and the second bit value of the mantissa part may be determined as 1.

상기 가수부를 인코딩하는 단계는 상기 나머지가 2인 경우, 상기 가수부의 제1 비트 값을 0으로, 상기 가수부의 제2 비트 값을 0으로, 상기 가수부의 제3 비트 값을 1로 결정하는 단계를 포함할 수 있다.The step of encoding the mantissa part comprises determining, when the remainder is 2, the first bit value of the mantissa part as 0, the second bit value of the mantissa part as 0, and the third bit value of the mantissa part as 1 may include

상기 가수부를 인코딩하는 단계는 상기 나머지가 3인 경우, 상기 가수부의 제1 비트 값을 0으로, 상기 가수부의 제2 비트 값을 0으로, 상기 가수부의 제3 비트 값을 0으로, 제4 비트 값을 1로 결정하는 단계를 포함할 수 있다.In the encoding of the mantissa, when the remainder is 3, the first bit value of the mantissa part is set to 0, the second bit value of the mantissa part is set to 0, the third bit value of the mantissa part is set to 0, and the fourth bit determining the value to be 1.

일 실시예에 따른 연산 방법은 4비트 고정 소수점(fixed point)으로 표현된 제1 오퍼랜드(operand) 데이터를 수신하는 단계; 16비트 크기의 제2 오퍼랜드 데이터를 수신하는 단계; 상기 제2 오퍼랜드 데이터의 데이터 타입을 판단하는 단계; 상기 제2 오퍼랜드 데이터가 부동 소수점 타입인 경우, 상기 제2 오퍼랜드 데이터를 인코딩하는 단계; 상기 인코딩된 제2 오퍼랜드 데이터를 4비트 단위의 4개의 브릭(brick)으로 분리하는 단계; 및 상기 4개의 브릭으로 분리된 제2 오퍼랜드 데이터와 상기 제1 오퍼랜드 데이터 사이의 MAC 연산을 수행하는 단계를 포함할 수 있다.A calculation method according to an embodiment may include: receiving first operand data expressed as a 4-bit fixed point; receiving second operand data having a size of 16 bits; determining a data type of the second operand data; encoding the second operand data when the second operand data is a floating point type; separating the encoded second operand data into four bricks of a 4-bit unit; and performing a MAC operation between the second operand data divided into the four bricks and the first operand data.

상기 인코딩하는 단계는 상기 제2 오퍼랜드 데이터를 4비트(bit) 단위로 분리할 수 있도록, 상기 제2 오퍼랜드 데이터의 지수부 및 가수부의 비트 수를 조정하는 단계; 및 상기 지수부가 4의 배수로 될 수 있도록 상기 비트 수가 조정된 제2 오퍼랜드 데이터를 인코딩하는 단계를 포함할 수 있다.The encoding may include: adjusting the number of bits of the exponent part and the mantissa part of the second operand data so that the second operand data can be divided into 4-bit units; and encoding the second operand data in which the number of bits is adjusted so that the exponent part is a multiple of 4.

상기 분리하는 단계는 상기 인코딩된 제2 오퍼랜드 데이터를 하나의 지수부 브릭 데이터과 3개의 가수부 브릭 데이터로 분리하는 단계를 포함할 수 있다.The separating may include separating the encoded second operand data into one exponential part brick data and three mantissa part brick data.

상기 MAC 연산을 수행하는 단계는 상기 3개의 가수부 브릭 데이터 각각과 상기 제1 오퍼랜드 데이터 사이의 곱셈 연산을 수행하는 단계; 지수부 레지스터에 저장된 누산 지수부 데이터와 상기 지수부 브릭 데이터를 비교하는 단계; 및 상기 비교 결과에 기초하여, 3개의 가수부 레지스터 각각에 저장된 누산 가수부 데이터에 상기 곱셈 연산 수행 결과를 누산하는 단계를 포함할 수 있다.The performing of the MAC operation may include: performing a multiplication operation between each of the three mantissa part brick data and the first operand data; comparing the accumulated exponential part data stored in the exponential part register with the exponential part brick data; and accumulating the result of performing the multiplication operation on accumulated mantissa data stored in each of the three mantissa registers based on the comparison result.

상기 누산하는 단계는 상기 비교 결과에 기초하여, 상기 3개의 가수부 레지스터 각각에 저장된 상기 누산 가수부 데이터와 상기 곱셈 연산 수행 결과의 누산 위치(position)를 정렬하는 단계를 포함할 수 있다.The accumulating may include aligning the accumulated mantissa data stored in each of the three mantissa registers and an accumulated position of the multiplication operation result based on the comparison result.

일 실시예에 따른 연산 방법은 상기 제2 오퍼랜드 데이터가 고정 수수점 타입인 경우, 병렬 데이터 연산을 위해 상기 제2 오퍼랜드 데이터를 4비트 단위의 4개의 브릭으로 분리하는 단계를 더 포함할 수 있다.The operation method according to an embodiment may further include, when the second operand data is a fixed number type, dividing the second operand data into four blocks of 4-bit units for parallel data operation.

일 실시예에 따른 인코딩 장치는 16비트 부동 소수점(floating point)으로 표현된 입력 데이터를 수신하고, 상기 입력 데이터를 4비트(bit) 단위로 분리할 수 있도록, 상기 입력 데이터의 지수부(exponent) 및 가수부(mantissa)의 비트 수를 조정하고, 상기 지수부가 4의 배수로 될 수 있도록 상기 비트 수가 조정된 입력 데이터를 인코딩하는 프로세서를 포함할 수 있다.The encoding apparatus according to an embodiment receives input data expressed in 16-bit floating point and divides the input data into 4-bit units, so as to divide the input data into an exponent of the input data. and a processor that adjusts the number of bits of a mantissa and encodes the input data whose number of bits is adjusted so that the exponent part can be a multiple of 4.

상기 프로세서는 상기 지수부에 4비트를 할당하고, 상기 가수부에 11비트를 할당할 수 있다.The processor may allocate 4 bits to the exponent part and 11 bits to the mantissa part.

상기 프로세서는 상기 입력 데이터의 지수부에 4를 더한 값을 4로 나눈 몫과 나머지를 계산하고, 상기 몫에 기초하여 상기 지수부를 인코딩하고, 상기 나머지에 기초하여, 상기 가수부를 인코딩할 수 있다.The processor may calculate a quotient and a remainder obtained by dividing a value obtained by adding 4 to an exponent part of the input data by 4, encode the exponent part based on the quotient, and encode the mantissa part based on the remainder.

일 실시예에 따른 연산 장치는 4비트 고정 소수점(fixed point)으로 표현된 제1 오퍼랜드(operand) 데이터를 수신하고, 16비트 크기의 제2 오퍼랜드 데이터를 수신하고, 상기 제2 오퍼랜드 데이터의 데이터 타입을 판단하고, 상기 제2 오퍼랜드 데이터가 부동 소수점 타입인 경우, 상기 제2 오퍼랜드 데이터를 인코딩하고, 상기 인코딩된 제2 오퍼랜드 데이터를 4비트 단위의 4개의 브릭(brick)으로 분리하고, 상기 4개의 브릭으로 분리된 제2 오퍼랜드 데이터와 상기 제1 오퍼랜드 데이터 사이의 MAC 연산을 수행하는 프로세서를 포함할 수 있다.The arithmetic unit according to an embodiment receives first operand data expressed as a 4-bit fixed point, receives 16-bit second operand data, and a data type of the second operand data is determined, and when the second operand data is a floating point type, the second operand data is encoded, the encoded second operand data is divided into four bricks of a 4-bit unit, and the four and a processor that performs a MAC operation between the second operand data separated by bricks and the first operand data.

상기 프로세서는 상기 제2 오퍼랜드 데이터를 4비트(bit) 단위로 분리할 수 있도록 상기 제2 오퍼랜드 데이터의 지수부 및 가수부의 비트 수를 조정하고, 상기 지수부가 4의 배수로 될 수 있도록 상기 비트 수가 조정된 제2 오퍼랜드 데이터를 인코딩할 수 있다.The processor adjusts the number of bits in the exponent part and mantissa part of the second operand data to separate the second operand data in 4-bit units, and adjusts the number of bits so that the exponent part is a multiple of 4 The second operand data may be encoded.

상기 프로세서는 상기 인코딩된 제2 오퍼랜드 데이터를 하나의 지수부 브릭 데이터과 3개의 가수부 브릭 데이터로 분리할 수 있다.The processor may separate the encoded second operand data into one exponential part brick data and three mantissa part brick data.

상기 프로세서는 상기 3개의 가수부 브릭 데이터 각각과 상기 제1 오퍼랜드 데이터 사이의 곱셈 연산을 수행하고, 지수부 레지스터에 저장된 누산 지수부 데이터와 상기 지수부 브릭 데이터를 비교하고, 상기 비교 결과에 기초하여, 3개의 가수부 레지스터 각각에 저장된 누산 가수부 데이터에 상기 곱셈 연산 수행 결과를 누산할 수 있다.The processor performs a multiplication operation between each of the three mantissa part brick data and the first operand data, compares the accumulated exponent part data stored in the exponent part register with the exponent part brick data, and based on the comparison result , the result of performing the multiplication operation may be accumulated in accumulated mantissa data stored in each of the three mantissa registers.

상기 프로세서는 상기 비교 결과에 기초하여, 상기 3개의 가수부 레지스터 각각에 저장된 상기 누산 가수부 데이터와 상기 곱셈 연산 수행 결과의 누산 위치(position)를 정렬할 수 있다.The processor may align the accumulated mantissa data stored in each of the three mantissa registers and an accumulated position of the multiplication operation result based on the comparison result.

상기 프로세서는 상기 제2 오퍼랜드 데이터가 고정 수수점 타입인 경우, 병렬 데이터 연산을 위해 상기 제2 오퍼랜드 데이터를 4비트 단위의 4개의 브릭으로 분리할 수 있다.When the second operand data is a fixed number type, the processor may divide the second operand data into four blocks of 4-bit units for parallel data operation.

도 1a는 인공신경망(Artificial Neural Network)를 이용한 딥러닝 연산 방법을 설명하기 위한 도면이다.
도 1b는 딥러닝 연산에서 입력으로 제공되는 입력 특징맵의 데이터와 필터를 설명하기 위한 도면이다.
도 1c는 딥러닝 기반에서 컨볼루션 연산을 수행하는 과정을 설명하기 위한 도면이다.
도 1d는 시스톨릭 어레이(systolic array)을 이용하여 컨볼루션 연산을 수행하는 방법을 설명하기 위한 도면이다.
도 2는 일 실시예에 따른 인코딩 방법을 설명하기 위한 순서도이다.
도 3은 일 실시예에 따른 인코딩 방법을 설명하기 위한 도면이다.
도 4는 일 실시예에 따른 연산 방법을 설명하기 위한 도면이다.
도 5는 일 실시예에 따른 4비트 고정 소수점으로 표현된 제1 오퍼랜드 데이터와 16비트 하프 부동 소수점으로 표현된 제2 오퍼랜드 데이터 사이의 MAC 연산을 수행하는 방법을 설명하기 위한 도면이다.
도 6은 일 실시예에 따른 지수차에 따라 데이터를 정렬하는 방법을 설명하는 도면이다.
도 7은 일 실시예에 따른 연산 장치를 설명하기 위한 블록도이다.1A is a diagram for explaining a deep learning calculation method using an artificial neural network.
1B is a diagram for explaining data and a filter of an input feature map provided as an input in a deep learning operation.
1C is a diagram for explaining a process of performing a convolution operation based on deep learning.
1D is a diagram for explaining a method of performing a convolution operation using a systolic array.
2 is a flowchart illustrating an encoding method according to an embodiment.
3 is a diagram for describing an encoding method according to an embodiment.
4 is a diagram for explaining a calculation method according to an embodiment.
FIG. 5 is a diagram for explaining a method of performing a MAC operation between first operand data expressed as a 4-bit fixed point and second operand data expressed as a 16-bit half floating point according to an embodiment.
6 is a diagram illustrating a method of aligning data according to an exponential difference according to an exemplary embodiment.
7 is a block diagram illustrating an arithmetic device according to an embodiment.

본 명세서에서 개시되어 있는 특정한 구조적 또는 기능적 내용들은 단지 기술적 개념에 따른 실시예들을 설명하기 위한 목적으로 예시된 것으로서, 다양한 다른 형태로 실시될 수 있으며 설명된 구조 또는 기능에 한정되지 않는다.Specific structural or functional contents disclosed in this specification are merely exemplified for the purpose of describing embodiments according to technical concepts, and may be embodied in various other forms and are not limited to the described structures or functions.

제1 또는 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 이해되어야 한다. 예를 들어 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various elements, but these terms should be understood only for the purpose of distinguishing one element from another element. For example, a first component may be termed a second component, and similarly, a second component may also be termed a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 표현들, 예를 들어 "~간의에"와 "바로~간의에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When a component is referred to as being “connected” or “connected” to another component, it may be directly connected or connected to the other component, but it is understood that other components may exist in between. it should be On the other hand, when it is said that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle. Expressions describing the relationship between elements, for example, “between” and “between” or “neighboring to” and “directly adjacent to”, etc. should be interpreted similarly.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present specification, terms such as “comprise” or “have” are intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, and includes one or more other features or numbers. , it is to be understood that it does not preclude the possibility of the presence or addition of steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present specification. does not

실시예들은 데이터 센터, 서버, 퍼스널 컴퓨터, 랩톱 컴퓨터, 태블릿 컴퓨터, 스마트 폰, 텔레비전, 스마트 가전 기기, 지능형 자동차, 키오스크, 웨어러블 장치 등 다양한 형태의 제품으로 구현될 수 있다. 이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.The embodiments may be implemented in various types of products, such as data centers, servers, personal computers, laptop computers, tablet computers, smart phones, televisions, smart home appliances, intelligent cars, kiosks, wearable devices, and the like. Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. Like reference numerals in each figure indicate like elements.

도 1a는 인공신경망(Artificial Neural Network)를 이용한 딥러닝 연산 방법을 설명하기 위한 도면이다.1A is a diagram for explaining a deep learning calculation method using an artificial neural network.

딥러닝(Deep Learning) 등을 포함하는 인공지능(AI) 알고리즘은 인공신경망(Artificial Neural Network, ANN)에 입력 데이터(10)를 입력시키고, 컨볼루션 등의 연산을 통해 출력 데이터(30)를 학습하는 것을 특징으로 한다. 인공신경망은 생물학적 뇌를 모델링한 컴퓨터 과학적 아키텍쳐(Computational Architecture)를 의미할 수 있다. 인공신경망 내에서, 뇌의 뉴런들에 해당되는 노드들은 서로 연결되어 있고, 입력 데이터를 처리하기 위하여 집합적으로 동작한다. 다양한 종류의 뉴럴 네트워크들을 예로 들면, 컨볼루션 뉴럴 네트워크(Convolutional Neural Network, CNN), 회귀 뉴럴 네트워크(Recurrent Neural Network, RNN), 딥 빌리프 네트워크(Deep Belief Network, DBN), 제한된 볼츠만 기계(Restricted Boltzman Machine, RBM) 방식 등이 있으나, 이에 제한되지 않는다. 피드-포워드(feed-forward) 뉴럴 네트워크에서, 뉴럴 네트워크의 뉴런들은 다른 뉴런들과의 연결들(links)을 갖는다. 이와 같은 연결들은 뉴럴 네트워크를 통해, 한 방향으로, 예를 들어 순방향(forward direction)으로 확장될 수 있다.Artificial intelligence (AI) algorithms, including deep learning, input the input data 10 to an artificial neural network (ANN), and learn the output data 30 through operations such as convolution. characterized in that The artificial neural network may refer to a computer scientific architecture modeling a biological brain. In an artificial neural network, nodes corresponding to neurons in the brain are connected to each other and collectively operate to process input data. Examples of various types of neural networks include Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Deep Belief Network (DBN), Restricted Boltzman Machine Machine, RBM) method, but is not limited thereto. In a feed-forward neural network, neurons in the neural network have links with other neurons. Such connections may extend through the neural network in one direction, for example in the forward direction.

도 1a를 참조하면, 인공신경망에 입력 데이터(10)가 입력되고, 하나 이상의 레이어(layer)를 포함하는 인공 신경망(예를 들어, 컨볼루션 뉴럴 네트워크(Convolution Neural Network, CNN)(20))를 통해 출력 데이터(30)가 출력되는 구조가 도시된다. 인공신경망은 2개 이상의 레이어를 보유한 딥 뉴럴 네트워크(deep neural network)일 수 있다.Referring to FIG. 1A , input data 10 is input to an artificial neural network, and an artificial neural network including one or more layers (eg, a convolutional neural network (CNN) 20) is generated. The structure in which the output data 30 is output through the is shown. The artificial neural network may be a deep neural network having two or more layers.

컨볼루션 뉴럴 네트워크(20)는 입력 데이터(10)로부터 테두리, 선 색 등과 같은 "특징들(features)"을 추출하기 위해 이용될 수 있다. 컨볼루션 뉴럴 네트워크(20)는 복수의 레이어를 포함할 수 있다. 각각의 레이어는 데이터를 수신할 수 있고, 해당 레이어에 입력되는 데이터를 처리하여 해당 레이어에서 출력되는 데이터를 생성할 수 있다. 레이어에서 출력되는 데이터는, 컨볼루션 뉴럴 네트워크(20)에 입력된 이미지 또는 입력된 특징맵(feature map)을 하나 이상의 필터(filter)의 웨이트(weight) 값과 컨볼루션 연산하여 생성한 특징맵일 수 있다. 컨볼루션 뉴럴 네트워크(20)의 초기 레이어들은 입력으로부터 에지들 또는 그레디언트들과 같은 낮은 레벨의 특징들을 추출하도록 동작될 수 있다. 컨볼루션 뉴럴 네트워크(20)의 다음 레이어들은 이미지 내의 눈, 코 등과 같은 점진적으로 더 복잡한 특징들을 추출할 수 있다.The convolutional neural network 20 may be used to extract “features” such as borders, line colors, etc. from the input data 10 . The convolutional neural network 20 may include a plurality of layers. Each layer may receive data, and may generate data output from the corresponding layer by processing data input to the corresponding layer. The data output from the layer may be a feature map generated by performing a convolution operation on an image or an input feature map input to the convolutional neural network 20 with weight values of one or more filters. have. The initial layers of the convolutional neural network 20 may be operated to extract low-level features such as edges or gradients from the input. Subsequent layers of the convolutional neural network 20 may extract progressively more complex features such as eyes, nose, etc. in the image.

도 1b는 딥러닝 연산에서 입력으로 제공되는 입력 특징맵의 데이터와 필터를 설명하기 위한 도면이다.1B is a diagram for explaining data and a filter of an input feature map provided as an input in a deep learning operation.

도 1b를 참조하면, 입력 특징맵(100)은 인공신경망에 입력되는 이미지의 픽셀 값 또는 수치 데이터의 집합일 수 있으나, 이에 제한되지 않는다. 도 1b에서 입력 특징맵(100)은 인공신경망을 통해 학습할 대상이 되는 이미지의 픽셀 값으로 정의될 수 있다. 예를 들어, 입력 특징맵(100)은 256×256의 픽셀과 K의 깊이(depth)를 가질 수 있다. 그러나, 상기 값은 예시적인 것이고, 입력 특징맵(100)의 픽셀 크기가 상기 예시로 한정되는 것은 아니다.Referring to FIG. 1B , the input feature map 100 may be a set of pixel values or numerical data of an image input to the artificial neural network, but is not limited thereto. In FIG. 1B , the input feature map 100 may be defined as a pixel value of an image to be learned through an artificial neural network. For example, the input feature map 100 may have a pixel of 256×256 and a depth of K. However, the above values are exemplary, and the pixel size of the input feature map 100 is not limited to the above example.

필터(110-1 내지 110-n)은 N개로 형성될 수 있다. 복수의 필터(110-1 내지 110-n) 각각은 n by n(n×n)의 웨이트(weight) 값을 포함할 수 있다. 예를 들어, 복수의 필터(110-1 내지 110-n) 각각은 3×3의 픽셀과 K의 깊이값을 가질 수 있다. 그러나, 상기 필터의 크기는 예시적인 것이고, 복수의 필터(110-1 내지 110-n) 각각의 크기가 상기 예시로 한정되는 것은 아니다.N filters 110-1 to 110-n may be formed. Each of the plurality of filters 110-1 to 110-n may include a weight value of n by n (n×n). For example, each of the plurality of filters 110-1 to 110-n may have a pixel of 3×3 and a depth value of K. However, the size of the filter is exemplary, and the size of each of the plurality of filters 110 - 1 to 110 - n is not limited to the above example.

도 1c는 딥러닝 기반에서 컨볼루션 연산을 수행하는 과정을 설명하기 위한 도면이다.1C is a diagram for explaining a process of performing a convolution operation based on deep learning.

도 1c를 참조하면, 인공신경망에서 컨볼루션 연산을 수행하는 과정은, 각각의 레이어에서 입력 특징맵(100)과 필터(110)와의 곱셈 및 덧셈 연산을 하여 출력 값을 생성하고, 출력 값을 누적하여 합산함으로써, 출력 특징맵(120)을 생성하는 과정을 의미할 수 있다.Referring to FIG. 1C , in the process of performing the convolution operation in the artificial neural network, multiplication and addition operations are performed between the input feature map 100 and the filter 110 in each layer to generate an output value, and the output value is accumulated and summing them, it may mean a process of generating the output feature map 120 .

컨볼루션 연산 수행 과정은, 현재 레이어에서 입력 특징맵(100)의 좌측 상단으로부터 우측 하단까지 일정한 크기, 즉 n×n 크기의 필터(110)를 적용하여 곱셈 및 덧셈 연산을 수행하는 과정이다. 이하에서는, 필터(110)의 크기가 3×3인 경우에 컨볼루션 연산을 수행하는 과정을 설명하기로 한다.The process of performing the convolution operation is a process of performing multiplication and addition operations by applying a filter 110 having a constant size, that is, n×n, from the upper left to the lower right of the input feature map 100 in the current layer. Hereinafter, a process of performing a convolution operation when the size of the filter 110 is 3×3 will be described.

예를 들어, 먼저 입력 특징맵(100)의 좌측 상단 제1 영역(101)에서 3×3, 즉 제1 방향으로 3개의 데이터와 제2 방향으로 3개의 데이터를 포함한 총 9개의 데이터(x11 내지 x33)를 각각 필터(110)의 웨이트 값(weight)(w11 내지 w33)과 곱하는 연산을 수행한다. 이후, 곱셈 연산의 출력 값, 즉 x11*w11, x12*w12, x13*w13, x21*w21, x22*w22, x23*w23, x31*w31, x32*w32, x33*w33을 모두 누적하여 합산하면 출력 특징맵(120)의 제1-1 출력 데이터(y11)가 생성된다.For example, first, in the upper left region 101 of the input feature map 100 , 3×3, that is, a total of 9 data (x11 to 3 data in the first direction and 3 data in the second direction) x33) is multiplied by weight values w11 to w33 of the filter 110, respectively. After that, if all the output values of the multiplication operation, that is, x11*w11, x12*w12, x13*w13, x21*w21, x22*w22, x23*w23, x31*w31, x32*w32, x33*w33, are accumulated and summed, The 1-1 output data y11 of the output feature map 120 is generated.

이후, 입력 특징맵(100)의 좌측 상단의 제1 영역(101)에서 제2 영역(102)으로 데이터의 단위만큼 이동하면서 연산한다. 이 때, 컨볼루션 연산 과정에서 입력 특징맵(100) 내의 데이터가 이동하는 개수를 스트라이드(stride)라고 하며, 스트라이드의 크기에 따라 생성되는 출력 특징맵(120)의 크기가 결정될 수 있다. 예를 들어, 스트라이드가 1인 경우, 제2 영역(102)에 포함된 총 9개의 입력 데이터(x12 내지 x34)를 필터(110)의 웨이트 값(w11 내지 w33)과 곱하는 연산을 수행하고, 곱셈 연산의 출력 값인 x12*w11, x13*w12, x14*w13, x22*w21, x23*w22, x24*w23, x32*w31, x33*w32, x34*w33을 모두 누적하여 합산하면 출력 특징맵(120)의 제1-2 출력 데이터(y12)가 생성된다.Thereafter, the operation is performed while moving from the first area 101 at the upper left of the input feature map 100 to the second area 102 by data units. In this case, the number of moving data in the input feature map 100 in the convolution operation process is called a stride, and the size of the generated output feature map 120 may be determined according to the size of the stride. For example, when the stride is 1, a total of nine input data (x12 to x34) included in the second region 102 is multiplied by the weight values w11 to w33 of the filter 110, and multiplication is performed. The output feature map (120 ) of 1-2 output data y12 is generated.

도 1d는 시스톨릭 어레이(systolic array)을 이용하여 컨볼루션 연산을 수행하는 방법을 설명하기 위한 도면이다.1D is a diagram for explaining a method of performing a convolution operation using a systolic array.

도 1d를 참조하면, 입력 특징맵(130)의 각 데이터는 일정한 레이턴시(latency)를 갖는 클럭(clock)에 따라 순차적으로 프로세싱 엘리먼트들(Processing Elements; PEs)(141 내지 149)에 입력되는 시스톨릭 어레이로 매핑(mapping)될 수 있다. 프로세싱 엘리먼트는 곱셈 덧셈 연산기일 수 있다.Referring to FIG. 1D , each data of the input feature map 130 is sequentially input to Processing Elements (PEs) 141 to 149 according to a clock having a constant latency. It can be mapped to an array. The processing element may be a multiplication addition operator.

제1 클럭에는 시스톨릭 어레이의 제1 행(①)의 제1-1 데이터(x11)가 제1 프로세싱 엘리먼트(141)에 입력될 수 있다. 도 1d에는 도시되지 않았지만, 제1-1 데이터(x11)는 제1 클럭에서 w11의 웨이트 값과 곱해질 수 있다. 이후 제2 클럭에서는 제1-1 데이터(x11)는 제2 프로세싱 엘리먼트(142)에 입력되고, 제2-1 데이터(x21)는 제1 프로세싱 엘리먼트(141)에 입력되며, 제1-2 데이터(x12)는 제4 프로세싱 엘리먼트(144)에 입력될 수 있다. 마찬가지로, 제3 클럭에서 제1-1 데이터(x11)는 제3 프로세싱 엘리먼트(143)에 입력되고, 제2-1 데이터(x21)는 제2 프로세싱 엘리먼트(142)에 입력되며, 제1-2 데이터(x12)는 제5 프로세싱 엘리먼트(145)에 입력 될 수 있다. 제3 클럭에서 제3-1 데이터(x31)는 제1 프로세싱 엘리먼트(141)에 입력되고, 제2-2 데이터(x22)는 제4 프로세싱 엘리먼트(144)에 입력되며, 제1-3 데이터(x13)는 제7 프로세싱 엘리먼트(147)에 입력될 수 있다.In the first clock, the first-first data (x11) of the first row (①) of the systolic array may be input to the first processing element 141 . Although not shown in FIG. 1D , the 1-1 data (x11) may be multiplied by a weight value of w11 in the first clock. Thereafter, in the second clock, the 1-1 th data (x11) is input to the second processing element 142, the 2-1 th data (x21) is input to the first processing element 141, and the 1-2 th data (x12) may be input to the fourth processing element 144 . Similarly, in the third clock, the 1-1 th data (x11) is input to the third processing element 143, the 2-1 th data (x21) is input to the second processing element 142, and the 1-2 th data The data x12 may be input to the fifth processing element 145 . In the third clock, the 3-1 th data (x31) is input to the first processing element 141, the 2-2 th data (x22) is input to the fourth processing element 144, and the 1-3 th data ( x13) may be input to the seventh processing element 147 .

전술한 바와 같이, 입력 특징맵(130)은 순차적인 클럭에 따라 프로세싱 엘리먼트(141 내지 149) 내의 각 프로세싱 엘리먼트에 입력되고, 각 클럭에 따라 입력된 웨이트 값과 곱셈 및 덧셈 연산이 수행될 수 있다. 순차적으로 입력된 입력 특징맵(130)의 각 데이터와 웨이트 값의 곱셈 및 덧셈 연산을 통해 출력된 값들을 누적 합산함에 따라 출력 특징맵이 생성될 수 있다.As described above, the input feature map 130 is input to each processing element in the processing elements 141 to 149 according to a sequential clock, and multiplication and addition operations with the input weight value according to each clock may be performed. . An output feature map may be generated by cumulatively summing values output through multiplication and addition operations of sequentially input data of the input feature map 130 and a weight value.

도 2는 일 실시예에 따른 인코딩 방법을 설명하기 위한 순서도이다.2 is a flowchart illustrating an encoding method according to an embodiment.

도 2의 동작은 도시된 순서 및 방식으로 수행될 수 있지만, 도시된 실시예의 사상 및 범위를 벗어나지 않으면서 일부 동작의 순서가 변경되거나 일부 동작이 생략될 수 있다. 도 2에 도시된 다수의 동작은 병렬로 또는 동시에 수행될 수 있다. 도 2의 하나 이상의 블록들 및 블록들의 조합은 특정 기능을 수행하는 특수 목적 하드웨어 기반 컴퓨터, 또는 특수 목적 하드웨어 및 컴퓨터 명령들의 조합에 의해 구현될 수 있다.The operations of FIG. 2 may be performed in the illustrated order and manner, but the order of some operations may be changed or some operations may be omitted without departing from the spirit and scope of the illustrated embodiment. A number of the operations shown in FIG. 2 may be performed in parallel or concurrently. One or more blocks and combinations of blocks of FIG. 2 may be implemented by a special-purpose hardware-based computer that performs a particular function, or a combination of special-purpose hardware and computer instructions.

일 실시예에 따른 뉴럴 네트워크를 이용한 연산은 응용 어플리케이션의 종류에 따라 필요한 연산 포맷이 상이할 수 있다. 예를 들어, 이미지의 객체 종류를 판단하는 어플리케이션의 경우, 8bit 보다 낮은 비트 프리시전만으로도 충분할 수 있고, 음성과 관련된 어플리케이션의 경우 8bit 보다 높은 비트 프리시전이 요구될 수 있다.A calculation format required for calculation using a neural network according to an embodiment may be different depending on the type of application. For example, in the case of an application for determining the object type of an image, a bit precision lower than 8 bits may be sufficient, and in the case of an application related to voice, a bit precision higher than 8 bits may be required.

일 실시예에 따른 딥러닝의 필수 연산자인 MAC(multiply-and-accumulate) 연산의 입력 오퍼랜드(input operand) 또한 상황에 따라 다양한 프리시전(precision)으로 구성될 수 있다. 예를 들어, 뉴럴 네트워크의 학습에 필요한 입력 오퍼랜드 중 하나인 그래디언트(gradient)는 16비트 하프 부동 소수점(half floating point) 정도의 프리시전을 요구하며, 다른 입력 오퍼랜드들인 입력 특징맵과 웨이트는 낮은 프리시전의 고정 소수점으로도 처리가 가능할 수 있다.An input operand of a multiply-and-accumulate (MAC) operation, which is an essential operator of deep learning according to an embodiment, may also be configured with various precisions according to circumstances. For example, a gradient, one of the input operands required for training a neural network, requires a precision of about 16-bit half floating point, and the other input operands, the input feature map and weight, are of low precision. Fixed point processing may also be possible.

이렇게 다양한 요구(requirement)를 가지는 데이터를 처리하기 위한 기본적인 방법은, 불필요하게 많은 하드웨어 리소스를 써서 각 입력 타입(input type)에 대한 MAC 연산을 수행할 수 있는 하드웨어 구성 요소(component)들을 만들어 쓰는 것이다.The basic method for processing data having such various requirements is to create and write hardware components that can perform MAC operation for each input type by using a lot of hardware resources unnecessarily. .

단일 하드웨어를 이용해 다양한 입력 타입에 대해 MAC 연산을 수행하기 위해서는, 가장 복잡도가 높은 데이터 타입을 기준으로 하드웨어 연산 유닛들을 설계해야 한다. 그러나, 이 경우에는 낮은 로우 프리시전(low-precision) 연산이 들어올 경우에도 복잡도가 가장 높은 하이 프리시전(high-precision) 데이터를 기준으로 만들어진 연산자를 통해서 실행할 수 밖에 없어 비효율적이게 된다. 보다 구체적으로, 하드웨어 구현 영역(hardware implementation area)이 불필요하게 증가하고, 하드웨어가 소모하는 전력 또한 불필요하게 증가할 수 있다.In order to perform MAC operations on various input types using single hardware, hardware operation units should be designed based on the data type with the highest complexity. However, in this case, even when a low-precision operation is entered, it is inefficient because it has to be executed through an operator made based on high-precision data with the highest complexity. More specifically, a hardware implementation area may increase unnecessarily, and power consumed by the hardware may also increase unnecessarily.

일 실시예에 따른 인코딩 방법 및 연산 방법에 따르면, 학습 과정의 그래디언트 연산을 고정밀도로 유지함과 동시에 저정밀도의 추론 과정을 효율적으로 구동시킬 수 있다.According to the encoding method and the operation method according to an embodiment, it is possible to efficiently drive a low-precision reasoning process while maintaining a high-precision gradient operation in a learning process.

단계(210)에서, 일 실시예에 따른 인코딩 장치는 16비트 부동 소수점(floating point)으로 표현된 입력 데이터를 수신한다.In operation 210, the encoding apparatus according to an embodiment receives input data expressed in 16-bit floating point.

단계(220)에서, 일 실시예에 따른 인코딩 장치는 입력 데이터를 4비트(bit) 단위로 분리할 수 있도록, 상기 입력 데이터의 지수부(exponent) 및 가수부(mantissa)의 비트 수를 조정한다. 일 실시예에 따른 인코딩 장치는 기존 16비트 하프 부동 소수점(half floating point)의 비트 분포인 {sign, exponent, mantissa}={1,5,10}을 4비트 단위로 분리할 수 있도록 {sign, exponent, mantissa}={1,4,11}의 형태로 구성 비트 수를 조정할 수 있다. 이에 따라 지수부에 할당된 비트는 하나 줄어들고, 그만큼 가수부 비트가 11비트로 하나 더 늘어나게 된다.In step 220, the encoding apparatus according to an embodiment adjusts the number of bits of the exponent and mantissa of the input data so that the input data can be divided into 4-bit units. . The encoding apparatus according to an embodiment of the present invention provides {sign, The number of constituent bits can be adjusted in the form exponent, mantissa}={1,4,11}. Accordingly, the number of bits allocated to the exponent is reduced by one, and the number of bits allocated to the mantissa is increased by one more to 11 bits.

단계(230)에서, 일 실시예에 따른 인코딩 장치는 지수부가 4의 배수로 될 수 있도록 상기 비트 수가 조정된 입력 데이터를 인코딩한다. 일 실시예에 따른 인코딩 장치는 기존 16비트 하프 부동 소수점보다 지수부 범위(exponent range)를 더 넓게 확보함과 동시에 비트-브릭(bit-brick)의 연산에 용이하도록 지수부를 4만큼의 스텝(step)으로 인코딩할 수 있다. 아래에서, 인코딩하는 구체적인 방법은 도 3을 참조하여 설명된다.In step 230, the encoding apparatus according to an embodiment encodes the input data whose number of bits is adjusted so that the exponent part is a multiple of 4. The encoding apparatus according to an embodiment secures an exponent range wider than that of a conventional 16-bit half floating point and at the same time steps the exponent part by 4 to facilitate bit-brick operation. ) can be encoded. In the following, a specific method of encoding is described with reference to FIG. 3 .

도 3은 일 실시예에 따른 인코딩 방법을 설명하기 위한 도면이다.3 is a diagram for describing an encoding method according to an embodiment.

일 실시예에 따른 인코딩 방법을 설명하기에 앞서, 데이터를 부동 소수점으로 표현하는 방법을 선행적으로 설명한다. 예를 들어, 10진법으로 표현된 263.3을 2진법으로 표현하면 100000111.0100110??이고, 이를 1.0000011101*2⁸으로 표현할 수 있다. 나아가, 이를 부동 소수점으로 표현하면 부호부의 비트(1bit)는 0(양수), 지수부의 비트(5bit)는 11000(8+16(bias))이고, 가수부 비트는 0000011101(10bit)로, 최종적으로 0110000000011101로 표현할 수 있다.Before describing an encoding method according to an embodiment, a method for representing data in floating point numbers will be described in advance. For example, if 263.3 expressed in decimal system is expressed in binary system, it is 100000111.0100110??, which can be expressed as 1.0000011101*2 ⁸ . Furthermore, when expressed as a floating point, the bit (1 bit) of the sign part is 0 (positive number), the bit (5 bits) of the exponent part is 11000 (8+16 (bias)), and the mantissa bit is 0000011101 (10 bits), and finally It can be expressed as 0110000000011101.

도 3을 참조하면, 일 실시예에 따른 인코딩 장치는 {sign, exponent, mantissa}={1,4,11}의 형태로 구성 비트 수를 조정할 수 있다. 예를 들어, 위 예시의 1.0000011101*2⁸를 0.10000011101*2⁹로 조정하여, 부호부에는 1비트, 지수부에는 4비트, 가수부에는 11비트를 할당할 수 있다.Referring to FIG. 3 , the encoding apparatus according to an embodiment may adjust the number of constituent bits in the form {sign, exponent, mantissa}={1,4,11}. For example, by adjusting 1.0000011101*2 ⁸ in the above example to 0.10000011101*2 ⁹ , 1 bit can be allocated to the sign part, 4 bits to the exponent part, and 11 bits to the mantissa part.

일 실시예에 따른 인코딩 장치는 지수부가 4의 배수로 될 수 있도록 상기 비트 수가 조정된 입력 데이터를 인코딩할 수 있다. 보다 구체적으로, 인코딩 장치는 입력 데이터의 지수부에 4를 더한 값을 4로 나눈 몫과 나머지를 계산하고, 몫에 기초하여 지수부를 인코딩하고, 나머지에 기초하여 가수부를 인코딩할 수 있다.The encoding apparatus according to an embodiment may encode the input data in which the number of bits is adjusted so that the exponent part may be a multiple of 4. More specifically, the encoding apparatus may calculate a quotient and a remainder obtained by dividing a value obtained by adding 4 to an exponent part of the input data by 4, encode the exponent part based on the quotient, and encode the mantissa part based on the remainder.

일 실시예에 따른 인코딩 장치는 몫 및 바이어스(bias)에 기초하여, 상기 지수부를 인코딩할 수 있다.The encoding apparatus according to an embodiment may encode the exponent part based on a quotient and a bias.

일 실시예에 따른 인코딩 장치는 나머지가 0인 경우, 상기 가수부의 제1 비트 값을 1로 결정할 수 있고, 나머지가 1인 경우, 상기 가수부의 제1 비트 값을 0으로, 상기 가수부의 제2 비트 값을 1로 결정할 수 있고, 나머지가 2인 경우, 상기 가수부의 제1 비트 값을 0으로, 상기 가수부의 제2 비트 값을 0으로, 상기 가수부의 제3 비트 값을 1로 결정할 수 있고, 나머지가 3인 경우, 상기 가수부의 제1 비트 값을 0으로, 상기 가수부의 제2 비트 값을 0으로, 상기 가수부의 제3 비트 값을 1로 결정할 수 있다. 이를 표로 표현하면 표 1과 같다.The encoding apparatus according to an embodiment may determine the first bit value of the mantissa part as 1 when the remainder is 0, and when the remainder is 1, the first bit value of the mantissa part is set to 0, and the second bit value of the mantissa part is set to 0. The bit value may be determined to be 1, and if the remainder is 2, the first bit value of the mantissa part may be determined as 0, the second bit value of the mantissa part may be determined as 0, and the third bit value of the mantissa part may be determined as 1. , the remainder is 3, it is possible to determine the first bit value of the mantissa part as 0, the second bit value of the mantissa part as 0, and the third bit value of the mantissa part as 1. If this is expressed in a table, it is shown in Table 1.

예를 들어, 인코딩 장치는 0.10000011101*2⁹를 0.10000011101*2^4*3-3로 변환할 수 있고, 이를 다시 0.00010000011101*2^4*3로 변환할 수 있다. 이에 기초하여, 인코딩 장치는 지수부의 비트(4bit)는 1011(3+8(bias))으로, 부호부의 비트(1bit)는 0(양수)으로, 가수부 비트는 00010000011으로 인코딩할 수 있다.For example, the encoding device may convert 0.10000011101*2 ⁹ into 0.10000011101*2 ^4*3-3 , and may convert it back into 0.00010000011101*2 ^4*3 . Based on this, the encoding apparatus may encode bits (4 bits) of the exponent part as 1011 (3+8 (bias)), bits (1 bit) of the sign part as 0 (positive numbers), and bits of the mantissa part as 00010000011.

일 실시예에 따른 인코딩 장치는 인코딩된 데이터를 하나의 지수부 브릭 데이터과 3개의 가수부 브릭 데이터로 분리하여 표현할 수 있다. 3개의 가수부 브릭 데이터는 탑(top) 브릭 데이터, 미들(middle) 브릭 데이터, 바텀(bottom) 브릭 데이터로 분리할 수 있고, 탑 브릭은 1개의 부호부 비트와 3개의 가수부 비트로 구성될 수 있다. 위 예시에서, 지수 브릭 데이터는 1011이고, 탑 브릭 데이터는 0000, 미들 브릭 데이터는 1000, 바텀 브릭 데이터는 0011일 수 있다.The encoding apparatus according to an embodiment may divide and express encoded data into one exponential part brick data and three mantissa part brick data. The three mantissa brick data can be divided into top brick data, middle brick data, and bottom brick data, and the top brick can be composed of 1 sign bit and 3 mantissa bits. have. In the above example, the index brick data may be 1011, the top brick data may be 0000, the middle brick data may be 1000, and the bottom brick data may be 0011.

일 실시예에 따른 4 비트의 지수부 브릭 데이터와 4비트의 탑/미들/바텀 브릭 데이터들은 하드웨어 적으로 분리가 용이할 수 있다. 더불어, 부동 소수점 덧셈 연산에서 항상 고려하는 지수부 차이(exponent difference)가 항상 4의 배수 단위로 차이가 나기 때문에 곱해진 값들을 특별한 시프팅(shifting) 없이 고정 소수점 덧셈기(fixed-point adder)들로 퓨징(fusing)할 수 있는 구조가 가능해질 수 있다.According to an embodiment, the 4-bit exponential part brick data and the 4-bit top/middle/bottom brick data may be easily separated in hardware. In addition, since the exponential difference, which is always considered in the floating-point addition operation, always differs by a multiple of 4, the multiplied values are converted to fixed-point adders without special shifting. A structure capable of fusing may become possible.

도 4는 일 실시예에 따른 연산 방법을 설명하기 위한 도면이다.4 is a diagram for explaining a calculation method according to an embodiment.

도 4를 참조하면, 일 실시예에 따른 연산 장치는 4비트 고정 소수점(fixed point)으로 표현된 제1 오퍼랜드(operand) 데이터(410)와 16비트 크기의 제2 오퍼랜드 데이터(420)를 수신할 수 있다. 일 실시예에 따른 연산 장치는 도 2 내지 도 3을 참조하여 전술한 인코딩 장치를 포함할 수 있다. 제1 오퍼랜드 데이터는 웨이트 및/또는 입력 특징맵일 수 있고, 제2 오퍼랜드 데이터는 그래디언트일 수 있다.Referring to FIG. 4 , the arithmetic unit according to an embodiment may receive first operand data 410 expressed as a 4-bit fixed point and second operand data 420 having a size of 16 bits. can The computing device according to an embodiment may include the encoding device described above with reference to FIGS. 2 to 3 . The first operand data may be a weight and/or an input feature map, and the second operand data may be a gradient.

일 실시예에 따른 연산 장치는 단계(430)에서, 제2 오퍼랜드 데이터의 데이터 타입을 판단할 수 있다.In operation 430 , the computing device according to an embodiment may determine the data type of the second operand data.

일 실시예에 따른 연산 장치는 단계(440-1)에서, 제2 오퍼랜드 데이터(420)가 고정 소수점 타입인 경우, 병렬 데이터 연산을 위해 제2 오퍼랜드 데이터(420)를 4비트 단위의 4개의 브릭으로 분리할 수 있다.In the operation device according to an embodiment, in step 440-1, when the second operand data 420 is of a fixed-point type, the second operand data 420 is converted into four blocks of 4-bit units for parallel data operation. can be separated into

일 실시예에 따른 연산 장치는 단계(440-2)에서, 제2 오퍼랜드 데이터(420)가 부동 소수점 타입인 경우, 제2 오퍼랜드 데이터(420)를 도 2 내지 도 3을 참조하여 전술한 방법에 따라 인코딩할 수 있다. 예를 들어, 연산 장치는 제2 오퍼랜드 데이터(420)를 4비트(bit) 단위로 분리할 수 있도록, 제2 오퍼랜드 데이터의 지수부 및 가수부의 비트 수를 조정할 수 있고, 지수부가 4의 배수로 될 수 있도록 비트 수가 조정된 제2 오퍼랜드 데이터를 인코딩할 수 있다.In step 440-2, the arithmetic device according to an embodiment converts the second operand data 420 to the method described above with reference to FIGS. 2 to 3 when the second operand data 420 is a floating point type. can be encoded accordingly. For example, the arithmetic unit may adjust the number of bits of the exponent part and mantissa part of the second operand data so as to separate the second operand data 420 in units of 4 bits, and the exponent part may be a multiple of 4. It is possible to encode the second operand data whose number of bits is adjusted so as to be able to do so.

일 실시예에 따른 연산 장치는 단계(450)에서, 인코딩된 제2 오퍼랜드 데이터를 4비트 단위의 4개의 브릭(brick)으로 분리할 수 있다. 보다 구체적으로, 연산 장치는 인코딩된 제2 오퍼랜드 데이터를 하나의 지수부 브릭 데이터과 3개의 가수부 브릭 데이터로 분리할 수 있다.In operation 450 , the arithmetic device according to an embodiment may divide the encoded second operand data into four bricks in units of 4 bits. More specifically, the arithmetic unit may separate the encoded second operand data into one exponential part brick data and three mantissa part brick data.

일 실시예에 따른 연산 장치는 단계(460)에서, 4개의 브릭으로 분리된 제2 오퍼랜드 데이터와 제1 오퍼랜드 데이터(410) 사이의 MAC 연산을 수행할 수 있다. 연산 장치는 3개의 가수부 브릭 데이터 각각과 제1 오퍼랜드 데이터(410) 사이의 곱셈 연산을 수행할 수 있다. 4개의 브릭으로 분리된 제2 오퍼랜드 데이터와 제1 오퍼랜드 데이터(410) 사이의 MAC 연산을 수행하는 구체적은 방법은 아래에서 도 5를 참조하여 설명한다.In operation 460 , the computing device according to an embodiment may perform a MAC operation between the second operand data and the first operand data 410 separated into four blocks. The arithmetic unit may perform a multiplication operation between each of the three mantissa block data and the first operand data 410 . A specific method of performing a MAC operation between the second operand data and the first operand data 410 separated into four bricks will be described below with reference to FIG. 5 .

일 실시예에 따른 연산 장치는 단계(470)에서, 제2 오퍼랜드 데이터의 데이터 타입을 판단할 수 있다.In operation 470 , the computing device according to an embodiment may determine the data type of the second operand data.

일 실시예에 따른 연산 장치는 단계(480-1)에서, 제2 오퍼랜드 데이터(420)의 데이터가 고정 소수점 타입인 경우, 4개의 분리된 출력을 합산할 수 있다.In operation 480-1, the arithmetic device according to an embodiment may sum the four separate outputs when the data of the second operand data 420 is a fixed-point type.

일 실시예에 따른 연산 장치는 단계(480-2)에서, 제2 오퍼랜드 데이터(420)의 데이터가 부동 소수점 타입인 경우, 지수부 레지스터에 저장된 누산 지수부 데이터와 지수부 브릭 데이터를 비교하고, 비교 결과에 기초하여 3개의 가수부 레지스터 각각에 저장된 누산 가수부 데이터에 곱셈 연산 수행 결과를 누산할 수 있다. 보다 구체적으로, 연산 장치는 전술한 비교 결과에 기초하여 3개의 가수부 레지스터 각각에 저장된 누산 가수부 데이터와 곱셈 연산 수행 결과의 누산 위치(position)를 정렬하여 누산할 수 있다. 비교 결과에 기초하여 3개의 가수부 레지스터 각각에 저장된 누산 가수부 데이터에 곱셈 연산 수행 결과를 누산하는 구체적인 방법은 아래에서 도 6을 참조하여 설명한다.In step 480-2, the arithmetic device according to an embodiment compares the accumulated exponent data stored in the exponent register and the exponent block data when the data of the second operand data 420 is of a floating point type, Based on the comparison result, the result of performing the multiplication operation may be accumulated in the accumulated mantissa data stored in each of the three mantissa registers. More specifically, the arithmetic unit may align and accumulate accumulated mantissa data stored in each of the three mantissa registers based on the above-described comparison result and an accumulated position of a result of performing a multiplication operation. A detailed method of accumulating a result of performing a multiplication operation in accumulated mantissa data stored in each of the three mantissa registers based on the comparison result will be described below with reference to FIG. 6 .

도 5는 일 실시예에 따른 4비트 고정 소수점으로 표현된 제1 오퍼랜드 데이터와 16비트 하프 부동 소수점으로 표현된 제2 오퍼랜드 데이터 사이의 MAC 연산을 수행하는 방법을 설명하기 위한 도면이다.FIG. 5 is a diagram for explaining a method of performing a MAC operation between first operand data expressed as a 4-bit fixed point and second operand data expressed as a 16-bit half floating point according to an embodiment.

도 5를 참조하면, 일 실시예에 따른 연산 장치는 4*4 단위의 곱셈기, 지수부 레지스터와 3개의 가수부 레지스터를 포함할 수 있다. 3개의 가수부 레지스터는 탑 브릭 데이터의 연산 결과를 저장하는 탑 브릭 레지스터, 미들 브릭 데이터의 연산 결과를 저장하는 미들 브릭 레지스터 및 바텀 브릭 데이터의 연산 결과를 저장하는 바텀 브릭 레지스터를 포함할 수 있다.Referring to FIG. 5 , an arithmetic unit according to an embodiment may include a 4*4 unit multiplier, an exponent register, and three mantissa registers. The three mantissa registers may include a top brick register storing an operation result of the top brick data, a middle brick register storing an operation result of the middle brick data, and a bottom brick register storing an operation result of the bottom brick data.

일 실시예에 따른 연산 장치는, 제2 오퍼랜드 데이터가 16비트 하프 부동 소수점 타입이라면, 3개의 가수부를 3개의 4비트 브릭 데이터로 나눠서, 4*4로 설계된 곱셈기를 통해 제1 오퍼랜드 데이터와 곱셈을 수행할 수 있다. 그 결과로 나온 3개의 곱셈 결과는 지수부 레지스터에 저장된 누산 지수부 데이터와 지수부 브릭 데이터의 차인 지수차(Exp diff)에 따라 정렬(alignment)이 이뤄지고 각각의 가수부 레지스터 각각에 저장된 누산 가수부 데이터에 곱셈 연산 수행 결과가 누산되어 저장될 수 있다.If the second operand data is a 16-bit half floating-point type, the arithmetic unit according to an embodiment divides three mantissa parts into three 4-bit brick data, and multiplies the first operand data with the first operand data through a multiplier designed as 4*4. can be done The resulting three multiplication results are aligned according to the exp diff, which is the difference between the accumulated exponent data stored in the exponent register and the exponential block data, and the accumulated mantissa stored in each mantissa register. A result of performing a multiplication operation may be accumulated and stored in data.

도 6은 일 실시예에 따른 지수차에 따라 데이터를 정렬하는 방법을 설명하는 도면이다.6 is a diagram illustrating a method of aligning data according to an exponential difference according to an exemplary embodiment.

도 6을 참조하면, 일 실시예에 따른 곱셈기의 출력인 8비트(4bit*4bit) 데이터를 누산하기 위해 제공하는 가수부 레지스터는 12비트로 구성되어 있다. 일 실시예에 따른 연산 장치는 지수차에 따라 곱셈기의 출력들의 위치를 지정하여 누산할 수 있다.Referring to FIG. 6 , the mantissa register provided for accumulating 8-bit (4bit*4bit) data that is the output of the multiplier according to an embodiment consists of 12 bits. The arithmetic device according to an embodiment may accumulate by designating the positions of the outputs of the multiplier according to the exponential difference.

예를 들어, 지수차가 0인 경우(제2 오퍼랜드 데이터의 지수가 저장된 누산 지수부 데이터보다 큰 경우)에는, 곱셈 연산 수행 결과와 3개의 가수부 레지스터 각각에 저장된 누산 가수부 데이터를 동일한 위치에 정렬하여 누산할 수 있다.For example, when the exponent difference is 0 (when the exponent of the second operand data is greater than the stored accumulated exponent data), the result of the multiplication operation and the accumulated mantissa data stored in each of the three mantissa registers are arranged in the same position so it can be accumulated.

또한, 지수차가 -1인 경우(제2 오퍼랜드 데이터의 지수가 저장된 누산 지수부 데이터보다 큰 경우)에는, 곱셈 연산 수행 결과가 3개의 가수부 레지스터 각각에 저장된 누산 가수부 데이터보다 오른쪽으로 4비트 시프팅될 수 있도록 정렬하여 누산할 수 있다.In addition, when the exponent difference is -1 (when the exponent of the second operand data is larger than the stored accumulated exponent data), the result of the multiplication operation is shifted by 4 bits to the right from the accumulated mantissa data stored in each of the three mantissa registers. It can be accumulated by sorting it so that it can be indexed.

또한, 지수차가 1인 경우(제2 오퍼랜드 데이터의 지수가 저장된 누산 지수부 데이터보다 작은 경우)에는, 곱셈 연산 수행 결과가 3개의 가수부 레지스터 각각에 저장된 누산 가수부 데이터보다 왼쪽으로 4비트 시프팅될 수 있도록 정렬하여 누산할 수 있다.In addition, when the exponent difference is 1 (when the exponent of the second operand data is smaller than the stored accumulated exponent data), the result of the multiplication operation is shifted by 4 bits to the left of the accumulated mantissa data stored in each of the three mantissa registers It can be accumulated by sorting it so that it becomes possible.

도 7은 일 실시예에 따른 연산 장치를 설명하기 위한 블록도이다.7 is a block diagram illustrating an arithmetic device according to an embodiment.

도 7을 참조하면, 일 실시예에 따른 연산 장치(700)는 프로세서(710)를 포함한다. 연산 장치(700)는 메모리(730) 및 통신 인터페이스(750)를 더 포함할 수 있다. 프로세서(710), 메모리(730), 통신 인터페이스(750), 및 센서들(770)은 통신 버스(705)를 통해 서로 통신할 수 있다.Referring to FIG. 7 , a computing device 700 according to an embodiment includes a processor 710 . The computing device 700 may further include a memory 730 and a communication interface 750 . Processor 710 , memory 730 , communication interface 750 , and sensors 770 may communicate with each other via communication bus 705 .

프로세서(710)는 4비트 고정 소수점(fixed point)으로 표현된 제1 오퍼랜드(operand) 데이터를 수신하고, 16비트 크기의 제2 오퍼랜드 데이터를 수신하고, 제2 오퍼랜드 데이터의 데이터 타입을 판단하고, 제2 오퍼랜드 데이터가 부동 소수점 타입인 경우, 제2 오퍼랜드 데이터를 인코딩하고, 인코딩된 제2 오퍼랜드 데이터를 4비트 단위의 4개의 브릭(brick)으로 분리하고, 4개의 브릭으로 분리된 제2 오퍼랜드 데이터와 제1 오퍼랜드 데이터 사이의 MAC 연산을 수행한다.The processor 710 receives first operand data expressed in 4-bit fixed point, receives 16-bit second operand data, determines a data type of the second operand data, When the second operand data is a floating-point type, the second operand data is encoded, the encoded second operand data is divided into 4 bricks of a 4-bit unit, and the second operand data divided into 4 bricks and MAC operation between the first operand data is performed.

메모리(730)는 휘발성 메모리 또는 비 휘발성 메모리일 수 있다.The memory 730 may be a volatile memory or a non-volatile memory.

실시예에 따라서, 프로세서(710)는 제2 오퍼랜드 데이터를 4비트(bit) 단위로 분리할 수 있도록 상기 제2 오퍼랜드 데이터의 지수부 및 가수부의 비트 수를 조정하고, 상기 지수부가 4의 배수로 될 수 있도록 상기 비트 수가 조정된 제2 오퍼랜드 데이터를 인코딩할 수 있다.According to an embodiment, the processor 710 adjusts the number of bits in the exponent part and mantissa part of the second operand data so that the second operand data can be divided into 4-bit units, and the exponent part becomes a multiple of 4 The second operand data of which the number of bits is adjusted may be encoded.

프로세서(710)는 인코딩된 제2 오퍼랜드 데이터를 하나의 지수부 브릭 데이터과 3개의 가수부 브릭 데이터로 분리할 수 있다.The processor 710 may separate the encoded second operand data into one exponential part brick data and three mantissa part brick data.

프로세서(710)는 3개의 가수부 브릭 데이터 각각과 제1 오퍼랜드 데이터 사이의 곱셈 연산을 수행하고, 지수부 레지스터에 저장된 누산 지수부 데이터와 지수부 브릭 데이터를 비교하고, 비교 결과에 기초하여, 3개의 가수부 레지스터 각각에 저장된 누산 가수부 데이터에 곱셈 연산 수행 결과를 누산할 수 있다.The processor 710 performs a multiplication operation between each of the three mantissa part brick data and the first operand data, compares the accumulated exponent part data stored in the exponent part register and the exponent part brick data, and based on the comparison result, 3 A result of performing a multiplication operation may be accumulated in accumulated mantissa data stored in each of the n mantissa registers.

프로세서(710)는 비교 결과에 기초하여, 3개의 가수부 레지스터 각각에 저장된 누산 가수부 데이터와 곱셈 연산 수행 결과의 누산 위치(position)를 정렬할 수 있다.The processor 710 may align the accumulated mantissa data stored in each of the three mantissa registers and the accumulated position of the multiplication operation result based on the comparison result.

이 밖에도, 프로세서(710)는 도 1a 내지 도 6을 통해 전술한 적어도 하나의 방법 또는 적어도 하나의 방법에 대응되는 알고리즘을 수행할 수 있다. 프로세서(710)는 프로그램을 실행하고, 연산 장치(700)를 제어할 수 있다. 프로세서(710)에 의하여 실행되는 프로그램 코드는 메모리(730)에 저장될 수 있다. 연산 장치(700)는 입출력 장치(미도시)를 통하여 외부 장치(예를 들어, 퍼스널 컴퓨터 또는 네트워크)에 연결되고, 데이터를 교환할 수 있다. 연산 장치(700)는 스마트 폰, 테블릿 컴퓨터, 랩톱 컴퓨터, 데스크톱 컴퓨터, 텔레비전, 웨어러블 장치, 보안 시스템, 스마트 홈 시스템 등 다양한 컴퓨팅 장치 및/또는 시스템에 탑재될 수 있다.In addition, the processor 710 may perform at least one method described above with reference to FIGS. 1A to 6 or an algorithm corresponding to the at least one method. The processor 710 may execute a program and control the computing device 700 . The program code executed by the processor 710 may be stored in the memory 730 . The computing device 700 may be connected to an external device (eg, a personal computer or a network) through an input/output device (not shown), and may exchange data. The computing device 700 may be mounted on various computing devices and/or systems, such as a smart phone, a tablet computer, a laptop computer, a desktop computer, a television, a wearable device, a security system, and a smart home system.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented by a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the apparatus, methods and components described in the embodiments may include a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a PLU. It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit, microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited drawings, those skilled in the art may apply various technical modifications and variations based on the above. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

receiving input data represented by 16-bit half floating point;
adjusting the number of bits of an exponent and a mantissa of the input data so that the input data can be divided into 4-bit units; and
encoding the input data whose number of bits is adjusted so that the exponent part is a multiple of four;
An encoding method comprising .

According to claim 1,
The step of adjusting the number of bits is
allocating 4 bits to the exponent part; and
allocating 11 bits to the mantissa
Including, encoding method.

According to claim 1,
The encoding step
calculating a quotient and remainder obtained by dividing a value obtained by adding 4 to an exponent of the input data by 4;
encoding the exponent portion based on the quotient; and
encoding the mantissa based on the remainder;
An encoding method comprising .

4. The method of claim 3,
The step of encoding the exponent part is
encoding the exponent part based on the quotient and a bias;
Including, encoding method.

4. The method of claim 3,
The step of encoding the mantissa is
determining the first bit value of the mantissa to be 1 when the remainder is 0
Including, encoding method.

4. The method of claim 3,
The step of encoding the mantissa is
When the remainder is 1, determining the first bit value of the mantissa part as 0 and the second bit value of the mantissa part as 1;
Including, encoding method.

4. The method of claim 3,
The step of encoding the mantissa is
When the remainder is 2, determining the first bit value of the mantissa part as 0, the second bit value of the mantissa part as 0, and the third bit value of the mantissa part as 1;
Including, encoding method.

4. The method of claim 3,
The step of encoding the mantissa is
When the remainder is 3, determining the first bit value of the mantissa part as 0, the second bit value of the mantissa part as 0, the third bit value of the mantissa part as 0, and the fourth bit value as 1;
Including, encoding method.

Receiving first operand data represented by a 4-bit fixed point (fixed point);
receiving second operand data having a size of 16 bits;
determining a data type of the second operand data;
encoding the second operand data when the second operand data is a floating point type;
separating the encoded second operand data into four bricks of a 4-bit unit; and
performing a MAC operation between the second operand data and the first operand data divided into the four blocks
Including, calculation method.

10. The method of claim 9,
The encoding step
adjusting the number of bits in the exponent part and the mantissa part of the second operand data so that the second operand data can be divided into 4-bit units; and
encoding the second operand data whose number of bits is adjusted so that the exponent part is a multiple of four;
Including, calculation method.

10. The method of claim 9,
The separating step
Separating the encoded second operand data into one exponential part brick data and three mantissa part brick data
Including, calculation method.

12. The method of claim 11,
The step of performing the MAC operation is
performing a multiplication operation between each of the three mantissa part brick data and the first operand data;
comparing the accumulated exponential part data stored in the exponential part register with the exponential part brick data; and
accumulating the result of performing the multiplication operation on accumulated mantissa data stored in each of the three mantissa registers based on the comparison result
Including, calculation method.

13. The method of claim 12,
The accumulating step
aligning the accumulated mantissa data stored in each of the three mantissa registers and the accumulated position of the multiplication operation result based on the comparison result;
Including, calculation method.

10. The method of claim 9,
separating the second operand data into four blocks of 4-bit units for parallel data operation when the second operand data is a fixed number point type;
Further comprising, a calculation method.

A computer-readable storage medium storing one or more programs including instructions for performing the method of any one of claims 1 to 14.

Bits of an exponent and a mantissa of the input data to receive input data expressed in 16-bit floating point and separate the input data into 4-bit units A processor that adjusts the number and encodes the input data with the number of bits adjusted so that the exponent part is a multiple of four.
Encoding device comprising a.

17. The method of claim 16,
the processor
and assigning 4 bits to the exponent part and assigning 11 bits to the mantissa part.

17. The method of claim 16,
the processor
calculating a quotient and a remainder obtained by dividing a value obtained by adding 4 to an exponent part of the input data by 4, encoding the exponent part based on the quotient, and encoding the mantissa part based on the remainder.

Receives first operand data represented by a 4-bit fixed point, receives second operand data of 16-bit size, determines a data type of the second operand data, and the second operand When the data is a floating-point type, the second operand data is encoded, the encoded second operand data is divided into four bricks of a 4-bit unit, and the second operand data divided into the four bricks and a processor that performs a MAC operation between the first operand data
Computing device comprising.

20. The method of claim 19,
the processor
A second second in which the number of bits of the exponent and mantissa of the second operand data is adjusted so that the second operand data can be divided into 4-bit units, and the number of bits is adjusted so that the exponent part is a multiple of 4 An encoding device for encoding operand data.

20. The method of claim 19,
the processor
and separating the encoded second operand data into one exponential part brick data and three mantissa part brick data.

22. The method of claim 21,
the processor
A multiplication operation is performed between each of the three mantissa part brick data and the first operand data, the accumulated exponent part data stored in the exponent part register is compared with the exponent part brick data, and based on the comparison result, three and accumulating a result of performing the multiplication operation on accumulated mantissa data stored in each mantissa register.

23. The method of claim 22,
the processor
and aligning an accumulated position of the accumulated mantissa data stored in each of the three mantissa registers and a result of performing the multiplication operation based on the comparison result.

20. The method of claim 19,
the processor
When the second operand data is a fixed decimal point type, the encoding apparatus of claim 1, wherein the second operand data is divided into four blocks of a 4-bit unit for parallel data operation.