KR102360116B1

KR102360116B1 - Artificial intelligent accelerator including compression module and data transmission method using method using the same

Info

Publication number: KR102360116B1
Application number: KR1020210039858A
Authority: KR
Inventors: 이성주; 이준표
Original assignee: 세종대학교산학협력단
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2022-02-08

Abstract

The present invention relates to an artificial intelligence accelerator. Specifically, the present invention relates to the artificial intelligence accelerator comprising a compression module that is configured to allow an inference median value of an accelerator, using a run length coding technique and an adaptive huffman coding technique, to be transferred to a memory by a primary and secondary compression; and a data transfer method using the same. Therefore, the present invention is capable of providing the artificial intelligence accelerator with both an improved compression ratio and data transfer speed.

Description

Artificial intelligent accelerator including compression module and data transmission method using method using the same}

본 발명은 인공지능 가속기에 관한 것으로서, 보다 구체적으로는 2가지의 압축 기술을 사용하여 압축하도록 구성된 압축 모듈을 포함하는 인공지능 가속기 및 이를 이용한 데이터 전달 방법에 관한 것이다. The present invention relates to an artificial intelligence accelerator, and more particularly, to an artificial intelligence accelerator including a compression module configured to compress using two compression techniques, and a data transfer method using the same.

현대 사회에서 인공지능(artificial intelligent)은 4차 산업혁명을 견인하는 핵심 기술로 자리 잡았다. 인공지능의 산업적 활용을 위해서는 막대한 계산량으로 인해 발생하는 전력 소비 문제를 해결하는 것인데, 이러한 문제를 해결하기 위한 방안 중 하나로서 연산 속도를 높이기 위해 인공신경망(artificial neural network)의 연산에 최적화되도록 설계된 연산 장치인 인공지능 가속기(accelerator)를 사용하고 있다.In modern society, artificial intelligence has established itself as a core technology driving the 4th industrial revolution. The industrial application of artificial intelligence is to solve the power consumption problem caused by the huge amount of computation. As one of the methods to solve this problem, an operation designed to be optimized for the operation of an artificial neural network in order to increase the operation speed. It is using an artificial intelligence accelerator, a device.

통상의 인공지능 가속기를 이용한 추론 결과 값 획득 과정을 살펴보면, 신경망(neural network) 입력단(예, pc, 카메라 등)에서 추론 결과를 얻고자 하는 새로운 데이터를 메모리에 입력하면, 가속기의 프로세싱 유닛(processing unit)은 이를 초기 입력 데이터로 입력 받아 추론 결과 값을 도출하고, 이를 다시 메모리로 전달하면 신경망 입력단에서 해당 메모리 주소로 추론 결과 값을 읽어 들여 확인하는 것으로 구성된다. 여기서, 가속기의 프로세싱 유닛은 내부 연산 프로그램에 따라 연산을 진행하는데, 전체 데이터를 한 번에 연산할 경우 연산량이 너무 많아지기 때문에 일반적으로 연산 과정 중에 일부 데이터(즉, 추론 중간 값)를 출력하여 메모리에 저장하였다가 이를 다시 입력 데이터로 입력 받아 연산하여, 단계 별로 나누어서 연산하는 방식으로 구성된다.Looking at the process of obtaining an inference result value using a typical artificial intelligence accelerator, when new data for obtaining an inference result from a neural network input terminal (eg, pc, camera, etc.) is input into the memory, the processing unit of the accelerator (processing unit) unit) receives it as initial input data, derives an inference result, and transfers it back to the memory, and it consists of reading and checking the inference result value from the neural network input terminal to the corresponding memory address. Here, the processing unit of the accelerator performs the operation according to the internal operation program. Since the amount of operation is too large when all data is calculated at once, it generally outputs some data (that is, the intermediate value of inference) during the operation process and outputs the memory. It is stored in the , and is input again as input data, is calculated, and divided into steps to calculate.

그런데, 메모리와 인공지능 가속기 사이에서 일어나는 데이터 전달은 많은 시간과 에너지를 소모하며, 이는 곧 전력 소비 문제로 이어지게 된다. 이를 해결하기 위해 메모리와 가속기 사이의 데이터 전달을 최소화하기 위해 많은 연구들이 진행되고 있다.However, data transfer between the memory and the AI accelerator consumes a lot of time and energy, which soon leads to a power consumption problem. To solve this problem, many studies are being conducted to minimize data transfer between the memory and the accelerator.

대표적인 해결 방안으로서, 통상적으로 가속기 내에 프로세싱 유닛으로부터 출력되는 추론 중간 값을 소정의 압축 기법으로 압축하여 메모리에 전달하고, 이를 다시 입력 받으면 압축 해제하여 프로세싱 유닛으로 전달하는 압축 모듈을 구성하여 메모리와 인공지능 가속기 사이의 데이터 전달 속도를 최소화 하는 방식을 사용하고 있다. As a representative solution, a compression module that compresses the inference intermediate value output from the processing unit in the accelerator with a predetermined compression technique and transmits it to the memory, and decompresses it when it is input again, and delivers it to the processing unit It uses a method that minimizes the data transfer rate between the intelligent accelerators.

종래에는 압축 모듈의 압축 기법으로서, 주로 무손실 압축 기법 중 하나인 허프만 코딩 기법을 채택하여 사용하고 있다. Conventionally, as a compression technique of a compression module, the Huffman coding technique, which is one of the lossless compression techniques, is mainly adopted and used.

허프만 코딩(Huffman coding) 기법은, 정해진 데이터의 횟수들을 측정하여 가장 많이 등장하는 데이터가 가장 적은 비트를 사용하여 보낼 수 있도록 가변적인 데이터 비트를 사용하여 압축하는 알고리즘이다. 이러한 허프만 코딩은 필터 설정 값에 따라 특정 개수의 데이터들을 가지고 있어야 압축 알고리즘 수행하도록 구성된다. 따라서, 가속기의 프로세싱 유닛으로부터 추론 중간 값이 특정 개수가 채워질 때까지 대기하고 있다가 특정 개수가 충족되면 압축 알고리즘을 이용하여 압축시켜 메모리로 전달하게 된다. 이로 인해, 대기 시간 동안의 공백으로 데이터 전달 속도가 저하되는 문제가 발생된다. The Huffman coding technique is an algorithm that measures the number of times of data and compresses it using variable data bits so that the most frequently appearing data can be transmitted using the fewest bits. Such Huffman coding is configured to perform a compression algorithm only when a specific number of data is included according to a filter setting value. Therefore, it waits until a specific number of inference intermediate values from the processing unit of the accelerator is filled, and when the specific number is satisfied, it is compressed using a compression algorithm and transmitted to the memory. Due to this, there is a problem in that the data transfer speed is lowered due to the blank during the waiting time.

(특허문헌 1) KR10-2020-0093404 A (Patent Document 1) KR10-2020-0093404 A

본 발명은 상술한 문제점을 해결하고자 하는 것으로서, 기존의 허프만 코딩이 갖는 속도 저하를 개선하기 위하여 압축 모듈을 2가지의 압축 기법을 결합하여 압축하도록 구성하여, 데이터 전달 속도 및 압축률이 모두 향상된 인공지능 가속기를 제공하고자 한다. The present invention is to solve the above-mentioned problem, and to improve the speed degradation of the existing Huffman coding, the compression module is configured to compress by combining two compression techniques, so that the data transfer speed and the compression rate are both improved. We want to provide an accelerator.

본 발명에 따른 인공지능 가속기(artificial intelligent accelerator)는, 신경망(neural network) 입력단의 초기 인풋 데이터를 기반으로 연산하는 프로세싱 모듈로부터 출력되는 추론 중간 값을 소정의 서로 다른 압축 방식으로 1, 2차 압축하여 메모리에 저장하고, 상기 메모리에 저장된 압축된 추론 중간 값을 인풋 데이터로 획득하여 상기 압축 방식 각각에 대응하는 압축 해제 방식으로 1, 2차 압축 해제하여 상기 프로세싱 모듈로 입력하는 압축 모듈; 및 메모리에 저장된 초기 인풋 데이터를 획득하여 추론 연산을 수행하고, 그 추론 연산의 중간 값인 추론 중간 값을 출력하여 압축 모듈을 통해 메모리에 저장하고, 상기 저장된 추론 중간 값을 압축 모듈을 통해 인풋 데이터로 획득하여 추론 연산을 수행하는 프로세싱 모듈; 을 포함하여 구성된다.An artificial intelligent accelerator according to the present invention compresses an inference intermediate value output from a processing module that calculates based on initial input data of a neural network input terminal in a predetermined different compression method first and second. a compression module for storing the compressed inference intermediate value stored in the memory as input data, decompressing the first and second compressions using a decompression method corresponding to each of the compression methods, and inputting it to the processing module; and obtaining the initial input data stored in the memory to perform a speculation operation, outputting a speculation intermediate value that is an intermediate value of the speculation operation, and storing it in the memory through the compression module, and converting the stored speculation intermediate value as input data through the compression module a processing module to obtain and perform an inference operation; is comprised of

보다 구체적으로, 상기 압축 모듈은, 프로세싱 모듈로부터 출력되는 추론 중간 값을 소정의 제1, 2 압축 방식으로 1, 2차 압축하여 메모리로 전달하여 저장하는 압축 수행 모듈; 및 상기 압축 수행 모듈에 의해 2차 압축되어 저장된 압축 추론 중간 값을 리딩하여 인풋 데이터로 획득하여, 상기 제1, 2차 압축 방식 각각에 대응하는 제1, 2 압축 해제 방식으로 1, 2차 압축 해제하여 프로세싱 모듈로 입력하는 압축 해제 수행 모듈; 을 포함하여 구성되는 것을 특징으로 한다.More specifically, the compression module may include: a compression performing module configured to first and second compress an inference intermediate value output from the processing module using predetermined first and second compression methods, and transmit and store the transmitted to a memory; and reading an intermediate value of compression speculation stored after being secondarily compressed by the compression performing module to obtain it as input data, and performing primary and secondary compression in first and second decompression methods corresponding to the first and second compression methods, respectively a decompression performing module to decompress and input to the processing module; It is characterized in that it comprises a.

상기 압축 수행 모듈은, 프로세싱 모듈로부터 출력되는 추론 중간 값을 소정의 제1 압축 방식으로 1차 압축하는 제1 압축 수행 모듈; 상기 제1 압축 수행 모듈에 의해 소정의 제1 압축 방식으로 1차 압축된 추론 중간 값을 소정의 제2 압축 방식으로 2차 압축하여 압축 추론 값을 생성하는 제2 압축 수행 모듈; 을 포함하여 구성되는 것을 특징으로 한다.The compression performing module may include: a first compression performing module for primary compression of an inference intermediate value output from the processing module by a predetermined first compression method; a second compression performing module for generating a compression speculation value by secondarily compressing the speculation intermediate value firstly compressed by the first compression performing module using a predetermined second compression method; It is characterized in that it comprises a.

한편, 상기 압축 해제 수행 모듈은, 메모리에 저장된 상기 압축 추론 중간 값을 인풋 데이터로 획득하여, 상기 제2 압축 방식에 대응하는 제2 압축 해제 방식으로 1차 압축 해제하는 제1 압축 해제 수행 모듈; 상기 제1 압축 해제 수행 모듈에 의해 1차 압축 해제된 인풋 데이터를 상기 제1 압축 방식에 대응하는 제1 압축 해제 방식으로 2차 압축 해제하여 프로세싱 모듈로 입력하는 제2 압축 해제 수행 모듈; 을 포함하여 구성되는 것을 특징으로 한다.On the other hand, the decompression performing module may include: a first decompression performing module that obtains the compression speculation intermediate value stored in a memory as input data, and performs primary decompression with a second decompression method corresponding to the second compression method; a second decompression performing module for secondarily decompressing the input data first decompressed by the first decompression performing module by a first decompression method corresponding to the first compression method, and inputting the second decompression performing module to a processing module; It is characterized in that it comprises a.

여기서, 상기 소정의 제1 압축 방식은, 런 렝스 코딩(Run Length Coding) 기법인 것을 특징으로 한다.Here, the predetermined first compression method is characterized in that it is a run length coding (Run Length Coding) technique.

한편, 상기 소정의 제2 압축 방식은, 동적 허프만 코딩(Adaptive Huffman Coding) 기법인 것을 특징으로 한다.Meanwhile, the predetermined second compression method is characterized in that it is an adaptive Huffman coding technique.

한편, 상기 제1 압축 해제 방식은, 상기 동적 허프만 코딩(Adaptive Huffman Coding) 기법의 압축 과정을 역순으로 수행하는 것; 을 특징으로 한다.Meanwhile, the first decompression method may include performing a compression process of the adaptive Huffman coding technique in a reverse order; is characterized by

한편, 상기 제2 압축 해제 방식은, 상기 런 렝스 코딩(Run Length Coding) 기법의 압축 과정을 역순으로 수행하는 것; 을 특징으로 한다. Meanwhile, the second decompression method may include performing a compression process of the run length coding technique in a reverse order; is characterized by

본 발명에 따른 인공지능 가속기(artificial intelligent accelerator)에서 메모리 간의 데이터 전달 방법은, 가속기의 프로세싱 모듈에서, 메모리에 미리 저장된 신경망 입력단으로부터의 초기 인풋 데이터를 리딩하여 획득하는 초기 인풋 데이터 획득 단계; 가속기의 프로세싱 모듈에서, 상기 초기 인풋 데이터 획득 단계에서 획득한 초기 인풋 데이터를 이용하여 연산하여 추론 결과의 중간 값인 추론 중간 값을 출력하는 추론 중간 값 출력 단계; 가속기의 압축 모듈에서, 상기 추론 중간 값 출력 단계에서 출력된 프로세싱 모듈로부터의 추론 중간 값을 소정의 서로 다른 압축 방식으로 1, 2차 압축하여 압축 추론 중간 값을 생성하는 추론 중간 값 압축 단계; 가속기의 압축 모듈에서, 상기 추론 중간 값 압축 단계에서 생성된 압축 추론 중간 값을 메모리로 전달하여 저장하는 압축 추론 중간 값 저장 단계; 를 포함하여 구성된다.The method for transferring data between memories in an artificial intelligent accelerator according to the present invention includes: an initial input data acquisition step of acquiring, in a processing module of the accelerator, initial input data from a neural network input terminal stored in advance in a memory by reading; a speculation intermediate value output step of calculating, in the processing module of the accelerator, the initial input data obtained in the initial input data acquisition step and outputting a speculation intermediate value that is an intermediate value of the speculation result; a speculation median compression step of, in the compression module of the accelerator, primary and secondary compression of the speculation median value output from the processing module output in the speculation median value output step using predetermined different compression methods to generate a compressed speculation median value; a compression speculation intermediate value storage step of transferring, in the compression module of the accelerator, the compressed speculation intermediate value generated in the speculation intermediate value compression step to a memory and storing; is comprised of

한편, 가속기의 압축 모듈에서, 상기 압축 추론 중간 값 저장 단계에 의해 메모리에 저장된 압축 추론 중간 값을 리딩하여 인풋 데이터로 획득하는 인풋 데이터 획득 단계; 가속기의 압축 모듈에서, 상기 인풋 데이터 획득 단계에서 인풋 데이터로 획득한 압축 추론 중간 값을 상기 소정의 제1, 2 압축 방식에 각각 대응하는 제1, 2 압축 해제 방식으로 압축 추론 중간 값을 1, 2차 압축 해제하는 인풋 데이터 압축 해제 단계; 가속기의 압축 모듈에서, 상기 인풋 데이터 압축 해제 단계에서 2차 압축 해제된 인풋 데이터를 프로세싱 모듈로 전달하는 인풋 데이터 전달 단계; 를 더 포함하여 구성된다.On the other hand, in the compression module of the accelerator, the input data acquisition step of reading the compressed speculation median value stored in the memory by the compression speculation median value storage step to obtain as input data; In the compression module of the accelerator, the compression speculation median value obtained as input data in the input data acquisition step is set to 1 in the first and second decompression schemes corresponding to the predetermined first and second compression schemes, respectively; Decompressing the input data of the secondary decompression step; an input data transfer step of transmitting, in the compression module of the accelerator, the input data decompressed in the step of decompressing the input data to the processing module; Consists of further including.

보다 구체적으로, 상기 추론 중간 값 압축 단계는, 상기 추론 중간 값 출력 단계에서 출력된 프로세싱 모듈로부터의 추론 중간 값을 소정의 제1 압축 방식을 사용하여 1차 압축하는 제1 압축 수행 단계; 상기 제1 압축 수행 단계에서 소정의 제1 압축 방식으로 1차 압축된 추론 중간 값을 소정의 제2 압축 방식을 사용하여 2차 압축하는 제2 압축 수행 단계; 를 포함하여 구성되는 것을 특징으로 한다.More specifically, the step of compressing the speculation median value may include: performing a first compression step of first compressing the speculation median value from the processing module output in the step of outputting the speculation median value using a predetermined first compression method; a second compression performing step of secondarily compressing the speculative intermediate value firstly compressed using a predetermined first compression method in the first compression performing step using a second predetermined compression method; It is characterized in that it comprises a.

한편, 상기 인풋 데이터 압축 해제 단계는, 상기 인풋 데이터 획득 단계에서 인풋 데이터로 획득한 압축 추론 중간 값을 상기 소정의 제2 압축 방식에 대응하는 제2 압축 해제 방식으로 1차 압축 해제하는 제1 압축 해제 수행 단계; 상기 제1 압축 해제 단계에서 제2 압축 해제 방식으로 1차 압축 해제한 압축 추론 중간 값을 상기 소정의 제1 압축 방식에 대응하는 제1 압축 해제 방식으로 2차 압축 해제하는 제2 압축 해제 수행 단계; 를 포함하여 구성되는 것을 특징으로 한다.On the other hand, in the step of decompressing the input data, a first compression decompression of the compression speculation intermediate value obtained as input data in the step of obtaining the input data is first decompressed by a second decompression method corresponding to the predetermined second compression method performing the release step; A second decompression performing step of secondarily decompressing the compression speculation median value first decompressed by the second decompression method in the first decompression step by the first decompression method corresponding to the predetermined first compression method ; It is characterized in that it comprises a.

한편, 상기 제1 압축 해제 방식은, 동적 허프만 코딩(Adaptive Huffman Coding) 기법의 압축 과정을 역순으로 수행하는 것; 을 특징으로 한다.Meanwhile, the first decompression method may include performing a compression process of a dynamic Huffman coding technique in a reverse order; is characterized by

한편, 상기 제2 압축 해제 방식은, 런 렝스 코딩(Run Length Coding) 기법의 압축 과정을 역순으로 수행하는 것; 을 특징으로 한다.Meanwhile, the second decompression method may include performing a compression process of a run length coding technique in a reverse order; is characterized by

본 발명은 가속기의 압축 모듈을 런 렝스 코딩(Run Lenght Coding) 기법으로 1차 압축하고, 동적 허프만 코딩(Adaptive Huffman Coding) 기법으로 2차 압축하여 메모리로 전달하도록 구성함으로써, 각 기법의 단점은 상호 보완하고 장점은 극대화 하는 효과를 발휘하여 압축률 및 데이터 전달 속도가 모두 향상된 인공지능 가속기를 제공할 수 있다. The present invention is configured so that the compression module of the accelerator is first compressed with the Run Length Coding technique and secondarily compressed with the Dynamic Huffman Coding technique and transferred to the memory. By complementing and maximizing the advantages, it is possible to provide an artificial intelligence accelerator with improved compression rate and data transfer speed.

도 1은 본 발명에 따른 인공지능 가속기를 포함하는 전체적인 데이터 전달 시스템을 도시한 도면이다.
도 2는 도 1의 압축 모듈의 세부 구성을 도시한 도면이다.
도 3은 런 렝스 코딩 기법의 예시를 보여주는 도면이다.
도 4는 동적 허프만 코딩 기법의 예시를 보여주는 도면이다.
도 5는 허프만 코딩 기법의 예시를 보여주는 도면이다.
도 6은 본 발명에 따른 인공지능 가속기와 메모리 간 데이터 전달 방법의 흐름을 보여주는 도면이다. 1 is a diagram illustrating an overall data delivery system including an artificial intelligence accelerator according to the present invention.
FIG. 2 is a diagram illustrating a detailed configuration of the compression module of FIG. 1 .
3 is a diagram showing an example of a run length coding technique.
4 is a diagram showing an example of a dynamic Huffman coding technique.
5 is a diagram illustrating an example of a Huffman coding scheme.
6 is a diagram showing a flow of a data transfer method between an artificial intelligence accelerator and a memory according to the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시 예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. However, the present invention may be implemented in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예컨대, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.Terms including an ordinal number such as 1st, 2nd, etc. may be used to describe various elements, but the elements are not limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise.

명세서 전체에서, 어떤 부분이 다른 부분과 “연결”되어 있다고 할 때, 이는 “직접적으로 연결”되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 “전기적으로 연결”되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 “포함”한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 본원 명세서 전체에서 사용되는 정도의 용어 “~(하는) 단계” 또는 “~의 단계”는 “~를 위한 단계”를 의미하지 않는다.Throughout the specification, when a part is said to be “connected” to another part, it includes not only the case where it is “directly connected” but also the case where it is “electrically connected” with another element in between. . Also, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated. As used throughout this specification, the term “step for (to)” or “step for” does not mean “step for”.

본 발명에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다. The terms used in the present invention have been selected as currently widely used general terms as possible while considering the functions in the present invention, but these may vary depending on the intention or precedent of a person skilled in the art, the emergence of new technology, and the like. In addition, in a specific case, there is a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the corresponding invention. Therefore, the term used in the present invention should be defined based on the meaning of the term and the overall content of the present invention, rather than the name of a simple term.

이하, 도면을 참조하여 본 발명에 대하여 상세하게 설명한다.Hereinafter, the present invention will be described in detail with reference to the drawings.

1. 본 발명에서 사용하는 용어1. Terms used in the present invention

1.1. 추론 중간 값1.1. inference median

본 발명에서 사용하는 추론 중간 값은, 가속기의 프로세싱 모듈에서 인풋 데이터와 가중치를 이용하여 연산하는 과정 중에 중간마다 출력하는 부분합(partial sum)을 말한다.The inference intermediate value used in the present invention refers to a partial sum that is output in the middle in the process of calculating using input data and weights in the processing module of the accelerator.

통상적으로 가속기의 프로세싱 모듈은 미리 설정된 내부 연산 프로그램에 기반하여 신경망 입력단(PC, 카메라 등)으로부터 입력하는 새로운 입력 데이터에 대한 추론 결과 값을 도출하기 위한 연산을 수행하는데, 한 번에 모든 연산을 처리하기에는 연산량이 너무 많아지기 때문에 연산 처리 과정에서 중간 중간마다 연산 결과를 저장하고, 이를 다시 입력 데이터로 하여 다음 연산 단계를 수행하도록 구성된다. 여기서, 연산 과정 중에 중간마다 출력되는 연산 결과 값을 본 명세서에서는 추론 중간 값으로 지칭하는 것이다.Typically, the processing module of the accelerator performs an operation for deriving an inference result value for new input data input from the neural network input terminal (PC, camera, etc.) based on a preset internal operation program, and processes all operations at once. Since the amount of calculation is too large to do, the calculation result is stored in the middle of the calculation process, and the next calculation step is performed using this as input data again. Here, in the present specification, an operation result value output in the middle during the operation process is referred to as a speculation intermediate value.

즉, 추론 중간 값이란 가속기의 추론 결과 값을 도출하기 위한 연산 과정 중 출력하는 중간 값을 의미한다. That is, the inference intermediate value refers to an intermediate value output during an operation process for deriving an inference result value of the accelerator.

1.2. 압축 추론 중간 값1.2. Compressed inference median

압축 추론 중간 값이란, 본 발명에 따른 가속기의 압축 모듈에 의해 두 번 압축 완료된 추론 중간 값을 의미한다.The compressed speculation median value means a speculation median value that has been compressed twice by the compression module of the accelerator according to the present invention.

따라서, 본 발명은 가속기에서 메모리에 추론 중간 값을 저장하고 이를 다시 읽어 들일 때 두 번 압축된 압축 추론 중간 값의 형태로 전달하여, 종래보다 빠른 속도로 데이터 전달이 이루어질 수 있다. Therefore, according to the present invention, the data transfer can be performed at a faster speed than in the prior art by storing the speculation intermediate value in the memory in the accelerator and transmitting it in the form of the compressed speculation intermediate value compressed twice when reading it back.

2. 본 발명에 따른 인공지능 가속기2. Artificial intelligence accelerator according to the present invention

본 발명에 따른 인공지능 가속기(artificial intelligent accelerator)는, 가속기로 들어오는 인풋 데이터를 이용하여 연산한 추론 결과를 중간 중간마다 저장하기 위해 메모리에 접근할 때 최대한 압축하여 빠르게 전달하여 메모리와 가속기 간 데이터 전달 속도를 높일 수 있도록 구성된 압축 모듈을 포함하여 구성된다. 도 1은 본 발명에 따른 가속기를 포함하는 전체적인 데이터 전달 시스템을 도시한 도면이고, 도 2는 압축 모듈의 세부 구성을 도시한 도면이다.The artificial intelligent accelerator according to the present invention compresses as much as possible and delivers data between the memory and the accelerator as quickly as possible when accessing the memory to store the inference result calculated using input data coming into the accelerator in the middle. It is configured by including a compression module configured to increase the speed. 1 is a diagram illustrating an overall data delivery system including an accelerator according to the present invention, and FIG. 2 is a diagram illustrating a detailed configuration of a compression module.

도 1 및 2를 참조하면, 본 발명에 따른 인공지능 가속기(100)는 하기의 구성을 포함하여 구성된다.1 and 2, the artificial intelligence accelerator 100 according to the present invention is configured to include the following configuration.

2.1. 압축 모듈(110)2.1. compression module (110)

압축 모듈(110)은, 메모리(200)와 가속기(100) 내의 프로세싱 모듈(120) 사이에 구비되어, 프로세싱 모듈(120)로부터 출력되는 추론 중간 값을 소정의 서로 다른 압축 방식으로 두 번에 걸쳐 압축하여 메모리(200)로 전달하여 저장하고, 상기 메모리(200)에 저장되어 있는 압축된 추론 중간 값을 인풋 데이터로 다시 획득하여 상기 압축 방식 각각에 대응하는 압축 해제 방식으로 두 번에 걸쳐 복원하여 프로세싱 모듈(120)로 입력하도록 구성된다. The compression module 110 is provided between the memory 200 and the processing module 120 in the accelerator 100, and twice the inferred intermediate value output from the processing module 120 in a predetermined different compression method. It is compressed and delivered to the memory 200 for storage, and the compressed inference intermediate value stored in the memory 200 is obtained again as input data, and restored twice with a decompression method corresponding to each of the compression methods. and input to the processing module 120 .

이와 같은 압축 모듈은, 아래와 같은 세부 구성을 포함하여 구성될 수 있다. Such a compression module may be configured to include the following detailed configuration.

가. 압축 수행 모듈(112)go. Compression performance module (112)

압축 수행 모듈(112)은, 프로세싱 모듈(120)로부터 출력되는 추론 중간 갓을 소정의 서로 다른 압축 방식으로 1, 2차 압축하여 메모리(200)로 전달하여 저장하는 구성이다.The compression performing module 112 is configured to first and secondly compress the speculation intermediate god output from the processing module 120 using predetermined different compression methods, and deliver it to the memory 200 for storage.

1) 제1 압축 수행 모듈(1122)1) first compression performing module 1122

제1 압축 수행 모듈(1122)은, 프로세싱 모듈(120)로부터 출력되는 추론 중간 값을 소정의 제1 압축 방식으로 1차 압축하도록 구성될 수 있다.The first compression performing module 1122 may be configured to first compress the speculative intermediate value output from the processing module 120 in a predetermined first compression method.

여기서, 소정의 제1 압축 방식은 런 렝스 코딩(Run Length Coding) 기법일 수 있다. Here, the predetermined first compression method may be a run length coding technique.

런 렝스 코딩 기법은, 중복되는 문자를 한 문자로 치환하는 방식으로 데이터의 길이를 압축시키는 개념이다. 쉽게 말해, ‘AAAAABBCCCDEEFFGG’ 라는 텍스트가 있을 때 이것을 각각 ‘문자 X 반복 횟수’로 표현하는 방법이다.The run-length coding technique is a concept of compressing the length of data by replacing duplicate characters with one character. In other words, when there is text ‘AAAAABBCCCCDEEFFGG’, it is a method of expressing each of these as ‘character X number of repetitions’.

본 발명의 제1 압축 수행 모듈(1122)은, 이와 같은 런 렝스 코딩 기법을 사용하여 프로세싱 모듈(120)로부터의 추론 중간 값을 1차 압축한다.The first compression performing module 1122 of the present invention first compresses the inferred intermediate value from the processing module 120 using such a run length coding technique.

도 3은 런 렝스 코딩 기법으로 1차 압축하는 예시를 보여주는 도면이다. 도 3을 참조하여 설명하면, 예를 들어 프로세싱 모듈(120)로부터 출력되는 추론 중간 값이 도 3의 (a)와 같은 경우, 이를 런 렝스 코딩 기법으로 압축하면 도 3의 (b)와 같이 추론 중간 값의 첫 번째에 배열된 ‘3’부터 값, 횟수, 값, 횟수, 값, 횟수 순서로 데이터가 압축 표현되는 것이다.3 is a diagram illustrating an example of primary compression using a run-length coding technique. Referring to FIG. 3 , for example, if the inference intermediate value output from the processing module 120 is the same as in FIG. 3 ( a ), if it is compressed by the run length coding technique, it is inferred as shown in FIG. 3 ( b ) Data is compressed and expressed in the order of value, count, value, count, value, count starting with '3' arranged at the first of the intermediate values.

이와 같이 런 렝스 코딩 기법은 연속적으로 같은 데이터가 많이 들어올수록 압축률이 높아지는 알고리즘으로서, 특히 활성화 함수가 ‘Relu’ 함수일 경우 발생하는 대부분의 중간 값이 0이므로 압축의 효과를 더욱 극대화 할 수 있다. As such, the run-length coding technique is an algorithm that increases the compression rate as the same data is continuously received. In particular, when the activation function is the ‘Relu’ function, most of the intermediate values that occur are 0, so the effect of compression can be further maximized.

2) 제2 압축 수행 모듈(1124)2) second compression performing module 1124

제2 압축 수행 모듈은, 상기 제1 압축 수행 모듈(1122)에 의해 소정의 제1 방식으로 1차 압축된 추론 중간 값을 소정의 제2 압축 방식으로 다시 압축하여 압축 추론 중간 값을 생성할 수 있다.A second compression performing module may generate a compressed speculation intermediate value by re-compressing the speculation median value first compressed by the first compression performing module 1122 in a predetermined second compression scheme. have.

여기서, 소정의 제2 압축 방식은 동적 허프만 코딩(Adaptive Huffman Coding) 기법일 수 있다.Here, the predetermined second compression method may be an adaptive Huffman coding technique.

동적 허프만 코딩 기법은, 문자의 빈도수를 만들어나가면서 코딩을 하는 방법으로서, 트리가 실시간으로 만들어지며 입력 값을 읽고 트리를 만들어지는 과정이 동시에 이루어지는 방식이다.The dynamic Huffman coding technique is a method of coding while generating the frequency of characters, in which the tree is created in real time, the process of reading the input value and creating the tree are performed simultaneously.

도 4는 동적 허프만 코딩 기법을 사용하여 1차 압축된 추론 중간 값을 2차 압축하는 예시를 보여주는 도면이다. 이를 참조하면, 런 렝스 코딩 기법으로 1차 압축된 추론 중간 값이 도 3의 (b)과 같은 경우, 제2 압축 수행 모듈(1124)에 ‘3, 1, 0, 9, …’ 순서로 들어오면서 트리 구조가 실시간으로 업데이트 되며, 압축된 값의 표현도 달라진다. 예를 들어, 도 3의 (b)에서 마지막 숫자인 ‘4’가 들어올 때 트리 구조는 도 4의 (a)와 같이 업데이트 되며 해당 트리를 이용한 테이블은 도 4의 (b)에 보이는 표와 같다.4 is a diagram illustrating an example of secondary compression of a primary compressed inference intermediate value using a dynamic Huffman coding technique. Referring to this, when the inference median value first compressed by the run-length coding technique is the same as that of FIG. ’, the tree structure is updated in real time, and the expression of compressed values changes. For example, when the last number '4' in FIG. 3(b) is entered, the tree structure is updated as shown in FIG. 4(a), and the table using the tree is the same as the table shown in FIG. 4(b). .

이에 따라, 제2 압축 수행 모듈(1124)로 ‘4’ 뒤에 들어오는 값이 도 4 (b)의 테이블에 존재하는 숫자라면 테이블에 존재하는 압축된 값 표현으로 바뀌고, 테이블에 존재하지 않는 새로운 값이 입력될 경우 해당 값은 압축되지 않고 그대로 메모리(200)에 저장하고 도 4 (a), (b)의 트리 및 테이블에는 해당 새로운 값을 추가되어 업데이트 된다. Accordingly, if the value coming after '4' to the second compression performing module 1124 is a number existing in the table of FIG. When input, the corresponding value is stored in the memory 200 as it is without being compressed, and the new value is added and updated in the trees and tables of FIGS. 4 (a) and (b).

앞서 언급하였던 것과 같이, 종래에는 정해진 데이터의 횟수들을 측정하여 가장 많이 등장하는 데이터가 가장 적은 비트를 사용하여 보낼 수 있도록 가변적인 데이터 비트를 사용하여 압축하는 알고리즘인 허프만 코딩(Huffman coding)을 사용하였다. 도 5는 허프만 코딩 기법의 예시를 보여주는 도면이다. 이를 참조하면, 그림 (a)처럼 ‘A’, ‘B’, ‘C’, ‘D’, ‘E’ 가 각각 15번, 7번, 6번, 6번, 5번 등장한 데이터를 가지고 허프만 코딩을 하면, (b) 단계처럼 가장 적은 횟수로 등장하는 ‘D’와 ‘E’를 묶어 트리로 표현한 후, (c) 단계에서는 ‘C’와 ‘D’, ‘E’ 횟수 합이 ‘B’의 횟수보다 많기 때문에 ‘C’와 ‘B’를 묶는다. 이후 (d) 단계와 (e) 단계에서는 남은 ‘A’가 가장 많은 횟수이기 때문에 가장 위쪽 최상단 트리로 넣어 가장 적은 비트가 포함되도록 만든다. 그러므로, (e) 단계의 완성된 그림으로 보았을 때 ‘A’는 ‘1’, ‘B’는 ‘001’, ‘C’는 ‘010’, ‘D’는 ‘001’, ‘E’는 ‘000’ 표현된다. As mentioned above, conventionally, Huffman coding, which is an algorithm that measures the number of times of data and compresses it using variable data bits, is used so that the data that appears the most can be transmitted using the fewest bits. . 5 is a diagram illustrating an example of a Huffman coding scheme. Referring to this, as shown in Figure (a), 'A', 'B', 'C', 'D', and 'E' appear 15 times, 7 times, 6 times, 6 times, and 5 times, respectively, and Huffman coding , as in step (b), 'D' and 'E' appearing the fewest times are grouped together and expressed as a tree. 'C' and 'B' are grouped because it is more than the number of Afterwards, in steps (d) and (e), since the remaining ‘A’ is the largest number of times, it is put into the topmost tree so that the fewest bits are included. Therefore, when looking at the finished picture of step (e), 'A' is '1', 'B' is '001', 'C' is '010', 'D' is '001', and 'E' is ' 000' is expressed.

이와 같은 알고리즘으로 압축하는 허프만 코딩 기법은, (a) 단계에서 미리 설정된 특정 개수의 데이터, 즉 ‘A’, ‘B’, ‘C, ‘D’, ‘E’ 처럼 5개의 데이터가 쌓여야지만 알고리즘을 이용하여 압축하는 것이 가능하다. 이러한 특징은, 쌓인 데이터의 개수가 많을수록 압축 효과가 극대화되어 높은 압축률을 가지는 반면, 특정 개수의 데이터가 쌓일 때까지 대기해야 하므로 대기시간 동안의 공백이 발생하게 된다. 이로 인해 가속기에서 메모리로의 데이터 전달에 속도 저하 문제가 있지만, 종래에는 이러한 문제점을 감안하고 압축률에 주안점을 두는 방향으로 하여 허프만 코딩 기법을 채택하여 사용하였다.In the Huffman coding technique that compresses with such an algorithm, a specific number of data preset in step (a), that is, 5 data such as 'A', 'B', 'C, 'D', 'E', must be accumulated, but the algorithm It is possible to compress using This feature maximizes the compression effect as the number of stacked data increases and thus has a high compression rate, while waiting until a specific number of data is accumulated, resulting in a gap during the waiting time. Due to this, there is a problem of speed degradation in data transfer from the accelerator to the memory. However, in the prior art, the Huffman coding technique was adopted and used in a direction to focus on the compression rate in consideration of this problem.

본 발명은 이를 개선하기 위하여, 상술한 바와 같이 프로세싱 모듈(120)로부터 출력되는 추론 중간 값을 반복되는 데이터의 값이 많을수록 압축률의 효과가 극대화되는 런 렝스 기법으로 1차 압축하고, 이를 허프만 코딩 기법과 달리 입력되는 데이터를 이용하여 실시간으로 트리 구조를 업데이트하며 압축을 진행하는 동적 허프만 기법으로 2차 압축하여 메모리(200)로 전달하도록 구성함으로써, 상호 간 단점 보완으로 각 기법의 장점을 극대화 시켜 향상된 압축률 및 시간 단축 효과를 제공할 수 있다.In order to improve this, in the present invention, as described above, the inference intermediate value output from the processing module 120 is first compressed by a run length technique in which the effect of the compression rate is maximized as the number of repeated data increases, and this is a Huffman coding technique. In contrast to this, by using the input data to update the tree structure in real time and to deliver the secondary compression to the memory 200 with the dynamic Huffman technique that performs compression, the advantages of each technique are maximized by compensating for each other's shortcomings. Compression rate and time reduction effect can be provided.

나. 압축 해제 수행 모듈(114)me. decompression performing module (114)

압축 해제 수행 모듈(114)는, 상기 압축 수행 모듈(112)에 의해 2차 압축되어 메모리(200)에 저장된 압축 추론 중간 값을 리딩하여 다시 인풋 데이터로 획득하여, 상기 제1, 2 압축 방식 각각에 대응하는 제1, 2 압축 해제 방식으로 1, 2차 압축 해제하여 프로세싱 모듈(120)로 입력하는 구성이다.The decompression performing module 114 reads the compression inference intermediate value stored in the memory 200 that is secondarily compressed by the compression performing module 112 to obtain it as input data again, and the first and second compression methods are respectively The first and second decompression methods corresponding to the first and second decompression are decompressed and input to the processing module 120 .

1) 제1 압축 해제 수행 모듈(1142)1) first decompression performing module 1142

제1 압축 해제 수행 모듈(1142)은, 메모리(200)에 저장되어 있는 압축 추론 중간 값을 인풋 데이터로 획득하여, 이를 상기 제2 압축 방식에 대응하는 제2 압축 해제 방식으로 1차 압축 해제하도록 구성될 수 있다.The first decompression performing module 1142 obtains the compression speculation intermediate value stored in the memory 200 as input data, and first decompresses it with a second decompression method corresponding to the second compression method. can be configured.

여기서, 제2 압축 해제 방식이라 함은, 압축 수행 모듈(112)에서 메모리(200)로 전달할 시 마지막 압축 방식인 동적 허프만 코딩(Adaptive Huffman Coging) 기법의 압축 과정을 역순으로 수행하는 것을 말한다. 즉, 앞서 설명한 도 4의 예시와 같은 알고리즘을 역순으로 수행하는 것이다.Here, the second decompression method refers to performing the compression process of the last compression method, adaptive Huffman Coging, in the reverse order when the compression performing module 112 transmits the data to the memory 200 . That is, the algorithm as in the example of FIG. 4 described above is performed in the reverse order.

압축 추론 중간 값은 런 렝스 코딩 기법으로 1차 압축, 동적 허프만 코딩 기법으로 2차 압축된 것이므로, 이를 다시 복원하기 위해서는 역순으로 먼저 2차 압축 방식으로 해제한 후 1차 압축 방식으로 해제되어야 한다.Since the compression inference intermediate value is first compressed by the run-length coding technique and secondarily compressed by the dynamic Huffman coding technique, in order to restore it, it must first be decompressed by the second compression method in the reverse order and then decompressed by the first compression method.

따라서, 압축 추론 중간 값의 1차 압축 해제를 수행하는 제1 압축 해제 수행 모듈(1142)은, 제2 압축 방식에 대응하는 동적 허프만 코딩 기법의 압축 과정을 역순으로 수행하여 압축 추론 중간 값을 1차적으로 압축 해제할 수 있다. Accordingly, the first decompression performing module 1142 that performs the primary decompression of the compression speculation median value performs the compression process of the dynamic Huffman coding technique corresponding to the second compression scheme in the reverse order to reduce the compression speculation median value to 1 It can be decompressed gradually.

2) 제2 압축 해제 수행 모듈(1144)2) second decompression performing module 1144

제2 압축 해제 수행 모듈은, 상기 제1 압축 해제 수행 모듈(1142)에 의해 1차 압축 해제된 압축 추론 중간 값을 제1 압축 방식에 대응하는 제1 압축 해제 방식으로 2차 압축 해제하도록 구성될 수 있다.a second decompression performing module, configured to secondary decompress the compression speculation intermediate value decompressed primarily by the first decompression performing module 1142 into a first decompression method corresponding to the first compression method can

여기서, 제1 압축 해제 방식이라 함은, 압축 수행 모듈(112)에서 프로세싱 모듈(120)로부터의 추론 중간 값을 1차 압축한 압축 방식인 런 렝스 코딩(Run Length Coding) 기법의 압축 과정을 역순으로 수행하는 것을 말한다. 즉, 앞서 설명한 도 3의 예시와 같은 알고리즘을 역순으로 수행하는 것이다.Here, the first decompression method refers to the reverse order of the compression process of the Run Length Coding technique, which is a compression method in which the inference intermediate value from the processing module 120 is first compressed in the compression performing module 112 . means to do it with That is, the algorithm as in the example of FIG. 3 described above is performed in the reverse order.

제2 압축 해제 수행 모듈(1144)에서 상기 제1 압축 해제 수행 모듈(1142)에 의해 1차 압축된 압축 추론 중간 값을 런 렝스 코딩 기법의 압축 과정을 역순으로 수행하여 2차 압축 해제하면, 압축 수행 모듈(112)에서 추론 중간 값의 압축을 수행하기 전, 즉 프로세싱 모듈(120)로부터 출력되었을 시의 본래의 추론 중간 값 상태로 복원된다. When the compression inference intermediate value first compressed by the first decompression performing module 1142 in the second decompression performing module 1144 is secondarily decompressed by performing the compression process of the run length coding technique in the reverse order, the compression Before compression of the speculation median value is performed in the performing module 112 , that is, the original speculation median value state when output from the processing module 120 is restored.

이와 같이 복원된 추론 중간 값은 프로세싱 모듈(120)로 입력하여 다음 단계의 연산을 수행할 수 있도록 한다.The reconstructed speculation intermediate value is input to the processing module 120 so that the operation of the next step can be performed.

2.2. 프로세싱 모듈(120)2.2. processing module 120

프로세싱 모듈(120)은, 미리 설정된 내부 연산 프로그램에 기반으로 메모리(200)에 저장된 인풋 데이터를 이용하여 추론 연산을 수행하여 추론 결과 값을 도출하는 구성이다.The processing module 120 is configured to derive a speculation result value by performing a speculation calculation using input data stored in the memory 200 based on a preset internal calculation program.

구체적으로, 초기에는 메모리(200)에 저장된 초기 인풋 데이터를 리딩하여 획득하여 추론 연산을 수행하되, 연산 수행 중 추론 결과의 중간 값인 추론 중간 값을 출력하여 압축 모듈(110)을 통해 메모리(200)에 저장하고, 저장된 추론 중간 값을 압축 모듈(110)을 통해 다시 인풋 데이터로 하여 추론 연산을 단계적으로 수행하도록 구성된다.Specifically, in the beginning, the memory 200 through the compression module 110 by reading the initial input data stored in the memory 200 to obtain and performing an inference operation, by outputting a speculation intermediate value that is an intermediate value of the reasoning result during operation. It is configured to store the stored speculation intermediate value as input data again through the compression module 110 to perform the speculation operation step by step.

아래에는 본 발명의 가속기(100)를 설명하는 데에 언급된 공지 구성으로서 메모리(200)와 신경망 입력단(300)에 대해 간략하게 설명한다.Hereinafter, the memory 200 and the neural network input terminal 300 will be briefly described as well-known components mentioned in describing the accelerator 100 of the present invention.

2.3. 메모리(200)2.3. memory (200)

메모리(200)는, 신경망 입력단(300)으로부터 입력 받은 초기 인풋 데이터를 저장하고, 가속기(100)의 압축 모듈(110)로부터 전달되는 압축 추론 중간 값을 저장한다. The memory 200 stores the initial input data received from the neural network input terminal 300 , and stores the compression inference intermediate value transmitted from the compression module 110 of the accelerator 100 .

이러한 메모리(200)에 의해, 가속기(100)에서 추론 연산 단계에 따라 중간 값을 저장하였다가, 다시 인풋 데이터로서 읽어 들여 다음 단계의 연산을 수행할 수 있어 가속기(100)의 연산 과정이 효율적으로 운영되도록 한다. With this memory 200, the accelerator 100 stores the intermediate value according to the reasoning operation step and reads it again as input data to perform the next step operation, so that the operation process of the accelerator 100 can be efficiently performed. make it operational

2.4. 신경망(neural network) 입력단(300)2.4. Neural network input terminal (300)

신경망 입력단(300)은, 가속기(100)를 이용하여 새로운 데이터에 대한 추론 결과 값을 획득하고자 하는 구성으로서, 예를 들어 PC, 카메라 등을 포함할 수 있다.The neural network input terminal 300 is a configuration for obtaining an inference result value for new data using the accelerator 100 , and may include, for example, a PC, a camera, and the like.

신경망 입력단(300)은, 추론 결과 값을 얻고자 하는 새로운 데이터를 메모리(200)에 초기 인풋 데이터로서 저장한다. 이후 초기 인풋 데이터를 입력 받은 가속기(100)로부터 추론 연산을 통해 추론 중간 값이 출력되어 메모리(200)에 저장되면, 해당 메모리 주소로 추론 중간 값을 읽어 들여 인식/확인하는 형태로 구성된다. The neural network input terminal 300 stores new data for obtaining an inference result value as initial input data in the memory 200 . After that, when the inference intermediate value is output through the inference operation from the accelerator 100 receiving the initial input data and stored in the memory 200, the inference intermediate value is read into the corresponding memory address and recognized/confirmed.

3. 본 발명에 따른 인공지능 가속기의 데이터 전달 방법3. Data transfer method of artificial intelligence accelerator according to the present invention

도 6은 본 발명에 따른 인공지능 가속기를 이용한 메모리로의 데이터 전달 방법의 흐름을 보여주는 도면이다.6 is a diagram showing a flow of a method of transferring data to a memory using an artificial intelligence accelerator according to the present invention.

도 6을 참조하면, 본 발명의 데이터 전달 방법은, 아래와 같은 단계를 포함하여 구성된다.Referring to FIG. 6 , the data transfer method of the present invention is configured to include the following steps.

3.1. 초기 인풋 데이터 획득 단계(S100)3.1. Initial input data acquisition step (S100)

먼저, 가속기의 프로세싱 모듈(120)에서, 신경망 입력단(300)에 의해 메모리(200)에 저장된 초기 인풋 데이터를 리딩하여 획득하는 단계이다. 가속기의 프로세싱 모듈(120)은 획득한 초기 인풋 데이터를 시작으로 추론 연산을 수행할 수 있다. First, in the processing module 120 of the accelerator, the initial input data stored in the memory 200 by the neural network input terminal 300 is read and acquired. The processing module 120 of the accelerator may perform an inference operation starting with the obtained initial input data.

여기서, 초기 인풋 데이터란, 신경망 입력단(300)에서 추론 결과를 얻고자 입력하는 새로운 데이터를 의미한다. Here, the initial input data means new data input to obtain an inference result from the neural network input terminal 300 .

3.2. 추론 중간 값 출력 단계(S200)3.2. Inference intermediate value output step (S200)

가속기의 프로세싱 모듈(120)은, 상기 초기 인풋 데이터 획득 단계(S100)에서 획득한 초기 인풋 데이터를 이용하여 연산하며 추론 결과의 중간 값인 추론 중간 값을 출력하는 추론 중간 값 출력 단계(S200)를 수행한다.The processing module 120 of the accelerator performs a speculation intermediate value output step (S200) of calculating using the initial input data obtained in the initial input data obtaining step (S100) and outputting a speculation intermediate value that is an intermediate value of the reasoning result do.

3.3. 추론 중간 값 압축 단계(S300) 3.3. Inference intermediate value compression step (S300)

가속기의 압축 모듈(110)은, 가속기의 프로세싱 모듈(120)에 의해 추론 중간 값 출력 단계(S200)에서 추론 중간 값이 출력되면, 상기 출력된 추론 중간 값을 소정의 서로 다른 압축 방식으로 1, 2차 압축하여 압축 추론 중간 값을 생성하는 추론 중간 값 압축 단계(S300)를 수행한다.The compression module 110 of the accelerator, when the inference intermediate value is output in the inference intermediate value output step S200 by the processing module 120 of the accelerator, converts the output inference intermediate value to 1 in a predetermined different compression method; A speculation median value compression step (S300) of generating a compressed speculation median value by secondary compression is performed.

이러한 추론 중간 값 압축 단계(S300)는 아래의 세부 단계를 포함하여 구성된다.This inference intermediate value compression step (S300) is configured to include the following detailed steps.

가. 제1 압축 수행 단계(S310) go. First compression performing step (S310)

먼저, 상기 추론 중간 값 출력 단계(S200)에서 출력된 프로세싱 모듈(120)로부터의 추론 중간 값을 소정의 제1 압축 방식을 사용하여 1차 압축하는 제1 압축 수행 단계(S310)를 수행한다. First, a first compression performing step (S310) of primary compression of the speculation median value from the processing module 120 output in the speculative median value output step (S200) using a predetermined first compression method is performed.

여기서, 소정의 제1 압축 방식은 런 렝스 코딩(Run Length Coding) 기법일 수 있다.Here, the predetermined first compression method may be a run length coding technique.

런 렝스 기법을 사용하여 데이터를 1차 압축하는 방식은 앞서 시스템 구성에서 상세하게 설명하였으므로, 구체적인 설명은 생략한다.Since the method of first compressing data using the run length technique has been described in detail in the system configuration above, a detailed description thereof will be omitted.

이와 같은 제1 압축 수행 단계(S310)는, 가속기(100)의 제1 압축 수행 모듈(1122)에 의해 실행된다. The first compression performing step S310 is executed by the first compression performing module 1122 of the accelerator 100 .

나. 제2 압축 수행 단계(S320)me. Second compression performing step (S320)

상기 제1 압축 수행 단계(S310)에 의해 소정의 제1 압축 방식을 사용하여 프로세싱 모듈(120)로부터의 추론 중간 값이 1차 압축되면, 상기 1차 압축된 추론 중간 값을 소정의 제2 압축 방식을 사용하여 2차 압축하는 제2 압축 수행 단계(S320)를 수행한다.When the inference intermediate value from the processing module 120 is first compressed using a first predetermined compression method by the first compression performing step S310, the first compressed inference intermediate value is subjected to a predetermined second compression A second compression performing step (S320) of secondary compression using the method is performed.

동적 허프만 코딩 기법을 사용하여 데이터를 압축하는 방식은, 앞서 시스템 구성에서 상세하게 설명하였으므로, 구체적인 설명은 생략한다.Since the method of compressing data using the dynamic Huffman coding technique has been described in detail in the system configuration above, a detailed description thereof will be omitted.

이와 같은 제2 압축 수행 단계(S320)는, 가속기의 제2 압축 수행 모듈(1124)에 의해 실행된다. This second compression performing step (S320) is executed by the second compression performing module 1124 of the accelerator.

3.4. 압축 추론 중간 값 저장 단계(S400)3.4. Compression inference intermediate value storage step (S400)

가속기의 압축 모듈(110)은, 상기 추론 중간 값 압축 단계(S300)에서 두 번에 걸쳐 압축된 추론 중간 값을 메모리(200)로 전달하여 저장하는 압축 추론 중간 값 저장 단계(S400)를 수행한다. The compression module 110 of the accelerator performs a compression inference intermediate value storage step (S400) of transferring the inference intermediate value compressed twice in the inference intermediate value compression step (S300) to the memory 200 and storing it. .

3.5. 인풋 데이터 획득 단계(S500)3.5. Input data acquisition step (S500)

가속기의 압축 모듈(110)에서, 상기 압축 추론 중간 값 저장 단계(S400)를 통해 메모리(200)에 저장되어 있는 압축 추론 중간 값을 리딩하여 다시 인풋 데이터로 획득하는 인풋 데이터 획득 단계(S500)를 수행한다.In the compression module 110 of the accelerator, an input data acquisition step (S500) of reading the compression speculation median value stored in the memory 200 through the compression speculation median value storage step (S400) and acquiring it again as input data (S500) carry out

이 때, 획득하는 인풋 데이터는 앞서 설명한 제1, 2 압축 수행 단계(S310, S320)에 의해 2차 압축되어 있는 상태이다. At this time, the obtained input data is in a state of being secondary compressed by the first and second compression performing steps S310 and S320 described above.

3.6. 인풋 데이터 압축 해제 단계(S600)3.6. Input data decompression step (S600)

가속기의 압축 모듈(110)은, 상기 인풋 데이터 획득 단계(S500)에서 인풋 데이터로서 획득한 압축 추론 중간 값을 상기 추론 중간 값 압축 단계(S300)에서의 소정의 제1, 제2 압축 방식에 각각 대응하는 제1, 제2 압축 해제 방식으로 압축 해제하는 단계를 수행한다. The compression module 110 of the accelerator applies the compressed speculation median value obtained as input data in the input data acquisition step S500 to predetermined first and second compression schemes in the speculation median value compression step S300, respectively. performing decompression in the corresponding first and second decompression methods.

가. 제1 압축 해제 수행 단계(S610)go. Performing the first decompression step (S610)

먼저, 상기 인풋 데이터 획득 단계(S500)에서 메모리(200)로부터 인풋 데이터로 획득한 압축 추론 중간 값을 상기 소정의 제2 압축 방식에 대응하는 제2 압축 해제 방식으로 1차 압축 해제한다.First, the compression speculation intermediate value obtained as input data from the memory 200 in the input data acquisition step S500 is first decompressed by a second decompression method corresponding to the predetermined second compression method.

상술한 것처럼 소정의 제2 압축 방식은 동적 허프만 코딩 기법으로, 이에 대응하는 제2 압축 해제 방식이라 함은 동적 허프만 코딩 기법의 압축 과정을 역순으로 수행하는 것을 말한다. As described above, the predetermined second compression scheme is a dynamic Huffman coding scheme, and the corresponding second decompression scheme refers to performing the compression process of the dynamic Huffman coding scheme in the reverse order.

이와 같은 단계는, 압축 모듈(110)의 제1 압축 해제 수행 모듈(1142)에 의해 이루어진다.Such a step is performed by the first decompression performing module 1142 of the compression module 110 .

나. 제2 압축 해제 수행 단계(S620)me. Performing the second decompression step (S620)

다음, 상기 제1 압축 해제 수행 단계(S610)에서 제2 압축 해제 방식으로 메모리(200)로부터의 압축 추론 중간 값을 1차 압축 해제하면, 상기 소정의 제1 압축 방식에 대응하는 제1 압축 해제 방식으로 2차 압축 해제한다.Next, when the compression inference intermediate value from the memory 200 is first decompressed in the second decompression method in the first decompression performing step ( S610 ), the first decompression corresponding to the predetermined first compression method Secondary compression is decompressed in this way.

상술한 것처럼 소정의 제1 압축 방식은 런 렝스 코딩 기법으로, 이에 대응하는 제1 압축 해제 방식이라 함은 런 렝스 코딩 기법의 압축 과정을 역순으로 수행하는 것을 말한다.As described above, the predetermined first compression scheme is a run-length coding scheme, and the corresponding first decompression scheme refers to performing the compression process of the run-length coding scheme in the reverse order.

즉, 프로세싱 모듈(120)로부터의 추론 중간 값을 런 렝스 코딩 기법으로 1차 압축하고, 동적 허프만 코딩 기법으로 2차 압축하여 메모리(200)에 저장 후, 이를 다시 인풋 데이터로 획득하여 역순으로 동적 허프만 코딩 기법으로 1차 압축 해제하고, 런 렝스 코딩 기법으로 2차 압축 해제하여 본래의 추론 중간 값 상태로 복원하는 것이다. That is, the intermediate value inferred from the processing module 120 is first compressed by the run-length coding technique, secondary compressed by the dynamic Huffman coding technique, and stored in the memory 200, and then it is obtained as input data again and dynamically in the reverse order. Primary decompression is performed using Huffman coding technique, and secondary decompression is performed using run length coding technique to restore the original inference intermediate value state.

이와 같은 단계는, 압축 모듈(110)의 제2 압축 해제 수행 모듈(1144)에 의해 이루어진다.Such a step is performed by the second decompression performing module 1144 of the compression module 110 .

3.7. 인풋 데이터 전달 단계(S700)3.7. Input data transfer step (S700)

가속기의 압축 모듈(110)은, 상기 제1, 2 압축 해제 수행 단계(S510, S520)를 거쳐 메모리(200)로부터의 압축 추론 중간 값이 압축 전 상태의 추론 중간 값으로 복원되면, 이를 프로세싱 모듈(120)로 입력하는 인풋 데이터 전달 단계(S700)를 수행한다. The compression module 110 of the accelerator performs the first and second decompression performing steps (S510 and S520). When the compression speculation median value from the memory 200 is restored to the pre-compression median speculation median value, the processing module A step (S700) of transmitting input data to be input to (120) is performed.

그러면, 프로세싱 모듈(120)은 인풋 데이터로 입력 받은 압축 해제된 추론 중간 값을 이용하여 다음 연산 과정을 수행할 수 있다.Then, the processing module 120 may perform the following calculation process using the decompressed speculation intermediate value input as input data.

이후, 앞서 설명한 추론 중간 값 출력 단계(S200) 내지 인풋 데이터 전달 단계(S700)가 반복적으로 수행되어 가속기(100)에서 추론 결과 값을 도출할 수 있다.Thereafter, the above-described inference intermediate value output step S200 to input data transfer step S700 may be repeatedly performed to derive an inference result value from the accelerator 100 .

한편, 본 발명의 기술적 사상은 상기 실시 예에 따라 구체적으로 기술되었으나, 상기 실시 예는 그 설명을 위한 것이며, 그 제한을 위한 것이 아님을 주지해야 한다. 또한, 본 발명의 기술분야에서 당업자는 본 발명의 기술 사상의 범위 내에서 다양한 실시 예가 가능함을 이해할 수 있을 것이다.On the other hand, although the technical idea of the present invention has been described in detail according to the above embodiments, it should be noted that the above embodiments are for description and not for limitation. In addition, those skilled in the art will understand that various embodiments are possible within the scope of the technical spirit of the present invention.

100: 인공지능 가속기
110: 압축 모듈
112: 압축 수행 모듈
1122: 제1 압축 수행 모듈
1124: 제2 압축 수행 모듈
114: 압축 해제 수행 모듈
1142: 제1 압축 해제 수행 모듈
1144: 제2 압축 해제 수행 모듈
200: 메모리
300: 신경망 입력단100: artificial intelligence accelerator
110: compression module
112: compression performing module
1122: first compression performing module
1124: second compression performing module
114: decompression performing module
1142: first decompression performing module
1144: second decompression performing module
200: memory
300: neural network input

Claims

In the artificial intelligent accelerator (artificial intelligent accelerator),
The inference intermediate value output from the processing module that operates based on the initial input data of the neural network input is compressed and stored in the memory, and the compressed inference intermediate value stored in the memory is obtained as input data and decompressed for processing a compression module to input the model; and
Acquire the initial input data stored in the memory to perform the speculation operation, output the speculation intermediate value, which is the intermediate value of the speculation operation, and store it in the memory through the compression module, and obtain the stored speculation intermediate value as input data through the compression module and a processing module that performs an inference operation by
The compression module,
a compression performing module for sequentially first and second compressing the speculation intermediate value output from the processing module using predetermined first and second compression methods, and transferring it to a memory for storage; and
The first and second compression inference median values stored after the first and second compression by the compression module are read and obtained as input data, and the first and second decompression methods corresponding to the first and second compression methods respectively a decompression performing module to decompress and input to the processing module; consists of,
The first compression is
In order to increase the compression effect of the inference intermediate value with many repeated data values, a run-length compression technique in which a duplicate character is replaced with one character is applied as the first compression method,
The secondary compression is
An artificial intelligence accelerator that performs compression by applying the adaptive Huffman coding technique, which performs compression by updating the tree structure in real time using input data, as a second compression method.

delete

According to claim 1,
The compression performing module,
a first compression performing module that first compresses the speculation intermediate value output from the processing module using a predetermined first compression method;
a second compression performing module for generating a compression speculation value by secondarily compressing the speculation intermediate value firstly compressed by the first compression performing module using a predetermined second compression method;
Artificial intelligence accelerator, characterized in that it comprises a.

4. The method of claim 3,
The decompression performing module,
a first decompression performing module that obtains the compression speculation intermediate value stored in the memory as input data, and performs primary decompression in a second decompression method corresponding to the second compression method;
a second decompression performing module for secondarily decompressing the input data first decompressed by the first decompression performing module by a first decompression method corresponding to the first compression method, and inputting the second decompression performing module to a processing module;
Artificial intelligence accelerator, characterized in that it comprises a.

delete

5. The method of claim 4,
The first decompression method is
performing a compression process of the dynamic Huffman coding technique in a reverse order; An artificial intelligence accelerator featuring

5. The method of claim 4,
The second decompression method is
performing the compression process of the run length coding technique in a reverse order; An artificial intelligence accelerator featuring

In a method of passing data between memories in an artificial intelligent accelerator,
an initial input data acquisition step of acquiring, in a processing module of the accelerator, reading initial input data from a neural network input terminal stored in advance in a memory;
a speculation intermediate value output step of, in the processing module of the accelerator, outputting a speculation intermediate value that is an intermediate value of the speculation result by calculating using the initial input data obtained in the initial input data obtaining step;
In the compression module of the accelerator, primary compression is performed on the speculation median value from the processing module output in the step of outputting the speculation median value using a predetermined first compression method, and the primary compressed speculation median value is converted to the first a speculative intermediate value compression step of performing secondary compression using a predetermined second compression method different from the compression method;
an input data acquisition step of, in the compression module of the accelerator, reading the compressed speculation median value stored in the memory by the compression speculation median value storage step to obtain it as input data;
In the compression module of the accelerator, the compression speculation median value obtained as input data in the input data acquisition step is set to 1, the compression speculation median value is 1, Decompressing the input data of the secondary decompression step;
an input data transfer step of transmitting, in the compression module of the accelerator, the input data decompressed in the step of decompressing the input data to the processing module;
consists of,
The primary compression is performed by applying a run-length compression technique that replaces duplicated characters with one character as the first compression method in order to increase the compression effect of inferred intermediate values with many repeated data values,
The secondary compression may be performed by applying, as a second compression method, a dynamic Huffman coding technique for performing compression by updating a tree structure in real time using input data;
A data transfer method between an artificial intelligence accelerator and memory, characterized in that

delete

10. The method of claim 9,
The input data decompression step includes:
a first decompression performing step of first decompressing the compression speculation intermediate value obtained as input data in the input data obtaining step by a second decompression method corresponding to the predetermined second compression method;
A second decompression performing step of decompressing the compression speculation intermediate value first decompressed by the second decompression method in the first decompression step by the first decompression method corresponding to the predetermined first compression method ;
Data transfer method between the artificial intelligence accelerator and memory, characterized in that it comprises a.

delete

13. The method of claim 12,
The first decompression method is
performing the compression process of the adaptive Huffman coding technique in a reverse order; A data transfer method between an artificial intelligence accelerator and memory, characterized in that

13. The method of claim 12,
The second decompression method is
performing the compression process of the Run Length Coding technique in a reverse order; A data transfer method between an artificial intelligence accelerator and memory, characterized in that