KR20220059396A

KR20220059396A - Neural network accelerator and operating method thereof

Info

Publication number: KR20220059396A
Application number: KR1020210094528A
Authority: KR
Inventors: 양정민
Original assignee: 한국전자통신연구원
Priority date: 2020-11-02
Filing date: 2021-07-20
Publication date: 2022-05-10

Abstract

Disclosed are a neural network accelerator and an operating method thereof. The neural network accelerator comprises: a command analyzer for analyzes a first command for instructing an operation for a first layer of a neural network algorithm from an external device; a polymorphic operator array including a plurality of operators for performing operations for the first layer under the control of the command analyzer; an interface communicating with the external device and an external memory under the control of the command analyzer; an internal memory; a type-converted data mover including a type converter and for storing data, received from the external memory through the interface, in the internal memory under the control of the command analyzer; and an internal type converter for converting data, stored in the internal memory, or data, generated by the polymorphic operator array, under the control of the command analyzer. Therefore, the computational efficiency and power efficiency of the neural network accelerators can be increased.

Description

Neural network accelerator and method of operation thereof

본 개시는 복합 정밀도 뉴럴 네트워크 가속기 및 그의 동작 방법에 관한 것이다.The present disclosure relates to a complex precision neural network accelerator and a method of operating the same.

인간의 뇌를 모방한 인공지능(Artificial Intelligence, AI) 반도체 설계 기술은 수십 년의 역사를 가지고 있지만, 기존 실리콘 기반 반도체의 연산량 한계로 인해 정체된 상태였다. 입력 값에 대한 가중치(Weight)를 학습하는 과정을 통해 뉴런(Neuron)의 신경 전달을 모델링한 신경망(Neural Network)는 오래 전에 제안되었지만, 반도체 기술의 한계로 인하여 각광받지 못하였다. 하지만, 최근에 지속적인 반도체 공정 미세화 및 고도화에 따라, 인공 지능 반도체 설계 기술 및 신경망 모델이 다시 각광받고 있다.Although artificial intelligence (AI) semiconductor design technology that mimics the human brain has a history of several decades, it has been stagnant due to the limitation of the amount of computation of existing silicon-based semiconductors. A neural network, which models the neural transmission of neurons through the process of learning weights for input values, has been proposed a long time ago, but has not received much attention due to the limitations of semiconductor technology. However, with the continuous refinement and advancement of semiconductor processes in recent years, artificial intelligence semiconductor design technology and neural network models are in the spotlight again.

인공지능 반도체는 대량의 입력 정보를 이용하여 특정 서비스에 최적화된 사고 및 추론, 행동 및 동작을 구현한다. 이러한 인공지능 반도체 기술은 다층신경회로망(MLP; multi-layer perceptron)과 신경망(Neural Network) 회로의 개념이 도입되면서 인공지능 기술의 응용 분야가 다양화 및 다변화되고 있다. 지능형 개인서비스를 위한 디바이스(모바일 기기, 지능형 스마트폰 등), 자율이동체(자율주행 자동차, 자율이동형 드론, 물품 수송), 서버형 딥러닝 가속기, 지능형 헬스케어(인공지능 의사, 원격 진료, 웨어러블 헬스케어 기기), 군용장비(무인비행체, 탐지형 로봇), 사회 서비스(금융-예측서비스, 범죄 감시) 등, 인공지능에 의한 기술 혁신은 ICT소재부품 분야의 거의 모든 곳에 적용될 것으로 예상된다.Artificial intelligence semiconductors implement thoughts and reasoning, actions and actions optimized for a specific service by using a large amount of input information. With the introduction of the concepts of multi-layer perceptron (MLP) and neural network circuits, such artificial intelligence semiconductor technology is diversifying and diversifying the application fields of artificial intelligence technology. Devices for intelligent personal service (mobile devices, intelligent smartphones, etc.), autonomous vehicles (self-driving cars, autonomous drones, goods transport), server-type deep learning accelerators, intelligent health care (artificial intelligence doctors, telemedicine, wearable health) Care devices), military equipment (unmanned aerial vehicles, detection robots), and social services (financial-prediction services, crime surveillance), etc., technological innovation by artificial intelligence is expected to be applied almost everywhere in the field of ICT materials and parts.

인공 지능 컴퓨터는 성능 향상을 위하여, 대량의 CPU 및 GPU 기반의 분산 컴퓨팅 기법을 활용한다. 하지만, 인공 지능 컴퓨팅에 요구되는 연산량의 증가는 CPU 및 GPU 기반의 아키텍처가 처리 가능한 연산량의 범위를 벗어나고 있다. 딥러닝(Deep learning)이 적용된 인공 지능은 현재의 모바일 프로세서의 1000배 이상의 성능을 요구한다. 이러한 성능을 구현하기 위해 제조된 제품은 수 킬로와트(KW) 이상의 전력을 소모하고 있어, 상용화에 어려움을 겪고 있다. 나아가, 반도체 장치는 현재 공정 스케일링에 대한 물리적 한계에 직면하고 있다.An artificial intelligence computer utilizes a large amount of CPU and GPU-based distributed computing techniques to improve performance. However, the increase in the amount of computation required for artificial intelligence computing is out of the range of the amount of computation that can be processed by CPU and GPU-based architectures. Artificial intelligence applied with deep learning requires more than 1000 times the performance of current mobile processors. Products manufactured to implement this performance consume more than several kilowatts (KW) of power, so it is difficult to commercialize. Furthermore, semiconductor devices currently face physical limitations with respect to process scaling.

증가된 연산량에 대응하여, GPU(Graphics Processing Unit)을 고도화한 초병렬 GPU 구조 및 뉴럴 네트워크 연산 전용의 가속기 구조가 제안되고 있다. 이러한 구조들은 뉴럴 네트워크 연산에서 높은 비중을 차지하는 다차원 행렬곱 연산 및 합성곱 연산을 가속하기 위한 구조일 수 있다.In response to the increased amount of computation, a superparallel GPU structure in which a graphics processing unit (GPU) is advanced and an accelerator structure dedicated to neural network computation have been proposed. Such structures may be structures for accelerating multidimensional matrix multiplication and convolution operations, which occupy a high proportion in neural network operations.

본 개시의 목적은 효율적인 행렬 합성곱 연산을 제공하는, 복합 정밀도를 지원하는 뉴럴 네트워크 가속기 및 그의 동작 방법을 제공하는 데 있다.An object of the present disclosure is to provide a neural network accelerator supporting complex precision, which provides an efficient matrix convolution operation, and a method of operating the same.

본 개시의 일 실시 예에 따른 뉴럴 네트워크 가속기는: 외부 장치로부터 뉴럴 네트워크 알고리즘의 제 1 층위에 대한 연산을 지시하는 제 1 명령어를 분석하는 명령어 분석기; 상기 명령어 분석기의 제어 하에, 상기 제 1 층위에 대한 연산을 수행하는 다수의 연산기들을 포함하는 다형연산기 어레이; 상기 명령어 분석기의 제어 하에, 상기 외부 장치 및 외부 메모리와 통신하는 인터페이스; 내부 메모리; 상기 명령어 분석기의 제어 하에, 상기 인터페이스를 통해 상기 외부 메모리로부터 수신된 데이터를 상기 내부 메모리에 저장하는 형변환 데이터 이동기; 및 상기 명령어 분석기의 제어 하에, 상기 내부 메모리에 저장된 데이터 또는 상기 다형연산기 어레이에 의해 생성된 데이터의 변환을 수행하는 내부 형변환기를 포함할 수 있다.A neural network accelerator according to an embodiment of the present disclosure includes: a command analyzer for analyzing a first command instructing an operation for a first layer of a neural network algorithm from an external device; a polymorphic operator array including a plurality of operators performing an operation on the first layer under the control of the instruction analyzer; an interface communicating with the external device and an external memory under the control of the command analyzer; internal memory; a type conversion data mover for storing data received from the external memory through the interface in the internal memory under the control of the command analyzer; and an internal type converter that converts data stored in the internal memory or data generated by the polymorphism operator array under the control of the command analyzer.

본 개시의 일 실시 예에 따른 뉴럴 네트워크 알고리즘을 처리하기 위한 연산들을 수행하는 다형연산기 어레이, 내부 메모리, 외부 메모리에 저장된 데이터를 상기 내부 메모리로 전달하는 형변환 데이터 이동기, 및 상기 내부 메모리에 저장된 데이터 또는 상기 다형연산기 어레이에 의해 생성된 데이터의 형변환을 수행하는 내부 형변환기를 포함하는 뉴럴 네트워크 가속기의 동작 방법은: 외부 장치로부터 뉴럴 네트워크 알고리즘의 제 1 층위에 대한 연산을 지시하는 제 1 명령어를 분석하는 단계; 상기 제 2 층위의 결과 데이터의 형변환을 상기 형변환 데이터 이동기 또는 상기 내부 형변환기 중 어느 하나에 의해 수행하는 단계; 상기 제 2 층위의 결과 데이터에 기반하여, 상기 제 1 층위에 대한 상기 연산을 수행하는 단계; 및 상기 제 1 층위에 대한 상기 연산의 결과를 포함하는 상기 제 1 층위의 결과 데이터를 출력하는 단계를 포함할 수 있다.A polymorphic operator array that performs operations for processing a neural network algorithm according to an embodiment of the present disclosure, an internal memory, a type conversion data mover that transfers data stored in an external memory to the internal memory, and data stored in the internal memory or An operating method of a neural network accelerator including an internal type converter for performing type conversion of data generated by the polymorphic operator array includes: analyzing a first instruction instructing an operation on a first layer of a neural network algorithm from an external device; ; performing type conversion of the result data of the second layer by either the type conversion data mover or the internal type converter; performing the operation on the first layer based on the result data of the second layer; and outputting result data of the first layer including the result of the operation on the first layer.

본 개시의 일 실시 예에 따른 뉴럴 네트워크 가속기는 뉴럴 네트워크 각 층위에 대해 서로 다른 정밀도를 지원할 수 있다. 이에 따라, 뉴럴 네트워크 가속기의 연산 효율 및 전력 효율이 개선될 수 있다. 나아가, 본 개시의 일 실시 예에 따른 뉴럴 네트워크 가속기는 각 층위 별로 데이터의 형변환을 수행할 수 있고, 이때 형변환이 수행되는 시점은 유동적으로 조절될 수 있다. 이에 따라, 뉴럴 네트워크 가속기 내부 메모리의 용량이 절약되고, 뉴럴 네트워크 가속기의 외부 장치와 통신하는 인터페이스의 대역폭이 절약될 수 있다.A neural network accelerator according to an embodiment of the present disclosure may support different precisions for each layer of a neural network. Accordingly, the computational efficiency and power efficiency of the neural network accelerator may be improved. Furthermore, the neural network accelerator according to an embodiment of the present disclosure may perform type conversion of data for each layer, and in this case, the timing at which the type conversion is performed may be flexibly adjusted. Accordingly, the capacity of the internal memory of the neural network accelerator may be saved, and the bandwidth of the interface communicating with the external device of the neural network accelerator may be saved.

도 1은 본 개시의 일 실시 예에 따라, 컴퓨팅 장치의 블록도를 도시한다.
도 2는 본 개시의 일 실시 예에 따라, 도 1의 다형연산기 어레이를 좀 더 구체적으로 도시한다.
도 3은 본 개시의 일 실시 예에 따라, 도 1의 다형연산기의 어레이의 동작을 설명하기 위한 모식도이다.
도 4는 본 개시의 일 실시 예에 따라, 도 1의 형변환 데이터 이동기를 좀 더 구체적으로 도시한다.
도 5는 본 개시의 일 실시 예에 따라, 도 1의 뉴럴 네트워크 장치의 동작 방법을 나타내는 흐름도를 도시한다.
도 6은 본 개시의 일 실시 예에 따라, 도 1의 뉴럴 네트워크 가속기의 동작 방법을 좀 더 구체적으로 도시한다.
도 7은 본 개시의 다른 실시 예에 따라, 도 1의 뉴럴 네트워크 가속기의 동작 방법을 좀 더 구체적으로 도시한다.1 illustrates a block diagram of a computing device, according to an embodiment of the present disclosure.
FIG. 2 illustrates the polymorphic operator array of FIG. 1 in more detail, according to an embodiment of the present disclosure.
FIG. 3 is a schematic diagram for explaining the operation of the array of the polymorphic operator of FIG. 1 according to an embodiment of the present disclosure.
4 illustrates the type conversion data mover of FIG. 1 in more detail according to an embodiment of the present disclosure.
5 is a flowchart illustrating an operation method of the neural network device of FIG. 1 according to an embodiment of the present disclosure.
FIG. 6 illustrates a method of operating the neural network accelerator of FIG. 1 in more detail according to an embodiment of the present disclosure.
FIG. 7 illustrates a method of operating the neural network accelerator of FIG. 1 in more detail according to another embodiment of the present disclosure.

이하에서, 본 개시의 기술 분야에서 통상의 지식을 가진 자가 본 개시를 용이하게 실시할 수 있을 정도로, 본 개시의 실시 예들이 명확하고 상세하게 기재될 것이다.Hereinafter, embodiments of the present disclosure will be described clearly and in detail to the extent that those of ordinary skill in the art can easily practice the present disclosure.

이하에서, 첨부한 도면들을 참조하여, 본 개시의 몇몇 실시 예들을 보다 상세하게 설명하고자 한다. 본 개시를 설명함에 있어 전체적인 이해를 용이하게 하기 위하여 도면상의 유사한 구성요소에 대해서는 유사한 참조부호가 사용되고, 그리고 유사한 구성요소에 대해서 중복된 설명은 생략된다.Hereinafter, some embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. In describing the present disclosure, in order to facilitate an overall understanding, like reference numerals are used for similar components in the drawings, and duplicate descriptions for similar components are omitted.

도 1은 본 개시의 일 실시 예에 따라, 컴퓨팅 장치(10)의 블록도를 도시한다. 도 1을 참조하면, 컴퓨팅 장치(10)는 컨트롤러(11), 외부 메모리(12), 및 뉴럴 네트워크 가속기(100)를 포함할 수 있다. 몇몇 실시 예들에 있어서, 컴퓨팅 장치(10)는 과학 기술 연산 등과 같은 다양한 분야에서 사용되는 슈퍼 컴퓨팅 장치일 수 있다. 그러나 본 개시의 범위가 이에 한정되는 것은 아니며, 컴퓨팅 장치(10)는 다양한 연산 기능을 지원하도록 구성된 다양한 종류의 컴퓨팅 장치들을 포함할 수 있다. 예를 들어, 컴퓨팅 장치(10)는 개인용 컴퓨터, 노트북, 태블릿, 스마트폰, 서버, 워크스테이션, 블랙박스, 자동차 전장 시스템 등과 같은 다양한 컴퓨팅 장치 또는 정보 처리 장치를 포함할 수 있다.1 illustrates a block diagram of a computing device 10 according to an embodiment of the present disclosure. Referring to FIG. 1 , a computing device 10 may include a controller 11 , an external memory 12 , and a neural network accelerator 100 . In some embodiments, the computing device 10 may be a super computing device used in various fields such as scientific and technological computation. However, the scope of the present disclosure is not limited thereto, and the computing device 10 may include various types of computing devices configured to support various computing functions. For example, the computing device 10 may include various computing devices or information processing devices such as personal computers, notebook computers, tablets, smart phones, servers, workstations, black boxes, automotive electronic systems, and the like.

컨트롤러(11)는 컴퓨팅 장치(10) 내에서 수행되는 다양한 동작들을 제어할 수 있다. 예를 들어, 컨트롤러(11)는 뉴럴 네트워크 가속기(100)에서 사용되기 위한 연산 데이터, 또는 뉴럴 네트워크 알고리즘 등을 결정할 수 있다. 다른 예를 들어, 컨트롤러(11)는 외부 메모리(12)에 데이터를 저장하거나, 또는 외부 메모리(12)에 저장된 데이터를 읽을 수 있다.The controller 11 may control various operations performed in the computing device 10 . For example, the controller 11 may determine computation data or a neural network algorithm to be used in the neural network accelerator 100 . For another example, the controller 11 may store data in the external memory 12 or read data stored in the external memory 12 .

외부 메모리(12)는 컨트롤러(11) 또는 뉴럴 네트워크 가속기(100)에 의해, 데이터를 저장할 수 있다. 예를 들어, 외부 메모리(12)는 뉴럴 네트워크 가속기(100)로 입력되기 위한 데이터 또는 뉴럴 네트워크 가속기(100)의 학습 과정에서 생성 또는 갱신되는 다양한 파라미터들의 데이터를 저장할 수 있다. 몇몇 실시 예들에 있어서, 외부 메모리(12)는 DRAM(Dynamic Random Access Memory)을 포함할 수 있다.The external memory 12 may store data by the controller 11 or the neural network accelerator 100 . For example, the external memory 12 may store data to be input to the neural network accelerator 100 or data of various parameters generated or updated in a learning process of the neural network accelerator 100 . In some embodiments, the external memory 12 may include dynamic random access memory (DRAM).

뉴럴 네트워크 가속기(100)는 컨트롤러(11)의 제어에 따라, 뉴럴 네트워크 모델 또는 알고리즘을 학습하거나, 또는 학습된 뉴럴 네트워크 모델에 기반하여 추론을 수행할 수 있다. 뉴럴 네트워크 가속기(100)는 뉴럴 네트워크 모델 또는 알고리즘을 학습하기 위한 다양한 연산들을 수행하거나, 또는 학습된 뉴럴 네트워크 모델에 기반하여 추론을 수행하기 위한 다양한 연산들을 수행할 수 있다. 몇몇 실시 예들에 있어서, 뉴럴 네트워크 가속기(100)는 연산 데이터에 기반하여 합성곱(convolution) 연산을 수행하고, 그리고 연산의 결과를 결과 데이터로서 뉴럴 네트워크 가속기(100)의 내부 메모리(140) 또는 외부 메모리(12)에 저장할 수 있다.The neural network accelerator 100 may learn a neural network model or algorithm under the control of the controller 11 , or may perform inference based on the learned neural network model. The neural network accelerator 100 may perform various operations for learning a neural network model or algorithm, or may perform various operations for performing inference based on the learned neural network model. In some embodiments, the neural network accelerator 100 performs a convolution operation based on the operation data, and uses the result of the operation as the result data in the internal memory 140 or external of the neural network accelerator 100 may be stored in the memory 12 .

뉴럴 네트워크 가속기(100)는 명령어 분석기(110), 인터페이스(120), 형변환 데이터 이동기(130), 내부 메모리(140), 내부 형변환기(150), 및 다형연산기 어레이(160)를 포함할 수 있다.The neural network accelerator 100 may include a command analyzer 110 , an interface 120 , a type conversion data mover 130 , an internal memory 140 , an internal type converter 150 , and a polymorphism operator array 160 . .

명령어 분석기(110)는 뉴럴 네트워크 가속기(100)의 동작을 관리할 수 있다. 예를 들어, 명령어 분석기(110)는 외부 장치(예를 들어, 컨트롤러(11) 또는 외부 메모리(12))로부터 인터페이스(120)를 통해 뉴럴 네트워크 가속기(100)의 동작을 지시하는 명령어를 수신할 수 있다. 명령어 분석기(110)는 수신된 명령어를 분석하고, 그리고 분석된 명령어에 따라 뉴럴 네트워크 가속기(100)의 인터페이스(120), 형변환 데이터 이동기(130), 내부 메모리(140), 내부 형변환기(150), 또는 다형연산기 어레이(160) 등의 동작들을 제어할 수 있다. 몇몇 실시 예들에 있어서, 명령어 분석기(110)는 뉴럴 네트워크 가속기(100)의 컨트롤러 또는 프로세서로서 지칭될 수 있다.The command analyzer 110 may manage the operation of the neural network accelerator 100 . For example, the command analyzer 110 may receive a command instructing the operation of the neural network accelerator 100 through the interface 120 from an external device (eg, the controller 11 or the external memory 12). can The command analyzer 110 analyzes the received command, and according to the analyzed command, the interface 120 of the neural network accelerator 100, the cast data mover 130, the internal memory 140, the internal caster 150 , or operations such as the polymorphic operator array 160 may be controlled. In some embodiments, the instruction analyzer 110 may be referred to as a controller or processor of the neural network accelerator 100 .

예를 들어, 명령어 분석기(110)는 행렬 연산의 수행, 데이터의 정밀도 조정, 또는 결과 데이터의 출력 등과 같이, 뉴럴 네트워크의 어떤 층위(layer)의 연산 방식에 대해 정의된(또는 그러한 연산 방식을 지시하는) 명령어를 수신할 수 있다. 명령어 분석기(110)는 수신된 명령어를 분석하고, 그리고 분석된 명령어에 응답하여, 뉴럴 네트워크 가속기(100)의 다른 구성 요소들을 제어할 수 있다.For example, the instruction analyzer 110 is defined for (or indicates an operation method) of a certain layer of the neural network, such as performing a matrix operation, adjusting the precision of data, or outputting result data, etc. ) command can be received. The command analyzer 110 may analyze the received command, and in response to the analyzed command, control other components of the neural network accelerator 100 .

인터페이스(120)는 뉴럴 네트워크 가속기(100)의 외부 장치와 통신할 수 있다. 예를 들어, 인터페이스(120)는 명령어 또는 뉴럴 네트워크 가속기(100)에 의해 처리되기 위한 연산 데이터를 외부 장치로부터 수신할 수 있다. 인터페이스(120)는 뉴럴 네트워크 가속기(100)에 의해 생성된 결과 데이터를 외부 장치로 출력할 수 있다. 예를 들어, 인터페이스(120)는 결과 데이터를 외부 메모리(12)에 저장할 수 있다. 몇몇 실시 예들에 있어서, 인터페이스(120)는 AXI(Advanced eXtensible Interface) 또는 PCIe(Peripheral Component Interconnect Express) 등으로 구현될 수 있다.The interface 120 may communicate with an external device of the neural network accelerator 100 . For example, the interface 120 may receive a command or operation data to be processed by the neural network accelerator 100 from an external device. The interface 120 may output result data generated by the neural network accelerator 100 to an external device. For example, the interface 120 may store the result data in the external memory 12 . In some embodiments, the interface 120 may be implemented as an Advanced eXtensible Interface (AXI) or a Peripheral Component Interconnect Express (PCIe).

형변환 데이터 이동기(130)는 인터페이스(120)를 통해 연산(예를 들어, 행렬 연산)을 위한 연산 데이터를 수신할 수 있다. 형변환 데이터 이동기(130)는 인터페이스(120)를 통해 수신된 외부 장치의 데이터를 내부 메모리(140)에 저장할 수 있다. 예를 들어, 형변환 데이터 이동기(130)는 데이터의 형변환(또는 정밀도 변환)을 수행하고, 그리고 형변환된 데이터를 내부 메모리(140)에 저장할 수 있다. 몇몇 실시 예들에 있어서, 형변환 데이터 이동기(130)는 형변환기를 포함하는 DMA(Direct Memory Access)를 포함할 수 있다. 형변환 데이터 이동기(130)는 구체적으로 후술될 것이다.The type conversion data mover 130 may receive operation data for operation (eg, matrix operation) through the interface 120 . The type conversion data mover 130 may store the data of the external device received through the interface 120 in the internal memory 140 . For example, the type conversion data mover 130 may perform type conversion (or precision conversion) of data, and store the type conversion data in the internal memory 140 . In some embodiments, the type conversion data mover 130 may include a DMA (Direct Memory Access) including a type converter. The type conversion data mover 130 will be described in detail later.

내부 메모리(140)는 인터페이스(120)를 통해 외부 장치로부터 입력된 데이터, 명령어 분석기(110)에 의해 처리되고자 하는 명령어들, 또는 다형연산기 어레이(160)에 의해 생성된 결과 데이터 등을 저장할 수 있다. 몇몇 실시 예들에 있어서, 내부 메모리(140)는 DRAM 또는 SRAM(Static Random Access Memory) 등으로서 구현될 수 있다.The internal memory 140 may store data input from an external device through the interface 120 , commands to be processed by the command analyzer 110 , or result data generated by the polymorphism operator array 160 , etc. . In some embodiments, the internal memory 140 may be implemented as DRAM or static random access memory (SRAM).

내부 형변환기(150)는 다형연산기 어레이(160)로 입력되기 위한 데이터의 형변환을 수행할 수 있다. 예를 들어, 내부 형변환기(150)는 내부 메모리(140)에 저장된 연산 데이터 또는 내부 메모리(140)에 저장된 이전 층위의 결과 데이터의 정밀도를 변환할 수 있다. 내부 형변환기(150)는 형변환된 데이터를 다형연산기 어레이(160)로 전달할 수 있다.The internal type converter 150 may perform type conversion of data to be input to the polymorphism operator array 160 . For example, the internal type converter 150 may convert the precision of operation data stored in the internal memory 140 or result data of a previous layer stored in the internal memory 140 . The internal type converter 150 may transmit the type-converted data to the polymorphism operator array 160 .

다형연산기 어레이(160)는 연산 데이터에 대해 다양한 연산들을 수행할 수 있다. 예를 들어, 다형연산기 어레이(160)는 다수의 연산 데이터에 대해 병렬 연산 또는 행렬 연산을 수행할 수 있다. 다형연산기 어레이(160)는 행렬 연산의 가속을 위한 다수의 다형연산기들로 구성된 어레이(들)를 포함할 수 있다. 다형연산기 어레이(160)는 연산들의 수행으로 생성되는 결과 데이터를 내부 메모리(140)에 저장하거나, 또는 외부 메모리(12)에 저장하기 위해 인터페이스(120)로 전달할 수 있다. 다형연산기 어레이(160)의 동작은 구체적으로 후술될 것이다.The polymorphic operator array 160 may perform various operations on operation data. For example, the polymorphic operator array 160 may perform a parallel operation or a matrix operation on a plurality of operation data. The polymorphic operator array 160 may include an array(s) of a plurality of polymorphic operators for accelerating matrix operations. The polymorphic operator array 160 may store result data generated by performing the operations in the internal memory 140 or transmit it to the interface 120 for storage in the external memory 12 . The operation of the polymorphic operator array 160 will be described in detail later.

다형연산기 어레이(160)에 의해 처리되고자 하는 연산 데이터는 입력 데이터 및 커널(kernel) 데이터를 포함할 수 있다. 입력 데이터는 프레임별 단위 이미지 또는 프레임별 단위 음성 행렬 데이터를 포함할 수 있다. 커널 데이터는 학습이 종료된 신경망의 경우 정해진 값을 가진 특수한 행렬 데이터일 수 있다. 다형연산기 어레이(160)는 다수의 입력 데이터에 대한 실시간 신경망 연산을 통해 실시간 영상, 음성 등의 데이터를 처리할 수 있다.The operation data to be processed by the polymorphic operator array 160 may include input data and kernel data. The input data may include unit image per frame or unit speech matrix data per frame. The kernel data may be special matrix data having a predetermined value in the case of a neural network that has been trained. The polymorphic operator array 160 may process data such as real-time images and voices through real-time neural network operation on a plurality of input data.

이하에서, 뉴럴 네트워크 가속기(100)에 의해 가속되는(또는 처리되는) 뉴럴 네트워크 알고리즘에 대하여, 다음과 같은 전제들이 가정될 것이다: 1) 뉴럴 네트워크 알고리즘은 다수의 층위들로 구성될 수 있고 그리고 각 층위에 대하여, 연산의 대상이 되는 행렬의 크기는 미리 결정될 수 있다; 2) 뉴럴 네트워크 알고리즘의 각 층위에 대하여, 커널 데이터는 뉴럴 네트워크 알고리즘의 학습을 통해 미리 그 값이 결정될 수 있다; 그리고, 3) 뉴럴 네트워크 알고리즘의 각 층위에 대하여, 입력 데이터는 응용 프로그램에 따라 이미지, 음성 등과 같이 정해진 형태 그리고, 미리 결정된 값의 범위를 가질 수 있다.In the following, for the neural network algorithm accelerated (or processed) by the neural network accelerator 100, the following assumptions will be assumed: 1) The neural network algorithm can be composed of multiple layers and each With respect to the layer, the size of the matrix to be computed may be predetermined; 2) For each layer of the neural network algorithm, the value of the kernel data may be determined in advance through learning of the neural network algorithm; And, 3) For each layer of the neural network algorithm, input data may have a predetermined shape such as an image or voice according to an application program, and a predetermined range of values.

뉴럴 네트워크 가속기(100)에 의해 가속되는 뉴럴 네트워크 알고리즘은 다수의 층위들을 포함할 수 있다. 하나의 층위에 대하여, 뉴럴 네트워크 가속기(100)는 뉴럴 네트워크 가속기(100)의 동작을 지시하는 명령어를 외부 장치(예를 들어, 컨트롤러(11))로부터 수신하고, 그리고 수신된 명령어를 분석하여, 연산 데이터에 대한 행렬 연산(예를 들어, 합성곱 연산)을 수행할 수 있다. 뉴럴 네트워크 가속기(100)는 연산 결과를 내부 메모리(140)에 저장할 수 있다. 뉴럴 네트워크 가속기(100)는 내부 메모리(140)에 저장된 연산 결과를 외부 장치(예를 들어, 외부 메모리(12) 또는 컨트롤러(11))로 출력하거나, 또는 다음 층위의 연산에 다시 사용할 수 있다. 이후, 다음 층위에 대하여, 뉴럴 네트워크 가속기(100)는 다시 명령어를 수신하고, 수신된 명령어를 분석하여 연산을 수행하고, 그리고 수행된 연산의 결과를 저장 또는 출력할 수 있다.A neural network algorithm accelerated by the neural network accelerator 100 may include multiple layers. For one layer, the neural network accelerator 100 receives a command instructing the operation of the neural network accelerator 100 from an external device (eg, the controller 11), and analyzes the received command, A matrix operation (eg, a convolution operation) may be performed on the operation data. The neural network accelerator 100 may store the calculation result in the internal memory 140 . The neural network accelerator 100 may output the operation result stored in the internal memory 140 to an external device (eg, the external memory 12 or the controller 11), or may be used again for the next-level operation. Thereafter, for the next layer, the neural network accelerator 100 may receive a command again, analyze the received command, perform an operation, and store or output a result of the performed operation.

몇몇 실시 예들에 있어서, 어떤 한 층위에서 사용되는(또는 요구되는) 데이터의 정밀도는 후속 층위에서 사용되는(또는 요구되는) 데이터의 정밀도와 상이할 수 있다. 뉴럴 네트워크 가속기(100)는 각 층위에 대하여 서로 다른 정밀도를 지원할 수 있다. 예를 들어, 뉴럴 네트워크 가속기(100)는 어떤 한 층위에 대하여, 이전 층위의 결과 데이터에 대해 데이터 형변환을 독립적으로 수행하고, 그리고 형변환된 데이터에 대하여 연산을 수행할 수 있다. 이러한 실시 예들에서, 뉴럴 네트워크 가속기(100)는 다양한 데이터 타입에 대한 연산을 지원할 수 있다. 뉴럴 네트워크 가속기(100)는 복합 정밀도(Multi-Precision)를 지원하는 뉴럴 네트워크 가속기로서 이해될 수 있다.In some embodiments, the precision of data used (or required) in one layer may be different from the precision of data used (or required) in a subsequent layer. The neural network accelerator 100 may support different precision for each layer. For example, the neural network accelerator 100 may independently perform data type conversion on the result data of a previous layer for any one layer, and perform an operation on the type-converted data. In these embodiments, the neural network accelerator 100 may support operations on various data types. The neural network accelerator 100 may be understood as a neural network accelerator supporting multi-precision.

뉴럴 네트워크 가속기(100)에 의해 처리되는 뉴럴 네트워크 알고리즘의 각 층위는 동일한 정밀도를 사용할 필요가 없을 수 있다. 이로 인해, 모든 층위에서 단일한 정밀도를 사용함에 따라 가장 높은 정밀도로 연산을 수행할 필요 또한 없을 수 있다. 이에 따라, 뉴럴 네트워크 가속기(100)의 연산 효율 및 전력 효율이 개선될 수 있다. 이로 인해, 다형연산기 어레이(160)에 대하여 블랙 실리콘 현상 또는 온도 제어 불가능 현상의 발생이 방지될 수 있다.Each layer of the neural network algorithm processed by the neural network accelerator 100 may not need to use the same precision. For this reason, since a single precision is used in all layers, it may not be necessary to perform an operation with the highest precision. Accordingly, the computational efficiency and power efficiency of the neural network accelerator 100 may be improved. Due to this, the occurrence of a black silicon phenomenon or a temperature control impossible phenomenon in the polymorphic operator array 160 can be prevented.

몇몇 실시 예들에 있어서, 뉴럴 네트워크 가속기(100)가 형변환을 수행하는 시점은 유동적으로 가변될 수 있다. 예를 들어, 뉴럴 네트워크 가속기(100)는 인터페이스(120) 및 내부 메모리(140) 사이의 형변환 데이터 이동기(130)에서, 또는 내부 메모리(140) 및 다형연산기 어레이(160) 사이의 형변환기(150)에서 데이터의 형변환을 수행할 수 있다. 형변환을 적절하게 수행함으로써, 뉴럴 네트워크 가속기(100)는 상대적으로 적은 용량의 데이터를 내부 메모리(140)에 저장할 수 있다. 이에 따라, 내부 메모리(140)의 용량이 절약될 수 있다.In some embodiments, the timing at which the neural network accelerator 100 performs type conversion may be flexibly changed. For example, the neural network accelerator 100 is the type conversion data mover 130 between the interface 120 and the internal memory 140 , or the type converter 150 between the internal memory 140 and the polymorphic operator array 160 . ) to perform data type conversion. By appropriately performing the type conversion, the neural network accelerator 100 can store a relatively small amount of data in the internal memory 140 . Accordingly, the capacity of the internal memory 140 may be saved.

몇몇 실시 예들에 있어서, 뉴럴 네트워크 가속기(100)는 다형연산기 어레이(160)에 의해 생성된 결과 데이터에 대해 추가적으로 형변환을 수행할 수 있다. 형변환된 데이터는 내부 메모리(140) 또는 외부 메모리(12)에 저장될 수 있다. 이에 따라, 내부 메모리(140)의 용량이 절약되거나, 또는 인터페이스(120)의 대역폭이 절약될 수 있다.In some embodiments, the neural network accelerator 100 may additionally perform type conversion on the result data generated by the polymorphism operator array 160 . The type-converted data may be stored in the internal memory 140 or the external memory 12 . Accordingly, the capacity of the internal memory 140 or the bandwidth of the interface 120 may be saved.

도 2는 본 개시의 일 실시 예에 따라, 도 1의 다형연산기 어레이(160)를 좀 더 구체적으로 도시한다. 도 3은 본 개시의 일 실시 예에 따라, 도 1의 다형연산기 어레이(160)의 동작을 설명하기 위한 모식도이다. 도 1 내지 도 3을 참조하여, 다형연산기 어레이(160)의 동작이 구체적으로 설명될 것이다.FIG. 2 illustrates the polymorphic operator array 160 of FIG. 1 in more detail, according to an embodiment of the present disclosure. 3 is a schematic diagram for explaining the operation of the polymorphic operator array 160 of FIG. 1 according to an embodiment of the present disclosure. 1 to 3, the operation of the polymorphic operator array 160 will be described in detail.

다형연산기 어레이(160)는 다수의 연산기들(161)을 포함할 수 있다. 예를 들어, 다형연산기 어레이(160)는 N개의 행 및 N개의 열을 포함하는 N*N 다형연산기 어레이를 포함할 수 있다(N은 자연수). 연산기(161)는 하나 이상의 하위 연산기들을 포함할 수 있다. 예를 들어, 연산기(161)는 10-비트 덧셈기(161a), 5-비트 덧셈기(161b), 10-비트 곱셈기(161c), 및 4-비트 곱셈기(161d)를 포함할 수 있다. 도 2 및 도 3에 도시된 실시 예에 있어서, 연산기(161)를 포함하는 다형연산기 어레이(160)는 다섯 가지 종류의 연산들 또는 다섯 가지 종류의 데이터 타입을 지원할 수 있다. 그러나 본 개시의 다형연산기 어레이(160)가 지원하는 데이터 타입의 가지 수는 도시된 실시 예에 한정되지 아니한다.The polymorphic operator array 160 may include a plurality of operators 161 . For example, the polymorphic operator array 160 may include an N*N polymorphic operator array including N rows and N columns (N is a natural number). The operator 161 may include one or more sub-operators. For example, the operator 161 may include a 10-bit adder 161a, a 5-bit adder 161b, a 10-bit multiplier 161c, and a 4-bit multiplier 161d. 2 and 3 , the polymorphic operator array 160 including the operator 161 may support five types of operations or five types of data types. However, the number of data types supported by the polymorphic operator array 160 of the present disclosure is not limited to the illustrated embodiment.

도 3을 참조하여, 연산기(161)의 하위 연산기들의 동작이 결정될 수 있다. 도 3의 실시 예에서, 뉴럴 네트워크 가속기(100)의 다형연산기 어레이(160)는 16비트 부동소수점(FP16), 8비트 부동 소수점(HFP8), 2비트 고정소수점(INT2), 4비트 고정소수점(IN4), 및 8비트 고정소수점(INT8)의 데이터 타입들(형들)을 지원할 수 있다. 예를 들어, 뉴럴 네트워크 가속기(100)를 포함하는 컴퓨팅 장치(10) 상에서, 뉴럴 네트워크 가속기(100)에 의해 처리되는 뉴럴 네트워크 알고리즘의 층위들에 따라 상술된 다섯 가지 유형의 데이터를 처리하는 응용 프로그램들이 구동될 수 있다.Referring to FIG. 3 , operations of lower operators of the operator 161 may be determined. In the embodiment of Figure 3, the polymorphic operator array 160 of the neural network accelerator 100 is 16-bit floating point (FP16), 8-bit floating point (HFP8), 2-bit fixed-point (INT2), 4-bit fixed-point ( IN4), and data types (types) of 8-bit fixed-point (INT8). For example, on the computing device 10 including the neural network accelerator 100, an application program that processes the five types of data described above according to the layers of the neural network algorithm processed by the neural network accelerator 100 can be driven

각 데이터 타입은 부호, 지수, 및 가수 부분으로 나눠질 수 있다. 이때, 세 부분들의 비트 수의 합이 데이터의 총 비트 수일 수 있다. 몇몇 실시 예에서, 고정소수점의 데이터는 부호 부분은 포함할 수 있으나, 지수 부분은 포함하지 않을 수 있다. 8비트 부동소수점은 IEEE 754(Institute of Electrical and Electronics Engineers Standard for Floating-Point Arithmetic)를 따르지 않는 새롭게 정의된 데이터 타입일 수 있다. 뉴럴 네트워크 연산이 가수 부분의 높은 정밀도를 요구하지 않음을 고려하여, 8비트 부동소수점은 16비트 부동소수점 데이터의 가수 부분을 부분적으로 또는 전부 제거한 데이터일 수 있다. 8비트 부동소수점과 같이 새롭게 고안된 데이터 타입은 대부분 기존 표준인 IEEE 754에 따른 데이터 타입에서 가수 부분이 차지하는 부분을 줄이고 지수 부분을 늘림으로써, 데이터의 표현 범위는 늘리되 데이터의 유효 숫자는 줄이는 방식으로 생성될 수 있다.Each data type can be divided into sign, exponent, and mantissa parts. In this case, the sum of the number of bits of the three parts may be the total number of bits of data. In some embodiments, fixed-point data may include a sign part, but may not include an exponent part. 8-bit floating point may be a newly defined data type that does not conform to IEEE 754 (Institute of Electrical and Electronics Engineers Standard for Floating-Point Arithmetic). Considering that neural network operation does not require high precision of the mantissa part, 8-bit floating-point data may be data in which the mantissa part of 16-bit floating-point data is partially or completely removed. Newly devised data types such as 8-bit floating point are mostly used in data types according to the existing standard IEEE 754, by reducing the mantissa part and increasing the exponent part. can be created

도 3의 '덧셈' 열 및 '곱셈' 열은 다형연산기 어레이(160) 내 각 연산기(161)에서 데이터 타입들 각각에 대해 덧셈 연산 및 곱셈 연산이 각각 수행될 때 실제로 연산되는 비트 수를 나타낼 수 있다. 몇몇 실시 예들에 있어서, 고정소수점 데이터 타입의 경우, 덧셈 연산 및 곱셈 연산이 수행될 때 실제로 연산되는 비트 수는 데이터의 총 비트 수와 동일할 수 있다. 반면에, 부동소수점 데이터 타입의 경우, 덧셈 연산이 수행될 때는 가수 부분의 덧셈 연산만이 수행되고, 그리고 곱셈 연산이 수행될 때는 지수 부분의 덧셈 연산 및 가수 부분의 곱셈 연산이 수행될 수 있으므로, 데이터 타입의 총 비트 수는 실제로 연산되는 비트 수와 상이할 수 있다.The 'addition' column and the 'multiplication' column of FIG. 3 may represent the number of bits actually operated when addition and multiplication operations are respectively performed for each data type in each operator 161 in the polymorphic operator array 160. there is. In some embodiments, in the case of a fixed-point data type, the number of bits actually operated when addition and multiplication operations are performed may be equal to the total number of bits of data. On the other hand, in the case of the floating-point data type, when the addition operation is performed, only the addition operation of the mantissa part is performed, and when the multiplication operation is performed, the addition operation of the exponent part and the multiplication operation of the mantissa part can be performed. The total number of bits of the data type may be different from the number of bits actually operated.

실제 연산이 수행되는 비트 수에 따르면, 덧셈 연산의 경우, 10-비트 덧셈기(161a) 및 5-비트 덧셈기(161b)를 이용하면 5개의 데이터 타입간 덧셈 연산이 모두 수행될 수 있다. 5-비트 덧셈기(161b)는 곱셈 연산이 수행될 때 수행되는 지수 부분 사이의 덧셈 연산에도 사용될 수 있다. 곱셈 연산의 경우, 10-비트 곱셈기(161c) 및 4-비트 곱셈기(161d)를 이용하면 5개의 데이터 타입의 가수 부분 사이의 곱셈 연산이 모두 수행될 수 있다. 결과적으로, 도 3의 다섯 가지 데이터 타입들을 지원하기 위해 연산기(161)가 10-비트 덧셈기(161a), 5-비트 덧셈기(161b), 10-비트 곱셈기(161c), 및 4-비트 곱셈기(161d)를 포함하면, 다형연산기 어레이(160)의 면적 및 전력 효율이 최적화될 수 있다.According to the number of bits on which the actual operation is performed, in the case of the addition operation, all addition operations between five data types may be performed using the 10-bit adder 161a and the 5-bit adder 161b. The 5-bit adder 161b may also be used for an addition operation between exponent parts that is performed when a multiplication operation is performed. In the case of a multiplication operation, all multiplication operations between mantissa parts of five data types can be performed by using the 10-bit multiplier 161c and the 4-bit multiplier 161d. As a result, to support the five data types of FIG. 3 , the operator 161 is a 10-bit adder 161a, a 5-bit adder 161b, a 10-bit multiplier 161c, and a 4-bit multiplier 161d. ), the area and power efficiency of the polymorphic operator array 160 can be optimized.

각 층위에 대해 서로 다른 정밀도를 지원하기 위해, 뉴럴 네트워크 가속기(100)는 연산 데이터 또는 결과 데이터의 타입을 변환할 수 있다. 예를 들어, 데이터 형변환은 형변환 데이터 이동기(130) 또는 내부 형변환기(150)에 의해 수행될 수 있다. 데이터 형변환의 수행 시점 및 그 수행 주체는 구체적으로 후술될 것이다.In order to support different precision for each layer, the neural network accelerator 100 may convert the type of operation data or result data. For example, data type conversion may be performed by the type conversion data mover 130 or the internal type converter 150 . The execution time of the data type conversion and the execution subject will be described in detail later.

데이터 형변환은 뉴럴 네트워크 가속기(100)에 의해 지원되는 모든 데이터 타입들 사이에서 가능할 수 있다. 예를 들어, 도 2 및 도 3에 도시된 실시 예에서, 총 20 가지 경우의 데이터 형변환이 발생할 수 있다(₅P₂). 몇몇 실시 예들에 있어서, 뉴럴 네트워크 가속기(100)에 의해 수행되는 데이터 형변환은 일반적인 데이터 타입의 형변환과 상이할 수 있다. 예를 들어, 뉴럴 네트워크 알고리즘은 정확한 결과값을 도출하는 것이 아니라, 가장 확률이 높은 후보를 가려내기 위한 알고리즘이므로, 뉴럴 네트워크 가속기(100)에 의해 수행되는 데이터 형변환은 대수적 의미의 유효 숫자 자체의 유지를 목표로 하지 않을 수 있다.Data type conversion may be possible between all data types supported by the neural network accelerator 100 . For example, in the embodiments shown in FIGS. 2 and 3 , a total of 20 cases of data type conversion may occur ( ₅ P ₂ ). In some embodiments, the data type conversion performed by the neural network accelerator 100 may be different from the type conversion of a general data type. For example, since the neural network algorithm does not derive an accurate result value, but is an algorithm for screening the candidate with the highest probability, the data type conversion performed by the neural network accelerator 100 maintains the significant number itself in the algebraic meaning. may not be aimed at.

고정소수점 데이터 타입 사이에서 데이터의 총 비트수가 커지는 형변환의 경우, 상위 비트(Most Significant Bit; MSB)가 확장될 수 있다. 예를 들어, INT2 타입에서 INT4 타입으로의 변환, INT2 타입에서 INT8 타입으로의 변환, 또는 INT4 타입에서 INT8 타입으로의 변환이 수행될 때, 데이터의 상위 비트(들)가 확장될 수 있다. 이에 따라, 기존 데이터 값의 변동이 없을 수 있다.In the case of type conversion in which the total number of bits of data increases between fixed-point data types, a Most Significant Bit (MSB) may be extended. For example, when conversion from INT2 type to INT4 type, conversion from INT2 type to INT8 type, or conversion from INT4 type to INT8 type is performed, the upper bit(s) of data may be extended. Accordingly, there may be no change in the existing data value.

고정소수점 데이터 타입 사이에서 데이터의 총 비트수가 감소하는 형변환의 경우, 기준점이 이동되는 형변환이 수행될 수 있다. 예를 들어, INT8 타입에서 INT4 타입으로의 변환, INT4 타입에서 INT2 타입으로의 변환, 또는 INT8 타입에서 INT2 타입으로의 변환이 수행될 때, 일반적인 데이터 타입의 형변환이 수행되면 데이터 사이의 상대적 차이가 존재하지 않을 수 있다(예를 들어, 데이터들이 모두 무한대로 변환되거나, 또는 모두 0으로 변환될 수 있다). 따라서, 형변환이 수행될 때, 데이터의 기준점이 이동될 수 있다.In the case of type conversion in which the total number of bits of data is reduced between fixed-point data types, type conversion in which a reference point is moved may be performed. For example, when conversion from INT8 type to INT4 type, from INT4 type to INT2 type, or from INT8 type to INT2 type is performed, if the type conversion of general data types is performed, the relative difference between data may not exist (eg, data may be all converted to infinity, or all converted to zero). Accordingly, when the type conversion is performed, the reference point of the data may be moved.

부동소수점 데이터 타입 사이에서 데이터의 총 비트수가 커지는 형변환의 경우, 데이터의 가수 부분의 상위 비트가 확장될 수 있다. 예를 들어, HFP8 타입에서 FP16 타입으로의 변환이 수행될 때, 데이터의 가수 부분의 상위 비트가 확장될 수 있다.In the case of type conversion in which the total number of bits of data becomes large between floating-point data types, the upper bits of the mantissa part of the data may be extended. For example, when conversion from the HFP8 type to the FP16 type is performed, the upper bits of the mantissa part of the data may be extended.

부동소수점 데이터 타입 사이에서 데이터의 총 비트수가 작아지는 형변환의 경우, 데이터의 가수 부분의 하위 비트가 절단될 수 있다. 예를 들어, FP16 타입에서 HFP8 타입으로의 변환이 수행될 때, 데이터의 가수 부분의 하위 비트가 버려질 수 있다.In the case of type conversion in which the total number of bits of data becomes smaller between floating-point data types, the lower bits of the mantissa portion of the data may be truncated. For example, when conversion from the FP16 type to the HFP8 type is performed, lower bits of the mantissa portion of the data may be discarded.

고정소수점 데이터 타입 및 부동소수점 데이터 타입 사이의 형변환이 수행될 때, 데이터의 표현 범위에 따라 그 형변환의 방식이 결정될 수 있다. 예를 들어, 데이터의 표현 범위가 넓어지는 형변환의 경우(예를 들어, INT 타입에서 HFP으로 또는 INT 타입에서 FP로의 변환), 기존 데이터가 유지될 수 있다. 다른 예를 들어, 데이터의 표현 범위가 작아지는 형변환의 경우(예를 들어, FP 타입에서 INT 타입으로의 변환, 또는 HFP 타입에서 INT 타입으로의 변환), 데이터 사이의 상대적 차이의 유지를 목표로, 형변환이 수행될 수 있다.When type conversion between the fixed-point data type and the floating-point data type is performed, the type conversion method may be determined according to the expression range of the data. For example, in the case of type conversion in which the expression range of data is widened (eg, INT type to HFP or INT type to FP conversion), existing data may be maintained. For another example, in the case of type conversion in which the expression range of data becomes smaller (for example, conversion from FP type to INT type, or conversion from HFP type to INT type), the goal is to maintain the relative difference between data , type conversion can be performed.

도 4는 본 개시의 일 실시 예에 따라, 도 1의 형변환 데이터 이동기(130)를 좀 더 구체적으로 도시한다. 도 1 내지 도 4를 참조하면, 형변환 데이터 이동기(130)는 형변환기(131)를 포함할 수 있다. 형변환 데이터 이동기(130)는 외부 메모리(12)에 저장된 데이터를 인터페이스(120)를 통해 수신하고, 수신된 데이터에 대해 선택적으로 형변환을 수행하고, 그리고 수신된 데이터 또는 형변환된 데이터를 내부 메모리(140)로 전달할 수 있다. 도시의 편의를 위해, 인터페이스(120)의 도시가 생략되었다.4 illustrates the type conversion data mover 130 of FIG. 1 in more detail according to an embodiment of the present disclosure. 1 to 4 , the type conversion data mover 130 may include a type converter 131 . The type conversion data mover 130 receives the data stored in the external memory 12 through the interface 120, selectively performs type conversion on the received data, and converts the received data or the type-converted data to the internal memory 140 ) can be passed as For convenience of illustration, illustration of the interface 120 is omitted.

형변환기(131)는 데이터 타입 간 형변환을 수행하기 위한 하위 연산기들을 포함할 수 있다. 예를 들어, 형변환기(131)는 올림기(131a) 및 반올림기(131b)를 포함할 수 있다. 도시된 바와 달리, 형변환기(131)는 도 2 및 도 3을 참조하여 설명된 데이터 타입 간 변환을 수행하기 위한 다른 하위 연산기들(예를 들어, 버림기 등)을 더 포함할 수 있다. 형변환기(131)는 하위 연산기들을 통해, 명령어 분석기(110)의 제어 하에, 외부 메모리(12)로부터 수신된 데이터에 대해 형변환을 수행할 수 있다. 몇몇 실시 예들에 있어서, 내부 형변환기(150)는 형변환 데이터 이동기(130)의 형변환기(131)와 유사하게 구현될 수 있고, 그리고 그와 유사하게 동작할 수 있다.The type converter 131 may include sub-operators for performing type conversion between data types. For example, the type converter 131 may include a rounding machine 131a and a rounding machine 131b. Contrary to the illustration, the type converter 131 may further include other sub-operators (eg, a thrower, etc.) for performing conversion between data types described with reference to FIGS. 2 and 3 . The type converter 131 may perform type conversion on data received from the external memory 12 under the control of the command analyzer 110 through sub-operators. In some embodiments, the internal caster 150 may be implemented similarly to the caster 131 of the cast data mover 130 , and may operate similarly thereto.

형변환 데이터 이동기(130)는 형변환기(131)에 의해 형변환된 데이터를 내부 메모리(140)에 저장할 수 있다. 이후, 내부 메모리(140)에 저장된 형변환된 데이터는 다형연산기 어레이(160)로 제공될 수 있다.The type conversion data mover 130 may store the data type converted by the type converter 131 in the internal memory 140 . Thereafter, the type-converted data stored in the internal memory 140 may be provided to the polymorphism operator array 160 .

도 5는 본 개시의 일 실시 예에 따라, 도 1의 뉴럴 네트워크 가속기(100)의 동작 방법을 나타내는 흐름도를 도시한다. 도 1 및 도 5를 참조하면, 뉴럴 네트워크 가속기(100)는 S101 내지 S105 단계들을 수행할 수 있다. 이하에서, 뉴럴 네트워크 가속기(100)에 의해 연산되는 뉴럴 네트워크 알고리즘의 각 층위의 데이터 타입은 컨트롤러(11) 또는 명령어 분석기(110) 등에 의해 미리 결정된 것으로 가정될 것이다.5 is a flowchart illustrating an operation method of the neural network accelerator 100 of FIG. 1 according to an embodiment of the present disclosure. 1 and 5 , the neural network accelerator 100 may perform steps S101 to S105. Hereinafter, it will be assumed that the data type of each layer of the neural network algorithm calculated by the neural network accelerator 100 is predetermined by the controller 11 or the command analyzer 110 .

S101 단계에서, 뉴럴 네트워크 가속기(100)는 연산을 수행하기 위한 명령어를 분석할 수 있다. 예를 들어, 뉴럴 네트워크 가속기(100)의 인터페이스(120)는 외부 장치(예를 들어, 컨트롤러(11))로부터 제 1 층위에 대해 다형연산기 어레이(160)에 의해 수행되고자 하는 연산을 지시하는 동작 명령어를 수신할 수 있다. 명령어 분석기(110)는 수신된 동작 명령어를 분석할 수 있다.In step S101 , the neural network accelerator 100 may analyze an instruction for performing an operation. For example, the interface 120 of the neural network accelerator 100 is an operation for instructing an operation to be performed by the polymorphic operator array 160 for the first layer from an external device (eg, the controller 11). command can be received. The command analyzer 110 may analyze the received operation command.

S102 단계에서, 뉴럴 네트워크 가속기(100)는 이전 층위의 결과 데이터의 위치를 판단할 수 있다. 예를 들어, 뉴럴 네트워크 가속기(100)는 제 2 층위의 결과 데이터의 입력을 지시하는 입력 명령어를 더 수신할 수 있다. 뉴럴 네트워크 가속기(100)는 입력 명령어를 분석하고, 그리고 분석된 입력 명령어에 응답하여, 외부 메모리(12)로부터 제 1 층위보다 한 층위만큼 선행하여 연산이 수행된 제 2 층위의 결과 데이터를 수신하거나 또는 내부 메모리(140)로부터 제 2 층위의 결과 데이터를 로드할 수 있다.In step S102, the neural network accelerator 100 may determine the location of the result data of the previous layer. For example, the neural network accelerator 100 may further receive an input command instructing input of result data of the second layer. The neural network accelerator 100 analyzes the input command, and, in response to the analyzed input command, receives, from the external memory 12, the result data of the second layer, in which the operation is performed in advance of the first layer by one layer, or Alternatively, the result data of the second layer may be loaded from the internal memory 140 .

S103 단계에서, 뉴럴 네트워크 가속기(100)는 S102 단계에서 판단된 위치에 기반하여, 이전 층위의 결과 데이터에 대해 형변환을 수행할 수 있다. 예를 들어, 제 1 층위의 데이터 타입 및 제 2 층위의 데이터 타입이 상이한 것에 응답하여, 제 2 층위의 결과 데이터에 대해 형변환이 수행될 수 있다. 제 2 층위의 결과 데이터의 형변환은 형변환 데이터 이동기(130) 또는 내부 형변환기(150) 중 어느 하나에 의해 수행될 수 있다. 제 2 층위의 결과 데이터의 형변환을 수행하는 주체는 S102 단계의 판단 결과에 따라 결정될 수 있다. 제 2 층위의 결과 데이터의 형변환은 도 6을 참조하여 구체적으로 후술될 것이다.In step S103 , the neural network accelerator 100 may perform type conversion on the result data of the previous layer based on the position determined in step S102 . For example, in response to the data type of the first layer and the data type of the second layer being different, type conversion may be performed on the result data of the second layer. The type conversion of the result data of the second layer may be performed by either the type conversion data mover 130 or the internal type converter 150 . The subject performing the type conversion of the result data of the second layer may be determined according to the determination result of step S102. The type conversion of the result data of the second layer will be described in detail later with reference to FIG. 6 .

S104 단계에서, 뉴럴 네트워크 가속기(100)는 이전 층위의 결과 데이터에 기반하여, 연산을 수행할 수 있다. 예를 들어, 뉴럴 네트워크 가속기(100)의 다형연산기 어레이(160)는 S102 단계에서 검색된 제 2 층위의 결과 데이터 또는 S103 단계에서 형변환된 제 2 층위의 결과 데이터에 대해, S101 단계에서 분석된 동작 명령어에 대응하는 연산을 수행할 수 있다.In step S104 , the neural network accelerator 100 may perform an operation based on the result data of the previous layer. For example, the polymorphic operator array 160 of the neural network accelerator 100 performs the operation command analyzed in step S101 for the result data of the second layer retrieved in step S102 or the result data of the second layer type-converted in step S103. operation corresponding to .

S105 단계에서, 뉴럴 네트워크 가속기(100)는 결과 데이터를 출력할 수 있다. 예를 들어, S101 단계에서 분석된 명령어에 응답하여, 뉴럴 네트워크 가속기(100)는 S104 단계에서 수행된 연산의 결과를 제 1 층위의 결과 데이터로서 내부 메모리(140) 또는 외부 메모리(12)에 저장할 수 있다. 다른 예를 들어, 뉴럴 네트워크 가속기(100)는 데이터의 출력을 명령하는 출력 명령어를 수신하고, 출력 명령어를 명령어 분석기(110)를 통해 분석하고, 그리고 분석된 출력 명령어에 응답하여, S104 단계에서 수행된 연산의 결과를 제 1 층위의 결과 데이터로서 내부 메모리(140) 또는 외부 메모리(12)에 저장할 수 있다.In step S105 , the neural network accelerator 100 may output result data. For example, in response to the command analyzed in step S101, the neural network accelerator 100 stores the result of the operation performed in step S104 in the internal memory 140 or the external memory 12 as the result data of the first layer. can As another example, the neural network accelerator 100 receives an output command for instructing the output of data, analyzes the output command through the command analyzer 110, and responds to the analyzed output command, performed in step S104 The result of the calculated operation may be stored in the internal memory 140 or the external memory 12 as result data of the first layer.

몇몇 실시 예들에 있어서, 뉴럴 네트워크 가속기(100)는 제 1 층위보다 한 층위만큼 후행하여 연산이 수행될 제 3 층위의 데이터 타입에 응답하여, 결과 데이터에 대해 형변환을 수행하고, 그리고 형변환된 결과 데이터를 내부 메모리(140) 도는 외부 메모리(12)에 저장할 수 있다. 제 1 층위의 결과 데이터의 형변환은 도 7을 참조하여 구체적으로 후술될 것이다.In some embodiments, the neural network accelerator 100 performs type conversion on the result data in response to a data type of a third layer on which an operation is to be performed one layer later than the first layer, and the type conversion result data may be stored in the internal memory 140 or the external memory 12 . The type conversion of the result data of the first layer will be described in detail later with reference to FIG. 7 .

도 6은 본 개시의 일 실시 예에 따라, 도 1의 뉴럴 네트워크 가속기(100)의 동작 방법을 좀 더 구체적으로 도시한다. 도 1, 도 5, 및 도 6을 참조하면, 제 1 층위의 데이터 타입 및 제 2 층위의 데이터 타입이 상이한 것에 응답하여, 뉴럴 네트워크 가속기(100)는 S201 내지 S205 단계들을 더 수행할 수 있다.FIG. 6 illustrates a method of operating the neural network accelerator 100 of FIG. 1 in more detail according to an embodiment of the present disclosure. 1, 5, and 6 , in response to the data type of the first layer and the data type of the second layer being different, the neural network accelerator 100 may further perform steps S201 to S205.

S201 단계에서, 뉴럴 네트워크 가속기(100)는 이전 층위의 결과 데이터의 위치를 판단할 수 있다. 예를 들어, 뉴럴 네트워크 가속기(100)는 S102 단계와 유사한 방식으로 S201 단계를 수행할 수 있다. 제 2 층위의 결과 데이터가 외부 메모리(12)에 위치할 때, 뉴럴 네트워크 가속기(100)는 S202 내지 S204 단계들을 수행할 수 있다. 제 2 층위의 결과 데이터가 내부 메모리(140)에 위치할 때, 뉴럴 네트워크 가속기(100)는 S205 단계를 수행할 수 있다.In step S201, the neural network accelerator 100 may determine the location of the result data of the previous layer. For example, the neural network accelerator 100 may perform step S201 in a manner similar to step S102. When the result data on the second layer is located in the external memory 12 , the neural network accelerator 100 may perform steps S202 to S204 . When the result data of the second layer is located in the internal memory 140 , the neural network accelerator 100 may perform step S205 .

S202 단계에서, 뉴럴 네트워크 가속기(100)는 수행할 형변환의 종류를 판단할 수 있다. 예를 들어, 뉴럴 네트워크 가속기(100)의 명령어 분석기(110)는 제 1 층위의 데이터 타입 및 제 2 층위의 데이터 타입을 비교함으로써, 제 2 층위의 결과 데이터에 대해 수행되어야 할 형변환의 종류를 판단할 수 있다.In step S202, the neural network accelerator 100 may determine the type of type conversion to be performed. For example, the command analyzer 110 of the neural network accelerator 100 compares the data type of the first layer and the data type of the second layer, and determines the type of type conversion to be performed on the result data of the second layer can do.

다른 예를 들어, 뉴럴 네트워크 가속기(100)는 외부 장치로부터 뉴럴 네트워크 가속기(100)에 의해 제 2 층위의 결과 데이터에 대해 수행되어야 할 형변환의 종류를 지시하는 명령어를 수신할 수 있다. 몇몇 실시 예들에 있어서, 상기 형변환을 지시하는 명령어는 S101 단계에서 수신된 명령어에 포함될 수도 있다. 상기 형변환을 지시하는 명령어에 기반하여, 뉴럴 네트워크 가속기(100)는 형변환의 종류를 판단할 수 있다.As another example, the neural network accelerator 100 may receive a command indicating the type of type conversion to be performed on the result data of the second layer by the neural network accelerator 100 from an external device. In some embodiments, the command instructing the type conversion may be included in the command received in step S101. Based on the command instructing the type conversion, the neural network accelerator 100 may determine the type of type conversion.

제 2 층위의 결과 데이터의 비트 수를 증가시키는 형변환이 수행되어야 한다고 판단되는 것에 응답하여, 뉴럴 네트워크 가속기(100)는 S203 단계를 수행할 수 있다. 예를 들어, 제 1 층위의 데이터 타입의 총 비트 수가 제 2 층위의 데이터 타입의 총 비트 수보다 큰 것 또는 제 1 층위의 데이터 타입의 표현 범위가 제 2 층위의 데이터 타입의 표현 범위보다 큰 것에 응답하여, 뉴럴 네트워크 가속기(100)는 S203 단계를 수행할 수 있다. 다른 예를 들어, 제 2 층위의 결과 데이터의 총 비트 수를 증가시키는 형변환 또는 제 2 층위의 결과 데이터의 표현 범위를 증가시키는 형변환이 수행되어야 한다고 판단되는 것에 응답하여, 뉴럴 네트워크 가속기(100)는 S203 단계를 수행할 수 있다.In response to determining that the type conversion for increasing the number of bits of the result data of the second layer should be performed, the neural network accelerator 100 may perform step S203 . For example, in the case where the total number of bits of the data type of the first layer is greater than the total number of bits of the data type of the second layer, or the expression range of the data type of the first layer is larger than the expression range of the data type of the second layer In response, the neural network accelerator 100 may perform step S203. As another example, in response to determining that type conversion to increase the total number of bits of the result data of the second layer or type conversion to increase the expression range of the result data of the second layer should be performed, the neural network accelerator 100 is Step S203 may be performed.

S203 단계에서, 뉴럴 네트워크 가속기(100)의 내부 형변환기(150)에 의해 제 2 층위의 결과 데이터의 형변환이 수행될 수 있다. 예를 들어, 형변환 데이터 이동기(130)는 외부 메모리(12)에 저장된 제 2 층위의 결과 데이터를 그대로 내부 메모리(140)에 저장할 수 있다. 즉, 형변환 데이터 이동기(130)는 제 2 층위의 결과 데이터의 형변환을 수행하지 않을 수 있다. 이후, 내부 메모리(140)에 저장된 제 2 층위의 결과 데이터에 대해 내부 형변환기(150)에 의해 형변환이 수행될 수 있다. 내부 형변환기(150)에 의해 형변환된 제 2 층위의 결과 데이터는 다형연산기 어레이(160)로 전달될 수 있다.In step S203 , type conversion of the result data of the second layer may be performed by the internal type converter 150 of the neural network accelerator 100 . For example, the type conversion data mover 130 may store the result data of the second layer stored in the external memory 12 as it is in the internal memory 140 . That is, the type conversion data mover 130 may not perform type conversion on the result data of the second layer. Thereafter, type conversion may be performed by the internal type converter 150 on the result data of the second layer stored in the internal memory 140 . The result data of the second layer cast by the internal caster 150 may be transferred to the polymorphism operator array 160 .

S203 단계가 수행됨에 따라, 내부 메모리(140)에 저장되는 제 2 층위의 결과 데이터의 크기는 내부 형변환기(150)에 의해 형변환된 데이터의 크기보다 작을 수 있다. 이에 따라, 상대적으로 작은 크기의 데이터가 내부 메모리(140)에 저장될 수 있고, 결과적으로 내부 메모리(140)의 저장 용량이 절약될 수 있다.As step S203 is performed, the size of the result data of the second layer stored in the internal memory 140 may be smaller than the size of the data type-converted by the internal type converter 150 . Accordingly, data having a relatively small size may be stored in the internal memory 140 , and as a result, the storage capacity of the internal memory 140 may be saved.

제 2 층위의 결과 데이터의 비트 수를 감소시키는 형변환이 수행되어야 한다고 판단되는 것에 응답하여, 뉴럴 네트워크 가속기(100)는 S204 단계를 수행할 수 있다. 예를 들어, 제 1 층위의 데이터 타입의 총 비트 수가 제 2 층위의 데이터 타입의 총 비트 수보다 작은 것 또는 제 1 층위의 데이터 타입의 표현 범위가 제 2 층위의 데이터 타입의 표현 범위보다 작은 것에 응답하여, 뉴럴 네트워크 가속기(100)는 S204 단계를 수행할 수 있다. 다른 예를 들어, 제 2 층위의 결과 데이터의 총 비트 수를 감소시키는 형변환 또는 제 2 층위의 결과 데이터의 표현 범위를 감소시키는 형변환이 수행되어야 한다고 판단되는 것에 응답하여, 뉴럴 네트워크 가속기(100)는 S204 단계를 수행할 수 있다. In response to determining that the type conversion for reducing the number of bits of the result data of the second layer should be performed, the neural network accelerator 100 may perform step S204. For example, in the case where the total number of bits of the data type of the first layer is smaller than the total number of bits of the data type of the second layer, or the expression range of the data type of the first layer is smaller than the expression range of the data type of the second layer In response, the neural network accelerator 100 may perform step S204. For another example, in response to determining that a type conversion to reduce the total number of bits of the result data of the second layer or a type conversion to reduce the expression range of the result data of the second layer should be performed, the neural network accelerator 100 is Step S204 may be performed.

S204 단계에서, 뉴럴 네트워크 가속기(100)의 형변환 데이터 이동기(130)에 의해 제 2 층위의 결과 데이터의 형변환이 수행될 수 있다. 예를 들어, 형변환 데이터 이동기(130)는 외부 메모리(12)에 저장된 제 2 층위의 결과 데이터에 대해 형변환을 수행할 수 있다. 형변환 데이터 이동기(130)는 형변환된 제 2 층위의 결과 데이터를 내부 메모리(140)에 저장할 수 있다. 내부 메모리(140)에 저장된 형변환된 제 2 층위의 결과 데이터는 다형연산기 어레이(160)로 전달될 수 있다.In step S204 , type conversion of the result data of the second layer may be performed by the type conversion data mover 130 of the neural network accelerator 100 . For example, the type conversion data mover 130 may perform type conversion on the result data of the second layer stored in the external memory 12 . The type conversion data mover 130 may store the type conversion result data of the second layer in the internal memory 140 . The type-converted second layer result data stored in the internal memory 140 may be transferred to the polymorphism operator array 160 .

S204 단계가 수행됨에 따라, 내부 메모리(140)에 저장되는 형변환된 제 2 층위의 결과 데이터의 크기는 외부 메모리(12)에 저장된 제 2 층위의 결과 데이터의 크기보다 작을 수 있다. 이에 따라, 상대적으로 작은 크기의 데이터가 내부 메모리(140)에 저장될 수 있고, 결과적으로 내부 메모리(140)의 저장 용량이 절약될 수 있다.As step S204 is performed, the size of the result data on the type-converted second layer stored in the internal memory 140 may be smaller than the size of the result data on the second layer stored in the external memory 12 . Accordingly, data having a relatively small size may be stored in the internal memory 140 , and as a result, the storage capacity of the internal memory 140 may be saved.

S205 단계에서, 뉴럴 네트워크 가속기(100)의 내부 형변환기(150)는 제 2 층위의 결과 데이터에 대해 형변환을 수행할 수 있다. 예를 들어, 내부 형변환기(150)는 내부 메모리(140)에 저장된 제 2 층위의 결과 데이터에 대해, 형변환의 종류에 무관하게, 형변환을 수행할 수 있다. 이후, 내부 형변환기(150)는 형변환된 데이터를 다형연산기 어레이(160)로 전달할 수 있다.In step S205 , the internal type converter 150 of the neural network accelerator 100 may perform type conversion on the result data of the second layer. For example, the internal type converter 150 may perform type conversion on the result data of the second layer stored in the internal memory 140 , regardless of the type of type conversion. Thereafter, the internal type converter 150 may transmit the type-converted data to the polymorphism operator array 160 .

상술된 방식으로, 뉴럴 네트워크 가속기(100)는 데이터의 형변환을 유동적으로 수행할 수 있다. 예를 들어, 뉴럴 네트워크 가속기(100)에 의해 데이터의 형변환이 수행되는 시점은, 이전 층위의 결과 데이터의 저장 위치 및 수행할 형변환의 종류에 기반하여 유동적으로 조절될 수 있다. 이에 따라, 뉴럴 네트워크 가속기(100)는 각 층위 별로 자유롭게 운용될 수 있고, 그리고 내부 메모리(140)의 용량이 절약될 수 있다.In the manner described above, the neural network accelerator 100 may flexibly perform type conversion of data. For example, the timing at which data type conversion is performed by the neural network accelerator 100 may be flexibly adjusted based on the storage location of the result data of the previous layer and the type of type conversion to be performed. Accordingly, the neural network accelerator 100 can be freely operated for each layer, and the capacity of the internal memory 140 can be saved.

도 7은 본 개시의 다른 실시 예에 따라, 도 1의 뉴럴 네트워크 가속기(100)의 동작 방법을 좀 더 구체적으로 도시한다. 도 1, 도 5, 및 도 7을 참조하면, 뉴럴 네트워크 가속기(100)는 S301 내지 S303 단계들을 수행할 수 있다. 뉴럴 네트워크 가속기(100)는 현재 층위의 결과 데이터에 대해 선택적으로 형변환을 수행한 후 내부 메모리(140) 또는 외부 메모리(12)로 저장할 수 있다.FIG. 7 illustrates a method of operating the neural network accelerator 100 of FIG. 1 in more detail according to another embodiment of the present disclosure. 1, 5, and 7 , the neural network accelerator 100 may perform steps S301 to S303. The neural network accelerator 100 may selectively perform type conversion on the result data of the current layer and then store it in the internal memory 140 or the external memory 12 .

S301 단계에서, 다음 층위의 데이터 타입의 비트 수 및 현재 층위의 데이터 타입의 비트 수가 비교될 수 있다. 예를 들어, 뉴럴 네트워크 가속기(100)의 명령어 분석기(110)는 제 3 층위의 데이터 타입의 비트 수를 판단하고, 그리고 제 1 층위의 데이터 타입의 총 비트 수 및 제 3 층위의 데이터 타입의 총 비트 수를 비교할 수 있다. In step S301 , the number of bits of the data type of the next layer may be compared with the number of bits of the data type of the current layer. For example, the instruction analyzer 110 of the neural network accelerator 100 determines the number of bits of the data type of the third layer, and the total number of bits of the data type of the first layer and the total number of data types of the third layer You can compare the number of bits.

몇몇 실시 예들에 있어서, 뉴럴 네트워크 가속기(100)의 명령어 분석기(110)는 컨트롤러(11)로부터 미리 제공된 정보에 기반하여, 제 3 층위의 데이터 타입의 비트 수를 판단할 수 있다. 또는, 뉴럴 네트워크 가속기(100)는 외부 장치로부터 뉴럴 네트워크 가속기(100)에 의해 제 3 층위의 데이터 타입의 비트 수에 대한 정보를 포함하는 명령어를 더 수신할 수 있다. 몇몇 실시 예들에 있어서, 이러한 명령어는 S101 단계에서 수신된 명령어에 포함될 수도 있다. 이러한 명령어에 기반하여, 뉴럴 네트워크 가속기(100)는 제 3 층위의 데이터 타입의 비트 수를 판단할 수 있다.In some embodiments, the command analyzer 110 of the neural network accelerator 100 may determine the number of bits of the data type of the third layer based on information provided in advance from the controller 11 . Alternatively, the neural network accelerator 100 may further receive a command including information on the number of bits of the third layer data type by the neural network accelerator 100 from an external device. In some embodiments, such a command may be included in the command received in step S101. Based on this instruction, the neural network accelerator 100 may determine the number of bits of the data type of the third layer.

다음 층위의 데이터 타입의 비트 수가 현재 층위의 데이터 타입의 비트 수보다 작은 것에 응답하여, S302 단계에서, 뉴럴 네트워크 가속기(100)에 의해 현재 층위의 결과 데이터에 대해 형변환이 수행될 수 있다. 예를 들어, 제 3 층위의 데이터 타입의 비트 수가 제 1 층위의 데이터 타입의 비트 수보다 작은 것에 응답하여, 뉴럴 네트워크 가속기(100)의 내부 형변환기(150) 또는 형변환 데이터 이동기(130)에 의해 제 1 층위의 결과 데이터에 대해 형변환이 수행될 수 있다. 이후, S303 단계에서, 형변환된 제 1 층위의 결과 데이터가 내부 메모리(140) 또는 외부 메모리(12)에 각각 저장될 수 있다. 형변환된 제 1 층위의 결과 데이터는 제 1 층위의 결과 데이터보다 용량이 작을 수 있다. 따라서, 형변환된 제 1 층위의 결과 데이터가 데이터는 제 1 층위의 결과 데이터 대신에 내부 메모리(140)에 저장되거나 또는 외부 메모리(12)에 인터페이스(120)를 통해 저장될 수 있다. 결과적으로 내부 메모리(140)의 용량이 절약되거나 또는 인터페이스(120)의 데이터 대역폭이 절약될 수 있다.In response to the number of bits of the data type of the next layer being smaller than the number of bits of the data type of the current layer, in step S302 , type conversion may be performed on the result data of the current layer by the neural network accelerator 100 . For example, in response to the number of bits of the data type of the third layer being smaller than the number of bits of the data type of the first layer, by the internal caster 150 or the cast data mover 130 of the neural network accelerator 100 Type conversion may be performed on the result data of the first layer. Thereafter, in step S303 , the result data of the type-converted first layer may be stored in the internal memory 140 or the external memory 12 , respectively. The type-converted first layer result data may have a smaller capacity than the first layer result data. Accordingly, the type-converted first layer result data may be stored in the internal memory 140 instead of the first layer result data or stored in the external memory 12 through the interface 120 . As a result, the capacity of the internal memory 140 may be saved or the data bandwidth of the interface 120 may be saved.

다음 층위의 데이터 타입의 비트 수가 현재 층위의 데이터 타입의 비트 수보다 작은 것에 응답하여, 뉴럴 네트워크 가속기(100)에 의해 현재 층위의 결과 데이터에 대해 형변환이 수행됨이 없이, S303 단계에서 현재 층위의 결과 데이터가 저장될 수 있다. 예를 들어, 제 3 층위의 데이터 타입의 비트 수가 제 1 층위의 데이터 타입의 비트 수보다 크거나 같은 것에 응답하여, 뉴럴 네트워크 가속기(100)는 제 1 층위의 결과 데이터에 대한 형변환을 수행하지 않고, 제 1 층위의 결과 데이터를 그대로 내부 메모리(140) 또는 외부 메모리(12)에 저장할 수 있다.In response to the number of bits of the data type of the next layer being smaller than the number of bits of the data type of the current layer, the result of the current layer in step S303 without type conversion being performed on the result data of the current layer by the neural network accelerator 100 Data may be stored. For example, in response to the number of bits of the data type of the third layer being greater than or equal to the number of bits of the data type of the first layer, the neural network accelerator 100 does not perform type conversion on the result data of the first layer , the result data of the first layer may be stored in the internal memory 140 or the external memory 12 as it is.

몇몇 실시 예들에 있어서, S301 단계는 컴퓨팅 장치(10)의 다른 구성 요소(예를 들어, 컨트롤러(11))에 의해 수행될 수도 있다. 이러한 실시 예들에서, 뉴럴 네트워크 가속기(100)는 다음 층위의 데이터 타입의 비트 수가 현재 층위의 데이터 타입의 비트 수보다 작은 것에 응답하여 S302 단계의 수행을 지시하는 출력 명령어 또는 다음 층위의 데이터 타입의 비트 수가 현재 층위의 데이터 타입의 비트 수보다 크거나 같은 것에 응답하여 S303 단계의 수행을 지시하는 출력 명령어를 수신할 수 있다. 수신되는 출력 명령어에 응답하여, 뉴럴 네트워크 가속기(100)는 S302 단계 또는 S303 단계를 수행할 수 있다.In some embodiments, step S301 may be performed by another component (eg, the controller 11 ) of the computing device 10 . In these embodiments, the neural network accelerator 100 responds to the fact that the number of bits of the data type of the next layer is smaller than the number of bits of the data type of the current layer. In response to the number being greater than or equal to the number of bits of the data type of the current layer, an output command instructing to perform step S303 may be received. In response to the received output command, the neural network accelerator 100 may perform step S302 or step S303.

상술된 내용은 본 개시를 실시하기 위한 구체적인 실시 예들이다. 본 개시는 상술된 실시 예들뿐만 아니라, 단순하게 설계 변경되거나 용이하게 변경할 수 있는 실시 예들 또한 포함할 것이다. 또한, 본 개시는 실시 예들을 이용하여 용이하게 변형하여 실시할 수 있는 기술들도 포함될 것이다. 따라서, 본 개시의 범위는 상술된 실시 예들에 국한되어 정해져서는 안 되며 후술하는 특허청구범위뿐만 아니라 이 발명의 특허청구범위와 균등한 것들에 의해 정해져야 할 것이다.The above are specific embodiments for carrying out the present disclosure. The present disclosure will include not only the above-described embodiments, but also simple design changes or easily changeable embodiments. In addition, the present disclosure will also include techniques that can be easily modified and implemented using the embodiments. Therefore, the scope of the present disclosure should not be limited to the above-described embodiments and should be defined by the claims and equivalents of the present invention as well as the claims to be described later.

100: 뉴럴 네트워크 가속기
130: 형변환 데이터 이동기
150: 내부 형변환기100: Neural Network Accelerator
130: type conversion data mover
150: internal caster

Claims

a command analyzer for analyzing a first command instructing an operation for a first layer of a neural network algorithm from an external device;
a polymorphic operator array including a plurality of operators performing an operation on the first layer under the control of the instruction analyzer;
an interface communicating with the external device and an external memory under the control of the command analyzer;
internal memory;
a type conversion data mover for storing data received from the external memory through the interface in the internal memory under the control of the command analyzer; and
and an internal type converter that converts data stored in the internal memory or data generated by the polymorphism operator array under the control of the command analyzer.

The method of claim 1,
wherein a first one of the plurality of operators of the polymorphic operator array includes a 10-bit adder, a 5-bit adder, a 10-bit multiplier, and a 4-bit multiplier.

The method of claim 1,
In response to the result data of the second layer on which the operation is performed one level earlier than the first layer is stored in the external memory, the interface reads the result data of the second layer from the external memory, and the type conversion data delivered to the mobile device,
The result data of the second layer includes a result of the operation on the second layer.

4. The method of claim 3,
In response to the total number of bits of the data type requested by the first layer being greater than the total number of bits of the data type required by the second layer, the type conversion data mover is configured to:
performing type conversion of the result data of the second layer; and
A neural network accelerator for storing the type-converted second layer result data in the internal memory.

4. The method of claim 3,
In response to the total number of bits of the data type requested on the first layer being greater than the total number of bits of the data type requested on the second layer, the type conversion data mover stores the result data on the second layer in the internal memory. and
The internal type converter performs type conversion of the result data on the second layer stored in the internal memory, and transfers the type-converted result data on the second layer to the polymorphism operator array.

The method of claim 1,
In response to the result data of the second layer in which the operation is performed one level earlier than the first layer by one level is stored in the internal memory, the internal type converter includes:
performing type conversion on the result data on the second layer stored in the internal memory; and
Transmitting the result data of the type-converted second layer to the polymorphic operator array,
The result data of the second layer includes a result of the operation on the second layer.

The method of claim 1,
In response to the fact that the total number of bits of the data type required by the third layer, in which the operation is to be performed one layer later than the first layer, is greater than the total number of bits of the data type required by the first layer, Result data is stored in the internal memory or the external memory,
The result data of the first layer includes a result of the operation on the first layer performed by the polymorphic operator array.

The method of claim 1,
In response to the total number of bits of the data type required in the third layer, in which an operation is to be performed one layer later than the first layer, is less than the total number of bits of the data type required in the first layer,
The internal caster is:
performing type conversion of the result data of the first layer, and
Storing the type-converted first layer result data in the internal memory,
The result data of the first layer includes a result of the operation on the first layer performed by the polymorphic operator array.

The method of claim 1,
In response to the total number of bits of the data type required in the third layer, in which an operation is to be performed one layer later than the first layer, is less than the total number of bits of the data type required in the first layer,
The type conversion data mover is:
performing type conversion of the result data of the first layer, and
transmitting the result data of the first layer cast to the interface, and
The interface stores the type-converted first layer result data in the external memory,
The result data of the first layer includes a result of the operation on the first layer performed by the polymorphic operator array.

A polymorphic operator array that performs operations for processing a neural network algorithm, an internal memory, a type conversion data mover that transfers data stored in an external memory to the internal memory, and data stored in the internal memory or generated by the polymorphic operator array A method of operating a neural network accelerator including an internal type converter for performing data type conversion, the method comprising:
analyzing a first instruction instructing an operation for a first layer of a neural network algorithm from an external device;
performing type conversion of the result data of the second layer, in which the operation is performed one level earlier than the first layer, by either the type conversion data mover or the internal type converter;
performing the operation on the first layer based on the result data of the second layer; and
and outputting result data of the first layer including the result of the operation on the first layer.

11. The method of claim 10,
Resulting data on the second layer is stored in the external memory, and in response to a total number of bits of a data type requested on the first layer being greater than a total number of bits of a data type requested on the second layer, The method of operating a neural network accelerator, wherein the type conversion of the result data of two layers is performed by the type conversion data mover.

11. The method of claim 10,
Resulting data on the second layer is stored in the external memory, and in response to a total number of bits of a data type requested on the first layer being less than a total number of bits of a data type requested on the second layer, The method of operating a neural network accelerator, wherein the type conversion of the result data of two layers is performed by the internal type converter.

11. The method of claim 10,
In response to the result data on the second layer being stored in the internal memory, the casting of the result data on the second layer is performed by the internal caster.

11. The method of claim 10,
The step of outputting the result data of the first layer includes:
In response to the fact that the total number of bits of the data type required by the third layer, in which the operation is to be performed one layer later than the first layer, is greater than the total number of bits of the data type required by the first layer, Storing the result data in the internal memory or the external memory,
The result data of the first layer includes a result of the operation on the first layer performed by the polymorphic operator array.

11. The method of claim 10,
The step of outputting the result data of the first layer includes:
In response to the fact that the total number of bits of the data type required by the third layer, in which an operation is to be performed one layer later than the first layer, is smaller than the total number of bits of the data type required by the first layer, performing type conversion of the result data; and
Storing the type-converted first layer result data in the internal memory or the external memory,
The result data of the first layer includes a result of the operation on the first layer performed by the polymorphic operator array.