KR20230018972A

KR20230018972A - Neural network apparatus and processing method performed by the same

Info

Publication number: KR20230018972A
Application number: KR1020210161287A
Authority: KR
Inventors: 이종은; 아자트 아자마트
Original assignee: 삼성전자주식회사; 울산과학기술원
Priority date: 2021-07-30
Filing date: 2021-11-22
Publication date: 2023-02-07

Abstract

Disclosed are a neural network device and a processing method performed by the neural network device. The neural network device may comprise: a random access memory that generates an analog output signal based on an input and weight, and has a crossbar array structure; an analog-to-digital converter circuit that generates a digital output signal based on a reference signal and an analog output signal of a resistance memory; a first ADC scaler that scales the reference signal of an ADC circuit; and a second ADC scaler that scales the digital output signal generated by the ADC circuit.

Description

Neural network device and processing method performed by the neural network device

실시예들은 인-메모리 컴퓨팅 기반 뉴럴 네트워크 장치와 뉴럴 네트워크 장치에 의해 수행되는 처리 방법에 관한 것이다.Embodiments relate to an in-memory computing based neural network device and a processing method performed by the neural network device.

최근 딥 뉴럴 네트워크(deep neural network; DNN)의 성공으로 인하여 DNN 알고리즘을 가속화하기 위한 효율적인 하드웨어에 대한 관심이 증가하고 있다. 저항 메모리(resistive random-access memory; ReRAM) 크로스바 어레이(crossbar array)(RCA)는 최근에 제안된 많은 RCA 기반 DNN 가속기(accelerator)의 기초가 되는 행렬-벡터 곱셈(matrix-vector multiplication; MVM) 연산들의 효율적인 계산을 가능하게 한다. RCA 기반 DNN 가속기는 데이터가 저장된 곳에서 바로 계산이 수행되는 아키텍쳐를 가지고 있고, 모든 시냅스 요소들(synaptic elements)을 전용 하드웨어로 구현하여 높은 처리량(throughput)을 제공한다.Due to the recent success of deep neural networks (DNNs), interest in efficient hardware for accelerating DNN algorithms is increasing. A resistive random-access memory (ReRAM) crossbar array (RCA) is a matrix-vector multiplication (MVM) operation that underlies many recently proposed RCA-based DNN accelerators. It allows efficient calculation of The RCA-based DNN accelerator has an architecture in which calculations are performed right where data is stored, and provides high throughput by implementing all synaptic elements with dedicated hardware.

일 실시예에 따른 뉴럴 네트워크 장치는, 입력과 가중치에 기초하여 아날로그 출력 신호를 생성하고, 크로스바 어레이(crossbar array) 구조를 가지는 랜덤 액세스 메모리; 기준 신호와 상기 랜덤 액세스 메모리의 상기 아날로그 출력 신호에 기초하여 디지털 출력 신호를 생성하는 아날로그-디지털 컨버터(analog-to-digital converter; ADC) 회로; 상기 ADC 회로의 상기 기준 신호를 스케일링(scaling)하는 제1 ADC 스케일러(scaler); 및 상기 ADC 회로에 의해 생성된 상기 디지털 출력 신호를 스케일링하는 제2 ADC 스케일러를 포함할 수 있다.A neural network device according to an embodiment includes a random access memory that generates an analog output signal based on inputs and weights and has a crossbar array structure; an analog-to-digital converter (ADC) circuit that generates a digital output signal based on a reference signal and the analog output signal of the random access memory; a first ADC scaler for scaling the reference signal of the ADC circuit; and a second ADC scaler for scaling the digital output signal generated by the ADC circuit.

상기 제1 ADC 스케일러와 상기 제2 ADC 스케일러는 서로 동일한 스케일 요소(scale factor)를 가질 수 있다.The first ADC scaler and the second ADC scaler may have the same scale factor.

상기 제1 ADC 스케일러는, 아날로그 영역에서 상기 기준 신호에 대응하는 기준 전압을 스케일 요소로 나누는 것에 의해 상기 기준 전압을 조정할 수 있다.The first ADC scaler may adjust the reference voltage by dividing the reference voltage corresponding to the reference signal by a scale factor in the analog domain.

상기 제2 ADC 스케일러는, 디지털 영역에서 상기 디지털 출력 신호에 상기 스케일 요소를 곱하는 것에 의해 상기 디지털 출력 신호를 조정할 수 있다.The second ADC scaler may adjust the digital output signal by multiplying the digital output signal by the scale factor in a digital domain.

일 실시예에 따른 뉴럴 네트워크 장치에 의해 수행되는 처리 방법은, 입력과 가중치를 수신하는 동작; 크로스바 어레이 구조의 랜덤 액세스 메모리를 통해, 상기 입력과 상기 가중치에 기초하여 아날로그 출력 신호를 생성하는 동작; 아날로그-디지털 컨버터 회로를 통해, 제1 ADC 스케일러에 의해 스케일링된 기준 신호와 상기 랜덤 액세스 메모리의 상기 아날로그 출력 신호에 기초하여 디지털 출력 신호를 생성하는 동작; 및 제2 ADC 스케일러를 통해, 상기 ADC 회로에 의해 생성된 상기 디지털 출력 신호에 대한 스케일링을 수행하는 동작을 포함할 수 있다.A processing method performed by a neural network device according to an embodiment includes an operation of receiving an input and a weight; generating an analog output signal based on the input and the weight through a random access memory having a crossbar array structure; generating a digital output signal based on a reference signal scaled by a first ADC scaler and the analog output signal of the random access memory through an analog-to-digital converter circuit; and scaling the digital output signal generated by the ADC circuit through a second ADC scaler.

일 실시예에 따른 뉴럴 네트워크 장치에 의해 수행되는 처리 방법은, 상기 제1 ADC 스케일러에 의해, 아날로그 영역에서 상기 기준 신호를 스케일 요소로 나누는 것에 의해 상기 스케일링된 기준 신호를 생성하는 동작을 더 포함할 수 있다.The processing method performed by the neural network device according to an embodiment may further include generating the scaled reference signal by dividing the reference signal by a scale factor in an analog domain by the first ADC scaler. can

도 1은 일 실시예에 따른 뉴럴 네트워크 장치의 구성들을 도시하는 블록도이다.
도 2는 일 실시예에 따른 뉴럴 네트워크 장치에서 수행되는 처리 과정을 설명하기 위한 도면이다.
도 3은 일 실시예에 따른 RCA 구조를 포함하는 뉴럴 네트워크 장치의 구조를 도시하는 도면이다.
도 4는 일 실시예에 따른 가중치를 분할하는 것을 설명하기 위한 도면이다.
도 5는 일 실시예에 따른 ADC 회로를 도시하는 도면이다.
도 6a 및 도 6b는 일 실시예에 따른 양자화 기법에 기초하여 스케일 요소를 도출하는 것을 설명하기 위한 도면들이다.
도 7은 일 실시예에 따른 뉴럴 네트워크 장치에 의해 수행되는 처리 방법의 동작들을 도시하는 흐름도이다.1 is a block diagram showing configurations of a neural network device according to an embodiment.
2 is a diagram for explaining a processing process performed in a neural network device according to an exemplary embodiment.
3 is a diagram showing the structure of a neural network device including an RCA structure according to an embodiment.
4 is a diagram for explaining dividing weights according to an exemplary embodiment.
5 is a diagram illustrating an ADC circuit according to an embodiment.
6A and 6B are views for explaining deriving a scale factor based on a quantization technique according to an exemplary embodiment.
7 is a flowchart illustrating operations of a processing method performed by a neural network device according to an embodiment.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 구현될 수 있다. 따라서, 실제 구현되는 형태는 개시된 특정 실시예로만 한정되는 것이 아니며, 본 명세서의 범위는 실시예들로 설명한 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only, and may be changed and implemented in various forms. Therefore, the form actually implemented is not limited only to the specific embodiments disclosed, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical idea described in the embodiments.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Although terms such as first or second may be used to describe various components, such terms should only be construed for the purpose of distinguishing one component from another. For example, a first element may be termed a second element, and similarly, a second element may be termed a first element.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.It should be understood that when an element is referred to as being “connected” to another element, it may be directly connected or connected to the other element, but other elements may exist in the middle.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, but one or more other features or numbers, It should be understood that the presence or addition of steps, operations, components, parts, or combinations thereof is not precluded.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in this specification, it should not be interpreted in an ideal or excessively formal meaning. don't

본 문서에서 설명되는 실시예들은 인-메모리 컴퓨팅에 기초하는 딥 러닝 하드웨어 장치나 인-메모리 컴퓨팅 방식의 행렬-벡터 곱셈(matrix-vector multiplication)을 사용하는 하드웨어 장치(예: 인공지능 하드웨어 응용, 신호 처리 칩 등)에 적용될 수 있다. 아래 설명에서는 설명의 편의를 위해 저항 메모리(ReRAM)에 기반한 실시예를 설명하고 있지만, 본 문서의 실시예들은 크로스바 어레이 구조를 가지면서 아날로그 연산(예: 아날로그 덧셈)을 수행하는 다른 타입의 메모리(SRAM(static RAM), DRAM(dynamic RAM), PRAM(phase change RAM), MRAM(magnetoresistive RAM), FeRAM(ferroelectric RAM) 등)를 이용하는 실시예들에도 모두 적용될 수 있다.Embodiments described in this document are deep learning hardware devices based on in-memory computing or hardware devices using matrix-vector multiplication of an in-memory computing method (eg, artificial intelligence hardware application, signal processing chips, etc.). In the following description, an embodiment based on a resistive memory (ReRAM) is described for convenience of explanation, but the embodiments of this document are other types of memory (eg, analog addition) that has a crossbar array structure and performs analog operations (eg, analog addition). static RAM (SRAM), dynamic RAM (DRAM), phase change RAM (PRAM), magnetoresistive RAM (MRAM), ferroelectric RAM (FeRAM), etc.).

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. In the description with reference to the accompanying drawings, the same reference numerals are given to the same components regardless of reference numerals, and overlapping descriptions thereof will be omitted.

도 1은 일 실시예에 따른 뉴럴 네트워크 장치의 구성들을 도시하는 블록도이다.1 is a block diagram showing configurations of a neural network device according to an embodiment.

도 1을 참조하면, 뉴럴 네트워크 장치(100)는 인-메모리 컴퓨팅 방식으로 뉴럴 네트워크를 구현한 하드웨어 장치일 수 있다. 예를 들어, 뉴럴 네트워크 장치(100)는 뉴럴 네트워크의 워크로드를 처리하도록 최적화된 뉴럴 네트워크 가속기일 수 있다. 뉴럴 네트워크 장치(100)는 크로스바 어레이(crossbar array) 구조를 가지는 랜덤 액세스 메모리(RAM)(110)를 이용하여 연산을 수행할 수 있다.Referring to FIG. 1 , the neural network device 100 may be a hardware device implementing a neural network using an in-memory computing method. For example, the neural network device 100 may be a neural network accelerator optimized to handle a neural network workload. The neural network apparatus 100 may perform an operation using a random access memory (RAM) 110 having a crossbar array structure.

뉴럴 네트워크 장치(100)는 랜덤 액세스 메모리(110)(RAM), ADC 회로(120), 제1 ADC 스케일러(130) 및 제2 ADC 스케일러(140)를 포함할 수 있다.The neural network device 100 may include a random access memory 110 (RAM), an ADC circuit 120, a first ADC scaler 130, and a second ADC scaler 140.

랜덤 액세스 메모리(110)는 입력과 가중치에 기초하여 아날로그 출력 신호를 생성할 수 있다. 가중치는 뉴럴 네트워크 장치(100)에 의해 구현되는 뉴럴 네트워크의 파라미터에 속한다. 입력과 가중치는 각각 양자화(이진화도 수행될 수 있음)되고, 랜덤 액세스 메모리(110)의 크로스바 어레이 구조에 대응되도록 분할(split)된 것일 수 있다. 랜덤 액세스 메모리(110)에는 각각 양자화되고 분할된 입력과 가중치가 입력될 수 있고, 각각 양자화되고 분할된 입력과 가중치 간의 연산에 의해 생성된 아날로그 값의 부분 합(partial sum)들을 생성할 수 있다.The random access memory 110 may generate an analog output signal based on inputs and weights. A weight belongs to a parameter of a neural network implemented by the neural network device 100 . Each of the input and weight may be quantized (binarization may also be performed) and split to correspond to the crossbar array structure of the random access memory 110 . Each quantized and divided input and weight may be input to the random access memory 110, and partial sums of analog values generated by an operation between each quantized and divided input and weight may be generated.

랜덤 액세스 메모리(110)는 예를 들어 저항 메모리(ReRAM)일 수 있으나, 실시예의 범위가 이에 제한되는 것은 아니며, 크로스바 어레이 구조를 가지는 다른 타입의 메모리일 수 있다. 저항 메모리는 전이금속 산화물에서 관찰되는 저항 변화 현상을 이용하여 저(low) 저항 상태에 대응하는 '1' 또는 고(high) 저항 상태에 대응하는 '0'을 저장하는 방식으로 동작하고, 크로스바 어레이 구조로 구현될 수 있다. 크로스바 어레이 구조의 경우, 데이터를 저장하는 단위 셀을 선택하기 위한 셀 선택 트랜지스터 없이, 비트 라인(bit line)과 워드 라인(word line)의 2개의 전극을 이용하여 구동이 가능하므로 집적도 면에서 장점을 가진다.The random access memory 110 may be, for example, a resistive memory (ReRAM), but the scope of the embodiment is not limited thereto, and may be another type of memory having a crossbar array structure. The resistance memory operates in a manner of storing '1' corresponding to a low resistance state or '0' corresponding to a high resistance state by using a resistance change phenomenon observed in a transition metal oxide, and a crossbar array structure can be implemented. In the case of the crossbar array structure, it can be driven using two electrodes of a bit line and a word line without a cell selection transistor to select a unit cell for storing data, so it has advantages in terms of integration. have

ADC 회로(120)는 입력된 아날로그 신호를 디지털 신호로 변환할 수 있다. ADC 회로(120)는 기준 신호와 랜덤 액세스 메모리(110)의 아날로그 출력 신호에 기초하여 디지털 출력 신호를 생성할 수 있다. ADC 회로(120)는 랜덤 액세스 메모리(110)에서 출력된 아날로그 값의 부분 합들을 디지털 값으로 변환하여 디지털 값의 부분 합들을 생성하고, 디지털 값의 부분 합들을 누적(accumulation)하여 디지털 출력 신호를 생성할 수 있다. ADC 회로(120)는 해당 아날로그 출력 신호와 서로 다른 기준 신호에 입력되는 복수의 비교기들을 포함하고, 비교기들 각각은 아날로그 출력 신호와 기준 신호 간의 비교 결과에 기초하여 이진화된 출력 값을 출력할 수 있다.The ADC circuit 120 may convert an input analog signal into a digital signal. The ADC circuit 120 may generate a digital output signal based on the reference signal and the analog output signal of the random access memory 110 . The ADC circuit 120 converts partial sums of analog values output from the random access memory 110 into digital values to generate partial sums of digital values, and accumulates the partial sums of digital values to obtain a digital output signal. can create The ADC circuit 120 includes a plurality of comparators input to a corresponding analog output signal and a reference signal different from each other, and each of the comparators may output a binarized output value based on a comparison result between the analog output signal and the reference signal. .

제1 ADC 스케일러(130)는 ADC 회로(120)의 기준 신호를 스케일링(scaling)할 수 있다. '스케일링'은 신호의 크기를 조절하는 것을 의미한다. 제1 ADC 스케일러(130)는 아날로그 영역(analog domain)에서 기준 신호에 대응하는 기준 전압(예: 도 5의 Vref)을 설정된 스케일 요소(scaling factor)로 나누는 것에 의해 기준 전압을 조정할 수 있다. 제1 ADC 스케일러(130)는 직렬로 연결된 저항들에 인가되는 기준 전압을 조정하는 것에 의해 기준 신호를 스케일링할 수 있다. 제2 ADC 스케일러(140)는 ADC 회로(120)에 의해 생성된 디지털 출력 신호를 스케일링할 수 있다. 제2 ADC 스케일러(140)는 디지털 영역(digital domain)에서 해당 디지털 출력 신호에 스케일 요소를 곱하는 것에 의해 디지털 출력 신호를 조정할 수 있다. 제2 ADC 스케일러(140)는 ADC 회로(120)의 디지털 출력 신호에 스케일 요소를 곱한 결과를 출력하는 디지털 곱셈기(digital multiplier) 및 디지털 덧셈기를 포함할 수 있다. 제1 ADC 스케일러(130)는 프리(pre)-ADC 스케일러라고도 지칭될 수 있고, 제2 ADC 스케일러(140)는 포스트(post)-ADC 스케일러라고도 지칭될 수 있다.The first ADC scaler 130 may scale the reference signal of the ADC circuit 120 . 'Scaling' means adjusting the amplitude of a signal. The first ADC scaler 130 may adjust the reference voltage by dividing the reference voltage (eg, Vref in FIG. 5 ) corresponding to the reference signal in the analog domain by a set scaling factor. The first ADC scaler 130 may scale a reference signal by adjusting a reference voltage applied to resistors connected in series. The second ADC scaler 140 may scale the digital output signal generated by the ADC circuit 120 . The second ADC scaler 140 may adjust a digital output signal by multiplying a corresponding digital output signal by a scale factor in a digital domain. The second ADC scaler 140 may include a digital multiplier and a digital adder that output a result obtained by multiplying a digital output signal of the ADC circuit 120 by a scale factor. The first ADC scaler 130 may also be referred to as a pre-ADC scaler, and the second ADC scaler 140 may also be referred to as a post-ADC scaler.

위와 같이, 제1 ADC 스케일러(130)는 기준 전압을 제어하는 것에 의해 아날로그 영역의 신호 스케일을 제어하고, 제2 ADC 스케일러(140)는 디지털 영역의 신호 스케일을 제어할 수 있다. 제1 ADC 스케일러(130)와 제2 ADC 스케일러(140)가 모두 사용되어 스케일 요소와 같은 양자화 파라미터가 실현될 수 있다. 제1 ADC 스케일러(130)와 제2 ADC 스케일러(140)는 서로 동일한 스케일 요소를 가질 수 있고, 이로 인해 오버헤드를 줄일 수 있다. 제1 ADC 스케일러(130)와 제2 ADC 스케일러(140)의 파라미터로서 동일한 값이 사용될 수 있다. 제1 ADC 스케일러(130)의 스케일 요소와 제2 ADC 스케일러(140)의 스케일 요소는 양자화 기법으로 도출된 것(양자화 이론에서 도출된 최적의 값)일 수 있고, 제1 ADC 스케일러(130)의 스케일 요소, 제2 ADC 스케일러(140)의 스케일 요소 및 랜덤 액세스 메모리(110)에 입력되는 가중치는 동일한 학습 과정에 의해 최적화된 것일 수 있다. 학습 과정에서는 제1 ADC 스케일러(130)의 스케일 요소 및 제2 ADC 스케일러(140)의 스케일 요소의 최적의 값이 결정되고, 결정된 최적의 스케일 요소를 기반으로 뉴럴 네트워크 장치(100)에 적용되는 가중치가 학습될 수 있다.As described above, the first ADC scaler 130 can control the signal scale of the analog domain by controlling the reference voltage, and the second ADC scaler 140 can control the signal scale of the digital domain. Both the first ADC scaler 130 and the second ADC scaler 140 can be used to realize quantization parameters such as scale factors. The first ADC scaler 130 and the second ADC scaler 140 may have the same scale factor, thereby reducing overhead. The same value may be used as a parameter of the first ADC scaler 130 and the second ADC scaler 140 . The scale factor of the first ADC scaler 130 and the scale factor of the second ADC scaler 140 may be derived by quantization techniques (optimal values derived from quantization theory), and The scale factor, the scale factor of the second ADC scaler 140, and the weight input to the random access memory 110 may be optimized through the same learning process. In the learning process, optimal values of the scale factor of the first ADC scaler 130 and the scale factor of the second ADC scaler 140 are determined, and the weight applied to the neural network device 100 based on the determined optimal scale factor. can be learned.

인-메모리 컴퓨팅 기반의 뉴럴 네트워크 하드웨어는 계산을 수행하기 위해 ADC를 필요로 한다. 본 명세서에서 제안된 RCA 기반의 뉴럴 네트워크 장치(100)에서는 행렬-벡터 곱셈(MVM)의 연산이 아날로그 영역에서 수행되고, 아날로그 영역에서의 연산 결과를 디지털 신호로 변환하기 위한 ADC를 필요로 한다. ADC는 일반적으로 면적, 에너지 및 파워 등의 측면에서 큰 오버헤드(overhead)를 차지한다. 제안된 기술에 따르면, 하드웨어의 변경 없이 제1 ADC 스케일러(130)와 제2 ADC 스케일러(140)를 통해 ADC의 필요 면적을 줄이면서도 높은 정확도를 제공할 수 있고, ADC의 오버헤드를 크게 줄일 수 있다.Neural network hardware based on in-memory computing requires an ADC to perform the calculations. In the RCA-based neural network apparatus 100 proposed in this specification, a matrix-vector multiplication (MVM) operation is performed in an analog domain, and an ADC is required to convert an operation result in the analog domain into a digital signal. ADCs generally have a large overhead in terms of area, energy, and power. According to the proposed technology, it is possible to provide high accuracy while reducing the required area of the ADC through the first ADC scaler 130 and the second ADC scaler 140 without changing the hardware, and greatly reduce the overhead of the ADC. there is.

뉴럴 네트워크 장치(100)는 양자화 기법으로부터 도출된 최적의 스케일 요소를 ADC 회로(120)의 파라미터로 사용함으로써 인-메모리 컴퓨팅 기반의 뉴럴 네트워크 하드웨어에서 요구되는 ADC의 크기를 줄이면서 높은 계산 정확도를 제공할 수 있고, 주변 회로들(peripheral circuits)(예: ADC 회로(120))의 전력 소비와 필요 면적을 감소시킬 수 있다. 이러한 뉴럴 네트워크 장치(100)는 칩(chip)의 형태로 구현되거나 또는 컴퓨터나 모바일 폰 등의 장치에 탑재될 수 있다.The neural network device 100 uses the optimal scale factor derived from the quantization technique as a parameter of the ADC circuit 120, thereby reducing the size of the ADC required in the in-memory computing-based neural network hardware and providing high calculation accuracy. and reduce power consumption and required area of peripheral circuits (eg, the ADC circuit 120). The neural network device 100 may be implemented in the form of a chip or mounted on a device such as a computer or mobile phone.

도 2는 일 실시예에 따른 뉴럴 네트워크 장치에서 수행되는 처리 과정을 설명하기 위한 도면이다.2 is a diagram for explaining a processing process performed in a neural network device according to an exemplary embodiment.

도 2를 참조하면, 뉴럴 네트워크 장치(예: 도 1의 뉴럴 네트워크 장치(100))에 뉴럴 네트워크 레이어의 입력(210)과 뉴럴 네트워크의 가중치(230)가 주어진다. 입력(210)과 가중치(230)는 다차원 배열(n-dimensional array)의 데이터 구조인 텐서(tensor) 데이터 구조를 가질 수 있다.Referring to FIG. 2 , an input 210 of a neural network layer and a weight 230 of a neural network are given to a neural network device (eg, the neural network device 100 of FIG. 1 ). The input 210 and the weight 230 may have a tensor data structure, which is an n-dimensional array data structure.

입력(210)은 이진화(binarization) 및 양자화(Quantization)되어 이진화 및 양자화된 입력(215)이 생성될 수 있다. 이진화 및 양자화된 입력(215)는 뉴럴 네트워크 장치에 포함된 랜덤 액세스 메모리(예: 도 1의 랜덤 액세스 메모리(110))의 크로스바 어레이 구조에 대응되도록 분할(220)되어 분할된 입력들(225)이 생성될 수 있다. 이와 유사하게, 가중치(230)도 이진화 및 양자화되어 이진화 및 양자화된 가중치(235)이 생성될 수 있다. 이진화 및 양자화된 가중치(235)는 랜덤 액세스 메모리의 크로스바 어레이 구조에 대응되도록 분할(240)되어 분할된 가중치들(245)이 생성될 수 있다. 이와 같이, 이진화 및 양자화된 입력(215)과 이진화 및 양자화된 가중치(235)는 각각 랜덤 액세스 메모리의 크로스바 어레이 구조의 크기에 맞게 분할될 수 있다.Input 210 may be binarized and quantized to generate binarized and quantized input 215 . The binarized and quantized input 215 is divided 220 to correspond to the crossbar array structure of the random access memory (eg, the random access memory 110 of FIG. 1) included in the neural network device, and divided inputs 225 this can be created. Similarly, weights 230 may also be binarized and quantized to produce binarized and quantized weights 235 . The binarized and quantized weights 235 may be divided 240 to correspond to the crossbar array structure of the random access memory to generate divided weights 245 . In this way, the binarized and quantized input 215 and the binarized and quantized weights 235 may be divided according to the size of the crossbar array structure of the random access memory.

분할된 입력들(225)과 분할된 가중치들(245)이 랜덤 액세스 메모리의 크로스바 어레이 구조에 입력되고, 랜덤 액세스 메모리에서 분할된 입력들(225)과 분할된 가중치들(245)을 기초로 연산(250)이 수행될 수 있다. 연산(250)은 아날로그 영역에서 수행되고, 예를 들어 컨볼루션 연산에 해당할 수 있다. 각 연산(250)의 수행 결과로서, 랜덤 액세스 메모리로부터 아날로그 값의 부분 합들(260)이 생성되고, 아날로그 값의 부분 합들(260)은 ADC 회로(예: 도 1의 ADC 회로(120))에 입력될 수 있다. ADC 회로는 아날로그 값의 부분 합들(260)을 각각 디지털 값으로 변환하여 디지털 값의 부분 합들(270)을 생성할 수 있다. 이렇게 생성된 디지털 값의 부분 합들(270)은 누적(accumulation)(280)되고, 누적 결과로서 최종 출력에 대응하는 디지털 출력 신호(290)가 생성될 수 있다.The divided inputs 225 and the divided weights 245 are input to the crossbar array structure of the random access memory, and an operation is performed based on the divided inputs 225 and the divided weights 245 in the random access memory. (250) may be performed. Operation 250 is performed in the analog domain and may correspond to, for example, a convolution operation. As a result of performing each operation 250, partial sums 260 of analog values are generated from the random access memory, and the partial sums 260 of analog values are transferred to an ADC circuit (eg, the ADC circuit 120 of FIG. 1). can be entered. The ADC circuit may generate partial sums 270 of digital values by converting each of the partial sums 260 of analog values into digital values. The partial sums 270 of the digital values generated in this way are accumulated (accumulation) 280, and a digital output signal 290 corresponding to a final output may be generated as an accumulation result.

도 3은 일 실시예에 따른 RCA 구조를 포함하는 뉴럴 네트워크 장치의 구조를 도시하는 도면이다.3 is a diagram showing the structure of a neural network device including an RCA structure according to an embodiment.

도 3을 참조하면, 저항 메모리 크로스바 어레이(RCA) 구조를 포함하는 뉴럴 네트워크 장치(300)의 구조가 도시되어 있다. 뉴럴 네트워크 장치(300)의 구조는 도 1의 뉴럴 네트워크 장치(100)가 랜덤 액세스 메모리로서 저항 메모리(ReRAM)(310)을 가지는 구조에 대응할 수 있다. 저항 메모리(310)는 크기가 컴팩트(compact)하고, 연산이 빠르다는 장점이 있다.Referring to FIG. 3 , a structure of a neural network apparatus 300 including a resistive memory crossbar array (RCA) structure is shown. The structure of the neural network device 300 may correspond to the structure in which the neural network device 100 of FIG. 1 has a resistive memory (ReRAM) 310 as a random access memory. The resistive memory 310 has advantages of being compact in size and fast in operation.

일 실시예에서, 저항 메모리(310)는 디지털 값의 입력을 아날로그 값으로 변환하는 디지털-아날로그 컨버터(digital-to-analog converter; DAC)들(312), 저(low) 저항 상태 또는 고(high) 저항 상태인지 여부를 기초로 데이터를 저장하는 크로스바 어레이 구조(314), 및 크로스바 어레이 구조(314)에서의 아날로그 값을 샘플링(sampling)하고 홀드(hold)하는 샘플 및 홀드 회로들(316)을 포함할 수 있다. 저항 메모리(310)는 저항 메모리(310)의 행 라인들(row lines)로 입력되는 입력과 저항 메모리(310)의 열 라인들(column lines)로 입력되는 가중치 간의 연산을 수행하고, 아날로그 값의 부분 합들을 생성할 수 있다.In one embodiment, resistive memory 310 includes digital-to-analog converters (DACs) 312 that convert inputs of digital values to analog values, a low resistance state or a high resistance state. ) crossbar array structure 314 for storing data based on whether or not in a resistance state, and sample and hold circuits 316 for sampling and holding analog values in the crossbar array structure 314 can include The resistive memory 310 performs an operation between an input input to the row lines of the resistive memory 310 and a weight input to the column lines of the resistive memory 310, You can create partial sums.

저항 메모리(310)의 출력인 아날로그 값의 부분 합들은 아날로그 곱셈기들(320)에 전달되고, 아날로그 곱셈기들(320)은 설정된 스케일 요소에 기초하여 아날로그 값의 부분 합들을 스케일링할 수 있다. 아날로그 곱셈기들(320)는 아날로그 영역에서 아날로그 값의 부분 합들에 스케일 요소(예: 1/s)를 곱할 수 있다. 아날로그-디지털 컨버터(ADC) 회로들(330)은 아날로그 값의 부분 합들을 각각 디지털 값으로 변환하여 디지털 값의 부분 합들을 생성할 수 있다. 디지털 곱셈기들(340)은 ADC 회로들(330)에서 생성된 디지털 값의 부분 합들을 스케일링할 수 있다. 디지털 곱셈기들(340)은 디지털 값의 부분 합들에 스케일 요소(예: s)를 곱할 수 있다. 스케일링된 디지털 값의 부분 합들은 누적(350)되어 최종의 디지털 출력 신호가 생성될 수 있다.The partial sums of the analog values output from the resistance memory 310 are transferred to the analog multipliers 320, and the analog multipliers 320 may scale the partial sums of the analog values based on the set scale factor. The analog multipliers 320 may multiply partial sums of analog values in the analog domain by a scale factor (eg, 1/s). Analog-to-digital converter (ADC) circuits 330 may generate partial sums of digital values by converting each of the partial sums of analog values into digital values. Digital multipliers 340 may scale partial sums of digital values generated by ADC circuits 330 . The digital multipliers 340 may multiply partial sums of digital values by a scale factor (eg, s). The partial sums of the scaled digital values may be accumulated 350 to generate a final digital output signal.

도 4는 일 실시예에 따른 가중치를 분할하는 것을 설명하기 위한 도면이다.4 is a diagram for explaining dividing weights according to an exemplary embodiment.

도 4를 참조하면, 일 실시예로서 컨볼루션 레이어에서 텐서 데이터 구조를 가지는 가중치(410)가 랜덤 액세스 메모리의 크로스바 어레이 구조에 대응되도록 분할되는 것을 도시한다.Referring to FIG. 4 , as an example, a weight 410 having a tensor data structure in a convolution layer is divided to correspond to a crossbar array structure of a random access memory.

컨볼루션 레이어를 크로스바 어레이 구조에 맵핑(mapping)하기 위한 다양한 방법들이 있다. 해당 방법들은 가중치(410)의 3차원 구조를 어떻게 평면화하고, 크로스바 어레이 구조의 입력 행(input rows)에 어떻게 맵핑(mapping)하는지에 따라 다양할 수 있다. 컨볼루션 레이어를 크로스바 어레이 구조에 맵핑하는 경우, 텐서 데이터 구조를 가지는 가중치를 예를 들어 P 개 이하의 입력 채널을 가지는 여러 개의 1 X 1 컨볼루션들로 분할할 수 있다. 이용되는 필터(filter)는 예를 들어 K X K(도시된 예에서, K=3)의 필터 크기(filter size)를 가질 수 있다. 각 분할된 가중치 블록(420)은 1 X 1 X P의 형태를 가질 수 있다. P는 크로스바 어레이 구조의 입력 행의 개수를 나타내고, Cin은 입력 채널의 개수를 나타낸다. P X P는 크로스바 어레이 구조의 크기에 대응할 수 있다.There are various methods for mapping a convolutional layer to a crossbar array structure. Corresponding methods may vary depending on how to flatten the 3D structure of the weights 410 and how to map them to input rows of the crossbar array structure. When the convolution layer is mapped to the crossbar array structure, weights having a tensor data structure may be divided into several 1 X 1 convolutions having, for example, P or less input channels. The filter used may have, for example, a filter size of K X K (in the illustrated example, K=3). Each divided weight block 420 may have a shape of 1 X 1 X P. P represents the number of input rows of the crossbar array structure, and Cin represents the number of input channels. P X P may correspond to the size of the crossbar array structure.

도 5는 일 실시예에 따른 ADC 회로를 도시하는 도면이다.5 is a diagram illustrating an ADC circuit according to an embodiment.

도 5를 참조하면, ADC 회로(예: 도 1의 ADC 회로(120))(500)의 일례로서 플래쉬(flash) ADC 회로가 도시되어 있다. ADC 회로(500)는 복수의 저항기들(512, 514, 516, 518), 비교기들(522, 524, 526) 및 인코더(530)를 포함할 수 있다.Referring to FIG. 5 , a flash ADC circuit is shown as an example of the ADC circuit (eg, the ADC circuit 120 of FIG. 1 ) 500 . The ADC circuit 500 may include a plurality of resistors 512 , 514 , 516 , and 518 , comparators 522 , 524 , and 526 , and an encoder 530 .

ADC 회로(500)에서 기준 신호(예: 기준 전압) Vref로부터 직렬로 연결된 저항기들(512, 514, 516, 518)에 의해 서로 다른 전압들이 생성되고, 각각의 서로 다른 전압은 비교기들(522, 524, 526) 각각에 입력된다. 서로 다른 전압들의 전압 값은 저항기들(512, 514, 516, 518)의 연결 관계를 기초로 전압 분배 법칙에 의해 결정될 수 있다.In the ADC circuit 500, different voltages are generated from a reference signal (eg, a reference voltage) Vref by resistors 512, 514, 516, and 518 connected in series, and each different voltage is generated by comparators 522, 524, 526) are input to each. Voltage values of the different voltages may be determined by a voltage division rule based on a connection relationship between the resistors 512 , 514 , 516 , and 518 .

기준 신호 Vref는 제1 ADC 스케일러(예: 도 1의 제1 ADC 스케일러(130))에 의해 조절될 수 있다. 제1 ADC 스케일러는 기준 신호 Vref에 스케일 요소를 적용하여 기준 신호 Vref를 스케일링할 수 있다. 제1 ADC 스케일러의 스케일 요소는 뉴럴 네트워크의 가중치를 학습하는 과정에서 가중치와 함께 학습되어 결정될 수 있다. 이와 같이, 아날로그 영역의 스케일러인 제1 ADC 스케일러를 기준 신호 Vref를 이용하는 것으로 구현함으로써, ADC 회로(500)를 양자화기로 대체하면서 면적 비용(area cost)을 발생시키지 않을 수 있다. 기준 신호 Vref를 조정하는 것에 의해 아날로그 곱셈기(analog multiplier)를 사용하지 않고도 아날로그 영역에서의 스케일링을 구현할 수 있다.The reference signal Vref may be adjusted by a first ADC scaler (eg, the first ADC scaler 130 of FIG. 1 ). The first ADC scaler may scale the reference signal Vref by applying a scale factor to the reference signal Vref. The scale factor of the first ADC scaler may be learned and determined together with the weights in the process of learning the weights of the neural network. In this way, by implementing the first ADC scaler, which is an analog domain scaler, by using the reference signal Vref, it is possible to replace the ADC circuit 500 with a quantizer without incurring area cost. Scaling in the analog domain can be implemented without using an analog multiplier by adjusting the reference signal Vref.

비교기들(522, 524, 526)에는 각각 랜덤 액세스 메모리(예: 도 1의 랜덤 액세스 메모리(110))로부터 출력된 아날로그 값의 아날로그 출력 신호(예: 아날로그 전압 신호) Vin과 기준 신호 Vref로부터 생성된 서로 다른 전압이 입력된다. 비교기들(522, 524, 526)은 각각 입력된 아날로그 전압 신호와 기준 신호 Vref와 저항기들(512, 514, 516, 518)로부터 생성된 기준 전압 신호를 비교하여, 그 대소 관계에 따라 하이 레벨(high level) 또는 로우 레벨(low level)의 출력 값을 출력할 수 있다.The comparators 522, 524, and 526 each generate an analog output signal (eg, an analog voltage signal) of an analog value output from a random access memory (eg, the random access memory 110 of FIG. 1) from Vin and a reference signal Vref. Different voltages are input. The comparators 522, 524, and 526 compare the input analog voltage signal, the reference signal Vref, and the reference voltage signal generated from the resistors 512, 514, 516, and 518, respectively, to a high level ( A high level or low level output value may be output.

비교기들(522, 524, 526)의 출력 값은 인코더(530)에 전달되고, 인코더(530)는 비교기들(522, 524, 526)의 출력 값에 기초하여 디지털 값의 디지털 출력 신호를 출력할 수 있다. 인코더(530)는 예를 들어 전가산기(full adder)와 가산기(adder)의 조합으로 구성될 수 있고, 비교기들(522, 524, 526)의 출력 값을 2진 코드로 변환하여 출력할 수 있다. 해당 2진 코드를 병렬 출력으로 출력시키는 것에 의해 ADC 회로(500)의 최종 디지털 출력 신호가 생성될 수 있다.The output values of the comparators 522, 524, and 526 are transmitted to the encoder 530, and the encoder 530 outputs a digital output signal of a digital value based on the output values of the comparators 522, 524, and 526. can The encoder 530 may be composed of, for example, a combination of a full adder and an adder, and may convert the output values of the comparators 522, 524, and 526 into binary codes and output them. . A final digital output signal of the ADC circuit 500 may be generated by outputting the corresponding binary code as a parallel output.

도 6a 및 도 6b는 일 실시예에 따른 양자화 기법에 기초하여 스케일 요소를 도출하는 것을 설명하기 위한 도면들이다.6A and 6B are views for explaining deriving a scale factor based on a quantization technique according to an exemplary embodiment.

뉴럴 네트워크 장치에 의해 구현되는 뉴럴 네트워크(예: 딥 뉴럴 네트워크(DNN))의 성능을 최적화하기 위한 제1 ADC 스케일러 및 제2 ADC 스케일러의 파라미터(예: 스케일 요소)를 결정하는 방법이 제공된다. ADC의 면적을 줄이기 위한 문제(ADC 감소 문제, ADC reduction problem)를 양자화 문제로 봄으로써, 제안 기술은 RCA 구조를 가지는 뉴럴 네트워크 장치에서 ADC의 정밀도(precision)를 최적화할 수 있다.A method for determining parameters (eg, scale factors) of a first ADC scaler and a second ADC scaler for optimizing the performance of a neural network (eg, a deep neural network (DNN)) implemented by a neural network apparatus is provided. By considering the problem of reducing the area of the ADC (ADC reduction problem, ADC reduction problem) as a quantization problem, the proposed technology can optimize the precision of the ADC in the neural network device having the RCA structure.

제1 단계는 뉴럴 네트워크 그래프를 변환하는 단계로서, 제1 단계에서는 도 6a에 도시된 것과 같이 뉴럴 네트워크(예: DNN) 그래프를 RCA 블록들(610)과 ADC 블록들(620)로 맵핑하여 변환한다. 뉴럴 네트워크의 가중치 행렬을 RCA 행렬로 분할(partitioning)하는 것은 가중치-RCA 맵핑(weight-to-RCA mapping)이라고 하며, 가중치-RCA 맵핑은 가중치의 정밀도, 저항 메모리의 셀 정확도(cell precision), RCA 크기, 가중치 텐서의 차원 및 가중치의 처리 방법 등에 의존한다. 뉴럴 네트워크의 완전 연결 레이어(fully-connected layer)는 도 6a에 도시된 것과 같은 RCA 블록들(610)에 맵핑될 수 있다. RCA 블록들(610)은 아날로그 값을 처리하여 출력하기 때문에 RCA 블록들(610)에 ADC 블록들(620)이 연결되고, 그 뒤에 각각의 ADC 블록들(620)로부터 출력된 디지털 값을 합산(summation)(또는 누적)하는 합산 블록(630)이 배치된다.The first step is to convert the neural network graph. In the first step, as shown in FIG. 6A, the neural network (eg, DNN) graph is mapped to RCA blocks 610 and ADC blocks 620 and converted. do. Partitioning the weight matrix of a neural network into RCA matrices is called weight-to-RCA mapping, and weight-to-RCA mapping determines the precision of weights, cell precision of resistive memory, RCA It depends on the size, dimensions of the weight tensor and how the weights are processed. A fully-connected layer of the neural network may be mapped to RCA blocks 610 as shown in FIG. 6A. Since the RCA blocks 610 process and output analog values, the ADC blocks 620 are connected to the RCA blocks 610, and then the digital values output from the respective ADC blocks 620 are summed ( A summation block 630 for summation (or accumulation) is placed.

제2 단계에서는 도 6b에 도시된 것과 같이 ADC 블록들(620)을 양자화기 quantizer, Q) 블록들(640)로 대체한다. 제2 단계에서는, 파라미터들을 최적화하여 제1 단계에서 변환된 뉴럴 네트워크 그래프에 양자화를 적용한다. 양자화기 블록들(640)은 제1 ADC 스케일러(예: 도 1의 제1 ADC 스케일러(130)) 및 제2 ADC 스케일러(예: 도 1의 제2 ADC 스케일러(140))를 시뮬레이션하는 기능을 수행할 수 있다. 양자화기 블록들(640)은 RCA 블록들(610)의 출력 값을 양자화하고, 입력 값의 스케일을 복원한다. 양자화기 블록들(640)에서는 예를 들어 다음의 수학식 1과 같은 연산이 수행될 수 있다.In the second step, as shown in FIG. 6B, the ADC blocks 620 are replaced with quantizer (Q) blocks 640. In the second step, parameters are optimized and quantization is applied to the neural network graph transformed in the first step. The quantizer blocks 640 function to simulate a first ADC scaler (eg, the first ADC scaler 130 of FIG. 1 ) and a second ADC scaler (eg, the second ADC scaler 140 of FIG. 1 ). can be done The quantizer blocks 640 quantize the output values of the RCA blocks 610 and restore the scale of the input values. In the quantizer blocks 640, for example, an operation such as Equation 1 below may be performed.

여기서,

는 양자화의 대상이 되는 입력 값이고,

는 양자화된 입력 값을 나타낸다.

는 수학식 1 연산의 결과 값을 나타내고, s는 스케일 요소(또는 단계 크기)의 파라미터를 나타낸다. b는 ADC의 정밀도(비트 수)를 나타낸다.

는 라운드 연산(round operation)을 나타내고,clip(x, a, b) = min(max(x, a), b)의 연산을 나타낸다. 스케일 요소 s는 뉴럴 네트워크의 학습 과정 또는 다른 방법(예: 통계적인 방법)을 통해 결정될 수 있다. RCA 블록들(610)의 출력 신호(예: 아날로그 값의 출력 전압)에 대응하는

가 스케일 요소 s에 나누어진다. 이 과정은 제1 ADC 스케일러에 의해 ADC 회로의 기준 전압 Vref(예: 도 5의 기준 전압 Vref)을 1/s 배로 조정하는 것에 의해 구현될 수 있다. ADC 회로의 기준 전압 Vref을 조정함으로써, 아날로그 곱셈기를 사용하지 않고도 아날로그 영역에서의 스케일링을 구현할 수 있게 된다.here,

is an input value to be quantized,

represents a quantized input value.

denotes a resultant value of the calculation of Equation 1, and s denotes a parameter of a scale factor (or step size). b represents the precision (number of bits) of the ADC.

represents a round operation, and represents an operation of clip(x, a, b) = min(max(x, a), b). The scale factor s may be determined through a learning process of a neural network or another method (eg, a statistical method). Corresponding to the output signal (eg, the output voltage of the analog value) of the RCA blocks 610

is divided by the scale factor s. This process may be implemented by adjusting the reference voltage Vref (eg, the reference voltage Vref of FIG. 5 ) of the ADC circuit by a factor of 1/s by the first ADC scaler. By adjusting the reference voltage Vref of the ADC circuit, scaling in the analog domain can be implemented without using an analog multiplier.

제3 단계에서는, 양자화된 뉴럴 네트워크를 다시 RCA 기반 뉴럴 네트워크(예: RCA 기반 가속기)로 다시 맵핑한다. 양자화 블록들(640)에서는 수학식 1의 라운드 연산의 전과 후에 수행되는 2개의 스케일링 연산들이 있다. 라운드 연산 전에 수행되는 아날로그 영역에서의 스케일링 연산(제1 ADC 스케일러에 의해 수행됨)과 라운드 연산 후에 수행되는 디지털 영역에서의 스케일링 연산(제2 ADC 스케일러에 의해 수행됨)이다. 스케일 요소의 파라미터는 뉴럴 네트워크의 전체 레이어, 출력 채널 또는 각각의 RCA 구조 내에서 공유될 수 있다. 전체 레이어에서 동일한 스케일 요소를 사용하는 경우 스케일링 오버헤드를 줄일 수 있다. 또는, 스케일 요소의 파라미터가 공유되지 않을 수도 있으며, 이 경우 제1 ADC 스케일러의 스케일 요소는 고유한 값을 가질 수도 있다.In the third step, the quantized neural network is remapped to an RCA-based neural network (eg, an RCA-based accelerator). In the quantization blocks 640, there are two scaling operations performed before and after the round operation of Equation 1. A scaling operation in the analog domain performed before the round operation (performed by the first ADC scaler) and a scaling operation in the digital domain performed after the round operation (performed by the second ADC scaler). The parameters of the scale factor can be shared within the entire layer, output channel or each RCA structure of the neural network. Scaling overhead can be reduced if the same scale factor is used in all layers. Alternatively, the parameter of the scale factor may not be shared, and in this case, the scale factor of the first ADC scaler may have a unique value.

도 7은 일 실시예에 따른 뉴럴 네트워크 장치에 의해 수행되는 처리 방법의 동작들을 도시하는 흐름도이다. 처리 방법의 동작들은 본 명세서에서 설명되는 뉴럴 네트워크 장치(예: 도 1의 뉴럴 네트워크 장치(100))에 의해 수행될 수 있다.7 is a flowchart illustrating operations of a processing method performed by a neural network device according to an embodiment. Operations of the processing method may be performed by a neural network device (eg, the neural network device 100 of FIG. 1 ) described in this specification.

도 7을 참조하면, 동작(710)에서 뉴럴 네트워크 장치는 입력과 가중치를 수신할 수 있다. 입력과 가중치는 각각 양자화(이진화도 수행될 수 있음)되고, 랜덤 액세스 메모리(예: 도 1의 랜덤 액세스 메모리(110))의 크로스바 어레이 구조에 대응되도록 분할된 것일 수 있다.Referring to FIG. 7 , in operation 710, the neural network device may receive an input and a weight. Each input and weight may be quantized (binarization may also be performed) and divided to correspond to a crossbar array structure of a random access memory (eg, random access memory 110 of FIG. 1 ).

동작(720)에서, 뉴럴 네트워크 장치는 크로스바 어레이 구조의 랜덤 액세스 메모리를 통해, 입력과 가중치에 기초하여 아날로그 출력 신호를 생성할 수 있다. 랜덤 액세스 메모리는 각각 양자화되고 분할된 입력과 가중치 간의 연산에 의해 생성된 아날로그 값의 부분 합들을 생성할 수 있다.In operation 720, the neural network device may generate an analog output signal based on the input and the weight through the random access memory having a crossbar array structure. The random access memory may generate partial sums of analog values generated by operations between weights and inputs that are each quantized and divided.

동작(730)에서, 뉴럴 네트워크 장치는 제1 ADC 스케일러(예: 도 1의 제1 ADC 스케일러(130))를 통해 아날로그 영역에서 기준 신호를 스케일 요소로 나누는 것에 의해 스케일링된 기준 신호를 생성할 수 있다. 여기서, 기준 신호는 ADC 회로에서 아날로그 출력 신호에 비교 대상이 되는 기준 전압에 대응할 수 있다.In operation 730, the neural network device may generate a scaled reference signal by dividing the reference signal by a scale factor in the analog domain through a first ADC scaler (eg, the first ADC scaler 130 of FIG. 1). there is. Here, the reference signal may correspond to a reference voltage to be compared with the analog output signal in the ADC circuit.

동작(740)에서, 뉴럴 네트워크 장치는, 아날로그-디지털 컨버터(ADC) 회로(예: 도 1으 ADC 회로(120))를 통해, 제1 ADC 스케일러에 의해 스케일링된 기준 신호와 랜덤 액세스 메모리의 아날로그 출력 신호에 기초하여 디지털 출력 신호를 생성할 수 있다. ADC 회로는 아날로그 값의 부분 합들을 디지털 값으로 변환하여 디지털 값의 부분 합들을 생성하고, 디지털 값의 부분 합들을 누적하여 디지털 출력 신호를 생성할 수 있다.In operation 740, the neural network device, via an analog-to-digital converter (ADC) circuit (e.g., ADC circuit 120 in FIG. 1), analogs the reference signal scaled by the first ADC scaler to the random access memory. A digital output signal may be generated based on the output signal. The ADC circuit may generate the partial sums of the digital values by converting the partial sums of the analog values into digital values, and may generate a digital output signal by accumulating the partial sums of the digital values.

동작(750)에서, 뉴럴 네트워크 장치는 제2 ADC 스케일러(예: 도 1의 제2 ADC 스케일러(140))를 통해, ADC 회로에 의해 생성된 디지털 출력 신호에 대한 스케일링을 수행할 수 있다. 제2 ADC 스케일러는 디지털 영역에서 디지털 출력 신호에 스케일 요소를 곱하는 것에 의해 디지털 출력 신호를 조정할 수 있다. 일 실시예에서, 제1 ADC 스케일러와 제2 ADC 스케일러는 서로 동일한 스케일 요소를 가질 수 있다.In operation 750, the neural network device may perform scaling of the digital output signal generated by the ADC circuit through a second ADC scaler (eg, the second ADC scaler 140 of FIG. 1). The second ADC scaler may adjust the digital output signal by multiplying the digital output signal by a scale factor in the digital domain. In one embodiment, the first ADC scaler and the second ADC scaler may have the same scale factor.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA). array), programmable logic units (PLUs), microprocessors, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of software. For convenience of understanding, there are cases in which one processing device is used, but those skilled in the art will understand that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it can include. For example, a processing device may include a plurality of processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. You can command the device. Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or provide instructions or data to a processing device. may be permanently or temporarily embodied in Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on computer readable media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 저장할 수 있으며 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. A computer readable medium may store program instructions, data files, data structures, etc. alone or in combination, and program instructions recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in the art of computer software. there is. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.

위에서 설명한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 또는 복수의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware device described above may be configured to operate as one or a plurality of software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 이를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited drawings, those skilled in the art can apply various technical modifications and variations based on this. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

Claims

In a neural network device using random-access memory (RAM),
the random access memory generating an analog output signal based on inputs and weights and having a crossbar array structure;
an analog-to-digital converter (ADC) circuit that generates a digital output signal based on a reference signal and the analog output signal of the random access memory;
a first ADC scaler for scaling the reference signal of the ADC circuit; and
A second ADC scaler for scaling the digital output signal generated by the ADC circuit
A neural network device comprising a.

According to claim 1,
The first ADC scaler and the second ADC scaler have the same scale factor,
Neural network device.

According to claim 1,
The first ADC scaler,
Adjusting the reference voltage by dividing the reference voltage corresponding to the reference signal in the analog domain by a scale factor.
Neural network device.

According to claim 3,
The second ADC scaler,
adjusting the digital output signal by multiplying the digital output signal by the scale factor in the digital domain;
Neural network device.

According to claim 1,
The first ADC scaler,
scaling the reference signal by adjusting a reference voltage applied to resistors connected in series;
Neural network device.

According to claim 1,
The second ADC scaler,
Including a digital multiplier for outputting a result of multiplying the digital output signal of the ADC circuit by a scale factor,
Neural network device.

According to claim 1,
The ADC circuit,
Includes a plurality of comparators input to the analog output signal and a different reference signal,
Each of the comparators outputs a binarized output value based on a comparison result between the analog output signal and a reference signal.
Neural network device.

According to claim 1,
The input and the weight are each quantized and split to correspond to the crossbar array structure of the random access memory,
Neural network device.

According to claim 8,
The random access memory,
Generating partial sums of analog values generated by the operation between the respective quantized and divided inputs and weights,
Neural network device.

According to claim 9,
The ADC circuit,
converting the partial sums of the analog values into digital values to generate partial sums of digital values, and generating the digital output signal by accumulating the partial sums of the digital values;
Neural network device.

According to claim 1,
The scale factor of the first ADC scaler and the scale factor of the second ADC scaler are derived by a quantization technique,
Neural network device.

According to claim 1,
The random access memory,
A neural network device, which is a resistive RAM.

In the processing method performed by the neural network device,
receiving inputs and weights;
generating an analog output signal based on the input and the weight through a random-access memory (RAM) having a crossbar array structure;
generating a digital output signal based on a reference signal scaled by a first ADC scaler and the analog output signal of the random access memory through an analog-to-digital converter (ADC) circuit; and
Scaling the digital output signal generated by the ADC circuit through a second ADC scaler.
A processing method comprising a.

According to claim 13,
The first ADC scaler and the second ADC scaler have the same scale factor,
processing method.

According to claim 13,
Generating the scaled reference signal by dividing the reference signal by a scale factor in an analog domain by the first ADC scaler
A processing method further comprising a.

According to claim 13,
The operation of scaling the digital output signal,
Adjusting the digital output signal by multiplying the digital output signal by the scale factor in the digital domain.
A processing method comprising a.

According to claim 13,
The operation of generating the analog output signal,
An operation of generating partial sums of analog values generated by an operation between each quantized and divided input and weights
A processing method comprising a.

According to claim 17,
The operation of generating the digital output signal,
converting the partial sums of the analog values into digital values to generate the partial sums of the digital values, and generating the digital output signal by accumulating the partial sums of the digital values;
A processing method comprising a.

A computer-readable recording medium storing one or more computer programs including instructions for performing the method of any one of claims 13 to 18.