KR102425869B1

KR102425869B1 - CMOS-based crossbar deep learning accelerator

Info

Publication number: KR102425869B1
Application number: KR1020190162662A
Authority: KR
Inventors: 여인준; 이병근; 김정균
Original assignee: 광주과학기술원
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2022-07-28
Also published as: KR20210072391A

Abstract

본 발명에 따른 씨모스에 기반하는 크로스바 어레이 딥러닝 가속기에는, 가로 및 세로방향으로 서로 교차하는 복수의 도선을 포함하는 크로스바 어레이; 상기 가로방향 도선(로우) 및 세로방향 도선(칼럼)을 교차하여 연결하는 저항소자; 상기 크로스바 어레이에 의해서 제공되는 신경망을 업데이트 및 훈련을 위한 업데이트 엔진; 및 상기 크로스바 어레이에 의해서 제공되는 신경망의 입출력 동작을 수행하는 프로세싱 엔진이 포함되어, 하드웨어 인공지능 장치를 제공할 수 있다. A crossbar array deep learning accelerator based on CMOS according to the present invention includes: a crossbar array including a plurality of conductors crossing each other in horizontal and vertical directions; a resistance element that crosses and connects the horizontal conductors (rows) and longitudinal conductors (columns); an update engine for updating and training the neural network provided by the crossbar array; and a processing engine for performing input/output operations of the neural network provided by the crossbar array, thereby providing a hardware artificial intelligence device.

Description

CMOS-based crossbar deep learning accelerator

본 발명은 씨모스에 기반하는 크로스바 어레이 딥러닝 가속기에 관한 것이다. The present invention relates to a crossbar array deep learning accelerator based on CMOS.

소프트웨어에 기반하는 인공지능장치는 상당한 수준에 도달하여 어디서나 쉽게 적용할 수 있다. 그러나, 컴퓨터 자원을 활용하기 때문에, 파워 효율이 떨어지고, 연산 시간이 길고, 네트워크에 연결하여 서버를 사용하지 않으면 고차원의 활용이 불가한 문제점이 있다. Artificial intelligence based on software has reached a significant level and can be easily applied anywhere. However, since computer resources are utilized, power efficiency is reduced, calculation time is long, and high-level utilization is impossible unless a server is connected to a network and used.

이러한 단점을 개선하는 것으로서, 하드웨어 그 자체를 인공 신경망의 연산에 적합하게 제작함으로써, 소프트웨어와 비교하여 빠른 인식속도 및 높은 파워효율의 구현이 가능할 수 있다. 상기 하드웨어 인공 신경망으로는, 소자를 크로스바 어레이(crossbar array)에 적용하는 방식을 예로 들 수 있다. 이 방식은 메모리 소자 자체가 크로스바 어레이가 될 수 있기 때문에, 연산효율을 향상시킬 수 있다. As an improvement on this disadvantage, by making the hardware itself suitable for the computation of the artificial neural network, faster recognition speed and higher power efficiency can be realized compared to software. As the hardware artificial neural network, a method of applying a device to a crossbar array may be exemplified. In this method, since the memory device itself can be a crossbar array, it is possible to improve the arithmetic efficiency.

상기 소자로는 멤리스터(memristor) 또는 RRAM(Resistive RAM)을 예로 들 수 있다. 그러나, 상기 소자는 씨모스(CMOS)에 기반하는 회로에 함께 설계하기가 어렵고, 컨덕턴스 응답(conductance response)의 신뢰도가 떨어지는 문제점이 있다. Examples of the device include a memristor or a resistive RAM (RRAM). However, the device has problems in that it is difficult to design the device together with a circuit based on CMOS, and the reliability of conductance response is low.

이에, 씨모스 소자를 사용하여 저항소자를 제작할 수 있으면, 집적이 쉽고 멤리스터보다 넓은 동작범위를 가질 수 있을 것으로 기대된다. Accordingly, if a resistance element can be manufactured using a CMOS element, it is expected to be easy to integrate and to have a wider operating range than a memristor.

2018 Symposium on VLSI Technology Digest of Technical Papers에서 공개된 'Capacitor-based Cross-point Array for Analog Neural Network with Record Symmetry and Linearity'에는, 인공지능의 구현을 위하여, 미세공정으로 제작되는 캐패시터에 기반하는 크로스 어레이가 소개되어 있다. 본 문헌에는 DRAM 공정의 트랜지스터 기반 크로스바 어레이를 활용한 가속기를 제시하고 있다. In 'Capacitor-based Cross-point Array for Analog Neural Network with Record Symmetry and Linearity', unveiled at the 2018 Symposium on VLSI Technology Digest of Technical Papers, a cross-array based on capacitors manufactured with microprocessing for the realization of artificial intelligence is introduced. In this document, an accelerator using a transistor-based crossbar array in a DRAM process is presented.

상기 문헌에는 저항소자를 구성하는 것에 대하여 구체적으로 제시하지 못하는 문제점이 있다. There is a problem in that the above literature does not specifically present the configuration of the resistance element.

2018 Symposium on VLSI Technology Digest of Technical Papers에서 공개된 'Capacitor-based Cross-point Array for Analog Neural Network with Record Symmetry and Linearity''Capacitor-based Cross-point Array for Analog Neural Network with Record Symmetry and Linearity' unveiled at the 2018 Symposium on VLSI Technology Digest of Technical Papers

본 발명은 상기되는 배경하에서 제안되는 것으로서, 제어되는 컨덕컨스 값으로 저항소자를 조절할 수 있는 씨모스에 기반하는 크로스바 어레이 딥러닝 가속기를 제안한다. The present invention is proposed under the background described above, and proposes a CMOS-based crossbar array deep learning accelerator capable of adjusting a resistance element with a controlled conductance value.

상기 저항소자에는, 흐르는 전류를 조절하기 위하여 직렬로 연결되는 제 1 피모스 및 제 2 피모스; 드레인에 구동전압이 연결되고 게이트에 입력전압이 연결되고 소스가 제 2 피모스의 게이트에 연결되는 제 2 엔모스; 및 상기 제 2 엔모스와 제 1 소스 팔로워 회로를 이루는 제 1 엔모스가 포함된다. 이에 따르면, 제 2 피모스가 포화영역에 진입하지 않도록 함으로써, 사전에 프로그래밍되어 있는 학습을 충실히 수행할 수 있다. 물론, 이에 따라서, 학습이 수행되는 범위를 더 크게 할 수 있다. The resistance element includes: first PMOS and second PMOS connected in series to control a flowing current; a second NMOS having a drain connected to a driving voltage, an input voltage connected to a gate, and a source connected to the gate of the second PMOS; and a first NMOS constituting the second NMOS and a first source follower circuit. Accordingly, by preventing the second PMOS from entering the saturation region, pre-programmed learning can be faithfully performed. Of course, according to this, the range in which learning is performed can be made larger.

본 발명에 따르면, 넓은 조절범위의 구현이 가능하다. According to the present invention, it is possible to implement a wide control range.

본 발명에 따르면, 저항소자의 컨덕턴스를 제어상태에 맞추어서 조절할 수 있다. According to the present invention, the conductance of the resistance element can be adjusted according to the control state.

더 구체적으로는, 멤리스터의 경우 현재 stuck-at-fault 라는 문제, 즉 원하는 범위의 컨덕턴스 범위 내에 값이 존재하지 않고 양 극단으로 머무르는 문제(가령 회로 상으로 Open이나 Short 상태가 되는 컨덕턴스 값)가 있다. 이것은 멤리스터 소자 자체의 문제로, 멤리스터 소자 사이에 너무 큰 전압이 걸리는 경우 멤리스터 소자 내의 절연체가 그 특성을 유지하지 못하고 다른 상태(완전히 끊기거나 연결되는 상태)로 들어가버리는 현상이 발생한다. 그러나 본 발명의 경우 CMOS를 기반으로 구현하였기 때문에 안정적인 성능을 보장할 수 있다. More specifically, in the case of memristors, the problem of currently stuck-at-fault, that is, a problem in which a value does not exist within the desired conductance range and stays at both extremes (for example, a conductance value that becomes open or short in the circuit) have. This is a problem of the memristor element itself, and when too large a voltage is applied between the memristor elements, the insulator in the memristor element does not maintain its characteristics and enters a different state (completely disconnected or connected). However, since the present invention is implemented based on CMOS, stable performance can be guaranteed.

또한, 멤리스터를 사용할 경우 CMOS 공정과 함께 제작될 수 없기 때문에 멤리스터 기반 crossbar array 따로, 신호 송/수신 주변회로가 따로 제작되어야 한다. 그러나, 본 발명은 멤리스터가 아닌 일반 CMOS 공정을 사용하기 때문에 한 칩 안에 주변회로들이 함께 집적될 수 있어 제조가 간단하다. 즉, crossbar array 구조의 deep learning accelerator를 single chip으로 쉽게 구현 가능하다. In addition, since a memristor cannot be manufactured together with the CMOS process, a memristor-based crossbar array and a signal transmission/reception peripheral circuit must be separately manufactured. However, since the present invention uses a general CMOS process rather than a memristor, peripheral circuits can be integrated together in one chip, thereby simplifying manufacturing. In other words, a deep learning accelerator with a crossbar array structure can be easily implemented with a single chip.

또한, 본 발명은 CMOS 기반이기 때문에 CMOS 기반 다른 가속기와 비교하면, off-chip memory인 SRAM 기반 가속기에 비하여, Crossbar array 형태의 in-memory device이기 때문에 더 집적도가 높다. 즉, 일반적인 CMOS 기반 가속기 보다 더 많은 메모리 용량을 가질 수 있다.In addition, since the present invention is CMOS-based, compared with other CMOS-based accelerators, since it is an in-memory device in the form of a crossbar array, the degree of integration is higher than that of an off-chip memory SRAM-based accelerator. That is, it can have more memory capacity than a typical CMOS-based accelerator.

도 1은 실시예에 따른 씨모스에 기반하는 크로스바 어레이 딥러닝 가속기의 구성도.
도 2는 바이어스 생성기와 차지펌프의 구성을 보이는 도면.
도 3은 증폭기와 동적 비교기의 구성을 보이는 도면이
도 4는 저항소자의 구성을 보이는 도면.
도 5와 도 6은 저항소자에 제공되는 소스팔로워의 작용을 설명하는 도면. 1 is a configuration diagram of a crossbar array deep learning accelerator based on CMOS according to an embodiment;
2 is a view showing the configuration of a bias generator and a charge pump.
3 is a diagram showing the configuration of an amplifier and a dynamic comparator;
4 is a view showing the configuration of a resistance element.
5 and 6 are views for explaining the operation of the source follower provided to the resistance element.

이하에서는 도면을 참조하여 본 발명의 구체적인 실시예를 상세하게 설명한다. 다만, 본 발명의 사상은 이하에 제시되는 실시예에 제한되지 아니하고, 본 발명의 사상을 이해하는 당업자는 동일한 사상의 범위내에 포함되는 다른 실시예를 구성요소의 부가, 변경, 삭제, 및 추가 등에 의해서 용이하게 제안할 수 있을 것이나, 이 또한 본 발명 사상의 범위 내에 포함된다고 할 것이다. Hereinafter, specific embodiments of the present invention will be described in detail with reference to the drawings. However, the spirit of the present invention is not limited to the embodiments presented below, and those skilled in the art who understand the spirit of the present invention may add, change, delete, and add components to other embodiments included within the scope of the same spirit. It will be easily proposed by the , but this will also be included within the scope of the present invention.

도 1은 실시예에 따른 씨모스에 기반하는 크로스바 어레이 딥러닝 가속기의 구성도이다. 1 is a configuration diagram of a crossbar array deep learning accelerator based on CMOS according to an embodiment.

도 1을 참조하면, 가로 및 세로방향으로 서로 교차하는 복수의 도선을 포함하는 크로스바 어레이(2)가 포함된다. Referring to FIG. 1 , a crossbar array 2 including a plurality of conducting wires crossing each other in the horizontal and vertical directions is included.

상기 크로스바 어레이(2)에는, 가로방향 도선(로우: Row) 및 세로방향 도선(칼럼: Column)을 교차하여 연결하는 저항소자(20)가 더 포함된다. 상기 저항소자(20)는 각 칼럼과 각 로우를 서로 연결하여 제공될 수 있다. 상기 저항소자(20)는, 상기 칼럼의 제어신호가 상기 로우의 각 스캐닝신호에 의해서 공급되는 기준신호에 의해서 동작될 수 있다. The crossbar array 2 further includes a resistance element 20 that crosses and connects horizontal conductors (rows) and longitudinal conductors (columns). The resistance element 20 may be provided by connecting each column and each row to each other. The resistance element 20 may be operated by a reference signal supplied by each scanning signal of the row to which the control signal of the column is supplied.

상기 저항소자는 일반적인 의미에서 저항이 아니라 결과론적으로 저항소자로서 동작하는 저항연산소자를 약칭할 수 있다. 상기 저항연산소자는 트랜지스터 및 모스에 의해서 구현될 수 있다. The resistance element may not be a resistor in a general sense, but may be an abbreviation for a resistance arithmetic element which consequently operates as a resistance element. The resistance operation element may be implemented by a transistor and a MOS.

상기 저항소자는 상세한 내용은 뒤에 설명한다. The resistance element will be described in detail later.

상기 저항소자(12)를 포함하는 크로스바 어레이(2)에 의해서 제공되는 신경망을 업데이트 및 훈련을 위한 업데이트 엔진(11), 및 상기 저항소자(12)를 포함하는 크로스바 어레이(2)에 의해서 제공되는 신경망의 입출력 동작을 수행하는 프로세싱 엔진(12)가 더 포함된다. An update engine 11 for updating and training a neural network provided by a crossbar array 2 comprising the resistive element 12, and a crossbar array 2 comprising the resistive element 12 A processing engine 12 for performing input/output operations of the neural network is further included.

실시예에 따른 딥러닝 가속기(1)의 대략적인 동작은 다음과 같다. The approximate operation of the deep learning accelerator 1 according to the embodiment is as follows.

먼저, 상기 프로세싱 엔진(12)의 로우 펄스 드라이버(3)의 입력이 상기 크로스바 어레이(2)에 맞도록 전류로 변형되어 인가된다. First, the input of the low pulse driver 3 of the processing engine 12 is transformed into a current to fit the crossbar array 2 and is applied thereto.

이후에 상기 크로스바 어레이(2)의 각 저항소자(20)를 통과하며 연산된 값들은 증폭기(4)에 의해서 적분된다. 상기 증폭기(4)는 용량형 트랜스 임피던스 증폭기(capacitive trans-impedance amplifier)가 적용될 수 있다. 상기 증폭기(4)의 출력값은, 동적 비교기(5)에 의해서 결과값으로 연산될 수 있다. Thereafter, the values calculated while passing through each resistance element 20 of the crossbar array 2 are integrated by the amplifier 4 . A capacitive trans-impedance amplifier may be applied to the amplifier 4 . The output value of the amplifier 4 may be calculated as a result value by the dynamic comparator 5 .

이후에 상기 결과값을 통하여 신경망 학습을 수행한 후에, 상기 업데이트 엔진(11)을 이용하여 상기 크로스바 어레이(2)의 저항소자(20)의 저항값을 변경한다. Thereafter, after neural network learning is performed using the result value, the resistance value of the resistance element 20 of the crossbar array 2 is changed using the update engine 11 .

상기 저항값의 변화는, 마스터 바이어스(7)에서 참조전류(reference current)(Im)를 출력하는 것, 바이어스 생성기(programmble Bias generator)(8)에서 프로그래밍된 값에 따라서 상기 참조전류를 제어하여 제어전류(Ib)를 출력하는 것, 및 차지펌프(column parallel charge pump)(9)에서 상기 제어전류에 따라서 차지값을 조절하는 것에서 수행될 수 있다. The change in the resistance value is controlled by outputting a reference current Im from the master bias 7 , and controlling the reference current according to a value programmed by a programmable bias generator 8 . outputting the current Ib, and adjusting the charge value according to the control current in a column parallel charge pump 9 .

상기 차지값(CPout)은 상기 저항소자(20)의 기준전압(Vcon)으로 입력될 수 있다. The charge value CPout may be input as a reference voltage Vcon of the resistance element 20 .

결국, 상기 바이어스 생성기(8)에 미리 프로그래밍 된 값에 따라서 저항소자를 제어하는 기준전압(Vcon)이 조절될 수 있다. As a result, the reference voltage Vcon for controlling the resistance element may be adjusted according to a value previously programmed in the bias generator 8 .

이하에서는 실시예에 따른 딥러닝 가속기(1)의 각 구성요소의 구조 및 동작을 설명한다. Hereinafter, the structure and operation of each component of the deep learning accelerator 1 according to the embodiment will be described.

도 2는 상기 바이어스 생성기와 차지펌프의 구성을 보이는 도면이다. 2 is a diagram showing the configuration of the bias generator and the charge pump.

도 2를 참조하면, 상기 바이서스 생성기(8)에는, R-2R 저항사다리(R-2R resistor ladder)포함되어, 상기 참조전류(Im)을 이진화 한다. Referring to FIG. 2 , the bias generator 8 includes an R-2R resistor ladder to binarize the reference current Im.

각각의 플립플람(FF)이 서로 연결되어, 출력측을 향하여 왼쪽에서 부터 순차적으로 1/2Im, 1/4Im, 및 1/8Im 등의 전류가 흐른다. 상기 플립플랍에 1 또는 0을 프로그래밍함으로써 회로를 스위칭하여, 상기 차지펌프(9)를 위한 제어전류(Ib) 측으로 또는 그라운드로 흐를 수 있다. 예를 들어, 1 값이 프로그래밍된 출력측의 모든 전류들은 상기 제어전류(Ib)로 합쳐져서 흐를 수 있다. Each flip-flam FF is connected to each other, and currents such as 1/2Im, 1/4Im, and 1/8Im sequentially flow from the left toward the output side. By programming 1 or 0 in the flip-flop, the circuit can be switched to flow to the control current Ib for the charge pump 9 or to the ground. For example, all currents on the output side, in which a value of 1 is programmed, may flow by being summed as the control current Ib.

이에 따르면, 상기 바이어스 생성기(8)에 미리 프로그래밍된 값에 따라서, 상기 차지펌프(9)의 바이어스 값으로 작용하는 제어전류(Ib)를 변화시킬 수 있다. Accordingly, the control current Ib acting as the bias value of the charge pump 9 may be changed according to the value previously programmed in the bias generator 8 .

상기 차지펌프(9)는 매 동작클럭마다 업 또는 다운에 전압을 인가하여, 출력노드의 차지를 증가시키거나 감소시킬 수 있다. 이때 제어전류(Ib)에 따라 한 클럭 당 차지를 변화시킬 수 있는 값이 변한다. 상기 과정을 통하여 미리 프로그래밍된 값에 따라서 한 번에 기준전압(Vcon)을 많이 변화시키거나 조금 변화시킬 수 있어 다양한 값으로 조절이 가능하다. The charge pump 9 may increase or decrease the charge of the output node by applying a voltage up or down at every operation clock. At this time, the value that can change the charge per one clock is changed according to the control current Ib. Through the above process, it is possible to change the reference voltage Vcon a lot or a little at a time according to a pre-programmed value, so that various values can be adjusted.

한편, 어느 한 칼럼의 차지펌프(9)가 동작할 때, 상기 로우 스캐너(10)가 순차적으로 동작하여 상기 기준전압이 인가되는 위치를 조절할 수 있다. Meanwhile, when the charge pump 9 of any one column operates, the row scanner 10 sequentially operates to adjust a position to which the reference voltage is applied.

도 3은 상기 증폭기와 상기 동적 비교기의 구성을 보이는 도면이다. 3 is a diagram showing the configuration of the amplifier and the dynamic comparator.

도 3을 참조하면, 상기 크로스바 어레이(2)에는, 신경망에 대하여 양의 가중치 및 음의 가중치를 구현하기 위하여 두 줄의 저항소자 칼럼이 연결될 수 있다. Referring to FIG. 3 , two rows of resistance element columns may be connected to the crossbar array 2 to implement positive weights and negative weights for the neural network.

상기 두 칼럼을 통과하는 전체 전류의 양은, 상기 증폭기(4)에 의해서 적분된다. 적분된 출력값은 동적 비교기(5)에 의해서 신경망의 출력을 제공할 수 있다. The total amount of current passing through the two columns is integrated by the amplifier (4). The integrated output value may provide an output of the neural network by the dynamic comparator 5 .

상기 동적 비교기(5)는, 매 클럭마다 랜덤 신호와 상기 적분된 출력값을 비교하여 0 또는 1의 출력값을 결정할 수 있다. 상기 출력값이 임의의 주기 안에 몇개가 발생하는 지에 따라서 신경망의 출력을 결정할 수 있다. The dynamic comparator 5 may determine an output value of 0 or 1 by comparing a random signal with the integrated output value for every clock. The output of the neural network can be determined according to how many of the output values occur in an arbitrary period.

상기 동적 비교기 및 상기 동적 비교기와 연관되는 동작의 설명은, 본 발명자가 이미 출원한 바가 있는 등록번호 10-1965850호, '인공 신경망에 사용되는 활성함수의 확률적 구현 방법 및 그를 포함하는 시스템'에 설명되므로, 이를 참조할 수 있다. 필요한 범위 내에서 발명자의 상기 등록특허의 기술은 본 발명의 실시에 필요한 범위 내에서 본 발명의 설명에 포함되는 것으로 한다. The description of the dynamic comparator and the operation associated with the dynamic comparator is in Registration No. 10-1965850, 'a method for probabilistic implementation of an activation function used in an artificial neural network and a system including the same', which the present inventor has already applied for. As it is described, it can be referred to. Within the necessary range, the description of the above registered patent of the inventor shall be included in the description of the present invention within the range necessary for the implementation of the present invention.

도 4는 상기 저항소자의 구성을 보이는 도면이고, 도 5와 도 6은 저항소자에 제공되는 소스팔로워의 작용을 설명하는 도면이다. 4 is a diagram showing the configuration of the resistance element, and FIGS. 5 and 6 are diagrams for explaining the operation of the source follower provided to the resistance element.

도 4를 참조하면, 실시예에 따른 저항소자(20)에는, 흐르는 전류를 조절하기 위하여 직렬로 연결되는 제 1 피모스(P1)(25), 및 제 2 피모스(P2)(26)가 포함된다. 상기 제 1, 2 피모스(25)(26)은 소스가 서로 연결되고, 제 1 피모스(25)의 드레인에는 입력측이 연결되고, 제 2 피모스의 드레인에는 출력측이 연결된다. Referring to FIG. 4 , in the resistance element 20 according to the embodiment, a first PMOS (P1) 25 and a second PMOS (P2) 26 connected in series to control a flowing current are provided. Included. The sources of the first and second PMOS 25 and 26 are connected to each other, the input side is connected to the drain of the first PMOS 25, and the output side is connected to the drain of the second PMOS.

한편, 실시예에서 모스로 언급되는 소자는, 트랜지스터로서의 구동이 가능한 모든 소자를 지칭하는 것으로 해석되어야 할 것이다. On the other hand, a device referred to as MOS in the embodiment should be interpreted as referring to all devices capable of driving as a transistor.

실시예에 따른 저항소자(20)에는, 드레인에 구동전압(Vdd)이 연결되고 게이트에 입력전압(Va)이 연결되고, 소스가 제 2 피모스(26)의 게이트에 연결되는 제 2 엔모스(22), 및 제 2 엔모스(22)와 제 1 소스 팔로워(source follower)회로를 이루는 제 1 엔모스(21)가 포함된다. In the resistance element 20 according to the embodiment, the driving voltage Vdd is connected to the drain, the input voltage Va is connected to the gate, and the source is connected to the gate of the second PMOS 26 . (22), and the second NMOS 22 and the first NMOS 21 constituting the first source follower circuit are included.

상기 제 1 소스 팔로워의 입력인 제 2 엔모스(22)의 게이트전압은, 소스 팔로워의 출력인 제 2 엔모스(22)의 소스 및 제 1 엔모스(21)의 드레인전압에 그대로 반영될 수 있다. 이상적인 경우에는, 일대일의 변화를 가질 수 있다. The gate voltage of the second NMOS 22 that is the input of the first source follower may be directly reflected to the source voltage of the second NMOS 22 and the drain voltage of the first NMOS 21 that are the output of the source follower. have. Ideally, you could have a one-to-one variation.

실시예에 따른 저항소자(20)에는, 드레인에 구동전압(Vdd)이 연결되고 게이트에 출력전압(Vb)이 연결되고, 소스가 제 1 피모스(25)의 게이트에 연결되는 제 4 엔모스(24), 및 제 4 엔모스(24)와 제 2 소스 팔로워(source follower)회로를 이루는 제 3 엔모스(21)가 포함된다. In the resistance element 20 according to the embodiment, the driving voltage Vdd is connected to the drain, the output voltage Vb is connected to the gate, and the source is connected to the gate of the first PMOS 25 . (24), and a fourth NMOS 24 and a third NMOS 21 constituting a second source follower circuit are included.

상기 제 2 소스 팔로워의 입력인 제 4 엔모스(24)의 게이트전압은, 소스 팔로워의 출력인 제 3 엔모스(23)의 소스 및 제 4 엔모스(24)의 드레인전압에 그대로 반영될 수 있다. 이상적인 경우에는, 일대일의 변화를 가질 수 있다. The gate voltage of the fourth NMOS 24 that is the input of the second source follower may be directly reflected to the source voltage of the third NMOS 23 and the drain voltage of the fourth NMOS 24 that are the output of the source follower. have. Ideally, you could have a one-to-one variation.

상기 제 1 엔모스(21), 및 상기 제 2 엔모스(22)의 게이트는 상기 기준전압(Vcon)이 연결된다. 상기 제 1 엔모스(21)의 드레인은 상기 제 2 엔모스(22)의 소스와 연결되고, 상기 제 3 엔모스(23)의 소스는 상기 제 4 엔모스(24)의 소스와 연결된다. The reference voltage Vcon is connected to gates of the first NMOS 21 and the second NMOS 22 . The drain of the first NMOS 21 is connected to the source of the second NMOS 22 , and the source of the third NMOS 23 is connected to the source of the fourth NMOS 24 .

실시예에 따른 저항소자(20)의 주요 동작을 설명한다. The main operation of the resistance element 20 according to the embodiment will be described.

상기 입력전압(Va)이 변하면, 상기 제 1 소스 팔로워가 이를 감지한다(도 5참조). 상기 제 1 소스 팔로워의 출력전압은 상기 제 2 피모스(26)의 게이트로 인가되어 출력전압(Vb)에 반영된다(도 6참조). When the input voltage Va changes, the first source follower senses it (see FIG. 5). The output voltage of the first source follower is applied to the gate of the second PMOS 26 and is reflected in the output voltage Vb (refer to FIG. 6).

이에 따라서, 상기 제 2 피모스(26)가 포화영역에 진입하는 것을 방지할 수 있다. Accordingly, it is possible to prevent the second PMOS 26 from entering the saturation region.

결국, 상기 제 1, 2 피모스(25)(26)에 의해서 구현하도록 프로그래밍된 전류의 양을 구현할 수 있다. 이에 따라서, 프로그래밍 된 컨덕컨스 값을 유지할 수 있다. 나아가서, 프로그래밍 가능한 전류의 범위를 크게 할 수도 있다. As a result, the amount of current programmed to be implemented by the first and second PMOS 25 and 26 can be implemented. Accordingly, the programmed conductance value can be maintained. Furthermore, the range of the programmable current may be enlarged.

상기 입력전압(Va)의 변화에 따른 대응은, 상기 출력전압(Vb)의 변화에 대한 대응에 있어서도 마찬가지이다. 예를 들어, 제 2 소스 팔로워의 동작에 의해서 제 1 피모스(25)가 포화영역에 진입하는 것을 방지할 수 있다. The response to the change in the input voltage Va is the same as the response to the change in the output voltage Vb. For example, it is possible to prevent the first PMOS 25 from entering the saturation region by the operation of the second source follower.

본 발명의 일 주된 사상을 이루는 저항소자의 개념은 비교예를 제시하는 것에 의해서 더 쉽게 이해할 수 있을 것이다. The concept of a resistance element constituting one main idea of the present invention will be more easily understood by presenting a comparative example.

상기 비교예로는, 제 1, 2, 3, 및 4 엔모스(21)(22)(23)(24)가 사용되지 않고, 상기 제 1, 2 피모스(25)(26)의 게이트에 기준전압(Vcon)이 직접연결되는 구성이다. In the comparative example, the first, second, third, and fourth NMOS (21, 22, 23, 24) are not used, and the gates of the first, second PMOS (25, 26) are not used. It is a configuration in which the reference voltage Vcon is directly connected.

실시예를 비교예와 비교한다. The examples are compared with the comparative examples.

상기 비교예의 경우에도, 상기 기준전압에 따라서 흐르는 전류를 조절할 수 있다. 그런데, 상기 기준전압이 고정되어 있을 때, 소스, 및 드레인의 전압에 따라서 상기 제 1, 2 피모스(25)(26)의 동작영역이 포화영역(saturation region)에 이를 수 있다. 예를 들어, 입력전압(Va) 및 소스의 전압이 상승함에도 불구하고 상기 기준전압(Vcon)의 변화가 없으므로, 제 2 피모스(26)에서 소스와 드레인의 전압차가 증가한다. 이 때에는 상기 제 2 피모스(26)는 포화영역에 진입할 수 있다. Even in the case of the comparative example, the current flowing according to the reference voltage may be adjusted. However, when the reference voltage is fixed, the operating regions of the first and second PMOS 25 and 26 may reach a saturation region depending on source and drain voltages. For example, since the reference voltage Vcon does not change despite the increase of the input voltage Va and the source voltage, the voltage difference between the source and the drain in the second PMOS 26 increases. In this case, the second PMOS 26 may enter the saturation region.

이에 따라서, 프로그래밍된 전류를 흐를 수 없다. 이에 따르면, 프로그래밍 할 수 있는 전류영역이 줄어들게 된다. As a result, the programmed current cannot flow. According to this, the current area that can be programmed is reduced.

이에 반하여, 실시예의 저항소자(20)는, 상기 입력전압(Va)이 상승하면, 그에 따라서 제 2 피모스(26)의 게이트전압이 상승한다. 이에 따라서, 상기 제 2 피모스(26)는 상기 포화영역(saturation region)에 이르지 않고 선형영역(triods region)을 유지할 수 있다. In contrast, in the resistance element 20 of the embodiment, when the input voltage Va increases, the gate voltage of the second PMOS 26 increases accordingly. Accordingly, the second PMOS 26 may maintain a triods region without reaching the saturation region.

이에 따라서, 상기 바이서스 생성기(8)에서 미리 프로그래밍된 모든 영역의 전류에 대응하여 저항소자를 제어할 수 있다. Accordingly, it is possible to control the resistance element in response to the current in all regions previously programmed in the bias generator 8 .

본 발명에 따르면, 씨모스에 기반하는 크로스바 어레이 딥러닝 가속기를 더 넓은 범위에서 안정적으로 사용할 수 있다. According to the present invention, a CMOS-based crossbar array deep learning accelerator can be used stably in a wider range.

20: 저항소자20: resistance element

Claims

a crossbar array including a plurality of conductive wires crossing each other in horizontal and vertical directions;
a resistance element that crosses and connects horizontal conductors (rows) and longitudinal conductors (columns) in the plurality of conductors;
an update engine used for updating and training a neural network provided to the resistance element and the crossbar array; and
a processing engine for performing input/output operations of the neural network provided to the resistive element and the crossbar array;
In the resistance element,
a first PMOS and a second PMOS connected in series with sources connected to each other to control a flowing current;
a second NMOS having a drain connected to a driving voltage, a gate connected to an input voltage, and a source connected to the gate of the second PMOS; and
a first NMOS constituting the second NMOS and a first source follower circuit;
The gate voltage of the second NMOS, which is the input of the first source follower circuit, is applied to the CMOS that is reflected to the source and drain voltages of the second NMOS, which is the output of the first source follower circuit, and the first NMOS. Based on a crossbar array deep learning accelerator.

The method of claim 1,
The source connected to the first PMOS and the second PMOS includes a CMOS-based crossbar array deep learning in which an input side is connected to a drain of the first PMOS and an output side is connected to a drain of the second PMOS. accelerator.

The method of claim 1,
The gate of the first NMOS is a CMOS-based crossbar array deep learning accelerator to which a reference voltage including information programmed in the update engine is applied.

The method of claim 1,
a fourth NMOS having a drain connected to a driving voltage, a gate connected to an output voltage, and a source connected to the gate of the first PMOS; and
a third NMOS constituting the fourth NMOS and the second source follower circuit is included;
The gate voltage of the fourth NMOS that is the input of the second source follower circuit is applied to the CMOS that is reflected to the source and drain voltages of the fourth NMOS and the third NMOS that are the output of the second source follower circuit. Based on a crossbar array deep learning accelerator.

5. The method of claim 4,
The gate of the third NMOS is a CMOS-based crossbar array deep learning accelerator to which a reference voltage including information programmed in the update engine is applied.