WO2021000469A1 - Binary neural network accumulator circuit based on analogue delay chain - Google Patents

Binary neural network accumulator circuit based on analogue delay chain Download PDF

Info

Publication number
WO2021000469A1
WO2021000469A1 PCT/CN2019/114252 CN2019114252W WO2021000469A1 WO 2021000469 A1 WO2021000469 A1 WO 2021000469A1 CN 2019114252 W CN2019114252 W CN 2019114252W WO 2021000469 A1 WO2021000469 A1 WO 2021000469A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
input
analog delay
delay
delay chain
Prior art date
Application number
PCT/CN2019/114252
Other languages
French (fr)
Chinese (zh)
Inventor
单伟伟
商新超
Original Assignee
东南大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东南大学 filed Critical 东南大学
Publication of WO2021000469A1 publication Critical patent/WO2021000469A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the invention relates to a binary neural network accumulator circuit based on an analog delay chain, relates to a circuit that uses digital-analog hybrid technology to realize neural network accumulation calculation, and belongs to the technical field of basic electronic circuits.
  • the mainstream neural network structure data bit width uses 32bit, and there is a trend of gradually reducing it to 16bit, 8bit or even 4bit. Therefore, in order to reduce power consumption, the method of dynamic precision adjustment can be used to adjust the number of bits of operation, according to different needs.
  • the operation bit width is dynamically selected to save power consumption. The more radical one can be 2bit bit width than 4bit, and the most extreme one can be 1bit bit width. When the bit width becomes 1bit, the neural network becomes a special network-Binary Neutral Network (BNN).
  • BNN network-Binary Neutral Network
  • the multiplication operation of the binary neural network is consistent with the XOR gate operation. Therefore, in the actual chip circuit implementation, the XOR gate can be used to realize the multiplication operation of the binary neural network.
  • the overall operation process of the binary neural network includes multiplication and accumulation, and the final output result is determined by judging the number of 1 according to the accumulation result. Therefore, simulation calculations can be used to determine the accumulated result.
  • the invention is mainly used for the accumulation calculation of the binary neural network, thereby reducing the power consumption of the neural network calculation.
  • the analog delay unit designed in this application can control the delay from A to Y by controlling the state of the data terminal "D".
  • the purpose of the present invention is to address the shortcomings of the above-mentioned background technology and provide a binary neural network accumulator circuit based on analog delay chain, which uses analog calculation to replace the traditional digital circuit accumulation calculation, effectively reducing the two
  • the power consumption of the cumulative calculation of the numerized neural network realizes the energy-efficient cumulative calculation of the binary neural network, and solves the technical problem that the energy consumption of the cumulative calculation of the binary neural network needs to be reduced.
  • a binary neural network system based on an analog delay chain including a delay chain module and a pulse generating circuit.
  • the delay chain module structure is composed of two delay chains and a D flip-flop, where each delay chain It is composed of N analog delay units.
  • the analog delay unit uses 6 MOS transistors to determine whether the input data is "0" or "1" through the difference in delay time.
  • the delay chain connects N analog delay units as required to achieve multi-input data Accumulate and determine the number of "1"s.
  • the design method of the binary neural network accumulator circuit based on the analog delay chain of the present invention includes the following steps:
  • Analog delay unit design first complete the size design of the analog delay unit, and then draw the analog delay unit according to the layout design rules of the digital standard unit;
  • Delay chain module After the design of the analog delay unit is completed, use the units in the standard cell library to join the analog delay unit to complete the design of the delay chain module.
  • the delay unit is composed of 3 NMOS tubes, 1 PMOS tube, and an inverter.
  • the peripheral input data is connected to the gate of the PMOS tube and the first NMOS tube, and the peripheral input data is connected to the gate of the second NMOS tube.
  • the source of the first NMOS tube, the drain of the second NMOS tube and the drain of the third NMOS tube are connected to node n
  • the source of the second NMOS tube M3 and the source of the third NMOS tube are connected to ground
  • the drain and the drain of the first NMOS transistor are connected to node m as the input of the inverter.
  • the output of the inverter is the output delay signal.
  • the source of the PMOS transistor and the gate of the third NMOS transistor are both connected power supply.
  • the delay chain module is composed of two delay chains and a D flip-flop, wherein each delay chain is composed of n analog delay units.
  • the data input terminal of the analog delay unit is connected to the peripheral input data
  • the signal output terminals of the delay chain 1 and delay chain 2 are respectively connected to the data terminal and the clock terminal of the D flip-flop
  • the clock signal output terminal of the pulse generating circuit is connected to the delay
  • the clock signal input end of the module is connected, and the output signal of the D flip-flop is the judgment signal of the delay signal.
  • the input signal of the delay chain module is a clock input signal and n peripheral input data
  • the output signal is a data output flag signal.
  • the n peripheral input data are respectively connected to the data input terminals of n delay units
  • the clock input signal is input to the data input terminal of the first delay unit
  • the output terminal of each delay unit is connected to the data of the next delay unit Input terminal
  • the output terminal of the nth delay unit of delay chain 1 is connected to the D terminal of the D flip-flop
  • the output terminal of the nth delay unit of delay chain 2 is connected to the clock CLK terminal of the D flip-flop
  • D trigger The output signal Flag of the detector is the delay flag signal.
  • the overall operation process of the binary neural network includes multiplication, accumulation, and the final output result is determined by judging the number of 1 according to the accumulated result.
  • the present invention adopts the above technical solution and has the following beneficial effects: the present invention uses analog calculation to realize the cumulative calculation of neural network, converts digital signals into analog signals for calculation, can effectively reduce the overall power consumption of the chip, and can be used in wide voltage At the same time, the proposed delay unit has less area overhead, so that higher power consumption gains can be obtained.
  • Figure 1 is a structural diagram of the delay unit of the present invention.
  • FIG. 2 is a working sequence diagram of the delay unit of the present invention.
  • Figure 3 is a circuit diagram of the analog delay chain of the present invention.
  • Fig. 4 is an overall structure diagram of the delay chain module of the present invention.
  • Fig. 5 is a working sequence diagram of the delay chain module of the present invention.
  • Figure 6 is a HSPICE simulation timing diagram of the delay chain module of the present invention.
  • Fig. 7 is a circuit diagram of the pulse generating circuit of the present invention.
  • Fig. 8 is a working timing diagram of the pulse generating circuit of the present invention.
  • the delay unit of the present invention is shown in Fig. 1, and is composed of 3 NMOS tubes, 1 PMOS tube and an inverter.
  • the peripheral input data A is connected to the gates of the PMOS tube M1 and the NMOS tube M2, and the peripheral input data D Connected to the gate of the NMOS transistor M3, the source of the NMOS transistor M2 and the drains of the NMOS transistors M3 and M4 are connected to node n, the sources of the NMOS transistors M3 and M4 are connected to the ground, and the source of the PMOS transistor M1 is connected to the third NMOS
  • the gates of the tubes M4 are both connected to the power supply, and the drains of the PMOS tube M1 and the NMOS tube M2 are connected to the node m as the input end of the inverter U1, and the output of the inverter is the output delay signal.
  • the working sequence diagram of the delay unit of the present invention is shown in Figure 2.
  • the data input terminal D controls whether the MOS transistor M3 is on or off. When the input terminal D is "1", the transistor M3 is turned on, and when the input terminal A When changing from “0" to “1", the discharge path of node n is completed by transistors M3 and M4 in parallel. When the input terminal D is "0”, the transistor M3 is turned off. When the input terminal A changes from “0" to "1”, the discharge path of node n can only be completed through the transistor M4, causing the delay from A to Y to increase . Therefore, the delay from A to Y can be controlled through the data input terminal D.
  • the analog delay chain circuit of the present invention includes two parts: a delay chain module and a pulse generating circuit.
  • the delay chain module is composed of n delay units and a D flip-flop.
  • the data input of the delay unit Terminal D is connected to the peripheral input data
  • the signal output terminals Y1 and Y2 of delay chain 1 and delay chain 2 are respectively connected to the data terminal and clock terminal of the D flip-flop
  • the clock signal output terminal of the pulse generating circuit is connected to the clock of the delay module
  • the signal input terminal is connected, and the output signal Flag of the D flip-flop is the judgment signal of the delay signal.
  • the overall structure of the delay chain module of the present invention is shown in FIG. 4.
  • the weights W1, W2, .., Wn of the neural network and the input data X1, X2, .., Xn perform the same-or operation, and the output results D1, D2, .., Dn are used as the data input to the data input of the delay chain module .
  • Delay chain module The delay chain module is composed of two delay chains and a D flip-flop, and each delay chain is composed of n analog delay units.
  • the input data of delay chain 1 is the data D1, D2, .., Dn after the weight and the image are the same or the same; the delay chain 2 is the reference chain, and the input data is configured according to the calculation needs of each layer of the neural network.
  • the signal output terminals Y1 and Y2 of delay chain 1 and delay chain 2 are respectively connected to the data terminal and clock terminal of the D flip-flop.
  • the clock signal output terminal of the pulse generating circuit is connected to the clock signal input terminal of the delay module.
  • D flip-flop The output signal Flag is the judgment signal of the delay signal.
  • N normal distribution
  • BN batch normalization
  • ⁇ and ⁇ are scaling factors and bias coefficients, which are parameters during training, used to affine the activation value to ensure the restoration of the original input
  • x is the input data set
  • ⁇ B is the input
  • ⁇ B represents the standard deviation of the input data set
  • is a parameter added to prevent the denominator from being 0, usually a small constant greater than 0.
  • the binarization neural network binarizes the weight value and each activation function value in its weight matrix (binarization to positive 1 or negative 1). Due to the particularity of its calculation, the batch of the binarization neural network can be The normalization method is optimized, and the calculation formula for batch normalization of the binary neural network is shown in the following formula 1.2:
  • the batch normalization of the binarized neural network can be added to the bias. Therefore, the bias value can be directly added to the reference delay chain 2, and the delay chain can be configured according to the results of network training 2. The input situation.
  • the working sequence diagram of the delay chain module is shown in Figure 5.
  • the signals Y1 and Y2 are respectively connected to the data terminal and the clock terminal of the D flip-flop.
  • the number of "1"s in delay chain 1 is more than the number of "1"s in delay chain 2. If Y1 arrives first, the data collected by the D flip-flop is "1"; In the second clock cycle, the number of "1"s in delay chain 1 is less than the number of "1”s in delay chain 2, and Y2 arrives first, and the data collected by the D flip-flop is "0".
  • the HSPICE simulation timing diagram of the delay chain module is shown in Figure 6.
  • signal Y2 arrives first, and the data (Flag) collected by the D flip-flop is "0"; when the number of "1" in delay chain 1 When the number is more than the delay chain 2, the signal Y1 arrives first, and the data (Flag) collected by the D flip-flop is "1".
  • the pulse generating circuit and working sequence diagram of the present invention are shown in FIG. 7.
  • the pulse generating circuit is composed of 3 NAND gates, an inverter and a delay module.
  • the delay module can be equipped to complete the configuration of different delay sizes, so as to realize the adjustment of the pulse width.
  • the working sequence diagram of the pulse generating circuit of the present invention is shown in FIG. 8.
  • the basic principle of the pulse generating circuit is: when CLK is low, nodes X and Qb are both high, and node Y remains low; when CLK changes from low to high, first node Qb changes from high to low Change, which causes node Y to change from low to high, and node X from high to low. At this time, node Qb changes from low to high.
  • the time to complete the whole process is the pulse width generated by the pulse generating circuit. Therefore, the pulse width is It is determined by the delay chain and the delay of the three NAND gates.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Pulse Circuits (AREA)

Abstract

A binary neural network accumulator circuit based on an analogue delay chain, which belongs to the technical field of basic electronic circuits. The binary neural network accumulator circuit comprises a delay chain module with two delay chains and a pulse generation circuit, wherein the analogue delay chain is composed of multiple analogue delay cells connected in series, and the analogue delay cells use six MOS transistors and determine "0" and "1" by means of the size of delay. An analogue calculation method is used to replace cumulative calculation in a traditional digital circuit design. Moreover, the accumulator structure can work stably within a wide voltage, and circuit implementation is simple, thereby effectively reducing the power consumption of binary neural network accumulator calculation, and greatly improving the energy efficiency of a neural network circuit.

Description

一种基于模拟延时链的二值化神经网络累加器电路A Binary Neural Network Accumulator Circuit Based on Analog Delay Chain 技术领域Technical field
本发明涉及一种基于模拟延时链的二值化神经网络累加器电路,涉及利用数模混合技术实现神经网络累加计算的电路,属于基本电子电路的技术领域。The invention relates to a binary neural network accumulator circuit based on an analog delay chain, relates to a circuit that uses digital-analog hybrid technology to realize neural network accumulation calculation, and belongs to the technical field of basic electronic circuits.
背景技术Background technique
最近几年内,人工智能技术展现了其在图像识别,面部检测,语音识别,文字处理和人工智能游戏等领域方面的独特优势。在发达国家,人工智能已经成为了优先发展目标,其中,最为突出的是最近在深度学习(Deep Learning)领域的进展,百度、谷歌、微软、Facebook此类高端互联网公司的研究实践表明深度学习能够在图像感知方面达到甚至超过人类的水平。实现深度学习网络的一个最主要的挑战在于大量的运算会消耗过多的能源与硬件资源。In recent years, artificial intelligence technology has demonstrated its unique advantages in image recognition, face detection, voice recognition, word processing and artificial intelligence games. In developed countries, artificial intelligence has become a priority development goal. Among them, the most prominent is the recent progress in the field of deep learning. The research practices of high-end Internet companies such as Baidu, Google, Microsoft, and Facebook show that deep learning can It reaches or exceeds human level in image perception. One of the main challenges in implementing a deep learning network is that a large number of operations will consume too much energy and hardware resources.
主流的神经网络结构数据位宽采用32bit,目前有逐步降低为16bit、8bit甚至是4bit的趋势,因此为了降低功耗,可以采用动态精度调整的方法来调节运算的位数,根据不同需要的场合动态地选择运算位宽来节省功耗,比4bit更激进的可以是2bit位宽,最极端的可以是1bit位宽。当位宽变为1bit的时候,神经网络就成了一种特殊的网络——二值化神经网络(Binary Neutral Network,BNN)。The mainstream neural network structure data bit width uses 32bit, and there is a trend of gradually reducing it to 16bit, 8bit or even 4bit. Therefore, in order to reduce power consumption, the method of dynamic precision adjustment can be used to adjust the number of bits of operation, according to different needs. The operation bit width is dynamically selected to save power consumption. The more radical one can be 2bit bit width than 4bit, and the most extreme one can be 1bit bit width. When the bit width becomes 1bit, the neural network becomes a special network-Binary Neutral Network (BNN).
功耗是限制神经网络面向应用的一大瓶颈,而二值神经网络是神经网络“小型化”探索中的一个重要方向。神经网络中有两个部分可以被二值化,一是网络的系数,二是网络的中间结果。通过把浮点单精度的系数变成正1或负1,系数的二值化能达成存储大小变为原来的1/32也就是3%的效果。另一方面,如果中间结果也变为二值,由于大部分计算都在1间进行,可以将浮点计算替换成整数位运算。相比于非二值化的网络,二值神经网络将大量的数学运算变成了位操作,这样大大减少了计算量,并且有效减少了存储量,使神经网络的应用门槛变得更低。Power consumption is a major bottleneck that limits the application of neural networks, and binary neural networks are an important direction in the exploration of neural network "miniaturization". There are two parts in a neural network that can be binarized, one is the coefficient of the network, and the other is the intermediate result of the network. By changing the floating-point single-precision coefficient to positive 1 or negative 1, the binarization of the coefficient can achieve the effect of reducing the storage size to 1/32 or 3% of the original. On the other hand, if the intermediate result also becomes two-valued, since most of the calculations are performed between 1, floating-point calculations can be replaced with integer bit operations. Compared with non-binarized networks, binary neural networks turn a large number of mathematical operations into bit operations, which greatly reduces the amount of calculations and effectively reduces the amount of storage, making the application threshold of neural networks lower.
因计算的特殊性,二值化神经网络的乘运算与同或门运算一致,因此,在实际的芯片电路实现中,可以采用同或门来实现二值化神经网络的乘运算。二值化神经网络的整体运算过程包括乘运算和累加,根据累加后的结果判断1的数目决定最终的输出结果。因此,可以采用模拟计算的方式来判断累加后的结果。Due to the particularity of calculation, the multiplication operation of the binary neural network is consistent with the XOR gate operation. Therefore, in the actual chip circuit implementation, the XOR gate can be used to realize the multiplication operation of the binary neural network. The overall operation process of the binary neural network includes multiplication and accumulation, and the final output result is determined by judging the number of 1 according to the accumulation result. Therefore, simulation calculations can be used to determine the accumulated result.
本发明主要用于二值化神经网络的累加计算,从而降低神经网络计算的功耗。本申请设计的模拟延时单元可以通过控制数据端“D”的状态控制A到Y的延时。The invention is mainly used for the accumulation calculation of the binary neural network, thereby reducing the power consumption of the neural network calculation. The analog delay unit designed in this application can control the delay from A to Y by controlling the state of the data terminal "D".
发明内容Summary of the invention
本发明的发明目的是针对上述背景技术的不足,提供了一种基于模拟延时链的二值化神经网络累加器电路,利用模拟计算的方式替代传统的数字电路累加计算,有效的降低了二值化神经网络累加计算的功耗,实现了高能效的二值化神经网络累加计算,解决了二值化神经网络累加计算能耗有待降低的技术问题。The purpose of the present invention is to address the shortcomings of the above-mentioned background technology and provide a binary neural network accumulator circuit based on analog delay chain, which uses analog calculation to replace the traditional digital circuit accumulation calculation, effectively reducing the two The power consumption of the cumulative calculation of the numerized neural network realizes the energy-efficient cumulative calculation of the binary neural network, and solves the technical problem that the energy consumption of the cumulative calculation of the binary neural network needs to be reduced.
本发明为实现上述发明目的采用如下技术方案:The present invention adopts the following technical solutions to achieve the above-mentioned purpose of the invention:
一种基于模拟延时链的二值化神经网络系统,包括延时链模块和脉冲产生电路,延时链模块结构由两条延时链和一个D触发器构成,其中,每条延时链由N个模拟延时单元组成。模拟延时单元采用6个MOS管,通过延时时间的不同来判断输入数据是“0”还是“1”,延时链将N个模拟延时单元按照要求相连接,从而实现多输入数据的累加,并且判断出“1”的数目。A binary neural network system based on an analog delay chain, including a delay chain module and a pulse generating circuit. The delay chain module structure is composed of two delay chains and a D flip-flop, where each delay chain It is composed of N analog delay units. The analog delay unit uses 6 MOS transistors to determine whether the input data is "0" or "1" through the difference in delay time. The delay chain connects N analog delay units as required to achieve multi-input data Accumulate and determine the number of "1"s.
本发明所述的基于模拟延时链的二值化神经网络累加器电路,其设计方法包括如下步骤:The design method of the binary neural network accumulator circuit based on the analog delay chain of the present invention includes the following steps:
(1)模拟延时单元设计:首先完成模拟延时单元的尺寸设计,然后按照数字标准单元的版图设计规则绘制模拟延时单元;(1) Analog delay unit design: first complete the size design of the analog delay unit, and then draw the analog delay unit according to the layout design rules of the digital standard unit;
(2)延时链模块:模拟延时单元设计完成后,利用标准单元库里的单元与模拟延时单元进行拼接,完成延时链模块的设计。(2) Delay chain module: After the design of the analog delay unit is completed, use the units in the standard cell library to join the analog delay unit to complete the design of the delay chain module.
所述延时单元由3个NMOS管、1个PMOS管和一个反相器组成,外围输入数据与PMOS管、第一NMOS管的栅极相连,外围输入数据与第二NMOS管的栅极相连,第一NMOS管的源极与第二NMOS管的漏极以及第三NMOS管的漏极相连于节点n,第二NMOS管M3的源极和第三NMOS管的源极相连接地,PMOS管的漏极和第一NMOS管的漏极相连于节点m后作为反相器的输入端,反相器的输出即输出延时信号,PMOS管的源极和第三NMOS管的栅极均接电源。The delay unit is composed of 3 NMOS tubes, 1 PMOS tube, and an inverter. The peripheral input data is connected to the gate of the PMOS tube and the first NMOS tube, and the peripheral input data is connected to the gate of the second NMOS tube. , The source of the first NMOS tube, the drain of the second NMOS tube and the drain of the third NMOS tube are connected to node n, the source of the second NMOS tube M3 and the source of the third NMOS tube are connected to ground, and the PMOS tube The drain and the drain of the first NMOS transistor are connected to node m as the input of the inverter. The output of the inverter is the output delay signal. The source of the PMOS transistor and the gate of the third NMOS transistor are both connected power supply.
所述延时链模块由两条延时链和一个D触发器构成,其中,每条延时链由n个模拟延时单元组成。模拟延时单元的数据输入端与外围输入数据相连,延时链1和延时链2的信号输出端分别连接D触发器的数据端和时钟端,脉冲产生电路 的时钟信号输出端与延时模块的时钟信号输入端相连,D触发器的输出信号即延时信号的判断信号。The delay chain module is composed of two delay chains and a D flip-flop, wherein each delay chain is composed of n analog delay units. The data input terminal of the analog delay unit is connected to the peripheral input data, the signal output terminals of the delay chain 1 and delay chain 2 are respectively connected to the data terminal and the clock terminal of the D flip-flop, and the clock signal output terminal of the pulse generating circuit is connected to the delay The clock signal input end of the module is connected, and the output signal of the D flip-flop is the judgment signal of the delay signal.
所述延时链模块的输入信号为时钟输入信号和n个外围输入数据,输出信号为数据输出标志信号。n个外围输入数据分别连接到n个延时单元的数据输入端,时钟输入信号输入到第一个延时单元的数据输入端,每个延时单元的输出端连接下个延时单元的数据输入端,延时链1的第n个延时单元的输出端连接D触发器的D端,延时链2的第n个延时单元的输出端连接D触发器的时钟CLK端,D触发器的输出信号Flag,即延时标志信号。The input signal of the delay chain module is a clock input signal and n peripheral input data, and the output signal is a data output flag signal. The n peripheral input data are respectively connected to the data input terminals of n delay units, the clock input signal is input to the data input terminal of the first delay unit, and the output terminal of each delay unit is connected to the data of the next delay unit Input terminal, the output terminal of the nth delay unit of delay chain 1 is connected to the D terminal of the D flip-flop, the output terminal of the nth delay unit of delay chain 2 is connected to the clock CLK terminal of the D flip-flop, and D trigger The output signal Flag of the detector is the delay flag signal.
二值化神经网络的整体运算过程包括乘运算,累加,根据累加后的结果判断1的数目决定最终的输出结果。在实际的运算中,我们只需要知道二值化后的累加结果,即判断累加后的结果是大于0还是小于0,因此可以采用模拟计算的方式来判断累加后的结果。The overall operation process of the binary neural network includes multiplication, accumulation, and the final output result is determined by judging the number of 1 according to the accumulated result. In the actual calculation, we only need to know the accumulation result after binarization, that is, judge whether the accumulation result is greater than 0 or less than 0, so we can use simulation calculation to judge the accumulation result.
本发明采用上述技术方案,具有以下有益效果:本发明利用模拟计算的方式实现神经网络的累加计算,将数字信号转换为模拟信号进行计算,能够有效的降低芯片的整体功耗,可以在宽电压下稳定的工作,同时,提出的延时单元面积开销较少,从而可以取得较高的功耗收益。The present invention adopts the above technical solution and has the following beneficial effects: the present invention uses analog calculation to realize the cumulative calculation of neural network, converts digital signals into analog signals for calculation, can effectively reduce the overall power consumption of the chip, and can be used in wide voltage At the same time, the proposed delay unit has less area overhead, so that higher power consumption gains can be obtained.
附图说明Description of the drawings
图1为本发明的延时单元结构图。Figure 1 is a structural diagram of the delay unit of the present invention.
图2为本发明的延时单元工作时序图。Figure 2 is a working sequence diagram of the delay unit of the present invention.
图3为本发明的模拟延时链电路图。Figure 3 is a circuit diagram of the analog delay chain of the present invention.
图4为本发明的延时链模块的整体结构图。Fig. 4 is an overall structure diagram of the delay chain module of the present invention.
图5为本发明的延时链模块的工作时序图。Fig. 5 is a working sequence diagram of the delay chain module of the present invention.
图6为本发明的延时链模块的HSPICE仿真时序图。Figure 6 is a HSPICE simulation timing diagram of the delay chain module of the present invention.
图7为本发明的脉冲产生电路图。Fig. 7 is a circuit diagram of the pulse generating circuit of the present invention.
图8为本发明的脉冲产生电路工作时序图。Fig. 8 is a working timing diagram of the pulse generating circuit of the present invention.
具体实施方式Detailed ways
下面结合附图对发明的技术方案进行详细说明,但是本发明的保护范围不局限于所述实施例。The technical solution of the invention will be described in detail below with reference to the accompanying drawings, but the protection scope of the invention is not limited to the embodiments.
本发明的延时单元如图1所示,由3个NMOS管、1个PMOS管和一个反相器组成,外围输入数据A与PMOS管M1、NMOS管M2的栅极相连,外围 输入数据D与NMOS管M3的栅极相连,NMOS管M2的源极与NMOS管M3和M4的漏极相连于节点n,NMOS管M3和M4的源极相连接地,PMOS管M1的源极和第三NMOS管M4的栅极均接电源,PMOS管M1和NMOS管M2的漏极相连于节点m后作为反相器U1的输入端,反相器的输出即输出延时信号。The delay unit of the present invention is shown in Fig. 1, and is composed of 3 NMOS tubes, 1 PMOS tube and an inverter. The peripheral input data A is connected to the gates of the PMOS tube M1 and the NMOS tube M2, and the peripheral input data D Connected to the gate of the NMOS transistor M3, the source of the NMOS transistor M2 and the drains of the NMOS transistors M3 and M4 are connected to node n, the sources of the NMOS transistors M3 and M4 are connected to the ground, and the source of the PMOS transistor M1 is connected to the third NMOS The gates of the tubes M4 are both connected to the power supply, and the drains of the PMOS tube M1 and the NMOS tube M2 are connected to the node m as the input end of the inverter U1, and the output of the inverter is the output delay signal.
本发明的延时单元工作时序图如图2所示,通过数据输入端D来控制MOS管M3是导通还是断开,当输入端D是“1”时,晶体管M3打开,当输入端A由“0”到“1”变化时,节点n的放电通路由晶体管M3和M4并联完成。当输入端D是“0”时,晶体管M3关闭,当输入端A由“0”到“1”变化时,节点n的放电通路只能通过晶体管M4完成,造成A到Y的延时增大。因此,可以通过数据输入端D来控制A到Y的延时。The working sequence diagram of the delay unit of the present invention is shown in Figure 2. The data input terminal D controls whether the MOS transistor M3 is on or off. When the input terminal D is "1", the transistor M3 is turned on, and when the input terminal A When changing from "0" to "1", the discharge path of node n is completed by transistors M3 and M4 in parallel. When the input terminal D is "0", the transistor M3 is turned off. When the input terminal A changes from "0" to "1", the discharge path of node n can only be completed through the transistor M4, causing the delay from A to Y to increase . Therefore, the delay from A to Y can be controlled through the data input terminal D.
本发明的模拟延时链电路如图3所示包括二个部分:延时链模块和脉冲产生电路,延时链模块由n个延时单元和一个D触发器构成,延时单元的数据输入端D与外围输入数据相连,延时链1和延时链2的信号输出端Y1、Y2分别连接D触发器的数据端和时钟端,脉冲产生电路的时钟信号输出端与延时模块的时钟信号输入端相连,D触发器的输出信号Flag即延时信号的判断信号。As shown in Figure 3, the analog delay chain circuit of the present invention includes two parts: a delay chain module and a pulse generating circuit. The delay chain module is composed of n delay units and a D flip-flop. The data input of the delay unit Terminal D is connected to the peripheral input data, the signal output terminals Y1 and Y2 of delay chain 1 and delay chain 2 are respectively connected to the data terminal and clock terminal of the D flip-flop, and the clock signal output terminal of the pulse generating circuit is connected to the clock of the delay module The signal input terminal is connected, and the output signal Flag of the D flip-flop is the judgment signal of the delay signal.
本发明的延时链模块的整体结构如图4所示。神经网络的权重W1,W2,..,Wn和输入数据X1,X2,..,Xn进行同或运算,其输出结果D1,D2,..,Dn作为输入到延时链模块数据输入端的数据。延时链模块延时链模块由两条延时链和一个D触发器构成,其中,每条延时链由n个模拟延时单元组成。延时链1的输入数据为权重和图像同或后的数据D1,D2,..,Dn;延时链2为基准链,输入数据根据神经网络每一层的计算需要进行配置。延时链1和延时链2的信号输出端Y1、Y2分别连接D触发器的数据端和时钟端,脉冲产生电路的时钟信号输出端与延时模块的时钟信号输入端相连,D触发器的输出信号Flag即延时信号的判断信号。在神经网络训练阶段,需要对神经网络每一层的数据进行标准化处理,使输出规范化到正态分布N(0,1),即批量归一化(Batch Normalization,BN)。批量归一化的计算公式如下式1.1所示:The overall structure of the delay chain module of the present invention is shown in FIG. 4. The weights W1, W2, .., Wn of the neural network and the input data X1, X2, .., Xn perform the same-or operation, and the output results D1, D2, .., Dn are used as the data input to the data input of the delay chain module . Delay chain module The delay chain module is composed of two delay chains and a D flip-flop, and each delay chain is composed of n analog delay units. The input data of delay chain 1 is the data D1, D2, .., Dn after the weight and the image are the same or the same; the delay chain 2 is the reference chain, and the input data is configured according to the calculation needs of each layer of the neural network. The signal output terminals Y1 and Y2 of delay chain 1 and delay chain 2 are respectively connected to the data terminal and clock terminal of the D flip-flop. The clock signal output terminal of the pulse generating circuit is connected to the clock signal input terminal of the delay module. D flip-flop The output signal Flag is the judgment signal of the delay signal. In the neural network training stage, the data of each layer of the neural network needs to be standardized, so that the output is normalized to a normal distribution N (0, 1), that is, batch normalization (BN). The calculation formula for batch normalization is shown in the following formula 1.1:
Figure PCTCN2019114252-appb-000001
Figure PCTCN2019114252-appb-000001
其中,γ、β为缩放因子和偏置系数,是训练时的参数,用于对激活值进行仿射变化,以保证对原有输入的还原,x是输入数据集,μ B表示的是输入数据集 均值,σ B表示的是输入数据集的标准差,ε是为了防止分母为0而加入的参数,通常为一很小的大于0的常数。 Among them, γ and β are scaling factors and bias coefficients, which are parameters during training, used to affine the activation value to ensure the restoration of the original input, x is the input data set, and μ B is the input The mean of the data set, σ B represents the standard deviation of the input data set, and ε is a parameter added to prevent the denominator from being 0, usually a small constant greater than 0.
二值化神经网络将其权重矩阵中权重值和各个激活函数值均进行二值化(二值化为正1或负1),由于其计算的特殊性,可以对二值化神经网络的批量归一化方法进行优化,二值化神经网络批量归一化的计算公式如下式1.2所示:The binarization neural network binarizes the weight value and each activation function value in its weight matrix (binarization to positive 1 or negative 1). Due to the particularity of its calculation, the batch of the binarization neural network can be The normalization method is optimized, and the calculation formula for batch normalization of the binary neural network is shown in the following formula 1.2:
Figure PCTCN2019114252-appb-000002
Figure PCTCN2019114252-appb-000002
经过公式的变化,二值化神经网络的批量归一化就可以加到偏置上,因此,可以将偏置值直接加在基准延时链2上,根据网络训练的结果来配置延时链2的输入情况。After changing the formula, the batch normalization of the binarized neural network can be added to the bias. Therefore, the bias value can be directly added to the reference delay chain 2, and the delay chain can be configured according to the results of network training 2. The input situation.
延时链模块的工作时序图如图5所示。为了比较延时链1和延时链2中“1”的数目,可以通过比较信号Y1和Y2到达的先后顺序来判断。将信号Y1和Y2分别接到D触发器的数据端和时钟端。在第一个时钟周期内,延时链1中“1”的数目多于延时链2中“1”的个数,Y1先到达,则D触发器采集到的数据为“1”;在第二个时钟周期内,延时链1中“1”的数目少于延时链2中“1”的个数,Y2先到达,则D触发器采集到的数据为“0”。The working sequence diagram of the delay chain module is shown in Figure 5. In order to compare the number of "1"s in delay chain 1 and delay chain 2, it can be judged by comparing the arrival sequence of signals Y1 and Y2. The signals Y1 and Y2 are respectively connected to the data terminal and the clock terminal of the D flip-flop. In the first clock cycle, the number of "1"s in delay chain 1 is more than the number of "1"s in delay chain 2. If Y1 arrives first, the data collected by the D flip-flop is "1"; In the second clock cycle, the number of "1"s in delay chain 1 is less than the number of "1"s in delay chain 2, and Y2 arrives first, and the data collected by the D flip-flop is "0".
延时链模块的HSPICE仿真时序图如图6所示。当延时链1中“1”的数目小于延时链2时,信号Y2先到,此时D触发器采集到的数据(Flag)为“0”;当延时链1中“1”的数目多于延时链2时,信号Y1先到,此时D触发器采集到的数据(Flag)为“1”。The HSPICE simulation timing diagram of the delay chain module is shown in Figure 6. When the number of "1"s in delay chain 1 is less than that of delay chain 2, signal Y2 arrives first, and the data (Flag) collected by the D flip-flop is "0"; when the number of "1" in delay chain 1 When the number is more than the delay chain 2, the signal Y1 arrives first, and the data (Flag) collected by the D flip-flop is "1".
本发明的脉冲产生电路和工作时序图如图7所示。脉冲产生电路由3个与非门,一个反相器和延时模块组成,其中,可配延时模块可以完成延时大小不同的配置,从而实现脉冲宽度的调节。The pulse generating circuit and working sequence diagram of the present invention are shown in FIG. 7. The pulse generating circuit is composed of 3 NAND gates, an inverter and a delay module. Among them, the delay module can be equipped to complete the configuration of different delay sizes, so as to realize the adjustment of the pulse width.
本发明的脉冲产生电路工作时序图如图8所示。脉冲产生电路的基本原理是:当CLK为低电平时,节点X和Qb都为高电平,节点Y保持低电平;当CLK由低变为高电平时,首先节点Qb发生由高到低变化,从而导致节点Y由低到高变化,节点X由高到低变化,此时,节点Qb从低到高变化,完成整个过程的时 间就是脉冲产生电路产生的脉冲宽度,因此,脉冲宽度是由延时链和三个与非门的延时共同决定的。The working sequence diagram of the pulse generating circuit of the present invention is shown in FIG. 8. The basic principle of the pulse generating circuit is: when CLK is low, nodes X and Qb are both high, and node Y remains low; when CLK changes from low to high, first node Qb changes from high to low Change, which causes node Y to change from low to high, and node X from high to low. At this time, node Qb changes from low to high. The time to complete the whole process is the pulse width generated by the pulse generating circuit. Therefore, the pulse width is It is determined by the delay chain and the delay of the three NAND gates.
在具体的实施过程中,为了说明其在计算功耗方面的优势,将其与传统的加法器结构(采用工艺商提供的标准单元库中的全加器综合)进行比较。将64个单比特的数据分别采用传统数字加法器结构和本文设计的结构进行实现,并判断累加后“1”的数目。表1是与传统的数字加法器结构进行的数据对比,从表中数据可以看出,实现相同的64个单比特数据累加计算,功耗可以节省57%,并且性能有33.3%的提升。In the specific implementation process, in order to illustrate its advantages in calculating power consumption, compare it with the traditional adder structure (using the full adder synthesis in the standard cell library provided by the craftsman). The 64 single-bit data is implemented using the traditional digital adder structure and the structure designed in this paper, and the number of "1"s after accumulation is judged. Table 1 is a data comparison with the traditional digital adder structure. It can be seen from the data in the table that the same 64 single-bit data accumulation calculation can be realized, power consumption can be saved by 57%, and performance is improved by 33.3%.
表1传统数字电路64位单bit累加结构与本设计数据指标对比(0.81V,125℃,SS)Table 1 Comparison of 64-bit single-bit accumulation structure of traditional digital circuit and the design data index (0.81V, 125℃, SS)
Figure PCTCN2019114252-appb-000003
Figure PCTCN2019114252-appb-000003
如上所述,尽管参照特定的优选实施例已经表示和表述了本发明,但其不得解释为对本发明自身的限制。在不脱离所附权利要求定义的本发明的精神和范围前提下,可对其在形式上和细节上做出各种变化。As mentioned above, although the present invention has been shown and described with reference to specific preferred embodiments, it should not be construed as limiting the present invention itself. Various changes in form and details can be made without departing from the spirit and scope of the present invention as defined by the appended claims.

Claims (6)

  1. 模拟延时单元,其特征在于,采用数字输入信号控制时钟输入信号的延迟,具体包括:PMOS管(M1)、第一NMOS管(M2)、第二NMOS管(M3)、第三NMOS管(M4)、反相器(U1),PMOS管(M1)的栅极与第一NMOS管(M2)的栅极并接后接时钟输入信号,PMOS管(M1)的漏极与第一NMOS管(M2)的漏极并接后与反相器(U1)的输入端连接,第二NMOS管(M3)的栅极接数字输入信号,第二NMOS管(M3)的漏极与第三NMOS管(M4)的漏极并接后与第一NMOS管(M2)的源极连接,PMOS管(M1)的源极和第三NMOS管(M4)的栅极均接电源,第二NMOS管(M3)的源极与第三NMOS管(M4)的源极共同接地。The analog delay unit is characterized in that it uses a digital input signal to control the delay of the clock input signal, and specifically includes: a PMOS tube (M1), a first NMOS tube (M2), a second NMOS tube (M3), and a third NMOS tube ( M4), inverter (U1), the gate of the PMOS tube (M1) and the gate of the first NMOS tube (M2) are connected in parallel and then the clock input signal, the drain of the PMOS tube (M1) and the first NMOS tube The drain of (M2) is connected in parallel with the input terminal of the inverter (U1), the gate of the second NMOS tube (M3) is connected to the digital input signal, and the drain of the second NMOS tube (M3) is connected to the third NMOS The drain of the tube (M4) is connected in parallel with the source of the first NMOS tube (M2), the source of the PMOS tube (M1) and the gate of the third NMOS tube (M4) are both connected to the power supply, and the second NMOS tube The source of (M3) and the source of the third NMOS transistor (M4) are both grounded.
  2. 模拟延时链,其特征在于,其特征在于,由多个如权利要求1所述的模拟延时单元串联而成,后一个模拟延时单元的数字信号输入端接前一个模拟延时单元的输出端。The analog delay chain is characterized in that it is composed of a plurality of analog delay units as claimed in claim 1 in series, and the digital signal input end of the latter analog delay unit is connected to the previous analog delay unit. The output terminal.
  3. 二值化神经网络累加器电路,其特征在于,包括两条如权利要求2所述的模拟延时链及一个D触发器,两个模拟延时链的时钟信号输入端接同一脉冲时钟信号,第一模拟延时链中各模拟延时单元的数字数据输入端接二值化神经网络层权重参数和输入特征图数据的卷积结果,第二模拟延时链中各延时单元的数字数据输入端接对应于所述二值化神经网络层中各卷积单元计算结果的参考值,D触发器的数据输入端接第一模拟延时链的输出端,D触发器的时钟输入端接第二模拟延时链的输出端,D触发器比较两条模拟延时链输出信号到达的先后顺序输出标志信号。The binary neural network accumulator circuit is characterized by comprising two analog delay chains according to claim 2 and a D flip-flop, the clock signal input ends of the two analog delay chains are connected to the same pulse clock signal, The digital data input terminal of each analog delay unit in the first analog delay chain is connected to the convolution result of the binary neural network layer weight parameter and the input feature map data, and the digital data of each delay unit in the second analog delay chain The input terminal is connected to the reference value corresponding to the calculation result of each convolution unit in the binary neural network layer, the data input terminal of the D flip-flop is connected to the output terminal of the first analog delay chain, and the clock input terminal of the D flip-flop is connected At the output end of the second analog delay chain, the D flip-flop compares the arrival sequence of the output signals of the two analog delay chains and outputs the flag signal.
  4. 根据权利要求3所述二值化神经网络累加器电路,其特征在于,所述二值化神经网络层权重参数和输入特征图数据的卷积结果通过对权重数据和输入特征图数据进行同或运算得到,第一模拟延时链中每一个模拟延时单元的数字数据输入端接一个同或门的输出端,同或门的两个输入端分别接一个卷积单元的权重数据和输入特征图数据。The binarized neural network accumulator circuit according to claim 3, wherein the convolution result of the weight parameters of the binarized neural network layer and the input feature map data is performed by performing the same OR on the weight data and the input feature map data. It is calculated that the digital data input terminal of each analog delay unit in the first analog delay chain is connected to the output terminal of an XOR gate, and the two input terminals of the XOR gate are respectively connected to the weight data and input characteristics of a convolution unit Graph data.
  5. 根据权利要求3所述二值化神经网络累加器电路,其特征在于,所述二值化神经网络层中各卷积单元计算结果的参考值为训练所得到的每层网络的偏置值。The binarized neural network accumulator circuit according to claim 3, wherein the reference value of the calculation result of each convolution unit in the binarized neural network layer is the bias value of each network layer obtained by training.
  6. 根据权利要求3所述二值化神经网络累加器电路,其特征在于,两个模拟延时链的时钟信号输入端接入的同一脉冲时钟信号由脉冲产生电路提供,该脉冲产生电路包括:第一至第三与非门、延时模块、反相器,第一与非门的两个输入端分别接时钟信号和第三与非门的输出端,延时模块的输入端和反相器的输入端均与第一与非门的输出端连接,第二与非门的两个输入端分别接延时模块的输出端和第三与非门的输出端,第三与非门的两个输入端分别接第二与非门的输出端和时钟信号,反相器输出脉冲时钟信号至两个模拟延时链的时钟信号输入端。The binary neural network accumulator circuit according to claim 3, wherein the same pulse clock signal connected to the clock signal input ends of the two analog delay chains is provided by a pulse generating circuit, and the pulse generating circuit comprises: One to the third NAND gate, delay module, inverter, the two input terminals of the first NAND gate are respectively connected to the clock signal and the output terminal of the third NAND gate, the input terminal of the delay module and the inverter The input terminals of are connected to the output terminal of the first NAND gate, and the two input terminals of the second NAND gate are respectively connected to the output terminal of the delay module and the output terminal of the third NAND gate. The two input terminals are respectively connected to the output terminal and the clock signal of the second NAND gate, and the inverter outputs the pulse clock signal to the clock signal input terminals of the two analog delay chains.
PCT/CN2019/114252 2019-07-01 2019-10-30 Binary neural network accumulator circuit based on analogue delay chain WO2021000469A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910584269.1 2019-07-01
CN201910584269.1A CN110428048B (en) 2019-07-01 2019-07-01 Binaryzation neural network accumulator circuit based on analog delay chain

Publications (1)

Publication Number Publication Date
WO2021000469A1 true WO2021000469A1 (en) 2021-01-07

Family

ID=68409900

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114252 WO2021000469A1 (en) 2019-07-01 2019-10-30 Binary neural network accumulator circuit based on analogue delay chain

Country Status (2)

Country Link
CN (1) CN110428048B (en)
WO (1) WO2021000469A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115167093A (en) * 2022-07-20 2022-10-11 星汉时空科技(长沙)有限公司 Time interval precision measurement method and system based on FPGA
CN116720468A (en) * 2023-06-12 2023-09-08 南京邮电大学 Method for constructing unit library time sequence model by combining neural network

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115051700A (en) * 2021-03-09 2022-09-13 长鑫存储技术(上海)有限公司 Interleaved signal generating circuit
EP4203319A1 (en) 2021-03-09 2023-06-28 Changxin Memory Technologies, Inc. Interleaved signal generating circuit
EP4203316A4 (en) 2021-03-09 2024-08-14 Changxin Memory Tech Inc Signal output circuit and delay signal output circuit

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5155699A (en) * 1990-04-03 1992-10-13 Samsung Electronics Co., Ltd. Divider using neural network
CN106909970A (en) * 2017-01-12 2017-06-30 南京大学 A kind of two-value weight convolutional neural networks hardware accelerator computing module based on approximate calculation
CN107657312A (en) * 2017-09-18 2018-02-02 东南大学 Towards the two-value real-time performance system of voice everyday words identification
CN110414677A (en) * 2019-07-11 2019-11-05 东南大学 It is a kind of to deposit interior counting circuit suitable for connect binaryzation neural network entirely

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761153B (en) * 2005-11-04 2010-05-05 清华大学 High-speed master-slave type D trigger in low power consumption
US20150269482A1 (en) * 2014-03-24 2015-09-24 Qualcomm Incorporated Artificial neural network and perceptron learning using spiking neurons
CN107194462B (en) * 2016-03-15 2020-05-19 清华大学 Three-value neural network synapse array and neuromorphic computing network system using same
WO2019032870A1 (en) * 2017-08-09 2019-02-14 Google Llc Accelerating neural networks in hardware using interconnected crossbars
CN109635943B (en) * 2018-12-13 2022-03-18 佛山眼图科技有限公司 Digital-analog hybrid neuron circuit

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5155699A (en) * 1990-04-03 1992-10-13 Samsung Electronics Co., Ltd. Divider using neural network
CN106909970A (en) * 2017-01-12 2017-06-30 南京大学 A kind of two-value weight convolutional neural networks hardware accelerator computing module based on approximate calculation
CN107657312A (en) * 2017-09-18 2018-02-02 东南大学 Towards the two-value real-time performance system of voice everyday words identification
CN110414677A (en) * 2019-07-11 2019-11-05 东南大学 It is a kind of to deposit interior counting circuit suitable for connect binaryzation neural network entirely

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115167093A (en) * 2022-07-20 2022-10-11 星汉时空科技(长沙)有限公司 Time interval precision measurement method and system based on FPGA
CN115167093B (en) * 2022-07-20 2024-02-20 星汉时空科技(长沙)有限公司 Time interval precise measurement method and system based on FPGA
CN116720468A (en) * 2023-06-12 2023-09-08 南京邮电大学 Method for constructing unit library time sequence model by combining neural network
CN116720468B (en) * 2023-06-12 2024-01-19 南京邮电大学 Method for constructing unit library time sequence model by combining neural network

Also Published As

Publication number Publication date
CN110428048B (en) 2021-11-09
CN110428048A (en) 2019-11-08

Similar Documents

Publication Publication Date Title
WO2021000469A1 (en) Binary neural network accumulator circuit based on analogue delay chain
WO2022160847A1 (en) Method for optimizing polar-rnna quantizer of mlc-type nand flash memory on basis of deep learning
Samanta et al. Performance analysis of high speed low power carry look-ahead adder using different logic styles
Andri et al. Chewbaccann: A flexible 223 tops/w bnn accelerator
Onizawa et al. High-throughput low-energy self-timed CAM based on reordered overlapped search mechanism
CN115390789A (en) Magnetic tunnel junction calculation unit-based analog domain full-precision memory calculation circuit and method
CN114999544A (en) Memory computing circuit based on SRAM
Liu et al. A 1D-CRNN inspired reconfigurable processor for noise-robust low-power keywords recognition
Anthes Inexact design: beyond fault-tolerance
Ragavendran et al. Low Power and Low Complexity Flip-Flop Design using MIFGMOS
Zhao et al. Configurable memory with a multilevel shared structure enabling in-memory computing
US20080183784A1 (en) Full-adder of complementary carry logic voltage compensation
Xian et al. Weight Isolation-Based Binarized Neural Networks Accelerator
CN108199969B (en) Look-up type hardware search engine
Chen et al. A Quantization Model Based on a Floating-point Computing-in-Memory Architecture
CN203608178U (en) Enhanced dynamic full adder based on floating gate MOS (metal oxide semiconductor) transistor
CN116931873B (en) Two-byte multiplication circuit, and multiplication circuit and chip with arbitrary bit width of 2-power
Venkatesh et al. Performance analysis of adiabatic techniques using full adder for efficient power dissipation
Parameswari et al. Design and Analysis of Pruned Approximate Majority Logic Based Adder
Dai et al. Memory-Efficient Batch Normalization By One-Pass Computation for On-Device Training
CN111614346B (en) Processing unit and accumulation unit of AI processor based on delay signal
TWI821746B (en) Memory device and operation method thereof
Lee et al. Low Power Ternary XNOR using 10T SRAM for In-Memory Computing
CN117116310A (en) In-memory computing unit based on bit line isolation and bit line voltage stabilization
CN110046699B (en) Binarization system and method for reducing storage bandwidth requirement of accelerator external data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19936231

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19936231

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19936231

Country of ref document: EP

Kind code of ref document: A1