WO2021027214A1 - 基于flash存算阵列的脉冲型卷积神经网络 - Google Patents

基于flash存算阵列的脉冲型卷积神经网络 Download PDF

Info

Publication number
WO2021027214A1
WO2021027214A1 PCT/CN2019/126343 CN2019126343W WO2021027214A1 WO 2021027214 A1 WO2021027214 A1 WO 2021027214A1 CN 2019126343 W CN2019126343 W CN 2019126343W WO 2021027214 A1 WO2021027214 A1 WO 2021027214A1
Authority
WO
WIPO (PCT)
Prior art keywords
flash
neural network
convolutional neural
layer
module
Prior art date
Application number
PCT/CN2019/126343
Other languages
English (en)
French (fr)
Inventor
黄鹏
项亚臣
康晋峰
刘晓彦
韩润泽
Original Assignee
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京大学 filed Critical 北京大学
Priority to US17/619,395 priority Critical patent/US20220414427A1/en
Publication of WO2021027214A1 publication Critical patent/WO2021027214A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • G06F12/0246Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means

Definitions

  • the present disclosure relates to the field of semiconductor devices and integrated circuits, in particular to a pulse-type convolutional neural network based on a FLASH storage and calculation array.
  • Deep learning has achieved great success in image processing and speech recognition, and has been widely used in areas such as autonomous driving and security monitoring.
  • the performance improvement of convolutional neural network is of great significance to the further development of deep learning.
  • the storage-computing integrated array (storage-computing array) designed based on FLASH can execute matrix-vector multiplication in parallel to realize the integration of storage and computation, thereby accelerating computation at the hardware level.
  • a similar storage-calculation integrated structure will introduce new problems, that is, the extra and huge hardware overhead brought by peripheral circuits, especially analog-to-digital/digital-to-analog converters.
  • a pulsed convolutional neural network based on a FLASH storage and calculation array including: a sampling module, a FLASH based storage and calculation array and its corresponding neuron module, and a counter module;
  • the sampling module is used to sample an input image to obtain an input pulse
  • the FLASH-based storage and calculation array stores a weight matrix, which performs a vector matrix multiplication operation on the input pulse and the weight matrix, and the operation result is output in the form of current;
  • the neuron module integrates the operation result of the FLASH-based storage and calculation array to generate an output pulse
  • the counter module counts the number of pulses generated by the neuron module of the output layer, and uses the number of pulses of the neuron module with the largest number of pulses as the recognition result.
  • FIG. 1 is a schematic structural diagram of a pulse-type convolutional neural network based on a FLASH storage and arithmetic array according to an embodiment of the disclosure
  • Figure 2(a) is a fully connected layer based on FLASH storage and computing array
  • Figure 2(b) is a convolutional layer and pooling layer based on FLASH storage and computing array
  • Figure 3 is a schematic diagram of the structure of the neuron module
  • Figure 4 is a schematic diagram of the structure of the counter module.
  • the present invention uses the number of pulses to represent specific numerical information, that is, the input and output of each layer in the convolutional neural network are expressed in binary (1/0).
  • Such a hardware implementation method converts the intermediate values of each layer of the convolutional neural network into binary, thus eliminating the analog-to-digital/digital-to-analog converter, can effectively solve the hardware overhead caused by peripheral circuits, and simplify the integration of storage and calculation structure Hardware implementation.
  • the first embodiment of the present disclosure provides a pulse-type convolutional neural network based on a FLASH storage and arithmetic array.
  • Convolutional neural network includes: input layer, multiple hidden layers and output layer.
  • Multiple hidden layers include: multi-layer convolutional layer, multi-layer pooling layer, one or more fully connected layers.
  • the convolutional layer and the pooling layer perform feature extraction and feature compression on the input data, and the fully connected layer processes the feature images extracted by the convolutional layer and the pooling layer, and outputs the classification or recognition results.
  • the embodiments of the present disclosure provide a pulse-type convolutional neural network based on a FLASH storage and calculation array.
  • the pulse-type convolutional neural network includes a sampling module, a multi-layer FLASH-based storage and calculation array, and corresponding Neuron module, and counter module.
  • the sampling module samples the input image, and uses Poisson sampling or Gaussian sampling to sample the input image to obtain a binary input pulse.
  • Each layer of FLASH-based storage and calculation arrays and corresponding neuron modules corresponds to a layer of the convolutional neural network, that is, the FLASH-based storage and calculation arrays and corresponding neuron modules can be input layer, convolutional layer, pooling Layer, fully connected layer and output layer.
  • the FLASH-based storage array of each layer receives the output pulses of the neuron module of the previous layer, and the output pulses of the neuron module of this layer are used as the input of the FLASH-based storage array of the next layer.
  • the FLASH-based storage array performs vector matrix multiplication operations on the input pulse and the weight matrix stored in the storage array, thereby realizing convolution, pooling, and full connection operations at the hardware level, and the operation results are output in the form of current.
  • the neuron module integrates the calculation result (current) of the FLASH storage and calculation array of this layer.
  • the pulse generating circuit When the voltage obtained by the integration exceeds the preset threshold, the pulse generating circuit will be triggered to generate a pulse.
  • the neuron module outputs a pulse, and then the neuron module The integrated voltage of is reset to the initial state.
  • the pulse generating circuit When the integrated voltage does not exceed the preset threshold, the pulse generating circuit will not be triggered and the neuron module will not output pulses.
  • the neuron module generates a pulse sequence (1/0) as the output pulse through the above-mentioned method, and as the input pulse for the next layer of the FLASH-based storage and calculation array.
  • Each node of the output layer includes a counter module, that is, each neuron module as the output layer is connected to a counter module.
  • the counter module counts and records the number of pulses generated by the neuron module of each output layer during the entire recognition process. Since a single sampling cannot guarantee the integrity of the input image sampling, the pulse-type convolutional neural network based on the FLASH storage and calculation array of this embodiment is used for multiple recognition, that is, the sampling-calculation-integration process will be used throughout the recognition Repeatedly during the process.
  • the counter module of the output layer compares the number of pulses generated by the neuron module of each output layer, and the output pulse number of the neuron module with the largest number of pulses is the recognition result.
  • the FLASH-based storage and calculation array includes: multiple FLASH cells, multiple word lines, multiple source lines, multiple bit lines, and multiple subtractors.
  • a plurality of FLASH cells form a storage array, wherein the gates of each column of FLASH cells are connected to the same word line, the sources are connected to the same source line, and the drains of each row of FLASH cells are connected to the same bit line.
  • the number of word lines corresponds to the number of columns in the storage array, and input pulses are input to the FLASH unit through the word lines.
  • the number of source lines corresponds to the number of columns of the storage array, and the source lines are all connected to a fixed driving voltage V ds , which is applied to the source of the FLASH unit.
  • the number of bit lines corresponds to the number of rows in the storage array and is used to output the signal of the drain of the FLASH unit.
  • Each row of bit lines superimposes the drain signal of the FLASH unit in each column of the row, and outputs the superimposed drain signal as the output signal . That is, the drains of the FLASH cells in each row are connected to the same bit line, and the total current value on the bit line is the sum of the output values of the FLASH cells in each column of the row.
  • Figure 2(a) shows the FLASH-based storage and calculation array of the fully connected layer.
  • the input pulse (1/0) is input to the word line in the form of voltage, and is multiplied and accumulated by the weight matrix stored in the FLASH storage array to generate a sum current along the bit line.
  • Figure 2(b) shows the FLASH-based storage and calculation array of the convolutional layer and the pooling layer.
  • the k ⁇ k FLASH units on every two adjacent bit lines store k ⁇ k convolution kernels. The advantage of this operation is that parallel calculations can be realized, and the results of convolution or pooling operations can be directly read from the bit line at one time.
  • the threshold voltage of the FLASH cell can be set by programming and erasing.
  • programming a FLASH cell hot electrons are injected, and the threshold voltage of the FLASH cell increases, and its storage state is regarded as "0", that is, the FLASH cell stores data "0".
  • the FLASH cell is erased, electrons tunnel and the threshold voltage of the FLASH cell is reduced, and its storage state is regarded as "1", that is, the FLASH cell stores data "1". It can be seen that by programming and erasing the FLASH unit, the FLASH unit can store two kinds of data "0" and "1".
  • the FLASH unit with the storage state of "0” is used to represent the “0” in the binary weight
  • the FLASH unit with the storage state of "1” is used to represent the "1” in the binary weight, thereby a storage and calculation array composed of multiple FLASH units Then the weight matrix can be expressed.
  • the source lines of the FLASH units are all connected to a fixed driving voltage V ds .
  • the input pulse is input to the FLASH unit via the word line.
  • 0 voltage is applied to the gate of the FLASH cell through the word line.
  • the ratio of the drain output current of the FLASH cell to the reference current is 0, and the drain output current is the "in the input pulse”
  • the product of 0" and the data stored in the FLASH unit ("0" or "1"); for the "1" in the input pulse, V g is applied to the gate of the FLASH unit through the word line, and the drain of the FLASH unit outputs current "1", that is, the product of "1” in the input pulse and the data stored in the FLASH unit.
  • Connect the drains of multiple FLASH units together to output, and the "sum current” reflects the result of the multiplication of the input vector and the matrix stored in the FLASH array, and realizes the matrix vector multiplication operation.
  • Each row of bit lines superimpose the drain signals of the FLASH cells in each column of the row, and the superimposed drain signal "sum current” is output as the output signal, that is, the total current value on the bit line is the sum of the output signals of the FLASH cells in this row and each column , Reflects the result of multiplying the input vector and the weight matrix stored in the FLASH storage array.
  • the neuron module includes: operational amplifier, comparator, pulse generating circuit, reset switch, input resistance, integrating capacitor, and parallel resistance.
  • Each neuron module corresponds to a subtractor based on the FLASH storage array.
  • the negative terminal of the operational amplifier is connected to the output terminal of the subtractor through an input resistance, and its positive terminal is grounded.
  • a reset switch, a parallel resistor and an integrating capacitor are connected in parallel between its negative terminal and its output terminal, and its output terminal is connected to one of the comparators.
  • Another input terminal of the comparator inputs a preset threshold value, and its output terminal is connected with a reset switch and a pulse generating circuit.
  • the current output from the subtractor of the FLASH-based storage and calculation array is input to the operational amplifier, and the integrating capacitor integrates the current.
  • the comparator compares the output voltage obtained by integration with the preset threshold voltage. If the output voltage exceeds the threshold voltage, the comparator triggers the pulse generating circuit to output pulses, and the reset switch is triggered through the feedback of the comparator to set the neuron module to the initial status. If the output voltage does not exceed the threshold voltage, the comparator will not trigger the pulse generating circuit, and the pulse generating circuit will not output pulses.
  • Figure 4 shows the counter module, which consists of N bit shift registers.
  • the input terminal of the counter module is connected to the pulse generating circuit of the neuron module of the output layer for receiving pulses output by the pulse generating circuit, and the output terminals are Q 0 ,..., Q N-2 , Q N-1 .
  • the counter of each neuron module counts the number of pulses output by the neuron module, and the number of output pulses corresponding to the neuron module with the largest number of output pulses is the recognition result of the neural network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

一种基于FLASH存算阵列的脉冲型卷积神经网络,包括:采样模块、基于FLASH的存算阵列及其对应的神经元模块、以及计数器模块;所述采样模块用于对输入图像进行采样,得到输入脉冲;所述基于FLASH的存算阵列存储有权重矩阵,其对输入脉冲与权重矩阵进行向量矩阵乘法运算,运算结果以电流形式输出;所述神经元模块对基于FLASH的存算阵列的运算结果进行积分,生成输出脉冲;所述计数器模块统计输出层的神经元模块产生的脉冲数量,将具有最大脉冲数量的神经元模块的脉冲数量作为识别结果。

Description

基于FLASH存算阵列的脉冲型卷积神经网络 技术领域
本公开涉及半导体器件及集成电路领域,具体是一种基于FLASH存算阵列的脉冲型卷积神经网络。
背景技术
深度学习在图像处理和语音识别等方面取得了巨大成功,并被广泛应用于自动驾驶、安防监控等领域。作为深度学习重要组成部分的卷积神经网络,其性能的提升对深度学习的进一步发展具有重要意义。基于FLASH设计的存储计算一体化阵列(存算阵列)能够并行执行矩阵向量乘法运算,实现存算一体化,从而在硬件层面对运算进行加速。但是类似的存算一体化结构会引入新的问题,即外围电路尤其是模数/数模转换器带来的额外且巨大的硬件开销。
公开内容
根据本公开的一个方面,提供了一种基于FLASH存算阵列的脉冲型卷积神经网络,包括:采样模块、基于FLASH的存算阵列及其对应的神经元模块、以及计数器模块;
所述采样模块用于对输入图像进行采样,得到输入脉冲;
所述基于FLASH的存算阵列存储有权重矩阵,其对输入脉冲与权重矩阵进行向量矩阵乘法运算,运算结果以电流形式输出;
所述神经元模块对基于FLASH的存算阵列的运算结果进行积分,生成输出脉冲;
所述计数器模块统计输出层的神经元模块产生的脉冲数量,将具有最大脉冲数量的神经元模块的脉冲数量作为识别结果。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举优选实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍。应当理解,以下附图仅示出了本公开的某些 实施例,因此不应被看作是对范围的限定。对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为本公开实施例的基于FLASH存算阵列的脉冲型卷积神经网络的结构示意图;
图2(a)为基于FLASH存算阵列的全连接层;图2(b)为基于FLASH存算阵列的卷积层和池化层;
图3为神经元模块的结构示意图;
图4为计数器模块的结构示意图。
具体实施方式
本发明以脉冲个数代表具体数值信息,即将卷积神经网络中各层的输入输出均用二进制(1/0)表示。这样的硬件实现方式将卷积神经网络各层的中间值均转变为二进制,因此消除了模数/数模转换器,可以有效解决外围电路带来的硬件开销,并简化存算一体化结构的硬件实现。
为使本公开的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本公开进一步详细说明。其中一些但并非全部的实施例将被示出。实际上,本公开的各种实施例可以许多不同形式实现,而不应被解释为限于此数所阐述的实施例。在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。
本公开第一实施例提供了一种基于FLASH存算阵列的脉冲型卷积神经网络。卷积神经网络包括:输入层、多个隐藏层以及输出层。多个隐藏层包括:多层卷积层、多层池化层、一层或多层全连接层。卷积层和池化层对输入数据进行特征提取与特征压缩,全连接层对卷积层和池化层提取得到的特征图像进行处理,输出得到分类或识别结果。
本公开实施例提供了一种基于FLASH存算阵列的脉冲型卷积神经网络,如图1所示,脉冲型卷积神经网络包括:采样模块、多层的基于FLASH的存算阵列以及对应的神经元模块、以及计数器模块。
采样模块对输入图像进行采样,利用泊松采样或高斯采样对输入图像进行采样,得到二进制的输入脉冲。
每一层基于FLASH的存算阵列以及对应的神经元模块,均对应卷积神经网络的一层,即基于FLASH的存算阵列以及对应的神经元模块可以 是输入层、卷积层、池化层、全连接层和输出层。每一层的基于FLASH的存算阵列接收上一层神经元模块的输出脉冲,该层的神经元模块的输出脉冲作为下一层基于FLASH的存算阵列的输入。
基于FLASH的存算阵列对输入脉冲与存算阵列中存储的权重矩阵进行向量矩阵乘法运算,从而在硬件层面实现卷积、池化和全连接等运算,运算结果以电流形式输出。
神经元模块对本层的FLASH存算阵列的运算结果(电流)进行积分,当积分得到的电压超过预先设置的阈值,将触发脉冲产生电路生成脉冲,神经元模块输出一个脉冲,然后该神经元模块的积分电压被复位至初始状态。当积分得到的电压未超过该预先设置的阈值,脉冲产生电路不会被触发,神经元模块不输出脉冲。神经元模块通过上述方式产生脉冲序列(1/0)作为输出脉冲,并作为下一层基于FLASH的存算阵列的输入脉冲。
输出层的每一个节点包括一个计数器模块,即作为输出层的每一个神经元模块均连接一个计数器模块。计数器模块统计在整个识别过程中每个输出层的神经元模块产生的脉冲数并记录。由于单次采样无法保证对输入图像采样的完整性,故利用本实施例的基于FLASH存算阵列的脉冲型卷积神经网络进行多次识别,即采样-计算-积分这一过程会在整个识别过程内多次进行。而在识别过程结束时,输出层的计数器模块会比较每个输出层的神经元模块产生的脉冲数,具有最大脉冲数的神经元模块的输出的脉冲数即为识别结果。
如图2所示,所述基于FLASH的存算阵列包括:多个FLASH单元、多条字线、多条源线、多条位线、多个减法器。
多个FLASH单元组成存算阵列,其中,每一列FLASH单元的栅极连接相同的字线,源极连接相同的源线,每一行FLASH单元的漏极连接相同的位线。
字线的数量对应于存算阵列的列数,输入脉冲通过字线输入FLASH单元。
源线的数量对应于存算阵列的列数,源线均接固定的驱动电压V ds,向FLASH单元的源极施加该驱动电压。
位线的数量对应于存算阵列的行数,用于输出FLASH单元漏极的信 号,每一行位线叠加该行各列FLASH单元的漏极信号,并将叠加的漏极信号作为输出信号输出。即每一行的FLASH单元的漏极都连接于同一根位线,位线上的总电流值即这一行各列FLASH单元输出值的和。
图2(a)为全连接层的基于FLASH的存算阵列。输入脉冲(1/0)以电压形式被输入至字线,与存储在FLASH存算阵列中的权值矩阵相乘并累加,生成沿位线方向的和电流。图2(b)为卷积层和池化层的基于FLASH的存算阵列。在每两根相邻的位线上的k×k个FLASH单元,存储k×k的卷积核。这样操作的好处是可以实现并行计算,卷积或池化运算的结果可以一次性从位线直接读出。
FLASH单元的阈值电压可以通过编程和擦除进行设置。当对FLASH单元编程时,热电子注入,FLASH单元的阈值电压升高,其存储状态视为“0”,即该FLASH单元存储有数据“0”。当对FLASH单元擦除时,电子隧穿,FLASH单元的阈值电压降低,其存储状态视为“1”,即该FLASH单元存储有数据“1”。由此可见,通过对FLASH单元的编程和擦除,可使FLASH单元存储有“0”和“1”两种数据,通过将卷积神经网络的权值矩阵中的权值转换为二进制数,并用存储状态为“0”的FLASH单元表示二进制权值中的“0”,用存储状态为“1”的FLASH单元表示二进制权值中的“1”,从而多个FLASH单元组成的存算阵列即可表示出权值矩阵。
本实施例的基于FLASH的存算阵列,FLASH单元的源线均接固定的驱动电压V ds。输入脉冲经字线输入FLASH单元。对于输入脉冲中的“0”,0电压通过字线施加于FLASH单元的栅极,此时该FLASH单元的漏极输出电流与参考电流的比值0,漏极输出电流即为输入脉冲中的“0”与该FLASH单元存储数据(“0”或“1”)的乘积;对于输入脉冲中的“1”,V g通过字线施加于FLASH单元的栅极,该FLASH单元的漏极输出电流为“1”,即为输入脉冲中的“1”与FLASH单元存储数据的乘积。将多个FLASH单元的漏极连接在一起输出,“和电流”反映了输入向量和FLASH阵列中所存矩阵相乘后的结果,实现矩阵向量乘法运算。
每一行位线叠加该行各列FLASH单元的漏极信号,并将叠加的漏极信号“和电流”作为输出信号输出,即位线上的总电流值即这一行各列FLASH单元输出信号的和,反映了输入向量和FLASH存算阵列中所存权 值矩阵相乘后的结果。
如图3所示,神经元模块包括:运算放大器、比较器、脉冲产生电路、复位开关、输入电阻、积分电容、并联电阻。
每个神经元模块对应基于FLASH的存算阵列的一个减法器。其中,运算放大器的负极端通过输入电阻连接减法器的输出端,其正极端接地,其负极端与其输出端之间还并联有复位开关、并联电阻和积分电容,其输出端连接比较器的一个输入端。比较器的另一个输入端输入预先设置的阈值,其输出端连接复位开关以及脉冲产生电路。
基于FLASH的存算阵列的减法器输出的电流输入运算放大器,积分电容对电流进行积分。比较器将积分得到的输出电压与预先设置的阈值电压进行比较,如果输出电压超过阈值电压,比较器触发脉冲产生电路输出脉冲,并通过比较器的反馈触发复位开关,将神经元模块置为初始状态。如果输出电压未超过阈值电压,比较器不会触发脉冲产生电路,脉冲产生电路不输出脉冲。
图4为计数器模块,该模块由N位移位寄存器构成。计数器模块的输入端连接输出层的神经元模块的脉冲产生电路,用于接收该脉冲产生电路输出的脉冲,输出端为Q 0,…,Q N-2,Q N-1。每个神经元模块的计数器统计该神经元模块输出的脉冲数量,对应输出脉冲数最大的神经元模块的输出脉冲数量即为该神经网络的识别结果。
以上的详细描述通过使用示意图、流程图和/或示例,已经阐述了上述空净一体机的众多实施例。在这种示意图、流程图和/或示例包含一个或多个功能和/或操作的情况下,本领域技术人员应理解,这种示意图、流程图或示例中的每一功能和/或操作可以通过各种结构、硬件、软件、固件或实质上它们的任意组合来单独和/或共同实现。
除非存在技术障碍或矛盾,本公开的上述各种实施例可以自由组合以形成另外的实施例,这些另外的实施例均在本公开的保护范围中。
虽然结合附图对本公开进行了说明,但是附图中公开的实施例旨在对本公开优选实施方式进行示例性说明,而不能理解为对本公开的一种限制。附图中的尺寸比例仅仅是示意性的,并不能理解为对本公开的限制。
虽然本公开总体构思的一些实施例已被显示和说明,本领域普通技术 人员将理解,在不背离本公开公开构思的原则和精神的情况下,可对这些实施例做出改变,本公开的范围以权利要求和它们的等同物限定。

Claims (11)

  1. 一种基于FLASH存算阵列的脉冲型卷积神经网络,其特征在于,包括:采样模块、基于FLASH的存算阵列及其对应的神经元模块、以及计数器模块;
    所述采样模块用于对输入图像进行采样,得到输入脉冲;
    所述基于FLASH的存算阵列存储有权重矩阵,其对输入脉冲与权重矩阵进行向量矩阵乘法运算,运算结果以电流形式输出;
    所述神经元模块对基于FLASH的存算阵列的运算结果进行积分,生成输出脉冲;
    所述计数器模块统计输出层的神经元模块产生的脉冲数量,将具有最大脉冲数量的神经元模块的脉冲数量作为识别结果。
  2. 如权利要求1所述的脉冲型卷积神经网络,其特征在于,所述采样模块用于利用泊松采样或高斯采样对输入图像进行采样,得到输入脉冲。
  3. 如权利要求1所述的脉冲型卷积神经网络,其特征在于,所述脉冲型卷积神经网络的每一层均包括:所述基于FLASH的存算阵列,所述基于FLASH的存算阵列包括:多个FLASH单元、多条字线、多条源线、多条位线、多个减法器;
    多个FLASH单元组成的存算阵列,每一列FLASH单元的栅极连接相同的字线,源极连接相同的源线,每一行FLASH单元的漏极连接相同的位线;每个减法器的正极端和负极端分别连接相邻两条位线。
  4. 如权利要求3所述的脉冲型卷积神经网络,其特征在于,
    所述字线的数量对应于所述存算阵列的列数,输入脉冲通过字线输入FLASH单元;
    所述源线的数量对应于所述存算阵列的列数,所述源线均接固定的驱动电压;
    所述位线的数量对应于所述存算阵列的行数,每一行位线叠加该行各列FLASH单元的漏极信号,并将叠加的漏极信号作为输出信号输出。
  5. 如权利要求3所述的脉冲型卷积神经网络,其特征在于,所述FLASH单元存储有卷积神经网络的权重值,所述基于FLASH的存算阵列存储卷积神经网络的权重矩阵。
  6. 如权利要求5所述的脉冲型卷积神经网络,其特征在于,对所述FLASH单元编程,所述FLASH单元的存储状态视为“0”;对所述FLASH单元擦除,所述FLASH单元的存储状态视为“1”。
  7. 如权利要求5所述的脉冲型卷积神经网络,其特征在于,所述减法器正极端连接的位线上的FLASH单元存储正权重值,其负极端连接的位线上的FLASH单元存储负权重值。
  8. 如权利要求1所述的脉冲型卷积神经网络,其特征在于,神经元模块包括:比较器、脉冲产生电路、复位开关、积分电容;
    积分电容对运算结果进行积分,比较器将积分得到的输出电压与预先设置的阈值电压进行比较,如果输出电压超过阈值电压,比较器触发脉冲产生电路输出脉冲,并通过比较器的反馈触发复位开关,将神经元模块置为初始状态;如果输出电压未超过阈值电压,比较器不会触发脉冲产生电路,脉冲产生电路不输出脉冲。
  9. 如权利要求1所述的脉冲型卷积神经网络,其特征在于,所述脉冲型卷积神经网络包括:输入层、多个隐藏层以及输出层;
    多个隐藏层包括:多层卷积层、多层池化层、一层或多层全连接层。
  10. 如权利要求9所述的脉冲型卷积神经网络,其特征在于,所述输入层、卷积层、池化层、全连接层和输出层中的至少一层的节点包括:基于FLASH的存算阵列以及对应的神经元模块。
  11. 如权利要求10所述的脉冲型卷积神经网络,其特征在于,所述输出层的每个节点的神经元模块均连接一个所述计数器模块。
PCT/CN2019/126343 2019-08-12 2019-12-18 基于flash存算阵列的脉冲型卷积神经网络 WO2021027214A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/619,395 US20220414427A1 (en) 2019-08-12 2019-12-18 Spiking convolutional neural network based on flash storage and computing array

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910741894.2A CN110543933B (zh) 2019-08-12 2019-08-12 基于flash存算阵列的脉冲型卷积神经网络
CN201910741894.2 2019-08-12

Publications (1)

Publication Number Publication Date
WO2021027214A1 true WO2021027214A1 (zh) 2021-02-18

Family

ID=68710806

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/126343 WO2021027214A1 (zh) 2019-08-12 2019-12-18 基于flash存算阵列的脉冲型卷积神经网络

Country Status (3)

Country Link
US (1) US20220414427A1 (zh)
CN (1) CN110543933B (zh)
WO (1) WO2021027214A1 (zh)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543933B (zh) * 2019-08-12 2022-10-21 北京大学 基于flash存算阵列的脉冲型卷积神经网络
CN113033759A (zh) * 2019-12-09 2021-06-25 南京惟心光电系统有限公司 脉冲卷积神经网络算法、集成电路、运算装置及存储介质
CN113033792A (zh) * 2019-12-24 2021-06-25 财团法人工业技术研究院 神经网络运算装置及方法
CN111611529B (zh) * 2020-04-03 2023-05-02 深圳市九天睿芯科技有限公司 电容容量可变的电流积分和电荷共享的多位卷积运算模组
CN111144558B (zh) * 2020-04-03 2020-08-18 深圳市九天睿芯科技有限公司 基于时间可变的电流积分和电荷共享的多位卷积运算模组
CN111611528B (zh) * 2020-04-03 2023-05-02 深圳市九天睿芯科技有限公司 电流值可变的电流积分和电荷共享的多位卷积运算模组
CN114186676A (zh) * 2020-09-15 2022-03-15 深圳市九天睿芯科技有限公司 一种基于电流积分的存内脉冲神经网络
CN112148669A (zh) * 2020-10-01 2020-12-29 北京知存科技有限公司 脉冲存算一体芯片以及电子设备
EP4283522A1 (en) * 2021-04-02 2023-11-29 Huawei Technologies Co., Ltd. Spiking neural network circuit and spiking neural network-based calculation method
CN112992232B (zh) * 2021-04-28 2021-08-17 中科院微电子研究所南京智能技术研究院 一种多位正负单比特存内计算单元、阵列及装置
WO2023240578A1 (zh) * 2022-06-17 2023-12-21 北京大学 应用于神经网络的存内计算架构的操作方法、装置和设备
CN115083462B (zh) * 2022-07-14 2022-11-11 中科南京智能技术研究院 一种基于Sram的数字型存内计算装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805270A (zh) * 2018-05-08 2018-11-13 华中科技大学 一种基于存储器的卷积神经网络系统
WO2019020384A1 (fr) * 2017-07-25 2019-01-31 Commissariat A L'energie Atomique Et Aux Energies Alternatives Calculateur pour reseau de neurones impulsionnel avec agregation maximale
CN109800876A (zh) * 2019-01-18 2019-05-24 合肥恒烁半导体有限公司 一种基于NOR Flash模块的神经网络的数据运算方法
CN109816026A (zh) * 2019-01-29 2019-05-28 清华大学 卷积神经网络和脉冲神经网络的融合结构及方法
CN110543933A (zh) * 2019-08-12 2019-12-06 北京大学 基于flash存算阵列的脉冲型卷积神经网络

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8515885B2 (en) * 2010-10-29 2013-08-20 International Business Machines Corporation Neuromorphic and synaptronic spiking neural network with synaptic weights learned using simulation
CN107239824A (zh) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 用于实现稀疏卷积神经网络加速器的装置和方法
CN108985447B (zh) * 2018-06-15 2020-10-16 华中科技大学 一种硬件脉冲神经网络系统
CN109460817B (zh) * 2018-09-11 2021-08-03 华中科技大学 一种基于非易失存储器的卷积神经网络片上学习系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019020384A1 (fr) * 2017-07-25 2019-01-31 Commissariat A L'energie Atomique Et Aux Energies Alternatives Calculateur pour reseau de neurones impulsionnel avec agregation maximale
CN108805270A (zh) * 2018-05-08 2018-11-13 华中科技大学 一种基于存储器的卷积神经网络系统
CN109800876A (zh) * 2019-01-18 2019-05-24 合肥恒烁半导体有限公司 一种基于NOR Flash模块的神经网络的数据运算方法
CN109816026A (zh) * 2019-01-29 2019-05-28 清华大学 卷积神经网络和脉冲神经网络的融合结构及方法
CN110543933A (zh) * 2019-08-12 2019-12-06 北京大学 基于flash存算阵列的脉冲型卷积神经网络

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG YAN, XIAN-FA XU: "Image Segmentation Method Based on Improved Pulse Coupled Neural Network", COMPUTER SCIENCE, 1 July 2019 (2019-07-01), pages 258 - 262, XP055779627, ISSN: 1102-137X *

Also Published As

Publication number Publication date
US20220414427A1 (en) 2022-12-29
CN110543933B (zh) 2022-10-21
CN110543933A (zh) 2019-12-06

Similar Documents

Publication Publication Date Title
WO2021027214A1 (zh) 基于flash存算阵列的脉冲型卷积神经网络
Sun et al. Fully parallel RRAM synaptic array for implementing binary neural network with (+ 1,− 1) weights and (+ 1, 0) neurons
CN109800876B (zh) 一种基于NOR Flash模块的神经网络的数据运算方法
US20180005115A1 (en) Accelerated neural network training using a pipelined resistive processing unit architecture
US11042715B2 (en) Electronic system for performing a multiplication of a matrix and vector
CN111052153B (zh) 使用半导体存储元件的神经网络运算电路及动作方法
US10783432B2 (en) Update management for RPU array
CN111898329B (zh) 基于铁电晶体管FeFET的卷积计算方法
US11922169B2 (en) Refactoring mac operations
JP7475080B2 (ja) 曖昧検索回路
US11309026B2 (en) Convolution operation method based on NOR flash array
US11366874B2 (en) Analog circuit for softmax function
US11907380B2 (en) In-memory computation in homomorphic encryption systems
US20050160130A1 (en) Arithmetic circuit
CN110597487B (zh) 一种矩阵向量乘法电路及计算方法
CN114418080A (zh) 存算一体运算方法、忆阻器神经网络芯片及存储介质
US20220318612A1 (en) Deep neural network based on flash analog flash computing array
Cao et al. Performance analysis of convolutional neural network using multi-level memristor crossbar for edge computing
EP3999957A1 (en) A method for interfacing with hardware accelerators
US11049000B2 (en) Distributed state via cascades of tensor decompositions and neuron activation binding on neuromorphic hardware
CN113988279A (zh) 一种支持负值激励的存算阵列输出电流读出方法及系统
CN114861900A (zh) 用于忆阻器阵列的权重更新方法和处理单元
CN114861902A (zh) 处理单元及其操作方法、计算芯片
Zhou et al. DCBAM: A discrete chainable bidirectional associative memory
CN116523011B (zh) 基于忆阻的二值神经网络层电路及二值神经网络训练方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19941153

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19941153

Country of ref document: EP

Kind code of ref document: A1