WO2019227518A1 - 一种基于存储器的卷积神经网络系统 - Google Patents

一种基于存储器的卷积神经网络系统 Download PDF

Info

Publication number
WO2019227518A1
WO2019227518A1 PCT/CN2018/090249 CN2018090249W WO2019227518A1 WO 2019227518 A1 WO2019227518 A1 WO 2019227518A1 CN 2018090249 W CN2018090249 W CN 2018090249W WO 2019227518 A1 WO2019227518 A1 WO 2019227518A1
Authority
WO
WIPO (PCT)
Prior art keywords
module
layer circuit
circuit module
convolution
flash
Prior art date
Application number
PCT/CN2018/090249
Other languages
English (en)
French (fr)
Inventor
李祎
潘文谦
缪向水
Original Assignee
华中科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华中科技大学 filed Critical 华中科技大学
Priority to US16/464,977 priority Critical patent/US11531880B2/en
Publication of WO2019227518A1 publication Critical patent/WO2019227518A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to the technical field of artificial neural networks, and more particularly, to a memory-based convolutional neural network system.
  • CNN Convolutional Neural Network
  • a convolutional neural network is a special deep neural network model. Its particularity is reflected in two aspects. On the one hand, the connections between neurons are not fully connected. On the other hand, some neurons in the same layer are not connected. The weights of the connections are shared (ie the same). Its non-fully connected and weight-sharing network structure makes it more similar to a biological neural network, reducing the complexity of the network model (for deep structures that are difficult to learn, this is very important) and reducing the weight quantity.
  • an object of the present invention is to solve the conventional deep convolutional neural network always requires a large amount of training data, using the weight 8 up to 10 parameters, to obtain data and sends it to the memory from the CPU and GPU Computing and sending the results back to storage will become quite time-consuming, fail to meet the needs of real-time data processing, and may also cause technical problems with huge hardware costs.
  • the present invention provides a memory-based convolutional neural network system, including: an input module, a convolution layer circuit module composed of a NOR FLASH array as a convolution kernel, a pooling circuit module based on the NOR FLASH array, The activation function module, the fully connected layer circuit module composed of the NOR FLASH array as a synapse, the softmax function module, and the output module.
  • the convolution kernel value or the synaptic weight value is stored in the NOR FLASH unit;
  • the input module converts the input signal into a voltage signal required by the convolutional neural network, and transmits the result to the convolution layer circuit module; the convolution layer circuit module stores the voltage signal corresponding to the input signal and the voltage signal stored in the NOR FLASH unit.
  • the convolution kernel value performs a convolution operation and transfers the result to the activation function module; the activation function module activates a signal and transfers the result to the pooling layer circuit module; the pooling layer circuit sends the activated signal and storage
  • the convolution kernel value in the NOR FLASH unit is pooled and the result is passed to the fully connected layer circuit module; the fully connected layer circuit module combines the signal after the pooling operation with the synapse stored in the NOR FLASH unit
  • the weight value is multiplied to achieve classification, and the classification result is passed to the softmax function module; the softmax function module normalizes the output result of the fully connected layer circuit module into a probability value, and passes the result to the output Module as the output of the entire network.
  • the memory used in the present invention is NOR FLASH.
  • NOR FLASH NOR FLASH
  • Those skilled in the art can also select other types of memory according to actual needs.
  • the present invention does not limit this uniquely, and it will not be specifically described below.
  • the system further includes: a weight processing module; the weight processing module is used to connect the convolution layer circuit module, the pooling layer circuit module, and the fully connected layer circuit module; the weight processing module includes: a conversion module and a driver Module; the conversion module maps the convolution kernel value or synaptic weight value to the corresponding matrix conversion mapping, and converts the convolution kernel matrix or weight value matrix into the threshold characteristic of NOR FLASH.
  • a weight processing module is used to connect the convolution layer circuit module, the pooling layer circuit module, and the fully connected layer circuit module
  • the weight processing module includes: a conversion module and a driver Module; the conversion module maps the convolution kernel value or synaptic weight value to the corresponding matrix conversion mapping, and converts the convolution kernel matrix or weight value matrix into the threshold characteristic of NOR FLASH.
  • the NOR FLASH threshold voltage is adjusted to a low threshold, and the threshold adjustment signal is set to 1; if the convolution kernel matrix element or the weight value matrix is 0, the NOR FLASH threshold voltage is adjusted to a high threshold, the threshold adjustment signal Set to 0 and send the threshold adjustment signal to the drive module; the drive module receives the threshold adjustment signal sent by the conversion module and sends pulses to the convolution layer circuit module, the pooling layer circuit module, and the fully connected layer circuit module according to the signal, Adjust the threshold of NOR FLASH, that is, store the convolution kernel matrix elements or synaptic weight values in NOR FLASH.
  • the convolution kernel value and synaptic weight value are obtained through offline learning.
  • the offline learning algorithm there will be a training set and a test set.
  • This training set contains multiple training instances, and each training instance has a probability.
  • the distribution is independently and identically extracted from the instance space.
  • the goal is to construct a classifier based on this training set.
  • the model is used for the test set to evaluate the quality of the model.
  • the input module converts an external input signal into a voltage signal required by the convolutional neural network, and the input signal and the voltage signal follow a proportional relationship.
  • the greater the value of the input signal the larger the corresponding voltage signal, and vice versa.
  • the convolution layer circuit module performs a convolution operation on the signal and the convolution kernel value stored in the NOR FLASH unit.
  • the circuit uses two columns of the NOR FLASH array as a convolution kernel.
  • the convolution kernel is converted into two matrices K + and K-, and the corresponding input signal X is converted into a one-dimensional matrix.
  • the K + array output terminal is connected to the positive input terminal of the operational amplifier included in the convolution layer circuit module, and the K-array output terminal.
  • the valid convolution kernel value is (K +)-(K-)
  • a positive value can be achieved
  • negative convolution kernel values In order to perform one-step convolution operations on the input signal without the need for a complex storage layer, when the value of the convolution kernel determined in the software is stored in the NOR FLASH unit, the convolution kernel value is It is mapped into a matrix that can perform matrix multiplication with the entire input signal. The convolution kernel is expanded into a large sparse matrix.
  • the NOR FLASH gates on the same row are connected together to connect the input module interface, and a driving voltage is connected to the NOR FLASH source.
  • the drain of the column NOR FLASH unit is connected at Since the current collecting drain terminal of the array to obtain the result of the convolution operation, the current in the same column implement pooled addition calculation, a drain terminal connected to the operational amplifier of the operational amplifier results obtained after transfer to the activation function module.
  • the pooling layer circuit module is mainly divided into an average pooling operation and a maximum pooling operation.
  • the circuit structure is a convolution layer circuit structure.
  • the convolution kernel values corresponding to the two operations are different, and the NOR and FLASH gate ends of the same row are connected.
  • Connect the activation function module together connect a driving voltage to the NOR FLASH source terminal, the drains of the same column of NOR FLASH units are connected together and connected to the input terminals of the operational amplifier included in the pooling layer circuit module, and collect the output of the operational amplifier The result is the result of the pooling operation.
  • the currents on the same column are brought together to achieve the addition calculation, and then the result obtained by the operational amplifier is passed to the fully connected layer circuit module.
  • the activation functions included in the activation function module mainly include: a sigmoid function, a hyperbolic tangent function, and a modified linear single function; the activation function module is respectively connected to the convolution layer circuit module and the pooling layer circuit module to convolve The operation result is activated and the output value y is obtained. At the same time, the output value is converted into a voltage signal for use as an input of the pooling layer.
  • the fully connected layer circuit module is connected to the pooling layer module and the softmax function module, respectively.
  • the fully connected layer maps the final output to a linearly separable space, that is, to implement the classification function.
  • the fully connected layer circuit module and its convolution The weight distribution of the layer circuit module is different.
  • the fully connected layer circuit module completes a simple series of multiplication and addition operations in the perceptron network for storing and calculating the weight matrix.
  • the convolution operation circuit is used for storing and calculating a group of Convolution kernel array; in order to achieve positive and negative weight values, the fully connected layer circuit module uses two NOR FLASH units as a synapse, the gate of the NOR FLASH unit is connected to the pooling layer output module, the source is connected to the driving voltage, and the positive weight value is stored
  • the drain terminal of the NOR FLASH unit is connected to the positive input terminal of the operational amplifier, otherwise, it is connected to the negative input terminal, and the output terminal of the operational amplifier is connected to the softmax function module.
  • the output result of the operational amplifier is the operation result of the fully connected layer.
  • W1 and W2 are two weight values stored in the NOR FLASH unit, then the effective weight value of the synapse is W1-W2, and the positive value can be achieved. Synaptic weight value.
  • the signal after the pooling operation is processed by the fully connected layer and then passes through the softmax function module.
  • the output value of the fully connected layer is normalized to a probability value, and then the result is passed to the output module to obtain the output of the entire network. .
  • the softmax function module is implemented
  • the output probability function is about to normalized values, the formula X i represents an input signal X in the i-th element, Input signal for all elements and the index, y is the input probability value output signal corresponding to X i.
  • the weight processing module modifies the convolution kernel and synaptic weight values obtained by software simulation into a NOR FLASH working mode, and stores the convolution kernel and synaptic weight values in the NOR FLASH.
  • the input module receives external information. After the information is calculated layer by layer through convolution operations, pooling operations, activation functions, fully connected layers, and activation functions, the final output module generates output results.
  • a convolution layer circuit module composed of a NOR FLASH array, in order to represent positive and negative convolution kernel values, two columns of NOR FLASH arrays are used as one convolution kernel.
  • NOR FLASH units In the fully connected circuit module, in order to represent positive and negative synaptic weight values, two NOR FLASH units are used as one synapse.
  • the multi-level threshold adjustment function of NOR FLASH is used to simulate the continuous adjustment of convolution kernel and synaptic weight values in convolutional neural networks.
  • the traditional brain-heuristic computing architecture of the von Neumann architecture is time-consuming, cannot meet the needs of real-time data processing, and also results in huge hardware costs.
  • the present invention implements hardwareization of a convolutional neural network to achieve integration of storage and calculation, and can well solve the above disadvantages.
  • FIG. 1 is a schematic structural diagram of a memory-based convolutional neural network system according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a convolution operation principle according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a synapse composed of two NOR FLASH units according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of a weight processing module according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of an input module according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a NOR FLASH-based convolution operation layer circuit module according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a mapping formula of a convolution kernel matrix and an input matrix according to an embodiment of the present invention
  • FIG. 7 (a) is a schematic diagram of converting a convolution kernel matrix K into matrices K + and K-
  • FIG. 7 (b) is an input matrix X Transformed into a one-dimensional matrix diagram
  • FIG. 8 is a schematic structural diagram of an activation function module according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a pooling layer circuit module according to an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of a fully connected layer circuit module according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of a softmax function module according to an embodiment of the present invention.
  • FIG. 1 is a schematic structural diagram of a memory-based convolutional neural network system according to an embodiment of the present invention.
  • the system includes: a weight processing module, an input module, a convolution layer circuit module composed of a NOR FLASH array as a convolution kernel, a pooling layer circuit module based on the NOR FLASH array, an activation function module, and a NOR
  • the FLASH array serves as a fully connected circuit module, a softmax function module, and an output module composed of synapses.
  • the weight processing module converts the convolution kernel value or the fully connected layer weight value obtained by the software into the corresponding NOR FLASH threshold, and stores it in the convolution layer circuit module, the pooling layer circuit module, and the fully connected layer circuit module.
  • the input module converts external input information into a voltage signal required by the convolutional neural network.
  • the input signal and the voltage signal follow a proportional mapping relationship. The larger the input signal value, the larger the corresponding voltage, and vice versa. Small and pass the voltage signal into the convolution layer circuit module.
  • the convolution layer circuit module performs a convolution operation on the input signal and the convolution kernel value stored in the NOR FLASH unit, and passes the result to the activation function module.
  • the activation function module activates the signal and passes the result to the pooling layer circuit module.
  • the pooling layer circuit performs a pooling operation on the activated signal and the value stored in the NOR FLASH unit, and transmits the result to the fully connected layer circuit module.
  • the fully connected layer circuit module multiplies the previously integrated signal with the weight value stored in the NOR FLASH unit to achieve classification, and passes the result to the softmax function module to normalize the output result to a probability value, and then the probability value Into the output module, the output result of the entire network is obtained.
  • Convolution kernels are used to perform matrix and image processing in image processing. It is also called a mask, and is a parameter for performing operations on the original image.
  • the convolution kernel is usually a square grid structure, and each square in the area has a weight value.
  • FIG. 3 is a schematic diagram of a synapse composed of two NOR FLASH units according to an embodiment of the present invention.
  • the NOR FLASH unit changes its threshold voltage by programming or erasing operation to store different logic values.
  • the multiplier can be stored in the unit by programming or erasing, and the multiplier is loaded on the gate. Both of them determine the opening of NOR FLASH. status.
  • a driving voltage is applied to the source terminal, and different current values can be obtained according to different open states. Collecting the drain terminal current in each column of NOR FLASH will obtain the result of the multiplication operation.
  • NOR FLASH units as one synapse can achieve positive and negative weight values.
  • the gates of the two NOR FLASH units are connected together and connected to the voltage signal X corresponding to the input signal, the source is connected to the driving voltage together, and two drains are connected.
  • the extreme ends are respectively connected to the input of the operational amplifier, and the weight values stored by the two NOR FLASH units are W + and W- respectively.
  • the effective weight value is [(W +)-(W-)], which can realize positive and negative weight values, where X is the voltage signal corresponding to the input signal.
  • FIG. 4 is a schematic structural diagram of a weight processing module provided by the present invention, which connects a convolution layer circuit module, a pooling layer circuit module, and a fully connected layer circuit module. Including: conversion module and drive module.
  • the conversion module performs corresponding matrix conversion mapping on the convolution kernel value and the synaptic weight value, and converts the convolution kernel matrix and the weight value matrix into the threshold characteristics of NOR FLASH. If the elements of the convolution kernel matrix and the weight value matrix are 1 or When -1, the NOR FLASH threshold voltage is adjusted to a low threshold; if the convolution kernel matrix element and the weight value matrix are 0, the NOR FLASH threshold voltage is adjusted to a high threshold and the result is sent to the drive module.
  • the driving module receives the threshold adjustment signal sent by the conversion module, and sends a pulse to the convolution layer circuit module, the pooling layer circuit module, and the fully connected layer circuit module according to the signal, and adjusts the threshold of the NOR FLASH, which is to convolution kernel matrix element or burst
  • the touch weight value is stored in NOR FLASH.
  • the convolution kernel value and synaptic weight value are obtained by offline training.
  • FIG. 5 is a schematic structural diagram of an input module provided in an embodiment of the present invention, which is connected to a convolution layer circuit module.
  • the input module converts external input information into a voltage signal required by the convolutional neural network through a conversion circuit module.
  • the voltage signal follows a proportional mapping relationship. The larger the input signal value, the larger the corresponding voltage signal. On the contrary, the smaller the corresponding voltage signal, and the voltage signal is passed to the convolution layer circuit module.
  • FIG. 6 is a schematic structural diagram of a NOR FLASH-based convolution layer circuit module provided in an embodiment of the present invention, and a black dot represents a NOR FLASH unit provided by an embodiment of the present invention.
  • the input signal of the embodiment of the present invention uses the MNIST handwritten font database as an example.
  • the data set contains 60,000 training sets and 10,000 test sets, and a total of 70,000 handwritten digital grayscale pictures, each of which contains 28 ⁇ 28 pixels.
  • the figure shows a convolution operation circuit with an input matrix of 28 ⁇ 28, a convolution kernel size of 9 ⁇ 9, and an output matrix size of 20 ⁇ 20.
  • the circuit is represented by two columns of NOR FLASH cell arrays.
  • the NOR FLASH gate terminals in the same row are connected together to connect the input module interface, a driving voltage is connected to the NOR FLASH source terminal, and the drains of the NOR FLASH cells in the same column are connected together and connected to the operational amplifier. Collect the current at the drain terminal of the array to obtain the convolution As a result of the operation, the currents on the same column are brought together to achieve the addition calculation, and the result obtained by the operational amplifier is passed to the activation function module.
  • the convolution kernel is extended to a large sparse matrix. In the following, we take the 2 ⁇ 2 convolution kernel matrix K and the 3 ⁇ 3 input signal matrix X as an example for demonstration.
  • Fig. 7 (a) shows how the convolution kernel matrix K based on NOR FLASH is transformed into the matrices K + and K- using the proposed method.
  • the convolution kernel is converted into two matrices, so the NOR FLASH array can easily interpret convolution kernels with positive and negative values. Since the input signal matrix X has 9 elements, each core matrix K + and K- must have 9 rows.
  • Figure 7 (b) shows how the input matrix X is converted into a one-dimensional matrix, multiplied by K + and K-, respectively. Since the size of K is 2 ⁇ 2 and the size of X is 3 ⁇ 3, the size of the output feature is 2 ⁇ 2. Therefore, the convolution kernel matrix must have 8 columns, and the output values of each two columns are respectively connected to the input end of the operational amplifier, and four output values are obtained at the output end of the operational amplifier.
  • the conversion method shown in Figure 7 is implemented in the weight processing module.
  • FIG. 8 is a schematic structural diagram of an activation function module in the present invention.
  • the activation function f mainly includes: an S-type function (sigmoid function), a hyperbolic tangent (tanh) function, and a modified linear single (ReLU) function.
  • the activation function module is respectively connected to the convolution layer circuit module and the pooling layer circuit module, activates the result of the convolution operation and obtains an output value y, and converts the output value into a voltage signal for use as an input of the pooling layer.
  • FIG. 9 is a schematic structural diagram of a pooling layer circuit module provided in an embodiment of the present invention, which is mainly divided into an average pooling operation and a maximum pooling operation.
  • the entire picture is divided into several small blocks of the same size without overlapping. Only the largest number (or average value) is taken in each small block, and other original nodes are discarded, and the original plane structure is maintained to output.
  • the pooling layer circuit module is connected to the activation function module and the fully connected layer circuit module respectively.
  • the pooling operation is a simpler convolution operation.
  • the circuit structure is also a convolution layer circuit structure, but the convolution kernel value changes.
  • the NOR FLASH gate terminals are connected together to connect to the input module interface. A driving voltage is connected to the NOR FLASH source terminal.
  • the drains of the same column of NOR FLASH units are connected together and connected to the input of the operational amplifier.
  • the results of the output of the operational amplifier are pooled.
  • the currents on the same column are brought together to achieve the addition calculation, and the output of the operational amplifier is connected to a fully connected layer circuit module.
  • the pooling operation can very effectively reduce the size of the matrix, thereby reducing the parameters in the final fully connected layer.
  • using the pooling layer can both speed up the calculation and prevent the problem of overfitting.
  • a 2 ⁇ 2 matrix size pooling operation is used.
  • the output matrix of the convolution operation layer is 20 ⁇ 20
  • the output matrix is 10 ⁇ 10
  • the size of the NOR FLASH array of the pooling layer circuit module is 801 ⁇ 100.
  • the value of the pooling layer convolution kernel is After the weight processing module, the value is stored in the NOR FLASH array. The result of the pooling operation is passed to the fully connected layer circuit module.
  • FIG. 10 is a schematic structural diagram of a fully connected layer circuit module provided in an embodiment of the present invention.
  • the pooled layer circuit module and the softmax function module are connected respectively.
  • the fully connected layer also called the feedforward layer
  • the fully connected layer circuit module is very similar to the convolution operation circuit, but the weight distribution is different. It completes a simple series of multiplication and addition operations in the perceptron network. This circuit is used to store the weight matrix, and the convolution operation circuit is used to store a set of convolution kernel arrays.
  • the weight value of the fully connected layer obtained by the software is converted into the threshold value of the corresponding NOR FLASH unit by the weight processing module, and it is stored in the NOR FLASH.
  • the fully connected layer circuit module uses two NOR FLASH units as one synapse, and stores the positive and negative weight values respectively.
  • the input signal is converted into a one-dimensional matrix.
  • the gate of the NOR FLASH unit is connected to the output of the pooling layer.
  • the source is connected to the driving voltage.
  • the drain terminal of the NOR FLASH unit that stores the positive weight value is connected to the positive input of the operational amplifier.
  • the output terminal of the operational amplifier is connected to the activation function module, and the output result of the operational amplifier is the operation result of the fully connected layer.
  • W1 and W2 are two effective weight values stored in the NOR FLASH unit, and then the effective weight value of the synapse is W1-W2, and the positive and negative synaptic weight values can be realized.
  • the input layer size of the pooling layer is 10 ⁇ 10. If the final classification category has 10 types, the size of the NOR FLASH array of the fully connected layer circuit in this embodiment is 101 ⁇ 10. After the signal is processed by the fully connected layer, it passes through the softmax function module, and then the result is passed to the output module to obtain the output of the entire network.
  • FIG. 11 is a schematic structural diagram of a softmax function module in the present invention, which is connected to a fully connected layer circuit module and an output module respectively, and the softmax function module is implemented
  • the output probability function is about to normalized values
  • the formula X i represents an input signal X in the i-th element
  • y is the input probability value output signal corresponding to X i.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Neurology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Complex Calculations (AREA)
  • Image Analysis (AREA)
  • Semiconductor Memories (AREA)

Abstract

本发明公开一种基于存储器的卷积神经网络系统,包括:输入模块、卷积层电路模块、池化电路模块、激活函数模块、全连接层电路模块以及输出模块,卷积核值或突触权重值储存在NOR FLASH单元中;输入模块将输入信号转换成卷积神经网络所需的电压信号;卷积层电路模块将输入信号对应的电压信号和卷积核值进行卷积运算并将结果传入激活函数模块;激活函数模块将信号进行激活;池化层电路将激活后的信号进行池化操作;全连接层电路模块将池化操作后的信号与突触权重值进行乘法运算实现分类,softmax函数模块分类结果归一化为概率值,作为整个网络的输出。本发明满足数据实时处理的需求,且硬件成本低。

Description

一种基于存储器的卷积神经网络系统
本申请要求于2018年5月8日提交中国国家知识产权局专利局、申请号为201810434049.6、发明名称为“一种基于NOR FLASH的卷积神经网络系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
[技术领域]
本发明涉及人工神经网络技术领域,更具体地,涉及一种基于存储器的卷积神经网络系统。
[背景技术]
卷积神经网络(Convolutional Neural Network,CNN)的网络结构是Fukushima第一个在1980年提出的。但是由于训练算法难以实现,所以还没有得到广泛的应用。在20世纪90年代,LeCun等人将一种基于梯度的学习算法应用于CNN,并取得了很好的结果。之后,研究人员进一步完善了CNN,并在图像识别领域取得了强有力的成果。最近,Ciresan等人提出了多层CNN来识别数字,字母,汉字和交通标志。CNN也被设计为模仿人类的视觉处理,并且在处理二维图像时有高度优化的结构来。此外,CNN可以有效地学习2D特征的提取和抽象。除了深度神经网络的共同优点之外,CNN还具有其他所需的特性。卷积神经网络是一种特殊的深层的神经网络模型,它的特殊性体现在两个方面,一方面它的神经元间的连接是非全连接的,另一方面同一层中某些神经元之间的连接的权重是共享的(即相同的)。它的非全连接和权值共享的网络结构使之更类似于生物神经网络,降低了网络模型的复杂度(对于很难学习的深层结构来说,这是非常重要的),减少了权值的数量。
然而,大多数软件卷积神经网络算法是由传统冯诺依曼体系结构下的CPU和GPU实现的。由于深卷积神经网络总是需要大量的训练数据,并利用重量参数达到10 8级,从存储器获取数据并将其发送给CPU和GPU进行 计算以及将结果发送回存储将变得相当耗时,不能满足数据实时处理的需求,并且还可能导致巨大的硬件成本。随着人工智能的高速兴起,我们需要寻找高效的卷积神经网络的硬件实现方案,超越冯诺依曼体系结构的大脑启发式计算架构,实现存储和计算的集成,从而支持更高速度和更低硬件成本的软件算法。
[发明内容]
针对现有技术的缺陷,本发明的目的在于解决现有深卷积神经网络总是需要大量的训练数据,并利用重量参数达到10 8级,从存储器获取数据并将其发送给CPU和GPU进行计算以及将结果发送回存储将变得相当耗时,不能满足数据实时处理的需求,并且还可能导致巨大的硬件成本的技术问题。
为实现上述目的,本发明提供一种基于存储器的卷积神经网络系统,包括:输入模块、由NOR FLASH阵列作为卷积核构成的卷积层电路模块、基于NOR FLASH阵列的池化电路模块、激活函数模块、由NOR FLASH阵列作为突触构成的全连接层电路模块、softmax函数模块以及输出模块,卷积核值或突触权重值储存在NOR FLASH单元中;
输入模块将输入信号转换成卷积神经网络所需的电压信号,并将结果传入卷积层电路模块;卷积层电路模块将所述输入信号对应的电压信号和存储在NOR FLASH单元中的卷积核值进行卷积运算并将结果传入所述激活函数模块;激活函数模块将信号进行激活并将结果传入所述池化层电路模块;池化层电路将激活后的信号和存储在NOR FLASH单元中的卷积核值进行池化操作,并将结果传入全连接层电路模块;所述全连接层电路模块将池化操作后的信号与存储在NOR FLASH单元中的突触权重值进行乘法运算实现分类作用,并将分类结果传入所述softmax函数模块;所述softmax函数模块将全连接层电路模块的输出结果归一化为概率值,并将结果传入所述输出模块,作为整个网络的输出。
可以理解的是,本发明中使用的存储器为NOR FLASH,本领域技术人员还可根据实际需要选择其他类型的存储器,本发明对此不做唯一性限定,以下不再做特别说明。
可选地,该系统还包括:权重处理模块;所述权重处理模块用于连接卷积层电路模块、池化层电路模块以及全连接层电路模块;所述权重处理模块包括:转换模块和驱动模块;转换模块将卷积核值或突触权重值进行相应的矩阵转换映射,并将卷积核矩阵或权重值矩阵转换为NOR FLASH的阈值特性,如果卷积核矩阵元素或权重值矩阵为1或-1时,NOR FLASH阈值电压被调整为低阈值,阈值调整信号设为1;如果卷积核矩阵元素或权重值矩阵为0时,NOR FLASH阈值电压被调整为高阈值,阈值调整信号设为0,并将阈值调整信号发送给驱动模块;驱动模块接收转换模块发送的阈值调整信号,并根据该信号向卷积层电路模块、池化层电路模块以及全连接层电路模块发送脉冲,调整NOR FLASH的阈值,即将卷积核矩阵元素或突触权重值存储到NOR FLASH中。
可选地,卷积核值和突触权重值通过离线学习方式获得,在离线学习算法中,会有一个训练集和测试集,这个训练集包含多个训练实例,每个训练实例以一个概率分布从实例空间中独立同分布地抽取,目标是根据这个训练集构造一个分类器,模型训练好后,再将模型用于测试集来评估模型的好坏。
可选地,输入模块将外界输入信号转换成所述卷积神经网络所需的电压信号,输入信号与电压信号遵循正比例关系,输入信号值越大,所对应电压信号就越大,反之,所对应电压信号就越小。
可选地,卷积层电路模块将信号与存储在NOR FLASH单元中的卷积核值进行卷积运算,为了表示正负卷积核值,电路用两列NOR FLASH阵列作为一个卷积核,卷积核转换为两个矩阵K+和K-,相应地输入信号X转换为一维矩阵,K+阵列输出端连接到卷积层电路模块包括的运算放大器 的正向输入端,K-阵列输出端连接到运算放大器的负向输入端,则输出结果为y=[(K+)-(K-)]*X,有效的卷积核值为(K+)-(K-),即可实现正值和负值的卷积核值,为了将输入信号能一步进行卷积操作而不需要中间复杂的存储层,将软件中确定的卷积核的数值存储在NOR FLASH单元时,卷积核值被映射成可以与整个输入信号进行矩阵乘法运算的矩阵,卷积核被扩展为大型稀疏矩阵,同一行的NOR FLASH栅端连接在一起连接输入模块接口,在NOR FLASH源端连接一个驱动电压,同一列NOR FLASH单元漏极连接在一起,收集阵列漏端的电流即得到卷积运算的结果,同一列上的电流汇集在一起实现加法计算,漏端连接到运算放大器后将运算放大器所得结果传递到激活函数模块。
可选地,池化层电路模块主要分为平均池化操作和最大池化操作,电路结构是卷积层电路结构,两种操作对应的卷积核值不同,同一行的NOR FLASH栅端连接在一起并连接激活函数模块,在NOR FLASH源端连接一个驱动电压,同一列NOR FLASH单元漏极连接在一起并分别连接到池化层电路模块包括的运算放大器的输入端,收集运算放大器的输出结果即得到池化操作的结果,同一列上的电流汇集在一起实现加法计算,然后将运算放大器所得结果传递到全连接层电路模块。
可选地,激活函数模块所包括的激活函数主要有:sigmoid函数、双曲正切函数和修正线性单函数;所述激活函数模块分别连接卷积层电路模块和池化层电路模块,将卷积运算结果进行激活并得到输出值y,同时将输出值转换成电压信号以便作为池化层的输入。
可选地,全连接层电路模块分别连接池化层模块和softmax函数模块,全连接层将最后的输出映射到线性可分的空间,即实现分类的功能,全连接层电路模块其与卷积层电路模块的权重的分布的不同,所述全连接层电路模块完成在感知器网络中简单的一系列乘法加法运算,用于存储与计算权重矩阵,卷积运算电路用于存储与计算一组卷积核数组;为了实现正负 权重值,全连接层电路模块使用两个NOR FLASH单元作为一个突触,NOR FLASH单元栅极连接池化层输出模块,源极连接驱动电压,存储正权重值的NOR FLASH单元漏端连接运算放大器正向输入端,反之,连接负向输入端,运算放大器输出端连接softmax函数模块,运算放大器输出结果即为全连接层操作结果,输出结果y=[W1-W2]*X,W1和W2为NOR FLASH单元存储的两个权重值,则突触的有效权重值为W1-W2,即可实现正负突触权重值,池化操作后的信号经过全连接层处理后再经过softmax函数模块,将全连接层输出的值归一化为概率值,然后将结果传入输出模块即得到整个网络的输出。
可选地,softmax函数模块实现
Figure PCTCN2018090249-appb-000001
即将输出结果归一化为概率值的功能,公式中X i表示输入信号X中的第i个元素,
Figure PCTCN2018090249-appb-000002
为所有输入信号元素的指数和,y为输入信号X i对应的概率输出值。
总体而言,通过本发明所构思的以上技术方案与现有技术相比,具有以下有益效果:
本发明提供的基于存储器的卷积神经网络系统,权重处理模块将软件模拟所得卷积核和突触权重值修改成NOR FLASH工作方式,将卷积核和突触权重值存储到NOR FLASH中。输入模块接收外部信息,该信息经过卷积运算,池化操作,激活函数,全连接层以及激活函数逐层计算后,最终输出模块产生输出结果。NOR FLASH阵列构成的卷积层电路模块中,为了表示正负卷积核值,将两列NOR FLASH阵列用作一个卷积核。全连接电路模块中,为了表示正负突触权重值,用两个NOR FLASH单元作为一个突触。利用NOR FLASH的多阶阈值调控功能来模拟卷积神经网络中卷积核和突触权重值的连续调节。传统的冯诺依曼体系结构的大脑启发式计算架构耗时,不能满足数据实时处理的需求,并且还会导致巨大的硬件成本。与上述现有技术的缺陷对比,本发明将卷积神经网络实现硬件化可以 实现存储和计算的集成,可以很好地解决上述缺点。
[附图说明]
图1为本发明实施例提供的基于存储器的卷积神经网络系统结构示意图;
图2为本发明实施例提供的卷积运算原理示意图;
图3为本发明实施例提供的由两个NOR FLASH单元组成一个突触的结构示意图;
图4为本发明实施例提供的权重处理模块结构示意图;
图5为本发明实施例提供的输入模块结构示意图;
图6为本发明实施例提供的基于NOR FLASH的卷积运算层电路模块结构示意图;
图7为本发明实施例提供的卷积核矩阵和输入矩阵的映射公式示意图;图7(a)为卷积核矩阵K转换成矩阵K+和K-示意图,图7(b)为输入矩阵X转换为一维矩阵示意图;
图8为本发明实施例提供的激活函数模块结构示意图;
图9为本发明实施例提供的池化层电路模块结构示意图;
图10为本发明实施例提供的全连接层电路模块结构示意图;
图11是本发明实施例提供的softmax函数模块结构示意图。
[具体实施方式]
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。此外,下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。
本发明的目的在于提供一种基于存储器的卷积神经网络系统。图1是本发明实施例提供的基于存储器的卷积神经网络系统结构示意图。如图1 所示,该系统包括:权重处理模块、输入模块、由NOR FLASH阵列作为卷积核构成的卷积层电路模块、基于NOR FLASH阵列的池化层电路模块、激活函数模块、由NOR FLASH阵列作为突触构成的全连接电路模块、softmax函数模块以及输出模块。
权重处理模块将软件所得卷积核数值或全连接层权重值转换成对应的NOR FLASH阈值,并存储在卷积层电路模块、池化层电路模块以及全连接层电路模块。
输入模块将外界输入信息转换成所述卷积神经网络所需的电压信号,输入信号与电压信号遵循正比映射关系,输入信号值越大,所对应电压就越大,反之,所对应电压就越小,并将电压信号传入卷积层电路模块。
卷积层电路模块将输入信号和存储在NOR FLASH单元中的卷积核值进行卷积运算并将结果传入所述激活函数模块。
激活函数模块将信号进行激活并将结果传入池化层电路模块。池化层电路将激活后的信号和存储在NOR FLASH单元中的值进行池化操作,并将结果传入所述全连接层电路模块。
全连接层电路模块将前面整合后的信号与存储在NOR FLASH单元中的权重值进行乘法运算实现分类作用,并将结果传入softmax函数模块将输出结果归一化为概率值,然后将概率值传入所述输出模块,得到整个网络的输出结果。
需要说明的是,卷积运算作为一种广义的积分概念,在图像处理和数字信号处理等方面有很重要的应用,卷积核(算子)是用来做图像处理时的矩阵,图像处理时也称为掩膜,是与原图像做运算的参数。卷积核通常是一个四方形的网格结构,该区域上每个方格都有一个权重值。使用卷积进行计算时,先将卷积核进行180°反转,将卷积核的中心放置在要计算的像素上,一次计算核中每个元素和其覆盖的图像像素值的乘积并求和,得到的结构就是该位置的新像素值。沿行方向右移一位继续计算重叠部分 的乘积和得到下一个像素值,直到行方向全部重叠过,然后卷积核沿列方向移一位并回到行方向起始点,计算重叠部分乘积和,直到输入矩阵被卷积核全部重叠过。对于一个规模为m×m的输入矩阵经过一个n×n的卷积核后,得到的输出矩阵大小为(m-n+1)×(m-n+1)。图2演示了一个大小为4×4的输入矩阵经过一个2×2的卷积核后得到3×3的输出矩阵的卷积计算过程。
图3是本发明实施例提供的由两个NOR FLASH单元组成一个突触的示意图。NOR FLASH单元通过编程或擦出操作来改变其阈值电压,从而来存储不同的逻辑值。卷积神经网络运算中存在大量的乘法操作,在NOR FLASH中的乘法运算,乘数可以通过编程或擦出存储到单元中,被乘数加载在栅极,两者共同决定了NOR FLASH的开启状态。在源端施加一个驱动电压,根据不同的开启状态可以得到不同的电流值,在每列NOR FLASH收集漏端电流即得到乘法运算的结果。使用两个NOR FLASH单元作为一个突触,可以实现正负权重值,两个NOR FLASH单元的栅极连接在一起并连接输入信号对应的电压信号X,源极一起连接到驱动电压,两个漏极端分别连接运算放大器输入端,两个NOR FLASH单元分别存储的权重值为W+和W-,则运算放大器输出端的有效输出为y=[(W+)-(W-)]*X,突触的有效权重值则为[(W+)-(W-)],即可实现正负权重值,其中X为输入信号对应的电压信号。
图4是本发明提供的权重处理模块结构示意图,其连接卷积层电路模块,池化层电路模块和全连接层电路模块。包括:转换模块和驱动模块。转换模块将卷积核值和突触权重值进行相应的矩阵转换映射,并将卷积核矩阵和权重值矩阵转换为NOR FLASH的阈值特性,如果卷积核矩阵元素和权重值矩阵为1或-1时,NOR FLASH阈值电压被调整为低阈值;如果卷积核矩阵元素和权重值矩阵为0,NOR FLASH阈值电压被调整为高阈值,并将结果发送给驱动模块。驱动模块接收转换模块发送的阈值调整信号, 并根据该信号向卷积层电路模块,池化层电路模块和全连接层电路模块发送脉冲,调整NOR FLASH的阈值,即将卷积核矩阵元素或突触权重值存储到NOR FLASH中。卷积核值和突触权重值由离线训练方式获得。
图5是本发明实施例中提供的输入模块结构示意图,其连接卷积层电路模块,输入模块将外界输入信息通过转换电路模块转换成所述卷积神经网络所需的电压信号,输入信号与电压信号遵循正比映射关系,输入信号值越大,所对应电压信号就越大,反之,所对应电压信号就越小,并将电压信号传入卷积层电路模块。
图6是本发明实施例中提供的基于NOR FLASH的卷积层电路模块结构示意图,用黑色圆点代表本发明实施例提供的NOR FLASH单元。本发明实施例输入信号采用MNIST手写字体数据库为例,该数据集包含60,000个训练集和10,000个测试集共70,000张手写数字的灰度图片,其中每一张图片包含28×28个像素点。图中示出了输入矩阵为28×28,卷积核大小为9×9,输出矩阵大小为20×20的卷积运算电路。为了表示正负卷积核值,该电路用两列NOR FLASH单元阵列来表示,卷积核转换为两个矩阵K+和K-,输入信号X转换为一维矩阵,存储K+的阵列漏极连接到运算放大器正向输入端,存储K-的阵列漏极连接到运算放大器负向向输入端,则运算放大器输出结果为y=[(K+)-(K-)]*X,即可实现正值和负值的卷积核。在神经网络中我们还需加入偏置的作用,所以我们需要的NOR FLASH阵列结构为1569行,400列。同一行的NOR FLASH栅端连接在一起连接输入模块接口,在NOR FLASH源端连接一个驱动电压,同一列NOR FLASH单元漏极连接在一起并连接到运算放大器,收集阵列漏端的电流即得到卷积运算的结果,同一列上的电流汇集在一起实现加法计算,并将运算放大器所得结果传递到激活函数模块。为了将输入信号能一步进行卷积操作而不需要中间复杂的存储层,将软件中确定的卷积核的数值存储在NOR FLASH单元时,它们必须被设置成使得可以从矩阵乘法运算中确定整个特 征映射。为此,卷积核被扩展为大型稀疏矩阵。以下我们以2×2卷积核矩阵K和3×3的输入信号矩阵X为实施例来演示。
图7(a)显示了使用所提出的方法基于NOR FLASH的卷积核矩阵K如何转换成矩阵K+和K-。卷积核被转换为两个矩阵,因此NOR FLASH阵列可以很容易地解释具有正值和负值的卷积核。由于输入信号矩阵X有9个元素,每个核心矩阵K+和K-必须有9行。
图7(b)显示了输入矩阵X如何转换为一维矩阵,分别乘以K+和K-。由于K的尺寸为2×2,X的尺寸为3×3,所以输出特征的尺寸为2×2。因此,卷积核矩阵必须有8列,每两列输出值分别连接到运算放大器输入端,在运算放大器输出端即得到四个输出值。图7显示的转换方法在权重处理模块中实现。
图8是本发明中的激活函数模块结构示意图,其中激活函数f主要有:S型函数(sigmoid函数)、双曲正切(tanh)函数和修正线性单(ReLU)函数。激活函数模块分别连接卷积层电路模块和池化层电路模块,将卷积运算结果进行激活并得到输出值y,同时将输出值转换成电压信号以便作为池化层的输入。
图9是本发明实施例中提供的池化层电路模块结构示意图,主要分为平均池化操作和最大池化操作。整个图片被不重叠的分割成若干个同样大小的小块。每个小块内只取最大的数字(或平均值),再舍弃其他节点后,保持原有的平面结构得输出。池化层电路模块分别连接激活函数模块和全连接层电路模块,池化操作是一个更简单的卷积操作,则电路结构也是卷积层电路结构,只是卷积核值发生改变,同一行的NOR FLASH栅端连接在一起连接输入模块接口,在NOR FLASH源端连接一个驱动电压,同一列NOR FLASH单元漏极连接在一起并分别连接运算放大器输入端,收集运算放大器输出端的结果即得到池化操作的结果,同一列上的电流汇集在一起实现加法计算,并将运算放大器输出端连接到全连接层电路模块。池 化操作可以非常有效地缩小矩阵尺寸,从而减少最后全连接层中的参数,同时,使用池化层既可以加快计算速度也有防止过拟合问题的作用。本实施例中使用2×2矩阵大小的池化操作,由于卷积运算层的输出矩阵为20×20,则输出矩阵为10×10,所以池化层电路模块NOR FLASH阵列大小为801×100。如果使用平均池化操作,则池化层卷积核的值为
Figure PCTCN2018090249-appb-000003
经过权重处理模块后将值存储在NOR FLASH阵列中。池化操作处理后将结果传入全连接层电路模块。
图10是本发明实施例中提供的全连接层电路模块结构示意图,分别连接池化层电路模块和softmax函数模块,全连接层(也叫前馈层)将最后的输出映射到线性可分的空间,即实现分类的功能。全连接层电路模块与卷积运算电路非常相似,只是权重的分布的不同,它完成了在感知器网络中简单的一系列乘法加法运算。该电路用于存储权重矩阵,卷积运算电路用于存储一组卷积核数组。将软件所得全连接层权重值经过权重处理模块转换成对应的NOR FLASH单元的阈值,将其存储在NOR FLASH中。为了实现正负权重值,全连接层电路模块使用两个NOR FLASH单元来作为一个突触,分别存储正负权重值。同样,输入信号转换一维矩阵,NOR FLASH单元栅极连接池化层输出,源极连接驱动电压,存储正权重值的NOR FLASH单元漏端连接运算放大器正向输入端,反之,连接负向输入端,运算放大器输出端连接激活函数模块,运算放大器输出结果即为全连接层操作结果。输出结果y=[W1-W2]*X。其中,W1和W2为NOR FLASH单元存储的两个有效权重值,则突触的有效权重值为W1-W2,即可实现正负突触权重值。池化层输入矩阵大小为10×10,如果最后的分类类别有10类,则本实施例中全连接层电路NOR FLASH阵列大小为101×10。信号经过全连接层处理后再经过softmax函数模块,然后将结果传入输出模块即得到整 个网络的输出。
图11是本发明中的softmax函数模块结构示意图,分别连接全连接层电路模块和输出模块,softmax函数模块实现
Figure PCTCN2018090249-appb-000004
即将输出结果归一化为概率值的功能,公式中X i表示输入信号X中的第i个元素,
Figure PCTCN2018090249-appb-000005
为所有输入信号元素指数和,y为输入信号X i对应的概率输出值。
本领域的技术人员容易理解,以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。

Claims (9)

  1. 一种基于存储器的卷积神经网络系统,其特征在于,包括:输入模块、由NOR FLASH阵列作为卷积核构成的卷积层电路模块、基于NOR FLASH阵列的池化电路模块、激活函数模块、由NOR FLASH阵列作为突触构成的全连接层电路模块、softmax函数模块以及输出模块,所述卷积核值或突触权重值储存在NOR FLASH单元中;
    所述输入模块将输入信号转换成卷积神经网络所需的电压信号,并将结果传入所述卷积层电路模块;
    所述卷积层电路模块将所述输入信号对应的电压信号和存储在NOR FLASH单元中的卷积核值进行卷积运算并将结果传入所述激活函数模块;
    所述激活函数模块将信号进行激活并将结果传入所述池化层电路模块;
    所述池化层电路将激活后的信号和存储在NOR FLASH单元中的卷积核值进行池化操作,并将结果传入所述全连接层电路模块;
    所述全连接层电路模块将池化操作后的信号与存储在NOR FLASH单元中的突触权重值进行乘法运算实现分类作用,并将分类结果传入所述softmax函数模块;
    所述softmax函数模块将全连接层电路模块的输出结果归一化为概率值,并将结果传入所述输出模块,作为整个网络的输出。
  2. 根据权利要求1所述的基于存储器的卷积神经网络系统,其特征在于,还包括:权重处理模块;
    所述权重处理模块用于连接卷积层电路模块、池化层电路模块以及全连接层电路模块;
    所述权重处理模块包括:转换模块和驱动模块;
    所述转换模块将卷积核值或突触权重值进行相应的矩阵转换映射,并 将卷积核矩阵或权重值矩阵转换为NOR FLASH的阈值特性,如果卷积核矩阵元素或权重值矩阵为1或-1时,NOR FLASH阈值电压被调整为低阈值,阈值调整信号设为1;如果卷积核矩阵元素或权重值矩阵为0时,NOR FLASH阈值电压被调整为高阈值,阈值调整信号设为0,并将阈值调整信号发送给驱动模块;
    所述驱动模块接收转换模块发送的阈值调整信号,并根据该信号向卷积层电路模块、池化层电路模块以及全连接层电路模块发送脉冲,调整NOR FLASH的阈值,即将卷积核矩阵元素或突触权重值存储到NOR FLASH中。
  3. 根据权利要求2所述的基于存储器的卷积神经网络系统,其特征在于,所述卷积核值和突触权重值通过离线学习方式获得,在离线学习算法中,有一个训练集和测试集,该训练集包含多个训练实例,每个训练实例以一个概率分布从实例空间中独立同分布地抽取,目标是根据这个训练集构造一个分类器,模型训练好后,将模型用于测试集来评估模型的好坏。
  4. 根据权利要求3所述的基于存储器的卷积神经网络系统,其特征在于,所述输入模块将外界输入信号转换成所述卷积神经网络所需的电压信号,输入信号与电压信号遵循正比例关系,输入信号值越大,所对应电压信号就越大,反之,所对应电压信号就越小。
  5. 根据权利要求1所述的基于存储器的卷积神经网络系统,其特征在于,卷积层电路模块将信号与存储在NOR FLASH单元中的卷积核值进行卷积运算,电路用两列NOR FLASH阵列作为一个卷积核,卷积核转换为两个矩阵K+和K-,相应地输入信号X转换为一维矩阵,K+阵列输出端连接到卷积层电路模块包括的运算放大器的正向输入端,K-阵列输出端连接到运算放大器的负向输入端,则输出结果为y=[(K+)-(K-)]*X,有效的卷积核值为(K+)-(K-),即可实现正值和负值的卷积核值,为了将输入信号能一步进行卷积操作而不需要中间复杂的存储层,将软件中确定的卷积核的数值存储在NOR FLASH单元时,卷积核值被映射成可以与整个输入信号进 行矩阵乘法运算的矩阵,卷积核被扩展为大型稀疏矩阵,同一行的NOR FLASH栅端连接在一起连接输入模块接口,在NOR FLASH源端连接一个驱动电压,同一列NOR FLASH单元漏极连接在一起,收集阵列漏端的电流即得到卷积运算的结果,同一列上的电流汇集在一起实现加法计算,漏端连接到运算放大器后将运算放大器所得结果传递到激活函数模块。
  6. 根据权利要求1所述的基于存储器的卷积神经网络系统,其特征在于,所述池化层电路模块主要分为平均池化操作和最大池化操作,电路结构是卷积层电路结构,两种操作对应的卷积核值不同,同一行的NOR FLASH栅端连接在一起并连接激活函数模块,在NOR FLASH源端连接一个驱动电压,同一列NOR FLASH单元漏极连接在一起并分别连接到池化层电路模块包括的运算放大器的输入端,收集运算放大器的输出结果即得到池化操作的结果,同一列上的电流汇集在一起实现加法计算,然后将运算放大器所得结果传递到全连接层电路模块。
  7. 根据权利要求1所述的基于存储器的卷积神经网络系统,其特征在于,所述激活函数模块所包括的激活函数主要有:sigmoid函数、双曲正切函数和修正线性单函数;所述激活函数模块分别连接卷积层电路模块和池化层电路模块,将卷积运算结果进行激活并得到输出值y,同时将输出值转换成电压信号以便作为池化层的输入。
  8. 根据权利要求1所述的基于存储器的卷积神经网络系统,其特征在于,所述全连接层电路模块分别连接池化层模块和softmax函数模块,全连接层将最后的输出映射到线性可分的空间,即实现分类的功能,全连接层电路模块其与卷积层电路模块的权重的分布的不同,所述全连接层电路模块完成在感知器网络中简单的一系列乘法加法运算,用于存储与计算权重矩阵,卷积运算电路用于存储与计算一组卷积核数组;全连接层电路模块使用两个NOR FLASH单元作为一个突触,NOR FLASH单元栅极连接池化层输出模块,源极连接驱动电压,存储正权重值的NOR FLASH单元漏端 连接运算放大器正向输入端,反之,连接负向输入端,运算放大器输出端连接softmax函数模块,运算放大器输出结果即为全连接层操作结果,输出结果y=[W1-W2]*X,W1和W2为NOR FLASH单元存储的两个权重值,则突触的有效权重值为W1-W2,即可实现正负突触权重值,池化操作后的信号经过全连接层处理后再经过softmax函数模块,将全连接层输出的值归一化为概率值,然后将结果传入输出模块即得到整个网络的输出。
  9. 根据权利要求1所述的基于存储器的卷积神经网络系统,其特征在于,所述softmax函数模块实现
    Figure PCTCN2018090249-appb-100001
    即将输出结果归一化为概率值的功能,公式中X i表示输入信号X中的第i个元素,
    Figure PCTCN2018090249-appb-100002
    为所有输入信号元素的指数和,y为输入信号X i对应的概率输出值。
PCT/CN2018/090249 2018-05-08 2018-06-07 一种基于存储器的卷积神经网络系统 WO2019227518A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/464,977 US11531880B2 (en) 2018-05-08 2018-06-07 Memory-based convolutional neural network system

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201810434049 2018-05-08
CN201810532151.XA CN108805270B (zh) 2018-05-08 2018-05-29 一种基于存储器的卷积神经网络系统
CN201810532151.X 2018-05-29

Publications (1)

Publication Number Publication Date
WO2019227518A1 true WO2019227518A1 (zh) 2019-12-05

Family

ID=64090795

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/090249 WO2019227518A1 (zh) 2018-05-08 2018-06-07 一种基于存储器的卷积神经网络系统

Country Status (3)

Country Link
US (1) US11531880B2 (zh)
CN (1) CN108805270B (zh)
WO (1) WO2019227518A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991623A (zh) * 2019-12-20 2020-04-10 中国科学院自动化研究所 基于数模混合神经元的神经网络运算系统
CN111639757A (zh) * 2020-04-11 2020-09-08 复旦大学 一种基于柔性材料的模拟卷积神经网络
CN112036561A (zh) * 2020-09-30 2020-12-04 北京百度网讯科技有限公司 数据处理方法、装置、电子设备及存储介质
CN112115665A (zh) * 2020-09-14 2020-12-22 上海集成电路研发中心有限公司 存算一体存储阵列及其卷积运算方法
CN113132397A (zh) * 2021-04-23 2021-07-16 信阳农林学院 一种基于深度学习的网络加密流量识别方法、装置及设备
CN115902615A (zh) * 2023-01-09 2023-04-04 佰聆数据股份有限公司 电力断路器缺陷分析方法及装置
CN110991623B (zh) * 2019-12-20 2024-05-28 中国科学院自动化研究所 基于数模混合神经元的神经网络运算系统

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11309026B2 (en) * 2017-01-25 2022-04-19 Peking University Convolution operation method based on NOR flash array
CN110785779A (zh) * 2018-11-28 2020-02-11 深圳市大疆创新科技有限公司 神经网络处理装置、控制方法以及计算系统
US11005995B2 (en) * 2018-12-13 2021-05-11 Nice Ltd. System and method for performing agent behavioral analytics
KR20200076083A (ko) * 2018-12-19 2020-06-29 에스케이하이닉스 주식회사 오류 역전파를 이용하여 지도 학습을 수행하는 뉴로모픽 시스템
JP2020113809A (ja) * 2019-01-08 2020-07-27 ソニー株式会社 固体撮像素子およびその信号処理方法、並びに電子機器
CN109784483B (zh) * 2019-01-24 2022-09-09 电子科技大学 基于fd-soi工艺的二值化卷积神经网络内存内计算加速器
DE102019106529A1 (de) 2019-03-14 2020-09-17 Infineon Technologies Ag Fmcw radar mit störsignalunterdrückung mittels künstlichem neuronalen netz
US11907829B2 (en) 2019-03-14 2024-02-20 Infineon Technologies Ag FMCW radar with interference signal suppression using artificial neural network
CN110288078B (zh) * 2019-05-19 2023-03-24 南京惟心光电系统有限公司 一种针对GoogLeNet模型的加速器及其方法
CN110533160A (zh) * 2019-07-22 2019-12-03 北京大学 基于nor flash模拟量计算阵列的深度神经网络
CN110543933B (zh) * 2019-08-12 2022-10-21 北京大学 基于flash存算阵列的脉冲型卷积神经网络
CN110475119A (zh) * 2019-08-12 2019-11-19 北京大学 基于flash存算阵列的图像压缩系统和方法
CN110852429B (zh) * 2019-10-28 2022-02-18 华中科技大学 一种基于1t1r的卷积神经网络电路及其操作方法
CN111222626B (zh) * 2019-11-07 2021-08-10 恒烁半导体(合肥)股份有限公司 一种基于NOR Flash模块的神经网络的数据切分运算方法
CN110991608B (zh) * 2019-11-25 2021-08-13 恒烁半导体(合肥)股份有限公司 一种卷积神经网络量化计算方法及系统
CN113033792A (zh) * 2019-12-24 2021-06-25 财团法人工业技术研究院 神经网络运算装置及方法
CN113496274A (zh) * 2020-03-20 2021-10-12 郑桂忠 基于存储器内运算电路架构的量化方法及其系统
CN111144558B (zh) * 2020-04-03 2020-08-18 深圳市九天睿芯科技有限公司 基于时间可变的电流积分和电荷共享的多位卷积运算模组
CN112163374B (zh) * 2020-09-27 2024-02-20 中国地质调查局自然资源综合调查指挥中心 一种多模态数据中间层融合全连接地质图预测模型的处理方法
CN112992232B (zh) * 2021-04-28 2021-08-17 中科院微电子研究所南京智能技术研究院 一种多位正负单比特存内计算单元、阵列及装置
CN113608063B (zh) * 2021-06-25 2023-04-18 北京智芯微电子科技有限公司 电力线路故障识别方法、装置及电子设备
CN113376172B (zh) * 2021-07-05 2022-06-14 四川大学 一种基于视觉与涡流的焊缝缺陷检测系统及其检测方法
CN113466338B (zh) * 2021-07-19 2024-02-20 中国工程物理研究院计量测试中心 一种基于神经网络的塑封电子元器件缺陷识别系统及方法
CN113592084B (zh) * 2021-07-23 2022-11-11 东南大学 基于反向优化超结构卷积核的片上光子神经网络
CN113672854B (zh) * 2021-08-25 2024-02-06 恒烁半导体(合肥)股份有限公司 一种基于电流镜和存储单元的存内运算方法、装置及其应用
CN113923723B (zh) * 2021-10-15 2023-05-09 中国联合网络通信集团有限公司 流量重构方法、装置、设备及存储介质
KR102517601B1 (ko) * 2021-12-27 2023-04-06 디즈니엔터프라이즈 유한회사 Nft 블록체인을 이용한 디즈니콜라보작품 p2p 거래 서비스 제공 시스템
CN115049885B (zh) * 2022-08-16 2022-12-27 之江实验室 一种存算一体卷积神经网络图像分类装置及方法
CN116147130A (zh) * 2023-04-18 2023-05-23 杭州行至云起科技有限公司 智能家居控制系统及其方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228240A (zh) * 2016-07-30 2016-12-14 复旦大学 基于fpga的深度卷积神经网络实现方法
US9600763B1 (en) * 2015-10-20 2017-03-21 Fujitsu Limited Information processing method, information processing device, and non-transitory recording medium for storing program
CN107463990A (zh) * 2016-06-02 2017-12-12 国家计算机网络与信息安全管理中心 一种卷积神经网络的fpga并行加速方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10685262B2 (en) * 2015-03-20 2020-06-16 Intel Corporation Object recognition based on boosting binary convolutional neural network features
EP3427198B1 (en) * 2016-03-07 2020-07-08 Hyla, Inc. Screen damage detection for devices
US9646243B1 (en) * 2016-09-12 2017-05-09 International Business Machines Corporation Convolutional neural networks using resistive processing unit array
US11309026B2 (en) * 2017-01-25 2022-04-19 Peking University Convolution operation method based on NOR flash array
CN107341518A (zh) * 2017-07-07 2017-11-10 东华理工大学 一种基于卷积神经网络的图像分类方法
CN107679622B (zh) * 2017-09-06 2020-08-14 清华大学 一种面向神经网络算法的模拟感知计算架构
US10572760B1 (en) * 2017-11-13 2020-02-25 Amazon Technologies, Inc. Image text localization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9600763B1 (en) * 2015-10-20 2017-03-21 Fujitsu Limited Information processing method, information processing device, and non-transitory recording medium for storing program
CN107463990A (zh) * 2016-06-02 2017-12-12 国家计算机网络与信息安全管理中心 一种卷积神经网络的fpga并行加速方法
CN106228240A (zh) * 2016-07-30 2016-12-14 复旦大学 基于fpga的深度卷积神经网络实现方法

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991623A (zh) * 2019-12-20 2020-04-10 中国科学院自动化研究所 基于数模混合神经元的神经网络运算系统
CN110991623B (zh) * 2019-12-20 2024-05-28 中国科学院自动化研究所 基于数模混合神经元的神经网络运算系统
CN111639757A (zh) * 2020-04-11 2020-09-08 复旦大学 一种基于柔性材料的模拟卷积神经网络
CN111639757B (zh) * 2020-04-11 2023-04-18 复旦大学 一种基于柔性材料的模拟卷积神经网络
CN112115665A (zh) * 2020-09-14 2020-12-22 上海集成电路研发中心有限公司 存算一体存储阵列及其卷积运算方法
CN112115665B (zh) * 2020-09-14 2023-11-07 上海集成电路研发中心有限公司 存算一体存储阵列及其卷积运算方法
CN112036561A (zh) * 2020-09-30 2020-12-04 北京百度网讯科技有限公司 数据处理方法、装置、电子设备及存储介质
CN112036561B (zh) * 2020-09-30 2024-01-19 北京百度网讯科技有限公司 数据处理方法、装置、电子设备及存储介质
CN113132397A (zh) * 2021-04-23 2021-07-16 信阳农林学院 一种基于深度学习的网络加密流量识别方法、装置及设备
CN115902615A (zh) * 2023-01-09 2023-04-04 佰聆数据股份有限公司 电力断路器缺陷分析方法及装置

Also Published As

Publication number Publication date
CN108805270B (zh) 2021-02-12
CN108805270A (zh) 2018-11-13
US11531880B2 (en) 2022-12-20
US20200285954A1 (en) 2020-09-10

Similar Documents

Publication Publication Date Title
WO2019227518A1 (zh) 一种基于存储器的卷积神经网络系统
WO2021244079A1 (zh) 智能家居环境中图像目标检测方法
US11861489B2 (en) Convolutional neural network on-chip learning system based on non-volatile memory
WO2021042828A1 (zh) 神经网络模型压缩的方法、装置、存储介质和芯片
WO2022252272A1 (zh) 一种基于迁移学习的改进vgg16网络猪的身份识别方法
WO2020238293A1 (zh) 图像分类方法、神经网络的训练方法及装置
CN108399421B (zh) 一种基于词嵌入的深度零样本分类方法
Diehl et al. Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing
Dong et al. Sparse fully convolutional network for face labeling
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
Teow Understanding convolutional neural networks using a minimal model for handwritten digit recognition
Arif et al. Study and observation of the variations of accuracies for handwritten digits recognition with various hidden layers and epochs using convolutional neural network
CN109829541A (zh) 基于学习自动机的深度神经网络增量式训练方法及系统
WO2021051987A1 (zh) 神经网络模型训练的方法和装置
CN109086802A (zh) 一种基于八元数卷积神经网络的图像分类方法
Zheng et al. Rethinking the Role of Activation Functions in Deep Convolutional Neural Networks for Image Classification.
Bouchain Character recognition using convolutional neural networks
Sun et al. Low-consumption neuromorphic memristor architecture based on convolutional neural networks
Luan et al. Sunflower seed sorting based on convolutional neural network
CN114882278A (zh) 一种基于注意力机制和迁移学习的轮胎花纹分类方法和装置
Li et al. Performance analysis of fine-tune transferred deep learning
López-Monroy et al. Neural networks and deep learning
Lv et al. Deep learning development review
Xia et al. Efficient synthesis of compact deep neural networks
CN108960275A (zh) 一种基于深度玻尔兹曼机的图像识别方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18920526

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.05.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18920526

Country of ref document: EP

Kind code of ref document: A1