WO2018205708A1 - 应用于二值权重卷积网络的处理系统及方法 - Google Patents

应用于二值权重卷积网络的处理系统及方法 Download PDF

Info

Publication number
WO2018205708A1
WO2018205708A1 PCT/CN2018/076260 CN2018076260W WO2018205708A1 WO 2018205708 A1 WO2018205708 A1 WO 2018205708A1 CN 2018076260 W CN2018076260 W CN 2018076260W WO 2018205708 A1 WO2018205708 A1 WO 2018205708A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
value
neural network
data
binary
Prior art date
Application number
PCT/CN2018/076260
Other languages
English (en)
French (fr)
Inventor
韩银和
许浩博
王颖
Original Assignee
中国科学院计算技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院计算技术研究所 filed Critical 中国科学院计算技术研究所
Priority to US16/603,340 priority Critical patent/US11551068B2/en
Publication of WO2018205708A1 publication Critical patent/WO2018205708A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2207/00Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F2207/38Indexing scheme relating to groups G06F7/38 - G06F7/575
    • G06F2207/48Indexing scheme relating to groups G06F7/48 - G06F7/575
    • G06F2207/4802Special implementations
    • G06F2207/4818Threshold devices
    • G06F2207/4824Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to a processing system and method applied to a binary weight convolution network.
  • Deep learning technology has developed rapidly in recent years. Deep neural networks, especially convolutional neural networks, have achieved in the fields of image recognition, speech recognition, natural language understanding, weather prediction, gene expression, content recommendation and intelligent robots. A wide range of applications.
  • the deep network structure obtained through deep learning is an operation model, which contains a large number of data nodes, each data node is connected with other data nodes, and the connection relationship between the nodes is represented by weights.
  • neural network technology has many problems such as occupying more resources, slower computing speed and higher energy consumption in the actual application process.
  • a binary weighted convolutional neural network model is applied to fields such as image recognition, augmented reality, and virtual reality.
  • Binary weighted convolutional neural networks reduce the data bit width by binarizing the weights (eg, using 1 and -1 to represent weights), greatly reducing parameter capacity and increasing the speed of network model operations.
  • the emergence of binary weighted convolutional neural networks reduces the hardware configuration required for complex systems such as image recognition, and extends the application field of convolutional neural networks.
  • the present invention is directed to a network feature and computational features of a binary weighted convolutional neural network, and provides a processing system and method for a binary weighted convolutional network to overcome the deficiencies of the prior art described above.
  • a processing system for a binary weighted convolutional neural network includes:
  • At least one storage unit for storing data and instructions
  • At least one control unit for obtaining an instruction stored in the storage unit and issuing a control signal
  • At least one calculating unit configured to obtain, from the storage unit, a node value of a layer in the convolutional neural network and corresponding binary weight value data, and obtain a node value of the next layer by performing an addition and subtraction operation.
  • the calculation unit includes a convolution unit and an accumulator, wherein the convolution unit receives a node value of a layer in a convolutional neural network and corresponding binary weight value data, the volume The output of the product unit is coupled to the accumulator.
  • the convolution unit includes a value inversion unit, a demultiplexing unit, and an adder, wherein the input data is respectively connected to the demultiplexing unit and directly connected by the numerical inversion unit Entering into the multiple selection unit, binary weight value data is accessed to the multiple selection unit to control signal gating of the multiple selection unit, and the output of the multiple selection unit is connected to the addition Device.
  • the binary weight value is mapped using the following formula:
  • Binarize(z) represents the mapped value
  • the binary weight value is further mapped to:
  • a processing method for a binary weighted convolutional neural network includes: obtaining a node value of a layer in the convolutional neural network and corresponding binary weight value data; obtaining a node value of the next layer by performing an addition and subtraction operation.
  • the binary weight value is mapped using the following formula:
  • Binarize(z) represents the mapped value
  • obtaining the node value of the next layer by performing the addition and subtraction operation includes: when the weight value is 1, the original input data is transmitted to the adder; and when the weight value is -1, the value is taken The reversed input data is transferred to the adder.
  • the binary weight value is further mapped to:
  • the present invention has an advantage in that the system based on the present invention can implement a processor or chip oriented to a binary convolution network, and reduces the overhead of the storage circuit by reducing the weight value bit width to a single bit. Reduced computational complexity and, in addition, reduced on-chip data transfer bandwidth. Compared with a neural network using a common bit width, the processing system provided by the present invention can effectively reduce chip power consumption and circuit area without losing excessive calculation accuracy.
  • FIG. 1 shows a schematic diagram of a model of a binary neural network according to an embodiment of the present invention
  • FIG. 2 is a block diagram showing the structure of a neural network processing system according to an embodiment of the present invention
  • FIG. 3 is a block diagram showing the structure of a neural network processing system according to another embodiment of the present invention.
  • FIG. 4 is a block diagram showing the structure of a computing unit in the neural network processing system of the present invention.
  • Figure 5 is a block diagram showing the structure of a convolution unit in a computing unit according to the present invention.
  • FIG. 6 shows a flow chart of a method of processing a neural network in accordance with one embodiment of the present invention.
  • the neural network structure includes an input layer, a plurality of hidden layers, and an output layer.
  • the first layer input value of the multi-layer structure is an original image (the "original image” in the present invention refers to
  • the raw data to be processed is not only the narrow image obtained by taking the photo), so the calculation of the first layer (input layer) requires normal bit width (for example, 8-bit, 16-bit, etc.) for calculation.
  • the remaining layers can be calculated in a binary manner, that is, the nodes of the next layer are obtained by performing binary operations on the node values of the layer and their corresponding weight values.
  • the present invention aims to provide a processing system or processor called a binary weighted neural network, which uses basic addition and subtraction operations in the calculation process of binary weighted neural networks instead of multiply and add in traditional convolutional neural networks. Operation, thereby improving the computing speed and energy efficiency of the neural network.
  • the neural network processor provided by the present invention is based on a storage-control-computation structure.
  • the storage structure is configured to store data participating in the calculation, neural network weights, and processor operation instructions;
  • the control structure is configured to parse the operation instructions, and generate a control signal for controlling scheduling and storage of data in the processing system and a calculation process of the neural network
  • the computing structure is used to participate in neural network computing operations in the processor to ensure that the data is correctly calculated in the computing unit with the corresponding weights.
  • a processing system 200 for a binary weighted neural network includes at least one storage unit 210, at least one control unit 220, and at least one computing unit 230.
  • the control unit 220 is connected to the storage unit 210 and the calculation unit 230.
  • the computing unit 230 is coupled to the storage unit 210 for reading or writing data from the storage unit 210.
  • the data path between the storage unit 210, the control unit 220, and the computing unit 230 includes interconnection technologies such as H-TREE or FAT-TREE.
  • the storage unit 210 is configured to store data (for example, original feature map data) transmitted from the outside of the neural network processing system or to store data generated during the processing, including processing results or intermediate results generated during the processing, and the results may be from The core computing component or other external computing component inside the neural network processing system.
  • the storage unit can also be used to store instruction information that participates in the calculation (e.g., load data to a computing unit, start calculations, end of calculations, or store calculation results to a storage unit, etc.).
  • the storage unit may be a common storage medium such as a static random access memory (SRAM), a dynamic random access memory (DRAM), a register file, or a new type of storage such as a 3D storage device.
  • the control unit 220 is configured to acquire an instruction stored in the storage unit and perform analysis, and then control the calculation unit 230 to perform a correlation operation of the neural network according to the analyzed control signal.
  • the control unit 220 performs operations such as instruction decoding, data scheduling, process control, and the like.
  • the calculation unit 230 is configured to perform a corresponding neural network calculation according to a control signal obtained from the control unit 220, and the calculation unit 230 is connected to the storage unit 210 to obtain data for calculation and write the calculation result to the storage unit 210.
  • the computing unit 230 can perform most of the calculations in the neural network, such as convolution operations, pooling operations, and the like.
  • the pooling operation is usually performed after the convolution operation, and its role is to reduce the convolution layer feature vector, which usually includes two types: average pooling and maximum pooling.
  • the average pooling method is to calculate the average value of all the elements in the layer as the output result.
  • the method of calculating the maximum value is to calculate the maximum value of all the elements in the layer. Through the pooling operation, the over-fitting phenomenon of the layer can be improved.
  • the processing system also includes an address addressing function for mapping the input index to the correct storage address to obtain the required data from the storage unit.
  • the address addressing function can be implemented in the control unit or in the form of a separate unit.
  • FIG. 3 is a structural block diagram of a neural network processing system in accordance with another embodiment of the present invention.
  • the difference from the neural network processing system of FIG. 2 is that in the neural network processing system 300 of FIG. 3 (in which the connection relationship of each unit is not shown), according to the type of the stored data, it is divided into a plurality of storage units, that is, input.
  • the input data storage unit 311 is configured to store data participating in the calculation, the data including the original feature map data and the data participating in the intermediate layer calculation;
  • the weight storage unit 312 is configured to store the trained neural network weights;
  • the instruction storage unit 313 is configured to store The instruction information participating in the calculation may be parsed by the control unit 320 into a control flow to schedule the calculation of the neural network;
  • the output data storage unit 314 is configured to store the calculated neuron response value.
  • the computational speed of the neural network can be provided by employing multiple parallel computing units.
  • the calculation unit is composed of a convolution unit, an addition unit (or adder), an accumulator unit, an intermediate layer buffer unit, a pooling, and a batch normalization unit, which are sequentially connected.
  • the convolution unit refers to the result of convolution by an addition and subtraction operation in physical implementation.
  • the convolution unit may be composed of units such as an original code-complement conversion unit, a multiplexer, and an adder for performing convolution operations of layer data and weights, and outputting the result as input data of the addition unit.
  • the accumulator consists of adder units for storing and accumulating partial data and results of the addition unit.
  • the intermediate layer buffer unit is composed of a memory for storing the result of the completion of the convolution operation by a single convolution kernel.
  • the pooling and batch normalization units perform pooling operations on the convolution output layer.
  • the addition unit may be implemented by an OR gate, or the input of the gate is an output result from the convolution unit, and the output value is a single bit value, and the addition unit is implemented by an OR gate to simplify the operation and increase the operation efficiency.
  • the addition unit can be implemented using a Hamming weight calculation unit.
  • the input of the Hamming weight calculation unit is the output result of the convolution unit, and the output value is the number of logic 1 in the input data, that is, the Hamming weight.
  • the addition unit is implemented by the Hamming weight calculation unit to accurately realize the summation operation.
  • the present invention proposes a convolution unit suitable for a binary weighted neural network, as shown in FIG.
  • the convolution unit is composed of a numerical inversion unit, a demultiplexing unit, and an adder unit.
  • the input data (eg, the node value of one layer in the convolutional neural network) is respectively connected to the numerical inversion unit and one input of the multiple selection unit, and the numerical inversion unit is connected to the other input of the multiple selection unit, the weight
  • the data is connected to the multiplex unit as a signal strobe unit, and the output result of the multiplex unit is connected to the adder unit, and the output result of the adder unit is output as a convolution unit.
  • the numerical inversion unit is used to reverse the input value.
  • the positive number is represented by the original code
  • the negative number is represented by the complement code
  • the numerical inversion unit can perform the numerical inversion processing of the input data.
  • the binary complement of the input to the value inversion unit is 1011 (-5);
  • the binary value of the input value value inverting unit is 0110 (+6).
  • the original input data and the inverted input data are connected to the multiplexer.
  • the multiplexer When the weight value is -1, the multiplexer outputs the input data after the numerical reversal, when the weight value is At 1 o'clock, the multiplexer outputs the original input data.
  • the adder unit is used to perform the addition operation in the convolution operation.
  • the weight data may be further reduced.
  • the specific process is as follows:
  • the weights can be expressed by 1 and the value -1. Therefore, when applying the weight data of the normal bit width in the traditional convolutional neural network to the binary weighted convolutional neural network, it is necessary to The layer is binarized.
  • z represents an input operand
  • Binarize(z) represents a mapping result, that is, an operation expressed by equation (1) is understood to be that when the input operand is greater than or equal to zero, the operand is binarized to 1; When less than zero, the operand is binarized to -1.
  • mapping may be performed in other manners, for example, by using a probabilistic method to determine the mapping to be 1 or -1.
  • a two-bit binary number can be used to describe the binarized weight data in the binary weighted neural network, wherein the upper bit is a sign bit, the lower bit is a data bit, the binary source code of 1 is 01, and the two's complement of the -1 is 11.
  • the weight data represented by the two bits described above may be remapped, and the remapping function r(z) is:
  • equation (2) can be understood as that when the input operand is equal to 1, the operand keeps the value 1 unchanged; when the operand is -1, the operand is mapped to the value 0.
  • the binary weighted neural network processing system proposed by the present invention can also use a value of 0 to represent a weight value of -1 in a binary weighted neural network, and a value of 1 to represent a weight value of 1 in a binary weighted neural network.
  • the weight values loaded into the weighted neural network processor need to be preprocessed off-chip, ie remapping according to the function r(z). In this way, the weight value represented by the two bits can be reduced to a single bit.
  • FIG. 6 is a flow chart of a method for performing neural network calculation using the neural network processing system of FIG. 3, the method comprising:
  • Step S1 the control unit addresses the storage unit, reads and parses the instruction that needs to be executed in the next step;
  • Step S2 obtaining input data from the storage unit according to the storage address obtained by the parsing instruction
  • Step S3 loading data and weights from the input storage unit and the weight storage unit to the computing unit;
  • Step S4 the calculating unit performs an arithmetic operation in the neural network operation, including a convolution operation, a pooling operation, and the like
  • step S5 the data storage is output to the storage unit.
  • the invention provides a processing system applied to a binary weighted convolutional neural network according to the characteristics of weight values 1 and -1 in the binary weighted neural network, which reduces the data bit width and improves the convolution operation in the neural network calculation process. Speed, reduced storage capacity and working energy consumption.
  • the convolutional neural network processor of the present invention can be applied to various electronic devices such as mobile phones, embedded electronic devices, and the like.
  • the invention can be a system, method and/or computer program product.
  • the computer program product can comprise a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement various aspects of the present invention.
  • the computer readable storage medium can be a tangible device that retains and stores the instructions used by the instruction execution device.
  • a computer readable storage medium may include, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, for example, with instructions stored thereon A raised structure in the hole card or groove, and any suitable combination of the above.

Abstract

本发明提供一种应用于二值权重卷积神经网络的处理系统。该系统包括:至少一个存储单元,用于存储数据和指令;至少一个控制单元,用于获得保存在所述存储单元的指令并发出控制信号;至少一个计算单元,用于从所述存储单元获得卷积神经网络中的一层的节点值和对应的二值权重值数据并通过执行加减操作获得下一层的节点值。本发明的系统减少了卷积神经网络计算过程中的数据位宽、提高了卷积运算速度、降低了存储容量及工作能耗。

Description

应用于二值权重卷积网络的处理系统及方法 技术领域
本发明涉及计算机技术领域,尤其涉及一种应用于二值权重卷积网络的处理系统及方法。
背景技术
深度学习技术在近几年得到了飞速的发展,深度神经网络,尤其是卷积神经网络,在图像识别、语音识别、自然语言理解、天气预测、基因表达、内容推荐和智能机器人等领域取得了广泛的应用。通过深度学习获得的深度网络结构是一种运算模型,其中包含大量数据节点,每个数据节点与其他数据节点相连,各个节点间的连接关系用权重表示。伴随着神经网络复杂度的不断提高,神经网络技术在实际应用过程中存在占用资源多、运算速度慢、能量消耗大等问题。
在现有技术中,为解决上述问题,将二值权重卷积神经网络模型应用到图像识别、增强现实和虚拟现实等领域。二值权重卷积神经网络通过将权重二值化(例如,采用1和-1表示权重)减少了数据位宽,极大地降低了参数容量并且提高了网络模型运算速度。二值权重卷积神经网络的出现降低了图像识别等复杂系统运行所需要的硬件配置,扩展了卷积神经网络的应用领域。
然而,目前大部分的深度学习应用是使用中央处理器和图形处理单元等实现的,这些技术能效不高,在嵌入式设备或低开销数据中心等领域应用时存在严重的能效问题和运算速度瓶颈,难以满足应用的性能要求,因此,很难将其应用于移动电话、嵌入式电子设备等小型化轻量级设备中。
发明内容
本发明针对二值权重卷积神经网络的网络特征和计算特征,提供一种应用于二值权重卷积网络的处理系统及方法,以克服上述现有技术的缺陷。
根据本发明的一个方面,提供了一种应用于二值权重卷积神经网络的 处理系统。该系统包括:
至少一个存储单元,用于存储数据和指令;
至少一个控制单元,用于获得保存在所述存储单元的指令并发出控制信号;
至少一个计算单元,用于从所述存储单元获得卷积神经网络中的一层的节点值和对应的二值权重值数据并通过执行加减操作获得下一层的节点值。
在本发明的系统中,所述计算单元包括卷积单元和累加器,其中,所述卷积单元接收卷积神经网络中的一层的节点值和对应的二值权重值数据,所述卷积单元的输出耦合到所述累加器。
在本发明的系统中,所述卷积单元包括数值取反单元、多路选择单元和加法器,其中,输入数据分别通过所述数值取反单元接入至所述多路选择单元以及直接接入至所述多路选择单元,二值权重值数据接入至所述多路选择单元以控制所述多路选择单元的信号选通,所述多路选择单元的输出接入至所述加法器。
在本发明的系统中,所述二值权重值采用以下公式进行映射:
Figure PCTCN2018076260-appb-000001
其中,z表示操作数,Binarize(z)表示映射后的值。
在本发明的系统中,所述二值权重值进一步映射为:
Figure PCTCN2018076260-appb-000002
其中,z表示操作数,r(z)表示映射后的值。
根据本发明的第二方面,提供了一种应用于二值权重卷积神经网络的处理方法。该方法包括:获得卷积神经网络中的一层的节点值和对应的二值权重值数据;通过执行加减操作获得下一层的节点值。
在本发明的方法中,所述二值权重值采用以下公式进行映射:
Figure PCTCN2018076260-appb-000003
其中,z表示操作数,Binarize(z)表示映射后的值。
在本发明的方法中,通过执行加减操作获得下一层的节点值包括:当权重值为1时,将原始输入数据传送到加法器;以及当权重值为-1时,将经过数值取反后的输入数据传送到加法器。
在本发明的方法中,所述二值权重值进一步映射为:
Figure PCTCN2018076260-appb-000004
其中,z表示操作数,r(z)表示映射后的值。
与现有技术相比,本发明的优点在于,基于本发明的系统可以实现面向二值卷积网络的处理器或芯片,通过将权重值位宽降低至单比特,减少了存储电路的开销,降低了计算复杂度,此外,也降低了片上数据传输带宽。与采用普通位宽的神经网络相比,本发明提供的处理系统可以在不损失过多计算精度的情况下,有效降低芯片功耗和电路面积。
附图说明
以下附图仅对本发明作示意性的说明和解释,并不用于限定本发明的范围,其中:
图1示出了根据本发明一个实施例的二值神经网络的模型示意图;
图2示出了根据本发明一个实施例的神经网络处理系统的结构框图;
图3示出了根据本发明另一实施例的神经网络处理系统的结构框图;
图4示出了本发明的神经网络处理系统中计算单元的结构框图;
图5示出了根据本发明的计算单元中的卷积单元的结构框图;
图6示出了根据本发明一个实施例的神经网络的处理方法的流程图。
具体实施方式
为了使本发明的目的、技术方案、设计方法及优点更加清楚明了,以下结合附图通过具体实施例对本发明进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。
神经网络结构包括输入层、多个隐藏层和输出层,在二值权重卷积神经网络中,多层结构的第一层输入值为原始图像(在本发明中的“原始图像”指的是待处理的原始数据,不仅仅是狭义的通过拍摄照片获得的图像),因此在第一层(输入层)计算时需要采用正常位宽(例如,8位、1 6位等)来进行计算,其余层可以采用二值方式进行计算,即通过对该层的节点值和其对应的权重值进行二值运算来得到下一层的节点。
参见图1示出的二值神经网络模型示意图,假设
Figure PCTCN2018076260-appb-000005
表示神经网络中某一层的几个节点,它们与下一层的节点y相连。
Figure PCTCN2018076260-appb-000006
表示对应连接的权重,由于所有的权重都是二值数据,例如,可以用1和-1表示二值的两个取值。通过函数f来计算y的取值,则可以定义:y=x×w。对于每一层的参数,权重值w为二值数据,当权重值w为1时,函数f的计算结果为x,当权重值w为-1时,函数f的计算结果为-x。因此,针对各层的运算存在大量的乘加操作。
本发明旨在提供一种面向二值权重神经网络的处理系统或称作处理器,该系统在二值权重神经网络计算过程中采用基本的加减操作来代替传统卷积神经网络中的乘加操作,从而提升神经网络的运算速度及能效。
图2示出了根据本发明一个实施例的应用于二值权重神经网络的处理系统的框图。概括而言,本发明提供的神经网络处理器基于存储-控制-计算的结构。存储结构用于存储参与计算的数据、神经网络权重及处理器操作指令;控制结构用于解析操作指令,生成控制信号,该信号用于控制处理系统内数据的调度和存储以及神经网络的计算过程;计算结构用于参与该处理器中的神经网络计算操作,保证数据在计算单元中能够正确地与相应权重进行计算。
具体地,参见图2的实施例,提供了面向二值权重神经网络的处理系统200,其包括至少一个存储单元210、至少一个控制单元220和至少一个计算单元230。控制单元220与存储单元210、计算单元230连接。计算单元230和存储单元210连接,用于从存储单元210读取或写入数据。存储单元210、控制单元220和计算单元230之间的数据通路包括H-TREE或FAT-TREE等互联技术。
存储单元210用于存储神经网络处理系统外部传来的数据(例如,原始特征图数据)或用于存储处理过程中产生的数据,包括处理过程中产生的处理结果或中间结果,这些结果可以来自于神经网络处理系统内部的核心运算部件或其他外部运算部件。此外,存储单元还可用于存储参与计算的指令信息(例如,载入数据至计算单元、计算开始、计算结束、或将计 算结果存储至存储单元等)。存储单元可以是静态随机存储器(SRAM)、动态随机存储器(DRAM)、寄存器堆等常见存储介质,也可以是3D存储器件等新型的存储类型。
控制单元220用于获取保存在存储单元的指令并进行解析,进而根据解析得到的控制信号来控制计算单元230进行神经网络的相关运算。控制单元220完成指令译码、数据调度、过程控制等工作。
计算单元230用于根据从控制单元220获得的控制信号来执行相应的神经网络计算,计算单元230与存储单元210相连,以获得数据进行计算并将计算结果写入到存储单元210。计算单元230可完成神经网络中的大部分计算,如,卷积操作、池化操作等。池化操作通常在卷积操作之后进行,其作用为降低卷积层特征向量,通常包括平均值池化和最大值池化两类。平均值池化的方法为计算图层内所有元素的平均值作为输出结果,最大值池化的方法为计算图层内所有元素的最大值最为输出结果。通过池化操作可以改善图层出现过拟合现象。
本领域的技术人员应理解的是,尽管图2中未示出,该处理系统还包括地址寻址功能,用于将输入的索引映射到正确的存储地址,以从存储单元中获得需要的数据或指令,地址寻址功能可以实现在控制单元中或以独立单元的形式实现。
图3是根据本发明另一实施例的神经网络处理系统的结构框图。与图2的神经网络处理系统的区别是:在图3(其中未示出各单元的连接关系)的神经网络处理系统300中,根据存储数据的类型不同,划分成多个存储单元,即输入数据存储单元311、权重存储单元312、指令存储单元313和输出数据存储单元314;计算单元包括多个可以并行处理的子计算单元1至N。
输入数据存储单元311用于存储参与计算的数据,该数据包括原始特征图数据和参与中间层计算的数据;权重存储单元312用于存储已经训练好的神经网络权重;指令存储单元313用于存储参与计算的指令信息,指令可被控制单元320解析为控制流来调度神经网络的计算;输出数据存储单元314用于存储计算得到的神经元响应值。通过将存储单元进行细分,可将数据类型基本一致的数据集中存储,以便于选择合适的存储介质并可以简化数据寻址等操作。
此外,通过采用多个并行的计算单元,可以提供神经网络的计算速度。
图4示出了图2和图3中的计算单元的结构框图和连接关系图。参见图4所示,计算单元由依次连接的卷积单元、加法单元(或加法器)、累加器单元、中间层缓冲单元、池化及批量归一化单元等运算部件组成。应注意的是,在本文中,所述的卷积单元指的是在物理实现时通过加减操作来完成卷积的结果。
卷积单元可由原码-补码转换单元、多路选择器和加法器等单元组成,用于完成图层数据与权重的卷积操作,输出结果作为加法单元的输入数据。
累加器由加法器单元组成,用于保存和累加加法单元的部分数据和结果。
中间层缓冲单元由存储器组成,用于存储单个卷积核完成卷积操作后的结果。
池化及批量归一化单元对卷积输出层进行池化操作。
在本发明的实施例中,可采用或门实现加法单元,或门的输入为来自卷积单元的输出结果,输出值为单比特值,采用或门实现加法单元可以简化运算、增加运算效率。在另一实施例中,可以采用汉明重量计算单元来实现加法单元。汉明重量计算单元的输入为卷积单元的输出结果,输出值为输入数据中逻辑1的数量,即汉明重量。采用汉明重量计算单元实现加法单元能够精确实现求和操作。
进一步地,本发明提出一种适用于二值权重神经网络的卷积单元,如图5所示。卷积单元由数值取反单元、多路选择单元和加法器单元组成。输入数据(例如,卷积神经网络中的一层的节点值)分别接入到数值取反单元以及多路选择单元的一个输入,数值取反单元接入至多路选择单元的另一个输入,权重数据接入至多路选择单元中作为信号选通单元,多路选择单元的输出结果接入至加法器单元中,加法器单元的输出结果作为卷积单元的输出结果。
数值取反单元用于将输入数值做取反操作。在本发明提供的应用于二值权重卷积神经网络处理中,正数采用原码表示,负数采用补码表示,数值取反单元可将输入数据做数值取反处理。例如,对于具有符号位的二进制正数0101(+5),输入至数值取反单元后输出的二进制补码为1011(-5);对于采用补码表示的具有符号位的二进制负数1010(-6),输入值数值取反单元后输出的二进制数为0110(+6)。
在卷积单元中原始输入数据和数值取反后的输入数据接入至多路选择器中,当权重值为-1时,多路选择器输出经过数值取反后的输入数据,当权重值为1时,多路选择器输出原始输入数据。加法器单元用于完成卷积运算中加法操作。
具体地,当采用本发明的卷积单元时,以卷积神经网络某一层的y=x 0×w 0+x 1×w 1+x 2×w 2为例,当w0为1,w1为-1,w2为1时,则y可以表示为y=x 0-x 1+x 2,即乘加运算过程转化为加减运算。
此外,在本发明提供的基于二值权重神经网络处理系统中,为了减少存储空间并提高运算效率,在本发明的另外实施中,还可将权重数据进一步缩减。具体过程如下:
对于二值权重卷积神经网络,权重可采用1和数值-1表示,因此在将传统卷积神经网络中的正常位宽的权重数据应用至二值权重卷积神经网络时需要依据下式对图层进行二值化处理。
Figure PCTCN2018076260-appb-000007
其中,z表示输入操作数,Binarize(z)表示映射结果,即式(1)表达的操作可理解为,当输入操作数大于等于零时,所述操作数被二值化为1;当操作数小于零时,操作数被二值化为-1。
本领域的技术人员应当理解,除了采用上式(1)中的决定式的二值化方式外,还可采用其他方式进行映射,例如,通过概率方法来判断映射为1或-1。
通常,可采用两比特二进制数描述二值权重神经网络中被二值化的权重数据,其中高位为符号位,低位为数据位,1的二进制源码为01,-1的二进制补码为11。
可将上述采用两比特表示的权重数据进行重映射,重映射函数r(z)为:
Figure PCTCN2018076260-appb-000008
式(2)表达的操作可理解为,当输入操作数等于1时,所述操作数保持数值1不变;当操作数为-1时,操作数被映射为数值0。
因此,本发明提出的二值权重神经网络处理系统也可采用数值0代表 二值权重神经网络中数值为-1的权重值,采用数值1代表二值权重神经网络中数值为1的权重值。载入至权重神经网络处理器的权重值需要在片外进行预处理,即根据函数r(z)进行重映射。通过这种方式,可以将两比特表示的权重值缩减为单比特。
图6是本发明利用图3的神经网络处理系统进行神经网络计算的方法流程图,该方法包括:
步骤S1,控制单元对存储单元寻址,读取并解析下一步需要执行的指令;
步骤S2,根据解析指令得到的存储地址从存储单元中获取输入数据;
步骤S3,将数据和权重分别从输入存储单元和权重存储单元载入至计算单元;
步骤S4,计算单元执行神经网络运算中的运算操作,其中包括卷积操作、池化操作等
步骤S5,将数据存储输出至存储单元中。
本发明依据二值权重神经网络中权重值为1和-1的特点,提供一种应用于二值权重卷积神经网络的处理系统,减少了神经网络计算过程中数据位宽、提高卷积运算速度、降低了存储容量及工作能耗。
本发明的卷积神经网络处理器可应用了各种电子设备,例如、移动电话、嵌入式电子设备等。
本发明可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本发明的各个方面的计算机可读程序指令。
计算机可读存储介质可以是保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以包括但不限于电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽 性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (10)

  1. 一种应用于二值权重卷积神经网络的处理系统,其特征在于,包括:
    至少一个存储单元,用于存储数据和指令;
    至少一个控制单元,用于获得保存在所述存储单元的指令并发出控制信号;
    至少一个计算单元,用于从所述存储单元获得卷积神经网络中的一层的节点值和对应的二值权重值数据并通过执行加减操作获得下一层的节点值。
  2. 根据权利要求1所述的系统,其特征在于,所述计算单元包括卷积单元和累加器,其中,所述卷积单元接收卷积神经网络中的一层的节点值和对应的二值权重值数据,所述卷积单元的输出耦合到所述累加器。
  3. 根据权利要求2所述的系统,其特征在于,所述卷积单元包括数值取反单元、多路选择单元和加法器,其中,输入数据分别通过所述数值取反单元接入至所述多路选择单元以及直接接入至所述多路选择单元,二值权重值数据接入至所述多路选择单元以控制所述多路选择单元的信号选通,所述多路选择单元的输出接入至所述加法器。
  4. 根据权利要求1所述的系统,其特征在于,所述二值权重值采用以下公式进行映射:
    Figure PCTCN2018076260-appb-100001
    其中,z表示操作数,Binarize(z)表示映射后的值。
  5. 根据权利要求4所述的系统,其特征在于,所述二值权重值进一步映射为:
    Figure PCTCN2018076260-appb-100002
    其中,z表示操作数,r(z)表示映射后的值。
  6. 一种应用于二值权重卷积神经网络的处理方法,其特征在于,包括:
    获得卷积神经网络中的一层的节点值和对应的二值权重值数据;
    通过执行加减操作获得下一层的节点值。
  7. 根据权利要求6所述的处理方法,其中,所述二值权重值采用以下公式进行映射:
    Figure PCTCN2018076260-appb-100003
    其中,z表示操作数,Binarize(z)表示映射后的值。
  8. 根据权利要求7所述的处理方法,其中,通过执行加减操作获得下一层的节点值包括:
    当权重值为1时,将原始输入数据传送到加法器;以及
    当权重值为-1时,将经过数值取反后的输入数据传送到加法器。
  9. 根据权利要求7所述的处理方法,其中,所述二值权重值进一步映射为:
    Figure PCTCN2018076260-appb-100004
    其中,z表示操作数,r(z)表示映射后的值。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现根据权利要求6至9中任一项所述方法的步骤。
PCT/CN2018/076260 2017-05-08 2018-02-11 应用于二值权重卷积网络的处理系统及方法 WO2018205708A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/603,340 US11551068B2 (en) 2017-05-08 2018-02-11 Processing system and method for binary weight convolutional neural network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710315998.8A CN107169563B (zh) 2017-05-08 2017-05-08 应用于二值权重卷积网络的处理系统及方法
CN201710315998.8 2017-05-08

Publications (1)

Publication Number Publication Date
WO2018205708A1 true WO2018205708A1 (zh) 2018-11-15

Family

ID=59812476

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/076260 WO2018205708A1 (zh) 2017-05-08 2018-02-11 应用于二值权重卷积网络的处理系统及方法

Country Status (3)

Country Link
US (1) US11551068B2 (zh)
CN (1) CN107169563B (zh)
WO (1) WO2018205708A1 (zh)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169563B (zh) * 2017-05-08 2018-11-30 中国科学院计算技术研究所 应用于二值权重卷积网络的处理系统及方法
CN108205704B (zh) * 2017-09-27 2021-10-29 深圳市商汤科技有限公司 一种神经网络芯片
CN107704923A (zh) * 2017-10-19 2018-02-16 珠海格力电器股份有限公司 卷积神经网络运算电路
CN109726807B (zh) * 2017-10-31 2023-11-24 上海寒武纪信息科技有限公司 神经网络处理器、运算方法及存储介质
CN107944545B (zh) * 2017-11-10 2020-07-31 中国科学院计算技术研究所 应用于神经网络的计算方法及计算装置
CN109978156B (zh) * 2017-12-28 2020-06-12 中科寒武纪科技股份有限公司 集成电路芯片装置及相关产品
CN109993272B (zh) * 2017-12-29 2019-12-06 北京中科寒武纪科技有限公司 卷积及降采样运算单元、神经网络运算单元和现场可编程门阵列集成电路
CN108256638B (zh) * 2018-01-05 2021-06-22 上海兆芯集成电路有限公司 微处理器电路以及执行神经网络运算的方法
CN108256644B (zh) 2018-01-05 2021-06-22 上海兆芯集成电路有限公司 微处理器电路以及执行神经网络运算的方法
CN108875922B (zh) * 2018-02-02 2022-06-10 北京旷视科技有限公司 存储方法、装置、系统及介质
CN108665063B (zh) * 2018-05-18 2022-03-18 南京大学 用于bnn硬件加速器的双向并行处理卷积加速系统
CN109325582B (zh) * 2018-09-07 2020-10-30 中国科学院计算技术研究所 面向二值神经网络的计算装置及方法
CN109359730B (zh) * 2018-09-26 2020-12-29 中国科学院计算技术研究所 面向固定输出范式Winograd卷积的神经网络处理器
CN110135563B (zh) * 2019-05-13 2022-07-26 北京航空航天大学 一种卷积神经网络二值化方法及运算电路
CN111985602A (zh) * 2019-05-24 2020-11-24 华为技术有限公司 神经网络计算设备、方法以及计算设备
KR102345409B1 (ko) * 2019-08-29 2021-12-30 주식회사 하이퍼커넥트 컨볼루션 뉴럴 네트워크에서 컨볼루션 연산을 가속하는 프로세서 및 프로세서의 동작 방법
CN110717387B (zh) * 2019-09-02 2022-07-08 东南大学 一种基于无人机平台的实时车辆检测方法
US11854536B2 (en) 2019-09-06 2023-12-26 Hyperconnect Inc. Keyword spotting apparatus, method, and computer-readable recording medium thereof
CN113177638A (zh) * 2020-12-11 2021-07-27 联合微电子中心(香港)有限公司 用于生成神经网络的二值化权重的处理器和方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654176A (zh) * 2014-11-14 2016-06-08 富士通株式会社 神经网络系统及神经网络系统的训练装置和方法
CN106127170A (zh) * 2016-07-01 2016-11-16 重庆中科云丛科技有限公司 一种融合关键特征点的训练方法、识别方法及系统
CN107169563A (zh) * 2017-05-08 2017-09-15 中国科学院计算技术研究所 应用于二值权重卷积网络的处理系统及方法

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130090147A (ko) * 2012-02-03 2013-08-13 안병익 신경망 컴퓨팅 장치 및 시스템과 그 방법
CN107430677B (zh) * 2015-03-20 2022-04-12 英特尔公司 基于对二进制卷积神经网络特征进行提升的目标识别
CN105260776B (zh) * 2015-09-10 2018-03-27 华为技术有限公司 神经网络处理器和卷积神经网络处理器
CN105681628B (zh) * 2016-01-05 2018-12-07 西安交通大学 一种卷积网络运算单元及可重构卷积神经网络处理器和实现图像去噪处理的方法
CN105844330B (zh) * 2016-03-22 2019-06-28 华为技术有限公司 神经网络处理器的数据处理方法及神经网络处理器
US10311342B1 (en) * 2016-04-14 2019-06-04 XNOR.ai, Inc. System and methods for efficiently implementing a convolutional neural network incorporating binarized filter and convolution operation for performing image classification
CN106203621B (zh) * 2016-07-11 2019-04-30 北京深鉴智能科技有限公司 用于卷积神经网络计算的处理器
US10417560B2 (en) * 2016-12-01 2019-09-17 Via Alliance Semiconductor Co., Ltd. Neural network unit that performs efficient 3-dimensional convolutions
WO2018119785A1 (en) * 2016-12-28 2018-07-05 Intel Corporation Method and apparatus for a binary neural network mapping scheme utilizing a gate array architecture
KR102592721B1 (ko) * 2017-01-11 2023-10-25 한국전자통신연구원 이진 파라미터를 갖는 컨볼루션 신경망 시스템 및 그것의 동작 방법
EP3631691A4 (en) * 2017-05-23 2021-03-31 Intel Corporation METHODS AND APPARATUS FOR IMPROVING A NEURONAL NETWORK OF BINARY WEIGHT USING A DEPENDENCY TREE
US11669585B2 (en) * 2019-06-25 2023-06-06 Apple Inc. Optimizing binary convolutional neural networks
US20210125063A1 (en) * 2019-10-23 2021-04-29 Electronics And Telecommunications Research Institute Apparatus and method for generating binary neural network
US11657259B2 (en) * 2019-12-20 2023-05-23 Sandisk Technologies Llc Kernel transformation techniques to reduce power consumption of binary input, binary weight in-memory convolutional neural network inference engine

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654176A (zh) * 2014-11-14 2016-06-08 富士通株式会社 神经网络系统及神经网络系统的训练装置和方法
CN106127170A (zh) * 2016-07-01 2016-11-16 重庆中科云丛科技有限公司 一种融合关键特征点的训练方法、识别方法及系统
CN107169563A (zh) * 2017-05-08 2017-09-15 中国科学院计算技术研究所 应用于二值权重卷积网络的处理系统及方法

Also Published As

Publication number Publication date
US20210089871A1 (en) 2021-03-25
CN107169563B (zh) 2018-11-30
CN107169563A (zh) 2017-09-15
US11551068B2 (en) 2023-01-10

Similar Documents

Publication Publication Date Title
WO2018205708A1 (zh) 应用于二值权重卷积网络的处理系统及方法
CN107256424B (zh) 三值权重卷积网络处理系统及方法
CN108427990B (zh) 神经网络计算系统和方法
CN107729989B (zh) 一种用于执行人工神经网络正向运算的装置及方法
US11880768B2 (en) Method and apparatus with bit-serial data processing of a neural network
WO2022037257A1 (zh) 卷积计算引擎、人工智能芯片以及数据处理方法
US11775807B2 (en) Artificial neural network and method of controlling fixed point in the same
KR20210065830A (ko) 에너지 효율적인 컴퓨팅 니어 메모리 이진 신경망 회로들
US11277149B2 (en) Bit string compression
Reis et al. A fast and energy efficient computing-in-memory architecture for few-shot learning applications
CN111930681A (zh) 一种计算装置及相关产品
CN115516463A (zh) 使用posit的神经元
US20230244923A1 (en) Neuromorphic operations using posits
Zhan et al. Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems
TWI814734B (zh) 用於執行卷積的計算裝置及計算方法
CN112132273B (zh) 计算装置、处理器、电子设备和计算方法
US11782711B2 (en) Dynamic precision bit string accumulation
CN112132272B (zh) 神经网络的计算装置、处理器和电子设备
JP2020021208A (ja) ニューラルネットワーク用プロセッサ、ニューラルネットワーク用処理方法、および、プログラム
WO2021081854A1 (zh) 一种卷积运算电路和卷积运算方法
JP2022539554A (ja) 高精度のニューラル処理要素
WO2019127480A1 (zh) 用于处理数值数据的方法、设备和计算机可读存储介质
US20220215235A1 (en) Memory system to train neural networks
US20240086153A1 (en) Multi-bit accumulator and in-memory computing processor with same
US20220058471A1 (en) Neuron using posits

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18798156

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18798156

Country of ref document: EP

Kind code of ref document: A1