WO2019165679A1 - 一种包括比特转换装置的神经网络处理器及其方法 - Google Patents

一种包括比特转换装置的神经网络处理器及其方法 Download PDF

Info

Publication number
WO2019165679A1
WO2019165679A1 PCT/CN2018/082179 CN2018082179W WO2019165679A1 WO 2019165679 A1 WO2019165679 A1 WO 2019165679A1 CN 2018082179 W CN2018082179 W CN 2018082179W WO 2019165679 A1 WO2019165679 A1 WO 2019165679A1
Authority
WO
WIPO (PCT)
Prior art keywords
bit
data
neural network
bit conversion
conversion
Prior art date
Application number
PCT/CN2018/082179
Other languages
English (en)
French (fr)
Inventor
韩银和
闵丰
许浩博
王颖
Original Assignee
中国科学院计算技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院计算技术研究所 filed Critical 中国科学院计算技术研究所
Publication of WO2019165679A1 publication Critical patent/WO2019165679A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • the present invention relates to the field of artificial intelligence, and more particularly to improvements in neural network processors.
  • the deep learning technology of artificial intelligence has been rapidly developed in recent years, and it has been solved in solving high-level abstract cognitive problems such as image recognition, speech recognition, natural language understanding, weather prediction, gene expression, content recommendation and intelligent robots. It has been widely used and proven to have excellent performance. This has made the development and improvement of artificial intelligence technology a research hotspot in academia and industry.
  • Deep neural network is one of the highest level of perceptual models in the field of artificial intelligence. This kind of network simulates the neural connection structure of the human brain by establishing a model, and uses multiple transformation stages to describe the data features for images, videos and Large-scale data processing tasks such as audio have brought about breakthroughs.
  • the model of deep neural network belongs to an operation model, which contains a large number of nodes. These nodes adopt a mesh interconnection structure, which is called a neuron of deep neural network.
  • the strength of the connection between the two nodes represents the weighted value of the signal between the two nodes, ie the weight, to correspond to the memory in the biological neural network.
  • a dedicated processor for neural network computing has also been developed accordingly.
  • a large amount of data needs to be repeatedly convolved, activated, pooled, etc., which requires a large amount of computation time, which seriously affects the user experience. This makes how to reduce the computation time of the neural network becomes an improved strategy for the neural network processor.
  • a neural network processor including a bit conversion device including the following:
  • An input interface a control unit, a data conversion unit, and an output interface
  • the control unit is configured to generate a control signal for the data conversion unit
  • the input interface is configured to receive original data
  • the data conversion unit is configured to perform bit conversion on the original data according to the control signal to convert the original data into a bit conversion result expressed by using fewer bits;
  • the output interface is configured to output the bit conversion result to the bit conversion device.
  • control unit is configured to determine a rule for performing bit conversion according to the set parameter or the input parameter to generate the control signal
  • the parameter includes information related to the number of bits of the original data and the number of bits of the bit conversion result.
  • the data conversion unit is configured to determine a reserved bit and a truncation bit in the original data according to the control signal, and according to the reserved bit of the original data and the The highest bit of the truncated bits of the original data determines the bit conversion result.
  • the data conversion unit is configured to determine a reserved bit and a truncation bit in the original data according to the control signal, and use a reserved bit in the original data as a Describe the bit conversion result.
  • the data conversion unit is configured to perform bit conversion on the original data according to the control signal, and convert the original data into a bit conversion expressed by using half of the original number of bits result.
  • control unit generates a control signal for the data conversion unit
  • the input interface receives raw data from outside the bit conversion device that needs to perform bit conversion
  • the data conversion unit performs bit conversion on the original data according to the control signal to convert the original data into a bit conversion result expressed by using fewer bits;
  • the output interface outputs the bit conversion result to the bit conversion device.
  • step 1) comprises:
  • control unit determines a rule for performing bit conversion according to the set parameter or the input parameter
  • control unit generates a control signal corresponding to the rule
  • the parameter includes information related to the number of bits of the original data and the number of bits of the bit conversion result.
  • step 3 comprises:
  • the data conversion unit determines the bit conversion result based on the reserved bit of the original data and the highest bit of the truncated bits of the original data according to the control signal.
  • step 3 comprises:
  • the data conversion unit uses a reserved bit in the original data as the bit conversion result according to the control signal.
  • the buffered neural network data is input to the bit conversion device to perform steps 1)-4), or
  • the result of the convolution operation is input to the bit conversion device to perform steps 1)-4).
  • a computer readable storage medium having stored therein a computer program, when executed, for implementing the method of any of the above.
  • the present invention provides a bit conversion apparatus for a neural network processor that can be used to adjust the number of bits used to express data in various computational processes of a neural network. By reducing the number of bits used to express the data, the hardware cost required for the calculation can be reduced, the computational speed can be increased, the need for the data storage space of the neural network processor can be reduced, and the energy consumption for performing neural network calculations can be reduced.
  • FIG. 1 shows a block diagram of a bit conversion device in accordance with one embodiment of the present invention
  • FIG. 2 is a connection diagram of respective units in a bit conversion device according to an embodiment of the present invention.
  • FIG. 3 is a flow chart of a method of bit-transforming neural network data using a bit conversion device as shown in FIG. 1 according to an embodiment of the present invention
  • 4a is a hardware configuration diagram for performing bit conversion in a "rounding mode" in a data conversion unit of a bit conversion device according to an embodiment of the present invention
  • 4b is a hardware configuration diagram for performing bit conversion in a "direct truncation mode" in a data conversion unit of a bit conversion device according to an embodiment of the present invention.
  • the inventors believe that by appropriately reducing the number of bits of data involved in the calculation of the neural network, for example, using fewer bits to represent data that would otherwise require more bits, reducing the amount of computation to reduce the neural network. Calculation time. This is because the inventors have found in the prior art that the neural network algorithm has relatively high fault tolerance for the intermediate result of the calculation, even if fewer bits are used to represent that more bits are needed to represent The practice of the data changes the accuracy of the data involved in the calculation and thus the accuracy of the intermediate results obtained, but this does not have a large impact on the final output of the neural network.
  • bit conversion the manner in which the bit of the data used for the calculation is reduced.
  • bit conversion the process of adjusting the number of binary bits required to express a value.
  • bit conversion For example, for a decimal value of 0.5, the result of using Q7 fixed-point data is 01000000 (where Q7 uses the leftmost first bit of the 8 bits as the sign bit, and the remaining 7 bits represent the fractional part, thereby To represent the decimal between -1 and 1 with a precision of 7), when performing bit conversion, the result originally expressed by Q7 can be modified to be represented by Q3, and the result is 0100 (similar to Q7, Q3 is also used).
  • the first leftmost bit is used as a sign bit, except that it uses 3 bits to represent the fractional part, which can represent a fractional precision of -1 between 1 and 1.
  • the present invention proposes a bit conversion device for a neural network processor.
  • a rule for performing bit conversion can be determined by the bit conversion device according to a parameter set or based on a user input to perform bit conversion on the data.
  • the neural network processor can process a relatively small amount of data, thereby increasing the processing speed and reducing the energy consumption of the neural network processor.
  • the inventor believes that in the logic combination circuit, the speed of the data operation is inversely proportional to the number of bits of the numerical expression; the energy consumption of the data operation is proportional to the bit of the numerical expression; therefore, after the bit conversion of the data, the accelerated calculation can be achieved. With the effect of reducing power consumption.
  • FIG. 1 shows a bit conversion device 101 according to an embodiment of the present invention, comprising: an input bus unit 102 as an input interface, a data conversion unit 103, an output bus unit 104 as an output interface, and a control unit 105.
  • the input bus unit 102 is configured to acquire neural network data that needs to be bit-converted to be provided to the data conversion unit 103.
  • input bus unit 102 can receive and/or transmit a plurality of data to be converted in parallel.
  • the data conversion unit 103 is configured to perform bit conversion on the neural network data from the input bus unit 102 according to a rule for performing bit conversion determined, for example, or based on a parameter input by the user.
  • the output bus unit 104 is configured to output the result of the bit conversion obtained by the processing by the data conversion unit 103 from the bit conversion device 101 to provide means for performing subsequent processing in the neural network processing.
  • the control unit 105 is configured to determine a rule of bit conversion, and select a corresponding bit conversion mode to control the operation of the data conversion unit 103 to perform bit conversion.
  • the control unit 105 may determine a rule for performing bit conversion by analyzing the set parameters or parameters input by the user to select from among various conversion modes set in advance.
  • the parameter may include the number of bits of the data to be converted and the number of bits of the data to be converted, or the binary representation of the data to be converted and the binary representation of the converted data, such as Q7. Q3 and so on. For example, based on the parameters entered by the user, it is determined that the neural network data represented by Q7 is converted to be represented by Q3.
  • input bus unit 102 and/or output bus unit 104 can receive and/or transmit a plurality of data to be converted in parallel.
  • the input bus unit has a bit number of 128 bits
  • the output bus has a bit number of 64 bits.
  • the control unit receives parameters input by the user from outside the bit conversion device for generating a mode switching signal for the data conversion unit according to the determined bit conversion rule, so that the data conversion unit can know which manner is needed in the current situation To perform bit conversion. And, the control unit may further generate an input control signal for controlling the input bus unit to start receiving data or suspending receiving data, and an output control signal for controlling the output bus unit to start outputting or suspending the output bit conversion result.
  • a method of performing bit conversion of neural network data using the bit conversion apparatus as shown in FIG. 1 will be described below by way of an embodiment. Referring to Figure 3, the method includes:
  • Step 1 The control unit 105 in the bit conversion device 101 determines the rule of the bit conversion used based on the set conversion demand parameter or the parameter input by the user.
  • the set conversion requirement parameter, the parameter input by the user includes information related to the number of bits of the neural network data to be converted and the number of converted data bits.
  • the set conversion requirement parameter, the parameter input by the user may further include a cut rule when performing bit conversion, such as a rule of "rounding off” or "direct truncation".
  • the control unit 105 selects from a preset bit conversion mode.
  • the bit conversion mode comprises a "rounding mode” and a "direct truncation mode", and the processing for the two different modes will be described in a subsequent step.
  • Step 2 The input bus unit 102 in the bit conversion device 101 supplies the neural network data obtained by the bit conversion device 101 to the data conversion unit 103.
  • the input bus unit 102 herein may include a plurality of interfaces capable of receiving data in parallel to receive neural network data from outside the bit conversion device 101 that needs to perform bit conversion in parallel. Similarly, the input bus unit 102 may also include a plurality of interfaces capable of outputting data in parallel, thereby providing data to the data conversion unit 103 in parallel for bit conversion processing.
  • Step 3 The data conversion unit 103 performs bit conversion on the neural network data that needs to perform bit conversion in accordance with the rule of bit conversion determined by the control unit 105.
  • a control signal from the control unit 105 can be received by the data conversion unit 103 to perform bit conversion in accordance with the rule.
  • the inventors have found that, when reducing the number of bits of the data used for the calculation, if the reduced number of bits is greater than or equal to half the number of bits of the original data, the neural network processor can be made in hardware cost, processing speed, and A compromise between accuracy rates. Therefore, in the present invention, it is preferable to reduce the number of bits of the neural network data that needs to perform bit conversion to half of the original, for example, to perform bit conversion using a fixed hardware structure to convert 32-bit data into 16 bits, and 16 bits.
  • the data is converted into 8 bits, 8 bits of data are converted into 4 bits, 4 bits of data are converted into 2 bits, and 2 bits of data are converted into 1 bit.
  • each bit of the neural network data that needs to perform bit conversion may be divided into reserved bits and truncated bits according to the rule, wherein the reserved bits are among the bits of the neural network data.
  • a high one or more bits, the truncated bits being the remaining bits in the various bits of the neural network data. For example, for the 8-bit data 10101111, if the number of bits is reduced to half of the original number, the reserved bit is 1010 and the truncation bit is 1111.
  • 4a shows a hardware device structure for performing bit conversion in the "rounding mode" in the data conversion unit 103, in which 16 8-bit neural network data requiring bit conversion is performed in parallel, according to an embodiment of the present invention.
  • Input into the data conversion unit 103 among the 4 bit reserved bits in each 8-bit neural network data, bits other than the sign bit (for example, a 1 , a 2 , a 3 ) and the highest bit of the corresponding truncation bit are removed ( For example, a 4 ) is used as two inputs of the adder, respectively, and the output of the adder and the sign bit in the neural network data are used together as a result of performing bit conversion on the 8-bit neural network data.
  • the neural network data input to the conversion unit 103 is 10101111 (inverse), indicating that it represents -0.6328125 in decimal, and its truncation bit is 1111, which will be truncated.
  • the highest bit 1 is added to the three bits 010 other than the sign bit in the reserved bit, and the result of bit conversion based on the sign bit in the neural network data and the result of the adder is 1011 (reverse code), indicating Decimal -0.625.
  • 4b shows a hardware device structure for performing bit conversion in a "direct truncation mode" in the data conversion unit 103, in which 16 8-bit neural network data requiring bit conversion is paralleled, according to an embodiment of the present invention.
  • Input into the data conversion unit 103 4 bit reserved bits (for example, a 0 , a 1 , a 2 , a 3 ) in each 8-bit neural network data are directly used as execution bits for the 8 bit neural network data. The result of the conversion.
  • Step 4 The result of the bit conversion obtained by the processing of the data conversion unit 103 by the output bus unit 104 is output from the bit conversion device 101 to provide means for performing subsequent processing in the neural network processing.
  • bit conversion apparatus provided by the above-described embodiments of the present invention can be used as part of a neural network processor in various calculation processes for neural networks.
  • the buffered neural network data may be bit-converted using a bit conversion device when the buffering of the neural network data has been completed and the convolution operation has not been completed.
  • the buffer may be used by the bit conversion device.
  • the network data is bit-converted, and the result obtained by the bit conversion is supplied to a unit for performing a convolution operation to perform a convolution operation.
  • the bit conversion means may be used to perform bit conversion on the result of the convolution operation when the convolution operation on the data has been completed and the activation operation has not been completed.
  • the accumulation operation of the convolution operation unit tends to increase the number of bits of the result of the obtained convolution operation, in order to adapt to the bit number requirements of subsequent operations (for example, for some hardware-implemented activation).
  • the number of bits used is often fixed, and the result of the convolution operation needs to be bit-converted.
  • the present invention provides a bit conversion apparatus for a neural network processor that can be used to adjust the number of bits used to express data in various calculation processes of a neural network.
  • a bit conversion apparatus for a neural network processor that can be used to adjust the number of bits used to express data in various calculation processes of a neural network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

一种神经网络处理器,以及采用所述神经网络处理器对神经网络的数据进行比特转换的方法。所述神经网络处理器中包括比特转换装置(101),该比特转换装置(101)包括:输入接口(102)、控制单元(105)、数据转换单元(103)、和输出接口(104);其中,所述控制单元(105)用于产生针对所述数据转换单元(103)的控制信号;所述输入接口(102)用于接收原始数据;所述数据转换单元(103)用于根据所述控制信号对所述原始数据进行比特转换,以将所述原始数据转换为采用更少的比特位数进行表达的比特转换结果;所述输出接口(104)用于将所述比特转换结果输出所述比特转换装置(101)。通过该方法可以减少表达数据所采用的比特位数,降低计算所需的硬件成本、和能耗,提高计算速度。

Description

一种包括比特转换装置的神经网络处理器及其方法 技术领域
本发明涉及人工智能领域,尤其涉及对神经网络处理器的改进。
背景技术
人工智能的深度学习技术在近几年得到了飞速的发展,在解决高级抽象认知问题上,例如图像识别、语音识别、自然语言理解、天气预测、基因表达、内容推荐和智能机器人等领域得到了广泛应用并且被证实了具有出色的表现。这使得对于人工智能技术的开发和改进成为了学术界和工业界的研究热点。
深度神经网络是人工智能领域具有最高发展水平的感知模型之一,该类网络通过建立模型来模拟人类大脑的神经连接结构,采用多个变换阶段分层对数据特征进行描述,为图像、视频和音频等大规模数据处理任务带来了突破性进展。深度神经网络的模型属于一种运算模型,其包含大量的节点,这些节点之间采用网状的互连结构,被称作为深度神经网络的神经元。在两个节点之间的连接强度代表信号在该两个节点间的加权值,即权重,以与生物学意义上的神经网络中的记忆相对应。
针对神经网络计算的专用处理器,即神经网络处理器也得到了相应的发展。在实际的神经网络计算处理过程中,需要对大量的数据反复地进行卷积、激活、池化等操作,这需要消耗极大量的计算时间,严重影响了用户的使用体验。这使得如何减少神经网络的计算时间成为了针对神经网络处理器的一种改进策略。
发明内容
因此,本发明的目的在于克服上述现有技术的缺陷,提供一种神经网络处理器,该神经网络处理器中包括比特转换装置,该比特转换装置包括:
输入接口、控制单元、数据转换单元、和输出接口;
其中,
所述控制单元用于产生针对所述数据转换单元的控制信号;
所述输入接口用于接收原始数据;
所述数据转换单元用于根据所述控制信号对所述原始数据进行比特转换,以将所述原始数据转换为采用更少的比特位数进行表达的比特转换结果;
所述输出接口用于将所述比特转换结果输出所述比特转换装置。
优选地,根据所述神经网络处理器,其中所述控制单元用于根据设置的参数或者输入的参数确定执行比特转换的规则,以产生所述控制信号;
其中,所述参数包括与所述原始数据的比特位数以及所述比特转换结果的比特位数相关的信息。
优选地,根据所述神经网络处理器,其中所述数据转换单元用于根据所述控制信号,确定所述原始数据中的保留位以及截断位,并且根据所述原始数据的保留位以及所述原始数据的截断位中的最高位确定所述比特转换结果。
优选地,根据所述神经网络处理器,其中所述数据转换单元用于根据所述控制信号,确定所述原始数据中的保留位以及截断位,并且将所述原始数据中的保留位作为所述比特转换结果。
优选地,根据所述神经网络处理器,其中所述数据转换单元用于根据所述控制信号对所述原始数据进行比特转换,以原始数据转化为采用原本一半的比特位数进行表达的比特转换结果。
一种采用上述任意一项所述的神经网络处理器对神经网络的数据进行比特转换的方法,包括:
1)所述控制单元产生针对数据转换单元的控制信号;
2)所述输入接口接收来自所述比特转换装置外部的需要执行比特转换的原始数据;
3)所述数据转换单元根据所述控制信号对所述原始数据进行比特转换,以将所述原始数据转换为采用更少的比特位数进行表达的比特转换结果;
4)所述输出接口将所述比特转换结果输出所述比特转换装置。
优选地,根据所述方法,其中步骤1)包括:
1-1)所述控制单元根据设置的参数或者输入的参数确定执行比特转换的规则;
1-2)所述控制单元产生与所述规则对应的控制信号;
其中,所述参数包括与所述原始数据的比特位数以及所述比特转换结果的比特位数相关的信息。
优选地,根据所述方法,其中步骤3)包括:
所述数据转换单元根据所述控制信号,基于所述原始数据的保留位以及所述原始数据的截断位中的最高位确定所述比特转换结果。
优选地,根据所述方法,其中步骤3)包括:
所述数据转换单元根据所述控制信号,将所述原始数据中的保留位作为所述比特转换结果。
优选地,根据所述方法,在已完成对神经网络数据的缓存、并且尚未完成卷积运算时,将缓存的神经网络数据输入所述比特转换装置以执行步骤1)-4),或者在已完成对数据的卷积运算、并且尚未完成激活运算时,将卷积运算的结果输入所述比特转换装置以执行步骤1)-4)。
一种计算机可读存储介质,其中存储有计算机程序,所述计算机程序在被执行时用于实现上述任意一项所述的方法。
与现有技术相比,本发明的优点在于:
本发明提供一种用于神经网络处理器的比特转换装置,其可被用于在神经网络的各种计算过程中对表达数据所采用的比特位数进行调整。通过减少表达数据所采用的比特位数,可以降低计算所需的硬件成本、提高计算速度、减少神经网络处理器对数据存储空间的需要、并且降低执行神经网络计算的能耗。
附图说明
以下参照附图对本发明实施例作进一步说明,其中:
图1示出了根据本发明的一个实施例的比特转换装置的模块图;
图2是根据本发明的一个实施例的比特转换装置中各个单元的连接关系图;
图3是根据本发明的一个实施例的采用如图1所示出的比特转换装置对神经网络数据进行比特转换的方法流程;
图4a是根据本发明的一个实施例在比特转换装置的数据转换单元中用于在“四舍五入模式”下执行比特转换的硬件结构图;
图4b是根据本发明的一个实施例在比特转换装置的数据转换单元中用于在“直接截断模式”下执行比特转换的硬件结构图。
具体实施方式
下面结合附图和具体实施方式对本发明作详细说明。
如前文中所述,在设计神经网络处理器时,希望能够减少神经网络的计算时间。对此,发明人认为可以通过适当地减少参与到神经网络计算的数据的比特位数,例如采用更少的比特位来代表原本需要较多比特位来表示的数据,降低运算量以减少神经网络的计算时间。这是由于,发明人在对现有技术的研究中发现,神经网络的算法对于计算的中间结果存在相对较高的容错性,即便采用更少的比特位来代表原本需要较多比特位来表示的数据的做法会改变参与计算的数据的精度从而影响所获得的中间结果的准确性,然而这并不会对神经网络最终输出的结果造成较大的影响。
在本发明中,将这种缩减计算所使用的数据的比特位的方式称作为对数据的“裁剪操作”。并且,将对表达数值所需的二进制比特位数进行调整的过程称作为“比特转换”。例如,针对十进制的数值0.5,采用Q7定点数据进行表示的结果为01000000(这里Q7采用8比特中的最左侧第一个比特位作为符号位,采用其余7个比特位表示小数部分,由此来表示-1到1之间的精度为7的小数),在进行比特转换时,可以将原本采用Q7表示的结果修改为采用Q3进行表示,获得结果0100(与Q7相类似地,Q3同样采用最左侧第一个比特作为符号位,不同的是其采用3个比特位表示小数部分,可以表示-1到1之间的精度为3的小数)。
基于上述分析,本发明提出了一种用于神经网络处理器的比特转换装置。通过所述比特转换装置可以根据设置的或者基于用户输入的参数确定执行比特转换的规则,以对数据执行比特转换。通过这样的转换,神经网络处理器可以对相对较少量的数据进行处理,并由此提升处理速度、降低神经网络处理器的能耗。发明人认为,在逻辑组合电路中,数据运算的速度与数值表达的比特位数成反比;数据运算的能耗与数值表达的比特位成正比;故对数据进行比特转换后,可达到加速计算与降低功耗的效果。
图1示出了根据本发明的一个实施例的比特转换装置101,包括:作为输入接口的输入总线单元102、数据转换单元103、作为输出接口的输出总线单元104、控制单元105。
其中,输入总线单元102,用于获取需要进行比特转换的神经网络数据,以将其提供至数据转换单元103。在一些实施例中,输入总线单元102可以并行地接收和/或传输多个待转换数据。
数据转换单元103,用于根据例如设置的或者基于用户输入的参数而确定的执行比特转换的规则,对来自输入总线单元102的神经网络数据执行比特转换。
输出总线单元104,用于将经由数据转换单元103处理所获得的比特转换的结果从比特转换装置101中输出,以提供至神经网络处理中用于执行后续处理的装置。
控制单元105,用于确定比特转换的规则,选择相应的比特转换模式来控制数据转换单元103执行比特转换的操作。所述控制单元105可以通过分析设置的参数或者由用户输入的参数来确定执行比特转换的规则,以从预先设置的各种转换模式中进行选择。这里参数可以包括待转换数据的比特位数以及转换后的数据比特位数,也可以是待转换数据所采用的二进制的表达方式以及转化后的数据所期望采用的二进制的表达方式,例如Q7、Q3等。例如,根据用户输入的参数,确定将采用Q7表示的神经网络数据转换为采用Q3表示。在减少表达所采用的比特位时,可以采用“四舍五入”的方式,例如将01011000转换为0110,也可以采用“直接截断”的方式,例如将01011000转换为0101。这里的“四舍五入”或者“直接截断”等转换方式既可以由用户输入,也可以被设置为是固定的。
在一些实施例中,输入总线单元102和/或输出总线单元104可以并行地接收和/或传输多个待转换数据。
图2是根据本发明的一个实施例的比特转换装置中各个单元的连接关系图。其中,输入总线单元的比特位数为128bit,输出总线的比特位数为64bit。控制单元从比特转换装置外部接收由用户输入的参数,其用于根据确定的比特转换规则以产生用于数据转换单元的模式切换信号,使得数据转换单元可以获知在当前状况下需要采用何种方式以执行比特转换。并且,控制单元还可以产生用于控制输入总线单元开始接收数据或者暂停接收数据的输入控制信号,以及用于控制输出总线单元开始输出或者暂停输出比特转换结果的输出控制信号。
下面将通过一个实施例介绍采用如图1所示出的比特转换装置对神经网络数据进行比特转换的方法过程。参考图3,所述方法包括:
步骤1.基于设置的转换需求参数或者由用户输入的参数,由比特转换装置101中的控制单元105确定所使用的比特转换的规则。所述设置的转换需求参数、所述由用户输入的参数中包括与需要转换的神经网络数据的比特位数以及转化后的数据比特位数相关的信息。所述设置的转换需求参数、所述由用户输入的参数还可以包括在进行比特转换时的截断规则,例如“四舍五入”或者“直接截断”等规则。
基于上述规则,可以由控制单元105从预先设置的比特转换模式中进行选择。根据本发明的一个实施例,所述比特转换模式包括“四舍五入模式”和“直接截断模式”,对于所述两种不同模式的处理方式将在随后的步骤中进行介绍。
步骤2.比特转换装置101中的输入总线单元102将其所获得的需要执行比特转换的神经网络数据提供至数据转换单元103。
这里的输入总线单元102可以包括多个能够并行接收数据的接口,以并行地接收来自比特转换装置101外部的需要执行比特转换的神经网络数据。类似的,输入总线单元102也可以包括多个能够并行输出数据的接口,从而并行地将数据提供至数据转换单元103,以进行比特转换的处理。
步骤3.数据转换单元103依据控制单元105所确定的比特转换的规则,对需要执行比特转换的神经网络数据执行比特转换。
在此步骤中,可以由数据转换单元103接收来自控制单元105的控制信号以依据所述规则执行比特转换。
发明人发现,在降低计算所使用的数据的比特位数时,若是缩减后的比特位数大于等于原本数据的比特位数的一半,则可以使得神经网络处理器在硬件成本、处理速度、和准确率之间达到折中。因此,在本发明中优选地,将需要执行比特转换的神经网络数据的比特位数缩减为原本的一半,例如采用固定的硬件结构来执行比特转换,以将32bit的数据转化为16bit、将16bit的数据转化为8bit、将8bit的数据转化为4bit、将4bit的数据转化为2bit、以及将2bit的数据转化为1bit等。
在执行比特转换的过程中,可以根据所述规则,将需要执行比特转换的神经网络数据的各个比特位划分为保留位和截断位,其中保留位为所述神经网络数据的各个比特位中较高的一个或多个比特位,截断位为所述神经网络数据的各个比特位中的其余比特位。例如,对于8bit的数据10101111而言,若采用将其比特位数缩减为原本的一半的方式,则其保留位为1010, 其截断位为1111。
图4a示出了根据本发明的一个实施例在数据转换单元103中用于在“四舍五入模式”下执行比特转换的硬件装置结构,其中16个8bit的需要执行比特转换的神经网络数据被并行地输入到数据转换单元103中,每一个8bit的神经网络数据中的4bit的保留位中除去符号位以外的比特位(例如a 1、a 2、a 3)和对应的截断位中的最高位(例如a 4)被分别用作为加法器的两个输入,所述加法器的输出以及所述神经网络数据中的符号位共同被用作为针对所述8bit的神经网络数据执行比特转换后的结果。
参考图4a进行举例说明,在“四舍五入模式”下,假设输入到转换单元103中的神经网络数据为10101111(反码),表示其表示十进制的-0.6328125,其截断位为1111,将截断位的最高位1与保留位中除符号位之外的3个比特位010相加,基于所述神经网络数据中的符号位与加法器的结果得到比特转换后的结果为1011(反码),表示十进制的-0.625。
图4b示出了根据本发明的一个实施例在数据转换单元103中用于在“直接截断模式”下执行比特转换的硬件装置结构,其中16个8bit的需要执行比特转换的神经网络数据被并行地输入到数据转换单元103中,每一个8bit的神经网络数据中的4bit的保留位(例如a 0、a 1、a 2、a 3)被直接用作为针对所述8bit的神经网络数据执行比特转换后的结果。
参考图4b进行举例说明,在“直接截断模式”下,假设输入到转换单元103中的神经网络数据为10101111(反码),则执行比特转换后的结果为1010。
步骤4.由输出总线单元104将经由数据转换单元103处理所获得的比特转换的结果从比特转换装置101中输出,以提供至神经网络处理中用于执行后续处理的装置。
由本发明上述实施例所提供的比特转换装置可以作为神经网络处理器的一部分,在针对神经网络的各种计算过程中使用。
例如,可以在已完成对神经网络数据的缓存、并且尚未完成卷积运算时,采用比特转换装置对缓存的神经网络数据进行比特转换。这样做的原因在于,神经网络的不同网络层对数据所采用的比特位数可能存在不同的要求,为了适应于所需要的计算速度、以及期望的能耗,可以由比特转换装置对缓存的神经网络数据进行比特转换,并将经过比特转换所获得的结果提供至用于执行卷积运算的单元以执行卷积运算。
又例如,可以在已完成对数据的卷积运算、并且尚未完成激活运算时,采用比特转换装置对卷积运算的结果进行比特转换。这样做的原因在于,卷积运算单元的累加操作往往会增加所获得的卷积运算的结果的比特位数,为了适应于后续操作对比特位数的要求(例如对于一些采用硬件方式实现的激活运算单元而言,其所使用的比特位数往往是固定的),需要对卷积运算的结果进行比特转换。
基于上述实施例,本发明提供一种用于神经网络处理器的比特转换装置,其可被用于在神经网络的各种计算过程中对表达数据所采用的比特位数进行调整。通过减少表达数据所采用的比特位数,可以降低计算所需的硬件成本、提高计算速度、减少神经网络处理器对数据存储空间的需要、并且降低执行神经网络计算的能耗。
需要说明的是,上述实施例中介绍的各个步骤并非都是必须的,本领域技术人员可以根据实际需要进行适当的取舍、替换、修改等。
最后所应说明的是,以上实施例仅用以说明本发明的技术方案而非限制。尽管上文参照实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,对本发明的技术方案进行修改或者等同替换,都不脱离本发明技术方案的精神和范围,其均应涵盖在本发明的权利要求范围当中。

Claims (11)

  1. 一种神经网络处理器,该神经网络处理器中包括比特转换装置,该比特转换装置包括:
    输入接口、控制单元、数据转换单元、和输出接口;
    其中,
    所述控制单元用于产生针对所述数据转换单元的控制信号;
    所述输入接口用于接收原始数据;
    所述数据转换单元用于根据所述控制信号对所述原始数据进行比特转换,以将所述原始数据转换为采用更少的比特位数进行表达的比特转换结果;
    所述输出接口用于将所述比特转换结果输出所述比特转换装置。
  2. 根据权利要求1所述的神经网络处理器,其中所述控制单元用于根据设置的参数或者输入的参数确定执行比特转换的规则,以产生所述控制信号;
    其中,所述参数包括与所述原始数据的比特位数以及所述比特转换结果的比特位数相关的信息。
  3. 根据权利要求2所述的神经网络处理器,其中所述数据转换单元用于根据所述控制信号,确定所述原始数据中的保留位以及截断位,并且根据所述原始数据的保留位以及所述原始数据的截断位中的最高位确定所述比特转换结果。
  4. 根据权利要求2所述的神经网络处理器,其中所述数据转换单元用于根据所述控制信号,确定所述原始数据中的保留位以及截断位,并且将所述原始数据中的保留位作为所述比特转换结果。
  5. 根据权利要求1所述的神经网络处理器,其中所述数据转换单元用于根据所述控制信号对所述原始数据进行比特转换,以原始数据转化为采用原本一半的比特位数进行表达的比特转换结果。
  6. 一种采用如权利要求1-5中任意一项所述的神经网络处理器对神经网络的数据进行比特转换的方法,包括:
    1)所述控制单元产生针对数据转换单元的控制信号;
    2)所述输入接口接收来自所述比特转换装置外部的需要执行比特转换的原始数据;
    3)所述数据转换单元根据所述控制信号对所述原始数据进行比特转换,以将所述原始数据转换为采用更少的比特位数进行表达的比特转换结果;
    4)所述输出接口将所述比特转换结果输出所述比特转换装置。
  7. 根据权利要求6所述的方法,其中步骤1)包括:
    1-1)所述控制单元根据设置的参数或者输入的参数确定执行比特转换的规则;
    1-2)所述控制单元产生与所述规则对应的控制信号;
    其中,所述参数包括与所述原始数据的比特位数以及所述比特转换结果的比特位数相关的信息。
  8. 根据权利要求7所述的方法,其中步骤3)包括:
    所述数据转换单元根据所述控制信号,基于所述原始数据的保留位以及所述原始数据的截断位中的最高位确定所述比特转换结果。
  9. 根据权利要求7所述的方法,其中步骤3)包括:
    所述数据转换单元根据所述控制信号,将所述原始数据中的保留位作为所述比特转换结果。
  10. 根据权利要求6-9中任意一项所述的方法,在已完成对神经网络数据的缓存、并且尚未完成卷积运算时,将缓存的神经网络数据输入所述比特转换装置以执行步骤1)-4),或者在已完成对数据的卷积运算、并且尚未完成激活运算时,将卷积运算的结果输入所述比特转换装置以执行步骤1)-4)。
  11. 一种计算机可读存储介质,其中存储有计算机程序,所述计算机程序在被执行时用于实现如权利要求6-10中任意一项所述的方法。
PCT/CN2018/082179 2018-03-01 2018-04-08 一种包括比特转换装置的神经网络处理器及其方法 WO2019165679A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810170612.3A CN108345938A (zh) 2018-03-01 2018-03-01 一种包括比特转换装置的神经网络处理器及其方法
CN201810170612.3 2018-03-01

Publications (1)

Publication Number Publication Date
WO2019165679A1 true WO2019165679A1 (zh) 2019-09-06

Family

ID=62959552

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/082179 WO2019165679A1 (zh) 2018-03-01 2018-04-08 一种包括比特转换装置的神经网络处理器及其方法

Country Status (2)

Country Link
CN (1) CN108345938A (zh)
WO (1) WO2019165679A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113392954B (zh) * 2020-03-13 2023-01-24 华为技术有限公司 终端网络模型的数据处理方法、装置、终端以及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106447034A (zh) * 2016-10-27 2017-02-22 中国科学院计算技术研究所 一种基于数据压缩的神经网络处理器、设计方法、芯片
CN107145939A (zh) * 2017-06-21 2017-09-08 北京图森未来科技有限公司 一种神经网络优化方法及装置
CN107330515A (zh) * 2016-04-29 2017-11-07 北京中科寒武纪科技有限公司 一种用于执行人工神经网络正向运算的装置和方法
CN107423816A (zh) * 2017-03-24 2017-12-01 中国科学院计算技术研究所 一种多计算精度神经网络处理方法和系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106796668B (zh) * 2016-03-16 2019-06-14 香港应用科技研究院有限公司 用于人工神经网络中比特深度减少的方法和系统
CN107340993B (zh) * 2016-04-28 2021-07-16 中科寒武纪科技股份有限公司 运算装置和方法
CN107203808B (zh) * 2017-05-08 2018-06-01 中国科学院计算技术研究所 一种二值卷积装置及相应的二值卷积神经网络处理器
CN107292458B (zh) * 2017-08-07 2021-09-10 北京中星微人工智能芯片技术有限公司 一种应用于神经网络芯片的预测方法和预测装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330515A (zh) * 2016-04-29 2017-11-07 北京中科寒武纪科技有限公司 一种用于执行人工神经网络正向运算的装置和方法
CN106447034A (zh) * 2016-10-27 2017-02-22 中国科学院计算技术研究所 一种基于数据压缩的神经网络处理器、设计方法、芯片
CN107423816A (zh) * 2017-03-24 2017-12-01 中国科学院计算技术研究所 一种多计算精度神经网络处理方法和系统
CN107145939A (zh) * 2017-06-21 2017-09-08 北京图森未来科技有限公司 一种神经网络优化方法及装置

Also Published As

Publication number Publication date
CN108345938A (zh) 2018-07-31

Similar Documents

Publication Publication Date Title
JP7273108B2 (ja) モデルトレーニング方法、装置、電子デバイス、記憶媒体、プログラム
CN107451658B (zh) 浮点运算定点化方法及系统
US20230048031A1 (en) Data processing method and apparatus
CN109478144A (zh) 一种数据处理装置和方法
US20220335304A1 (en) System and Method for Automated Design Space Determination for Deep Neural Networks
KR102655950B1 (ko) 뉴럴 네트워크의 고속 처리 방법 및 그 방법을 이용한 장치
CN108345934B (zh) 一种用于神经网络处理器的激活装置及方法
JP2022118263A (ja) 自然言語処理モデルの訓練方法、自然言語処理方法、装置、電子機器、記憶媒体及びプログラム
CN111047045B (zh) 机器学习运算的分配系统及方法
JP2022173453A (ja) ディープラーニングモデルのトレーニング方法、自然言語処理方法及び装置、電子機器、記憶媒体及びコンピュータプログラム
CN110781686A (zh) 一种语句相似度计算方法、装置及计算机设备
CN115017178A (zh) 数据到文本生成模型的训练方法和装置
CN112884146A (zh) 一种训练基于数据量化与硬件加速的模型的方法及系统
WO2020093885A1 (zh) 一种异构协同计算系统
EP3444758B1 (en) Discrete data representation-supporting apparatus and method for back-training of artificial neural network
WO2019165679A1 (zh) 一种包括比特转换装置的神经网络处理器及其方法
CN116762080A (zh) 神经网络生成装置、神经网络运算装置、边缘设备、神经网络控制方法以及软件生成程序
CN117151178A (zh) 一种面向fpga的cnn定制网络量化加速方法
CN112183744A (zh) 一种神经网络剪枝方法及装置
CN116976461A (zh) 联邦学习方法、装置、设备及介质
JP2022078286A (ja) データ処理モデルのトレーニング方法、装置、電子機器及び記憶媒体
CN114462592A (zh) 模型训练方法、装置、电子设备及计算机可读存储介质
CN113935456A (zh) 脉冲神经网络层内数据处理方法及设备、处理芯片
CN114692865A (zh) 一种神经网络量化训练方法、装置及相关产品
CN108416435B (zh) 一种具有低带宽激活装置的神经网络处理器及其方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18908091

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18908091

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 18908091

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 280921)

122 Ep: pct application non-entry in european phase

Ref document number: 18908091

Country of ref document: EP

Kind code of ref document: A1