WO2021068469A1 - Quantization and fixed-point fusion method and apparatus for neural network - Google Patents

Quantization and fixed-point fusion method and apparatus for neural network Download PDF

Info

Publication number
WO2021068469A1
WO2021068469A1 PCT/CN2020/083797 CN2020083797W WO2021068469A1 WO 2021068469 A1 WO2021068469 A1 WO 2021068469A1 CN 2020083797 W CN2020083797 W CN 2020083797W WO 2021068469 A1 WO2021068469 A1 WO 2021068469A1
Authority
WO
WIPO (PCT)
Prior art keywords
current layer
scale
calculation
fixed
point
Prior art date
Application number
PCT/CN2020/083797
Other languages
French (fr)
Chinese (zh)
Inventor
齐南
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Publication of WO2021068469A1 publication Critical patent/WO2021068469A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the preset processing parameters include: the quantized amplitude of the input data of the current layer, the quantized amplitude of the weight of the current layer, the quantized amplitude of the output data of the current layer, the scale of the batch normalization of the current layer, and the batch normalization of the current layer. At least one of the unified offset values.
  • At least one processor At least one processor
  • Figure 4 is a schematic structural diagram of a neural network quantization and fixed point fusion device according to an embodiment of the present application
  • Solution (3) usually quantifies the parameter part, which cannot achieve the desired effect in terms of network acceleration and operational efficiency improvement.
  • the input data of the current layer of the neural network in FIG. 1 may include an input feature map (fm).
  • the "int n " on the connecting line between "fm” and “compute” indicates that the input feature of the current layer is mapped for quantization, and the subscript n indicates the quantized bit.
  • the input data and weights of the current layer of the neural network can be quantified.
  • the environmental parameters of the parking space and the heading angle and weight of the parking body are quantified.
  • the current layer use the quantized weights to perform calculation operations on the quantized input data to obtain the results of the calculation operations.
  • the result of the calculation operation may be the angle of the steering wheel capable of completing automatic parking control.
  • the preset processing parameters are fixed-point processing.
  • the preset processing parameters include the first bias value of the current layer.
  • the post-processing operation includes adding the result of the calculation operation to the first offset value after the fixed-point processing.
  • Fig. 3 is a flow chart of convolution calculation of a neural network quantization and fixed-point fusion method according to an embodiment of the present application.
  • a convolutional layer is taken as an example to show the calculation process of the quantization and fixed-point fusion method.
  • the calculation process includes an accelerated calculation process and an initialization calculation process.
  • the rectangular box labeled 1 in FIG. 3 represents the accelerated calculation process, and the rectangular box labeled 2 represents the initialization calculation process.
  • the calculation operation includes a convolution operation
  • the pointing unit 300 is used to:
  • the memory 502 may include a storage program area and a storage data area.
  • the storage program area may store an operating system and an application program required by at least one function; the storage data area may store an electronic device according to the quantization and fixed-point integration method that executes the neural network Use the created data, etc.
  • the memory 502 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices.
  • the memory 502 may optionally include memories remotely arranged relative to the processor 501, and these remote memories may be connected to an electronic device that executes a neural network quantization and fixed-point integration method through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Facsimile Image Signal Circuits (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A quantification and fixed-point fusion method and apparatus of a neural network, and an electronic device and a storage medium, relating to the field of artificial intelligence, and in particular, to the field of automatic driving. The method comprises: performing quantitative processing on input data and weight of a current layer of the neural network (S110); in the current layer, performing a calculation operation on the input data subjected to quantitative processing by utilizing the weight subjected to quantitative processing to obtain the calculation operation result (S120); performing fixed-point processing on a preset processing parameter (S130); and performing a post-processing operation on the calculation operation result by utilizing the preset processing parameter subjected to fixed-point processing to obtain the output result of the current layer (S140). According to the method, by means of fusion of quantitative processing and fixed-point processing, the requirement of data transmission between operators for bandwidth is obviously reduced, the calculation amount of the acceleration unit is effectively reduced, the advantages of fixed-point calculation of the acceleration unit are fully exerted, the requirement of calculation for resources is reduced, and the calculation efficiency is improved while the resources are saved.

Description

神经网络的量化与定点化融合方法及装置Quantitative and fixed-point fusion method and device of neural network
本申请要求于2019年10月11日提交中国专利局、申请号为201910966512.6、发明名称为“神经网络的量化与定点化融合方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 11, 2019, the application number is 201910966512.6, and the title of the invention is "Quantification and fixed-point fusion method and device of neural network", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请涉及一种信息技术领域,尤其涉及一种涉及人工智能领域,尤其是自动驾驶(包括自主泊车)领域。This application relates to the field of information technology, in particular to the field of artificial intelligence, especially the field of autonomous driving (including autonomous parking).
背景技术Background technique
传统的神经网络的计算基于高bit(比特)的浮点运算,造成大量的计算资源浪费,而且容易过拟合,降低了模型的泛化能力。在神经网络的传统加速方法中,即使采用了低bit的浮点或者整数预算,在处理中间的浮点运算的过程中也会造成精度的浪费,从而导致最终结果在后续过程的使用前做截断操作。这种方式既浪费了精度也降低了计算力。Traditional neural network calculations are based on high-bit (bit) floating-point operations, which causes a lot of waste of computing resources, and is prone to overfitting, which reduces the generalization ability of the model. In the traditional acceleration method of neural networks, even if a low-bit floating point or integer budget is used, precision will be wasted in the process of processing intermediate floating point operations, which will cause the final result to be truncated before use in subsequent processes. operating. This method both wastes precision and reduces computational power.
在人工智能领域,尤其是自动驾驶领域,也存在同样的问题。例如,在自主泊车领域的应用场景中,传统的神经网络的计算基于高bit的浮点运算,造成大量的计算资源浪费。或者,在神经网络的传统加速方法中,即使采用了低bit的浮点或者整数预算,在处理中间的浮点运算的过程中造成精度的浪费,既浪费了精度也降低了计算力。The same problem exists in the field of artificial intelligence, especially in the field of autonomous driving. For example, in application scenarios in the field of autonomous parking, traditional neural network calculations are based on high-bit floating-point operations, resulting in a lot of waste of computing resources. Or, in the traditional acceleration method of the neural network, even if a low-bit floating point or integer budget is used, precision is wasted in the process of processing intermediate floating point operations, which wastes precision and reduces computational power.
发明内容Summary of the invention
本申请实施例提出一种神经网络的量化与定点化融合方法、装置、电子设备及存储介质,以至少解决现有技术中的以上技术问题。The embodiments of the present application propose a method, device, electronic equipment, and storage medium for quantizing and fusing a neural network to at least solve the above technical problems in the prior art.
第一方面,本申请实施例提供了一种神经网络的量化与定点化融合方法,包括:In the first aspect, an embodiment of the present application provides a quantization and fixed-point fusion method of a neural network, including:
对神经网络的当前层的输入数据和权重做量化处理;Quantify the input data and weights of the current layer of the neural network;
在当前层中,利用量化处理后的权重对量化处理后的输入数据做计算操作,得到计算操作结果;In the current layer, use the quantized weights to perform calculation operations on the quantized input data to obtain the results of the calculation operations;
对预设处理参数进行定点化处理;以及Fixed-point processing of preset processing parameters; and
利用定点化处理后的预设处理参数,对计算操作结果进行后处理操作,得到当前层的输出结果。Use the preset processing parameters after fixed-point processing to perform post-processing operations on the results of the calculation operations to obtain the output results of the current layer.
本申请实施例中,通过量化处理与定点化处理的融合,明显降低了算子之间的数据传输对带宽的要求,有效降低了加速单元的计算量,充分发挥了加速单元的定点化计算的优势。同时通过量化处理和定点化处理降低了计算对资源的要求,在节省资源的同时提升了计算效率。In the embodiments of this application, through the fusion of quantization processing and fixed-point processing, the bandwidth requirements for data transmission between operators are significantly reduced, the calculation amount of the acceleration unit is effectively reduced, and the fixed-point calculation of the acceleration unit is fully utilized. Advantage. At the same time, the requirements for computing resources are reduced through quantitative processing and fixed-point processing, and computing efficiency is improved while saving resources.
在一种实施方式中,计算操作包括将量化处理后的输入数据与量化处理后的权重相乘;In one embodiment, the calculation operation includes multiplying the quantized input data by the quantized weight;
预设处理参数包括当前层的第一偏置值;以及The preset processing parameters include the first bias value of the current layer; and
后处理操作包括将计算操作结果与定点化处理后的第一偏置值相加。The post-processing operation includes adding the result of the calculation operation to the first offset value after the fixed-point processing.
本申请实施例中,将用于计算操作的输入数据和权重的量化处理,和用于后处理操作的预设处理参数的定点化处理相融合,从而达到很好的神经网络加速效果。In the embodiments of the present application, the quantization processing of the input data and weights used for the calculation operation is combined with the fixed-point processing of the preset processing parameters used for the post-processing operation, so as to achieve a good neural network acceleration effect.
在一种实施方式中,计算操作包括卷积操作;以及In one embodiment, the calculation operation includes a convolution operation; and
预设处理参数包括:当前层的输入数据的量化幅值、当前层的权重的量化幅值、当前层的输出数据的量化幅值、当前层的批量归一化的尺度和当前层的批量归一化的偏置值中的至少一项。The preset processing parameters include: the quantized amplitude of the input data of the current layer, the quantized amplitude of the weight of the current layer, the quantized amplitude of the output data of the current layer, the scale of the batch normalization of the current layer, and the batch normalization of the current layer. At least one of the unified offset values.
本申请实施例中,在卷积计算流程中将量化处理与定点化处理相融合,利用当前层的输入数据、权重以及输出数据做加速处理,提升了计算效率。In the embodiment of the present application, quantization processing and fixed-point processing are combined in the convolution calculation process, and input data, weights, and output data of the current layer are used for acceleration processing, which improves calculation efficiency.
在一种实施方式中,对预设处理参数进行定点化处理,包括:In an implementation manner, performing fixed-point processing on preset processing parameters includes:
对至少两个预设处理参数进行融合计算,得到当前层的尺度值和当前层的第二偏置值。Perform fusion calculation on at least two preset processing parameters to obtain the scale value of the current layer and the second bias value of the current layer.
本申请实施例中,将预设处理参数进行融合计算,得到的融合计算的结果可作为后续的定点化处理的数据依据,通过定点化可以提高加速单元的有效算力。In the embodiment of the present application, the preset processing parameters are subjected to fusion calculation, and the result of the fusion calculation can be used as a data basis for subsequent fixed-point processing, and the effective computing power of the acceleration unit can be improved through the fixed-point processing.
在一种实施方式中,后处理操作包括:将计算操作结果与当前层的尺度值相乘之后,再与当前层的第二偏置值相加。In one embodiment, the post-processing operation includes: multiplying the result of the calculation operation by the scale value of the current layer, and then adding it to the second bias value of the current layer.
本申请实施例中,利用定点化处理后的融合计算的结果做后处理操作, 使量化处理和定点化处理相融合,在加速效果和运行效率方面都有很大的提升。In the embodiments of the present application, the result of the fusion calculation after the fixed-point processing is used for post-processing operations, so that the quantization processing and the fixed-point processing are merged, which greatly improves the acceleration effect and operating efficiency.
在一种实施方式中,对至少两个预设处理参数进行融合计算,包括利用以下公式进行融合计算:In one embodiment, performing fusion calculation on at least two preset processing parameters includes performing fusion calculation using the following formula:
new_scale=bn_scale*input_scale*weight_scale/output_scale;new_scale=bn_scale*input_scale*weight_scale/output_scale;
new_bias=bn_bias/output_scale,new_bias=bn_bias/output_scale,
其中,new_scale表示当前层的尺度值,bn_scale表示当前层的批量归一化的尺度,input_scale表示当前层的输入数据的量化幅值,weight_scale表示当前层的权重的量化幅值,output_scale表示当前层的输出数据的量化幅值,new_bias表示当前层的第二偏置值,bn_bias表示当前层的批量归一化的偏置值。Among them, new_scale represents the scale value of the current layer, bn_scale represents the batch normalization scale of the current layer, input_scale represents the quantized amplitude of the input data of the current layer, weight_scale represents the quantized amplitude of the weight of the current layer, and output_scale represents the current layer The quantized amplitude of the output data, new_bias represents the second bias value of the current layer, and bn_bias represents the batch normalized bias value of the current layer.
本申请实施例中,将上述预设处理参数进行融合计算,得到的融合计算的结果可作为后续的定点化处理的数据依据,通过定点化可有效降低加速单元的计算量,充分发挥加速单元的定点化计算的优势。In the embodiment of this application, the above-mentioned preset processing parameters are subjected to fusion calculation, and the result of the fusion calculation can be used as a data basis for subsequent fixed-point processing. Through fixed-point processing, the calculation amount of the acceleration unit can be effectively reduced, and the acceleration unit can be fully utilized. Advantages of fixed-point computing.
第二方面,本申请实施例提供了一种神经网络的量化与定点化融合装置,包括:In the second aspect, an embodiment of the present application provides a quantization and fixed-point fusion device of a neural network, including:
量化单元,用于:对神经网络的当前层的输入数据和权重做量化处理;The quantization unit is used to: quantify the input data and weights of the current layer of the neural network;
第一操作单元,用于:在当前层中,利用量化处理后的权重对量化处理后的输入数据做计算操作,得到计算操作结果;The first operation unit is used to: in the current layer, use the weights after the quantization processing to perform calculation operations on the input data after the quantization processing to obtain the results of the calculation operations;
定点化单元,用于:对预设处理参数进行定点化处理;以及The fixed-point unit is used to: perform fixed-point processing on preset processing parameters; and
第二操作单元,用于:利用定点化处理后的预设处理参数,对计算操作结果进行后处理操作,得到当前层的输出结果。The second operation unit is used to: use the preset processing parameters after the fixed-point processing to perform post-processing operations on the calculation operation results to obtain the output results of the current layer.
在一种实施方式中,计算操作包括将量化处理后的输入数据与量化处理后的权重相乘;In one embodiment, the calculation operation includes multiplying the quantized input data by the quantized weight;
预设处理参数包括当前层的第一偏置值;以及The preset processing parameters include the first bias value of the current layer; and
后处理操作包括将计算操作结果与定点化处理后的第一偏置值相加。The post-processing operation includes adding the result of the calculation operation to the first offset value after the fixed-point processing.
在一种实施方式中,计算操作包括卷积操作;以及In one embodiment, the calculation operation includes a convolution operation; and
预设处理参数包括:当前层的输入数据的量化幅值、当前层的权重的量化幅值、当前层的输出数据的量化幅值、当前层的批量归一化的尺度的 量化幅值和当前层的批量归一化的偏置值的量化幅值中的至少一项。The preset processing parameters include: the quantized amplitude of the input data of the current layer, the quantized amplitude of the weight of the current layer, the quantized amplitude of the output data of the current layer, the quantized amplitude of the batch normalized scale of the current layer, and the current At least one of the quantized amplitudes of the batch-normalized bias values of the layer.
在一种实施方式中,定点化单元用于:In one embodiment, the fixed-point unit is used to:
对至少两个预设处理参数进行融合计算,得到当前层的尺度值和当前层的第二偏置值。Perform fusion calculation on at least two preset processing parameters to obtain the scale value of the current layer and the second bias value of the current layer.
在一种实施方式中,后处理操作包括:将计算操作结果与当前层的尺度值相乘之后,再与当前层的第二偏置值相加。In one embodiment, the post-processing operation includes: multiplying the result of the calculation operation by the scale value of the current layer, and then adding it to the second bias value of the current layer.
在一种实施方式中,定点化单元用于利用以下公式进行融合计算:In one embodiment, the fixed-point unit is used for fusion calculation using the following formula:
new_scale=bn_scale*input_scale*weight_scale/output_scale;new_scale=bn_scale*input_scale*weight_scale/output_scale;
new_bias=bn_bias/output_scale,new_bias=bn_bias/output_scale,
其中,new_scale表示当前层的尺度值,bn_scale表示当前层的批量归一化的尺度,input_scale表示当前层的输入数据的量化幅值,weight_scale表示当前层的权重的量化幅值,output_scale表示当前层的输出数据的量化幅值,new_bias表示当前层的第二偏置值,bn_bias表示当前层的批量归一化的偏置值。Among them, new_scale represents the scale value of the current layer, bn_scale represents the batch normalization scale of the current layer, input_scale represents the quantized amplitude of the input data of the current layer, weight_scale represents the quantized amplitude of the weight of the current layer, and output_scale represents the current layer The quantized amplitude of the output data, new_bias represents the second bias value of the current layer, and bn_bias represents the batch normalized bias value of the current layer.
第三方面,本申请实施例提供了一种电子设备,包括:In the third aspect, an embodiment of the present application provides an electronic device, including:
至少一个处理器;以及At least one processor; and
与至少一个处理器通信连接的存储器,其中,A memory communicatively connected with at least one processor, wherein,
存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行本申请任意一项实施例所提供的方法。The memory stores instructions that can be executed by at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method provided by any one of the embodiments of the present application.
第四方面,本申请实施例提供了一种存储有计算机指令的非瞬时计算机可读存储介质,该计算机指令用于使该计算机执行本申请任意一项实施例所提供的方法。In a fourth aspect, an embodiment of the present application provides a non-transitory computer-readable storage medium storing computer instructions, which are used to make the computer execute the method provided in any one of the embodiments of the present application.
上述申请中的一个实施例具有如下优点或有益效果:通过量化处理与定点化处理的融合,明显降低了算子之间的数据传输对带宽的要求,有效降低了加速单元的计算量,充分发挥了加速单元的定点化计算的优势。同时通过量化处理和定点化处理降低了计算对资源的要求,在节省资源的同时提升了计算效率。An embodiment in the above application has the following advantages or beneficial effects: through the fusion of quantization processing and fixed-point processing, the bandwidth requirements for data transmission between operators are significantly reduced, the calculation amount of the acceleration unit is effectively reduced, and the amount of calculation of the acceleration unit is effectively reduced. The advantages of the fixed-point calculation of the acceleration unit are improved. At the same time, the requirements for computing resources are reduced through quantitative processing and fixed-point processing, and computing efficiency is improved while saving resources.
上述可选方式所具有的其他效果将在下文中结合具体实施例加以说 明。The other effects of the above-mentioned optional methods will be described in the following in conjunction with specific embodiments.
附图说明Description of the drawings
附图用于更好地理解本方案,不构成对本申请的限定。其中:The drawings are used to better understand the solution, and do not constitute a limitation on the application. among them:
图1是根据本申请实施例的神经网络的量化与定点化融合方法的流程图;Fig. 1 is a flow chart of the quantization and fixed-point fusion method of neural network according to an embodiment of the present application;
图2是根据本申请实施例的神经网络的量化与定点化融合方法的量化与定点化融合关系图;FIG. 2 is a diagram of the quantization and fixed-point fusion relationship diagram of the quantization and fixed-point fusion method of the neural network according to an embodiment of the present application;
图3是根据本申请实施例的神经网络的量化与定点化融合方法的卷积计算流程图;Fig. 3 is a flow chart of convolution calculation of a neural network quantization and fixed-point fusion method according to an embodiment of the present application;
图4是根据本申请实施例的神经网络的量化与定点化融合装置的结构示意图;Figure 4 is a schematic structural diagram of a neural network quantization and fixed point fusion device according to an embodiment of the present application;
图5是用来实现本申请实施例的神经网络的量化与定点化融合方法的电子设备的框图。FIG. 5 is a block diagram of an electronic device used to implement the quantization and fixed-point fusion method of a neural network according to an embodiment of the present application.
具体实施方式Detailed ways
以下结合附图对本申请的示范性实施例做出说明,其中包括本申请实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本申请的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。The exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be regarded as merely exemplary. Therefore, those of ordinary skill in the art should realize that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the present application. Likewise, for clarity and conciseness, descriptions of well-known functions and structures are omitted in the following description.
目前的神经网络在嵌入式平台的推理有如下几种解决方案:(1)选取小型网络;(2)模型剪枝压缩;(3)参数量化。The current neural network reasoning on the embedded platform has the following solutions: (1) select a small network; (2) model pruning and compression; (3) parameter quantification.
上述解决方案具有如下缺陷。The above solution has the following drawbacks.
方案(1)只能解决简单的任务,适合简单场景,如果任务复杂则需要更为复杂结构的模型,该方法无法满足复杂任务的需求。Solution (1) can only solve simple tasks and is suitable for simple scenarios. If the task is complex, a model with a more complex structure is required. This method cannot meet the needs of complex tasks.
方案(2)是将规模较大的网络减少内部分支。该方法看似可以完成简单任务也可以适应复杂任务,但是该方法不适用于高并行度运行的 加速单元。由于对网络结构做了调整,在实现并行方面有明显问题,无法发挥出加速单元的加速能力。Solution (2) is to reduce the number of internal branches in a larger network. This method seems to be able to complete simple tasks and adapt to complex tasks, but this method is not suitable for acceleration units that run with high parallelism. Due to the adjustment of the network structure, there are obvious problems in achieving parallelism, and the acceleration capability of the acceleration unit cannot be used.
方案(3)通常是对参数部分做了量化,在网络加速和运行效率的提升方面不能达到理想的效果。Solution (3) usually quantifies the parameter part, which cannot achieve the desired effect in terms of network acceleration and operational efficiency improvement.
图1是根据本申请实施例的神经网络的量化与定点化融合方法的流程图。参见图1,该神经网络的量化与定点化融合方法包括:Fig. 1 is a flow chart of the quantization and fixed-point fusion method of a neural network according to an embodiment of the present application. Referring to Figure 1, the quantization and fixed-point fusion method of the neural network includes:
步骤S110,对神经网络的当前层的输入数据和权重做量化处理;Step S110: quantify the input data and weights of the current layer of the neural network;
步骤S120,在当前层中,利用量化处理后的权重对量化处理后的输入数据做计算操作,得到计算操作结果;Step S120, in the current layer, use the quantized weight to perform a calculation operation on the quantized input data to obtain a calculation operation result;
步骤S130,对预设处理参数进行定点化处理;Step S130, performing fixed-point processing on the preset processing parameters;
步骤S140,利用定点化处理后的预设处理参数,对计算操作结果进行后处理操作,得到当前层的输出结果。Step S140, using the preset processing parameters after fixed-point processing to perform post-processing operations on the results of the calculation operations to obtain the output results of the current layer.
神经网络的具体实现方式中,可以使用加速技术用以提升运行效率。以卷积神经网络为例,神经网络的加速方法可包括量化处理。量化即在计算中使用低数值精度以提升计算速度。具体地,量化技术是通过减少表示每个权重所需的比特数来压缩原始神经网络的加速方法。在一个示例中,对于N个比特位,可以表示2的N次方个权重,即修改神经网络中的权重,使得神经网络中的权重只能取2的N次方个值。低bit量化技术是在保证精度的基础上对计算过程中以及在其前后的处理步骤的数据进行量化,降低数据的表示范围。以嵌入式平台的应用场景为例,在当前神经网络推理中,一方面每层的输入输出以及对应的权重信息都有呈现正态分布或者稀疏性,另一方面高精度的数据表示会带来类似过拟合的影响,而且高精度的数据表示会造成结果精度的浪费,所以通常在嵌入式平台上的神经网络推理采用低bit量化来提高神经网络模型的鲁棒性以及提高平台计算能力。In the specific implementation of the neural network, acceleration technology can be used to improve operating efficiency. Taking the convolutional neural network as an example, the acceleration method of the neural network may include quantization processing. Quantization is the use of low numerical precision in calculations to increase the calculation speed. Specifically, the quantization technique is an acceleration method that compresses the original neural network by reducing the number of bits required to express each weight. In an example, for N bits, the N-th power of 2 can be expressed, that is, the weights in the neural network are modified so that the weights in the neural network can only take 2 N-th values. Low-bit quantization technology is to quantify the data in the calculation process and the processing steps before and after it on the basis of ensuring accuracy, reducing the range of data representation. Taking the application scenario of the embedded platform as an example, in the current neural network reasoning, on the one hand, the input and output of each layer and the corresponding weight information have a normal distribution or sparseness, on the other hand, high-precision data representation will bring Similar to the impact of over-fitting, and high-precision data representation will cause a waste of result accuracy, so neural network reasoning on embedded platforms usually uses low-bit quantization to improve the robustness of the neural network model and increase the computing power of the platform.
定点化技术是将数据由浮点数表示转换为定点数表示。嵌入式平台中的加速单元如FPGA(Field Programmable Gate Array,现场可编程逻辑门阵列)、DSP(digital signal processor,数字信号处理器)、GPU(Graphics Processing Unit,图形处理器)等,都对定点化计算有很好的支持,而 且对定点化计算的计算效率更高。定点化不仅可以完全保证结果精度,而且能最大限度地提高加速单元的有效算力。Fixed-point technology is to convert data from floating-point number representation to fixed-point number representation. The acceleration units in the embedded platform, such as FPGA (Field Programmable Gate Array), DSP (digital signal processor, digital signal processor), GPU (Graphics Processing Unit, graphics processor), etc., are all fixed-point There is very good support for the calculation of fixed-point calculation, and the calculation efficiency of the fixed-point calculation is higher. Fixed-pointization can not only fully guarantee the accuracy of the results, but also maximize the effective computing power of the acceleration unit.
计算机处理的数值数据多数带有小数。小数点在计算机中通常有两种表示方法。一种是约定所有数值数据的小数点隐含在某一个固定位置上,称为定点表示法,简称定点数。另一种是小数点位置可以浮动,称为浮点表示法,简称浮点数。也就是说,定点数是小数点固定的数。对于定点数,在计算机中没有专门表示小数点的位,小数点的位置是约定默认的。浮点数是小数点的位置可以变动的数。为增大数值表示范围,防止溢出,在有些应用场景下数据采用浮点数表示法。浮点表示法类似于科学计数法。Most of the numerical data processed by the computer have decimals. There are usually two ways to represent the decimal point in a computer. One is to stipulate that the decimal point of all numerical data is implicit in a certain fixed position, which is called fixed-point notation, or fixed-point for short. The other is that the position of the decimal point can be floated, which is called floating point representation, or floating point for short. In other words, a fixed-point number is a number with a fixed decimal point. For fixed-point numbers, there is no special place for the decimal point in the computer, and the position of the decimal point is the default by convention. Floating point numbers are numbers whose decimal point position can be changed. In order to increase the range of numerical representation and prevent overflow, the data adopts floating-point number representation in some application scenarios. Floating point notation is similar to scientific notation.
以嵌入式平台的应用场景为例,神经网络的加速对于提升嵌入式平台的计算效率而言是至关重要的。本发明实施例结合算法本身以及嵌入式加速单元的特点提供了一种将量化处理与定点化处理相融合的高效计算优化方法。图2是根据本申请实施例的神经网络的量化与定点化融合方法的量化与定点化融合关系图。如图2所示,本申请实施例的神经网络的量化与定点化融合方法主要包括低bit量化计算和定点化计算两个计算步骤,而且这两个步骤相互依赖相互影响,通过两者的有效结合可以实现更高效的加速效果。Taking the application scenario of the embedded platform as an example, the acceleration of the neural network is essential to improve the computing efficiency of the embedded platform. The embodiments of the present invention combine the characteristics of the algorithm itself and the embedded acceleration unit to provide an efficient calculation optimization method that combines quantization processing and fixed-point processing. Fig. 2 is a diagram showing the relationship between quantization and fixed-point fusion of a neural network quantization and fixed-point fusion method according to an embodiment of the present application. As shown in Figure 2, the quantization and fixed-point fusion method of the neural network in the embodiment of the present application mainly includes two calculation steps: low-bit quantization calculation and fixed-point calculation, and these two steps depend on each other and influence each other. The combination can achieve a more efficient acceleration effect.
如图2所示,低bit量化计算步骤包括“fm”、“conv weight”和“compute”。其中,“fm”的全称是feature map,表示当前层的输入特征映射。“conv weight”表示卷积层的权重。“compute”表示当前层的计算操作,例如与权重相乘的计算操作、卷积层中的卷积计算操作等。As shown in Figure 2, the low-bit quantization calculation steps include "fm", "conv weight" and "compute". Among them, the full name of "fm" is feature map, which represents the input feature map of the current layer. "Conv weight" indicates the weight of the convolutional layer. "Compute" represents the calculation operation of the current layer, such as the calculation operation of multiplying the weight, the convolution calculation operation in the convolution layer, and so on.
参见图1和图2,在一个示例中,“fm”本身可以是整数。图1中的神经网络的当前层的输入数据可包括输入特征映射(fm)。在图2中,“fm”与“compute”之间的连接线上的“int n”,表示将当前层的输入特征映射做量化处理,下角标n表示量化后的比特位。 Referring to Figures 1 and 2, in one example, "fm" itself may be an integer. The input data of the current layer of the neural network in FIG. 1 may include an input feature map (fm). In Figure 2, the "int n " on the connecting line between "fm" and "compute" indicates that the input feature of the current layer is mapped for quantization, and the subscript n indicates the quantized bit.
参见图1和图2,在当前层为卷积层的情况下,图1中的神经网络的当前层的权重即为卷积层的权重(conv weight)。在图2中,“conv weight”与“compute”之间的连接线上的“int n”,表示将当前层的权重做 量化处理,下角标n表示量化后的比特位。图2中“compute”的输出结果也就是图1中的计算操作结果。“compute”与“post compute”之间的连接线上的“int m”,表示“compute”的输出结果是一个整型数据,下角标m表示输出结果的比特位。在一个示例中,n的取值可以为8、4或2,m>n。 Referring to FIGS. 1 and 2, when the current layer is a convolutional layer, the weight of the current layer of the neural network in FIG. 1 is the conv weight (conv weight). In Figure 2, the "int n " on the connecting line between "conv weight" and "compute" indicates that the weight of the current layer is quantized, and the subscript n indicates the quantized bit position. The output result of "compute" in Figure 2 is also the result of the calculation operation in Figure 1. The "int m " on the connecting line between "compute" and "post compute" indicates that the output result of "compute" is an integer data, and the subscript m indicates the bit position of the output result. In an example, the value of n can be 8, 4, or 2, and m>n.
如图2所示,定点化计算步骤包括“post compute”和“bias”。其中,“post compute”表示当前层中的后处理操作,例如加偏置、归一化等。参见图1和图2,图1中的预设处理参数可包括bias(偏置),bias表示当前层的网络参数之中的偏置值。图2中的“bias”表示对bias(偏置)做定点化处理后产生fixed p.q值,下角标p和q分别表示定点化处理的整数部分和小数部分各自所占用的比特位。 As shown in Figure 2, the fixed-point calculation steps include "post compute" and "bias". Among them, "post compute" means post-processing operations in the current layer, such as biasing, normalization, etc. Referring to FIGS. 1 and 2, the preset processing parameters in FIG. 1 may include bias (bias), and bias represents a bias value among the network parameters of the current layer. The "bias" in Figure 2 represents the fixed pq value generated after the bias is fixed-pointed, and the subscripts p and q respectively represent the bits occupied by the integer part and the decimal part of the fixed-point processing.
参见图2,将定点化处理后的fixed p.q值输入到“post compute”做后处理操作,产生当前层的计算结果。这个计算结果有可能精度过高,也就是说这个计算结果的精度可能高于对输出结果要求的精度。因此需要对这个计算结果做进一步的量化处理。 Referring to Figure 2, the fixed pq value after the fixed-point processing is input into the "post compute" for post-processing operations to generate the calculation result of the current layer. The calculation result may be too accurate, which means that the accuracy of the calculation result may be higher than the accuracy required for the output result. Therefore, it is necessary to further quantify the calculation result.
如图2所示,“quant”表示对后处理操作产生的计算结果进行量化处理,例如截断小数位、进位等。例如可采用一种通常的进位方式:四舍五入。As shown in Figure 2, "quant" means performing quantization processing on the calculation result generated by the post-processing operation, such as truncating decimal places, carrying bits, and so on. For example, a usual rounding method can be used: rounding.
如图2所示,将“quant”量化处理之后的结果作为神经网络的下一层的输入,即图2中所示的“fm(next)”。As shown in Figure 2, the result of the "quant" quantization process is used as the input of the next layer of the neural network, that is, the "fm(next)" shown in Figure 2.
如前述,本发明实施例不限制量化处理的比特位,可以是8bit,4bit甚至2bit。在低bit量化中的bit位宽确定的前提下,定点化的参数也可以自适应确定。例如定点化处理的整数部分p和小数部分q可根据具体应用的数据精度要求而确定。在一种实施方式中,也可以仅通过量化处理的方式或者仅通过定点化处理的方式来实现神经网络推理的加速。As mentioned above, the embodiment of the present invention does not limit the bits of the quantization processing, and may be 8 bits, 4 bits or even 2 bits. On the premise that the bit width in low-bit quantization is determined, the fixed-point parameters can also be adaptively determined. For example, the integer part p and the decimal part q of the fixed-point processing can be determined according to the data accuracy requirements of the specific application. In an implementation manner, the acceleration of neural network inference can also be achieved only by means of quantization processing or only by means of fixed-point processing.
本申请实施例中,通过量化处理与定点化处理的融合,明显降低了算子之间的数据传输对带宽的要求,有效降低了加速单元的计算量,充分发挥了加速单元的定点化计算的优势。因为算子之间传输的数据bit位指数倍地降低,在计算资源上也会有收益。又由于算子能够直接获得 量化数据开始计算,中间的数据转换部分消耗的资源会消失。同时通过量化处理和定点化处理降低了计算对资源的要求,在节省资源的同时提升了计算效率。In the embodiments of this application, through the fusion of quantization processing and fixed-point processing, the bandwidth requirements for data transmission between operators are significantly reduced, the calculation amount of the acceleration unit is effectively reduced, and the fixed-point calculation of the acceleration unit is fully utilized. Advantage. Because the data bits transmitted between operators are exponentially reduced, there will also be benefits in computing resources. And because the operator can directly obtain the quantified data to start the calculation, the resources consumed by the intermediate data conversion part will disappear. At the same time, the requirements for computing resources are reduced through quantitative processing and fixed-point processing, and computing efficiency is improved while saving resources.
在一种实施方式中,图1中的计算操作包括将量化处理后的输入数据与量化处理后的权重相乘;In one embodiment, the calculation operation in FIG. 1 includes multiplying the quantized input data by the quantized weight;
预设处理参数包括当前层的第一偏置值;The preset processing parameters include the first bias value of the current layer;
后处理操作包括将计算操作结果与定点化处理后的第一偏置值相加。The post-processing operation includes adding the result of the calculation operation to the first offset value after the fixed-point processing.
参见图1和图2,在这种实施方式中,图2中的“低bit量化计算”具体可包括将量化处理后的输入数据与量化处理后的权重相乘的计算过程,其中的“compute”具体可包括将量化处理后的输入特征映射与量化处理后的权重相乘。Referring to Figures 1 and 2, in this embodiment, the "low bit quantization calculation" in Figure 2 may specifically include a calculation process of multiplying quantized input data and quantized weights, where the "compute "Specifically, it may include multiplying the quantized input feature map by the quantized weight.
参见图1和图2,图1中的预设处理参数可包括当前层的网络参数之中的偏置值,也称为第一偏置值。图2中的“定点化计算”具体可包括将计算操作产生的结果再加上定点化处理后的第一偏置值的计算过程,其中的“post compute”具体可包括将上述相乘的结果再加上fixed p.q值的计算过程。 Referring to FIGS. 1 and 2, the preset processing parameters in FIG. 1 may include the offset value among the network parameters of the current layer, which is also referred to as the first offset value. The "fixed-point calculation" in FIG. 2 can specifically include the calculation process of adding the result of the calculation operation to the first offset value after the fixed-point processing, and the "post compute" can specifically include the result of the above multiplication. Plus the calculation process of the fixed pq value.
本申请实施例中,将用于计算操作的输入数据和权重的量化处理,和用于后处理操作的预设处理参数的定点化处理相融合,从而达到很好的神经网络加速效果。In the embodiments of the present application, the quantization processing of the input data and weights used for the calculation operation is combined with the fixed-point processing of the preset processing parameters used for the post-processing operation, so as to achieve a good neural network acceleration effect.
本申请实施例可应用于人工智能领域,尤其是自动驾驶领域。例如,在自主泊车领域的应用场景中,可用泊车位的环境参数和泊车车身的航向角作为输入数据,用方向盘的转角作为输出结果。其中,泊车位的环境参数可包括泊车位的大小及位置。在一个示例中,输入数据还可包括车身位置,例如车身右后点的坐标。通过训练好的神经网络计算方向盘的转角,再根据计算出的方向盘的转角、泊车位数据和车身位置可完成自动泊车控制。The embodiments of the present application can be applied to the field of artificial intelligence, especially the field of automatic driving. For example, in an application scenario in the field of autonomous parking, the environmental parameters of the parking space and the heading angle of the parking body can be used as input data, and the rotation angle of the steering wheel can be used as the output result. Among them, the environmental parameters of the parking space may include the size and location of the parking space. In an example, the input data may also include the position of the vehicle body, such as the coordinates of the right rear point of the vehicle body. Calculate the steering wheel angle through the trained neural network, and then complete the automatic parking control based on the calculated steering wheel angle, parking space data and body position.
在自主泊车领域的应用场景中,可对神经网络的当前层的输入数据和权重做量化处理。例如对泊车位的环境参数和泊车车身的航向角和权 重做量化处理。在当前层中,利用量化处理后的权重对量化处理后的输入数据做计算操作,得到计算操作结果。例如计算操作结果可以是能够完成自动泊车控制的方向盘的转角。然后对预设处理参数进行定点化处理。例如预设处理参数包括当前层的第一偏置值。利用定点化处理后的预设处理参数,对计算操作结果进行后处理操作,得到当前层的输出结果。例如后处理操作包括将计算操作结果与定点化处理后的第一偏置值相加。In application scenarios in the field of autonomous parking, the input data and weights of the current layer of the neural network can be quantified. For example, the environmental parameters of the parking space and the heading angle and weight of the parking body are quantified. In the current layer, use the quantized weights to perform calculation operations on the quantized input data to obtain the results of the calculation operations. For example, the result of the calculation operation may be the angle of the steering wheel capable of completing automatic parking control. Then the preset processing parameters are fixed-point processing. For example, the preset processing parameters include the first bias value of the current layer. Use the preset processing parameters after fixed-point processing to perform post-processing operations on the results of the calculation operations to obtain the output results of the current layer. For example, the post-processing operation includes adding the result of the calculation operation to the first offset value after the fixed-point processing.
在自主泊车领域的应用场景中,将用于计算操作的输入数据和权重的量化处理,和用于后处理操作的预设处理参数的定点化处理相融合,从而达到很好的神经网络加速效果。通过量化处理与定点化处理的融合,明显降低了算子之间的数据传输对带宽的要求,有效降低了加速单元的计算量,充分发挥了加速单元的定点化计算的优势。在节省资源的同时提升了计算效率。In the application scenarios of autonomous parking, the quantization processing of input data and weights used for calculation operations is integrated with the fixed-point processing of preset processing parameters for post-processing operations, so as to achieve good neural network acceleration effect. Through the fusion of quantization processing and fixed-point processing, the bandwidth requirements for data transmission between operators are significantly reduced, the calculation amount of the acceleration unit is effectively reduced, and the advantages of the fixed-point calculation of the acceleration unit are fully utilized. While saving resources, it improves computing efficiency.
图3是根据本申请实施例的神经网络的量化与定点化融合方法的卷积计算流程图。在图3中以卷积层为例示出了量化与定点化融合方法的计算流程。该计算流程包括加速计算流程和初始化计算流程。图3中标号为1的矩形框表示加速计算流程,标号为2的矩形框表示初始化计算流程。Fig. 3 is a flow chart of convolution calculation of a neural network quantization and fixed-point fusion method according to an embodiment of the present application. In Fig. 3, a convolutional layer is taken as an example to show the calculation process of the quantization and fixed-point fusion method. The calculation process includes an accelerated calculation process and an initialization calculation process. The rectangular box labeled 1 in FIG. 3 represents the accelerated calculation process, and the rectangular box labeled 2 represents the initialization calculation process.
参见图1和图3,图1中的步骤S110具体可包括:在加速计算流程中,“weight int n”表示将卷积层的权重做量化处理,下角标n表示量化后的比特位。“input int n”表示将卷积层的输入数据做量化处理,下角标n表示量化后的比特位。 1 and 3, step S110 in FIG. 1 may specifically include: in the accelerated calculation process, "weight int n " represents the quantization of the weight of the convolutional layer, and the subscript n represents the quantized bit position. "Input int n " means quantizing the input data of the convolutional layer, and the subscript n means the quantized bit.
在一种实施方式中,计算操作包括卷积操作。In one embodiment, the calculation operation includes a convolution operation.
参见图1和图3,图1中的步骤S120中的计算操作在卷积层中具体可包括卷积操作。图3中的“Multi compute”表示卷积操作。“Multi compute”产生的结果为“Conv result(int m)”,表示该结果是一个整型数据,下角标m表示该结果的比特位。 Referring to FIG. 1 and FIG. 3, the calculation operation in step S120 in FIG. 1 may specifically include a convolution operation in the convolution layer. "Multi compute" in Figure 3 represents a convolution operation. The result of "Multi compute" is "Conv result(int m )", which means that the result is an integer data, and the subscript m indicates the bit position of the result.
参见图1和图3,图1中的步骤S130中的对预设处理参数进行定点化处理后的结果具体可包括:初始化计算流程中产生的“New weight”的最终结果。“New weight”的最终结果包括New scale(当前层的尺 度值)和New bias(当前层的第二偏置值)。1 and 3, the result of the fixed-point processing of the preset processing parameters in step S130 in FIG. 1 may specifically include: the final result of "New weight" generated in the initialization calculation process. The final result of "New weight" includes New scale (the scale value of the current layer) and New bias (the second bias value of the current layer).
在一种实施方式中,后处理操作包括:将计算操作结果与当前层的尺度值相乘之后,再与当前层的第二偏置值相加。In one embodiment, the post-processing operation includes: multiplying the result of the calculation operation by the scale value of the current layer, and then adding it to the second bias value of the current layer.
参见图1和图3,图1中的步骤S140中的后处理操作具体可包括:加速计算流程中的“Multi and add”所执行的操作,即将“Conv result”与New scale相乘,再加上New bias。通过“Multi and add”所执行的操作后得到当前层的输出数据“output(int n)”,表示该输出数据是一个整型数据,下角标n表示该输出数据的比特位。 1 and 3, the post-processing operation in step S140 in FIG. 1 may specifically include: the operation performed by "Multi and add" in the accelerated calculation process, that is, multiplying "Conv result" by New scale, and then adding Go to New bias. The output data “output(int n )” of the current layer is obtained after the operation performed by “Multi and add”, which indicates that the output data is an integer data, and the subscript n indicates the bit position of the output data.
在一个示例中,n的取值可以为8、4或2,m>n。In an example, the value of n can be 8, 4, or 2, and m>n.
本申请实施例中,利用定点化处理后的融合计算的结果做后处理操作,使量化处理和定点化处理相融合,在加速效果和运行效率方面都有很大的提升。In the embodiment of the present application, the result of the fusion calculation after the fixed-point processing is used for post-processing operations, so that the quantization processing and the fixed-point processing are merged, which greatly improves the acceleration effect and operating efficiency.
在一种实施方式中,预设处理参数包括:当前层的输入数据的量化幅值、当前层的权重的量化幅值、当前层的输出数据的量化幅值、当前层的批量归一化的尺度和当前层的批量归一化的偏置值中的至少一项。In one embodiment, the preset processing parameters include: the quantized amplitude of the input data of the current layer, the quantized amplitude of the weight of the current layer, the quantized amplitude of the output data of the current layer, and the batch normalized value of the current layer. At least one of the scale and the batch normalized bias value of the current layer.
参见图1和图3,图1中的预设处理参数具体可包括图3中的“input scale”、“Weight scale”、“output scale”、“bn scale”和“bn scale”。在初始化计算流程中,上述预设处理参数的表示含义如下:Referring to Figure 1 and Figure 3, the preset processing parameters in Figure 1 may specifically include "input scale", "Weight scale", "output scale", "bn scale" and "bn scale" in Figure 3. In the initialization calculation process, the meanings of the above preset processing parameters are as follows:
“input scale”表示在训练时统计的当前层的输入数据的量化幅值,例如统计当前层的输入数据的最大值。"Input scale" represents the quantized amplitude of the input data of the current layer that is counted during training, for example, the maximum value of the input data of the current layer is counted.
“Weight scale”表示在训练时统计的当前层的权重的量化幅值,例如统计当前层的权重的最大值。"Weight scale" represents the quantized magnitude of the weight of the current layer calculated during training, for example, the maximum value of the weight of the current layer is calculated.
“output scale”表示在训练时统计的当前层的输出数据的量化幅值,例如统计当前层的输出数据的最大值。"Output scale" represents the quantized amplitude of the output data of the current layer calculated during training, for example, the maximum value of the output data of the current layer is calculated.
“bn scale”表示当前层的批量归一化的尺度,是由训练产出的参数中bn(batch normalization)参数之一。"Bn scale" represents the batch normalization scale of the current layer, and is one of the bn (batch normalization) parameters among the parameters produced by the training.
“bn bias”表示当前层的批量归一化的偏置值,是由训练产出的参数中bn(batch normalization)参数之一。"Bn bias" represents the bias value of the batch normalization of the current layer, which is one of the bn (batch normalization) parameters among the parameters produced by the training.
本申请实施例中,在卷积计算流程中将量化处理与定点化处理相融 合,利用当前层的输入数据、权重以及输出数据做加速处理,提升了计算效率。In the embodiment of the present application, the quantization processing and the fixed-point processing are combined in the convolution calculation process, and the input data, weight, and output data of the current layer are used for acceleration processing, which improves the calculation efficiency.
以上是卷积层的预设处理参数的示例。在具体应用中,预设处理参数可以根据网络结构进行调整。The above is an example of the preset processing parameters of the convolutional layer. In specific applications, the preset processing parameters can be adjusted according to the network structure.
在一种实施方式中,图1中的步骤S130,对预设处理参数进行定点化处理,包括:In one embodiment, step S130 in FIG. 1, performing fixed-point processing on preset processing parameters, includes:
对至少两个预设处理参数进行融合计算,得到当前层的尺度值和当前层的第二偏置值。Perform fusion calculation on at least two preset processing parameters to obtain the scale value of the current layer and the second bias value of the current layer.
参见图1和图3,图3中的“Weight fusion”表示将预设处理参数做融合计算。Refer to Figure 1 and Figure 3. The "Weight fusion" in Figure 3 indicates that the preset processing parameters are used for fusion calculation.
本申请实施例中,将预设处理参数进行融合计算,得到的融合计算的结果可作为后续的定点化处理的数据依据,通过定点化可以提高加速单元的有效算力。In the embodiment of the present application, the preset processing parameters are subjected to fusion calculation, and the result of the fusion calculation can be used as a data basis for subsequent fixed-point processing, and the effective computing power of the acceleration unit can be improved through the fixed-point processing.
在一种实施方式中,对至少两个预设处理参数进行融合计算,包括利用以下公式进行融合计算:In one embodiment, performing fusion calculation on at least two preset processing parameters includes performing fusion calculation using the following formula:
new_scale=bn_scale*input_scale*weight_scale/output_scale;new_scale=bn_scale*input_scale*weight_scale/output_scale;
new_bias=bn_bias/output_scale,new_bias=bn_bias/output_scale,
其中,new_scale表示当前层的尺度值,bn_scale表示当前层的批量归一化的尺度,input_scale表示当前层的输入数据的量化幅值,weight_scale表示当前层的权重的量化幅值,output_scale表示当前层的输出数据的量化幅值,new_bias表示当前层的第二偏置值,bn_bias表示当前层的批量归一化的偏置值。Among them, new_scale represents the scale value of the current layer, bn_scale represents the batch normalization scale of the current layer, input_scale represents the quantized amplitude of the input data of the current layer, weight_scale represents the quantized amplitude of the weight of the current layer, and output_scale represents the current layer The quantized amplitude of the output data, new_bias represents the second bias value of the current layer, and bn_bias represents the batch normalized bias value of the current layer.
本申请实施例中,将上述预设处理参数进行融合计算,得到的融合计算的结果可作为后续的定点化处理的数据依据,通过定点化可有效降低加速单元的计算量,充分发挥加速单元的定点化计算的优势。In the embodiment of the application, the above-mentioned preset processing parameters are subjected to fusion calculation, and the result of the fusion calculation can be used as a data basis for subsequent fixed-point processing. Through the fixed-point processing, the calculation amount of the acceleration unit can be effectively reduced, and the acceleration unit can be fully utilized. Advantages of fixed-point computing.
参见图3,“Fix point”表示做定点化处理。在一种实施方式中,经过“Weight fusion”融合计算之后得到“New weight”的中间结果,再将“New weight”的中间结果做定点化处理,得到“New weight”的最终结果。Refer to Figure 3, "Fix Point" means fixed point processing. In one embodiment, the intermediate result of "New weight" is obtained after the "Weight fusion" fusion calculation, and then the intermediate result of "New weight" is fixed-pointed to obtain the final result of "New weight".
如图3所示,标号为3的操作流程或数据表示低bit数据以及对应的计算过程。标号为4的操作流程或数据表示初始化计算流程,该计算流程是基于浮点运算,且该计算流程不会占用加速单元。其中,加速单元可包括硬件设备,如FPGA(Field Programmable Gate Array,现场可编程逻辑门阵列)、GPU(Graphics Processing Unit,图形处理器)等。标号为5的操作流程或数据表示定点化数据以及相关运算部分。As shown in Fig. 3, the operation flow or data labeled 3 represents low-bit data and the corresponding calculation process. The operation process or data labeled 4 represents an initialization calculation process. The calculation process is based on floating-point operations, and the calculation process does not occupy an acceleration unit. Among them, the acceleration unit may include hardware devices, such as FPGA (Field Programmable Gate Array, field programmable logic gate array), GPU (Graphics Processing Unit, graphics processor), and so on. The operation flow or data labeled 5 represents fixed-point data and related calculation parts.
神经网络通常为多层结构,其中的每一层执行一个操作处理过程。每一个操作处理过程对应于一个算子。仍以嵌入式加速平台为例,本申请实施例中的低bit量化处理是将神经网络中的权重参数在训练时加入量化规则,使之产出的权重为低bit表示数据,并且在训练过程中对各个算子的输入输出进行量化参数统计,将统计出的量化参数应用在神经网络在嵌入式加速平台的推理过程中。由于有了以上前提,在定点化处理的过程中,算子对输入输出的数据要求的精度本身不需要太高,只是达到量化后的低bit表示即可,所以在算子输出计算时不需要进行全精度的计算,只要保证量化产出的精度即可。在定点化处理的过程中,将低bit计算结果与预设处理参数的计算,获取最终低bit结果。例如预设处理参数可以是在初始化计算流程中全精度计算并做定点化操作得到的。Neural networks are usually multi-layered structures, each of which performs an operation process. Each operation process corresponds to an operator. Still taking the embedded acceleration platform as an example, the low-bit quantization processing in the embodiment of this application is to add quantization rules to the weight parameters in the neural network during training, so that the weights produced are low-bit representation data, and during the training process Quantified parameter statistics are performed on the input and output of each operator, and the statistically calculated quantized parameters are used in the reasoning process of the neural network on the embedded acceleration platform. Due to the above premise, in the process of fixed-point processing, the accuracy required by the operator for the input and output data itself does not need to be too high, but only to achieve the quantized low-bit representation, so it is not necessary for the calculation of the operator output For full-precision calculations, as long as the accuracy of the quantified output is guaranteed. In the process of fixed-point processing, the low-bit calculation result is calculated with the preset processing parameters to obtain the final low-bit result. For example, the preset processing parameters may be obtained by full-precision calculation and fixed-point operation in the initialization calculation process.
在神经网络的训练过程中需要考虑加速单元的计算顺序,也就是在算子融合后的实际计算顺序。例如conv(卷积)+add(加)+bn+relu(Rectified Linear Unit,修正线性单元)这样的网络结构,上述网络结构中的四个算子分别对应于网络中的4个层,可将4个算子融合成1个算子做加速处理。可通过类似于图2的方式处理conv+add+bn+relu这样的结构算子,例如实际计算过程是先做conv的乘累积,然后将加操作融合到了bn中,而且低bit量化的参数也融合到了bn中。其中的预设处理参数包含了输入数据、权重以及输出数据,所以与传统的计算顺序是不一致的。传统的计算顺序中通常仅包括关于输入数据或权重的单独的计算步骤。同时在训练中要考虑定点化的操作,也就是在某些位置要做特殊的量化。In the training process of the neural network, the calculation sequence of the acceleration unit needs to be considered, that is, the actual calculation sequence after the operator fusion. For example, conv (convolution) + add (add) + bn + relu (Rectified Linear Unit, modified linear unit) such a network structure, the four operators in the above network structure correspond to the four layers in the network, and the 4 operators are merged into 1 operator for acceleration processing. Structural operators such as conv+add+bn+relu can be processed in a manner similar to Figure 2. For example, the actual calculation process is to first multiply and accumulate conv, and then integrate the addition operation into bn, and the low-bit quantization parameters are also Merged into bn. The preset processing parameters include input data, weights, and output data, so they are inconsistent with the traditional calculation sequence. The traditional calculation sequence usually only includes a single calculation step with respect to input data or weights. At the same time, fixed-point operations should be considered in training, that is, special quantification should be done in certain positions.
本申请实施例需要根据实际的推理过程来确定训练的过程,也就是说训练时要考虑算子融合的问题,保持算子能够按照约定的顺序计算。使训练过程的计算顺序与推理过程的计算顺序保持一致。而且在反向传播过程中,计算梯度时要注意使用输入输出的固定的量化尺度来更新权重信息。计算出权重信息后要采用数学统计的方法来更新输入输出的量化尺度。也就是说,量化尺度的更新以及权重的获取都是在整个训练过程中完成的。在训练过程中定点化可以通过限定数据范围来自动的更新到预设处理参数中。其中,限定数据范围可包括表示定点化处理的整数部分和小数部分各自所占用的比特位。The embodiment of the present application needs to determine the training process according to the actual reasoning process, that is, the problem of operator fusion should be considered during training, and the operator can be calculated in an agreed order. Make the calculation sequence of the training process consistent with the calculation sequence of the inference process. And in the process of backpropagation, when calculating the gradient, pay attention to using the fixed quantization scale of input and output to update the weight information. After calculating the weight information, mathematical statistics should be used to update the quantization scale of input and output. In other words, the update of the quantization scale and the acquisition of the weights are all completed during the entire training process. In the training process, the fixed point can be automatically updated to the preset processing parameters by limiting the data range. Wherein, the limited data range may include the bits occupied by the integer part and the decimal part of the fixed-point processing.
在神经网络的推理过程中,整个计算过程可按照低bit以及定点化的要求的计算过程来实施,以确保得到准确的结果。保证在推理过程中与训练过程一样,首先要保持计算顺序一致,然后在融合预设处理参数时首先按照全精度离线计算完成,在实际使用时按照定点化的要求来做截断,保证数据的正确性。In the reasoning process of the neural network, the entire calculation process can be implemented in accordance with the calculation process of low bit and fixed-point requirements to ensure accurate results. Ensure that the inference process is the same as the training process. First, the calculation sequence must be consistent. Then, when the preset processing parameters are merged, the full-precision offline calculation is first completed. In actual use, it is truncated according to the fixed-point requirements to ensure the correctness of the data Sex.
本申请实施例中,低bit量化与定点化融合在一起,不只是在前向推理阶段,在网络训练过程中也要充分考虑如何将两者融合并且做到完全适配。对于不同的神经网络框架,在训练时需要考虑算子融合部分,要考虑如何在算子融合的过程将以上量化与定点化准确无误的融合进来,例如预先设计好在哪个步骤做量化处理以及在哪个步骤做定点化处理,保证数据的正确性和过程的正确性。In the embodiment of this application, low-bit quantization and fixed-point fusion are integrated, not only in the forward reasoning stage, but also in the network training process, how to integrate the two and achieve complete adaptation. For different neural network frameworks, the operator fusion part needs to be considered during training, and how to integrate the above quantization and fixed-pointization accurately and accurately in the process of operator fusion, such as pre-designed in which step to perform quantization and in Which step is to be processed at a fixed point to ensure the correctness of the data and the correctness of the process.
图4是根据本申请实施例的神经网络的量化与定点化融合装置的结构示意图。如图4所示,本申请实施例的神经网络的量化与定点化融合装置包括:Fig. 4 is a schematic structural diagram of a neural network quantization and fixed-point fusion device according to an embodiment of the present application. As shown in FIG. 4, the quantization and fixed-point fusion device of neural network in the embodiment of the present application includes:
量化单元100,用于:对神经网络的当前层的输入数据和权重做量化处理;The quantization unit 100 is used to perform quantization processing on the input data and weights of the current layer of the neural network;
第一操作单元200,用于:在当前层中,利用量化处理后的权重对量化处理后的输入数据做计算操作,得到计算操作结果;The first operating unit 200 is configured to: in the current layer, use the weights after the quantization processing to perform calculation operations on the input data after the quantization processing to obtain the results of the calculation operations;
定点化单元300,用于:对预设处理参数进行定点化处理;The fixed-point unit 300 is used to: perform fixed-point processing on preset processing parameters;
第二操作单元400,用于:利用定点化处理后的预设处理参数,对 计算操作结果进行后处理操作,得到当前层的输出结果。The second operating unit 400 is configured to: use the preset processing parameters after the fixed-point processing to perform post-processing operations on the results of the calculation operations to obtain the output results of the current layer.
在一种实施方式中,计算操作包括将量化处理后的输入数据与量化处理后的权重相乘;In one embodiment, the calculation operation includes multiplying the quantized input data by the quantized weight;
预设处理参数包括当前层的第一偏置值;The preset processing parameters include the first bias value of the current layer;
后处理操作包括将计算操作结果与定点化处理后的第一偏置值相加。The post-processing operation includes adding the result of the calculation operation to the first offset value after the fixed-point processing.
在一种实施方式中,计算操作包括卷积操作;In one embodiment, the calculation operation includes a convolution operation;
预设处理参数包括:当前层的输入数据的量化幅值、当前层的权重的量化幅值、当前层的输出数据的量化幅值、当前层的批量归一化的尺度的量化幅值和当前层的批量归一化的偏置值的量化幅值中的至少一项。The preset processing parameters include: the quantized amplitude of the input data of the current layer, the quantized amplitude of the weight of the current layer, the quantized amplitude of the output data of the current layer, the quantized amplitude of the batch normalized scale of the current layer, and the current At least one of the quantized amplitudes of the batch-normalized bias values of the layer.
在一种实施方式中,定点化单元300用于:In one embodiment, the pointing unit 300 is used to:
对至少两个预设处理参数进行融合计算,得到当前层的尺度值和当前层的第二偏置值。Perform fusion calculation on at least two preset processing parameters to obtain the scale value of the current layer and the second bias value of the current layer.
在一种实施方式中,后处理操作包括:将计算操作结果与当前层的尺度值相乘之后,再与当前层的第二偏置值相加。In one embodiment, the post-processing operation includes: multiplying the result of the calculation operation by the scale value of the current layer, and then adding it to the second bias value of the current layer.
在一种实施方式中,定点化单元300用于利用以下公式进行融合计算:In one embodiment, the fixed-point unit 300 is used to perform fusion calculation using the following formula:
new_scale=bn_scale*input_scale*weight_scale/output_scale;new_scale=bn_scale*input_scale*weight_scale/output_scale;
new_bias=bn_bias/output_scale,new_bias=bn_bias/output_scale,
其中,new_scale表示当前层的尺度值,bn_scale表示当前层的批量归一化的尺度,input_scale表示当前层的输入数据的量化幅值,weight_scale表示当前层的权重的量化幅值,output_scale表示当前层的输出数据的量化幅值,new_bias表示当前层的第二偏置值,bn_bias表示当前层的批量归一化的偏置值。Among them, new_scale represents the scale value of the current layer, bn_scale represents the batch normalization scale of the current layer, input_scale represents the quantized amplitude of the input data of the current layer, weight_scale represents the quantized amplitude of the weight of the current layer, and output_scale represents the current layer The quantized amplitude of the output data, new_bias represents the second bias value of the current layer, and bn_bias represents the batch normalized bias value of the current layer.
本申请实施例的神经网络的量化与定点化融合装置中的各单元的功能可以参见上述方法中的对应描述,在此不再赘述。The function of each unit in the quantization and fixed-point fusion device of the neural network in the embodiment of the present application can be referred to the corresponding description in the above method, and will not be repeated here.
根据本申请的实施例,本申请还提供了一种电子设备和一种可读存储介质。According to the embodiments of the present application, the present application also provides an electronic device and a readable storage medium.
如图5所示,是根据本申请实施例的神经网络的量化与定点化融合方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本申请的实现。As shown in FIG. 5, it is a block diagram of an electronic device of the quantization and fixed-point fusion method of a neural network according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples, and are not intended to limit the implementation of the application described and/or required herein.
如图5所示,该电子设备包括:一个或多个处理器501、存储器502,以及用于连接各部件的接口,包括高速接口和低速接口。各个部件利用不同的总线互相连接,并且可以被安装在公共主板上或者根据需要以其它方式安装。处理器可以对在电子设备内执行的指令进行处理,包括存储在存储器中或者存储器上以在外部输入/输出装置(诸如,耦合至接口的显示设备)上显示图形用户界面(Graphical User Interface,GUI)的图形信息的指令。在其它实施方式中,若需要,可以将多个处理器和/或多条总线与多个存储器和多个存储器一起使用。同样,可以连接多个电子设备,各个设备提供部分必要的操作(例如,作为服务器阵列、一组刀片式服务器、或者多处理器系统)。图5中以一个处理器501为例。As shown in FIG. 5, the electronic device includes: one or more processors 501, a memory 502, and interfaces for connecting various components, including a high-speed interface and a low-speed interface. The various components are connected to each other using different buses, and can be installed on a common motherboard or installed in other ways as needed. The processor may process instructions executed in the electronic device, including storing in or on the memory to display a graphical user interface (GUI) on an external input/output device (such as a display device coupled to an interface) ) Instructions for graphic information. In other embodiments, if necessary, multiple processors and/or multiple buses can be used with multiple memories and multiple memories. Similarly, multiple electronic devices can be connected, and each device provides part of the necessary operations (for example, as a server array, a group of blade servers, or a multi-processor system). In FIG. 5, a processor 501 is taken as an example.
存储器502即为本申请所提供的非瞬时计算机可读存储介质。其中,所述存储器存储有可由至少一个处理器执行的指令,以使所述至少一个处理器执行本申请所提供的神经网络的量化与定点化融合方法。本申请的非瞬时计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行本申请所提供的神经网络的量化与定点化融合方法。The memory 502 is a non-transitory computer-readable storage medium provided by this application. Wherein, the memory stores instructions that can be executed by at least one processor, so that the at least one processor executes the quantization and fixed-point fusion method of the neural network provided in this application. The non-transitory computer-readable storage medium of the present application stores computer instructions, and the computer instructions are used to make the computer execute the quantization and fixed-point fusion method of the neural network provided in the present application.
存储器502作为一种非瞬时计算机可读存储介质,可用于存储非瞬时软件程序、非瞬时计算机可执行程序以及模块,如本申请实施例中的神经网络的量化与定点化融合方法对应的程序指令/模块(例如,附图4所示的量化单元100、第一操作单元200、定点化单元300、第二操作单元400)。处理器501通过运行存储在存储器502中的非瞬时软件程 序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例中的神经网络的量化与定点化融合方法。As a non-transitory computer-readable storage medium, the memory 502 can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as the program instructions corresponding to the neural network quantization and fixed-point fusion method in the embodiment of the present application /Module (for example, the quantization unit 100, the first operation unit 200, the fixed point unit 300, and the second operation unit 400 shown in FIG. 4). The processor 501 executes various functional applications and data processing of the server by running non-transient software programs, instructions, and modules stored in the memory 502, that is, realizing the quantization and fixed-point fusion method of the neural network in the above method embodiment.
存储器502可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据执行神经网络的量化与定点化融合方法的电子设备的使用所创建的数据等。此外,存储器502可以包括高速随机存取存储器,还可以包括非瞬时存储器,例如至少一个磁盘存储器件、闪存器件、或其他非瞬时固态存储器件。在一些实施例中,存储器502可选包括相对于处理器501远程设置的存储器,这些远程存储器可以通过网络连接至执行神经网络的量化与定点化融合方法的电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 502 may include a storage program area and a storage data area. The storage program area may store an operating system and an application program required by at least one function; the storage data area may store an electronic device according to the quantization and fixed-point integration method that executes the neural network Use the created data, etc. In addition, the memory 502 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the memory 502 may optionally include memories remotely arranged relative to the processor 501, and these remote memories may be connected to an electronic device that executes a neural network quantization and fixed-point integration method through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
执行神经网络的量化与定点化融合方法的电子设备还可以包括:输入装置503和输出装置504。处理器501、存储器502、输入装置503和输出装置504可以通过总线或者其他方式连接,图5中以通过总线连接为例。The electronic device that executes the quantization and fixed-point fusion method of the neural network may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503, and the output device 504 may be connected by a bus or in other ways. In FIG. 5, the connection by a bus is taken as an example.
输入装置503可接收输入的数字或字符信息,以及产生与执行神经网络的量化与定点化融合方法的电子设备的用户设置以及功能控制有关的键信号输入,例如触摸屏、小键盘、鼠标、轨迹板、触摸板、指示杆、一个或者多个鼠标按钮、轨迹球、操纵杆等输入装置。输出装置504可以包括显示设备、辅助照明装置(例如,LED)和触觉反馈装置(例如,振动电机)等。该显示设备可以包括但不限于,液晶显示器(Liquid Crystal Display,LCD)、发光二极管(Light Emitting Diode,LED)显示器和等离子体显示器。在一些实施方式中,显示设备可以是触摸屏。The input device 503 can receive input digital or character information, and generate key signal input related to the user settings and function control of the electronic device that performs the quantization and fixed-point fusion method of the neural network, such as touch screen, keypad, mouse, trackpad , Touchpad, pointing stick, one or more mouse buttons, trackball, joystick and other input devices. The output device 504 may include a display device, an auxiliary lighting device (for example, LED), a tactile feedback device (for example, a vibration motor), and the like. The display device may include, but is not limited to, a liquid crystal display (Liquid Crystal Display, LCD), a light emitting diode (Light Emitting Diode, LED) display, and a plasma display. In some embodiments, the display device may be a touch screen.
此处描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、专用集成电路(Application Specific Integrated Circuits,ASIC)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行 和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and technologies described herein can be implemented in digital electronic circuit systems, integrated circuit systems, application specific integrated circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof . These various embodiments may include: being implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, the programmable processor It can be a dedicated or general-purpose programmable processor that can receive data and instructions from the storage system, at least one input device, and at least one output device, and transmit the data and instructions to the storage system, the at least one input device, and the at least one output device. An output device.
这些计算程序(也称作程序、软件、软件应用、或者代码)包括可编程处理器的机器指令,并且可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。如本文使用的,术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置(programmable logic device,PLD)),包括,接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。These computing programs (also referred to as programs, software, software applications, or codes) include machine instructions for programmable processors, and can be implemented using high-level procedures and/or object-oriented programming languages, and/or assembly/machine language Calculation program. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, device, and/or device used to provide machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memories, and programmable logic devices (programmable logic devices, PLDs) include machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(Cathode Ray Tube,阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。In order to provide interaction with the user, the system and technology described here can be implemented on a computer with: a display device for displaying information to the user (for example, CRT (Cathode Ray Tube, cathode ray tube) or LCD (liquid crystal) (Display) monitor); and keyboard and pointing device (for example, mouse or trackball) through which the user can provide input to the computer. Other types of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (for example, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, voice input, or tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(Local Area Network,LAN)、 广域网(Wide Area Network,WAN)和互联网。The systems and technologies described herein can be implemented in a computing system that includes back-end components (for example, as a data server), or a computing system that includes middleware components (for example, an application server), or a computing system that includes front-end components (for example, A user computer with a graphical user interface or a web browser, through which the user can interact with the implementation of the system and technology described herein), or includes such back-end components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be connected to each other through any form or medium of digital data communication (for example, a communication network). Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN), and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。The computer system can include clients and servers. The client and server are generally far away from each other and usually interact through a communication network. The relationship between the client and the server is generated by computer programs running on the corresponding computers and having a client-server relationship with each other.
根据本申请实施例的技术方案,通过量化处理与定点化处理的融合,明显降低了算子之间的数据传输对带宽的要求,有效降低了加速单元的计算量,充分发挥了加速单元的定点化计算的优势。同时通过量化处理和定点化处理降低了计算对资源的要求,在节省资源的同时提升了计算效率。According to the technical solution of the embodiment of the present application, through the fusion of quantization processing and fixed-point processing, the bandwidth requirements for data transmission between operators are significantly reduced, the calculation amount of the acceleration unit is effectively reduced, and the fixed-point of the acceleration unit is fully utilized. The advantages of optimized computing. At the same time, the requirements for computing resources are reduced through quantitative processing and fixed-point processing, and computing efficiency is improved while saving resources.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请公开的技术方案所期望的结果,本文在此不进行限制。It should be understood that the various forms of processes shown above can be used to reorder, add or delete steps. For example, the steps described in the present application can be performed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present application can be achieved, this is not limited herein.
上述具体实施方式,并不构成对本申请保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等,均应包含在本申请保护范围之内。The foregoing specific implementation manners do not constitute a limitation on the protection scope of the present application. Those skilled in the art should understand that various modifications, combinations, sub-combinations and substitutions can be made according to design requirements and other factors. Any modification, equivalent replacement and improvement made within the spirit and principle of this application shall be included in the protection scope of this application.

Claims (14)

  1. 一种神经网络的量化与定点化融合方法,其特征在于,包括:A quantization and fixed-point fusion method of neural network, which is characterized in that it includes:
    对神经网络的当前层的输入数据和权重做量化处理;Quantify the input data and weights of the current layer of the neural network;
    在所述当前层中,利用量化处理后的权重对量化处理后的输入数据做计算操作,得到计算操作结果;In the current layer, use the quantized weight to perform a calculation operation on the quantized input data to obtain a calculation operation result;
    对预设处理参数进行定点化处理;以及Fixed-point processing of preset processing parameters; and
    利用定点化处理后的预设处理参数,对所述计算操作结果进行后处理操作,得到所述当前层的输出结果。Using the preset processing parameters after fixed-point processing, post-processing operations are performed on the calculation operation results to obtain the output results of the current layer.
  2. 根据权利要求1所述的方法,其特征在于,The method of claim 1, wherein:
    所述计算操作包括将所述量化处理后的输入数据与所述量化处理后的权重相乘;The calculation operation includes multiplying the quantized input data by the quantized weight;
    所述预设处理参数包括所述当前层的第一偏置值;以及The preset processing parameters include the first bias value of the current layer; and
    所述后处理操作包括将所述计算操作结果与定点化处理后的所述第一偏置值相加。The post-processing operation includes adding the result of the calculation operation to the first offset value after the fixed-point processing.
  3. 根据权利要求1所述的方法,其特征在于,The method of claim 1, wherein:
    所述计算操作包括卷积操作;以及The calculation operation includes a convolution operation; and
    所述预设处理参数包括:所述当前层的输入数据的量化幅值、所述当前层的权重的量化幅值、所述当前层的输出数据的量化幅值、所述当前层的批量归一化的尺度和所述当前层的批量归一化的偏置值中的至少一项。The preset processing parameters include: the quantized amplitude of the input data of the current layer, the quantized amplitude of the weight of the current layer, the quantized amplitude of the output data of the current layer, and the batch return of the current layer. At least one of the scale of unification and the bias value of the batch normalization of the current layer.
  4. 根据权利要求3所述的方法,其特征在于,对预设处理参数进行定点化处理,包括:The method according to claim 3, wherein the fixed-point processing of the preset processing parameters comprises:
    对至少两个所述预设处理参数进行融合计算,得到所述当前层的尺度值和所述当前层的第二偏置值。Performing fusion calculation on at least two of the preset processing parameters to obtain the scale value of the current layer and the second bias value of the current layer.
  5. 根据权利要求4所述的方法,其特征在于,所述后处理操作包括:将所述计算操作结果与所述当前层的尺度值相乘之后,再与所述当前层的第二偏置值相加。The method according to claim 4, wherein the post-processing operation comprises: multiplying the result of the calculation operation by the scale value of the current layer, and then multiplying the result with the second bias value of the current layer Add up.
  6. 根据权利要求4所述的方法,其特征在于,对至少两个所述预设处理参数进行融合计算,包括利用以下公式进行融合计算:The method according to claim 4, wherein performing fusion calculation on at least two of the preset processing parameters includes performing fusion calculation using the following formula:
    new_scale=bn_scale*input_scale*weight_scale/output_scale;new_scale=bn_scale*input_scale*weight_scale/output_scale;
    new_bias=bn_bias/output_scale,new_bias=bn_bias/output_scale,
    其中,new_scale表示所述当前层的尺度值,bn_scale表示所述当前层的批量归一化的尺度,input_scale表示所述当前层的输入数据的量化幅值,weight_scale表示所述当前层的权重的量化幅值,output_scale表示所述当前层的输出数据的量化幅值,new_bias表示所述当前层的第二偏置值,bn_bias表示所述当前层的批量归一化的偏置值。Wherein, new_scale represents the scale value of the current layer, bn_scale represents the batch normalization scale of the current layer, input_scale represents the quantization amplitude of the input data of the current layer, and weight_scale represents the quantization of the weight of the current layer Amplitude, output_scale represents the quantized amplitude of the output data of the current layer, new_bias represents the second bias value of the current layer, and bn_bias represents the batch normalized bias value of the current layer.
  7. 一种神经网络的量化与定点化融合装置,其特征在于,包括:A quantization and fixed-point fusion device of neural network, which is characterized in that it comprises:
    量化单元,用于:对神经网络的当前层的输入数据和权重做量化处理;The quantization unit is used to: quantify the input data and weights of the current layer of the neural network;
    第一操作单元,用于:在所述当前层中,利用量化处理后的权重对量化处理后的输入数据做计算操作,得到计算操作结果;The first operation unit is configured to: in the current layer, use the quantized weight to perform a calculation operation on the quantized input data to obtain a calculation operation result;
    定点化单元,用于:对预设处理参数进行定点化处理;以及The fixed-point unit is used to: perform fixed-point processing on preset processing parameters; and
    第二操作单元,用于:利用定点化处理后的预设处理参数,对所述计算操作结果进行后处理操作,得到所述当前层的输出结果。The second operation unit is configured to perform post-processing operations on the calculation operation results by using preset processing parameters after fixed-point processing to obtain the output results of the current layer.
  8. 根据权利要求7所述的装置,其特征在于,The device according to claim 7, wherein:
    所述计算操作包括将所述量化处理后的输入数据与所述量化处理后的权重相乘;The calculation operation includes multiplying the quantized input data by the quantized weight;
    所述预设处理参数包括所述当前层的第一偏置值;以及The preset processing parameters include the first bias value of the current layer; and
    所述后处理操作包括将所述计算操作结果与定点化处理后的所述第一偏置值相加。The post-processing operation includes adding the result of the calculation operation to the first offset value after the fixed-point processing.
  9. 根据权利要求7所述的装置,其特征在于,The device according to claim 7, wherein:
    所述计算操作包括卷积操作;以及The calculation operation includes a convolution operation; and
    所述预设处理参数包括:所述当前层的输入数据的量化幅值、所述当前层的权重的量化幅值、所述当前层的输出数据的量化幅值、所述当前层的批量归一化的尺度的量化幅值和所述当前层的批量归一化的偏置值的量化幅值中的至少一项。The preset processing parameters include: the quantized amplitude of the input data of the current layer, the quantized amplitude of the weight of the current layer, the quantized amplitude of the output data of the current layer, and the batch return of the current layer. At least one of the quantized amplitude of the unified scale and the quantized amplitude of the batch normalized bias value of the current layer.
  10. 根据权利要求9所述的装置,其特征在于,所述定点化单元用于:The device according to claim 9, wherein the pointing unit is used for:
    对至少两个所述预设处理参数进行融合计算,得到所述当前层的尺度值和所述当前层的第二偏置值。Performing fusion calculation on at least two of the preset processing parameters to obtain the scale value of the current layer and the second bias value of the current layer.
  11. 根据权利要求10所述的装置,其特征在于,所述后处理操作包括:将所述计算操作结果与所述当前层的尺度值相乘之后,再与所述当前层的第二偏置值相加。The device according to claim 10, wherein the post-processing operation comprises: multiplying the result of the calculation operation by the scale value of the current layer, and then multiplying the result with the second bias value of the current layer Add up.
  12. 根据权利要求10所述的装置,其特征在于,所述定点化单元用于利用以下公式进行融合计算:The device according to claim 10, wherein the fixed point unit is used to perform fusion calculation using the following formula:
    new_scale=bn_scale*input_scale*weight_scale/output_scale;new_scale=bn_scale*input_scale*weight_scale/output_scale;
    new_bias=bn_bias/output_scale,new_bias=bn_bias/output_scale,
    其中,new_scale表示所述当前层的尺度值,bn_scale表示所述当前层的批量归一化的尺度,input_scale表示所述当前层的输入数据的量化幅值,weight_scale表示所述当前层的权重的量化幅值,output_scale表示所述当前层的输出数据的量化幅值,new_bias表示所述当前层的第二偏置值,bn_bias表示所述当前层的批量归一化的偏置值。Wherein, new_scale represents the scale value of the current layer, bn_scale represents the batch normalization scale of the current layer, input_scale represents the quantization amplitude of the input data of the current layer, and weight_scale represents the quantization of the weight of the current layer Amplitude, output_scale represents the quantized amplitude of the output data of the current layer, new_bias represents the second bias value of the current layer, and bn_bias represents the batch normalized bias value of the current layer.
  13. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    至少一个处理器;以及At least one processor; and
    与所述至少一个处理器通信连接的存储器,其中,A memory communicatively connected with the at least one processor, wherein:
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-6中任一项所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute any one of claims 1 to 6 Methods.
  14. 一种存储有计算机指令的非瞬时计算机可读存储介质,其特征在于,所述计算机指令用于使所述计算机执行权利要求1-6中任一项所述的方法。A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to make the computer execute the method according to any one of claims 1-6.
PCT/CN2020/083797 2019-10-11 2020-04-08 Quantization and fixed-point fusion method and apparatus for neural network WO2021068469A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910966512.6A CN110705696B (en) 2019-10-11 2019-10-11 Quantization and fixed-point fusion method and device for neural network
CN201910966512.6 2019-10-11

Publications (1)

Publication Number Publication Date
WO2021068469A1 true WO2021068469A1 (en) 2021-04-15

Family

ID=69198512

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/083797 WO2021068469A1 (en) 2019-10-11 2020-04-08 Quantization and fixed-point fusion method and apparatus for neural network

Country Status (2)

Country Link
CN (1) CN110705696B (en)
WO (1) WO2021068469A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115019150A (en) * 2022-08-03 2022-09-06 深圳比特微电子科技有限公司 Target detection fixed point model establishing method and device and readable storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705696B (en) * 2019-10-11 2022-06-28 阿波罗智能技术(北京)有限公司 Quantization and fixed-point fusion method and device for neural network
CN113222097A (en) * 2020-01-21 2021-08-06 上海商汤智能科技有限公司 Data processing method and related product
CN113326942B (en) * 2020-02-28 2024-06-11 上海商汤智能科技有限公司 Model reasoning method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160328645A1 (en) * 2015-05-08 2016-11-10 Qualcomm Incorporated Reduced computational complexity for fixed point neural network
CN108345939A (en) * 2017-01-25 2018-07-31 微软技术许可有限责任公司 Neural network based on fixed-point calculation
CN109902745A (en) * 2019-03-01 2019-06-18 成都康乔电子有限责任公司 A kind of low precision training based on CNN and 8 integers quantization inference methods
CN110008952A (en) * 2019-03-26 2019-07-12 深兰科技(上海)有限公司 A kind of target identification method and equipment
CN110705696A (en) * 2019-10-11 2020-01-17 百度在线网络技术(北京)有限公司 Quantization and fixed-point fusion method and device for neural network

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10373050B2 (en) * 2015-05-08 2019-08-06 Qualcomm Incorporated Fixed point neural network based on floating point neural network quantization
CN115688877A (en) * 2017-06-06 2023-02-03 格兰菲智能科技有限公司 Method and computing device for fixed-point processing of data to be quantized
CN107748915A (en) * 2017-11-02 2018-03-02 北京智能管家科技有限公司 Compression method, device, equipment and the medium of deep neural network DNN models
CN110059811A (en) * 2017-11-06 2019-07-26 畅想科技有限公司 Weight buffer
CN108334945B (en) * 2018-01-30 2020-12-25 中国科学院自动化研究所 Acceleration and compression method and device of deep neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160328645A1 (en) * 2015-05-08 2016-11-10 Qualcomm Incorporated Reduced computational complexity for fixed point neural network
CN108345939A (en) * 2017-01-25 2018-07-31 微软技术许可有限责任公司 Neural network based on fixed-point calculation
CN109902745A (en) * 2019-03-01 2019-06-18 成都康乔电子有限责任公司 A kind of low precision training based on CNN and 8 integers quantization inference methods
CN110008952A (en) * 2019-03-26 2019-07-12 深兰科技(上海)有限公司 A kind of target identification method and equipment
CN110705696A (en) * 2019-10-11 2020-01-17 百度在线网络技术(北京)有限公司 Quantization and fixed-point fusion method and device for neural network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115019150A (en) * 2022-08-03 2022-09-06 深圳比特微电子科技有限公司 Target detection fixed point model establishing method and device and readable storage medium

Also Published As

Publication number Publication date
CN110705696B (en) 2022-06-28
CN110705696A (en) 2020-01-17

Similar Documents

Publication Publication Date Title
WO2021068469A1 (en) Quantization and fixed-point fusion method and apparatus for neural network
KR102396936B1 (en) Method, device, electronic device and storage medium for acquiring reading comprehension model
CN106990937B (en) Floating point number processing device and processing method
CN111832701B (en) Model distillation method, model distillation device, electronic equipment and storage medium
KR20210114853A (en) Method and apparatus for updating parameter of model
US11216615B2 (en) Method, device and storage medium for predicting punctuation in text
CN110807331B (en) Polyphone pronunciation prediction method and device and electronic equipment
KR20210148918A (en) Method, device, equipment and storage medium for acquiring word vector based on language model
JP7210830B2 (en) Speech processing system, speech processing method, electronic device and readable storage medium
CN111666077B (en) Operator processing method and device, electronic equipment and storage medium
WO2022057502A1 (en) Method and device for implementing dot product operation, electronic device, and storage medium
KR20220003444A (en) Optimizer learning method and apparatus, electronic device and readable storage medium
CN112529189A (en) Model compression method and device, electronic equipment and storage medium
US11243743B2 (en) Optimization of neural networks using hardware calculation efficiency and adjustment factors
US20220113943A1 (en) Method for multiply-add operations for neural network
CN112036561B (en) Data processing method, device, electronic equipment and storage medium
WO2022027862A1 (en) Method and device for quantifying neural network model
CN110782029B (en) Neural network prediction method and device, electronic equipment and automatic driving system
CN112507692B (en) Method and device for establishing style text generation model
WO2022206138A1 (en) Operation method and apparatus based on neural network
CN115237991A (en) Data format conversion method and device and matrix processing method and device
CN115952790A (en) Information extraction method and device
JP2023103419A (en) Operation method, device, chip, electronic equipment, and storage medium
CN115237992A (en) Data format conversion method and device and matrix processing method and device
CN115951860A (en) Data processing device, data processing method and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20875017

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20875017

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13.03.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20875017

Country of ref document: EP

Kind code of ref document: A1