WO2020211783A1 - Adjusting method for quantization frequency of operational data and related product - Google Patents

Adjusting method for quantization frequency of operational data and related product Download PDF

Info

Publication number
WO2020211783A1
WO2020211783A1 PCT/CN2020/084943 CN2020084943W WO2020211783A1 WO 2020211783 A1 WO2020211783 A1 WO 2020211783A1 CN 2020084943 W CN2020084943 W CN 2020084943W WO 2020211783 A1 WO2020211783 A1 WO 2020211783A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
quantization accuracy
quantization
neural network
accuracy
Prior art date
Application number
PCT/CN2020/084943
Other languages
French (fr)
Chinese (zh)
Inventor
刘少礼
张曦珊
曾洪博
孟小甫
Original Assignee
上海寒武纪信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201910306478.XA external-priority patent/CN111832710A/en
Priority claimed from CN201910306477.5A external-priority patent/CN111832709A/en
Priority claimed from CN201910306479.4A external-priority patent/CN111832711A/en
Priority claimed from CN201910307675.3A external-priority patent/CN111832696A/en
Priority claimed from CN201910307672.XA external-priority patent/CN111832695A/en
Priority claimed from CN201910306480.7A external-priority patent/CN111832712A/en
Application filed by 上海寒武纪信息科技有限公司 filed Critical 上海寒武纪信息科技有限公司
Publication of WO2020211783A1 publication Critical patent/WO2020211783A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The present disclosure provides an adjusting method for quantization frequency of operational data and a related product. The method comprises adjusting the quantization frequency of a neural network. Thus, the present disclosure has the advantage of high calculation accuracy.

Description

运算数据的量化频率调整方法及相关产品Quantitative frequency adjustment method of calculation data and related products 技术领域Technical field
本披露涉及神经网络领域,尤其涉及一种运算数据的量化频率调整方法及相关产品。The present disclosure relates to the field of neural networks, and in particular to a method for quantifying frequency adjustment of arithmetic data and related products.
背景技术Background technique
人工神经网络(Artificial Neural Network,即ANN),是20世纪80年代以来人工智能领域兴起的研究热点。它从信息处理角度对人脑神经元网络进行抽象,建立某种简单模型,按不同的连接方式组成不同的网络。在工程与学术界也常直接简称为神经网络或类神经网络。神经网络是一种运算模型,由大量的节点(或称神经元)之间相互联接构成。现有的神经网络的运算基于CPU(Central Processing Unit,中央处理器)或GPU(英文:Graphics Processing Unit,图形处理器)来实现神经网络的推理(即正向运算)和反向训练,在训练时为了减少计算量,会对运算数据进行量化处理,但是现有的量化处理对量化的频率不会改变,这样使得量化参数与运算数据可能不匹配,导致量化的效果,影响计算的精度或计算量。Artificial Neural Network (ANN) is a research hotspot that has emerged in the field of artificial intelligence since the 1980s. It abstracts the human brain neuron network from the perspective of information processing, establishes a simple model, and composes different networks according to different connection methods. In engineering and academia, it is often referred to as neural network or quasi-neural network. A neural network is a computing model, which is composed of a large number of nodes (or neurons) interconnected. The calculations of existing neural networks are based on CPU (Central Processing Unit) or GPU (English: Graphics Processing Unit) to realize neural network inference (that is, forward operation) and reverse training. In order to reduce the amount of calculation, the calculation data will be quantized, but the existing quantization processing will not change the frequency of quantization, so that the quantization parameters may not match the calculation data, resulting in the effect of quantization, affecting the accuracy or calculation of the calculation the amount.
发明内容Summary of the invention
本披露实施例提供了一种运算数据的量化频率调整方法及相关产品,可提高计算的精度或降低计算量。The embodiment of the disclosure provides a method for adjusting the quantization frequency of arithmetic data and related products, which can improve the accuracy of calculation or reduce the amount of calculation.
第一方面,提供一种运算数据的量化频率调整的方法,所述方法包括如下步骤:In a first aspect, a method for adjusting the quantization frequency of arithmetic data is provided. The method includes the following steps:
运算数据的量化频率调整方法,其特征在于,所述方法应用于人工智能处理器,所述方法包括如下步骤:A method for adjusting the quantization frequency of arithmetic data is characterized in that the method is applied to an artificial intelligence processor, and the method includes the following steps:
确定神经网络的运算数据;Determine the calculation data of the neural network;
获取量化命令,所述量化命令包括:量化精度的数据类型以及量化精度;Acquire a quantization command, where the quantization command includes: the data type of the quantization accuracy and the quantization accuracy;
获取神经网络的训练参数,依据所述训练参数以及量化精度的数据类型确定所述量化精度的调整频率,以使得所述人工智能处理器依据所述调整频率对量化精度调整。The training parameters of the neural network are acquired, and the adjustment frequency of the quantization accuracy is determined according to the training parameters and the data type of the quantization accuracy, so that the artificial intelligence processor adjusts the quantization accuracy according to the adjustment frequency.
可选的,所述运算数据包括:输入神经元A、输出神经元B、权重W、输入神经元导数
Figure PCTCN2020084943-appb-000001
输出神经元导数
Figure PCTCN2020084943-appb-000002
权重导数
Figure PCTCN2020084943-appb-000003
中的一种或任意组合;
Optionally, the calculation data includes: input neuron A, output neuron B, weight W, input neuron derivative
Figure PCTCN2020084943-appb-000001
Output neuron derivative
Figure PCTCN2020084943-appb-000002
Weight derivative
Figure PCTCN2020084943-appb-000003
One or any combination of;
所述量化精度的数据类型具体包括:离散量化精度或连续量化精度。The data type of the quantization accuracy specifically includes: discrete quantization accuracy or continuous quantization accuracy.
可选的,所述训练参数包括:训练周期iteration或训练时期epoch。Optionally, the training parameters include: training period iteration or training period epoch.
可选的,所述依据所述训练参数以及量化精度的数据类型确定所述量化精度的调整频率具体包括:Optionally, the determining the adjustment frequency of the quantization accuracy according to the training parameter and the data type of the quantization accuracy specifically includes:
每隔δ个训练周期调整一次量化精度,δ固定不变;Adjust the quantization accuracy every δ training cycles, and δ is fixed;
或每隔δ个训练时期epoch调整一次量化精度,δ固定不变;Or adjust the quantization accuracy every δ training period epoch, δ is fixed;
或每隔step个iteration或epoch调整一次量化精度,其中step=αδ,α大于1;Or adjust the quantization accuracy every step iteration or epoch, where step=αδ, α is greater than 1;
或每隔δ个训练iteration或epoch调整一次量化精度,所述δ随着训练次数的增加,逐渐递减。Or adjust the quantization accuracy every δ training iterations or epochs, and the δ gradually decreases as the number of training increases.
可选的,所述量化精度的确定方法具体包括:Optionally, the method for determining the quantization accuracy specifically includes:
根据该运算数据绝对值最大值确定离散量化精度的指数s或连续量化精度f;Determine the exponent s of the discrete quantization accuracy or the continuous quantization accuracy f according to the maximum absolute value of the operation data;
或根据该运算数据绝对值最小值确定s、f;Or determine s and f according to the minimum absolute value of the calculated data;
或根据不同数据的量化精度关系确定s、f;Or determine s and f according to the quantitative precision relationship of different data;
或根据经验值常量确定s、f。Or determine s and f based on empirical constants.
可选的,所述根据该运算数据绝对值最大值确定s或f具体包括:Optionally, the determining s or f according to the maximum absolute value of the operation data specifically includes:
如为离散量化精度,通过公式1-1来确定s;For discrete quantization accuracy, determine s by formula 1-1;
S a=[log 2(a max)—bitnum+1]   公式(1-1) S a =[log 2 (a max )—bitnum+1] formula (1-1)
如为连续量化精度,通过公式2-1来确定fFor continuous quantization accuracy, determine f by formula 2-1
Figure PCTCN2020084943-appb-000004
Figure PCTCN2020084943-appb-000004
其中c为常数,bitnum为量化后的数据的比特位数,amax为运算数据绝对值最大值。Where c is a constant, bitnum is the number of bits of the quantized data, and amax is the maximum absolute value of the calculated data.
可选的,所述根据该运算数据绝对值最小值确定s、f具体包括:Optionally, the determining s and f according to the minimum absolute value of the operation data specifically includes:
如如为离散量化精度,通过公式1-2确定精度s;If it is the discrete quantization accuracy, determine the accuracy s by formula 1-2;
Figure PCTCN2020084943-appb-000005
Figure PCTCN2020084943-appb-000005
如为连续量化精度,通过公式2-2确定精度f;If it is continuous quantization accuracy, determine the accuracy f by formula 2-2;
f a=a min*d   公式2-2 f a =a min *d Formula 2-2
其中d为常数,amin为运算数据绝对值最小值。Among them, d is a constant and amin is the absolute minimum value of the calculated data.
可选的,所述运算数据绝对值最大值或所述绝对值最小值的确定方式具体包括:Optionally, the method for determining the maximum absolute value of the operation data or the minimum absolute value specifically includes:
采用所有层分类寻找绝对值最大值或绝对值最小值;Use all layer classifications to find the absolute maximum value or the absolute minimum value;
或采用分层分类别寻找绝对值最大值或绝对值最小值;Or use hierarchical classification to find the absolute maximum value or the absolute minimum value;
或采用分层分类别分组进寻找绝对值最大值或绝对值最小值。Or use hierarchical classification and grouping to find the absolute maximum value or the absolute minimum value.
可选的,所述根据不同数据类型的量化精度确定s、f具体包括:Optionally, the determining s and f according to the quantization precision of different data types specifically includes:
如为离散量化精度,通过公式1-3来确定离散定点精度
Figure PCTCN2020084943-appb-000006
For discrete quantization accuracy, use formula 1-3 to determine the discrete fixed-point accuracy
Figure PCTCN2020084943-appb-000006
S a (l)=∑ b≠aα bS b (l)b    公式1-3 S a (l) =∑ b≠a α b S b (l) + β b Formula 1-3
其中,
Figure PCTCN2020084943-appb-000007
为与数据a (l)同层的数据b (l)的离散定点精度,所述
Figure PCTCN2020084943-appb-000008
已知;
among them,
Figure PCTCN2020084943-appb-000007
Discrete fixed-point accuracy of the data a (l) in the same layer as the data b (l) of the
Figure PCTCN2020084943-appb-000008
A known;
如为连续量化精度,通过公式2-3来确定连续量化精度
Figure PCTCN2020084943-appb-000009
For continuous quantization accuracy, use formula 2-3 to determine the continuous quantization accuracy
Figure PCTCN2020084943-appb-000009
f a (l)=∑ b≠aα bf b (l)b   公式2-3 f a (l) =∑ b≠a α b f b (l) + β b Formula 2-3
Figure PCTCN2020084943-appb-000010
为与数据a (l)同层的数据b (l)的连续量化精度,所述
Figure PCTCN2020084943-appb-000011
已知。
Figure PCTCN2020084943-appb-000010
The data with the data a (l) with the layer b (l) successive quantization precision, the
Figure PCTCN2020084943-appb-000011
A known.
可选的,所述根据经验值常量确定s、f具体包括:Optionally, the determining s and f according to the empirical value constant specifically includes:
设定
Figure PCTCN2020084943-appb-000012
其中C为整数常数;
set up
Figure PCTCN2020084943-appb-000012
Where C is an integer constant;
设定
Figure PCTCN2020084943-appb-000013
其中C为有理数常数;
set up
Figure PCTCN2020084943-appb-000013
Where C is a rational constant;
其中,
Figure PCTCN2020084943-appb-000014
为数据a (l)的离散量化精度的指数,
Figure PCTCN2020084943-appb-000015
为数据a (l)的连续量化精度。
among them,
Figure PCTCN2020084943-appb-000014
Is the index of the discrete quantization precision of data a (l) ,
Figure PCTCN2020084943-appb-000015
Is the continuous quantization accuracy of data a (l) .
第二方面,提供一种人工智能处理器,所述人工智能处理器包括:In a second aspect, an artificial intelligence processor is provided, and the artificial intelligence processor includes:
处理单元,用于确定神经网络的运算数据;The processing unit is used to determine the operational data of the neural network;
获取单元,用于获取量化命令,所述量化命令包括:量化精度的数据类型以及量化精度,获取神经网络的训练参数;The obtaining unit is configured to obtain a quantization command, the quantization command includes: the data type of the quantization accuracy and the quantization accuracy, and the training parameters of the neural network are obtained;
处理单元,还用于依据所述训练参数以及量化精度的数据类型确定所述量化精度的调整频率,以使得所述人工智能处理器依据所述调整频率对量化精度调整。The processing unit is further configured to determine the adjustment frequency of the quantization accuracy according to the training parameter and the data type of the quantization accuracy, so that the artificial intelligence processor adjusts the quantization accuracy according to the adjustment frequency.
第三方面,提供一种神经网络运算装置,所述神经网络运算装置包括一个或多个第二方面提供的芯片。In a third aspect, a neural network computing device is provided, and the neural network computing device includes one or more chips provided in the second aspect.
第四方面,提供一种组合处理装置,所述组合处理装置包括:第三方面提供的神经网络运算装置、通用互联接口和通用处理装置;In a fourth aspect, a combined processing device is provided. The combined processing device includes: the neural network computing device, the universal interconnection interface, and the universal processing device provided in the third aspect;
所述神经网络运算装置通过所述通用互联接口与所述通用处理装置连接。The neural network computing device is connected to the general processing device through the general interconnection interface.
第五方面,提供一种电子设备,所述电子设备包括第二方面提供的芯片或第三方面提供的神经网络运算装置。In a fifth aspect, an electronic device is provided, which includes the chip provided in the second aspect or the neural network computing device provided in the third aspect.
第六方面,提供一种计算机可读存储介质,其特征在于,存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行第一方面提供的方法。In a sixth aspect, a computer-readable storage medium is provided, which is characterized by storing a computer program for electronic data exchange, wherein the computer program causes a computer to execute the method provided in the first aspect.
第七方面,提供一种计算机程序产品,其中,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行第一方面提供的方法。In a seventh aspect, a computer program product is provided, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the method provided in the first aspect.
第一方面,提供一种运算数据的量化的方法,所述方法应用于人工智能处理器,所述方法包括如下步骤:In a first aspect, a method for quantifying computational data is provided. The method is applied to an artificial intelligence processor. The method includes the following steps:
运算数据的混合量化方法,所述方法应用于人工智能处理器,其特征在于,所述方法包括如下步骤:A hybrid quantization method for computing data, which is applied to an artificial intelligence processor, and is characterized in that the method includes the following steps:
确定运算数据;获取量化命令,该量化命令包括量化精度的数据类型,依据该数据类型从计算库中提取与该数据类型对应的量化函数;将该运算数据划分成g组数据,依据该量化精度的数据类型对该g组数据采用混合量化操作,得到量化后的数据,以使得所述人工智能处理器根据所述量化后的数据执行运算操作,所述g为大于等于2的整数。Determine the operation data; obtain the quantization command, the quantization command includes the data type of the quantization accuracy, and extract the quantization function corresponding to the data type from the calculation library according to the data type; divide the operation data into g groups of data according to the quantization accuracy The data type of uses a hybrid quantization operation on the g group of data to obtain quantized data, so that the artificial intelligence processor performs arithmetic operations according to the quantized data, where g is an integer greater than or equal to 2.
可选的,所述运算数据包括:输入神经元A、输出神经元B、权重W、输入神经元导数
Figure PCTCN2020084943-appb-000016
输出神经元导数
Figure PCTCN2020084943-appb-000017
权重导数
Figure PCTCN2020084943-appb-000018
中的一种或任意组合;
Optionally, the calculation data includes: input neuron A, output neuron B, weight W, input neuron derivative
Figure PCTCN2020084943-appb-000016
Output neuron derivative
Figure PCTCN2020084943-appb-000017
Weight derivative
Figure PCTCN2020084943-appb-000018
One or any combination of;
所述量化精度的数据类型具体包括:离散量化精度或连续量化精度。The data type of the quantization accuracy specifically includes: discrete quantization accuracy or continuous quantization accuracy.
可选的,所述依据该量化精度的数据类型对该g组数据采用混合量化操作具体包括:Optionally, the applying a hybrid quantization operation to the g group of data according to the data type of the quantization precision specifically includes:
依据该量化精度的数据类型对该g组数据采用至少二种量化精度的数据类型对g组进行量化操作,其中g组数据中单组数据的量化精度的数据类型相同。According to the data type of the quantization accuracy, at least two data types of quantization accuracy are used to perform a quantization operation on the g group for the g group data, wherein the data types of the quantization accuracy of the single group of data in the g group data are the same.
可选的,所述将该运算数据划分成g组数据具体包括:Optionally, the dividing the operation data into g groups of data specifically includes:
依据神经网络的网络层将运算数据划分成g组数据;Divide the calculation data into g groups of data according to the network layer of the neural network;
或依据所述人工智能处理器的核数量将运算数据划分成g组数据。Or divide the calculation data into g groups of data according to the number of cores of the artificial intelligence processor.
可选的,所述依据该量化精度的数据类型对该g组数据采用至少二种量化精度的数据类型对g组进行量化操作具体包括:Optionally, the using at least two data types of quantization precision for the g group of data to perform a quantization operation on the group g according to the data type of the quantization precision specifically includes:
将g组数据的一部分组数据采用离散量化精度执行量化操作,将g组数据的另一部分组数据采用连续量化精度执行量化操作。A part of the g group of data is quantized with discrete quantization precision, and another part of the g group of data is quantized with continuous quantization precision.
可选的,所述依据该数据类型从计算库中提取与该数据类型对应的量化函数具体包括:Optionally, the extraction of the quantization function corresponding to the data type from the calculation library according to the data type specifically includes:
如所述量化精度的数据类型为离散量化精度的指数s,依据离散量化精度以及运算数据的元素值计 算得到量化后的数据;For example, the data type of the quantization accuracy is the exponent s of the discrete quantization accuracy, and the quantized data is calculated according to the discrete quantization accuracy and the element value of the operation data;
如所述量化精度的数据类型为连续量化精度,依据连续量化精度以及运算数据的元素值计算得到量化后的数据。If the data type of the quantization precision is continuous quantization precision, the quantized data is calculated according to the continuous quantization precision and the element value of the operation data.
可选的,所述方法在将该运算数据划分成g组数据之前,还包括依据该运算数据以及该数据类型确定该运算数据的量化精度,所述依据该运算数据以及该数据类型确定该运算数据的量化精度具体包括:Optionally, before dividing the operation data into g groups of data, the method further includes determining the quantization accuracy of the operation data according to the operation data and the data type, and determining the operation data according to the operation data and the data type The quantification accuracy of data includes:
根据该运算数据绝对值最大值确定s或f;Determine s or f according to the maximum absolute value of the calculated data;
或根据该运算数据绝对值最小值确定s或f;Or determine s or f according to the minimum absolute value of the calculated data;
或根据不同数据的量化精度确定s或f;Or determine s or f according to the quantization precision of different data;
或根据经验值常量确定s或f。Or determine s or f according to the empirical value constant.
可选的,所述根据该运算数据绝对值最大值确定s和f具体包括:Optionally, the determining s and f according to the maximum absolute value of the operation data specifically includes:
通过公式1-1来确定s;Determine s by formula 1-1;
s=[log 2(a max)—bitnum+1]   公式(1-1) s=[log 2 (a max )—bitnum+1] formula (1-1)
通过公式2-1来确定fDetermine f by formula 2-1
Figure PCTCN2020084943-appb-000019
Figure PCTCN2020084943-appb-000019
其中c为常数,bitnum为量化后的数据的比特位数,amax为运算数据绝对值最大值。Where c is a constant, bitnum is the number of bits of the quantized data, and amax is the maximum absolute value of the calculated data.
可选的,所述根据该运算数据绝对值最小值确定s或f具体包括:Optionally, the determining s or f according to the minimum absolute value of the operation data specifically includes:
通过公式1-2确定s;Determine s by formula 1-2;
Figure PCTCN2020084943-appb-000020
Figure PCTCN2020084943-appb-000020
或者,通过公式2-2确定精度f;Or, determine the accuracy f by formula 2-2;
f=a min*d   公式2-2 f=a min *d formula 2-2
其中d为常数,amin为运算数据绝对值最小值。Among them, d is a constant and amin is the absolute minimum value of the calculated data.
可选的,所述运算数据绝对值最大值或所述绝对值最小值的确定方式具体包括:Optionally, the method for determining the maximum absolute value of the operation data or the minimum absolute value specifically includes:
采用所有层分类寻找绝对值最大值或绝对值最小值;Use all layer classifications to find the absolute maximum value or the absolute minimum value;
或采用分层分类别寻找绝对值最大值或绝对值最小值;Or use hierarchical classification to find the absolute maximum value or the absolute minimum value;
或采用分层分类别分组进寻找绝对值最大值或绝对值最小值。Or use hierarchical classification and grouping to find the absolute maximum value or the absolute minimum value.
可选的,所述根据不同数据的量化精度确定s、f具体包括:Optionally, the determining s and f according to the quantization precision of different data specifically includes:
通过公式1-3来确定离散量化精度
Figure PCTCN2020084943-appb-000021
Determine the discrete quantization accuracy by formula 1-3
Figure PCTCN2020084943-appb-000021
S a (l)=∑ b≠aα bS b (l)b   公式1-3 S a (l) =∑ b≠a α b S b (l) + β b Formula 1-3
其中,
Figure PCTCN2020084943-appb-000022
为与数据a (l)同层的数据b (l)的离散量化精度的指数,所述
Figure PCTCN2020084943-appb-000023
已知;
among them,
Figure PCTCN2020084943-appb-000022
Discrete quantization precision data and a (l) in the same layer as the data b (l) of the index, the
Figure PCTCN2020084943-appb-000023
A known;
通过公式2-3来确定连续量化精度
Figure PCTCN2020084943-appb-000024
Determine the continuous quantization accuracy by formula 2-3
Figure PCTCN2020084943-appb-000024
f a (l)=∑ b≠aα bf b (l)b   公式2-3 f a (l) =∑ b≠a α b f b (l) + β b Formula 2-3
Figure PCTCN2020084943-appb-000025
为与数据a (l)同层的数据b (l)的连续量化精度,所述
Figure PCTCN2020084943-appb-000026
已知,上标l表示第l层。
Figure PCTCN2020084943-appb-000025
The data with the data a (l) with the layer b (l) successive quantization precision, the
Figure PCTCN2020084943-appb-000026
It is known that the superscript l represents the first layer.
可选的,所述根据经验值常量确定s、f具体包括:Optionally, the determining s and f according to the empirical value constant specifically includes:
设定
Figure PCTCN2020084943-appb-000027
其中C为整数常数;
set up
Figure PCTCN2020084943-appb-000027
Where C is an integer constant;
设定
Figure PCTCN2020084943-appb-000028
其中C为有理数常数;
set up
Figure PCTCN2020084943-appb-000028
Where C is a rational constant;
其中,
Figure PCTCN2020084943-appb-000029
为第l层的数据a (l)的离散量化精度的指数,
Figure PCTCN2020084943-appb-000030
为第l层的数据a (l)的连续量化精度。
among them,
Figure PCTCN2020084943-appb-000029
Is the index of the discrete quantization precision of the data a (l) of the lth layer,
Figure PCTCN2020084943-appb-000030
It is the continuous quantization accuracy of the data a (l) of the first layer.
可选的,所述方法还包括:Optionally, the method further includes:
动态调整s或f。Adjust s or f dynamically.
可选的,所述动态调整s或f具体包括:Optionally, the dynamic adjustment of s or f specifically includes:
根据待量化数据绝对值最大值向上调整s或f:;Adjust s or f upward according to the maximum absolute value of the data to be quantified:;
或根据待量化数据绝对值最大值逐步向上调整s或f;Or gradually adjust s or f upward according to the maximum absolute value of the data to be quantified;
或根据待量化数据分布单步向上调整s或f;Or adjust s or f upwards in a single step according to the data distribution to be quantified;
或根据待量化数据分布逐步向上调整s或f;Or gradually adjust s or f upwards according to the data distribution to be quantified;
或根据待量化数据绝对值最大值向下调整s或f。Or adjust s or f downward according to the maximum absolute value of the data to be quantified.
可选的,所述方法还包括:Optionally, the method further includes:
动态调整s、f的触发频率。Adjust the trigger frequency of s and f dynamically.
可选的,所述动态调整s、f的触发频率具体包括:Optionally, the dynamic adjustment of the trigger frequency of s and f specifically includes:
每隔δ个训练周期调整一次s、f,δ固定不变;Adjust s and f once every δ training periods, and δ is fixed;
或每隔δ个训练时期epoch调整一次s、f,δ固定不变;Or adjust s and f once every δ training periods epoch, and δ is fixed;
或每隔step个iteration或epoch调整一次s、f,step=αδ,其中α大于1;Or adjust s and f every step iteration or epoch, step=αδ, where α is greater than 1;
或每隔δ个训练iteration或epoch调整一次s、f,所述δ随着训练次数的增加,逐渐递减。Or adjust s and f once every δ training iterations or epochs, and the δ gradually decreases as the number of training increases.
附图说明Description of the drawings
图1是一种神经网络的训练方法示意图。Figure 1 is a schematic diagram of a neural network training method.
图2是一种运算数据的量化频率调整的方法的流程示意图。Fig. 2 is a schematic flowchart of a method for adjusting the quantization frequency of arithmetic data.
图3a是离散定点数据的表示示意图。Figure 3a is a schematic representation of discrete fixed-point data.
图3b是连续定点数据的表示示意图。Figure 3b is a schematic representation of continuous fixed-point data.
图4a是一种芯片示意图。Figure 4a is a schematic diagram of a chip.
图4b是另一种芯片示意图。Figure 4b is a schematic diagram of another chip.
图5a为本披露还揭露了一个组合处理装置结构示意图。Figure 5a shows a schematic structural diagram of a combined processing device for this disclosure.
图5b为本披露还揭露了一个组合处理装置另一种结构示意图。FIG. 5b also discloses another structural schematic diagram of a combined processing device in this disclosure.
图5c为本披露实施例提供的一种神经网络处理器板卡的结构示意图。Fig. 5c is a schematic structural diagram of a neural network processor board provided by an embodiment of the disclosure.
图5d为本披露实施例流提供的一种神经网络芯片封装结构的结构示意图。FIG. 5d is a schematic structural diagram of a neural network chip packaging structure provided by an embodiment of the disclosure.
图5e为本披露实施例流提供的一种神经网络芯片的结构示意图。Fig. 5e is a schematic structural diagram of a neural network chip provided by an embodiment of the disclosure.
图6为本披露实施例流提供的一种神经网络芯片封装结构的示意图。FIG. 6 is a schematic diagram of a neural network chip packaging structure provided by an embodiment of the disclosure.
图6a为本披露实施例流提供的另一种神经网络芯片封装结构的示意图。FIG. 6a is a schematic diagram of another neural network chip packaging structure provided by an embodiment of the disclosure.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本披露方案,下面将结合本披露实施例中的附图,对本披露实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本披露一部分实施例,而不是全部的实施例。基于本披露中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本披露保护的范围。In order to enable those skilled in the art to better understand the solutions of the disclosure, the technical solutions in the embodiments of the disclosure will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the disclosure. Obviously, the described embodiments are only It is a part of the embodiments of this disclosure, but not all the embodiments. Based on the embodiments in this disclosure, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of this disclosure.
本申请提供的运算数据的量化方法可以在处理器内运行,上述处理器可以为通用的处理器,例如中央处理器CPU,也可以是专用的处理器,例如图形处理器GPU,当然也可以在人工智能处理器内实现。本申请对处理器的具体表现形式并不限定。The quantization method of arithmetic data provided in this application can be run in a processor. The aforementioned processor can be a general-purpose processor, such as a central processing unit CPU, or a dedicated processor, such as a graphics processor GPU. Realized in artificial intelligence processor. This application does not limit the specific expression form of the processor.
参阅图1,图1为本申请提供的一种神经网络的一个运算层的训练示意图,如图1所示,该运算层可以为全连接层或卷积层,如该运算层为全连接层,其对应的为全连接运算,例如图1所示的矩阵乘法运算,如该运算层为卷积层,其对应的为卷积运算。如图1所示,该训练包含正向推理(简称推理)和反向训练,如图1所示,该实线所示为正向推理的过程,该虚线为反向训练的过程。如图1所示的正向推理,运算层的输入数据与权值执行运算得到运算层的输出数据,该输出数据可以为该运算层的下一层的输入数据。如图1所示的反向训练过程,运算层的输出数据梯度与权值执行运算得到输入数据梯度,该输出数据梯度与输入数据运算得到权值梯度,该权值梯度用于对本运算层的权值进行更新,该输入数据梯度作为该运算层的下一层的输出数据梯度。Refer to Figure 1. Figure 1 is a training schematic diagram of a computing layer of a neural network provided by this application. As shown in Figure 1, the computing layer can be a fully connected layer or a convolutional layer. For example, the computing layer is a fully connected layer. , It corresponds to a fully connected operation, such as the matrix multiplication operation shown in Fig. 1. If the operation layer is a convolution layer, the corresponding operation is a convolution operation. As shown in Figure 1, the training includes forward inference (abbreviated as inference) and reverse training. As shown in Figure 1, the solid line shows the process of forward inference, and the dashed line shows the process of reverse training. As shown in the forward reasoning shown in Figure 1, the input data and weights of the computing layer are calculated to obtain the output data of the computing layer. The output data can be the input data of the next layer of the computing layer. In the reverse training process shown in Figure 1, the output data gradient and weight value of the arithmetic layer are calculated to obtain the input data gradient. The output data gradient and the input data are calculated to obtain the weight gradient. The weight gradient is used for the calculation of the calculation layer. The weight is updated, and the input data gradient is used as the output data gradient of the next layer of the operation layer.
参阅图2,图2提供了一种运算数据的量化频率调整的方法,该运算可以包括,正向推理和/或反向训练;该方法由计算芯片执行,该计算芯片可以为通用的处理器,例如中央处理器、图形处理器,当然其也可以为专用的神经网络处理器。当然上述方法还可以由包含计算芯片的装置来执行,该方法如图2所示,包括如下步骤:Referring to Figure 2, Figure 2 provides a method for adjusting the quantization frequency of calculation data. The calculation may include forward inference and/or reverse training; the method is executed by a computing chip, which may be a general-purpose processor , Such as central processing unit, graphics processor, of course, it can also be a dedicated neural network processor. Of course, the above method can also be executed by a device including a computing chip. As shown in FIG. 2, the method includes the following steps:
步骤S201、计算芯片确定神经网络的运算数据;Step S201: The calculation chip determines the calculation data of the neural network;
可选的,上述步骤S201中的运算数据包括但不限于:输入神经元A、输出神经元B、权重W、输入神经元导数
Figure PCTCN2020084943-appb-000031
输出神经元导数
Figure PCTCN2020084943-appb-000032
权重导数
Figure PCTCN2020084943-appb-000033
中的一种或任意组合。
Optionally, the calculation data in the above step S201 includes but is not limited to: input neuron A, output neuron B, weight W, input neuron derivative
Figure PCTCN2020084943-appb-000031
Output neuron derivative
Figure PCTCN2020084943-appb-000032
Weight derivative
Figure PCTCN2020084943-appb-000033
One or any combination of them.
步骤S202、计算芯片获取量化命令,所述量化命令包括:量化精度的数据类型以及量化精度;Step S202: The computing chip obtains a quantization command, where the quantization command includes: the data type of the quantization accuracy and the quantization accuracy;
可选的,量化精度的数据类型具体包括:离散量化精度或连续量化精度f,离散量化精度可以表示为2 s,其中s为离散量化精度的指数。图3a为离散定点数据表示示意图,图3b为连续定点数据表示示意图。 Optionally, the data type of the quantization accuracy specifically includes: discrete quantization accuracy or continuous quantization accuracy f. The discrete quantization accuracy can be expressed as 2 s , where s is an index of the discrete quantization accuracy. Figure 3a is a schematic representation of discrete fixed-point data, and Figure 3b is a schematic representation of continuous fixed-point data.
步骤S203、计算芯片获取神经网络的训练参数,依据所述训练参数以及量化精度的数据类型确定所述量化精度的调整频率,以使得所述人工智能处理器依据所述调整频率对量化精度调整。Step S203: The computing chip obtains the training parameters of the neural network, and determines the adjustment frequency of the quantization accuracy according to the training parameters and the data type of the quantization accuracy, so that the artificial intelligence processor adjusts the quantization accuracy according to the adjustment frequency.
可选的,上述训练参数包括:训练周期iteration或训练时期epoch。上述训练周期具体可以为,一个训练样本完成一次迭代运算,即一个训练样本完成一次正向运算和反向训练即一个训练周期;上述训练时期具体可以包括:训练集中的所有样本训练一次为一个训练时期,即一个训练时期可以为训练集的所有样本执行一次正向运算和反向训练即一个训练时期。Optionally, the foregoing training parameters include: training period iteration or training period epoch. The above-mentioned training period may specifically be that one training sample completes one iteration operation, that is, one training sample completes one forward operation and reverse training, that is, one training period; the above-mentioned training period may specifically include: all samples in the training set are trained for one training at a time Period, that is, a training period can perform a forward operation and reverse training for all samples in the training set, that is, a training period.
本申请提供的技术方案在确定神经网络运算数据后,通过量化命令确定量化精度的类型以及量化精度,然后获取训练参数,依据该训练参数确定该量化精度的调整频率,这样能够让人工智能处理器依据该调整频率对神经网络的数据进行量化精度的调整。其避免了量化精度长时间不调整导致的量化精度与运算数据不匹配的问题,使得量化精度与运算数据的匹配度更高,提高了计算精度或降低了计算量。The technical solution provided by this application determines the type of quantization accuracy and quantization accuracy through a quantization command after determining the neural network operation data, then obtains training parameters, and determines the adjustment frequency of the quantization accuracy according to the training parameters, so that the artificial intelligence processor According to the adjustment frequency, the quantization accuracy of the neural network data is adjusted. It avoids the problem of mismatch between the quantization accuracy and the operation data caused by the unadjusted quantization accuracy for a long time, makes the quantization accuracy and the operation data match higher, improves the calculation accuracy or reduces the calculation amount.
可选的,上述依据所述训练参数以及量化精度的数据类型确定所述量化精度的调整频率具体包括:Optionally, the foregoing determining the adjustment frequency of the quantization accuracy according to the training parameters and the data type of the quantization accuracy specifically includes:
a)永远不触发调整,即固定s、f。a) Never trigger adjustment, that is, fix s and f.
b)每隔δ个训练周期(iteration)调整一次量化精度,δ固定不变,调整方式可以参见上述s、f的调整方式。上述训练周期具体可以为,一个训练样本完成一次迭代运算,即一个训练样本完成一次正向运算和反向训练即一个训练周期。b) The quantization accuracy is adjusted once every δ training cycles (iteration), δ is fixed, and the adjustment method can refer to the above adjustment method of s and f. The above-mentioned training period may specifically be that one training sample completes one iteration operation, that is, one training sample completes one forward operation and reverse training, that is, one training period.
优选地,上述δ可以依据数据种类设置不同的数值,例如对于输入、输出神经元和权重数据等数据类型,δ可以设置为100。对于神经元导数数据类型,δ可以设置为20。Preferably, the above-mentioned δ can be set to different values according to the type of data. For example, for data types such as input and output neurons and weight data, δ can be set to 100. For the neuron derivative data type, δ can be set to 20.
c)每隔δ个训练epoch(时期)调整一次量化精度,δ固定不变。上述训练时期具体可以包括:训练集中的所有样本训练一次为一个训练时期,即一个训练时期可以为训练集的所有样本执行一次正向运算和反向训练即一个训练时期。c) Adjust the quantization accuracy every δ training epochs (period), and δ is fixed. The above-mentioned training period may specifically include: all samples in the training set are trained once as a training period, that is, a training period can perform one forward operation and reverse training for all samples in the training set, that is, a training period.
d)每隔δ个训练iteration或epoch调整一次量化精度,每隔step个iteration或epoch调整step=αδ,其中α大于1。d) Adjust the quantization accuracy every δ training iterations or epochs, and adjust step=αδ every step iterations or epochs, where α is greater than 1.
e)每隔δ个训练iteration或epoch调整一次量化精度,δ随着训练次数的增加,逐渐递减,例如训练次数为100时,δ=100,训练次数为180时,δ=90,训练次数为260时,δ=80。e) Adjust the quantization accuracy every δ training iterations or epochs. As the number of training increases, δ gradually decreases. For example, when the number of training is 100, δ=100, when the number of training is 180, δ=90, and the number of training is At 260, δ=80.
可选的,上述确定量化精度的方法具体可以为下述方法的任意一种:Optionally, the foregoing method for determining the quantization accuracy may specifically be any one of the following methods:
方式A、根据该运算数据绝对值最大值确定s、f;Method A: Determine s and f according to the maximum absolute value of the calculated data;
具体的,计算芯片确定该运算数据的绝对值最大值amax,通过以下公式确定s或f;Specifically, the calculation chip determines the maximum absolute value amax of the operation data, and determines s or f by the following formula;
可以通过公式1-1来确定s;S can be determined by formula 1-1;
S a=[log 2(a max)—bitnum+1]    公式(1-1) S a =[log 2 (a max )—bitnum+1] formula (1-1)
可以通过公式2-1来确定fF can be determined by formula 2-1
Figure PCTCN2020084943-appb-000034
Figure PCTCN2020084943-appb-000034
其中c为常数,可取任意有理数,优选的,该c可以为[1,1.2]之间的任意有理数;当然上述c也可以不取上述范围的有理数,只需为有理数即可。上述bitnum为量化后的数据的比特位数;参阅图3a以及图3b,图3a为离散定点数据表示示意图,图3b为连续定点数据表示示意图,对于离散定点数据,该bitnum可以取8、16、24或32。Where c is a constant and can be any rational number. Preferably, the c can be any rational number between [1,1.2]; of course, the above c can also be a rational number in the above range, and only needs to be a rational number. The above-mentioned bitnum is the number of bits of the quantized data; refer to Figure 3a and Figure 3b, Figure 3a is a schematic diagram of discrete fixed-point data, and Figure 3b is a schematic diagram of continuous fixed-point data. For discrete fixed-point data, the bitnum can be 8, 16, 24 or 32.
可选的,上述amax选取方法可以采用多种方法。具体的,amax可以按数据类别寻找;当还可以分层、分类别或分组寻找。Optionally, multiple methods can be used for the above-mentioned amax selection method. Specifically, amax can be searched by data category; it can also be searched by layers, categories or groups.
A1、计算芯片可以采用所有层分类寻找绝对值最大值;计算芯片确定待运算数据的每个元素为
Figure PCTCN2020084943-appb-000035
其中
Figure PCTCN2020084943-appb-000036
可为输入神经元A、输出神经元B、权重W、输入神经元导数
Figure PCTCN2020084943-appb-000037
输出神经元导数
Figure PCTCN2020084943-appb-000038
权重导数
Figure PCTCN2020084943-appb-000039
的第l层的元素值。遍历神经网络的所有层,寻找所有层的每类别数据的绝对值最大值。
A1. The computing chip can use all layer classifications to find the maximum absolute value; the computing chip determines that each element of the data to be calculated is
Figure PCTCN2020084943-appb-000035
among them
Figure PCTCN2020084943-appb-000036
Can be input neuron A, output neuron B, weight W, input neuron derivative
Figure PCTCN2020084943-appb-000037
Output neuron derivative
Figure PCTCN2020084943-appb-000038
Weight derivative
Figure PCTCN2020084943-appb-000039
The element value of the l-th level. Traverse all the layers of the neural network to find the maximum absolute value of each category of data in all layers.
A2、计算芯片可以采用分层分类别寻找绝对值最大值;计算芯片确定待运算数据的每个元素为
Figure PCTCN2020084943-appb-000040
其中
Figure PCTCN2020084943-appb-000041
可为输入神经元A、输出神经元B、权重W、输入神经元导数
Figure PCTCN2020084943-appb-000042
输出神经元导数
Figure PCTCN2020084943-appb-000043
权重导数
Figure PCTCN2020084943-appb-000044
的第l层的元素值。当然在实际应用中,还可以提取除了l层以外其他层的数据,例如,可以提取λ层的每个类别数据的绝对值最大值,λ为大于等于2的整数。
A2. The computing chip can use hierarchical classification to find the maximum absolute value; the computing chip determines each element of the data to be calculated as
Figure PCTCN2020084943-appb-000040
among them
Figure PCTCN2020084943-appb-000041
Can be input neuron A, output neuron B, weight W, input neuron derivative
Figure PCTCN2020084943-appb-000042
Output neuron derivative
Figure PCTCN2020084943-appb-000043
Weight derivative
Figure PCTCN2020084943-appb-000044
The element value of the l-th level. Of course, in practical applications, it is also possible to extract data of other layers except for layer 1, for example, the maximum absolute value of each category data of the λ layer can be extracted, and λ is an integer greater than or equal to 2.
A3、计算芯片可以采用分层分类别分组进寻找绝对值最大值;计算芯片确定待运算数据的每个元素为
Figure PCTCN2020084943-appb-000045
其中
Figure PCTCN2020084943-appb-000046
可为输入神经元A、输出神经元B、权重W、输入神经元导数
Figure PCTCN2020084943-appb-000047
输出神经元导数
Figure PCTCN2020084943-appb-000048
权重导数
Figure PCTCN2020084943-appb-000049
的第l层的元素值。将每层的每类数据分为g组(该g可以为经验值或用户设定值),遍历 神经网络的所有层,寻找每层的每个类别的g组数据中每组的绝对值最大值。上述输入神经元A、输出神经元B、权重W、输入神经元导数
Figure PCTCN2020084943-appb-000050
输出神经元导数
Figure PCTCN2020084943-appb-000051
权重导数
Figure PCTCN2020084943-appb-000052
均可以表示一类数据。
A3. The computing chip can use hierarchical classification and grouping to find the maximum absolute value; the computing chip determines that each element of the data to be calculated is
Figure PCTCN2020084943-appb-000045
among them
Figure PCTCN2020084943-appb-000046
Can be input neuron A, output neuron B, weight W, input neuron derivative
Figure PCTCN2020084943-appb-000047
Output neuron derivative
Figure PCTCN2020084943-appb-000048
Weight derivative
Figure PCTCN2020084943-appb-000049
The element value of the l-th level. Divide each type of data in each layer into g groups (the g can be an empirical value or a user set value), traverse all layers of the neural network, and find the maximum absolute value of each group of g data in each category of each layer value. The above input neuron A, output neuron B, weight W, input neuron derivative
Figure PCTCN2020084943-appb-000050
Output neuron derivative
Figure PCTCN2020084943-appb-000051
Weight derivative
Figure PCTCN2020084943-appb-000052
Both can represent one type of data.
方式B、根据该运算数据绝对值最小值确定s、f;Method B: Determine s and f according to the minimum absolute value of the calculated data;
具体的,计算芯片确定该运算数据绝对值最小值a min,依据a min确定定点化精度s或f。 Specifically, the computing chip determines the minimum absolute value a min of the arithmetic data, and determines the fixed-point accuracy s or f according to a min .
可以通过公式1-2确定精度s;The accuracy s can be determined by formula 1-2;
Figure PCTCN2020084943-appb-000053
Figure PCTCN2020084943-appb-000053
可以通过公式2-2确定精度f;The accuracy f can be determined by formula 2-2;
f a=a min*d   公式2-2 f a =a min *d Formula 2-2
其中d为常数,可取任意有理数。Where d is a constant and can be any rational number.
上述a min的按数据类别寻找;当还可以分层、分类别或分组寻找。其具体的寻找方式可以采用上述A1、A2或A3的方式,只需将A1、A2或A3中的a max替换成a min即可。 The a min mentioned above is searched by data category; it can also be searched in layers, categories or groups. The specific search method can be the above-mentioned A1, A2 or A3 method, just replace a max in A1, A2 or A3 with a min .
方式C、根据不同数据类型间关系确定s、f:Method C. Determine s and f according to the relationship between different data types:
同一层的不同数据类型的定点化精度s存在着相关性。例如第l层数据a (l)
Figure PCTCN2020084943-appb-000054
可以由第l层数据b (l)
Figure PCTCN2020084943-appb-000055
依据公式1-3来确定。
The fixed-point precision s of different data types in the same layer is correlated. For example, the first layer data a (l)
Figure PCTCN2020084943-appb-000054
Can be determined by the l-th layer data b (l)
Figure PCTCN2020084943-appb-000055
Determine according to formula 1-3.
S a (l)=∑ b≠aα bS b (l)b   公式1-3 S a (l) =∑ b≠a α b S b (l) + β b Formula 1-3
第l层数据a (l)的定点精度
Figure PCTCN2020084943-appb-000056
可以由第l层数据b (l)的定点精度
Figure PCTCN2020084943-appb-000057
依据公式2-3来确定。
Fixed-point accuracy of the l layer data a (l)
Figure PCTCN2020084943-appb-000056
Can be determined by the fixed-point accuracy of the first layer data b (l)
Figure PCTCN2020084943-appb-000057
Determine according to formula 2-3.
f a (l)=∑ b≠aα bf b (l)b   公式2-3 f a (l) =∑ b≠a α b f b (l) + β b Formula 2-3
其中α b、β b为整数常数。具体的,对于公式1-3,该α b、β b为整数常数,对于公式2-3,该α b、β b为有理数常数。 Among them, α b and β b are integer constants. Specifically, for Formula 1-3, the α b and β b are integer constants, and for Formula 2-3, the α b and β b are rational constants.
上述数据类型a (l)可以为输入神经元X (l)、输出神经元Y (l)、权重W (l)、输入神经元导数
Figure PCTCN2020084943-appb-000058
输出神经元导数
Figure PCTCN2020084943-appb-000059
权重导数
Figure PCTCN2020084943-appb-000060
中的一种;上述数据类型b (l)可以为输入神经元X (l)、输出神经元Y (l)、权重W (l)、输入神经元导数
Figure PCTCN2020084943-appb-000061
输出神经元导数
Figure PCTCN2020084943-appb-000062
权重导数
Figure PCTCN2020084943-appb-000063
中的另一种。
The above data type a (l) can be input neuron X (l) , output neuron Y (l) , weight W (l) , input neuron derivative
Figure PCTCN2020084943-appb-000058
Output neuron derivative
Figure PCTCN2020084943-appb-000059
Weight derivative
Figure PCTCN2020084943-appb-000060
One of the above data type b (l) can be input neuron X (l) , output neuron Y (l) , weight W (l) , input neuron derivative
Figure PCTCN2020084943-appb-000061
Output neuron derivative
Figure PCTCN2020084943-appb-000062
Weight derivative
Figure PCTCN2020084943-appb-000063
In another.
方式D、计算芯片根据经验值常量确定s、f:Method D. The calculation chip determines s and f according to the empirical value constant:
具体地,第l层数据类型a (l)的定点精度
Figure PCTCN2020084943-appb-000064
可人为设定
Figure PCTCN2020084943-appb-000065
其中C为整数常数。
Specifically, the fixed-point accuracy of the first layer data type a (l)
Figure PCTCN2020084943-appb-000064
Can be set manually
Figure PCTCN2020084943-appb-000065
Where C is an integer constant.
具体地,第l层数据类型a (l)的定点精度
Figure PCTCN2020084943-appb-000066
可人为设定
Figure PCTCN2020084943-appb-000067
其中C为有理数常数。
Specifically, the fixed-point accuracy of the first layer data type a (l)
Figure PCTCN2020084943-appb-000066
Can be set manually
Figure PCTCN2020084943-appb-000067
Where C is a rational constant.
本申请还提供一种人工智能处理器,所述人工智能处理器包括:This application also provides an artificial intelligence processor, which includes:
处理单元,用于确定神经网络的运算数据;The processing unit is used to determine the operational data of the neural network;
获取单元,用于获取量化命令,所述量化命令包括:量化精度的数据类型以及量化精度,获取神经网络的训练参数;The obtaining unit is configured to obtain a quantization command, the quantization command includes: the data type of the quantization accuracy and the quantization accuracy, and the training parameters of the neural network are obtained;
处理单元,还用于依据所述训练参数以及量化精度的数据类型确定所述量化精度的调整频率,以使得所述人工智能处理器依据所述调整频率对量化精度调整。The processing unit is further configured to determine the adjustment frequency of the quantization accuracy according to the training parameter and the data type of the quantization accuracy, so that the artificial intelligence processor adjusts the quantization accuracy according to the adjustment frequency.
本披露还揭露了一个神经网络运算装置,其包括一个或多个在如图4a或如图4b所示的芯片,也可以包括一个或多个上述人工智能处理器。用于从其他处理装置中获取待运算数据和控制信息,执行指定的神经网络运算,执行结果通过I/O接口传递给外围设备。外围设备譬如摄像头,显示器,鼠标,键盘, 网卡,wifi接口,服务器。当包含一个以上神如图4a或如图4b所示的芯片时,如图4a或如图4b所示的芯片间可以通过特定的结构进行链接并传输数据,譬如,通过PCIE总线进行互联并传输数据,以支持更大规模的神经网络的运算。此时,可以共享同一控制系统,也可以有各自独立的控制系统;可以共享内存,也可以每个加速器有各自的内存。此外,其互联方式可以是任意互联拓扑。The present disclosure also discloses a neural network computing device, which includes one or more chips as shown in FIG. 4a or FIG. 4b, and may also include one or more artificial intelligence processors. It is used to obtain the data to be calculated and control information from other processing devices, execute the specified neural network operation, and transfer the execution result to the peripheral device through the I/O interface. Peripheral devices such as camera, monitor, mouse, keyboard, network card, wifi interface, server. When it contains more than one chip as shown in Figure 4a or Figure 4b, the chips as shown in Figure 4a or Figure 4b can be linked and transmit data through a specific structure, for example, interconnect and transmit data through the PCIE bus Data to support larger-scale neural network operations. At this time, the same control system can be shared, or there can be separate control systems; the memory can be shared, or each accelerator can have its own memory. In addition, the interconnection mode can be any interconnection topology.
该神经网络运算装置具有较高的兼容性,可通过PCIE接口与各种类型的服务器相连接。The neural network computing device has high compatibility and can be connected to various types of servers through a PCIE interface.
本披露还揭露了一个组合处理装置,其包括上述的神经网络运算装置,通用互联接口,和其他处理装置(即通用处理装置)。神经网络运算装置与其他处理装置进行交互,共同完成用户指定的操作。如5a为组合处理装置的示意图。The present disclosure also discloses a combined processing device, which includes the above-mentioned neural network computing device, a universal interconnection interface, and other processing devices (ie, universal processing devices). The neural network computing device interacts with other processing devices to jointly complete the operation specified by the user. For example, 5a is a schematic diagram of the combined processing device.
其他处理装置,包括中央处理器CPU、图形处理器GPU、神经网络处理器等通用/专用处理器中的一种或以上的处理器类型。其他处理装置所包括的处理器数量不做限制。其他处理装置作为神经网络运算装置与外部数据和控制的接口,包括数据搬运,完成对本神经网络运算装置的开启、停止等基本控制;其他处理装置也可以和神经网络运算装置协作共同完成运算任务。Other processing devices include one or more types of general/special processors such as central processing unit CPU, graphics processing unit GPU, neural network processor, etc. The number of processors included in other processing devices is not limited. Other processing devices serve as the interface between the neural network computing device and external data and control, including data handling, and complete basic controls such as opening and stopping the neural network computing device; other processing devices can also cooperate with the neural network computing device to complete computing tasks.
通用互联接口,用于在所述神经网络运算装置与其他处理装置间传输数据和控制指令。该神经网络运算装置从其他处理装置中获取所需的输入数据,写入神经网络运算装置片上的存储装置;可以从其他处理装置中获取控制指令,写入神经网络运算装置片上的控制缓存;也可以读取神经网络运算装置的存储模块中的数据并传输给其他处理装置。The universal interconnection interface is used to transmit data and control instructions between the neural network computing device and other processing devices. The neural network computing device obtains the required input data from other processing devices and writes them to the storage device on the neural network computing device chip; it can obtain control instructions from other processing devices and write them to the control buffer on the neural network computing device chip; also The data in the storage module of the neural network computing device can be read and transmitted to other processing devices.
如图5b所示,可选的,该结构还包括存储装置,用于保存在本运算单元/运算装置或其他运算单元所需要的数据,尤其适用于所需要运算的数据在本神经网络运算装置或其他处理装置的内部存储中无法全部保存的数据。As shown in Figure 5b, optionally, the structure also includes a storage device for storing data required by the arithmetic unit/arithmetic device or other arithmetic units, especially suitable for the data that needs to be calculated in the neural network arithmetic device Or other data that cannot be stored in the internal storage of the processing device.
该组合处理装置可以作为手机、机器人、无人机、视频监控设备等设备的SOC片上系统,有效降低控制部分的核心面积,提高处理速度,降低整体功耗。此情况时,该组合处理装置的通用互联接口与设备的某些部件相连接。某些部件譬如摄像头,显示器,鼠标,键盘,网卡,wifi接口。The combined processing device can be used as an SOC system-on-chip for mobile phones, robots, drones, video surveillance equipment and other equipment, effectively reducing the core area of the control part, increasing processing speed, and reducing overall power consumption. In this case, the universal interconnection interface of the combined processing device is connected to some parts of the equipment. Some components such as camera, monitor, mouse, keyboard, network card, wifi interface.
请参照图5c,图5c为本披露实施例提供的一种神经网络处理器板卡的结构示意图。如图5c所示,上述神经网络处理器板卡10包括神经网络芯片封装结构11、第一电气及非电气连接装置12和第一基板(substrate)13。Please refer to FIG. 5c. FIG. 5c is a schematic structural diagram of a neural network processor board provided by an embodiment of the disclosure. As shown in FIG. 5c, the aforementioned neural network processor board 10 includes a neural network chip packaging structure 11, a first electrical and non-electrical connection device 12, and a first substrate 13.
本披露对于神经网络芯片封装结构11的具体结构不作限定,可选的,如图5d所示,上述神经网络芯片封装结构11包括:神经网络芯片111、第二电气及非电气连接装置112、第二基板113。This disclosure does not limit the specific structure of the neural network chip packaging structure 11. Optionally, as shown in FIG. 5d, the neural network chip packaging structure 11 includes: a neural network chip 111, a second electrical and non-electrical connection device 112, and a second electrical connection device 112. Two substrate 113.
本披露所涉及的神经网络芯片111的具体形式不作限定,上述的神经网络芯片111包含但不限于将神经网络处理器集成的神经网络晶片,上述晶片可以由硅材料、锗材料、量子材料或分子材料等制成。根据实际情况(例如:较严苛的环境)和不同的应用需求可将上述神经网络晶片进行封装,以使神经网络晶片的大部分被包裹住,而将神经网络晶片上的引脚通过金线等导体连到封装结构的外边,用于和更外层进行电路连接。The specific form of the neural network chip 111 involved in this disclosure is not limited. The neural network chip 111 mentioned above includes but is not limited to a neural network chip integrated with a neural network processor. The chip can be made of silicon material, germanium material, quantum material or molecular Made of materials. According to the actual situation (for example: harsher environment) and different application requirements, the above-mentioned neural network chip can be packaged so that most of the neural network chip is wrapped, and the pins on the neural network chip are passed through the gold wire The conductor is connected to the outer side of the package structure for circuit connection with the outer layer.
本披露对于神经网络芯片111的具体结构不作限定,可选的,请参照图4a或图4b所示的装置。The present disclosure does not limit the specific structure of the neural network chip 111. For alternatives, please refer to the device shown in FIG. 4a or FIG. 4b.
本披露对于第一基板13和第二基板113的类型不做限定,可以是印制电路板(printed circuit board,PCB)或印制线路板(printed wiring board,PWB),还可能为其它电路板。对PCB的制作材料也不做限定。This disclosure does not limit the types of the first substrate 13 and the second substrate 113. They may be printed circuit boards (PCB) or printed wiring boards (PWB), or other circuit boards. . There are no restrictions on the materials used to make the PCB.
本披露所涉及的第二基板113用于承载上述神经网络芯片111,通过第二电气及非电气连接装置112将上述的神经网络芯片111和第二基板113进行连接得到的神经网络芯片封装结构11,用于保护神经网络芯片111,便于将神经网络芯片封装结构11与第一基板13进行进一步封装。The second substrate 113 involved in the present disclosure is used to carry the aforementioned neural network chip 111, and the neural network chip package structure 11 obtained by connecting the aforementioned neural network chip 111 and the second substrate 113 through a second electrical and non-electrical connection device 112 , Used to protect the neural network chip 111 and facilitate further packaging of the neural network chip packaging structure 11 and the first substrate 13.
对于上述具体的第二电气及非电气连接装置112的封装方式和封装方式对应的结构不作限定,可根据实际情况和不同的应用需求选择合适的封装方式并进行简单地改进,例如:倒装芯片球栅阵列封装(Flip Chip Ball Grid Array Package,FCBGAP),薄型四方扁平式封装(Low-profile Quad Flat Package,LQFP)、带散热器的四方扁平封装(Quad Flat Package with Heat sink,HQFP)、无引脚四方扁平封装(Quad Flat Non-lead Package,QFN)或小间距四方扁平式封装(Fine-pitch Ball Grid Package,FBGA)等封装方式。The packaging method and the corresponding structure of the packaging method of the above-mentioned specific second electrical and non-electrical connection device 112 are not limited. A suitable packaging method can be selected according to the actual situation and different application requirements and simple improvements, such as flip chip Ball grid array package (Flip Chip Ball Grid Array Package, FCBGAP), Low-profile Quad Flat Package (LQFP), Quad Flat Package with Heat sink (HQFP), no Pin Quad Flat Non-lead Package (QFN) or Fine-Pitch Ball Grid Package (FBGA) and other packaging methods.
倒装芯片(Flip Chip),适用于对封装后的面积要求高或对导线的电感、信号的传输时间敏感的情况下。除此之外可以用引线键合(Wire Bonding)的封装方式,减少成本,提高封装结构的灵活性。Flip chip (Flip Chip) is suitable for situations that require high packaging area or are sensitive to wire inductance and signal transmission time. In addition, wire bonding (Wire Bonding) packaging methods can be used to reduce costs and improve the flexibility of the packaging structure.
球栅阵列(Ball Grid Array),能够提供更多引脚,且引脚的平均导线长度短,具备高速传递信号的作用,其中,封装可以用引脚网格阵列封装(Pin Grid Array,PGA)、零插拔力(Zero Insertion Force,ZIF)、单边接触连接(Single Edge Contact Connection,SECC)、触点阵列(Land Grid Array,LGA)等来代替。Ball Grid Array (Ball Grid Array) can provide more pins, and the average lead length of the pins is short, which has the function of high-speed signal transmission. Among them, the package can be a pin grid array (Pin Grid Array, PGA) , Zero Insertion Force (ZIF), Single Edge Contact Connection (SECC), Land Grid Array (LGA), etc. instead.
可选的,采用倒装芯片球栅阵列(Flip Chip Ball Grid Array)的封装方式对神经网络芯片111和第二基板113进行封装,具体的神经网络芯片封装结构的示意图可参照图6。如图6所示,上述神经网络芯片封装结构包括:神经网络芯片21、焊盘22、焊球23、第二基板24、第二基板24上的连接点25、引脚26。Optionally, a Flip Chip Ball Grid Array (Flip Chip Ball Grid Array) packaging method is used to package the neural network chip 111 and the second substrate 113. For a specific schematic diagram of the neural network chip packaging structure, refer to FIG. 6. As shown in FIG. 6, the above-mentioned neural network chip packaging structure includes: a neural network chip 21, a pad 22, a solder ball 23, a second substrate 24, a connection point 25 on the second substrate 24, and a pin 26.
其中,焊盘22与神经网络芯片21相连,通过在焊盘22和第二基板24上的连接点25之间焊接形成焊球23,将神经网络芯片21和第二基板24连接,即实现了神经网络芯片21的封装。Wherein, the pad 22 is connected to the neural network chip 21, and a solder ball 23 is formed by welding between the pad 22 and the connection point 25 on the second substrate 24, and the neural network chip 21 and the second substrate 24 are connected, that is, the realization The packaging of the neural network chip 21.
引脚26用于与封装结构的外部电路(例如,神经网络处理器板卡10上的第一基板13)相连,可实现外部数据和内部数据的传输,便于神经网络芯片21或神经网络芯片21对应的神经网络处理器对数据进行处理。对于引脚的类型和数量本披露也不作限定,根据不同的封装技术可选用不同的引脚形式,并遵从一定规则进行排列。The pin 26 is used to connect with the external circuit of the package structure (for example, the first substrate 13 on the neural network processor board 10), which can realize the transmission of external data and internal data, and is convenient for the neural network chip 21 or the neural network chip 21 The corresponding neural network processor processes the data. The disclosure also does not limit the type and number of pins, and different pin forms can be selected according to different packaging technologies, and they are arranged in accordance with certain rules.
可选的,上述神经网络芯片封装结构还包括绝缘填充物,置于焊盘22、焊球23和连接点25之间的空隙中,用于防止焊球与焊球之间产生干扰。Optionally, the above-mentioned neural network chip packaging structure further includes an insulating filler, which is placed in the gap between the pad 22, the solder ball 23 and the connection point 25 to prevent interference between the solder ball and the solder ball.
其中,绝缘填充物的材料可以是氮化硅、氧化硅或氧氮化硅;干扰包含电磁干扰、电感干扰等。Among them, the material of the insulating filler may be silicon nitride, silicon oxide or silicon oxynitride; interference includes electromagnetic interference, inductive interference, etc.
可选的,上述神经网络芯片封装结构还包括散热装置,用于散发神经网络芯片21运行时的热量。其中,散热装置可以是一块导热性良好的金属片、散热片或散热器,例如,风扇。Optionally, the aforementioned neural network chip packaging structure further includes a heat dissipation device for dissipating heat when the neural network chip 21 is running. The heat dissipation device may be a metal sheet, heat sink or heat sink with good thermal conductivity, for example, a fan.
举例来说,如图6a所示,神经网络芯片封装结构11包括:神经网络芯片21、焊盘22、焊球23、第二基板24、第二基板24上的连接点25、引脚26、绝缘填充物27、散热膏28和金属外壳散热片29。其中,散热膏28和金属外壳散热片29用于散发神经网络芯片21运行时的热量。For example, as shown in FIG. 6a, the neural network chip packaging structure 11 includes: a neural network chip 21, pads 22, solder balls 23, a second substrate 24, connection points 25 on the second substrate 24, pins 26, Insulating filler 27, heat dissipation paste 28 and metal shell heat sink 29. Among them, the heat dissipation paste 28 and the metal shell heat sink 29 are used to dissipate the heat of the neural network chip 21 during operation.
可选的,上述神经网络芯片封装结构11还包括补强结构,与焊盘22连接,且内埋于焊球23中,以增强焊球23与焊盘22之间的连接强度。Optionally, the aforementioned neural network chip packaging structure 11 further includes a reinforcing structure connected to the pad 22 and buried in the solder ball 23 to enhance the connection strength between the solder ball 23 and the pad 22.
其中,补强结构可以是金属线结构或柱状结构,在此不做限定。Among them, the reinforcing structure may be a metal wire structure or a columnar structure, which is not limited herein.
本披露对于第一电气及非电气装置12的具体形式也不作限定,可参照第二电气及非电气装置112的描述,即通过焊接的方式将神经网络芯片封装结构11进行封装,也可以采用连接线连接或插拔方式连接第二基板113和第一基板13的方式,便于后续更换第一基板13或神经网络芯片封装结构11。The present disclosure does not limit the specific form of the first electrical and non-electrical device 12, and can refer to the description of the second electrical and non-electrical device 112, that is, the neural network chip packaging structure 11 is packaged by welding, or the connection can be used. The method of connecting the second substrate 113 and the first substrate 13 by wire connection or plug-in connection facilitates the subsequent replacement of the first substrate 13 or the neural network chip packaging structure 11.
可选的,第一基板13包括用于扩展存储容量的内存单元的接口等,例如:同步动态随机存储器(Synchronous Dynamic Random Access Memory,SDRAM)、双倍速率同步动态随机存储器(Double Date  Rate SDRAM,DDR)等,通过扩展内存提高了神经网络处理器的处理能力。Optionally, the first substrate 13 includes an interface for a memory unit for expanding storage capacity, such as: Synchronous Dynamic Random Access Memory (SDRAM), Double Date Rate SDRAM, DDR), etc., by expanding the memory to improve the processing capacity of the neural network processor.
第一基板13上还可包括快速外部设备互连总线(Peripheral Component Interconnect-Express,PCI-E或PCIe)接口、小封装可热插拔(Small Form-factor Pluggable,SFP)接口、以太网接口、控制器局域网总线(Controller Area Network,CAN)接口等等,用于封装结构和外部电路之间的数据传输,可提高运算速度和操作的便利性。The first substrate 13 may also include a Peripheral Component Interconnect-Express (PCI-E or PCIe) interface, a small form-factor pluggable (SFP) interface, an Ethernet interface, Controller Area Network (CAN) interfaces, etc., are used for data transmission between the package structure and external circuits, which can improve computing speed and operational convenience.
将神经网络处理器封装为神经网络芯片111,将神经网络芯片111封装为神经网络芯片封装结构11,将神经网络芯片封装结构11封装为神经网络处理器板卡10,通过板卡上的接口(插槽或插芯)与外部电路(例如:计算机主板)进行数据交互,即直接通过使用神经网络处理器板卡10实现神经网络处理器的功能,并保护神经网络芯片111。且神经网络处理器板卡10上还可添加其他模块,提高了神经网络处理器的应用范围和运算效率。The neural network processor is packaged as a neural network chip 111, the neural network chip 111 is packaged as a neural network chip packaging structure 11, and the neural network chip packaging structure 11 is packaged as a neural network processor board card 10 through the interface on the board ( The socket or ferrule) performs data interaction with an external circuit (for example, a computer motherboard), that is, directly realizes the function of the neural network processor by using the neural network processor board 10 and protects the neural network chip 111. In addition, other modules can be added to the neural network processor board 10, which improves the application range and computing efficiency of the neural network processor.
在一个实施例里,本公开公开了一个电子装置,其包括了上述神经网络处理器板卡10或神经网络芯片封装结构11。In one embodiment, the present disclosure discloses an electronic device, which includes the neural network processor board 10 or the neural network chip packaging structure 11 described above.
电子装置包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。Electronic devices include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cameras, cameras, projectors, watches, headsets, mobile storage , Wearable devices, vehicles, household appliances, and/or medical equipment.
所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。The transportation means include airplanes, ships, and/or vehicles; the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
以上所述的具体实施例,对本披露的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本披露的具体实施例而已,并不用于限制本披露,凡在本披露的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本披露的保护范围之内。The specific embodiments described above further describe the purpose, technical solutions and beneficial effects of this disclosure in further detail. It should be understood that the above are only specific embodiments of the disclosure and are not intended to limit the disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of this disclosure shall be included in the protection scope of this disclosure.

Claims (16)

  1. 一种运算数据的量化频率调整方法,其特征在于,所述方法应用于人工智能处理器,所述方法包括如下步骤:A method for adjusting the quantization frequency of computing data, characterized in that the method is applied to an artificial intelligence processor, and the method includes the following steps:
    确定神经网络的运算数据;Determine the calculation data of the neural network;
    获取量化命令,所述量化命令包括:量化精度的数据类型以及量化精度;Acquire a quantization command, where the quantization command includes: the data type of the quantization accuracy and the quantization accuracy;
    获取神经网络的训练参数,依据所述训练参数以及量化精度的数据类型确定所述量化精度的调整频率,以使得所述人工智能处理器依据所述调整频率对量化精度调整。The training parameters of the neural network are acquired, and the adjustment frequency of the quantization accuracy is determined according to the training parameters and the data type of the quantization accuracy, so that the artificial intelligence processor adjusts the quantization accuracy according to the adjustment frequency.
  2. 根据权利要求1所述的方法,其特征在于,The method according to claim 1, wherein:
    所述运算数据包括:输入神经元A、输出神经元B、权重W、输入神经元导数
    Figure PCTCN2020084943-appb-100001
    输出神经元导数
    Figure PCTCN2020084943-appb-100002
    权重导数
    Figure PCTCN2020084943-appb-100003
    中的一种或任意组合;
    The calculation data includes: input neuron A, output neuron B, weight W, input neuron derivative
    Figure PCTCN2020084943-appb-100001
    Output neuron derivative
    Figure PCTCN2020084943-appb-100002
    Weight derivative
    Figure PCTCN2020084943-appb-100003
    One or any combination of;
    所述量化精度的数据类型具体包括:离散量化精度或连续量化精度。The data type of the quantization accuracy specifically includes: discrete quantization accuracy or continuous quantization accuracy.
  3. 根据权利要求1或2所述的方法,其特征在于,The method of claim 1 or 2, wherein:
    所述训练参数包括:训练周期iteration或训练时期epoch。The training parameters include: training period iteration or training period epoch.
  4. 根据权利要求3所述的方法,其特征在于,所述依据所述训练参数以及量化精度的数据类型确定所述量化精度的调整频率具体包括:The method according to claim 3, wherein the determining the adjustment frequency of the quantization accuracy according to the training parameter and the data type of the quantization accuracy specifically comprises:
    每隔δ个训练周期调整一次量化精度,δ固定不变;Adjust the quantization accuracy every δ training cycles, and δ is fixed;
    或每隔δ个训练时期epoch调整一次量化精度,δ固定不变;Or adjust the quantization accuracy every δ training period epoch, δ is fixed;
    或每隔step个iteration或epoch调整一次量化精度,其中step=αδ,α大于1;Or adjust the quantization accuracy every step iteration or epoch, where step=αδ, α is greater than 1;
    或每隔δ个训练iteration或epoch调整一次量化精度,所述δ随着训练次数的增加,逐渐递减。Or adjust the quantization accuracy every δ training iterations or epochs, and the δ gradually decreases as the number of training increases.
  5. 根据权利要求4所述的方法,其特征在于,所述量化精度的确定方法具体包括:The method according to claim 4, wherein the method for determining the quantization accuracy specifically comprises:
    根据该运算数据绝对值最大值确定离散量化精度的指数s或连续量化精度f;Determine the exponent s of the discrete quantization accuracy or the continuous quantization accuracy f according to the maximum absolute value of the operation data;
    或根据该运算数据绝对值最小值确定s、f;Or determine s and f according to the minimum absolute value of the calculated data;
    或根据不同数据的量化精度关系确定s、f;Or determine s and f according to the quantitative precision relationship of different data;
    或根据经验值常量确定s、f。Or determine s and f based on empirical constants.
  6. 根据权利要求5所述的方法,其特征在于,所述根据该运算数据绝对值最大值确定s或f具体包括:The method according to claim 5, wherein the determining s or f according to the maximum absolute value of the operation data specifically comprises:
    如为离散量化精度,通过公式1-1来确定s;For discrete quantization accuracy, determine s by formula 1-1;
    S a=[log 2(a max)—bitnum+1]  公式(1-1) S a =[log 2 (a max )—bitnum+1] formula (1-1)
    如为连续量化精度,通过公式2-1来确定fFor continuous quantization accuracy, determine f by formula 2-1
    Figure PCTCN2020084943-appb-100004
    Figure PCTCN2020084943-appb-100004
    其中c为常数,bitnum为量化后的数据的比特位数,amax为运算数据绝对值最大值。Where c is a constant, bitnum is the number of bits of the quantized data, and amax is the maximum absolute value of the calculated data.
  7. 根据权利要求5所述的方法,其特征在于,所述根据该运算数据绝对值最小值确定s、f具体包括:The method according to claim 5, wherein the determining s and f according to the minimum absolute value of the calculated data specifically comprises:
    如为离散量化精度,通过公式1-2确定精度s;If it is discrete quantization accuracy, determine the accuracy s by formula 1-2;
    Figure PCTCN2020084943-appb-100005
    Figure PCTCN2020084943-appb-100005
    如为连续量化精度,通过公式2-2确定精度f;If it is continuous quantization accuracy, determine the accuracy f by formula 2-2;
    f a=a min*d  公式2-2 f a =a min *d Formula 2-2
    其中d为常数,amin为运算数据绝对值最小值。Among them, d is a constant and amin is the absolute minimum value of the calculated data.
  8. 根据权利要求6或7所述的方法,其特征在于,所述运算数据绝对值最大值或所述绝对值最小值的确定方式具体包括:The method according to claim 6 or 7, wherein the method for determining the maximum absolute value of the operation data or the minimum absolute value specifically includes:
    采用所有层分类寻找绝对值最大值或绝对值最小值;Use all layer classifications to find the absolute maximum value or the absolute minimum value;
    或采用分层分类别寻找绝对值最大值或绝对值最小值;Or use hierarchical classification to find the absolute maximum value or the absolute minimum value;
    或采用分层分类别分组进寻找绝对值最大值或绝对值最小值。Or use hierarchical classification and grouping to find the absolute maximum value or the absolute minimum value.
  9. 根据权利要求5所述的方法,其特征在于,所述根据不同数据类型的量化精度确定s、f具体包括:The method according to claim 5, wherein the determining s and f according to the quantization accuracy of different data types specifically comprises:
    如为离散量化精度,通过公式1-3来确定离散定点精度
    Figure PCTCN2020084943-appb-100006
    For discrete quantization accuracy, use formula 1-3 to determine the discrete fixed-point accuracy
    Figure PCTCN2020084943-appb-100006
    S a (l)=∑ b≠aα bS b (l)b  公式1-3 S a (l) =∑ b≠a α b S b (l) + β b Formula 1-3
    其中,
    Figure PCTCN2020084943-appb-100007
    为与数据a (l)同层的数据b (l)的离散定点精度,所述
    Figure PCTCN2020084943-appb-100008
    已知;
    among them,
    Figure PCTCN2020084943-appb-100007
    Discrete fixed-point accuracy of the data a (l) in the same layer as the data b (l) of the
    Figure PCTCN2020084943-appb-100008
    A known;
    如为连续量化精度,通过公式2-3来确定连续量化精度
    Figure PCTCN2020084943-appb-100009
    For continuous quantization accuracy, use formula 2-3 to determine the continuous quantization accuracy
    Figure PCTCN2020084943-appb-100009
    f a (l)=Σ b≠aα bf b (l)b  公式2-3 f a (l) = Σ b≠a α b f b (l) + β b Formula 2-3
    Figure PCTCN2020084943-appb-100010
    为与数据a (l)同层的数据b (l)的连续量化精度,所述
    Figure PCTCN2020084943-appb-100011
    已知。
    Figure PCTCN2020084943-appb-100010
    The data with the data a (l) with the layer b (l) successive quantization precision, the
    Figure PCTCN2020084943-appb-100011
    A known.
  10. 根据权利要求5所述的方法,其特征在于,所述根据经验值常量确定设定
    Figure PCTCN2020084943-appb-100012
    其中C为整数常数;
    The method of claim 5, wherein the setting is determined according to the empirical value constant
    Figure PCTCN2020084943-appb-100012
    Where C is an integer constant;
    设定
    Figure PCTCN2020084943-appb-100013
    其中C为有理数常数;
    set up
    Figure PCTCN2020084943-appb-100013
    Where C is a rational constant;
    其中,
    Figure PCTCN2020084943-appb-100014
    为数据a (l)的离散量化精度的指数,
    Figure PCTCN2020084943-appb-100015
    为数据a (l)的连续量化精度。
    among them,
    Figure PCTCN2020084943-appb-100014
    Is the index of the discrete quantization precision of data a (l) ,
    Figure PCTCN2020084943-appb-100015
    Is the continuous quantization accuracy of data a (l) .
  11. 一种人工智能处理器,其特征在于,所述人工智能处理器包括:An artificial intelligence processor, characterized in that the artificial intelligence processor comprises:
    处理单元,用于确定神经网络的运算数据;The processing unit is used to determine the operational data of the neural network;
    获取单元,用于获取量化命令,所述量化命令包括:量化精度的数据类型以及量化精度,获取神经网络的训练参数;The obtaining unit is configured to obtain a quantization command, the quantization command includes: the data type of the quantization accuracy and the quantization accuracy, and the training parameters of the neural network are obtained;
    处理单元,还用于依据所述训练参数以及量化精度的数据类型确定所述量化精度的调整频率,以使得所述人工智能处理器依据所述调整频率对量化精度调整。The processing unit is further configured to determine the adjustment frequency of the quantization accuracy according to the training parameter and the data type of the quantization accuracy, so that the artificial intelligence processor adjusts the quantization accuracy according to the adjustment frequency.
  12. 一种神经网络运算装置,其特征在于,所述神经网络运算装置包括一个或多个如权利要求11所述的人工智能处理器。A neural network computing device, wherein the neural network computing device includes one or more artificial intelligence processors according to claim 11.
  13. 一种组合处理装置,其特征在于,所述组合处理装置包括:如权利要求12所述的神经网络运算装置、通用互联接口和通用处理装置;A combined processing device, characterized in that the combined processing device comprises: a neural network computing device, a universal interconnection interface, and a universal processing device according to claim 12;
    所述神经网络运算装置通过所述通用互联接口与所述通用处理装置连接。The neural network computing device is connected to the general processing device through the general interconnection interface.
  14. 一种电子设备,其特征在于,所述电子设备包括如权利要求11所述的人工智能处理器或如权利要求12所述的神经网络运算装置。An electronic device, characterized in that the electronic device comprises the artificial intelligence processor according to claim 11 or the neural network computing device according to claim 12.
  15. 一种计算机可读存储介质,其特征在于,存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如权利要求1-10任意一项所述的方法。A computer-readable storage medium, characterized by storing a computer program for electronic data exchange, wherein the computer program enables a computer to execute the method according to any one of claims 1-10.
  16. 一种计算机程序产品,其中,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如权利要求1-10任意一项所述的方法。A computer program product, wherein the computer program product includes a non-transitory computer readable storage medium storing a computer program, and the computer program is operable to make a computer execute the method according to any one of claims 1-10.
PCT/CN2020/084943 2019-04-16 2020-04-15 Adjusting method for quantization frequency of operational data and related product WO2020211783A1 (en)

Applications Claiming Priority (12)

Application Number Priority Date Filing Date Title
CN201910307672.X 2019-04-16
CN201910306478.XA CN111832710A (en) 2019-04-16 2019-04-16 Method for adjusting quantization frequency of operation data and related product
CN201910306477.5A CN111832709A (en) 2019-04-16 2019-04-16 Mixed quantization method of operation data and related product
CN201910306479.4A CN111832711A (en) 2019-04-16 2019-04-16 Method for quantizing operation data and related product
CN201910306480.7 2019-04-16
CN201910307675.3A CN111832696A (en) 2019-04-16 2019-04-16 Neural network operation method and related product
CN201910306479.4 2019-04-16
CN201910306478.X 2019-04-16
CN201910306477.5 2019-04-16
CN201910307672.XA CN111832695A (en) 2019-04-16 2019-04-16 Method for adjusting quantization precision of operation data and related product
CN201910306480.7A CN111832712A (en) 2019-04-16 2019-04-16 Method for quantizing operation data and related product
CN201910307675.3 2019-04-16

Publications (1)

Publication Number Publication Date
WO2020211783A1 true WO2020211783A1 (en) 2020-10-22

Family

ID=72837027

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/084943 WO2020211783A1 (en) 2019-04-16 2020-04-15 Adjusting method for quantization frequency of operational data and related product

Country Status (1)

Country Link
WO (1) WO2020211783A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995593A (en) * 2014-05-22 2014-08-20 无锡爱维特信息技术有限公司 Dynamic position data uploading method based on acceleration action sensor
CN104683804A (en) * 2015-02-14 2015-06-03 北京航空航天大学 Parameter-adaptive multidimensional bit rate control method based on video content characteristics
US20180285736A1 (en) * 2017-04-04 2018-10-04 Hailo Technologies Ltd. Data Driven Quantization Optimization Of Weights And Input Data In An Artificial Neural Network
US20180341857A1 (en) * 2017-05-25 2018-11-29 Samsung Electronics Co., Ltd. Neural network method and apparatus
CN109190754A (en) * 2018-08-30 2019-01-11 北京地平线机器人技术研发有限公司 Quantitative model generation method, device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103995593A (en) * 2014-05-22 2014-08-20 无锡爱维特信息技术有限公司 Dynamic position data uploading method based on acceleration action sensor
CN104683804A (en) * 2015-02-14 2015-06-03 北京航空航天大学 Parameter-adaptive multidimensional bit rate control method based on video content characteristics
US20180285736A1 (en) * 2017-04-04 2018-10-04 Hailo Technologies Ltd. Data Driven Quantization Optimization Of Weights And Input Data In An Artificial Neural Network
US20180341857A1 (en) * 2017-05-25 2018-11-29 Samsung Electronics Co., Ltd. Neural network method and apparatus
CN109190754A (en) * 2018-08-30 2019-01-11 北京地平线机器人技术研发有限公司 Quantitative model generation method, device and electronic equipment

Similar Documents

Publication Publication Date Title
US20200104693A1 (en) Processing method and accelerating device
TWI771539B (en) Apparatus and method for neural network operation
TWI793225B (en) Method for neural network training and related product
US11748601B2 (en) Integrated circuit chip device
TWI791725B (en) Neural network operation method, integrated circuit chip device and related products
TWI768159B (en) Integrated circuit chip apparatus and related product
EP3770824A1 (en) Computation method and related products of recurrent neural network
TWI767098B (en) Method for neural network forward computation and related product
TWI793224B (en) Integrated circuit chip apparatus and related product
WO2020211783A1 (en) Adjusting method for quantization frequency of operational data and related product
CN110490315B (en) Reverse operation sparse method of neural network and related products
CN111832709A (en) Mixed quantization method of operation data and related product
CN110490314B (en) Neural network sparseness method and related products
WO2019165946A1 (en) Integrated circuit chip device, board card and related product
CN111832710A (en) Method for adjusting quantization frequency of operation data and related product
CN111832712A (en) Method for quantizing operation data and related product
CN111832696A (en) Neural network operation method and related product
CN111832711A (en) Method for quantizing operation data and related product
CN111832695A (en) Method for adjusting quantization precision of operation data and related product
TWI768160B (en) Integrated circuit chip apparatus and related product
TWI767097B (en) Integrated circuit chip apparatus and related product
CN110472735A (en) The Sparse methods and Related product of neural network
TW201937415A (en) Integrated circuit chip device and related product has the advantages of small amount of calculation and low power consumption

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20790521

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 210122)

122 Ep: pct application non-entry in european phase

Ref document number: 20790521

Country of ref document: EP

Kind code of ref document: A1